Evaluation Using Interface

Input:

  • Optional Inputs:
    • input: The input column provided to the LLM that triggers the function call.
    • output: Column which has the resulting function call or response generated by the LLM.
    • context: The contextual information provided to the model.

Configuration Parameters:

  • Criteria: Description of the criteria for evaluation

Output:

  • Score: Percentage score between 0 and 100

Interpretation:

  • Higher scores: Indicate that the context is well-suited for the task, while a low score suggests inadequacies in the context.
  • Lower scores: Indicate that the context is not relevant or sufficient to produce an accurate and coherent output.

Evaluation Using Python SDK

Click here to learn how to setup evaluation using the Python SDK.


Input TypeParameterTypeDescription
OptionalinputstringThe input provided to the LLM that triggers the function call.
outputstringData which has the resulting function call or response generated by the LLM.
contextstring or list[string]The contextual information provided to the model.
Configuration ParameterscriteriastringDescription of the criteria for evaluation.
OutputTypeDescription
ScorefloatReturns score between 0 and 1.
from fi.evals import EvalClient
from fi.testcases import TestCase
from fi.evals.templates import ContextRetrieval

retrieval_eval = ContextRetrieval(config={
	"criteria": "Return quality of output based on relevance to the input and context"
})

test_case = TestCase(
    input="What are black holes?",
    output="Black holes are regions of spacetime where gravity is so strong that nothing can escape.",
    context="Black holes are cosmic objects with extremely strong gravitational fields"
)

result = evaluator.evaluate(eval_templates=[retrieval_eval], inputs=[test_case])
retrieval_score = result.eval_results[0].metrics[0].value


What to do if Eval Context Retrieval Quality is Low

If the evaluation returns a low score, the criteria should be reviewed to ensure they are well-defined, relevant, and aligned with the evaluation’s objectives. Adjustments may be necessary to enhance clarity and comprehensiveness. The context should also be analysed for relevance and sufficiency, identifying any gaps or inadequacies and refining it as needed to better support the output.


Differentiating Eval Context Retrieval Quality with Context Adherence

Eval Context Retrieval Quality and Context Adherence serve different purposes. Eval Context Retrieval Quality assesses the overall quality and relevance of the retrieved context, ensuring it is sufficient and appropriate for generating a response. In contrast, Context Adherence focuses on whether the response strictly adheres to the provided context, preventing the introduction of external information.