Evaluation Using Interface

Input:

  • Required Inputs:
    • output: The output column generated by the model.
  • Optional Inputs:
    • context: The context column provided to the model.
    • input: The input column provided to the model.
  • Configuration Parameters:
    • criteria: Text description of the evaluation criteria (e.g., “Evaluate if the output directly answers the question in the input, considering the provided context for background information.”).
    • check_internet: Boolean - Whether to check external sources during evaluation based on the criteria.

Output:

  • Score: Percentage score between 0 and 100

Interpretation:

  • Higher scores: Indicate strong alignment between the input, output, and context according to the specified criteria.
  • Lower scores: Suggest that the output does not meet the defined criteria in relation to the input and context.

Evaluation Using Python SDK

Click here to learn how to setup evaluation using the Python SDK.


Input TypeParameterTypeDescription
Required InputsoutputstringThe output generated by the model.
Optional InputscontextstringThe context provided to the model.
inputstringThe input provided to the model.
Configuration ParameterscriteriastringThe evaluation criteria.
check_internetboolWhether to check internet for evaluation based on the criteria.
OutputTypeDescription
ScorefloatReturns a score between 0 and 1, where higher values indicate better alignment based on criteria.
from fi.evals import EvalClient
from fi.testcases import TestCase
from fi.evals.templates import EvalOutput

eval_output_eval = EvalOutput(config={
    "criteria": "Evaluate if the output directly answers the question in the input, considering the provided context for background information.",
    "check_internet": True
})

test_case = TestCase(
    input="What is the solar system?",
    output="The solar system consists of the Sun and celestial objects bound to it",
    context=[
        "The solar system consists of the Sun and celestial objects bound to it",
        "Our solar system formed 4.6 billion years ago"
    ]
)

result = evaluator.evaluate(eval_templates=[eval_output_eval], inputs=[test_case])
eval_output_score = result.eval_results[0].metrics[0].value


What to Do When Eval Output Evaluation Give Low Score

If the evaluation fails, a criteria review should be conducted to reassess whether the evaluation criteria are clearly defined and aligned with the evaluation’s goals. Adjustments may be necessary to ensure they are comprehensive and relevant.

Additionally, an output analysis should be performed to identify misalignments between the input, context, and output. If discrepancies are found, refining the output or adjusting the evaluation criteria can help improve alignment.


Differentiating Eval Output with Context Adherence

Eval Output evaluation assesses the alignment between input, output, and context based on specified criteria, ensuring coherence. Context Adherence, on the other hand, checks if the output strictly stays within the given context without introducing external information.

Eval Output measures overall alignment, whereas Context Adherence focuses on maintaining contextual integrity.