Eval Output

Evaluation Using Interface

Input:

Required Inputs:
- output: The output column generated by the model.
Optional Inputs:
- context: The context column provided to the model.
- input: The input column provided to the model.
Configuration Parameters:
- criteria: Text description of the evaluation criteria (e.g., “Evaluate if the output directly answers the question in the input, considering the provided context for background information.”).
- check_internet: Boolean - Whether to check external sources during evaluation based on the criteria.

Output:

Score: Percentage score between 0 and 100

Interpretation:

Higher scores: Indicate strong alignment between the input, output, and context according to the specified criteria.
Lower scores: Suggest that the output does not meet the defined criteria in relation to the input and context.

Evaluation Using Python SDK

Click here to learn how to setup evaluation using the Python SDK.

Input Type	Parameter	Type	Description
Required Inputs	`output`	`string`	The output generated by the model.
Optional Inputs	`context`	`string`	The context provided to the model.
	`input`	`string`	The input provided to the model.
Configuration Parameters	`criteria`	`string`	The evaluation criteria.
	`check_internet`	`bool`	Whether to check internet for evaluation based on the criteria.

Output	Type	Description
`Score`	`float`	Returns a score between 0 and 1, where higher values indicate better alignment based on criteria.

from fi.evals import Evaluator
from fi.testcases import TestCase
from fi.evals.templates import EvalOutput

eval_output_eval = EvalOutput(config={
    "criteria": "Evaluate if the output directly answers the question in the input, considering the provided context for background information.",
    "check_internet": True
})

test_case = TestCase(
    input="What is the solar system?",
    output="The solar system consists of the Sun and celestial objects bound to it",
    context=[
        "The solar system consists of the Sun and celestial objects bound to it",
        "Our solar system formed 4.6 billion years ago"
    ]
)

result = evaluator.evaluate(eval_templates=[eval_output_eval], inputs=[test_case], model_name="turing_flash")
eval_output_score = result.eval_results[0].metrics[0].value

What to Do When Eval Output Evaluation Give Low Score

If the evaluation fails, a criteria review should be conducted to reassess whether the evaluation criteria are clearly defined and aligned with the evaluation’s goals. Adjustments may be necessary to ensure they are comprehensive and relevant. Additionally, an output analysis should be performed to identify misalignments between the input, context, and output. If discrepancies are found, refining the output or adjusting the evaluation criteria can help improve alignment.

Differentiating Eval Output with Context Adherence

Eval Output evaluation assesses the alignment between input, output, and context based on specified criteria, ensuring coherence. Context Adherence, on the other hand, checks if the output strictly stays within the given context without introducing external information. Eval Output measures overall alignment, whereas Context Adherence focuses on maintaining contextual integrity.

Get Started

Guides

Evaluation Using Interface

Evaluation Using Python SDK

What to Do When Eval Output Evaluation Give Low Score

Differentiating Eval Output with Context Adherence

Get Started

Guides

​Evaluation Using Interface

​Evaluation Using Python SDK

​What to Do When Eval Output Evaluation Give Low Score

​Differentiating Eval Output with Context Adherence

Evaluation Using Interface

Evaluation Using Python SDK

What to Do When Eval Output Evaluation Give Low Score

Differentiating Eval Output with Context Adherence