Evaluation Using Interface

Input:

  • Required Inputs:
    • output: The generated response column from the model.
    • input: The user-provided input column to the model (acting as the source context).
  • Configuration Parameters:
    • None specified for this evaluation.

Output:

  • Score: Percentage score between 0 and 100

Interpretation:

  • Higher scores: Indicate that the output is well-grounded in the input.
  • Lower scores: Suggest that the output includes information not present in or supported by the input.

Evaluation Using Python SDK

Click here to learn how to setup evaluation using the Python SDK.


Input TypeParameterTypeDescription
Required InputsoutputstringThe generated response from the model.
inputstringThe user-provided input to the model (acting as the source context).
OutputTypeDescription
ScorefloatReturns a score between 0 and 1, where higher values indicate better grounding in the input.
from fi.evals import Evaluator
from fi.testcases import TestCase
from fi.evals.templates import Groundedness

groundedness_eval = Groundedness()

test_case = TestCase(
    input="The Earth orbits around the Sun in how many days?",
    output="The Earth completes one orbit around the Sun every 365.25 days"
)

result = evaluator.evaluate(eval_templates=[groundedness_eval], inputs=[test_case])
groundedness_score = result.eval_results[0].metrics[0].value


What to do when Groundedness Evaluation Fails

If the evaluation fails, the Context Review should reassess the provided context for completeness and clarity, ensuring it includes all necessary information to support the response. In Response Analysis, the response should be examined for any elements not supported by the context, and adjustments should be made to improve alignment with the given information.


Differentiating Groundedness from Context Adherence

While both evaluations assess context alignment, Groundedness ensures that the response is strictly based on the provided context, whereas Context Adherence measures how well the response stays within the context without introducing external information. Both evaluations require a response and context as inputs and produce a Pass/Fail output based on adherence to the provided information.