Evaluation Using Interface

Input:

  • Required Inputs:
    • output: The output column generated by model.
    • context: The context column provided to the model

Output:

  • Score: Percentage score between 0 and 100

Interpretation:

  • Higher scores: Indicate that the output is more contextually consistent.
  • Lower scores: Suggest that the output is less contextually consistent.

Evaluation Using Python SDK

Click here to learn how to setup evaluation using the Python SDK.

Input:

  • Required Inputs:
    • output: string - The output column generated by the model.
    • context: string - The context column provided to the model

Output:

  • Score: float - Returns score between 0 and 1

Interpretation:

  • Higher scores: Indicate that the output is more contextually consistent.
  • Lower scores: Suggest that the output is less contextually consistent.
from fi.testcase import TestCase
from fi.evals import ContextAdherence

test_case = TestCase(
    output="The output generated by the model",
    context="The context provided to the model"
)

adherence_template = ContextAdherence()

response = evaluator.evaluate(eval_templates=[adherence_template], inputs=[test_case])

adherence_result = response.eval_results[0].metrics[0].value
reason = response.eval_results[0].reason


What to do when Context Adherence is Low

When context adherence is low, start by identifying statements that are not supported by the provided context and checking for implicit versus explicit information to assess potential misinterpretations.

Reviewing how the context is processed can help pinpoint inconsistencies. If necessary, expand context coverage to fill in gaps, clarify ambiguous details, and add missing relevant information.

To improve adherence, implement stricter context binding, integrate fact-checking mechanisms, and enhance overall context processing.


Comparing Context Adherence with Similar Evals

  1. Context Relevance: While Context Adherence focuses on staying within context bounds, Context Relevance evaluates if the provided context is sufficient and appropriate for the query.
  2. Prompt/Instruction Adherence: Context Adherence measures factual consistency with context, while Prompt Adherence evaluates following instructions and format requirements.