Evaluation Using Interface

Input:

  • Required:
    • context: The contextual information provided to the model.
    • output: The response generated by the language model.
  • Optional:
    • input: The original query or instruction given to the model.

Output:

  • Score: A percentage score between 0 and 100.

Interpretation:

  • Higher scores: Indicate that the model effectively and extensively incorporated the provided context into its response.
  • Lower scores: Suggest that the model minimally used or ignored the provided context.

Evaluation Using Python SDK

Click here to learn how to setup evaluation using the Python SDK.

InputParameterTypeDescription
Requiredcontextstring or list[string]The contextual information provided to the model.
outputstringThe response generated by the language model.
OptionalinputstringThe original query or instruction given to the model.
OutputTypeDescription
ScorefloatReturns score between 0 and 1.
from fi.testcases import TestCase
from fi.evals.templates import ChunkUtilization

test_case = TestCase(
    context=[
        "Paris is the capital and largest city of France.",
        "France is a country in Western Europe.",
        "Paris is known for its art museums and fashion districts."
    ],
    output="According to the provided information, Paris is the capital city of France. It is a major European city and a global center for art, fashion, and culture.",
    input="What is the capital of France?"
)

template = ChunkUtilization()

response = evaluator.evaluate(eval_templates=[template], inputs=[test_case])

print(f"Score: {response.eval_results[0].metrics[0].value}")
print(f"Reason: {response.eval_results[0].reason}")


What to Do When Chunk Utilization Score is Low

  • Ensure that the context provided is relevant and sufficiently detailed for the model to utilise effectively.
  • Modify the input prompt to better guide the model in using the context. Clearer instructions may help the model understand how to incorporate the context into its response.
  • If the model consistently fails to use context, it may require retraining or fine-tuning with more examples that emphasise the importance of context utilization.

Differentiating Chunk Utilization with Chunk Attribution

Chunk Attribution assesses whether the model acknowledges and references the provided context at all, yielding a binary result: Pass if the context is used, or Fail if it is not. In contrast, Chunk Utilization evaluates how effectively the model incorporates that context into its response, producing a score that reflects the depth of its reliance on the information. While Attribution checks if the context was used, Utilization measures how well it was used to generate a meaningful and informed output.