Evaluating how well a language model utilises provided context chunks is essential for ensuring accurate, relevant, and contextually appropriate responses. A model that effectively leverages context can generate more reliable and factually grounded outputs, reducing hallucinations and improving consistency.Failure to properly utilise context can lead to:
Incomplete responses – Missing relevant details present in the provided context.
Context misalignment – The model generating responses that contradict or ignore the context.
Inefficient information usage – Not fully leveraging the available context to improve accuracy.
Export your API key and Secret key into your environment variables.
Copy
from fi.testcases import TestCasefrom fi.evals.templates import ChunkAttributiontest_case = TestCase( input="What is the capital of France?", output="According to the provided information, Paris is the capital city of France. It is a major European city and a global center for art, fashion, and culture.", context=[ "Paris is the capital and largest city of France.", "France is a country in Western Europe.", "Paris is known for its art museums and fashion districts." ])template = ChunkAttribution()response = evaluator.evaluate(eval_templates=[template], inputs=[test_case], model_name="turing_flash")print(f"Score: {response.eval_results[0].metrics[0].value}")print(f"Reason: {response.eval_results[0].reason}")
Measures how effectively the model integrates the retrieved context chunks into its response. Unlike Chunk Attribution, which only checks for references, Chunk Utilization assigns a score based on how well the context contributes to a meaningful response.
Higher Score – The model extensively incorporates relevant chunks.
Lower Score – The model includes minimal or no context.
from fi.testcases import TestCasefrom fi.evals.templates import ChunkUtilizationtest_case = TestCase( input="What is the capital of France?", output="According to the provided information, Paris is the capital city of France. It is a major European city and a global center for art, fashion, and culture.", context=[ "Paris is the capital and largest city of France.", "France is a country in Western Europe.", "Paris is known for its art museums and fashion districts." ])template = ChunkUtilization()response = evaluator.evaluate(eval_templates=[template], inputs=[test_case], model_name="turing_flash")print(f"Score: {response.eval_results[0].metrics[0].value}")print(f"Reason: {response.eval_results[0].reason}")