Evaluating how well a language model utilises provided context chunks is essential for ensuring accurate, relevant, and contextually appropriate responses. A model that effectively leverages context can generate more reliable and factually grounded outputs, reducing hallucinations and improving consistency. Failure to properly utilise context can lead to:
  • Incomplete responses – Missing relevant details present in the provided context.
  • Context misalignment – The model generating responses that contradict or ignore the context.
  • Inefficient information usage – Not fully leveraging the available context to improve accuracy.
To assess this, two key evaluations are used: These evaluations help ensure that AI-generated responses are grounded in the provided information, improving reliability and accuracy.

1. Chunk Attribution

Evaluates whether a language model references the provided context chunks in its generated response. It checks whether the model acknowledges and utilises the context at all. Click here to read the eval definition of Chunk Attribution

a. Using Interface

Required Inputs

  • output: The generated response.
  • context: The provided contextual information.

Optional Inputs

  • input: The original query or instruction.

Output

Returns a Pass/Fail result:
  • Pass – The response references the provided context.
  • Fail – The response does not utilise the context.

b. Using SDK

Export your API key and Secret key into your environment variables.
result = evaluator.evaluate(
    eval_templates="chunk_attribution",
    inputs={
        "context": "Honey never spoils because it has low moisture content and high acidity, creating an environment that resists bacteria and microorganisms. Archaeologists have even found pots of honey in ancient Egyptian tombs that are still perfectly edible.",
        "input": "Why doesn’t honey go bad?",
        "output": "Honey doesn’t spoil because its low moisture and high acidity prevent the growth of bacteria and other microbes."
    },
    model_name="turing_flash"
)

print(result.eval_results[0].output)
print(result.eval_results[0].reason)

2. Chunk Utilization

Measures how effectively the model integrates the retrieved context chunks into its response. Unlike Chunk Attribution, which only checks for references, Chunk Utilization assigns a score based on how well the context contributes to a meaningful response.
  • Higher Score – The model extensively incorporates relevant chunks.
  • Lower Score – The model includes minimal or no context.
Click here to read the eval definition of Chunk Utilization

a. Using Interface

Required Inputs

  • output: The model’s generated response.
  • context: The provided context chunks.

Optional Inputs

  • input: The original query or instruction.

Output

  • Score between 0 and 1, where higher values indicate better utilization of context.

b. Using SDK

result = evaluator.evaluate(
    eval_templates="chunk_utilization",
    inputs={
        "context": "Honey never spoils because it has low moisture content and high acidity, creating an environment that resists bacteria and microorganisms. Archaeologists have even found pots of honey in ancient Egyptian tombs that are still perfectly edible.",
        "output": "Honey doesn’t spoil because its low moisture and high acidity prevent the growth of bacteria and other microbes."
    },
    model_name="turing_flash"
)

print(result.eval_results[0].output)
print(result.eval_results[0].reason)