Eval Definition
Chunk Utilization
Measures how effectively a language model leverages information from the provided context to produce a coherent and contextually appropriate output.
Evaluation Using Interface
Input:
- Required:
- context: The contextual information provided to the model.
- output: The response generated by the language model.
- Optional:
- input: The original query or instruction given to the model.
Output:
- Score: A percentage score between 0 and 100.
Interpretation:
- Higher scores: Indicate that the model effectively and extensively incorporated the provided context into its response.
- Lower scores: Suggest that the model minimally used or ignored the provided context.
Evaluation Using Python SDK
Click here to learn how to setup evaluation using the Python SDK.
Input | Parameter | Type | Description |
---|---|---|---|
Required | context | string or list[string] | The contextual information provided to the model. |
output | string | The response generated by the language model. | |
Optional | input | string | The original query or instruction given to the model. |
Output | Type | Description |
---|---|---|
Score | float | Returns score between 0 and 1. |
What to Do When Chunk Utilization Score is Low
- Ensure that the context provided is relevant and sufficiently detailed for the model to utilise effectively.
- Modify the input prompt to better guide the model in using the context. Clearer instructions may help the model understand how to incorporate the context into its response.
- If the model consistently fails to use context, it may require retraining or fine-tuning with more examples that emphasise the importance of context utilization.
Differentiating Chunk Utilization with Chunk Attribution
Chunk Attribution assesses whether the model acknowledges and references the provided context at all, yielding a binary result: Pass if the context is used, or Fail if it is not. In contrast, Chunk Utilization evaluates how effectively the model incorporates that context into its response, producing a score that reflects the depth of its reliance on the information. While Attribution checks if the context was used, Utilization measures how well it was used to generate a meaningful and informed output.