Eval Definition
Aggregated Metric
Combines output of multiple evaluation metrics into a single normalised score using different aggregated methods.
Purpose of Aggregated Metric Eval
- Provides a holistic evaluation by combining the strengths of different metrics e.g., BLEU for lexical overlap, ROUGE for recall-oriented matching, and Levenshtein for edit similarity. Useful when no single metric captures all aspects of quality.
- Supports custom weighting, allowing user to prioritize different metrics based on specific use-case (e.g., prioritizing factual accuracy vs. phrasing style).
Aggregated Metric using Future AGI’s Python SDK
Click here to learn how to setup evaluation using the Python SDK.
Input & Configuration:
Parameter | Type | Description | |
---|---|---|---|
Required Inputs | response | str | Model-generated output to be evaluated. |
expected_text | str or List[str] | One or more reference texts. | |
Required Config | metrics | List[EvalTemplate] | A list of objects from evaluators class like BLEUScore() , ROUGEScore() , etc. |
metric_names | List[str] | Display names for each metric used. Must match length of metrics . | |
aggregator | str | Aggregation strategy. Options: "average" or "weighted_average" . | |
weights | List[float] | Required if aggregator="weighted_average" . Defines relative importance of each metric (should sum to 1). |
Parameter Options:
Parameter - aggregator | Description |
---|---|
"average" | Takes the mean of the normalized metric scores. |
"weighted_average" | Takes a weighted mean based on the weights . (e.g. 0.7 for BLEU, 0.3 for ROUGE) |
Output:
Output Field | Type | Description |
---|---|---|
score | float | Aggregated score between 0 and 1. |
Example:
Output:
What if Aggregated Score is Low?
- Diagnose individual metric output.
- Adjust weights as per the required use-case.
Previous
OverviewThe Knowledge Base (KB) is the foundation for grounded, context-aware synthetic data generation and accurate evaluations. It ensures that every output whether it's data generation or evaluation is informed by your uploaded content, which is semantically processed and abstracted to reflect your organization’s unique domain.
Next