Aggregated Metric

Purpose of Aggregated Metric Eval

Provides a holistic evaluation by combining the strengths of different metrics e.g., BLEU for lexical overlap, ROUGE for recall-oriented matching, and Levenshtein for edit similarity. Useful when no single metric captures all aspects of quality.
Supports custom weighting, allowing user to prioritize different metrics based on specific use-case (e.g., prioritizing factual accuracy vs. phrasing style).

Aggregated Metric using Future AGI’s Python SDK

Click here to learn how to setup evaluation using the Python SDK.

Input & Configuration:

	Parameter	Type	Description
Required Inputs	`response`	`str`	Model-generated output to be evaluated.
	`expected_text`	`str` or `List[str]`	One or more reference texts.
Required Config	`metrics`	`List[EvalTemplate]`	A list of objects from evaluators class like `BLEUScore()`, `ROUGEScore()`, etc.
	`metric_names`	`List[str]`	Display names for each metric used. Must match length of `metrics`.
	`aggregator`	`str`	Aggregation strategy. Options: `"average"` or `"weighted_average"`.
	`weights`	`List[float]`	Required if `aggregator="weighted_average"`. Defines relative importance of each metric (should sum to 1).

Parameter Options:

Parameter - `aggregator`	Description
`"average"`	Takes the mean of the normalized metric scores.
`"weighted_average"`	Takes a weighted mean based on the `weights`. (e.g. 0.7 for BLEU, 0.3 for ROUGE)

Output:

Output Field	Type	Description
`score`	`float`	Aggregated score between 0 and 1.

Example:

from fi.evals.metrics import BLEUScore, ROUGEScore, LevenshteinDistance, AggregatedMetric
from fi.testcases import TestCase

# Test input
test_case = TestCase(
    response="The quick brown fox jumps over the lazy dog.",
    expected_text="quick brown fox jumps over the lazy dog."
)

# Instantiate metrics
bleu = BLEUScore()
rouge = ROUGEScore(config={"rouge_type": "rouge1"})
levenshtein = LevenshteinDistance()

# 1. Simple average
avg_metric = AggregatedMetric(config={
    "metrics": [bleu, rouge],
    "metric_names": ["bleu", "rouge1"],
    "aggregator": "average"
})

# 2. Weighted average (70% BLEU, 30% ROUGE)
weighted_metric = AggregatedMetric(config={
    "metrics": [bleu, rouge],
    "metric_names": ["bleu", "rouge1"],
    "aggregator": "weighted_average",
    "weights": [0.7, 0.3]
})

# 3. Average with BLEU, ROUGE, Levenshtein
combined_metric = AggregatedMetric(config={
    "metrics": [bleu, rouge, levenshtein],
    "metric_names": ["bleu", "rouge1", "levenshtein"],
    "aggregator": "average"
})

# Run evaluation
for label, metric in {
    "BLEU + ROUGE (Average)": avg_metric,
    "BLEU + ROUGE (Weighted)": weighted_metric,
    "BLEU + ROUGE + Levenshtein (Average)": combined_metric
}.items():
    result = metric.evaluate([test_case])
    score = result.eval_results[0].metrics[0].value
    metadata = result.eval_results[0].metadata
    print(f"\n{label}")
    print(f"Aggregated Score: {score:.4f}")

Output:

BLEU + ROUGE (Average)
Aggregated Score: 0.8761

BLEU + ROUGE (Weighted)
Aggregated Score: 0.8710

BLEU + ROUGE + Levenshtein (Average)
Aggregated Score: 0.6144

What if Aggregated Score is Low?

Diagnose individual metric output.
Adjust weights as per the required use-case.

OverviewThe Knowledge Base (KB) is the foundation for grounded, context-aware synthetic data generation and accurate evaluations. It ensures that every output whether it's data generation or evaluation is informed by your uploaded content, which is semantically processed and abstracted to reflect your organization’s unique domain.

On this page

Purpose of Aggregated Metric Eval
Aggregated Metric using Future AGI’s Python SDK
What if Aggregated Score is Low?

Introduction

Evaluation

Knowledge Base

Dataset

Prototype

Observe

Tracing

Optimization

Prompt Workbench

Protect

MCP

Admin & Settings

FAQs

Purpose of Aggregated Metric Eval

Aggregated Metric using Future AGI’s Python SDK

What if Aggregated Score is Low?

Introduction

Evaluation

Knowledge Base

Dataset

Prototype

Observe

Tracing

Optimization

Prompt Workbench

Protect

MCP

Admin & Settings

FAQs

​Purpose of Aggregated Metric Eval

​Aggregated Metric using Future AGI’s Python SDK

​What if Aggregated Score is Low?

Purpose of Aggregated Metric Eval

Aggregated Metric using Future AGI’s Python SDK

What if Aggregated Score is Low?