Purpose of Numeric Difference Eval

  • It evaluate the accuracy of numerical values in model-generated outputs.
  • Unlike semantic or lexical metrics which can overlook numeric discrepancies, NumericDiff ensures that numeric correctness is measured explicitly.

Numeric Difference Eval using Future AGI’s Python SDK

Click here to learn how to setup evaluation using the Python SDK.

Input & Configuration:

ParameterTypeDescription
Required InputsresponsestrModel-generated text containing the numeric prediction.
expected_textstrGround-truth text with the expected numeric value.
Optional Confignormalized_resultboolWhether to return a normalised score in [0, 1] or absolute difference. If False, raw absolute error is returned. Default: True.

Output:

Output FieldTypeDescription
scorefloatScore between 0 and 1 if normalized_result=True , else absolute difference. Higher is better.

Example:

from fi.evals.metrics import NumericDiff
from fi.testcases import TestCase

test_case = TestCase(
    response="The result is 98.5",
    expected_text="result is 100"
)

normalized_numeric_diff = NumericDiff(config={
    "normalized_result": True
})

result = normalized_numeric_diff.evaluate([test_case])
print(f"Normalised Differnce: {result.eval_results[0].metrics[0].value:.4f}")

absolute_numeric_diff = NumericDiff(config={
    "normalized_result": False
})

result = absolute_numeric_diff.evaluate([test_case])
print(f"Absolute Differnce: {result.eval_results[0].metrics[0].value:.4f}")

Output:

Normalised Differnce: 0.0150
Absolute Differnce: 1.5000