Purpose of Numeric Difference Eval

  • It evaluate the accuracy of numerical values in model-generated outputs.
  • Unlike semantic or lexical metrics which can overlook numeric discrepancies, NumericDiff ensures that numeric correctness is measured explicitly.

Numeric Difference Eval using Future AGI’s Python SDK

Click here to learn how to setup evaluation using the Python SDK.
Input & Configuration:
ParameterTypeDescription
Required InputsresponsestrModel-generated text containing the numeric prediction.
expected_textstrGround-truth text with the expected numeric value.
Optional Confignormalized_resultboolWhether to return a normalised score in [0, 1] or absolute difference. If False, raw absolute error is returned. Default: True.
Output:
Output FieldTypeDescription
scorefloatScore between 0 and 1 if normalized_result=True , else absolute difference. Higher is better.
Example:
result = evaluator.evaluate(
    eval_templates="numeric_similarity",
    inputs={
        "expected_text": "The Eiffel Tower is a famous landmark in Paris, built in 1889 for the World's Fair. It stands 324 meters tall.",
        "response": "The Eiffel Tower, located in Paris, was built in 1889 and is 324 meters high."
    },
    model_name="turing_flash"
)

print(result.eval_results[0].output)
print(result.eval_results[0].reason)