Evaluation Using Interface

Input:

  • Required Inputs:
    • input: The source material, context, or question.
    • output: The response to be evaluated for factual consistency.

Output:

  • Result: Returns ‘Passed’ if the output is factually consistent with the input, ‘Failed’ if it contains inconsistencies.

Evaluation Using Python SDK

Click here to learn how to setup evaluation using the Python SDK.

Input:

  • Required Inputs:
    • input: string - The source material, context, or question.
    • output: string - The response to be evaluated for factual consistency.

Output:

  • Result: Returns a list containing ‘Passed’ if the output is factually consistent with the input, or ‘Failed’ if it contains inconsistencies.
  • Reason: Provides a detailed explanation of why the response was deemed factually consistent or inconsistent.
result = evaluator.evaluate(
    eval_templates="is_factually_consistent", 
    inputs={
        "input": "Why doesn't honey go bad?",
        "output": "Honey doesn't spoil because its low moisture and high acidity prevent the growth of bacteria and other microbes."
    },
    model_name="turing_flash"
)

print(result.eval_results[0].metrics[0].value)
print(result.eval_results[0].reason)

Example Output:

['Passed']
The evaluation is 'Passed' because the output accurately reflects the source material.

-   The output correctly states that honey's resistance to spoilage is due to its **low moisture** and **high acidity**, mirroring the source.
-   There are **no factual inaccuracies**, unsupported additions, or misrepresentations in the output.
-   A different value is not possible because the output is **factually consistent** with the source, with no evidence to suggest otherwise.

What to do If you get Undesired Results

If the content is evaluated as factually inconsistent (Failed) and you want to improve it:

  • Verify all facts against reliable sources or the provided context
  • Remove any claims or details not supported by the source material
  • Correct any inaccuracies, contradictions, or misrepresentations
  • Ensure numbers, dates, names, and specific details align with the source
  • Avoid extrapolating beyond what is explicitly stated in the source
  • Use qualifying language (like “may,” “could,” or “suggests”) when appropriate
  • Cite specific parts of the source material when providing information

Comparing Is Factually Consistent with Similar Evals

  • Factual Accuracy: While Is Factually Consistent focuses on consistency with the provided input or context, Factual Accuracy might verify claims against broader world knowledge.
  • Groundedness: Is Factually Consistent evaluates whether output contradicts the source, while Groundedness measures how well the output is supported by the source.