Evaluation Using Interface

Input:

  • Required Inputs:
    • input: The original text content.
    • output: The summary to be evaluated.

Output:

  • Result: Returns ‘Passed’ if the summary effectively captures the key information, ‘Failed’ if it doesn’t.

Evaluation Using Python SDK

Click here to learn how to setup evaluation using the Python SDK.

Input:

  • Required Inputs:
    • input: string - The original text content.
    • output: string - The summary to be evaluated.

Output:

  • Result: Returns a list containing ‘Passed’ if the summary effectively captures the key information, or ‘Failed’ if it doesn’t.
  • Reason: Provides a detailed explanation of why the summary was deemed good or poor.
result = evaluator.evaluate(
    eval_templates="is_good_summary", 
    inputs={
        "input": "Honey never spoils because it has low moisture content and high acidity, creating an environment that resists bacteria and microorganisms. Archaeologists have even found pots of honey in ancient Egyptian tombs that are still perfectly edible.",
        "output": "Honey doesn't spoil because its low moisture and high acidity prevent the growth of bacteria and other microbes."
    },
    model_name="turing_flash"
)

print(result.eval_results[0].metrics[0].value)
print(result.eval_results[0].reason)

Example Output:

['Passed']
The evaluation is 'Passed' because the summary effectively captures the core information from the original text.

*   The summary accurately reflects the main point about honey's resistance to spoilage due to low moisture and high acidity. The summary is clear and coherent.
*   The omission of the archaeological detail is considered minor and does not significantly impact the overall understanding. A different value is not possible because the summary maintains the essential meaning of the original text.

What to do If you get Undesired Results

If the summary is evaluated as poor (Failed) but you want to improve it:

  • Ensure all key points from the original text are included
  • Maintain the core meaning and intent of the original text
  • Remove unnecessary details but keep essential information
  • Keep the summary concise while preserving important context
  • Avoid adding new information not present in the original text
  • Use clear language that accurately represents the original content

Comparing Is Good Summary with Similar Evals

  • Summary Quality: While Is Good Summary provides a binary assessment (Passed/Failed), Summary Quality might offer more granular ratings of summary effectiveness.
  • Completeness: Is Good Summary focuses on the overall effectiveness of a summary, whereas Completeness specifically measures whether all required information is included.