Click here to learn how to setup evaluation using the Python SDK.
Input:
Required Inputs:
input: string - The original text content.
output: string - The summary to be evaluated.
Output:
Result: Returns a list containing ‘Passed’ if the summary effectively captures the key information, or ‘Failed’ if it doesn’t.
Reason: Provides a detailed explanation of why the summary was deemed good or poor.
Copy
result = evaluator.evaluate( eval_templates="is_good_summary", inputs={ "input": "Honey never spoils because it has low moisture content and high acidity, creating an environment that resists bacteria and microorganisms. Archaeologists have even found pots of honey in ancient Egyptian tombs that are still perfectly edible.", "output": "Honey doesn't spoil because its low moisture and high acidity prevent the growth of bacteria and other microbes." }, model_name="turing_flash")print(result.eval_results[0].metrics[0].value)print(result.eval_results[0].reason)
Example Output:
Copy
['Passed']The evaluation is 'Passed' because the summary effectively captures the core information from the original text.* The summary accurately reflects the main point about honey's resistance to spoilage due to low moisture and high acidity. The summary is clear and coherent.* The omission of the archaeological detail is considered minor and does not significantly impact the overall understanding. A different value is not possible because the summary maintains the essential meaning of the original text.
Summary Quality: While Is Good Summary provides a binary assessment (Passed/Failed), Summary Quality might offer more granular ratings of summary effectiveness.
Completeness: Is Good Summary focuses on the overall effectiveness of a summary, whereas Completeness specifically measures whether all required information is included.