Summaries play a critical role in distilling large amounts of information into concise, digestible content. Whether used in business reports, research articles, legal documents, or AI-generated content, summaries must maintain both clarity and accuracy while effectively representing the original document.

However, poorly generated summaries can lead to:

  • Loss of critical details – Important information may be omitted.
  • Misinterpretations – The summary may introduce errors that distort the original meaning.
  • Lack of relevance – Unnecessary or unrelated information may be included.

To ensure high-quality summaries, two key evaluations are conducted:

Both evaluations work together to verify that summaries are concise, accurate, and representative of the original content, reducing the risk of misinformation or misleading conclusions.


1. Summary Quality

Evaluates if a summary effectively captures the main points, maintains factual accuracy, and achieves an appropriate length while preserving the original meaning. Ensures key information is included while unnecessary details are excluded.

Click here to read the eval definition of Summary Quality

a. Using Interface

Required Inputs

  • output: The generated summary.

Optional Inputs

  • context: Additional background information (if applicable).
  • input: The original document or source content.

Config

  • Check Internet: Whether to verify information using external sources.

Output

Returns a float between 0 and 1, where higher values indicate better summary quality.

b. Using SDK

from fi.evals import EvalClient
from fi.testcases import TestCase
from fi.evals.templates import SummaryQuality

summary_quality = SummaryQuality(config={"check_internet": False})

test_case = TestCase(
		output= "Example output summary text",
    context="Example context text",
    input="Example input text"
)

result = evaluator.evaluate(eval_templates=[summary_quality], inputs=[test_case])
accuracy_score = result.eval_results[0].metrics[0].value

2. Summarization Accuracy

Assesses the quality of a summary generated by the system. Checks for key information inclusion, factual accuracy, and conciseness while preserving the original meaning.

Click here to read the eval definition of Summarization Accuracy

a. Using Interface

Required Parameters

  • Inputs:
    • document: Original text
    • response: Generated summary

Config:

  • model: LLM model to use

Output: Returns float between 0 and 1, where higher values indicate more accurate summarization

b. Using SDK

from fi.evals import EvalClient
from fi.testcases import TestCase
from fi.evals.templates import SummaryAccuracy

summary_eval = SummaryAccuracy(config={"model": "gpt-4o-mini"})

test_case = TestCase(
    document="Long original text...",
    response="Concise summary of the text..."
)

result = evaluator.evaluate(eval_templates=[summary_eval], inputs=[test_case])
accuracy_score = result.eval_results[0].metrics[0].value