OCR Evaluation: PDF to JSON Content Accuracy Metric

Evaluates the quality of OCR output by verifying that the extracted JSON content faithfully represents the information in the source PDF document.

result = evaluator.evaluate(
    eval_templates="ocr_evaluation",
    inputs={
        "input_pdf": "path/to/document.pdf",
        "json_content": '{"name": "John Doe", "date": "2024-01-01", "amount": "$100.00"}'
    },
    model_name="turing_flash"
)

print(result.eval_results[0].output)
print(result.eval_results[0].reason)

import { Evaluator, Templates } from "@future-agi/ai-evaluation";

const evaluator = new Evaluator();

const result = await evaluator.evaluate(
  "ocr_evaluation",
  {
    input_pdf: "path/to/document.pdf",
    json_content: '{"name": "John Doe", "date": "2024-01-01", "amount": "$100.00"}'
  },
  {
    modelName: "turing_flash",
  }
);

console.log(result);


Required Input	Type	Description
`input_pdf`	`string`	The PDF document to verify against
`json_content`	`string`	The JSON content extracted from OCR to evaluate

Output
	Field	Description
	Result	Returns a numeric score where higher values indicate more accurate OCR extraction
	Reason	Provides a detailed explanation of the OCR quality assessment

What to Do When OCR Evaluation Score is Low

If the OCR evaluation score is lower than expected:

Check for poor scan quality or low-resolution images in the PDF
Verify that the OCR tool supports the fonts and languages present in the document
Review the JSON structure to ensure it maps correctly to the document fields
Look for misinterpreted characters (e.g., 0 vs O, 1 vs l)
Ensure tables and multi-column layouts are being parsed correctly
Consider pre-processing the PDF to improve contrast and clarity before OCR

Comparing OCR Evaluation with Similar Evals

Ground Truth Match: While OCR Evaluation checks the accuracy of structured extraction from a PDF, Ground Truth Match compares any generated output against a known expected value.

Was this page helpful?

Questions & Discussion