OCR Evaluation

Evaluates the quality of OCR output by verifying that the extracted JSON content faithfully represents the information in the source PDF document.

result = evaluator.evaluate(
    eval_templates="ocr_evaluation",
    inputs={
        "input_pdf": "path/to/document.pdf",
        "json_content": '{"name": "John Doe", "date": "2024-01-01", "amount": "$100.00"}'
    },
    model_name="turing_flash"
)

print(result.eval_results[0].output)
print(result.eval_results[0].reason)
import { Evaluator, Templates } from "@future-agi/ai-evaluation";

const evaluator = new Evaluator();

const result = await evaluator.evaluate(
  "ocr_evaluation",
  {
    input_pdf: "path/to/document.pdf",
    json_content: '{"name": "John Doe", "date": "2024-01-01", "amount": "$100.00"}'
  },
  {
    modelName: "turing_flash",
  }
);

console.log(result);
Input
Required InputTypeDescription
input_pdfstringThe PDF document to verify against
json_contentstringThe JSON content extracted from OCR to evaluate
Output
FieldDescription
ResultReturns a numeric score where higher values indicate more accurate OCR extraction
ReasonProvides a detailed explanation of the OCR quality assessment

What to Do When OCR Evaluation Score is Low

If the OCR evaluation score is lower than expected:

  • Check for poor scan quality or low-resolution images in the PDF
  • Verify that the OCR tool supports the fonts and languages present in the document
  • Review the JSON structure to ensure it maps correctly to the document fields
  • Look for misinterpreted characters (e.g., 0 vs O, 1 vs l)
  • Ensure tables and multi-column layouts are being parsed correctly
  • Consider pre-processing the PDF to improve contrast and clarity before OCR

Comparing OCR Evaluation with Similar Evals

  • Ground Truth Match: While OCR Evaluation checks the accuracy of structured extraction from a PDF, Ground Truth Match compares any generated output against a known expected value.
Was this page helpful?

Questions & Discussion