Caption Hallucination Detection Metric for Images

Evaluates whether an image caption contains fabricated details not visible in the image, flagging invented content not grounded in the visual input.

result = evaluator.evaluate(
    eval_templates="caption_hallucination",
    inputs={
        "image": "https://www.esparklearning.com/app/uploads/2024/04/Albert-Einstein-generated-by-AI-1024x683.webp",
        "caption": "old man"
    },
    model_name="turing_flash"
)

print(result.eval_results[0].output)
print(result.eval_results[0].reason)

import { Evaluator, Templates } from "@future-agi/ai-evaluation";

const evaluator = new Evaluator();

const result = await evaluator.evaluate(
  "caption_hallucination",
  {
    image: "https://www.esparklearning.com/app/uploads/2024/04/Albert-Einstein-generated-by-AI-1024x683.webp",
    caption: "old man"
  },
  {
    modelName: "turing_flash",
  }
);

console.log(result);


Required Input	Type	Description
`image`	`string`	URL or file path to the image being captioned
`caption`	`string`	The caption text to evaluate

Output
	Field	Description
	Result	Returns Passed or Failed, where Passed indicates the caption accurately represents what’s in the image without hallucination and Failed indicates the caption contains hallucinated elements
	Reason	Provides a detailed explanation of the evaluation

What to do If you get Undesired Results

If the caption is evaluated as containing hallucinations (Failed) and you want to improve it:

Stick strictly to describing what is visibly present in the image
Avoid making assumptions about:
- People’s identities (unless clearly labeled or universally recognizable)
- The location or setting (unless clearly identifiable)
- Time periods or dates
- Actions occurring before or after the captured moment
- Emotions or thoughts of subjects
- Objects that are partially obscured or ambiguous
Use qualifying language (like “appears to be,” “what looks like”) when uncertain
Focus on concrete visual elements rather than interpretations
For generic descriptions, stay high-level and avoid specifics that aren’t clearly visible

Comparing Caption Hallucination with Similar Evals

Is AI Generated Image: Caption Hallucination evaluates the accuracy of image descriptions, while Is AI Generated Image determines if the image itself was created by AI.
Detect Hallucination: Caption Hallucination specifically evaluates image descriptions, whereas Detect Hallucination evaluates factual fabrication in text content more broadly.
Groundedness: Caption Hallucination focuses on whether descriptions match what’s visible in images, while Groundedness ensures text responses adhere strictly to provided context without adding external information.

Was this page helpful?

Questions & Discussion