Input | |||
---|---|---|---|
Required Input | Type | Description | |
image | string | URL or file path to the image being captioned | |
caption | string | The caption text to evaluate |
Output | ||
---|---|---|
Field | Description | |
Result | Returns Passed or Failed, where Passed indicates the caption accurately represents what’s in the image without hallucination and Failed indicates the caption contains hallucinated elements | |
Reason | Provides a detailed explanation of the evaluation |
What to do If you get Undesired Results
If the caption is evaluated as containing hallucinations (Failed) and you want to improve it:- Stick strictly to describing what is visibly present in the image
- Avoid making assumptions about:
- People’s identities (unless clearly labeled or universally recognizable)
- The location or setting (unless clearly identifiable)
- Time periods or dates
- Actions occurring before or after the captured moment
- Emotions or thoughts of subjects
- Objects that are partially obscured or ambiguous
- Use qualifying language (like “appears to be,” “what looks like”) when uncertain
- Focus on concrete visual elements rather than interpretations
- For generic descriptions, stay high-level and avoid specifics that aren’t clearly visible
Comparing Caption Hallucination with Similar Evals
- Is AI Generated Image: Caption Hallucination evaluates the accuracy of image descriptions, while Is AI Generated Image determines if the image itself was created by AI.
- Detect Hallucination: Caption Hallucination specifically evaluates image descriptions, whereas Detect Hallucination evaluates factual fabrication in text content more broadly.
- Factual Accuracy: Caption Hallucination focuses on whether descriptions match what’s visible in images, while Factual Accuracy evaluates correctness of factual statements more generally.