Evaluation Using Interface

Input:
  • Required Inputs:
    • input: URL or file path to the image being captioned.
    • output: The caption text to evaluate.
Output:
  • Result: Returns ‘Passed’ if the caption accurately represents what’s in the image without hallucination, ‘Failed’ if the caption contains hallucinated elements.
  • Reason: A detailed explanation of why the caption was classified as containing or not containing hallucinations.

Evaluation Using SDK

Click here to learn how to setup evaluation using SDK.
Input:
  • Required Inputs:
    • input: string - URL or file path to the image being captioned.
    • output: string - The caption text to evaluate.
Output:
  • Result: Returns a list containing ‘Passed’ if the caption accurately represents what’s in the image without hallucination, or ‘Failed’ if the caption contains hallucinated elements.
  • Reason: Provides a detailed explanation of the evaluation.
result = evaluator.evaluate(
    eval_templates="caption_hallucination",
    inputs={
        "input": "https://www.esparklearning.com/app/uploads/2024/04/Albert-Einstein-generated-by-AI-1024x683.webp",
        "output": "old man"
    },
    model_name="turing_flash"
)

print(result.eval_results[0].output)
print(result.eval_results[0].reason)
[‘Passed’] The evaluation is ‘Passed’ because the caption “old man” is accurately descriptive of what is clearly visible in the image without adding any hallucinated details.
  • The image does indeed show an elderly male figure with characteristic features of advanced age (white/gray hair, wrinkles, aged appearance).
  • The caption is minimalist but factually correct, avoiding any specific claims about identity, activity, setting, or other details that might constitute hallucination.
  • While the caption doesn’t capture the specific identity of the person (who appears to be Albert Einstein or an Einstein-like figure), simply describing the subject as an “old man” remains factually accurate without overreaching.
A different evaluation would only be warranted if the caption made claims about elements not visibly present in the image.

---

### What to do If you get Undesired Results

If the caption is evaluated as containing hallucinations (Failed) and you want to improve it:

- Stick strictly to describing what is visibly present in the image
- Avoid making assumptions about:
  - People's identities (unless clearly labeled or universally recognizable)
  - The location or setting (unless clearly identifiable)
  - Time periods or dates
  - Actions occurring before or after the captured moment
  - Emotions or thoughts of subjects
  - Objects that are partially obscured or ambiguous
- Use qualifying language (like "appears to be," "what looks like") when uncertain
- Focus on concrete visual elements rather than interpretations
- For generic descriptions, stay high-level and avoid specifics that aren't clearly visible

---

### Comparing Caption Hallucination with Similar Evals

- [**Is AI Generated Image**](https://docs.futureagi.com/future-agi/products/evaluation/eval-definition/is-AI-generated-image): Caption Hallucination evaluates the accuracy of image descriptions, while Is AI Generated Image determines if the image itself was created by AI.
- [**Detect Hallucination**](https://docs.futureagi.com/future-agi/products/evaluation/eval-definition/detect-hallucination): Caption Hallucination specifically evaluates image descriptions, whereas Detect Hallucination evaluates factual fabrication in text content more broadly.
- [**Factual Accuracy**](https://docs.futureagi.com/future-agi/products/evaluation/eval-definition/factual-accuracy): Caption Hallucination focuses on whether descriptions match what's visible in images, while Factual Accuracy evaluates correctness of factual statements more generally.