CLIP Score: Image-Text Alignment Measurement Metric

Measures how well images match their text descriptions. Higher scores indicate better image-text alignment (range: 0–100).

result = evaluator.evaluate(
    eval_templates="clip_score",
    inputs={
        "images": ["https://example.com/generated-image.jpg"],
        "text": ["a golden retriever playing in a park"]
    },
    model_name="turing_flash"
)

print(result.eval_results[0].output)
print(result.eval_results[0].reason)

import { Evaluator, Templates } from "@future-agi/ai-evaluation";

const evaluator = new Evaluator();

const result = await evaluator.evaluate(
  "clip_score",
  {
    images: ["https://example.com/generated-image.jpg"],
    text: ["a golden retriever playing in a park"]
  },
  {
    modelName: "turing_flash",
  }
);

console.log(result);


Required Input	Type	Description
`images`	`string` or `list[string]`	Single image or list of images (URL or file path) to evaluate
`text`	`string` or `list[string]`	Text description or list of descriptions to compare against the images

Output
	Field	Description
	Result	Returns a numeric score from 0 to 100, where higher values indicate better alignment between the image and text description
	Reason	Provides a detailed explanation of the image-text alignment assessment

What to Do When CLIP Score is Low

Make the text description more specific and aligned with the visual content
Check that the image actually depicts what the prompt requested
Avoid overly abstract or ambiguous descriptions
Ensure the image generation prompt used matches the evaluation text
Consider refining the generation model or prompt engineering

Comparing CLIP Score with Similar Evals

FID Score: CLIP Score measures image-text alignment for individual pairs, while FID Score measures the distributional similarity between sets of real and generated images.
Image Instruction Adherence: CLIP Score provides a statistical alignment metric, while Image Instruction Adherence uses an LLM to evaluate whether generated images meet detailed instruction criteria.

Was this page helpful?

Questions & Discussion