CLIP Score

Measures how well images match their text descriptions. Higher scores indicate better image-text alignment (range: 0–100).

result = evaluator.evaluate(
    eval_templates="clip_score",
    inputs={
        "images": ["https://example.com/generated-image.jpg"],
        "text": ["a golden retriever playing in a park"]
    },
    model_name="turing_flash"
)

print(result.eval_results[0].output)
print(result.eval_results[0].reason)
import { Evaluator, Templates } from "@future-agi/ai-evaluation";

const evaluator = new Evaluator();

const result = await evaluator.evaluate(
  "clip_score",
  {
    images: ["https://example.com/generated-image.jpg"],
    text: ["a golden retriever playing in a park"]
  },
  {
    modelName: "turing_flash",
  }
);

console.log(result);
Input
Required InputTypeDescription
imagesstring or list[string]Single image or list of images (URL or file path) to evaluate
textstring or list[string]Text description or list of descriptions to compare against the images
Output
FieldDescription
ResultReturns a numeric score from 0 to 100, where higher values indicate better alignment between the image and text description
ReasonProvides a detailed explanation of the image-text alignment assessment

What to Do When CLIP Score is Low

  • Make the text description more specific and aligned with the visual content
  • Check that the image actually depicts what the prompt requested
  • Avoid overly abstract or ambiguous descriptions
  • Ensure the image generation prompt used matches the evaluation text
  • Consider refining the generation model or prompt engineering

Comparing CLIP Score with Similar Evals

  • FID Score: CLIP Score measures image-text alignment for individual pairs, while FID Score measures the distributional similarity between sets of real and generated images.
  • Image Instruction Adherence: CLIP Score provides a statistical alignment metric, while Image Instruction Adherence uses an LLM to evaluate whether generated images meet detailed instruction criteria.
Was this page helpful?

Questions & Discussion