CLIP Score
Measures how well images match their text descriptions. Higher scores indicate better image-text alignment (range: 0–100).
result = evaluator.evaluate(
eval_templates="clip_score",
inputs={
"images": ["https://example.com/generated-image.jpg"],
"text": ["a golden retriever playing in a park"]
},
model_name="turing_flash"
)
print(result.eval_results[0].output)
print(result.eval_results[0].reason)import { Evaluator, Templates } from "@future-agi/ai-evaluation";
const evaluator = new Evaluator();
const result = await evaluator.evaluate(
"clip_score",
{
images: ["https://example.com/generated-image.jpg"],
text: ["a golden retriever playing in a park"]
},
{
modelName: "turing_flash",
}
);
console.log(result); | Input | |||
|---|---|---|---|
| Required Input | Type | Description | |
images | string or list[string] | Single image or list of images (URL or file path) to evaluate | |
text | string or list[string] | Text description or list of descriptions to compare against the images |
| Output | ||
|---|---|---|
| Field | Description | |
| Result | Returns a numeric score from 0 to 100, where higher values indicate better alignment between the image and text description | |
| Reason | Provides a detailed explanation of the image-text alignment assessment |
What to Do When CLIP Score is Low
- Make the text description more specific and aligned with the visual content
- Check that the image actually depicts what the prompt requested
- Avoid overly abstract or ambiguous descriptions
- Ensure the image generation prompt used matches the evaluation text
- Consider refining the generation model or prompt engineering
Comparing CLIP Score with Similar Evals
- FID Score: CLIP Score measures image-text alignment for individual pairs, while FID Score measures the distributional similarity between sets of real and generated images.
- Image Instruction Adherence: CLIP Score provides a statistical alignment metric, while Image Instruction Adherence uses an LLM to evaluate whether generated images meet detailed instruction criteria.
Was this page helpful?