FID Score
Computes the Fréchet Inception Distance (FID) between two sets of images. Lower scores indicate more similar image distributions.
result = evaluator.evaluate(
eval_templates="fid_score",
inputs={
"real_images": ["https://example.com/real1.jpg", "https://example.com/real2.jpg"],
"fake_images": ["https://example.com/generated1.jpg", "https://example.com/generated2.jpg"]
},
model_name="turing_flash"
)
print(result.eval_results[0].output)
print(result.eval_results[0].reason)import { Evaluator, Templates } from "@future-agi/ai-evaluation";
const evaluator = new Evaluator();
const result = await evaluator.evaluate(
"fid_score",
{
real_images: ["https://example.com/real1.jpg", "https://example.com/real2.jpg"],
fake_images: ["https://example.com/generated1.jpg", "https://example.com/generated2.jpg"]
},
{
modelName: "turing_flash",
}
);
console.log(result); | Input | |||
|---|---|---|---|
| Required Input | Type | Description | |
real_images | list[string] | List of URLs or file paths to the real/reference images | |
fake_images | list[string] | List of URLs or file paths to the generated/fake images |
| Output | ||
|---|---|---|
| Field | Description | |
| Result | Returns a numeric FID score — lower values indicate more similar distributions between real and generated images | |
| Reason | Provides a detailed explanation of the FID score assessment |
What to Do When FID Score is High
- Increase the diversity and size of both image sets for a more reliable score
- Review the generation model for mode collapse or quality issues
- Ensure real and generated images are from the same domain and resolution
- Check preprocessing steps — both sets should be normalized consistently
- Consider fine-tuning the generation model on domain-specific data
Comparing FID Score with Similar Evals
- CLIP Score: FID Score measures distribution similarity between real and generated images, while CLIP Score measures how well images align with a text description.
- Synthetic Image Evaluator: FID Score evaluates the statistical quality of a batch of generated images, while Synthetic Image Evaluator classifies individual images as AI-generated or real.
Was this page helpful?