FID Score: Fréchet Inception Distance for Image Sets

Computes the Frechet Inception Distance (FID) between two sets of images. Lower scores indicate more similar image distributions.

result = evaluator.evaluate(
    eval_templates="fid_score",
    inputs={
        "real_images": ["https://example.com/real1.jpg", "https://example.com/real2.jpg"],
        "fake_images": ["https://example.com/generated1.jpg", "https://example.com/generated2.jpg"]
    },
    model_name="turing_flash"
)

print(result.eval_results[0].output)
print(result.eval_results[0].reason)

import { Evaluator, Templates } from "@future-agi/ai-evaluation";

const evaluator = new Evaluator();

const result = await evaluator.evaluate(
  "fid_score",
  {
    real_images: ["https://example.com/real1.jpg", "https://example.com/real2.jpg"],
    fake_images: ["https://example.com/generated1.jpg", "https://example.com/generated2.jpg"]
  },
  {
    modelName: "turing_flash",
  }
);

console.log(result);


Required Input	Type	Description
`real_images`	`list[string]`	List of URLs or file paths to the real/reference images
`fake_images`	`list[string]`	List of URLs or file paths to the generated/fake images

Output
	Field	Description
	Result	Returns a numeric FID score — lower values indicate more similar distributions between real and generated images
	Reason	Provides a detailed explanation of the FID score assessment

What to Do When FID Score is High

Increase the diversity and size of both image sets for a more reliable score
Review the generation model for mode collapse or quality issues
Ensure real and generated images are from the same domain and resolution
Check preprocessing steps — both sets should be normalized consistently
Consider fine-tuning the generation model on domain-specific data

Comparing FID Score with Similar Evals

CLIP Score: FID Score measures distribution similarity between real and generated images, while CLIP Score measures how well images align with a text description.
Synthetic Image Evaluator: FID Score evaluates the statistical quality of a batch of generated images, while Synthetic Image Evaluator classifies individual images as AI-generated or real.

Was this page helpful?

Questions & Discussion