FID Score

Computes the Fréchet Inception Distance (FID) between two sets of images. Lower scores indicate more similar image distributions.

result = evaluator.evaluate(
    eval_templates="fid_score",
    inputs={
        "real_images": ["https://example.com/real1.jpg", "https://example.com/real2.jpg"],
        "fake_images": ["https://example.com/generated1.jpg", "https://example.com/generated2.jpg"]
    },
    model_name="turing_flash"
)

print(result.eval_results[0].output)
print(result.eval_results[0].reason)
import { Evaluator, Templates } from "@future-agi/ai-evaluation";

const evaluator = new Evaluator();

const result = await evaluator.evaluate(
  "fid_score",
  {
    real_images: ["https://example.com/real1.jpg", "https://example.com/real2.jpg"],
    fake_images: ["https://example.com/generated1.jpg", "https://example.com/generated2.jpg"]
  },
  {
    modelName: "turing_flash",
  }
);

console.log(result);
Input
Required InputTypeDescription
real_imageslist[string]List of URLs or file paths to the real/reference images
fake_imageslist[string]List of URLs or file paths to the generated/fake images
Output
FieldDescription
ResultReturns a numeric FID score — lower values indicate more similar distributions between real and generated images
ReasonProvides a detailed explanation of the FID score assessment

What to Do When FID Score is High

  • Increase the diversity and size of both image sets for a more reliable score
  • Review the generation model for mode collapse or quality issues
  • Ensure real and generated images are from the same domain and resolution
  • Check preprocessing steps — both sets should be normalized consistently
  • Consider fine-tuning the generation model on domain-specific data

Comparing FID Score with Similar Evals

  • CLIP Score: FID Score measures distribution similarity between real and generated images, while CLIP Score measures how well images align with a text description.
  • Synthetic Image Evaluator: FID Score evaluates the statistical quality of a batch of generated images, while Synthetic Image Evaluator classifies individual images as AI-generated or real.
Was this page helpful?

Questions & Discussion