Image Instruction Adherence

Measures how well generated images adhere to a given text instruction across subject, style, and composition.

result = evaluator.evaluate(
    eval_templates="image_instruction_adherence",
    inputs={
        "instruction": "A photorealistic image of a red sports car on a mountain road at sunset",
        "images": ["https://example.com/generated-car.jpg"]
    },
    model_name="turing_flash"
)

print(result.eval_results[0].output)
print(result.eval_results[0].reason)
import { Evaluator, Templates } from "@future-agi/ai-evaluation";

const evaluator = new Evaluator();

const result = await evaluator.evaluate(
  "image_instruction_adherence",
  {
    instruction: "A photorealistic image of a red sports car on a mountain road at sunset",
    images: ["https://example.com/generated-car.jpg"]
  },
  {
    modelName: "turing_flash",
  }
);

console.log(result);
Input
Required InputTypeDescription
instructionstringThe text instruction describing what the image should contain or depict
imagesstring or list[string]The generated image(s) to be evaluated against the instruction
Output
FieldDescription
ResultReturns a numeric score where higher values indicate closer adherence to the instruction
ReasonProvides a detailed explanation of how well the image matches the instruction

What to Do When Image Instruction Adherence Score is Low

  • Review the instruction for ambiguity and make it more specific
  • Check that all key elements mentioned in the instruction are present in the image
  • Verify that style, composition, and color requirements are reflected
  • Consider iterating on the generation prompt to better guide the model
  • Break complex instructions into simpler, more focused prompts

Comparing Image Instruction Adherence with Similar Evals

  • CLIP Score: Image Instruction Adherence uses an LLM to reason about detailed instruction compliance, while CLIP Score computes a statistical alignment metric between image and text embeddings.
  • Caption Hallucination: Image Instruction Adherence evaluates whether a generated image matches its instruction, while Caption Hallucination checks whether a text caption accurately describes what is visible in an image.
Was this page helpful?

Questions & Discussion