Eval Image Instruction

Evaluation Using Interface

Input:

Required Inputs:
- input: The instruction or textual description column associated with the image (e.g., “A vibrant sunrise over a mountain”).
- image_url: The URL column of the image being evaluated.
Configuration Parameters:
- criteria: The evaluation standard that defines how the alignment is measured (e.g., colour accuracy, object representation, or stylistic features).

Output:

Score: Percentage score between 0 and 100

Interpretation:

Higher scores: Indicate strong alignment between the instructions and the image based on the specified criteria.
Lower scores: Suggest discrepancies or misalignment between the instructions and the image.

Evaluation Using Python SDK

Click here to learn how to setup evaluation using the Python SDK.

Input Type	Parameter	Type	Description
Required Inputs	`input`	`string`	The instruction or textual description associated with the image.
	`image_url`	`string`	The URL of the image being evaluated.
Configuration Parameters	`criteria`	`string`	The evaluation standard that defines how the alignment is measured.

Output	Type	Description
`Score`	`float`	Returns a score between 0 and 1, where higher values indicate better alignment.

from fi.evals import Evaluator
from fi.evals.templates import ImageInstruction
from fi.testcases import MLLMTestCase

test_case = MLLMTestCase(
    input="A serene beach landscape photo taken from a wooden boardwalk",
    image_url="<https://example.com/beach_photo.jpg>"
)

template = ImageInstruction(
    config={
        "criteria": """
        Evaluate the image based on:
        1. Instruction clarity and specificity
        2. Image composition alignment
        3. Scene elements accuracy
        4. Overall visual quality
        """
    }
)

evaluator = Evaluator(
    fi_api_key="your_api_key",
    fi_secret_key="your_secret_key",
    fi_base_url="<https://api.futureagi.com>"
)

response = evaluator.evaluate(eval_templates=[template], inputs=[test_case], model_name="turing_flash")

score = response.eval_results[0].metrics[0].value
reason = response.eval_results[0].reason

print(f"Evaluation Score: {score}")
print(f"Evaluation Reason: {reason}")

What to do if Eval Image Instruction has Low Score

The first step is to review the evaluation criteria to ensure they are clearly defined and aligned with the intended assessment goals. If necessary, adjustments should be made to enhance their comprehensiveness and relevance. Next, a detailed analysis of the instruction and image should be conducted to examine their alignment. Any discrepancies or misalignments should be identified, and refinements should be considered, either by modifying the instructions or improving the image generation process to achieve better consistency.

Differentiating Eval Image Instruction with Score Eval

Eval Image Instruction focuses specifically on assessing the alignment between textual instructions and image, ensuring that the generated image accurately represents the given instructions. In contrast, Score Eval has a broader scope, evaluating coherence and alignment across multiple inputs and outputs, including both text and images. Eval Image Instruction assesses instruction-image accuracy, whereas Score Eval examines overall coherence and adherence to instructions. Eval Image Instruction is ideal for cases where precise image representation is the main concern, while Score Eval is better suited for complex scenarios involving multiple modalities, ensuring comprehensive alignment and coherence.

Introduction

Evaluation

Knowledge Base

Dataset

Prototype

Observe

Tracing

Optimization

Prompt Workbench

Protect

MCP

Admin & Settings

FAQs

Eval Image Instruction

Evaluation Using Interface

Evaluation Using Python SDK

What to do if Eval Image Instruction has Low Score

Differentiating Eval Image Instruction with Score Eval

Introduction

Evaluation

Knowledge Base

Dataset

Prototype

Observe

Tracing

Optimization

Prompt Workbench

Protect

MCP

Admin & Settings

FAQs

​Evaluation Using Interface

​Evaluation Using Python SDK

​What to do if Eval Image Instruction has Low Score

​Differentiating Eval Image Instruction with Score Eval

Evaluation Using Interface

Evaluation Using Python SDK

What to do if Eval Image Instruction has Low Score

Differentiating Eval Image Instruction with Score Eval