Evaluation Using Interface

Input:

  • Required Inputs:
    • input: The instruction or textual description column associated with the image (e.g., “A vibrant sunrise over a mountain”).
    • image_url: The URL column of the image being evaluated.
  • Configuration Parameters:
    • criteria: The evaluation standard that defines how the alignment is measured (e.g., colour accuracy, object representation, or stylistic features).

Output:

  • Score: Percentage score between 0 and 100

Interpretation:

  • Higher scores: Indicate strong alignment between the instructions and the image based on the specified criteria.
  • Lower scores: Suggest discrepancies or misalignment between the instructions and the image.

Evaluation Using Python SDK

Click here to learn how to setup evaluation using the Python SDK.


Input TypeParameterTypeDescription
Required InputsinputstringThe instruction or textual description associated with the image.
image_urlstringThe URL of the image being evaluated.
Configuration ParameterscriteriastringThe evaluation standard that defines how the alignment is measured.

OutputTypeDescription
ScorefloatReturns a score between 0 and 1, where higher values indicate better alignment.
from fi.evals import EvalClient
from fi.evals.templates import ImageInstruction
from fi.testcases import MLLMTestCase

test_case = MLLMTestCase(
    input="A serene beach landscape photo taken from a wooden boardwalk",
    image_url="<https://example.com/beach_photo.jpg>"
)

template = ImageInstruction(
    config={
        "criteria": """
        Evaluate the image based on:
        1. Instruction clarity and specificity
        2. Image composition alignment
        3. Scene elements accuracy
        4. Overall visual quality
        """
    }
)

evaluator = EvalClient(
    fi_api_key="your_api_key",
    fi_secret_key="your_secret_key",
    fi_base_url="<https://api.futureagi.com>"
)

response = evaluator.evaluate(eval_templates=[template], inputs=[test_case])

score = response.eval_results[0].metrics[0].value
reason = response.eval_results[0].reason

print(f"Evaluation Score: {score}")
print(f"Evaluation Reason: {reason}")


What to do if Eval Image Instruction has Low Score

The first step is to review the evaluation criteria to ensure they are clearly defined and aligned with the intended assessment goals. If necessary, adjustments should be made to enhance their comprehensiveness and relevance. Next, a detailed analysis of the instruction and image should be conducted to examine their alignment. Any discrepancies or misalignments should be identified, and refinements should be considered, either by modifying the instructions or improving the image generation process to achieve better consistency.


Differentiating Eval Image Instruction with Score Eval

Eval Image Instruction focuses specifically on assessing the alignment between textual instructions and image, ensuring that the generated image accurately represents the given instructions. In contrast, Score Eval has a broader scope, evaluating coherence and alignment across multiple inputs and outputs, including both text and images.

Eval Image Instruction assesses instruction-image accuracy, whereas Score Eval examines overall coherence and adherence to instructions. Eval Image Instruction is ideal for cases where precise image representation is the main concern, while Score Eval is better suited for complex scenarios involving multiple modalities, ensuring comprehensive alignment and coherence.