Definition

Scores the linkage between textual instructions and the resulting image based on specified criteria. This evaluation ensures that the image accurately reflects the instructions provided, adhering to the defined evaluation criteria. A high score indicates strong alignment between the instructions and the image, while a low score suggests discrepancies or misalignment.


Calculation

The evaluation process begins with configuring the input instructions and the image URL to be assessed, along with defining the evaluation criteria that guide the analysis. Linkage analysis is then conducted to assess the alignment between the textual instructions and the image, determining the degree of consistency and relevance. Finally, a score is assigned based on the linkage analysis, which is compared against predefined criteria to determine whether the image meets the expected standards.


What to do if Eval Image Instruction has Low Score

The first step is to review the evaluation criteria to ensure they are clearly defined and aligned with the intended assessment goals. If necessary, adjustments should be made to enhance their comprehensiveness and relevance. Next, a detailed analysis of the instruction and image should be conducted to examine their alignment. Any discrepancies or misalignments should be identified, and refinements should be considered, either by modifying the instructions or improving the image generation process to achieve better consistency.


Differentiating Eval Image Instruction with Score Eval

Eval Image Instruction focuses specifically on assessing the alignment between textual instructions and image, ensuring that the generated image accurately represents the given instructions. In contrast, Score Eval has a broader scope, evaluating coherence and alignment across multiple inputs and outputs, including both text and images.

Eval Image Instruction assesses instruction-image accuracy, whereas Score Eval examines overall coherence and adherence to instructions. Eval Image Instruction is ideal for cases where precise image representation is the main concern, while Score Eval is better suited for complex scenarios involving multiple modalities, ensuring comprehensive alignment and coherence.