Definition

Scores the linkage between instructions, input images, and output images. This evaluation ensures that the output images accurately reflect the instructions and input images, adhering to the defined evaluation criteria. A high score indicates strong alignment and coherence, while a low score suggests discrepancies or misalignment.


Calculation

The evaluation process begins with configuration setup, where the rule prompt and input are defined, along with the evaluation criteria that guide the assessment. During linkage analysis, the system assesses the alignment between instructions, input images, and output images based on the specified criteria, determining the degree of consistency and relevance. The result generation phase assigns a score based on the linkage analysis, comparing it against predefined criteria to evaluate whether the output meets the expected standards.


What to do if Score Eval Gives Low Score

The evaluation criteria should be reassessed to ensure they are clearly defined and aligned with the intended evaluation goals. Adjustments may be necessary to make the criteria more comprehensive and relevant.

Additionally, examining the output images for alignment with instructions and input images can help identify discrepancies or misalignments.

Refining the instructions or improving the image generation process can enhance the overall evaluation outcome.


Differentiating Score Eval with Eval Image Instruction

Eval Image Instruction focuses specifically on assessing the alignment between textual instructions and image, ensuring that the generated image accurately represents the given instructions. In contrast, Score Eval has a broader scope, evaluating coherence and alignment across multiple inputs and outputs, including both text and images.

Eval Image Instruction assesses instruction-image accuracy, whereas Score Eval examines overall coherence and adherence to instructions. Eval Image Instruction is ideal for cases where precise image representation is the main concern, while Score Eval is better suited for complex scenarios involving multiple modalities, ensuring comprehensive alignment and coherence.