Definition

Assesses a response based on custom-defined grading criteria. This evaluation uses language models to determine whether a response meets specific conditions set by the grading criteria.

A Passed evaluation indicates that the response meets the criteria, while a Failed evaluation suggests that the response does not satisfy the specified conditions.


Calculation

Begins with configuration setup, where grading conditions are established to define passing and failing thresholds, and an appropriate language model is selected for evaluation. Next, during response analysis, the system assesses the response against the grading criteria, identifying any elements that trigger a fail condition. Finally, in the result generation phase, the evaluation produces a binary Pass/Fail output, indicating whether the response adheres to the established criteria.


What to do when Grading Criteria Fails

If the evaluation fails, start with a criteria review by reassessing the grading conditions to ensure they are clear, appropriate, and aligned with the evaluation objectives. Verify that the criteria comprehensively cover the expected response structure. Next, conduct a response analysis to identify the specific elements that triggered the fail condition. If necessary, refine the response to better meet the grading standards or adjust the criteria to allow for more precise and fair evaluation.


Differentiating Grading Criteria with Deterministic Eval

Grading Criteria allow for custom conditions and model-based evaluation, making it adaptable, while Deterministic Evaluation follows fixed rules, ensuring consistency but with less adaptability.

Use Cases also vary, with Grading Criteria being suitable for subjective assessments where nuanced judgment is required, whereas Deterministic Evaluation is ideal for cases demanding strict rule adherence.

Output distinction lies in Grading Criteria providing a Pass/Fail result based on evaluation conditions, whereas Deterministic Evaluation delivers consistent outputs according to predefined logic.