Definition

Evaluates whether an output is deterministic or not by following specific rules or patterns. This evaluation is particularly versatile as it can be applied across multiple modalities including text, images, conversations, and custom outputs. It verifies if the generated content adheres to predefined rules, formats, or expected patterns.


Calculation

The evaluation process utilises a structured rule-based system to validate inputs against predefined criteria. It begins with rule configuration, where the evaluator accepts a rule prompt that defines expected patterns, formats, and validation choices. The system supports both single-choice and multi-choice validation, allowing for custom input rules and specific choice definitions.

Once configured, the evaluation process progresses through multiple stages. In the input processing phase, the system verifies that the input adheres to the rule prompt specifications, checks for predefined choice patterns, and correctly handles both single and multi-choice scenarios. The pattern matching stage further refines validation based on input type-text is checked for exact matches and format compliance, images are validated against specified visual criteria, conversations are analysed to ensure responses align with structured rules, and custom types follow user-defined validation methods.

For scoring, the system applies binary scoring (Pass/Fail) for single-choice validation, weighted scoring for multi-choice evaluations, and custom scoring logic based on rule prompt specifications. The final step involves output generation, where results are returned in a structured format. The system provides detailed validation insights, explanations for any rule violations, and mappings to predefined choice options, ensuring transparency in evaluation outcomes.


What To Do When Deterministic Eval Does Not Return Expected Option

  • Rule Refinement:
    • Review and clarify rule prompt definitions
    • Adjust pattern matching criteria
    • Update choice options if too restrictive
  • Input Validation:
    • Check input formatting
    • Verify rule string compatibility
    • Ensure choice options are comprehensive

Comparing Deterministic Eval with Similar Evals

  1. Content Moderation: While Content Moderation focuses on safety and appropriateness, Deterministic Evals verify pattern compliance and rule adherence.
  2. Prompt Perplexity: it measures a model’s understanding and confidence through perplexity calculations, making it useful for assessing comprehension and response certainty. whereas deterministic eval follows a structured classification framework with explicit rules and criteria, ensuring strict adherence to predefined standards