Eval Definition
Deterministic Eval
Evaluates whether an output is deterministic or not by following specific rules or patterns. This evaluation is particularly versatile as it can be applied across multiple modalities including text, images, conversations, and custom outputs. It verifies if the generated content adheres to predefined rules, formats, or expected patterns.
Evaluation Using Interface
Input:
- Configuration Parameters:
- Input: The content generated by the model/system that needs to be evaluated against the rules.
- Rule Prompt: A string defining the specific rules, patterns, or criteria the
Input
must adhere to. You can use double-curly braces like{{column_name}}
which will be substituted with actual input data from columncolumn_name
during evaluation. - Choices: A list of predefined options or categories. If
Multi Choice
is enabled, the evaluation checks if theInput
matches one of these choices based on theRule Prompt
. - Multi Choice: A boolean (
true
/false
) indicating whether the evaluation involves selecting from the predefinedChoices
(true) or simply evaluating theInput
against theRule Prompt
(false).
Output:
- The result is a set of choice(s) provided by the user of the output’s adherence to the deterministic criteria.
Evaluation Using Python SDK
Click here to learn how to setup evaluation using the Python SDK.
Input Type | Parameter | Type | Description |
---|---|---|---|
Configuration Parameters | input | string | The actual output or content generated by the model/system that needs to be evaluated against the rules. |
rule_prompt | string | A string defining the specific rules, patterns, or criteria the Input must adhere to. You can use double-curly braces like {{column_name}} which will be substituted with actual input data from column column_name during evaluation. | |
choices | list[string] | A list of predefined options or categories. Used when multi_choice is true. | |
multi_choice | bool | If true, evaluates if the input matches one of the choices based on the rule_prompt . If false, evaluates input against rule_prompt . |
Output | Type | Description |
---|---|---|
Result | string / list[string] | Returns the matching choice(s) |
What To Do When Deterministic Eval Does Not Return Expected Option
- Rule Refinement:
- Review and clarify rule prompt definitions
- Adjust pattern matching criteria
- Update choice options if too restrictive
- Input Validation:
- Check input formatting
- Verify rule string compatibility
- Ensure choice options are comprehensive
Comparing Deterministic Eval with Similar Evals
- Content Moderation: While Content Moderation focuses on safety and appropriateness, Deterministic Evals verify pattern compliance and rule adherence.
- Prompt Perplexity: it measures a model’s understanding and confidence through perplexity calculations, making it useful for assessing comprehension and response certainty. whereas deterministic eval follows a structured classification framework with explicit rules and criteria, ensuring strict adherence to predefined standards