Deterministic Eval

Evaluation Using Interface

Input:

Configuration Parameters:
- Input: The content generated by the model/system that needs to be evaluated against the rules.
- Rule Prompt: A string defining the specific rules, patterns, or criteria the Input must adhere to. You can use double-curly braces like {{column_name}} which will be substituted with actual input data from column column_name during evaluation.
- Choices: A list of predefined options or categories. If Multi Choice is enabled, the evaluation checks if the Input matches one of these choices based on the Rule Prompt.
- Multi Choice: A boolean (true/false) indicating whether the evaluation involves selecting from the predefined Choices (true) or simply evaluating the Input against the Rule Prompt (false).

Output:

The result is a set of choice(s) provided by the user of the output’s adherence to the deterministic criteria.

Evaluation Using Python SDK

Click here to learn how to setup evaluation using the Python SDK.

Input Type	Parameter	Type	Description
Configuration Parameters	`input`	`string`	The actual output or content generated by the model/system that needs to be evaluated against the rules.
	`rule_prompt`	`string`	A string defining the specific rules, patterns, or criteria the `Input` must adhere to. You can use double-curly braces like `{{column_name}}` which will be substituted with actual input data from column `column_name` during evaluation.
	`choices`	`list[string]`	A list of predefined options or categories. Used when `multi_choice` is true.
	`multi_choice`	`bool`	If true, evaluates if the `input` matches one of the `choices` based on the `rule_prompt`. If false, evaluates `input` against `rule_prompt`.

Output	Type	Description
`Result`	`string` / `list[string]`	Returns the matching choice(s)

from fi.testcases import MLLMTestCase
from fi.evals import Deterministic

class DeterministicTestCase(MLLMTestCase):
    context: str
    question: str

deterministic_eval = Deterministic(config={
    "multi_choice": False,
    "choices": ["Pass", "Fail"],
    "rule_prompt": "context : {{input_key1}}, question : {{input_key2}}. Given the context and question, choose Pass if the question is grammatically correct, well-structured, and free of errors; choose Fail otherwise",
        "input": {
        "input_key1": "context",
        "input_key2": "question",
    }
})

for index, row in dataset.iterrows():
  test_case = DeterministicTestCase(
        context=row["context"],
        question=row["question"]
  )
  result = evaluator.evaluate([deterministic_eval], [test_case])
  option = result.eval_results[0].metrics[0].value
  reason = result.eval_results[0].reason

What To Do When Deterministic Eval Does Not Return Expected Option

Rule Refinement:
- Review and clarify rule prompt definitions
- Adjust pattern matching criteria
- Update choice options if too restrictive
Input Validation:
- Check input formatting
- Verify rule string compatibility
- Ensure choice options are comprehensive

Comparing Deterministic Eval with Similar Evals

Content Moderation: While Content Moderation focuses on safety and appropriateness, Deterministic Evals verify pattern compliance and rule adherence.
Prompt Perplexity: it measures a model’s understanding and confidence through perplexity calculations, making it useful for assessing comprehension and response certainty. whereas deterministic eval follows a structured classification framework with explicit rules and criteria, ensuring strict adherence to predefined standards

Introduction

Evaluation

Knowledge Base

Dataset

Prototype

Observe

Tracing

Optimization

Prompt Workbench

Protect

MCP

Admin & Settings

FAQs

Evaluation Using Interface

Evaluation Using Python SDK

What To Do When Deterministic Eval Does Not Return Expected Option

Comparing Deterministic Eval with Similar Evals

Introduction

Evaluation

Knowledge Base

Dataset

Prototype

Observe

Tracing

Optimization

Prompt Workbench

Protect

MCP

Admin & Settings

FAQs

​Evaluation Using Interface

​Evaluation Using Python SDK

​What To Do When Deterministic Eval Does Not Return Expected Option

​Comparing Deterministic Eval with Similar Evals

Evaluation Using Interface

Evaluation Using Python SDK

What To Do When Deterministic Eval Does Not Return Expected Option

Comparing Deterministic Eval with Similar Evals