Custom Evaluation Using Deterministic Eval

In most of the AI applications, predictable behaviour is essential for maintaining reliability, consistency, and system integrity. Deterministic evaluations help verify that AI models operate within expected constraints, minimising inconsistencies and unintended variability. Future AGI provides Deterministic Eval that evaluates outputs by comparing them to predefined rules or expected patterns. It checks whether:

The output consistently adheres to a given rule prompt.
It meets the structure or format defined by the evaluation (e.g., multiple-choice validation).
Variability in output is minimised when the same input is provided.

Key Features of Deterministic Eval:

Rule-Based Validation: Uses a customisable rule prompt to define the criteria for evaluating outputs.
Multiple Choice Handling: Supports evaluation for tasks that involve multiple-choice questions.
Customisable Inputs: Works with flexible input types and can be adapted for various AI-generated outputs, such as text, conversation, or image.

Click here to read the eval definition of Deterministic Eval

a. Using Interface

Required Parameters

Input: The input to be evaluated.
Choices: A set of predefined options for multiple-choice questions (if applicable).
Rule Prompt: A rule or set of conditions that the output must meet (e.g., “Output must include X if input includes Y”).
MultiChoice: A Boolean value indicating whether the output should be treated as a multiple-choice question.

The result is a set of choices provided by the user of the output’s adherence to the deterministic criteria.

b. Using SDK

i. For Text Use Case

from fi.testcases import MLLMTestCase

class DeterministicTestCase(MLLMTestCase):
    context: str
    question: str

from fi.evals import Deterministic

deterministic_eval = Deterministic(config={
    "multi_choice": False,
    "choices": ["Pass", "Fail"],
    "rule_prompt": "context : {{input_key1}}, question : {{input_key2}}. Given the context and question, choose Pass if the question is grammatically correct, well-structured, and free of errors; choose Fail otherwise",
        "input": {
        "input_key1": "context",
        "input_key2": "question",
    }
})

complete_result = {}

options = []
reasons = []
for index, row in dataset.iterrows():
  test_case = DeterministicTestCase(
        context=row["context"],
        question=row["question"]
  )
  result = evaluator.evaluate([deterministic_eval], [test_case])
  option = result.eval_results[0].metrics[0].value
  reason = result.eval_results[0].reason
  options.append(option)
  reasons.append(reason)

complete_result["Error-Eval-Rating"] = options
complete_result["Error-Eval-Reason"] = reasons

result_df = pd.DataFrame(complete_result)

ii. For Image Use Case

deterministic_eval = Deterministic(config={
    "multi_choice": False,
    "choices": ["Yes", "No"],
    "rule_prompt": "Prompt : {{input_key2}}, Image : {{input_key3}}. Given the prompt and the corresponding image, answer the Question : {{input_key1}}. Focus only on the {{input_key4}}",
    "input": {
        "input_key1": "question",
        "input_key2": "prompt",
        "input_key3": "image_url",
        "input_key4": "category"
    }
})

class DeterministicTestCase(MLLMTestCase):
    question: str
    prompt: str
    image_url: str
    category: str

test_case = DeterministicTestCase(
    question=datapoint['question'],
    prompt=datapoint['prompt'],
    image_url=datapoint['image_url'],
    category=datapoint['category']
)

batch_result = evaluator.evaluate([deterministic_eval], [test_case])
print(batch_result.eval_results[0].reason)

References

[1] https://futureagi.com/customers/scaling-success-in-edtech-leveraging-genai-and-future-agi-for-better-kpi [2] https://futureagi.com/customers/optimizing-image-ai

Introduction

Evaluation

Simulations

Knowledge Base

Dataset

Prototype

Observe

Tracing

Optimization

Prompt Workbench

Protect

MCP

Admin & Settings

FAQs

Custom Evaluation Using Deterministic Eval

a. Using Interface

b. Using SDK

i. For Text Use Case

ii. For Image Use Case

References

Introduction

Evaluation

Simulations

Knowledge Base

Dataset

Prototype

Observe

Tracing

Optimization

Prompt Workbench

Protect

MCP

Admin & Settings

FAQs

​a. Using Interface

​b. Using SDK

​i. For Text Use Case

​ii. For Image Use Case

​References

a. Using Interface

b. Using SDK

i. For Text Use Case

ii. For Image Use Case

References