In most of the AI applications, predictable behaviour is essential for maintaining reliability, consistency, and system integrity. Deterministic evaluations help verify that AI models operate within expected constraints, minimising inconsistencies and unintended variability.
Future AGI provides Deterministic Eval that evaluates outputs by comparing them to predefined rules or expected patterns. It checks whether:
- The output consistently adheres to a given rule prompt.
- It meets the structure or format defined by the evaluation (e.g., multiple-choice validation).
- Variability in output is minimised when the same input is provided.
Key Features of Deterministic Eval:
- Rule-Based Validation: Uses a customisable rule prompt to define the criteria for evaluating outputs.
- Multiple Choice Handling: Supports evaluation for tasks that involve multiple-choice questions.
- Customisable Inputs: Works with flexible input types and can be adapted for various AI-generated outputs, such as text, conversation, or image.
Click here to read the eval definition of Deterministic Eval
a. Using Interface
Required Parameters
- Input: The input to be evaluated.
- Choices: A set of predefined options for multiple-choice questions (if applicable).
- Rule Prompt: A rule or set of conditions that the output must meet (e.g., “Output must include X if input includes Y”).
- MultiChoice: A Boolean value indicating whether the output should be treated as a multiple-choice question.
The result is a set of choices provided by the user of the output’s adherence to the deterministic criteria.
b. Using SDK
i. For Text Use Case
from fi.testcases import MLLMTestCase
class DeterministicTestCase(MLLMTestCase):
context: str
question: str
from fi.evals import Deterministic
deterministic_eval = Deterministic(config={
"multi_choice": False,
"choices": ["Pass", "Fail"],
"rule_prompt": "context : {{input_key1}}, question : {{input_key2}}. Given the context and question, choose Pass if the question is grammatically correct, well-structured, and free of errors; choose Fail otherwise",
"input": {
"input_key1": "context",
"input_key2": "question",
}
})
complete_result = {}
options = []
reasons = []
for index, row in dataset.iterrows():
test_case = DeterministicTestCase(
context=row["context"],
question=row["question"]
)
result = evaluator.evaluate([deterministic_eval], [test_case])
option = result.eval_results[0].metrics[0].value
reason = result.eval_results[0].reason
options.append(option)
reasons.append(reason)
complete_result["Error-Eval-Rating"] = options
complete_result["Error-Eval-Reason"] = reasons
result_df = pd.DataFrame(complete_result)
ii. For Image Use Case
deterministic_eval = Deterministic(config={
"multi_choice": False,
"choices": ["Yes", "No"],
"rule_prompt": "Prompt : {{input_key2}}, Image : {{input_key3}}. Given the prompt and the corresponding image, answer the Question : {{input_key1}}. Focus only on the {{input_key4}}",
"input": {
"input_key1": "question",
"input_key2": "prompt",
"input_key3": "image_url",
"input_key4": "category"
}
})
class DeterministicTestCase(MLLMTestCase):
question: str
prompt: str
image_url: str
category: str
test_case = DeterministicTestCase(
question=datapoint['question'],
prompt=datapoint['prompt'],
image_url=datapoint['image_url'],
category=datapoint['category']
)
batch_result = evaluator.evaluate([deterministic_eval], [test_case])
print(batch_result.eval_results[0].reason)
References
[1] https://futureagi.com/customers/scaling-success-in-edtech-leveraging-genai-and-future-agi-for-better-kpi
[2] https://futureagi.com/customers/optimizing-image-ai