Create Custom Evals
Define custom evaluation criteria and rules for your use case beyond built-in templates.
What it is
Custom evals are user-defined evaluation templates with their own assessment criteria, rule prompt, and output format. Each custom eval supports configurable output types (pass/fail, score, or choices), mappable input variables, an optional internet check for up-to-date validation, and a choice of model. Once created, a custom eval is available alongside built-in templates across datasets, simulations, and the SDK.
Use cases
- Domain-specific validation — Assess content against industry or regulatory standards that aren’t in the default templates.
- Business rule compliance — Enforce your organization’s guidelines (tone, format, disclosures) in a repeatable eval.
- Complex or weighted scoring — Implement multi-criteria or custom scoring logic via your rule prompt.
- Custom output formats — Validate specific response structures or formats (e.g. JSON shape, required fields) with a tailored eval.
How to
You can create custom evals from the UI or via the SDK (by calling the REST API from your code). After the template is saved, run it from the UI or from the evaluation SDK using the template name.
Open evaluation creation
Open your dataset, click Evaluate in the top-right, then Add Evaluation. Select Create your own eval to start the custom-eval flow.

Configure basic settings
Name — Unique name for the eval (lowercase letters, numbers, hyphens, and underscores only; cannot start or end with - or _). Used when you add the eval to a dataset or call it from the SDK.
Model — Use Future AGI Models (e.g. turing_large, turing_flash, turing_small, protect, protect_flash) or Use other LLMs (your own or external providers). For model details, see Future AGI models and Use custom models.

Define evaluation rules
In Rule prompt (criteria), write the instructions the model will follow to evaluate each row. Use {{variable_name}} for placeholders; you’ll map these to dataset columns (or SDK input keys) when you add or run the eval. Be specific about what counts as pass/fail or how to score.

Configure output type
Pass/Fail — Binary result (e.g. 1.0 pass, 0.0 fail). Percentage (score) — Numeric score between 0 and 100. Deterministic choices — Categorical result; provide a dict of allowed choices.

Add optional settings and save
Tags — For filtering and organization. Description — Shown in the evaluation list. Check Internet — Allow the eval to use up-to-date information when needed. Required keys — List the input variable names the eval expects (e.g. input, output, user_query, chatbot_response). Click Create Evaluation to save; the new template appears in your list and can be added to datasets or called via the SDK.

Run the evaluation
In your dataset, click Evaluate → Add Evaluation, select the custom eval you created, map the columns to the rule-prompt variables, then click Add & Run. See Running your first eval for the full UI flow.
Creating a custom eval template requires a POST to the Future AGI API. Once created, run it using the Evaluator from the ai-evaluation SDK.
Install the SDK
pip install ai-evaluation Create the custom eval template
Send a POST to /model-hub/create_custom_evals/ using your FI_API_KEY and FI_SECRET_KEY as headers.
import requests
response = requests.post(
"https://api.futureagi.com/model-hub/create_custom_evals/",
headers={
"X-Api-Key": "your-fi-api-key",
"X-Secret-Key": "your-fi-secret-key",
},
json={
"name": "chatbot_politeness_and_relevance",
"description": "Evaluates if the response is polite and relevant.",
"criteria": "Evaluate: 1) Politeness. 2) Relevance to: {{user_query}}. Response: {{chatbot_response}}. Pass only if both.",
"output_type": "Pass/Fail",
"required_keys": ["user_query", "chatbot_response"],
"config": {"model": "turing_small"},
"check_internet": False,
"tags": ["customer-service"],
},
)
print(response.json()) # {"eval_template_id": "..."} Run the custom eval template
Use the template name you registered with Evaluator.evaluate():
from fi.evals import Evaluator
evaluator = Evaluator(
fi_api_key="your-fi-api-key",
fi_secret_key="your-fi-secret-key",
)
result = evaluator.evaluate(
eval_templates="chatbot_politeness_and_relevance",
inputs={
"user_query": "What is the return policy?",
"chatbot_response": "Our return policy allows returns within 30 days.",
},
)
print(result.eval_results[0].output)
print(result.eval_results[0].reason) What you can do next
Evaluate via Platform & SDK
Run evals from the UI or SDK.
Eval groups
Add your custom eval to a group and run it with others.
Use custom models
Bring your own model for evaluations.
Future AGI models
Built-in models available for evals.
CI/CD pipeline
Run evals automatically in your pipeline.
Evaluation overview
How evaluation fits into the platform.