Create Custom Evals

Define custom evaluation criteria and rules for your use case beyond built-in templates.

What it is

Custom evals are user-defined evaluation templates with their own assessment criteria, rule prompt, and output format. Each custom eval supports configurable output types (pass/fail, score, or choices), mappable input variables, an optional internet check for up-to-date validation, and a choice of model. Once created, a custom eval is available alongside built-in templates across datasets, simulations, and the SDK.


Use cases

  • Domain-specific validation — Assess content against industry or regulatory standards that aren’t in the default templates.
  • Business rule compliance — Enforce your organization’s guidelines (tone, format, disclosures) in a repeatable eval.
  • Complex or weighted scoring — Implement multi-criteria or custom scoring logic via your rule prompt.
  • Custom output formats — Validate specific response structures or formats (e.g. JSON shape, required fields) with a tailored eval.

How to

You can create custom evals from the UI or via the SDK (by calling the REST API from your code). After the template is saved, run it from the UI or from the evaluation SDK using the template name.

Open evaluation creation

Open your dataset, click Evaluate in the top-right, then Add Evaluation. Select Create your own eval to start the custom-eval flow. Open evaluation creation

Configure basic settings

Name — Unique name for the eval (lowercase letters, numbers, hyphens, and underscores only; cannot start or end with - or _). Used when you add the eval to a dataset or call it from the SDK.

ModelUse Future AGI Models (e.g. turing_large, turing_flash, turing_small, protect, protect_flash) or Use other LLMs (your own or external providers). For model details, see Future AGI models and Use custom models. Configure basic settings

Define evaluation rules

In Rule prompt (criteria), write the instructions the model will follow to evaluate each row. Use {{variable_name}} for placeholders; you’ll map these to dataset columns (or SDK input keys) when you add or run the eval. Be specific about what counts as pass/fail or how to score. Define evaluation rules

Configure output type

Pass/Fail — Binary result (e.g. 1.0 pass, 0.0 fail). Percentage (score) — Numeric score between 0 and 100. Deterministic choices — Categorical result; provide a dict of allowed choices. Configure output type

Add optional settings and save

Tags — For filtering and organization. Description — Shown in the evaluation list. Check Internet — Allow the eval to use up-to-date information when needed. Required keys — List the input variable names the eval expects (e.g. input, output, user_query, chatbot_response). Click Create Evaluation to save; the new template appears in your list and can be added to datasets or called via the SDK. Add optional settings and save

Run the evaluation

In your dataset, click EvaluateAdd Evaluation, select the custom eval you created, map the columns to the rule-prompt variables, then click Add & Run. See Running your first eval for the full UI flow.

Creating a custom eval template requires a POST to the Future AGI API. Once created, run it using the Evaluator from the ai-evaluation SDK.

Install the SDK

pip install ai-evaluation

Create the custom eval template

Send a POST to /model-hub/create_custom_evals/ using your FI_API_KEY and FI_SECRET_KEY as headers.

import requests

response = requests.post(
    "https://api.futureagi.com/model-hub/create_custom_evals/",
    headers={
        "X-Api-Key": "your-fi-api-key",
        "X-Secret-Key": "your-fi-secret-key",
    },
    json={
        "name": "chatbot_politeness_and_relevance",
        "description": "Evaluates if the response is polite and relevant.",
        "criteria": "Evaluate: 1) Politeness. 2) Relevance to: {{user_query}}. Response: {{chatbot_response}}. Pass only if both.",
        "output_type": "Pass/Fail",
        "required_keys": ["user_query", "chatbot_response"],
        "config": {"model": "turing_small"},
        "check_internet": False,
        "tags": ["customer-service"],
    },
)
print(response.json())  # {"eval_template_id": "..."}

Run the custom eval template

Use the template name you registered with Evaluator.evaluate():

from fi.evals import Evaluator

evaluator = Evaluator(
    fi_api_key="your-fi-api-key",
    fi_secret_key="your-fi-secret-key",
)

result = evaluator.evaluate(
    eval_templates="chatbot_politeness_and_relevance",
    inputs={
        "user_query": "What is the return policy?",
        "chatbot_response": "Our return policy allows returns within 30 days.",
    },
)

print(result.eval_results[0].output)
print(result.eval_results[0].reason)

What you can do next

Was this page helpful?

Questions & Discussion