Create Custom Evals

Define custom evaluation criteria and rules for your use case beyond built-in templates.

About

Every AI product has its own definition of a good response. Custom evals let you encode those rules and run them at scale, you write the criteria once, in plain language, and Future AGI scores every response against it automatically, returning a result and a reason for each one.

Once created, a custom eval works exactly like any built-in template: use it on a dataset, in a simulation, or call it from the SDK.


When to use

  • Domain-specific validation: Assess content against industry or regulatory standards that aren’t in the default templates.
  • Business rule compliance: Enforce your organization’s guidelines (tone, format, disclosures) in a repeatable eval.
  • Complex or weighted scoring: Implement multi-criteria or custom scoring logic via your rule prompt.
  • Custom output formats: Validate specific response structures or formats (e.g. JSON shape, required fields) with a tailored eval.

How to

You can create custom evals from the UI or via the SDK (by calling the REST API from your code). After the template is saved, run it from the UI or from the evaluation SDK using the template name.

Open evaluation creation

Open your dataset, click Evaluate in the top-right, then Add Evaluation. Select Create your own eval to start the custom-eval flow. Open evaluation creation

Configure basic settings

Name: unique name for the eval (lowercase letters, numbers, hyphens, and underscores only; cannot start or end with - or _). Used when you add the eval to a dataset or call it from the SDK.

Model: choose Use Future AGI Models (e.g. turing_large, turing_flash, turing_small, protect, protect_flash) or Use other LLMs (your own or external providers). For model details, see Future AGI models and Use custom models. Configure basic settings

Define evaluation rules

In Rule prompt (criteria), write the instructions the model will follow to evaluate each row. Use {{variable_name}} for placeholders; you’ll map these to dataset columns (or SDK input keys) when you add or run the eval. Be specific about what counts as pass/fail or how to score. Define evaluation rules

Configure output type

Pass/Fail: binary result (e.g. 1.0 pass, 0.0 fail). Percentage (score): numeric score between 0 and 100. Deterministic choices: categorical result; provide a dict of allowed choices. Configure output type

Add optional settings

  • Tags: for filtering and organization.
  • Description: shown in the evaluation list.
  • Check Internet: allow the eval to fetch up-to-date information when needed.
  • Required keys: list the input variable names the eval expects (e.g. input, output, user_query, chatbot_response).

Add optional settings and save

Save the eval

Click Create Evaluation. The new template appears in your evaluation list and can be added to any dataset or called via the SDK using the name you gave it.

Run the evaluation

In your dataset, click EvaluateAdd Evaluation, select the custom eval you created, map the columns to the rule-prompt variables, then click Add & Run. See Running your first eval for the full UI flow.

Creating a custom eval template requires a POST to the Future AGI API. Once created, run it using the Evaluator from the ai-evaluation SDK.

Install the SDK

pip install ai-evaluation

Create the custom eval template using API

Send a POST to /model-hub/create_custom_evals/ using your FI_API_KEY and FI_SECRET_KEY as headers.

import requests

response = requests.post(
    "https://api.futureagi.com/model-hub/create_custom_evals/",
    headers={
        "X-Api-Key": "your-fi-api-key",
        "X-Secret-Key": "your-fi-secret-key",
    },
    json={
        "name": "chatbot_politeness_and_relevance",
        "description": "Evaluates if the response is polite and relevant.",
        "criteria": "Evaluate: 1) Politeness. 2) Relevance to: {{user_query}}. Response: {{chatbot_response}}. Pass only if both.",
        "output_type": "Pass/Fail",
        "required_keys": ["user_query", "chatbot_response"],
        "config": {"model": "turing_small"},
        "check_internet": False,
        "tags": ["customer-service"],
    },
)
print(response.json())  # {"eval_template_id": "..."}

Run the custom eval template

Use the template name you registered with Evaluator.evaluate():

from fi.evals import Evaluator

evaluator = Evaluator(
    fi_api_key="your-fi-api-key",
    fi_secret_key="your-fi-secret-key",
)

result = evaluator.evaluate(
    eval_templates="chatbot_politeness_and_relevance",
    inputs={
        "user_query": "What is the return policy?",
        "chatbot_response": "Our return policy allows returns within 30 days.",
    },
)

print(result.eval_results[0].output)
print(result.eval_results[0].reason)

Next Steps

Was this page helpful?

Questions & Discussion