Evaluate via Platform & SDK

Run evaluations via the Future AGI platform UI or the Python SDK.

What it is

Evaluate via Platform & SDK is the primary way to run evaluations on Future AGI — either through the platform UI on a dataset, or programmatically via the Python SDK. It supports built-in and custom eval templates, sync and async execution, and returns a score, pass/fail result, or reason for each evaluated input.


Use cases

  • Quick quality check — Run a single eval (e.g. tone) on one input to verify the pipeline before scaling.
  • Try built-in templates — Use Future AGI templates (e.g. tone) or your own custom template from the UI or SDK.
  • Automate evals — Call the SDK from scripts or CI to run evals programmatically (sync or async).
  • Run evals on a dataset — From the UI, open a dataset, add an evaluation, map columns, and run so every row is evaluated.

How to

Choose UI or SDK below; each tab has the process in steps.

Select a dataset

You need a dataset to run evals from the UI. If you don’t have one, add a dataset first. See Dataset overview. Select a dataset

Open the evaluation panel

Open your dataset, then click Evaluate in the top-right. The evaluation configuration panel opens. Open the evaluation panel

Add and run an eval

Click Add Evaluation. You’re taken to the evaluation list: choose a built-in template (e.g. tone) or Create your own eval. For a template: click it, give the evaluation a name, and in config select the dataset column(s) to use as input (and output if the template requires it). Optionally enable Error Localization so that when a row fails, the platform can localize the error in the dataset. Choose a model if the template requires one (many built-in evals do). Click Add & Run to run the eval on the dataset. Add and run an eval

Add dataset eval (API): The request includes name, template_id, config (column mapping), optional error_localizer, optional model, and run: true to run immediately.

Optional: Create your own eval

From the Add Evaluation flow, click Create your own eval to define a custom template (name, model, rule prompt, output type, and optional settings). After you save it, the new eval appears in the evaluation list and you can add it to your dataset as in the step above. For full details on creating and configuring custom evals, see Create custom evals. Create your own eval

Install and initialise

Install the package ai-evaluation and create an Evaluator with your Future AGI API key and secret. Prefer setting FI_API_KEY and FI_SECRET_KEY in the environment instead of passing them in code. See Accessing API keys.

pip install ai-evaluation
from fi.evals import Evaluator

evaluator = Evaluator(
    fi_api_key="your_api_key",
    fi_secret_key="your_secret_key",
)

Run a sync eval

Call evaluate with the eval template name (e.g. tone), inputs (dict with the keys the template expects, e.g. "input"), and model_name. Many built-in (system) templates require a model.

result = evaluator.evaluate(
    eval_templates="tone",
    inputs={
        "input": "Dear Sir, I hope this email finds you well. I look forward to any insights or advice you might have whenever you have a free moment"
    },
    model_name="turing_flash",
)
print(result.eval_results[0].output)
print(result.eval_results[0].reason)

Optional: Run async eval

For long-running or large runs, set is_async=True. The call returns immediately with an eval_id; the evaluation runs in the background.

result = evaluator.evaluate(
    eval_templates="tone",
    inputs={"input": "Your text here"},
    model_name="turing_flash",
    is_async=True,
)
eval_id = result.eval_results[0].eval_id

Retrieve async results

Use get_eval_result(eval_id) to fetch the result when the evaluation has finished. The same method works for both sync and async runs (e.g. to re-fetch a result).

result = evaluator.get_eval_result(eval_id)
print(result.eval_results[0].output)
print(result.eval_results[0].reason)

Use a custom template

To use a template you created in the UI, pass its name as eval_templates and supply the inputs dict with the keys your template’s required_keys expect (e.g. "input", "output"). Use the same template name you see in the evaluation list.

from fi.evals import evaluate

result = evaluate(
    eval_templates="name-of-your-eval",
    inputs={
        "input": "your_input_text",
        "output": "your_output_text"
    },
    model_name="model_name"
)

print(result.eval_results[0].output)
print(result.eval_results[0].reason)

Note

For system (built-in) eval templates, model_name is required and must be one of the models listed for that template. The backend validates required input keys from the template’s config.

Tip

For more eval templates and Future AGI models, see Built-in evals and Future AGI models.


What you can do next

Was this page helpful?

Questions & Discussion