Running Evals in Simulation

Run evaluations in Future AGI simulations. Test AI agents against simulated customers and score interactions for quality, context retention, and escalation.

What is it?

Simulation is Future AGI’s agent testing product. It lets you run your AI agent against simulated customers in realistic scenarios — without real users, real calls, or production risk. You define who the customer is, what they want, and how they behave; the platform drives the conversation and scores every interaction using evaluations you configure. The result is a detailed breakdown of where your agent succeeds and where it fails, before you ship.


Before starting, make sure you have set up your Agent Definition and Scenarios.

Open the Run Simulation Dashboard

Navigate to your simulation and click Run Simulation. You’ll see the eval configuration panel where you can add evaluators before starting the run.

Run simulation dashboard

Add an Evaluation

Click Add Evaluation to open the eval drawer. Choose from Future AGI’s built-in simulation evals or create a custom one.

Eval drawer

Recommended built-in evals for simulation:

  • customer_agent_conversation_quality — overall conversation quality
  • customer_agent_query_handling — correct interpretation and relevant answers
  • customer_agent_context_retention — agent remembers earlier context
  • customer_agent_human_escalation — appropriate escalation to a human
  • customer_agent_loop_detection — detects repetitive or looping responses

See the full list of built-in evals here.

Configure the Eval

After selecting an eval, a configuration drawer opens. Fill in the required fields:

Configure eval

  • Name — displayed in your simulation dashboard after the run
  • Language Model — recommended: TURING_LARGE
  • Required Inputs — map the eval’s input keys to your simulation columns:
    • conversationMono Voice Recording or Stereo Recording
    • inputperson or situation
    • outputMono Voice Recording, Stereo Recording, or outcome

Click Save Eval when done.

Save eval

Add More Evals (Optional)

The saved eval appears under Selected Evaluations. You can add multiple evals to a single run to test the agent more broadly.

Selected evaluations

Run the Simulation

Once you’ve added all the evals you need, click Next and then run the simulation. Results will appear in your simulation dashboard with a score for each eval.


Creating a Custom Eval

If the built-in evals don’t cover your use case, you can create your own.

Start a Custom Eval

In the eval drawer, click Create your own evals and provide a unique name.

Create custom eval

Write a Rule Prompt

Select a model (recommended: TURING_LARGE) and write your evaluation criteria using {{ }} for input variables.

Example: Given {{conversation}}, evaluate if the agent convinces the customer to purchase insurance.

Map {{conversation}} to Mono Voice Recording or Stereo Recording.

Set the Output Type

Choose how the eval should score results:

  • Pass/Fail — recommended for most cases
  • Percentage — specify what 0% means
  • Categorical — define all possible output labels

Click Create Evaluation to save it as a reusable template under User Built evals.

Use the Custom Eval

Your custom eval now appears in the eval drawer. Select it, give it a run name, map the input columns, and click Save Eval.

Custom eval saved

Was this page helpful?

Questions & Discussion