Running Evals in Simulation

Run evaluations in Future AGI simulations. Test AI agents against simulated customers and score interactions for quality, context retention, and escalation.

About

Simulation is Future AGI’s agent testing product. It lets you run your AI agent against simulated customers in realistic scenarios without real users, real calls, or production risk. You define who the customer is, what they want, and how they behave. The platform drives the conversation and scores every interaction using evaluations you configure. The result is a detailed breakdown of where your agent succeeds and where it fails, before you ship.

Prerequisites: Before starting, make sure you have set up your Agent Definition, Scenarios, and Personas.

Open the Run Simulation Dashboard

Navigate to your simulation and click Run Simulation. You’ll see the eval configuration panel where you can add evaluators before starting the run.

Run simulation dashboard

Add an Evaluation

Click Add Evaluation to open the eval drawer. Choose from Future AGI’s built-in simulation evals or create a custom one.

Eval drawer

Recommended built-in evals for simulation:

customer_agent_conversation_quality — overall conversation quality
customer_agent_query_handling — correct interpretation and relevant answers
customer_agent_context_retention — agent remembers earlier context
customer_agent_human_escalation — appropriate escalation to a human
customer_agent_loop_detection — detects repetitive or looping responses

See the full list of built-in evals here.

Configure the Eval

After selecting an eval, a configuration drawer opens. Fill in the required fields:

Configure eval

Name: displayed in your simulation dashboard after the run
Language Model: recommended TURING_LARGE
Required Inputs: map the eval’s input keys to your simulation columns:
- conversation maps to Mono Voice Recording or Stereo Recording
- input maps to person or situation
- output maps to Mono Voice Recording, Stereo Recording, or outcome

Click Save Eval when done.

Save eval

Add More Evals (Optional)

The saved eval appears under Selected Evaluations. You can add multiple evals to a single run to test the agent more broadly.

Selected evaluations

Run the Simulation

Once you’ve added all the evals you need, click Next and then run the simulation.

View Results

After the simulation completes, your results appear in the simulation dashboard. Each scenario shows a score for every eval you configured. You can drill into individual conversations to see the full transcript and where the agent scored well or poorly.

Creating a Custom Eval

If the built-in evals don’t cover your use case, you can create your own.

Start a Custom Eval

In the eval drawer, click Create your own evals and provide a unique name.

Create custom eval

Write a Rule Prompt

Select a model (recommended: TURING_LARGE) and write your evaluation criteria using {{ }} for input variables.

Example: Given {{conversation}}, evaluate if the agent convinces the customer to purchase insurance.

Map {{conversation}} to Mono Voice Recording or Stereo Recording.

Set the Output Type

Choose how the eval should score results:

Pass/Fail — recommended for most cases
Percentage — specify what 0% means
Categorical — define all possible output labels

Click Create Evaluation to save it as a reusable template under User Built evals.

Use the Custom Eval

Your custom eval now appears in the eval drawer. Select it, give it a run name, map the input columns, and click Save Eval.

Custom eval saved

Next Steps

Browse all built-in evals to find metrics that fit your use case
Set up agent definitions if you haven’t already
Learn about simulation concepts for a deeper understanding of how scenarios and personas work

Was this page helpful?

FutureAGI AI Assistant