Prompt Workbench Simulation Without SDK or Code

Launch multi-turn chat simulations against any saved prompt version directly from the Future AGI Prompts workbench. No SDK or agent definition required.

📝

TL;DR

Launch multi-turn chat simulations against any saved prompt version directly from the Prompts workbench — no SDK or agent definition required.

Time	Difficulty	Package
10 min	Beginner	UI only

Prerequisites

FutureAGI account → app.futureagi.com
At least one saved prompt version in the Prompts workbench (see Prompt Versioning if you need to create one)
At least one chat scenario under Simulate → Scenarios (see Scenarios if you need to create one)

What is Prompt Workbench Simulation?

The Prompt Workbench has four tabs: Playground, Evaluation, Metrics, and Simulation. The Simulation tab lets you run multi-turn chat simulations where your saved prompt acts directly as the agent. The platform uses your prompt’s system message, model, and parameters to drive the conversation. You do not need a separate agent definition or any SDK code. Each scenario defines a simulated user persona and conversation goal; the platform runs one conversation per scenario row, up to 10 turns each.

Tutorial

Open your prompt in the workbench

Go to app.futureagi.com → Prompts (left sidebar under BUILD) → click the prompt template you want to test.

The workbench opens showing the Playground tab by default.

Navigate to the Simulation tab

Inside the prompt workbench, click the Simulation tab in the top tab bar (next to Playground, Evaluation, and Metrics).

Note

The Simulation tab is only clickable after the prompt has at least one saved version. If the tab shows a tooltip “Save your prompt to run simulations”, go back to the Playground tab and click Run Prompt — this executes the prompt and automatically saves it as a version.

Create a simulation

On the Simulation tab, click Create Simulation. A dialog opens — “Create Chat Simulation”.

Fill in the dialog:

Simulation Name: Auto-populated as Simulation - {Date} at {Time}. Edit it to something descriptive, for example: support-prompt-v2-test.
Prompt Version: Select which saved version of your prompt to test. The default version is pre-selected. Use the dropdown to switch versions.
Description (optional): Notes about what you are testing, for example: Testing revised tone instructions against return-request scenario.
Select Scenarios: Check one or more scenarios from the list. Each checked scenario produces one simulated conversation when the simulation runs.

Tip

If you have no scenarios yet, click Create New Chat Scenario at the top of the scenario list — it opens the scenario creation page in a new tab. After saving, return to this dialog and click the refresh icon to reload the list.

Click Create Simulation. The dialog closes and the simulation detail view opens automatically.

Review and adjust the simulation configuration

The simulation detail view shows the simulation name and a run count chip. The header toolbar includes three controls on the right: Version, Scenarios, and Evals.

Version dropdown: Use this to switch which prompt version the next run uses without recreating the simulation. Changing it updates the simulation immediately.

Scenarios button: Click to open a popover where you can add or remove scenarios. The count badge shows how many are currently attached.

Evals button: Click to open the evaluations drawer. You can add evaluations that will run automatically on each completed conversation. Click Add Evaluation inside the drawer to configure one.

Tip

Adding evaluations before running is recommended. Evaluations like Task Completion, Tone, and the Conversational agent evaluation group give you structured quality scores on top of raw CSAT. You can also add evaluations after the run and re-run them on completed conversations.

Run the simulation

Click Run Simulation in the top-right corner of the simulation detail header.

A success notification confirms execution has started. The simulation creates one chat conversation per attached scenario row. Each conversation runs up to 10 turns between your prompt (acting as the agent) and the simulated customer.

The executions grid below the header updates in real time. Each row is one conversation. You can search runs using the search bar above the grid.

View execution results

Once conversations complete, click any row in the executions grid to open the execution detail page at /dashboard/simulate/test/{simulationId}/{executionId}.

The execution detail page has three tabs: Simulated runs, Logs, and Analytics.

Simulated runs tab

Shows the full conversation transcript — every turn between the simulated user and your prompt. Review the dialogue to see how the prompt handled the scenario.

Analytics tab

Shows aggregate performance metrics across executions in this simulation:

Metric group	What it shows
Chat Details	Total chats, completed count, completion percentage
System Metrics	Avg total tokens, avg input tokens, avg output tokens, avg chat latency (ms)
Evaluation Metrics	Average score per configured evaluation (e.g., Task Completion, Tone)

Reading the executions grid

Back on the simulation detail view, the grid shows one row per completed conversation with these columns:

Column	Description
Status	Completed, In Progress, or Failed
CSAT	Customer satisfaction score with color indicator
Total Tokens	Total tokens used in the conversation
Input Tokens	Prompt tokens
Output Tokens	Completion tokens
Average Latency (ms)	Average response time per turn
Turn Count	Number of back-and-forth turns
Evaluation Metrics	Per-eval results as colored tags

Iterate — swap versions and re-run

Use the Version dropdown in the simulation header to switch to a different prompt version, then click Run Simulation again. Each run appends new rows to the executions grid — all previous runs are preserved. Compare CSAT and evaluation scores across runs to measure whether prompt changes improved results.

What you built

You can now run multi-turn chat simulations against any prompt version, review CSAT scores and evaluation results, and iterate on prompt quality without writing any code.

Opened a saved prompt in the Prompts workbench and navigated to the Simulation tab
Created a chat simulation by selecting a prompt version and attaching scenarios
Configured evaluations to score each completed conversation automatically
Ran the simulation and reviewed per-conversation CSAT scores and transcripts in the execution detail view
Iterated by switching prompt versions and re-running without leaving the workbench

Questions & Discussion