Voice Replay: Debug Voice Agents from Production Calls
Replay real production voice calls from Future AGI Observe in simulation to debug, iterate, and improve your voice agent based on real usage.
What it is
Voice Replay (Observe → Simulate) lets you replay real production voice calls captured via Voice Observability and rerun them in a development environment using voice simulation. When something goes wrong in production -a misunderstood order, wrong tool call, poor latency, or bad tone -you can select the exact voice trace from Observe, create a replay session, turn it into a simulation scenario, and run a new voice call end-to-end against your dev agent. Change your agent (prompt, model, voice settings) and replay again to verify fixes. This closes the loop between voice observability and iteration.
Under the hood, the platform extracts the original voice configuration (system prompt, assistant settings, provider config) from the production trace’s raw call log, creates a voice agent definition with a configuration snapshot matching the original call, and generates a graph scenario from the production conversation. You then run the scenario via Voice Simulation (UI or SDK). Results include side-by-side transcript comparison, performance metrics comparison, and audio recording playback for both the baseline and replayed calls.
Note
Voice Replay currently supports Vapi as the primary provider. Retell is supported for transcript comparison but config extraction during replay setup is optimized for Vapi’s data structure.
Use cases
- Debug voice agent failures -Reproduce misunderstood intents, wrong tool calls, or hallucinations from real production calls.
- Compare call quality -Replay the same conversation after changing your prompt, model, or voice settings and compare latency, WPM, and talk ratio side by side.
- Test provider changes -Switch from one voice provider or model to another and replay the same scenarios to measure impact.
- Iterate on voice UX -Improve first messages, interruption handling, or response length by replaying real caller interactions.
- Turn failures into regression tests -Save the replayed scenario and add it to regular simulation runs or CI.
How to
You need Voice Observability integrated (so production voice calls are captured with their recordings and transcripts), and FI_API_KEY / FI_SECRET_KEY for the replay and simulation APIs.
The flow is: select voice traces → create a replay session → generate scenario (agent + scenario from audio/transcripts) → create run test → run voice simulation → compare with baseline and iterate.
Have Voice Observability capturing production calls
With Voice Observability integrated, your production voice calls (via Vapi, Retell, or other supported providers) are captured as traces with conversation-type spans. Each span stores the full call data including transcripts, recordings, and call metrics. See Set Up Voice Observability for integration details.
Select voice traces and create a replay session
From the Observe experience, select the voice traces you want to replay. Create a replay session with:
- project_id -The Observe project that owns the voice traces.
- replay_type -
"trace"(each voice trace is one complete call). - ids -List of trace IDs to replay, or set select_all to include all voice traces.
The platform detects that these are voice traces (by checking for conversation-type spans), extracts the original voice configuration from the raw call log (system prompt, assistant ID, provider, model, phone number), and returns suggestions including agent_type: "voice" and the extracted config.

Generate scenario (agent + scenario from audio)
On the replay session, trigger Generate scenario. You provide:
- agent_name, scenario_name (required); agent_description is auto-extracted from the original call’s system prompt.
- agent_type -
"voice". - no_of_rows -How many scenario rows to generate (default 20).

The platform:
- Creates a voice agent definition with the original provider config (assistant ID, model, voice settings) preserved in the agent version’s configuration snapshot.
- Extracts user intents from each trace -if recording URLs are available, the audio is used for intent extraction. If no recordings exist, text transcripts are used as a fallback.
- Generates a graph scenario (source Session Replay) with persona, situation, and outcome columns derived from the call data.
The replay session moves to GENERATING. When the workflow finishes, the scenario is ready.

Once generated, you can review the scenario rows with persona, situation, and outcome details.

Map eval variables and start replay
After scenarios are generated, you can optionally map eval variables -connect scenario columns (like expected outcome or situation context) to evaluation metrics so the platform can automatically score each replayed call. You can also add additional evaluations after the replay.
Then click Start Replay to create a run test linked to the replay session.
Run the voice simulation
Once the run test is created, run the voice simulation -the platform calls the voice provider using the preserved configuration snapshot, so the replayed call uses the same assistant settings, model, and voice as the original production call. Each scenario row generates a new voice call.
Compare with baseline and iterate
After the simulation completes, open a call execution and click Compare with baseline call to see a side-by-side comparison:
Performance metrics -Call Duration, Turn Count, Avg Agent Latency (ms), User WPM, Bot WPM, and Talk Ratio, each showing the value, absolute change, and percentage change from the baseline call.
Audio recordings -Play back both the baseline and replayed call recordings (stereo, mono combined, mono customer, mono assistant) directly in the UI.
Transcript comparison -Side-by-side transcripts of the baseline call and the replayed call. Use Show Diff to highlight differences between the two conversations.
Update your agent (prompt, model, voice settings, or tools) and replay again to verify improvements.

Note
The Compare with baseline call button only appears for call executions that originated from a replay session (where a baseline trace exists to compare against).
What you can do next
Chat Replay
Replay text-based production sessions using chat simulation.
Voice Observability
Set up voice call monitoring for production calls.
Scenarios
Understand scenarios and how replay creates graph scenarios from transcripts.
Agent Definition
Configure voice agents for simulation, including provider settings and voice config.
Questions & Discussion