Observe to Simulate: Replay Production Chat Sessions

Replay real production sessions in a dev environment using chat simulation to debug, iterate, and improve your agent. Works with Observe data.

About

Replay lets you take real production conversations captured in Observe and rerun them against your dev agent using chat simulation. When something goes wrong in production, you select the exact session or trace, create a replay session, and run the same conversation end-to-end. Change your agent and replay again to verify fixes.

Replay types: session vs trace

Type	What is replayed	Use when
Session	All traces in a given `session_id`, ordered by span start time — one multi-turn conversation per session.	You want to replay full production conversations as multi-turn chat scenarios.
Trace	Each selected trace as a separate conversation with one turn (input → output).	You want to replay individual calls or single-turn interactions.

Note

Replay does not require a new integration. It builds on Observe (to capture production sessions/traces) and Chat Simulation (to run the replayed conversations).

When to use

Debug real failures: Reproduce and fix issues from production instead of relying only on synthetic test cases.
Reproduce edge cases: Re-run conversations that only happened in production so you can iterate on them safely.
Compare before vs after: Change your agent and replay the same session to see how behavior and metrics change.
Test fixes safely: Validate prompt, model, or tool changes without impacting live users.
Turn failures into regression tests: Save the replayed scenario and add it to regular simulation runs.

How to

You need Observe integrated (so production sessions and traces are in the platform), and FI_API_KEY / FI_SECRET_KEY for the replay and simulation APIs. To run the simulation via the SDK you’ll also need a chat agent callback and any LLM provider keys it uses — see Chat Simulation Using SDK.

The flow is: select production data → create a replay session → generate scenario (agent + scenario from transcripts) → create run test → run simulation → view results and iterate.

Have Observe capturing production data

With Observe integrated, your production system sends sessions and traces to the platform; they are stored per project. Once that data is there, you can create a replay session from it — no extra setup for replay.

Select sessions or traces and create a replay session

From the Observe experience (e.g. your project’s sessions or traces), choose what to replay: either sessions (full multi-turn conversations by session_id) or traces (individual traces, each treated as one turn). Create a replay session with:

project_id — The Observe project that owns the data.
replay_type — "session" or "trace".
ids — List of session IDs or trace IDs to replay, or set select_all to include all sessions or all traces for the project.

The platform creates a replay session in INIT and returns suggestions (e.g. agent_name, scenario_name, agent_description) and, if you already have replay sessions for this project, an existing agent definition to reuse. You can use these when generating the scenario in the next step.

Generate scenario (agent + scenario from transcripts)

On the replay session, trigger Generate scenario. You provide:

agent_name, scenario_name (required); agent_description (optional).
agent_type — "text" (chat) or "voice"; for replay → chat simulation use text.
no_of_rows — How many scenario rows to generate from the transcripts (default 20).
Optional: personas, custom_columns, graph, generate_graph.

The platform creates or updates an agent definition for the project, creates a graph scenario (source Session Replay) from the production transcripts, and starts the scenario generation workflow. The replay session moves to GENERATING. When the workflow finishes, the scenario is ready to use in a run test.

Create a run test and run the simulation

Once the scenario is ready, create a run test that uses the replay session’s agent definition and scenario. When creating the run test, pass replay_session_id so the platform can mark the replay session as COMPLETED and link it to the new run test.

Then run the simulation the same way you run any chat simulation: from the UI (Simulate → Run Simulation, then run the new run test) or via the Chat Simulation SDK (use the run test name and your agent callback). The replayed conversations run against your dev agent; transcripts and evals are stored in the dashboard.

View results and iterate

Open the run test (or simulation) and inspect the test execution and call executions. You get the same kind of results as for any chat simulation.

Performance metrics (top of the execution view): Chat details — total chats, completed count, completion percentage. System metrics — avg output tokens, avg chat latency (ms), avg turn count, avg CSAT. Evaluation metrics — aggregated eval scores (e.g. ground truth match, task completion) showing how closely the replayed agent matches or improves on the original production behavior.

Session list — Each row is one replayed session. Compare CSAT, token usage (total, input, output), and per-eval scores across runs. Single session — Click a session to see the turn-by-turn transcript (and, where available, a diff or comparison to the original production conversation) so you can see exactly where the agent’s responses, tool calls, or decisions changed after your fix.

Update your agent (prompt, logic, tools, or model) and replay again to verify improvements.

Questions & Discussion

Observe to Simulate: Replay Production Chat Sessions

About

Replay types: session vs trace

When to use

How to

Have Observe capturing production data

Select sessions or traces and create a replay session

Generate scenario (agent + scenario from transcripts)

Create a run test and run the simulation

View results and iterate

Next Steps

Simulation Using SDK

Run Simulation

Scenarios

Agent Definition