What is Observe → Simulate?
Observe → Simulate lets you replay real production conversations captured via Observe, and rerun them safely in a development environment using chat simulation. If something went wrong in production: a hallucination, tool failure, bad tone, or incorrect decision, you can:- Select the exact trace or session from Observe
- Click Replay
- Recreate the same user intent as a simulation scenario
- Re-run the full conversation end-to-end against your dev agent
- Modify your agent (prompt, logic, tools) and replay again
When is this important?
Use Observe → Simulate when you want to:- Debug real failures instead of synthetic test cases
- Reproduce edge cases seen only in production
- Compare before vs after agent behavior
- Safely test fixes without impacting users
- Turn production issues into repeatable regression tests
If you can observe it, you should be able to replay it.
How it works
-
Observe captures sessions
Your production system sends sessions, messages, tools, and metadata to Future AGI via Observe. -
You select a session
Choose a full session from the Observe UI. -
Replay generates scenarios
Future AGI automatically creates chat simulation scenarios that recreate the original user intent and flow. -
Simulation runs in dev
The replay is executed using Chat Simulation, calling your dev agent turn-by-turn. -
You iterate and re-run
Update prompts, logic, tools, or models and replay again.
Prerequisites
Before using Observe → Simulate, make sure you have:- Observe integrated in your production architecture
- Chat Simulation configured in Future AGI
- A chat agent callback available in your dev environment
FI_API_KEYandFI_SECRET_KEY
Integration overview
Observe → Simulate does not require a new integration. It builds directly on top of Chat Simulation.Required components
| Component | Purpose |
|---|---|
| Observe | Capture real production |
| Chat Agent Definition | Defines your chat agent |
| Scenarios (auto-generated) | Recreate user intent from production |
| Run Test (Chat) | Executes replayed sessions |
| Agent Callback | Your dev agent implementation |
Step 1: Integrate Observe (Production)
Once Observe is integrated, all sessions automatically appear in the Future AGI platform. No additional setup is required for replay.Step 2: Select a session to replay
From the Observe UI:- Open a session or trace
- Click Replay
- Choose:
- Environment (e.g.
dev) - Agent version
- Optional overrides (prompt, model, tools)
- Environment (e.g.
- Conversation turns
- User intent
- Tool usage
- Metadata
Step 3: Run replay using Chat Simulation
Follow the steps in chat simulation using SDK. Once the replay completes, you’ll see a results dashboard with a detailed breakdown of every replayed session:
Understanding the replay results
The replay results view gives you a side-by-side comparison of all replayed sessions so you can quickly identify regressions, improvements, and patterns. Performance Metrics at the top provide an aggregate summary:- Chat Details — Total chats, completed count, and completion percentage.
- System Metrics — Avg output tokens, avg chat latency (ms), avg turn count, and avg CSAT score across all sessions.
- Evaluation Metrics — Aggregated evaluation scores such as Ground Truth Match, showing how closely your agent’s responses align with the original production responses.
- Compare CSAT scores per session to spot which conversations improved or degraded.
- Review token usage (total, input, output) to understand cost and verbosity changes.
- Click into any session to see a turn-by-turn diff between the original production conversation and the replayed version, highlighting differences in agent responses, tool calls, and decision paths.
Step 4: Iterate and replay again
Update prompts, fix logic, change tools or models, and replay the same session again to verify improvements.Common workflows
Debug a bad production response
Replay → Fix → Replay againConvert a failure into a regression test
Replay → Save scenario → Add to CI runsCompare agent versions
Replay the same session across multiple agentsKey takeaway
Observe → Simulate turns production data into a development superpower.Every production failure becomes a reproducible test case.