What it does
- Connects a simulated “customer” into your LiveKit room to talk with your deployed agent
- Records per-participant WAVs and a combined conversation WAV
- Produces a transcript and a structured report
- Integrates with ai-evaluation to score the quality of the agent’s performance
Requirements
- LiveKit room with your agent already connected (Cloud or self-host)
- Python 3.12 recommended (works with 3.10–3.13)
- Environment:
LIVEKIT_URL,LIVEKIT_API_KEY,LIVEKIT_API_SECRETOPENAI_API_KEY(for the simulator)- Optional
FI_API_KEY,FI_SECRET_KEY(for evaluations)
Install
Quick start
- Minimal test run against a deployed agent:
- The SDK base64‑encodes any audio input mapped from a local file path (e.g.,
audio_combined_path) before sending to the evaluator; your eval specs should reference the report field name directly. - Mapping is strict: if a template expects
audio, you must map toaudio.
How recording works
- A passive recorder participant joins your room and subscribes to all remote audio tracks.
- Per-identity WAVs are written to
recordings/<room>-<identity>-track-<sid>.wav. - A persona‑level combined WAV is mixed and attached to each result as
audio_combined_path.
TestCaseResult):
audio_input_path: simulated customer’s recordingaudio_output_path: your agent’s recordingaudio_combined_path: mono mix of the conversation
Simulator customization (STT/LLM/TTS/VAD)
- The deployed agent (your agent) is not modified by the SDK; you control its stack.
- The simulated customer can be configured via
SimulatorAgentDefinitionand passed toTestRunner.run_test(...).
- LLM:
model,temperature - TTS:
model,voice - STT:
language - VAD:
provider(e.g., Silero) - Turn-taking:
allow_interruptions,min_endpointing_delay,max_endpointing_delay
Ending calls
- The SDK waits for a natural session close or a hard timeout.
- Best practice: your agent should own hangups (e.g., an
end_calltool) and ask for explicit confirmation before ending. Add turn/time gates if needed.
Troubleshooting
-
No recordings
- Ensure
LIVEKIT_API_KEY/SECRETare set and valid - Leave
recorder_join_delay <= 0.2to catch early utterances
- Ensure
-
Evaluations say “Audio upload failed”
- Ensure your
eval_specsmapaudiotoaudio_combined_path - The helper base64‑encodes local paths automatically
- Ensure your
-
Stalls: “speech scheduling is paused”
- Use STT turn detection; keep
allow_interruptions=True; balanced endpointing delays (≈0.2–2.2s)
- Use STT turn detection; keep
Public API (import from fi.simulate)
AgentDefinitionSimulatorAgentDefinitionScenario,PersonaTestRunnerTestReport,TestCaseResultScenarioGeneratorevaluate_report
Core classes quick reference
-
Persona
- persona: dict (e.g.,
{"name": "Alice"}) - situation: str (what the customer wants)
- outcome: str (what “done” looks like)
- persona: dict (e.g.,
-
Scenario
- name: str
- dataset: list[Persona]
-
AgentDefinition (your deployed agent under test)
- name: str
- url: str (LiveKit URL)
- room_name: str
- system_prompt: str
- llm/tts/stt/vad: simple config knobs (optional; your deployment usually controls these)
-
SimulatorAgentDefinition (simulated customer model/voice)
- instructions: str (persona behavior)
- llm:
{"model": "...", "temperature": ...} - tts:
{"model": "...", "voice": "..."} - stt:
{"language": "..."} - vad:
{"provider": "silero"} - allow_interruptions, min/max_endpointing_delay, use_tts_aligned_transcript (optional)
-
TestRunner
- run_test(agent_definition, scenario, simulator=None, record_audio=True, …) -> TestReport
- Records per-speaker WAVs and creates a combined WAV per persona when enabled
-
TestReport
- results: list[TestCaseResult]
-
TestCaseResult
- persona: Persona
- transcript: str
- evaluation: dict | None
- audio_input_path: str | None # simulated customer audio
- audio_output_path: str | None # support agent audio
- audio_combined_path: str | None # mixed mono WAV for the call