What it does
- Connects a simulated “customer” into your LiveKit room to talk with your deployed agent
- Records per-participant WAVs and a combined conversation WAV
- Produces a transcript and a structured report
- Integrates with ai-evaluation to score the quality of the agent’s performance
Requirements
- LiveKit room with your agent already connected (Cloud or self-host)
- Python 3.12 recommended (works with 3.10–3.13)
- Environment:
LIVEKIT_URL
,LIVEKIT_API_KEY
,LIVEKIT_API_SECRET
OPENAI_API_KEY
(for the simulator)- Optional
FI_API_KEY
,FI_SECRET_KEY
(for evaluations)
Install
Quick start
- Minimal test run against a deployed agent:
- The SDK base64‑encodes any audio input mapped from a local file path (e.g.,
audio_combined_path
) before sending to the evaluator; your eval specs should reference the report field name directly. - Mapping is strict: if a template expects
audio
, you must map toaudio
.
How recording works
- A passive recorder participant joins your room and subscribes to all remote audio tracks.
- Per-identity WAVs are written to
recordings/<room>-<identity>-track-<sid>.wav
. - A persona‑level combined WAV is mixed and attached to each result as
audio_combined_path
.
TestCaseResult
):
audio_input_path
: simulated customer’s recordingaudio_output_path
: your agent’s recordingaudio_combined_path
: mono mix of the conversation
Simulator customization (STT/LLM/TTS/VAD)
- The deployed agent (your agent) is not modified by the SDK; you control its stack.
- The simulated customer can be configured via
SimulatorAgentDefinition
and passed toTestRunner.run_test(...)
.
- LLM:
model
,temperature
- TTS:
model
,voice
- STT:
language
- VAD:
provider
(e.g., Silero) - Turn-taking:
allow_interruptions
,min_endpointing_delay
,max_endpointing_delay
Ending calls
- The SDK waits for a natural session close or a hard timeout.
- Best practice: your agent should own hangups (e.g., an
end_call
tool) and ask for explicit confirmation before ending. Add turn/time gates if needed.
Troubleshooting
-
No recordings
- Ensure
LIVEKIT_API_KEY/SECRET
are set and valid - Leave
recorder_join_delay <= 0.2
to catch early utterances
- Ensure
-
Evaluations say “Audio upload failed”
- Ensure your
eval_specs
mapaudio
toaudio_combined_path
- The helper base64‑encodes local paths automatically
- Ensure your
-
Stalls: “speech scheduling is paused”
- Use STT turn detection; keep
allow_interruptions=True
; balanced endpointing delays (≈0.2–2.2s)
- Use STT turn detection; keep
Public API (import from fi.simulate
)
AgentDefinition
SimulatorAgentDefinition
Scenario
,Persona
TestRunner
TestReport
,TestCaseResult
ScenarioGenerator
evaluate_report
Core classes quick reference
-
Persona
- persona: dict (e.g.,
{"name": "Alice"}
) - situation: str (what the customer wants)
- outcome: str (what “done” looks like)
- persona: dict (e.g.,
-
Scenario
- name: str
- dataset: list[Persona]
-
AgentDefinition (your deployed agent under test)
- name: str
- url: str (LiveKit URL)
- room_name: str
- system_prompt: str
- llm/tts/stt/vad: simple config knobs (optional; your deployment usually controls these)
-
SimulatorAgentDefinition (simulated customer model/voice)
- instructions: str (persona behavior)
- llm:
{"model": "...", "temperature": ...}
- tts:
{"model": "...", "voice": "..."}
- stt:
{"language": "..."}
- vad:
{"provider": "silero"}
- allow_interruptions, min/max_endpointing_delay, use_tts_aligned_transcript (optional)
-
TestRunner
- run_test(agent_definition, scenario, simulator=None, record_audio=True, …) -> TestReport
- Records per-speaker WAVs and creates a combined WAV per persona when enabled
-
TestReport
- results: list[TestCaseResult]
-
TestCaseResult
- persona: Persona
- transcript: str
- evaluation: dict | None
- audio_input_path: str | None # simulated customer audio
- audio_output_path: str | None # support agent audio
- audio_combined_path: str | None # mixed mono WAV for the call