Skip to main content

What it does

  • Connects a simulated “customer” into your LiveKit room to talk with your deployed agent
  • Records per-participant WAVs and a combined conversation WAV
  • Produces a transcript and a structured report
  • Integrates with ai-evaluation to score the quality of the agent’s performance

Requirements

  • LiveKit room with your agent already connected (Cloud or self-host)
  • Python 3.12 recommended (works with 3.10–3.13)
  • Environment:
    • LIVEKIT_URL, LIVEKIT_API_KEY, LIVEKIT_API_SECRET
    • OPENAI_API_KEY (for the simulator)
    • Optional FI_API_KEY, FI_SECRET_KEY (for evaluations)

Install

pip install agent-simulate

Quick start

  • Minimal test run against a deployed agent:
from fi.simulate import AgentDefinition, Scenario, Persona, TestRunner, evaluate_report
import os, asyncio

async def main():
    agent = AgentDefinition(
        name="support-agent",
        url=os.environ["LIVEKIT_URL"],
        room_name=os.environ.get("AGENT_ROOM_NAME", "test-room-001"),
        system_prompt="Helpful support agent",
    )

    scenario = Scenario(
        name="Support Test",
        dataset=[
            Persona(
                persona={"name": "Alice"},
                situation="Login issues",
                outcome="Reset password successfully",
            )
        ],
    )

    runner = TestRunner()
    report = await runner.run_test(
        agent,
        scenario,
        record_audio=True,           # enable recorder participant
        recorder_sample_rate=8000,   # low-overhead
        recorder_join_delay=0.1,     # join recorder early
        max_seconds=300.0,           # hard timeout safety net
    )

    # Evaluate: map your evaluator inputs to report fields (strict mapping)
    eval_specs = [
      {"template": "task_completion", "map": {"input": "persona.situation", "output": "transcript"}},
      {"template": "tone",            "map": {"output": "transcript"}},
      {"template": "audio_transcription", "map": {"audio": "audio_combined_path", "transcription": "transcript"}},
    ]
    report = evaluate_report(
        report,
        eval_specs=eval_specs,
        model_name="turing_large",
        api_key=os.getenv("FI_API_KEY"),
        secret_key=os.getenv("FI_SECRET_KEY"),
    )

    for r in report.results:
        print("Persona:", r.persona.persona["name"])
        print("Transcript:\n", r.transcript)
        print("Combined audio:", getattr(r, "audio_combined_path", None))
        print("Evaluation:", r.evaluation)

asyncio.run(main())
  • The SDK base64‑encodes any audio input mapped from a local file path (e.g., audio_combined_path) before sending to the evaluator; your eval specs should reference the report field name directly.
  • Mapping is strict: if a template expects audio, you must map to audio.

How recording works

  • A passive recorder participant joins your room and subscribes to all remote audio tracks.
  • Per-identity WAVs are written to recordings/<room>-<identity>-track-<sid>.wav.
  • A persona‑level combined WAV is mixed and attached to each result as audio_combined_path.
Result fields (on TestCaseResult):
  • audio_input_path: simulated customer’s recording
  • audio_output_path: your agent’s recording
  • audio_combined_path: mono mix of the conversation

Simulator customization (STT/LLM/TTS/VAD)

  • The deployed agent (your agent) is not modified by the SDK; you control its stack.
  • The simulated customer can be configured via SimulatorAgentDefinition and passed to TestRunner.run_test(...).
Available knobs:
  • LLM: model, temperature
  • TTS: model, voice
  • STT: language
  • VAD: provider (e.g., Silero)
  • Turn-taking: allow_interruptions, min_endpointing_delay, max_endpointing_delay
Example:
from fi.simulate import SimulatorAgentDefinition

sim = SimulatorAgentDefinition(
    name="sim-customer",
    instructions="Be concise, ask clarifying questions, confirm resolution.",
    llm={"model": "gpt-4o-mini", "temperature": 0.6},
    tts={"model": "tts-1", "voice": "alloy"},
    stt={"language": "en"},
    vad={"provider": "silero"},
    allow_interruptions=True,
    min_endpointing_delay=0.3,
    max_endpointing_delay=4.0,
)

report = await runner.run_test(agent, scenario, simulator=sim, record_audio=True)

Ending calls

  • The SDK waits for a natural session close or a hard timeout.
  • Best practice: your agent should own hangups (e.g., an end_call tool) and ask for explicit confirmation before ending. Add turn/time gates if needed.

Troubleshooting

  • No recordings
    • Ensure LIVEKIT_API_KEY/SECRET are set and valid
    • Leave recorder_join_delay <= 0.2 to catch early utterances
  • Evaluations say “Audio upload failed”
    • Ensure your eval_specs map audio to audio_combined_path
    • The helper base64‑encodes local paths automatically
  • Stalls: “speech scheduling is paused”
    • Use STT turn detection; keep allow_interruptions=True; balanced endpointing delays (≈0.2–2.2s)

Public API (import from fi.simulate)

  • AgentDefinition
  • SimulatorAgentDefinition
  • Scenario, Persona
  • TestRunner
  • TestReport, TestCaseResult
  • ScenarioGenerator
  • evaluate_report

Core classes quick reference

  • Persona
    • persona: dict (e.g., {"name": "Alice"})
    • situation: str (what the customer wants)
    • outcome: str (what “done” looks like)
  • Scenario
    • name: str
    • dataset: list[Persona]
  • AgentDefinition (your deployed agent under test)
    • name: str
    • url: str (LiveKit URL)
    • room_name: str
    • system_prompt: str
    • llm/tts/stt/vad: simple config knobs (optional; your deployment usually controls these)
  • SimulatorAgentDefinition (simulated customer model/voice)
    • instructions: str (persona behavior)
    • llm: {"model": "...", "temperature": ...}
    • tts: {"model": "...", "voice": "..."}
    • stt: {"language": "..."}
    • vad: {"provider": "silero"}
    • allow_interruptions, min/max_endpointing_delay, use_tts_aligned_transcript (optional)
  • TestRunner
    • run_test(agent_definition, scenario, simulator=None, record_audio=True, …) -> TestReport
    • Records per-speaker WAVs and creates a combined WAV per persona when enabled
  • TestReport
    • results: list[TestCaseResult]
  • TestCaseResult
    • persona: Persona
    • transcript: str
    • evaluation: dict | None
    • audio_input_path: str | None # simulated customer audio
    • audio_output_path: str | None # support agent audio
    • audio_combined_path: str | None # mixed mono WAV for the call