Testing a Voice AI Agent with Agent Simulate SDK
This cookbook demonstrates how to use the agent-simulate SDK to test a conversational voice AI agent.
Testing a Voice AI Agent with Agent Simulate SDK
This notebook demonstrates how to use the agent-simulate SDK to test a conversational voice AI agent.
We will:
- Install the necessary libraries.
- Start a local LiveKit development server.
- Set up environment variables.
- Define a simple, local support agent to act as the agent-under-test.
- Define a test scenario with a simulated customer persona.
- Run the simulation and record the conversation.
- Display the transcript and play back the recorded audio.
- Run evaluations on the conversation.
1. Installation
First, let’s install the agent-simulate SDK and other required Python packages.
pip install agent-simulate
Download VAD Model
The livekit-agents SDK uses the Silero VAD (Voice Activity Detection) plugin. We need to download its model weights before we can start the simulation.
from livekit.plugins import silero
print("Downloading Silero VAD model...")
silero.VAD.load()
print("Download complete.")
2. Start LiveKit Server
For this demo, we’ll run a local LiveKit development server. Open a new terminal and run the following commands to download and start the server:
curl -sSL https://get.livekit.io | bash
livekit-server --dev --bind 127.0.0.1
The server will keep running in that terminal.
3. Set Environment Variables
We need to configure our API keys and LiveKit server details. The livekit-server --dev command prints the key, secret, and URL you need.
Important:
- Copy the
API Key,API Secret, andURLfrom thelivekit-serveroutput. - You will also need an
OPENAI_API_KEYfor the simulated customer’s LLM. - If you want to run evaluations, you’ll also need your
FI_API_KEYandFI_SECRET_KEY.
import os
import getpass
os.environ["LIVEKIT_URL"] = "http://127.0.0.1:7880"
os.environ["LIVEKIT_API_KEY"] = "devkey" # From livekit-server output
os.environ["LIVEKIT_API_SECRET"] = "secret" # From livekit-server output
os.environ["OPENAI_API_KEY"] = getpass.getpass("Enter your OpenAI API key: ")
# For evaluations
os.environ["FI_API_KEY"] = getpass.getpass("Enter your FI API key: ")
os.environ["FI_SECRET_KEY"] = getpass.getpass("Enter your FI secret key: ")
4. Define the Agent-Under-Test
Instead of connecting to a remote, deployed agent, we’ll define and run a simple SupportAgent locally. The TestRunner will manage spawning this agent for each test case.
import asyncio
import uuid
import contextlib
from dotenv import load_dotenv
from fi.simulate import AgentDefinition, Scenario, Persona, TestRunner, evaluate_report
from livekit import rtc
from livekit.api import AccessToken, VideoGrants
from livekit.agents import Agent, AgentSession, function_tool
from livekit.plugins import openai, silero
from livekit.agents.voice.room_io import RoomInputOptions, RoomOutputOptions
import logging
logging.basicConfig(level=logging.INFO)
class SupportAgent(Agent):
def __init__(self, *, room: rtc.Room, **kwargs):
super().__init__(**kwargs)
self._room = room
@function_tool()
async def end_call(self) -> None:
self.session.say("I'm glad I could help. Have a great day! Goodbye.")
await asyncio.sleep(0.2)
self.session.shutdown()
# Disconnect room if still connected
try:
if getattr(self._room, "isconnected", False):
if callable(self._room.isconnected):
if self._room.isconnected():
await self._room.disconnect()
elif self._room.isconnected:
await self._room.disconnect()
except Exception:
pass
async def run_support_agent(lk_url: str, lk_api_key: str, lk_api_secret: str, room_name: str):
token = (
AccessToken(lk_api_key, lk_api_secret)
.with_identity("support-agent")
.with_grants(VideoGrants(room_join=True, room=room_name))
.to_jwt()
)
room = rtc.Room()
await room.connect(lk_url, token)
agent = SupportAgent(
room=room,
stt=openai.STT(),
llm=openai.LLM(model="gpt-4o-mini", temperature=0.7),
tts=openai.TTS(voice="alloy"),
vad=silero.VAD.load(),
allow_interruptions=True,
min_endpointing_delay=0.4,
max_endpointing_delay=2.2,
instructions=(
"You are a helpful support agent. Be friendly and proactive. "
"Ask clarifying questions and provide step-by-step guidance. "
"Keep the conversation going for at least 6 turns unless the issue is resolved. "
"When the customer confirms their issue is resolved or they say they're done, "
"call the `end_call` tool to gracefully end the call."
),
)
session = AgentSession(
stt=agent.stt,
llm=agent.llm,
tts=agent.tts,
vad=None,
turn_detection="stt",
allow_interruptions=True,
discard_audio_if_uninterruptible=True,
min_interruption_duration=0.25,
min_endpointing_delay=0.35,
max_endpointing_delay=2.0,
preemptive_generation=True,
)
await session.start(
agent,
room=room,
room_input_options=RoomInputOptions(
delete_room_on_close=False,
# ensure the agent hears both simulator and other agents
participant_kinds=[rtc.ParticipantKind.PARTICIPANT_KIND_STANDARD,
rtc.ParticipantKind.PARTICIPANT_KIND_AGENT],
),
room_output_options=RoomOutputOptions(transcription_enabled=False),
)
# small delay so tracks publish before the greeting
await asyncio.sleep(0.6)
session.say("Hello! How can I help you today?")
# Wait until session closes
closed = asyncio.Event()
session.on("close", lambda ev: closed.set())
await closed.wait()
# Ensure disconnect
try:
if getattr(room, "isconnected", False):
if callable(room.isconnected):
if room.isconnected():
await room.disconnect()
elif room.isconnected:
await room.disconnect()
except Exception:
pass
5. Define Test Scenario & Persona
Now we’ll use the agent-simulate SDK to define the test case. We need two main components:
AgentDefinition: Tells theTestRunnerhow to spawn our localSupportAgent.Scenario: Contains one or morePersonaobjects that define the simulated customer’s details.
from fi.simulate import AgentDefinition, Scenario, Persona, TestRunner
room_name = "test-room-1"
# 1. Define the agent to be tested.
# Since it's a local agent, we provide the class and constructor arguments.
agent_definition = AgentDefinition(
name="deployed-support-agent",
url=os.environ["LIVEKIT_URL"],
room_name=room_name,
system_prompt="Helpful support agent",
)
# 2. Create a test scenario
scenario = Scenario(
name="Account Login Support",
dataset=[
Persona(
persona={"name": "Fubar", "mood": "annoyed"},
situation="He is trying to log into his account but keeps getting an 'invalid password' error, even though he's sure it's correct.",
outcome="The agent should calmly guide him to reset his password.",
),
]
)
6. Run the Simulation
Now we’ll instantiate the TestRunner and call run_test. This will:
- Create a new, unique LiveKit room for this test.
- Spawn our
SupportAgentand connect it to the room. - Connect the simulated customer (“Fubar”) to the room.
- Record the full conversation.
- Return a
TestReportcontaining the results.
# This can take a few minutes to run
support_task = asyncio.create_task(
run_support_agent(
os.environ["LIVEKIT_URL"],
os.environ["LIVEKIT_API_KEY"],
os.environ["LIVEKIT_API_SECRET"],
room_name,
)
)
try:
runner = TestRunner()
report = await runner.run_test(
agent_definition,
scenario,
record_audio=True,
max_seconds=240.0,
)
except Exception as e:
print(f"Error: {e}")
# Print the report for inspection
print(report.model_dump_json(indent=2))
7. View Results
The TestReport object contains the full transcript and paths to the recorded audio files. Let’s display the transcript. In an interactive notebook, you could use IPython.display.Audio to play back the combined conversation.
for result in report.results:
print("--- Transcript ---")
print(result.transcript)
print("\n--- Audio Playback ---")
if result.audio_combined_path and os.path.exists(result.audio_combined_path):
print(f"Audio file saved at: {result.audio_combined_path}")
else:
print("Combined audio file not found.")
8. Run Evaluations
The agent-simulate SDK includes a helper function, evaluate_report, to easily run evaluations on your test results using the ai-evaluation library.
You define a list of eval_specs, which map fields from the TestReport (like transcript or audio_combined_path) to the inputs required by your chosen evaluation templates.
from fi.simulate.evaluation import evaluate_report
# Ensure you have set your FI_API_KEY and FI_SECRET_KEY in step 3
if os.environ.get("FI_API_KEY"):
eval_specs = [
{"template": "task_completion", "map": {"input": "persona.situation", "output": "transcript"}},
{"template": "tone", "map": {"output": "transcript"}},
{"template": "is_harmful_advice", "map": {"output": "transcript"}},
{"template": "answer_refusal", "map": {"input": "persona.situation", "output": "transcript"}}
]
report = evaluate_report(
report,
eval_specs=eval_specs,
model_name="turing_large",
api_key=os.environ.get("FI_API_KEY"),
secret_key=os.environ.get("FI_SECRET_KEY"),
)
print("\n--- Test Report ---")
for result in report.results:
print(f"\n--- Persona: {result.persona.persona['name']} ---")
print("Transcript:")
print(result.transcript)
if getattr(result, "audio_combined_path", None):
print(f"Combined audio: {result.audio_combined_path}")
if result.evaluation:
print("Evaluation:")
for k, v in result.evaluation.items():
print(f" - {k}: {v}")
print("\n--- End of Report ---")
else:
print("Skipping evaluations. Set FI_API_KEY and FI_SECRET_KEY to run.")