Run Test

This comprehensive guide walks you through creating and running simulation tests to evaluate your AI agents. We’ll continue with our insurance sales agent example to demonstrate the complete testing workflow.

Overview

Running tests in FutureAGI involves a 5-step wizard that guides you through:
  1. Test configuration
  2. Scenario selection
  3. Simulation agent selection
  4. Evaluation configuration
  5. Review and execution
Test Creation Overview

Creating a Test

Step 1: Test Configuration

Navigate to SimulationsRun Tests and click “Create Test” to start the test creation wizard. Create Test Button

Basic Information

Configure your test with meaningful information: Test Name (Required)
  • Enter a descriptive name for your test
  • Example: Insurance Sales Agent - Q4 Performance Test
  • Best practice: Include agent type, purpose, and timeframe
Test Name Field Description (Optional)
  • Provide context about what this test evaluates
  • Example: Testing our insurance sales agent's ability to handle diverse customer profiles, with focus on objection handling and conversion rates
  • Include test goals and success criteria
Test Description Field Click “Next” to proceed to scenario selection.

Step 2: Select Test Scenarios

Choose one or more scenarios that your agent will be tested against. This screen shows all available scenarios with their details. Select Scenarios Screen

Scenario Selection Features

Search Bar
  • Search scenarios by name or description
  • Real-time filtering as you type
  • Example: Search “insurance” to find relevant scenarios
Scenario Search Scenario List Each scenario card displays:
  • Name: Scenario identifier
  • Description: What the scenario tests
  • Type Badge: Dataset, Graph, Script, or Auto-generated
  • Row Count: Number of test cases (for dataset scenarios)
Scenario Card Multi-Select
  • Check multiple scenarios to test various situations
  • Selected scenarios are highlighted with a primary border
  • Counter shows total selected: “Scenarios (3)”
Pagination
  • Navigate through scenarios if you have many
  • Adjust items per page (10, 25, 50)

Empty State

If no scenarios exist, you’ll see:
  • Empty state message
  • Direct link to create scenarios
  • Documentation link
No Scenarios Empty State Select your scenarios and click “Next”.

Step 3: Select Test Agent

Choose the simulation agent that will interact with your insurance sales agent. This agent simulates customer behavior during tests. Select Test Agent Screen

Agent Selection Features

Search Functionality
  • Search agents by name
  • Filter to find specific customer personas
Agent Search Bar Agent Cards Each agent shows:
  • Name: Agent identifier (e.g., “Insurance Customer Simulator”)
  • Radio Button: Single selection only
  • Clean, simple interface for quick selection
Agent Selection Card Empty State If no simulation agents exist:
  • Helpful message about creating agents
  • Direct button to add simulator agent
  • Links to documentation
No Agents Empty State Select your simulation agent and click “Next”.

Step 4: Select Evaluations

Configure evaluation metrics to measure your agent’s performance. This step is crucial for defining success criteria. Select Evaluations Screen

Important Notice

A warning banner explains:
  • Selected evaluations will be created and linked to this test run
  • Evaluations become part of your test configuration
  • They’ll run automatically during test execution
Evaluation Warning Banner

Adding Evaluations

Initial State When no evaluations are selected:
  • Empty state with clear message
  • Prominent “Add Evaluations” button
Add Evaluations Empty State Evaluation Selection Dialog Clicking “Add Evaluations” opens a comprehensive dialog: Evaluation Selection Dialog The dialog includes:
  • Search bar: Find evaluations by name or type
  • Category tabs: System, Custom, or All evaluations
  • Evaluation list: Available evaluation templates
Common evaluations for insurance sales:
  • Conversation Quality: Measures professionalism and clarity
  • Sales Effectiveness: Tracks conversion and objection handling
  • Compliance Check: Ensures regulatory requirements
  • Product Knowledge: Verifies accurate information
  • Customer Satisfaction: Simulated CSAT score

Selected Evaluations View

After adding evaluations, you’ll see:
  • Total count: “Selected Evaluations (5)”
  • “Add More” button for additional evaluations
  • List of selected evaluations with:
    • Name and description
    • Configuration details (if any)
    • Mapped fields shown as chips
    • Remove button (trash icon)
Selected Evaluations List

Evaluation Configuration

Some evaluations require field mapping:
  • Map evaluation inputs to your data fields
  • Example: Map “customer_response” to “agent_reply”
  • Configured mappings show as chips
Evaluation Mapping Click “Next” to review your configuration.

Step 5: Summary

Review all your test configuration before creating the test. Test Summary Screen The summary is organized into clear sections:

Test Configuration Section

Shows your basic test setup:
  • Test name
  • Description (if provided)
  • Creation timestamp
Summary Test Config

Selected Test Scenarios Section

Displays all chosen scenarios:
  • Total count: “3 scenario(s) selected”
  • Each scenario shows:
    • Name and description
    • Row count for datasets
    • Gray background for easy scanning
Summary Scenarios

Selected Test Agent Section

Shows your chosen simulation agent:
  • Agent name
  • Description (if available)
  • Highlighted in gray box
Summary Agent

Selected Evaluations Section

Lists all evaluation metrics:
  • Total count: “5 evaluation(s) selected”
  • Each evaluation shows:
    • Name and description
    • Any configured mappings
    • Gray background boxes
Summary Evaluations

Action Buttons

  • Back: Return to modify any section
  • Create Test: Finalize and create the test

Creating the Test

When you click “Create Test”:
  1. Loading State
    • Button shows “Creating…” with spinner
    • All inputs are disabled
    • Prevents duplicate submissions
Creating Test Loading
  1. Success
    • Success notification appears
    • Automatically redirects to test list
    • Your test appears at the top
Test Creation Success
  1. Error Handling
    • Clear error messages
    • Specific guidance on issues
    • Ability to retry

Running Tests

Once created, tests appear in your test list. Here’s how to run them:

Test List View

Navigate to SimulationsRun Tests to see all your tests. Run Tests List Each test row shows:
  • Name & Description: Test identifier and purpose
  • Scenarios: Count of included scenarios
  • Agent: Which sales agent is being tested
  • Testing Agent: Customer simulator being used
  • Data Points: Total test cases from all scenarios
  • Evaluations: Number of metrics being tracked
  • Created: Timestamp
  • Actions: Run, view details, edit, delete

Running a Test

Click on a test to view its details and run options. Test Detail View

Test Detail Header

Shows test information and primary actions:
  • Test name and description
  • Run Test button (primary action)
  • Navigation breadcrumbs
  • Quick stats (scenarios, evaluations, etc.)
Test Detail Header

Test Runs Tab

The default view shows all test runs: Test Runs Tab Run Test Button Click “Run Test” to start execution:
  1. Confirmation dialog appears
  2. Shows estimated duration
  3. Option to run all or select specific scenarios
Run Test Confirmation Scenario Selection Advanced option to run specific scenarios:
  • Click “Scenarios (X)” button
  • Opens scenario selector
  • Check/uncheck scenarios to include
  • Shows row count for each
Scenario Selector Test Execution Status Once running, the test shows:
  • Status Badge: Running, Completed, Failed
  • Progress Bar: Real-time completion percentage
  • Duration: Elapsed time
  • Start Time: When test began
Test Running Status

Monitoring Test Progress

Click on a running test to monitor progress: Test Execution Monitor Real-time Updates
  • Overall progress percentage
  • Current scenario being executed
  • Completed vs total test cases
  • Live duration counter
Execution Grid Shows individual test case status:
  • Scenario: Which scenario is running
  • Status: Pending, In Progress, Completed, Failed
  • Duration: Time per test case
  • Result: Pass/Fail indicator
Execution Grid

Call Logs Tab

View detailed conversation logs from your tests: Call Logs Tab Features:
  • Search conversations by content
  • Filter by status, duration, or evaluation results
  • Export logs for analysis
  • Pagination for large result sets
Call Log Entry Each log shows:
  • Timestamp and duration
  • Scenario used
  • Conversation preview
  • Evaluation scores
  • Detailed view link
Call Log Entry Detailed Call View Click any call to see:
  • Full conversation transcript
  • Turn-by-turn analysis
  • Evaluation results per metric
  • Audio playback (if enabled)
  • Key moments flagged by evaluations
Detailed Call View

Test Results & Analytics

After test completion, comprehensive results are available:

Test Run Summary

Access from the test runs list by clicking a completed test: Test Run Summary Key Metrics Dashboard
  • Overall Score: Aggregate performance (e.g., 85/100)
  • Pass Rate: Percentage of successful test cases
  • Average Duration: Mean conversation length
  • Conversion Rate: For sales scenarios
Key Metrics Dashboard

Evaluation Results

View performance across all evaluation metrics: Evaluation Results Per-Evaluation Breakdown:
  • Score distribution graph
  • Pass/fail percentages
  • Detailed insights
  • Comparison to benchmarks
Insurance Sales Specific Metrics:
  • Compliance Score: 98% (regulatory adherence)
  • Product Accuracy: 92% (correct information)
  • Objection Handling: 87% (successful responses)
  • Conversion Rate: 65% (sales closed)
  • Customer Satisfaction: 4.2/5 (simulated CSAT)

Detailed Analysis

Conversation Analysis
  • Common failure points
  • Successful patterns
  • Word clouds of key terms
  • Sentiment progression
Conversation Analysis Scenario Performance Compare how your agent performs across different scenarios:
  • Bar charts by scenario
  • Identify weak areas
  • Drill down capabilities
Scenario Performance

Export Options

Export your test results for further analysis: Export Button Located in the test run header: Export Button Export Formats:
  • PDF Report: Executive summary with graphs
  • CSV Data: Raw evaluation scores
  • JSON: Complete test data
  • Call Recordings: Audio files (if enabled)
Export Options Dialog

Advanced Features

Scheduled Tests

Set up recurring test runs:
  1. In test details, click “Schedule” button
  2. Configure:
    • Frequency (daily, weekly, monthly)
    • Time and timezone
    • Notification preferences
    • Auto-report generation
Schedule Test Dialog

Test Comparison

Compare multiple test runs:
  1. Select tests to compare (checkbox)
  2. Click “Compare” button
  3. View side-by-side metrics
  4. Identify improvements or regressions
Test Comparison View

Evaluation Management

From the test detail view:
  • Add new evaluations
  • Remove underperforming metrics
  • Adjust evaluation thresholds
  • Create custom evaluations
Evaluation Management

Best Practices

Test Strategy

  1. Start Small: Begin with 5-10 test cases
  2. Increase Gradually: Add scenarios as you improve
  3. Regular Cadence: Run tests daily or weekly
  4. Version Control: Track agent changes between tests

Scenario Coverage

For insurance sales agents:
  • Demographics: Test all age groups and income levels
  • Products: Cover all insurance types
  • Objections: Include common customer concerns
  • Edge Cases: Difficult or unusual situations

Evaluation Selection

Choose evaluations that match your goals:
  • Quality: Conversation flow and professionalism
  • Accuracy: Product information correctness
  • Compliance: Regulatory requirement adherence
  • Business: Conversion and revenue metrics

Results Analysis

  1. Look for Patterns: Identify common failure points
  2. Compare Scenarios: Find which situations challenge your agent
  3. Track Trends: Monitor improvement over time
  4. Act on Insights: Update agent based on results

Troubleshooting

Common Issues

Test Won’t Start
  • Verify agent definition has valid API credentials
  • Check simulation agent is properly configured
  • Ensure scenarios have valid data
  • Confirm you have sufficient credits
Low Scores
  • Review evaluation thresholds
  • Check if scenarios match agent training
  • Analyze failure patterns in call logs
  • Adjust agent prompts based on feedback
Long Execution Times
  • Reduce concurrent test cases
  • Simplify complex scenarios
  • Check for timeout settings
  • Monitor resource usage

Getting Help

  • Documentation: Detailed guides for each feature
  • Support: Contact team for assistance
  • Community: Share experiences with other users
  • Updates: Regular feature improvements

Next Steps

After mastering test execution:
  1. Optimize Your Agent: Use insights to improve performance
  2. Expand Testing: Add more scenarios and evaluations
  3. Automate: Set up scheduled tests and CI/CD integration
  4. Scale: Test multiple agents and versions
For advanced topics: