Fix My Agent: Diagnostics and Fixes from Simulation Results

Diagnose and fix agent performance issues using in-depth analytics from Future AGI simulation results. Get targeted recommendations for each failure type.

After running simulations, Future AGI’s Fix My Agent feature automatically analyzes your agent’s performance and provides actionable recommendations to improve quality, reduce failures, and enhance overall effectiveness. Instead of manually debugging issues, get intelligent suggestions with one click.

About

Fix My Agent analyzes your simulation results — call metrics, transcripts, and eval scores — and surfaces a prioritized list of issues with specific recommended fixes. After a run, instead of manually reviewing each call to find patterns, you get a clear breakdown of what’s failing, how many calls it affected, and what to change. You can then implement fixes, re-run, and compare results to validate improvements.

Note

Fix My Agent gives you instant diagnostics and suggestions. For advanced prompt refinement, the platform also offers optimization algorithms (later in this guide) that automatically generate and test multiple prompt variations.

When to use

Quick diagnostics — Get instant, prioritized suggestions after every simulation run without manual debugging.
Reduce failures — Address high-priority issues (e.g. latency, brevity, end-of-speech) that affect the most calls.
Validate changes — Implement fixes, re-run the simulation, and compare metrics to confirm improvements.
Auto-optimization (optional) — Use algorithms (Random Search, Bayesian, Meta-Prompt, ProTeGi, PromptWizard, GEPA) to generate and evaluate optimized prompts when manual fixes aren’t enough.

How to

Use Fix My Agent from the execution detail page after a simulation run. Recommended flow: run simulation → open Fix My Agent → review and apply suggestions → re-run to validate. Optionally run auto-optimization for systematic prompt refinement.

Navigate to execution results

After your simulation run completes, open the execution detail page.

What you see (field meanings):

Field	Meaning
Call Details	Total calls, connected calls, connection rate for this run.
System Metrics	CSAT scores, agent latency, WPM (words per minute).
Evaluation Metrics	Results from the evaluations you attached to the simulation.

This is where Fix My Agent runs its analysis.

Open Fix My Agent

Click Fix My Agent in the top-right of the execution page. A side panel opens.

What the panel shows (field meanings):

Field	Meaning
Suggestions	Total number of issues the analysis identified.
Priority	High / Medium / Low — urgency of each issue.
Issue categories	Type of problem (e.g. latency, response brevity, detection tuning).
Affected calls	How many calls in this run showed each issue.
Last updated	When the analysis was last run (refresh to get a new analysis).

No configuration required—suggestions are generated from the run.

Review and apply suggestions

Each suggestion in the panel has these parts:

Field	Meaning
Issue description	What’s wrong (e.g. pipeline latency, response length, end-of-speech detection).
Recommended fix	What to change (e.g. switch to a faster model, add a token limit, adjust VAD parameters).
Priority	High / Medium / Low — tackle High first.
Affected calls	Number of calls that showed this issue.
View issue	Opens specific call examples so you can see the problem in context.

Example suggestion types: Aggressively Reduce Pipeline Latency (e.g. faster model for lower TTFT), Enforce Strict Response Brevity (e.g. hard token limit), Tune End-of-Speech Detection (e.g. adjust VAD). Implement the recommended changes in your system prompt, then re-run the simulation to validate. Start with High Priority; do 1–2 fixes per iteration and re-run to verify before moving on.

Optional: Run auto-optimization

To have the platform generate and test prompt variations, click Optimize My Agent in the Fix My Agent panel.

Configuration fields:

Field	Meaning
Name	Label for this optimization run (e.g. “opt1”, “latency-v2”).
Optimizer	Algorithm that generates and evaluates prompt variations (see below).
Language model	LLM used for the optimization (teacher model).
Parameters	Optimizer-specific settings (e.g. number of variations, rounds, trials).

Choose an optimizer — Select from the algorithms below:

Best for: Quick baseline testing and initial exploration.

How it works: Generates random prompt variations using a teacher model and evaluates each candidate.

Characteristics:

⚡⚡⚡ Fast execution
⭐⭐ Basic quality improvements
💰 Low cost
Ideal for: 10-30 examples

Use when: You need quick results or want to establish a performance baseline before trying more sophisticated algorithms.

Best for: Few-shot learning tasks and intelligent example selection.

How it works: Uses Bayesian optimization to intelligently select few-shot examples and prompt configurations.

Characteristics:

⚡⚡ Medium speed
⭐⭐⭐⭐ High quality
💰💰 Medium cost
Ideal for: 15-50 examples

Use when: Your dataset contains good examples and you want to leverage few-shot learning effectively.

Best for: Complex reasoning tasks requiring deep analysis.

How it works: Analyzes failed examples, formulates hypotheses, and rewrites the entire prompt through deep reasoning.

Characteristics:

⚡⚡ Medium speed
⭐⭐⭐⭐ High quality
💰💰💰 Higher cost
Ideal for: 20-40 examples

Use when: Your agent handles complex reasoning tasks or you need holistic prompt redesign.

Best for: Identifying and fixing specific error patterns.

How it works: Generates critiques of failures and applies targeted improvements using beam search to maintain multiple candidates.

Characteristics:

⚡ Slower execution
⭐⭐⭐⭐ High quality
💰💰💰 Higher cost
Ideal for: 20-50 examples

Use when: You have clear failure patterns and want systematic error fixing.

Best for: Creative exploration and diverse prompt variations.

How it works: Combines mutation with different “thinking styles”, then critiques and refines top performers.

Characteristics:

⚡ Slower execution
⭐⭐⭐⭐ High quality
💰💰💰 Higher cost
Ideal for: 15-40 examples

Use when: You want creative exploration or diverse conversational approaches.

Best for: Production deployments requiring state-of-the-art performance.

How it works: Uses evolutionary algorithms with reflective learning and mutation strategies inspired by natural selection.

Characteristics:

⚡ Slower execution
⭐⭐⭐⭐⭐ Excellent quality
💰💰💰💰 Highest cost
Ideal for: 30-100 examples

Use when: You need production-grade optimization with robust results and have sufficient evaluation budget.

Click Start Optimizing your agent to begin the automated prompt generation process. The optimization engine will: (1) Analyze your simulation data and Fix My Agent suggestions; (2) Generate multiple system prompt variations using the selected algorithm; (3) Evaluate each variation against your test scenarios; (4) Score performance improvements; (5) Select the best-performing optimized prompt. View results in the Optimization Runs tab: performance comparison, best prompt, and history. Review the improved prompt, test on scenarios not in the original set, then update your agent and re-run to validate.

Tip

Most users find that manually implementing Fix My Agent suggestions is the fastest path to improvement. Use auto-optimization when you need to test many prompt variations or want production-grade automated refinement.

View results and deploy

After implementing fixes or running auto-optimization, use the tabs below to view results and deploy.

After implementing Fix My Agent suggestions:

Re-run simulations with your updated prompt
Compare metrics to baseline in the execution dashboard
Review new suggestions from Fix My Agent
Iterate until performance meets your goals
Deploy to production when satisfied

If you used automated optimization, view results in the Optimization Runs tab:

Performance comparison — Original prompt baseline scores, auto-generated prompt scores, improvement percentage.

Best prompt — The highest-performing variation, changes from the original, evaluation scores across metrics.

Optimization history — All variations tested, performance trajectory, iteration details.

Copy the best prompt into your agent, test on new scenarios, then deploy. Always validate with test cases that weren’t in the optimization set to avoid overfitting.

Whether implementing manually or using auto-optimization:

✓ Review the improved prompt carefully
✓ Test with additional scenarios not in original dataset
✓ Update your agent definition with the new prompt
✓ Re-run simulations to validate improvements
✓ Monitor performance in production

Warning

Always validate with new test cases before production deployment. Both manual and automated approaches can overfit to the evaluation dataset.

Algorithm Comparison

Algorithm	Speed	Quality	Cost	Best Dataset Size
Random Search	⚡⚡⚡	⭐⭐	💰	10-30 examples
Bayesian Search	⚡⚡	⭐⭐⭐⭐	💰💰	15-50 examples
Meta-Prompt	⚡⚡	⭐⭐⭐⭐	💰💰💰	20-40 examples
ProTeGi	⚡	⭐⭐⭐⭐	💰💰💰	20-50 examples
PromptWizard	⚡	⭐⭐⭐⭐	💰💰💰	15-40 examples
GEPA	⚡	⭐⭐⭐⭐⭐	💰💰💰💰	30-100 examples

Note

Speed: ⚡ = Slow, ⚡⚡ = Medium, ⚡⚡⚡ = Fast
Quality: ⭐ = Basic, ⭐⭐⭐⭐⭐ = Excellent
Cost: 💰 = Low, 💰💰💰💰 = High (based on API calls)

Decision Tree

Do you need production-grade optimization?
├─ Yes → Use GEPA
└─ No
   │
   Do you have clear error patterns to fix?
   ├─ Yes → Use ProTeGi
   └─ No
      │
      Is your task reasoning-heavy or complex?
      ├─ Yes → Use Meta-Prompt
      └─ No
         │
         Do you need few-shot learning optimization?
         ├─ Yes → Use Bayesian Search
         └─ No
            │
            Do you want creative exploration?
            ├─ Yes → Use PromptWizard
            └─ No → Use Random Search (baseline)

Questions & Discussion

Fix My Agent: Diagnostics and Fixes from Simulation Results

About

When to use

How to

Navigate to execution results

Open Fix My Agent

Review and apply suggestions

Optional: Run auto-optimization

View results and deploy

Algorithm Comparison

Decision Tree

Next Steps

Run Simulation

Create Scenarios

Agent Definition

Optimization Algorithms (Advanced)