Fix My Agent
In-depth diagnostics and targeted fixes for your agent's performance issues based on simulation results
After running simulations, Future AGI’s Fix My Agent feature automatically analyzes your agent’s performance and provides actionable recommendations to improve quality, reduce failures, and enhance overall effectiveness. Instead of manually debugging issues, get intelligent suggestions with one click.
What it is
Fix My Agent analyzes your simulation results — call metrics, transcripts, and eval scores — and surfaces a prioritized list of issues with specific recommended fixes. After a run, instead of manually reviewing each call to find patterns, you get a clear breakdown of what’s failing, how many calls it affected, and what to change. You can then implement fixes, re-run, and compare results to validate improvements.
Note
Fix My Agent gives you instant diagnostics and suggestions. For advanced prompt refinement, the platform also offers optimization algorithms (later in this guide) that automatically generate and test multiple prompt variations.
Use cases
- Quick diagnostics — Get instant, prioritized suggestions after every simulation run without manual debugging.
- Reduce failures — Address high-priority issues (e.g. latency, brevity, end-of-speech) that affect the most calls.
- Validate changes — Implement fixes, re-run the simulation, and compare metrics to confirm improvements.
- Auto-optimization (optional) — Use algorithms (Random Search, Bayesian, Meta-Prompt, ProTeGi, PromptWizard, GEPA) to generate and evaluate optimized prompts when manual fixes aren’t enough.
How to
Use Fix My Agent from the execution detail page after a simulation run. Recommended flow: run simulation → open Fix My Agent → review and apply suggestions → re-run to validate. Optionally run auto-optimization for systematic prompt refinement.
Navigate to execution results
After your simulation run completes, open the execution detail page.
What you see (field meanings):
| Field | Meaning |
|---|---|
| Call Details | Total calls, connected calls, connection rate for this run. |
| System Metrics | CSAT scores, agent latency, WPM (words per minute). |
| Evaluation Metrics | Results from the evaluations you attached to the simulation. |
This is where Fix My Agent runs its analysis.
Open Fix My Agent
Click Fix My Agent in the top-right of the execution page. A side panel opens.
What the panel shows (field meanings):
| Field | Meaning |
|---|---|
| Suggestions | Total number of issues the analysis identified. |
| Priority | High / Medium / Low — urgency of each issue. |
| Issue categories | Type of problem (e.g. latency, response brevity, detection tuning). |
| Affected calls | How many calls in this run showed each issue. |
| Last updated | When the analysis was last run (refresh to get a new analysis). |
No configuration required—suggestions are generated from the run.
Review and apply suggestions
Each suggestion in the panel has these parts:
| Field | Meaning |
|---|---|
| Issue description | What’s wrong (e.g. pipeline latency, response length, end-of-speech detection). |
| Recommended fix | What to change (e.g. switch to a faster model, add a token limit, adjust VAD parameters). |
| Priority | High / Medium / Low — tackle High first. |
| Affected calls | Number of calls that showed this issue. |
| View issue | Opens specific call examples so you can see the problem in context. |
Example suggestion types: Aggressively Reduce Pipeline Latency (e.g. faster model for lower TTFT), Enforce Strict Response Brevity (e.g. hard token limit), Tune End-of-Speech Detection (e.g. adjust VAD). Implement the recommended changes in your system prompt, then re-run the simulation to validate. Start with High Priority; do 1–2 fixes per iteration and re-run to verify before moving on.
Optional: Run auto-optimization
To have the platform generate and test prompt variations, click Optimize My Agent in the Fix My Agent panel.
Configuration fields:
| Field | Meaning |
|---|---|
| Name | Label for this optimization run (e.g. “opt1”, “latency-v2”). |
| Optimizer | Algorithm that generates and evaluates prompt variations (see below). |
| Language model | LLM used for the optimization (teacher model). |
| Parameters | Optimizer-specific settings (e.g. number of variations, rounds, trials). |
Choose an optimizer — Select from the algorithms below:
Best for: Quick baseline testing and initial exploration.
How it works: Generates random prompt variations using a teacher model and evaluates each candidate.
Characteristics:
- ⚡⚡⚡ Fast execution
- ⭐⭐ Basic quality improvements
- 💰 Low cost
- Ideal for: 10-30 examples
Use when: You need quick results or want to establish a performance baseline before trying more sophisticated algorithms.
Best for: Few-shot learning tasks and intelligent example selection.
How it works: Uses Bayesian optimization to intelligently select few-shot examples and prompt configurations.
Characteristics:
- ⚡⚡ Medium speed
- ⭐⭐⭐⭐ High quality
- 💰💰 Medium cost
- Ideal for: 15-50 examples
Use when: Your dataset contains good examples and you want to leverage few-shot learning effectively.
Best for: Complex reasoning tasks requiring deep analysis.
How it works: Analyzes failed examples, formulates hypotheses, and rewrites the entire prompt through deep reasoning.
Characteristics:
- ⚡⚡ Medium speed
- ⭐⭐⭐⭐ High quality
- 💰💰💰 Higher cost
- Ideal for: 20-40 examples
Use when: Your agent handles complex reasoning tasks or you need holistic prompt redesign.
Best for: Identifying and fixing specific error patterns.
How it works: Generates critiques of failures and applies targeted improvements using beam search to maintain multiple candidates.
Characteristics:
- ⚡ Slower execution
- ⭐⭐⭐⭐ High quality
- 💰💰💰 Higher cost
- Ideal for: 20-50 examples
Use when: You have clear failure patterns and want systematic error fixing.
Best for: Creative exploration and diverse prompt variations.
How it works: Combines mutation with different “thinking styles”, then critiques and refines top performers.
Characteristics:
- ⚡ Slower execution
- ⭐⭐⭐⭐ High quality
- 💰💰💰 Higher cost
- Ideal for: 15-40 examples
Use when: You want creative exploration or diverse conversational approaches.
Best for: Production deployments requiring state-of-the-art performance.
How it works: Uses evolutionary algorithms with reflective learning and mutation strategies inspired by natural selection.
Characteristics:
- ⚡ Slower execution
- ⭐⭐⭐⭐⭐ Excellent quality
- 💰💰💰💰 Highest cost
- Ideal for: 30-100 examples
Use when: You need production-grade optimization with robust results and have sufficient evaluation budget.
Click Start Optimizing your agent to begin the automated prompt generation process. The optimization engine will: (1) Analyze your simulation data and Fix My Agent suggestions; (2) Generate multiple system prompt variations using the selected algorithm; (3) Evaluate each variation against your test scenarios; (4) Score performance improvements; (5) Select the best-performing optimized prompt. View results in the Optimization Runs tab: performance comparison, best prompt, and history. Review the improved prompt, test on scenarios not in the original set, then update your agent and re-run to validate.
Tip
Most users find that manually implementing Fix My Agent suggestions is the fastest path to improvement. Use auto-optimization when you need to test many prompt variations or want production-grade automated refinement.
View results and deploy
After implementing fixes or running auto-optimization, use the tabs below to view results and deploy.
After implementing Fix My Agent suggestions:
- Re-run simulations with your updated prompt
- Compare metrics to baseline in the execution dashboard
- Review new suggestions from Fix My Agent
- Iterate until performance meets your goals
- Deploy to production when satisfied
If you used automated optimization, view results in the Optimization Runs tab:
Performance comparison — Original prompt baseline scores, auto-generated prompt scores, improvement percentage.
Best prompt — The highest-performing variation, changes from the original, evaluation scores across metrics.
Optimization history — All variations tested, performance trajectory, iteration details.
Copy the best prompt into your agent, test on new scenarios, then deploy. Always validate with test cases that weren’t in the optimization set to avoid overfitting.
Whether implementing manually or using auto-optimization:
✓ Review the improved prompt carefully
✓ Test with additional scenarios not in original dataset
✓ Update your agent definition with the new prompt
✓ Re-run simulations to validate improvements
✓ Monitor performance in production
Warning
Always validate with new test cases before production deployment. Both manual and automated approaches can overfit to the evaluation dataset.
Algorithm Comparison
| Algorithm | Speed | Quality | Cost | Best Dataset Size |
|---|---|---|---|---|
| Random Search | ⚡⚡⚡ | ⭐⭐ | 💰 | 10-30 examples |
| Bayesian Search | ⚡⚡ | ⭐⭐⭐⭐ | 💰💰 | 15-50 examples |
| Meta-Prompt | ⚡⚡ | ⭐⭐⭐⭐ | 💰💰💰 | 20-40 examples |
| ProTeGi | ⚡ | ⭐⭐⭐⭐ | 💰💰💰 | 20-50 examples |
| PromptWizard | ⚡ | ⭐⭐⭐⭐ | 💰💰💰 | 15-40 examples |
| GEPA | ⚡ | ⭐⭐⭐⭐⭐ | 💰💰💰💰 | 30-100 examples |
Note
- Speed: ⚡ = Slow, ⚡⚡ = Medium, ⚡⚡⚡ = Fast
- Quality: ⭐ = Basic, ⭐⭐⭐⭐⭐ = Excellent
- Cost: 💰 = Low, 💰💰💰💰 = High (based on API calls)
Decision Tree
Do you need production-grade optimization?
├─ Yes → Use GEPA
└─ No
│
Do you have clear error patterns to fix?
├─ Yes → Use ProTeGi
└─ No
│
Is your task reasoning-heavy or complex?
├─ Yes → Use Meta-Prompt
└─ No
│
Do you need few-shot learning optimization?
├─ Yes → Use Bayesian Search
└─ No
│
Do you want creative exploration?
├─ Yes → Use PromptWizard
└─ No → Use Random Search (baseline)
Next Steps
Run Simulation
Learn how to run comprehensive agent simulations
Create Scenarios
Build diverse test scenarios for better diagnostics
Agent Definition
Configure your agent for optimal performance
Optimization Algorithms (Advanced)
Deep dive into auto-optimization algorithm details