Understanding Optimization

How prompt optimization works: the feedback loop, key components, algorithms, and how to choose the right one.

About

Prompt optimization is the process of iteratively improving a prompt using evaluation scores as feedback. You start with a baseline prompt, run it against your data, score the outputs, and let an algorithm generate better versions. Each round, the optimizer adjusts the prompt based on what scored well and what didn’t.

This is different from experimentation, which compares two or more fixed prompts side by side. Optimization takes one prompt and makes it better over multiple rounds.

How It Works

The optimization loop has four components:

  1. Dataset: A set of input/output examples that the prompt runs against (e.g. questions and expected answers, articles and target summaries)
  2. Generator: The LLM that runs the prompt and produces outputs (e.g. GPT-4o-mini)
  3. Evaluator: Scores each output using an eval template (e.g. summary_quality, tone, groundedness)
  4. Optimizer: The algorithm that generates new prompt variations based on scores from previous rounds

The process:

Baseline prompt

Run on dataset → Generate outputs

Score outputs with evaluator

Optimizer generates new prompt variations

Run variations on dataset → Score again

Repeat for N rounds

Return best prompt + score

Each round, the optimizer sees which prompts scored higher and uses that signal to generate the next set of candidates. After all rounds complete, you get the best prompt and its score.

Optimization vs Experimentation

OptimizationExperimentation
GoalImprove one prompt iterativelyCompare multiple fixed prompts
ProcessAlgorithmic (automated rounds)Manual (you define the variants)
OutputBest prompt found + scoreSide-by-side comparison of scores
When to useYou have a prompt and want to make it betterYou have multiple candidates and want to pick the best one

Typically you’d experiment first to find a promising prompt direction, then optimize that prompt to squeeze out more quality.


Choosing an Algorithm

Future AGI supports 6 optimization algorithms. Use the tables below to pick one.

Quick Selection

Use caseRecommended optimizerWhy
Few-shot learningBayesian SearchSelects and formats examples intelligently
Complex reasoningMeta-PromptDeep failure analysis and full prompt rewrite
Fixing specific errorsProTeGiIdentifies and fixes failure patterns
Creative / open-endedPromptWizardDiverse prompt exploration
Production deploymentsGEPAStrong evolutionary search with good budgeting
Quick baselineRandom SearchFast, simple baseline

Performance Comparison

OptimizerSpeedQualityCostDataset size
Random SearchFastBasicLow10-30
Bayesian SearchMediumHighMedium15-50
Meta-PromptMediumHighHigh20-40
ProTeGiSlowHighHigh20-50
PromptWizardSlowHighHigh15-40
GEPASlowExcellentVery High30-100

Decision Tree

Do you need production-grade optimization?
├─ Yes → Use GEPA
└─ No

   Do you have few-shot examples in your dataset?
   ├─ Yes → Use Bayesian Search
   └─ No

      Is your task reasoning-heavy or complex?
      ├─ Yes → Use Meta-Prompt
      └─ No

         Do you have clear failure patterns to fix?
         ├─ Yes → Use ProTeGi
         └─ No

            Do you want creative exploration?
            ├─ Yes → Use PromptWizard
            └─ No → Use Random Search (baseline)

For detailed parameters and configuration of each algorithm, see the individual algorithm pages linked in the sidebar.


Combining Optimizers

You can run multiple optimizers sequentially for best results:

# Stage 1: Quick exploration with Random Search
random_result = random_optimizer.optimize(...)
initial_prompts = [h.prompt for h in random_result.history[:3]]

# Stage 2: Deep refinement with Meta-Prompt
meta_result = meta_optimizer.optimize(
    initial_prompts=initial_prompts,
    ...
)

# Stage 3: Few-shot enhancement with Bayesian Search
final_result = bayesian_optimizer.optimize(
    initial_prompts=[meta_result.best_generator.get_prompt_template()],
    ...
)

Next Steps

Was this page helpful?

Questions & Discussion