Optimization fundamentals

Compare optimization algorithms and choose the right strategy for your use case.

What it is

Experimentation allows users to compare different prompt or model configurations, but it does not refine a single prompt in a systematic, data-driven way. Once an experiment identifies a well-performing prompt, optimization takes it a step further by making iterative improvements. This process enhances clarity, response quality, and efficiency while reducing ambiguity that can cause inconsistencies in AI outputs.

Since LLMs generate responses probabilistically, even the same input can produce different outputs. Optimization ensures that prompts are structured to deliver the most consistent, high-quality results while minimising unnecessary token usage.


Why optimization

Experimentation compares prompt or model configurations but doesn’t refine a single prompt systematically. Optimization does: it uses eval scores to guide changes so you get clearer, more consistent, and more efficient outputs. Because LLM outputs vary, optimization helps structure prompts for the most consistent, high-quality results and fewer wasted tokens.


How optimization works

An optimization run needs: a dataset, an initial prompt, evaluation metrics to score performance, and an optimization algorithm. The optimizer runs the baseline, then enters a loop: it creates new prompt candidates, runs them on the dataset, scores them with your evals, and uses that feedback for the next round. When the run finishes, you get the best-performing prompt (and optionally a history of trials). You can change where optimization runs (e.g. project or dataset) in settings; that applies everywhere.


Choosing an optimization strategy

The library provides six algorithms. Use the table and tabs below to pick one.

Quick selection

Use caseRecommended optimizerWhy
Few-shot learningBayesian SearchSelects and formats examples intelligently
Complex reasoningMeta-PromptDeep failure analysis and full prompt rewrite
Fixing specific errorsProTeGiIdentifies and fixes failure patterns
Creative / open-endedPromptWizardDiverse prompt exploration
Production deploymentsGEPAStrong evolutionary search with good budgeting
Quick baselineRandom SearchFast, simple baseline

Performance comparison

OptimizerSpeedQualityCostDataset size
Bayesian Search⚡⚡⭐⭐⭐⭐💰💰15–50
Meta-Prompt⚡⚡⭐⭐⭐⭐💰💰💰20–40
ProTeGi⭐⭐⭐⭐💰💰💰20–50
PromptWizard⭐⭐⭐⭐💰💰💰15–40
GEPA⭐⭐⭐⭐⭐💰💰💰💰30–100
Random Search⚡⚡⚡⭐⭐💰10–30

Note

Speed: ⚡ = slow → ⚡⚡⚡ = fast. Quality: ⭐ = basic → ⭐⭐⭐⭐⭐ = excellent. Cost: 💰 = low → 💰💰💰💰 = high.

Optimizer details

These optimizers explore the prompt space systematically.

How it works: Generates random prompt variations using a teacher model and tests each one.

Strengths: Very fast; simple to understand and debug; good baseline for comparison.

Limitations: No learning from previous attempts; may miss optimal solutions; quality depends on teacher model.

Random Search →

How it works: Uses Bayesian optimization to select few-shot examples and prompt configurations.

Strengths: Efficient search; excellent for few-shot learning; can infer optimal example templates.

Limitations: Requires examples in your dataset; may need many trials for complex spaces; best for structured tasks.

Bayesian Search →

These optimizers iteratively improve prompts through analysis.

How it works: Analyzes failed examples, formulates hypotheses, and rewrites the entire prompt.

Strengths: Deep failure understanding; holistic redesign; strong for complex tasks.

Limitations: Slower than search-based; higher API cost; may overfit to eval set.

Meta-Prompt →

How it works: Generates critiques of failures and applies targeted improvements using beam search.

Strengths: Systematic error fixing; keeps multiple candidates; good exploration/refinement balance.

Limitations: Can be expensive; needs clear failure signals; may need several rounds.

ProTeGi →

How it works: Combines mutation with different “thinking styles,” then critiques and refines top performers.

Strengths: Creative exploration; structured refinement; diverse variations.

Limitations: Multiple stages can be slow; requires a good teacher model; may produce unconventional prompts.

PromptWizard →

These use evolutionary strategies inspired by natural selection.

How it works: Evolutionary algorithms with reflective learning and mutation strategies.

Strengths: Strong performance; efficient eval budgeting; robust to local optima; production-ready.

Limitations: Requires the external gepa library; more complex setup; higher compute.

GEPA →

Decision Tree

Do you need production-grade optimization?
├─ Yes → Use GEPA
└─ No

   Do you have few-shot examples in your dataset?
   ├─ Yes → Use Bayesian Search
   └─ No

      Is your task reasoning-heavy or complex?
      ├─ Yes → Use Meta-Prompt
      └─ No

         Do you have clear failure patterns to fix?
         ├─ Yes → Use ProTeGi
         └─ No

            Do you want creative exploration?
            ├─ Yes → Use PromptWizard
            └─ No → Use Random Search (baseline)

Combining Optimizers

You can run multiple optimizers sequentially for best results:

# Stage 1: Quick exploration with Random Search
random_result = random_optimizer.optimize(...)
initial_prompts = [h.prompt for h in random_result.history[:3]]

# Stage 2: Deep refinement with Meta-Prompt
meta_result = meta_optimizer.optimize(
    initial_prompts=initial_prompts,
    ...
)

# Stage 3: Few-shot enhancement with Bayesian Search
final_result = bayesian_optimizer.optimize(
    initial_prompts=[meta_result.best_generator.get_prompt_template()],
    ...
)

What you can do next

Was this page helpful?

Questions & Discussion