GEPA: Evolutionary Prompt Optimization
Discover GEPA (Genetic Pareto), a powerful evolutionary algorithm that evolves prompts over generations using reflection and mutation for complex, high-stakes optimization.
GEPA (Genetic Pareto) is a powerful, state-of-the-art evolutionary algorithm that evolves a population of prompts over multiple generations. It uses a powerful “reflection” language model to analyze failures and provide feedback, which guides the mutation and evolution process toward creating better-performing prompts. It is designed for complex, high-stakes problems where achieving the best possible performance is critical.
When to Use GEPA
✅ Best For
- Complex, agentic AI systems
- High-stakes optimization problems
- Finding state-of-the-art prompts
- Production-grade deployments
- Effective alternative to Reinforcement Learning
❌ Not Ideal For
- Simple, straightforward tasks
- Quick experiments or baseline testing
- Projects with a low computational budget
- Requires the external
gepalibrary to be installed
How It Works
GEPA uses a sophisticated evolutionary loop to systematically refine prompts. The process is managed by the external gepa library, which our optimizer adapts to.
1. Initialization
The process starts with a single seed_candidate prompt. An adapter is initialized to bridge our evaluation framework with the GEPA engine.
2. Evaluation
GEPA’s engine runs the current generation of prompts against the dataset. Our internal adapter calls our standard Evaluator to score the outputs, feeding the results back to GEPA.
3. Reflection
GEPA uses a powerful reflection_lm to analyze the evaluation results, especially the failures. It creates a “reflective dataset” that contains detailed feedback on why certain outputs were poor.
4. Evolution (Mutation)
The reflective dataset is used to guide the evolution process. The reflection model generates a new population of candidate prompts (mutations) that are specifically designed to avoid the failures of the previous generation.
5. Selection & Repetition
The new generation of prompts is evaluated, and the best-performing ones are selected to continue. This cycle repeats until a predefined budget (e.g., max_metric_calls) is exhausted, ensuring the process is efficient.
Basic Usage
To use the GEPA optimizer, you need to provide two key models: one for reflection and one for generation.
from fi.opt.optimizers import GEPAOptimizer
from fi.opt.datamappers import BasicDataMapper
from fi.opt.base.evaluator import Evaluator
# 1. Setup the evaluator to score prompt performance
evaluator = Evaluator(
eval_template="summary_quality",
eval_model_name="turing_flash",
fi_api_key="your_key",
fi_secret_key="your_secret"
)
# 2. Setup the data mapper
data_mapper = BasicDataMapper(
key_map={"input": "article", "output": "generated_output"}
)
# 3. Initialize the GEPA optimizer
# The reflection_model should be a powerful LLM (e.g., GPT-4 Turbo)
# The generator_model is the model your final prompt will use
optimizer = GEPAOptimizer(
reflection_model="gpt-4-turbo",
generator_model="gpt-4o-mini"
)
# 4. Run the optimization
# GEPA works towards a budget of total evaluations (max_metric_calls)
result = optimizer.optimize(
evaluator=evaluator,
data_mapper=data_mapper,
dataset=my_dataset,
initial_prompts=["Summarize this article concisely: {article}"],
max_metric_calls=200 # Total number of evaluations to perform
)
print(f"Best prompt found: {result.best_generator.get_prompt_template()}")
print(f"Final score: {result.final_score:.4f}")
Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
reflection_model | str | required | Model for reflection and mutation (e.g. gpt-4-turbo, claude-3-opus) |
generator_model | str | gpt-4o-mini | Model for generated outputs (typically your production model) |
max_metric_calls | int | 150 | Total evaluation budget across all generations |
Key concepts: GEPAOptimizer wraps the external gepa library; an internal adapter translates between our Evaluator and GEPA’s engine (evaluate prompts, format reflection data). Install the gepa package to use this optimizer.
Tips: Use a strong reflection model; set max_metric_calls (e.g. 100–150 for experiments). Library not found: install the external gepa library.
Underlying Research
GEPA is based on recent advancements in evolutionary algorithms for prompt engineering, showing significant gains over traditional methods.
- Core Paper: The method is detailed in “GEPA: Reflective Prompt Evolution Can Outperform Reinforcement …”, which demonstrates that it can outperform RL-based methods with far fewer evaluations.
- Efficiency: As highlighted by the Databricks Blog, GEPA can lead to massive cost reductions for agent optimization. It is integrated into leading optimization frameworks like Opik and SuperOptiX.