GEPA: Evolutionary Algorithm for Prompt Optimization

GEPA (Genetic Pareto) is an evolutionary algorithm that evolves prompts over generations using reflection and mutation for complex optimization.

GEPA (Genetic Pareto) is a powerful, state-of-the-art evolutionary algorithm that evolves a population of prompts over multiple generations. It uses a powerful “reflection” language model to analyze failures and provide feedback, which guides the mutation and evolution process toward creating better-performing prompts. It is designed for complex, high-stakes problems where achieving the best possible performance is critical.

When to Use GEPA

✅ Best For

Complex, agentic AI systems
High-stakes optimization problems
Finding state-of-the-art prompts
Production-grade deployments
Effective alternative to Reinforcement Learning

❌ Not Ideal For

Simple, straightforward tasks
Quick experiments or baseline testing
Projects with a low computational budget
Requires the external gepa library to be installed

How It Works

GEPA uses a sophisticated evolutionary loop to systematically refine prompts. The process is managed by the external gepa library, which our optimizer adapts to.

1. Initialization

The process starts with a single seed_candidate prompt. An adapter is initialized to bridge our evaluation framework with the GEPA engine.

2. Evaluation

GEPA’s engine runs the current generation of prompts against the dataset. Our internal adapter calls our standard Evaluator to score the outputs, feeding the results back to GEPA.

3. Reflection

GEPA uses a powerful reflection_lm to analyze the evaluation results, especially the failures. It creates a “reflective dataset” that contains detailed feedback on why certain outputs were poor.

4. Evolution (Mutation)

The reflective dataset is used to guide the evolution process. The reflection model generates a new population of candidate prompts (mutations) that are specifically designed to avoid the failures of the previous generation.

5. Selection & Repetition

The new generation of prompts is evaluated, and the best-performing ones are selected to continue. This cycle repeats until a predefined budget (e.g., max_metric_calls) is exhausted, ensuring the process is efficient.

Basic Usage

To use the GEPA optimizer, you need to provide two key models: one for reflection and one for generation.

from fi.opt.optimizers import GEPAOptimizer
from fi.opt.datamappers import BasicDataMapper
from fi.opt.base.evaluator import Evaluator

# 1. Setup the evaluator to score prompt performance
evaluator = Evaluator(
    eval_template="summary_quality",
    eval_model_name="turing_flash",
    fi_api_key="your_key",
    fi_secret_key="your_secret"
)

# 2. Setup the data mapper
data_mapper = BasicDataMapper(
    key_map={"input": "article", "output": "generated_output"}
)

# 3. Initialize the GEPA optimizer
# The reflection_model should be a powerful LLM (e.g., GPT-4 Turbo)
# The generator_model is the model your final prompt will use
optimizer = GEPAOptimizer(
    reflection_model="gpt-4-turbo",
    generator_model="gpt-4o-mini"
)

# 4. Run the optimization
# GEPA works towards a budget of total evaluations (max_metric_calls)
result = optimizer.optimize(
    evaluator=evaluator,
    data_mapper=data_mapper,
    dataset=my_dataset,
    initial_prompts=["Summarize this article concisely: {article}"],
    max_metric_calls=200  # Total number of evaluations to perform
)

print(f"Best prompt found: {result.best_generator.get_prompt_template()}")
print(f"Final score: {result.final_score:.4f}")

Parameters

Parameter	Type	Default	Description
`reflection_model`	str	required	Model for reflection and mutation (e.g. gpt-4-turbo, claude-3-opus)
`generator_model`	str	gpt-4o-mini	Model for generated outputs (typically your production model)
`max_metric_calls`	int	150	Total evaluation budget across all generations

Key concepts: GEPAOptimizer wraps the external gepa library; an internal adapter translates between our Evaluator and GEPA’s engine (evaluate prompts, format reflection data). Install the gepa package to use this optimizer.

Tips: Use a strong reflection model; set max_metric_calls (e.g. 100–150 for experiments). Library not found: install the external gepa library.

Underlying Research

GEPA is based on recent advancements in evolutionary algorithms for prompt engineering, showing significant gains over traditional methods.

Core Paper: The method is detailed in “GEPA: Reflective Prompt Evolution Can Outperform Reinforcement …”, which demonstrates that it can outperform RL-based methods with far fewer evaluations.
Efficiency: As highlighted by the Databricks Blog, GEPA can lead to massive cost reductions for agent optimization. It is integrated into leading optimization frameworks like Opik and SuperOptiX.

Next steps

Try Meta-Prompt

For a different refinement approach

Compare Optimizers

See all optimization strategies.

Was this page helpful?

Questions & Discussion