Skip to main content
GEPA (Genetic Pareto) is a powerful, state-of-the-art evolutionary algorithm that evolves a population of prompts over multiple generations. It uses a powerful “reflection” language model to analyze failures and provide feedback, which guides the mutation and evolution process toward creating better-performing prompts. It is designed for complex, high-stakes problems where achieving the best possible performance is critical.

When to Use GEPA

✅ Best For

  • Complex, agentic AI systems
  • High-stakes optimization problems
  • Finding state-of-the-art prompts
  • Production-grade deployments
  • Effective alternative to Reinforcement Learning

❌ Not Ideal For

  • Simple, straightforward tasks
  • Quick experiments or baseline testing
  • Projects with a low computational budget
  • Requires the external gepa library to be installed

How It Works

GEPA uses a sophisticated evolutionary loop to systematically refine prompts. The process is managed by the external gepa library, which our optimizer adapts to.
1

1. Initialization

The process starts with a single seed_candidate prompt. An adapter is initialized to bridge our evaluation framework with the GEPA engine.
2

2. Evaluation

GEPA’s engine runs the current generation of prompts against the dataset. Our internal adapter calls our standard Evaluator to score the outputs, feeding the results back to GEPA.
3

3. Reflection

GEPA uses a powerful reflection_lm to analyze the evaluation results, especially the failures. It creates a “reflective dataset” that contains detailed feedback on why certain outputs were poor.
4

4. Evolution (Mutation)

The reflective dataset is used to guide the evolution process. The reflection model generates a new population of candidate prompts (mutations) that are specifically designed to avoid the failures of the previous generation.
5

5. Selection & Repetition

The new generation of prompts is evaluated, and the best-performing ones are selected to continue. This cycle repeats until a predefined budget (e.g., max_metric_calls) is exhausted, ensuring the process is efficient.

Basic Usage

To use the GEPA optimizer, you need to provide two key models: one for reflection and one for generation.
from fi.opt.optimizers import GEPAOptimizer
from fi.opt.datamappers import BasicDataMapper
from fi.opt.base.evaluator import Evaluator

# 1. Setup the evaluator to score prompt performance
evaluator = Evaluator(
    eval_template="summary_quality",
    eval_model_name="turing_flash",
    fi_api_key="your_key",
    fi_secret_key="your_secret"
)

# 2. Setup the data mapper
data_mapper = BasicDataMapper(
    key_map={"input": "article", "output": "generated_output"}
)

# 3. Initialize the GEPA optimizer
# The reflection_model should be a powerful LLM (e.g., GPT-4 Turbo)
# The generator_model is the model your final prompt will use
optimizer = GEPAOptimizer(
    reflection_model="gpt-4-turbo",
    generator_model="gpt-4o-mini"
)

# 4. Run the optimization
# GEPA works towards a budget of total evaluations (max_metric_calls)
result = optimizer.optimize(
    evaluator=evaluator,
    data_mapper=data_mapper,
    dataset=my_dataset,
    initial_prompts=["Summarize this article concisely: {article}"],
    max_metric_calls=200  # Total number of evaluations to perform
)

print(f"Best prompt found: {result.best_generator.get_prompt_template()}")
print(f"Final score: {result.final_score:.4f}")

Configuration Parameters

reflection_model
str
required
The name of a powerful language model (e.g., gpt-4-turbo, claude-3-opus) that GEPA will use for its high-level reflection and mutation steps. The success of the optimization heavily depends on this model’s reasoning capabilities.
generator_model
str
default:"gpt-4o-mini"
The model that will be used to generate outputs with the prompts being optimized. This is typically a smaller, faster, or more cost-effective model that you intend to use in production.
max_metric_calls
int
default:"150"
The total budget for the optimization process, defined as the maximum number of individual evaluations to perform across all generations. This provides a predictable upper bound on the cost and duration of the optimization.

Under the Hood: The GEPA Adapter

The GEPAOptimizer acts as a wrapper around the external gepa library. To make them compatible, we use an internal adapter (_InternalGEPAAdapter). This adapter’s job is to be the translator between the two systems:
  1. Evaluation Requests: When GEPA’s engine needs to evaluate a prompt, it calls the adapter’s evaluate method. The adapter then uses our framework’s LiteLLMGenerator and Evaluator to perform the task and returns the scores in the format GEPA expects.
  2. Reflection Data: The adapter’s make_reflective_dataset method formats the evaluation results, including scores and failure reasons, into a structured dataset that GEPA’s reflection model can analyze to guide the next evolutionary step.
This design allows us to leverage GEPA’s powerful, cutting-edge optimization algorithm while still using our framework’s standardized components for evaluation and data handling.

Underlying Research

GEPA is based on recent advancements in evolutionary algorithms for prompt engineering, showing significant gains over traditional methods.
  • Core Paper: The method is detailed in “GEPA: Reflective Prompt Evolution Can Outperform Reinforcement …”, which demonstrates that it can outperform RL-based methods with far fewer evaluations.
  • Efficiency: As highlighted by the Databricks Blog, GEPA can lead to massive cost reductions for agent optimization. It is integrated into leading optimization frameworks like Opik and SuperOptiX.

Next Steps

I