Evolutionary Optimization with GEPA - Future AGI Documentation

The GEPAOptimizer is an adapter for the powerful, state-of-the-art GEPA (Genetic-Pareto) library. It uses an evolutionary algorithm that treats prompts like DNA, iteratively mutating them based on rich, reflective feedback from a “teacher” model to find highly optimized solutions. This cookbook will guide you through setting up and running the GEPAOptimizer for production-grade prompt optimization.

This optimizer requires the gepa library. If you haven’t already, install it with: pip install gepa.

When to Use GEPA

GEPA is your most powerful tool, ideal for scenarios where achieving the absolute best performance is critical.

✅ Best For

Critical, production-grade applications
Complex, multi-component systems (e.g., RAG)
High-stakes tasks where small improvements matter
When you have a larger evaluation budget

❌ Not Ideal For

Quick, simple experiments
Very small budgets or datasets
Initial exploration (use Random Search first)

How It Works

Our GEPAOptimizer acts as a clean adapter to the external gepa library, handling the complex setup for you. The core evolutionary loop proceeds in steps:

Evaluate

GEPA first tests the performance of the current best prompt(s) on a sample of your dataset to establish a baseline.

Reflect

It uses a powerful “reflection” model to analyze the results, especially the failures. It generates rich, textual feedback on why the prompt failed.

Mutate

Based on this reflection, the reflection model rewrites the prompt to create new, improved “offspring” prompts (mutations). This step also includes paraphrasing to increase diversity.

Select & Repeat

GEPA uses a sophisticated method called Pareto-aware selection (powered by a UCB bandit algorithm) to efficiently choose the most promising new prompts to carry forward to the next generation. The cycle then repeats.

1. Prepare Your Dataset and Initial Prompt

A high-quality dataset is crucial for GEPA. For this example, we’ll aim to optimize a summarization prompt. A good dataset should contain a diverse set of articles and their ideal, “golden” summaries.

# A high-quality dataset is key for GEPA's success.
# 30-100 examples are recommended for a good optimization run.
dataset = [
    {
        "article": "The James Webb Space Telescope (JWST) has captured stunning new images of the Pillars of Creation, revealing previously unseen details of star formation within the dense clouds of gas and dust.",
        "target_summary": "The JWST has taken new, detailed pictures of star formation in the Pillars of Creation."
    },
    {
        "article": "Researchers at the University of Austin have discovered a new enzyme capable of breaking down polyethylene terephthalate (PET), the plastic commonly found in beverage bottles, in a matter of hours.",
        "target_summary": "A new enzyme that rapidly breaks down PET plastic has been discovered by researchers."
    },
    # ... more examples
]

# This is our starting point—a simple prompt we want GEPA to evolve.
initial_prompt = "Summarize this article concisely: {article}"

2. Configure the GEPA Optimizer

GEPA requires two key models and an evaluation budget.

from fi.opt.optimizers import GEPAOptimizer
from fi.opt.datamappers import BasicDataMapper
from fi.opt.base import Evaluator

# a. Setup the evaluator to score prompt performance.
# We'll use the FutureAGI platform for a high-quality, semantic evaluation.
import os
# Add your FutureAGI API keys
os.environ["FI_API_KEY"] = "YOUR_API_KEY"
os.environ["FI_SECRET_KEY"] = "YOUR_SECRET_KEY"

evaluator = Evaluator(
    eval_template="summary_quality",
    eval_model_name="turing_flash",
)

# b. Setup the data mapper to connect our components.
data_mapper = BasicDataMapper(
    key_map={
        "input": "article",          # Map our dataset's 'article' to the evaluator's 'input'
        "output": "generated_output" # Map the generator's output to the evaluator's 'output'
    }
)

# c. Initialize the GEPA optimizer.
optimizer = GEPAOptimizer(
    # A powerful model for reflection is crucial for good results.
    reflection_model="gpt-5",
    
    # The "student" model whose prompt we are optimizing.
    generator_model="gpt-4o-mini"
)

3. Run the Optimization

With everything configured, call the .optimize() method. The most important parameter is max_metric_calls, which defines your total budget for the entire evolutionary process.

Important: max_metric_calls includes all evaluations, even for initial prompt outputs. If your dataset has 300 rows and max_metric_calls is 200, the budget will be exhausted just evaluating the first prompt, preventing any actual optimization. Ensure max_metric_calls is significantly larger than your dataset size.

# Run the optimization with a budget of 200 evaluations.
# A larger budget allows for more generations and potentially better results.

result = optimizer.optimize(
    evaluator=evaluator,
    data_mapper=data_mapper,
    dataset=dataset,
    initial_prompts=[initial_prompt],
    max_metric_calls=200
)

4. Analyze the Results

The result object contains the best prompt found, its score, and the history of the run. GEPA’s strength is finding highly optimized prompts that often contain specific, nuanced instructions learned from analyzing failures.

print("--- GEPA Optimization Complete ---")
print(f"Best Score: {result.final_score:.4f}")

print("\n--- Initial Prompt ---")
print(initial_prompt)

print("\n--- Best Prompt Found by GEPA ---")
print(result.best_generator.get_prompt_template())

# The optimized prompt might look something like this:
#
# You are an expert summarizer. Your task is to generate a single, concise sentence
# that captures the main takeaway of the provided article.
#
# Key requirements:
# 1.  **Fidelity:** Ensure the summary is factually consistent with the source text.
# 2.  **Brevity:** Do not exceed 20 words.
# 3.  **Key Entities:** The summary must include the primary subject of the article.
#
# Article: {article}
# Summary:

Performance Tips

Provide a Sufficient Budget

GEPA is powerful but data-hungry. Its evolutionary process shines with a larger budget. A max_metric_calls of 150-300 is a good starting point for real tasks. A small budget (< 50) may not be enough for the algorithm to evolve past the initial prompt.

Use a High-Quality Reflection Model

The quality of the optimization is heavily dependent on the reflection_model. Using a top-tier model like gpt-5 or claude-4.5-sonnet or gemini-2.5-pro for this role is highly recommended for generating insightful critiques and high-quality mutations.

Start with a Decent Initial Prompt

While GEPA can work from a very simple prompt, providing a reasonably well-structured initial prompt gives the evolutionary process a better starting point and can lead to faster convergence on a high-quality solution.

Cookbooks

​When to Use GEPA

✅ Best For

❌ Not Ideal For

​How It Works

​1. Prepare Your Dataset and Initial Prompt

​2. Configure the GEPA Optimizer

​3. Run the Optimization

​4. Analyze the Results

​Performance Tips

When to Use GEPA

How It Works

1. Prepare Your Dataset and Initial Prompt

2. Configure the GEPA Optimizer

3. Run the Optimization

4. Analyze the Results

Performance Tips