Skip to main content
ProTeGi (Prompt optimization via Test case Generation and Improvement) systematically improves prompts by identifying failure patterns, generating targeted critiques, and applying specific fixes. It uses beam search to maintain multiple candidate prompts and progressively refines them.

When to Use ProTeGi

✅ Best For

  • Debugging specific failure modes
  • Systematic error correction
  • Tasks with clear failure patterns
  • Iterative refinement workflows

❌ Not Ideal For

  • Quick experiments (multi-stage process)
  • Tasks where failures are random
  • Very small datasets
  • Budget-constrained projects

How It Works

ProTeGi follows a structured expansion and selection process:
1

Identify Failures

Run current prompts and identify examples with low scores
2

Generate Critiques

Teacher model analyzes failures and generates multiple specific critiques (“gradients”)
3

Apply Improvements

For each critique, generate improved prompt variations
4

Beam Selection

Evaluate all candidates and keep top N prompts
5

Iterate

Repeat expansion from the best performing prompts
ProTeGi maintains a “beam” of candidate prompts throughout optimization, preventing premature convergence to local optima.

Basic Usage

from fi.opt.optimizers import ProTeGi
from fi.opt.generators import LiteLLMGenerator
from fi.opt.datamappers import BasicDataMapper
from fi.opt.base.evaluator import Evaluator

# Setup teacher model
teacher = LiteLLMGenerator(
    model="gpt-4o",
    prompt_template="{prompt}"
)

# Setup evaluator
evaluator = Evaluator(
    eval_template="context_relevance",
    eval_model_name="turing_flash",
    fi_api_key="your_key",
    fi_secret_key="your_secret"
)

# Setup data mapper
data_mapper = BasicDataMapper(
    key_map={"input": "question", "output": "generated_output"}
)

# Create optimizer
optimizer = ProTeGi(
    teacher_generator=teacher,
    num_gradients=4,
    errors_per_gradient=4,
    prompts_per_gradient=1,
    beam_size=4
)

# Run optimization
result = optimizer.optimize(
    evaluator=evaluator,
    data_mapper=data_mapper,
    dataset=dataset,
    initial_prompts=["Answer the question: {question}"],
    num_rounds=3,
    eval_subset_size=32
)

Underlying Research

ProTeGi introduces a novel, gradient-inspired approach to prompt optimization, adapting concepts from numerical optimization to natural language.

Configuration Parameters

Core Parameters

teacher_generator
LiteLLMGenerator
required
Powerful model for generating critiques and improved prompts. Recommended: gpt-4o, claude-3-opus.
num_gradients
int
default:"4"
Number of distinct critiques to generate for each prompt. More gradients = more diverse improvement directions.
errors_per_gradient
int
default:"4"
Number of failed examples shown to teacher when generating each critique. Higher = more context but more expensive.
prompts_per_gradient
int
default:"1"
Number of new prompts to generate from each critique. Set to 2-3 for more exploration.
beam_size
int
default:"4"
Number of top-performing prompts to keep each round. Larger beam = more diversity but slower.

**Optimization Parameters