When to Use ProTeGi
✅ Best For
- Debugging specific failure modes
- Systematic error correction
- Tasks with clear failure patterns
- Iterative refinement workflows
❌ Not Ideal For
- Quick experiments (multi-stage process)
- Tasks where failures are random
- Very small datasets
- Budget-constrained projects
How It Works
ProTeGi follows a structured expansion and selection process:1
Identify Failures
Run current prompts and identify examples with low scores
2
Generate Critiques
Teacher model analyzes failures and generates multiple specific critiques (“gradients”)
3
Apply Improvements
For each critique, generate improved prompt variations
4
Beam Selection
Evaluate all candidates and keep top N prompts
5
Iterate
Repeat expansion from the best performing prompts
ProTeGi maintains a “beam” of candidate prompts throughout optimization, preventing premature convergence to local optima.
Basic Usage
Underlying Research
ProTeGi introduces a novel, gradient-inspired approach to prompt optimization, adapting concepts from numerical optimization to natural language.- Core Paper: The method originates from the paper “Automatic Prompt Optimization with “Gradient Descent” and Beam Search”, which details how to create “textual gradients” (critiques) to guide prompt improvement.
- Extensions: The core idea has been extended in subsequent research, such as “Momentum-Aided Gradient Descent Prompt Optimization”, which incorporates momentum to accelerate convergence.
- Classification: In surveys on automatic prompt engineering, ProTeGi is categorized as a pioneering gradient-based method for its innovative approach to error-driven refinement.
Configuration Parameters
Core Parameters
Powerful model for generating critiques and improved prompts. Recommended:
gpt-4o
, claude-3-opus
.Number of distinct critiques to generate for each prompt. More gradients = more diverse improvement directions.
Number of failed examples shown to teacher when generating each critique. Higher = more context but more expensive.
Number of new prompts to generate from each critique. Set to 2-3 for more exploration.
Number of top-performing prompts to keep each round. Larger beam = more diversity but slower.