When to Use PromptWizard
✅ Best For
- Creative domains and content generation
- Improving prompt style and meta-instructions
- Complex tasks requiring reasoning
- When you need a balance of exploration and refinement
❌ Not Ideal For
- Quick, simple optimizations
- When teacher model quality is low
- Projects with tight computational budgets
- Tasks with very narrow, specific failure modes (ProTeGi may be better)
How It Works
PromptWizard follows a sophisticated, multi-stage loop for a set number ofrefine_iterations. Each iteration aims to evolve the best prompt from the previous round.
1
1. Mutate & Expand
The optimizer takes the current best prompt and generates numerous creative variations. It uses a powerful teacher model and a list of diverse “thinking styles” (e.g., “Think step-by-step,” “Analyze from different perspectives”) to create a large pool of candidate prompts.
2
2. Score & Select
All candidate prompts in the pool are evaluated against a subset of the dataset. Their performance is scored, and the top prompts are selected based on the
beam_size. This ensures that only the most promising variations proceed.3
3. Critique Failures
For each of the top-performing prompts, the optimizer identifies specific examples from the dataset where it performed poorly (i.e., received a low score). The teacher model then generates a detailed critique, explaining the likely reasons for failure.
4
4. Refine with Feedback
Using the original prompt, the failed examples, and the generated critique, the teacher model rewrites the prompt to address the identified weaknesses. This creates a new set of refined prompts.
5
5. Final Selection & Iteration
The refined prompts are scored again. The single best-performing prompt becomes the input for the next full iteration of the mutate-critique-refine cycle. This process repeats, progressively enhancing the prompt’s quality.
Basic Usage
Configuration Parameters
A powerful language model used for the mutation, critique, and refinement steps. The quality of the optimization is highly dependent on this model’s capabilities. Recommended:
gpt-4o, claude-3-opus.The number of times the teacher model is called to generate variations of the prompt during the mutation phase of each iteration. More rounds create a more diverse candidate pool.
The total number of full cycles (Mutate -> Score -> Critique -> Refine) the optimizer will run. Each iteration builds upon the best prompt from the previous one.
The number of top-performing prompts to select from the candidate pool after scoring. These selected prompts are the ones that will be critiqued and refined. A larger beam size allows for more parallel exploration but increases computational cost.
Comparison with ProTeGi
PromptWizard and ProTeGi both use a teacher model to refine prompts, but their core strategies are different.| Aspect | PromptWizard | ProTeGi |
|---|---|---|
| Primary Strategy | Exploration then Refinement: Starts by creatively exploring a wide range of prompt styles (mutate), then refines the most successful ideas. | Error-Driven Correction: Focuses intensely on fixing what’s wrong. It generates specific critiques (“textual gradients”) for failures and applies targeted fixes. |
| Initial Step | Generates many diverse variations using “thinking styles” to see what might work. | Identifies specific examples where the current prompt fails. |
| Refinement Focus | Holistic improvement based on a high-level critique of the prompt’s general weaknesses. | Micro-level improvement based on multiple, specific critiques for a set of failures. |
| Best For | Finding novel phrasings, improving prompt style, and creative tasks where the “best” structure is unknown. | Systematically debugging a prompt with known, repeatable failure modes (e.g., always fails on JSON formatting). |
| Analogy | A brainstorming session followed by a focused workshop. | A debugging session with a senior engineer. |
Choose PromptWizard when you want to discover better ways to phrase your prompt. Choose ProTeGi when you know your prompt is close but has specific, identifiable flaws that need fixing.
Underlying Research
PromptWizard is based on the concept of self-evolving prompts, where an LLM iteratively improves its own instructions.- Core Paper: The framework is introduced in “PromptWizard: Task-Aware Prompt Optimization Framework” from Microsoft Research.
- Self-Evolution: The underlying mechanism is detailed in “Optimizing Prompts via Task-Aware, Feedback-Driven Self-Evolution”, which discusses the joint optimization of instructions and examples. The Microsoft Research Blog highlights this as a key direction for the future of prompt optimization.