PromptWizard Optimizer

PromptWizard is a feedback-driven optimizer that improves prompts through a multi-stage process. It first explores creative variations of a prompt using different “thinking styles,” identifies the most promising candidates, critiques their failures, and then systematically refines them. It uses beam search to maintain and evolve the best-performing prompts over several iterations.

When to Use PromptWizard

✅ Best For

Creative domains and content generation
Improving prompt style and meta-instructions
Complex tasks requiring reasoning
When you need a balance of exploration and refinement

❌ Not Ideal For

Quick, simple optimizations
When teacher model quality is low
Projects with tight computational budgets
Tasks with very narrow, specific failure modes (ProTeGi may be better)

How It Works

PromptWizard follows a sophisticated, multi-stage loop for a set number of refine_iterations. Each iteration aims to evolve the best prompt from the previous round.

1. Mutate & Expand

The optimizer takes the current best prompt and generates numerous creative variations. It uses a powerful teacher model and a list of diverse “thinking styles” (e.g., “Think step-by-step,” “Analyze from different perspectives”) to create a large pool of candidate prompts.

2. Score & Select

All candidate prompts in the pool are evaluated against a subset of the dataset. Their performance is scored, and the top prompts are selected based on the beam_size. This ensures that only the most promising variations proceed.

3. Critique Failures

For each of the top-performing prompts, the optimizer identifies specific examples from the dataset where it performed poorly (i.e., received a low score). The teacher model then generates a detailed critique, explaining the likely reasons for failure.

4. Refine with Feedback

Using the original prompt, the failed examples, and the generated critique, the teacher model rewrites the prompt to address the identified weaknesses. This creates a new set of refined prompts.

5. Final Selection & Iteration

The refined prompts are scored again. The single best-performing prompt becomes the input for the next full iteration of the mutate-critique-refine cycle. This process repeats, progressively enhancing the prompt’s quality.

Basic Usage

from fi.opt.optimizers import PromptWizardOptimizer
from fi.opt.generators import LiteLLMGenerator
from fi.opt.datamappers import BasicDataMapper
from fi.opt.base.evaluator import Evaluator

# 1. Setup a powerful teacher model for the optimization process
teacher = LiteLLMGenerator(
    model="gpt-4o",
    prompt_template="{prompt}"
)

# 2. Setup the evaluator to score prompt performance
evaluator = Evaluator(
    eval_template="summary_quality",
    eval_model_name="turing_flash",
    fi_api_key="your_key",
    fi_secret_key="your_secret"
)

# 3. Setup the data mapper
data_mapper = BasicDataMapper(
    key_map={"input": "article", "output": "generated_output"}
)

# 4. Initialize the PromptWizard optimizer
optimizer = PromptWizardOptimizer(
    teacher_generator=teacher,
    mutate_rounds=3,        # Number of mutation rounds per iteration
    refine_iterations=2,    # Total number of refinement cycles
    beam_size=2             # Keep top 2 prompts for critique/refinement
)

# 5. Run the optimization
result = optimizer.optimize(
    evaluator=evaluator,
    data_mapper=data_mapper,
    dataset=my_dataset,
    initial_prompts=["Summarize the following article: {article}"],
    task_description="Generate a concise, one-sentence summary of the article.",
    eval_subset_size=20
)

print(f"Best prompt found: {result.best_generator.get_prompt_template()}")
print(f"Final score: {result.final_score:.4f}")

Configuration Parameters

teacher_generator

LiteLLMGenerator

required

A powerful language model used for the mutation, critique, and refinement steps. The quality of the optimization is highly dependent on this model’s capabilities. Recommended: gpt-4o, claude-3-opus.

mutate_rounds

int

default:"3"

The number of times the teacher model is called to generate variations of the prompt during the mutation phase of each iteration. More rounds create a more diverse candidate pool.

refine_iterations

int

default:"2"

The total number of full cycles (Mutate -> Score -> Critique -> Refine) the optimizer will run. Each iteration builds upon the best prompt from the previous one.

beam_size

int

default:"1"

The number of top-performing prompts to select from the candidate pool after scoring. These selected prompts are the ones that will be critiqued and refined. A larger beam size allows for more parallel exploration but increases computational cost.

Comparison with ProTeGi

PromptWizard and ProTeGi both use a teacher model to refine prompts, but their core strategies are different.

Aspect	PromptWizard	ProTeGi
Primary Strategy	Exploration then Refinement: Starts by creatively exploring a wide range of prompt styles (`mutate`), then refines the most successful ideas.	Error-Driven Correction: Focuses intensely on fixing what’s wrong. It generates specific critiques (“textual gradients”) for failures and applies targeted fixes.
Initial Step	Generates many diverse variations using “thinking styles” to see what might work.	Identifies specific examples where the current prompt fails.
Refinement Focus	Holistic improvement based on a high-level critique of the prompt’s general weaknesses.	Micro-level improvement based on multiple, specific critiques for a set of failures.
Best For	Finding novel phrasings, improving prompt style, and creative tasks where the “best” structure is unknown.	Systematically debugging a prompt with known, repeatable failure modes (e.g., always fails on JSON formatting).
Analogy	A brainstorming session followed by a focused workshop.	A debugging session with a senior engineer.

Choose PromptWizard when you want to discover better ways to phrase your prompt. Choose ProTeGi when you know your prompt is close but has specific, identifiable flaws that need fixing.

Underlying Research

PromptWizard is based on the concept of self-evolving prompts, where an LLM iteratively improves its own instructions.

Core Paper: The framework is introduced in “PromptWizard: Task-Aware Prompt Optimization Framework” from Microsoft Research.
Self-Evolution: The underlying mechanism is detailed in “Optimizing Prompts via Task-Aware, Feedback-Driven Self-Evolution”, which discusses the joint optimization of instructions and examples. The Microsoft Research Blog highlights this as a key direction for the future of prompt optimization.

Get Started

Guides

When to Use PromptWizard

✅ Best For

❌ Not Ideal For

How It Works

Basic Usage

Configuration Parameters

Comparison with ProTeGi

Underlying Research

Next Steps

Try ProTeGi

Compare All Optimizers

Get Started

Guides

​When to Use PromptWizard

✅ Best For

❌ Not Ideal For

​How It Works

​Basic Usage

​Configuration Parameters

​Comparison with ProTeGi

​Underlying Research

​Next Steps

Try ProTeGi

Compare All Optimizers

When to Use PromptWizard

How It Works

Basic Usage

Configuration Parameters

Comparison with ProTeGi

Underlying Research

Next Steps