PromptWizard Optimizer

Learn about PromptWizard, a multi-stage feedback-driven optimizer that improves prompts through a cycle of mutation, critique, and refinement.

PromptWizard is a feedback-driven optimizer that improves prompts through a multi-stage process. It first explores creative variations of a prompt using different “thinking styles,” identifies the most promising candidates, critiques their failures, and then systematically refines them. It uses beam search to maintain and evolve the best-performing prompts over several iterations.

When to Use PromptWizard

✅ Best For

Creative domains and content generation
Improving prompt style and meta-instructions
Complex tasks requiring reasoning
When you need a balance of exploration and refinement

❌ Not Ideal For

Quick, simple optimizations
When teacher model quality is low
Projects with tight computational budgets
Tasks with very narrow, specific failure modes (ProTeGi may be better)

How It Works

PromptWizard follows a sophisticated, multi-stage loop for a set number of refine_iterations. Each iteration aims to evolve the best prompt from the previous round.

1. Mutate & Expand

The optimizer takes the current best prompt and generates numerous creative variations. It uses a powerful teacher model and a list of diverse “thinking styles” (e.g., “Think step-by-step,” “Analyze from different perspectives”) to create a large pool of candidate prompts.

2. Score & Select

All candidate prompts in the pool are evaluated against a subset of the dataset. Their performance is scored, and the top prompts are selected based on the beam_size. This ensures that only the most promising variations proceed.

3. Critique Failures

For each of the top-performing prompts, the optimizer identifies specific examples from the dataset where it performed poorly (i.e., received a low score). The teacher model then generates a detailed critique, explaining the likely reasons for failure.

4. Refine with Feedback

Using the original prompt, the failed examples, and the generated critique, the teacher model rewrites the prompt to address the identified weaknesses. This creates a new set of refined prompts.

5. Final Selection & Iteration

The refined prompts are scored again. The single best-performing prompt becomes the input for the next full iteration of the mutate-critique-refine cycle. This process repeats, progressively enhancing the prompt’s quality.

Basic Usage

from fi.opt.optimizers import PromptWizardOptimizer
from fi.opt.generators import LiteLLMGenerator
from fi.opt.datamappers import BasicDataMapper
from fi.opt.base.evaluator import Evaluator

# 1. Setup a powerful teacher model for the optimization process
teacher = LiteLLMGenerator(
    model="gpt-4o",
    prompt_template="{prompt}"
)

# 2. Setup the evaluator to score prompt performance
evaluator = Evaluator(
    eval_template="summary_quality",
    eval_model_name="turing_flash",
    fi_api_key="your_key",
    fi_secret_key="your_secret"
)

# 3. Setup the data mapper
data_mapper = BasicDataMapper(
    key_map={"input": "article", "output": "generated_output"}
)

# 4. Initialize the PromptWizard optimizer
optimizer = PromptWizardOptimizer(
    teacher_generator=teacher,
    mutate_rounds=3,        # Number of mutation rounds per iteration
    refine_iterations=2,    # Total number of refinement cycles
    beam_size=2             # Keep top 2 prompts for critique/refinement
)

# 5. Run the optimization
result = optimizer.optimize(
    evaluator=evaluator,
    data_mapper=data_mapper,
    dataset=my_dataset,
    initial_prompts=["Summarize the following article: {article}"],
    task_description="Generate a concise, one-sentence summary of the article.",
    eval_subset_size=20
)

print(f"Best prompt found: {result.best_generator.get_prompt_template()}")
print(f"Final score: {result.final_score:.4f}")

Parameters

Parameter	Type	Default	Description
`teacher_generator`	LiteLLMGenerator	required	Model for mutation, critique, and refinement (e.g. gpt-4o)
`mutate_rounds`	int	3	Mutation calls per iteration; more = more diverse pool
`refine_iterations`	int	2	Full cycles (Mutate → Score → Critique → Refine)
`beam_size`	int	1	Top prompts to keep for critique and refinement

Tips: Use a strong teacher; start with mutate_rounds=3, refine_iterations=2. Slow: reduce those or eval_subset_size. Little improvement: make task_description more specific or try ProTeGi for clear failure patterns.

Vs ProTeGi: PromptWizard explores first (mutate with “thinking styles”) then refines; best for novel phrasings and style. ProTeGi is error-driven (fix specific failures); best when you have identifiable flaws to fix.

Underlying Research

PromptWizard is based on the concept of self-evolving prompts, where an LLM iteratively improves its own instructions.

Core Paper: The framework is introduced in “PromptWizard: Task-Aware Prompt Optimization Framework” from Microsoft Research.
Self-Evolution: The underlying mechanism is detailed in “Optimizing Prompts via Task-Aware, Feedback-Driven Self-Evolution”, which discusses the joint optimization of instructions and examples. The Microsoft Research Blog highlights this as a key direction for the future of prompt optimization.

Next steps

Try ProTeGi

For a more error-driven approach

Compare Optimizers

See all optimization strategies.

Was this page helpful?

FutureAGI AI Assistant