Skip to main content
PromptWizard is a feedback-driven optimizer that improves prompts through a multi-stage process. It first explores creative variations of a prompt using different “thinking styles,” identifies the most promising candidates, critiques their failures, and then systematically refines them. It uses beam search to maintain and evolve the best-performing prompts over several iterations.

When to Use PromptWizard

✅ Best For

  • Creative domains and content generation
  • Improving prompt style and meta-instructions
  • Complex tasks requiring reasoning
  • When you need a balance of exploration and refinement

❌ Not Ideal For

  • Quick, simple optimizations
  • When teacher model quality is low
  • Projects with tight computational budgets
  • Tasks with very narrow, specific failure modes (ProTeGi may be better)

How It Works

PromptWizard follows a sophisticated, multi-stage loop for a set number of refine_iterations. Each iteration aims to evolve the best prompt from the previous round.
1

1. Mutate & Expand

The optimizer takes the current best prompt and generates numerous creative variations. It uses a powerful teacher model and a list of diverse “thinking styles” (e.g., “Think step-by-step,” “Analyze from different perspectives”) to create a large pool of candidate prompts.
2

2. Score & Select

All candidate prompts in the pool are evaluated against a subset of the dataset. Their performance is scored, and the top prompts are selected based on the beam_size. This ensures that only the most promising variations proceed.
3

3. Critique Failures

For each of the top-performing prompts, the optimizer identifies specific examples from the dataset where it performed poorly (i.e., received a low score). The teacher model then generates a detailed critique, explaining the likely reasons for failure.
4

4. Refine with Feedback

Using the original prompt, the failed examples, and the generated critique, the teacher model rewrites the prompt to address the identified weaknesses. This creates a new set of refined prompts.
5

5. Final Selection & Iteration

The refined prompts are scored again. The single best-performing prompt becomes the input for the next full iteration of the mutate-critique-refine cycle. This process repeats, progressively enhancing the prompt’s quality.

Basic Usage

from fi.opt.optimizers import PromptWizardOptimizer
from fi.opt.generators import LiteLLMGenerator
from fi.opt.datamappers import BasicDataMapper
from fi.opt.base.evaluator import Evaluator

# 1. Setup a powerful teacher model for the optimization process
teacher = LiteLLMGenerator(
    model="gpt-4o",
    prompt_template="{prompt}"
)

# 2. Setup the evaluator to score prompt performance
evaluator = Evaluator(
    eval_template="summary_quality",
    eval_model_name="turing_flash",
    fi_api_key="your_key",
    fi_secret_key="your_secret"
)

# 3. Setup the data mapper
data_mapper = BasicDataMapper(
    key_map={"input": "article", "output": "generated_output"}
)

# 4. Initialize the PromptWizard optimizer
optimizer = PromptWizardOptimizer(
    teacher_generator=teacher,
    mutate_rounds=3,        # Number of mutation rounds per iteration
    refine_iterations=2,    # Total number of refinement cycles
    beam_size=2             # Keep top 2 prompts for critique/refinement
)

# 5. Run the optimization
result = optimizer.optimize(
    evaluator=evaluator,
    data_mapper=data_mapper,
    dataset=my_dataset,
    initial_prompts=["Summarize the following article: {article}"],
    task_description="Generate a concise, one-sentence summary of the article.",
    eval_subset_size=20
)

print(f"Best prompt found: {result.best_generator.get_prompt_template()}")
print(f"Final score: {result.final_score:.4f}")

Configuration Parameters

teacher_generator
LiteLLMGenerator
required
A powerful language model used for the mutation, critique, and refinement steps. The quality of the optimization is highly dependent on this model’s capabilities. Recommended: gpt-4o, claude-3-opus.
mutate_rounds
int
default:"3"
The number of times the teacher model is called to generate variations of the prompt during the mutation phase of each iteration. More rounds create a more diverse candidate pool.
refine_iterations
int
default:"2"
The total number of full cycles (Mutate -> Score -> Critique -> Refine) the optimizer will run. Each iteration builds upon the best prompt from the previous one.
beam_size
int
default:"1"
The number of top-performing prompts to select from the candidate pool after scoring. These selected prompts are the ones that will be critiqued and refined. A larger beam size allows for more parallel exploration but increases computational cost.

Comparison with ProTeGi

PromptWizard and ProTeGi both use a teacher model to refine prompts, but their core strategies are different.
AspectPromptWizardProTeGi
Primary StrategyExploration then Refinement: Starts by creatively exploring a wide range of prompt styles (mutate), then refines the most successful ideas.Error-Driven Correction: Focuses intensely on fixing what’s wrong. It generates specific critiques (“textual gradients”) for failures and applies targeted fixes.
Initial StepGenerates many diverse variations using “thinking styles” to see what might work.Identifies specific examples where the current prompt fails.
Refinement FocusHolistic improvement based on a high-level critique of the prompt’s general weaknesses.Micro-level improvement based on multiple, specific critiques for a set of failures.
Best ForFinding novel phrasings, improving prompt style, and creative tasks where the “best” structure is unknown.Systematically debugging a prompt with known, repeatable failure modes (e.g., always fails on JSON formatting).
AnalogyA brainstorming session followed by a focused workshop.A debugging session with a senior engineer.
Choose PromptWizard when you want to discover better ways to phrase your prompt. Choose ProTeGi when you know your prompt is close but has specific, identifiable flaws that need fixing.

Underlying Research

PromptWizard is based on the concept of self-evolving prompts, where an LLM iteratively improves its own instructions.

Next Steps

I