PromptWizard Optimizer

Learn about PromptWizard, a multi-stage feedback-driven optimizer that improves prompts through a cycle of mutation, critique, and refinement.

PromptWizard is a feedback-driven optimizer that improves prompts through a multi-stage process. It first explores creative variations of a prompt using different “thinking styles,” identifies the most promising candidates, critiques their failures, and then systematically refines them. It uses beam search to maintain and evolve the best-performing prompts over several iterations.


When to Use PromptWizard

✅ Best For

  • Creative domains and content generation
  • Improving prompt style and meta-instructions
  • Complex tasks requiring reasoning
  • When you need a balance of exploration and refinement

❌ Not Ideal For

  • Quick, simple optimizations
  • When teacher model quality is low
  • Projects with tight computational budgets
  • Tasks with very narrow, specific failure modes (ProTeGi may be better)

How It Works

PromptWizard follows a sophisticated, multi-stage loop for a set number of refine_iterations. Each iteration aims to evolve the best prompt from the previous round.

1. Mutate & Expand

The optimizer takes the current best prompt and generates numerous creative variations. It uses a powerful teacher model and a list of diverse “thinking styles” (e.g., “Think step-by-step,” “Analyze from different perspectives”) to create a large pool of candidate prompts.

2. Score & Select

All candidate prompts in the pool are evaluated against a subset of the dataset. Their performance is scored, and the top prompts are selected based on the beam_size. This ensures that only the most promising variations proceed.

3. Critique Failures

For each of the top-performing prompts, the optimizer identifies specific examples from the dataset where it performed poorly (i.e., received a low score). The teacher model then generates a detailed critique, explaining the likely reasons for failure.

4. Refine with Feedback

Using the original prompt, the failed examples, and the generated critique, the teacher model rewrites the prompt to address the identified weaknesses. This creates a new set of refined prompts.

5. Final Selection & Iteration

The refined prompts are scored again. The single best-performing prompt becomes the input for the next full iteration of the mutate-critique-refine cycle. This process repeats, progressively enhancing the prompt’s quality.


Basic Usage

from fi.opt.optimizers import PromptWizardOptimizer
from fi.opt.generators import LiteLLMGenerator
from fi.opt.datamappers import BasicDataMapper
from fi.opt.base.evaluator import Evaluator

# 1. Setup a powerful teacher model for the optimization process
teacher = LiteLLMGenerator(
    model="gpt-4o",
    prompt_template="{prompt}"
)

# 2. Setup the evaluator to score prompt performance
evaluator = Evaluator(
    eval_template="summary_quality",
    eval_model_name="turing_flash",
    fi_api_key="your_key",
    fi_secret_key="your_secret"
)

# 3. Setup the data mapper
data_mapper = BasicDataMapper(
    key_map={"input": "article", "output": "generated_output"}
)

# 4. Initialize the PromptWizard optimizer
optimizer = PromptWizardOptimizer(
    teacher_generator=teacher,
    mutate_rounds=3,        # Number of mutation rounds per iteration
    refine_iterations=2,    # Total number of refinement cycles
    beam_size=2             # Keep top 2 prompts for critique/refinement
)

# 5. Run the optimization
result = optimizer.optimize(
    evaluator=evaluator,
    data_mapper=data_mapper,
    dataset=my_dataset,
    initial_prompts=["Summarize the following article: {article}"],
    task_description="Generate a concise, one-sentence summary of the article.",
    eval_subset_size=20
)

print(f"Best prompt found: {result.best_generator.get_prompt_template()}")
print(f"Final score: {result.final_score:.4f}")

Parameters

ParameterTypeDefaultDescription
teacher_generatorLiteLLMGeneratorrequiredModel for mutation, critique, and refinement (e.g. gpt-4o)
mutate_roundsint3Mutation calls per iteration; more = more diverse pool
refine_iterationsint2Full cycles (Mutate → Score → Critique → Refine)
beam_sizeint1Top prompts to keep for critique and refinement

Tips: Use a strong teacher; start with mutate_rounds=3, refine_iterations=2. Slow: reduce those or eval_subset_size. Little improvement: make task_description more specific or try ProTeGi for clear failure patterns.

Vs ProTeGi: PromptWizard explores first (mutate with “thinking styles”) then refines; best for novel phrasings and style. ProTeGi is error-driven (fix specific failures); best when you have identifiable flaws to fix.


Underlying Research

PromptWizard is based on the concept of self-evolving prompts, where an LLM iteratively improves its own instructions.


Next steps

Was this page helpful?

Questions & Discussion