Skip to main content
Meta-Prompt uses a powerful teacher LLM to analyze how your prompt performs, understand why it fails on specific examples, formulate hypotheses about improvements, and completely rewrite the prompt. This approach is inspired by the promptim library and excels at tasks requiring deep reasoning.

When to Use Meta-Prompt

✅ Best For

  • Complex reasoning tasks
  • Tasks where understanding failures helps
  • Refining well-scoped prompts
  • Deep iterative improvement

❌ Not Ideal For

  • Quick experiments (slower)
  • Simple classification tasks
  • Very large datasets (costly)
  • Tasks with unclear failure patterns

How It Works

Meta-Prompt follows a systematic analysis-and-rewrite cycle:
1

Evaluate Current Prompt

Run the current prompt on a subset of your dataset and collect scores
2

Identify Failures

Focus on examples with low scores to understand what went wrong
3

Formulate Hypothesis

Teacher model analyzes failures and proposes a specific improvement theory
4

Rewrite Prompt

Generate a complete new prompt implementing the hypothesis
5

Repeat

Continue for multiple rounds, building on previous insights
Unlike optimizers that tweak parts of a prompt, Meta-Prompt rewrites the entire prompt each iteration based on deep analysis.

Basic Usage

from fi.opt.optimizers import MetaPromptOptimizer
from fi.opt.generators import LiteLLMGenerator
from fi.opt.datamappers import BasicDataMapper
from fi.opt.base.evaluator import Evaluator

# Setup teacher model (use a powerful model for analysis)
teacher = LiteLLMGenerator(
    model="gpt-4o",
    prompt_template="{prompt}"
)

# Setup evaluator
evaluator = Evaluator(
    eval_template="summary_quality",
    eval_model_name="turing_flash",
    fi_api_key="your_key",
    fi_secret_key="your_secret"
)

# Setup data mapper
data_mapper = BasicDataMapper(
    key_map={"input": "text", "output": "generated_output"}
)

# Create optimizer
optimizer = MetaPromptOptimizer(
    teacher_generator=teacher
)

# Run optimization
result = optimizer.optimize(
    evaluator=evaluator,
    data_mapper=data_mapper,
    dataset=dataset,
    initial_prompts=["Summarize this text: {text}"],
    task_description="Create concise, informative summaries",
    num_rounds=5,
    eval_subset_size=40
)

print(f"Improvement: {result.final_score:.2%}")
print(f"Best prompt:\n{result.best_generator.get_prompt_template()}")

Configuration Parameters

Core Parameters

teacher_generator
LiteLLMGenerator
required
A powerful language model used for analyzing failures and generating improved prompts. Recommended: gpt-4o, gpt-4-turbo, or claude-3-opus.
teacher = LiteLLMGenerator("gpt-4o", "{prompt}")
task_description
str
default:"I want to improve my prompt."
Description of what you want the optimized prompt to achieve. More specific descriptions lead to better results.
task_description="Generate summaries that capture key points while being under 50 words"
num_rounds
int
default:"5"
Number of analysis-and-rewrite iterations. More rounds can lead to better results but cost more.
eval_subset_size
int
default:"40"
Number of examples to evaluate each round. Smaller = faster but less reliable signal.

The Meta-Prompt Process

What the Teacher Model Sees

In each round, the teacher model receives:
  1. Current Prompt - The prompt being evaluated
  2. Previous Failed Attempts - Prompts that performed worse (to avoid repeating mistakes)
  3. Performance Data - Detailed results showing which examples failed and why
  4. Task Description - Your goal for the optimization

What the Teacher Model Returns

The teacher provides two things:
{
  "hypothesis": "The prompt fails on complex multi-sentence texts because it doesn't specify a structure. Adding explicit instruction to identify main points first should improve clarity.",
  "improved_prompt": "First identify the 2-3 main points in the following text. Then write a single concise sentence that captures these points:\n\n{text}"
}

Underlying Research

The Meta-Prompt optimizer is inspired by meta-learning and reflective AI systems, where a model improves its own processes.
  • Meta-Learning: The core idea is formalized in research like “System Prompt Optimization with Meta-Learning”, which uses bilevel optimization. Another related work is “metaTextGrad”, which optimizes both prompts and their surrounding structures.
  • Industry Tools: This reflective approach is used in tools like Google’s Vertex AI Prompt Optimizer and is a key feature in advanced models for self-improvement.
  • Frameworks: The concept is explored in libraries like promptim and is classified in surveys as a leading LLM-driven optimization method.

Advanced Examples

With Detailed Task Description

optimizer = MetaPromptOptimizer(
    teacher_generator=LiteLLMGenerator("gpt-4o", "{prompt}")
)

result = optimizer.optimize(
    evaluator=evaluator,
    data_mapper=data_mapper,
    dataset=dataset,
    initial_prompts=[initial_prompt],
    
    # Provide detailed context
    task_description="""
    I want to extract structured information from customer support tickets.
    The prompt should:
    - Identify the main issue
    - Extract customer sentiment (positive/negative/neutral)
    - Determine urgency level (low/medium/high)
    - Suggest appropriate department routing
    
    The output must be in JSON format and handle incomplete information gracefully.
    """,
    
    num_rounds=7,
    eval_subset_size=30
)

With More Rounds for Complex Tasks

# For very complex tasks, use more rounds
optimizer = MetaPromptOptimizer(
    teacher_generator=LiteLLMGenerator("gpt-4o", "{prompt}")
)

result = optimizer.optimize(
    evaluator=evaluator,
    data_mapper=data_mapper,
    dataset=complex_dataset,
    initial_prompts=[initial_prompt],
    task_description=detailed_description,
    num_rounds=10,  # More iterations for complex refinement
    eval_subset_size=50  # More examples for reliable signal
)

# Analyze the evolution
for i, iteration in enumerate(result.history):
    print(f"\nRound {i+1} Score: {iteration.average_score:.4f}")
    print(f"Prompt: {iteration.prompt[:150]}...")

Combining with Other Optimizers

Use Meta-Prompt for deep refinement after initial exploration:
# Stage 1: Quick exploration
random_result = random_search_optimizer.optimize(...)

# Stage 2: Deep refinement on best candidate
meta_optimizer = MetaPromptOptimizer(
    teacher_generator=LiteLLMGenerator("gpt-4o", "{prompt}")
)

final_result = meta_optimizer.optimize(
    evaluator=evaluator,
    data_mapper=data_mapper,
    dataset=dataset,
    initial_prompts=[random_result.best_generator.get_prompt_template()],
    task_description="Refine for clarity and consistency",
    num_rounds=5
)

Understanding the Results

Tracking Hypothesis Evolution

Meta-Prompt’s hypotheses show its reasoning process:
result = optimizer.optimize(...)

# View the optimization journey
for i, iteration in enumerate(result.history):
    print(f"\n{'='*60}")
    print(f"Round {i+1}")
    print(f"Score: {iteration.average_score:.4f}")
    print(f"\nPrompt:\n{iteration.prompt}")
    
    # Note: Hypothesis is internal to teacher model, 
    # but you can infer it from prompt evolution

Analyzing Improvement Patterns

scores = [iteration.average_score for iteration in result.history]

import matplotlib.pyplot as plt
plt.plot(scores, marker='o')
plt.xlabel('Round')
plt.ylabel('Score')
plt.title('Meta-Prompt Optimization Progress')
plt.show()

# Calculate improvement
initial_score = scores[0]
final_score = scores[-1]
improvement = ((final_score - initial_score) / initial_score) * 100
print(f"Total improvement: {improvement:.1f}%")

Performance Tips

Meta-Prompt’s quality depends heavily on the teacher model’s reasoning ability. Use gpt-4o, claude-3-opus, or similar high-end models.
Specific task descriptions help the teacher make targeted improvements. Include constraints, desired output format, and edge cases to handle.
5 rounds is usually enough for meaningful improvement. Increase to 7-10 only for very complex tasks where you see continued progress.
  • Too small (< 20): Unreliable signal, may optimize for noise
  • Too large (> 50): Slow and expensive
  • Sweet spot: 30-40 examples
Look at low-scoring examples in each round to understand what the optimizer is trying to fix:
for iteration in result.history:
    failures = [r for r in iteration.individual_results if r.score < 0.5]
    print(f"Round failures: {len(failures)}")
    for f in failures[:3]:  # Show first 3
        print(f"  - Score: {f.score:.2f}, Reason: {f.reason}")

Common Patterns

Complex Reasoning Tasks

dataset = [
    {
        "problem": "Multi-step math word problem...",
        "solution": "Step-by-step solution..."
    }
]

optimizer = MetaPromptOptimizer(
    teacher_generator=LiteLLMGenerator("gpt-4o", "{prompt}")
)

result = optimizer.optimize(
    evaluator=evaluator,
    data_mapper=BasicDataMapper({
        "input": "problem",
        "output": "generated_output"
    }),
    dataset=dataset,
    initial_prompts=["Solve this problem: {problem}"],
    task_description="""
    Generate step-by-step solutions that:
    - Show clear reasoning at each step
    - Explain why each step is necessary
    - Arrive at the correct final answer
    """,
    num_rounds=8
)

Creative Writing with Constraints

optimizer = MetaPromptOptimizer(
    teacher_generator=LiteLLMGenerator("gpt-4o", "{prompt}")
)

result = optimizer.optimize(
    evaluator=evaluator,
    data_mapper=data_mapper,
    dataset=creative_dataset,
    initial_prompts=["Write a story based on: {prompt}"],
    task_description="""
    Generate engaging short stories (200-300 words) that:
    - Have a clear beginning, middle, and end
    - Include vivid sensory details
    - Match the tone specified in the prompt
    - Are appropriate for a general audience
    """,
    num_rounds=6,
    eval_subset_size=25
)

Data Transformation Tasks

optimizer = MetaPromptOptimizer(
    teacher_generator=LiteLLMGenerator("gpt-4o", "{prompt}")
)

result = optimizer.optimize(
    evaluator=evaluator,
    data_mapper=data_mapper,
    dataset=transformation_dataset,
    initial_prompts=["Convert this data: {input_data}"],
    task_description="""
    Transform unstructured text into JSON format with these fields:
    - name (string)
    - date (YYYY-MM-DD format)
    - amount (number)
    - category (one of: personal, business, travel)
    
    Handle missing fields by using null. Infer dates from context when possible.
    """,
    num_rounds=5
)

Troubleshooting

Problem: Improvement stops after 2-3 roundsSolution:
  • Your initial prompt might already be good - check if score is already high
  • Make task description more specific to guide further refinement
  • Try a different teacher model for fresh perspective
  • Increase eval_subset_size for more reliable signal
Problem: Each iteration adds more instructions, making prompts unwieldySolution:
  • Add to task description: “Keep the prompt concise and under 200 words”
  • Manually select a mid-optimization prompt that balances quality and length
  • Use fewer rounds (3-4 instead of 7-8)
Problem: Optimization is expensive with GPT-4Solution:
  • Reduce num_rounds to 3-5
  • Decrease eval_subset_size to 20-30
  • Use gpt-4o-mini as teacher for initial experiments
  • Run on a smaller dataset subset first to validate approach
Problem: Score goes up and down between roundsSolution:
  • Increase eval_subset_size for more stable measurements
  • Check if your evaluation metric is too noisy
  • Ensure dataset examples are high-quality and representative
  • Consider using a different evaluation metric

Comparison with Other Optimizers

AspectMeta-PromptBayesian SearchProTeGi
ApproachAnalysis & rewriteFew-shot selectionError-driven fixing
Best forComplex reasoningStructured tasksSystematic debugging
SpeedMediumFastSlow
Prompt changesComplete rewritesExample selectionTargeted edits
Teacher dependencyHighMediumHigh

Next Steps