Skip to main content
Bayesian Search uses Bayesian optimization (via Optuna) to intelligently explore the space of few-shot prompt configurations. Instead of randomly trying different prompts, it learns from each trial to make smarter choices about which examples and configurations to test next.

✅ Best For

  • Few-shot learning tasks
  • Efficient exploration
  • Structured Q&A or classification
  • Limited evaluation budget

❌ Not Ideal For

  • Tasks without examples in dataset
  • Purely zero-shot scenarios
  • Very creative/open-ended tasks
  • Tiny datasets (< 10 examples)

How It Works

  1. Few-Shot Selection: Intelligently samples different numbers and combinations of examples from your dataset
  2. Template Optimization: Can automatically infer the best way to format examples (optional)
  3. Bayesian Learning: Uses previous trial results to guide future selections
  4. Efficient Search: Converges faster than random search by learning from history
1

Initialize Search Space

Define range of few-shot examples (e.g., 2-8 examples) and other configurations
2

Sample Configuration

Bayesian optimizer suggests number of examples and which ones to use
3

Build Prompt

Format selected examples and combine with base prompt
4

Evaluate

Generate outputs and score them on eval subset
5

Update & Repeat

Optimizer learns from results and suggests next configuration

Basic Usage

from fi.opt.optimizers import BayesianSearchOptimizer
from fi.opt.datamappers import BasicDataMapper
from fi.opt.base.evaluator import Evaluator

# Setup evaluator
evaluator = Evaluator(
    eval_template="summary_quality",
    eval_model_name="turing_flash",
    fi_api_key="your_key",
    fi_secret_key="your_secret"
)

# Setup data mapper
data_mapper = BasicDataMapper(
    key_map={"input": "text", "output": "generated_output"}
)

# Create optimizer
optimizer = BayesianSearchOptimizer(
    inference_model_name="gpt-4o-mini",
    n_trials=20,
    min_examples=2,
    max_examples=8
)

# Run optimization
result = optimizer.optimize(
    evaluator=evaluator,
    data_mapper=data_mapper,
    dataset=dataset,
    initial_prompts=["Summarize: {text}"]
)

Configuration Parameters

Search Space

min_examples
int
default:"2"
Minimum number of few-shot examples to try
max_examples
int
default:"8"
Maximum number of few-shot examples to try
allow_repeats
bool
default:"false"
Whether the same example can be used multiple times in few-shot block
fixed_example_indices
List[int]
default:"[]"
Specific example indices that must always be included
fixed_example_indices=[0, 5]  # Always include examples at index 0 and 5

Optimization Control

n_trials
int
default:"10"
Number of different configurations to try. More trials = better results but higher cost.
seed
int
default:"42"
Random seed for reproducibility
direction
str
default:"maximize"
Optimization direction. Use "maximize" for scores, "minimize" for loss/error rates.

Model Configuration

inference_model_name
str
default:"gpt-4o-mini"
Model used to generate outputs during optimization
inference_model_kwargs
dict
default:"{}"
Additional arguments passed to the inference model
inference_model_kwargs={"temperature": 0.7, "max_tokens": 200}

Example Formatting

example_template
str
default:"None"
Template string for formatting examples using Python .format() syntax
example_template="Q: {question}\nA: {answer}"
example_template_fields
List[str]
default:"None"
List of fields to include when no template is provided
example_template_fields=["question", "answer"]
field_aliases
Dict[str, str]
default:"{}"
Custom labels for fields in examples
field_aliases={"question": "Input", "answer": "Output"}
example_separator
str
default:"\\n"
String used to separate multiple examples in the few-shot block
example_separator="\n\n---\n\n"
few_shot_position
str
default:"append"
Where to place few-shot examples: "append" (after base prompt) or "prepend" (before)
few_shot_title
str
default:"None"
Optional title/header for the few-shot examples section
few_shot_title="Here are some examples:"

Teacher-Guided Template Inference

infer_example_template_via_teacher
bool
default:"false"
Use a teacher model to automatically infer the best example format from your data
teacher_model_name
str
default:"gpt-5"
Powerful model used for template inference
teacher_model_kwargs
dict
default:"{'temperature': 1.0, 'max_tokens': 16000}"
Arguments for the teacher model
template_infer_n_samples
int
default:"8"
Number of dataset examples to show the teacher for template inference
Template inference is powerful but costs extra API calls. Use it when you’re unsure how to format examples.

Evaluation Controls

eval_subset_size
int
default:"None"
Number of examples to evaluate per trial (for speed). If None, uses entire dataset.
eval_subset_strategy
str
default:"random"
How to select eval subset: "random", "first", or "all"

Underlying Research

Bayesian Search builds on established principles of Bayesian optimization, adapted for the unique challenges of prompt engineering.
  • Core Concept: The method is detailed in papers like “A Bayesian approach for prompt optimization in pre-trained models”, which explores mapping discrete prompts to continuous embeddings for more efficient searching.
  • Few-Shot Learning: Its application in few-shot scenarios is highlighted by tools like Comet’s OPik, which features a “Few-Shot Bayesian Optimizer”.
  • Advanced Implementations: Recent research, such as “Searching for Optimal Solutions with LLMs via Bayesian Optimization (BOPRO)”, investigates using Bayesian optimization to navigate complex LLM search spaces. The popular BayesianOptimization library on GitHub provides the foundational Gaussian process-based modeling.
This approach is noted for its efficiency in prominent frameworks like DSPy and is recognized in surveys for its effectiveness in few-shot learning contexts.

Advanced Examples

With Automatic Template Inference

Let the teacher model determine the best example format:
optimizer = BayesianSearchOptimizer(
    inference_model_name="gpt-4o-mini",
    teacher_model_name="gpt-4o",
    n_trials=25,
    min_examples=3,
    max_examples=6,
    
    # Enable automatic template inference
    infer_example_template_via_teacher=True,
    template_infer_n_samples=10,
    
    # Evaluation settings
    eval_subset_size=15,
    eval_subset_strategy="random"
)

result = optimizer.optimize(
    evaluator=evaluator,
    data_mapper=data_mapper,
    dataset=dataset,
    initial_prompts=[initial_prompt]
)

print(f"Best score: {result.final_score}")
print(f"Optimized prompt:\n{result.best_generator.get_prompt_template()}")

With Custom Example Formatting

Full control over example formatting:
def custom_formatter(example: dict) -> str:
    """Custom function to format each example."""
    return f"""
    Context: {example['context']}
    Question: {example['question']}
    Answer: {example['answer']}
    ---
    """

optimizer = BayesianSearchOptimizer(
    inference_model_name="gpt-4o-mini",
    n_trials=20,
    min_examples=2,
    max_examples=5,
    
    # Use custom formatter
    example_formatter=custom_formatter,
    few_shot_position="prepend",
    few_shot_title="## Example Q&A Pairs"
)

With Custom Prompt Builder

Control how few-shot examples integrate with base prompt:
def custom_prompt_builder(base_prompt: str, few_shot_blocks: list) -> str:
    """Custom function to build the final prompt."""
    few_shot_text = few_shot_blocks[0] if few_shot_blocks else ""
    
    return f"""
    # Task Instructions
    {base_prompt}
    
    # Reference Examples
    {few_shot_text}
    
    # Your Turn
    Now apply these instructions to the following:
    """

optimizer = BayesianSearchOptimizer(
    inference_model_name="gpt-4o-mini",
    n_trials=15,
    min_examples=2,
    max_examples=4,
    prompt_builder=custom_prompt_builder
)

With Fixed Examples

Always include certain critical examples:
# Suppose examples at indices 0, 5, and 10 are particularly important
optimizer = BayesianSearchOptimizer(
    inference_model_name="gpt-4o-mini",
    n_trials=20,
    min_examples=5,  # Will always have at least 5 (3 fixed + 2 additional)
    max_examples=10,
    
    # These will always be included
    fixed_example_indices=[0, 5, 10],
    
    # Optimizer will vary the additional examples
    allow_repeats=False
)

Understanding the Results

Analyzing Optimization History

result = optimizer.optimize(...)

# See all tried configurations
for i, iteration in enumerate(result.history):
    print(f"\nTrial {i+1}:")
    print(f"Score: {iteration.average_score:.4f}")
    print(f"Prompt snippet: {iteration.prompt[:200]}...")
    
    # Count number of examples used
    num_examples = iteration.prompt.count("Q:") - 1  # Adjust based on your format
    print(f"Examples used: ~{num_examples}")

Extracting Best Configuration

# Get the best prompt
best_prompt = result.best_generator.get_prompt_template()

# Extract few-shot examples from the prompt
# (Pattern depends on your formatting)
import re
examples = re.findall(r"Q: (.*?)\nA: (.*?)\n", best_prompt)
print(f"Best configuration used {len(examples)} examples")

Performance Tips

Begin with n_trials=10 to validate your setup, then increase to 20-30 for production.
Set eval_subset_size=20 when you have 50+ examples to speed up optimization significantly.
  • Classification: min_examples=2, max_examples=5
  • Complex reasoning: min_examples=3, max_examples=8
  • Creative tasks: min_examples=1, max_examples=4
Run a quick optimization with infer_example_template_via_teacher=True, save the inferred template, then use it explicitly in future runs to save costs.

Common Patterns

Question Answering with Context

dataset = [
    {
        "context": "...",
        "question": "...",
        "answer": "..."
    }
]

optimizer = BayesianSearchOptimizer(
    inference_model_name="gpt-4o-mini",
    n_trials=20,
    min_examples=2,
    max_examples=6,
    example_template="Context: {context}\nQ: {question}\nA: {answer}",
    example_separator="\n\n",
    few_shot_position="prepend"
)

Text Classification

dataset = [
    {
        "text": "Product review text...",
        "label": "positive"  # or "negative", "neutral"
    }
]

optimizer = BayesianSearchOptimizer(
    inference_model_name="gpt-4o-mini",
    n_trials=15,
    min_examples=3,
    max_examples=8,
    example_template="Text: {text}\nSentiment: {label}",
    eval_subset_size=25
)

Data Extraction

dataset = [
    {
        "input_text": "John Doe lives in NYC...",
        "extracted_name": "John Doe",
        "extracted_location": "NYC"
    }
]

optimizer = BayesianSearchOptimizer(
    inference_model_name="gpt-4o-mini",
    n_trials=20,
    min_examples=2,
    max_examples=5,
    example_template_fields=["input_text", "extracted_name", "extracted_location"],
    field_aliases={
        "input_text": "Input",
        "extracted_name": "Name",
        "extracted_location": "Location"
    }
)

Troubleshooting

Problem: KeyError when formatting examplesSolution: Ensure all fields in example_template exist in your dataset examples. Use example_template_fields to explicitly list available fields.
Problem: Scores stop improving after few trialsSolution:
  • Increase max_examples to explore larger few-shot sizes
  • Try infer_example_template_via_teacher=True
  • Check if your dataset has sufficient diversity
Problem: Each trial takes too longSolution:
  • Set eval_subset_size=10 or smaller
  • Use a faster inference model
  • Reduce max_examples
Problem: Adding examples doesn’t improve scoresSolution:
  • Verify examples are high-quality and diverse
  • Check that example_template formats them clearly
  • Your task might not benefit from few-shot (try Meta-Prompt instead)

Next Steps