Bayesian Search Optimizer

Bayesian Search uses Bayesian optimization (via Optuna) to intelligently explore the space of few-shot prompt configurations. Instead of randomly trying different prompts, it learns from each trial to make smarter choices about which examples and configurations to test next.

When to Use Bayesian Search

✅ Best For

Few-shot learning tasks
Efficient exploration
Structured Q&A or classification
Limited evaluation budget

❌ Not Ideal For

Tasks without examples in dataset
Purely zero-shot scenarios
Very creative/open-ended tasks
Tiny datasets (< 10 examples)

How It Works

Few-Shot Selection: Intelligently samples different numbers and combinations of examples from your dataset
Template Optimization: Can automatically infer the best way to format examples (optional)
Bayesian Learning: Uses previous trial results to guide future selections
Efficient Search: Converges faster than random search by learning from history

Initialize Search Space

Define range of few-shot examples (e.g., 2-8 examples) and other configurations

Sample Configuration

Bayesian optimizer suggests number of examples and which ones to use

Build Prompt

Format selected examples and combine with base prompt

Evaluate

Generate outputs and score them on eval subset

Update & Repeat

Optimizer learns from results and suggests next configuration

Basic Usage

from fi.opt.optimizers import BayesianSearchOptimizer
from fi.opt.datamappers import BasicDataMapper
from fi.opt.base.evaluator import Evaluator

# Setup evaluator
evaluator = Evaluator(
    eval_template="summary_quality",
    eval_model_name="turing_flash",
    fi_api_key="your_key",
    fi_secret_key="your_secret"
)

# Setup data mapper
data_mapper = BasicDataMapper(
    key_map={"input": "text", "output": "generated_output"}
)

# Create optimizer
optimizer = BayesianSearchOptimizer(
    inference_model_name="gpt-4o-mini",
    n_trials=20,
    min_examples=2,
    max_examples=8
)

# Run optimization
result = optimizer.optimize(
    evaluator=evaluator,
    data_mapper=data_mapper,
    dataset=dataset,
    initial_prompts=["Summarize: {text}"]
)

Configuration Parameters

Search Space

min_examples

int

default:"2"

Minimum number of few-shot examples to try

max_examples

int

default:"8"

Maximum number of few-shot examples to try

allow_repeats

bool

default:"false"

Whether the same example can be used multiple times in few-shot block

fixed_example_indices

List[int]

default:"[]"

Specific example indices that must always be included

fixed_example_indices=[0, 5]  # Always include examples at index 0 and 5

Optimization Control

n_trials

int

default:"10"

Number of different configurations to try. More trials = better results but higher cost.

seed

int

default:"42"

Random seed for reproducibility

direction

str

default:"maximize"

Optimization direction. Use "maximize" for scores, "minimize" for loss/error rates.

Model Configuration

inference_model_name

str

default:"gpt-4o-mini"

Model used to generate outputs during optimization

inference_model_kwargs

dict

default:"{}"

Additional arguments passed to the inference model

inference_model_kwargs={"temperature": 0.7, "max_tokens": 200}

Example Formatting

example_template

str

default:"None"

Template string for formatting examples using Python .format() syntax

example_template="Q: {question}\nA: {answer}"

example_template_fields

List[str]

default:"None"

List of fields to include when no template is provided

example_template_fields=["question", "answer"]

field_aliases

Dict[str, str]

default:"{}"

Custom labels for fields in examples

field_aliases={"question": "Input", "answer": "Output"}

example_separator

str

default:"\\n"

String used to separate multiple examples in the few-shot block

example_separator="\n\n---\n\n"

few_shot_position

str

default:"append"

Where to place few-shot examples: "append" (after base prompt) or "prepend" (before)

few_shot_title

str

default:"None"

Optional title/header for the few-shot examples section

few_shot_title="Here are some examples:"

Teacher-Guided Template Inference

infer_example_template_via_teacher

bool

default:"false"

Use a teacher model to automatically infer the best example format from your data

teacher_model_name

str

default:"gpt-5"

Powerful model used for template inference

teacher_model_kwargs

dict

default:"{'temperature': 1.0, 'max_tokens': 16000}"

Arguments for the teacher model

template_infer_n_samples

int

default:"8"

Number of dataset examples to show the teacher for template inference

Template inference is powerful but costs extra API calls. Use it when you’re unsure how to format examples.

Evaluation Controls

eval_subset_size

int

default:"None"

Number of examples to evaluate per trial (for speed). If None, uses entire dataset.

eval_subset_strategy

str

default:"random"

How to select eval subset: "random", "first", or "all"

Underlying Research

Bayesian Search builds on established principles of Bayesian optimization, adapted for the unique challenges of prompt engineering.

Core Concept: The method is detailed in papers like “A Bayesian approach for prompt optimization in pre-trained models”, which explores mapping discrete prompts to continuous embeddings for more efficient searching.
Few-Shot Learning: Its application in few-shot scenarios is highlighted by tools like Comet’s OPik, which features a “Few-Shot Bayesian Optimizer”.
Advanced Implementations: Recent research, such as “Searching for Optimal Solutions with LLMs via Bayesian Optimization (BOPRO)”, investigates using Bayesian optimization to navigate complex LLM search spaces. The popular BayesianOptimization library on GitHub provides the foundational Gaussian process-based modeling.

This approach is noted for its efficiency in prominent frameworks like DSPy and is recognized in surveys for its effectiveness in few-shot learning contexts.

Advanced Examples

With Automatic Template Inference

Let the teacher model determine the best example format:

optimizer = BayesianSearchOptimizer(
    inference_model_name="gpt-4o-mini",
    teacher_model_name="gpt-4o",
    n_trials=25,
    min_examples=3,
    max_examples=6,
    
    # Enable automatic template inference
    infer_example_template_via_teacher=True,
    template_infer_n_samples=10,
    
    # Evaluation settings
    eval_subset_size=15,
    eval_subset_strategy="random"
)

result = optimizer.optimize(
    evaluator=evaluator,
    data_mapper=data_mapper,
    dataset=dataset,
    initial_prompts=[initial_prompt]
)

print(f"Best score: {result.final_score}")
print(f"Optimized prompt:\n{result.best_generator.get_prompt_template()}")

With Custom Example Formatting

Full control over example formatting:

def custom_formatter(example: dict) -> str:
    """Custom function to format each example."""
    return f"""
    Context: {example['context']}
    Question: {example['question']}
    Answer: {example['answer']}
    ---
    """

optimizer = BayesianSearchOptimizer(
    inference_model_name="gpt-4o-mini",
    n_trials=20,
    min_examples=2,
    max_examples=5,
    
    # Use custom formatter
    example_formatter=custom_formatter,
    few_shot_position="prepend",
    few_shot_title="## Example Q&A Pairs"
)

With Custom Prompt Builder

Control how few-shot examples integrate with base prompt:

def custom_prompt_builder(base_prompt: str, few_shot_blocks: list) -> str:
    """Custom function to build the final prompt."""
    few_shot_text = few_shot_blocks[0] if few_shot_blocks else ""
    
    return f"""
    # Task Instructions
    {base_prompt}
    
    # Reference Examples
    {few_shot_text}
    
    # Your Turn
    Now apply these instructions to the following:
    """

optimizer = BayesianSearchOptimizer(
    inference_model_name="gpt-4o-mini",
    n_trials=15,
    min_examples=2,
    max_examples=4,
    prompt_builder=custom_prompt_builder
)

With Fixed Examples

Always include certain critical examples:

# Suppose examples at indices 0, 5, and 10 are particularly important
optimizer = BayesianSearchOptimizer(
    inference_model_name="gpt-4o-mini",
    n_trials=20,
    min_examples=5,  # Will always have at least 5 (3 fixed + 2 additional)
    max_examples=10,
    
    # These will always be included
    fixed_example_indices=[0, 5, 10],
    
    # Optimizer will vary the additional examples
    allow_repeats=False
)

Understanding the Results

Analyzing Optimization History

result = optimizer.optimize(...)

# See all tried configurations
for i, iteration in enumerate(result.history):
    print(f"\nTrial {i+1}:")
    print(f"Score: {iteration.average_score:.4f}")
    print(f"Prompt snippet: {iteration.prompt[:200]}...")
    
    # Count number of examples used
    num_examples = iteration.prompt.count("Q:") - 1  # Adjust based on your format
    print(f"Examples used: ~{num_examples}")

Extracting Best Configuration

# Get the best prompt
best_prompt = result.best_generator.get_prompt_template()

# Extract few-shot examples from the prompt
# (Pattern depends on your formatting)
import re
examples = re.findall(r"Q: (.*?)\nA: (.*?)\n", best_prompt)
print(f"Best configuration used {len(examples)} examples")

Performance Tips

Start with fewer trials

Begin with n_trials=10 to validate your setup, then increase to 20-30 for production.

Use eval subsets for large datasets

Set eval_subset_size=20 when you have 50+ examples to speed up optimization significantly.

Adjust example range based on task

Classification: min_examples=2, max_examples=5
Complex reasoning: min_examples=3, max_examples=8
Creative tasks: min_examples=1, max_examples=4

Let teacher infer template first

Run a quick optimization with infer_example_template_via_teacher=True, save the inferred template, then use it explicitly in future runs to save costs.

Common Patterns

Question Answering with Context

dataset = [
    {
        "context": "...",
        "question": "...",
        "answer": "..."
    }
]

optimizer = BayesianSearchOptimizer(
    inference_model_name="gpt-4o-mini",
    n_trials=20,
    min_examples=2,
    max_examples=6,
    example_template="Context: {context}\nQ: {question}\nA: {answer}",
    example_separator="\n\n",
    few_shot_position="prepend"
)

Text Classification

dataset = [
    {
        "text": "Product review text...",
        "label": "positive"  # or "negative", "neutral"
    }
]

optimizer = BayesianSearchOptimizer(
    inference_model_name="gpt-4o-mini",
    n_trials=15,
    min_examples=3,
    max_examples=8,
    example_template="Text: {text}\nSentiment: {label}",
    eval_subset_size=25
)

Data Extraction

dataset = [
    {
        "input_text": "John Doe lives in NYC...",
        "extracted_name": "John Doe",
        "extracted_location": "NYC"
    }
]

optimizer = BayesianSearchOptimizer(
    inference_model_name="gpt-4o-mini",
    n_trials=20,
    min_examples=2,
    max_examples=5,
    example_template_fields=["input_text", "extracted_name", "extracted_location"],
    field_aliases={
        "input_text": "Input",
        "extracted_name": "Name",
        "extracted_location": "Location"
    }
)

Troubleshooting

Template formatting errors

Problem: KeyError when formatting examplesSolution: Ensure all fields in example_template exist in your dataset examples. Use example_template_fields to explicitly list available fields.

Optimization plateaus quickly

Problem: Scores stop improving after few trialsSolution:

Increase max_examples to explore larger few-shot sizes
Try infer_example_template_via_teacher=True
Check if your dataset has sufficient diversity

Very slow optimization

Problem: Each trial takes too longSolution:

Set eval_subset_size=10 or smaller
Use a faster inference model
Reduce max_examples

Few-shot examples don't help

Problem: Adding examples doesn’t improve scoresSolution:

Verify examples are high-quality and diverse
Check that example_template formats them clearly
Your task might not benefit from few-shot (try Meta-Prompt instead)

Next Steps

Try Meta-Prompt

For tasks that need deeper reasoning

Compare Optimizers

See all optimization strategies

Get Started

Guides

When to Use Bayesian Search

✅ Best For

❌ Not Ideal For

How It Works

Basic Usage

Configuration Parameters

Search Space

Optimization Control

Model Configuration

Example Formatting

Teacher-Guided Template Inference

Evaluation Controls

Underlying Research

Advanced Examples

With Automatic Template Inference

With Custom Example Formatting

With Custom Prompt Builder

With Fixed Examples

Understanding the Results

Analyzing Optimization History

Extracting Best Configuration

Performance Tips

Common Patterns

Question Answering with Context

Text Classification

Data Extraction

Troubleshooting

Next Steps

Try Meta-Prompt

Compare Optimizers

Get Started

Guides

​When to Use Bayesian Search

✅ Best For

❌ Not Ideal For

​How It Works

​Basic Usage

​Configuration Parameters

​Search Space

​Optimization Control

​Model Configuration

​Example Formatting

​Teacher-Guided Template Inference

​Evaluation Controls

​Underlying Research

​Advanced Examples

​With Automatic Template Inference

​With Custom Example Formatting

​With Custom Prompt Builder

​With Fixed Examples

​Understanding the Results

​Analyzing Optimization History

​Extracting Best Configuration

​Performance Tips

​Common Patterns

​Question Answering with Context

​Text Classification

​Data Extraction

​Troubleshooting

​Next Steps

Try Meta-Prompt

Compare Optimizers

When to Use Bayesian Search

How It Works

Basic Usage

Configuration Parameters

Search Space

Optimization Control

Model Configuration

Example Formatting

Teacher-Guided Template Inference

Evaluation Controls

Underlying Research

Advanced Examples

With Automatic Template Inference

With Custom Example Formatting

With Custom Prompt Builder

With Fixed Examples

Understanding the Results

Analyzing Optimization History

Extracting Best Configuration

Performance Tips

Common Patterns

Question Answering with Context

Text Classification

Data Extraction

Troubleshooting

Next Steps