Dataset Optimization

Dataset Optimization is the process of automatically improving prompt templates stored within your datasets. It uses the same optimization algorithms as Prompt Optimization but applies them to dataset columns rather than agent system prompts. This is useful when you want to systematically enhance the quality of prompts in your training or evaluation data using a structured, data-driven approach.

How It Differs from Prompt Optimization

🎯 Prompt Optimization

Targets agent system prompts
Uses simulation (conversational) evaluation
Best for improving agent behavior
Input: Agent configuration

📊 Dataset Optimization

Targets dataset column prompts
Uses direct (input/output) evaluation
Best for improving training & eval data
Input: Dataset column + evaluation templates

Key Concepts

Optimization Run

An optimization run connects the following components:

📋 Column — The dataset column containing the prompt template to optimize.
⚙️ Optimizer Algorithm — The strategy used to find better prompts (e.g., Bayesian Search, ProTeGi).
📏 Evaluation Templates — The evaluations used to score how well each prompt variation performs.
🧠 Teacher Model — The LLM used for optimization decisions (generating new prompt candidates).
🚀 Inference Model — The LLM used to execute prompts and generate outputs during each trial.

Trials

Each optimization run consists of multiple trials:

📌 Baseline Trial — The original prompt is evaluated first to establish a baseline score.
🔄 Variation Trials — New prompt variations generated by the optimizer algorithm.
Each trial receives an average score based on the configured evaluation templates.

Evaluation Templates

Evaluation templates define how each prompt variation is scored across dataset rows. You can use:

✅ Built-in templates — Pre-configured evaluations like summary_quality, context_adherence, tone, and more.
🛠️ Custom templates — Define your own evaluation criteria tailored to your specific use case.

You can browse and create evaluation templates from the Evaluations section of the platform.

How It Works

Create an Optimization Run

Navigate to your dataset and click the Optimize button in the top action bar. Select the column containing the prompt template you want to optimize, choose an optimizer algorithm, configure the teacher and inference models, and select your evaluation templates.

Baseline Evaluation

The system evaluates your original prompt against the dataset to establish a baseline score. This score serves as the benchmark for measuring improvements.

Optimization Loop

The optimizer generates new prompt variations, runs them against dataset rows (up to 50) using the inference model, and scores them using your evaluation templates. Each variation is saved as a trial with its results.

Review & Deploy

Compare the baseline against all optimized variations. The system highlights the best-performing prompt with measurable score improvements. 🎉

Optimizations can be paused and resumed — the optimizer state is persisted after each trial, so you won’t lose progress if a run is interrupted.

Supported Optimizers

All six optimization algorithms are available for dataset optimization:

Optimizer	Speed	Quality	Best For
Random Search	⚡⚡⚡	⭐⭐	Quick baselines
Bayesian Search	⚡⚡	⭐⭐⭐⭐	Few-shot learning tasks
Meta-Prompt	⚡⚡	⭐⭐⭐⭐	Complex reasoning tasks
ProTeGi	⚡	⭐⭐⭐⭐	Fixing specific error patterns
PromptWizard	⚡	⭐⭐⭐⭐	Creative/open-ended exploration
GEPA	⚡	⭐⭐⭐⭐⭐	Production deployments

Speed: ⚡ = Slow, ⚡⚡ = Medium, ⚡⚡⚡ = Fast Quality: ⭐ = Basic, ⭐⭐⭐⭐⭐ = Excellent

Start with Random Search for a quick baseline, then use a more advanced optimizer like ProTeGi or GEPA to push for higher quality. 🚀

For detailed information on each algorithm, see the Optimization Algorithms page.

When to Use Each Optimizer

🎲 Random Search — Quick Baseline

Best for: Getting a fast baseline to compare against.Generates random prompt variations using a teacher model. No learning from previous attempts, but very fast.

"optimizer_algorithm": "random_search"
"optimizer_config": { "num_variations": 5 }

📈 Bayesian Search — Few-Shot Learning

Best for: Tasks that benefit from few-shot examples in the prompt.Uses Bayesian optimization to intelligently select the best combination and number of few-shot examples.

"optimizer_algorithm": "bayesian"
"optimizer_config": { "n_trials": 20, "min_examples": 2, "max_examples": 8 }

🧠 Meta-Prompt — Deep Reasoning

Best for: Complex tasks where the prompt needs holistic redesign.Analyzes failed examples, formulates hypotheses about what went wrong, and rewrites the entire prompt.

"optimizer_algorithm": "metaprompt"
"optimizer_config": { "num_rounds": 5 }

🔬 ProTeGi — Error-Driven Fixes

Best for: When you can identify clear failure patterns in outputs.Generates critiques of failures and applies targeted improvements using beam search.

"optimizer_algorithm": "protegi"
"optimizer_config": { "num_rounds": 3, "beam_size": 3, "num_gradients": 4 }

🪄 PromptWizard — Creative Exploration

Best for: Open-ended tasks where you want diverse prompt variations.Combines mutation with different “thinking styles”, then critiques and refines top performers.

"optimizer_algorithm": "promptwizard"
"optimizer_config": { "refine_iterations": 2, "mutate_rounds": 3 }

🧬 GEPA — Production-Grade

Best for: Production deployments requiring the highest quality results.Uses evolutionary algorithms with reflective learning and mutation strategies. State-of-the-art performance.

"optimizer_algorithm": "gepa"
"optimizer_config": { "max_metric_calls": 150 }

Best Practices

📦 Dataset Size

Optimal range: 15–50 rows for optimization.
The system evaluates up to 50 rows per trial for efficiency.
Smaller datasets run faster but may produce less reliable scores.

📏 Evaluation Templates

Use 1–3 evaluation templates that directly measure what matters for your task.
Avoid conflicting evaluations that may confuse the optimizer.
Use clear pass/fail criteria where possible.

🧭 Choosing an Optimizer

Do you need production-grade results?
├─ Yes → Use GEPA 🧬
└─ No
   │
   Do you have few-shot examples?
   ├─ Yes → Use Bayesian Search 📈
   └─ No
      │
      Is your task reasoning-heavy?
      ├─ Yes → Use Meta-Prompt 🧠
      └─ No
         │
         Do you have clear failure patterns?
         ├─ Yes → Use ProTeGi 🔬
         └─ No → Use Random Search 🎲 (baseline)

Troubleshooting

No improvement after optimization

Cause: Dataset may be too small or not diverse enough.Solution: Use more diverse examples (15–50 rows recommended) and ensure your evaluation templates clearly distinguish good from bad outputs.

High score variance between trials

Cause: Inconsistent or conflicting evaluation templates.Solution: Simplify your evaluations — use 1–2 clear templates instead of many overlapping ones.

Optimization running too slowly

Cause: Too many dataset rows or a slow optimizer.Solution: Reduce dataset size, or switch to a faster optimizer like Random Search or Bayesian Search.

Run failed mid-way

Cause: API errors or timeout.Solution: Create a new run with the same configuration — the system can resume from where it left off using persisted optimizer state.

Get Started

Guides

How It Differs from Prompt Optimization

🎯 Prompt Optimization

📊 Dataset Optimization

Key Concepts

Optimization Run

Trials

Evaluation Templates

How It Works

Supported Optimizers

When to Use Each Optimizer

Best Practices

📦 Dataset Size

📏 Evaluation Templates

🧭 Choosing an Optimizer

Troubleshooting

Next Steps

Optimization Algorithms

Using the Python SDK

Get Started

Guides

​How It Differs from Prompt Optimization

🎯 Prompt Optimization

📊 Dataset Optimization

​Key Concepts

​Optimization Run

​Trials

​Evaluation Templates

​How It Works

​Supported Optimizers

​When to Use Each Optimizer

​Best Practices

​📦 Dataset Size

​📏 Evaluation Templates

​🧭 Choosing an Optimizer

​Troubleshooting

​Next Steps

Optimization Algorithms

Using the Python SDK

How It Differs from Prompt Optimization

Key Concepts

Optimization Run

Trials

Evaluation Templates

How It Works

Supported Optimizers

When to Use Each Optimizer

Best Practices

📦 Dataset Size

📏 Evaluation Templates

🧭 Choosing an Optimizer

Troubleshooting

Next Steps