Overview

Create, manage and analyze datasets for AI model development and evaluation

What it is

Datasets are the core data layer for evaluation and experimentation. Each dataset is a table: columns (e.g. “user query”, “expected answer”, “score”), rows (one row = one example), and cells (the value in each column for each row). Datasets are the single source of truth that prompts, evals, experiments, and optimizations run on.

Purpose

  • Store and manage test/eval data in one place.
  • Run prompts and evals over the same structured data.
  • Compare model or prompt performance across experiments.
  • Support building datasets from product usage (e.g. from observed traces) as well as from uploads, API, or synthetic generation.

Getting Started with Datasets

Was this page helpful?

Questions & Discussion