Overview

Create, manage and analyze datasets for AI model development and evaluation

About

Datasets are the core data layer for evaluation and experimentation in Future AGI. Each dataset is a table with columns (e.g. “user query”, “expected answer”, “score”), rows (one row per example), and cells (the value in each column for each row).

Datasets are the single source of truth that prompts, evaluations, experiments, and optimizations run on. You can create them from file uploads, the SDK, observed production traces, or synthetic generation.

Column Types

Datasets support two types of columns:

  • Static columns: Data you add directly, either manually, via file upload, or through the SDK. These hold your inputs, expected outputs, ground truth labels, or any fixed data.
  • Dynamic columns: Generated on-the-fly by running a prompt, evaluation, or model against your dataset rows. For example, running GPT-4o on every row creates a dynamic column with the model’s responses.

This distinction matters because dynamic columns let you add model outputs, evaluation scores, and computed fields to your dataset without duplicating data.

How Datasets Connect to Other Features

  • Evaluation: Run 70+ built-in metrics across your dataset rows to score model outputs. Results are stored as new columns. Learn more
  • Experiments: Compare two prompts or models by running both against the same dataset and comparing scores side by side. Learn more
  • Optimization: Use datasets as the training ground for prompt optimization algorithms. Learn more
  • Observe: Build datasets from production traces to test against real user queries. Learn more

Getting Started with Datasets

Next Steps

Was this page helpful?

Questions & Discussion