Overview
Create, manage and analyze datasets for AI model development and evaluation
About
Datasets are the core data layer for evaluation and experimentation in Future AGI. Each dataset is a table with columns (e.g. “user query”, “expected answer”, “score”), rows (one row per example), and cells (the value in each column for each row).
Datasets are the single source of truth that prompts, evaluations, experiments, and optimizations run on. You can create them from file uploads, the SDK, observed production traces, or synthetic generation.
Column Types
Datasets support two types of columns:
- Static columns: Data you add directly, either manually, via file upload, or through the SDK. These hold your inputs, expected outputs, ground truth labels, or any fixed data.
- Dynamic columns: Generated on-the-fly by running a prompt, evaluation, or model against your dataset rows. For example, running GPT-4o on every row creates a dynamic column with the model’s responses.
This distinction matters because dynamic columns let you add model outputs, evaluation scores, and computed fields to your dataset without duplicating data.
How Datasets Connect to Other Features
- Evaluation: Run 70+ built-in metrics across your dataset rows to score model outputs. Results are stored as new columns. Learn more
- Experiments: Compare two prompts or models by running both against the same dataset and comparing scores side by side. Learn more
- Optimization: Use datasets as the training ground for prompt optimization algorithms. Learn more
- Observe: Build datasets from production traces to test against real user queries. Learn more
Getting Started with Datasets
Create New Dataset
Create datasets using SDK integration, file upload, or synthetic data generation
Add Rows to Dataset
Learn how to add individual records or bulk import data rows
Add Columns to Dataset
Extend your dataset structure with additional data fields
Run Prompts
Test and execute prompts against your dataset entries
Experimentations
Design and conduct controlled experiments to compare approaches
Annotate Dataset
Add metadata and annotations to enrich your dataset
Next Steps
- Understanding Datasets: Deeper dive into dataset concepts, column types, and best practices
- Generate Synthetic Data: Create realistic datasets from scratch when real data is unavailable
- Import from HuggingFace: Bring existing HuggingFace datasets into Future AGI