Future AGI Datasets: Structure, Column Types, and Lifecycle

Each row is one example; each column is an attribute. Datasets are the foundation for running prompts, evals, experiments, and optimizations in Future AGI.

About

A dataset in Future AGI is a table of structured data. Each row is one example (e.g. a user query and its expected answer). Each column is an attribute (e.g. “input”, “expected_output”, “model_response”, “score”). Datasets are the foundation for running prompts, evaluations, experiments, and optimizations.

Here’s what a simple dataset looks like:

input	expected_output	model_response	is_correct
What is the capital of France?	Paris	Paris	true
Who wrote Hamlet?	Shakespeare	William Shakespeare	true
What is 2+2?	4	The answer is 4	true

The first two columns (input, expected_output) are static columns that you add manually. The last two (model_response, is_correct) are dynamic columns generated by running a prompt and an evaluation against each row.

Structure

Every dataset has three core components:

Rows: Each row is one data point or test case. You can add rows manually, import from files, generate them synthetically, or pull them from production traces.
Columns: Each column defines an attribute. Columns have a name, a data type (text, number, boolean, JSON, etc.), and are either static (you provide the data) or dynamic (the platform generates it).
Metadata: Each dataset has a name, description, and organization-level permissions that control who can view and edit it.

How to Create a Dataset

There are several ways to get data into a dataset:

Manual creation: Define the structure and add rows through the UI or SDK. Learn more
File import: Upload CSV, Excel, JSON, or JSONL files. Learn more
Synthetic generation: Describe the schema and let the platform generate realistic test data. Learn more
From HuggingFace: Import existing datasets from HuggingFace directly. Learn more
From production traces: Convert observed production data from the Observe module into datasets for regression testing. Learn more

Dataset Lifecycle

1. Create

Start with a schema (columns and types) and populate it with data using any of the methods above.

2. Enrich

Add more columns to your dataset over time:

Run prompts: Send each row through an LLM and store the responses as a new column. Learn more
Run evaluations: Score model outputs using 70+ built-in metrics. Results are stored as new columns. Learn more
Add annotations: Manually label rows with custom tags and scores. Future AGI also supports auto-annotations that learn from your labels. Learn more

3. Experiment

Use the same dataset to compare different prompts, models, or configurations side by side. Each experiment run adds new columns so you can see results next to each other. Learn more

4. Maintain

Datasets evolve over time. You can:

Add or remove columns without disrupting existing data
Add new rows as you discover edge cases
Archive or delete old datasets to keep your workspace clean

Next Steps

Static Columns: Data you add directly to your dataset
Dynamic Columns: Data generated by prompts, evaluations, or models
Synthetic Data: Generate realistic test data from a schema
Create a Dataset: Get started with your first dataset

Was this page helpful?

Questions & Discussion