Understanding Dataset

It organizes data in rows and columns, where each row represents an instance, and columns define the attributes associated with that instance. Datasets provides the necessary context, inputs, and evaluation references for prompt execution and iterative improvements.

Core Components of a Dataset

Dataset Name: A user-defined label to distinguish different datasets.
Column Order & Configuration: Maintains the structure of dataset columns, data types, and processing configurations.
Organization & Permissions: Defines access control, ensuring datasets are linked to specific teams or projects.

Dataset Lifecycle

The dataset system is designed to support a full lifecycle of data management, ensuring flexibility, scalability, and usability across different AI workflows.

1. Creation

Datasets can be created through multiple methods:

Manual Creation: Users can create datasets by defining structure and adding data manually. Learn more →
Automated Generation: The system can generate synthetic datasets for controlled testing. Learn more →
Importing from External Sources: Future AGI supports imports from CSV, Excel, JSON, JSONL, and Hugging Face datasets. Learn more →
Derived from Experiments: Users can convert experiment results into datasets, allowing further analysis and refinements. Learn more →

2. Enrichment

Datasets can be enriched with additional metadata and evaluations, including:

Annotations : Users can manually add the labels for a dataset defining their own set of rules and labels. Future AGI also provides auto-annotations which learn from the human in the loop and helps annotating the remaining datapoints. Learn more →
Evaluations : Users can utilize Future AGI Evaluations to evaluate the datasets to filter out the specific noise etc

4. Maintenance

Datasets are dynamic and evolve over time. The system enables:

Schema Updates: Columns and metadata can be modified without disrupting existing data.
Archival & Cleanup: Old datasets can be archived, merged, or deleted, keeping workflows optimized.

Introduction

Evaluation

Simulations

Knowledge Base

Dataset

Prototype

Observe

Tracing

Optimization

Prompt Workbench

Protect

MCP

Admin & Settings

FAQs

Understanding Dataset

Core Components of a Dataset

Dataset Lifecycle

1. Creation

2. Enrichment

4. Maintenance

Introduction

Evaluation

Simulations

Knowledge Base

Dataset

Prototype

Observe

Tracing

Optimization

Prompt Workbench

Protect

MCP

Admin & Settings

FAQs

​Core Components of a Dataset

​Dataset Lifecycle

​1. Creation

​2. Enrichment

​4. Maintenance

Core Components of a Dataset

Dataset Lifecycle

1. Creation

2. Enrichment

4. Maintenance