Core Components of a Dataset
- Dataset Name: A user-defined label to distinguish different datasets.
- Column Order & Configuration: Maintains the structure of dataset columns, data types, and processing configurations.
- Organization & Permissions: Defines access control, ensuring datasets are linked to specific teams or projects.
Dataset Lifecycle
The dataset system is designed to support a full lifecycle of data management, ensuring flexibility, scalability, and usability across different AI workflows.1. Creation
Datasets can be created through multiple methods:- Manual Creation: Users can create datasets by defining structure and adding data manually. Learn more →
- Automated Generation: The system can generate synthetic datasets for controlled testing. Learn more →
- Importing from External Sources: Future AGI supports imports from CSV, Excel, JSON, JSONL, and Hugging Face datasets. Learn more →
- Derived from Experiments: Users can convert experiment results into datasets, allowing further analysis and refinements. Learn more →
2. Enrichment
Datasets can be enriched with additional metadata and evaluations, including:- Annotations : Users can manually add the labels for a dataset defining their own set of rules and labels. Future AGI also provides auto-annotations which learn from the human in the loop and helps annotating the remaining datapoints. Learn more →
- Evaluations : Users can utilize Future AGI Evaluations to evaluate the datasets to filter out the specific noise etc
4. Maintenance
Datasets are dynamic and evolve over time. The system enables:- Schema Updates: Columns and metadata can be modified without disrupting existing data.
- Archival & Cleanup: Old datasets can be archived, merged, or deleted, keeping workflows optimized.