A dataset in Future AGI is a structured collection of data that serves as the foundation for executing LLM prompts, conducting experiments, and optimizing AI-generated responses.
It organizes data in rows and columns, where each row represents an instance, and columns define the attributes associated with that instance. Datasets provides the necessary context, inputs, and evaluation references for prompt execution and iterative improvements.
The dataset system is designed to support a full lifecycle of data management, ensuring flexibility, scalability, and usability across different AI workflows.
Datasets can be created through multiple methods:
Datasets can be enriched with additional metadata and evaluations, including:
Datasets are dynamic and evolve over time. The system enables: