Create New Dataset

Learn to create datasets to do experimentations on them

What it is

Creating a new dataset adds a blank table (or a table filled from a source) under your organization. You get a dataset with a name and optional columns/rows that you can then use for run prompt, evals, experiments, and optimization. The dataset is the container; you can keep editing it after creation.

Use cases

  • Evaluate a prompt or model – You need a set of inputs and (optionally) expected outputs or scores. Creating a dataset gives you that table so you can run prompts and evals on it.
  • Reuse production data – You have traces/spans from your app and want to turn them into eval data. Creating a dataset from “observe” (or adding to a new one) turns selected traces into rows.
  • Import existing data – You already have test cases in CSV/Excel or on Hugging Face. Creating a dataset from file or Hugging Face imports that data so you don’t have to type it in.
  • Generate test data – You don’t have real data yet but know the kind of examples you need. Creating a synthetic dataset (with description, objective, patterns) generates rows for you.
  • Branch from an experiment – You ran an experiment and want to keep that snapshot as a standalone dataset to edit or reuse. Creating a dataset from that experiment copies it into a new dataset.

How to

Choose how you want to create your dataset:

Use SDK to import your data to Future AGI.

Assign Dataset Name

Assign a name to your dataset and click on “Next” to proceed.

assign_dataset_name

Add Rows via SDK

You will be greeted with a screen containing code snippet to add rows to your dataset.

# pip install futureagi

import os
from fi.datasets import Dataset
from fi.datasets.types import (
    Cell,
    Column,
    DatasetConfig,
    DataTypeChoices,
    ModelTypes,
    Row,
    SourceChoices,
)

# Set environment variables
os.environ["FI_API_KEY"] = "<fi_api_key>"
os.environ["FI_SECRET_KEY"] = "<fi_secret_key>"

# Get existing dataset
config = DatasetConfig(name="my-dataset", model_type= ModelTypes.GENERATIVE_LLM)
dataset = Dataset(dataset_config=config)
dataset = Dataset.get_dataset_config("my-dataset")

# Define columns
columns = [
    Column(
        name="user_query",
        data_type=DataTypeChoices.TEXT,
        source=SourceChoices.OTHERS
    ),
    Column(
        name="response_quality",
        data_type=DataTypeChoices.INTEGER,
        source=SourceChoices.OTHERS
    ),
    Column(
        name="is_helpful",
        data_type=DataTypeChoices.BOOLEAN,
        source=SourceChoices.OTHERS
    )
]

# Define rows
rows = [
    Row(
        order=1,
        cells=[
            Cell(column_name="user_query", value="What is machine learning?"),
            Cell(column_name="response_quality", value=8),
            Cell(column_name="is_helpful", value=True)
        ]
    ),
    Row(
        order=2,
        cells=[
            Cell(column_name="user_query", value="Explain quantum computing"),
            Cell(column_name="response_quality", value=9),
            Cell(column_name="is_helpful", value=True)
        ]
    )
]

try:
    # Add columns and rows to dataset
    dataset = dataset.add_columns(columns=columns)
    dataset = dataset.add_rows(rows=rows)
    print("✓ Data added successfully")

except Exception as e:
    print(f"Failed to add data: {e}")
import { Dataset, DataTypeChoices, createRow, createCell } from "@future-agi/sdk";

process.env["FI_API_KEY"] = "<fi_api_key>";
process.env["FI_SECRET_KEY"] = "<fi_secret_key>";

async function main() {
try {
    const dsName = "my-dataset";

    // 1) Open the dataset (fetch if it exists, create if not)
    const dataset = await Dataset.open(dsName);

    // 2) Define columns
    const columns = [
    { name: "user_query", dataType: DataTypeChoices.TEXT },
    { name: "response_quality", dataType: DataTypeChoices.INTEGER },
    { name: "is_helpful", dataType: DataTypeChoices.BOOLEAN },
    ];

    // 3) Define rows
    const rows = [
    createRow({
        cells: [
        createCell({ columnName: "user_query", value: "What is machine learning?" }),
        createCell({ columnName: "response_quality", value: 8 }),
        createCell({ columnName: "is_helpful", value: true }),
        ],
    }),
    createRow({
        cells: [
        createCell({ columnName: "user_query", value: "Explain quantum computing" }),
        createCell({ columnName: "response_quality", value: 9 }),
        createCell({ columnName: "is_helpful", value: true }),
        ],
    }),
    ];

    // 4) Add columns and rows
    await dataset.addColumns(columns);
    await dataset.addRows(rows);
    console.log("✓ Data added successfully");
} catch (err) {
    console.error("Failed to add data:", err);
}
}

main();
curl --request POST     --url https://api.futureagi.com/model-hub/develops/<dataset_id>/add_columns/     --header 'X-Api-Key: <fi_api_key>'     --header 'X-Secret-Key: <fi_secret_key>'     --header 'content-type: application/json'     --data '{
"new_columns_data": [
    {
    "name": "user_query",
    "data_type": "text"
    },
    {
    "name": "response_quality",
    "data_type": "integer"
    },
    {
    "name": "is_helpful",
    "data_type": "boolean"
    }
]
}'

Note

Click here to access API Key and Secret Key.

Upload CSV, JSON or JSONL files to create a dataset.

upload_file

Synthetically generate data and perform experimentations on it.

Add Details

Provide basic details about the dataset you want to generate.

add_details

PropertyDescription
NameName of the dataset
Knowledge Base (optional)Select which knowledge base you want to use.
DescriptionDescribe the dataset you want to generate
Objective (optional)Use case of the dataset
Pattern (optional)Style, tone or behavioral traits of the generated dataset
No. of RowsRow count of the generated dataset (min 10 rows)

Add Column Properties

Define column types and properties

add_column_properties

PropertyDescription
Column NameName of the column
Column TypeChoose the type of the column (available types: text, boolean, integer, float, json, array, datetime)

Add Description

Now add description for each column. Describe in detail what values you want in this column. add_column_description

Create Dataset

Click on “Create Dataset” button to generate the dataset. Your synthetic dataset will be generated in a few seconds and will be available in your dataset dashboard.

If you are not satisfied with the generated dataset, you can click on “Configure Synthetic Data” button. It will allow you to edit the fields and generate the dataset again. create_dataset configure_synthetic_data

Manually create dataset from scratch.

Provide Basic Details

To proceed with creating dataset manually from scratch, provide the name you want to assign and the number of columns and rows you want. manually You will be greeted with an empty dataset with the name you assigned and with empty rows and columns. empty_dataset

Populating the dataset

You can populate the dataset by double-tapping over the empty cell you want to populate. It will open an editor where you can provide the details you want to fill in that cell. populate_dataset

Search Hugging Face Dataset

Search for the dataset you want to import from Hugging Face. You can even refine the search by using flters given on left side.

search_hugging_face_dataset

Import Dataset

Once you have selected the dataset you want to import, click on that dataset and it will open a panel where you can select what subset and split you want to import.

You can also select the number of rows you want to import. By default, it will import all the rows. import_dataset

Click on “Start Experimenting” button and it will start importing the dataset and you will be able to see it in your dataset dashboard.

You can create a subset from an existing dataset.

Choose the existing dataset

Assign a name to this dataset and choose the existing dataset from the dropdown you want to create a subset from. choose_existing_dataset It allows you to import the dataset in two ways:

  1. Import Data: It will only import the original columns from the existing dataset.
  2. Import Data and Prompt Configuration: Along with original column, it will also import the prompt columns from that dataset.

Map the columns

You can choose what columns you want to use from that existing dataset and also you can assign a new name to the columns you want to use. map_columns

Import the dataset

Click on “Add” button and it will create a new dataset in your dataset dashboard.

What you can do next

Was this page helpful?

Questions & Discussion