Datasets

Create, populate, and manage datasets for evaluation. Upload CSV/JSON files, import from HuggingFace, add LLM-generated columns, and run evaluations at scale.

📝

TL;DR

pip install futureagi (or comes with ai-evaluation)
Create datasets from scratch, CSV/JSON files, or HuggingFace
Chain operations: create → add columns → add rows → run evals → download results

Datasets hold your test data and evaluation scores. For the full platform guide, see Dataset docs. Create one, fill it with data, run evals across every row, and download the results.

Note

Requires pip install futureagi and FI_API_KEY + FI_SECRET_KEY in your environment. If you installed ai-evaluation, you already have futureagi.

Quick Example

from fi.datasets import Dataset, DatasetConfig

# Create a dataset
config = DatasetConfig(name="my-eval-data", model_type="GenerativeLLM")
dataset = Dataset(dataset_config=config)
dataset.create()

# Add columns and rows
dataset.add_columns([
    {"name": "question", "data_type": "text"},
    {"name": "answer", "data_type": "text"},
])

dataset.add_rows([
    {"cells": [{"column_name": "question", "value": "What is Python?"}, {"column_name": "answer", "value": "A programming language."}]},
    {"cells": [{"column_name": "question", "value": "What is 2+2?"}, {"column_name": "answer", "value": "4"}]},
])

# Download as a pandas DataFrame
df = dataset.download(load_to_pandas=True)
print(df)
#           question                   answer
# 0  What is Python?  A programming language.
# 1     What is 2+2?                        4

DatasetConfig

Every dataset needs a config with a name and model type.

from fi.datasets import DatasetConfig

config = DatasetConfig(
    name="my-dataset",          # required, max 255 chars
    model_type="GenerativeLLM", # "GenerativeLLM" or "GenerativeImage"
)

Creating Datasets

Empty dataset

from fi.datasets import Dataset, DatasetConfig

config = DatasetConfig(name="my-dataset", model_type="GenerativeLLM")
dataset = Dataset(dataset_config=config).create()

From a CSV or JSON file

dataset = Dataset(dataset_config=DatasetConfig(name="from-file", model_type="GenerativeLLM"))
dataset.create(source="path/to/data.csv")
# Supported: .csv, .json, .jsonl, .xlsx, .xls

From HuggingFace

from fi.datasets.types import HuggingfaceDatasetConfig

hf = HuggingfaceDatasetConfig(name="squad", subset="default", split="train", num_rows=100)
dataset = Dataset(dataset_config=DatasetConfig(name="squad-sample", model_type="GenerativeLLM"))
dataset.create(source=hf)

Columns and Rows

Adding columns

Pass a list of dicts with name and data_type.

dataset.add_columns([
    {"name": "input", "data_type": "text"},
    {"name": "output", "data_type": "text"},
    {"name": "score", "data_type": "float"},
    {"name": "metadata", "data_type": "json"},
])

Column types: text, boolean, integer, float, json, array, image, datetime, audio.

Adding rows

Each row is a dict with a cells list. Each cell maps a column name to a value.

dataset.add_rows([
    {"cells": [
        {"column_name": "input", "value": "Summarize this article"},
        {"column_name": "output", "value": "The article discusses..."},
        {"column_name": "score", "value": 0.85},
    ]},
])

Tip

You can also use typed Column, Row, and Cell objects from fi.datasets.types instead of dicts. Both work the same way — dicts are simpler for most cases.

Running LLM Prompts on a Dataset

Run an LLM on every row to generate outputs. Use {{column_name}} in your messages to reference column values.

dataset.add_run_prompt(
    name="gpt4o_response",
    model="gpt-4o-mini",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Answer this question: {{question}}"},
    ],
    max_tokens=500,
    temperature=0.7,
)

A new column gpt4o_response appears with the LLM output for each row.

Parameter	Type	Default	Description
`name`	str	required	Column name for the generated outputs
`model`	str	required	LLM model name (e.g. `"gpt-4o-mini"`)
`messages`	list	required	Chat messages with `{{column}}` placeholders
`max_tokens`	int	500	Maximum tokens per response
`temperature`	float	0.5	Sampling temperature
`concurrency`	int	5	Parallel requests
`top_p`	float	1	Top-p sampling
`tools`	list or None	None	Tool definitions for function calling
`response_format`	dict or None	None	Structured output format

Running Evaluations on a Dataset

Score every row using an evaluation template. Map the template’s required inputs to your dataset columns.

dataset.add_evaluation(
    name="tone_check",
    eval_template="tone",
    model="turing_flash",
    required_keys_to_column_names={
        "output": "gpt4o_response",
    },
)

This adds a tone_check column with the evaluation score for each row.

Parameter	Type	Default	Description
`name`	str	required	Column name for the scores
`eval_template`	str	required	Template name (see Cloud Evals)
`model`	str	required	Turing model (`turing_flash`, `turing_small`, `turing_large`)
`required_keys_to_column_names`	dict	required	Maps template inputs to column names
`reason_column`	bool	False	Add a column with the reasoning
`config`	dict or None	None	Template-specific config

Downloading Results

# As a pandas DataFrame
df = dataset.download(load_to_pandas=True)
print(df.head())

# To a file
dataset.download(file_path="results.csv")
# Supported: .csv, .json, .xlsx

Deleting Datasets

dataset.delete()

Chaining

Most methods return self, so you can chain them:

from fi.datasets import Dataset, DatasetConfig

dataset = (
    Dataset(dataset_config=DatasetConfig(name="pipeline", model_type="GenerativeLLM"))
    .create(source="questions.csv")
    .add_run_prompt(
        name="response",
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": "Answer: {{question}}"}],
    )
    .add_evaluation(
        name="helpfulness",
        eval_template="is_helpful",
        model="turing_flash",
        required_keys_to_column_names={"input": "question", "output": "response"},
    )
    .download(file_path="scored.csv")
)

Class Methods

For one-off operations by dataset name, without creating an instance first:

Method	What it does
`Dataset.create_dataset(config, source)`	Create a dataset
`Dataset.download_dataset(name, load_to_pandas=True)`	Download by name
`Dataset.delete_dataset(name)`	Delete by name
`Dataset.get_dataset_config(name)`	Get config by name (cached)
`Dataset.add_dataset_columns(name, columns)`	Add columns by name
`Dataset.add_dataset_rows(name, rows)`	Add rows by name

Datasets

Quick Example

DatasetConfig

Creating Datasets

Empty dataset

From a CSV or JSON file

From HuggingFace

Columns and Rows

Adding columns

Adding rows

Running LLM Prompts on a Dataset

Running Evaluations on a Dataset

Downloading Results

Deleting Datasets

Chaining

Class Methods

Evaluations

Cloud Evals

Knowledge Base

Questions & Discussion

FutureAGI AI Assistant

Quick Example

DatasetConfig

Creating Datasets

Empty dataset

From a CSV or JSON file

From HuggingFace

Columns and Rows

Adding columns

Adding rows

Running LLM Prompts on a Dataset

Running Evaluations on a Dataset

Downloading Results

Deleting Datasets

Chaining

Class Methods

Related

Evaluations

Cloud Evals

Knowledge Base

Questions & Discussion