Datasets
Create, populate, and manage datasets for evaluation. Upload CSV/JSON files, import from HuggingFace, add LLM-generated columns, and run evaluations at scale.
pip install futureagi(or comes withai-evaluation)- Create datasets from scratch, CSV/JSON files, or HuggingFace
- Chain operations: create → add columns → add rows → run evals → download results
Datasets hold your test data and evaluation scores. For the full platform guide, see Dataset docs. Create one, fill it with data, run evals across every row, and download the results.
Note
Requires pip install futureagi and FI_API_KEY + FI_SECRET_KEY in your environment. If you installed ai-evaluation, you already have futureagi.
Quick Example
from fi.datasets import Dataset, DatasetConfig
# Create a dataset
config = DatasetConfig(name="my-eval-data", model_type="GenerativeLLM")
dataset = Dataset(dataset_config=config)
dataset.create()
# Add columns and rows
dataset.add_columns([
{"name": "question", "data_type": "text"},
{"name": "answer", "data_type": "text"},
])
dataset.add_rows([
{"cells": [{"column_name": "question", "value": "What is Python?"}, {"column_name": "answer", "value": "A programming language."}]},
{"cells": [{"column_name": "question", "value": "What is 2+2?"}, {"column_name": "answer", "value": "4"}]},
])
# Download as a pandas DataFrame
df = dataset.download(load_to_pandas=True)
print(df)
# question answer
# 0 What is Python? A programming language.
# 1 What is 2+2? 4
DatasetConfig
Every dataset needs a config with a name and model type.
from fi.datasets import DatasetConfig
config = DatasetConfig(
name="my-dataset", # required, max 255 chars
model_type="GenerativeLLM", # "GenerativeLLM" or "GenerativeImage"
)
Creating Datasets
Empty dataset
from fi.datasets import Dataset, DatasetConfig
config = DatasetConfig(name="my-dataset", model_type="GenerativeLLM")
dataset = Dataset(dataset_config=config).create()
From a CSV or JSON file
dataset = Dataset(dataset_config=DatasetConfig(name="from-file", model_type="GenerativeLLM"))
dataset.create(source="path/to/data.csv")
# Supported: .csv, .json, .jsonl, .xlsx, .xls
From HuggingFace
from fi.datasets.types import HuggingfaceDatasetConfig
hf = HuggingfaceDatasetConfig(name="squad", subset="default", split="train", num_rows=100)
dataset = Dataset(dataset_config=DatasetConfig(name="squad-sample", model_type="GenerativeLLM"))
dataset.create(source=hf)
Columns and Rows
Adding columns
Pass a list of dicts with name and data_type.
dataset.add_columns([
{"name": "input", "data_type": "text"},
{"name": "output", "data_type": "text"},
{"name": "score", "data_type": "float"},
{"name": "metadata", "data_type": "json"},
])
Column types: text, boolean, integer, float, json, array, image, datetime, audio.
Adding rows
Each row is a dict with a cells list. Each cell maps a column name to a value.
dataset.add_rows([
{"cells": [
{"column_name": "input", "value": "Summarize this article"},
{"column_name": "output", "value": "The article discusses..."},
{"column_name": "score", "value": 0.85},
]},
])
Tip
You can also use typed Column, Row, and Cell objects from fi.datasets.types instead of dicts. Both work the same way — dicts are simpler for most cases.
Running LLM Prompts on a Dataset
Run an LLM on every row to generate outputs. Use {{column_name}} in your messages to reference column values.
dataset.add_run_prompt(
name="gpt4o_response",
model="gpt-4o-mini",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Answer this question: {{question}}"},
],
max_tokens=500,
temperature=0.7,
)
A new column gpt4o_response appears with the LLM output for each row.
| Parameter | Type | Default | Description |
|---|---|---|---|
name | str | required | Column name for the generated outputs |
model | str | required | LLM model name (e.g. "gpt-4o-mini") |
messages | list | required | Chat messages with {{column}} placeholders |
max_tokens | int | 500 | Maximum tokens per response |
temperature | float | 0.5 | Sampling temperature |
concurrency | int | 5 | Parallel requests |
top_p | float | 1 | Top-p sampling |
tools | list or None | None | Tool definitions for function calling |
response_format | dict or None | None | Structured output format |
Running Evaluations on a Dataset
Score every row using an evaluation template. Map the template’s required inputs to your dataset columns.
dataset.add_evaluation(
name="tone_check",
eval_template="tone",
model="turing_flash",
required_keys_to_column_names={
"output": "gpt4o_response",
},
)
This adds a tone_check column with the evaluation score for each row.
| Parameter | Type | Default | Description |
|---|---|---|---|
name | str | required | Column name for the scores |
eval_template | str | required | Template name (see Cloud Evals) |
model | str | required | Turing model (turing_flash, turing_small, turing_large) |
required_keys_to_column_names | dict | required | Maps template inputs to column names |
reason_column | bool | False | Add a column with the reasoning |
config | dict or None | None | Template-specific config |
Downloading Results
# As a pandas DataFrame
df = dataset.download(load_to_pandas=True)
print(df.head())
# To a file
dataset.download(file_path="results.csv")
# Supported: .csv, .json, .xlsx
Deleting Datasets
dataset.delete()
Chaining
Most methods return self, so you can chain them:
from fi.datasets import Dataset, DatasetConfig
dataset = (
Dataset(dataset_config=DatasetConfig(name="pipeline", model_type="GenerativeLLM"))
.create(source="questions.csv")
.add_run_prompt(
name="response",
model="gpt-4o-mini",
messages=[{"role": "user", "content": "Answer: {{question}}"}],
)
.add_evaluation(
name="helpfulness",
eval_template="is_helpful",
model="turing_flash",
required_keys_to_column_names={"input": "question", "output": "response"},
)
.download(file_path="scored.csv")
)
Class Methods
For one-off operations by dataset name, without creating an instance first:
| Method | What it does |
|---|---|
Dataset.create_dataset(config, source) | Create a dataset |
Dataset.download_dataset(name, load_to_pandas=True) | Download by name |
Dataset.delete_dataset(name) | Delete by name |
Dataset.get_dataset_config(name) | Get config by name (cached) |
Dataset.add_dataset_columns(name, columns) | Add columns by name |
Dataset.add_dataset_rows(name, rows) | Add rows by name |