Datasets

Create, populate, and manage datasets for evaluation. Upload CSV/JSON files, import from HuggingFace, add LLM-generated columns, and run evaluations at scale.

📝
TL;DR
  • pip install futureagi (or comes with ai-evaluation)
  • Create datasets from scratch, CSV/JSON files, or HuggingFace
  • Chain operations: create → add columns → add rows → run evals → download results

Datasets hold your test data and evaluation scores. For the full platform guide, see Dataset docs. Create one, fill it with data, run evals across every row, and download the results.

Note

Requires pip install futureagi and FI_API_KEY + FI_SECRET_KEY in your environment. If you installed ai-evaluation, you already have futureagi.

Quick Example

from fi.datasets import Dataset, DatasetConfig

# Create a dataset
config = DatasetConfig(name="my-eval-data", model_type="GenerativeLLM")
dataset = Dataset(dataset_config=config)
dataset.create()

# Add columns and rows
dataset.add_columns([
    {"name": "question", "data_type": "text"},
    {"name": "answer", "data_type": "text"},
])

dataset.add_rows([
    {"cells": [{"column_name": "question", "value": "What is Python?"}, {"column_name": "answer", "value": "A programming language."}]},
    {"cells": [{"column_name": "question", "value": "What is 2+2?"}, {"column_name": "answer", "value": "4"}]},
])

# Download as a pandas DataFrame
df = dataset.download(load_to_pandas=True)
print(df)
#           question                   answer
# 0  What is Python?  A programming language.
# 1     What is 2+2?                        4

DatasetConfig

Every dataset needs a config with a name and model type.

from fi.datasets import DatasetConfig

config = DatasetConfig(
    name="my-dataset",          # required, max 255 chars
    model_type="GenerativeLLM", # "GenerativeLLM" or "GenerativeImage"
)

Creating Datasets

Empty dataset

from fi.datasets import Dataset, DatasetConfig

config = DatasetConfig(name="my-dataset", model_type="GenerativeLLM")
dataset = Dataset(dataset_config=config).create()

From a CSV or JSON file

dataset = Dataset(dataset_config=DatasetConfig(name="from-file", model_type="GenerativeLLM"))
dataset.create(source="path/to/data.csv")
# Supported: .csv, .json, .jsonl, .xlsx, .xls

From HuggingFace

from fi.datasets.types import HuggingfaceDatasetConfig

hf = HuggingfaceDatasetConfig(name="squad", subset="default", split="train", num_rows=100)
dataset = Dataset(dataset_config=DatasetConfig(name="squad-sample", model_type="GenerativeLLM"))
dataset.create(source=hf)

Columns and Rows

Adding columns

Pass a list of dicts with name and data_type.

dataset.add_columns([
    {"name": "input", "data_type": "text"},
    {"name": "output", "data_type": "text"},
    {"name": "score", "data_type": "float"},
    {"name": "metadata", "data_type": "json"},
])

Column types: text, boolean, integer, float, json, array, image, datetime, audio.

Adding rows

Each row is a dict with a cells list. Each cell maps a column name to a value.

dataset.add_rows([
    {"cells": [
        {"column_name": "input", "value": "Summarize this article"},
        {"column_name": "output", "value": "The article discusses..."},
        {"column_name": "score", "value": 0.85},
    ]},
])

Tip

You can also use typed Column, Row, and Cell objects from fi.datasets.types instead of dicts. Both work the same way — dicts are simpler for most cases.

Running LLM Prompts on a Dataset

Run an LLM on every row to generate outputs. Use {{column_name}} in your messages to reference column values.

dataset.add_run_prompt(
    name="gpt4o_response",
    model="gpt-4o-mini",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Answer this question: {{question}}"},
    ],
    max_tokens=500,
    temperature=0.7,
)

A new column gpt4o_response appears with the LLM output for each row.

ParameterTypeDefaultDescription
namestrrequiredColumn name for the generated outputs
modelstrrequiredLLM model name (e.g. "gpt-4o-mini")
messageslistrequiredChat messages with {{column}} placeholders
max_tokensint500Maximum tokens per response
temperaturefloat0.5Sampling temperature
concurrencyint5Parallel requests
top_pfloat1Top-p sampling
toolslist or NoneNoneTool definitions for function calling
response_formatdict or NoneNoneStructured output format

Running Evaluations on a Dataset

Score every row using an evaluation template. Map the template’s required inputs to your dataset columns.

dataset.add_evaluation(
    name="tone_check",
    eval_template="tone",
    model="turing_flash",
    required_keys_to_column_names={
        "output": "gpt4o_response",
    },
)

This adds a tone_check column with the evaluation score for each row.

ParameterTypeDefaultDescription
namestrrequiredColumn name for the scores
eval_templatestrrequiredTemplate name (see Cloud Evals)
modelstrrequiredTuring model (turing_flash, turing_small, turing_large)
required_keys_to_column_namesdictrequiredMaps template inputs to column names
reason_columnboolFalseAdd a column with the reasoning
configdict or NoneNoneTemplate-specific config

Downloading Results

# As a pandas DataFrame
df = dataset.download(load_to_pandas=True)
print(df.head())

# To a file
dataset.download(file_path="results.csv")
# Supported: .csv, .json, .xlsx

Deleting Datasets

dataset.delete()

Chaining

Most methods return self, so you can chain them:

from fi.datasets import Dataset, DatasetConfig

dataset = (
    Dataset(dataset_config=DatasetConfig(name="pipeline", model_type="GenerativeLLM"))
    .create(source="questions.csv")
    .add_run_prompt(
        name="response",
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": "Answer: {{question}}"}],
    )
    .add_evaluation(
        name="helpfulness",
        eval_template="is_helpful",
        model="turing_flash",
        required_keys_to_column_names={"input": "question", "output": "response"},
    )
    .download(file_path="scored.csv")
)

Class Methods

For one-off operations by dataset name, without creating an instance first:

MethodWhat it does
Dataset.create_dataset(config, source)Create a dataset
Dataset.download_dataset(name, load_to_pandas=True)Download by name
Dataset.delete_dataset(name)Delete by name
Dataset.get_dataset_config(name)Get config by name (cached)
Dataset.add_dataset_columns(name, columns)Add columns by name
Dataset.add_dataset_rows(name, rows)Add rows by name
Was this page helpful?

Questions & Discussion