Datasets
Reference for the Dataset class in the Future AGI Python SDK.
Dataset
Class
The Dataset
class is the primary client for managing datasets in the Future AGI SDK. It supports both class-level (static) and instance-level operations for creating, downloading, modifying, and deleting datasets, as well as adding columns, rows, prompts, and evaluations.
Initialization
Arguments:
dataset_config
(Optional[DatasetConfig]): The configuration for the dataset. If provided and has no ID, the config will be fetched by name.fi_api_key
(Optional[str]): API key for authentication.fi_secret_key
(Optional[str]): Secret key for authentication.fi_base_url
(Optional[str]): Base URL for the API.**kwargs
: Additional keyword arguments for advanced configuration.
Instance Methods
create
Creates a new dataset (optionally from a file or Huggingface config)
- Returns:
Dataset
instance
download
Downloads the dataset to a file or as a pandas DataFrame.
- Returns:
- File path (
str
) - DataFrame
Dataset
instance
- File path (
delete
Deletes the current dataset.
- Returns:
- None
get_config
- Returns:
DatasetConfig
instance
add_columns
Adds columns to the dataset.
- Arguments:
columns
(List[Union[Column, dict]]): A list ofColumn
objects or dictionaries.
- Returns:
Dataset
instance
add_rows
Adds rows to the dataset.
- Arguments:
rows
(List[Union[Row, dict]]): A list ofRow
objects or dictionaries.
- Returns:
Dataset
instance
get_column_id
Returns the column ID for a given column name.
- Arguments:
column_name
(str): The name of the column.
- Returns:
- The column ID (
str
)
- The column ID (
add_run_prompt
Adds a run prompt column to the dataset.
- Arguments:
name
(str): The name of the run prompt column.model
(str): The model to use for the run prompt column.messages
(List[Dict[str, str]]): The messages to use for the run prompt column.output_format
(str): The output format to use for the run prompt column.concurrency
(int): The concurrency to use for the run prompt column.max_tokens
(int): The max tokens to use for the run prompt column.temperature
(float): The temperature to use for the run prompt column.presence_penalty
(float): The presence penalty to use for the run prompt column.frequency_penalty
(float): The frequency penalty to use for the run prompt column.top_p
(float): The top p to use for the run prompt column.tools
(Optional[List[Dict]]): The tools to use for the run prompt column.tool_choice
(Optional[Any]): The tool choice to use for the run prompt column.response_format
(Optional[Dict]): The response format to use for the run prompt column.
- Returns:
Dataset
instance
add_evaluation
Adds an evaluation to the dataset.
- Arguments:
name
(str): The name of the evaluation.eval_template
(str): The evaluation template to use for the evaluation.required_keys_to_column_names
(Dict[str, str]): The required keys to column names to use for the evaluation.save_as_template
(bool): Whether to save the evaluation as a template.run
(bool): Whether to run the evaluation.reason_column
(bool): Whether to add a reason column to the evaluation.config
(Optional[Dict[str, Any]]): The configuration to use for the evaluation.
- Returns:
Dataset
instance
get_eval_stats
Returns evaluation statistics for the dataset.
- Returns:
- A dictionary containing evaluation statistics.
add_optimization
Adds an optimization task to the dataset.
- Arguments:
optimization_name
(str): The name of the optimization task.prompt_column_name
(str): The name of the prompt column to optimize.optimize_type
(str): The type of optimization to perform.model_config
(Optional[Dict[str, Any]]): The model configuration to use for the optimization.
- Returns:
Dataset
instance
Class Methods
create_dataset
Creates a dataset using the provided config.
- Arguments:
dataset_config
(DatasetConfig): The configuration for the dataset.source
(Optional[Union[str, HuggingfaceDatasetConfig]]): The source to use for the dataset.
- Returns:
Dataset
instance
download_dataset
Downloads a dataset by name.
- Arguments:
dataset_name
(str): The name of the dataset.file_path
(Optional[str]): The file path to save the dataset to.load_to_pandas
(bool): Whether to load the dataset to a pandas DataFrame.
- Returns:
- The file path (
str
) - DataFrame
- The file path (
delete_dataset
Deletes a dataset by name.
- Arguments:
dataset_name
(str): The name of the dataset.
- Returns:
- None
get_dataset_config
Fetches and caches the dataset configuration.
- Arguments:
dataset_name
(str): The name of the dataset.excluded_datasets
(Optional[List[str]]): The datasets to exclude from the configuration.
- Returns:
Dataset
instance
add_dataset_columns
Adds columns to a dataset.
- Arguments:
dataset_name
(str): The name of the dataset.columns
(List[Union[Column, dict]]): The columns to add to the dataset.
- Returns:
Dataset
instance
add_dataset_rows
Adds rows to a dataset.
- Arguments:
dataset_name
(str): The name of the dataset.rows
(List[Union[Row, dict]]): The rows to add to the dataset.
- Returns:
Dataset
instance