Dataset
Class
The Dataset
class is the primary client for managing datasets in the Future AGI SDK. It supports both class-level (static) and instance-level operations for creating, downloading, modifying, and deleting datasets, as well as adding columns, rows, prompts, and evaluations.
Initialization
dataset_config
(Optional[DatasetConfig]): The configuration for the dataset. If provided and has no ID, the config will be fetched by name.fi_api_key
(Optional[str]): API key for authentication.fi_secret_key
(Optional[str]): Secret key for authentication.fi_base_url
(Optional[str]): Base URL for the API.**kwargs
: Additional keyword arguments for advanced configuration.
Instance Methods
create
Creates a new dataset (optionally from a file or Huggingface config)
- Returns:
Dataset
instance
download
Downloads the dataset to a file or as a pandas DataFrame.
- Returns:
- File path (
str
) - DataFrame
Dataset
instance
- File path (
delete
Deletes the current dataset.
- Returns:
- None
get_config
- Returns:
DatasetConfig
instance
add_columns
Adds columns to the dataset.
- Arguments:
columns
(List[Union[Column, dict]]): A list ofColumn
objects or dictionaries.
- Returns:
Dataset
instance
add_rows
Adds rows to the dataset.
- Arguments:
rows
(List[Union[Row, dict]]): A list ofRow
objects or dictionaries.
- Returns:
Dataset
instance
get_column_id
Returns the column ID for a given column name.
- Arguments:
column_name
(str): The name of the column.
- Returns:
- The column ID (
str
)
- The column ID (
add_run_prompt
Adds a run prompt column to the dataset.
- Arguments:
name
(str): The name of the run prompt column.model
(str): The model to use for the run prompt column.messages
(List[Dict[str, str]]): The messages to use for the run prompt column.output_format
(str): The output format to use for the run prompt column.concurrency
(int): The concurrency to use for the run prompt column.max_tokens
(int): The max tokens to use for the run prompt column.temperature
(float): The temperature to use for the run prompt column.presence_penalty
(float): The presence penalty to use for the run prompt column.frequency_penalty
(float): The frequency penalty to use for the run prompt column.top_p
(float): The top p to use for the run prompt column.tools
(Optional[List[Dict]]): The tools to use for the run prompt column.tool_choice
(Optional[Any]): The tool choice to use for the run prompt column.response_format
(Optional[Dict]): The response format to use for the run prompt column.
- Returns:
Dataset
instance
add_evaluation
Adds an evaluation to the dataset.
- Arguments:
name
(str): The name of the evaluation.eval_template
(str): The evaluation template to use for the evaluation.required_keys_to_column_names
(Dict[str, str]): The required keys to column names to use for the evaluation.save_as_template
(bool): Whether to save the evaluation as a template.run
(bool): Whether to run the evaluation.reason_column
(bool): Whether to add a reason column to the evaluation.config
(Optional[Dict[str, Any]]): The configuration to use for the evaluation.
- Returns:
Dataset
instance
get_eval_stats
Returns evaluation statistics for the dataset.
- Returns:
- A dictionary containing evaluation statistics.
add_optimization
Adds an optimization task to the dataset.
- Arguments:
optimization_name
(str): The name of the optimization task.prompt_column_name
(str): The name of the prompt column to optimize.optimize_type
(str): The type of optimization to perform.model_config
(Optional[Dict[str, Any]]): The model configuration to use for the optimization.
- Returns:
Dataset
instance
Class Methods
create_dataset
Creates a dataset using the provided config.
- Arguments:
dataset_config
(DatasetConfig): The configuration for the dataset.source
(Optional[Union[str, HuggingfaceDatasetConfig]]): The source to use for the dataset.
- Returns:
Dataset
instance
download_dataset
Downloads a dataset by name.
- Arguments:
dataset_name
(str): The name of the dataset.file_path
(Optional[str]): The file path to save the dataset to.load_to_pandas
(bool): Whether to load the dataset to a pandas DataFrame.
- Returns:
- The file path (
str
) - DataFrame
- The file path (
delete_dataset
Deletes a dataset by name.
- Arguments:
dataset_name
(str): The name of the dataset.
- Returns:
- None
get_dataset_config
Fetches and caches the dataset configuration.
- Arguments:
dataset_name
(str): The name of the dataset.excluded_datasets
(Optional[List[str]]): The datasets to exclude from the configuration.
- Returns:
Dataset
instance
add_dataset_columns
Adds columns to a dataset.
- Arguments:
dataset_name
(str): The name of the dataset.columns
(List[Union[Column, dict]]): The columns to add to the dataset.
- Returns:
Dataset
instance
add_dataset_rows
Adds rows to a dataset.
- Arguments:
dataset_name
(str): The name of the dataset.rows
(List[Union[Row, dict]]): The rows to add to the dataset.
- Returns:
Dataset
instance