Datasets
Reference for the Dataset class in the Future AGI Python SDK.
Dataset Class
The Dataset class is the primary client for managing datasets in the Future AGI SDK. It supports both class-level (static) and instance-level operations for creating, downloading, modifying, and deleting datasets, as well as adding columns, rows, prompts, and evaluations.
Initialization
def __init__(
self,
dataset_config: Optional[DatasetConfig] = None,
fi_api_key: Optional[str] = None,
fi_secret_key: Optional[str] = None,
fi_base_url: Optional[str] = None,
**kwargs,
)
Arguments:
dataset_config(Optional[DatasetConfig]): The configuration for the dataset. If provided and has no ID, the config will be fetched by name.fi_api_key(Optional[str]): API key for authentication.fi_secret_key(Optional[str]): Secret key for authentication.fi_base_url(Optional[str]): Base URL for the API.**kwargs: Additional keyword arguments for advanced configuration.
Instance Methods
create
Creates a new dataset (optionally from a file or Huggingface config)
def create(self, source: Optional[Union[str, HuggingfaceDatasetConfig]] = None) -> "Dataset"
- Returns:
Datasetinstance
download
Downloads the dataset to a file or as a pandas DataFrame.
def download(self, file_path: Optional[str] = None, load_to_pandas: bool = False) -> Union[str, pd.DataFrame, "Dataset"]
- Returns:
- File path (
str) - DataFrame
Datasetinstance
- File path (
delete
Deletes the current dataset.
def delete(self) -> None
- Returns:
- None
get_config
def get_config(self) -> DatasetConfig
- Returns:
DatasetConfiginstance
add_columns
Adds columns to the dataset.
def add_columns(self, columns: List[Union[Column, dict]]) -> "Dataset"
- Arguments:
columns(List[Union[Column, dict]]): A list ofColumnobjects or dictionaries.
- Returns:
Datasetinstance
add_rows
Adds rows to the dataset.
def add_rows(self, rows: List[Union[Row, dict]]) -> "Dataset"
- Arguments:
rows(List[Union[Row, dict]]): A list ofRowobjects or dictionaries.
- Returns:
Datasetinstance
get_column_id
Returns the column ID for a given column name.
def get_column_id(self, column_name: str) -> Optional[str]
- Arguments:
column_name(str): The name of the column.
- Returns:
- The column ID (
str)
- The column ID (
add_run_prompt
Adds a run prompt column to the dataset.
def add_run_prompt(
self,
name: str,
model: str,
messages: List[Dict[str, str]],
output_format: str = "string",
concurrency: int = 5,
max_tokens: int = 500,
temperature: float = 0.5,
presence_penalty: float = 1,
frequency_penalty: float = 1,
top_p: float = 1,
tools: Optional[List[Dict]] = None,
tool_choice: Optional[Any] = None,
response_format: Optional[Dict] = None,
) -> "Dataset"
- Arguments:
name(str): The name of the run prompt column.model(str): The model to use for the run prompt column.messages(List[Dict[str, str]]): The messages to use for the run prompt column.output_format(str): The output format to use for the run prompt column.concurrency(int): The concurrency to use for the run prompt column.max_tokens(int): The max tokens to use for the run prompt column.temperature(float): The temperature to use for the run prompt column.presence_penalty(float): The presence penalty to use for the run prompt column.frequency_penalty(float): The frequency penalty to use for the run prompt column.top_p(float): The top p to use for the run prompt column.tools(Optional[List[Dict]]): The tools to use for the run prompt column.tool_choice(Optional[Any]): The tool choice to use for the run prompt column.response_format(Optional[Dict]): The response format to use for the run prompt column.
- Returns:
Datasetinstance
add_evaluation
Adds an evaluation to the dataset.
def add_evaluation(
self,
name: str,
eval_template: str,
required_keys_to_column_names: Dict[str, str],
save_as_template: bool = False,
run: bool = True,
reason_column: bool = False,
config: Optional[Dict[str, Any]] = None,
) -> "Dataset"
- Arguments:
name(str): The name of the evaluation.eval_template(str): The evaluation template to use for the evaluation.required_keys_to_column_names(Dict[str, str]): The required keys to column names to use for the evaluation.save_as_template(bool): Whether to save the evaluation as a template.run(bool): Whether to run the evaluation.reason_column(bool): Whether to add a reason column to the evaluation.config(Optional[Dict[str, Any]]): The configuration to use for the evaluation.
- Returns:
Datasetinstance
get_eval_stats
Returns evaluation statistics for the dataset.
def get_eval_stats(self) -> Dict[str, Any]
- Returns:
- A dictionary containing evaluation statistics.
add_optimization
Adds an optimization task to the dataset.
def add_optimization(
self,
optimization_name: str,
prompt_column_name: str,
optimize_type: str = "PROMPT_TEMPLATE",
model_config: Optional[Dict[str, Any]] = None,
) -> "Dataset"
- Arguments:
optimization_name(str): The name of the optimization task.prompt_column_name(str): The name of the prompt column to optimize.optimize_type(str): The type of optimization to perform.model_config(Optional[Dict[str, Any]]): The model configuration to use for the optimization.
- Returns:
Datasetinstance
Class Methods
create_dataset
Creates a dataset using the provided config.
@classmethod
def create_dataset(cls, dataset_config: DatasetConfig, source: Optional[Union[str, HuggingfaceDatasetConfig]] = None, **kwargs) -> "Dataset"
- Arguments:
dataset_config(DatasetConfig): The configuration for the dataset.source(Optional[Union[str, HuggingfaceDatasetConfig]]): The source to use for the dataset.
- Returns:
Datasetinstance
download_dataset
Downloads a dataset by name.
@classmethod
def download_dataset(cls, dataset_name: str, file_path: Optional[str] = None, load_to_pandas: bool = False, **kwargs) -> Union[str, pd.DataFrame]
- Arguments:
dataset_name(str): The name of the dataset.file_path(Optional[str]): The file path to save the dataset to.load_to_pandas(bool): Whether to load the dataset to a pandas DataFrame.
- Returns:
- The file path (
str) - DataFrame
- The file path (
delete_dataset
Deletes a dataset by name.
@classmethod
def delete_dataset(cls, dataset_name: str, **kwargs) -> None
- Arguments:
dataset_name(str): The name of the dataset.
- Returns:
- None
get_dataset_config
Fetches and caches the dataset configuration.
@classmethod
def get_dataset_config(cls, dataset_name: str, excluded_datasets: Optional[List[str]] = None, **kwargs) -> "Dataset"
- Arguments:
dataset_name(str): The name of the dataset.excluded_datasets(Optional[List[str]]): The datasets to exclude from the configuration.
- Returns:
Datasetinstance
add_dataset_columns
Adds columns to a dataset.
@classmethod
def add_dataset_columns(cls, dataset_name: str, columns: List[Union[Column, dict]], **kwargs)
- Arguments:
dataset_name(str): The name of the dataset.columns(List[Union[Column, dict]]): The columns to add to the dataset.
- Returns:
Datasetinstance
add_dataset_rows
Adds rows to a dataset.
@classmethod
def add_dataset_rows(cls, dataset_name: str, rows: List[Union[Row, dict]], **kwargs)
- Arguments:
dataset_name(str): The name of the dataset.rows(List[Union[Row, dict]]): The rows to add to the dataset.
- Returns:
Datasetinstance