Datasets

Reference for the Dataset class in the Future AGI Python SDK.

Dataset Class

The Dataset class is the primary client for managing datasets in the Future AGI SDK. It supports both class-level (static) and instance-level operations for creating, downloading, modifying, and deleting datasets, as well as adding columns, rows, prompts, and evaluations.

Initialization

def __init__(
    self,
    dataset_config: Optional[DatasetConfig] = None,
    fi_api_key: Optional[str] = None,
    fi_secret_key: Optional[str] = None,
    fi_base_url: Optional[str] = None,
    **kwargs,
)

Arguments:

  • dataset_config (Optional[DatasetConfig]): The configuration for the dataset. If provided and has no ID, the config will be fetched by name.
  • fi_api_key (Optional[str]): API key for authentication.
  • fi_secret_key (Optional[str]): Secret key for authentication.
  • fi_base_url (Optional[str]): Base URL for the API.
  • **kwargs: Additional keyword arguments for advanced configuration.

Instance Methods

create

Creates a new dataset (optionally from a file or Huggingface config)

def create(self, source: Optional[Union[str, HuggingfaceDatasetConfig]] = None) -> "Dataset"
  • Returns:
    • Dataset instance

download

Downloads the dataset to a file or as a pandas DataFrame.

def download(self, file_path: Optional[str] = None, load_to_pandas: bool = False) -> Union[str, pd.DataFrame, "Dataset"]
  • Returns:
    • File path (str)
    • DataFrame
    • Dataset instance

delete

Deletes the current dataset.

def delete(self) -> None
  • Returns:
    • None

get_config

def get_config(self) -> DatasetConfig
  • Returns:
    • DatasetConfig instance

add_columns

Adds columns to the dataset.

def add_columns(self, columns: List[Union[Column, dict]]) -> "Dataset"
  • Arguments:
    • columns (List[Union[Column, dict]]): A list of Column objects or dictionaries.
  • Returns:
    • Dataset instance

add_rows

Adds rows to the dataset.

def add_rows(self, rows: List[Union[Row, dict]]) -> "Dataset"
  • Arguments:
    • rows (List[Union[Row, dict]]): A list of Row objects or dictionaries.
  • Returns:
    • Dataset instance

get_column_id

Returns the column ID for a given column name.

def get_column_id(self, column_name: str) -> Optional[str]
  • Arguments:
    • column_name (str): The name of the column.
  • Returns:
    • The column ID (str)

add_run_prompt

Adds a run prompt column to the dataset.

def add_run_prompt(
    self,
    name: str,
    model: str,
    messages: List[Dict[str, str]],
    output_format: str = "string",
    concurrency: int = 5,
    max_tokens: int = 500,
    temperature: float = 0.5,
    presence_penalty: float = 1,
    frequency_penalty: float = 1,
    top_p: float = 1,
    tools: Optional[List[Dict]] = None,
    tool_choice: Optional[Any] = None,
    response_format: Optional[Dict] = None,
) -> "Dataset"
  • Arguments:
    • name (str): The name of the run prompt column.
    • model (str): The model to use for the run prompt column.
    • messages (List[Dict[str, str]]): The messages to use for the run prompt column.
    • output_format (str): The output format to use for the run prompt column.
    • concurrency (int): The concurrency to use for the run prompt column.
    • max_tokens (int): The max tokens to use for the run prompt column.
    • temperature (float): The temperature to use for the run prompt column.
    • presence_penalty (float): The presence penalty to use for the run prompt column.
    • frequency_penalty (float): The frequency penalty to use for the run prompt column.
    • top_p (float): The top p to use for the run prompt column.
    • tools (Optional[List[Dict]]): The tools to use for the run prompt column.
    • tool_choice (Optional[Any]): The tool choice to use for the run prompt column.
    • response_format (Optional[Dict]): The response format to use for the run prompt column.
  • Returns:
    • Dataset instance

add_evaluation

Adds an evaluation to the dataset.

def add_evaluation(
    self,
    name: str,
    eval_template: str,
    required_keys_to_column_names: Dict[str, str],
    save_as_template: bool = False,
    run: bool = True,
    reason_column: bool = False,
    config: Optional[Dict[str, Any]] = None,
) -> "Dataset"
  • Arguments:
    • name (str): The name of the evaluation.
    • eval_template (str): The evaluation template to use for the evaluation.
    • required_keys_to_column_names (Dict[str, str]): The required keys to column names to use for the evaluation.
    • save_as_template (bool): Whether to save the evaluation as a template.
    • run (bool): Whether to run the evaluation.
    • reason_column (bool): Whether to add a reason column to the evaluation.
    • config (Optional[Dict[str, Any]]): The configuration to use for the evaluation.
  • Returns:
    • Dataset instance

get_eval_stats

Returns evaluation statistics for the dataset.

def get_eval_stats(self) -> Dict[str, Any]
  • Returns:
    • A dictionary containing evaluation statistics.

add_optimization

Adds an optimization task to the dataset.

def add_optimization(
    self,
    optimization_name: str,
    prompt_column_name: str,
    optimize_type: str = "PROMPT_TEMPLATE",
    model_config: Optional[Dict[str, Any]] = None,
) -> "Dataset"
  • Arguments:
    • optimization_name (str): The name of the optimization task.
    • prompt_column_name (str): The name of the prompt column to optimize.
    • optimize_type (str): The type of optimization to perform.
    • model_config (Optional[Dict[str, Any]]): The model configuration to use for the optimization.
  • Returns:
    • Dataset instance

Class Methods

create_dataset

Creates a dataset using the provided config.

@classmethod
def create_dataset(cls, dataset_config: DatasetConfig, source: Optional[Union[str, HuggingfaceDatasetConfig]] = None, **kwargs) -> "Dataset"
  • Arguments:
    • dataset_config (DatasetConfig): The configuration for the dataset.
    • source (Optional[Union[str, HuggingfaceDatasetConfig]]): The source to use for the dataset.
  • Returns:
    • Dataset instance

download_dataset

Downloads a dataset by name.

@classmethod
def download_dataset(cls, dataset_name: str, file_path: Optional[str] = None, load_to_pandas: bool = False, **kwargs) -> Union[str, pd.DataFrame]
  • Arguments:
    • dataset_name (str): The name of the dataset.
    • file_path (Optional[str]): The file path to save the dataset to.
    • load_to_pandas (bool): Whether to load the dataset to a pandas DataFrame.
  • Returns:
    • The file path (str)
    • DataFrame

delete_dataset

Deletes a dataset by name.

@classmethod
def delete_dataset(cls, dataset_name: str, **kwargs) -> None
  • Arguments:
    • dataset_name (str): The name of the dataset.
  • Returns:
    • None

get_dataset_config

Fetches and caches the dataset configuration.

@classmethod
def get_dataset_config(cls, dataset_name: str, excluded_datasets: Optional[List[str]] = None, **kwargs) -> "Dataset"
  • Arguments:
    • dataset_name (str): The name of the dataset.
    • excluded_datasets (Optional[List[str]]): The datasets to exclude from the configuration.
  • Returns:
    • Dataset instance

add_dataset_columns

Adds columns to a dataset.

@classmethod
def add_dataset_columns(cls, dataset_name: str, columns: List[Union[Column, dict]], **kwargs)
  • Arguments:
    • dataset_name (str): The name of the dataset.
    • columns (List[Union[Column, dict]]): The columns to add to the dataset.
  • Returns:
    • Dataset instance

add_dataset_rows

Adds rows to a dataset.

@classmethod
def add_dataset_rows(cls, dataset_name: str, rows: List[Union[Row, dict]], **kwargs)
  • Arguments:
    • dataset_name (str): The name of the dataset.
    • rows (List[Union[Row, dict]]): The rows to add to the dataset.
  • Returns:
    • Dataset instance

Was this page helpful?

Questions & Discussion