Dataset Class

The Dataset class is the primary client for managing datasets in the Future AGI SDK. It supports both class-level (static) and instance-level operations for creating, downloading, modifying, and deleting datasets, as well as adding columns, rows, prompts, and evaluations.

Initialization

def __init__(
    self,
    dataset_config: Optional[DatasetConfig] = None,
    fi_api_key: Optional[str] = None,
    fi_secret_key: Optional[str] = None,
    fi_base_url: Optional[str] = None,
    **kwargs,
)

Arguments:

  • dataset_config (Optional[DatasetConfig]): The configuration for the dataset. If provided and has no ID, the config will be fetched by name.
  • fi_api_key (Optional[str]): API key for authentication.
  • fi_secret_key (Optional[str]): Secret key for authentication.
  • fi_base_url (Optional[str]): Base URL for the API.
  • **kwargs: Additional keyword arguments for advanced configuration.

Instance Methods

create

Creates a new dataset (optionally from a file or Huggingface config)

def create(self, source: Optional[Union[str, HuggingfaceDatasetConfig]] = None) -> "Dataset"
  • Returns:
    • Dataset instance

download

Downloads the dataset to a file or as a pandas DataFrame.

def download(self, file_path: Optional[str] = None, load_to_pandas: bool = False) -> Union[str, pd.DataFrame, "Dataset"]
  • Returns:
    • File path (str)
    • DataFrame
    • Dataset instance

delete

Deletes the current dataset.

def delete(self) -> None
  • Returns:
    • None

get_config

def get_config(self) -> DatasetConfig
  • Returns:
    • DatasetConfig instance

add_columns

Adds columns to the dataset.

def add_columns(self, columns: List[Union[Column, dict]]) -> "Dataset"
  • Arguments:
    • columns (List[Union[Column, dict]]): A list of Column objects or dictionaries.
  • Returns:
    • Dataset instance

add_rows

Adds rows to the dataset.

def add_rows(self, rows: List[Union[Row, dict]]) -> "Dataset"
  • Arguments:
    • rows (List[Union[Row, dict]]): A list of Row objects or dictionaries.
  • Returns:
    • Dataset instance

get_column_id

Returns the column ID for a given column name.

def get_column_id(self, column_name: str) -> Optional[str]
  • Arguments:
    • column_name (str): The name of the column.
  • Returns:
    • The column ID (str)

add_run_prompt

Adds a run prompt column to the dataset.

def add_run_prompt(
    self,
    name: str,
    model: str,
    messages: List[Dict[str, str]],
    output_format: str = "string",
    concurrency: int = 5,
    max_tokens: int = 500,
    temperature: float = 0.5,
    presence_penalty: float = 1,
    frequency_penalty: float = 1,
    top_p: float = 1,
    tools: Optional[List[Dict]] = None,
    tool_choice: Optional[Any] = None,
    response_format: Optional[Dict] = None,
) -> "Dataset"
  • Arguments:
    • name (str): The name of the run prompt column.
    • model (str): The model to use for the run prompt column.
    • messages (List[Dict[str, str]]): The messages to use for the run prompt column.
    • output_format (str): The output format to use for the run prompt column.
    • concurrency (int): The concurrency to use for the run prompt column.
    • max_tokens (int): The max tokens to use for the run prompt column.
    • temperature (float): The temperature to use for the run prompt column.
    • presence_penalty (float): The presence penalty to use for the run prompt column.
    • frequency_penalty (float): The frequency penalty to use for the run prompt column.
    • top_p (float): The top p to use for the run prompt column.
    • tools (Optional[List[Dict]]): The tools to use for the run prompt column.
    • tool_choice (Optional[Any]): The tool choice to use for the run prompt column.
    • response_format (Optional[Dict]): The response format to use for the run prompt column.
  • Returns:
    • Dataset instance

add_evaluation

Adds an evaluation to the dataset.

def add_evaluation(
    self,
    name: str,
    eval_template: str,
    required_keys_to_column_names: Dict[str, str],
    save_as_template: bool = False,
    run: bool = True,
    reason_column: bool = False,
    config: Optional[Dict[str, Any]] = None,
) -> "Dataset"
  • Arguments:
    • name (str): The name of the evaluation.
    • eval_template (str): The evaluation template to use for the evaluation.
    • required_keys_to_column_names (Dict[str, str]): The required keys to column names to use for the evaluation.
    • save_as_template (bool): Whether to save the evaluation as a template.
    • run (bool): Whether to run the evaluation.
    • reason_column (bool): Whether to add a reason column to the evaluation.
    • config (Optional[Dict[str, Any]]): The configuration to use for the evaluation.
  • Returns:
    • Dataset instance

get_eval_stats

Returns evaluation statistics for the dataset.

def get_eval_stats(self) -> Dict[str, Any]
  • Returns:
    • A dictionary containing evaluation statistics.

add_optimization

Adds an optimization task to the dataset.

def add_optimization(
    self,
    optimization_name: str,
    prompt_column_name: str,
    optimize_type: str = "PROMPT_TEMPLATE",
    model_config: Optional[Dict[str, Any]] = None,
) -> "Dataset"
  • Arguments:
    • optimization_name (str): The name of the optimization task.
    • prompt_column_name (str): The name of the prompt column to optimize.
    • optimize_type (str): The type of optimization to perform.
    • model_config (Optional[Dict[str, Any]]): The model configuration to use for the optimization.
  • Returns:
    • Dataset instance

Class Methods

create_dataset

Creates a dataset using the provided config.

@classmethod
def create_dataset(cls, dataset_config: DatasetConfig, source: Optional[Union[str, HuggingfaceDatasetConfig]] = None, **kwargs) -> "Dataset"
  • Arguments:
    • dataset_config (DatasetConfig): The configuration for the dataset.
    • source (Optional[Union[str, HuggingfaceDatasetConfig]]): The source to use for the dataset.
  • Returns:
    • Dataset instance

download_dataset

Downloads a dataset by name.

@classmethod
def download_dataset(cls, dataset_name: str, file_path: Optional[str] = None, load_to_pandas: bool = False, **kwargs) -> Union[str, pd.DataFrame]
  • Arguments:
    • dataset_name (str): The name of the dataset.
    • file_path (Optional[str]): The file path to save the dataset to.
    • load_to_pandas (bool): Whether to load the dataset to a pandas DataFrame.
  • Returns:
    • The file path (str)
    • DataFrame

delete_dataset

Deletes a dataset by name.

@classmethod
def delete_dataset(cls, dataset_name: str, **kwargs) -> None
  • Arguments:
    • dataset_name (str): The name of the dataset.
  • Returns:
    • None

get_dataset_config

Fetches and caches the dataset configuration.

@classmethod
def get_dataset_config(cls, dataset_name: str, excluded_datasets: Optional[List[str]] = None, **kwargs) -> "Dataset"
  • Arguments:
    • dataset_name (str): The name of the dataset.
    • excluded_datasets (Optional[List[str]]): The datasets to exclude from the configuration.
  • Returns:
    • Dataset instance

add_dataset_columns

Adds columns to a dataset.

@classmethod
def add_dataset_columns(cls, dataset_name: str, columns: List[Union[Column, dict]], **kwargs)
  • Arguments:
    • dataset_name (str): The name of the dataset.
    • columns (List[Union[Column, dict]]): The columns to add to the dataset.
  • Returns:
    • Dataset instance

add_dataset_rows

Adds rows to a dataset.

@classmethod
def add_dataset_rows(cls, dataset_name: str, rows: List[Union[Row, dict]], **kwargs)
  • Arguments:
    • dataset_name (str): The name of the dataset.
    • rows (List[Union[Row, dict]]): The rows to add to the dataset.
  • Returns:
    • Dataset instance