• Access to Diverse Datasets: Leverage high-quality, curated datasets for training and testing.
  • Preprocessed Data: Many datasets are formatted and ready to use.
  • Easy Integration: Directly imports into the platform without manual conversion.

This feature streamlines the process of working with established datasets, making it faster and more efficient to get started with data-driven experiments.


Steps to Import a Hugging Face Dataset

1. Access the Dataset Import Section

  • Navigate to the Datasets & Experiments section.

  • Click on “Add Dataset” to access dataset creation options.

  • Select “Import from Hugging Face” from the available choices.

2. Browse and Select a Dataset

  • The system presents a catalog of datasets sourced from Hugging Face.
  • Each dataset includes key metadata such as:
    • Dataset Name (e.g., databricks-dolly-15k)
    • Source (e.g., OpenAI, Hugging Face, Microsoft)
    • Record Count
    • Usage Popularity and Metadata
  • Use the search functionality to locate a specific dataset.

3. Configure Dataset Parameters

  • Upon selection, a configuration panel appears displaying:
    • Dataset Overview: Summary, source, and dataset reference link.

    • Subset Selection: Options include Default, Train, or Split.

    • Number of Rows: Specify the number of records to be imported.

    • Additional Preferences: Optionally enable “Add selected rows” for precise filtering.

4. Initiate the Import Process

  • Click “Start Experimenting” to commence the dataset ingestion.
  • The imported dataset will be available in the Datasets & Experiments section for further processing and utilization.