Why Import from Hugging Face?
- Access to Diverse Datasets: Leverage high-quality, curated datasets for training and testing.
- Preprocessed Data: Many datasets are formatted and ready to use.
- Easy Integration: Directly imports into the platform without manual conversion.
Steps to Import a Hugging Face Dataset
1. Access the Dataset Import Section
- Navigate to the Datasets & Experiments section.
- Click on “Add Dataset” to access dataset creation options.
- Select “Import from Hugging Face” from the available choices.
2. Browse and Select a Dataset
- The system presents a catalog of datasets sourced from Hugging Face.
- Each dataset includes key metadata such as:
- Dataset Name (e.g., databricks-dolly-15k)
- Source (e.g., OpenAI, Hugging Face, Microsoft)
- Record Count
- Usage Popularity and Metadata
- Use the search functionality to locate a specific dataset.
3. Configure Dataset Parameters
- Upon selection, a configuration panel appears displaying:
- Dataset Overview: Summary, source, and dataset reference link.
- Subset Selection: Options include Default, Train, or Split.
- Number of Rows: Specify the number of records to be imported.
- Additional Preferences: Optionally enable “Add selected rows” for precise filtering.
4. Initiate the Import Process
- Click “Start Experimenting” to commence the dataset ingestion.
- The imported dataset will be available in the Datasets & Experiments section for further processing and utilization.