Adding Dataset
Import from Hugging Face
Import datasets from Hugging Face to leverage pre-existing models and datasets.
Why Import from Hugging Face?
- Access to Diverse Datasets: Leverage high-quality, curated datasets for training and testing.
- Preprocessed Data: Many datasets are formatted and ready to use.
- Easy Integration: Directly imports into the platform without manual conversion.
This feature streamlines the process of working with established datasets, making it faster and more efficient to get started with data-driven experiments.
Steps to Import a Hugging Face Dataset
1. Access the Dataset Import Section
- Navigate to the Datasets & Experiments section.
- Click on “Add Dataset” to access dataset creation options.
- Select “Import from Hugging Face” from the available choices.
2. Browse and Select a Dataset
- The system presents a catalog of datasets sourced from Hugging Face.
- Each dataset includes key metadata such as:
- Dataset Name (e.g., databricks-dolly-15k)
- Source (e.g., OpenAI, Hugging Face, Microsoft)
- Record Count
- Usage Popularity and Metadata
- Use the search functionality to locate a specific dataset.
3. Configure Dataset Parameters
- Upon selection, a configuration panel appears displaying:
- Dataset Overview: Summary, source, and dataset reference link.
- Subset Selection: Options include Default, Train, or Split.
- Number of Rows: Specify the number of records to be imported.
- Additional Preferences: Optionally enable “Add selected rows” for precise filtering.
4. Initiate the Import Process
- Click “Start Experimenting” to commence the dataset ingestion.
- The imported dataset will be available in the Datasets & Experiments section for further processing and utilization.
Was this page helpful?
Previous
Add from Existing Dataset or ExperimentThis feature allows users to incorporate data from previously created datasets or experiments, streamlining the workflow by reusing structured information. Instead of manually recreating datasets, you can efficiently map and integrate existing data, ensuring consistency and reducing redundancy.
Next