Create Synthetic Data

Prototype AI Applications — Build and test applications with representative data before collecting real data
Augment Training Sets — Expand limited datasets with diverse synthetic examples to improve model performance
Test Edge Cases — Generate rare scenarios that might be difficult to find in real-world data
Ensure Privacy Compliance — Avoid data privacy concerns by using synthetic alternatives to sensitive information
Balance Datasets — Create balanced class distributions for more effective model training

1. Open the Tool

Navigate to the Dataset section in the sidebar. Click Add Dataset → Create Synthetic Data. This opens the interface, where you’ll define the structure and patterns for your synthetic dataset you want to generate.

2. Set Dataset Details

Start by providing basic metadata:

Name (required): Give your dataset a clear, descriptive title.
Description(required) : Write the details of the dataset that you will be generating, what is the purpose of the generation etc.
Use Case : Specify the Use case for your dataset that is going to be used
- “Simulated customer support logs for LLM fine-tuning”
- “Classification dataset with evenly distributed labels”
Pattern (optional): Write the structure of how your dataset should be
- “Follow a Conversational pattern while generating the dataset”
- “Keep the tone formal for all the data points”

This context helps organize datasets in large projects and enables team collaboration.

3. Define the Schema

Click Add Column to define the structure of each row. For every column:

Name: Name of the column (e.g., message, label, timestamp, transcript)
Type: Choose from:
- text, float, integer, boolean, array, json, datetime
Properties:
- Add constraints (like min/max, string patterns, etc.) to ensure realistic value ranges.
- When choosing property Value You can specify the categorical label or go for dynamic and let the generator decide the label
- You can create more properties based on your use case by specifying the name and description of the property

This step is where you define how your data behaves—whether it mimics user queries, numerical values, or system logs.

3.1 Example Schema Definition

Let’s illustrate with an example. Suppose you’re creating a dataset for product reviews. You might define the following columns:

Column 1:
- Name: review_text
- Type: text
- Properties: None specific, as the content is freeform.
Column 2:
- Name: rating
- Type: integer
- Properties:
  - min: 1 (Ensures ratings are at least 1 star)
  - max: 5 (Ensures ratings are at most 5 stars)
Column 3:
- Name: sentiment
- Type: text
- Properties:
  - Value: positive, negative, neutral (Specifies allowed categorical values)

4. Set Row Count

Specify how many rows you want the dataset to contain. The generator will create this many entries based on your schema. Click Next

5. Define Column Descriptions

Define the details for each column you have provided. This will give our generator all the information for each column to create a rich dataset that you desire

6. Generate the Dataset

Click Next to preview the schema and example values.
Review and make adjustments if needed.
Click Create to generate the full dataset.

Once complete, the dataset is saved and ready for exploration or use in downstream tasks.

What’s Next?

Once your synthetic dataset is created, you can:

Explore the Data: Click on the dataset name to view the generated rows and columns.
Use in Experiments: Integrate your dataset into Experimentation Workflows.
Add Annotations: Enhance the dataset with Annotations

Introduction

Evaluation

Simulations

Knowledge Base

Dataset

Prototype

Observe

Tracing

Optimization

Prompt Workbench

Protect

MCP

Admin & Settings

FAQs

Create Synthetic Data

1. Open the Tool

2. Set Dataset Details

3. Define the Schema

3.1 Example Schema Definition

4. Set Row Count

5. Define Column Descriptions

6. Generate the Dataset

What’s Next?

Introduction

Evaluation

Simulations

Knowledge Base

Dataset

Prototype

Observe

Tracing

Optimization

Prompt Workbench

Protect

MCP

Admin & Settings

FAQs

​1. Open the Tool

​2. Set Dataset Details

​3. Define the Schema

​3.1 Example Schema Definition

​4. Set Row Count

​5. Define Column Descriptions

​6. Generate the Dataset

​What’s Next?

1. Open the Tool

2. Set Dataset Details

3. Define the Schema

3.1 Example Schema Definition

4. Set Row Count

5. Define Column Descriptions

6. Generate the Dataset

What’s Next?