Generate Synthetic Data

Generate synthetic datasets with Future AGI. Define schemas, column types, and constraints to create realistic data for training and evaluation.

What is it?

Dataset is Future AGI’s data management product. The synthetic data generation feature lets you create realistic, structured datasets from scratch — without collecting or exposing real user data. You define the schema, column types, and constraints; the platform generates rows that match your specification. Use it to build training sets, test edge cases, prototype AI pipelines, or create evaluation datasets when real data is unavailable or restricted.


Open the Tool

Navigate to the Dataset section in the sidebar. Click Add DatasetCreate Synthetic Data. tool

This opens the interface where you’ll define the structure and patterns for your synthetic dataset.

Set Dataset Details

Provide the basic metadata for your dataset: set dataset

  • Name (required) — A clear, descriptive title for the dataset.
  • Description (required) — What the dataset is for and how it will be used.
  • Use Case — The intended application, e.g. “Simulated customer support logs for LLM fine-tuning”.
  • Pattern (optional) — Structural or stylistic rules, e.g. “Follow a conversational pattern” or “Keep tone formal”.

Define the Schema

Click Add Column to define the structure of each row. For every column: properties

  • Name — e.g. message, label, transcript
  • Typetext, float, integer, boolean, array, json, or datetime
  • Properties — Add constraints (min/max, string patterns) and specify categorical values or leave dynamic for the generator to decide.

Example schema for a product reviews dataset:

ColumnTypeProperties
review_texttextNone — freeform content
ratingintegermin: 1, max: 5
sentimenttextValues: positive, negative, neutral

Define Column Descriptions

Add a description for each column you defined. This gives the generator the context it needs to produce rich, relevant data for each field. define

Generate the Dataset

Review the schema and example values in the preview. Make any adjustments needed, then click Create to generate the full dataset.

Once complete, the dataset is saved and ready to explore or use in downstream tasks.

Was this page helpful?

Questions & Discussion