Synthetic Data Generation from Column Schemas
Define column schemas with types and categorical distributions, then generate structured test datasets from the Future AGI dashboard. No code required.
Synthetic Data Generation lets you define a column schema with types, constraints, and categorical distributions, then generate structured test datasets directly from the FutureAGI dashboard. Review, iterate, and run quality evals on the output — all without writing code.
| Time | Difficulty | Package |
|---|---|---|
| 10 min | Beginner | Dashboard only |
- FutureAGI account → app.futureagi.com
Tutorial
Start the synthetic data wizard
- Go to app.futureagi.com, then Dataset, then Add Dataset
- Select Create Synthetic Data
Add details
| Field | Value |
|---|---|
| Name | support-qa-synthetic |
| Description | Customer support Q&A pairs for an e-commerce company covering returns, shipping, billing, and account issues |
| Objective | Fine-tuning a support chatbot |
| Pattern | Questions phrased naturally as a customer would ask. Answers professional, concise, and actionable. |
| Enter No. of rows | 20 |
Select knowledge base (optional): If you have a Knowledge Base with your product docs, select it here to ground the generated data in your domain. The generator will use your KB documents as context and produce Q&A pairs that are verifiable against your actual content. Leave empty to generate without domain grounding.
To set up a KB first, see the Knowledge Base cookbook. You can also start directly from the KB detail view; click Create Synthetic data in the action bar, and the wizard opens with your KB pre-selected.
Click Next.
Add column properties
Add three columns using the Add columns button:
Column 1: question
- Column Type: Text
- Properties:
Min Length=20,Max Length=200
Column 2: answer
- Column Type: Text
- Properties:
Min Length=50,Max Length=500
Column 3: category
- Column Type: Text
- Properties: Set Value to
Categoricalwith:shipping— 25%billing— 25%returns— 25%account— 25%
Note
Category percentages must sum to 100%. Use Add more properties to add constraints per column. See Dataset overview for all supported column types and properties.
Click Next.
Add description
Write a description for each column. Use {{column_name}} to reference other columns — this creates dependencies so generated values are contextually related.
Column 1: question
A realistic customer support question about {{category}} issues.
Phrased as a real customer would type it in a chat widget.Column 2: answer
A professional support response to {{question}} about {{category}}.
Directly addresses the concern with a clear next step.Column 3: category
The support category this Q&A pair belongs to. Generate
Click Create Dataset. The platform generates rows server-side and redirects you to the new dataset.
Review and iterate
- Sort/filter rows to inspect quality
- To re-generate or modify: click Configure Synthetic Data in the dataset toolbar. Synthetic Data Details drawer opens.
- Re-Generate same Configuration: retry with same settings
- Edit Configuration: modify and choose:
- Replace the current dataset: overwrite with new rows
- Create as new dataset: keep original, generate a separate dataset
- Add it to existing dataset: append new rows
Run evals on the generated data
- Click Evaluate in the dataset toolbar
- Add Evaluations → select
completeness - Map keys:
output→answer,input→question - Add & Run
Scores appear as a new column. Filter out low-quality rows before using the dataset for fine-tuning.
For batch evaluation via SDK, see Dataset SDK: Batch Evaluation.
What you built
You can now generate synthetic datasets with categorical distribution, iterate on the output, and run quality evals from the FutureAGI dashboard.
- Generated 20 synthetic Q&A rows with categorical distribution across support topics
- Used
{{column_name}}references to create interdependent columns - Reviewed and iterated on generation via the Configure Synthetic Data drawer
- Ran quality evals on the generated data
Next steps
Questions & Discussion