Evaluation Groups
Organize multiple evaluations into groups and run them together across datasets, simulations, and more.
What it is
Evaluation groups are reusable bundles of multiple eval templates that can be applied together across datasets, simulations, experiments, and prompts. Instead of configuring evals individually each time, a group defines a consistent eval set once and can be mapped wherever needed — keeping quality checks uniform across the platform.
Use cases
- Batch execution — Run several evals at once (e.g. tone, task completion, safety) instead of adding each eval separately.
- Consistent configuration — Apply the same group to multiple datasets or run tests with one mapping step so settings stay aligned.
- Reusability — Save a group and reuse it for future runs or in experiments and prompt workbench.
- Organization — Keep related evals (e.g. “quality bundle”, “safety checks”) in one place for easier management.
How to
Open eval groups in Evaluation, create a new group, select the evals, add name and description and save. After that you can apply the group to a dataset, simulation, experiment, or prompt. You can edit the group later to add or remove templates.
Open eval groups in Evaluation
Go to Evaluation in the platform and open Eval groups (or Groups). You’ll see your existing eval groups and an option to create a new one.

Create a new group
Click Create (or New group). The create-group flow opens.
Select evals
Choose the eval templates to include in the group. You can add multiple templates and mix built-in and custom evals. Select all the evals you want to run together in this group.

Add name and description, then save
Enter a name and optional description for the group, then save. The group appears in your eval groups list and is ready to apply.

Apply the group
Open the context where you want to run evals (e.g. a dataset, a run test in Simulate, an experiment, or a prompt in Prompt Workbench). Choose Apply eval group (or equivalent) and select your group. Set the mapping (e.g. which dataset columns or fields map to each eval’s input/output). Optionally pick a model and other filters (e.g. error localization) if the UI offers them. You can deselect specific evals from the group if you don’t want to run all of them in this run. Apply; the platform creates the individual eval configs from the group and runs them together.

Edit or reuse the group
You can edit an existing group to add or remove eval templates. The same group can be reused on other datasets, run tests, or experiments—just apply it again and adjust the mapping for that context.
Tip
You can mix built-in and custom evaluations in the same group to build a single assessment workflow.
Tip
Eval groups can be used across the platform: on datasets, Prompt Workbench, Simulation (run tests), Experiments, and elsewhere eval groups are supported.
What you can do next
Evaluate via Platform & SDK
Run a single eval from the UI or SDK.
Create custom evals
Define your own eval rules and criteria.
Use custom models
Bring your own model for evaluations.
Future AGI models
Built-in models available for evals.
CI/CD pipeline
Run evals automatically in your pipeline.
Evaluation overview
How evaluation fits into the platform.