Evaluation Groups

Organize multiple evaluations into groups and run them together across datasets, simulations, and more.

What it is

Evaluation groups are reusable bundles of multiple eval templates that can be applied together across datasets, simulations, experiments, and prompts. Instead of configuring evals individually each time, a group defines a consistent eval set once and can be mapped wherever needed — keeping quality checks uniform across the platform.


Use cases

  • Batch execution — Run several evals at once (e.g. tone, task completion, safety) instead of adding each eval separately.
  • Consistent configuration — Apply the same group to multiple datasets or run tests with one mapping step so settings stay aligned.
  • Reusability — Save a group and reuse it for future runs or in experiments and prompt workbench.
  • Organization — Keep related evals (e.g. “quality bundle”, “safety checks”) in one place for easier management.

How to

Open eval groups in Evaluation, create a new group, select the evals, add name and description and save. After that you can apply the group to a dataset, simulation, experiment, or prompt. You can edit the group later to add or remove templates.

Open eval groups in Evaluation

Go to Evaluation in the platform and open Eval groups (or Groups). You’ll see your existing eval groups and an option to create a new one. Open eval groups in Evaluation

Create a new group

Click Create (or New group). The create-group flow opens.

Select evals

Choose the eval templates to include in the group. You can add multiple templates and mix built-in and custom evals. Select all the evals you want to run together in this group. Select evals

Add name and description, then save

Enter a name and optional description for the group, then save. The group appears in your eval groups list and is ready to apply. Add name and description

Apply the group

Open the context where you want to run evals (e.g. a dataset, a run test in Simulate, an experiment, or a prompt in Prompt Workbench). Choose Apply eval group (or equivalent) and select your group. Set the mapping (e.g. which dataset columns or fields map to each eval’s input/output). Optionally pick a model and other filters (e.g. error localization) if the UI offers them. You can deselect specific evals from the group if you don’t want to run all of them in this run. Apply; the platform creates the individual eval configs from the group and runs them together. Apply the group

Edit or reuse the group

You can edit an existing group to add or remove eval templates. The same group can be reused on other datasets, run tests, or experiments—just apply it again and adjust the mapping for that context.

Tip

You can mix built-in and custom evaluations in the same group to build a single assessment workflow.

Tip

Eval groups can be used across the platform: on datasets, Prompt Workbench, Simulation (run tests), Experiments, and elsewhere eval groups are supported.


What you can do next

Was this page helpful?

Questions & Discussion