Eval Task Aggregations

Aggregate eval-task results as per-eval rollups, per-span pivots, or both.

GET https://api.futureagi.com/tracer/eval-task/get_usage/

Authentication

X-Api-Key API Key Required

Your Future AGI API key used to authenticate requests. You can find and manage your API keys in the Dashboard under Settings.

X-Secret-Key Secret Key Required

Your Future AGI secret key, used alongside the API key for request authentication. This is generated when you create an API key in the Dashboard.

Query parameters

eval_task_id UUID Required

The eval task whose runs should be aggregated.

eval_aggregation boolean Optional

When true, the response includes the eval_aggregation object — one rollup per CustomEvalConfig that ran in the task, keyed by eval name. Defaults to false. At least one of eval_aggregation or span_aggregation must be true.

span_aggregation boolean Optional

When true, the response includes the span_aggregation object — one entry per span the task evaluated, keyed by span_id, with the raw value of every eval that touched it. Defaults to false. At least one of eval_aggregation or span_aggregation must be true.

start_date ISO-8601 datetime Optional

Inclusive lower bound on the span’s created_at — only eval runs whose linked span was created at or after this instant are aggregated. When omitted, no lower bound is applied.

end_date ISO-8601 datetime Optional

Inclusive upper bound on the span’s created_at — only eval runs whose linked span was created at or before this instant are aggregated. When omitted, no upper bound is applied.

Response

200 OK
eval_task_id string
UUID of the eval task that was aggregated. Echoed back from the request.
eval_aggregation object

Per-eval rollup. Present only when eval_aggregation=true. Keys are CustomEvalConfig names; values are one rollup object per eval.

id string
UUID of the eval config.
name string
Eval config name (same as the parent key).
output_type string
Normalised output type for the eval: percentage, pass_fail, or deterministic. Drives the shape of aggregated_score.
aggregated_score number | object | null

The eval-level rollup. Shape depends on output_type:
percentagenumber (4-dp average across non-error runs, e.g. 0.7421).
pass_failnumber (pass rate as 0–100 with 2 dp, e.g. 87.5).
deterministicobject mapping each observed choice to its occurrence percentage 0–100 with 2 dp, e.g. {"positive": 62.5, "neutral": 25.0}. Only choices that actually appeared in the data are included.
null when no aggregatable rows exist (all errors / empty).

span_aggregation object

Per-span pivot. Present only when span_aggregation=true. Outer keys are span_id (one per span the task evaluated); inner keys are eval names; inner values are one entry per eval that touched the span.

id string
UUID of the eval config.
name string
Eval config name.
output_type string
Normalised output type for the eval: percentage, pass_fail, or deterministic. Drives the shape of value.
value number | boolean | array | null

The raw per-row eval result — no averaging. Shape depends on output_type:
percentagenumber (e.g. 0.82).
pass_failboolean.
deterministicarray of choice strings (e.g. ["positive"]).
When the same (span, eval) pair has multiple runs (re-runs), the latest by created_at wins.

Note

Soft-deleted eval runs are skipped in both aggregations so the rollups reflect the user’s current view of the data.

Both eval_aggregation and span_aggregation only include span-linked eval runs — session-target eval runs (where there is no underlying span) are excluded from both rollups, regardless of whether a date range is supplied.

start_date and end_date filter on the span’s creation time (observation_span.created_at), not on when the eval ran. The aggregation results therefore reflect only those spans that were created in the supplied window — eval runs against spans created outside the window are dropped from both rollups. When neither parameter is supplied, every span linked to the eval task is included.

Errors

400 Bad Request Optional

eval_task_id is missing, or no eval task with that ID exists in the caller’s organization.

401 Unauthorized Optional

Invalid or missing API credentials.

500 Internal Server Error Optional

Unexpected server error.

GET /
Authentication
REQUEST
 
RESPONSE