How Error Feed Works: Trace Analysis and Issue Grouping

The mental model behind Error Feed: how traces become analyzed issues, how similar errors are grouped, and how findings surface in the UI.

About

This page walks through what happens between a raw trace arriving and an issue showing up in the Feed. It’s the mental model, not the internals.

The pipeline in four steps

Traces flow in from Observe

Every trace sent to a Future AGI Observe project is a candidate. Error Feed works on a sample, configurable per project. See Sampling.

No extra instrumentation needed. If your agent is already instrumented with any of the supported integrations, it’s already sending what Error Feed needs.

Each trace is analyzed and scored

For every sampled trace, Error Feed:

  • Reads the full span tree: inputs, outputs, tool calls, LLM responses, errors, metadata
  • Checks for failures across the error taxonomy (five categories covering reasoning, safety, tool failures, workflow gaps, and reflection)
  • Scores the trace on four quality dimensions, 0–5: Factual Grounding, Privacy & Safety, Instruction Adherence, Optimal Plan Execution

Traces that pass without issues still get scored. A score isn’t a severity, it’s a quality signal.

Similar failures are grouped into clusters

When multiple traces fail in semantically similar ways (same error type, same part of the workflow), Error Feed groups them into a single issue. The cluster name describes what’s going wrong, e.g. “Hallucinated entity in product lookup”.

The number of traces in a cluster is its trace count. One cluster might represent a single noisy span seen once; another might represent a systematic failure across thousands of traces.

The point is to triage problems, not individual trace failures.

Issues appear in the Feed with analysis

Each issue in the list shows:

  • Error name and its taxonomy category
  • Severity (Critical / High / Medium / Low)
  • Status (Unresolved, Acknowledged, Resolved, Escalating)
  • Trace count (cluster size)
  • A sparkline showing whether the issue is getting worse, improving, or stable

Click an issue to open the detail view: description, root causes, evidence, agent flow, recommendations, and every trace in the cluster.

Two levels of analysis

There are two kinds of analysis, at different cost points:

Continuous scan runs automatically on every sampled trace. It produces the description, root cause, immediate fix, long-term recommendation, evidence snippets, and quality scores on the Overview tab. Always on.

Deep Analysis is on-demand. It runs a more thorough investigation on the cluster’s representative trace, producing more detailed pattern analysis and recommendations. Trigger it manually from the metadata panel when the continuous scan finds something worth digging into.

What “representative trace” means

When a cluster has many traces, Error Feed picks one to stand in for the cluster throughout the detail view. The Overview tab’s analysis, Agent Flow diagram, and Deep Analysis all run against this representative trace. The Traces tab shows every trace in the cluster, so you can jump to any specific one.

Continuous vs. sampled coverage

Error Feed doesn’t analyze 100% of traces by default. The sampling rate controls what fraction get analyzed. Lower rates are faster and cheaper but miss infrequent errors. 100% gives full coverage at higher cost.

Important: issues are formed only from traces that were actually analyzed. At 20% sampling, five identical errors in a batch of 25 traces might show up in the cluster as one occurrence — the one that got sampled.

See Sampling for per-project configuration.

Tip

Set sampling to 100% during development or testing. In production with high volume, 10–20% is a reasonable starting point.

Scores vs. errors

A trace can have a low quality score with no detected error, or a detected error with otherwise decent scores. The score measures overall quality on four dimensions; error detection flags specific failure patterns from the taxonomy. Both show up in the issue detail: scores in the metadata panel’s Evaluations section, errors on the Overview tab.

If an issue has a low Factual Grounding score but no Hallucinated Content error was flagged, that’s still worth a look — the classifier missed something the score caught.


Next steps

Was this page helpful?

Questions & Discussion