Alerts and monitors

Define monitors on Observe project metrics (system or evaluation) and get notified by email or Slack when values cross a threshold.

What it is

Alerts and monitors are Observe’s way to get notified when a project metric crosses a threshold—so regressions in error rate, latency, cost, or evaluation quality can trigger email or Slack instead of someone watching the dashboard. Monitors cover system metrics (errors, response time, token usage) and evaluation metrics (e.g. toxicity, bias). Each monitor evaluates on a schedule and, when the threshold is breached, creates a critical or warning alert and sends notifications. Alert history is stored so past triggers can be reviewed and marked resolved; monitors can be muted without being deleted.

Use cases

  • Error and reliability — Alert when error rate, LLM API failure rate, or error-free session rate crosses a threshold so you catch outages or degradation early.
  • Latency and performance — Monitor span or LLM response time and get notified when p95 or average exceeds a limit.
  • Cost and usage — Track token usage or daily/monthly tokens spent and alert when spend crosses a budget threshold.
  • Evaluation quality — Monitor an eval (e.g. fail rate for a pass/fail eval, or a numeric score) and alert when quality drops below or goes above a value.
  • Notifications — Send alerts to up to five email addresses and/or a Slack webhook so the right people are informed without checking the UI.

How to

Choose the metric

Create a monitor for an Observe project and select the metric type: Choose the metric

  • System metrics — e.g. count of errors, error-free session rates, LLM API failure rates, span response time, LLM response time, token usage, daily/monthly tokens spent.
  • Evaluation metrics — Attach a CustomEvalConfig (eval) for that project. For pass/fail or choice evals you can set threshold_metric_value to the specific value to monitor (e.g. fail rate or a choice label).

The monitor is scoped to one project (Observe projects only).

Define the threshold

Set how the alert is triggered: Define the threshold

  • threshold_operatorGreater than or Less than (the current metric value is compared to the threshold).
  • threshold_type — How the threshold is determined:
    • Static — You set fixed critical_threshold_value and optionally warning_threshold_value. Alert fires when the metric is greater than (or less than) these values.
    • Percentage change — Threshold is based on percentage change from a baseline (e.g. historical mean over a time window). You set critical_threshold_value and optionally warning_threshold_value as percentage values. auto_threshold_time_window (default one week, in minutes) defines the window used to compute the baseline.

When the condition is met, the system creates an alert log (critical or warning) and triggers notifications.

Set alert frequency

alert_frequency is how often the monitor is evaluated, in minutes (minimum 5, default 60). The monitor runs on this schedule and checks the metric over the relevant time window; if the threshold is breached, an alert is created and notifications are sent.

Configure notifications

  • Email — Add up to five addresses in notification_emails. They receive an email when an alert is triggered (subject and body include alert name, message, and type).
  • Slack — Set slack_webhook_url to your Slack incoming webhook. Optional slack_notes are included in the message. Configure notifications You can use email only, Slack only, or both. Mute a monitor with is_mute to stop notifications without deleting it.

View and resolve alerts

Alert history is stored as UserAlertMonitorLog records (critical/warning, message, time window, link). You can list logs for a monitor, see when each alert fired, and mark them resolved. Use the monitor detail view in the UI to see trend data and unresolved count.

Note

Monitors are only available for projects with trace_type observe. Optional filters (same structure as eval-task filters) can narrow which spans are included when computing the metric.


What you can do next

Was this page helpful?

Questions & Discussion