Observability
Monitor Prism Gateway with logs, metrics, and distributed tracing.
About
Prism logs every request and response, exports metrics to Prometheus and OpenTelemetry, and propagates trace IDs for distributed tracing. No additional setup needed for basic logging - it’s on by default.
Request logging
Every request through Prism is logged with:
- Request ID, trace ID, session ID
- Model requested and model actually used
- Provider that handled the request
- Input/output token counts
- Cost
- Latency
- Cache status (hit/miss/skip)
- Guardrail results
- Any errors or fallback events
Logs sync to the Future AGI dashboard automatically. View them in Prism > Logs.
Log levels
| Level | What’s logged |
|---|---|
error | Failed requests, provider errors, guardrail blocks |
warn | Fallbacks, slow requests, budget warnings |
info | Every request (default) |
debug | Full request/response bodies, header details |
For self-hosted deployments, set the log level in config.yaml:
logging:
level: info
Distributed tracing
Prism propagates trace IDs across the request lifecycle. Set x-prism-trace-id on incoming requests and the same ID appears in all downstream provider calls and logs.
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Hello"}],
trace_id="trace-from-my-app-abc123",
user_id="user-42",
)
print(response.prism.trace_id) # trace-from-my-app-abc123
print(response.prism.provider) # openai
print(response.prism.latency_ms) # 342
print(response.prism.cost) # 0.00015 raw = client.chat.completions.with_raw_response.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Hello"}],
extra_headers={
"x-prism-trace-id": "trace-from-my-app-abc123",
"x-prism-user-id": "user-42",
},
)
print(raw.headers.get("x-prism-trace-id"))
print(raw.headers.get("x-prism-cost")) curl -i https://gateway.futureagi.com/v1/chat/completions \
-H "Authorization: Bearer sk-prism-your-key" \
-H "x-prism-trace-id: trace-from-my-app-abc123" \
-H "x-prism-user-id: user-42" \
-H "Content-Type: application/json" \
-d '{"model": "gpt-4o", "messages": [{"role": "user", "content": "Hello"}]}'
# Look for x-prism-trace-id in response headers If you don’t set a trace ID, Prism generates one automatically. Use it for correlating gateway logs with your application logs.
OpenTelemetry integration
Self-hosted deployments can export traces to any OpenTelemetry-compatible backend:
telemetry:
traces:
enabled: true
exporter: otlp
endpoint: "http://otel-collector:4317"
service_name: "prism-gateway"
Metrics
Prism exports Prometheus metrics on the /-/metrics endpoint.
Available metrics
| Metric | Type | Description |
|---|---|---|
prism_requests_total | Counter | Total requests by model, provider, status code |
prism_request_duration_seconds | Histogram | Request latency distribution |
prism_tokens_total | Counter | Total tokens (input + output) by model |
prism_cost_total | Counter | Total cost in USD by model and provider |
prism_cache_hits_total | Counter | Cache hits by strategy (exact/semantic) |
prism_cache_misses_total | Counter | Cache misses |
prism_provider_errors_total | Counter | Provider errors by provider and error code |
prism_circuit_breaker_state | Gauge | Circuit breaker state (0=closed, 1=open, 2=half-open) |
prism_rate_limit_exceeded_total | Counter | Rate limit rejections by key |
prism_guardrail_triggered_total | Counter | Guardrail triggers by guardrail name and action |
Scrape configuration
# prometheus.yml
scrape_configs:
- job_name: "prism-gateway"
scrape_interval: 15s
metrics_path: "/-/metrics"
static_configs:
- targets: ["prism-gateway:8080"]
Self-hosted metrics config
telemetry:
metrics:
enabled: true
prometheus:
enabled: true
path: "/-/metrics"
Session tracking
Group related requests into sessions for conversation-level analytics. Set x-prism-session-id on each request in a conversation:
session_id = "user-123-conversation-456"
messages = []
# Each turn in the conversation shares the same session_id
messages.append({"role": "user", "content": "What's the capital of France?"})
response = client.chat.completions.create(
model="gpt-4o",
messages=messages,
session_id=session_id,
user_id="user-123",
)
messages.append({"role": "assistant", "content": response.choices[0].message.content})
messages.append({"role": "user", "content": "What's its population?"})
response = client.chat.completions.create(
model="gpt-4o",
messages=messages,
session_id=session_id,
user_id="user-123",
)
Sessions appear in the dashboard under Prism > Sessions and show:
- Total requests in the session
- Cumulative cost
- Models and providers used
- Timeline of requests
Alerting
Configure alerts to get notified about issues. See Cost tracking > Budget alerts for alert configuration.
| Event | When it fires |
|---|---|
| Budget threshold crossed | Spend exceeds configured percentage |
| Error rate spike | Error rate exceeds threshold over a time window |
| Latency spike | P95 latency exceeds threshold |
| Guardrail triggered | A guardrail blocks or flags a request |