Observability

Monitor Prism Gateway with logs, metrics, and distributed tracing.

About

Prism logs every request and response, exports metrics to Prometheus and OpenTelemetry, and propagates trace IDs for distributed tracing. No additional setup needed for basic logging - it’s on by default.

Request logging

Every request through Prism is logged with:

Request ID, trace ID, session ID
Model requested and model actually used
Provider that handled the request
Input/output token counts
Cost
Latency
Cache status (hit/miss/skip)
Guardrail results
Any errors or fallback events

Logs sync to the Future AGI dashboard automatically. View them in Prism > Logs.

Log levels

Level	What’s logged
`error`	Failed requests, provider errors, guardrail blocks
`warn`	Fallbacks, slow requests, budget warnings
`info`	Every request (default)
`debug`	Full request/response bodies, header details

For self-hosted deployments, set the log level in config.yaml:

logging:
  level: info

Distributed tracing

Prism propagates trace IDs across the request lifecycle. Set x-prism-trace-id on incoming requests and the same ID appears in all downstream provider calls and logs.

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello"}],
    trace_id="trace-from-my-app-abc123",
    user_id="user-42",
)

print(response.prism.trace_id)    # trace-from-my-app-abc123
print(response.prism.provider)    # openai
print(response.prism.latency_ms)  # 342
print(response.prism.cost)        # 0.00015

raw = client.chat.completions.with_raw_response.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello"}],
    extra_headers={
        "x-prism-trace-id": "trace-from-my-app-abc123",
        "x-prism-user-id": "user-42",
    },
)
print(raw.headers.get("x-prism-trace-id"))
print(raw.headers.get("x-prism-cost"))

curl -i https://gateway.futureagi.com/v1/chat/completions \
  -H "Authorization: Bearer sk-prism-your-key" \
  -H "x-prism-trace-id: trace-from-my-app-abc123" \
  -H "x-prism-user-id: user-42" \
  -H "Content-Type: application/json" \
  -d '{"model": "gpt-4o", "messages": [{"role": "user", "content": "Hello"}]}'
# Look for x-prism-trace-id in response headers

If you don’t set a trace ID, Prism generates one automatically. Use it for correlating gateway logs with your application logs.

OpenTelemetry integration

Self-hosted deployments can export traces to any OpenTelemetry-compatible backend:

telemetry:
  traces:
    enabled: true
    exporter: otlp
    endpoint: "http://otel-collector:4317"
    service_name: "prism-gateway"

Metrics

Prism exports Prometheus metrics on the /-/metrics endpoint.

Available metrics

Metric	Type	Description
`prism_requests_total`	Counter	Total requests by model, provider, status code
`prism_request_duration_seconds`	Histogram	Request latency distribution
`prism_tokens_total`	Counter	Total tokens (input + output) by model
`prism_cost_total`	Counter	Total cost in USD by model and provider
`prism_cache_hits_total`	Counter	Cache hits by strategy (exact/semantic)
`prism_cache_misses_total`	Counter	Cache misses
`prism_provider_errors_total`	Counter	Provider errors by provider and error code
`prism_circuit_breaker_state`	Gauge	Circuit breaker state (0=closed, 1=open, 2=half-open)
`prism_rate_limit_exceeded_total`	Counter	Rate limit rejections by key
`prism_guardrail_triggered_total`	Counter	Guardrail triggers by guardrail name and action

Scrape configuration

# prometheus.yml
scrape_configs:
  - job_name: "prism-gateway"
    scrape_interval: 15s
    metrics_path: "/-/metrics"
    static_configs:
      - targets: ["prism-gateway:8080"]

Self-hosted metrics config

telemetry:
  metrics:
    enabled: true
    prometheus:
      enabled: true
      path: "/-/metrics"

Session tracking

Group related requests into sessions for conversation-level analytics. Set x-prism-session-id on each request in a conversation:

session_id = "user-123-conversation-456"
messages = []

# Each turn in the conversation shares the same session_id
messages.append({"role": "user", "content": "What's the capital of France?"})
response = client.chat.completions.create(
    model="gpt-4o",
    messages=messages,
    session_id=session_id,
    user_id="user-123",
)
messages.append({"role": "assistant", "content": response.choices[0].message.content})

messages.append({"role": "user", "content": "What's its population?"})
response = client.chat.completions.create(
    model="gpt-4o",
    messages=messages,
    session_id=session_id,
    user_id="user-123",
)

Sessions appear in the dashboard under Prism > Sessions and show:

Total requests in the session
Cumulative cost
Models and providers used
Timeline of requests

Alerting

Configure alerts to get notified about issues. See Cost tracking > Budget alerts for alert configuration.

Event	When it fires
Budget threshold crossed	Spend exceeds configured percentage
Error rate spike	Error rate exceeds threshold over a time window
Latency spike	P95 latency exceeds threshold
Guardrail triggered	A guardrail blocks or flags a request

Observability

About

Request logging

Log levels

Distributed tracing

OpenTelemetry integration

Metrics

Available metrics

Scrape configuration

Self-hosted metrics config

Session tracking

Alerting

Next Steps

Cost tracking

Request & response headers

Self-hosted deployment

Shadow experiments

Questions & Discussion

FutureAGI AI Assistant

About

Request logging

Log levels

Distributed tracing

OpenTelemetry integration

Metrics

Available metrics

Scrape configuration

Self-hosted metrics config

Session tracking

Alerting

Next Steps

Cost tracking

Request & response headers

Self-hosted deployment

Shadow experiments

Questions & Discussion