Request & response headers
Reference for all x-prism-* request headers and response headers returned by the Prism AI Gateway.
About
Prism reads x-prism-* request headers to control per-request behavior (caching, sessions, routing) and writes x-prism-* response headers to report what happened (which provider, latency, cost, cache status).
The Prism SDK handles these automatically. If you’re using the OpenAI SDK or cURL, set them manually or use create_headers() to generate them.
Request headers
Tracking and correlation
| Header | Value | Description |
|---|---|---|
x-prism-trace-id | string | Custom trace ID for distributed tracing. If omitted, the gateway generates one. |
x-prism-session-id | string | Group related requests into a logical session for analytics. |
x-prism-session-name | string | Human-readable label for the session (used alongside session-id). |
x-prism-session-path | string | Hierarchical path within a session, e.g. /search/rerank. |
x-prism-request-id | string | Client-generated request ID for idempotency and log correlation. |
x-prism-user-id | string | User identifier for per-user tracking, budgets, and analytics. |
Metadata and properties
| Header | Value | Description |
|---|---|---|
x-prism-metadata | JSON string | Arbitrary key-value pairs for cost attribution and filtering. Example: {"team":"ml","env":"prod"} |
x-prism-property-{key} | string | Individual key-value properties. x-prism-property-env: prod is equivalent to including "env":"prod" in metadata. |
Cache control
| Header | Value | Description |
|---|---|---|
x-prism-cache-ttl | integer (seconds) | Override the cache TTL for this request. |
x-prism-cache-namespace | string | Route to a specific cache namespace for isolation (e.g. prod, staging). |
x-prism-cache-force-refresh | true | Bypass cache, fetch a fresh response from the provider, and update the cache with the new result. |
Cache-Control | no-store | Disable caching entirely for this request. The response is not read from or written to cache. |
Routing control
| Header | Value | Description |
|---|---|---|
x-prism-provider-lock | string | Force this request to a specific provider, bypassing the routing strategy. Example: openai. |
x-prism-complexity-override | string | Override complexity-based routing tier. Pass the tier name (e.g. simple, moderate, complex). |
Guardrails
| Header | Value | Description |
|---|---|---|
x-prism-guardrail-policy | string | Comma-separated list of guardrail policy IDs to apply to this request. Overrides org-level guardrail config. |
Gateway config (full override)
| Header | Value | Description |
|---|---|---|
x-prism-config | JSON string | Full GatewayConfig serialized as JSON. Overrides all per-request settings (cache, retry, fallback, guardrails, routing, timeouts). The Prism SDK’s GatewayConfig.to_headers() generates this automatically. |
x-prism-request-timeout | integer (ms) | Total request timeout in milliseconds. Also set automatically when using TimeoutConfig.total in the SDK. The gateway echoes the applied timeout back as x-prism-timeout-ms in the response. |
Response headers
Always present
| Header | Example | Description |
|---|---|---|
x-prism-request-id | req-a1b2c3 | Unique identifier for this request. Use this when filing support tickets or searching logs. |
x-prism-trace-id | trace-x7y8z9 | Trace ID for distributed tracing. Matches the request header if one was sent. |
x-prism-provider | openai | Which provider served this request. |
x-prism-model-used | gpt-4o-2024-08-06 | Actual model returned by the provider. May differ from the requested model if routing redirected the request. |
x-prism-latency-ms | 342 | Total gateway latency in milliseconds, including the provider call. |
x-prism-timeout-ms | 30000 | Timeout that was applied to this request. |
Conditional
| Header | Present when | Value |
|---|---|---|
x-prism-cost | Model has pricing data | Estimated cost in USD (e.g. 0.00234). Returns 0 on exact cache hits. |
x-prism-cache | Caching is enabled | hit, hit_exact, hit_semantic, miss, or skip |
x-prism-guardrail-triggered | A guardrail fired | true |
x-prism-fallback-used | A provider fallback occurred | true |
x-prism-routing-strategy | A routing policy is active | Strategy name: round-robin, weighted, least-latency, cost-optimized, adaptive, fastest |
x-prism-credits-remaining | Managed key with credit balance | Remaining USD balance (e.g. 12.50) |
Rate limit headers
Present when rate limiting is enabled for the key or org.
| Header | Description |
|---|---|
x-ratelimit-limit-requests | Maximum requests allowed per minute |
x-ratelimit-remaining-requests | Requests remaining in the current window |
x-ratelimit-reset-requests | Unix timestamp when the window resets |
Reading headers
Prism SDK
Every response from the Prism SDK has a .prism attribute with typed access to all gateway metadata:
from prism import Prism
client = Prism(
api_key="sk-prism-your-key",
base_url="https://gateway.futureagi.com",
)
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Hello"}],
)
print(response.choices[0].message.content)
print(response.prism.provider) # openai
print(response.prism.latency_ms) # 342
print(response.prism.cost) # 0.00015
print(response.prism.cache_status) # miss
print(response.prism.model_used) # gpt-4o-2024-08-06
print(response.prism.request_id) # req-a1b2c3
print(response.prism.trace_id) # trace-x7y8z9
print(response.prism.guardrail_triggered) # False
print(response.prism.fallback_used) # False
print(response.prism.routing_strategy) # None (or "weighted", etc.)
# Rate limit info (when enabled)
if response.prism.ratelimit:
print(response.prism.ratelimit.limit)
print(response.prism.ratelimit.remaining)
print(response.prism.ratelimit.reset)
OpenAI SDK
The OpenAI SDK doesn’t have response.prism. Use with_raw_response to read headers:
from openai import OpenAI
client = OpenAI(
base_url="https://gateway.futureagi.com/v1",
api_key="sk-prism-your-key",
)
raw = client.chat.completions.with_raw_response.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Hello"}],
)
print(raw.headers.get("x-prism-provider"))
print(raw.headers.get("x-prism-cost"))
response = raw.parse()
cURL
Use the -i flag to include response headers in the output:
curl -i -X POST https://gateway.futureagi.com/v1/chat/completions \
-H "Authorization: Bearer sk-prism-your-key" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4o",
"messages": [{"role": "user", "content": "Hello"}]
}'
Setting request headers
Prism SDK
The SDK accepts tracking parameters directly on each create() call:
from prism import Prism
client = Prism(
api_key="sk-prism-your-key",
base_url="https://gateway.futureagi.com",
)
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Hello"}],
session_id="sess-abc",
trace_id="trace-123",
user_id="user-42",
request_metadata={"team": "ml", "feature": "search"},
properties={"env": "prod"},
)
For gateway config, pass a GatewayConfig to the client constructor (applies to all requests) or override per-request with extra_headers:
from prism import Prism, GatewayConfig, CacheConfig, RetryConfig
# Client-level config (applies to all requests)
client = Prism(
api_key="sk-prism-your-key",
base_url="https://gateway.futureagi.com",
config=GatewayConfig(
cache=CacheConfig(ttl=300, namespace="prod"),
retry=RetryConfig(max_retries=3),
),
)
# Per-request override
override = GatewayConfig(cache=CacheConfig(force_refresh=True))
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Hello"}],
extra_headers=override.to_headers(),
)
OpenAI SDK with create_headers()
Use create_headers() to generate all x-prism-* headers for the OpenAI SDK:
from openai import OpenAI
from prism import create_headers, GatewayConfig, CacheConfig
headers = create_headers(
config=GatewayConfig(cache=CacheConfig(strategy="semantic", ttl=600)),
trace_id="trace-abc",
session_id="sess-123",
user_id="user-42",
metadata={"team": "ml", "env": "production"},
)
client = OpenAI(
api_key="sk-prism-your-key",
base_url="https://gateway.futureagi.com/v1",
default_headers=headers,
)
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Hello"}],
)
cURL
Pass headers with -H:
curl -X POST https://gateway.futureagi.com/v1/chat/completions \
-H "Authorization: Bearer sk-prism-your-key" \
-H "x-prism-session-id: sess-abc" \
-H "x-prism-trace-id: trace-123" \
-H "x-prism-user-id: user-42" \
-H "x-prism-metadata: {\"team\":\"ml\",\"env\":\"prod\"}" \
-H "x-prism-cache-ttl: 300" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4o",
"messages": [{"role": "user", "content": "Hello"}]
}'