Request & response headers

Reference for all x-prism-* request headers and response headers returned by the Prism AI Gateway.

About

Prism reads x-prism-* request headers to control per-request behavior (caching, sessions, routing) and writes x-prism-* response headers to report what happened (which provider, latency, cost, cache status).

The Prism SDK handles these automatically. If you’re using the OpenAI SDK or cURL, set them manually or use create_headers() to generate them.

Request headers

Tracking and correlation

Header	Value	Description
`x-prism-trace-id`	string	Custom trace ID for distributed tracing. If omitted, the gateway generates one.
`x-prism-session-id`	string	Group related requests into a logical session for analytics.
`x-prism-session-name`	string	Human-readable label for the session (used alongside `session-id`).
`x-prism-session-path`	string	Hierarchical path within a session, e.g. `/search/rerank`.
`x-prism-request-id`	string	Client-generated request ID for idempotency and log correlation.
`x-prism-user-id`	string	User identifier for per-user tracking, budgets, and analytics.

Metadata and properties

Header	Value	Description
`x-prism-metadata`	JSON string	Arbitrary key-value pairs for cost attribution and filtering. Example: `{"team":"ml","env":"prod"}`
`x-prism-property-{key}`	string	Individual key-value properties. `x-prism-property-env: prod` is equivalent to including `"env":"prod"` in metadata.

Cache control

Header	Value	Description
`x-prism-cache-ttl`	integer (seconds)	Override the cache TTL for this request.
`x-prism-cache-namespace`	string	Route to a specific cache namespace for isolation (e.g. `prod`, `staging`).
`x-prism-cache-force-refresh`	`true`	Bypass cache, fetch a fresh response from the provider, and update the cache with the new result.
`Cache-Control`	`no-store`	Disable caching entirely for this request. The response is not read from or written to cache.

Routing control

Header	Value	Description
`x-prism-provider-lock`	string	Force this request to a specific provider, bypassing the routing strategy. Example: `openai`.
`x-prism-complexity-override`	string	Override complexity-based routing tier. Pass the tier name (e.g. `simple`, `moderate`, `complex`).

Guardrails

Header	Value	Description
`x-prism-guardrail-policy`	string	Comma-separated list of guardrail policy IDs to apply to this request. Overrides org-level guardrail config.

Gateway config (full override)

Header	Value	Description
`x-prism-config`	JSON string	Full `GatewayConfig` serialized as JSON. Overrides all per-request settings (cache, retry, fallback, guardrails, routing, timeouts). The Prism SDK’s `GatewayConfig.to_headers()` generates this automatically.
`x-prism-request-timeout`	integer (ms)	Total request timeout in milliseconds. Also set automatically when using `TimeoutConfig.total` in the SDK. The gateway echoes the applied timeout back as `x-prism-timeout-ms` in the response.

Response headers

Always present

Header	Example	Description
`x-prism-request-id`	`req-a1b2c3`	Unique identifier for this request. Use this when filing support tickets or searching logs.
`x-prism-trace-id`	`trace-x7y8z9`	Trace ID for distributed tracing. Matches the request header if one was sent.
`x-prism-provider`	`openai`	Which provider served this request.
`x-prism-model-used`	`gpt-4o-2024-08-06`	Actual model returned by the provider. May differ from the requested model if routing redirected the request.
`x-prism-latency-ms`	`342`	Total gateway latency in milliseconds, including the provider call.
`x-prism-timeout-ms`	`30000`	Timeout that was applied to this request.

Conditional

Header	Present when	Value
`x-prism-cost`	Model has pricing data	Estimated cost in USD (e.g. `0.00234`). Returns `0` on exact cache hits.
`x-prism-cache`	Caching is enabled	`hit`, `hit_exact`, `hit_semantic`, `miss`, or `skip`
`x-prism-guardrail-triggered`	A guardrail fired	`true`
`x-prism-fallback-used`	A provider fallback occurred	`true`
`x-prism-routing-strategy`	A routing policy is active	Strategy name: `round-robin`, `weighted`, `least-latency`, `cost-optimized`, `adaptive`, `fastest`
`x-prism-credits-remaining`	Managed key with credit balance	Remaining USD balance (e.g. `12.50`)

Rate limit headers

Present when rate limiting is enabled for the key or org.

Header	Description
`x-ratelimit-limit-requests`	Maximum requests allowed per minute
`x-ratelimit-remaining-requests`	Requests remaining in the current window
`x-ratelimit-reset-requests`	Unix timestamp when the window resets

Reading headers

Prism SDK

Every response from the Prism SDK has a .prism attribute with typed access to all gateway metadata:

from prism import Prism

client = Prism(
    api_key="sk-prism-your-key",
    base_url="https://gateway.futureagi.com",
)

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello"}],
)

print(response.choices[0].message.content)
print(response.prism.provider)            # openai
print(response.prism.latency_ms)          # 342
print(response.prism.cost)                # 0.00015
print(response.prism.cache_status)        # miss
print(response.prism.model_used)          # gpt-4o-2024-08-06
print(response.prism.request_id)          # req-a1b2c3
print(response.prism.trace_id)            # trace-x7y8z9
print(response.prism.guardrail_triggered) # False
print(response.prism.fallback_used)       # False
print(response.prism.routing_strategy)    # None (or "weighted", etc.)

# Rate limit info (when enabled)
if response.prism.ratelimit:
    print(response.prism.ratelimit.limit)
    print(response.prism.ratelimit.remaining)
    print(response.prism.ratelimit.reset)

OpenAI SDK

The OpenAI SDK doesn’t have response.prism. Use with_raw_response to read headers:

from openai import OpenAI

client = OpenAI(
    base_url="https://gateway.futureagi.com/v1",
    api_key="sk-prism-your-key",
)

raw = client.chat.completions.with_raw_response.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello"}],
)

print(raw.headers.get("x-prism-provider"))
print(raw.headers.get("x-prism-cost"))

response = raw.parse()

cURL

Use the -i flag to include response headers in the output:

curl -i -X POST https://gateway.futureagi.com/v1/chat/completions \
  -H "Authorization: Bearer sk-prism-your-key" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o",
    "messages": [{"role": "user", "content": "Hello"}]
  }'

Setting request headers

Prism SDK

The SDK accepts tracking parameters directly on each create() call:

from prism import Prism

client = Prism(
    api_key="sk-prism-your-key",
    base_url="https://gateway.futureagi.com",
)

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello"}],
    session_id="sess-abc",
    trace_id="trace-123",
    user_id="user-42",
    request_metadata={"team": "ml", "feature": "search"},
    properties={"env": "prod"},
)

For gateway config, pass a GatewayConfig to the client constructor (applies to all requests) or override per-request with extra_headers:

from prism import Prism, GatewayConfig, CacheConfig, RetryConfig

# Client-level config (applies to all requests)
client = Prism(
    api_key="sk-prism-your-key",
    base_url="https://gateway.futureagi.com",
    config=GatewayConfig(
        cache=CacheConfig(ttl=300, namespace="prod"),
        retry=RetryConfig(max_retries=3),
    ),
)

# Per-request override
override = GatewayConfig(cache=CacheConfig(force_refresh=True))
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello"}],
    extra_headers=override.to_headers(),
)

OpenAI SDK with create_headers()

Use create_headers() to generate all x-prism-* headers for the OpenAI SDK:

from openai import OpenAI
from prism import create_headers, GatewayConfig, CacheConfig

headers = create_headers(
    config=GatewayConfig(cache=CacheConfig(strategy="semantic", ttl=600)),
    trace_id="trace-abc",
    session_id="sess-123",
    user_id="user-42",
    metadata={"team": "ml", "env": "production"},
)

client = OpenAI(
    api_key="sk-prism-your-key",
    base_url="https://gateway.futureagi.com/v1",
    default_headers=headers,
)

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello"}],
)

cURL

Pass headers with -H:

curl -X POST https://gateway.futureagi.com/v1/chat/completions \
  -H "Authorization: Bearer sk-prism-your-key" \
  -H "x-prism-session-id: sess-abc" \
  -H "x-prism-trace-id: trace-123" \
  -H "x-prism-user-id: user-42" \
  -H "x-prism-metadata: {\"team\":\"ml\",\"env\":\"prod\"}" \
  -H "x-prism-cache-ttl: 300" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o",
    "messages": [{"role": "user", "content": "Hello"}]
  }'

Request & response headers

About

Request headers

Tracking and correlation

Metadata and properties

Cache control

Routing control

Guardrails

Gateway config (full override)

Response headers

Always present

Conditional

Rate limit headers

Reading headers

Prism SDK

OpenAI SDK

cURL

Setting request headers

Prism SDK

OpenAI SDK with create_headers()

cURL

Next Steps

Chat completions

Caching

Configuration

Cost tracking

Questions & Discussion

FutureAGI AI Assistant

About

Request headers

Tracking and correlation

Metadata and properties

Cache control

Routing control

Guardrails

Gateway config (full override)

Response headers

Always present

Conditional

Rate limit headers

Reading headers

Prism SDK

OpenAI SDK

cURL

Setting request headers

Prism SDK

OpenAI SDK with create_headers()

cURL

Next Steps

Chat completions

Caching

Configuration

Cost tracking

Questions & Discussion