Request & response headers

Reference for all x-prism-* request headers and response headers returned by the Prism AI Gateway.

About

Prism reads x-prism-* request headers to control per-request behavior (caching, sessions, routing) and writes x-prism-* response headers to report what happened (which provider, latency, cost, cache status).

The Prism SDK handles these automatically. If you’re using the OpenAI SDK or cURL, set them manually or use create_headers() to generate them.


Request headers

Tracking and correlation

HeaderValueDescription
x-prism-trace-idstringCustom trace ID for distributed tracing. If omitted, the gateway generates one.
x-prism-session-idstringGroup related requests into a logical session for analytics.
x-prism-session-namestringHuman-readable label for the session (used alongside session-id).
x-prism-session-pathstringHierarchical path within a session, e.g. /search/rerank.
x-prism-request-idstringClient-generated request ID for idempotency and log correlation.
x-prism-user-idstringUser identifier for per-user tracking, budgets, and analytics.

Metadata and properties

HeaderValueDescription
x-prism-metadataJSON stringArbitrary key-value pairs for cost attribution and filtering. Example: {"team":"ml","env":"prod"}
x-prism-property-{key}stringIndividual key-value properties. x-prism-property-env: prod is equivalent to including "env":"prod" in metadata.

Cache control

HeaderValueDescription
x-prism-cache-ttlinteger (seconds)Override the cache TTL for this request.
x-prism-cache-namespacestringRoute to a specific cache namespace for isolation (e.g. prod, staging).
x-prism-cache-force-refreshtrueBypass cache, fetch a fresh response from the provider, and update the cache with the new result.
Cache-Controlno-storeDisable caching entirely for this request. The response is not read from or written to cache.

Routing control

HeaderValueDescription
x-prism-provider-lockstringForce this request to a specific provider, bypassing the routing strategy. Example: openai.
x-prism-complexity-overridestringOverride complexity-based routing tier. Pass the tier name (e.g. simple, moderate, complex).

Guardrails

HeaderValueDescription
x-prism-guardrail-policystringComma-separated list of guardrail policy IDs to apply to this request. Overrides org-level guardrail config.

Gateway config (full override)

HeaderValueDescription
x-prism-configJSON stringFull GatewayConfig serialized as JSON. Overrides all per-request settings (cache, retry, fallback, guardrails, routing, timeouts). The Prism SDK’s GatewayConfig.to_headers() generates this automatically.
x-prism-request-timeoutinteger (ms)Total request timeout in milliseconds. Also set automatically when using TimeoutConfig.total in the SDK. The gateway echoes the applied timeout back as x-prism-timeout-ms in the response.

Response headers

Always present

HeaderExampleDescription
x-prism-request-idreq-a1b2c3Unique identifier for this request. Use this when filing support tickets or searching logs.
x-prism-trace-idtrace-x7y8z9Trace ID for distributed tracing. Matches the request header if one was sent.
x-prism-provideropenaiWhich provider served this request.
x-prism-model-usedgpt-4o-2024-08-06Actual model returned by the provider. May differ from the requested model if routing redirected the request.
x-prism-latency-ms342Total gateway latency in milliseconds, including the provider call.
x-prism-timeout-ms30000Timeout that was applied to this request.

Conditional

HeaderPresent whenValue
x-prism-costModel has pricing dataEstimated cost in USD (e.g. 0.00234). Returns 0 on exact cache hits.
x-prism-cacheCaching is enabledhit, hit_exact, hit_semantic, miss, or skip
x-prism-guardrail-triggeredA guardrail firedtrue
x-prism-fallback-usedA provider fallback occurredtrue
x-prism-routing-strategyA routing policy is activeStrategy name: round-robin, weighted, least-latency, cost-optimized, adaptive, fastest
x-prism-credits-remainingManaged key with credit balanceRemaining USD balance (e.g. 12.50)

Rate limit headers

Present when rate limiting is enabled for the key or org.

HeaderDescription
x-ratelimit-limit-requestsMaximum requests allowed per minute
x-ratelimit-remaining-requestsRequests remaining in the current window
x-ratelimit-reset-requestsUnix timestamp when the window resets

Reading headers

Prism SDK

Every response from the Prism SDK has a .prism attribute with typed access to all gateway metadata:

from prism import Prism

client = Prism(
    api_key="sk-prism-your-key",
    base_url="https://gateway.futureagi.com",
)

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello"}],
)

print(response.choices[0].message.content)
print(response.prism.provider)            # openai
print(response.prism.latency_ms)          # 342
print(response.prism.cost)                # 0.00015
print(response.prism.cache_status)        # miss
print(response.prism.model_used)          # gpt-4o-2024-08-06
print(response.prism.request_id)          # req-a1b2c3
print(response.prism.trace_id)            # trace-x7y8z9
print(response.prism.guardrail_triggered) # False
print(response.prism.fallback_used)       # False
print(response.prism.routing_strategy)    # None (or "weighted", etc.)

# Rate limit info (when enabled)
if response.prism.ratelimit:
    print(response.prism.ratelimit.limit)
    print(response.prism.ratelimit.remaining)
    print(response.prism.ratelimit.reset)

OpenAI SDK

The OpenAI SDK doesn’t have response.prism. Use with_raw_response to read headers:

from openai import OpenAI

client = OpenAI(
    base_url="https://gateway.futureagi.com/v1",
    api_key="sk-prism-your-key",
)

raw = client.chat.completions.with_raw_response.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello"}],
)

print(raw.headers.get("x-prism-provider"))
print(raw.headers.get("x-prism-cost"))

response = raw.parse()

cURL

Use the -i flag to include response headers in the output:

curl -i -X POST https://gateway.futureagi.com/v1/chat/completions \
  -H "Authorization: Bearer sk-prism-your-key" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o",
    "messages": [{"role": "user", "content": "Hello"}]
  }'

Setting request headers

Prism SDK

The SDK accepts tracking parameters directly on each create() call:

from prism import Prism

client = Prism(
    api_key="sk-prism-your-key",
    base_url="https://gateway.futureagi.com",
)

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello"}],
    session_id="sess-abc",
    trace_id="trace-123",
    user_id="user-42",
    request_metadata={"team": "ml", "feature": "search"},
    properties={"env": "prod"},
)

For gateway config, pass a GatewayConfig to the client constructor (applies to all requests) or override per-request with extra_headers:

from prism import Prism, GatewayConfig, CacheConfig, RetryConfig

# Client-level config (applies to all requests)
client = Prism(
    api_key="sk-prism-your-key",
    base_url="https://gateway.futureagi.com",
    config=GatewayConfig(
        cache=CacheConfig(ttl=300, namespace="prod"),
        retry=RetryConfig(max_retries=3),
    ),
)

# Per-request override
override = GatewayConfig(cache=CacheConfig(force_refresh=True))
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello"}],
    extra_headers=override.to_headers(),
)

OpenAI SDK with create_headers()

Use create_headers() to generate all x-prism-* headers for the OpenAI SDK:

from openai import OpenAI
from prism import create_headers, GatewayConfig, CacheConfig

headers = create_headers(
    config=GatewayConfig(cache=CacheConfig(strategy="semantic", ttl=600)),
    trace_id="trace-abc",
    session_id="sess-123",
    user_id="user-42",
    metadata={"team": "ml", "env": "production"},
)

client = OpenAI(
    api_key="sk-prism-your-key",
    base_url="https://gateway.futureagi.com/v1",
    default_headers=headers,
)

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello"}],
)

cURL

Pass headers with -H:

curl -X POST https://gateway.futureagi.com/v1/chat/completions \
  -H "Authorization: Bearer sk-prism-your-key" \
  -H "x-prism-session-id: sess-abc" \
  -H "x-prism-trace-id: trace-123" \
  -H "x-prism-user-id: user-42" \
  -H "x-prism-metadata: {\"team\":\"ml\",\"env\":\"prod\"}" \
  -H "x-prism-cache-ttl: 300" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o",
    "messages": [{"role": "user", "content": "Hello"}]
  }'

Next Steps

Was this page helpful?

Questions & Discussion