Troubleshooting

Step-by-step solutions for common Prism Gateway issues.

About

Common issues and how to diagnose them when requests through Prism fail.

Debug checklist

When something isn’t working, start here:

Check the x-prism-request-id response header and search for it in your logs
Check x-prism-provider to confirm which provider handled the request
Check x-prism-model-used to confirm the actual model (may differ from requested if routing changed it)
Compare x-prism-latency-ms against your expected latency
Check x-prism-cost to verify pricing is as expected

Use curl -i to see all response headers:

curl -i https://gateway.futureagi.com/v1/chat/completions \
  -H "Authorization: Bearer sk-prism-your-key" \
  -H "Content-Type: application/json" \
  -d '{"model": "gpt-4o", "messages": [{"role": "user", "content": "hi"}]}'

Common issues

”model not found” but the model exists

Symptom: 404 with model_not_found even though the model appears in GET /v1/models.

Quick fix: Try the provider/model format to bypass model resolution:

# Check available models
curl https://gateway.futureagi.com/v1/models \
  -H "Authorization: Bearer sk-prism-your-key" | jq '.data[].id'

# Use explicit provider prefix
curl https://gateway.futureagi.com/v1/chat/completions \
  -H "Authorization: Bearer sk-prism-your-key" \
  -H "Content-Type: application/json" \
  -d '{"model": "openai/gpt-4o", "messages": [{"role": "user", "content": "hi"}]}'

If that works, set up a model map. See Error handling for all causes.

Provider returns 404 upstream

Symptom: 502 with provider_404.

The gateway reached the provider, but the provider rejected the request. Most common cause: the provider API key is invalid or doesn’t have access to the model. For OpenAI project-scoped keys (sk-proj-...), enable models in Project Settings > Model access.

See Error handling for details.

Responses are slow

Symptom: High x-prism-latency-ms values.

Possible causes:

Provider latency: Check if the provider itself is slow. Compare x-prism-latency-ms with direct provider calls.
No caching: Repeated identical requests hit the provider every time. Enable caching.
Wrong routing strategy: least-latency routing picks the fastest provider automatically. See routing.
Large prompts: Token count affects latency. Check usage.prompt_tokens in the response.
Guardrail overhead: Pre-request guardrails add latency. Check if guardrails are processing-heavy.

Cache isn’t working

Symptom: x-prism-cache always shows miss or doesn’t appear.

Checklist:

Is caching enabled? Check your org config or GatewayConfig.
Are you sending streaming requests? Streaming bypasses cache entirely.
Are the requests identical? Exact-match cache requires identical model, messages, temperature, and all parameters.
Is the TTL too short? Requests may expire before the next identical request arrives.
Are you using different cache namespaces? Each namespace is isolated.

# Force a cache test: send the same non-streaming request twice
from prism import Prism, GatewayConfig, CacheConfig

client = Prism(
    api_key="sk-prism-your-key",
    base_url="https://gateway.futureagi.com",
    config=GatewayConfig(cache=CacheConfig(enabled=True, strategy="exact", ttl=300)),
)

# First call
r1 = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "What is 2+2?"}],
)
print(f"Call 1 cache: {r1.prism.cache_status}")  # miss or None

# Second call (same input)
r2 = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "What is 2+2?"}],
)
print(f"Call 2 cache: {r2.prism.cache_status}")  # hit_exact

Guardrails blocking legitimate requests

Symptom: 403 with content_blocked on requests that should be allowed.

Diagnosis:

Check which guardrail fired: the error message includes the guardrail name
Check x-prism-guardrail-triggered: true in the response headers
Switch the guardrail from enforce to log mode temporarily to see what’s being flagged without blocking

See Guardrails for configuration options including fail-open behavior.

Rate limits hit unexpectedly

Symptom: 429 errors before you expect to hit limits.

Check the response headers:

x-ratelimit-limit-requests: 100
x-ratelimit-remaining-requests: 0
x-ratelimit-reset-requests: 1714000000

Common causes:

Per-key limits are lower than per-org limits. The most restrictive limit applies.
Multiple services share the same API key
Burst traffic from retries (each retry counts against the limit)

Fix: Increase limits in Rate limiting, use separate keys per service, or add backoff to retry logic.

Cost is higher than expected

Diagnosis:

Check x-prism-cost on individual requests to find expensive calls

Use metadata tagging to identify which team/feature is driving costs:

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello"}],
    request_metadata={"team": "search", "feature": "autocomplete"},
)

Check the analytics dashboard for cost-by-model breakdown
Look for missing cache hits on repeated queries
Check if the race routing strategy is enabled (bills all providers, not just the winner)

See Cost tracking for attribution and budgets.

Failover isn’t working

Symptom: Requests fail with provider errors but don’t route to backup providers.

Checklist:

Is failover enabled in your routing config?
Does failover_on include the status code you’re seeing? (Default: [429, 500, 502, 503, 504])
Are backup providers configured with valid credentials?
Check x-prism-fallback-used: true to confirm failover happened (or didn’t)
Check x-prism-provider to see which provider ultimately handled the request

Getting help

If you can’t resolve the issue:

Collect the x-prism-request-id from the failing request
Note the timestamp and error message
Check the Error handling guide for the specific error code
Contact support with the request ID - it links to the full request/response log on our end

Next Steps

Error handling

Error codes, retry strategies, and SDK exceptions

Request & response headers

All debug headers for request correlation

Routing & failover

Configure automatic failover

Configuration

Configuration hierarchy and overrides

Was this page helpful?

FutureAGI AI Assistant