Troubleshooting
Step-by-step solutions for common Prism Gateway issues.
About
Common issues and how to diagnose them when requests through Prism fail.
Debug checklist
When something isn’t working, start here:
- Check the
x-prism-request-idresponse header and search for it in your logs - Check
x-prism-providerto confirm which provider handled the request - Check
x-prism-model-usedto confirm the actual model (may differ from requested if routing changed it) - Compare
x-prism-latency-msagainst your expected latency - Check
x-prism-costto verify pricing is as expected
Use curl -i to see all response headers:
curl -i https://gateway.futureagi.com/v1/chat/completions \
-H "Authorization: Bearer sk-prism-your-key" \
-H "Content-Type: application/json" \
-d '{"model": "gpt-4o", "messages": [{"role": "user", "content": "hi"}]}'
Common issues
”model not found” but the model exists
Symptom: 404 with model_not_found even though the model appears in GET /v1/models.
Quick fix: Try the provider/model format to bypass model resolution:
# Check available models
curl https://gateway.futureagi.com/v1/models \
-H "Authorization: Bearer sk-prism-your-key" | jq '.data[].id'
# Use explicit provider prefix
curl https://gateway.futureagi.com/v1/chat/completions \
-H "Authorization: Bearer sk-prism-your-key" \
-H "Content-Type: application/json" \
-d '{"model": "openai/gpt-4o", "messages": [{"role": "user", "content": "hi"}]}'
If that works, set up a model map. See Error handling for all causes.
Provider returns 404 upstream
Symptom: 502 with provider_404.
The gateway reached the provider, but the provider rejected the request. Most common cause: the provider API key is invalid or doesn’t have access to the model. For OpenAI project-scoped keys (sk-proj-...), enable models in Project Settings > Model access.
See Error handling for details.
Responses are slow
Symptom: High x-prism-latency-ms values.
Possible causes:
- Provider latency: Check if the provider itself is slow. Compare
x-prism-latency-mswith direct provider calls. - No caching: Repeated identical requests hit the provider every time. Enable caching.
- Wrong routing strategy:
least-latencyrouting picks the fastest provider automatically. See routing. - Large prompts: Token count affects latency. Check
usage.prompt_tokensin the response. - Guardrail overhead: Pre-request guardrails add latency. Check if guardrails are processing-heavy.
Cache isn’t working
Symptom: x-prism-cache always shows miss or doesn’t appear.
Checklist:
- Is caching enabled? Check your org config or
GatewayConfig. - Are you sending streaming requests? Streaming bypasses cache entirely.
- Are the requests identical? Exact-match cache requires identical model, messages, temperature, and all parameters.
- Is the TTL too short? Requests may expire before the next identical request arrives.
- Are you using different cache namespaces? Each namespace is isolated.
# Force a cache test: send the same non-streaming request twice
from prism import Prism, GatewayConfig, CacheConfig
client = Prism(
api_key="sk-prism-your-key",
base_url="https://gateway.futureagi.com",
config=GatewayConfig(cache=CacheConfig(enabled=True, strategy="exact", ttl=300)),
)
# First call
r1 = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "What is 2+2?"}],
)
print(f"Call 1 cache: {r1.prism.cache_status}") # miss or None
# Second call (same input)
r2 = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "What is 2+2?"}],
)
print(f"Call 2 cache: {r2.prism.cache_status}") # hit_exact
Guardrails blocking legitimate requests
Symptom: 403 with content_blocked on requests that should be allowed.
Diagnosis:
- Check which guardrail fired: the error message includes the guardrail name
- Check
x-prism-guardrail-triggered: truein the response headers - Switch the guardrail from
enforcetologmode temporarily to see what’s being flagged without blocking
See Guardrails for configuration options including fail-open behavior.
Rate limits hit unexpectedly
Symptom: 429 errors before you expect to hit limits.
Check the response headers:
x-ratelimit-limit-requests: 100
x-ratelimit-remaining-requests: 0
x-ratelimit-reset-requests: 1714000000
Common causes:
- Per-key limits are lower than per-org limits. The most restrictive limit applies.
- Multiple services share the same API key
- Burst traffic from retries (each retry counts against the limit)
Fix: Increase limits in Rate limiting, use separate keys per service, or add backoff to retry logic.
Cost is higher than expected
Diagnosis:
- Check
x-prism-coston individual requests to find expensive calls - Use metadata tagging to identify which team/feature is driving costs:
response = client.chat.completions.create( model="gpt-4o", messages=[{"role": "user", "content": "Hello"}], request_metadata={"team": "search", "feature": "autocomplete"}, ) - Check the analytics dashboard for cost-by-model breakdown
- Look for missing cache hits on repeated queries
- Check if the
racerouting strategy is enabled (bills all providers, not just the winner)
See Cost tracking for attribution and budgets.
Failover isn’t working
Symptom: Requests fail with provider errors but don’t route to backup providers.
Checklist:
- Is failover enabled in your routing config?
- Does
failover_oninclude the status code you’re seeing? (Default:[429, 500, 502, 503, 504]) - Are backup providers configured with valid credentials?
- Check
x-prism-fallback-used: trueto confirm failover happened (or didn’t) - Check
x-prism-providerto see which provider ultimately handled the request
Getting help
If you can’t resolve the issue:
- Collect the
x-prism-request-idfrom the failing request - Note the timestamp and error message
- Check the Error handling guide for the specific error code
- Contact support with the request ID - it links to the full request/response log on our end