Prism AI Gateway
A single API that routes requests across 100+ LLM providers with built-in safety, caching, cost tracking, and reliability.
What is Prism?
Prism is Future AGI’s AI Gateway — a proxy layer that sits between your application and LLM providers. Instead of managing separate API keys, rate limits, and error handling for each provider, you connect to Prism once and it handles routing, failover, safety checks, caching, and cost tracking automatically.
You send requests to Prism using the standard OpenAI API format. Prism routes them to the appropriate provider, applies any guardrails and caching rules, and returns a response — with extra metadata headers showing which provider was used, how much the request cost, and whether the cache was hit.
Note
Already using the OpenAI SDK? You can keep your existing code. Point base_url at https://gateway.futureagi.com, swap your API key for a Prism key, and switch between 100+ providers by changing the model name — no other code changes needed.
Get started
Quickstart
Make your first LLM request through Prism in under 5 minutes
Core Concepts
Understand the building blocks: gateways, virtual keys, organizations, and providers
What Prism does
- Route requests across 100+ providers — OpenAI, Anthropic, Google Gemini, AWS Bedrock, Azure, Mistral, Groq, and more. Switch providers by changing the model name. No client code changes.
- Add safety guardrails — 18+ built-in checks including PII detection, prompt injection prevention, content moderation, and secret detection. Enforce, monitor, or log.
- Balance load and fail over — Weighted, latency-based, and cost-optimized routing with automatic failover, retries, and circuit breaking.
- Cache responses — Exact match and semantic caching at the gateway level. Repeated queries return instantly without calling the provider.
- Track costs and set budgets — Per-request cost in every response header. Budget limits block requests when exceeded. Cost attribution by team, feature, or user.
- Stream in real time — Full streaming pass-through. Responses stream token-by-token using the standard format, across all providers.
- Observe everything — Request logs, latency metrics, error tracking, and a full analytics dashboard.
Explore features
Manage Providers
Connect and configure LLM providers
Set Up Guardrails
Add safety policies and content moderation
Configure Routing
Set up load balancing and failover
Enable Caching
Reduce costs and latency with response caching
Cost Tracking
Monitor spend and set budget limits
Streaming
Stream responses in real time
Supported providers
Prism connects to cloud providers, API services, and self-hosted models. Providers with different native APIs (Anthropic, Gemini, Bedrock) are automatically translated to the standard OpenAI format, so your code stays the same regardless of which provider handles the request.
| Provider | Type |
|---|---|
| OpenAI | Cloud API |
| Anthropic | Cloud API |
| Google Gemini | Cloud API |
| AWS Bedrock | Cloud API |
| Azure OpenAI | Cloud API |
| Cohere | Cloud API |
| Groq, Together AI, Fireworks | Cloud API |
| Mistral AI, DeepInfra, Perplexity | Cloud API |
| xAI, OpenRouter | Cloud API |
| Ollama, vLLM, LM Studio | Self-hosted |
For the full list of endpoints, request headers, and response metadata headers, see Core Concepts.