Self-Hosted Deployment
Deploy Prism AI Gateway on your own infrastructure using Docker or a Go binary.
About
Prism is distributed as a Go binary and Docker image. Self-hosting gives you full control over data residency, network topology, and configuration. All requests stay within your infrastructure.
Whether you’re running a single instance for development or scaling to production, Prism handles routing, failover, caching, and rate limiting across multiple LLM providers.
Requirements
- Docker (for container deployment) or Go 1.23+ (to build from source)
- A publicly routable endpoint (if self-hosted LLM providers need to connect back to Prism)
- Provider API keys for any cloud LLM providers you want to use
- At least 256MB of available memory
Quick start with Docker
Create a configuration file
Save this as config.yaml:
server:
port: 8080
providers:
openai:
api_key: "${OPENAI_API_KEY}"
api_format: "openai"
models:
- gpt-4o
- gpt-4o-mini
auth:
enabled: true
keys:
- name: "my-key"
key: "sk-prism-my-key-here"
logging:
level: info Set your API key
export OPENAI_API_KEY="sk-..." Run the container
docker run -d \
-p 8080:8080 \
-v $(pwd)/config.yaml:/app/config.yaml \
-e OPENAI_API_KEY="$OPENAI_API_KEY" \
--name prism-gateway \
futureagi/prism-gateway:latest Verify it's running
curl http://localhost:8080/healthzExpected response: {"status":"ok"}
Note
Replace config.yaml with your actual configuration file. Environment variables referenced in the config (like ${OPENAI_API_KEY}) are resolved at runtime.
Configuration file
Basic configuration
Here’s a minimal config for getting started with OpenAI:
server:
port: 8080
host: "0.0.0.0"
providers:
openai:
api_key: "${OPENAI_API_KEY}"
api_format: "openai"
models:
- gpt-4o
- gpt-4o-mini
auth:
enabled: true
keys:
- name: "my-key"
key: "sk-prism-my-key-here"
logging:
level: info
Adding multiple providers
Combine OpenAI, Anthropic, and a self-hosted Ollama instance:
server:
port: 8080
providers:
openai:
api_key: "${OPENAI_API_KEY}"
api_format: "openai"
models:
- gpt-4o
- gpt-4o-mini
anthropic:
api_key: "${ANTHROPIC_API_KEY}"
api_format: "anthropic"
models:
- claude-sonnet-4-20250514
ollama:
base_url: "http://localhost:11434"
api_format: "openai"
type: "ollama"
auth:
enabled: true
keys:
- name: "my-key"
key: "sk-prism-my-key-here"
logging:
level: info
Tip
For Ollama, models are auto-discovered from the /v1/models endpoint. You don’t need to list them explicitly.
Enabling routing and failover
Add intelligent routing across multiple providers:
routing:
default_strategy: "round-robin"
failover:
enabled: true
max_attempts: 3
on_status_codes: [429, 500, 502, 503, 504]
on_timeout: true
circuit_breaker:
enabled: true
failure_threshold: 5
success_threshold: 2
cooldown: 30s
retry:
enabled: true
max_retries: 2
initial_delay: 500ms
max_delay: 10s
multiplier: 2.0
This configuration:
- Routes requests round-robin across providers
- Fails over to the next provider on 429, 5xx errors, or timeouts
- Opens circuit breaker after 5 consecutive failures
- Automatically retries with exponential backoff
Enabling caching
Cache responses to reduce latency and API costs:
cache:
enabled: true
default_ttl: 5m
max_entries: 10000
Warning
Caching is based on request content. Ensure your use case is compatible with cached responses (e.g., deterministic queries, not real-time data).
Rate limiting
Control request volume:
rate_limiting:
enabled: true
global_rpm: 1000
Set global_rpm: 0 for unlimited requests.
Authentication
Restrict access with API keys:
auth:
enabled: true
keys:
- name: "dev-key"
key: "sk-prism-dev-key-for-testing"
owner: "dev-team"
models:
- gpt-4o
- gpt-4o-mini
- name: "prod-key"
key: "sk-prism-prod-key-here"
owner: "production"
The models field is optional. If omitted, the key can access all models.
Server configuration reference
| Setting | Default | Description |
|---|---|---|
server.port | 8080 | Port to listen on |
server.host | 0.0.0.0 | Host to bind to |
server.read_timeout | 5s | Request read timeout |
server.write_timeout | 300s | Response write timeout |
server.idle_timeout | 120s | Idle connection timeout |
server.shutdown_timeout | 30s | Graceful shutdown timeout |
server.max_request_body_size | 10485760 | Max request body (10MB) |
server.default_request_timeout | 60s | Default timeout for provider requests |
Provider configuration reference
Each provider in the providers: section supports:
| Setting | Required | Description |
|---|---|---|
api_key | Yes | API key (can use ${ENV_VAR} syntax) |
api_format | Yes | Format: openai, anthropic, gemini, bedrock, cohere, azure |
base_url | No | Custom endpoint (auto-filled for known providers) |
type | No | Provider shorthand: groq, mistral, ollama, vllm, etc. |
models | No | List of available models (auto-discovered for some providers) |
default_timeout | No | Request timeout for this provider |
max_concurrent | No | Max concurrent requests |
conn_pool_size | No | Connection pool size |
Health checks
Verify the gateway is running and ready:
curl http://localhost:8080/healthzcurl http://localhost:8080/readyz Both endpoints return {"status":"ok"} when healthy.
Connecting your application
Once running, point your application to the self-hosted gateway:
from prism import Prism
client = Prism(
api_key="sk-prism-my-key-here",
base_url="http://localhost:8080",
)
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Hello!"}],
)
print(response.choices[0].message.content)import { Prism } from "@futureagi/prism";
const client = new Prism({
apiKey: "sk-prism-my-key-here",
baseUrl: "http://localhost:8080",
});
const response = await client.chat.completions.create({
model: "gpt-4o",
messages: [{ role: "user", content: "Hello!" }],
});
console.log(response.choices[0].message.content);curl -X POST http://localhost:8080/v1/chat/completions \
-H "Authorization: Bearer sk-prism-my-key-here" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4o",
"messages": [{"role": "user", "content": "Hello!"}]
}' Tip
For production, use a public endpoint (e.g., behind a reverse proxy with TLS). Replace http://localhost:8080 with your actual gateway URL.
Building from source
If you prefer to build the binary yourself:
git clone https://github.com/futureagi/core-backend.git
cd core-backend/prism-gateway
go build -o prism-gateway ./cmd/prism
./prism-gateway --config config.yaml
Environment variables
All values in config.yaml that use ${VAR_NAME} syntax are resolved from environment variables at startup. For example:
providers:
openai:
api_key: "${OPENAI_API_KEY}"
Set the variable before running:
export OPENAI_API_KEY="sk-..."
docker run -e OPENAI_API_KEY="$OPENAI_API_KEY" ...
Logging
Control verbosity with the logging.level setting:
logging:
level: debug # debug, info, warn, error
View logs from the container:
docker logs -f prism-gateway