Self-Hosted Deployment

Deploy Prism AI Gateway on your own infrastructure using Docker or a Go binary.

About

Prism is distributed as a Go binary and Docker image. Self-hosting gives you full control over data residency, network topology, and configuration. All requests stay within your infrastructure.

Whether you’re running a single instance for development or scaling to production, Prism handles routing, failover, caching, and rate limiting across multiple LLM providers.

Requirements

  • Docker (for container deployment) or Go 1.23+ (to build from source)
  • A publicly routable endpoint (if self-hosted LLM providers need to connect back to Prism)
  • Provider API keys for any cloud LLM providers you want to use
  • At least 256MB of available memory

Quick start with Docker

Create a configuration file

Save this as config.yaml:

server:
  port: 8080

providers:
  openai:
    api_key: "${OPENAI_API_KEY}"
    api_format: "openai"
    models:
      - gpt-4o
      - gpt-4o-mini

auth:
  enabled: true
  keys:
    - name: "my-key"
      key: "sk-prism-my-key-here"

logging:
  level: info

Set your API key

export OPENAI_API_KEY="sk-..."

Run the container

docker run -d \
  -p 8080:8080 \
  -v $(pwd)/config.yaml:/app/config.yaml \
  -e OPENAI_API_KEY="$OPENAI_API_KEY" \
  --name prism-gateway \
  futureagi/prism-gateway:latest

Verify it's running

curl http://localhost:8080/healthz

Expected response: {"status":"ok"}

Note

Replace config.yaml with your actual configuration file. Environment variables referenced in the config (like ${OPENAI_API_KEY}) are resolved at runtime.

Configuration file

Basic configuration

Here’s a minimal config for getting started with OpenAI:

server:
  port: 8080
  host: "0.0.0.0"

providers:
  openai:
    api_key: "${OPENAI_API_KEY}"
    api_format: "openai"
    models:
      - gpt-4o
      - gpt-4o-mini

auth:
  enabled: true
  keys:
    - name: "my-key"
      key: "sk-prism-my-key-here"

logging:
  level: info

Adding multiple providers

Combine OpenAI, Anthropic, and a self-hosted Ollama instance:

server:
  port: 8080

providers:
  openai:
    api_key: "${OPENAI_API_KEY}"
    api_format: "openai"
    models:
      - gpt-4o
      - gpt-4o-mini

  anthropic:
    api_key: "${ANTHROPIC_API_KEY}"
    api_format: "anthropic"
    models:
      - claude-sonnet-4-20250514

  ollama:
    base_url: "http://localhost:11434"
    api_format: "openai"
    type: "ollama"

auth:
  enabled: true
  keys:
    - name: "my-key"
      key: "sk-prism-my-key-here"

logging:
  level: info

Tip

For Ollama, models are auto-discovered from the /v1/models endpoint. You don’t need to list them explicitly.

Enabling routing and failover

Add intelligent routing across multiple providers:

routing:
  default_strategy: "round-robin"
  failover:
    enabled: true
    max_attempts: 3
    on_status_codes: [429, 500, 502, 503, 504]
    on_timeout: true
  circuit_breaker:
    enabled: true
    failure_threshold: 5
    success_threshold: 2
    cooldown: 30s
  retry:
    enabled: true
    max_retries: 2
    initial_delay: 500ms
    max_delay: 10s
    multiplier: 2.0

This configuration:

  • Routes requests round-robin across providers
  • Fails over to the next provider on 429, 5xx errors, or timeouts
  • Opens circuit breaker after 5 consecutive failures
  • Automatically retries with exponential backoff

Enabling caching

Cache responses to reduce latency and API costs:

cache:
  enabled: true
  default_ttl: 5m
  max_entries: 10000

Warning

Caching is based on request content. Ensure your use case is compatible with cached responses (e.g., deterministic queries, not real-time data).

Rate limiting

Control request volume:

rate_limiting:
  enabled: true
  global_rpm: 1000

Set global_rpm: 0 for unlimited requests.

Authentication

Restrict access with API keys:

auth:
  enabled: true
  keys:
    - name: "dev-key"
      key: "sk-prism-dev-key-for-testing"
      owner: "dev-team"
      models:
        - gpt-4o
        - gpt-4o-mini

    - name: "prod-key"
      key: "sk-prism-prod-key-here"
      owner: "production"

The models field is optional. If omitted, the key can access all models.

Server configuration reference

SettingDefaultDescription
server.port8080Port to listen on
server.host0.0.0.0Host to bind to
server.read_timeout5sRequest read timeout
server.write_timeout300sResponse write timeout
server.idle_timeout120sIdle connection timeout
server.shutdown_timeout30sGraceful shutdown timeout
server.max_request_body_size10485760Max request body (10MB)
server.default_request_timeout60sDefault timeout for provider requests

Provider configuration reference

Each provider in the providers: section supports:

SettingRequiredDescription
api_keyYesAPI key (can use ${ENV_VAR} syntax)
api_formatYesFormat: openai, anthropic, gemini, bedrock, cohere, azure
base_urlNoCustom endpoint (auto-filled for known providers)
typeNoProvider shorthand: groq, mistral, ollama, vllm, etc.
modelsNoList of available models (auto-discovered for some providers)
default_timeoutNoRequest timeout for this provider
max_concurrentNoMax concurrent requests
conn_pool_sizeNoConnection pool size

Health checks

Verify the gateway is running and ready:

curl http://localhost:8080/healthz
curl http://localhost:8080/readyz

Both endpoints return {"status":"ok"} when healthy.

Connecting your application

Once running, point your application to the self-hosted gateway:

from prism import Prism

client = Prism(
    api_key="sk-prism-my-key-here",
    base_url="http://localhost:8080",
)

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello!"}],
)
print(response.choices[0].message.content)
import { Prism } from "@futureagi/prism";

const client = new Prism({
  apiKey: "sk-prism-my-key-here",
  baseUrl: "http://localhost:8080",
});

const response = await client.chat.completions.create({
  model: "gpt-4o",
  messages: [{ role: "user", content: "Hello!" }],
});

console.log(response.choices[0].message.content);
curl -X POST http://localhost:8080/v1/chat/completions \
  -H "Authorization: Bearer sk-prism-my-key-here" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

Tip

For production, use a public endpoint (e.g., behind a reverse proxy with TLS). Replace http://localhost:8080 with your actual gateway URL.

Building from source

If you prefer to build the binary yourself:

git clone https://github.com/futureagi/core-backend.git
cd core-backend/prism-gateway
go build -o prism-gateway ./cmd/prism
./prism-gateway --config config.yaml

Environment variables

All values in config.yaml that use ${VAR_NAME} syntax are resolved from environment variables at startup. For example:

providers:
  openai:
    api_key: "${OPENAI_API_KEY}"

Set the variable before running:

export OPENAI_API_KEY="sk-..."
docker run -e OPENAI_API_KEY="$OPENAI_API_KEY" ...

Logging

Control verbosity with the logging.level setting:

logging:
  level: debug  # debug, info, warn, error

View logs from the container:

docker logs -f prism-gateway

Next steps

Was this page helpful?

Questions & Discussion