Self-Hosted Deployment

Deploy Prism AI Gateway on your own infrastructure using Docker or a Go binary.

About

Prism is distributed as a Go binary and Docker image. Self-hosting gives you full control over data residency, network topology, and configuration. All requests stay within your infrastructure.

Whether you’re running a single instance for development or scaling to production, Prism handles routing, failover, caching, and rate limiting across multiple LLM providers.

Requirements

Docker (for container deployment) or Go 1.23+ (to build from source)
A publicly routable endpoint (if self-hosted LLM providers need to connect back to Prism)
Provider API keys for any cloud LLM providers you want to use
At least 256MB of available memory

Quick start with Docker

Create a configuration file

Save this as config.yaml:

server:
  port: 8080

providers:
  openai:
    api_key: "${OPENAI_API_KEY}"
    api_format: "openai"
    models:
      - gpt-4o
      - gpt-4o-mini

auth:
  enabled: true
  keys:
    - name: "my-key"
      key: "sk-prism-my-key-here"

logging:
  level: info

Set your API key

export OPENAI_API_KEY="sk-..."

Run the container

docker run -d \
  -p 8080:8080 \
  -v $(pwd)/config.yaml:/app/config.yaml \
  -e OPENAI_API_KEY="$OPENAI_API_KEY" \
  --name prism-gateway \
  futureagi/prism-gateway:latest

Verify it's running

curl http://localhost:8080/healthz

Expected response: {"status":"ok"}

Note

Replace config.yaml with your actual configuration file. Environment variables referenced in the config (like ${OPENAI_API_KEY}) are resolved at runtime.

Configuration file

Basic configuration

Here’s a minimal config for getting started with OpenAI:

server:
  port: 8080
  host: "0.0.0.0"

providers:
  openai:
    api_key: "${OPENAI_API_KEY}"
    api_format: "openai"
    models:
      - gpt-4o
      - gpt-4o-mini

auth:
  enabled: true
  keys:
    - name: "my-key"
      key: "sk-prism-my-key-here"

logging:
  level: info

Adding multiple providers

Combine OpenAI, Anthropic, and a self-hosted Ollama instance:

server:
  port: 8080

providers:
  openai:
    api_key: "${OPENAI_API_KEY}"
    api_format: "openai"
    models:
      - gpt-4o
      - gpt-4o-mini

  anthropic:
    api_key: "${ANTHROPIC_API_KEY}"
    api_format: "anthropic"
    models:
      - claude-sonnet-4-20250514

  ollama:
    base_url: "http://localhost:11434"
    api_format: "openai"
    type: "ollama"

auth:
  enabled: true
  keys:
    - name: "my-key"
      key: "sk-prism-my-key-here"

logging:
  level: info

Tip

For Ollama, models are auto-discovered from the /v1/models endpoint. You don’t need to list them explicitly.

Enabling routing and failover

Add intelligent routing across multiple providers:

routing:
  default_strategy: "round-robin"
  failover:
    enabled: true
    max_attempts: 3
    on_status_codes: [429, 500, 502, 503, 504]
    on_timeout: true
  circuit_breaker:
    enabled: true
    failure_threshold: 5
    success_threshold: 2
    cooldown: 30s
  retry:
    enabled: true
    max_retries: 2
    initial_delay: 500ms
    max_delay: 10s
    multiplier: 2.0

This configuration:

Routes requests round-robin across providers
Fails over to the next provider on 429, 5xx errors, or timeouts
Opens circuit breaker after 5 consecutive failures
Automatically retries with exponential backoff

Enabling caching

Cache responses to reduce latency and API costs:

cache:
  enabled: true
  default_ttl: 5m
  max_entries: 10000

Warning

Caching is based on request content. Ensure your use case is compatible with cached responses (e.g., deterministic queries, not real-time data).

Rate limiting

Control request volume:

rate_limiting:
  enabled: true
  global_rpm: 1000

Set global_rpm: 0 for unlimited requests.

Authentication

Restrict access with API keys:

auth:
  enabled: true
  keys:
    - name: "dev-key"
      key: "sk-prism-dev-key-for-testing"
      owner: "dev-team"
      models:
        - gpt-4o
        - gpt-4o-mini

    - name: "prod-key"
      key: "sk-prism-prod-key-here"
      owner: "production"

The models field is optional. If omitted, the key can access all models.

Server configuration reference

Setting	Default	Description
`server.port`	`8080`	Port to listen on
`server.host`	`0.0.0.0`	Host to bind to
`server.read_timeout`	`5s`	Request read timeout
`server.write_timeout`	`300s`	Response write timeout
`server.idle_timeout`	`120s`	Idle connection timeout
`server.shutdown_timeout`	`30s`	Graceful shutdown timeout
`server.max_request_body_size`	`10485760`	Max request body (10MB)
`server.default_request_timeout`	`60s`	Default timeout for provider requests

Provider configuration reference

Each provider in the providers: section supports:

Setting	Required	Description
`api_key`	Yes	API key (can use `${ENV_VAR}` syntax)
`api_format`	Yes	Format: `openai`, `anthropic`, `gemini`, `bedrock`, `cohere`, `azure`
`base_url`	No	Custom endpoint (auto-filled for known providers)
`type`	No	Provider shorthand: `groq`, `mistral`, `ollama`, `vllm`, etc.
`models`	No	List of available models (auto-discovered for some providers)
`default_timeout`	No	Request timeout for this provider
`max_concurrent`	No	Max concurrent requests
`conn_pool_size`	No	Connection pool size

Health checks

Verify the gateway is running and ready:

curl http://localhost:8080/healthz

curl http://localhost:8080/readyz

Both endpoints return {"status":"ok"} when healthy.

Connecting your application

Once running, point your application to the self-hosted gateway:

from prism import Prism

client = Prism(
    api_key="sk-prism-my-key-here",
    base_url="http://localhost:8080",
)

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello!"}],
)
print(response.choices[0].message.content)

import { Prism } from "@futureagi/prism";

const client = new Prism({
  apiKey: "sk-prism-my-key-here",
  baseUrl: "http://localhost:8080",
});

const response = await client.chat.completions.create({
  model: "gpt-4o",
  messages: [{ role: "user", content: "Hello!" }],
});

console.log(response.choices[0].message.content);

curl -X POST http://localhost:8080/v1/chat/completions \
  -H "Authorization: Bearer sk-prism-my-key-here" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o",
    "messages": [{"role": "user", "content": "Hello!"}]
  }'

Tip

For production, use a public endpoint (e.g., behind a reverse proxy with TLS). Replace http://localhost:8080 with your actual gateway URL.

Building from source

If you prefer to build the binary yourself:

git clone https://github.com/futureagi/core-backend.git
cd core-backend/prism-gateway
go build -o prism-gateway ./cmd/prism
./prism-gateway --config config.yaml

Environment variables

All values in config.yaml that use ${VAR_NAME} syntax are resolved from environment variables at startup. For example:

providers:
  openai:
    api_key: "${OPENAI_API_KEY}"

Set the variable before running:

export OPENAI_API_KEY="sk-..."
docker run -e OPENAI_API_KEY="$OPENAI_API_KEY" ...

Logging

Control verbosity with the logging.level setting:

logging:
  level: debug  # debug, info, warn, error

View logs from the container:

docker logs -f prism-gateway

Next steps

Configuration

Deep dive into all configuration options

Manage Providers

Add, configure, and manage LLM providers

Was this page helpful?

FutureAGI AI Assistant