Configuration

How organization configuration works in Prism: sections, hierarchy, and real-time updates.

About

Prism is configured at the organization level. Each organization has its own set of providers, guardrails, routing rules, rate limits, and budgets. Configuration changes are pushed to the gateway in real time with no restart required.


Configuration Hierarchy

When a setting is specified in multiple places, Prism applies the most specific one:

Request Headers > API Key Config > Organization Config > Global Config

For example, a cache TTL set via the x-prism-cache-ttl request header overrides the TTL set in the organization config.


Configuration Sections

SectionWhat it controls
providersWhich LLM services are available and their credentials
guardrailsSafety checks applied to requests and responses
routingHow requests are distributed across providers (strategy, failover, retries)
cacheCaching mode, TTL, and namespace settings
rate_limitingMaximum request rate per API key or organization
budgetsSpending limits per period and alert thresholds
cost_trackingCost calculation and attribution settings
ip_aclIP Access Control List. Which source IP addresses are permitted
alertingEmail or webhook alerts for budget events, errors, and guardrail triggers
privacyData retention periods and request logging policies
tool_policyWhich tool and function calls are permitted
mcpModel Context Protocol integration settings
model_mapCustom model name aliases. Map a friendly name like “my-gpt” to an actual model
auditAudit log configuration and retention settings

Example Configuration

A minimal organization configuration that sets up two providers with weighted routing, caching, and a monthly budget:

{
  "providers": {
    "openai": {
      "api_key": "sk-...",
      "models": ["gpt-4o", "gpt-4o-mini"]
    },
    "anthropic": {
      "api_key": "sk-ant-...",
      "models": ["claude-sonnet-4-6", "claude-haiku-4-5"]
    }
  },
  "routing": {
    "strategy": "weighted",
    "weights": { "openai": 70, "anthropic": 30 },
    "failover": {
      "enabled": true,
      "providers": ["openai", "anthropic"]
    }
  },
  "cache": {
    "enabled": true,
    "mode": "exact",
    "ttl_seconds": 3600
  },
  "budgets": {
    "limit": 500.00,
    "period": "monthly",
    "alert_threshold_percent": 80
  }
}

Note

Changes to organization configuration are pushed to the gateway in real time. No restart or redeployment needed.


SDK configuration

The Prism SDK lets you apply configuration at two levels: client-level (affects all requests) and per-request (overrides for a single call).

Client-level config

Pass a GatewayConfig to the client constructor. It applies to every request made with that client:

from prism import Prism, GatewayConfig, CacheConfig, RetryConfig, FallbackConfig, FallbackTarget

client = Prism(
    api_key="sk-prism-your-key",
    base_url="https://gateway.futureagi.com",
    config=GatewayConfig(
        cache=CacheConfig(strategy="exact", ttl=300, namespace="prod"),
        retry=RetryConfig(max_retries=3, on_status_codes=[429, 500, 502, 503]),
        fallback=FallbackConfig(
            targets=[FallbackTarget(model="gpt-4o-mini")],
        ),
    ),
)

# All requests through this client use the cache, retry, and fallback settings
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello"}],
)
import { Prism } from "@futureagi/prism";

const client = new Prism({
  apiKey: "sk-prism-your-key",
  baseUrl: "https://gateway.futureagi.com",
  config: {
    cache: { strategy: "exact", ttl: 300, namespace: "prod" },
    retry: { max_retries: 3, on_status_codes: [429, 500, 502, 503] },
    fallback: {
      targets: [{ model: "gpt-4o-mini" }],
    },
  },
});

const response = await client.chat.completions.create({
  model: "gpt-4o",
  messages: [{ role: "user", content: "Hello" }],
});

Per-request overrides

Override config for a single request using extra_headers. The GatewayConfig.to_headers() method serialises the config to x-prism-config:

from prism import GatewayConfig, CacheConfig

# Force a cache refresh for this specific request
override = GatewayConfig(cache=CacheConfig(force_refresh=True))
headers = override.to_headers()

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "What time is it?"}],
    extra_headers=headers,
)

Using with the OpenAI SDK

If you’re not using the Prism SDK, use create_headers() to generate x-prism-* headers for any OpenAI-compatible client:

from openai import OpenAI
from prism import create_headers, GatewayConfig, CacheConfig

headers = create_headers(
    api_key="sk-prism-your-key",
    config=GatewayConfig(cache=CacheConfig(strategy="semantic", ttl=600)),
    trace_id="trace-abc",
    metadata={"team": "ml", "env": "production"},
)

client = OpenAI(
    base_url="https://gateway.futureagi.com/v1",
    default_headers=headers,
)

Override precedence

Per-request headers override client-level config, which overrides org config. See Configuration Hierarchy above.


Next Steps

Was this page helpful?

Questions & Discussion