Configuration
How organization configuration works in Prism: sections, hierarchy, and real-time updates.
About
Prism is configured at the organization level. Each organization has its own set of providers, guardrails, routing rules, rate limits, and budgets. Configuration changes are pushed to the gateway in real time with no restart required.
Configuration Hierarchy
When a setting is specified in multiple places, Prism applies the most specific one:
Request Headers > API Key Config > Organization Config > Global Config
For example, a cache TTL set via the x-prism-cache-ttl request header overrides the TTL set in the organization config.
Configuration Sections
| Section | What it controls |
|---|---|
providers | Which LLM services are available and their credentials |
guardrails | Safety checks applied to requests and responses |
routing | How requests are distributed across providers (strategy, failover, retries) |
cache | Caching mode, TTL, and namespace settings |
rate_limiting | Maximum request rate per API key or organization |
budgets | Spending limits per period and alert thresholds |
cost_tracking | Cost calculation and attribution settings |
ip_acl | IP Access Control List. Which source IP addresses are permitted |
alerting | Email or webhook alerts for budget events, errors, and guardrail triggers |
privacy | Data retention periods and request logging policies |
tool_policy | Which tool and function calls are permitted |
mcp | Model Context Protocol integration settings |
model_map | Custom model name aliases. Map a friendly name like “my-gpt” to an actual model |
audit | Audit log configuration and retention settings |
Example Configuration
A minimal organization configuration that sets up two providers with weighted routing, caching, and a monthly budget:
{
"providers": {
"openai": {
"api_key": "sk-...",
"models": ["gpt-4o", "gpt-4o-mini"]
},
"anthropic": {
"api_key": "sk-ant-...",
"models": ["claude-sonnet-4-6", "claude-haiku-4-5"]
}
},
"routing": {
"strategy": "weighted",
"weights": { "openai": 70, "anthropic": 30 },
"failover": {
"enabled": true,
"providers": ["openai", "anthropic"]
}
},
"cache": {
"enabled": true,
"mode": "exact",
"ttl_seconds": 3600
},
"budgets": {
"limit": 500.00,
"period": "monthly",
"alert_threshold_percent": 80
}
}
Note
Changes to organization configuration are pushed to the gateway in real time. No restart or redeployment needed.
SDK configuration
The Prism SDK lets you apply configuration at two levels: client-level (affects all requests) and per-request (overrides for a single call).
Client-level config
Pass a GatewayConfig to the client constructor. It applies to every request made with that client:
from prism import Prism, GatewayConfig, CacheConfig, RetryConfig, FallbackConfig, FallbackTarget
client = Prism(
api_key="sk-prism-your-key",
base_url="https://gateway.futureagi.com",
config=GatewayConfig(
cache=CacheConfig(strategy="exact", ttl=300, namespace="prod"),
retry=RetryConfig(max_retries=3, on_status_codes=[429, 500, 502, 503]),
fallback=FallbackConfig(
targets=[FallbackTarget(model="gpt-4o-mini")],
),
),
)
# All requests through this client use the cache, retry, and fallback settings
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Hello"}],
)import { Prism } from "@futureagi/prism";
const client = new Prism({
apiKey: "sk-prism-your-key",
baseUrl: "https://gateway.futureagi.com",
config: {
cache: { strategy: "exact", ttl: 300, namespace: "prod" },
retry: { max_retries: 3, on_status_codes: [429, 500, 502, 503] },
fallback: {
targets: [{ model: "gpt-4o-mini" }],
},
},
});
const response = await client.chat.completions.create({
model: "gpt-4o",
messages: [{ role: "user", content: "Hello" }],
}); Per-request overrides
Override config for a single request using extra_headers. The GatewayConfig.to_headers() method serialises the config to x-prism-config:
from prism import GatewayConfig, CacheConfig
# Force a cache refresh for this specific request
override = GatewayConfig(cache=CacheConfig(force_refresh=True))
headers = override.to_headers()
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "What time is it?"}],
extra_headers=headers,
)
Using with the OpenAI SDK
If you’re not using the Prism SDK, use create_headers() to generate x-prism-* headers for any OpenAI-compatible client:
from openai import OpenAI
from prism import create_headers, GatewayConfig, CacheConfig
headers = create_headers(
api_key="sk-prism-your-key",
config=GatewayConfig(cache=CacheConfig(strategy="semantic", ttl=600)),
trace_id="trace-abc",
metadata={"team": "ml", "env": "production"},
)
client = OpenAI(
base_url="https://gateway.futureagi.com/v1",
default_headers=headers,
)
Override precedence
Per-request headers override client-level config, which overrides org config. See Configuration Hierarchy above.