Configuration

How organization configuration works in Prism: sections, hierarchy, and real-time updates.

About

Prism is configured at the organization level. Each organization has its own set of providers, guardrails, routing rules, rate limits, and budgets. Configuration changes are pushed to the gateway in real time with no restart required.

Configuration Hierarchy

When a setting is specified in multiple places, Prism applies the most specific one:

Request Headers > API Key Config > Organization Config > Global Config

For example, a cache TTL set via the x-prism-cache-ttl request header overrides the TTL set in the organization config.

Configuration Sections

Section	What it controls
`providers`	Which LLM services are available and their credentials
`guardrails`	Safety checks applied to requests and responses
`routing`	How requests are distributed across providers (strategy, failover, retries)
`cache`	Caching mode, TTL, and namespace settings
`rate_limiting`	Maximum request rate per API key or organization
`budgets`	Spending limits per period and alert thresholds
`cost_tracking`	Cost calculation and attribution settings
`ip_acl`	IP Access Control List. Which source IP addresses are permitted
`alerting`	Email or webhook alerts for budget events, errors, and guardrail triggers
`privacy`	Data retention periods and request logging policies
`tool_policy`	Which tool and function calls are permitted
`mcp`	Model Context Protocol integration settings
`model_map`	Custom model name aliases. Map a friendly name like “my-gpt” to an actual model
`audit`	Audit log configuration and retention settings

Example Configuration

A minimal organization configuration that sets up two providers with weighted routing, caching, and a monthly budget:

{
  "providers": {
    "openai": {
      "api_key": "sk-...",
      "models": ["gpt-4o", "gpt-4o-mini"]
    },
    "anthropic": {
      "api_key": "sk-ant-...",
      "models": ["claude-sonnet-4-6", "claude-haiku-4-5"]
    }
  },
  "routing": {
    "strategy": "weighted",
    "weights": { "openai": 70, "anthropic": 30 },
    "failover": {
      "enabled": true,
      "providers": ["openai", "anthropic"]
    }
  },
  "cache": {
    "enabled": true,
    "mode": "exact",
    "ttl_seconds": 3600
  },
  "budgets": {
    "limit": 500.00,
    "period": "monthly",
    "alert_threshold_percent": 80
  }
}

Note

Changes to organization configuration are pushed to the gateway in real time. No restart or redeployment needed.

SDK configuration

The Prism SDK lets you apply configuration at two levels: client-level (affects all requests) and per-request (overrides for a single call).

Client-level config

Pass a GatewayConfig to the client constructor. It applies to every request made with that client:

from prism import Prism, GatewayConfig, CacheConfig, RetryConfig, FallbackConfig, FallbackTarget

client = Prism(
    api_key="sk-prism-your-key",
    base_url="https://gateway.futureagi.com",
    config=GatewayConfig(
        cache=CacheConfig(strategy="exact", ttl=300, namespace="prod"),
        retry=RetryConfig(max_retries=3, on_status_codes=[429, 500, 502, 503]),
        fallback=FallbackConfig(
            targets=[FallbackTarget(model="gpt-4o-mini")],
        ),
    ),
)

# All requests through this client use the cache, retry, and fallback settings
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello"}],
)

import { Prism } from "@futureagi/prism";

const client = new Prism({
  apiKey: "sk-prism-your-key",
  baseUrl: "https://gateway.futureagi.com",
  config: {
    cache: { strategy: "exact", ttl: 300, namespace: "prod" },
    retry: { max_retries: 3, on_status_codes: [429, 500, 502, 503] },
    fallback: {
      targets: [{ model: "gpt-4o-mini" }],
    },
  },
});

const response = await client.chat.completions.create({
  model: "gpt-4o",
  messages: [{ role: "user", content: "Hello" }],
});

Per-request overrides

Override config for a single request using extra_headers. The GatewayConfig.to_headers() method serialises the config to x-prism-config:

from prism import GatewayConfig, CacheConfig

# Force a cache refresh for this specific request
override = GatewayConfig(cache=CacheConfig(force_refresh=True))
headers = override.to_headers()

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "What time is it?"}],
    extra_headers=headers,
)

Using with the OpenAI SDK

If you’re not using the Prism SDK, use create_headers() to generate x-prism-* headers for any OpenAI-compatible client:

from openai import OpenAI
from prism import create_headers, GatewayConfig, CacheConfig

headers = create_headers(
    api_key="sk-prism-your-key",
    config=GatewayConfig(cache=CacheConfig(strategy="semantic", ttl=600)),
    trace_id="trace-abc",
    metadata={"team": "ml", "env": "production"},
)

client = OpenAI(
    base_url="https://gateway.futureagi.com/v1",
    default_headers=headers,
)

Override precedence

Per-request headers override client-level config, which overrides org config. See Configuration Hierarchy above.

Configuration

About

Configuration Hierarchy

Configuration Sections

Example Configuration

SDK configuration

Client-level config

Per-request overrides

Using with the OpenAI SDK

Override precedence

Next Steps

Core Concepts

API Reference

Manage Providers

Set Up Guardrails

Questions & Discussion

FutureAGI AI Assistant

About

Configuration Hierarchy

Configuration Sections

Example Configuration

SDK configuration

Client-level config

Per-request overrides

Using with the OpenAI SDK

Override precedence

Next Steps

Core Concepts

API Reference

Manage Providers

Set Up Guardrails

Questions & Discussion