Routing & Reliability
Configure load balancing, failover, retries, and circuit breaking across LLM providers.
What it is
Prism’s routing layer distributes requests across multiple providers and models to maximize reliability and performance. If one provider is down or slow, traffic automatically shifts to healthy alternatives. This ensures your application stays responsive even when individual providers experience outages or rate limiting.
Use cases
- High availability — Automatic failover to backup providers when primary is down or rate-limited
- Cost optimization — Route to the cheapest provider that supports the requested model
- Latency reduction — Route to the fastest provider based on recent response times
- Traffic distribution — Split traffic across providers by weight for capacity management
Key concepts
| Term | Definition |
|---|---|
| Failover | Automatic rerouting of requests to a backup provider when the primary provider fails or returns errors (429, 5xx) |
| Retries | Repeated attempts to send a request after a failure, using exponential backoff to avoid overwhelming the provider |
| Circuit breaking | A protection mechanism that stops sending requests to a failing provider entirely, then gradually tests recovery before resuming full traffic |
| Timeouts | Maximum duration Prism waits for a provider response before treating the request as failed |
| Routing strategy | The algorithm Prism uses to select which provider handles each request (e.g., round robin, weighted, latency-based) |
Configuration parameters
These parameters appear in the JSON configuration blocks throughout this page.
Failover:
| Parameter | Type | Description |
|---|---|---|
enabled | boolean | Turn failover on or off |
providers | string[] | Ordered list of providers to try when one fails |
failover_on | number[] | HTTP status codes that trigger failover (e.g., 429, 500, 502, 503, 504) |
Retries:
| Parameter | Type | Default | Description |
|---|---|---|---|
max_retries | number | 3 | Maximum number of retry attempts before giving up |
initial_backoff_ms | number | 100 | Wait time (ms) before the first retry |
max_backoff_ms | number | 10000 | Upper limit on wait time between retries |
backoff_multiplier | number | 2 | Multiplier applied to backoff after each retry (e.g., 100ms → 200ms → 400ms) |
Circuit breaker:
| Parameter | Type | Description |
|---|---|---|
enabled | boolean | Turn circuit breaking on or off |
error_threshold_percent | number | Error rate (%) that trips the circuit open |
min_requests | number | Minimum request count before the error threshold is evaluated |
open_duration_seconds | number | How long (seconds) the circuit stays open before testing recovery |
half_open_max_requests | number | Number of trial requests allowed during the half-open recovery test |
Timeouts:
| Parameter | Type | Description |
|---|---|---|
request_timeout_seconds | number | Maximum total time for the entire request (including retries and failovers) |
provider_timeout_seconds | number | Maximum time to wait for a single provider response |
Routing strategies
| Strategy | How it works |
|---|---|
| Round Robin | Evenly across providers in rotation |
| Weighted | Based on assigned weights (e.g., 70% OpenAI, 30% Anthropic) |
| Latency-based | Routes to fastest provider based on recent response times |
| Cost-optimized | Cheapest provider that supports the requested model |
| Adaptive | Dynamically adjusts weights based on real-time performance |
| Fastest | Sends to all providers simultaneously, returns the first response — note that you are billed for every call made, including those whose responses are discarded |
Configuring a routing policy

- Open Prism dashboard at https://app.futureagi.com/dashboard/gateway/routing
- Navigate to Routing
- Click Create Policy
- Enter name and optional description
- Select strategy
- Configure strategy-specific settings
- Click Save
from prism import Prism
client = Prism(
api_key="sk-prism-your-key",
base_url="https://gateway.futureagi.com",
control_plane_url="https://api.futureagi.com",
)
# Create a weighted routing policy
policy = client.routing.create(
name="Production routing",
strategy="weighted",
config={"weights": {"openai": 70, "anthropic": 30}},
description="70/30 split between OpenAI and Anthropic",
)
# List all routing policies
policies = client.routing.list()
# Update an existing policy
client.routing.update(
policy["id"],
strategy="least-latency",
config={"providers": ["openai", "anthropic", "gemini"], "failover_on": [429, 500, 502, 503, 504]},
) import { Prism } from "@futureagi/prism";
const client = new Prism({
apiKey: "sk-prism-your-key",
baseUrl: "https://gateway.futureagi.com",
controlPlaneUrl: "https://api.futureagi.com",
});
const policy = await client.routing.create({
name: "Production routing",
strategy: "weighted",
config: { weights: { openai: 70, anthropic: 30 } },
description: "70/30 split between OpenAI and Anthropic",
});
const policies = await client.routing.list();
await client.routing.update(policy.id, {
strategy: "least-latency",
config: { providers: ["openai", "anthropic", "gemini"], failoverOn: [429, 500, 502, 503, 504] },
}); Failover
Failover triggers on specific HTTP status codes and error conditions: 429 (rate limit), 5xx (server errors), timeouts, and connection errors. The providers array defines the failover order. When the primary provider fails, Prism automatically routes to the next provider in the list.
{
"failover": {
"enabled": true,
"providers": ["openai", "anthropic", "gemini"],
"failover_on": [429, 500, 502, 503, 504]
}
}
Note
The providers array defines the failover order. Prism will attempt each provider in sequence until one succeeds.
Retries
Prism uses exponential backoff for retries. This means it waits progressively longer between each retry attempt — for example, 100ms, then 200ms, then 400ms. This gives struggling providers time to recover instead of flooding them with rapid retry requests.
| Setting | Description | Default |
|---|---|---|
| max_retries | Maximum number of retry attempts | 3 |
| initial_backoff_ms | Initial backoff duration in milliseconds | 100 |
| max_backoff_ms | Maximum backoff duration in milliseconds | 10000 |
| backoff_multiplier | Multiplier for exponential backoff | 2 |
{
"retries": {
"max_retries": 3,
"initial_backoff_ms": 100,
"max_backoff_ms": 10000,
"backoff_multiplier": 2
}
}
Circuit breaking
Think of a circuit breaker like a fuse box. When a provider starts failing repeatedly, the circuit “trips” — Prism stops sending requests to that provider entirely and routes to healthy alternatives instead. After a recovery window, Prism tests the provider with a few trial requests. If those succeed, the circuit “closes” and normal routing resumes. This prevents a single failing provider from degrading your entire application.
Circuit breaking prevents cascading failures by stopping requests to a provider that is experiencing issues. The circuit breaker has three states: Closed (normal operation), Open (rejecting requests), and Half-Open (testing recovery).
| State | Behavior |
|---|---|
| Closed | Normal operation, requests pass through |
| Open | Requests rejected immediately, no calls to provider |
| Half-Open | Limited requests allowed to test if provider recovered |
{
"circuit_breaker": {
"enabled": true,
"error_threshold_percent": 50,
"min_requests": 10,
"open_duration_seconds": 60,
"half_open_max_requests": 3
}
}
Tip
Circuit breaking works seamlessly with failover. When a circuit opens, Prism automatically routes to the next available provider.
Timeouts
Configure per-request and per-provider timeouts to prevent hanging requests.
{
"timeouts": {
"request_timeout_seconds": 30,
"provider_timeout_seconds": 25
}
}
Example: High-availability setup
This configuration combines weighted routing, failover, retries, and circuit breaking for a production-grade setup:
{
"name": "Production HA",
"strategy": "weighted",
"config": {
"weights": {
"openai": 60,
"anthropic": 30,
"gemini": 10
},
"failover": {
"enabled": true,
"providers": ["openai", "anthropic", "gemini"],
"failover_on": [429, 500, 502, 503, 504]
},
"retries": {
"max_retries": 3,
"initial_backoff_ms": 100,
"max_backoff_ms": 10000,
"backoff_multiplier": 2
},
"circuit_breaker": {
"enabled": true,
"error_threshold_percent": 50,
"min_requests": 10,
"open_duration_seconds": 60,
"half_open_max_requests": 3
},
"timeouts": {
"request_timeout_seconds": 30,
"provider_timeout_seconds": 25
}
}
}