Quickstart
Make your first LLM request through Prism in under 5 minutes
What is Prism?
Prism is Future AGI’s AI Gateway — a proxy layer between your application and LLM providers. It provides a single API that handles routing across 100+ providers, enforces safety guardrails, caches responses, tracks costs, and delivers full observability into your LLM usage.
Note
Already using the OpenAI SDK? You can skip SDK installation entirely. Point your existing client’s base_url at https://gateway.futureagi.com and swap your API key. All providers work through the same OpenAI-format API.
Prerequisites
Before you begin:
- Future AGI account — Sign up at app.futureagi.com if you do not have one.
- Prism API key — Available in your dashboard under Settings → API Keys. Prism keys start with
sk-prism-. - At least one provider configured — If you have not added a provider yet, see Manage Providers. You will need your own API key for the provider you want to use (e.g., an OpenAI key).
Install the SDK
Install the Prism SDK for your language:
pip install prism-ainpm install @futureagi/prism Set credentials
Set your API key and gateway URL as environment variables:
export PRISM_API_KEY=sk-prism-your-api-key-here
export PRISM_BASE_URL=https://gateway.futureagi.comYour API key starts with sk-prism- and is available in your Prism dashboard under Settings → API Keys.
Make your first request
Send a chat completion request to any supported provider:
from prism import Prism
client = Prism(
api_key="sk-prism-your-api-key-here",
base_url="https://gateway.futureagi.com"
)
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=[
{"role": "user", "content": "What is the capital of France?"}
]
)
print(response.choices[0].message.content)
# Output: Parisimport { Prism } from '@futureagi/prism';
const client = new Prism({
apiKey: 'sk-prism-your-api-key-here',
baseUrl: 'https://gateway.futureagi.com'
});
const response = await client.chat.completions.create({
model: 'gpt-4o-mini',
messages: [
{ role: 'user', content: 'What is the capital of France?' }
]
});
console.log(response.choices[0].message.content);
// Output: Pariscurl -X POST https://gateway.futureagi.com/v1/chat/completions \
-H "Authorization: Bearer sk-prism-your-api-key-here" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4o-mini",
"messages": [
{"role": "user", "content": "What is the capital of France?"}
]
}' For the cURL request, you should see a JSON response like:
{
"choices": [{
"message": { "role": "assistant", "content": "Paris" },
"finish_reason": "stop"
}],
"model": "gpt-4o-mini"
} Check response headers
Every Prism response includes metadata headers that tell you what happened: which provider handled the request, how long it took, what it cost, and whether the cache was used. This is useful for debugging and cost monitoring.
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=[
{"role": "user", "content": "What is the capital of France?"}
]
)
# Access response metadata
print(f"Request ID: {response.headers.get('X-Prism-Request-Id')}")
print(f"Provider: {response.headers.get('X-Prism-Provider')}")
print(f"Latency: {response.headers.get('X-Prism-Latency-Ms')}ms")
print(f"Cost: ${response.headers.get('X-Prism-Cost')}")
print(f"Cache: {response.headers.get('X-Prism-Cache')}")const response = await client.chat.completions.create({
model: 'gpt-4o-mini',
messages: [
{ role: 'user', content: 'What is the capital of France?' }
]
});
// Access response metadata
console.log(`Request ID: ${response.headers.get('X-Prism-Request-Id')}`);
console.log(`Provider: ${response.headers.get('X-Prism-Provider')}`);
console.log(`Latency: ${response.headers.get('X-Prism-Latency-Ms')}ms`);
console.log(`Cost: $${response.headers.get('X-Prism-Cost')}`);
console.log(`Cache: ${response.headers.get('X-Prism-Cache')}`);curl -X POST https://gateway.futureagi.com/v1/chat/completions \
-H "Authorization: Bearer sk-prism-your-api-key-here" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4o-mini",
"messages": [
{"role": "user", "content": "What is the capital of France?"}
]
}' -i | grep -i "x-prism" A typical response looks like:
X-Prism-Request-Id: 01HX4K9QZJP7YXMN3T8WVFR2C
X-Prism-Provider: openai
X-Prism-Model-Used: gpt-4o-mini
X-Prism-Latency-Ms: 423
X-Prism-Cost: 0.000045
X-Prism-Cache: miss Try streaming
Stream responses in real time for better user experience:
with client.chat.completions.stream(
model="gpt-4o-mini",
messages=[
{"role": "user", "content": "Write a short poem about AI"}
]
) as stream:
for text in stream.text_stream:
print(text, end="", flush=True)const stream = await client.chat.completions.stream({
model: 'gpt-4o-mini',
messages: [
{ role: 'user', content: 'Write a short poem about AI' }
]
});
for await (const chunk of stream) {
process.stdout.write(chunk.choices[0]?.delta?.content || '');
}curl -X POST https://gateway.futureagi.com/v1/chat/completions \
-H "Authorization: Bearer sk-prism-your-api-key-here" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4o-mini",
"messages": [
{"role": "user", "content": "Write a short poem about AI"}
],
"stream": true
}' Switch providers
Use the same API with different providers by changing the model name. Prism handles translation — your code stays identical.
# OpenAI
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": "Hello"}]
)
# Anthropic
response = client.chat.completions.create(
model="anthropic/claude-haiku-4-5",
messages=[{"role": "user", "content": "Hello"}]
)
# Google Gemini
response = client.chat.completions.create(
model="gemini/gemini-2.0-flash",
messages=[{"role": "user", "content": "Hello"}]
)// OpenAI
const response1 = await client.chat.completions.create({
model: 'gpt-4o-mini',
messages: [{ role: 'user', content: 'Hello' }]
});
// Anthropic
const response2 = await client.chat.completions.create({
model: 'anthropic/claude-haiku-4-5',
messages: [{ role: 'user', content: 'Hello' }]
});
// Google Gemini
const response3 = await client.chat.completions.create({
model: 'gemini/gemini-2.0-flash',
messages: [{ role: 'user', content: 'Hello' }]
});# OpenAI
curl -X POST https://gateway.futureagi.com/v1/chat/completions \
-H "Authorization: Bearer sk-prism-your-api-key-here" \
-H "Content-Type: application/json" \
-d '{"model": "gpt-4o-mini", "messages": [{"role": "user", "content": "Hello"}]}'
# Anthropic
curl -X POST https://gateway.futureagi.com/v1/chat/completions \
-H "Authorization: Bearer sk-prism-your-api-key-here" \
-H "Content-Type: application/json" \
-d '{"model": "anthropic/claude-haiku-4-5", "messages": [{"role": "user", "content": "Hello"}]}'
# Google Gemini
curl -X POST https://gateway.futureagi.com/v1/chat/completions \
-H "Authorization: Bearer sk-prism-your-api-key-here" \
-H "Content-Type: application/json" \
-d '{"model": "gemini/gemini-2.0-flash", "messages": [{"role": "user", "content": "Hello"}]}' Error responses
Understanding common errors helps you handle them in your application.
Guardrail blocked (403)
When a guardrail is set to Enforce mode and triggers on a request, Prism returns 403 before the LLM is ever called:
{
"error": {
"type": "guardrail_triggered",
"code": "forbidden",
"message": "Request blocked by guardrail: pii-detector",
"guardrail": "pii-detector"
}
}
Budget exceeded (429)
When your organization’s spending limit is reached, new requests are blocked until the next billing period:
{
"error": {
"type": "budget_exceeded",
"code": "rate_limit_exceeded",
"message": "Organization monthly budget of $100.00 exceeded"
}
}
Provider unavailable (502)
When the selected provider is down or unreachable and no failover is configured:
{
"error": {
"type": "provider_error",
"code": "bad_gateway",
"message": "Provider openai returned 503: Service Unavailable"
}
}
Tip
To avoid provider failures affecting your users, configure routing with failover so Prism automatically retries with a backup provider.