Quickstart
Make your first LLM request through Prism in under 5 minutes.
Warning
The prism-ai Python package and @futureagi/prism TypeScript package are being renamed. The current packages will continue to work but are deprecated. Watch for the updated package names in an upcoming release.
About
Point your existing OpenAI SDK at Prism by changing two lines: base_url and api_key. All providers work through the same API. No new SDK required.
Prerequisites
- Future AGI account - sign up at app.futureagi.com
- Prism API key - found in your dashboard under Settings > API Keys. Keys start with
sk-prism-. - At least one provider configured - add a provider (OpenAI, Anthropic, Google, etc.) in Prism > Providers
Make your first request
If you already use the OpenAI SDK, change two lines and you’re done:
pip install prism-aifrom prism import Prism
client = Prism(
api_key="sk-prism-your-api-key-here",
base_url="https://gateway.futureagi.com",
)
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": "What is the capital of France?"}],
)
print(response.choices[0].message.content)
# Output: Paris from openai import OpenAI
# Already using OpenAI? Just swap base_url and api_key
client = OpenAI(
base_url="https://gateway.futureagi.com/v1",
api_key="sk-prism-your-api-key-here",
)
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": "What is the capital of France?"}],
)
print(response.choices[0].message.content)
# Output: Paris import litellm
response = litellm.completion(
model="openai/gpt-4o-mini",
messages=[{"role": "user", "content": "What is the capital of France?"}],
api_key="sk-prism-your-api-key-here",
base_url="https://gateway.futureagi.com/v1",
)
print(response.choices[0].message.content)
# Output: Paris curl -X POST https://gateway.futureagi.com/v1/chat/completions \
-H "Authorization: Bearer sk-prism-your-api-key-here" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4o-mini",
"messages": [
{"role": "user", "content": "What is the capital of France?"}
]
}' That’s it. Your existing code works with Prism. Every request now gets routing, caching, guardrails, and cost tracking automatically.
Check response headers
Prism adds metadata to every response so you can see what happened. Using the client from Step 1:
# Using the OpenAI SDK client from Step 1
response = client.chat.completions.with_raw_response.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": "Hello"}],
)
print(f"Provider: {response.headers.get('x-prism-provider')}")
print(f"Latency: {response.headers.get('x-prism-latency-ms')}ms")
print(f"Cost: ${response.headers.get('x-prism-cost')}")
print(f"Cache: {response.headers.get('x-prism-cache')}")
print(f"Model: {response.headers.get('x-prism-model-used')}")
# Parse the actual response
completion = response.parse()
print(f"Response: {completion.choices[0].message.content}")Example output:
Provider: openai
Latency: 423ms
Cost: $0.000045
Cache: miss
Model: gpt-4o-mini
Response: Hello! How can I help you today? Switch providers
Change the model name to route to a different provider. Using the same client from Step 1:
# OpenAI
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": "Hello"}]
)
# Anthropic
response = client.chat.completions.create(
model="claude-sonnet-4-6",
messages=[{"role": "user", "content": "Hello"}]
)
# Google Gemini
response = client.chat.completions.create(
model="gemini-2.0-flash",
messages=[{"role": "user", "content": "Hello"}]
)Prism translates the request to each provider’s native format. Your code doesn’t change.
Try streaming
Stream responses to show output as it arrives:
stream = client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": "Write a short poem about AI"}],
stream=True,
)
for chunk in stream:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="", flush=True) stream = client.chat.completions.create(
model="gpt-4o-mini",
messages=[{"role": "user", "content": "Write a short poem about AI"}],
stream=True,
)
for chunk in stream:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="", flush=True) import litellm
stream = litellm.completion(
model="openai/gpt-4o-mini",
messages=[{"role": "user", "content": "Write a short poem about AI"}],
api_key="sk-prism-your-api-key-here",
base_url="https://gateway.futureagi.com/v1",
stream=True,
)
for chunk in stream:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="", flush=True) curl -X POST https://gateway.futureagi.com/v1/chat/completions \
-H "Authorization: Bearer sk-prism-your-api-key-here" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4o-mini",
"messages": [
{"role": "user", "content": "Write a short poem about AI"}
],
"stream": true
}' Using a framework?
Prism works with any OpenAI-compatible client. If you use LangChain, LlamaIndex, or any other framework that supports custom base URLs, just point it at https://gateway.futureagi.com/v1 with your Prism key.
Next Steps
How it works
Understand the request pipeline and plugin architecture
Supported providers
Add and configure LLM providers
Guardrails
Add safety checks to requests and responses
Routing
Set up load balancing and failover
Chat completions
Full endpoint reference with function calling and vision
All endpoints
See every API endpoint available