Quickstart

Make your first LLM request through Prism in under 5 minutes.

About

Point your existing OpenAI SDK at Prism by changing two lines: base_url and api_key. All providers work through the same API. No new SDK required.

Prerequisites

  1. Future AGI account - sign up at app.futureagi.com
  2. Prism API key - found in your dashboard under Settings > API Keys. Keys start with sk-prism-.
  3. At least one provider configured - add a provider (OpenAI, Anthropic, Google, etc.) in Prism > Providers

Make your first request

If you already use the OpenAI SDK, change two lines and you’re done:

pip install prism-ai
from prism import Prism

client = Prism(
    api_key="sk-prism-your-api-key-here",
    base_url="https://gateway.futureagi.com",
)

response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": "What is the capital of France?"}],
)

print(response.choices[0].message.content)
# Output: Paris
from openai import OpenAI

# Already using OpenAI? Just swap base_url and api_key
client = OpenAI(
    base_url="https://gateway.futureagi.com/v1",
    api_key="sk-prism-your-api-key-here",
)

response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": "What is the capital of France?"}],
)

print(response.choices[0].message.content)
# Output: Paris
import litellm

response = litellm.completion(
    model="openai/gpt-4o-mini",
    messages=[{"role": "user", "content": "What is the capital of France?"}],
    api_key="sk-prism-your-api-key-here",
    base_url="https://gateway.futureagi.com/v1",
)

print(response.choices[0].message.content)
# Output: Paris
curl -X POST https://gateway.futureagi.com/v1/chat/completions \
  -H "Authorization: Bearer sk-prism-your-api-key-here" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o-mini",
    "messages": [
      {"role": "user", "content": "What is the capital of France?"}
    ]
  }'

That’s it. Your existing code works with Prism. Every request now gets routing, caching, guardrails, and cost tracking automatically.

Check response headers

Prism adds metadata to every response so you can see what happened. Using the client from Step 1:

# Using the OpenAI SDK client from Step 1
response = client.chat.completions.with_raw_response.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": "Hello"}],
)

print(f"Provider:  {response.headers.get('x-prism-provider')}")
print(f"Latency:   {response.headers.get('x-prism-latency-ms')}ms")
print(f"Cost:      ${response.headers.get('x-prism-cost')}")
print(f"Cache:     {response.headers.get('x-prism-cache')}")
print(f"Model:     {response.headers.get('x-prism-model-used')}")

# Parse the actual response
completion = response.parse()
print(f"Response:  {completion.choices[0].message.content}")

Example output:

Provider:  openai
Latency:   423ms
Cost:      $0.000045
Cache:     miss
Model:     gpt-4o-mini
Response:  Hello! How can I help you today?

Switch providers

Change the model name to route to a different provider. Using the same client from Step 1:

# OpenAI
response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": "Hello"}]
)

# Anthropic
response = client.chat.completions.create(
    model="claude-sonnet-4-6",
    messages=[{"role": "user", "content": "Hello"}]
)

# Google Gemini
response = client.chat.completions.create(
    model="gemini-2.0-flash",
    messages=[{"role": "user", "content": "Hello"}]
)

Prism translates the request to each provider’s native format. Your code doesn’t change.

Try streaming

Stream responses to show output as it arrives:

stream = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": "Write a short poem about AI"}],
    stream=True,
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)
stream = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": "Write a short poem about AI"}],
    stream=True,
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)
import litellm

stream = litellm.completion(
    model="openai/gpt-4o-mini",
    messages=[{"role": "user", "content": "Write a short poem about AI"}],
    api_key="sk-prism-your-api-key-here",
    base_url="https://gateway.futureagi.com/v1",
    stream=True,
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)
curl -X POST https://gateway.futureagi.com/v1/chat/completions \
  -H "Authorization: Bearer sk-prism-your-api-key-here" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o-mini",
    "messages": [
      {"role": "user", "content": "Write a short poem about AI"}
    ],
    "stream": true
  }'

Using a framework?

Prism works with any OpenAI-compatible client. If you use LangChain, LlamaIndex, or any other framework that supports custom base URLs, just point it at https://gateway.futureagi.com/v1 with your Prism key.


Next Steps

Was this page helpful?

Questions & Discussion