Assistants API

Use the OpenAI Assistants API through Prism for managed conversations with tool use and file retrieval.

About

Prism fully proxies the OpenAI Assistants API. Create assistants with instructions and tools, manage conversation threads, and execute runs - all through the gateway. You get the same Assistants API you’d use with OpenAI directly, plus Prism’s routing, cost tracking, rate limiting, and logging on every call.

The Assistants API is stateful (OpenAI stores threads and messages server-side), so it only works with OpenAI as the provider. Use the OpenAI SDK pointed at Prism.

Warning

Routing and failover do not apply to the Assistants API. Threads and runs are stored on OpenAI’s servers, so the assistant’s model must be an OpenAI model.


Endpoints

Assistants

MethodPathDescription
POST/v1/assistantsCreate an assistant
GET/v1/assistantsList assistants
GET/v1/assistants/{id}Get an assistant
POST/v1/assistants/{id}Update an assistant
DELETE/v1/assistants/{id}Delete an assistant

Threads

MethodPathDescription
POST/v1/threadsCreate a thread
GET/v1/threads/{id}Get a thread
POST/v1/threads/{id}Update a thread
DELETE/v1/threads/{id}Delete a thread

Messages

MethodPathDescription
POST/v1/threads/{id}/messagesAdd a message to a thread
GET/v1/threads/{id}/messagesList messages in a thread
GET/v1/threads/{id}/messages/{msg_id}Get a message
POST/v1/threads/{id}/messages/{msg_id}Update a message
DELETE/v1/threads/{id}/messages/{msg_id}Delete a message

Runs

MethodPathDescription
POST/v1/threads/{id}/runsCreate a run
GET/v1/threads/{id}/runsList runs
GET/v1/threads/{id}/runs/{run_id}Get a run
POST/v1/threads/{id}/runs/{run_id}Update a run
POST/v1/threads/{id}/runs/{run_id}/cancelCancel a run
POST/v1/threads/{id}/runs/{run_id}/submit_tool_outputsSubmit tool outputs
GET/v1/threads/{id}/runs/{run_id}/stepsList run steps
POST/v1/threads/runsCreate thread and run in one call

Quick example

Create an assistant, start a conversation, and get a response:

from openai import OpenAI

client = OpenAI(
    base_url="https://gateway.futureagi.com/v1",
    api_key="sk-prism-your-key",
)

# 1. Create an assistant
assistant = client.beta.assistants.create(
    name="Math Tutor",
    instructions="You are a math tutor. Explain concepts step by step.",
    model="gpt-4o",
)

# 2. Create a thread
thread = client.beta.threads.create()

# 3. Add a message
client.beta.threads.messages.create(
    thread_id=thread.id,
    role="user",
    content="Explain the Pythagorean theorem",
)

# 4. Run the assistant
run = client.beta.threads.runs.create_and_poll(
    thread_id=thread.id,
    assistant_id=assistant.id,
)

# 5. Get the response
if run.status == "completed":
    messages = client.beta.threads.messages.list(thread_id=thread.id)
    for msg in messages.data:
        if msg.role == "assistant":
            print(msg.content[0].text.value)
            break
# 1. Create an assistant
ASSISTANT_ID=$(curl -s -X POST https://gateway.futureagi.com/v1/assistants \
  -H "Authorization: Bearer sk-prism-your-key" \
  -H "Content-Type: application/json" \
  -H "OpenAI-Beta: assistants=v2" \
  -d '{
    "name": "Math Tutor",
    "instructions": "You are a math tutor. Explain concepts step by step.",
    "model": "gpt-4o"
  }' | jq -r '.id')

# 2. Create a thread
THREAD_ID=$(curl -s -X POST https://gateway.futureagi.com/v1/threads \
  -H "Authorization: Bearer sk-prism-your-key" \
  -H "Content-Type: application/json" \
  -H "OpenAI-Beta: assistants=v2" \
  -d '{}' | jq -r '.id')

# 3. Add a message
curl -s -X POST "https://gateway.futureagi.com/v1/threads/$THREAD_ID/messages" \
  -H "Authorization: Bearer sk-prism-your-key" \
  -H "Content-Type: application/json" \
  -H "OpenAI-Beta: assistants=v2" \
  -d '{"role": "user", "content": "Explain the Pythagorean theorem"}'

# 4. Create a run
RUN_ID=$(curl -s -X POST "https://gateway.futureagi.com/v1/threads/$THREAD_ID/runs" \
  -H "Authorization: Bearer sk-prism-your-key" \
  -H "Content-Type: application/json" \
  -H "OpenAI-Beta: assistants=v2" \
  -d "{\"assistant_id\": \"$ASSISTANT_ID\"}" | jq -r '.id')

# 5. Poll until complete, then get messages
# (poll GET /v1/threads/$THREAD_ID/runs/$RUN_ID until status is "completed")
curl -s "https://gateway.futureagi.com/v1/threads/$THREAD_ID/messages" \
  -H "Authorization: Bearer sk-prism-your-key" \
  -H "OpenAI-Beta: assistants=v2" | jq '.data[0].content[0].text.value'

Tool use

Assistants can call tools (functions you define) during a run. When the run enters requires_action status, you submit tool outputs to continue.

import json

# Create assistant with tools
assistant = client.beta.assistants.create(
    name="Weather Bot",
    instructions="You help users check the weather.",
    model="gpt-4o",
    tools=[{
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get current weather for a city",
            "parameters": {
                "type": "object",
                "properties": {
                    "city": {"type": "string", "description": "City name"},
                },
                "required": ["city"],
            },
        },
    }],
)

thread = client.beta.threads.create()
client.beta.threads.messages.create(
    thread_id=thread.id,
    role="user",
    content="What's the weather in Tokyo?",
)

run = client.beta.threads.runs.create(
    thread_id=thread.id,
    assistant_id=assistant.id,
)

# Poll until the run needs action or completes
import time
while run.status in ("queued", "in_progress"):
    time.sleep(1)
    run = client.beta.threads.runs.retrieve(
        thread_id=thread.id,
        run_id=run.id,
    )

if run.status == "requires_action":
    tool_calls = run.required_action.submit_tool_outputs.tool_calls

    # Process each tool call
    tool_outputs = []
    for call in tool_calls:
        args = json.loads(call.function.arguments)
        # Your actual function call here
        result = f"22°C and sunny in {args['city']}"
        tool_outputs.append({
            "tool_call_id": call.id,
            "output": result,
        })

    # Submit outputs and wait for completion
    run = client.beta.threads.runs.submit_tool_outputs_and_poll(
        thread_id=thread.id,
        run_id=run.id,
        tool_outputs=tool_outputs,
    )

if run.status == "completed":
    messages = client.beta.threads.messages.list(thread_id=thread.id)
    print(messages.data[0].content[0].text.value)

Assistants can search uploaded files using vector stores. Upload files, attach them to a vector store, then give the assistant access:

# Upload a file
file = client.files.create(
    file=open("knowledge_base.pdf", "rb"),
    purpose="assistants",
)

# Create a vector store and add the file
vector_store = client.beta.vector_stores.create(name="Knowledge Base")
client.beta.vector_stores.files.create(
    vector_store_id=vector_store.id,
    file_id=file.id,
)

# Create assistant with file search
assistant = client.beta.assistants.create(
    name="Research Assistant",
    instructions="Answer questions using the provided documents.",
    model="gpt-4o",
    tools=[{"type": "file_search"}],
    tool_resources={
        "file_search": {
            "vector_store_ids": [vector_store.id],
        }
    },
)

# Ask a question about the uploaded file
thread = client.beta.threads.create()
client.beta.threads.messages.create(
    thread_id=thread.id,
    role="user",
    content="What does the document say about quarterly revenue?",
)

run = client.beta.threads.runs.create_and_poll(
    thread_id=thread.id,
    assistant_id=assistant.id,
)

if run.status == "completed":
    messages = client.beta.threads.messages.list(thread_id=thread.id)
    print(messages.data[0].content[0].text.value)

Streaming runs

Stream run events for real-time UI updates instead of polling:

from openai import AssistantEventHandler

class MyHandler(AssistantEventHandler):
    def on_text_created(self, text):
        print("\nassistant > ", end="", flush=True)

    def on_text_delta(self, delta, snapshot):
        print(delta.value, end="", flush=True)

    def on_tool_call_created(self, tool_call):
        print(f"\n  Tool call: {tool_call.type}", flush=True)

# Using thread and assistant from earlier examples
with client.beta.threads.runs.stream(
    thread_id=thread.id,    # from the thread you created
    assistant_id=assistant.id,  # from the assistant you created
    event_handler=MyHandler(),
) as stream:
    stream.until_done()

What Prism adds

Since Prism proxies every Assistants API call, you get:

  • Cost tracking: Every run, message creation, and retrieval call is logged with cost in the x-prism-cost header
  • Rate limiting: Per-key and per-org limits apply to all Assistants API calls
  • Logging: Full request/response logging for debugging and compliance
  • Access control: Virtual key restrictions (allowed models, IP ACL) apply to the assistant’s model

The x-prism-* response headers are returned on every Assistants API response, just like any other Prism endpoint.


Next Steps

Was this page helpful?

Questions & Discussion