- Portkey answers: “What happened, how fast, and how much did it cost?” As an AI gateway, Portkey acts as the operational layer. It unifies your API calls, manages your keys, and gives you a centralized dashboard to monitor crucial operational metrics like latency, cost, and request volume.
- FutureAGI answers: “How good was the response?” As a tracing and evaluation platform, FutureAGI acts as the quality layer. It captures the full context of each request and runs automated evaluations to score the model’s output on modalities like audio, image and text. It also provides custom evaluation metrics for the data.
In this cookbook we’ll learn
Our goal is to create a system that can:- Test multiple LLMs (like GPT-4o, Claude 3.7 Sonnet, Llama) concurrently on a variety of tasks.
- Measure performance metrics like response time and token usage.
- Automatically evaluate the quality of each model’s response using FutureAGI’s built-in evaluators (e.g., conciseness, context adherence, task completion).
- Generate a comprehensive comparison report to easily identify the best model for a given set of tasks.
Core Concepts
- Portkey : An AI Gateway that provides a single, unified API to interact with various LLM providers. It simplifies key management through Virtual Keys, adds resilience with fallbacks/retries, and caches responses to save costs.
- Future AGI Tracing: An AI lifecycle platform designed to support enterprises throughout their AI journey. It combines rapid prototyping, rigorous evaluation, continuous observability, and reliable deployment to help build, monitor, optimize, and secure generative AI applications.
Prerequisites
- Python Environment: Ensure you have Python 3.8+ installed.
-
API Keys:
- A Portkey API Key.
- Virtual Keys for each provider you want to test (OpenAI, Anthropic, VertexAI, Groq, etc.) set up in your Portkey dashboard (https://app.portkey.ai/virtual-keys).
- Future AGI API Key (https://app.futureagi.com/dashboard/keys).
-
Install Libraries:
-
.env
File: Create a.env
file in your project root to securely store your Portkey API Key.
Step-by-Step Guide
You can utilize this colab notebook to run the instrumentation for portkey in futureagiStep 1: Basic Setup and Imports
First, we’ll import the necessary libraries and configure logging. We usedataclasses
to create structured objects for our model configurations and test results, which makes the code cleaner and more maintainable.
Step 2: Setting Up Tracing with FutureAGI Evals
This is the most critical step for automated evaluation. Thesetup_tracing
method configures FutureAGI.
register()
: Initializes a tracing project. We give it aproject_name
and aproject_version_name
to organize our experiments.eval_tags
: This is where the magic happens. We define a list ofEvalTag
objects that tell FutureAGI what to evaluate.
EvalTag
:
type
&value
: Specifies that this evaluation should run on every LLM call span.eval_name
: The built-in evaluation to use (e.g.,CONTEXT_ADHERENCE
).custom_eval_name
: A user-friendly name that will appear in the FutureAGI dashboard (e.g., “Response_Quality”).mapping
: This is crucial. It tells the evaluator where to find the necessary data within the trace. Here, we map the LLM’s input prompt to thecontext
parameter of the evaluator and the LLM’s response to theoutput
parameter.PortkeyInstrumentor().instrument()
: This line activates the instrumentation, linking our FutureAGI setup to any Portkey client created afterward.
Step 3: Defining Models and Test Scenarios
We define the models we want to test and the prompts for our test scenarios. This structure makes it easy to add or remove models and tests. (Feel Free to add more test prompts on your own)Step 4: Executing a Test and Capturing Results
Thetest_model
function orchestrates a single test run.
- It creates a
Portkey
client using the model-specific Virtual Key. - It constructs the request payload.
- It calls
client.chat.completions.create()
. Because of our instrumentation in Step 2, this call is automatically traced. - It measures the time taken and parses the response and token usage.
- It returns a structured
TestResult
object.
Step 4: Orchestrate with a main Function
The main function ties everything together. It gets the models and scenarios, then loops through them, calling our test_model function for each combination.- FutureAGI Dashboard - The Quality View

Trace Analysis
Click into the experiment to see traces for each API call. In the trace details, you’ll find the results of your automated EvalTags (Response_Quality, Task_Completion), giving you an objective score for the model’s performance.

- Unified Logs: See a single, unified log of all requests sent to OpenAI, Anthropic, and Groq.
- Cost and Latency: Portkey automatically tracks the cost and latency for every single call, allowing you to easily compare these crucial operational metrics.
