Linked Traces
Associate prompts with production traces to monitor latency, token usage, and cost per prompt version in the Prompt Workbench.
About
Every time your application sends a prompt to a model, Future AGI records it as a trace: the inputs, outputs, latency, tokens used, and cost. On their own, those traces tell you how your application is performing. Linked traces connect each trace back to the specific prompt and version that produced it.
Once linked, the Prompt Workbench shows aggregated metrics per prompt version alongside the prompt itself. Instead of searching through individual traces, you see a consolidated view: how many times a prompt was called, its typical latency and cost, and how those metrics shift as you iterate.
When to use
- Validating a prompt change in production: Compare latency and cost between versions on real traffic, not just test runs.
- Diagnosing a cost spike: Metrics per prompt version show exactly which prompt or version is driving spend.
- Comparing active versions: See real-world performance across prompt versions side by side to decide which to keep.
- Auditing prompt usage: Trace count shows which prompts are actively being called and which are stale or abandoned.
Linked Traces vs Raw Traces
| Raw traces | Linked traces | |
|---|---|---|
| What you see | Application-level metrics | Metrics per prompt and version |
| Attribution | Anonymous API calls | Tied to a specific template and version |
| Where to view | Observe / tracing dashboard | Prompt Workbench Metrics tab |
| Setup required | SDK instrumentation | SDK instrumentation + template reference in request |
How to
To link prompts to traces, you need to associate the prompt used in a generation with the corresponding trace. The process is described in the observability and manual tracing docs: Log prompt templates. Once your application sends traces that include the prompt template (or template ID), Future AGI links those traces to the prompt in the Prompt Workbench.
Metrics and Analytics
After linking, open your prompt in the dashboard and go to the Metrics tab.
| Metric | What it tells you |
|---|---|
| Median Latency | Typical time for the model to produce a response. Lower is better for responsiveness; use it to spot slow prompts or model changes. |
| Median Input Tokens | Typical size of the prompt sent to the model. Helps you see verbosity and compare input length across versions. |
| Median Output Tokens | Typical length of the model’s reply. Useful for cost and length control; compare after changing instructions or max tokens. |
| Median Costs | Typical cost per generation for this prompt. Use it to compare cost across prompt versions or models. |
| Traces Count | How many times this prompt was used in the selected period. Shows which prompts are active and where to focus optimization. |
| First and Last Generation | When the prompt was first and last used. Confirms the time range of the data you’re viewing. |
Compare the same metric across prompt versions or time ranges to see if a change improved latency, cost, or token usage.