Linked Traces

Linking prompts to traces is essential for monitoring and improving the performance of your language model applications. By establishing this connection, you can track metrics and evaluations for each prompt version, facilitating iterative enhancements over time.

What it is

Linking prompts to traces means associating each prompt (or prompt version) you use in your application with the corresponding trace recorded in Future AGI. Once linked, you can see metrics and analytics per prompt—latency, token usage, cost, and trace count—so you can monitor performance and improve prompts over time. The connection is established by logging the prompt template (or its ID) when you send a generation request; the platform then attributes that trace to the prompt in the Workbench.


Use cases

  • Monitor prompt performance — See median latency, input/output tokens, and cost per prompt version in one place.
  • Compare versions — Track how different prompt versions or model settings affect latency and cost over time.
  • Debug and iterate — Use trace count and timestamps to see when a prompt was used and refine based on real usage.
  • Cost and usage — Understand which prompts drive the most usage and cost so you can optimize.

How to

To link prompts to traces, you need to associate the prompt used in a generation with the corresponding trace. The process is described in the observability and manual tracing docs: Log prompt templates. Once your application sends traces that include the prompt template (or template ID), Future AGI links those traces to the prompt in the Prompt Workbench.

Metrics and analytics

After linking prompts to traces, you can view metrics for each prompt in the dashboard. Open your prompt in the Future AGI dashboard and go to the Metrics tab.

MetricWhat it tells you
Median LatencyTypical time for the model to produce a response. Lower is better for responsiveness; use it to spot slow prompts or model changes.
Median Input TokensTypical size of the prompt sent to the model. Helps you see verbosity and compare input length across versions.
Median Output TokensTypical length of the model’s reply. Useful for cost and length control; compare after changing instructions or max tokens.
Median CostsTypical cost per generation for this prompt. Use it to compare cost across prompt versions or models.
Traces CountHow many times this prompt was used in the selected period. Shows which prompts are active and where to focus optimization.
First and Last GenerationWhen the prompt was first and last used. Confirms the time range of the data you’re viewing.

Using these metrics: Compare the same metric across prompt versions or time ranges to see if a change improved latency, cost, or token usage. A higher traces count with stable or improving latency and cost usually indicates a healthy, well-used prompt.


What you can do next

Was this page helpful?

Questions & Discussion