In-line Evaluations

Run evaluations directly inside a traced span so results are automatically attached to that span in the Future AGI dashboard.

What it is

In-line evaluations let you run an evaluation from the Future AGI AI Evaluations library directly within the context of an active span. Instead of setting up separate evaluation tasks with filters, you call evaluator.evaluate() with trace_eval=True inside a span, and the evaluation results are automatically attached to that span as attributes.

This lets you see evaluation results in the exact context of the operation being traced — such as an LLM call — without any extra wiring.

Use cases

  • Per-span quality checks — Attach groundedness, relevance, or custom eval scores directly to the LLM span that produced the output.
  • Simplified evaluation setup — Skip configuring separate evaluation tasks and filters; run evals inline where your logic runs.
  • Side-by-side tracing and evaluation — View both the trace data and the evaluation result in the same span in the Future AGI dashboard.

How to

Set up your environment

Register a tracer provider and initialize the Evaluator with your API credentials.

import os
import openai
from fi_instrumentation import register, FITracer
from fi_instrumentation.fi_types import (
    ProjectType
)
from fi.evals import Evaluator


# Register the tracer
trace_provider = register(
    project_type=ProjectType.OBSERVE,
    project_name="YOUR_PROJECT_NAME",
    set_global_tracer_provider=True
)

# Initialize the Evaluator
evaluator = Evaluator(fi_api_key=os.getenv("FI_API_KEY"), fi_secret_key=os.getenv("FI_SECRET_KEY"))

client = openai.OpenAI()
tracer = FITracer(trace_provider.get_tracer(__name__))

Run an evaluation inside a span

Call evaluator.evaluate() with trace_eval=True inside an active span. The evaluation result will be automatically linked to that span.

with tracer.start_as_current_span("parent_span") as span:
    completion = client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": "hi how are you?"}],
    )

    span.set_attribute("raw.input", "hi how are you?")
    span.set_attribute("raw.output", completion.choices[0].message.content)

    # Define evaluation configs
    config_groundedness = {
        "eval_templates" : "groundedness",
        "inputs" : {
            "input": "hi how are you?",
            "output": completion.choices[0].message.content,
        },
        "model_name" : "turing_large"
    }

    # Run the evaluations with trace_eval=True
    eval_result1 = evaluator.evaluate(
        **config_groundedness,
        custom_eval_name="groundedness_check",
        trace_eval=True
    )

    print(eval_result1)

Key concepts

  • trace_eval=True — The essential parameter that enables in-line evaluation. It tells the system to find the current active span and attach the evaluation results to it as span attributes.
  • custom_eval_name — Required. A unique, human-readable name for this evaluation instance. It distinguishes between multiple evaluations of the same type within a trace and appears as the label in the UI.
  • Evaluator — The Future AGI evaluations client. Initialize it with your FI_API_KEY and FI_SECRET_KEY credentials.
  • eval_templates — The name of the evaluation template from the Future AGI AI Evaluations library (e.g., "groundedness").
  • Active span context — The evaluation must be called while a span is active (inside a with tracer.start_as_current_span(...) block) so the system knows which span to attach results to.

What you can do next

Was this page helpful?

Questions & Discussion