In-line Evaluations

Run evaluations directly inside a traced span so results are automatically attached to that span in the Future AGI dashboard.

About

Evaluation results are most useful when they sit next to the data that produced them. Running evals as a separate step means matching results back to specific spans after the fact. In-line evaluations remove that gap by running evaluator.evaluate() with trace_eval=True inside an active span. The evaluation result is automatically attached to that span as attributes, so both the trace data and the eval score appear together in the dashboard.


When to use

  • Per-span quality checks: Attach groundedness, relevance, or custom eval scores directly to the LLM span that produced the output.
  • Simplified evaluation setup: Skip configuring separate evaluation tasks and filters. Run evals inline where the logic runs.
  • Side-by-side tracing and evaluation: View both the trace data and the evaluation result in the same span in the dashboard.

How to

Set up your environment

Register a tracer provider and initialize the Evaluator with your API credentials.

import os
import openai
from fi_instrumentation import register, FITracer
from fi_instrumentation.fi_types import (
    ProjectType
)
from fi.evals import Evaluator


# Register the tracer
trace_provider = register(
    project_type=ProjectType.OBSERVE,
    project_name="YOUR_PROJECT_NAME",
    set_global_tracer_provider=True
)

# Initialize the Evaluator
evaluator = Evaluator(fi_api_key=os.getenv("FI_API_KEY"), fi_secret_key=os.getenv("FI_SECRET_KEY"))

client = openai.OpenAI()
tracer = FITracer(trace_provider.get_tracer(__name__))

Run an evaluation inside a span

Call evaluator.evaluate() with trace_eval=True inside an active span. The evaluation result will be automatically linked to that span.

with tracer.start_as_current_span("parent_span") as span:
    completion = client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": "hi how are you?"}],
    )

    span.set_attribute("raw.input", "hi how are you?")
    span.set_attribute("raw.output", completion.choices[0].message.content)

    # Define evaluation configs
    config_groundedness = {
        "eval_templates" : "groundedness",
        "inputs" : {
            "input": "hi how are you?",
            "output": completion.choices[0].message.content,
        },
        "model_name" : "turing_large"
    }

    # Run the evaluations with trace_eval=True
    eval_result1 = evaluator.evaluate(
        **config_groundedness,
        custom_eval_name="groundedness_check",
        trace_eval=True
    )

    print(eval_result1)

Key concepts

  • trace_eval=True:The essential parameter that enables in-line evaluation. It tells the system to find the current active span and attach the evaluation results to it as span attributes.
  • custom_eval_name:Required. A unique, human-readable name for this evaluation instance. It distinguishes between multiple evaluations of the same type within a trace and appears as the label in the UI.
  • Evaluator:The Future AGI evaluations client. Initialize it with your FI_API_KEY and FI_SECRET_KEY credentials.
  • eval_templates:The name of the evaluation template from the Future AGI AI Evaluations library (e.g., "groundedness").
  • Active span context:The evaluation must be called while a span is active (inside a with tracer.start_as_current_span(...) block) so the system knows which span to attach results to.

Next Steps

Was this page helpful?

Questions & Discussion