Overview

In-line evaluations provide a streamlined method for adding evaluations directly to any span within your trace. This approach simplifies the process compared to setting attributes and defining evaluation tasks with filters. With in-line evaluations, you can define and execute an evaluation from our AI Evaluations library within the context of a specific span, and the results will be automatically linked.

How it works

When you call evaluator.evaluate() with the trace_eval=True parameter inside an active span, the evaluation is executed, and its results are attached to that span as attributes. This allows you to see evaluation results directly in the context of the operation you are tracing, like an LLM call.

Usage

Here’s how to get started with in-line evaluations.

1. Setup and Initialization

First, you need to set up your environment, register a tracer, and initialize the Evaluator.

import os
import openai
from fi_instrumentation import register, FITracer
from fi_instrumentation.fi_types import (
    ProjectType
)
from fi.evals import Evaluator


# Register the tracer
trace_provider = register(
    project_type=ProjectType.OBSERVE,
    project_name="YOUR_PROJECT_NAME",
    set_global_tracer_provider=True
)

# Initialize the Evaluator
evaluator = Evaluator(fi_api_key=os.getenv("FI_API_KEY"), fi_secret_key=os.getenv("FI_SECRET_KEY"))

client = openai.OpenAI()
tracer = FITracer(trace_provider.get_tracer(__name__))

2. Configure and Run In-line Evaluations

To link an evaluation to a specific part of your code, run the evaluation within a span’s context. The span will be automatically linked to the evaluation result.

with tracer.start_as_current_span("parent_span") as span:
    completion = client.chat.completions.create(
        model="gpt-4o",
        messages=[{"role": "user", "content": "hi how are you?"}],
    )
    
    span.set_attribute("raw.input", "hi how are you?")
    span.set_attribute("raw.output", completion.choices[0].message.content)

    # Define evaluation configs
    config_groundedness = {
        "eval_templates" : "groundedness",
        "inputs" : {
            "input": "hi how are you?",
            "output": completion.choices[0].message.content,
        },
        "model_name" : "turing_large"
    }

    # Run the evaluations with trace_eval=True
    eval_result1 = evaluator.evaluate(
        **config_groundedness, 
        custom_eval_name="groundedness_check", 
        trace_eval=True
    )

    print(eval_result1)

Key Parameters

When calling evaluator.evaluate():

  • trace_eval=True: This is the essential parameter that enables the in-line evaluation feature. It tells the system to find the current active span and attach the evaluation results to it.
  • custom_eval_name: This parameter is required and provides a unique, human-readable name for your evaluation instance. It helps distinguish between multiple evaluations, especially of the same type, within a trace. The name will appear in the UI.