Langfuse Integration with Future AGI Evaluation Results

Integrate Future AGI evaluations with Langfuse to attach evaluation scores and results directly to your Langfuse traces.

About

Langfuse provides tracing but does not have a built-in evaluation engine. This integration adds that missing piece. By setting platform="langfuse" on evaluator.evaluate(), Future AGI runs the evaluation and attaches the result as a score directly to the active Langfuse span. Metrics like tone, groundedness, and relevance appear alongside trace data in the Langfuse dashboard.

When to use

Monitor LLM quality in Langfuse: Correlate evaluation metrics (tone, groundedness, etc.) with specific spans and traces in the Langfuse UI.
Per-span evaluation scores: Attach evaluation results to any Langfuse span without configuring separate evaluation tasks.
End-to-end observability: Combine Future AGI evaluation templates with Langfuse tracing for comprehensive LLM application monitoring.

How to

Install the required packages

Install the necessary Python packages before you begin.

pip install ai-evaluation fi-instrumentation-otel

Set up your environment

Initialize both the Langfuse and Future AGI clients.

import os
from langfuse import Langfuse
from fi.evals import Evaluator


# 1. Initialize Langfuse
langfuse = Langfuse(
  secret_key=os.getenv("LANGFUSE_SECRET_KEY"),
  public_key=os.getenv("LANGFUSE_PUBLIC_KEY"),
  host=os.getenv("LANGFUSE_HOST")
)

# 2. Initialize the Future AGI Evaluator
evaluator = Evaluator(
    fi_api_key=os.getenv("FI_API_KEY"),
    fi_secret_key=os.getenv("FI_SECRET_KEY"),
)

Note

Make sure you have LANGFUSE_SECRET_KEY, LANGFUSE_PUBLIC_KEY, and LANGFUSE_HOST in your .env file, or pass them directly when initializing the Evaluator:

evaluator = Evaluator(
    fi_api_key=os.getenv("FI_API_KEY"),
    fi_secret_key=os.getenv("FI_SECRET_KEY"),
    langfuse_secret_key=os.getenv("LANGFUSE_SECRET_KEY"),
    langfuse_public_key=os.getenv("LANGFUSE_PUBLIC_KEY"),
    langfuse_host=os.getenv("LANGFUSE_HOST")
)

Run an evaluation within a Langfuse span

Call evaluator.evaluate() with platform="langfuse" inside an active Langfuse span. The evaluation result will be automatically linked to that span as a score.

# Your application logic, e.g. an LLM call
response_from_llm = "this is a sample response."
expected_response = "this is a sample response."

# Start a Langfuse span
with langfuse.start_as_current_observation(
    name="OpenAI call",
    input={"user_query": user_query},
) as span:

    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {"role": "user", "content": user_query}
        ]
    )

    result = response.choices[0].message.content
    span.update(output={"response": result})

    # Evaluate the tone of the OpenAI response
    evaluator.evaluate(
        eval_templates="tone",
        inputs={
            "input": result
        },
        custom_eval_name="evaluate_tone",
        model_name="turing_large",
        platform="langfuse"
    )

The results will appear as scores for the span in your Langfuse project.

Key concepts

platform="langfuse":The essential parameter that directs evaluation results to Langfuse and links them with the current active span.
custom_eval_name:Required. A unique, human-readable name for your evaluation instance. This name appears as the score label in the Langfuse UI, helping you distinguish between different evaluations.
eval_templates:The name of the evaluation template from the Future AGI AI Evaluations library (e.g., "tone", "groundedness").
inputs:The data passed to the evaluation template (e.g., input, output, context depending on the template).

Questions & Discussion

Langfuse Integration with Future AGI Evaluation Results

About

When to use

How to

Install the required packages

Set up your environment

Run an evaluation within a Langfuse span

Key concepts

Next Steps

Running Your First Eval

In-line Evaluations

Set Up Tracing

Auto Instrumentation