Langfuse Integration

Integrate Future AGI evaluations with Langfuse to attach evaluation results directly to your Langfuse traces.

What it is

Langfuse Integration lets you attach evaluation results from the Future AGI AI Evaluations library directly to your Langfuse traces. When you call evaluator.evaluate() with platform="langfuse" inside an active Langfuse span, the evaluation runs and the results are automatically attached as scores to that specific span in your Langfuse dashboard.

Use cases

  • Monitor LLM quality in Langfuse — Correlate evaluation metrics (tone, groundedness, etc.) with specific spans and traces in the Langfuse UI.
  • Per-span evaluation scores — Attach evaluation results to any Langfuse span without configuring separate evaluation tasks.
  • End-to-end observability — Combine Future AGI evaluation templates with Langfuse tracing for comprehensive LLM application monitoring.

How to

Install the required packages

Install the necessary Python packages before you begin.

pip install ai-evaluation fi-instrumentation-otel

Set up your environment

Initialize both the Langfuse and Future AGI clients.

import os
from langfuse import Langfuse
from fi.evals import Evaluator


# 1. Initialize Langfuse
langfuse = Langfuse(
  secret_key=os.getenv("LANGFUSE_SECRET_KEY"),
  public_key=os.getenv("LANGFUSE_PUBLIC_KEY"),
  host=os.getenv("LANGFUSE_HOST")
)

# 2. Initialize the Future AGI Evaluator
evaluator = Evaluator(
    fi_api_key=os.getenv("FI_API_KEY"),
    fi_secret_key=os.getenv("FI_SECRET_KEY"),
)

Note

Make sure you have LANGFUSE_SECRET_KEY, LANGFUSE_PUBLIC_KEY, and LANGFUSE_HOST in your .env file, or pass them directly when initializing the Evaluator:

evaluator = Evaluator(
    fi_api_key=os.getenv("FI_API_KEY"),
    fi_secret_key=os.getenv("FI_SECRET_KEY"),
    langfuse_secret_key=os.getenv("LANGFUSE_SECRET_KEY"),
    langfuse_public_key=os.getenv("LANGFUSE_PUBLIC_KEY"),
    langfuse_host=os.getenv("LANGFUSE_HOST")
)

Run an evaluation within a Langfuse span

Call evaluator.evaluate() with platform="langfuse" inside an active Langfuse span. The evaluation result will be automatically linked to that span as a score.

# Your application logic, e.g. an LLM call
response_from_llm = "this is a sample response."
expected_response = "this is a sample response."

# Start a Langfuse span
with langfuse.start_as_current_span(
    name="OpenAI call",
    input={"user_query": user_query},
) as span:

    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {"role": "user", "content": user_query}
        ]
    )

    result = response.choices[0].message.content
    span.update(output={"response": result})

    # Evaluate the tone of the OpenAI response
    evaluator.evaluate(
        eval_templates="tone",
        inputs={
            "input": result
        },
        custom_eval_name="evaluate_tone",
        model_name="turing_large",
        platform="langfuse"
    )

The results will appear as scores for the span in your Langfuse project.


Key concepts

  • platform="langfuse" — The essential parameter that directs evaluation results to Langfuse and links them with the current active span.
  • custom_eval_name — Required. A unique, human-readable name for your evaluation instance. This name appears as the score label in the Langfuse UI, helping you distinguish between different evaluations.
  • eval_templates — The name of the evaluation template from the Future AGI AI Evaluations library (e.g., "tone", "groundedness").
  • inputs — The data passed to the evaluation template (e.g., input, output, context depending on the template).

What you can do next

Was this page helpful?

Questions & Discussion