Langfuse Integration

Integrate Future AGI evaluations with Langfuse to attach evaluation results directly to your Langfuse traces.

About

Langfuse provides tracing but does not have a built-in evaluation engine. This integration adds that missing piece. By setting platform="langfuse" on evaluator.evaluate(), Future AGI runs the evaluation and attaches the result as a score directly to the active Langfuse span. Metrics like tone, groundedness, and relevance appear alongside trace data in the Langfuse dashboard.


When to use

  • Monitor LLM quality in Langfuse: Correlate evaluation metrics (tone, groundedness, etc.) with specific spans and traces in the Langfuse UI.
  • Per-span evaluation scores: Attach evaluation results to any Langfuse span without configuring separate evaluation tasks.
  • End-to-end observability: Combine Future AGI evaluation templates with Langfuse tracing for comprehensive LLM application monitoring.

How to

Install the required packages

Install the necessary Python packages before you begin.

pip install ai-evaluation fi-instrumentation-otel

Set up your environment

Initialize both the Langfuse and Future AGI clients.

import os
from langfuse import Langfuse
from fi.evals import Evaluator


# 1. Initialize Langfuse
langfuse = Langfuse(
  secret_key=os.getenv("LANGFUSE_SECRET_KEY"),
  public_key=os.getenv("LANGFUSE_PUBLIC_KEY"),
  host=os.getenv("LANGFUSE_HOST")
)

# 2. Initialize the Future AGI Evaluator
evaluator = Evaluator(
    fi_api_key=os.getenv("FI_API_KEY"),
    fi_secret_key=os.getenv("FI_SECRET_KEY"),
)

Note

Make sure you have LANGFUSE_SECRET_KEY, LANGFUSE_PUBLIC_KEY, and LANGFUSE_HOST in your .env file, or pass them directly when initializing the Evaluator:

evaluator = Evaluator(
    fi_api_key=os.getenv("FI_API_KEY"),
    fi_secret_key=os.getenv("FI_SECRET_KEY"),
    langfuse_secret_key=os.getenv("LANGFUSE_SECRET_KEY"),
    langfuse_public_key=os.getenv("LANGFUSE_PUBLIC_KEY"),
    langfuse_host=os.getenv("LANGFUSE_HOST")
)

Run an evaluation within a Langfuse span

Call evaluator.evaluate() with platform="langfuse" inside an active Langfuse span. The evaluation result will be automatically linked to that span as a score.

# Your application logic, e.g. an LLM call
response_from_llm = "this is a sample response."
expected_response = "this is a sample response."

# Start a Langfuse span
with langfuse.start_as_current_observation(
    name="OpenAI call",
    input={"user_query": user_query},
) as span:

    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {"role": "user", "content": user_query}
        ]
    )

    result = response.choices[0].message.content
    span.update(output={"response": result})

    # Evaluate the tone of the OpenAI response
    evaluator.evaluate(
        eval_templates="tone",
        inputs={
            "input": result
        },
        custom_eval_name="evaluate_tone",
        model_name="turing_large",
        platform="langfuse"
    )

The results will appear as scores for the span in your Langfuse project.


Key concepts

  • platform="langfuse":The essential parameter that directs evaluation results to Langfuse and links them with the current active span.
  • custom_eval_name:Required. A unique, human-readable name for your evaluation instance. This name appears as the score label in the Langfuse UI, helping you distinguish between different evaluations.
  • eval_templates:The name of the evaluation template from the Future AGI AI Evaluations library (e.g., "tone", "groundedness").
  • inputs:The data passed to the evaluation template (e.g., input, output, context depending on the template).

Next Steps

Was this page helpful?

Questions & Discussion