Langfuse Integration
Integrate Future AGI evaluations with Langfuse to attach evaluation results directly to your Langfuse traces.
About
Langfuse provides tracing but does not have a built-in evaluation engine. This integration adds that missing piece. By setting platform="langfuse" on evaluator.evaluate(), Future AGI runs the evaluation and attaches the result as a score directly to the active Langfuse span. Metrics like tone, groundedness, and relevance appear alongside trace data in the Langfuse dashboard.
When to use
- Monitor LLM quality in Langfuse: Correlate evaluation metrics (tone, groundedness, etc.) with specific spans and traces in the Langfuse UI.
- Per-span evaluation scores: Attach evaluation results to any Langfuse span without configuring separate evaluation tasks.
- End-to-end observability: Combine Future AGI evaluation templates with Langfuse tracing for comprehensive LLM application monitoring.
How to
Install the required packages
Install the necessary Python packages before you begin.
pip install ai-evaluation fi-instrumentation-otel Set up your environment
Initialize both the Langfuse and Future AGI clients.
import os
from langfuse import Langfuse
from fi.evals import Evaluator
# 1. Initialize Langfuse
langfuse = Langfuse(
secret_key=os.getenv("LANGFUSE_SECRET_KEY"),
public_key=os.getenv("LANGFUSE_PUBLIC_KEY"),
host=os.getenv("LANGFUSE_HOST")
)
# 2. Initialize the Future AGI Evaluator
evaluator = Evaluator(
fi_api_key=os.getenv("FI_API_KEY"),
fi_secret_key=os.getenv("FI_SECRET_KEY"),
)Note
Make sure you have LANGFUSE_SECRET_KEY, LANGFUSE_PUBLIC_KEY, and LANGFUSE_HOST in your .env file, or pass them directly when initializing the Evaluator:
evaluator = Evaluator(
fi_api_key=os.getenv("FI_API_KEY"),
fi_secret_key=os.getenv("FI_SECRET_KEY"),
langfuse_secret_key=os.getenv("LANGFUSE_SECRET_KEY"),
langfuse_public_key=os.getenv("LANGFUSE_PUBLIC_KEY"),
langfuse_host=os.getenv("LANGFUSE_HOST")
) Run an evaluation within a Langfuse span
Call evaluator.evaluate() with platform="langfuse" inside an active Langfuse span. The evaluation result will be automatically linked to that span as a score.
# Your application logic, e.g. an LLM call
response_from_llm = "this is a sample response."
expected_response = "this is a sample response."
# Start a Langfuse span
with langfuse.start_as_current_observation(
name="OpenAI call",
input={"user_query": user_query},
) as span:
response = client.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "user", "content": user_query}
]
)
result = response.choices[0].message.content
span.update(output={"response": result})
# Evaluate the tone of the OpenAI response
evaluator.evaluate(
eval_templates="tone",
inputs={
"input": result
},
custom_eval_name="evaluate_tone",
model_name="turing_large",
platform="langfuse"
)The results will appear as scores for the span in your Langfuse project.
Key concepts
platform="langfuse":The essential parameter that directs evaluation results to Langfuse and links them with the current active span.custom_eval_name:Required. A unique, human-readable name for your evaluation instance. This name appears as the score label in the Langfuse UI, helping you distinguish between different evaluations.eval_templates:The name of the evaluation template from the Future AGI AI Evaluations library (e.g.,"tone","groundedness").inputs:The data passed to the evaluation template (e.g.,input,output,contextdepending on the template).
Next Steps
Running Your First Eval
Learn how to run evaluations using the Future AGI AI Evaluations library.
In-line Evaluations
Run evaluations directly inside a traced span with Future AGI tracing.
Set Up Tracing
Register a tracer provider and add instrumentation.
Auto Instrumentation
Browse all supported framework instrumentors.