Langfuse Integration
Integrate Future AGI evaluations with Langfuse
Overview
Future AGI’s evaluation platform can be seamlessly integrated with Langfuse, allowing you to attach evaluation results from our AI Evaluations library directly to your Langfuse traces. This enables you to monitor the performance and quality of your LLM applications within the Langfuse UI, correlating evaluation metrics with specific spans and traces.
How it works
When you call evaluator.evaluate()
with the platform="langfuse"
parameter inside an active Langfuse span, the evaluation is executed. The results are then automatically attached as scores to that specific span in your Langfuse dashboard.
Usage
1. Installation
Before you begin, install the necessary Python packages:
2. Setup and Initialization
First, you need to set up your environment by initializing both the Langfuse and Future AGI clients.
Make sure you have LANGFUSE_SECRET_KEY and LANGFUSE_PUBLIC_KEY and LANGFUSE_HOST in your .env file or pass them as arguments while initializing the Evaluator class
3. Configure and Run Evaluations within a Langfuse Span
To link an evaluation to a specific operation in your code, run the evaluation within the context of a Langfuse span. The evaluation result will be automatically linked to that span.
In this example, we will run a levenshtein_similarity
evaluation.
The results will appear as scores for the span in your Langfuse project.
To know more about how to run other evaluations, refer to the evaluations documentation
When calling evaluator.configure_evaluations()
:
platform="langfuse"
: This essential parameter directs the evaluation results to be sent to Langfuse and linked with the current active span.custom_eval_name
: This parameter is required and provides a unique, human-readable name for your evaluation instance. This name will appear on the score in the Langfuse UI, helping you distinguish between different evaluations.eval_config
: This dictionary contains the configuration for the evaluation, including theeval_templates
to use and theinputs
for the evaluation.