Step-by-Step Guide to Creating an Eval Task
1. Set Filters Based on Span Kind
Begin by defining a set of filters to narrow down the data you want to evaluate. Filters can be based on various properties such as:- Node Type
- Created At
2. Choose Data Type
Decide whether you want to run the Evals on:- Historic Data: Apply Evals to a specified time range of already-collected data.
- Continuous Data: Run the evaluation automatically as new data arrives. Recommended for continuous monitoring data in a production environment.
3. Define Sampling Rate
Set a sampling rate to determine the percentage of data to process. A sampling rate of (100%) means all data items are used, whereas (50%) means only half of the available data is used for evaluation. This helps control costs and manage data volume.4. Set Maximum Number of Spans
Define the maximum number of spans for each evaluation run. This ensures your evaluation scales well and avoids processing excessive amounts of data at once.5. Select Evals to Run
Choose from a list of preset or previously configured evaluations (Evals) that you want to apply to your filtered data. This selection determines which evaluations will be executed. For example, if you want to perform a Bias Detection evaluation, each evaluation requires specific keys. In the case of Bias Detection, an input key is essential. Every span contains key-value pairs, known as span attributes, where the data is stored. You need to supply one of these span attributes as the input. For instance, by passingllm.output_messages.0.message.content
as the input, the Bias Detection evaluation will determine whether the content is biased. The evaluation will return Passed
if the content is neutral, or Failed
if any bias is detected.
For more information on the evaluations we support, please refer to the evals documentation.