Eval Definition
Semantic List Contains
Evaluates whether a generated response semantically contains one or more reference phrases or keywords.
Overview
This evaluation is designed to evaluate whether the model’s output closely resembles any of the key phrases provided. The metric is especially useful when exact wording may differ but meaning is preserved or the reference is a set of expected keywords.
How Semantic List Contains Evals Work?
- Encodes both response and reference text into dense vectors using a SentenceTransformer.
- Computes similarity between the response and each phrase using cosine similarity
- Compares the result with a configurable threshold (e.g.,
0.7
) - Returns
1.0
(if exact match) or0.0
(no match) depending on whether:- Any match (
match_all = False
, default) - All match (
match_all = True
)
- Any match (
Semantic List Contains Eval using Future AGI’s Python SDK
Click here to learn how to setup evaluation using the Python SDK.
Input & Configuration:
Parameter | Type | Description | |
---|---|---|---|
Required Inputs | response | str | Model-generated output to be evaluated |
expected_text | str or List[str] | A single phrase or list of phrases that the response is expected to semantically include | |
Optional Config | case_insensitive | bool | Whether to lowercase input texts before comparison. Default: True |
remove_punctuation | bool | Whether to strip punctuation from texts. Default: True | |
match_all | bool | If True , all phrases must be semantically present; if False , any one match is enough. Default: False | |
similarity_threshold | float | Similarity threshold for considering a match. Typical range: 0.5 –0.9 . Default: 0.7 |
Output:
Output Field | Type | Description |
---|---|---|
score | float | Returns float between 1.0 and 0.0 , closer to 1.0 if match criteria are more satisfied, or closer0.0 otherwise |
metadata | dict | Contains similarity values for each phrase, the threshold, and match logic used |
Example:
Output:
What if Semantic List Contains Eval Score is Low?
- Lower the
similarity_threshold
value (if your use case allows relaxed semantic matches). - Use
"match_all"= False
if partial coverage is acceptable.