Semantic List Contains

Overview

This evaluation is designed to evaluate whether the model’s output closely resembles any of the key phrases provided. The metric is especially useful when exact wording may differ but meaning is preserved or the reference is a set of expected keywords.

How Semantic List Contains Evals Work?

Encodes both response and reference text into dense vectors using a SentenceTransformer.
Computes similarity between the response and each phrase using cosine similarity
Compares the result with a configurable threshold (e.g., 0.7)
Returns 1.0 (if exact match) or 0.0 (no match) depending on whether:
- Any match (match_all = False, default)
- All match (match_all = True)

Evaluation Using SDK

Click here to learn how to setup evaluation using SDK.

Input & Configuration:

	Parameter	Type	Description
Required Inputs	`response`	`str`	Model-generated output to be evaluated
	`expected_text`	`str` or `List[str]`	A single phrase or list of phrases that the response is expected to semantically include
Optional Config	`case_insensitive`	`bool`	Whether to lowercase input texts before comparison. Default: `True`
	`remove_punctuation`	`bool`	Whether to strip punctuation from texts. Default: `True`
	`match_all`	`bool`	If `True`, all phrases must be semantically present; if `False`, any one match is enough. Default: `False`
	`similarity_threshold`	`float`	Similarity threshold for considering a match. Typical range: `0.5`–`0.9`. Default: `0.7`

Output:

Output Field	Type	Description
`score`	`float`	Returns float between `1.0` and `0.0`, closer to `1.0` if match criteria are more satisfied, or closer`0.0` otherwise
`metadata`	`dict`	Contains similarity values for each phrase, the threshold, and match logic used

result = evaluator.evaluate(
    eval_templates="semantic_list_contains",
    inputs={
        "expected_text": "The Eiffel Tower is a famous landmark in Paris, built in 1889 for the World's Fair. It stands 324 meters tall.",
        "response": "The Eiffel Tower, located in Paris, was built in 1889 and is 324 meters high."
    },
    model_name="turing_flash"
)

print(result.eval_results[0].output)
print(result.eval_results[0].reason)

Output:

Score: 1.0
matches [True, False, False]
similarities {'brown fox': 0.6240062713623047, 'lazy dog': 0.5937517639250626, 'dancing giraffe': 0.28756572530065383}
threshold 0.6
match_all False

What if Semantic List Contains Eval Score is Low?

Lower the similarity_threshold value (if your use case allows relaxed semantic matches).
Use "match_all"= False if partial coverage is acceptable.

Introduction

Evaluation

Simulations

Knowledge Base

Dataset

Prototype

Observe

Tracing

Optimization

Prompt Workbench

Protect

MCP

Admin & Settings

FAQs

Semantic List Contains

Overview

How Semantic List Contains Evals Work?

Evaluation Using SDK

What if Semantic List Contains Eval Score is Low?

Introduction

Evaluation

Simulations

Knowledge Base

Dataset

Prototype

Observe

Tracing

Optimization

Prompt Workbench

Protect

MCP

Admin & Settings

FAQs

​Overview

​How Semantic List Contains Evals Work?

​Evaluation Using SDK

​What if Semantic List Contains Eval Score is Low?

Overview

How Semantic List Contains Evals Work?

Evaluation Using SDK

What if Semantic List Contains Eval Score is Low?