Skip to main content
result = evaluator.evaluate(
    eval_templates="semantic_list_contains",
    inputs={
        "expected": "The Eiffel Tower is a famous landmark in Paris, built in 1889 for the World's Fair. It stands 324 meters tall.",
        "output": "The Eiffel Tower, located in Paris, was built in 1889 and is 324 meters high."
    },
    model_name="turing_flash"
)

print(result.eval_results[0].output)
print(result.eval_results[0].reason)
Input
Required InputTypeDescription
outputstringThe content to be evaluated for semantic list contains against the reference.
expectedstring or List[string]A single phrase or list of phrases that the response is expected to semantically include.
Output
FieldDescription
ResultReturns a score representing the semantic list contains of the response against the reference, where higher values indicate better semantic list contains.
ReasonProvides a detailed explanation of the semantic list contains evaluation.

About Semantic List Contains

This evaluation is designed to evaluate whether the model’s output closely resembles any of the key phrases provided. The metric is especially useful when exact wording may differ but meaning is preserved or the reference is a set of expected keywords.

How Semantic List Contains Evals Work?

  1. Encodes both response and reference text into dense vectors using a SentenceTransformer.
  2. Computes similarity between the response and each phrase using cosine similarity
  3. Compares the result with a configurable threshold (e.g., 0.7)
  4. Returns 1.0 (if exact match) or 0.0 (no match) depending on whether:
    • Any match (match_all = False, default)
    • All match (match_all = True)

What if Semantic List Contains Eval Score is Low?

  • Lower the similarity_threshold value (if your use case allows relaxed semantic matches).
  • Use "match_all"= False if partial coverage is acceptable.

I