Embedding Similarity
Measures semantic similarity between the generated and reference text.
Overview
It evaluates how similar two texts are in meaning by comparing their vector embeddings using distance-based similarity measures. Traditional metrics like BLEU or ROUGE rely on word overlap and can fail when the generated output is a valid paraphrase with no lexical match.
How Similarity Is Calculated?
Once both texts are encoded into a high-dimensional vector representations, the similarity between the two vectors u
and v
is computed using one of the following methods:
- Cosine Similarity: Measures the cosine of the angle between vectors.
-
Euclidean Distance: Measures the straight-line distance between vectors (L2 Norm).
-
Manhattan Distance: Measures sum of absolute differences between vectors (L1 Norm).
Embedding Similarity Eval using Future AGI’s Python SDK
Click here to learn how to setup evaluation using the Python SDK.
Input & Configuration:
Parameter | Type | Description | |
---|---|---|---|
Required Inputs | response | str | Model-generated output to be evaluated. |
expected_text | str or List[str] | One or more reference texts for comparison. | |
Optional Config | similarity_method | str | Distance function used to compare embedding vectors. Options: "cosine" (default), "euclidean" , "manhattan" . |
normalize | bool | Whether to normalize embedding vectors before computing similarity. Default is True . |
Parameter Options:
Parameter - similarity_method | Description |
---|---|
cosine | Measures the cosine of the angle between two vectors |
euclidean | Computes the straight-line (L2) distance between vectors |
manhattan | Computes the L1 (absolute) distance between vectors |
Output:
Output Field | Type | Description |
---|---|---|
score | float | Value between 0 and 1 representing semantic similarity. Higher values indicate stronger similarity. |
Example:
Output: