Answer Similarity
Assesses the similarity between an expected response and an actual response. This evaluation uses various comparison methods to determine how closely the actual response matches the expected one.
Evaluation Using Interface
Input:
- Required Inputs:
- expected_response: The reference answer column.
- response: The generated answer column.
- Configuration Parameters:
- Comparator: The method used for comparison (e.g., Cosine, Exact Match).
- Failure Threshold: Float (e.g., 0.7) - The similarity score below which the evaluation is considered a failure.
Output:
- Score: Percentage score between 0 and 100
Interpretation:
- Scores ≥ (Failure Threshold * 100): Indicate that the generated
response
is sufficiently similar to theexpected_response
based on the chosenComparator
. - Scores < (Failure Threshold * 100): Suggest that the
response
deviates significantly from theexpected_response
.
Evaluation Using Python SDK
Click here to learn how to setup evaluation using the Python SDK.
Input Type | Parameter | Type | Description |
---|---|---|---|
Required Inputs | expected_response | string | The reference answer. |
response | string | The generated answer. | |
Configuration Parameters | comparator | string | The method to use for comparison (e.g., Comparator.COSINE.value ). |
failure_threshold | float | The threshold below which the evaluation fails (e.g., 0.7). |
Comparator Name | Class Name |
---|---|
Cosine Similarity | Comparator.COSINE.value |
Jaccard Similarity | Comparator.JACCARD.value |
Normalised Levenshtein Similarity | Comparator.NORMALISED_LEVENSHTEIN.value |
Jaro Winckler similarity | Comparator.JARO_WINKLER.value |
Sorensen Dice similarity | Comparator.SORENSEN_DICE.value |
Output | Type | Description |
---|---|---|
Score | float | Returns a score between 0 and 1. Values ≥ failure_threshold indicate sufficient similarity. |
What to Do When Answer Similarity Evaluation is Low
A response review should be conducted to reassess the actual response’s alignment with the expected response and identify discrepancies. If necessary, a comparator adjustment can be made, selecting an alternative similarity measure that better captures nuanced differences in meaning.
Differentiating Answer Similarity with Context Relevance
Answer Similarity specifically measures how closely two responses align in meaning, whereas Context Sufficiency determines whether a given context provides enough information to answer a query.
From an input perspective, Answer Similarity requires both an expected and actual response for comparison, while Context Sufficiency evaluates a query against its provided context.
Was this page helpful?