Answer Similarity
Definition
Assesses the similarity between an expected response and an actual response. This evaluation uses various comparison methods to determine how closely the actual response matches the expected one.
A high score indicates that the actual response is similar to the expected response, while a low score suggests significant differences.
Calculation
The evaluation process begins where the expected and actual responses are defined. A similarity comparison method, such as Cosine Similarity or Jaccard Similarity, is selected based on the evaluation needs.
During similarity analysis, the chosen comparator calculates a similarity score between the expected and actual responses. This score is then compared against a predefined failure threshold to determine alignment.
Eval returns a similarity score that quantifies the alignment between responses. If the score falls below the threshold, the evaluation flags a failure, indicating significant deviations.
What to Do When Answer Similarity Evaluation is Low
A response review should be conducted to reassess the actual response’s alignment with the expected response and identify discrepancies. If necessary, a comparator adjustment can be made, selecting an alternative similarity measure that better captures nuanced differences in meaning.
Differentiating Answer Similarity with Context Relevance
Answer Similarity specifically measures how closely two responses align in meaning, whereas Context Sufficiency determines whether a given context provides enough information to answer a query.
From an input perspective, Answer Similarity requires both an expected and actual response for comparison, while Context Sufficiency evaluates a query against its provided context.