Eval Definition
Levenshtein Distance
Measures minimum number of single-character edits required (insertions, deletions, or substitutions) to transform the generated text into the reference text.
Overview
It is a character-level edit distance metric which quantifies how similar two sequences are by calculating the minimum number of operations needed to transform one sequence into the other. The output is normalized to a score between 0 and 1, where 1 indicates an exact match and 0 indicates maximum dissimilarity. This metric is useful for use-cases in spelling correction, OCR and deterministic text matching.
Edit Operations
- Possible operations that are allowed in Levenshtein Distance are:
- Insertion: Add a character (e.g.,
kitten -> kitteng
) - Deletion: Remove a character (e.g.,
kitten -> kiten
) - Substitution: Replace one character with another (e.g.,
kitten -> sitten
)
- Insertion: Add a character (e.g.,
- Each operation has a cost of 1. The final distance is the sum of all such operations needed to match the two strings.
Normalized Levenshtein Score
- Score of 1 means the two strings are identical.
- Score of 0 means no characters are shared at corresponding positions.
Lavenshtein Distance Eval using Future AGI’s Python SDK
Click here to learn how to setup evaluation using the Python SDK.
Input & Configuration:
Parameter | Type | Description | |
---|---|---|---|
Required Inputs | response | str | Model-generated output to be evaluated. |
expected_text | str | Reference string against which the output is compared. | |
Optional Config | case_insensitive | bool | Whether to ignore casing during comparison. Default: False . |
remove_punctuation | bool | Whether to remove punctuation before computing distance. Default: False . |
Parameter Options:
Parameter - case_insensitive | Description |
---|---|
True | Converts both strings to lowercase before comparison. |
False | Performs strict case-sensitive matching. |
Parameter - remove_punctuation | Description |
---|---|
True | Strips all punctuation marks before computing distance. |
False | Evaluates exact string, including punctuation. |
Output:
Output Field | Type | Description |
---|---|---|
score | float | Normalized Levenshtein distance between 0 and 1. A score of 1.0 means perfect match. |
Example:
Output:
What if Levenshtein Distance Score is Low?
- Enable
case_insensitive
if the low score is due to casing. - Enable
remove_punctuation
if the formatting symbols are not relevant. - If meaning is more important than exact form, combine this with semantic or lexical metrics like
ROUGE
,BLEU
, orEmbeddingSimilarity
usingAggregatedMetric
.