Levenshtein Distance

Overview

It is a character-level edit distance metric which quantifies how similar two sequences are by calculating the minimum number of operations needed to transform one sequence into the other. The output is normalized to a score between 0 and 1, where 1 indicates an exact match and 0 indicates maximum dissimilarity. This metric is useful for use-cases in spelling correction, OCR and deterministic text matching.

Edit Operations

Possible operations that are allowed in Levenshtein Distance are:
- Insertion: Add a character (e.g., kitten -> kitteng)
- Deletion: Remove a character (e.g., kitten -> kiten)
- Substitution: Replace one character with another (e.g., kitten -> sitten)
Each operation has a cost of 1. The final distance is the sum of all such operations needed to match the two strings.

Normalized Levenshtein Score

\hbox{Score} = 1 - { \hbox{Levenshtein Distance} \over \hbox{max(Length of Prediction, Length of Reference)} }

Score of 1 means the two strings are identical.
Score of 0 means no characters are shared at corresponding positions.

Lavenshtein Distance Eval using Future AGI’s Python SDK

Click here to learn how to setup evaluation using the Python SDK.

Input & Configuration:

	Parameter	Type	Description
Required Inputs	`response`	`str`	Model-generated output to be evaluated.
	`expected_text`	`str`	Reference string against which the output is compared.
Optional Config	`case_insensitive`	`bool`	Whether to ignore casing during comparison. Default: `False`.
	`remove_punctuation`	`bool`	Whether to remove punctuation before computing distance. Default: `False`.

Parameter Options:

Parameter - `case_insensitive`	Description
`True`	Converts both strings to lowercase before comparison.
`False`	Performs strict case-sensitive matching.

Parameter - `remove_punctuation`	Description
`True`	Strips all punctuation marks before computing distance.
`False`	Evaluates exact string, including punctuation.

Output:

Output Field	Type	Description
`score`	`float`	Normalized Levenshtein distance between 0 and 1. A score of `1.0` means perfect match.

Example:

from fi.evals.metrics import LevenshteinDistance
from fi.testcases import TestCase

test_case = TestCase(
    response="The dog lazy the over jumps fox brown quick.",
    expected_text="The quick brown fox jumps over the lazy dog."
)

evaluator = LevenshteinDistance(config={
    "case_insensitive": True,
    "remove_punctuation": True
})

result = evaluator.evaluate([test_case])
print(f"{result.eval_results[0].metrics[0].value:.4f}")

Output:

0.7442

What if Levenshtein Distance Score is Low?

Enable case_insensitive if the low score is due to casing.
Enable remove_punctuation if the formatting symbols are not relevant.
If meaning is more important than exact form, combine this with semantic or lexical metrics like ROUGE, BLEU, or EmbeddingSimilarity using AggregatedMetric.

Numeric DifferenceExtracts numeric values from generated text and compute absolute or normalised difference between numeric value in reference text.

On this page

Overview
Edit Operations
Normalized Levenshtein Score
Lavenshtein Distance Eval using Future AGI’s Python SDK
What if Levenshtein Distance Score is Low?

Introduction

Evaluation

Knowledge Base

Dataset

Prototype

Observe

Tracing

Optimization

Prompt Workbench

Protect

MCP

Admin & Settings

FAQs

Levenshtein Distance

Overview

Edit Operations

Normalized Levenshtein Score

Lavenshtein Distance Eval using Future AGI’s Python SDK

What if Levenshtein Distance Score is Low?

Introduction

Evaluation

Knowledge Base

Dataset

Prototype

Observe

Tracing

Optimization

Prompt Workbench

Protect

MCP

Admin & Settings

FAQs

​Overview

​Edit Operations

​Normalized Levenshtein Score

​Lavenshtein Distance Eval using Future AGI’s Python SDK

​What if Levenshtein Distance Score is Low?

Overview

Edit Operations

Normalized Levenshtein Score

Lavenshtein Distance Eval using Future AGI’s Python SDK

What if Levenshtein Distance Score is Low?