Levenshtein Similarity: Edit Distance Evaluation Metric

Measures text similarity based on the minimum number of single-character edits required to transform one text into another.

result = evaluator.evaluate(
    eval_templates="levenshtein_similarity",
    inputs={
        "expected": "The Eiffel Tower is a famous landmark in Paris, built in 1889 for the World's Fair. It stands 324 meters tall.",
        "output": "The Eiffel Tower, located in Paris, was built in 1889 and is 324 meters high."
    },
    model_name="turing_flash"
)

print(result.eval_results[0].output)
print(result.eval_results[0].reason)

import { Evaluator, Templates } from "@future-agi/ai-evaluation";

const evaluator = new Evaluator();

const result = await evaluator.evaluate(
  "levenshtein_similarity",
  {
    expected: "The Eiffel Tower is a famous landmark in Paris, built in 1889 for the World's Fair. It stands 324 meters tall.",
    output: "The Eiffel Tower, located in Paris, was built in 1889 and is 324 meters high."
  },
  {
    modelName: "turing_flash",
  }
);

console.log(result);


Required Input	Type	Description
`expected`	`string`	Reference content for comparison against the model generated output.
`output`	`string`	Model generated content to be evaluated for similarity.

Output
	Field	Description
	Result	Returns a score, where higher score indicates greater similarity.
	Reason	Provides a detailed explanation of the similarity assessment.

About Levenshtein Similarity

Levenshtein Similarity is a character-level metric that quantifies how similar two text sequences are by calculating the minimum number of operations needed to transform one sequence into the other. The output is normalized to a score between 0 and 1, where 1 indicates an exact match and 0 indicates maximum dissimilarity. This metric is useful for use-cases in spelling correction, OCR, and deterministic text matching.

Edit Operations

Possible operations that are allowed in Levenshtein calculation:
- Insertion: Add a character (e.g., kitten -> kitteng)
- Deletion: Remove a character (e.g., kitten -> kiten)
- Substitution: Replace one character with another (e.g., kitten -> sitten)
Each operation has a cost of 1. The final distance is the sum of all such operations needed to match the two strings.

Normalized Levenshtein Score

Score = 1 - Levenshtein Distance max(Length of Prediction, Length of Reference)

Score of 1 means the two strings are identical.
Score of 0 means no characters are shared at corresponding positions.

What to do If you get Undesired Results

If the Levenshtein similarity score is lower than expected:

Consider case sensitivity - the comparison is typically case-sensitive
Check for whitespace and punctuation differences, which count as edits
For meaning-based comparison rather than exact character matching, consider using semantic similarity metrics
For texts with similar meaning but different wording, consider metrics like ROUGE, BLEU, or embedding similarity
Remember that this metric measures character-level similarity, not semantic similarity

Comparing Levenshtein Similarity with Similar Evals

Fuzzy Match: While Levenshtein Similarity focuses on character-level edits, Fuzzy Match may use different algorithms for approximate string matching.
Embedding Similarity: Levenshtein Similarity measures character-level edits, whereas Embedding Similarity captures semantic similarity through vector representations.
BLEU Score: Levenshtein operates at character level, while BLEU focuses on n-gram precision between the candidate and reference texts.

Was this page helpful?

Questions & Discussion