Levenshtein Similarity

Measures text similarity based on the minimum number of single-character edits required to transform one text into another.

result = evaluator.evaluate(
    eval_templates="levenshtein_similarity",
    inputs={
        "expected": "The Eiffel Tower is a famous landmark in Paris, built in 1889 for the World's Fair. It stands 324 meters tall.",
        "output": "The Eiffel Tower, located in Paris, was built in 1889 and is 324 meters high."
    },
    model_name="turing_flash"
)

print(result.eval_results[0].output)
print(result.eval_results[0].reason)
import { Evaluator, Templates } from "@future-agi/ai-evaluation";

const evaluator = new Evaluator();

const result = await evaluator.evaluate(
  "levenshtein_similarity",
  {
    expected: "The Eiffel Tower is a famous landmark in Paris, built in 1889 for the World's Fair. It stands 324 meters tall.",
    output: "The Eiffel Tower, located in Paris, was built in 1889 and is 324 meters high."
  },
  {
    modelName: "turing_flash",
  }
);

console.log(result);
Input
Required InputTypeDescription
expectedstringReference content for comparison against the model generated output.
outputstringModel generated content to be evaluated for similarity.
Output
FieldDescription
ResultReturns a score, where higher score indicates greater similarity.
ReasonProvides a detailed explanation of the similarity assessment.

About Levenshtein Similarity

Levenshtein Similarity is a character-level metric that quantifies how similar two text sequences are by calculating the minimum number of operations needed to transform one sequence into the other. The output is normalized to a score between 0 and 1, where 1 indicates an exact match and 0 indicates maximum dissimilarity. This metric is useful for use-cases in spelling correction, OCR, and deterministic text matching.

Edit Operations

  • Possible operations that are allowed in Levenshtein calculation:
    • Insertion: Add a character (e.g., kitten -> kitteng)
    • Deletion: Remove a character (e.g., kitten -> kiten)
    • Substitution: Replace one character with another (e.g., kitten -> sitten)
  • Each operation has a cost of 1. The final distance is the sum of all such operations needed to match the two strings.

Normalized Levenshtein Score

Score = 1 - Levenshtein Distance max(Length of Prediction, Length of Reference)

  • Score of 1 means the two strings are identical.
  • Score of 0 means no characters are shared at corresponding positions.

What to do If you get Undesired Results

If the Levenshtein similarity score is lower than expected:

  • Consider case sensitivity - the comparison is typically case-sensitive
  • Check for whitespace and punctuation differences, which count as edits
  • For meaning-based comparison rather than exact character matching, consider using semantic similarity metrics
  • For texts with similar meaning but different wording, consider metrics like ROUGE, BLEU, or embedding similarity
  • Remember that this metric measures character-level similarity, not semantic similarity

Comparing Levenshtein Similarity with Similar Evals

  • Fuzzy Match: While Levenshtein Similarity focuses on character-level edits, Fuzzy Match may use different algorithms for approximate string matching.
  • Embedding Similarity: Levenshtein Similarity measures character-level edits, whereas Embedding Similarity captures semantic similarity through vector representations.
  • BLEU Score: Levenshtein operates at character level, while BLEU focuses on n-gram precision between the candidate and reference texts.
Was this page helpful?

Questions & Discussion