Evaluation Using Interface

Input:

  • Required Inputs:
    • expected_text: The reference text against which to compare.
    • response: The text to be evaluated.

Output:

  • Score: A numeric score between 0 and 1, where 1 represents perfect similarity.
  • Reason: A detailed explanation of the similarity assessment.

Evaluation Using Python SDK

Click here to learn how to setup evaluation using the Python SDK.

Input:

  • Required Inputs:
    • expected_text: string - The reference text against which to compare.
    • response: string - The text to be evaluated.

Output:

  • Score: Returns a float value between 0 and 1, where higher values indicate greater similarity.
  • Reason: Provides a detailed explanation of the similarity assessment.
result = evaluator.evaluate(
    eval_templates="levenshtein_similarity", 
    inputs={
        "expected_text": "The Eiffel Tower is a famous landmark in Paris, built in 1889 for the World's Fair. It stands 324 meters tall.",
        "response": "The Eiffel Tower, located in Paris, was built in 1889 and is 324 meters high."
    },
    model_name="turing_flash"
)

print(result.eval_results[0].metrics[0].value)
print(result.eval_results[0].reason)

Example Output:

[Output would show the similarity score and detailed reason]

Overview

Levenshtein Similarity is a character-level metric that quantifies how similar two text sequences are by calculating the minimum number of operations needed to transform one sequence into the other. The output is normalized to a score between 0 and 1, where 1 indicates an exact match and 0 indicates maximum dissimilarity. This metric is useful for use-cases in spelling correction, OCR, and deterministic text matching.

Edit Operations

  • Possible operations that are allowed in Levenshtein calculation:
    • Insertion: Add a character (e.g., kitten -> kitteng)
    • Deletion: Remove a character (e.g., kitten -> kiten)
    • Substitution: Replace one character with another (e.g., kitten -> sitten)
  • Each operation has a cost of 1. The final distance is the sum of all such operations needed to match the two strings.

Normalized Levenshtein Score

Score=1Levenshtein Distancemax(Length of Prediction, Length of Reference)\hbox{Score} = 1 - { \hbox{Levenshtein Distance} \over \hbox{max(Length of Prediction, Length of Reference)} }
  • Score of 1 means the two strings are identical.
  • Score of 0 means no characters are shared at corresponding positions.

What to do If you get Undesired Results

If the Levenshtein similarity score is lower than expected:

  • Consider case sensitivity - the comparison is typically case-sensitive
  • Check for whitespace and punctuation differences, which count as edits
  • For meaning-based comparison rather than exact character matching, consider using semantic similarity metrics
  • For texts with similar meaning but different wording, consider metrics like ROUGE, BLEU, or embedding similarity
  • Remember that this metric measures character-level similarity, not semantic similarity

Comparing Levenshtein Similarity with Similar Evals

  • Fuzzy Match: While Levenshtein Similarity focuses on character-level edits, Fuzzy Match may use different algorithms for approximate string matching.
  • Embedding Similarity: Levenshtein Similarity measures character-level edits, whereas Embedding Similarity captures semantic similarity through vector representations.
  • BLEU Score: Levenshtein operates at character level, while BLEU focuses on n-gram precision between the candidate and reference texts.