Measures n-gram overlap precision between the generated and reference text.
"The quick brown fox jumps over the lazy dog"
"The"
, "quick"
, "brown"
, "fox"
"The quick"
, "quick brown"
, "brown fox"
"The quick brown"
, "quick brown fox"
Click here to learn how to setup evaluation using SDK.Input & Configuration:
Parameter | Type | Description | |
---|---|---|---|
Required Inputs | reference | str | Model-generated output to be evaluated. |
hypothesis | str or List[str] | One or more reference texts. |
Output Field | Type | Description |
---|---|---|
score | float | Score between 0 and 1. Higher values indicate greater lexical overlap. |
max_n_gram
in such case.
Embedding Similarity
using Aggregated Metric
to have a holistic view of comparing generated text with reference text.
smooth="method2"
or smooth="method4"
to mitigate the impact of zero matches at higher n-gram levels and obtain more stable scores for short or diverse outputs. Below is the guideline on what smoothing method to use based on different scenarios:
Scenario | Suggested Smoothing Method |
---|---|
Short outputs | method1 or method2 |
High variance in phrasing | method4 or method5 |
Very strict evaluation | method0 (no smoothing) |
General use | method1 (default) or method2 (balanced smoothing) |
Sparse references or low match rate (e.g., summaries) | method3 |
Mixed-length outputs with partial n-gram match | method6 |
when you want strictness early on, but flexibility only after the first break in match continuity. | method7 |