Recall-specific measurement of lexical overlap between generated text and reference text
Click here to learn how to setup evaluation using SDK.Input & Configuration:
Parameter | Type | Description | |
---|---|---|---|
Required Inputs | reference | str | Model-generated output to be evaluated. |
hypothesis | str | Reference text used for evaluation. |
Output Field | Type | Description |
---|---|---|
precision | float | Fraction of predicted tokens that matched the reference. |
recall | float | Fraction of reference tokens that were found in the prediction. |
fmeasure | float | Harmonic mean of precision and recall. Represents the final ROUGE score. |
"rougeL"
if the phrasing of generated text is different but the meaning is preserved.use_stemmer=True
to improve the robustness in word form variation.Embedding Similarity
using Aggregated Metric
to have a holistic view of comparing generated text with reference text.