Evaluation Using Interface
Input:- Required Inputs:
- expected_response: The reference answer column.
- response: The generated answer column.
- Configuration Parameters:
- Comparator: The method used for comparison (e.g., Cosine, Exact Match).
- Failure Threshold: Float (e.g., 0.7) - The similarity score below which the evaluation is considered a failure.
- Score: Percentage score between 0 and 100
- Scores ≥ (Failure Threshold * 100): Indicate that the generated
response
is sufficiently similar to theexpected_response
based on the chosenComparator
. - Scores < (Failure Threshold * 100): Suggest that the
response
deviates significantly from theexpected_response
.
Evaluation Using Python SDK
Click here to learn how to setup evaluation using the Python SDK.
Input Type | Parameter | Type | Description |
---|---|---|---|
Required Inputs | expected_response | string | The reference answer. |
response | string | The generated answer. | |
Configuration Parameters | comparator | string | The method to use for comparison (e.g., Comparator.COSINE.value ). |
failure_threshold | float | The threshold below which the evaluation fails (e.g., 0.7). |
Comparator Name | Class Name |
---|---|
Cosine Similarity | Comparator.COSINE.value |
Jaccard Similarity | Comparator.JACCARD.value |
Normalised Levenshtein Similarity | Comparator.NORMALISED_LEVENSHTEIN.value |
Jaro Winckler similarity | Comparator.JARO_WINKLER.value |
Sorensen Dice similarity | Comparator.SORENSEN_DICE.value |
Output | Type | Description |
---|---|---|
Score | float | Returns a score between 0 and 1. Values ≥ failure_threshold indicate sufficient similarity. |