Evaluation Using Interface
Input:- Required Inputs:
- expected_response: The reference answer column.
- response: The generated answer column.
- Configuration Parameters:
- Comparator: The method used for comparison (e.g., Cosine, Exact Match).
- Failure Threshold: Float (e.g., 0.7) - The similarity score below which the evaluation is considered a failure.
- Score: Percentage score between 0 and 100
- Scores ≥ (Failure Threshold * 100): Indicate that the generated
responseis sufficiently similar to theexpected_responsebased on the chosenComparator. - Scores < (Failure Threshold * 100): Suggest that the
responsedeviates significantly from theexpected_response.
Evaluation Using Python SDK
Click here to learn how to setup evaluation using the Python SDK.
| Input Type | Parameter | Type | Description |
|---|---|---|---|
| Required Inputs | expected_response | string | The reference answer. |
response | string | The generated answer. | |
| Configuration Parameters | comparator | string | The method to use for comparison (e.g., Comparator.COSINE.value). |
failure_threshold | float | The threshold below which the evaluation fails (e.g., 0.7). |
| Comparator Name | Class Name |
|---|---|
| Cosine Similarity | Comparator.COSINE.value |
| Jaccard Similarity | Comparator.JACCARD.value |
| Normalised Levenshtein Similarity | Comparator.NORMALISED_LEVENSHTEIN.value |
| Jaro Winckler similarity | Comparator.JARO_WINKLER.value |
| Sorensen Dice similarity | Comparator.SORENSEN_DICE.value |
| Output | Type | Description |
|---|---|---|
Score | float | Returns a score between 0 and 1. Values ≥ failure_threshold indicate sufficient similarity. |