Toxicity
Toxicity assesses the content for harmful or toxic language. This evaluation is crucial for ensuring that text does not contain language that could be offensive, abusive, or harmful to individuals or groups.
Evaluation Using Interface
Input:
- Required Inputs:
- output: The output column generated by model.
Output:
- Result: Passed / Failed
Interpretation:
- Passed: The output does not contain toxic language.
- Failed: The output contains toxic language.
Evaluation Using Python SDK
Click here to learn how to setup evaluation using the Python SDK.
Input:
- Required Inputs:
- output:
string
- The output column generated by the model.
- output:
Output:
- Result:
bool
- 0/1
Interpretation:
- 0: The output contains toxic language.
- 1: The output does not contain toxic language.
What to do when Toxicity is Detected
If toxicity is detected in your response, the first step is to remove or rephrase harmful language to ensure the text remains safe and appropriate. Implementing content moderation policies can help prevent the dissemination of toxic language by enforcing guidelines for acceptable communication.
Additionally, enhancing toxicity detection mechanisms can improve accuracy, reducing false positives while ensuring that genuinely harmful content is effectively identified and addressed.
Comparing Toxicity with Similar Evals
- Content Moderation: It focuses on assessing text for overall safety and appropriateness, identifying harmful or offensive content across various categories. In contrast, Toxicity Evaluation specifically targets the detection of toxic language, such as hate speech, threats, or highly inflammatory remarks.
- Tone Analysis: It evaluates the emotional tone and sentiment of the text, determining whether it is neutral, positive, or negative. While it provides insights into how a message may be perceived, Toxicity Evaluation is more concerned with identifying language that is explicitly harmful or offensive, regardless of sentiment.