Eval Definition
Content Safety Violation
Detects harmful, unsafe, or prohibited content that violates safety guidelines.
Evaluation Using Interface
Input:
- Required Inputs:
- input: The text content to evaluate for safety violations.
Output:
- Result: Returns ‘Passed’ if no safety violations are detected, ‘Failed’ if safety violations are detected.
- Reason: A detailed explanation of why the content was classified as safe or unsafe.
Evaluation Using Python SDK
Click here to learn how to setup evaluation using the Python SDK.
Input:
- Required Inputs:
- input:
string
- The text content to evaluate for safety violations.
- input:
Output:
- Result: Returns a list containing ‘Passed’ if no safety violations are detected, or ‘Failed’ if safety violations are detected.
- Reason: Provides a detailed explanation of the evaluation.
Example Output:
What to do If you get Undesired Results
If the content is flagged for safety violations (Failed) and you want to improve it:
- Remove any violent, threatening, or harassing language
- Eliminate content that promotes illegal activities or harmful behaviors
- Remove sexually explicit material or inappropriate references
- Avoid hate speech, discriminatory content, or derogatory language
- Remove content that could be seen as encouraging self-harm or harm to others
- Eliminate language that exploits or sexualizes minors in any way
- Avoid sharing personal information that could compromise privacy or security
- Replace extremist content or dangerous misinformation with factual, balanced information
Comparing Content Safety Violation with Similar Evals
- Is Compliant: Content Safety Violation focuses specifically on harmful or unsafe content, while Is Compliant provides a broader assessment of adherence to guidelines and policies.
- Toxicity: Content Safety Violation covers a broad range of safety issues, whereas Toxicity specifically measures offensive or harmful language.
- Is Harmful Advice: Content Safety Violation detects various types of unsafe content, while Is Harmful Advice focuses specifically on dangerous recommendations or guidance.