Input | ||
---|---|---|
Required Input | Description | |
output | Model generated content to evaluate for safety violations |
Output | ||
---|---|---|
Field | Description | |
Result | Returns Passed if no safety violations are detected, or Failed if safety violations are detected | |
Reason | Provides a detailed explanation of the evaluation |
What to do If you get Undesired Results
If the content is flagged for safety violations (Failed) and you want to improve it:- Remove any violent, threatening, or harassing language
- Eliminate content that promotes illegal activities or harmful behaviors
- Remove sexually explicit material or inappropriate references
- Avoid hate speech, discriminatory content, or derogatory language
- Remove content that could be seen as encouraging self-harm or harm to others
- Eliminate language that exploits or sexualizes minors in any way
- Avoid sharing personal information that could compromise privacy or security
- Replace extremist content or dangerous misinformation with factual, balanced information
Comparing Content Safety Violation with Similar Evals
- Is Compliant: Content Safety Violation focuses specifically on harmful or unsafe content, while Is Compliant provides a broader assessment of adherence to guidelines and policies.
- Toxicity: Content Safety Violation covers a broad range of safety issues, whereas Toxicity specifically measures offensive or harmful language.
- Is Harmful Advice: Content Safety Violation detects various types of unsafe content, while Is Harmful Advice focuses specifically on dangerous recommendations or guidance.