Skip to main content
result = evaluator.evaluate(
    eval_templates="content_safety_violation",
    inputs={
        "output": "Dear Sir, I hope this email finds you well. I look forward to any insights or advice you might have whenever you have a free moment"
    },
    model_name="turing_flash"
)

print(result.eval_results[0].output)
print(result.eval_results[0].reason)
Input
Required InputDescription
outputModel generated content to evaluate for safety violations
Output
FieldDescription
ResultReturns Passed if no safety violations are detected, or Failed if safety violations are detected
ReasonProvides a detailed explanation of the evaluation

What to do If you get Undesired Results

If the content is flagged for safety violations (Failed) and you want to improve it:
  • Remove any violent, threatening, or harassing language
  • Eliminate content that promotes illegal activities or harmful behaviors
  • Remove sexually explicit material or inappropriate references
  • Avoid hate speech, discriminatory content, or derogatory language
  • Remove content that could be seen as encouraging self-harm or harm to others
  • Eliminate language that exploits or sexualizes minors in any way
  • Avoid sharing personal information that could compromise privacy or security
  • Replace extremist content or dangerous misinformation with factual, balanced information

Comparing Content Safety Violation with Similar Evals

  • Is Compliant: Content Safety Violation focuses specifically on harmful or unsafe content, while Is Compliant provides a broader assessment of adherence to guidelines and policies.
  • Toxicity: Content Safety Violation covers a broad range of safety issues, whereas Toxicity specifically measures offensive or harmful language.
  • Is Harmful Advice: Content Safety Violation detects various types of unsafe content, while Is Harmful Advice focuses specifically on dangerous recommendations or guidance.
I