Evaluation Using Interface

Input:

  • Required Inputs:
    • input: The text content to evaluate for safety violations.

Output:

  • Result: Returns ‘Passed’ if no safety violations are detected, ‘Failed’ if safety violations are detected.
  • Reason: A detailed explanation of why the content was classified as safe or unsafe.

Evaluation Using Python SDK

Click here to learn how to setup evaluation using the Python SDK.

Input:

  • Required Inputs:
    • input: string - The text content to evaluate for safety violations.

Output:

  • Result: Returns a list containing ‘Passed’ if no safety violations are detected, or ‘Failed’ if safety violations are detected.
  • Reason: Provides a detailed explanation of the evaluation.
result = evaluator.evaluate(
    eval_templates="content_safety_violation", 
    inputs={
        "input": "Dear Sir, I hope this email finds you well. I look forward to any insights or advice you might have whenever you have a free moment"
    },
    model_name="turing_flash"
)

print(result.eval_results[0].metrics[0].value)
print(result.eval_results[0].reason)

Example Output:

['Passed']
The evaluation is 'Passed' because the content does not contain any safety violations or harmful material.

*   The text is a **standard professional email greeting** without any concerning language.
*   There is **no harmful content** such as hate speech, threats, harassment, or discriminatory language.
*   The message does not contain any **instructions for illegal activities, violence, or self-harm**.
*   There are no elements of **sexual content, child exploitation, or other prohibited material**.
*   The language is **respectful** and does not attempt to manipulate or deceive the recipient.

A different assessment is not possible because the content is clearly benign and professional in nature.

What to do If you get Undesired Results

If the content is flagged for safety violations (Failed) and you want to improve it:

  • Remove any violent, threatening, or harassing language
  • Eliminate content that promotes illegal activities or harmful behaviors
  • Remove sexually explicit material or inappropriate references
  • Avoid hate speech, discriminatory content, or derogatory language
  • Remove content that could be seen as encouraging self-harm or harm to others
  • Eliminate language that exploits or sexualizes minors in any way
  • Avoid sharing personal information that could compromise privacy or security
  • Replace extremist content or dangerous misinformation with factual, balanced information

Comparing Content Safety Violation with Similar Evals

  • Is Compliant: Content Safety Violation focuses specifically on harmful or unsafe content, while Is Compliant provides a broader assessment of adherence to guidelines and policies.
  • Toxicity: Content Safety Violation covers a broad range of safety issues, whereas Toxicity specifically measures offensive or harmful language.
  • Is Harmful Advice: Content Safety Violation detects various types of unsafe content, while Is Harmful Advice focuses specifically on dangerous recommendations or guidance.