Content Safety Violation

Evaluation Using Interface

Input:

Required Inputs:
- input: The text content to evaluate for safety violations.

Output:

Result: Returns ‘Passed’ if no safety violations are detected, ‘Failed’ if safety violations are detected.
Reason: A detailed explanation of why the content was classified as safe or unsafe.

Evaluation Using SDK

Click here to learn how to setup evaluation using SDK.

Input:

Required Inputs:
- input: string - The text content to evaluate for safety violations.

Output:

Result: Returns a list containing ‘Passed’ if no safety violations are detected, or ‘Failed’ if safety violations are detected.
Reason: Provides a detailed explanation of the evaluation.

result = evaluator.evaluate(
    eval_templates="content_safety_violation",
    inputs={
        "input": "Dear Sir, I hope this email finds you well. I look forward to any insights or advice you might have whenever you have a free moment"
    },
    model_name="turing_flash"
)

print(result.eval_results[0].output)
print(result.eval_results[0].reason)

['Passed']
The evaluation is 'Passed' because the content does not contain any safety violations or harmful material.

*   The text is a **standard professional email greeting** without any concerning language.
*   There is **no harmful content** such as hate speech, threats, harassment, or discriminatory language.
*   The message does not contain any **instructions for illegal activities, violence, or self-harm**.
*   There are no elements of **sexual content, child exploitation, or other prohibited material**.
*   The language is **respectful** and does not attempt to manipulate or deceive the recipient.

A different assessment is not possible because the content is clearly benign and professional in nature.

What to do If you get Undesired Results

If the content is flagged for safety violations (Failed) and you want to improve it:

Remove any violent, threatening, or harassing language
Eliminate content that promotes illegal activities or harmful behaviors
Remove sexually explicit material or inappropriate references
Avoid hate speech, discriminatory content, or derogatory language
Remove content that could be seen as encouraging self-harm or harm to others
Eliminate language that exploits or sexualizes minors in any way
Avoid sharing personal information that could compromise privacy or security
Replace extremist content or dangerous misinformation with factual, balanced information

Comparing Content Safety Violation with Similar Evals

Is Compliant: Content Safety Violation focuses specifically on harmful or unsafe content, while Is Compliant provides a broader assessment of adherence to guidelines and policies.
Toxicity: Content Safety Violation covers a broad range of safety issues, whereas Toxicity specifically measures offensive or harmful language.
Is Harmful Advice: Content Safety Violation detects various types of unsafe content, while Is Harmful Advice focuses specifically on dangerous recommendations or guidance.

Introduction

Evaluation

Simulations

Knowledge Base

Dataset

Prototype

Observe

Tracing

Optimization

Prompt Workbench

Protect

MCP

Admin & Settings

FAQs

Content Safety Violation

Evaluation Using Interface

Evaluation Using SDK

What to do If you get Undesired Results

Comparing Content Safety Violation with Similar Evals

Introduction

Evaluation

Simulations

Knowledge Base

Dataset

Prototype

Observe

Tracing

Optimization

Prompt Workbench

Protect

MCP

Admin & Settings

FAQs

​Evaluation Using Interface

​Evaluation Using SDK

​What to do If you get Undesired Results

​Comparing Content Safety Violation with Similar Evals

Evaluation Using Interface

Evaluation Using SDK

What to do If you get Undesired Results

Comparing Content Safety Violation with Similar Evals