Safe for Work Text Eval
Safe for Work Text eval assesses whether content is appropriate for professional environments. This evaluation ensures that text content maintains professional standards and is suitable for workplace consumption, free from inappropriate, explicit, or NSFW (Not Safe For Work) content.
Evaluation Using Interface
Input:
- Required Inputs:
- response: The text content column to evaluate for workplace appropriateness.
- Configuration Parameters:
- None specified for this evaluation.
Output:
- Result: Passed / Failed
Interpretation:
- Passed: Indicates the
response
content is considered appropriate for a general workplace environment (Not Safe For Work content was not detected). - Failed: Signifies that the
response
content contains material potentially inappropriate for a general workplace environment (e.g., explicit, offensive, or harmful content).
Evaluation Using Python SDK
Click here to learn how to setup evaluation using the Python SDK.
Input Type | Parameter | Type | Description |
---|---|---|---|
Required Inputs | response | string | The text content to evaluate for workplace appropriateness. |
Output | Type | Description |
---|---|---|
Result | float | Returns 1.0 if the content is deemed safe for work (Passed), 0.0 if it is not safe for work (Failed). |
What to do when NSFW Text is Detected
Remove or flag the inappropriate content to prevent its dissemination. If necessary, request content revision to ensure compliance with workplace standards.
Implementing robust content filtering policies can help prevent such content from being generated or shared. If detection accuracy needs improvement, adjust detection thresholds, update NSFW content patterns to reflect evolving standards, and strengthen validation rules to enhance filtering effectiveness.
Differentiating Safe for Work Text Eval with Toxicity
Safe for Work evaluation assesses whether content is appropriate for professional environments, ensuring it aligns with workplace standards. In contrast, Toxicity evaluation focuses on detecting harmful or offensive language, identifying content that may be aggressive, inflammatory, or inappropriate, regardless of context.