Safe for Work Text Eval
Definition
Safe for Work Text eval assesses whether content is appropriate for professional environments. This evaluation ensures that text content maintains professional standards and is suitable for workplace consumption, free from inappropriate, explicit, or NSFW (Not Safe For Work) content.
Calculation
It employs the GuardrailsAI NSFWText validator, conducting sentence-level validation to assess the presence of inappropriate content.
The system applies a configurable threshold to determine content acceptability. If the text meets workplace standards, it returns a “Pass”; if NSFW content is detected, it returns a “Fail,” ensuring that only suitable content is delivered in professional settings.
What to do when NSFW Text is Detected
Remove or flag the inappropriate content to prevent its dissemination. If necessary, request content revision to ensure compliance with workplace standards.
Implementing robust content filtering policies can help prevent such content from being generated or shared. If detection accuracy needs improvement, adjust detection thresholds, update NSFW content patterns to reflect evolving standards, and strengthen validation rules to enhance filtering effectiveness.
Differentiating Safe for Work Text Eval with Toxicity
Safe for Work evaluation assesses whether content is appropriate for professional environments, ensuring it aligns with workplace standards. In contrast, Toxicity evaluation focuses on detecting harmful or offensive language, identifying content that may be aggressive, inflammatory, or inappropriate, regardless of context.