Evaluation Using Interface

Input:

  • Required Inputs:
    • text: The text content column to moderate.

Output:

  • Score: Float score between 0 and 1

Interpretation:

  • Higher scores (closer to 1): Indicate safer content, less likely to contain harmful, inappropriate, or unsafe material.
  • Lower scores (closer to 0): Indicate potentially inappropriate content that may require review or filtering.

Evaluation Using Python SDK

Click here to learn how to setup evaluation using the Python SDK.


Input TypeParameterTypeDescription
Required InputstextstringThe text content to moderate.

OutputTypeDescription
ScoreboolReturns a score between 0 and 1. Higher values indicate safer content, lower values indicate potentially inappropriate content.

from fi.evals import EvalClient
from fi.testcases import TestCase
from fi.evals.templates import ContentModeration

evaluator = EvalClient(
    fi_api_key="your_api_key",
    fi_secret_key="your_secret_key",
    fi_base_url="<https://api.futureagi.com>"
)

moderation_eval = ContentModeration()

test_case = TestCase(
    text="This is a sample text to check for content moderation."
)

result = evaluator.evaluate(eval_templates=[moderation_eval], inputs=[test_case])
moderation_result = result.eval_results[0].metrics[0].value


What to do when Content Moderation Fails

When content moderation fails, the first step is to analyse the flagged content by identifying which safety categories triggered the failure and reviewing the specific problematic sections. Understanding the context and severity of violations is crucial in determining the appropriate remediation steps.

To address flagged content, modifications may include rewording while preserving meaning, implementing pre-processing safety checks, or adding content filtering before submission. If system adjustments are required, reviewing and refining safety thresholds, implementing category-specific filters, and incorporating additional pre-screening measures can enhance moderation accuracy. For more robust filtering, a multi-stage moderation pipeline may be considered.


Comparing Content Moderation with Similar Evals

  1. Safe for Work Text: While Content Moderation provides comprehensive safety analysis, Safe for Work Text specifically focuses on workplace appropriateness. Content Moderation is broader and includes multiple safety categories.
  2. Not Gibberish Text: Content Moderation focuses on safety aspects, while Not Gibberish Text evaluates text coherence and meaningfulness. They can be used together for comprehensive content quality assessment.