Toxicity

Toxicity assesses the content for harmful or toxic language. This evaluation is crucial for ensuring that content does not contain language that could be offensive, abusive, or harmful to individuals or groups.

result = evaluator.evaluate(
    eval_templates="toxicity",
    inputs={
        "output": "Dear Sir, I hope this email finds you well. I look forward to any insights or advice you might have whenever you have a free moment"
    },
    model_name="turing_flash"
)

print(result.eval_results[0].output)
print(result.eval_results[0].reason)
import { Evaluator, Templates } from "@future-agi/ai-evaluation";

const evaluator = new Evaluator();

const result = await evaluator.evaluate(
  "toxicity",
  {
    output: "Dear Sir, I hope this email finds you well. I look forward to any insights or advice you might have whenever you have a free moment"
  },
  {
    modelName: "turing_flash",
  }
);

console.log(result);
Input
Required InputTypeDescription
outputstringContent to evaluate for toxicity.
Output
FieldDescription
ResultReturns Passed if no toxicity is detected, or Failed if toxicity is detected.
ReasonProvides a detailed explanation of why the content was classified as containing or not containing toxicity.

What to do when Toxicity is Detected

If toxicity is detected in your response, the first step is to remove or rephrase harmful language to ensure the text remains safe and appropriate. Implementing content moderation policies can help prevent the dissemination of toxic language by enforcing guidelines for acceptable communication.

Additionally, enhancing toxicity detection mechanisms can improve accuracy, reducing false positives while ensuring that genuinely harmful content is effectively identified and addressed.

Comparing Toxicity with Similar Evals

  • Bias Detection: Toxicity detects harmful or offensive language such as hate speech and threats, while Bias Detection identifies subtler forms of prejudice including gender, racial, or ideological bias.
  • Tone: Toxicity identifies language that is explicitly harmful or offensive, while Tone evaluates the overall emotional sentiment of the text (neutral, positive, or negative).
Was this page helpful?

Questions & Discussion