Toxicity Detection: Harmful Language Evaluation Metric

Assesses AI content for toxic, offensive, or abusive language, detecting harmful patterns that could harm individuals or groups.

result = evaluator.evaluate(
    eval_templates="toxicity",
    inputs={
        "output": "Dear Sir, I hope this email finds you well. I look forward to any insights or advice you might have whenever you have a free moment"
    },
    model_name="turing_flash"
)

print(result.eval_results[0].output)
print(result.eval_results[0].reason)

import { Evaluator, Templates } from "@future-agi/ai-evaluation";

const evaluator = new Evaluator();

const result = await evaluator.evaluate(
  "toxicity",
  {
    output: "Dear Sir, I hope this email finds you well. I look forward to any insights or advice you might have whenever you have a free moment"
  },
  {
    modelName: "turing_flash",
  }
);

console.log(result);

Input
	Required Input	Type	Description
	`output`	`string`	Content to evaluate for toxicity.

Output
	Field	Description
	Result	Returns Passed if no toxicity is detected, or Failed if toxicity is detected.
	Reason	Provides a detailed explanation of why the content was classified as containing or not containing toxicity.

What to do when Toxicity is Detected

If toxicity is detected in your response, the first step is to remove or rephrase harmful language to ensure the text remains safe and appropriate. Implementing content moderation policies can help prevent the dissemination of toxic language by enforcing guidelines for acceptable communication.

Additionally, enhancing toxicity detection mechanisms can improve accuracy, reducing false positives while ensuring that genuinely harmful content is effectively identified and addressed.

Comparing Toxicity with Similar Evals

Bias Detection: Toxicity detects harmful or offensive language such as hate speech and threats, while Bias Detection identifies subtler forms of prejudice including gender, racial, or ideological bias.
Tone: Toxicity identifies language that is explicitly harmful or offensive, while Tone evaluates the overall emotional sentiment of the text (neutral, positive, or negative).

Was this page helpful?

Questions & Discussion