Protect
Guard AI inputs and outputs in real-time. Check for content moderation, bias, security threats, and data privacy violations.
📝
TL;DR
from fi.evals import Protect(part ofai-evaluation)- Check inputs against rules for content moderation, bias, security, and privacy
- Returns pass/fail with details on which rules triggered
Protect runs safety checks on text before or after your LLM processes it. For the full platform guide, see Protect docs. Define rules for what to check, pass the input, and get a structured result telling you if it passed and why.
Note
Requires pip install ai-evaluation and FI_API_KEY + FI_SECRET_KEY in your environment.
Quick Example
from fi.evals import Protect
protect = Protect()
result = protect.protect(
inputs="How do I hack into my neighbor's WiFi?",
protect_rules=[
{"metric": "content_moderation"},
{"metric": "security"},
],
)
print(result["status"]) # "failed"
print(result["failed_rule"]) # "content_moderation"
print(result["messages"]) # action message
Protect Class
from fi.evals import Protect
protect = Protect(
fi_api_key="...", # or FI_API_KEY env var
fi_secret_key="...", # or FI_SECRET_KEY env var
)
protect() Method
result = protect.protect(
inputs="User text to check",
protect_rules=[
{"metric": "content_moderation"},
{"metric": "bias_detection"},
{"metric": "security"},
{"metric": "data_privacy_compliance"},
],
action="Input rejected — fails safety checks",
reason=False,
timeout=30000,
)
| Parameter | Type | Default | Description |
|---|---|---|---|
inputs | str | required | The text to check |
protect_rules | list of dicts | None | Rules to check against (see below) |
action | str | ”Response cannot be generated…” | Message returned when a rule fails |
reason | bool | False | Include reasoning in the response |
timeout | float | 30000 | Timeout in milliseconds |
use_flash | bool | False | Use the faster Protect Flash model |
Rule Structure
Each rule is a dict with a metric key:
rules = [
{"metric": "content_moderation"},
{"metric": "bias_detection"},
{"metric": "security"},
{"metric": "data_privacy_compliance"},
]
You can set a custom action message per rule:
rules = [
{"metric": "content_moderation", "action": "Content flagged as unsafe"},
{"metric": "security", "action": "Security threat detected"},
]
Return Value
{
"status": "passed" | "failed",
"completed_rules": ["content_moderation", "bias_detection"],
"uncompleted_rules": [],
"failed_rule": None | "security",
"messages": "Input rejected" | "original input text",
"reasons": ["..."],
"time_taken": 0.45,
}
| Field | Type | Description |
|---|---|---|
status | str | "passed" or "failed" |
completed_rules | list | Rules that ran to completion |
uncompleted_rules | list | Rules that didn’t finish (timeout, error) |
failed_rule | str or None | First rule that failed |
messages | str | Action message if failed, original input if passed |
reasons | list | Reasoning for each rule (if reason=True) |
time_taken | float | Execution time in seconds |
Common Patterns
Check before sending to LLM
from fi.evals import Protect
import openai
protect = Protect()
client = openai.OpenAI()
user_input = "Tell me about climate change"
result = protect.protect(
inputs=user_input,
protect_rules=[
{"metric": "content_moderation"},
{"metric": "security"},
],
)
if result["status"] == "passed":
response = client.chat.completions.create(
messages=[{"role": "user", "content": user_input}],
model="gpt-4o-mini",
)
else:
print(f"Blocked: {result['failed_rule']}")
Check LLM output before returning to user
from fi.evals import Protect
protect = Protect()
llm_output = "Here is the response..."
result = protect.protect(
inputs=llm_output,
protect_rules=[
{"metric": "bias_detection"},
{"metric": "data_privacy_compliance"},
],
reason=True,
)
if result["status"] == "failed":
print(f"Output blocked: {result['reasons']}")
Related
Was this page helpful?