Protect

Guard AI inputs and outputs in real-time. Check for content moderation, bias, security threats, and data privacy violations.

📝
TL;DR
  • from fi.evals import Protect (part of ai-evaluation)
  • Check inputs against rules for content moderation, bias, security, and privacy
  • Returns pass/fail with details on which rules triggered

Protect runs safety checks on text before or after your LLM processes it. For the full platform guide, see Protect docs. Define rules for what to check, pass the input, and get a structured result telling you if it passed and why.

Note

Requires pip install ai-evaluation and FI_API_KEY + FI_SECRET_KEY in your environment.

Quick Example

from fi.evals import Protect

protect = Protect()

result = protect.protect(
    inputs="How do I hack into my neighbor's WiFi?",
    protect_rules=[
        {"metric": "content_moderation"},
        {"metric": "security"},
    ],
)

print(result["status"])          # "failed"
print(result["failed_rule"])     # "content_moderation"
print(result["messages"])        # action message

Protect Class

from fi.evals import Protect

protect = Protect(
    fi_api_key="...",       # or FI_API_KEY env var
    fi_secret_key="...",    # or FI_SECRET_KEY env var
)

protect() Method

result = protect.protect(
    inputs="User text to check",
    protect_rules=[
        {"metric": "content_moderation"},
        {"metric": "bias_detection"},
        {"metric": "security"},
        {"metric": "data_privacy_compliance"},
    ],
    action="Input rejected — fails safety checks",
    reason=False,
    timeout=30000,
)
ParameterTypeDefaultDescription
inputsstrrequiredThe text to check
protect_ruleslist of dictsNoneRules to check against (see below)
actionstr”Response cannot be generated…”Message returned when a rule fails
reasonboolFalseInclude reasoning in the response
timeoutfloat30000Timeout in milliseconds
use_flashboolFalseUse the faster Protect Flash model

Rule Structure

Each rule is a dict with a metric key:

rules = [
    {"metric": "content_moderation"},
    {"metric": "bias_detection"},
    {"metric": "security"},
    {"metric": "data_privacy_compliance"},
]

You can set a custom action message per rule:

rules = [
    {"metric": "content_moderation", "action": "Content flagged as unsafe"},
    {"metric": "security", "action": "Security threat detected"},
]

Return Value

{
    "status": "passed" | "failed",
    "completed_rules": ["content_moderation", "bias_detection"],
    "uncompleted_rules": [],
    "failed_rule": None | "security",
    "messages": "Input rejected" | "original input text",
    "reasons": ["..."],
    "time_taken": 0.45,
}
FieldTypeDescription
statusstr"passed" or "failed"
completed_ruleslistRules that ran to completion
uncompleted_ruleslistRules that didn’t finish (timeout, error)
failed_rulestr or NoneFirst rule that failed
messagesstrAction message if failed, original input if passed
reasonslistReasoning for each rule (if reason=True)
time_takenfloatExecution time in seconds

Common Patterns

Check before sending to LLM

from fi.evals import Protect
import openai

protect = Protect()
client = openai.OpenAI()

user_input = "Tell me about climate change"

result = protect.protect(
    inputs=user_input,
    protect_rules=[
        {"metric": "content_moderation"},
        {"metric": "security"},
    ],
)

if result["status"] == "passed":
    response = client.chat.completions.create(
        messages=[{"role": "user", "content": user_input}],
        model="gpt-4o-mini",
    )
else:
    print(f"Blocked: {result['failed_rule']}")

Check LLM output before returning to user

from fi.evals import Protect

protect = Protect()
llm_output = "Here is the response..."

result = protect.protect(
    inputs=llm_output,
    protect_rules=[
        {"metric": "bias_detection"},
        {"metric": "data_privacy_compliance"},
    ],
    reason=True,
)

if result["status"] == "failed":
    print(f"Output blocked: {result['reasons']}")
Was this page helpful?

Questions & Discussion