How To
Detect Prompt Injection
Prompt Injection is a security threat where adversarial inputs manipulate a language model’s behaviour, bypass security mechanisms, or override intended instructions. Identifying and mitigating prompt injection attacks is critical to ensuring the security, reliability, and integrity of AI systems.
To address this threat, Prompt Injection Detection is used to identify injection patterns, context manipulation, and security vulnerabilities.
Click here to read the eval definition of Prompt Injection
a. Using Interface
Required Input
- input: The user-provided prompt to be analysed for injection attempts.
Output
Returns a Passed/Failed result:
- Passed – No prompt injection attempts detected.
- Failed – Suspicious patterns identified, requiring mitigation.