Meta AI Safety System Easily Compromised, Study Shows

SC Media reports that attackers could easily evade the defenses of Meta's newly introduced artificial intelligence safety system Prompt-Guard-86M model through a new prompt injection attack exploit.

Removing punctuation and spacing out letters in a malicious prompt led PromptGuard's efficacy in detecting prompt injection attacks to decline from 100% to 0.2%, a report from Robust Intelligence revealed. Such a flaw in PromptGuard was identified after researchers discovered that the system, which is based on Microsoft’s mDeBERTa text processing model, did not have elevated Mean Absolute Errors for individual English alphabet characters, indicating a lack of fine-tuning for malicious prompts.

"This jailbreak raises concerns for companies considering the model as part of their AI security strategy. It highlights the importance of continuous evaluation of security tools and the need for a multi-layer approach," said Robust Intelligence AI Security Researcher Aman Priyanshu.