What is the acceptable number of alerts to ignore? 20%? 50%? The answer for many SOCs is around 70%. It’s hard to place the blame in one area.There are a lot of tools generating a lot of alerts, and because no one wants to miss something important, these tools are prone to generating false positives. This makes analysts’ queues endless, filled with benign, low-fidelity, and redundant alerts. Each of these alerts takes time to determine that it isn’t a serious threat.Wasn’t security automation supposed to solve this problem? Automation has helped, but significantly reducing alert noise at scale has proved to be very challenging. Doing it with a SIEM is extremely difficult. XDR has not yet been up to the task. Legacy SOAR tools have struggled to match the necessary scale and reliability.To automate the dismissal of alerts, you need high confidence, and high confidence requires massive amounts of processing power. Let’s consider why. A workflow that provides sufficient confidence about the nature of an alert might include as many as 500 actions. If that sounds crazy, we will dig into it in the next section. If a large MSSP is monitoring one million alerts per day, that’s 500 million steps to process every day, without ever stopping.That simple playbook can represent as many as 10,000 actions, just to ingest, deduplicate, and correlate alerts.
There are not many tools that automate reliably at that scale. Why not, and what are the missing pieces?The Workflow Engine
High-confidence triage at scale relies on the strength of the workflow engine you are using, which needs to be able to handle hundreds of actions at a time. Why does it need this much capacity? If an alert requires advanced queries to collect all indicators of compromise (IOCs), the workflow needs to unpack the alert, run the queries, and analyze the incoming IOCs. This analysis requires unique enrichment and logic. Any dismissed alerts need to be closed in the original data source as well. These actions quickly add up and are all necessary to avoid dismissing serious alerts.Some types of alerts naturally lead to expansive workflows, as more and more relevant data is brought into the playbook. For example, a phishing playbook might pull in hundreds of related emails that are part of the same campaign, each with their own IOCs to investigate. Even before triage, some workflows must be able to unwind hundreds of alerts. To take one example, in a tool that does not automatically deduplicate alerts, an ingestion playbook might look like this:- The playbook queries an integrated AWS S3 Bucket to ingest a list of 500 alerts into the automation tool.
- The playbook then unwinds that list, creating a parallel path for each alert.
- For each event, the playbook then executes a series of 15-20 actions to parse that alert, normalize its fields, and check the global list to see if the alert ID is already in the system.
- If the alert ID is not in the global list, then a new event is created, and triage can begin.