Network Security, Cloud Security, Patch/Configuration Management

Security pros brace for manual system-by-system fix to CrowdStrike outage

Share
Major disruption internet outage

Reaction from the industry on the CrowdStrike outage has been coming in fast and furious, with security pros advising that the fix to the cybersecurity firm's Falcon sensor will require a manual, computer-by-computer approach to focusing on a crash rejection feature for system updates.

The faulty update does not affect Mac and Linux users, said CrowdStrike CEO George Kurtz, who added in a post on X that the incident is tied to a content update and is not a cybersecurity incident or cyberattack.

Related: CrowdStrike confirms faulty update is tied to massive global IT outage: ‘Fix has been deployed’

Related: What the CrowdStrike update outage means for cybersecurity

Omer Grossman, chief information officer at CyberArk, said the CrowdStrike incident looks like it will go down as one of the most significant cyber issues of 2024, even though it's only July. Grossman said the damage to business processes at the global level is dramatic, adding that the glitch is due to a software update of CrowdStrike's EDR product.

“This is a product that runs with high privileges that protects endpoints,” explained Grossman. “A malfunction in this can, as we are seeing in the current incident, cause the operating system to crash.”

Grossman pointed out that there are two main issues on the agenda: The first is how customers get back online and regain continuity of business processes. It turns out that because the endpoints have crashed — the so-called "Blue Screen of Death" — they cannot be updated remotely and teams must solve the problem manually, endpoint- by-endpoint, a process that will take days.

The second is around what caused the malfunction?

“The range of possibilities ranges from human error — for instance a developer who downloaded an update without sufficient quality control — to the complex and intriguing scenario of a deep cyberattack, prepared ahead of time and involving an attacker activating a ‘doomsday command’ or ‘kill switch,’” said Grossman. “CrowdStrike's analysis and updates in the coming days will be of the utmost interest.”

Andy Ellis, operating partner at cybersecurity VC firm YL Ventures, said we need to talk about crash rejection, possibly the single most important safety feature in any system with dynamic updates.

“We had to build this into Akamai’s updating systems in May 2004 some 20 years ago after a similar incident to what CrowdStrike is going through,” said Ellis. “That was a bad year for DDoS incidents. A metadata update — the configuration files that specify how to handle each customer’s traffic — went out to all our servers and a bad interaction with the software caused widespread issues, including crashing servers.”

Alan Stephenson-Brown, chief executive office at Evolve, said news of a global IT outage that has caused problems at airlines, media and banks is a timely reminder that operational resilience should be at the forefront of the business agenda.

Stephenson-Brown said demonstrating that even large corporations aren't immune to IT troubles, this outage highlights the importance of having distributed data centers and rerouting connectivity that ensures business can continue functioning when cloud infrastructure is disrupted.

“By prioritizing both contingency planning and preventative measures, IT systems can be protected,” said Stephenson-Brown. “I urge business leaders to seriously appraise the systems they have in place to identify potential vulnerabilities before they find themselves the subject of the next IT outages headline."

IT industry reactions to CrowdStrike outage

Comments from cybersecurity professional quickly filled the inboxes of SC Magazine reporters and editors as the fallout from the faulty update began to come into focus. Here's a sampling of their hot takes:

Martin Greenfield, CEO of cybersecurity monitoring firm Quod Orbis: "The widespread impact of this outage also highlights the interconnectedness of global IT systems and the potential for cascading failures. Companies must conduct thorough risk assessments, not just of their own systems, but of their entire supply chain and third-party dependencies. This incident demonstrates how a single point of failure can have far-reaching consequences across multiple sectors and geographies."

Maxine Holt, senior director of cybersecurity for Omdia: "Omdia analysts connect the dots: this isn't a cyberattack, but it’s unquestionably a cybersecurity disaster. ... Cybersecurity's role is to protect and ensure uninterrupted business operations. Today, on 19 July 2024, many organizations are failing to operate, proving that even non-malicious cybersecurity failures can bring businesses to their knees. The workaround, involving booting into safe mode, is a nightmare for cloud customers. Cloud-dependent businesses are facing severe disruptions."

Tom Lysemose Hansen, CTO of Norwegian cybersecurity company Promon: "The errors made today will cost the affected organizations millions and leave their reputations significantly damaged due to a compromised experience for their customers.

"The sheer breadth of this disruption also serves as a grim reminder of the impact of mismanaging such a widely used operating system when inadequate measures are taken against preventing a breach and subsequent outage. The job of these cybersecurity firms and professionals is to mitigate against these threats and it is clear that a serious and disappointing error has been made.”

Get daily email updates

SC Media's daily must-read of the most current and pressing daily news

By clicking the Subscribe button below, you agree to SC Media Terms and Conditions and Privacy Policy.