Seven tips that offer short-term and long-term fixes following the CrowdStrike outage

The recent CrowdStrike outage serves as a reminder for cybersecurity defenders about the importance of robust testing and incident response strategies. As practitioners, our role is to anticipate, mitigate, and manage the impacts of such disruptions. Here are important actions and considerations drawn from the CrowdStrike incident and similar events, such as the Microsoft Windows 10 October 2018 Update.

CrowdStrike’s outage was caused by a bug in the Memory Scanning prevention policy that led to widespread performance issues across systems running the Falcon sensor for Windows. This bug, which resulted in the sensor consuming 100% of a CPU core, was not caught during standard testing procedures. The incident underscores the necessity for comprehensive testing and the critical need for rapid and effective incident response mechanisms.

Immediate actions to take

Activate the organization’s incident response plan: Upon identifying a widespread issue like this, activate the organization’s incident response plan. Ensure that all team members are aware of their roles and responsibilities. This plan should include steps for communication, mitigation, and resolution.
Communicate with stakeholders: Clear, transparent communication with affected stakeholders is vital. Inform users about the issue, the steps being taken to resolve it, and any actions they need to take. Consistent updates can help manage expectations and maintain trust.
Apply system reboots and patches: In the case of the CrowdStrike bug, the primary remediation involved system reboots. Ensure that critical systems that cannot afford downtime are managed carefully. Plan reboots during off-peak hours if possible and communicate clearly to minimize operational impact.

Long-term strategies

Deploy enhanced testing procedures: Traditional testing may not always catch complex bugs. Adopt a multi-layered testing approach that includes sandbox testing. Deploy updates in a controlled environment that closely mimics the organizations live systems. Also, gradually release updates to small groups of users before a full-scale deployment. Monitor the results and be ready to roll back if issues arise. Finally, encourage and act on feedback from early adopters to quickly identify and address issues.
Diversify the company’s vendor strategy: Relying too heavily on a single vendor can amplify the impact of such outages. Diversify your cybersecurity solutions to mitigate the risk of a single point of failure.
Develop regular training and drills: Conduct regular training sessions and simulated drills to ensure the team is prepared to handle real incidents efficiently. This includes recognizing phishing attempts and social engineering attacks that often follow major disruptions.
Review and update security policies: Post-incident, review all security policies and procedures. Identify any gaps or weaknesses exposed by the incident and update your protocols accordingly.

Follow-on threats

After major incidents like the CrowdStrike outage, it’s common to see a rise in follow-on threats. Attackers often exploit the confusion and urgency created by the initial disruption to launch secondary attacks.

These follow-on threats can include phishing and social engineering attacks, where criminals pose as support personnel from the affected company, attempting to trick users into providing sensitive information or installing malware. For example, after the Microsoft 2018 update incident, there were numerous reports of phishing emails and tech support scams. Another threat involves fake update notifications, where users might receive prompts to download "urgent" updates that are actually malware. It’s crucial to ensure that users only download updates from official sources. Additionally, attackers may increase their scanning activity to identify systems that are still vulnerable or improperly patched.

Drawing parallels from the Microsoft 2018 incident, we can expect CrowdStrike to face significant reputational damage and customer trust issues. Microsoft had to halt the rollout and work extensively to fix the problems, which included restoring lost files for some users. Similarly, CrowdStrike will need to dedicate substantial resources to remediation and rebuilding trust.

The CrowdStrike outage reminds us of the complexities and challenges in cybersecurity. By adopting comprehensive testing practices, diversifying vendor dependencies, maintaining clear communication, and enhancing social engineering defenses, practitioners can better prepare for and respond to such incidents. Continuous improvement and vigilance are vital to minimizing the impact of inevitable disruptions in our increasingly interconnected digital ecosystem.

Callie Guenther, senior manager, cyber threat research, Critical Start

Seven tips that offer short-term and long-term fixes following the CrowdStrike outage

Immediate actions to take

Long-term strategies

Follow-on threats

Related

CrowdStrike update causes global outages: Analysis

What the CrowdStrike update outage means for cybersecurity

Security pros brace for manual system-by-system fix to CrowdStrike outage

Related Events

The Security Experts Next Door: Enterprise Defense on a Lean Budget

Network security: New tools for an aging art

Get daily email updates