Editorial: CrowdStrike Outage: Is Our IT Too Fragile?
The recent CrowdStrike outage has caused global IT disruptions, impacting businesses and raising serious concerns about cybersecurity and disaster recovery.
The recent Crowdstrike outage has sent shockwaves through the global business community, highlighting the vulnerabilities inherent in our increasingly interconnected digital world. This incident, marked by simultaneous failures in Microsoft’s 365 services and a problematic Crowdstrike update, has resulted in widespread chaos, underscoring the urgent need for robust cybersecurity measures and comprehensive disaster recovery plans.
Simon Pardo, Director of Technology Specialists at Computer Care, captured the gravity of the situation succinctly:
“An IT failure double whammy has brought the world to its knees this morning, and the ongoing chaos will be costing businesses billions of pounds every hour. Two separate problems have struck at the same time, with issues hitting Microsoft’s three, six, five services, and an update from anti-malware product Crowdstrike that is pushing computers into a blue screen of death.”
The simultaneous failure of two critical services raises important questions about the testing and rollout of software updates. Pardo emphasized the importance of rigorous pre-release testing to prevent such disasters:
“Updates should always be thoroughly tested before they are pushed out to users, and it’s hard to know whether this is human error or a failure of process.”
The impact of these failures has been immediate and severe, with businesses around the world scrambling to mitigate the damage. This incident serves as a stark reminder of the necessity for robust disaster recovery plans. Pardo advised: “This is a wake-up call for all the companies that have been floored by this attack. Organisations need to urgently review their disaster recovery plans to make sure they can deal with such problems. For all business leaders my recommendation is to challenge your IT team – ask them how do we roll-out updates and patches? How would we recover from a Crowdstrike type failure and make sure you run a test and prove your disaster recovery plan works.”
The nature of the disruption has also sparked concerns about potential cyber-attacks. Iain James, Computer Systems Manager at Fusion eCare Solutions, pointed out:
“It appears that Crowdstrike is at the centre of a significant issue, raising alarms across the tech world. Whether this disruption stems from a botched update or a sophisticated cyber attack remains uncertain. Given Crowdstrike’s history of being targeted by Russian cyber operatives, the possibility of a malicious attack cannot be ruled out.”
This situation is particularly troubling as it extends to Microsoft systems that rely on Crowdstrike, potentially tarnishing Microsoft’s reputation by association.
Richard May and John Murray from virtua lDCS provided further technical insights into the challenges posed by the outage:
“It appears that the issue is not the infamous ‘blue screen of death’, but rather a boot loop preventing machines from starting up. This raises questions about why so many other infrastructure elements are also affected. From my perspective, Microsoft might be at fault here. It seems Crowdstrike is blocking the operating system from starting. Without booting up, these machines can’t connect to the network, meaning they’ll all need manual fixes – potentially taking weeks.”
They also highlighted the significant risks if businesses disable Crowdstrike to get their systems running, as this could leave them vulnerable to cyberattacks.
The implications of deploying commercial-grade software in critical systems were starkly highlighted by Dan O’Dowd, founder of The Dawn Project:
“The dangers of deploying commercial grade software in safety critical systems cannot be understated. The immense body of software developed using Silicon Valley’s ‘move fast and break things’ culture means that the software our lives depend on is riddled with defects and vulnerabilities. Defects in this software can result in a mass failure event even more serious than the one we have seen today.”
O’Dowd’s call to action is clear:
“We must convince the CEOs and Boards of Directors of the companies that build the systems our lives depend on to rewrite their software so that it never fails and can’t be hacked. The clock is ticking down to a Cyber Armageddon. Secure and reliable software exists – it is already deployed in military applications and on commercial airliners. These companies will not take cybersecurity seriously until the public demands it. And we must demand it now, before a major disaster strikes.”
Finally, Alina Timofeeva, a strategic advisor in Data and Technology, raised fundamental questions about the architecture of our IT systems:
“Why do so many systems depend on just a single vendor? Why the whole OS can be crippled by a software update? What does that tell about the current IT architecture?”
She emphasized the need for companies to invest in operational resilience:
“One of the key things to be concerned about is systemic or concentration risk, of being dependent on one provider. It is very key for Companies to invest into Operational Resilience which is broader than just Technology. It would cover Technology, Data, Third Parties, Processes, and People.”
The Crowdstrike outage serves as a stark reminder of the critical importance of cybersecurity and disaster recovery planning. Businesses must take this opportunity to reevaluate their IT strategies, ensuring that they are prepared for future disruptions. This incident is not just a technical failure; it is a wake-up call for the entire digital ecosystem to prioritize security, reliability, and resilience.