GUEST ESSAY: CrowdStrike outage fallout — stricter regulations required to achieve resiliency

By Sumedh Barde

What does the recent CrowdStrike outage tell us about the state of digital resiliency?

On a resiliency scale of one to 10, most enterprises are at about two. This was clear over the weekend when over 4000 flights were grounded, hospitals had to postpone services, and financial systems went down.

The only reason the impact was not broader was luck – not everybody runs CrowdStrike, and not all processes have been digitized.

The world was also lucky that this outage was due to a mistake by a legitimate vendor, and the recovery steps were relatively straightforward, albeit laborious. This made it possible for all users to recover cleanly and be sure that they have completely recovered.

Barde

Imagine that instead of a mistake by CrowdStrike, it was a malicious actor who subverted the CrowdStrike distribution channel and leveraged it as a Trojan for a data theft or ransomware attack. By the time such an attack gets detected, it would 100 percent impossible to size up the damage exactly, and the damagewould be so distributed that no single vendor (not CrowdStrike, not Microsoft) would be able to provide full recovery guidance.

So, what is it going to take to start making meaningful steps towards achieving digital resiliency across Internet-centric services?

Redundancy is vital

A multi-pronged approach is needed to ensure resiliency. Both vendors and enterprises have a role to play in this.

The first prong is prevention. Vendors need to test every update thoroughly, and have a clear rollback mechanism for every update. They need to release every update in phases to their users, starting with a small set of users who have opted to take the latest bits, so that if they missed an issue in their testing, it is at least detected early before it goes out to all users.

This is common practice among telecom and cloud service providers. Vendors should ideally also enable enterprises to control how to distribute their updates within the enterprise. An example is how Microsoft enables enterprises to control how to roll out Windows updates. Enterprises also need to adopt such controls where they are offered.

The second prong is containment. Mistakes happen. Enterprises need to avoid single points of failure, by diversifying their supply chain and their own technical implementations, so that a mistake in one component does not bring down all their systems.

Final prong is governance. Every commercial entity trades off how much they invest in containing risks with the impact if they do not. In some cases the financial incentives are not big enough.

In the end, the ones who suffer are consumers. In select cases, regulatory bodies may need to consider regulations for vendors and enterprises, to protect consumers.

About the essayist: Sumedh Barde is Head of Product at Simbian, which supplies fully autonomous security systems for intelligent defense.

July 30th, 2024 | Essays | Top Stories

*** This is a Security Bloggers Network syndicated blog from The Last Watchdog authored by bacohido. Read the original post at: https://www.lastwatchdog.com/guest-essay-crowdstrike-outage-fallout-stricter-regulations-required-to-achieve-resiliency/