Typo Set In Motion Chain Of Events That Shut Down AWS S3 Cloud

You hear about accident investigations on a regular basis. When an airliner goes down, or a train comes off the rails or any other serious accident, an investigation starts along with the grim task of recovering the dead and injured.

Usually, there will be a briefing by the investigating authority at the start and then you won’t hear anything for months. Few people know is what the investigators are even looking for.

That’s because it can take months for the investigators to go through every detail before determining what caused the accident.

Inside AWS outage

The investigations are elaborate because there’s rarely a single cause to a serious accident. Eventually the investigation will show that a sequence of events occurred and it’s possible that the accident could have been prevented if any one of those event had changed.

Investigations of this type actually happen for accidents of all sorts, not just transportation catastrophes. Companies and regulators follow similar procedures for a wide variety of unplanned events.

In fact, companies will launch such an investigation when an accident causes a major loss, such as the outage that took out Amazon Web Services and its S3 storage services on February 28, which explains why the company undertook one.

I observed this first-hand in the late spring of 1971, when I was sent up a mountain near Roanoke, Virginia, to cover an airplane crash for the television station where I’d just started working. On that mountain, World War II hero and Hollywood actor Audie Murphy and five others had died as the airplane in which they were riding slammed into the top of a fog shrouded mountain.

Around me as I climbed the side of the mountain with the rest of the news crew were representatives from the National Transportation Safety Board, already taking photos and making measurements of the crash site. Later, they would take all the components they could find of the shattered aircraft to a hanger for examination and further investigation.

Investigation

To me, as I reported from that mountainside, the reason for the crash seemed obvious. The pilot must have been lost in the fog, and failed to see the mountain. But the truth was much more complicated than that.

The investigators had to learn why the pilot been lost like that near a major airport? Why hadn’t he performed an instrument landing at the major airport nearby after the weather had turned bad? The questions were eventually answered, and ultimately a lesson was learned.

Fortunately, not every accident results in tragic deaths. But every serious accident must be investigated to learn how it happened and how it can be prevented from happening again.

This was the case with the Feb. 28 event when Amazon Web Service’s S3 storage services shut down for hours. This time the losses measured not in lives, but in millions of dollars lost by Amazon and clients because of the down time. Clearly an investigation was in order.

But as Amazon explained in a report it released on March 2 along with an apology to its customers, it was of chain of events that started with the smallest of errors, a typo in a server update command.

Originally published on eWeek

Page: 1 2

Wayne Rash

Wayne Rash is senior correspondent for eWEEK and a writer with 30 years of experience. His career includes IT work for the US Air Force.

Recent Posts

Craig Wright Sentenced For Contempt Of Court

Suspended prison sentence for Craig Wright for “flagrant breach” of court order, after his false…

2 days ago

El Salvador To Sell Or Discontinue Bitcoin Wallet, After IMF Deal

Cash-strapped south American country agrees to sell or discontinue its national Bitcoin wallet after signing…

2 days ago

UK’s ICO Labels Google ‘Irresponsible’ For Tracking Change

Google's change will allow advertisers to track customers' digital “fingerprints”, but UK data protection watchdog…

2 days ago

EU Publishes iOS Interoperability Plans

European Commission publishes preliminary instructions to Apple on how to open up iOS to rivals,…

3 days ago

Momeni Convicted In Bob Lee Murder

San Francisco jury finds Nima Momeni guilty of second-degree murder of Cash App founder Bob…

3 days ago