Microsoft Blames Firmware Update For Cloud Failure

Microsoft has identified the cause of the access problems to Hotmail, Outlook.com and portions of SkyDrive earlier this month, and it wasn’t bad weather, power grid troubles or even a wayward administrator.

The trio of cloud services suffered extended downtime – although SkyDrive to a lesser extent – due to a glitch that underscores the importance of proper and uninterrupted cooling in data centre environments, particularly those of cloud services providers. While Microsoft did not reveal the exact nature of technical fault, it did involve a firmware update gone awry.

Temperature Rising

On 12 March, Microsoft confirmed reports of the outage on the Live.com service status page. “We’re having a problem accessing email. You might not be able to see all your email messages. We’re working to restore service right now,” reported Microsoft at 5:35 pm.

As the hours passed, the company supplied frequent, but no less cryptic, updates until the matter was resolved 13 March. Now, a clearer picture has emerged.

In a post on Office Blogs, Microsoft Vice President Arthur de Haan explained: “On the afternoon of the 12th, in one physical region of one of our data centres, we performed our regular process of updating the firmware on a core part of our physical plant. This is an update that had been done successfully previously, but failed in this specific instance in an unexpected way.”

Like in finance, Microsoft’s cloud computing team discovered that past performance does not guarantee future results in IT.

“This failure resulted in a rapid and substantial temperature spike in the data centre. This spike was significant enough before it was mitigated that it caused our safeguards to come in to place for a large number of servers in this part of the data center,” de Haan wrote.

Those safeguards may have spared the physical infrastructure some damage, but it caused headaches for users who rely on Hotmail and Outlook.com for their email or SkyDrive for their cloud storage. Not only were inboxes and some online file stores rendered inaccessible, failover operations never kicked in.

de Hann’s post also provides a clue as to why the outage stretched into the morning hours of 13 March.

“Based on the failure scenario, there was a mix of infrastructure software and human intervention that was needed to bring the core infrastructure back online. Requiring this kind of human intervention is not the norm for our services and added significant time to the restoration,” he wrote.

Microsoft is keen to restore confidence in its cloud ecosystem. “Now that we’re through the resolution, we’re also hard at work on ensuring this doesn’t happen again,” de Hann wrote.

A lot is at stake for the software giant as it makes the transition to a software as a service (SaaS) provider.

On 18 February, David Law, Outlook.com director of product management, announced that in the six months since launch, Outlook.com had attracted 60 million active users. And as with most big product releases, Microsoft CEO Steve Ballmer announced the availability of Office 365 Home Premium, a cloud-enabled version of the Office productivity software suite, 29 January.

Test your Microsoft knowledge with our quiz!

Originally published on eWeek.

Pedro Hernandez

Pedro Hernandez covers Microsoft products and services, such as Office, Windows, Windows Phone, Azure and Skype.

Recent Posts

Apple Sales Rise 6 Percent After Early iPhone 16 Demand

Fourth quarter results beat Wall Street expectations, as overall sales rise 6 percent, but EU…

23 hours ago

X’s Community Notes Fails To Stem US Election Misinformation – Report

Hate speech non-profit that defeated Elon Musk's lawsuit, warns X's Community Notes is failing to…

24 hours ago

Google Fined More Than World’s GDP By Russia

Good luck. Russia demands Google pay a fine worth more than the world's total GDP,…

1 day ago

Spotify, Paramount Sign Up To Use Google Cloud ARM Chips

Google Cloud signs up Spotify, Paramount Global as early customers of its first ARM-based cloud…

2 days ago

Meta Warns Of Accelerating AI Infrastructure Costs

Facebook parent Meta warns of 'significant acceleration' in expenditures on AI infrastructure as revenue, profits…

2 days ago

AI Helps Boost Microsoft Cloud Revenues By 33 Percent

Microsoft says Azure cloud revenues up 33 percent for September quarter as capital expenditures surge…

2 days ago