Microsoft Says Sorry For BPOS Outage

Microsoft experienced its own outage last week after its BPOS service denied customers access to email

Customers using Microsoft’s BPOS service suffered an email outage last week prompting fresh cloud concerns.

On 10 May, malformed email traffic sparked a growing message backlog that impacted some customers for up to six to nine hours. The issue occurred again 12 May, compounded by a separate but related problem that led to customer delays as long as three hours.

Then, just to top off what was already a stressful week for Microsoft’s BPOS engineering teams, a failure in the Domain Name Service hosting mail.microsoftonline.com stopped users from accessing Outlook Web Access hosted in the Americas. That issue also affected Microsoft Outlook and Microsoft Exchange ActiveSync devices.

Microsoft Apology

Microsoft solved the issues and issued a mea culpa. “I’d like to apologise to you, our customers and partners, for the obvious inconveniences these issues caused,” Dave Thompson, corporate vice president of Microsoft Online Services, wrote in a 12 May posting on the Microsoft Online Services Team Blog. “We know that email is a critical part of your business communication, and my team and I fully recognise our responsibility as your partner and service provider.”

In the wake of the issues, Microsoft has taken steps to improve its communications with users. “Effective today, we updated our communications procedures to be more extensive and timely,” Thompson wrote. “The primary mechanism for communicating to our customers on issues has been and will continue to be the Service Health Dashboard.”

He also insisted that the issues gripping BPOS haven’t affected Office 365, Microsoft’s cloud-based productivity platform that recently launched its public beta, or other company services. (Office 365 is effectively the rebranding of BPOS.)

But the outages also raise some key questions about the cloud.

Cloud Commitment

Microsoft is “all in” with regard to cloud services. Indeed, CEO Steve Ballmer and other executives have spent much of the past year taking every opportunity to tout the company’s upcoming subscription platforms as the wave of its future. Office 365, Windows Azure and other platforms represent Microsoft’s attempts to expand its revenue base beyond traditional, desktop-bound software such as Windows and Office.

But while the cloud offers businesses some noted advantages – chief among them, removing the need to maintain on-site IT infrastructure – it also comes with certain risks. In April, an outage at Amazon Web Services led to service disruptions across the Internet, affecting popular websites such as Reddit, Quora and Hootsuite.

The issues with Amazon led some companies to revert back to on-premises solutions. “We are currently setting up dedicated servers with hard-wired storage,” wrote Andy Singleton, president of Assembla, a software development tools and services provider affected by the EC2 downing. Nonetheless, he touted the benefits of Amazon’s cloud: “We recommend it because their truly on-demand server resources make it possible to rapidly try things, fix things and innovate. Innovation speed is important.”

Amazon isn’t alone in its outages. Google lost some of its users’ email data in February, and launched an aggressive effort at restoration. The possibility of at least some downtime is baked into cloud contracts; the question is what happens with the outage is so catastrophic that it results in data loss, or delays so lengthy they cause a client to lose revenue. For most companies, including Microsoft, the response to an event like the one that hit BPOS last week is to issue some sort of credit for the cloud-time lost.

Even such well-publicised incidents, though, don’t seem to be dissuading businesses as to the ultimate benefits of the cloud. “Clouds will have downtime – it’s a fundamental issue,” Andi Mann, chief cloud strategy guru at CA Technologies, told eWEEK. “But you need to be ready for downtime, whether it’s your own infrastructure or cloud infrastructure. You need to understand what the risk is. It’s all just about risk management.”

In other words, the more businesses gravitate toward the cloud – and the more companies go “all in” on offering cloud services – the more well-publicised cloud incidents will occur. But with each incident, it seems that companies like Microsoft, Google and Amazon learn a little more what works and what doesn’t – and take steps to improve their services that much more. Their future revenues depend on it.