Facebook Crashes Twice In A Comedy Of Errors

Facebook went down on Thursday for two and a half hours because of a mishandling of an error condition in the social network’s system.

Web performance management company AlertSite logged that the site availability dropped to 38.46 per cent yesterday evening. Robert Johnson, director of software engineering at Facebook, wrote an apology to the affected users and detailed the problem.

Errors Flagging Errors

Basically, a routine used to handle invalid data found during error-checking was itself interpreted as in error. This caused the system to try to replace it. It could only use replacement code that was the same as the flagged routine. On top of that, the checker was still receiving routine calls from the rest of the system, grinding the whole system to a halt.

From the user viewpoint, their only friend on Facebook was a message saying that there was a “DNS error”. For Facebook’s IT team, it meant a few red faces in their new green data centre.

The error-checker, unsurprisingly, found that too to be in error and so an infinite loop began. A classic case of a developer not thinking outside the box and a literal comedy of errors resulting from it.

“The way to stop the feedback cycle was quite painful,” Johnson wrote, “We had to stop all traffic to this database cluster, which meant turning off the site. Once the databases had recovered and the root cause had been fixed, we slowly allowed more people back onto the site.”

Facebook engineers have yet to provide a fix for the condition, In the meantime, the reconfiguration module has been switched out. Presumably, Facebook executives have crossed their fingers that this will not adversely affect the system again.

Johnson’s missive ends: “We apologise again for the site outage, and we want you to know that we take the performance and reliability of Facebook very seriously.”

It is the worst outage that Facebook has had in the past four years but it is also the second in two days. Yesterday’s problem was a lot shorter, affected fewer people and was put down to issues at a third-party networking provider.


Eric Doyle, ChannelBiz

Eric is a veteran British tech journalist, currently editing ChannelBiz for NetMediaEurope. With expertise in security, the channel, and Britain's startup culture, through his TechBritannia initiative

Recent Posts

FTC Plans Investigation Into Microsoft Cloud Business – Report

Microsoft's cloud business practices are reportedly facing a potential anti-competitive investigation by the FTC

41 mins ago

Programmer Sentenced To Five Years In Prison For Bitcoin Laundering

Ilya Lichtenstein sentenced to five years in prison for hacking into a virtual currency exchange…

2 hours ago

Hate Speech Watchdog CCDH To Quit Musk’s X

Target for Elon Musk's lawsuit, hate speech watchdog CCDH, announces its decision to quit X…

19 hours ago

Meta Fined €798m Over Alleged Facebook Marketplace Violations

Antitrust penalty. European Commission fines Meta a hefty €798m ($843m) for tying Facebook Marketplace to…

21 hours ago

Elon Musk Rebuked By Italian President Over Migration Tweets

Elon Musk continues to provoke the ire of various leaders around the world with his…

22 hours ago

VW, Rivian Launch Joint Venture, As Investment Rises To $5.8 Billion

Volkswagen and Rivian officially launch their joint venture, as German car giant ups investment to…

23 hours ago