Japan Disaster Demonstrates Importance Of Backup
The Japan earthquake should serve as a wake-up call for IT managers about preparing for such events
The 8.9-magnitude 11 March earthquake off the northeast coast of Japan and its subsequent Pacific Ocean tsunami destroyed and severely damaged several cities and towns on the island of Honshu, knocking out utilities and communications connections that impacted much of the Pacific Rim.
It isn’t immediately known how many IT facilities or data centres were washed away in the disaster, but the mere fact that this horrific crisis happened serves to remind IT managers about their own business continuity systems and how well-prepared they are for such an event.
Don’t Take Small Events for Granted
A fact of human nature is that people become complacent as time passes without a real disaster alert affecting an IT system. An event like the 11 March quake ostensibly should serve to wake up those who might not have been testing their systems regularly – or scare those who, in fact, have no backup systems at all in place.
“First of all, everybody should be threat-aware at all times,” Bill Hughes, a business continuity specialist at SunGard Availability Services, told eWEEK. “They also should be looking at all their company’s locations and supply chain in terms of people, when they are checking their data centre resiliency and data recovery systems.”
Hughes said Sungard does a lot of business in California, where earthquakes are more common than in most other regions of the United States. More than just the US West Coast is in an earthquake hazard zone. (See this USGS map; PDF)
“IT managers there have learned to live with that cloud over their heads, thinking that they’ve been through this quake and that quake, that it’s not a big deal, and they can get through another as needed,” Hughes said. “It’s certainly different than where I’m from in the Midwest, but there’s a tendency to get complacent, and they need to not allow that to happen.”
IT managers certainly are aware of protecting their data centres and power connections, but they might not always be aware of the effects a regional disaster might have on their staff people at home, how it might affect transportation and other factors, Hughes said.
“You need to think about how your different locations might be affected [by a disaster], how your people and supply chain partners are affected. People also tend to think a disaster happens, and then it’s over with. Follow-up events also need to be considered,” Hughes said.
The 11 March quake had the initial 8.9 Richter-scale hit followed by more than 20 aftershocks of more than 6.0 each. Each of those was a serious quake that could knock out a data centre on its own.
The Importance of Regular DR Testing
Testing disaster recovery and business continuity systems is a pain in the rear for everybody. It can be time-consuming, is not exciting to perform, and often seems like a useless exercise of people and equipment. Furthermore, it’s very difficult to get a reasonable-facsimile test of an IT system that’s in full operation without stopping the business and taking down the entire works.
However, you cannot overemphasise the importance of testing DR systems, Hughes said.
“You don’t send a football player onto the field without him knowing the playbook and having the practice. You just can’t expect [DR systems] to perform, especially under those circumstances, without having them tested,” Hughes said.
One way to handle DR testing is to not make it an “event,” Hughes said.
“Try to be opportunistic. For example, if you can test whenever you put a new system in – at least test your recovery procedures, recovery scripts and your backups. Now it’s not the same as testing all your systems, but doing these things and making them part of the ongoing process is an important way to keep on top of those things,” Hughes said.
“Doing this certainly isn’t the same as doing an integrated test, but it will keep your documentation up-to-date, your people sharp, and it keeps that issue in front of them.”
The bottom line on testing, Hughes said, is this: “You have to ask yourself: Can we afford not to do it?”