IBM Building 120 Petabyte Storage System

IBM’s latest project is a 120PB storage system that will support simulations of complex real-world phenomena

IBM is building a hard disk system capable of storing 120 petabytes, or 120 million gigabytes – around 10 times bigger than any previously put together, the company said.

The system, made up of 200,000 conventional hard disk drives working together, is being built for an unnamed client by IBM’s Almaden, California research lab.

Simulations

The storage array should be able to hold around one trillion files and is intended for a client that is building a supercomputer for detailed simulations of complex real-world phenomena, IBM said.

Keeping track of the files’ attributes alone will take up around two petabytes of the systems’ capacity, IBM said.

The software and hardware technologies IBM is developing for the project could be commercialised for systems aimed at weather forecasts, seismic processing in the petroleum industry and molecular studies of genomes or proteins, IBM said.

“This 120 petabyte system is on the lunatic fringe now, but in a few years it may be that all cloud computing systems are like it,” Bruce Hillsberg, director of storage research at IBM and leader of the project, told MIT’s Technology Review.

The new hardware developed for the system includes technology for making the thousands of drives work together and a water-based cooling system.

New software

IBM developed software techniques for ensuring that data is protected from the inevitable drive failures, while maintaining high performance. The company uses a file system called GPFS developed at IBM Almaden which spreads files across mutliple disks in order to speed up data access.

In July IBM used GPFS to index 10 billion files in 43 minutes, breaking a previous record of one billion files scanned in three hours.

“Big data” is becoming a major industry trend, with IT giants such as Yahoo grooming technology such as Apache Hadoop into an industry standard for data analytics.

HP’s recent acquisition of British software firm Autonomy was seen as another indicator of the importance of big data. Autonomy is a fast-growing, multifaceted IT provider that placed itself on the global storage map by acquiring Iron Mountain’s digital archiving, e-discovery and online backup business for $380 million(£230m) in cash in May 2011.