Categories: Security

Hadoop Databases Expose 5 Petabytes Of Data To The Internet

Unsecure Hadoop databases are exposing a massive 5 petabytes (PB) of data to the Internet, putting it at risk from ransom attacks, according to a researcher.

The findings follow a spate of ransom attacks that began in January, when hackers discovered they could steal exposed data and demand payment for its return.

Exposed data

Those attacks affected tens of thousands of databases and most focused on MongoDB, as well as Elastic and Redis instances, due to their popularity.

But John Matherly, creator of the Shodan search engine, said that while fewer Hadoop instances are exposed, the amount of data those databases contain is far greater than that found on MongoDB.


Shodan found only about 4,487 exposed databases using Hadoop’s HDFS file system, about one-tenth of the number of MongoDB instances – 47,820.

But those Hadoop instances expose more than 200 times the amount of data found on the MongoDB instances, at 5,120 terabytes (or 5.1 PB) compared to 25 TB, Matherly said.

“In terms of data volume it turns out that HDFS is the real juggernaut,” he wrote in a blog post.

No authentication

The findings are consistent with figures that predate the ransom attacks, with Binary Edge finding in 2015 that Redis, MongoDB, Memcached and ElasticSearch database instances together only exposed about 1.1 PB of data.

The ransom attacks initially focused on the more numerous servers as hackers looked to amass a large number of ransom payments, with different groups competing to extort payments from the same compromised server, researchers said.

They later moved on to hit hundreds of Hadoop databases as well.

Matherly found the disparity continues today, with “most” of the MongoDB instances appearing to have been compromised, while ransom notes were found on only 207 Hadoop clusters.

Most of the Hadoop instances are located in the US (1,900) and China (1,426), with nearly all being hosted in the cloud – the top providers being Amazon, which hosts 1,059 of the databases, and Alibaba, which hosts 507.

The exposed servers are vulnerable because, due to misconfiguration or other issues, they’re accessible from the Internet without any authentication enabled, Matherly said.

Shodan is better known for its use in locating unsecured Internet-connected devices such as webcams, routers and set-top boxes.

The large numbers of such devices poses a security risk, since they can be hijacked and used to carry out disruptive denial-of-service attacks.

How well do you know the cloud? Try our quiz!

Matthew Broersma

Matt Broersma is a long standing tech freelance, who has worked for Ziff-Davis, ZDnet and other leading publications

Recent Posts

Craig Wright Sentenced For Contempt Of Court

Suspended prison sentence for Craig Wright for “flagrant breach” of court order, after his false…

2 days ago

El Salvador To Sell Or Discontinue Bitcoin Wallet, After IMF Deal

Cash-strapped south American country agrees to sell or discontinue its national Bitcoin wallet after signing…

2 days ago

UK’s ICO Labels Google ‘Irresponsible’ For Tracking Change

Google's change will allow advertisers to track customers' digital “fingerprints”, but UK data protection watchdog…

2 days ago

EU Publishes iOS Interoperability Plans

European Commission publishes preliminary instructions to Apple on how to open up iOS to rivals,…

3 days ago

Momeni Convicted In Bob Lee Murder

San Francisco jury finds Nima Momeni guilty of second-degree murder of Cash App founder Bob…

3 days ago