Hadoop World Highlights Dataset Security Issues
Hadoop World speakers touted the software for analysing unstructured data, others focused on security
Organisations planning to use Hadoop to aggregate and analyse data from multiple sources need to consider potential security issues beforehand, according to IT professionals at the Hadoop World conference in New York this week.
Hadoop makes it easier for organisations to get a handle on the large volumes of data, or Big Data, being generated each day, but can also create problems related to security, data access, monitoring, high availability and business continuity, Larry Feinsmith, managing director of IT operations at banking giant JPMorgan Chase, said in a keynote speech at Hadoop World in New York City.
Big data growing bigger
Data is growing faster than ever before, thanks to blogs, social media networks, machine sensors and location-based data from mobile devices. Companies can analyse the data to gain insights into customers and industry trends they weren’t able to have in the past.
However, organisations are faced with the prospect of somehow managing and securing petabytes and petabytes of data, Richard Clayton, a software engineer with Berico Technologies, said in a security panel at the conference.
The data is not monolithic, as there may be mixed classifications and varying levels of security sensitivity, Clayton said. As an IT services contractor for federal agencies, Berico Technologies had to consider varying encryption technologies, retention policies and access requirements for individual pieces of data.
Most organisations do not have the visibility they need to understand what they have and to properly secure it, Ken Cheney, vice-president of business development and marketing at storage management software vendor Likewise, told eWEEK before the conference. The visibility is essential in order to “know who owns the data, and who has access to it,” Cheney said.
Enterprises need to implement appropriate security controls for enforcing role-based access to the data, according to Clayton. However, he felt that built-in Hadoop Distributed File System (HDFS) security features, such as Access Control Lists and Kerberos, were not adequate to meet enterprise needs.
Many organisations tie the data being stored to identity management systems, such as Active Directory or LDAP, as the “source of truth,” according to Cheney. By linking the data with an actual identity, IT departments can track what is being done with the data and by whom, he said.
Dataset protection
Another big concern for organisations using Hadoop is the fact that analysing the data within the environment creates new datasets that also need to be protected, Clayton said. The data being aggregated in one place also increases the risk of data theft or accidental disclosures, he said. An effective data security approach in many Hadoop environments would be to encrypt the data at the individual record level, while it is in transit or being stored, according to Clayton.
Many government agencies are putting Hadoop-stored data into separate “enclaves”, or network segments, to ensure that only people with the proper level of security clearance can view the information, he said. Others are building firewalls that protect Hadoop environments and restrict access, Clayton said.
Some agencies have opted out of using Hadoop databases altogether because of these data access concerns, according to Clayton.
Large companies such as IBM, Yahoo and Google have been using Hadoop for years, but it’s only recently that large enterprises have started looking at Hadoop to rein in their out-of-control data.
JPMorgan Chase has been using the open source storage and data analysis framework for almost three years in various applications, such as fraud detection, IT risk management and self-service, Feinsmith said. Chase relies on Hadoop to collect and store Weblogs, transaction data and social media information on a common platform and runs data mining and analytics applications to gather intelligence, according to Feinsmith.