EMC Moves into Big Database World with Greenplum Appliance

EMC’s answer to the Oracle Exadata and IBM Netezza servers is the Greenplum Data Computing Appliance

As amazing as it seems, EMC, the world’s largest independent data storage and protection company, really did not have a competitive data warehousing choice in its product portfolio until it bought Greenplum in July.

That part of the market, which historically belonged to Teradata – currently about 70 percent – and, to a lesser extent, Netezza (bought by IBM for $1.7 billion), has somehow eluded EMC all these years.

Following the Greenplum acquisition, EMC also partnered with Cloudera to handle some other “big data” business and appeared to be set in the data warehousing world for at least a few years.

Pitching Against IBM And Oracle

EMC has had no new data warehousing products until now. Filling this gap, EMC launched its generically named Greenplum Data Computing Appliance (DCA), a data warehousing system that integrates with a data centre and will serve as a marketplace answer to the Oracle Exadata and IBM Netezza DW servers.

The hardware in each Greenplum rack includes 16 commodity servers, 192 Intel cores and Ethernet connectivity. These are high-transactional machines that handle huge workloads, and they are not inexpensive by any means. Each rack is said to cost at least $1 million per unit, with Exadata apparently being the highest-priced at $1.5 million.

The Greenplum-developed appliance will serve as the first product of the new EMC Data Computing Products Division, which is led by former Greenplum CEO and co-founder Luke Lonergan.

Lonergan described Greenplum as EMC’s new “key enabler of ‘big data’ cloud systems”, which include self-service healthcare and financial and scientific analytics.

“This will allow organisations to store, manage and closely analyse terabytes of detailed data for faster business insight, conclusions and revelations,” Lonergan told eWEEK.

The Greenplum appliance has significance beyond the release of a new high-end data centre product, Lonergan said. “This really marks a stepping out of EMC and VMware into data computing,” Lonergan said. “It’s not just about storing the data; it’s about using the data. That’s really what’s been behind Greenplum selling our data warehouse since 2006.”

The Greenplum DCA, which runs the parallel Greenplum Database 4.0, has been tested at a data-loading performance of 10TB an hour. This is twice as fast as Oracle Exadata and five times faster than the best systems from Netezza and Teradata, Lonergan said.

“Exadata just doesn’t work,” Lonergan said.

There are three main things that set Greenplum apart from Exadata and Netezza, Lonergan said.

“These are: scalability from one rack to 24 racks with one call to EMC – and it will do that while everything [is] online,” Lonergan said. “That would be from 36TB in one rack to low single-digit petabytes, uncompressed. This all scales online, that’s a key.

“Secondly, it uses Fibre Channel over Ethernet [FCOE], a converged networking stack [with 10 Gigabit Ethernet] with 16 FC connections from each rack that can be used to connect into your existing SAN [storage area network]. This enhances the appliance for high-availability.

“The third piece of this is that it’s private-cloud ready – it’s virtualisation-capable,” he said. “It snaps into existing VMware deployments.”

Lonergan said Greenplum has already shipped seven of these systems and expects to ship several hundred more of them in the following quarters.

“This fits into the appliance category but, at the same time, it leverages all the existing investments that the EMC customers in their core storage area network,” Lonergan said.

The Greenplum Data Computing Appliance is available in flexible half-rack [eight boxes], full-rack and multiple-rack appliance configurations for terabyte- to petabyte-scale requirements. It is natively integrated with EMC’s replication, backup and recovery and deduplication software.

“The Greenplum Database software stores a large amount of structured data in [a] format that allows queries and other access methods to complete much faster than if this ‘big data’ was stored in a traditional relational database,” Enterprise Strategy Group analyst Brian Babineau told eWEEK.

“This is not an HPC [high-performance computing] play for EMC; it is a horizontal market opportunity that spans multiple industries where large data warehouses and business intelligence systems support critical operations. Oracle is targeting this market with Exadata, and IBM acquired Netezza for similar reasons – these warehouses are getting so big that response times are not satisfactory and each vendor is trying to solve it with an integrated system,” Babineau said.

“The difference is that the integrated systems are built using different components. EMC is using a unique, new database approach. Oracle is shifting some of its software intelligence to the hardware. IBM/Netezza built an integrated system that changes where the analytics are actually executed.”

Babineau said in the context of “databases” customers usually view EMC as a premier storage systems supplier.

“EMC always had a system configuration that could address any size database with specific performance requirements,” he said. “Now, EMC owns the database software that can help address the performance and operational challenges that customers have today. The DCA is significant because it combines the new database, Greenplum, with the storage into a single, yet extremely scalable system.

“And that storage is not your traditional EMC storage (Symmetrix or Clariion), it is server-attached disk [storage] that is all managed and optimised by the Greenplum database. The bottom line is EMC is finding new ways to solve database performance issues outside of selling more of the same.”

Greenplum Database 4.0 is now shipping as a licensed software-only product for deployment on industry-standard x86 hardware and integrated infrastructure packages, such as the Virtual Computing Environment coalition’s Vblock cloud infrastructure packages.

Vblocks are preintegrated, preconfigured computing systems consisting of networkware from Cisco Systems, storage, security and system management from EMC and virtualisation software from VMware. The resulting cloud computing systems will range in size from hundreds of virtual machines to more than 6,000 virtual machines, depending upon the needs of the customer.