Categories: CloudDatacentre

Google Launches Service For Managing Hadoop, Spark Clusters

Big data analytics technologies such as Hadoop and Spark can help organizations extract business value from massive data sets, but they can be very complex to administer and to manage.

Hoping to help reduce some of that complexity, Google Wednesday announced the launch of a new service dubbed Cloud Dataproc for customers of its cloud platform. The service is currently available only in beta and is designed to minimize the time businesses spend on administering and managing computing clusters in Hadoop and Spark environments.

The company described Cloud Dataproc as a managed Spark and Hadoop service that lets customers of Google’s Cloud Platform create clusters more quickly, manage them more efficiently and save money by allowing them to turn clusters on and off as needed.

Google Hadoop

In a blog post Wednesday, Google Product Manager James Malone listed several features of Cloud Dataproc that he claimed makes the service better than on-premises products and competing services.

Cloud Dataproc, for instance, makes it much faster for enterprises to create and run Spark and Hadoop clusters compared to doing the same thing with on-premises clusters and rival infrastructure-as-a-service (IaaS) platforms, Malone said.

The average time it takes with Cloud Dataproc to start, scale or shut down Hadoop and Spark clusters is 90 seconds or less per operation, compared with between 5 and 30 minutes with on-premises technologies and other IaaS vendors, he claimed.

Cloud Dataproc is also tightly integrated with other Google cloud services such as BigQuery Cloud Logging, Cloud Monitoring and Cloud storage, making it a comprehensive data platform, Malone said. “For example, you can use Cloud Dataproc to effortlessly ETL [extract, transform and load] terabytes of raw log data directly into BigQuery for business reporting,” he noted.

Malone touted Cloud Dataproc’s pricing model as another advantage over alternate options. Google, for instance, currently charges only 1 cent per hour per CPU in a cluster, he said. That price can go down even further if a business chooses Google’s recently announced pre-emptible virtual machines option for running their workloads, he said.

Cloud tools

Google’s pre-emptible VMs allow enterprises to rent out extra infrastructure capacity from the company really cheaply to run short-duration workloads on the condition that the extra capacity can be pre-empted at any time to run regular workloads. “Instead of rounding your usage up to the nearest hour, Cloud Dataproc charges you only for what you really use with minute-by-minute billing and a low, ten-minute-minimum billing period,” Malone said.

Cloud administrators do not need to have to learn any new tools or APIs to be able to use Cloud Dataproc. Google’s Developer Console allows administrators to interact with Spark and Hadoop clusters without any handholding, he added.

Cloud Dataproc adds to a rapidly growing portfolio of tools from Google for working with large datasets and workloads in the cloud. In August, for example, the company boosted performance of its BigQuery data analytics service with new user-defined functions and usability improvements.

Earlier this year, the company announced a new Cloud Monitoring service designed to let enterprises monitor performance, availability and capacity of key Google services like Apps Engine, Cloud SQL and Compute Engine.

Take our data centre quiz here!

Originally published on eWeek.

Jaikumar Vijayan

Recent Posts

EU Publishes iOS Interoperability Plans

European Commission publishes preliminary instructions to Apple on how to open up iOS to rivals,…

1 hour ago

Momeni Convicted In Bob Lee Murder

San Francisco jury finds Nima Momeni guilty of second-degree murder of Cash App founder Bob…

2 hours ago

US Supreme Court Agrees To Hear TikTok Appeal

US Supreme Court says it will hear appeal of TikTok and parent ByteDance against ban…

2 hours ago

Japanese Space Start-Up Destroys Second Rocket After Launch

Japanese start-up Space One destroys Kairos rocket for second time shortly after launch, as country…

3 hours ago

CATL Aims To Massively Expand EV Battery-Swap Infrastructure

World's biggest EV battery maker CATL aims to build 1,000 battery-swap stations next year, rising…

3 hours ago

Facebook ‘Restricted’ Palestinian News Content

Facebook has 'severely restricted' news content from Palestinian outlets since October 2023 amidst bias concerns,…

4 hours ago