Categories: CloudDatacentre

Google Launches Service For Managing Hadoop, Spark Clusters

Big data analytics technologies such as Hadoop and Spark can help organizations extract business value from massive data sets, but they can be very complex to administer and to manage.

Hoping to help reduce some of that complexity, Google Wednesday announced the launch of a new service dubbed Cloud Dataproc for customers of its cloud platform. The service is currently available only in beta and is designed to minimize the time businesses spend on administering and managing computing clusters in Hadoop and Spark environments.

The company described Cloud Dataproc as a managed Spark and Hadoop service that lets customers of Google’s Cloud Platform create clusters more quickly, manage them more efficiently and save money by allowing them to turn clusters on and off as needed.

Google Hadoop

In a blog post Wednesday, Google Product Manager James Malone listed several features of Cloud Dataproc that he claimed makes the service better than on-premises products and competing services.

Cloud Dataproc, for instance, makes it much faster for enterprises to create and run Spark and Hadoop clusters compared to doing the same thing with on-premises clusters and rival infrastructure-as-a-service (IaaS) platforms, Malone said.

The average time it takes with Cloud Dataproc to start, scale or shut down Hadoop and Spark clusters is 90 seconds or less per operation, compared with between 5 and 30 minutes with on-premises technologies and other IaaS vendors, he claimed.

Cloud Dataproc is also tightly integrated with other Google cloud services such as BigQuery Cloud Logging, Cloud Monitoring and Cloud storage, making it a comprehensive data platform, Malone said. “For example, you can use Cloud Dataproc to effortlessly ETL [extract, transform and load] terabytes of raw log data directly into BigQuery for business reporting,” he noted.

Malone touted Cloud Dataproc’s pricing model as another advantage over alternate options. Google, for instance, currently charges only 1 cent per hour per CPU in a cluster, he said. That price can go down even further if a business chooses Google’s recently announced pre-emptible virtual machines option for running their workloads, he said.

Cloud tools

Google’s pre-emptible VMs allow enterprises to rent out extra infrastructure capacity from the company really cheaply to run short-duration workloads on the condition that the extra capacity can be pre-empted at any time to run regular workloads. “Instead of rounding your usage up to the nearest hour, Cloud Dataproc charges you only for what you really use with minute-by-minute billing and a low, ten-minute-minimum billing period,” Malone said.

Cloud administrators do not need to have to learn any new tools or APIs to be able to use Cloud Dataproc. Google’s Developer Console allows administrators to interact with Spark and Hadoop clusters without any handholding, he added.

Cloud Dataproc adds to a rapidly growing portfolio of tools from Google for working with large datasets and workloads in the cloud. In August, for example, the company boosted performance of its BigQuery data analytics service with new user-defined functions and usability improvements.

Earlier this year, the company announced a new Cloud Monitoring service designed to let enterprises monitor performance, availability and capacity of key Google services like Apps Engine, Cloud SQL and Compute Engine.

Take our data centre quiz here!

Originally published on eWeek.

Jaikumar Vijayan

Recent Posts

Mark Zuckerberg Overtakes Bezos To Become Second-Richest Man

Billionaire battle. Meta's boss Mark Zuckerberg overtakes Jeff Bezos to become the world’s second richest…

20 hours ago

US, Microsoft Disrupts Russian FSB Hackers

Internet domains used by “Russian intelligence agents and their proxies” for cyberattacks, seized by the…

22 hours ago

Mike Lynch Died From Drowning, Coroner Inquest Rules

UK's tech billionaire Dr Mike Lynch died from drowning on his superyacht, but his daughter's…

1 day ago

Tesla Recalls 27,000 Cybertrucks Over Rear Camera Issue

Another recall for thousands of Tesla Cybertrucks over delay with rear camera, with could hamper…

2 days ago

Browser Firms Press EU To Reconsider Microsoft Edge As Gatekeeper

Browser firms write to European Commission alleging Microsoft's Edge web browser enjoys an unfair advantage

2 days ago

Microsoft Invests €4.3 Billion In Italy For AI, Cloud

Data centre and AI spending spree continues over at Microsoft, with Italy earmarked for €4.3…

2 days ago