Google Takes Dataset Search Out Of Beta

Google

Custom-built framework aims to help researchers find the vast amount of data published online by labs, governments, universities and other institutions

Google has brought its dataset search tool out of the beta-testing phase, while adding new features.

Google Dataset Search was originally released in September 2018 to try to make datasets more accessible to researchers.

According to the search company, large amounts of such data is published online, from organisations including universities, governments and labs, but it can be difficult to find via standard searches.

Along with the search tool Google also released a set of open metadata tags, urging publishers to add them to pages containing datasets to make the information easier for search engines to index.

A Google data centre in Oklahoma. Image credit: Google

Metadata framework

Google’s tool has now indexed some 25 million datasets, in areas ranging from penguin populations to volcanic eruptions to medical data.

The information can be used for purposes such as testing hypotheses or to training AI algorithms.

Casual users can also use Google’s dataset search to find information related to their interests, such as a list of the fastest skiiers.

Google said hundreds of thousands of users have tried Dataset Search since its launch, and that the reaction from the scientific community was positive overall.

The company said the journal Nature, for instance, has begun requiring that data sharing take place with the proper metadata, said Natasha Noy, research scientist at Google Research.

New search features include the ability to filter data by type, such as tables, images or text, as well as whether the data is free to use and the geographic area covered.

Data discovery

The search engine is now available to use on mobile devices and has expanded dataset descriptions.

The biggest areas currently indexed include geosciences, biology and agriculture, with the most common queries being “education”, “weather”, “cancer”, “crime”, “soccer”… and “dogs”.

The US is the leader in open government dataset publishing, making more than 2 million available online.

Noy said Google is planning to continue releasing further updates to the search engine now that the beta-testing period has ended.

The company said its ultimate goal is to “help foster an ecosystem” for publishing, discovering and using datasets.