Six Libraries To Archive A Copy Of British Internet

From Saturday, six major libraries in the UK will start archiving digital content, hoping to eventually hold a copy of every website hosted in .uk domain space.

New Legal Deposit regulation will enable the British Library, the National Library of Scotland, the National Library of Wales, the Bodleian Libraries, Cambridge University Library and Trinity College Library in Dublin to collect a copy of every digital publication in Britain, just like they do with print editions.

The archive will serve to preserve the nation’s cultural heritage and make it available to future generations. In time, it could evolve into a database holding every public tweet or Facebook page.

Digital content comes of age

Libraries in the UK have been archiving printed media for centuries – British Library alone stores 150 million physical entries “representing every age of written civilisation”. This enormous collection was made possible thanks to Legal Deposit, a practice that requires publishers to submit copies of their work to one of the officially certified libraries, which has been enshrined in English law since 1662.

A decade ago, the government decided to update the law with the Legal Deposit Libraries Act of 2003, which extended the rules to include e-books, CDs, DVDs and websites. It will officially come into force on 6 April.

“Preserving and maintaining a record of everything that has been published provides a priceless resource for the researchers of today and the future. So it’s right that these long-standing arrangements have now been brought up to date for the 21st century, covering the UK’s digital publications for the first time,” said Culture Minister Ed Vaizey.

“Digital content can now be effectively archived and our academic and literary heritage preserved, in whatever form it takes.”

Initially, the project will ‘harvest’ 4.8 million websites containing over a billion pages, including copies of password-protected or paid-for content. By the end of this year, the results of the first archiving crawl of the .uk domain will be available to researchers. As for the public, access to non-print materials will be offered through on-site reading room facilities at each of the participating libraries.

The capacity of the archive is expected to be constantly upgraded over the coming years. The system – which includes the open source Hadoop software – has been built by IBM.

According to leader of the project Lucie Burgess, a lot of material related to events such as the 7/7 bombings or the 2008 financial crisis has already been lost or taken down.

“We will have to distinguish between content published in the UK and elsewhere but in principle we will be able to archive the publicly available tweets of any individual, company or organisation,” Burgess told AFP.

“Ten years ago, there was a very real danger of a black hole opening up and swallowing our digital heritage, with millions of web pages, e-publications and other non-print items falling through the cracks of a system that was devised primarily to capture ink and paper,” said Roly Keating, CEO of the British Library.

“The regulations now coming into force make digital legal deposit a reality, and ensure that the Legal Deposit Libraries themselves are able to evolve – collecting, preserving and providing long-term access to the profusion of cultural and intellectual content appearing online or in other digital formats.”

Are you fluent in the language of the Internet? Take our quiz!

Max Smolaks

Max 'Beast from the East' Smolaks covers open source, public sector, startups and technology of the future at TechWeekEurope. If you find him looking lost on the streets of London, feed him coffee and sugar.

View Comments

Recent Posts

X’s Community Notes Fails To Stem US Election Misinformation – Report

Hate speech non-profit that defeated Elon Musk's lawsuit, warns X's Community Notes is failing to…

1 day ago

Google Fined More Than World’s GDP By Russia

Good luck. Russia demands Google pay a fine worth more than the world's total GDP,…

1 day ago

Spotify, Paramount Sign Up To Use Google Cloud ARM Chips

Google Cloud signs up Spotify, Paramount Global as early customers of its first ARM-based cloud…

2 days ago

Meta Warns Of Accelerating AI Infrastructure Costs

Facebook parent Meta warns of 'significant acceleration' in expenditures on AI infrastructure as revenue, profits…

2 days ago

AI Helps Boost Microsoft Cloud Revenues By 33 Percent

Microsoft says Azure cloud revenues up 33 percent for September quarter as capital expenditures surge…

2 days ago