Navigation and service

Archiving German-language Twitter – Thank you!

Two persons  in front of a laptop

On 20 February 2023, an initiative launched by the Science Data Center for Literature and the German National Library issued a call for a concerted effort to download as many German-language Tweets as possible from the Twitter archive. The goal was to create as complete an archive of German-language Tweets as possible using a crowdsourcing initiative. The German National Library has made archive servers available to facilitate permanent storage.

Twitter and the Twitter archive are an important source of research for many academic disciplines. However, the platform is experiencing turbulent times following its takeover by a consortium of investors led by Elon Musk. Twitter has already begun remodelling the platform, and further changes are anticipated. Waves of users have been distancing themselves from Twitter or leaving it altogether.

In the light of these developments, scholarly access to the Twitter archive was appearing increasingly uncertain. From the perspective of cultural history and archival science, it was vital to back up and preserve at least part of the Twitter archive.

Some four billion German-language Tweets have been published since Twitter was launched. The academic access API facilitates access to the entire Twitter archive and allows users to download a maximum of ten million Tweets a month. One account would therefore need 400 months to download 4 billion Tweets, but 400 accounts would need just one month to download all of German-language Twitter.

Twitter closed "Academic Research" access to the Twitter API at the end of April 2023. This marked the end of our initiative. Thanks to the contributions made by our supporters, we succeeded in harvesting 200 million German-Language Tweets from around 5.7 million accounts. These encompass the period from March 2006 up to and including May 2011.

How did we go about it?

We performed a search to filter all Tweets classified by Twitter as German-language. In order to download large volumes, we used Twitter’s count API to compile individual batches, each containing a maximum of one million Tweets. We provided access to a web application that facilitated automated downloads (token donation) or the reservation of batches for independent downloads by supporters.

What happens next?

The Tweets are stored at the German National Library, and their preservation is guaranteed. The German National Library intends to make the collection of Tweets available for automated analysis (text and data minding) for scientific purposes. Automated analysis will only be permitted within the German National Library's infrastructure and on its premises.

Last changes: 23.05.2023
Contact: twarchiv@dnb.de

to the top