PETRUS - Process-supporting software for the digital German National Library
The aim of the PETRUS (Process-supporting software for the digital German National Library) project was to provide an introduction to the use of machine-based cataloguing processes. The German National Library is looking to reduce its traditional indexing operations in areas which are no longer feasible due to the continually growing number of publications, or are no longer necessary because of technological developments. The purposes of software-supported cataloguing methods are to handle the increasing number of cataloguing tasks, to overcome gaps and inconsistencies in the bibliographic documentation systems, to further consolidate cataloguing of the media and to reduce processing times.
The PETRUS project created the basis for a modular, software-supported cataloguing system. Automated cataloguing modules were developed and implemented for four specific scenarios of descriptive and subject cataloguing of online publications. Software tools from the fields of data and text analysis, computer linguistics, machine learning and information retrieval were used to generate new metadata for searching and indexing from bibliographic data and machine-readable publications (full text documents or digitised tables of content, for example). Workflows, data structures and quality requirements were adapted as well.
The bibliographic records of parallel editions of a publication have automatically been linked since March 2011. Parallel editions are e.g. online and print versions of the same work. Publication-independent bibliographic information such as subject classifications and subject headings or references to the Integrated Authority File are reciprocally exchanged.
All personal names transfered into the bibliographic record as phrases from external sources have automatically been linked to the standardised entries in the Integrated Authority File since the middle of 2011. If the personal name already exists, then the bibliographic record is directly linked, otherwise a new authority record is first created.
The roughly one hundred DDC (Dewey Decimal Classification) subject classes for German and English monographs of the bibliography series O have been automatically issued since January 2012. The classification models were created in advance by machine learning methods using manually catalogued publications.
Subject headings are also to be issued automatically to German online publications using the controlled vocabulary of the Integrated Authority File in the future. Work is still continuing on this process for subject indexing.
The German National Library is continuing this initiative beyond the end of the project. New application scenarios are gradually being developed and further types of media included in the automated cataloguing processes.
2009 - 2011
Last update: 12.4.2013