Navigation and service

We modernise for you
in Frankfurt am Main!

The German National Library in Frankfurt am Main will be closed from 10 to 22 March 2025. The exhibitions of the German Exile Archive will open Monday to Friday from 9:00 to 21:30 and on Saturdays from 10:00 to 17:30.

Find out more

Subject cataloguing of scientific publications by machine learning

Open book with neural network in the background, in the foreground a coloured area with the title of the project.

Project Description

The aim of the project is to develop machine learning models based on the extensive metadata holdings of the German National Library. These models will serve to mathematically catalogue the content of scientific publications in order to understand them at an abstract level and establish relationships between them. This will facilitate content searches and other functions. Free, pre-trained language models are being used as the foundation; however, the project is different from the large language models used by tech concerns. The project is using a training dataset specially created from DNB holdings to work on a model which is to be as streamlined as possible but optimised for the application. The researchers are also developing a web application which will allow users to visualise the proximity and distance between publications in terms of content and search for entries with similar content.

The project was proposed for and is being worked on by Linus Herterich and Max Schaible.

Duration

October 2024 – March 2025

Contact

DH-Stipendien@dnb.de

Last changes: 04.11.2024
Contact: DH-Stipendien@dnb.de

to the top