Subject cataloguing of scientific publications by machine learning
Project Description
The aim of the project is to develop machine learning models based on the extensive metadata holdings of the German National Library. These models will serve to mathematically catalogue the content of scientific publications in order to understand them at an abstract level and establish relationships between them. This will facilitate content searches and other functions. Free, pre-trained language models are being used as the foundation; however, the project is different from the large language models used by tech concerns. The project is using a training dataset specially created from DNB holdings to work on a model which is to be as streamlined as possible but optimised for the application. The researchers are also developing a web application which will allow users to visualise the proximity and distance between publications in terms of content and search for entries with similar content.
The project was proposed for and is being worked on by Linus Herterich and Max Schaible.
Duration
October 2024 – March 2025
Contact
Last changes:
04.11.2024
Contact:
DH-Stipendien@dnb.de