Navigation and service

Using natural language processing to extract missing entities and supplement data

In the background, people working can be seen out of focus, in the foreground a colored area with the title of the project.

Project description

This research project in the field of Jewish studies aims to supplement the Compact Memory and JudaicaLink knowledge graph projects with missing data from the German National Library.

The Compact Memory collection of the Specialised Information Service (FID) for Jewish Studies at the J.C. Senckenberg University Library (Frankfurt am Main) contains more than 500 digitised journals dating from 1768 to 1988. However, the collection is missing a few important editions, most of which were lost during the Holocaust. As yet, there is no comprehensive overview as to which of the journals are complete and where there are gaps. The aim is to create as complete an overview as possible of the metadata of existing editions of the journals and then to add the missing editions and extract the entities from the digitised texts in order to enrich the data already available.

Exploration of this database is a matter of great interest, e.g. when researching topics such as German Jewish orthodoxy, the "science of Judaism", the Jewish Enlightenment (Haskalah) or exile, and when central Europe is the context.


An overview of the metadata will be created against this background in order to identify the gaps. These will be compared with the German National Library's data to determine whether the German National Library holds missing editions and whether these can be added. If they are already digitised, full texts will be generated using Optical Character Recognition (OCR). The entities will then be semantically extracted from the full texts using NLP (Natural Language Processing). The entities found will be exported to JudaicaLink and enriched with data from other knowledge graphs. The enriched data will supplement the Integrated Authority File. The German Exile Archive 1933–1945 is making data available for this purpose. If the time allowed for the project turns out to be insufficient, the extraction and data enrichment steps will be performed for just a few sample texts. The German Exile Archive may also contain other material relevant to the field of Jewish studies such as letters, manuscripts, address books, directories, books etc.

The project was proposed and realised by Benjamin Schnabel.

Duration

September 2023 - February 2024

Contact

DH-Stipendien@dnb.de

Last changes: 31.05.2024
Contact: DH-Stipendien@dnb.de

to the top