Navigation and service

Updating large data sets

The OAI (Open Archives Initiative) interface can be used for regular updates of large amounts of data. Continuous synchronisation is only possible after the initial import of an up-to-date data dump to a separate database. Access to the OAI interface is free of charge, as is the access to the initial data dump. You will find further information in Dialog mit Bibliotheken, issue 2013,1.

An overview of all available metadata as well as the different options of obtaining data is given here.

Background

OAI (Open Archives Initiative) is an initiative with the task of defining an open interface for the exchange of metadata. Communication in such an interface takes place between the German National Library as the data provider and a service provider that requests the data. The data is collected automatically by a so-called “OAI harvester”. The protocol used for this method of communication is known as OAI-PMH (OAI Protocol for Metadata Harvesting).

OAI interface standard

Protocol: OAI-PMH Version 2.0

OAI PMH Protocol

The OAI PMH protocol is web-based. The OAI harvester works with simple requests using HTTP-GET or -POST and receives an HTTP response from the data provider in return. This response contains the requested metadata embedded in an XML structure.

OAI harvester

In order to use OAI to compare data between the German National Library and a service provider, the service provider must have implemented an OAI harvester. The OAI harvester calls itself repeatedly in a continuous loop. In doing so, it executes a "ListRecords command" (see OAI-Functions) limited to the dataset (catalogue) defined for the service provider. The time of the last retrieval is appended to the “ListRecords command“ using a time stamp. This guarantees that

  • no change is missed.
  • changes appear in the service provider’s database as soon as possible.
  • no data irrelevant to the service provider is transported.

OAI-Functions

The OAI-PMH protocol includes six basic functions which are appended to the baseURL (e.g. "https://services.dnb.de/oai/repository") using "?verb=":

  • Identify: Display general information about OAI repository, e.g.

    • repositoryName
    • baseURL
    • protocolVersion
  • ListSets: Information about all data sets (catalogues) available in OAI repository
  • ListMetadataFormats: List of all data formats (information on available export formats)
  • ListRecords: Harvests data records by specifying a data set and optionally a period (from/until)
    or
  • ListIdentifiers: Harvest identifiers of data records (PPN/IDN) by entering a data set and optionally the time period (from/until). Times are entered using Universal Time Coordinated, UTC. This is the core function of OAI. It facilitates selective harvesting, i.e. the harvester can restrict its request to data records that

    • come from a specific catalogue and
    • were generated or modified during a specific period.

    Parameter:

    • set: the catalogue from which the data records originate
    • from/until (optional): points which define the time period for selective harvesting. Depending on the OAI repository, these can be defined to the day (YYYY-MM-DD) or to the second (yyyy-mm-ddThh:mm:ssZ).
    • metadataPrefix: The values provided can be requested via the function ListMetadataFormats (see above).
    • resumptionToken: facilitates the return of partial responses. The OAI harvester is sent a resumption token which can be used for a new request to obtain the next responses from the OAI repository.
  • GetRecord: Call of individual data records using the ID. The identification number of the data record concerned must be known for this (MARC 21: 035 $a with prefix DE-101 respectively DE-599).
    Parameter:

    • identifier: Identification number of required data record
    • metadataPrefix: Name of the data format in which the data record is to be issued. The selection of values can be requested using the ListMetadataFormats command (see above).

Note: For a targeted search of single data records we recommend our SRU interface.

Access requirements

The OAI interface can be accessed free of charge and without registration.

German National Library catalogues (sets) available through OAI

Bibliographic data
SelectionValue for “set” parameter
Deutsche Nationalbibliografie excluding Integrated Authority File (GND)
dnb
dnb:wv (only data records for which compilation is complete)
Deutsche Nationalbibliografie, series A (publications from the publishers’ book trade)dnb:reiheA (incl. data records which are still undergoing compilation and those for which compilation is complete)
dnb:wv:reiheA (only data records for which compilation is complete)
Deutsche Nationalbibliografie, series B (publications from outside the publishers’ book trade)dnb:reiheB (incl. data records which are still undergoing compilation and those for which compilation is complete)
dnb:wv:reiheB (only data records for which compilation is complete)
Deutsche Nationalbibliografie, series C (maps)dnb:reiheC (incl. data records which are still undergoing compilation and those for which compilation is complete)
dnb:wv:reiheC (only data records for which compilation is complete)
Deutsche Nationalbibliografie, series H (university publications)dnb:reiheH (incl. data records which are still undergoing compilation and those for which compilation is complete)
dnb:wv:reiheH (only data records for which compilation is complete)
Deutsche Nationalbibliografie, series M (printed music) dnb:reiheM (incl. data records which are still undergoing compilation and those for which compilation is complete)
dnb:wv:reiheM (only data records for which compilation is complete)
Deutsche Nationalbibliografie, series T (recorded music)dnb:reiheT (incl. data records which are still undergoing compilation and those for which compilation is complete)
dnb:wv:reiheT (only data records for which compilation is complete)
Deutsche Nationalbibliografie, series O (online publications)dnb:reiheO (incl. data records which are still undergoing compilation and those for which compilation is complete)
dnb:wv:reiheO (only data records for which compilation is complete)
Deutsche Nationalbibliografie, restriction to one subject categorydnb:sg020 (example for library and information sciences)
Deutsche Nationalbibliografie, series A, restriction to one subject category dnb:wv:reiheA:sg720 (example for architecture)
Deutsche Nationalbibliografie: Digitalisierte Inhaltsverzeichnissednb:toc
Catalogue of the German National Library with New Release Service and online publications (without GND) dnb-all
New Release Service (information from publishers on announcements and new releases)dnb-all:reiheN
Online publications without restrictions
dnb-all:online
Restriction to one subject category
dnb-all:online:sg020 (example for library and information sciences)
Online dissertations (also without autopsy)
dnb-all:online:dissertations
Restriction to one subject category
dnb-all:online:dissertations:sg720 (example for architecture)
German Music Archive (DMA, incl. collection of historical sound recordings)dnb-all:dma
German Exile Archive 1933–1945 (DEA)dea
German Museum of Books and Writing (DBSM)dbsm
Authority data
SelectionValue for “set” parameter
Integrated Authority File (GND)
authorities
Integrated Authority File, GND entity Geographical Entity*authorities:geografikum
Integrated Authority File, GND entity Congress*authorities:kongress
Integrated Authority File, GND entity Corporate Body*authorities:koerperschaft
Integrated Authority File, GND entity Person*authorities:person
Integrated Authority File, GND entity Subject Term*authorities:sachbegriff
Integrated Authority File, GND entity work*authorities:werk
German Union Catalogue of Serials/ISIL and library codes
SelectionValue for “set” parameter
German Union Catalogue of Serials (ZDB)**
zdb
ZDB, holdingszdb:holdings
ZDB, restriction to one subject categoryzdb:sg010 (example for computer science)
ZDB, online publicationszdb:online
ZDB, online publications, holdingszdb:online:holdings
ZDB, online publications, restriction to one subject categoryzdb:online:sg010 (example for computer science)
ZDB, free online publicationszdb:online:free
ZDB, ZDB, free online publications, holdingszdb:online:free:holdings
ZDB, free online publications, Restriction to one subject category zdb:online:free:sg010 (example for computer science)
ISIL- and Library Code Directory***
bib

* When harvesting individual OAI subsets of the Integrated Authority File (GND) through the OAI interface, the relationships between linked GND data records cannot be traced if the corresponding GND data records belong to another OAI subset (or another GND entity). As usual, the links are contained in the subset with the identifier (MARC 21: fields 5XX $0) and as a text or string of characters (MARC 21: fields 5XX $a); however, the linked data record itself is not included if it is another entity.

**The German Union Catalogue of Serials is a service provided jointly by the Staatsbibliothek zu Berlin - Preußischer Kulturbesitz (Berlin State Library – Prussian Cultural Heritage) and the German National Library.

***The ISIL- and Library Code Directory is the address file of the German ISIL and Library Code Agency at Berlin State Library.

Datasets of free selections of digitized objects are available via the OAI2 interface. You will find more information in the DNBLab.

Formats

Detailed information on the formats available is given here.

Terms of use and provision

Detailed information on terms of use and provision is given here.

Practical examples

Syntax of an OAI query

Request to OAI server of the German National Library

https://services.dnb.de/oai/repository

Command to server

?verb=ListIdentifiers

Parameter “from” defines the beginning of the period requested

&from=2021-04-21

Parameter “until” defines the end of the period requested

&until=2021-04-22

Format desired for OAI response

&metadataPrefix=MARC21-xml

Defines the catalogue or set

&set=authorities

Syntax of an OAI request with a specific ID number

https://services.dnb.de/oai/repository?verb=GetRecord&metadataPrefix=MARC21-xml&identifier=oai:dnb.de/authorities/118540238

Syntax of an OAI request for online dissertations in the subject category “Social Sciences, Sociology, Anthropology” from a defined period

https://services.dnb.de/oai/repository?verb=ListRecords&from=2020-04-01T14:55:00Z&until=2020-07-08T09:54:59Z&metadataPrefix=oai_dc&set=dnb-all:online:dissertations:sg300

Frequently asked questions (FAQ)

What is the difference between SRU and OAI?


SRU enables you to conduct purposeful research without having your own database.
OAI facilitates the continual synchronisation of large amounts of data. This requires the import of current basic stock into a separate database.

What can be requested using OAI?

An OAI request returns all data records that were newly created or modified during the period defined. Retrospective harvesting over several months is not recommended, as certain automated processes have been put in place to ensure that the data records are continually updated. This means that any data harvested retrospectively would be incomplete. We provide complete sets of any retrospective data required.

What cannot be requested using OAI?

OAI cannot be used to request data records using criteria other than the date on which they were changed. Please use the SRU interface for such requests.
The OAI interface cannot be used to request all new data records. Why not?
When a new data record is created, the change date is the same as the creation date. If this record is changed manually or automatically at a later date, the change date is adjusted and a request sent to the OAI interface specifying the period in which the record was created will return no hits even though the record was actually created at this time. The record will only be delivered if the time of the last change is specified, as this is the only criterion recognised by OAI.

How often can the repository be searched using OAI?

The German National Library advises users not to perform searches of the repository at intervals of less than one minute, since data may otherwise be delivered in duplicate (does not apply to OAI requests using “resumptionToken”). The period between searches should not be shorter than the defined period.

Is there a limit on the number of data records per OAI response?

Hit lists are limited to 100,000 data records each. An error message is returned if the number of hits is greater. If this occurs, the search period and frequency must be reduced accordingly.

Which time period can be searched?

A limited search period should be defined in order to ensure that not more than 100,000 hits are returned. Recommended search period/frequency for searches that are not time-critical: 30 minutes. In the case of smaller sets (e.g. online dissertations), harvesting can be restricted to once a day or once a week, since data records changed several times during this period will only be harvested once to prevent too many hits from being returned.
We also recommend using the time setting in the “responseDate” element, e.g. <responseDate>2017-08-30T08:12:54Z</responseDate> as the retrieval time (“from”), as this time most accurately reflects the current availability of data in our repository. We also recommend harvesting with a small time overlap (“responseDate” minus one minute = “from”).

What happens when data records undergo extensive changes?

In the case of special operations requiring changes to more than 50,000 data records in one day, we proceed as follows:

  • Extensive special operations are carried out at weekends/on public holidays.
  • The changes to the data records are not logged, which means that this data is not visible via the OAI index.
    The changes to the data records are visible only in the OAI index at a later date when the OAI index has been completely newly created.
  • OAI deletions are not delivered.

A system that harvests data promptly after a change will therefore miss the changes, even if it running continually. Only if the changes are relevant to the harvesting system can the harvesting interval be reset as necessary.

Are redirected and deleted records delivered too?

OAI deletions are not delivered.
For ZDB bibliographic and local data and the GND, deleted and redirected records are delivered as truncated data records and labelled as such; they are furnished with ID numbers, standard and date fields.
Local data marked as deleted is available via the OAI interface for no more than one week (further information in FAQ on ILTIS).
The process for title changes has not yet been finalised.

Multi-part monographs and parts of continuing and integrating resources: harvesting superordinate data records

Only the data records for each part (with dependent or independent title) are delivered via the OAI interface. In contrast, the data service delivers both the data records for each part and the superordinate data records via SFTP/WWW servers. In general, OAI-PMH can be used to retrieve the headers by entering the corresponding identifiers. Depending on the type of data record (with dependent or independent title) and format, this identifier is stored in various fields or elements of the data records for each part, cf. overview (available only in German).

Can specific entries be expected in a certain time period?

No, because data records are constantly being changed and can only be searched via OAI using the changed date, which means that it is impossible to predict whether a particular record will be contained in an OAI response.

Using a "resumptionToken” when a search returns 51 or more data records

A “resumptionToken” facilitates the delivery of partial responses when a search returns 51 or more data records. The OAI harvester receives a token which it has to use for an additional request from the OAI repository in order to get the next partial response. If not all records have yet been delivered, a new “resumptionToken” is generated with each partial response and must be used to obtain the next. The element “resumptionToken” contains the attribute values of the current list position and the total number of data records (cursor="50" completeListSize="xxxxxx"). Each OAI response delivers a maximum of 50 data records. A “resumptionToken” is valid for no more than 30 minutes.

Example using a “resumptionToken”:

https://services.dnb.de/oai/repository?verb=ListRecords&resumptionToken=xxxxxxxxxx_hier_resumptionToken_einfügen_xxxxxxxx

What is Universal Time Coordinated (UTC)?

UTC (Universal Time Coordinated) is used as the basis for calculating the local time.
Example for Germany:
Local time for Berlin is UTC+1 (during summer time it is UTC+2). The OAI server gives the time in UTC.
If the local time is 10:30, it therefore only makes sense to harvest until UTC 09:29, or until UTC 08:29 in summer time.

How are mark-up characters transported in an OAI response?

Mark-up characters (<, > and &) in an OAI response are transported in a CDATA section.

Queries/error messages

Please send any specific queries or error messages to the OAI interface without delay. This is the only way to ensure transparency and help resolve the problem.
Please send queries or error messages to schnittstellen-service@dnb.de together with the following information:

  • Syntax of OAI-PMH request (set, format, time period)
  • Error message/description
  • Context (repeated, sporadic, client used etc.)

How should the time-out be set?

We recommend setting the time-out to at least 2 minutes so that the harvesting process is not interrupted. If extensive changes are made, the response times for certain sets (e.g. zdb:holdings) may last more than 90 seconds even if short query times are entered.

Further processing

If you are just getting started with processing metadata, useful programmes include the software suite Catmandu, OpenRefine or Metafacture, while data can be analysed with "Konstanz Information Miner" (KNIME) or the Metadata Quality Assurance Framework. A more detailed overview is provided in the presentation slides "Open Source Software zur Verarbeitung und Analyse von Metadaten" (available only in German) and the article "Survey of Tools for Linked Data Consumption".

How can I get important information, e.g. about changes and disruptions at the OAI interface?

We suggest you sign up to the OAI interface mailing list to make sure you receive timely information about changes, new developments, disruptions and maintenance work.

Contact

schnittstellen-service@dnb.de

News

Last changes: 24.10.2024
Short-URL: https://www.dnb.de/EN/oai
Contact: schnittstellen-service@dnb.de

to the top