OAI - frequently aksked questions (FAQs) / Troubleshooting
What is the difference between SRU and OAI?
SRU permits users to conduct a specific search without the need for a database of their own.
OAI permits continuous synchronisation of large amounts of data; this is based on the import of data from an up-to-date basic stock to a separate database.
What can be queried using OAI?
An OAI search yields all data records which were changed or created in the specified time period.
What cannot be queried using OAI?
It is not possible to use OAI to carry out a filtered search which extends beyond the time period. It is therefore impossible to search in all data records for a certain criterion, i.e. other than the date of change. The SRU-Interface needs to be used for such a search.
it is not possible to search for all new records using the OAI interface. And why not?
If a new data record is created, the date of change is the same as the date of creation. If this data record is then changed at a later date, manually or automatically, the date of change will be adjusted and a search of the OAI interface based on the time period of the date of creation would not yield any hits although the record was actually created at that time. This record would only be issued if the time period of the last change is entered, as this is the only criterion recognised by OAI.
How often can the DNB repository be searched using OAI?
The DNB recommends not launching two searches within one minute as this could produce duplicated data output (except for using a "resumptionToken"). The search frequency should not be smaller than the time period being queried.
Is the number of data records per OAI response limited?
Hit lists are limited to 100,000 data records. An error message is issued if the hit list is larger than this. The search period and frequency should be limited accordingly.
Which time period can be queried? Is there a recommendation?
In order to avoid creating a hit list with more than 100,000 data records, the query period should not be too large. Recommendation for non-time-critical processes for the query time period/frequency: 30 minutes. Harvesting once per day or once per week is sufficient for small sets (e. g. online dissertations), as a data record which has been changed several times during this period would only be harvested once, yet the hit list would still not be too large.
We also recommend using the time setting in the "responseDate" as the retrieval time ("from“) , e. g. <responseDate>2017-08-30T08:12:54Z</responseDate>, as this time most accurately reflects the current availability of the data in our repository. We also recommend harvesting with a small time overlap ("responseDate" minus one minute = "from").
How does the German National Library prevent extensive processing from delaying the harvesting of current data?
In the case of special operations involving changes to more than 50,000 data sets, the procedure is as follows:
- Extensive special operations are carried out on weekends / national holidays
- The changes to the data records are not logged, meaning that this data is not visible via the OAI index
- The data changes only become visible in the OAI index at a later date when the OAI index is generated afresh
A fine-granular harvesting system therefore picks up none of the changes, even if it is continually in operation. Only if the changes are relevant for a harvesting system can the harvest time interval be re-set accordingly.
Are deletions issued, too?
OAI deletions are not issued.
Deleted and redirected records are delivered for ZDB bibliographic, local and GND data as truncated data records; they are labelled as such and have ID numbers, standard and date fields.
Local records with delete marking are maximum available for one week only.
The process for title changes has not yet been finally established.
Can specific entries be expected in a specified time period?
No, because data records can be changed repeatedly and data records can only be searched via OAI using the date of change, meaning that it is impossible to predict if a particular record is contained in an OAI reply.
Period of validity and use of a "resumption token"?
These permit the return of partial responses. The OAI harvester is sent a resumption token which it can use for a new request from the OAI repository to obtain the next responses. The current list position and the size of the entire list (cursor="50" completeListSize="XXXXXXX") are returned with the token. From experience, a resumption token is valid for roughly 40-45 minutes in a ListRecords inquiry. The list position in the next OAI response can be freely configured using the figure after the underscore, e. g. resumptionToken=XXXXXXX_150 (start at list position 150). 50 records are delivered per OAI response.
What is UTC (UTC)?
UTC (universal time coordinated) is used as the basis for calculating the local time.
Example for Germany:
Local time for Berlin is UTC+1 (during summer time = UTC+2). The time given by the OAI server is given as UTC.
If the local time is 10:30 it therefore only makes sense to harvest until UTC 09:29, or until UTC 08:29 in summer time.
How is markup transported in a SRU response?
Markup is transported in a CDATA section.
Queries / Error messages
Please send any concrete queries / report any faults to the OAI interface without delay. This is the only way to ensure transparency and to help resolve the problem.
Please forward your queries / error messages, including the following information, to the Interface Service:
- Syntax OAI-PMH-query (set, format, period)
- Error message / description
- Context (repeated, sporadic, client used ...)
Last update: 12.10.2017