Das Portal für die Sozialwissenschaften

Documentation based on http://www.openarchives.org/OAI/openarchivesprotocol.html

The Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) is a low-barrier mechanism for repository interoperability. Data Providers are repositories that expose structured metadata via OAI-PMH. Service Providers then make OAI-PMH service requests to harvest that metadata. OAI-PMH is a set of six verbs or services that are invoked within HTTP.

The Open Archives Initiative Protocol for Metadata Harvesting provides an application-independent interoperability framework based on metadata harvesting.

A harvester is operated by a service provider as a means of collecting metadata from repositories.

A repository is managed by a data provider to expose metadata to harvesters. To allow various repository configurations, the OAI-PMH distinguishes between three distinct entities related to the metadata made accessible by the OAI-PMH. Each item has an identifier that is unique within the scope of the repository of which it is a constituent.

Sowiport OAI Help

Unique Identifiers

A unique identifier unambiguously identifies an item within a repository; the unique identifier is used in OAI-PMH requests for extracting metadata from the item. Items may contain metadata in multiple formats. The unique identifier maps to the item, and all possible records available from a single item share the same unique identifier.

The format of the unique identifier must correspond to that of the URI (Uniform Resource Identifier) syntax. Individual communities may develop community-specific URI schemes for coordinated use across repositories. The scheme component of the unique identifiers must not correspond to that of a recognized URI scheme unless the identifiers conform to that scheme. Repositories may implement the oai-identifier syntax described in the accompanying Implementation Guidelines document.

Unique identifiers play two roles in the protocol:

  1. Response: Identifiers are returned by both the ListIdentifiers and ListRecords requests.
  2. Request: An identifier, in combination with a metadataPrefix, is used in the GetRecord request as a means of requesting a record in a specific metadata format from an item.


A record is metadata expressed in a single format. A record is returned in an XML-encoded byte stream in response to an OAI-PMH request for metadata from an item. A record is identified unambiguously by the combination of the unique identifier of the item from which the record is available, the metadataPrefix identifying the metadata format of the record, and the datestamp of the record.


The XML-encoding of records is organized into the following parts:

         header -- contains the unique identifier of the item and properties necessary for selective harvesting. The header consists of the following parts:

o    the unique identifier -- the unique identifier of an item in a repository;

o    the datestamp -- the date of creation, modification or deletion of the record for the purpose of selective harvesting.

o    zero or more setSpec elements -- the set membership of the item for the purpose of selective harvesting.

  metadata -- a single manifestation of the metadata from an item. The OAI-PMH supports items with multiple manifestations (formats) of metadata. At a minimum, repositories must be able to return records with metadata expressed in the Dublin Core format, without any qualification. Optionally, a repository may also disseminate other formats of metadata. The specific metadata format of the record to be disseminated is specified by means of an argument -- the metadataPrefix -- in the GetRecord or ListRecords request that produces the record. The ListMetadataFormats request returns the list of all metadata formats available from a repository, or for a specific item (which can be specified as an argument to the ListMetadataFormats request).


The following example shows an XML-encoding of a record and its components:

  • the header part with:
    • a unique identifier of the item from which the record was disseminated.
    • the datestamp of the record equal to 2016-08-24T13:28:05Z
    • one setSpecs, respectively GESIS-SSOAR, indicating that the item from which the record was disseminated belongs to a set of the repository;
  • the metadata part. This consists of a single root tag - in the example the tag oai_dc:dc - with the nested tags belonging to the corresponding metadata format - in the example, Dublin Core elements such as dc:title.


Sowiport-AOI uses the Dublin Core elements from the DCMI schema (oai_dc).


A set is an optional construct for grouping items for the purpose of selective harvesting. Repositories may organize items into sets. Set organization may be flat, i.e. a simple list, or hierarchical. Multiple hierarchies with distinct, independent top-level nodes are allowed. Hierarchical organization of sets is expressed in the syntax of the setSpec parameter as described below. When a repository defines a set organization it must include set membership information in the headers of items returned in response to the ListIdentifiers , ListRecords and GetRecord requests.

The following is an example of one set in the Sowiport Repository:

Selective harvesting

Selective harvesting allows harvesters to limit harvest requests to portions of the metadata available from a repository. The OAI-PMH supports selective harvesting with two types of harvesting criteria that may be combined in an OAI-PMH request: datestamps and set membership.

Http Request Format

In addition to the base URL, all requests consist of a list of keyword arguments, which take the form of key=value pairs. Arguments may appear in any order and multiple arguments must be separated by ampersands [&]. Each OAI-PMH request must have at least one key=value pair that specifies the OAI-PMH request issued by the harvester:

key is the string 'verb';

value is one of the defined OAI-PMH requests.

The number and nature of additional key=value pairs depends on the arguments for the individual request.

URLs for GET requests have keyword arguments appended to the base URL, separated from it by a question mark [?]. For example, the URL of a GetRecord request to a repository with base URL that is


Special characters in URIs must be encoded.

For more information http://www.openarchives.org