Issue 17786

Implement OAI-PMH with payloads for EML and DublinCore

17786
Reporter: trobertson
Assignee: mblissett
Type: Epic
Summary: Implement OAI-PMH with payloads for EML and DublinCore
Priority: Blocker
Resolution: Done
Status: Done
Created: 2015-09-03 13:23:04.265
Updated: 2016-02-05 17:59:11.377
        
Description: In an effort to integrate with the national open data infrastructure, the NLBIF (Netherlands) seek to provide an Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) interface to expose all national datasets mediated by NLBIF.

The existing data mobilization infrastructure of the GBIF network including the Integrated Publishing Toolkit (IPT) and GBIF API and discovery system is working well for NLBIF and it has been decided to augment the GBIF API to include the new service centrally, rather than deploy it on e.g. the IPT or some other technology.

The OAI PMH specification is well documented (https://www.openarchives.org/pmh/) and the registry web services component of the GBIF infrastructure will be modified to support this with OAI-PMH feeds for 1) all datasets and 2) data mobilised nationally (i.e. from registered organizations associated with a country).

The payload offered by the service will support both Ecological Metadata Language (EML) format for domain specific metadata and the Dublin Core format to allow the feed to be integrated into cross-disciplinary databases.

The service must include the ability to provide a feed of new, modified and deleted datasets.  Support for surfacing deleted datasets is optional in the OAI-PMH specification, but the GBIF network regularly delete datasets so this must be included.

CC [~cgendreau]]]>
    


Author: trobertson@gbif.org
Comment: https://github.com/EKT/EnhancedOAIServer and http://pubserv.oclc.org/oaicat/jars/docs/ *might* be of interest to this, but we should be very careful about bringing in unwanted dependencies (i.e. forking might be appropriate, checking license restrictions) 
Created: 2015-09-04 10:22:21.898
Updated: 2015-09-04 10:22:21.898


Author: mblissett
Created: 2015-09-04 10:47:28.202
Updated: 2015-09-04 10:47:55.343
        
EnhancedOAIServer uses OAI-CAT, which is no longer maintained: https://github.com/OCLC-Research/oaicat/blob/wiki/ProjectHome.md (though that doesn't mean it's not useful).

There's https://github.com/DSpace/xoai, which has recent commits and merged pull requests, but more dependencies.
    


Author: trobertson@gbif.org
Comment: Please also consider that if dependencies become a concern in the registry-ws module of the registry, we could do a new module of registry-oaipmh which is a runnable artifact (i.e. a new service).  It could 1) communicate through HTTP using the registry-ws-client or 2) have dependency on the registry persistence code which we would probably have to pull out of registry-ws into a new registry-persistence module.  These are just for consideration - ideal would be that it can be added to the registry-ws (cleanly) without complicating the project further.
Created: 2015-09-04 10:54:20.78
Updated: 2015-09-04 10:54:20.78


Author: cgendreau
Created: 2015-10-21 12:03:43.373
Updated: 2015-10-21 12:04:18.152
        
After review, the following changes should be made to the OAI-PMH response:
- Add "purpose" provided in the IPT as a  element
- Add "application/dwca+zip" as 
- Add 3  elements : "geographicDescription", "boundingCoordinates" and "temporalCoverage"
- Add a  element containing the number of records