Issue 14347

All datasets have a DOI

14347
Reporter: trobertson
Assignee: trobertson
Type: Epic
Summary: All datasets have a DOI
Priority: Blocker
Status: ReadyForDev
Created: 2013-11-11 14:11:48.611
Updated: 2015-11-03 12:32:10.076
DueDate: 2014-12-31 00:00:00.0
        
Description: The GBIF 2014 work program commits to all datasets having a Digital Object Identifier (DOI).  A user of the GBIF infrastructure should be able to determine the dataset to which any resource (e.g. an occurrence) is associated and should then be able to find the DOI for that dataset.

*Portal and IPT updated to handle DOIs as stable identifiers for datasets: milestone Sep 2014* - delayed into end 2014

*Rationale*
DOI is fast becoming the preferred mechanism used in citation, particularly within scholarly publishing in journals.  By adopting DOI identifiers on datasets, GBIF aim to:
# provide the means to offer consistent citation of datasets
# leverage the DOI monitoring systems that allow traceability of use, by following citation chains
# adopt a familiar identifier format, that is already largely accepted in the community

*Existing work*

Proposal on DOIs for datasets (circulated, call closed Sep 2013, responses filed in LiveLink, http://livelink.gbif.org/gbif/livelink?func=ll&objId=4420060, briefing document and summary attached)

*Required components*
- communication explaining the planned changes (give publishers the chance to mint their own DOIs, and guidance what the implications are); this consultation needs to include the Science Committee
- official contract with issueing authority
- decision: are multiple DOIs for a dataset supported (snapshot DOIs/versions)? This may require a consultation with publishers / Nodes
- decision on how to handle data paper DOIs - in practical terms, they reference a dataset, though in form of a publication; similarly Plazi checklist datasets
- information for publishers / Nodes, including "what to do if you want to issue your own DOI"; coordinate this with the communication concerning endorsement (GBIF-29)
- [documented policy on OccurrenceIDs as stable identifiers for data records (milestone Mar 2014) - not in scope]
- portal: can handle DOIs as stable identifiers for datasets (milestone Sep 2014)
- IPT: can handle DOIs as stable identifiers for datasets (milestone Sep 2014)

*DOI in use within GBIF*
Some publishers already use DOIs and GBIF should consider accommodating existing DOIs where already known in order to satisfy the trackability requirements.
At the time of writing 2,715,251 records from 6,806 datasets [1] use a DOI as a [dwc:collectionCode|http://rs.tdwg.org/dwc/terms/#collectionCode].  These are published exclusively by [PANGAEA|http://www.gbif.org/publisher/d5778510-eb28-11da-8629-b8a03c50a862]
Additionally DOIs are provided in citations on some datasets such as [Royal Ontario Museum|http://www.gbif.org/dataset/d9522343-146c-4d6b-b312-543d4d8ca0e8] published by Canadensys, who document the rationale on the [Canadensys blog|http://www.canadensys.net/2012/link-love-dois-for-darwin-core-archives] and use the GBIF IPT to publish the datasets.  It is important to note here that Canadensys do not use the DOI for the collectionCode, and theoretically multiple collectionCodes can exist within an single dataset.


[1] Source query executed Nov 11th 2013: http://pastebin.com/4igkTnih

Also see: https://code.google.com/p/gbif-providertoolkit/issues/detail?id=978#c7

*Involvement*
IT group, EOT; (participation group)]]>
    
Attachment stable-identifiers-consultation-responses.docx
Attachment Stable identifiers for GBIF mediated data_v04.docx