Issue 16722

Update default dataset citation format on dataset page

16722
Reporter: kbraak
Assignee: ahahn
Type: Improvement
Summary: Update default dataset citation format on dataset page
Priority: Critical
Resolution: Fixed
Status: Closed
Created: 2014-12-08 10:51:52.832
Updated: 2017-10-10 10:13:21.823
Resolved: 2017-10-10 10:12:51.178
        
Description: Currently the default dataset citation format uses the GBIF dataset UUID as the citation identifier. For example, the default citation for http://www.gbif.org/dataset/c89babc3-4e67-4cdd-84e5-dd62f1193053 is:

{quote}
Default citation
GBIF Benin: Census of species recorded during phytosociological surveys in Benin, 2015-10-28.
Accessed via http://www.gbif.org/dataset/c89babc3-4e67-4cdd-84e5-dd62f1193053 on 2015-11-03
{quote}

Now that each dataset in GBIF has been assigned a DOI (see GBIF-1), the DOI should be used as the citation identifier.

Ideally, the portal should use the [IPT citation format|https://github.com/gbif/ipt/wiki/IPT2Citation.wiki], which is based on DataCite’s preferred citation format, and satisfies the [Joint Declaration of Data Citation Principles|https://www.force11.org/datacitation].

For example citations, see here: https://github.com/gbif/ipt/wiki/IPT2Citation.wiki#example-citations

]]>
    
Attachment DOI.pptx


Author: kbraak@gbif.org
Created: 2016-06-15 14:24:15.983
Updated: 2016-06-15 14:24:15.983
        
Burke is currently reengineering the dataset page. [~ahahn@gbif.org] and [~kylecopas] your help is requested in deciding what dataset citation(s) gets shown on the page.

I would like to make the following proposal:

a. We only show the GBIF generated (default) citation on the GBIF dataset page. That citation should be formatted according to the [IPT citation format](https://github.com/gbif/ipt/wiki/IPT2Citation.wiki). This guarantees that it includes all the essential components such as the identifier (DOI).
b. We no longer show the user-supplied citation. The user-supplied citation corresponds to use of the original version of their dataset - not the GBIF indexed version of their dataset.

Ultimately the dataset page should make clear GBIF is not a repository for the original data, and that users wishing to retrieve/use the original (raw and potentially richer) data should visit the dataset homepage. Ideally the dataset homepage should show the user-supplied citation format. 
    


Author: dschigel
Created: 2016-06-15 15:15:13.257
Updated: 2016-06-15 15:15:13.257
        
I second this. GBIF needs to recommend in a non ambiguous way how to cite the datasets / dataset page as displayed at GBIF.org. In principle, we could not stop publisher to recommend custom recommended citation for the source webpage. Most of the journal instructions would require URL and date of access, but they would seldom forbid extra information such as DOI.
Journals ask you to inform how and where did you access the reference, not where it is stored, so GBIF's DOI is very OK for that. As GBIF's DOIs resolve to the dataset webpages (not to datasets as such), having the DOIs in clickable form (with http etc.) will satisfy the journal requirement for URL and our needs for DOI use. I hope DOI remain traceable when turned into URLs. I guess the same format would be used for downloads. 
    


Author: ahahn@gbif.org
Comment: I agree with the procedure, provided we have the necessary documentation / communication around this. Some publishers do provide custom citations, so that we need to explain that a) those will still be available with the original data at the access point, and b) the advantages are that we will this way in future be able to provide proper tracking of citations through literature and reporting to the dataset owners this way. 
Created: 2016-06-15 15:32:22.109
Updated: 2016-06-15 15:32:22.109


Author: kbraak@gbif.org
Created: 2016-06-15 15:35:28.371
Updated: 2016-06-15 15:35:28.371
        
Thanks Dmitry.

Your comment highlights the fact that we should assign every dataset registered with GBIF a GBIF DOI (that resolves to the GBIF dataset page).

This will allow us to produce a GBIF generated citation for each dataset with an identifier (GBIF DOI) that resolves to the GBIF dataset page.

Currently GBIF only assigns a GBIF DOIN to a dataset if during registration the publisher hasn't specified an existing DOI. There are bugs with this DOI assigning functionality though (see [POR-3116|http://dev.gbif.org/issues/browse/POR-3116]).
    


Author: dschigel
Created: 2016-06-15 15:46:49.808
Updated: 2016-06-15 16:16:17.816
        
Hmm, this is an interesting twist of the issue. I always though it's neat that we don't issue a DOI if there is one already. Why was it set so that GBIF's DOIs resolve to the GBIF wepbage, not to the dataset at source, as publisher provided DOIs do (if that is correct in most cases at least)?

    


Author: kbraak@gbif.org
Created: 2016-06-15 16:16:14.939
Updated: 2016-06-15 16:16:14.939
        
The GBIF DOI should resolve to the GBIF dataset page (promoting the GBIF indexed and interpreted version of the data) so that the GBIF generated citation complies with Principle #7 "Specificity and Verifiability" of the [Joint Declaration of Data Citation Principles|https://www.force11.org/group/joint-declaration-data-citation-principles-final] that states:

bq. "Data citations should facilitate identification of, access to, and verification of the specific data that support a claim. Citations or citation metadata should include information about provenance and fixity sufficient to facilitate verfiying that the specific timeslice, version and/or granular portion of data retrieved subsequently is the same as was originally cited."

The dataset's homepage, on the other hand, makes the original raw version of the dataset accessible. This is not the same version of the data that was included in the GBIF download, and used by the researcher to support their claim.


    


Author: dschigel
Comment: Ok, in such case we do need to issue a new DOI for GBIF-interpreted version of a dataset, even if there is a DOI provided by publisher. This causes new version - new DOI schema, which we however, don't follow yet. I add my DOI slide. This was noticed also before - even if pubslisher changes a dataset 100% when publishing a new "version" of it, the DOI for dataset in our current model stays unchanged.
Created: 2016-06-15 16:20:57.326
Updated: 2016-06-15 16:22:35.525


Author: kbraak@gbif.org
Created: 2016-06-15 17:37:02.293
Updated: 2016-06-15 17:37:02.293
        
Publishers complying with DataCite's best practices for versioning datasets ensure that each version of a dataset can be unambiguously identified and in doing so make it possible for users to see when significant changes to the dataset occurred. This is what the [IPT's versioning policy|https://github.com/gbif/ipt/wiki/IPT2Versioning.wiki] is based on.

Indeed the publisher may have changed their dataset 100% without telling GBIF. GBIF doesn't have intimate knowledge of the dataset and doesn't support dataset versioning, therefore GBIF relies on the publisher to tell us whenever scientifically significant changes have been done to their dataset. In accordance with DataCite's best practices for versioning datasets, publishers can do this by assigning the dataset a new DOI and new major version number. GBIF can monitor the dataset to see if its publisher-assigned DOI has changed and consequently assign the GBIF indexed version of the dataset a new GBIF DOI as well.

Each GBIF download is immutable, persisted and assigned a DOI. Because the GBIF download includes a list of publisher-assigned citations (one for each dataset included in the download) the user can actually trace what version of the dataset those indexed records came from - provided of course the publisher-assigned citation included the version number of the dataset. That's why we need to promote publishers comply with DataCite's best practices for versioning datasets and generating citations. Currently only IPTs configured with an EZID or DataCite account are likely to comply with these best practices.
    


Author: dschigel
Created: 2016-06-16 09:42:35.961
Updated: 2016-06-16 09:42:35.961
        
Thanks Kyle, good summary.

One version - one DOI would solve the ambiguity problem, but it can easily become ridiculous, as you probably don't want a new DOI if you change one comma. How about adding one record? Perhaps recommendations = best practice + common sense will keep it working in a relatively uniform manner.

If we are unlikely to go the one version - one DOI way, as your write, currently the uniqueness and disambiguation in a citation require indication of DOI and a dataset version number or a version ID. Here I would not rely on best practices, and I'd rather leave no room for misinterpretation. It would be nice to make indication of version in a citation obligatory, latest version by default (user can change to earlier version when needed), and version would indicated even of there is only one. Date of access sort of doing this job, but of course on the same date one can access and cite any version available.
    


Author: hoefft
Comment: This has since been discussed intensively and implemented in the API. If issues still exist, please reopen
Created: 2017-10-10 10:12:51.287
Updated: 2017-10-10 10:13:21.818