Issue 16042

Source data/indexing issue ?

16042
Reporter: feedback bot
Assignee: jlegind
Type: Task
Summary: Source data/indexing issue ?
Priority: Critical
Resolution: Fixed
Status: Closed
Created: 2014-07-04 15:39:46.758
Updated: 2016-02-28 10:55:41.101
Resolved: 2016-02-26 14:02:17.664
        
        
Description: Hi everyone,

About two weeks ago, our friends at GBIF Togo published two datasets trough the France-hosted IPT: http://www.gbif.org/dataset/76ce538f-412d-48c3-a4ac-8b4af5ead610 and http://www.gbif.org/dataset/cbe48af4-952d-4cfe-946f-0817c57547f3.

While the metadata appeared instantly on publishing, the data itself seems to still be invisible in the portal (0 occurrences/species/taxa).

Is the delay normal ? Ot is there some indexing issue with their data ? I'll help them to fix if necessary !

Thanks,

*Reporter*: Nicolas NoƩ
*E-mail*: [mailto:n.noe@biodiversity.be]]]>
    


Author: kbraak@gbif.org
Comment: [~jlegind@gbif.org] can you please follow up with Nicolas? Thanks
Created: 2014-07-09 09:52:09.099
Updated: 2014-07-09 09:52:09.099


Author: jlegind@gbif.org
Comment: Publisher + Nicolas has been contacted
Created: 2014-07-10 10:56:15.266
Updated: 2014-07-10 10:56:15.266


Author: rdmpage
Created: 2016-02-28 10:51:07.422
Updated: 2016-02-28 10:55:41.096
        
I realise this issue is closed, but this dataset still has some problems. It's a great example of why GBIF should either:

- think like a journal publisher and review datasets before publishing
- or, fork data and treat it like source code

This dataset is a poor representation of the data published in the paper http://dx.doi.org/10.5252/z2011n3a4 The original paper has latitude and longitudes for the occurrences, these are not present in this dataset. The field "taxonID" in the dataset is NOT a taxonID but an occurrence id (they correspond to the specimen codes listed in material examined for each taxon, minus the institution or collector prefix). These should be reinterpreted as institution codes and catalogue numbers. This would also help users recognise that a number of these records are already in GBIF (some of the material in this dataset is museum material already indexed).

The "identifiedBy" column ("Segniagbeto et al. 2011" for every record) should really be in the "identificationReferences" column as it's the source paper http://dx.doi.org/10.5252/z2011n3a4 )

This could have been a nice, useful dataset with georeferenced specimens, and an external link (DOI) to the data source (the original paper). I know this is a question of resources, but if we had decent peer review of data in place (where we worked with the data submitters to improve the quality and utility of their submission), or people to screen data, augment and enrich it (say, via forking it on GitHub) we could do so much more with data like this (as could our users, who will mostly ignore this data as its not georeferenced).