Issue 18618
Issues with the updated NatureServe dataset
18618
Reporter: kylecopas
Assignee: jlegind
Type: Feedback
Summary: Issues with the updated NatureServe dataset
Status: InProgress
Created: 2016-06-29 15:09:24.246
Updated: 2016-07-04 14:32:18.999
Description: Okay, so a closer look calls a few items to my attention.
1. Are we double counting this dataset? It's not a precise factor of 2, but the source IPT has 929,144 records (http://services.natureserve.org/ipt) while our index shows 1,864,180.
2. They're running an old IPT, so they've set the license to the unsupportable BY-NC-SA. We ought to flag this for them ahead of the final licensing implementation/updates.
3. Aside from the potential duplication, the majority of these records are identified as 'unknown' basis of record. I don't think that's strictly the case.
4. Is there a better way to suggest they attribute the actual sources of these data from their network? They're using 'collectionCode'—e.g.,http://www.gbif.org/occurrence/1274178819, where IL-NHP, which translates to the Illinois Natural Heritage Program. The chain of provenance gets pretty murky using this method.
5. NEW: Default citation provided is two-plus years out of date: 'NatureServe Central Databases (accessed through GBIF data portal, http://data.gbif.org/datasets/resource/607, [DATE])'
I'd like to use this as an opportunity to engage my former colleagues and improve the quality of the data that's provided, if possible. ]]>
Author: jlegind@gbif.org
Comment: Publisher contacted regarding duplication. The issue was changes to the record identifier.
Created: 2016-07-01 15:42:40.605
Updated: 2016-07-01 15:42:40.605
Author: jlegind@gbif.org
Comment: Publisher contacted regarding the other items.
Created: 2016-07-04 14:32:18.999
Updated: 2016-07-04 14:32:18.999