Issue 18384
Incorrect Hyperlink
18384
Reporter: feedback bot
Type: Feedback
Summary: Incorrect Hyperlink
Status: Reopened
Created: 2016-04-07 20:33:48.231
Updated: 2016-04-11 10:46:46.719
Description: Please take a look at the hyperlink for complete classification for this record, http://webcache.googleusercontent.com/search?q=cache:d0bjBgZx1FAJ:www.gbif.org/species/103050321+&cd=10&hl=en&ct=clnk&gl=ca
The hyperlink takes one to a different organism. The original hyperlink is for a scarab beetle, the complete classification link takes one to a species of dragonfly.]]>
Author: mdoering@gbif.org
Comment: Indeed the id is now pointing to a different record. But this is for the NCBI checklist dataset, not the GBIF Backbone. And if a publisher does not keep his local identifiers stable the GBIF ids will also change like in this case. Not much we can do about this. dwc:taxonID needs to be stable in the published source for GBIF to keep the non backbone ids stable
Created: 2016-04-07 21:55:13.912
Updated: 2016-04-07 21:55:13.912
Author: rdmpage
Created: 2016-04-08 09:10:30.333
Updated: 2016-04-08 09:10:30.333
[~mdoering@gbif.org] I think there's something a little more complicated going on here. NCBI taxon ids are pretty stable so I'm surprised there is an issue like this. However, looking at the Darwin Core Archive for the NCBI taxonomy it looks like there are two sets of ids, one is the NCBI tax_id (stable) and the other is a sequential number with the prefix "e" (e1, e2, etc.). These ids are the synonyms, e.g.
e328876 Tetrathemis corduleformis 753953 misspelling
753953 Tetrathemis corduliformis species 333463
e328877 Tetrathemis corduliformis Longfield, 1936 753953 authority
753953 is the tax_id. NCBI doesn't have distinct identifiers for synonyms (every name is linked to the same tax_id) so arbitrary ones have been created (e328876 and e328877, the 328876th and e328877th synonym in this archive). Every time this file is generated (who does this?) the tax_ids are likely to be stable (NCBI does merge some occasionally, but not many) but the "e" ids will likely be different each time :(
GBIF generates taxa from each row in the NCBI Darwin Core archive, so some will have TAXONID tax_id and hence keep the same GBIF nub id, but those with "e" prefix will likely change. This is what happened here, the id "e327195" ends up being assigned the same GBIF nub id even if the taxon is different. Hence, the user is rightly confused about why this GBIF page has changed (if they'd picked http://www.gbif.org/species/104648044 they wouldn't have seen any changes).
Maybe we can reopen this issue, as the there's a problem with how the NCBI data is generated and parsed, and at the moment it pretty much guarantees that many NCBI taxa will have different nub ids with each update.