Issue 18403

Global Names Usage Bank considered harmful - mass duplication of names

18403
Reporter: rdmpage
Type: Feedback
Summary: Global Names Usage Bank considered harmful - mass duplication of names
Priority: Major
Resolution: Fixed
Status: Closed
Created: 2016-04-14 14:32:21.602
Updated: 2017-01-17 15:54:09.922
Resolved: 2017-01-17 12:37:18.671
        
Description: The Global Names Usage Bank is filling GBIF species searches with lots of "junk. For example, a search for the mosquito _Aedes albopictus_ has 113 hits! Most of these are from GNUB, and (a) are incorrectly formed, e.g. _Aedes albopictus_ (Skuse, 1894) Skuse 1894 and (b) are the same name.

It seems that GNUB is using a model of "names usage" where every citation of a name in a different publication is treated as a name, hence taxa with lots of citations flood the search results. This is a horrible experience for the user. A better model would be one name and a bibliography. ]]>
    


Author: mdoering@gbif.org
Comment: Agree Rod. I have asked Rich for more than a year now to correct the GNUB dataset which also contains badly linked data (its not compliant with the dwca format we use). And instead of GNUB GBIF would be much more interested to see the pure ZooBank view as a dataset on its own. Maybe we should remove GNUB until that is settled
Created: 2016-04-14 14:40:37.785
Updated: 2016-04-14 14:40:37.785


Author: trobertson@gbif.org
Created: 2016-04-14 14:49:17.294
Updated: 2016-04-14 14:49:17.294
        
I've lost all track of GN* stuff, but one of the earlier editions integrated a lot of name sources which were just wrong.

I can say that because I was responsible for the version of the GBIF backbone integrated in 2010 (ish) which was absolutely full of garbage through 1.) bugs in my integration code and 2.) that the backbone used names from occurrence data (which often had major issues like genus and order swapped) .  If that edition of the GBIF backbone persists in it's current making it will simply not be useful.
    


Author: rdmpage
Created: 2016-04-14 14:53:06.379
Updated: 2016-04-14 14:53:06.379
        
I'd be in favour of dropping GNUB for the moment. I tried to gauge the possible impact of loosing names that only GNUB provides, but the stats page doesn't seem to be working http://www.gbif.org/dataset/34a96ebe-e51c-4222-9d08-5c2043c39dec/stats

ZooBank would be nicer, although it uses the same data model, so the question would be whether Rich could export it in the form GBIF expects.
    


Author: mdoering@gbif.org
Comment: ZooBank added: http://www.gbif.org/dataset/b9a214b7-c368-4d22-aa53-b1fc16a1210a
Created: 2017-01-17 12:37:09.834
Updated: 2017-01-17 12:37:09.834


Author: rdmpage
Comment: [~mdoering@gbif.org] Nice to see this, I've grabbed a copy of the DwCA file to have a closer look :)
Created: 2017-01-17 15:54:09.922
Updated: 2017-01-17 15:54:09.922