Issue 18615

University of Florida Vertebrate Paleontology - massive index fail, 70% of names missed

18615
Reporter: rdmpage
Assignee: mdoering
Type: Task
Summary: University of Florida Vertebrate Paleontology - massive index fail, 70% of names missed
Priority: Major
Status: Open
Created: 2016-06-26 18:50:26.657
Updated: 2016-07-06 10:50:09.427
        
Description: The University of Florida Vertebrate Paleontology dataset http://www.gbif.org/dataset/2fba9985-ac30-46cb-99bf-91ccde0d8d2f has not been indexed properly, and the problem affects approx 70% of records (based on sample of first 1000 records).

The problem is that the taxonomic names in this data set are routinely not parsed properly. It may be because they are in ALL CAPS but names that GBIF has are not recognised. You can see this by comparing the GBIF and verbatim views of the same records, e.g. http://www.gbif.org/occurrence/1133267166 which the dataset has as ARCHAEOHIPPUS BLACKBERGI and GBIF treats as Archaeohippus

Something is clearly up with the name parsing as (a) the name isn't parsed properly and (b) there is no flag raised even though the interpreted name doesn't match the input.
]]>
    


Author: mdoering@gbif.org
Comment: As our backbone has huge gaps in paleontological names maybe we should contact them to see if they have names to be shared and included in our backbone builds?
Created: 2016-07-04 14:03:49.729
Updated: 2016-07-04 14:03:49.729


Author: rdmpage
Created: 2016-07-06 10:50:09.427
Updated: 2016-07-06 10:50:09.427
        
[~mdoering@gbif.org] Just to be clear, a *GBIF already has a lot of these names* (including _Archaeohippus blackbergi_  http://www.gbif.org/species/4969356 ), the problem seems to be that the name parsing code barfs all over these ALL CAPS names, either failing to match them to existing names, or failing to recognise when it's made a partial match.