Issue 17843

indet checklist species and strains are matched to their genus

17843
Reporter: mdoering
Type: Bug
Summary: indet checklist species and strains are matched to their genus
Priority: Critical
Status: Open
Created: 2015-09-25 12:51:30.137
Updated: 2015-10-06 18:06:08.445
        
Description: Checklist Bank claims to have over 41000 records for Lepidoptera in NCBI alone: http://www.gbif.org/species/search?q=Lepidoptera&dataset_key=fab88965-e69d-4491-a04d-e3198b626e52

These are mostly undetermined species or strain names and should not be matched to the genus backbone record but rather to nothing.

The new author aware nub matching used for checklists should fix this issue, but until it is live (requires new nub) this issue should be kept open]]>
    


Author: mdoering@gbif.org
Created: 2015-09-25 13:05:28.142
Updated: 2015-09-25 13:06:47.579
        
There are 142 nubKeys that return more than 500 usages from a single dataset.
As an immediate remedy I am removing those relations, nearly all come from NCBI and are the same issue.

Here are the non NCBI ones (datasetKey, nubKey, counts):

  16c3f9cb-4b19-4553-ac8e-ebb90003aa02 |     797 |  1632   (wikipedia)
  672aca30-f1b5-43d3-8a2b-c1606125fa1b |    9456 |  1252  (Mammal Species of the World)
  672aca30-f1b5-43d3-8a2b-c1606125fa1b |    5510 |   918
  672aca30-f1b5-43d3-8a2b-c1606125fa1b |    9614 |   525
  16c3f9cb-4b19-4553-ac8e-ebb90003aa02 | 2770879 |   533
  34a96ebe-e51c-4222-9d08-5c2043c39dec | 1626096 |   654  (GNUB)