Issue 17843
indet checklist species and strains are matched to their genus
17843
Reporter: mdoering
Type: Bug
Summary: indet checklist species and strains are matched to their genus
Priority: Critical
Status: Open
Created: 2015-09-25 12:51:30.137
Updated: 2015-10-06 18:06:08.445
Description: Checklist Bank claims to have over 41000 records for Lepidoptera in NCBI alone: http://www.gbif.org/species/search?q=Lepidoptera&dataset_key=fab88965-e69d-4491-a04d-e3198b626e52
These are mostly undetermined species or strain names and should not be matched to the genus backbone record but rather to nothing.
The new author aware nub matching used for checklists should fix this issue, but until it is live (requires new nub) this issue should be kept open]]>
Author: mdoering@gbif.org
Created: 2015-09-25 13:05:28.142
Updated: 2015-09-25 13:06:47.579
There are 142 nubKeys that return more than 500 usages from a single dataset.
As an immediate remedy I am removing those relations, nearly all come from NCBI and are the same issue.
Here are the non NCBI ones (datasetKey, nubKey, counts):
16c3f9cb-4b19-4553-ac8e-ebb90003aa02 | 797 | 1632 (wikipedia)
672aca30-f1b5-43d3-8a2b-c1606125fa1b | 9456 | 1252 (Mammal Species of the World)
672aca30-f1b5-43d3-8a2b-c1606125fa1b | 5510 | 918
672aca30-f1b5-43d3-8a2b-c1606125fa1b | 9614 | 525
16c3f9cb-4b19-4553-ac8e-ebb90003aa02 | 2770879 | 533
34a96ebe-e51c-4222-9d08-5c2043c39dec | 1626096 | 654 (GNUB)