Issue 13511

Block IRMNG genera without child species

13511
Reporter: mdoering
Assignee: mdoering
Type: Improvement
Summary: Block IRMNG genera without child species
Description: The current IRMNG homonyms list according to Tony includes some 7-8000 genera which are mispellings and no true homonyms. We add all of these to our backbone which results in the saem amount of bad genera. Similar to Open Tree of Life we can identify most of these by looking at the number of child taxa after the final nub assemblage. If there are none and the genus comes from IRMNG we should remove those bad genus taxa!
Priority: Major
Resolution: Fixed
Status: Resolved
Created: 2013-07-16 12:25:35.708
Updated: 2015-09-18 20:33:13.954
Resolved: 2015-09-18 20:33:13.919


Author: mdoering@gbif.org
Created: 2013-08-05 10:47:47.151
Updated: 2013-08-05 10:47:47.151
        
The current nub contains 1 order, 5390 families and 183.290 genera which are accepted but have no child taxa. Removing them all would be a significant change, but might lead to a much cleaner backbone.

The majority comes from IRMNG (163k genera and 2245 families alone), but Index Fungorum provides 1467 families and 500 genera and IPNI as the 3rd largest source 12k genera and 195 families.

Cutting them all out seems wrong, but IRMNG contains a lot of alternative misspellings which should really not be in the nub. Hopefully the new nub building which includes fuzzy matching will reduce the problem
    


Author: mdoering@gbif.org
Created: 2015-09-18 20:28:39.782
Updated: 2015-09-18 20:28:39.782
        
We flag all genera and other higher taxa without any accepted species as doubtful:
https://github.com/gbif/checklistbank/blob/master/checklistbank-cli/src/main/java/org/gbif/checklistbank/nub/NubBuilder.java#L412
    


Author: mdoering@gbif.org
Created: 2015-09-18 20:33:13.952
Updated: 2015-09-18 20:33:13.952
        
Will close the issue for now as those names are flagged as doubtful and are therefore less likely to be matched against in occurrences.
After reviewing the new nub we can still consider to exclude all or some IRMNG genera