Uploaded image for project: 'Portal'
  1. Portal
  2. POR-405

Block IRMNG genera without child species

    Details

    • Type: Improvement Improvement
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: Checklistbank
    • Labels:

      Description

      The current IRMNG homonyms list according to Tony includes some 7-8000 genera which are mispellings and no true homonyms. We add all of these to our backbone which results in the saem amount of bad genera. Similar to Open Tree of Life we can identify most of these by looking at the number of child taxa after the final nub assemblage. If there are none and the genus comes from IRMNG we should remove those bad genus taxa!

        Gliffy Diagrams

          Activity

          Hide
          Markus Döring added a comment -

          The current nub contains 1 order, 5390 families and 183.290 genera which are accepted but have no child taxa. Removing them all would be a significant change, but might lead to a much cleaner backbone.

          The majority comes from IRMNG (163k genera and 2245 families alone), but Index Fungorum provides 1467 families and 500 genera and IPNI as the 3rd largest source 12k genera and 195 families.

          Cutting them all out seems wrong, but IRMNG contains a lot of alternative misspellings which should really not be in the nub. Hopefully the new nub building which includes fuzzy matching will reduce the problem

          Show
          Markus Döring added a comment - The current nub contains 1 order, 5390 families and 183.290 genera which are accepted but have no child taxa. Removing them all would be a significant change, but might lead to a much cleaner backbone. The majority comes from IRMNG (163k genera and 2245 families alone), but Index Fungorum provides 1467 families and 500 genera and IPNI as the 3rd largest source 12k genera and 195 families. Cutting them all out seems wrong, but IRMNG contains a lot of alternative misspellings which should really not be in the nub. Hopefully the new nub building which includes fuzzy matching will reduce the problem
          Hide
          Markus Döring added a comment -
          Show
          Markus Döring added a comment - We flag all genera and other higher taxa without any accepted species as doubtful: https://github.com/gbif/checklistbank/blob/master/checklistbank-cli/src/main/java/org/gbif/checklistbank/nub/NubBuilder.java#L412
          Hide
          Markus Döring added a comment -

          Will close the issue for now as those names are flagged as doubtful and are therefore less likely to be matched against in occurrences.
          After reviewing the new nub we can still consider to exclude all or some IRMNG genera

          Show
          Markus Döring added a comment - Will close the issue for now as those names are flagged as doubtful and are therefore less likely to be matched against in occurrences. After reviewing the new nub we can still consider to exclude all or some IRMNG genera

            People

            • Assignee:
              Markus Döring
              Reporter:
              Markus Döring
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: