Uploaded image for project: 'Portal'
  1. Portal
  2. POR-2989

Too eager basionym merging during nub builds?

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Blocker Blocker
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: Checklistbank
    • Labels:

      Description

      When building the nub using just the Mantodea checklist we yield more synonymised recombinations of basionyms as in the original, even though it is a very well curated experts dataset with basionym information:

      http://www.gbif.org/dataset/99948a8b-63b2-41bf-9d10-6e007e967789

      SOURCE Mantodea:
      Hymenopodidae [family]
        Galinthias meruensis Sjostedt, 1909 [species]
          *Galinthias usambarica Sjostedt, 1909 [species]
        Oxypiloidea (Catasigerpes) nigericus (Giglio-Tos, 1915) [species]
        Oxypilus (Anoxypilus) meruensis (Sjostedt, 1909) [species]
          *Oxypilus (Anoxypilus) nigericus (Beier, 1930) [species]
        Pseudoharpax nigericus Giglio-Tos, 1915 [species]
      
      
      NUB:
      Hymenopodidae [family]
        $Galinthias meruensis Sjostedt, 1909 [species]
          *Galinthias usambarica Sjostedt, 1909 [species]
          *Oxypilus meruensis (Sjostedt, 1909) [species]
          *Oxypilus nigericus (Beier, 1930) [species]
        $Pseudoharpax nigericus Giglio-Tos, 1915 [species]
          *Oxypiloidea nigericus (Giglio-Tos, 1915) [species]
      

      Investigate whether this additional nub grouping could actually be correct or clearly is an error. If it is an error we need to think about how to avoid them. Currently we detect basionyms within a family which might be too eager

        Gliffy Diagrams

        Issue Links

          Activity

          Hide
          Markus Döring added a comment -

          Roderic D. M. Page do you have any more insight into this case maybe? I cannot find much information apart from the Mantodeo Species Files which treat Oxypilus (Anoxypilus) meruensis (Sjostedt, 1909) and Galinthias meruensis Sjostedt, 1909 as 2 distinct taxa:

          http://mantodea.speciesfile.org/Common/basic/Taxa.aspx?TaxonNameID=1182721
          http://mantodea.speciesfile.org/Common/basic/Taxa.aspx?TaxonNameID=1182837

          From the names alone it seems very likely that Oxypilus meruensis is a recombination based on the same type. Both genera Oxypilus and Galinthias predate the original species description

          Show
          Markus Döring added a comment - Roderic D. M. Page do you have any more insight into this case maybe? I cannot find much information apart from the Mantodeo Species Files which treat Oxypilus (Anoxypilus) meruensis (Sjostedt, 1909) and Galinthias meruensis Sjostedt, 1909 as 2 distinct taxa: http://mantodea.speciesfile.org/Common/basic/Taxa.aspx?TaxonNameID=1182721 http://mantodea.speciesfile.org/Common/basic/Taxa.aspx?TaxonNameID=1182837 From the names alone it seems very likely that Oxypilus meruensis is a recombination based on the same type. Both genera Oxypilus and Galinthias predate the original species description
          Hide
          Markus Döring added a comment -

          Looks like both species were described originally in 1909 in this expedition paper from the Kilimandjaro and thus are not based on the same type:
          page 67&69: http://www.biodiversitylibrary.org/item/16983#page/183/mode/1up

          Show
          Markus Döring added a comment - Looks like both species were described originally in 1909 in this expedition paper from the Kilimandjaro and thus are not based on the same type: page 67&69: http://www.biodiversitylibrary.org/item/16983#page/183/mode/1up
          Hide
          Roderic D. M. Page added a comment -

          Markus Döring The family-based method of checking for duplicated species names may fail if two species with same name are described in different genera in same family. We could maybe try and avoid some errors by looking for patterns like one genus with A. b. Linn, and the other has C. b. (Linn) but we could still make mistakes.

          I'm puzzled by this example though, because it's not clear to me why Oxypilus (Anoxypilus) meruensis (Sjostedt, 1909) is written with (Sjostedt, 1909) in parentheses. He's the author of the species name and placed it in Oxypilus as you've discovered (original reference here http://biostor.org/reference/161030 ), but Mantodea checklist seems to have added parentheses. At some point Oxypilus meruensis was place din the subgenus Anoxypilus, but I don't follow why this should mean the name is in parentheses.

          Show
          Roderic D. M. Page added a comment - Markus Döring The family-based method of checking for duplicated species names may fail if two species with same name are described in different genera in same family. We could maybe try and avoid some errors by looking for patterns like one genus with A. b. Linn, and the other has C. b. (Linn) but we could still make mistakes. I'm puzzled by this example though, because it's not clear to me why Oxypilus (Anoxypilus) meruensis (Sjostedt, 1909) is written with (Sjostedt, 1909) in parentheses. He's the author of the species name and placed it in Oxypilus as you've discovered (original reference here http://biostor.org/reference/161030 ), but Mantodea checklist seems to have added parentheses. At some point Oxypilus meruensis was place din the subgenus Anoxypilus, but I don't follow why this should mean the name is in parentheses.
          Hide
          Markus Döring added a comment -

          Yes, I do not quite understand that either. And this is actually causing the potentially false grouping. If the author would be without parentheses we would not consider them as being a recombination and leave the name as an accepted one. Guess we need to do some more detailed analysis on the new backbone over time as it is impossible to tell from just the names we have and each case needs a thorough manual investigation

          Show
          Markus Döring added a comment - Yes, I do not quite understand that either. And this is actually causing the potentially false grouping. If the author would be without parentheses we would not consider them as being a recombination and leave the name as an accepted one. Guess we need to do some more detailed analysis on the new backbone over time as it is impossible to tell from just the names we have and each case needs a thorough manual investigation
          Hide
          Roderic D. M. Page added a comment -

          (Sigh). Yes, a lot of manual work may be needed. I'm trying to lessen some of this by fleshing out BioNames to include more original publications so it's possible to track down the name, I've also an as-yet unreleased project doing the same for IPNI (some 300,000 names are now linked to a digital identifier for the publication of the name). It would be nice if these sort of resources could be expressed in a way that enabled automatic resolution of problems like the one you've encountered. If, say, we had names linked to type specimens we'd be able to pretty quickly spot false hits, but it's whether we'll ever have sufficiently linked data to be able to do that. Meantime, manual investigation seems the order of the day. Not quite the brave new world of automated reasoning yet

          Show
          Roderic D. M. Page added a comment - (Sigh). Yes, a lot of manual work may be needed. I'm trying to lessen some of this by fleshing out BioNames to include more original publications so it's possible to track down the name, I've also an as-yet unreleased project doing the same for IPNI (some 300,000 names are now linked to a digital identifier for the publication of the name). It would be nice if these sort of resources could be expressed in a way that enabled automatic resolution of problems like the one you've encountered. If, say, we had names linked to type specimens we'd be able to pretty quickly spot false hits, but it's whether we'll ever have sufficiently linked data to be able to do that. Meantime, manual investigation seems the order of the day. Not quite the brave new world of automated reasoning yet
          Hide
          Markus Döring added a comment -

          To reduce the too eager basionym grouping we could:

          • only create basionym relations programmatically if they are without conflicts and there is a clear, single original name (not 2 original names with the same epithet and author per family)
          • maintain a blacklist of families, authors and/or epithets to exclude from the automated grouping
          • better indicate that the basionym was derived
          • turn off basionym detection alltogether: this would bring us back to the situation with many, many more accepted names, see POR-1389. I would hope we do not have to do that and see this as the last resort. Better to have a few basionyms wrong than thousands of false accepted names.
          Show
          Markus Döring added a comment - To reduce the too eager basionym grouping we could: only create basionym relations programmatically if they are without conflicts and there is a clear, single original name (not 2 original names with the same epithet and author per family) maintain a blacklist of families, authors and/or epithets to exclude from the automated grouping better indicate that the basionym was derived turn off basionym detection alltogether: this would bring us back to the situation with many, many more accepted names, see POR-1389 . I would hope we do not have to do that and see this as the last resort. Better to have a few basionyms wrong than thousands of false accepted names.
          Hide
          Markus Döring added a comment - - edited

          Another example of an too eagerly synonymized species is Tanacetum millefolium (L.) Tzvelev which has the basionym Anthemis millefolia L. and not Achillea millefolium L. which is what the new backbone uses as the single basionym: http://mdoering.github.io/nub-browser/app/#/taxon/3120060

          Achillea millefolium L. 
            is the basionym of:
          
          Santolina millefolium (L.) Baill., 1882 species
          Chamaemelum millefolium (L.) E.H.L.Krause species
          Tanacetum millefolium (L.) Tzvelev species
          Anthemis millefolia L. species
          Chrysanthemum millefolium (L.) L. species
          Chrysanthemum millefolium (L.) Nyár. species
          Anthemis millefolium (L.) Schrank, 1789 species
          Alitubus millefolium (L.) Dulac species
          
          Show
          Markus Döring added a comment - - edited Another example of an too eagerly synonymized species is Tanacetum millefolium (L.) Tzvelev which has the basionym Anthemis millefolia L. and not Achillea millefolium L. which is what the new backbone uses as the single basionym: http://mdoering.github.io/nub-browser/app/#/taxon/3120060 Achillea millefolium L. is the basionym of: Santolina millefolium (L.) Baill., 1882 species Chamaemelum millefolium (L.) E.H.L.Krause species Tanacetum millefolium (L.) Tzvelev species Anthemis millefolia L. species Chrysanthemum millefolium (L.) L. species Chrysanthemum millefolium (L.) Nyár. species Anthemis millefolium (L.) Schrank, 1789 species Alitubus millefolium (L.) Dulac species
          Hide
          Markus Döring added a comment -

          Another example in Fabaceae: Pueraria montana, Anthyllis montana & Astragalus montanus L.
          See http://dev.gbif.org/issues/browse/PF-2513

          Show
          Markus Döring added a comment - Another example in Fabaceae: Pueraria montana, Anthyllis montana & Astragalus montanus L. See http://dev.gbif.org/issues/browse/PF-2513
          Hide
          Markus Döring added a comment -

          Another problematic cases from Rods blog post http://iphylo.blogspot.de/2013/09/the-quality-of-gbif-taxonomic.html:
          Helianthus atrorubens L. & Hebeclinium atrorubens Lemaire

          Show
          Markus Döring added a comment - Another problematic cases from Rods blog post http://iphylo.blogspot.de/2013/09/the-quality-of-gbif-taxonomic.html: Helianthus atrorubens L. & Hebeclinium atrorubens Lemaire
          Hide
          Markus Döring added a comment - - edited

          Another example in Fabaceae appears to be
          Acacia anceps var. angustifolia Benth. and Monopteryx angustifolia Benth.

          Monopteryx angustifolia apparently is endemic to Brazil:
          http://www.iucnredlist.org/details/full/19892850/0
          http://www.catalogueoflife.org/col/details/species/id/19a389479bdb0eeb17ac0298d8f5edfa

          ... while Acacia is an Australian plant:
          http://bie.ala.org.au/species/urn:lsid:biodiversity.org.au:apni.taxon:296027#tab_names

          If the basionym detection would be less greedy and strictly require a basionym author in brackets this case would nat have happened.


          Discovered while looking at unmatched names from APNI which are mostly all infraspecific autonyms:
          http://www.gbif.org/species/search?dataset_key=ccd12960-b471-469d-bd77-ae296e91bfab&issue=BACKBONE_MATCH_NONE

          These autonyms are removed in the backbone in case there is no accepted infraspecific taxon left in the species.

          Show
          Markus Döring added a comment - - edited Another example in Fabaceae appears to be Acacia anceps var. angustifolia Benth. and Monopteryx angustifolia Benth. Monopteryx angustifolia apparently is endemic to Brazil: http://www.iucnredlist.org/details/full/19892850/0 http://www.catalogueoflife.org/col/details/species/id/19a389479bdb0eeb17ac0298d8f5edfa ... while Acacia is an Australian plant: http://bie.ala.org.au/species/urn:lsid:biodiversity.org.au:apni.taxon:296027#tab_names If the basionym detection would be less greedy and strictly require a basionym author in brackets this case would nat have happened. Discovered while looking at unmatched names from APNI which are mostly all infraspecific autonyms: http://www.gbif.org/species/search?dataset_key=ccd12960-b471-469d-bd77-ae296e91bfab&issue=BACKBONE_MATCH_NONE These autonyms are removed in the backbone in case there is no accepted infraspecific taxon left in the species.
          Hide
          Markus Döring added a comment - - edited

          Attached the outcome of building a nub with all current backbone names from just Asteraceae, Fabaceae and Aves as input. As the input names used are not marked as synonyms or contain basionym links ALL synonyms in the output come from the implemented rules alone - which is useful to eyeball them for unwanted results.

          Synonyms in this text format are prefixed with an asterisk ( *). Basionyms are also prefixed with a dollar symbol ($)

          The attached logs document the detailed building steps name by name
          ------
          There are far fewer synonymizations. The first basionym group is a correct one:
          Aquila africanus (Cassin, 1865); Aquila africana (Cassin, 1865); Spizaetus africanus (Cassin, 1865)

          All of the above reported casesappear fixed in the sense that they are not synonymized at all now:

          • Achillea millefolium L.; Tanacetum millefolium L.; Anthemis millefolia L.; ...
          • Pueraria montana, Anthyllis montana & Astragalus montanus L.
          • Helianthus atrorubens L. & Hebeclinium atrorubens Lemaire
          • Acacia anceps var. angustifolia Benth. and Monopteryx angustifolia Benth.

          Examples of other verified good groupings:

          Anthyllis montana L. [species]
            Anthyllis montana subsp. atropurpurea (Vuk.) Pignatti [subspecies]
              *$Anthyllis montana f. atropurpurea Vuk. [form]
              *Anthyllis atropurpurea (Vuk.) Schloss. & Vuk. [species]
            Anthyllis montana subsp. hispanica (Degen & Hervier) Cullen [subspecies]
              *$Anthyllis montana var. hispanica Degen & Hervier [variety]
            Anthyllis montana subsp. jacquinii (A.Kern.) Hayek [subspecies]
              *$Anthyllis jacquinii A.Kern. [species]
            Anthyllis montana subsp. montana [subspecies]
            $Anthyllis montana var. jacquinii Rchb. f. [variety]
              *Anthyllis montana subsp. jacquinii (Rchb. f.) Rohlena [subspecies]
            Anthyllis montana var. montana [variety]
            Anthyllis montana var. sericea Jeanb. & Timb.-Lagr., 1879 [variety]
          
          $Anthyllis sampaiona Rothm., 1941 [species]
            *Anthyllis vulneraria subsp. sampaiona (Rothm.) Vasc., 1962 [subspecies]
          
          $Hebeclinium megalophyllum Lemaire [species]
            *Eupatorium megalophyllum (Lem.) Klatt [species]
            *Eupatorium megalophyllum (Lem.) N.E.Br. [species]
          

          See http://www.theplantlist.org/tpl1.1/record/gcc-24280

          $Dolichos lobatus Willd. [species]
            *Pueraria lobata (Willd.) Ohwi [species]
            *Pueraria lobata subsp. chinensis (Ohwi) Ohwi [subspecies]
            *Pueraria lobata subsp. lobata [subspecies]
            *Pueraria lobata subsp. thomsonii (Benth.) H.Ohashi & Tateishi [subspecies]
            *Pueraria lobata var. chinensis Ohwi [variety]
            *Pueraria lobata var. lobata [variety]
            *Pueraria lobata var. montana (Lour.) Maesen [variety]
            *Pueraria lobata var. thomsoni (Benth.) Maesen [variety]
            *Pueraria lobata var. thomsonii (Benth.) Maesen [variety]
            *Pueraria montana var. lobata (Willd.) Sanjappa & Pradeep [variety]
          

          See http://www.theplantlist.org/tpl1.1/record/ild-40432

          $Dolichos luteus Sw. [species]
            *Vigna lutea (Sw.) A.Gray [species]
            *Vigna repens var. lutea (Sw.) Kuntze [variety]
          

          See http://www.theplantlist.org/tpl1.1/record/ild-30245

          $Dolichos maritimus Aubl. [species]
            *Canavalia maritima (Aubl.) Thouars [species]
            *Canavalia maritima (Aubl.) Urb., 1919 [species]
          

          See http://www.theplantlist.org/tpl1.1/record/ild-3633

          $Dolichos miniatus Kunth [species]
            *Canavalia miniata (Kunth) DC. [species]
          

          See http://www.theplantlist.org/tpl1.1/record/ild-3633

          $Dolichos niloticus Delile [species]
            *Vigna nilotica (Delile) Hook.f. [species]
          

          See http://www.theplantlist.org/tpl1.1/record/ild-3509

          Show
          Markus Döring added a comment - - edited Attached the outcome of building a nub with all current backbone names from just Asteraceae, Fabaceae and Aves as input. As the input names used are not marked as synonyms or contain basionym links ALL synonyms in the output come from the implemented rules alone - which is useful to eyeball them for unwanted results. Synonyms in this text format are prefixed with an asterisk ( *). Basionyms are also prefixed with a dollar symbol ($) The attached logs document the detailed building steps name by name ------ There are far fewer synonymizations. The first basionym group is a correct one: Aquila africanus (Cassin, 1865); Aquila africana (Cassin, 1865); Spizaetus africanus (Cassin, 1865) All of the above reported casesappear fixed in the sense that they are not synonymized at all now: Achillea millefolium L.; Tanacetum millefolium L.; Anthemis millefolia L.; ... Pueraria montana, Anthyllis montana & Astragalus montanus L. Helianthus atrorubens L. & Hebeclinium atrorubens Lemaire Acacia anceps var. angustifolia Benth. and Monopteryx angustifolia Benth. Examples of other verified good groupings: Anthyllis montana L. [species] Anthyllis montana subsp. atropurpurea (Vuk.) Pignatti [subspecies] *$Anthyllis montana f. atropurpurea Vuk. [form] *Anthyllis atropurpurea (Vuk.) Schloss. & Vuk. [species] Anthyllis montana subsp. hispanica (Degen & Hervier) Cullen [subspecies] *$Anthyllis montana var. hispanica Degen & Hervier [variety] Anthyllis montana subsp. jacquinii (A.Kern.) Hayek [subspecies] *$Anthyllis jacquinii A.Kern. [species] Anthyllis montana subsp. montana [subspecies] $Anthyllis montana var. jacquinii Rchb. f. [variety] *Anthyllis montana subsp. jacquinii (Rchb. f.) Rohlena [subspecies] Anthyllis montana var. montana [variety] Anthyllis montana var. sericea Jeanb. & Timb.-Lagr., 1879 [variety] $Anthyllis sampaiona Rothm., 1941 [species] *Anthyllis vulneraria subsp. sampaiona (Rothm.) Vasc., 1962 [subspecies] $Hebeclinium megalophyllum Lemaire [species] *Eupatorium megalophyllum (Lem.) Klatt [species] *Eupatorium megalophyllum (Lem.) N.E.Br. [species] See http://www.theplantlist.org/tpl1.1/record/gcc-24280 $Dolichos lobatus Willd. [species] *Pueraria lobata (Willd.) Ohwi [species] *Pueraria lobata subsp. chinensis (Ohwi) Ohwi [subspecies] *Pueraria lobata subsp. lobata [subspecies] *Pueraria lobata subsp. thomsonii (Benth.) H.Ohashi & Tateishi [subspecies] *Pueraria lobata var. chinensis Ohwi [variety] *Pueraria lobata var. lobata [variety] *Pueraria lobata var. montana (Lour.) Maesen [variety] *Pueraria lobata var. thomsoni (Benth.) Maesen [variety] *Pueraria lobata var. thomsonii (Benth.) Maesen [variety] *Pueraria montana var. lobata (Willd.) Sanjappa & Pradeep [variety] See http://www.theplantlist.org/tpl1.1/record/ild-40432 $Dolichos luteus Sw. [species] *Vigna lutea (Sw.) A.Gray [species] *Vigna repens var. lutea (Sw.) Kuntze [variety] See http://www.theplantlist.org/tpl1.1/record/ild-30245 $Dolichos maritimus Aubl. [species] *Canavalia maritima (Aubl.) Thouars [species] *Canavalia maritima (Aubl.) Urb., 1919 [species] See http://www.theplantlist.org/tpl1.1/record/ild-3633 $Dolichos miniatus Kunth [species] *Canavalia miniata (Kunth) DC. [species] See http://www.theplantlist.org/tpl1.1/record/ild-3633 $Dolichos niloticus Delile [species] *Vigna nilotica (Delile) Hook.f. [species] See http://www.theplantlist.org/tpl1.1/record/ild-3509
          Hide
          Roderic D. M. Page added a comment -

          Did I miss the attachment...?

          Show
          Roderic D. M. Page added a comment - Did I miss the attachment...?
          Hide
          Markus Döring added a comment -

          Sorry Rod, I had to replace them with a correct version. Here they are!

          Show
          Markus Döring added a comment - Sorry Rod, I had to replace them with a correct version. Here they are!
          Hide
          Markus Döring added a comment -

          The new, strict grouping appears to be good.

          Show
          Markus Döring added a comment - The new, strict grouping appears to be good.
          Hide
          Markus Döring added a comment -

          Fixed occurrence maps for:
          Monopteryx angustifolia is endemic to Brazil: http://www.gbif-uat.org/species/2945505
          Acacia anceps is Australian again: http://www.gbif-uat.org/species/2979404

          Show
          Markus Döring added a comment - Fixed occurrence maps for: Monopteryx angustifolia is endemic to Brazil: http://www.gbif-uat.org/species/2945505 Acacia anceps is Australian again: http://www.gbif-uat.org/species/2979404

            People

            • Assignee:
              Markus Döring
              Reporter:
              Markus Döring
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: