[POR-2989] Too eager basionym merging during nub builds? Created: 04/Dec/15  Updated: 02/Aug/16  Resolved: 27/Jul/16

Status: Closed
Project: Portal
Component/s: Checklistbank
Affects Version/s: None
Fix Version/s: None

Type: Improvement Priority: Blocker
Reporter: Markus Döring Assignee: Markus Döring
Resolution: Fixed Votes: 0
Labels: basionym, nub
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Attachments: Zip Archive nub-synonyms.log.zip     Zip Archive nub-synonyms.txt.zip    
Issue Links:
blocks PF-2513 Synonymy issue for Pueraria lobata Closed
blocks PF-2514 Amygdalus mongolica Closed
blocks PF-2475 Lessingianthus rosmarinifolius and Sa... Closed
blocks PF-2481 Gilbertiodendron straussianum and Bau... Closed
blocks PF-2552 Bartlettina is a mess Closed
blocks PF-2387 Hedyotis microphylla homonym results ... Closed
relates to POR-398 Merge species by their epithet and au... Resolved
relates to POR-2794 Plant objective synonyms treated as "... Closed
Epic Link: Improve Backbone Building August 2016


When building the nub using just the Mantodea checklist we yield more synonymised recombinations of basionyms as in the original, even though it is a very well curated experts dataset with basionym information:


SOURCE Mantodea:
Hymenopodidae [family]
  Galinthias meruensis Sjostedt, 1909 [species]
    *Galinthias usambarica Sjostedt, 1909 [species]
  Oxypiloidea (Catasigerpes) nigericus (Giglio-Tos, 1915) [species]
  Oxypilus (Anoxypilus) meruensis (Sjostedt, 1909) [species]
    *Oxypilus (Anoxypilus) nigericus (Beier, 1930) [species]
  Pseudoharpax nigericus Giglio-Tos, 1915 [species]

Hymenopodidae [family]
  $Galinthias meruensis Sjostedt, 1909 [species]
    *Galinthias usambarica Sjostedt, 1909 [species]
    *Oxypilus meruensis (Sjostedt, 1909) [species]
    *Oxypilus nigericus (Beier, 1930) [species]
  $Pseudoharpax nigericus Giglio-Tos, 1915 [species]
    *Oxypiloidea nigericus (Giglio-Tos, 1915) [species]

Investigate whether this additional nub grouping could actually be correct or clearly is an error. If it is an error we need to think about how to avoid them. Currently we detect basionyms within a family which might be too eager

Comment by Markus Döring [ 11/Feb/16 ]

Roderic D. M. Page do you have any more insight into this case maybe? I cannot find much information apart from the Mantodeo Species Files which treat Oxypilus (Anoxypilus) meruensis (Sjostedt, 1909) and Galinthias meruensis Sjostedt, 1909 as 2 distinct taxa:


From the names alone it seems very likely that Oxypilus meruensis is a recombination based on the same type. Both genera Oxypilus and Galinthias predate the original species description

Comment by Markus Döring [ 11/Feb/16 ]

Looks like both species were described originally in 1909 in this expedition paper from the Kilimandjaro and thus are not based on the same type:
page 67&69: http://www.biodiversitylibrary.org/item/16983#page/183/mode/1up

Comment by Roderic D. M. Page [ 11/Feb/16 ]

Markus Döring The family-based method of checking for duplicated species names may fail if two species with same name are described in different genera in same family. We could maybe try and avoid some errors by looking for patterns like one genus with A. b. Linn, and the other has C. b. (Linn) but we could still make mistakes.

I'm puzzled by this example though, because it's not clear to me why Oxypilus (Anoxypilus) meruensis (Sjostedt, 1909) is written with (Sjostedt, 1909) in parentheses. He's the author of the species name and placed it in Oxypilus as you've discovered (original reference here http://biostor.org/reference/161030 ), but Mantodea checklist seems to have added parentheses. At some point Oxypilus meruensis was place din the subgenus Anoxypilus, but I don't follow why this should mean the name is in parentheses.

Comment by Markus Döring [ 12/Feb/16 ]

Yes, I do not quite understand that either. And this is actually causing the potentially false grouping. If the author would be without parentheses we would not consider them as being a recombination and leave the name as an accepted one. Guess we need to do some more detailed analysis on the new backbone over time as it is impossible to tell from just the names we have and each case needs a thorough manual investigation

Comment by Roderic D. M. Page [ 12/Feb/16 ]

(Sigh). Yes, a lot of manual work may be needed. I'm trying to lessen some of this by fleshing out BioNames to include more original publications so it's possible to track down the name, I've also an as-yet unreleased project doing the same for IPNI (some 300,000 names are now linked to a digital identifier for the publication of the name). It would be nice if these sort of resources could be expressed in a way that enabled automatic resolution of problems like the one you've encountered. If, say, we had names linked to type specimens we'd be able to pretty quickly spot false hits, but it's whether we'll ever have sufficiently linked data to be able to do that. Meantime, manual investigation seems the order of the day. Not quite the brave new world of automated reasoning yet

Comment by Markus Döring [ 17/Mar/16 ]

To reduce the too eager basionym grouping we could:

  • only create basionym relations programmatically if they are without conflicts and there is a clear, single original name (not 2 original names with the same epithet and author per family)
  • maintain a blacklist of families, authors and/or epithets to exclude from the automated grouping
  • better indicate that the basionym was derived
  • turn off basionym detection alltogether: this would bring us back to the situation with many, many more accepted names, see POR-1389. I would hope we do not have to do that and see this as the last resort. Better to have a few basionyms wrong than thousands of false accepted names.
Comment by Markus Döring [ 22/Mar/16 ]

Another example of an too eagerly synonymized species is Tanacetum millefolium (L.) Tzvelev which has the basionym Anthemis millefolia L. and not Achillea millefolium L. which is what the new backbone uses as the single basionym: http://mdoering.github.io/nub-browser/app/#/taxon/3120060

Achillea millefolium L. 
  is the basionym of:

Santolina millefolium (L.) Baill., 1882 species
Chamaemelum millefolium (L.) E.H.L.Krause species
Tanacetum millefolium (L.) Tzvelev species
Anthemis millefolia L. species
Chrysanthemum millefolium (L.) L. species
Chrysanthemum millefolium (L.) Nyár. species
Anthemis millefolium (L.) Schrank, 1789 species
Alitubus millefolium (L.) Dulac species
Comment by Markus Döring [ 01/Jul/16 ]

Another example in Fabaceae: Pueraria montana, Anthyllis montana & Astragalus montanus L.
See http://dev.gbif.org/issues/browse/PF-2513

Comment by Markus Döring [ 04/Jul/16 ]

Another problematic cases from Rods blog post http://iphylo.blogspot.de/2013/09/the-quality-of-gbif-taxonomic.html:
Helianthus atrorubens L. & Hebeclinium atrorubens Lemaire

Comment by Markus Döring [ 11/Jul/16 ]

Another example in Fabaceae appears to be
Acacia anceps var. angustifolia Benth. and Monopteryx angustifolia Benth.

Monopteryx angustifolia apparently is endemic to Brazil:

... while Acacia is an Australian plant:

If the basionym detection would be less greedy and strictly require a basionym author in brackets this case would nat have happened.

Discovered while looking at unmatched names from APNI which are mostly all infraspecific autonyms:

These autonyms are removed in the backbone in case there is no accepted infraspecific taxon left in the species.

Comment by Markus Döring [ 12/Jul/16 ]

Attached the outcome of building a nub with all current backbone names from just Asteraceae, Fabaceae and Aves as input. As the input names used are not marked as synonyms or contain basionym links ALL synonyms in the output come from the implemented rules alone - which is useful to eyeball them for unwanted results.

Synonyms in this text format are prefixed with an asterisk ( *). Basionyms are also prefixed with a dollar symbol ($)

The attached logs document the detailed building steps name by name
There are far fewer synonymizations. The first basionym group is a correct one:
Aquila africanus (Cassin, 1865); Aquila africana (Cassin, 1865); Spizaetus africanus (Cassin, 1865)

All of the above reported casesappear fixed in the sense that they are not synonymized at all now:

  • Achillea millefolium L.; Tanacetum millefolium L.; Anthemis millefolia L.; ...
  • Pueraria montana, Anthyllis montana & Astragalus montanus L.
  • Helianthus atrorubens L. & Hebeclinium atrorubens Lemaire
  • Acacia anceps var. angustifolia Benth. and Monopteryx angustifolia Benth.

Examples of other verified good groupings:

Anthyllis montana L. [species]
  Anthyllis montana subsp. atropurpurea (Vuk.) Pignatti [subspecies]
    *$Anthyllis montana f. atropurpurea Vuk. [form]
    *Anthyllis atropurpurea (Vuk.) Schloss. & Vuk. [species]
  Anthyllis montana subsp. hispanica (Degen & Hervier) Cullen [subspecies]
    *$Anthyllis montana var. hispanica Degen & Hervier [variety]
  Anthyllis montana subsp. jacquinii (A.Kern.) Hayek [subspecies]
    *$Anthyllis jacquinii A.Kern. [species]
  Anthyllis montana subsp. montana [subspecies]
  $Anthyllis montana var. jacquinii Rchb. f. [variety]
    *Anthyllis montana subsp. jacquinii (Rchb. f.) Rohlena [subspecies]
  Anthyllis montana var. montana [variety]
  Anthyllis montana var. sericea Jeanb. & Timb.-Lagr., 1879 [variety]
$Anthyllis sampaiona Rothm., 1941 [species]
  *Anthyllis vulneraria subsp. sampaiona (Rothm.) Vasc., 1962 [subspecies]
$Hebeclinium megalophyllum Lemaire [species]
  *Eupatorium megalophyllum (Lem.) Klatt [species]
  *Eupatorium megalophyllum (Lem.) N.E.Br. [species]

See http://www.theplantlist.org/tpl1.1/record/gcc-24280

$Dolichos lobatus Willd. [species]
  *Pueraria lobata (Willd.) Ohwi [species]
  *Pueraria lobata subsp. chinensis (Ohwi) Ohwi [subspecies]
  *Pueraria lobata subsp. lobata [subspecies]
  *Pueraria lobata subsp. thomsonii (Benth.) H.Ohashi & Tateishi [subspecies]
  *Pueraria lobata var. chinensis Ohwi [variety]
  *Pueraria lobata var. lobata [variety]
  *Pueraria lobata var. montana (Lour.) Maesen [variety]
  *Pueraria lobata var. thomsoni (Benth.) Maesen [variety]
  *Pueraria lobata var. thomsonii (Benth.) Maesen [variety]
  *Pueraria montana var. lobata (Willd.) Sanjappa & Pradeep [variety]

See http://www.theplantlist.org/tpl1.1/record/ild-40432

$Dolichos luteus Sw. [species]
  *Vigna lutea (Sw.) A.Gray [species]
  *Vigna repens var. lutea (Sw.) Kuntze [variety]

See http://www.theplantlist.org/tpl1.1/record/ild-30245

$Dolichos maritimus Aubl. [species]
  *Canavalia maritima (Aubl.) Thouars [species]
  *Canavalia maritima (Aubl.) Urb., 1919 [species]

See http://www.theplantlist.org/tpl1.1/record/ild-3633

$Dolichos miniatus Kunth [species]
  *Canavalia miniata (Kunth) DC. [species]

See http://www.theplantlist.org/tpl1.1/record/ild-3633

$Dolichos niloticus Delile [species]
  *Vigna nilotica (Delile) Hook.f. [species]

See http://www.theplantlist.org/tpl1.1/record/ild-3509

Comment by Roderic D. M. Page [ 12/Jul/16 ]

Did I miss the attachment...?

Comment by Markus Döring [ 12/Jul/16 ]

Sorry Rod, I had to replace them with a correct version. Here they are!

Comment by Markus Döring [ 27/Jul/16 ]

The new, strict grouping appears to be good.

Comment by Markus Döring [ 02/Aug/16 ]

Fixed occurrence maps for:
Monopteryx angustifolia is endemic to Brazil: http://www.gbif-uat.org/species/2945505
Acacia anceps is Australian again: http://www.gbif-uat.org/species/2979404

Generated at Fri Apr 20 08:46:58 CEST 2018 using JIRA 6.3.14#6345-sha1:47b2bb0a76c6e60bffb16fa45719b26a7e5e0c78.