Issue 18076

Too eager basionym merging during nub builds?

18076
Reporter: mdoering
Assignee: mdoering
Type: Improvement
Summary: Too eager basionym merging during nub builds?
Priority: Blocker
Resolution: Fixed
Status: Closed
Created: 2015-12-04 12:19:57.801
Updated: 2016-08-02 10:35:21.327
Resolved: 2016-07-27 13:18:58.757
        
Description: When building the nub using just the Mantodea checklist we yield more synonymised recombinations of basionyms as in the original, even though it is a very well curated experts dataset with basionym information:

http://www.gbif.org/dataset/99948a8b-63b2-41bf-9d10-6e007e967789

{noformat}
SOURCE Mantodea:
Hymenopodidae [family]
  Galinthias meruensis Sjostedt, 1909 [species]
    *Galinthias usambarica Sjostedt, 1909 [species]
  Oxypiloidea (Catasigerpes) nigericus (Giglio-Tos, 1915) [species]
  Oxypilus (Anoxypilus) meruensis (Sjostedt, 1909) [species]
    *Oxypilus (Anoxypilus) nigericus (Beier, 1930) [species]
  Pseudoharpax nigericus Giglio-Tos, 1915 [species]


NUB:
Hymenopodidae [family]
  $Galinthias meruensis Sjostedt, 1909 [species]
    *Galinthias usambarica Sjostedt, 1909 [species]
    *Oxypilus meruensis (Sjostedt, 1909) [species]
    *Oxypilus nigericus (Beier, 1930) [species]
  $Pseudoharpax nigericus Giglio-Tos, 1915 [species]
    *Oxypiloidea nigericus (Giglio-Tos, 1915) [species]
{noformat}

Investigate whether this additional nub grouping could actually be correct or clearly is an error. If it is an error we need to think about how to avoid them. Currently we detect basionyms within a family which might be too eager]]>
    
Attachment nub-synonyms.log.zip
Attachment nub-synonyms.txt.zip


Author: mdoering@gbif.org
Created: 2016-02-11 12:19:33.087
Updated: 2016-02-11 12:19:33.087
        
[~rdmpage] do you have any more insight into this case maybe? I cannot find much information apart from the Mantodeo Species Files which treat Oxypilus (Anoxypilus) meruensis (Sjostedt, 1909) and Galinthias meruensis Sjostedt, 1909 as 2 distinct taxa:

http://mantodea.speciesfile.org/Common/basic/Taxa.aspx?TaxonNameID=1182721
http://mantodea.speciesfile.org/Common/basic/Taxa.aspx?TaxonNameID=1182837

From the names alone it seems very likely that Oxypilus meruensis is a recombination based on the same type. Both genera Oxypilus and Galinthias predate the original species description
    


Author: mdoering@gbif.org
Created: 2016-02-11 12:35:39.295
Updated: 2016-02-11 12:35:39.295
        
Looks like both species were described originally in 1909 in this expedition paper from the Kilimandjaro and thus are not based on the same type:
page 67&69: http://www.biodiversitylibrary.org/item/16983#page/183/mode/1up
    


Author: rdmpage
Created: 2016-02-11 22:50:55.482
Updated: 2016-02-11 22:50:55.482
        
[~mdoering@gbif.org] The family-based method of checking for duplicated species names may fail if two species with same name are described in different genera in same family. We could maybe try and avoid some errors by looking for patterns like one genus with A. b. Linn, and the other has C. b. (Linn) but we could still make mistakes.

I'm puzzled by this example though, because it's not clear to me why Oxypilus (Anoxypilus) meruensis (Sjostedt, 1909) is written with (Sjostedt, 1909) in parentheses. He's the author of the species name and placed it in Oxypilus as you've discovered (original reference here http://biostor.org/reference/161030 ), but Mantodea checklist seems to have added parentheses. At some point Oxypilus meruensis was place din the subgenus Anoxypilus, but I don't follow why this should mean the name is in parentheses.
    


Author: mdoering@gbif.org
Comment: Yes, I do not quite understand that either. And this is actually causing the potentially false grouping. If the author would be without parentheses we would not consider them as being a recombination and leave the name as an accepted one. Guess we need to do some more detailed analysis on the new backbone over time as it is impossible to tell from just the names we have and each case needs a thorough manual investigation
Created: 2016-02-12 10:26:43.799
Updated: 2016-02-12 10:26:43.799


Author: rdmpage
Comment: (Sigh). Yes, a lot of manual work may be needed. I'm trying to lessen some of this by fleshing out BioNames to include more original publications so it's possible to track down the name, I've also an as-yet unreleased project doing the same for IPNI (some 300,000 names are now linked to a digital identifier for the publication of the name). It would be nice if these sort of resources could be expressed in a way that enabled automatic resolution of problems like the one you've encountered. If, say, we had names linked to type specimens we'd be able to pretty quickly spot false hits, but it's whether we'll ever have sufficiently linked data to be able to do that. Meantime, manual investigation seems the order of the day. Not quite the brave new world of automated reasoning yet ;) 
Created: 2016-02-12 11:01:39.777
Updated: 2016-02-12 11:01:39.777


Author: mdoering@gbif.org
Created: 2016-03-17 10:47:15.999
Updated: 2016-03-17 10:47:15.999
        
To reduce the too eager basionym grouping we could:
 - only create basionym relations programmatically if they are without conflicts and there is a clear, single original name (not 2 original names with the same epithet and author per family)
 - maintain a blacklist of families, authors and/or epithets to exclude from the automated grouping
 - better indicate that the basionym was derived
 - turn off basionym detection alltogether: this would bring us back to the situation with many, many more accepted names, see POR-1389. I would hope we do not have to do that and see this as the last resort. Better to have a few basionyms wrong than thousands of false accepted names.
    


Author: mdoering@gbif.org
Created: 2016-03-22 10:22:44.299
Updated: 2016-03-22 10:23:06.929
        
Another example of an too eagerly synonymized species is Tanacetum millefolium (L.) Tzvelev which has the basionym Anthemis millefolia L. and not Achillea millefolium L. which is what the new backbone uses as the single basionym: http://mdoering.github.io/nub-browser/app/#/taxon/3120060

{noformat}
Achillea millefolium L.
  is the basionym of:

Santolina millefolium (L.) Baill., 1882 species
Chamaemelum millefolium (L.) E.H.L.Krause species
Tanacetum millefolium (L.) Tzvelev species
Anthemis millefolia L. species
Chrysanthemum millefolium (L.) L. species
Chrysanthemum millefolium (L.) Nyár. species
Anthemis millefolium (L.) Schrank, 1789 species
Alitubus millefolium (L.) Dulac species
{noformat}

    


Author: mdoering@gbif.org
Created: 2016-07-01 17:54:47.559
Updated: 2016-07-01 17:54:47.559
        
Another example in Fabaceae: Pueraria montana,  Anthyllis montana & Astragalus montanus L.
See http://dev.gbif.org/issues/browse/PF-2513
    


Author: mdoering@gbif.org
Created: 2016-07-04 14:20:12.511
Updated: 2016-07-04 14:20:12.511
        
Another problematic cases from Rods blog post http://iphylo.blogspot.de/2013/09/the-quality-of-gbif-taxonomic.html:
Helianthus atrorubens L. & Hebeclinium atrorubens Lemaire
    


Author: mdoering@gbif.org
Created: 2016-07-11 10:48:58.99
Updated: 2016-07-11 10:50:23.221
        
Another example in Fabaceae appears to be
Acacia anceps var. angustifolia Benth. and Monopteryx angustifolia Benth.

Monopteryx angustifolia apparently is endemic to Brazil:
http://www.iucnredlist.org/details/full/19892850/0
http://www.catalogueoflife.org/col/details/species/id/19a389479bdb0eeb17ac0298d8f5edfa

... while Acacia is an Australian plant:
http://bie.ala.org.au/species/urn:lsid:biodiversity.org.au:apni.taxon:296027#tab_names

If the basionym detection would be less greedy and strictly require a basionym author in brackets this case would nat have happened.

-----
Discovered while looking at unmatched names from APNI which are mostly all infraspecific autonyms:
http://www.gbif.org/species/search?dataset_key=ccd12960-b471-469d-bd77-ae296e91bfab&issue=BACKBONE_MATCH_NONE

These autonyms are removed in the backbone in case there is no accepted infraspecific taxon left in the species.
    


Author: mdoering@gbif.org
Created: 2016-07-12 15:40:41.627
Updated: 2016-07-13 12:43:42.001
        
Attached the outcome of building a nub with all current backbone names from just Asteraceae, Fabaceae and Aves as input. As the input names used are not marked as synonyms or contain basionym links ALL synonyms in the output come from the implemented rules alone - which is useful to eyeball them for unwanted results.

Synonyms in this text format are prefixed with an asterisk ( *). Basionyms are also prefixed with a dollar symbol ($)

The attached logs document the detailed building steps name by name
------
There are far fewer synonymizations. The first basionym group is a correct one:
Aquila africanus (Cassin, 1865);  Aquila africana (Cassin, 1865);  Spizaetus africanus (Cassin, 1865)

All of the above reported casesappear fixed in the sense that they are not synonymized at all now:
 - Achillea millefolium L.; Tanacetum millefolium L.; Anthemis millefolia L.; ...
 - Pueraria montana, Anthyllis montana & Astragalus montanus L.
 - Helianthus atrorubens L. & Hebeclinium atrorubens Lemaire
 - Acacia anceps var. angustifolia Benth. and Monopteryx angustifolia Benth.


Examples of other verified good groupings:

{noformat}
Anthyllis montana L. [species]
  Anthyllis montana subsp. atropurpurea (Vuk.) Pignatti [subspecies]
    *$Anthyllis montana f. atropurpurea Vuk. [form]
    *Anthyllis atropurpurea (Vuk.) Schloss. & Vuk. [species]
  Anthyllis montana subsp. hispanica (Degen & Hervier) Cullen [subspecies]
    *$Anthyllis montana var. hispanica Degen & Hervier [variety]
  Anthyllis montana subsp. jacquinii (A.Kern.) Hayek [subspecies]
    *$Anthyllis jacquinii A.Kern. [species]
  Anthyllis montana subsp. montana [subspecies]
  $Anthyllis montana var. jacquinii Rchb. f. [variety]
    *Anthyllis montana subsp. jacquinii (Rchb. f.) Rohlena [subspecies]
  Anthyllis montana var. montana [variety]
  Anthyllis montana var. sericea Jeanb. & Timb.-Lagr., 1879 [variety]
{noformat}

{noformat}
$Anthyllis sampaiona Rothm., 1941 [species]
  *Anthyllis vulneraria subsp. sampaiona (Rothm.) Vasc., 1962 [subspecies]
{noformat}

{noformat}
$Hebeclinium megalophyllum Lemaire [species]
  *Eupatorium megalophyllum (Lem.) Klatt [species]
  *Eupatorium megalophyllum (Lem.) N.E.Br. [species]
{noformat}
See http://www.theplantlist.org/tpl1.1/record/gcc-24280

{noformat}
$Dolichos lobatus Willd. [species]
  *Pueraria lobata (Willd.) Ohwi [species]
  *Pueraria lobata subsp. chinensis (Ohwi) Ohwi [subspecies]
  *Pueraria lobata subsp. lobata [subspecies]
  *Pueraria lobata subsp. thomsonii (Benth.) H.Ohashi & Tateishi [subspecies]
  *Pueraria lobata var. chinensis Ohwi [variety]
  *Pueraria lobata var. lobata [variety]
  *Pueraria lobata var. montana (Lour.) Maesen [variety]
  *Pueraria lobata var. thomsoni (Benth.) Maesen [variety]
  *Pueraria lobata var. thomsonii (Benth.) Maesen [variety]
  *Pueraria montana var. lobata (Willd.) Sanjappa & Pradeep [variety]
{noformat}
See http://www.theplantlist.org/tpl1.1/record/ild-40432

{noformat}
$Dolichos luteus Sw. [species]
  *Vigna lutea (Sw.) A.Gray [species]
  *Vigna repens var. lutea (Sw.) Kuntze [variety]
{noformat}
See http://www.theplantlist.org/tpl1.1/record/ild-30245

{noformat}
$Dolichos maritimus Aubl. [species]
  *Canavalia maritima (Aubl.) Thouars [species]
  *Canavalia maritima (Aubl.) Urb., 1919 [species]
{noformat}
See http://www.theplantlist.org/tpl1.1/record/ild-3633

{noformat}
$Dolichos miniatus Kunth [species]
  *Canavalia miniata (Kunth) DC. [species]
{noformat}
See http://www.theplantlist.org/tpl1.1/record/ild-3633

{noformat}
$Dolichos niloticus Delile [species]
  *Vigna nilotica (Delile) Hook.f. [species]
{noformat}
See http://www.theplantlist.org/tpl1.1/record/ild-3509
    


Author: rdmpage
Comment: Did I miss the attachment...?
Created: 2016-07-12 15:57:37.583
Updated: 2016-07-12 15:57:37.583


Author: mdoering@gbif.org
Comment: Sorry Rod, I had to replace them with a correct version. Here they are!
Created: 2016-07-12 16:37:32.006
Updated: 2016-07-12 16:37:32.006


Author: mdoering@gbif.org
Comment: The new, strict grouping appears to be good.
Created: 2016-07-27 13:18:58.796
Updated: 2016-07-27 13:18:58.796


Author: mdoering@gbif.org
Created: 2016-08-02 10:35:21.327
Updated: 2016-08-02 10:35:21.327
        
Fixed occurrence maps for:
Monopteryx angustifolia is endemic to Brazil: http://www.gbif-uat.org/species/2945505
Acacia anceps is Australian again: http://www.gbif-uat.org/species/2979404