13584
Reporter: mdoering
Assignee: mdoering
Type: Improvement
Summary: Merge species by their epithet and authorship
Priority: Critical
Resolution: Fixed
Status: Resolved
Created: 2013-08-14 17:17:27.849
Updated: 2016-07-04 14:16:41.284
Resolved: 2015-09-18 20:30:56.094
Description: To avoid creating multiple accepted taxa based on the same holotype we should try to detect original name relations based on the authorship and the species epithet. Try to spot recombinations of the same original name within a family by looking at the epithet and authorship & year.
See http://iphylo.blogspot.de/2013/08/cluster-maps-papaya-plots-and-trouble.html#disqus_thread
and http://gist.neo4j.org/?a91e351279438d2ec1e6]]>
Author: mdoering@gbif.org
Created: 2015-08-25 22:08:44.366
Updated: 2015-08-25 22:33:47.254
I am exploring the largest families in plants, Asteraceae, and animals, Curculionidae. A file for each family is attached which lists all species or infraspecific names in the current GBIF backbone that share the same terminal epithet. On first glance it appears safe to assert a basionym relation if the basionym author of a recombination is the same as the primary author of a name with the same epithet.
In addition I am attaching a family file for 2 Mammals families, the largest rodent family Muridae and the bat family used by Rod in his gists above, Molossidae:
http://www.gbif-uat.org/species/5510
http://www.gbif-uat.org/species/5719
For reference the SQL executed:
{noformat}
\copy (select coalesce(infra_specific_epithet,specific_epithet) as epithet, string_agg(scientific_name, '|' order by bracket_authorship, scientific_name) from name_usage u join name n on name_fk=n.id where dataset_key='d7dddbf4-2cf0-4f39-9b2a-bb099caae36c' and specific_epithet is not null and family_fk=5719 group by epithet having count(*) > 1) to 'molossidae.txt';
{noformat}
Author: mdoering@gbif.org
Created: 2015-08-25 22:24:55.268
Updated: 2015-08-25 22:24:55.268
There are many cases in zoological names where the actual basionym (protonym) is not known. But we can assert several names all refer to the same, unknown protonym and therefore share the type specimen and should be synonymous.
For example:
{quote}
Chaerephon bemmeleni (Jentink, 1879)
Tadarida bemmeleni (Jentink, 1879)
Chaerophon bemmeleni
{quote}
{quote}
Chaerephon bivittata (Heuglin, 1861)
Tadarida bivittata (Heuglin, 1861)
Chaerophon bivittata
{quote}
In this case we will create a temporary placeholder basionym during the nub build which will be removed before it is exported to postgres (cause we do not know the name)
Other cases are crystal clear:
{quote}
Zyzomys woodwardi (Thomas, 1909)
Laomys woodwardi Thomas, 1909
{quote}
{quote}
Mesocricetus raddei (Nehring, 1894)
Cricetus nigricans subsp. raddei
Cricetus raddei Nehring, 1894
{quote}
{quote}
Peromyscus polionotus (Wagner, 1843)
Mus polionotus Wagner, 1843
{quote}
In the following group only the Eversmann group should be created, i.e. Microtus obscurus and Mus obscurus:
{quote}
Microtus obscurus (Eversmann, 1841)
Cricetulus obscurus (Milne-Edwards, 1867)
Bolomys obscurus (Waterhouse, 1837)
Mus obscurus Eversmann, 1841
Praomys obscurus Hutterer & Dieterlen, 1992
{quote}
Author: mdoering@gbif.org
Created: 2015-08-25 22:31:44.203
Updated: 2015-08-25 22:31:53.343
An example from PESI which contains a basionym group with different (infraspecific) ranks:
http://www.eu-nomen.eu/portal/taxon.php?GUID=74D5F715-C3BB-4273-A7EE-88B9967C912C
{quote}
Centaurea phrygia subsp. abbreviata (K. Koch) Dostál [ACCEPTED]
Centaurea salicifolia subsp. abbreviata K. Koch [BASIONYM]
Centaurea abbreviata (K. Koch) Hand.-Mazz.
Jacea abbreviata (K. Koch) Soják
Centaurea phrygia subsp. abbreviata (K. Koch) Dostál
{quote}
Author: mdoering@gbif.org
Created: 2015-08-25 22:51:20.149
Updated: 2015-08-25 22:51:54.892
Looking at all names in class Aves (which has had many name changes and the GBIF backbone is built on competing source bird classifications) one can see that basionyms do occur across families. See the Ridgway, 1893 names:
{noformat}
abbotti
Muscicapidae
Copsychus malabaricus subsp. abbotti
Luscinia svecica subsp. abbotti
Nectariniidae
Cinnyris abbotti
Cinnyris souimanga subsp. abbotti
Psittacidae
Cacatua sulphurea abbotti (Oberholser, 1917)
Psittacula alexandri abbotti (Oberholser, 1919)
Psittinus cyanurus abbotti Richmond, 1902
Sulidae
Papasula abbotti (Ridgway, 1893)
Sula abbotti Ridgway, 1893
Sylviidae
Malacocincla abbotti Blyth, 1845
Malacocincla abbotti subsp. abbotti
Threskiornithidae
Threskiornis bernieri abbotti (Ridgway, 1893)
Threskiornis aethiopicus subsp. abbotti
{noformat}
But as a new feature we prefer to stay on the safe side and only group names within the same family
Author: mdoering@gbif.org
Created: 2015-08-26 09:36:35.202
Updated: 2015-08-26 09:36:35.202
A bit more troublesome are these cases found in Aves families when the year slightly differs (plus the order of the authors which should be handled gracefully)
{quote}
aequatorialis: Thraupidae
Tangara arthus aequatorialis (Taczanowski & Berlepsch, 1885)
Dacnis lineata aequatorialis Berlepsch & Taczanowski, 1884
{quote}
The above subspecies appear to be distinct taxa with different types:
http://avibase.bsc-eoc.org/species.jsp?avibaseid=BBE57B94BF75B977
http://avibase.bsc-eoc.org/species.jsp?avibaseid=303530B0EEC98857
{quote}
aequatorialis: Trochilidae
Heliodoxa rubinoides aequatorialis (Gould, 1860)
Androdon aequatorialis Gould, 1863
Campylopterus largipennis aequatorialis Gould, 1861
{quote}
We will leave those cases unresolved for now to not overly eagerly create synonyms programmatically.
Author: rdmpage
Created: 2015-08-26 10:36:09.458
Updated: 2015-08-26 10:36:24.574
[~mdoering@gbif.org] Great to see progress on this, it's clearly not easy. The birds Tangara arthus aequatorialis and Dacnis lineata aequatorialis are different, their descriptions are in http://biostor.org/reference/108278 and http://biostor.org/reference/99650, respectively.
For the plant name Centaurea phrygia subsp. abbreviata it's interesting that IPNI doesn't have basionym links :(
Author: mdoering@gbif.org
Created: 2015-08-26 16:05:46.094
Updated: 2015-08-26 16:05:46.094
New BasionymSorter class created to group basionyms from a list of names with tests covering most of the above:
https://github.com/gbif/checklistbank/blob/master/checklistbank-common/src/test/java/org/gbif/checklistbank/authorship/BasionymSorterTest.java#L31
https://github.com/gbif/checklistbank/commit/9ae8f809de671d632be4596e17df64614dcd4051#diff-c92a56042d4c37f7e598e599fdf2ea49R26
Author: mdoering@gbif.org
Created: 2015-08-27 17:23:53.866
Updated: 2015-08-27 17:23:53.866
Assuming we find several accepted species names in a basionym group of the GBIF backbone and we identified a primary accepted name by using the most trusted source, what needs to happen with the other accepted name(s) in that group?
For now the GBIF backbone will change their status to Doubtful and raise an issue flag STATUS_DERIVED.
Alternatively we could try to automatically convert it into a homotypical synonym. But that would lead to subsequent problems to deal with, primarily what to do with the potential child species or infraspecies. If we relink the children to the primary accepted name they might need to be recombined into a new genus or species and we might see all sorts of nomenclatoral issues only a human can properly resolve.
Author: rdmpage
Created: 2015-08-27 18:44:53.228
Updated: 2015-08-27 18:44:53.228
In an ideal world the nomenclatural issues would be computable. If you know the types, and the dates of publication, then the names to use follow automatically.
Maybe another way to tackle this is to have clusters of names that are in some sense related, and if a user searches for one of these they arrive at that cluster. I need to think this through a bit more, but I envisage something like a suffix tree of names which could be used to generate all the possible names one might encounter (e.g., species names with different generic names, inclusion of subgenera, suspicious, etc.). I think if we decouple recognising associated sets of names from assertions about which one is accepted, we could avoid some of these problems.
Author: mdoering@gbif.org
Created: 2015-08-27 18:53:47.628
Updated: 2015-08-27 18:53:47.628
What would happen to subspecies of a species which should be a synonym because it belongs to the basionym group (which we called "nomenclatural group" in the days with Dave: https://code.google.com/p/gbif-ecat/wiki/ChecklistBank#Nomenclatural_Group) ?
Should these subspecies also be synonymized? I guess not as they should be based on a different type
Author: rdmpage
Comment: I'd need to play with an example, but I wonder if there's any way to postpone making a decision? Can we not simply say, these names are associated in some way so that user discovers information associated with related names, but avoids the strong assertion that a name is accepted or not. If we separated names and taxa, this would be a fairly easy thing to do I suspect...
Created: 2015-08-27 19:01:42.21
Updated: 2015-08-27 19:01:42.21
Author: mdoering@gbif.org
Created: 2015-08-27 20:27:24.249
Updated: 2015-08-27 20:27:24.249
We will basically create this discoverable cluster of names by having the basionym relation established. Then one can see the list of all names in such a group from the basionym and potentially vice versa.
BUT in order to include these names in occurrence searches, on maps or statics these need to by synonyms in our backbone. Thats how the system works and I think it makes sense that way. It actually gives the fuzzy term synonym some concrete meaning in GBIF.
PS: Note that I keep calling it basionym cause Im coming from the botanical world. Think of it as protonyms, chresonyms if you prefer that.
Author: mdoering@gbif.org
Created: 2015-08-28 10:46:59.318
Updated: 2015-08-28 10:55:05.257
A good visualization of "homotypical groups" within synonym list is often found in botanical literature. The CDM software of the BBM does a pretty nice job to show which names are all based on the same type. I have attached a few screenshots of extensive synonymies:
http://cichorieae.e-taxonomy.net/portal/cdm_dataportal/taxon/469b48a7-a2c9-4769-bd69-49b68674ba72/synonymy
http://cichorieae.e-taxonomy.net/portal/cdm_dataportal/taxon/209399b6-0d3c-4f5a-9f0d-b49ebe0f9403/synonymy
http://cichorieae.e-taxonomy.net/portal/cdm_dataportal/taxon/ccd1ceaf-c100-44a4-ba36-3f83bfed86e6/synonymy
This synonymy list is gigantic: http://cichorieae.e-taxonomy.net/portal/cdm_dataportal/taxon/7b3f0f40-63f2-44a4-a72b-6a8f49dd430f/synonymy
Author: mdoering@gbif.org
Created: 2015-09-10 15:06:08.61
Updated: 2015-09-10 15:06:08.61
If there is an accepted name Cichorium intybus L. with a synonym Cichorium glabratum C. Presl
If then in another source there is an accepted recombination Cichorium intybus subsp. glabratum (C. Presl) Arcang.
Should the subspecies be accepted in the backbone or become a synonym as the primary source we trust more does not accept C. glabratum which is the basionym based on the same type for the subspecies? I would think so
Author: mdoering@gbif.org
Created: 2015-09-18 20:30:36.157
Updated: 2015-09-18 20:30:36.157
Implemented here: https://github.com/gbif/checklistbank/blob/master/checklistbank-cli/src/main/java/org/gbif/checklistbank/nub/NubBuilder.java#L737
test:
https://github.com/gbif/checklistbank/blob/master/checklistbank-cli/src/test/java/org/gbif/checklistbank/nub/NubBuilderTest.java#L106
based on these 2 sources:
https://github.com/gbif/checklistbank/blob/master/checklistbank-cli/src/test/resources/nub-sources/dataset25.txt
https://github.com/gbif/checklistbank/blob/master/checklistbank-cli/src/test/resources/nub-sources/dataset26.txt