Issue 18419

Backbone (infra)species lacking epithets

18419
Reporter: mdoering
Assignee: mdoering
Type: Bug
Summary: Backbone (infra)species lacking epithets
Priority: Critical
Status: InProgress
Created: 2016-04-20 10:17:49.882
Updated: 2017-02-20 16:55:01.8
        
Description: There are 250 taxon names with "taxonRank=SPECIES but which in fact belong to genus (e.g. 7348906, 7350813, 8232585, etc.)

And 140 taxon names with "taxonRank=VARIETY|FORM" but for which intraspecific epithet is "null" (e.g. 7407832, 8181923 , 8189733, etc.)

]]>
    


Author: mdoering@gbif.org
Created: 2016-07-15 12:40:38.019
Updated: 2016-07-15 12:40:38.019
        
example:
http://www.gbif.org/species/7594570
Senecio jacobaea null proles (Bertol.) Rouy, 1903

Source usages has been removed, likely to be CoL then
    


Author: mdoering@gbif.org
Comment: This still persists in the august 2016 backbone. The lacking infraspecific epithet is having the literal "null" string value
Created: 2016-07-27 14:24:57.922
Updated: 2016-07-27 14:24:57.922


Author: mdoering@gbif.org
Created: 2016-09-07 14:35:48.7
Updated: 2016-09-07 14:36:28.083
        
"Chamaemy" of rank species is a current example:
http://api.gbif.org/v1/species/7457557

Having the same name as an IMPLICIT_NAME genus parent:
http://api.gbif.org/v1/species/8035685

The bad species comes from "Chamaemy Panzer, (1806-1809)" from the Official Lists and Indexes of Names in Zoology:
http://api.gbif.org/v1/species/100082720

Originally it says:
{noformat}elegans, Chamaemy[i]a, Panzer, (1806-1809), Fauna Ins. germ. (105): 12 (specific name of the
type species of Chamaemyia Meigen, 1803) (Insecta, Diptera). Op. 847 ..
{noformat}

Thats one source to fix as it is managed by us and Rod: https://github.com/gbif/iczn-lists/
    


Author: mdoering@gbif.org
Created: 2016-09-07 17:18:40.492
Updated: 2016-09-07 17:18:40.492
        
It is partly a problem of our name parser that fails to deal with a name like this:
{noformat}
Chamaemy[i]a elegans
{noformat}

See http://api.gbif.org/v1/species/100082720/verbatim

Fixed by updating the ICZN lists dwca: https://github.com/gbif/iczn-lists/commit/12dd52b96ca6d3885df92b09359ebf4c51c0c812
    


Author: mdoering@gbif.org
Created: 2016-09-08 10:04:47.181
Updated: 2016-09-08 10:05:19.409
        
{noformat}
select u.id, u.constituent_key, u.source_taxon_key, n.genus_or_above, n.scientific_name from name_usage u join name n on u.name_fk=n.id where u.deleted is null and u.dataset_key=nubKey() and u.rank='SPECIES' and n.specific_epithet is null and n.genus_or_above is not null;
--> yields 436 usages

select u.constituent_key, count(*) from name_usage u join name n on u.name_fk=n.id where u.deleted is null and u.dataset_key=nubKey() and u.rank='SPECIES' and n.specific_epithet is null and n.genus_or_above is not null GROUP BY u.constituent_key;
--> yields 27 constituents.
The main ones contributing > 99%

 7ddf754f-d193-4cc9-b351-99906754a03b:  169
 046bbc50-cae2-47ff-aa43-729fbf53f7c5:   93
 0938172b-2086-439c-a1dd-c21cb0109ed5:   67
 de8934f4-a136-481c-a87a-b0b202b80a31:   21
 2d59e5db-57ad-41ff-97d6-11f5fb264527:   19
 d9a4eedb-e985-4456-ad46-3df8472e00e8:   13
 9ca92552-f23a-41a8-a140-01abaa31c931:   11

{noformat}

    


Author: mdoering@gbif.org
Created: 2016-09-08 10:44:22.091
Updated: 2016-09-08 10:44:22.091
        
The remaining look mostly like name parsing problems.

The following don't parse properly and should be fixed in name parser:
{noformat}Angiopteris d'urvilleana de Vriese{noformat}

These are badly formatted source names:
{noformat}Homozygosphaera Schilleri (Kamptner) Okada & McIntyre, 1977{noformat}
The parser fails to parse it with authorship and falls back to canonical only parsing, igoring anything but the first genus name. Suggest to check the parsed name if it matches the rank during the nub build and either reject non virus species with no species epithet or just use the unparsed, full scientific name instead of the bad canonical one.

{noformat}
Acer √ó hillieri Lancaster
Agave √ó franzosinii Hort.Hanb. ex W.Wats.
{noformat}
Badly formatted IPNI names. Again only parsed to the genus with the authors parsed flag set to false.
The bad characters appear to represent the hybrid symbol as in Acer × hillieri. Potential addition to the name parser to understand that

{noformat}
Polana (Bulbusana) vana DeLong & Freytag 1972
Tabanus 4punctatus Fabricius, 1805
{noformat}
Name parser failures, needs fixed in parser!
    


Author: mdoering@gbif.org
Comment: Still true for 400 species in the january 2017 edition
Created: 2017-02-20 16:54:51.384
Updated: 2017-02-20 16:54:51.384