Issue 17400

Parse microbial ranks in scientific names

17400
Reporter: mdoering
Assignee: mdoering
Type: Improvement
Summary: Parse microbial ranks in scientific names
Priority: Major
Resolution: Fixed
Status: Closed
Created: 2015-03-06 16:25:26.085
Updated: 2015-03-07 00:07:15.704
Resolved: 2015-03-07 00:07:15.681
        
Description: The name parser does not deal with microbial ranks very well so far.
There are about 10 ranks listed in the microbial code that we should support:
http://www.ncbi.nlm.nih.gov/books/NBK8812/table/A844/?report=objectonly


Examples:
Convallaria majalis convar. latifolia (Mill.) Ponert
Prunus insititia convar. syriaca (Borkh.) Dost l
Prunus insititia convar. pomariorum (Boutigny) Dost l
Leptospira noguchii serovar Panama str. CZ, 214
Bacillus thuringiensis serovar sinensis
Bacillus thuringiensis serovar kunthalaRX24

Looking at existing parsed names in Checklist Bank there are > 2700 badly parsed microbial names which have serovar or cultivar as an epithet name:

checklistbank=> select infra_specific_epithet,count(*) from name where infra_specific_epithet ~ '(var|type|form)$' group by infra_specific_epithet;
 infra_specific_epithet | count
------------------------+-------
 cultivar               |   288
 agvar                  |     4
 elevar                 |     2
 type                   |   240
 genomovar              |    59
 savannah-type          |     1
 biovar                 |   237
 genonovar              |     3
 agamovar               |    11
 morphotype             |     9
 chemovar               |     4
 var                    |    50
 n-var                  |     4
 pathovar               |     3
 bolivar                |     2
 serovar                |  1294
 biotype                |    19
 convar                 |    73
 pseudovar              |     9
 genotype               |    43
 nvar                   |    47
 form                   |     7
 provar                 |    18
 cytoform               |    19
 serotype               |   303
 ecotype                |     4
(26 rows)
]]>
    


Author: mdoering@gbif.org
Created: 2015-03-07 00:07:15.701
Updated: 2015-03-07 00:07:15.701
        
https://github.com/gbif/name-parser/commit/194f184c6c4f729446ba83f0eb84223209e43fc6

https://github.com/gbif/name-parser/commit/90c95c48952574fab9b6090f7812a433d114fef9