Issue 17400
Parse microbial ranks in scientific names
17400
Reporter: mdoering
Assignee: mdoering
Type: Improvement
Summary: Parse microbial ranks in scientific names
Priority: Major
Resolution: Fixed
Status: Closed
Created: 2015-03-06 16:25:26.085
Updated: 2015-03-07 00:07:15.704
Resolved: 2015-03-07 00:07:15.681
Description: The name parser does not deal with microbial ranks very well so far.
There are about 10 ranks listed in the microbial code that we should support:
http://www.ncbi.nlm.nih.gov/books/NBK8812/table/A844/?report=objectonly
Examples:
Convallaria majalis convar. latifolia (Mill.) Ponert
Prunus insititia convar. syriaca (Borkh.) Dost l
Prunus insititia convar. pomariorum (Boutigny) Dost l
Leptospira noguchii serovar Panama str. CZ, 214
Bacillus thuringiensis serovar sinensis
Bacillus thuringiensis serovar kunthalaRX24
Looking at existing parsed names in Checklist Bank there are > 2700 badly parsed microbial names which have serovar or cultivar as an epithet name:
checklistbank=> select infra_specific_epithet,count(*) from name where infra_specific_epithet ~ '(var|type|form)$' group by infra_specific_epithet;
infra_specific_epithet | count
------------------------+-------
cultivar | 288
agvar | 4
elevar | 2
type | 240
genomovar | 59
savannah-type | 1
biovar | 237
genonovar | 3
agamovar | 11
morphotype | 9
chemovar | 4
var | 50
n-var | 4
pathovar | 3
bolivar | 2
serovar | 1294
biotype | 19
convar | 73
pseudovar | 9
genotype | 43
nvar | 47
form | 7
provar | 18
cytoform | 19
serotype | 303
ecotype | 4
(26 rows)
]]>
Author: mdoering@gbif.org
Created: 2015-03-07 00:07:15.701
Updated: 2015-03-07 00:07:15.701
https://github.com/gbif/name-parser/commit/194f184c6c4f729446ba83f0eb84223209e43fc6
https://github.com/gbif/name-parser/commit/90c95c48952574fab9b6090f7812a433d114fef9