Issue 18370

Cyrillic names confuse species matching

Reporter: mdoering
Type: Feedback
Summary: Cyrillic names confuse species matching
Priority: Minor
Status: Open
Created: 2016-04-06 11:08:00.584
Updated: 2016-07-25 16:55:25.25
Description: Even though the species name is given in perfect latin there is no species match for occurrences where all the higher classification is given in cyrillic. The species matching should probably be modified to ignore non latin names and we should try to lookup kingdoms in non latin alphabets at least. Maybe a "simple transliteration even works?

A species match with just the name is fine:

But once the cyrillic higher taxa are added the matching service think this is an entire different classification, most importantly the kingdom and family, is not matching up. The match still works though with the latin family given:

I assume the match was not working when we did not strip authorships from higher taxa, see A reindexing of this dataset should solve it

Created: 2016-04-06 13:30:29.721
Updated: 2016-04-06 13:30:29.721
Reprocessing matched all species, so this is a minor issue now:

Author: mblissett
Created: 2016-07-25 16:55:25.25
Updated: 2016-07-25 16:55:25.25

Transliterated using an online transliterator gives

kingdom=Rastenija (means plants)
phylum=cvetkovye (means flowering plants)
class=odnodol'nye (means monocots)
order=Sparzhacvetnye (means Asparagales)

So the Russian terms aren't simple transliterations of Latin.