Issue 11402

Detect garbage vernacular names and automatically flag them

11402
Reporter: mdoering
Assignee: mdoering
Type: Improvement
Summary: Detect garbage vernacular names and automatically flag them
Priority: Major
Resolution: Fixed
Status: Closed
Created: 2012-06-11 11:28:29.28
Updated: 2013-12-09 13:40:59.763
Resolved: 2012-06-11 11:58:34.688
        
Description: We can detect major character encoding issues with vernacular names by spotting characters that never occur in real names.
For example for the below list these are ¶§∞Ω∫∏≈±≤

More examples: http://ecat-dev.gbif.org/usage/5231240

Rio Mayo Titi Monkey [en]
Rio Mayo titi [en]
Rio Mayo titi (TEMPLATE) [en]
Tocón Colorado [es]
리오마요티티 [ko]
리오마요티티 [ko]
-ê-Ω-¥—Å-∫-∏ —Ç-∏—Ç-∏ [sr]-ê-Ω-¥—Å-∫-∏ —Ç-∏—Ç-∏ [sr]]]>
    


Author: mdoering@gbif.org
Created: 2012-06-11 11:38:32.379
Updated: 2012-06-11 11:38:32.379
        
list of chars to start out with:
£§©ª®°±¶ºΩ‡‰™∏√∞∫≈≤≥◊