Issue 17302

Flag names with UTF garbage as doubtful

17302
Reporter: mdoering
Type: Improvement
Summary: Flag names with UTF garbage as doubtful
Priority: Major
Status: Open
Created: 2015-02-23 10:08:32.988
Updated: 2016-02-05 17:44:46.445
        
Description: There are some names in our backbone (e.g. Adiantum confine Fée) that have utf8 garbaged authors. It would be good to detect that and either try to not use any author, repair them by trying to correctly apply UTF8 encodings or the easiest use an authorship from a different backbone source that also treats that name

http://www.gbif.org/species/3748123]]>
    


Author: mblissett
Created: 2016-02-05 17:44:46.445
Updated: 2016-02-05 17:44:46.445
        
I noticed some of these in Catalogue of Life.

Some errors can be corrected:

{{echo Bân | iconv -f utf-8 -t iso-8859-1 | iconv -f utf-8}}
{{Bân}}

It happened because UTF-8 was treated as ISO-8859-1:

{{echo â | iconv -f iso-8859-1}}
{{Bân}}

although {{Keßler}} in the same dataset cannot, I think the ß became some invalid character.

But maybe this kind of fix belongs higher up the chain.