Issue 14612

And yet more character encoding bugs

14612
Reporter: feedback bot
Assignee: jlegind
Type: Task
Summary: And yet more character encoding bugs
Status: Open
Created: 2014-01-10 16:54:12.472
Updated: 2014-01-13 11:45:54.717
        
        
Description: Follow up to http://dev.gbif.org/issues/browse/PF-1435

Realized that, of course, mis-parsed utf-8 isn't the only problem.

select institution_code, collection_code, catalogue_number, country, locality from raw_occurrence_record where locality like _utf8 x'25efbfbd25'  collate utf8_bin and institution_code not in ('nrm') limit 30;

This points out
  http://www.gbif.org/occurrence/475213387
among others.

Thanks!
*E-mail*: [mailto:cmccallum, at fas dot harvard dot edu]]]>
    

Attachment Screen Shot 2014-01-13 at 11.37.56 AM.png



Author: kbraak@gbif.org
Comment: Looking at source file (occurrence.txt from DwC-Archive) reveals the encoding problems are at the source.
Created: 2014-01-13 11:39:53.695
Updated: 2014-01-13 11:39:53.695


Author: kbraak@gbif.org
Comment: Jan, can you please contact the publishers of http://www.gbif.org/dataset/a9e763c8-f674-4492-94a8-4fd4eb9342a5 and help them resolve their encoding problems? Thanks
Created: 2014-01-13 11:41:49.092
Updated: 2014-01-13 11:41:49.092