Issue 14612
And yet more character encoding bugs
14612
Reporter: feedback bot
Assignee: jlegind
Type: Task
Summary: And yet more character encoding bugs
Status: Open
Created: 2014-01-10 16:54:12.472
Updated: 2014-01-13 11:45:54.717
Description: Follow up to http://dev.gbif.org/issues/browse/PF-1435
Realized that, of course, mis-parsed utf-8 isn't the only problem.
select institution_code, collection_code, catalogue_number, country, locality from raw_occurrence_record where locality like _utf8 x'25efbfbd25' collate utf8_bin and institution_code not in ('nrm') limit 30;
This points out
http://www.gbif.org/occurrence/475213387
among others.
Thanks!
*E-mail*: [mailto:cmccallum, at fas dot harvard dot edu]]]>
Attachment Screen Shot 2014-01-13 at 11.37.56 AM.png
Author: kbraak@gbif.org
Comment: Looking at source file (occurrence.txt from DwC-Archive) reveals the encoding problems are at the source.
Created: 2014-01-13 11:39:53.695
Updated: 2014-01-13 11:39:53.695
Author: kbraak@gbif.org
Comment: Jan, can you please contact the publishers of http://www.gbif.org/dataset/a9e763c8-f674-4492-94a8-4fd4eb9342a5 and help them resolve their encoding problems? Thanks
Created: 2014-01-13 11:41:49.092
Updated: 2014-01-13 11:41:49.092