Issue 18748

Null bytes present in downloads

18748
Reporter: nickyn
Type: Bug
Summary: Null bytes present in downloads
Priority: Unassessed
Status: Open
Created: 2016-09-27 11:42:13.679
Updated: 2016-09-29 10:24:34.453
        
Description: Null bytes present in data fields are passed through to the simple CSV and DwCA format downloads, which complicates the use of these formats for upload to postgres.
An example is occurrence record 1137618601 which contains a null byte in the locality field.
Is it possible for you to implement a pre-processing step to strip null bytes either when the data are collated or the download is produced (the data as-is can be retained in the verbatim format). A similar kind of pre-processing step appears to be implemented to strip newlines from data.

Verbatim view of the example record here:
http://www.gbif.org/occurrence/1137618601/fragment ]]>
    


Author: mblissett
Comment: These ought to be stripped, I thought they already were.  [~fmendez@gbif.org] / [~cgendreau]?
Created: 2016-09-28 17:00:49.478
Updated: 2016-09-28 17:01:15.119


Author: cgendreau
Created: 2016-09-29 10:23:53.977
Updated: 2016-09-29 10:24:34.449
        
The previous issue was POR-3123 where we introduced a NUL with a wrong data type mapping.

This time it's in the source data and we do not strip it, we need to apply these pre-processing steps before storing the data and we should not rely on the reading library of the data(probably the cause of PF-2625 which comes from TAPIR).