Issue 13529

Hi, I found something a little strange debatabl...

Reporter: feedback bot
Assignee: fmendez
Type: Bug
Summary: Hi,    I found something a little strange debatabl...
Resolution: Fixed
Status: Closed
Created: 2013-07-23 16:47:18.303
Updated: 2016-09-28 17:04:04.844
Resolved: 2013-07-29 13:57:43.294
Description: Hi,

I found something a little strange debatable while playing with the occurrences export.

As the export format seems to be a small superset of DwC-A, I'm actually adding "support" for it into

The issue is:

Within the attached file (export from the portal), you can notice that in line 189433446, one field (one of the last) contains the Unicode new line character (

When using external tools to parse the occurrence file (Python standard library in my case) and setting them up for UTF-8 (since it's reported to be UTF-8 in meta.xml), line is automatically considered as finished when encountering this character.

I have mixed feeling about if this is a bug or not:

- on one side, as the metafile specify \n in linesTerminatedBy, it is not unreasonable to filter all other characters when splitting the file in lines.
- on the other, it is still strange to encounter this character in an UTF-8 file and simply ignore it. And I guess many consumer tools will have problem like this. In case of Python, I'll probably need to subclass the standard File class.

*Reporter*: Nicolas NoƩ
*E-mail*: []]]>

Created: 2013-07-29 13:57:43.324
Updated: 2013-07-29 13:57:43.324
Tabs and "line break" characters are removed from uninterpreted text fields