Issue 12059

Raw occurrence record coordinates with commas (,) on them, are not correctly interpreted into their processed values into the occurrence_record table

12059
Reporter: jcuadra
Assignee: omeyn
Type: Bug
Summary: Raw occurrence record coordinates with commas (,) on them, are not correctly interpreted into their processed values into the occurrence_record table
Priority: Major
Resolution: Fixed
Status: Closed
Created: 2012-10-22 15:39:56.579
Updated: 2014-08-06 15:38:34.341
Resolved: 2014-08-06 15:38:34.221
        
Description: When interpreting from a ROR to an OR, if the coordinates on the ROR are separated by commas (,) then these coordinates are not interpreted properly into the OR table.

The currently seen behavior is that it sets a null value to these OR coordinates, and the geospatial_issue flag stays at 0.

Example:

SELECT id, latitude, longitude FROM raw_occurrence_record o where id = 201111358

SELECT id, latitude, longitude, geospatial_issue FROM occurrence_record o where id = 201111358]]>
    


Author: lfrancke@gbif.org
Comment: This is a gbif-parsers issue I think.
Created: 2012-10-22 15:47:28.061
Updated: 2012-10-22 15:47:28.061


Author: jcuadra@gbif.org
Comment: Do you think it is possible to solve this problem for the next rollover? I haven't worked with the gbif-parsers project so I'm afraid I can't fix the problem. We have a request from a user which has spotted the issue (coordinates with commas not getting processed correctly). Any help will be appreciated.
Created: 2012-10-30 11:26:09.33
Updated: 2012-10-30 11:26:09.33


Author: lfrancke@gbif.org
Created: 2012-11-07 17:12:41.048
Updated: 2012-11-07 17:12:41.048
        
gbif-parsers {{GeospatialParseUtils}} is where this needs to be implemented. We're using {{Double.parseDouble}} for this which isn't locale aware. We can use {{java.text.NumberFormat}} instead and try a couple of different Locales I guess.

I'll try to come up with a patch.
    


Author: lfrancke@gbif.org
Created: 2012-11-07 17:49:53.128
Updated: 2012-11-07 17:49:53.128
        
Two things. The XML specification actually (and DiGiR XSD) mandates a period as a decimal separator so in this case and for this example it should in theory be fixed upstream.

I agree though that it'd be nice to parse other decimal separators as well: https://en.wikipedia.org/wiki/Decimal_mark

But it's now a matter of fuzzy parsing. What does 1,000 mean? At the moment we're ignoring half of the world (my half even!) but it's not easy to fix so I won't spend more time on this at the moment.
    


Author: jcuadra@gbif.org
Comment: But then on processing, for these records, we should mark the geospatial_issue flag with something besides 0 (which means they did not had any geospatial issues).
Created: 2012-11-07 18:02:42.214
Updated: 2012-11-07 18:02:42.214