Issue 14528

eventDate interpreted using wrong timezone

14528
Reporter: kbraak
Assignee: cgendreau
Type: Bug
Summary: eventDate interpreted using wrong timezone
Priority: Critical
Resolution: Fixed
Status: Closed
Created: 2014-01-06 12:44:08.392
Updated: 2016-08-08 15:22:51.802
Resolved: 2016-08-08 15:22:51.7
        
Description: E.g.

http://api.gbif.org/v0.9/occurrence/814046005/verbatim
http://api.gbif.org/v0.9/occurrence/814046005

Our interpretation appears to read the occurrenceDate as 1992-04-07 00:00, with an assumed Copenhagen timezone (+0200 in the summer), so records 1992-04-06 22:00

]]>
    


Author: trobertson@gbif.org
Created: 2014-04-29 09:42:03.661
Updated: 2014-04-29 09:42:03.661
        
Possibly related - when processing the EU BON report, I notice spikes in 2007 that are on year 2000 but in the 2013 index, they show on 1999.
Effectively both have been run through the occurrence0.17 SNAPSHOT date handling as follows.  Note that these are year, month, day.  This might be unrelated, but seems highly similar to this issue:

{code}
SELECT
parseDate(year,month,day) d
FROM raw_2007_12_31

SELECT
parseDate(year,month,day) d
FROM raw_2013_12_31
{code}
    


Author: cgendreau
Created: 2016-01-13 16:56:56.156
Updated: 2016-01-13 16:56:56.156
        
The source of the issue is the JSON serialization when the Date is turned into a String by the following line:
MAPPER.configure(SerializationConfig.Feature.WRITE_DATES_AS_TIMESTAMPS, false);

Source:
https://github.com/gbif/gbif-common-ws/blob/43aae02becde20adbd4c84d9d59f830011a9bd40/src/main/java/org/gbif/ws/json/JacksonJsonContextResolver.java#L37

    


Author: cgendreau
Created: 2016-01-13 17:12:51.227
Updated: 2016-01-13 17:12:51.227
        
Solution is to use setDateFormat() on the MAPPER.

MAPPER.setDateFormat(new SimpleDateFormat("yyyy-MM-dd'T'HH:mm:ss.SSS"));
Note: SimpleDateFormat is NOT thread safe, so maybe Apache FastDateFormat.

Ideally, I would NOT return the Timezone to avoid confusion. But, I'm not sure if this would be a breaking change for people consuming the response of the API.

If we need to return it, I think we should return UTC but without applying the time difference.
Ex.
Verbatim 1992-04-07 will return 1992-04-07T00:00:00.000+0000 and NOT 1992-04-06T22:00:00.000+0000

[~mdoering@gbif.org] [~fmendez@gbif.org] What do you think?

    


Author: mdoering@gbif.org
Created: 2016-01-13 17:25:31.324
Updated: 2016-01-13 17:25:31.324
        
Jackson says it defaults to use GMT for the dates:
http://wiki.fasterxml.com/JacksonFAQDateHandling

We once agreed in GBIF to use GMT time for all our datat, not Copenhagen time. I still think thats a good solution we should also indicate on the API and/or processing docs.

So it appears to me that the problem is rather before when we process the data and apparently create a Copenhagen timezone date? That should be GMT.
    


Author: cgendreau
Created: 2016-01-13 22:06:04.137
Updated: 2016-01-13 22:06:04.137
        
Jackson transforms it to GMT based on the server time(using Calendar) so, it removes 2 hours. The date we have stored in HBase is correct.
Actually this is correct for probably all fields except eventDate since this field really represents a LocalDate.

The problem appears when no time is specified and we set 0:00. From there, removing 2 hours moves it to the day before.

My suggestion is to only change the behavior for eventDate. This means I would not touch the ObjectMapper but only change the Jackson config on eventDate in the model.

Now we need to agree on what should we return?

I think we should avoid returning a TimeZone since there is really no TimeZone. I'm not sure if this could affect the users (e.g. those parsing the date with R).

Eventually, I would like to address the handling of eventDate and probably add eventLocalDate to make it even more clear. But this will be for future discussion, now we can focus on TimeZone.



    


Author: cgendreau
Comment: [~jlegind@gbif.org],[~mblissett] Do you think returning eventDate: "1985-04-02T00:00:00.000", instead of eventDate: "1985-04-02T00:00:00.000+0000" could be an issue for the R, Python, JS users of the API?
Created: 2016-01-14 09:56:00.173
Updated: 2016-01-14 09:56:09.336


Author: mdoering@gbif.org
Comment: I think its fine to return the date without timezone. The timezone less format is still a valid according to ISO format, so its not a real change
Created: 2016-01-14 11:48:01.37
Updated: 2016-01-14 11:48:01.37


Author: mblissett
Created: 2016-01-14 12:37:44.539
Updated: 2016-01-14 12:37:44.539
        
I think that's fine.  Libraries will cope, and anyone doing anything manually will probably be ignoring the timezone.

Note that we're currently +0100 in Copenhagen, it's only +0200 during the "summer" (last Sunday of March to last Sunday of October).  And this is why we shouldn't use a local timezone!

    


Author: jlegind@gbif.org
Created: 2016-01-29 13:40:08.694
Updated: 2016-01-29 13:40:08.694
        
[~cgendreau] The date works fine without the timezone.
For Python there are many ways to skin this cat : time.strptime(mydate, "%Y-%m-%dT%H:%M:%S.%f") turns it into a named tuple that can be manipulated. There is also the Arrow library making all this easy.

It also work in R without a hitch.

    


Author: cgendreau
Created: 2016-04-15 15:13:17.211
Updated: 2016-04-15 15:13:17.211
        
This issue also prevents to find a record back from its date:
http://api.gbif.org/v1/occurrence/817858116
Can not be found even with a date range query:
http://api.gbif.org/v1/occurrence/search?eventDate=2008-04-02,2008-04-07&taxonKey=2498252
    


Author: cgendreau
Created: 2016-07-11 15:52:41.917
Updated: 2016-07-11 15:52:41.917
        
From occurrence-0.48, the parsed date is always UTC but the JSON response will still display +0000 for the moment.
Records must be reinterpreted.