Summary: eventDate interpreted using wrong timezone
Created: 2014-01-06 12:44:08.392
Updated: 2016-08-08 15:22:51.802
Resolved: 2016-08-08 15:22:51.7
Our interpretation appears to read the occurrenceDate as 1992-04-07 00:00, with an assumed Copenhagen timezone (+0200 in the summer), so records 1992-04-06 22:00
Created: 2014-04-29 09:42:03.661
Updated: 2014-04-29 09:42:03.661
Possibly related - when processing the EU BON report, I notice spikes in 2007 that are on year 2000 but in the 2013 index, they show on 1999.
Effectively both have been run through the occurrence0.17 SNAPSHOT date handling as follows. Note that these are year, month, day. This might be unrelated, but seems highly similar to this issue:
Created: 2016-01-13 16:56:56.156
Updated: 2016-01-13 16:56:56.156
The source of the issue is the JSON serialization when the Date is turned into a String by the following line:
Created: 2016-01-13 17:12:51.227
Updated: 2016-01-13 17:12:51.227
Solution is to use setDateFormat() on the MAPPER.
Note: SimpleDateFormat is NOT thread safe, so maybe Apache FastDateFormat.
Ideally, I would NOT return the Timezone to avoid confusion. But, I'm not sure if this would be a breaking change for people consuming the response of the API.
If we need to return it, I think we should return UTC but without applying the time difference.
Verbatim 1992-04-07 will return 1992-04-07T00:00:00.000+0000 and NOT 1992-04-06T22:00:00.000+0000
[~email@example.com] [~firstname.lastname@example.org] What do you think?
Created: 2016-01-13 17:25:31.324
Updated: 2016-01-13 17:25:31.324
Jackson says it defaults to use GMT for the dates:
We once agreed in GBIF to use GMT time for all our datat, not Copenhagen time. I still think thats a good solution we should also indicate on the API and/or processing docs.
So it appears to me that the problem is rather before when we process the data and apparently create a Copenhagen timezone date? That should be GMT.
Created: 2016-01-13 22:06:04.137
Updated: 2016-01-13 22:06:04.137
Jackson transforms it to GMT based on the server time(using Calendar) so, it removes 2 hours. The date we have stored in HBase is correct.
Actually this is correct for probably all fields except eventDate since this field really represents a LocalDate.
The problem appears when no time is specified and we set 0:00. From there, removing 2 hours moves it to the day before.
My suggestion is to only change the behavior for eventDate. This means I would not touch the ObjectMapper but only change the Jackson config on eventDate in the model.
Now we need to agree on what should we return?
I think we should avoid returning a TimeZone since there is really no TimeZone. I'm not sure if this could affect the users (e.g. those parsing the date with R).
Eventually, I would like to address the handling of eventDate and probably add eventLocalDate to make it even more clear. But this will be for future discussion, now we can focus on TimeZone.
Comment: [~email@example.com],[~mblissett] Do you think returning eventDate: "1985-04-02T00:00:00.000", instead of eventDate: "1985-04-02T00:00:00.000+0000" could be an issue for the R, Python, JS users of the API?
Created: 2016-01-14 09:56:00.173
Updated: 2016-01-14 09:56:09.336
Comment: I think its fine to return the date without timezone. The timezone less format is still a valid according to ISO format, so its not a real change
Created: 2016-01-14 11:48:01.37
Updated: 2016-01-14 11:48:01.37
Created: 2016-01-14 12:37:44.539
Updated: 2016-01-14 12:37:44.539
I think that's fine. Libraries will cope, and anyone doing anything manually will probably be ignoring the timezone.
Note that we're currently +0100 in Copenhagen, it's only +0200 during the "summer" (last Sunday of March to last Sunday of October). And this is why we shouldn't use a local timezone!
Created: 2016-01-29 13:40:08.694
Updated: 2016-01-29 13:40:08.694
[~cgendreau] The date works fine without the timezone.
For Python there are many ways to skin this cat : time.strptime(mydate, "%Y-%m-%dT%H:%M:%S.%f") turns it into a named tuple that can be manipulated. There is also the Arrow library making all this easy.
It also work in R without a hitch.
Created: 2016-04-15 15:13:17.211
Updated: 2016-04-15 15:13:17.211
This issue also prevents to find a record back from its date:
Can not be found even with a date range query:
Created: 2016-07-11 15:52:41.917
Updated: 2016-07-11 15:52:41.917
From occurrence-0.48, the parsed date is always UTC but the JSON response will still display +0000 for the moment.
Records must be reinterpreted.