Issue 14165

Misinterpretation of absence data

Reporter: ahahn
Assignee: jlegind
Type: Task
Summary: Misinterpretation of absence data
Priority: Major
Resolution: WontFix
Status: Closed
Created: 2013-10-07 13:15:43.518
Updated: 2016-02-14 12:11:52.307
Resolved: 2014-09-10 10:50:45.545
Description: See (you may need to fix the URLs):
e.g.  a Herring in Leeds -

The data set page indicates that a unit of -1 means “Land” and 0 means no data:

To us, both mean “present” …


Created: 2013-10-07 13:38:45.999
Updated: 2013-10-07 13:38:45.999
This issue, for now, probably needs fixing on the data publisher side (filter absence records from the published data):

Our current index is not able to interpret absence-data, i.e. records that state that something was _not_ found in a specific place. We have been discussing that for a while, but for the time being, a published record to us means "occurrence", ergo "presence", not "absence". Their metadata explains that empty field=no data and -1=land, but there is no way for us currently to interpret that, and neither will other digesters of the DiGIR-wrapped dataset. For that reason, such records should be filtered at source, so that the wrapper does not transmit them. Otherwise, they all appear on maps like in the given example.


Comment: The PANGAEA contacts have been notified about this issue. They have also been asked to review their other resources for similar patterns.
Created: 2013-10-11 12:01:45.268
Updated: 2013-10-11 12:01:45.268

Created: 2013-10-28 14:41:19.593
Updated: 2013-10-28 15:02:32.189
The contact communicated that data sets served to GBIF are filtered down to the occurences data. Actually, the content that reaches GBIF is something between data and metadata - in their view. A great part of the overall context is lost including the reference to the original source of the data when compiled data are downloaded from GBIF.

He further specified, that in a scientific sense the information that a species has not been observed in a specific temporal or geographical range has also value, in particular when a range of species were investigated.
He thinks that to improve the GBIF services, things need change on both sides - GBIF and data providers.