Issue 11099

Logging geospatial issues: combination of several issues not logged to occurrence_record.geospatial_issue

Reporter: ahahn
Assignee: lfrancke
Type: Bug
Summary: Logging geospatial issues: combination of several issues not logged to occurrence_record.geospatial_issue
Priority: Major
Resolution: WontFix
Status: Closed
Created: 2012-05-15 14:47:31.286
Updated: 2013-12-17 15:23:03.887
Resolved: 2013-09-13 11:14:51.058
Description: Detection of geospatial issues during rollover processing gets logged in two places: as an entry to gbif_log_message, and in occurrence_record.geospatial_issue. The geospatial_issue flag value encodes for issues concerning coordinates, country names, altitude and depth. It is a single integer value, but can encode for both single issues and a combination of two or more detected issues.

In the index version live on May 15, 2012, a SELECT DISTINCT on occurrence_record.geospatial_issue suggests that the field contains only single issues, no combination of several detected issues. There always where records with, e.g., both GEOSPATIAL_PRESUMED_NEGATED_LATITUDE and PRESUMED_NEGATED_LONGITUDE in the past, so that it appears suspicious there should be none at all - this seems to rather be something not handled in the current rollover processing.

select geospatial_issue,count(*) from occurrence_record where geospatial_issue != 0 group by 1;

flag value | occurrence record count -> this flag stands for hex=issue description

Attachment: file explaining the interpretation of geospatial issue flags and combinations of issues]]>
Attachment HowToInterpretGeospatialIssueFlags.doc

Created: 2013-03-06 10:24:12.869
Updated: 2013-03-06 10:24:12.869
So on the geospatial issue, here is a summary of the current state:

We only have the following single (!) values encoded in the rollover:
  public static final int GEOSPATIAL_PRESUMED_NEGATED_LATITUDE      = 0x01;
  public static final int GEOSPATIAL_PRESUMED_NEGATED_LONGITUDE     = 0x02;
  public static final int GEOSPATIAL_PRESUMED_INVERTED_COORDINATES  = 0x04;
  public static final int GEOSPATIAL_ZERO_COORDINATES               = 0x08;
  public static final int GEOSPATIAL_COORDINATES_OUT_OF_RANGE       = 0x10;
  public static final int GEOSPATIAL_COUNTRY_COORDINATE_MISMATCH    = 0x20;

And there is no code to handle the following:
  public static final int GEOSPATIAL_UNKNOWN_COUNTRY_NAME           = 0x40;
  public static final int GEOSPATIAL_ALTITUDE_OUT_OF_RANGE          = 0x80;
  public static final int GEOSPATIAL_PRESUMED_ERRONOUS_ALTITUDE          = 0x100;
  public static final int GEOSPATIAL_PRESUMED_MIN_MAX_ALTITUDE_REVERSED          = 0x200;
  public static final int GEOSPATIAL_PRESUMED_DEPTH_IN_FEET          = 0x400;
  public static final int GEOSPATIAL_DEPTH_OUT_OF_RANGE             = 0x800;
  public static final int GEOSPATIAL_PRESUMED_MIN_MAX_DEPTH_REVERSED          = 0x1000;
  public static final int GEOSPATIAL_PRESUMED_ALTITUDE_IN_FEET          = 0x2000;
  public static final int GEOSPATIAL_PRESUMED_ALTITUDE_NON_NUMERIC          = 0x4000;
  public static final int GEOSPATIAL_PRESUMED_DEPTH_NON_NUMERIC          = 0x8000;

So to fix the situation 2 things need to happen.

1) We need to handle the missing values
2) We need to merge things, and not use single values (I think rooted in the this class )

I suspect we need to craft a new UDF that takes depth and altitude as well, and then builds it properly.

Comment: If I understand correctly this should be fixed by the new realtime architecture.
Created: 2013-09-13 11:14:51.08
Updated: 2013-09-13 11:14:51.08