Issue 11099

Logging geospatial issues: combination of several issues not logged to occurrence_record.geospatial_issue

11099
Reporter: ahahn
Assignee: lfrancke
Type: Bug
Summary: Logging geospatial issues: combination of several issues not logged to occurrence_record.geospatial_issue
Priority: Major
Resolution: WontFix
Status: Closed
Created: 2012-05-15 14:47:31.286
Updated: 2013-12-17 15:23:03.887
Resolved: 2013-09-13 11:14:51.058
        
Description: Detection of geospatial issues during rollover processing gets logged in two places: as an entry to gbif_log_message, and in occurrence_record.geospatial_issue. The geospatial_issue flag value encodes for issues concerning coordinates, country names, altitude and depth. It is a single integer value, but can encode for both single issues and a combination of two or more detected issues.

In the index version live on May 15, 2012, a SELECT DISTINCT on occurrence_record.geospatial_issue suggests that the field contains only single issues, no combination of several detected issues. There always where records with, e.g., both GEOSPATIAL_PRESUMED_NEGATED_LATITUDE and PRESUMED_NEGATED_LONGITUDE in the past, so that it appears suspicious there should be none at all - this seems to rather be something not handled in the current rollover processing.

select geospatial_issue,count(*) from occurrence_record where geospatial_issue != 0 group by 1;

flag value | occurrence record count -> this flag stands for hex=issue description
 1 |    42092 -> 0x01=GEOSPATIAL_PRESUMED_NEGATED_LATITUDE
 2 |   334324 -> 0x02=GEOSPATIAL_PRESUMED_NEGATED_LONGITUDE
 4 |    23952 -> 0x04=GEOSPATIAL_PRESUMED_INVERTED_COORDINATES
 8 |   241202 -> 0x08=GEOSPATIAL_ZERO_COORDINATES
16 |   147551 -> 0x10=GEOSPATIAL_COORDINATES_OUT_OF_RANGE
32 |  9424169 -> 0x20=GEOSPATIAL_COUNTRY_COORDINATE_MISMATCH

Attachment: file explaining the interpretation of geospatial issue flags and combinations of issues]]>
    
Attachment HowToInterpretGeospatialIssueFlags.doc


Author: trobertson@gbif.org
Created: 2013-03-06 10:24:12.869
Updated: 2013-03-06 10:24:12.869
        
So on the geospatial issue, here is a summary of the current state:

We only have the following single (!) values encoded in the rollover:
  public static final int GEOSPATIAL_PRESUMED_NEGATED_LATITUDE      = 0x01;
  public static final int GEOSPATIAL_PRESUMED_NEGATED_LONGITUDE     = 0x02;
  public static final int GEOSPATIAL_PRESUMED_INVERTED_COORDINATES  = 0x04;
  public static final int GEOSPATIAL_ZERO_COORDINATES               = 0x08;
  public static final int GEOSPATIAL_COORDINATES_OUT_OF_RANGE       = 0x10;
  public static final int GEOSPATIAL_COUNTRY_COORDINATE_MISMATCH    = 0x20;

And there is no code to handle the following:
  public static final int GEOSPATIAL_UNKNOWN_COUNTRY_NAME           = 0x40;
  public static final int GEOSPATIAL_ALTITUDE_OUT_OF_RANGE          = 0x80;
  public static final int GEOSPATIAL_PRESUMED_ERRONOUS_ALTITUDE          = 0x100;
  public static final int GEOSPATIAL_PRESUMED_MIN_MAX_ALTITUDE_REVERSED          = 0x200;
  public static final int GEOSPATIAL_PRESUMED_DEPTH_IN_FEET          = 0x400;
  public static final int GEOSPATIAL_DEPTH_OUT_OF_RANGE             = 0x800;
  public static final int GEOSPATIAL_PRESUMED_MIN_MAX_DEPTH_REVERSED          = 0x1000;
  public static final int GEOSPATIAL_PRESUMED_ALTITUDE_IN_FEET          = 0x2000;
  public static final int GEOSPATIAL_PRESUMED_ALTITUDE_NON_NUMERIC          = 0x4000;
  public static final int GEOSPATIAL_PRESUMED_DEPTH_NON_NUMERIC          = 0x8000;

So to fix the situation 2 things need to happen.

1) We need to handle the missing values
2) We need to merge things, and not use single values (I think rooted in the this class https://code.google.com/p/gbif-occurrencestore/source/browse/trunk/occurrence-store/src/main/java/org/gbif/occurrencestore/hive/udf/CoordinateParsingUDF.java )

I suspect we need to craft a new UDF that takes depth and altitude as well, and then builds it properly.
    


Author: lfrancke@gbif.org
Comment: If I understand correctly this should be fixed by the new realtime architecture.
Created: 2013-09-13 11:14:51.08
Updated: 2013-09-13 11:14:51.08