Issue 12818

Ensure that geospatial issue is correctly handled

12818
Reporter: trobertson
Assignee: mdoering
Type: Task
Summary: Ensure that geospatial issue is correctly handled
Priority: Major
Resolution: Fixed
Status: Closed
Created: 2013-02-27 10:09:04.733
Updated: 2013-08-29 14:45:17.972
Resolved: 2013-03-20 16:07:48.262
        
Description: In the occurrence handling for georeferenced coordinates, we check that the point falls in the stated country (for example).  This information is stored in the geospatial issue flag.

When doing searches / downloads for bounding boxes, or polygons etc, we therefore need to respect this flag.

I propose that the UI have a checkbox
"include records that have:
  [x] no known coordinate issues
  [ ] known issues with coordinates"

This would then control the geospatial issue =0 or !=0 control in SOLR and in the Hive query (AND geospatial_issue=0 / AND geospatial_issue!=0 with appropriate logic).

It is useful to download records with known issues for a dataset for example, to help people clean their content, so the explicit flag is good to have.

[~mdoering] for information.

]]>
    


Author: mdoering@gbif.org
Created: 2013-02-27 10:15:48.107
Updated: 2013-02-27 10:15:48.107
        
For the UI its good enough to have a single checkbox, no?
{noformat}
[x] include doubtful coordinates
{noformat}

and then an info popup to explain what these doubts/issues potentially are?

    


Author: fmendez@gbif.org
Created: 2013-02-27 10:17:16.131
Updated: 2013-02-27 10:17:16.131
        
ok, FYI  the occurrence index is using the following logic to decide if the coordinate field is indexed or not:
 if ((latitude != null || longitude != null) && (geospatialIssue == null || geospatialIssue == 0)) {
      doc.setField(COORDINATE.getFieldName(), COORD_JOINER.join(latitude, longitude));
    } else {
      doc.setField(COORDINATE.getFieldName(), null);
 }
    


Author: fmendez@gbif.org
Comment: According to https://code.google.com/p/gbif-dataportal/source/browse/trunk/portal-core/src/main/java/org/gbif/portal/util/db/OccurrenceRecordUtils.java  the values related with coordinates issues are:0x01,0x02,0x04,0x08, 0x10, 0x20; rest of values are related to altitude, depth and country name...are these the same values we are using for the new occurrence/hbase table?
Created: 2013-03-05 14:52:39.17
Updated: 2013-03-05 14:52:39.17


Author: fmendez@gbif.org
Created: 2013-03-05 16:47:14.722
Updated: 2013-03-05 16:55:19.807
        
There's a problem to implement this issue, Solr only allows real coordinates for the datatype we are using for storing the fields for the geospatial search; we can't store coordinates with values coming from records with geospatial issues like: GEOSPATIAL_COORDINATES_OUT_OF_RANGE = 0x10, GEOSPATIAL_PRESUMED_INVERTED_COORDINATES = 0x04 because the latitude and longitude values for those documents could be beyond the allowed limits(-90,90,180,-180); this means that we are storing records without geospatial issues only and the filter  "[x] no known coordinate issues /[ ] known issues with coordinates" couldn't have any impact in the search results.

Maybe the indexing logic has an error, the occ index builder is excluding records which contain geospatial issues on fields different to latitude and longitude, to avoid this the validation must be something like (I've overlooked this!..I assumed that the geospatial issues were related to only coordinates):
if ((latitude != null || longitude != null)
      && (geospatialIssue == null || (geospatialIssue == 0 || geospatialIssue > 0x10))) { //geo spatial issued greater than 0x10 are related to altitude,depth and country names
      doc.setField(COORDINATE.getFieldName(), COORD_JOINER.join(latitude, longitude));
    } else {
      doc.setField(COORDINATE.getFieldName(), null);
    }