Issue 14271

Logging geospatial issues: combination of several issues not logged

14271
Reporter: ahahn
Assignee: mblissett
Type: Task
Summary: Logging geospatial issues: combination of several issues not logged
Priority: Major
Status: Open
Created: 2013-10-22 10:39:20.578
Updated: 2016-02-05 16:25:18.672
        
Description: When checking georeferenced records for potential errors, a combination of several errors may be spotted (e.g. "country coordinate mismatch" plus "presumed negated longitude). The intention of the geospatial issue flag in the old index was to report on all combination of issues in the geospatial_issue flag.

At present, the new index also only contains one field for such issues, but seems to only report on isolated issues:

- select geospatial_issue, count(*) from occurrence_hdfs where geospatial_issue is not null group by geospatial_issue;

results in:
geospatial_issue	count(*)
0	357018257
1      63335 (= 0x01, GEOSPATIAL_PRESUMED_NEGATED_LATITUDE)
2	424530 (= 0x02, GEOSPATIAL_PRESUMED_NEGATED_LONGITUDE)
4	59430 (= 0x04, GEOSPATIAL_PRESUMED_INVERTED_COORDINATES)
8	239488 (= 0x08, GEOSPATIAL_ZERO_COORDINATES)
16	164155 (= 0x10, GEOSPATIAL_COORDINATES_OUT_OF_RANGE)
32	8416302 (= 0x20, GEOSPATIAL_COUNTRY_COORDINATE_MISMATCH)

-> combinations of issues do not seem to exist at all, and neither do issues > 32 (e.g. unknown country name, altitude out of range etc)

At least the combination of several spotted coordinate issues we should try to re-introduce (whether as before in a combined field, or as individual flags), and at least evaluate if the same should be done for the country-name, altitude and depth issue flags.

This was closed as won't fix on ROL-13, but remains an issue in the new setup, as we must be underreporting geospatial issues. From the proportion of issues reported within georeferenced records (about 2.7%, comparable to the proportion since 2007) I assume that it is a "first-one-(or last-one-)wins", rather than no issue reported at all.]]>
    
Attachment HowToInterpretGeospatialIssueFlags.doc


Author: ahahn@gbif.org
Created: 2013-10-22 10:44:02.189
Updated: 2013-10-22 10:44:02.189
        
Also see "Tim Robertson added a comment - 06/Mar/13 10:24" on ROL-13

    


Author: omeyn@gbif.org
Comment: Your assumptions are all correct Andrea, and the new processing makes the same mistakes as the rollover. I'm not sure if it's logged here, but Tim and I have talked about it a bit and will probably head to a flag-per-issue model rather than the current combination model. That would make life easier for searching specific issues. Will be part of the general Occurrence refactor coming up in Nov/Dec 2013.
Created: 2013-10-22 11:16:17.185
Updated: 2013-10-22 11:16:17.185