Issue 16517

Download has 0 records - should have 522,721

16517
Reporter: kbraak
Assignee: fmendez
Type: Bug
Summary: Download has 0 records - should have 522,721
Priority: Critical
Resolution: Fixed
Status: Resolved
Created: 2014-10-03 16:48:02.9
Updated: 2015-03-04 14:03:29.014
Resolved: 2015-03-04 14:03:28.986
        
Description: I just reran this download on 3 Oct 2014 at 16:39

I received an email, saying it finished successfully:

{quote}

Your download 0000369-141002143147142 is ready at the following address: http://api.gbif.org/v1/occurrence/download/request/0000369-141002143147142.zip (13.2 kB - 522,721 records - 0 datasets)
For downloaded data usage reference please visit http://www.gbif.org/faq/datause
Download Information
Creation Date: 2014-10-03 16:32:30 CEST
Filter used:
	LAST_INTERPRETED: 2014-06
	BASIS_OF_RECORD: Human Observation
	MONTH: >=August
	COUNTRY: Denmark
	GEOMETRY: POLYGON((7.866210 56.776808,7.910156 55.813629,9.514160 55.329144,11.228027 55.924585,11.271972 57.397624,10.612792 57.774517,7.866210 56.776808))
	HAS_GEOSPATIAL_ISSUE: false
	PUBLISHING_COUNTRY: DK

{quote}

Unfortunately, the download contained 0 records. I attach the zipped download to this issue.

To rerun the occurrence search use:

{quote}

http://www.gbif.org/occurrence/search?MONTH=8%2C*&HAS_GEOSPATIAL_ISSUE=false&BASIS_OF_RECORD=HUMAN_OBSERVATION&LAST_INTERPRETED=2014-06&COUNTRY=DK&GEOMETRY=7.866210+56.776808%2C7.910156+55.813629%2C9.514160+55.329144%2C11.228027+55.924585%2C11.271972+57.397624%2C10.612792+57.774517%2C7.866210+56.776808&PUBLISHING_COUNTRY=DK

{quote}]]>
    
Attachment 0000369-141002143147142.zip


Author: kbraak@gbif.org
Comment: [~trobertson@gbif.org] please take note of this issue.
Created: 2014-10-03 16:48:59.504
Updated: 2014-10-03 16:48:59.504


Author: kbraak@gbif.org
Comment: Download attached.
Created: 2014-10-03 16:49:26.57
Updated: 2014-10-03 16:49:26.57


Author: trobertson@gbif.org
Created: 2014-10-04 17:59:06.812
Updated: 2014-10-04 17:59:06.812
        
This simulates the same download, and returns in 0 records.  This should help in debugging the cause.

{code}
add jar /Users/tim/dev/git/gbif/occurrence/occurrence-download-workflow/target/oozie-workflow/lib/occurrence-hive-0.21.1-SNAPSHOT.jar;
add jar /Users/tim/dev/git/gbif/occurrence/occurrence-download-workflow/target/oozie-workflow/lib/jts-1.13.jar;
CREATE TEMPORARY FUNCTION contains AS 'org.gbif.occurrence.hive.udf.ContainsUDF';
CREATE TABLE tim.por_2474 STORED AS RCFILE
AS SELECT gbifid
FROM prod_b.occurrence_hdfs
WHERE
((hasgeospatialissues = false) AND
(contains("POLYGON((7.866210 56.776808,7.910156 55.813629,9.514160 55.329144,11.228027 55.924585,11.271972 57.397624,10.612792 57.774517,7.866210 56.776808))", decimallatitude, decimallongitude)) AND
(countrycode = "DK") AND (publishingcountry = "DK") AND
(basisofrecord = "HUMAN_OBSERVATION") AND
(lastinterpreted = 1401573600000) AND
(month >= 8))
{code}
    


Author: trobertson@gbif.org
Created: 2014-10-04 18:16:08.429
Updated: 2014-10-04 18:16:08.429
        
This fails because of the {{LAST_INTERPRETED: 2014-06}} which results in {{WHERE lastinterpreted = 1401573600000}} being added to the Hive query +*which is the bug*+.

The [same query without the interpreted date filter|http://www.gbif.org/occurrence/search?MONTH=8%2C*&HAS_GEOSPATIAL_ISSUE=false&BASIS_OF_RECORD=HUMAN_OBSERVATION&COUNTRY=DK&GEOMETRY=7.866210+56.776808%2C7.910156+55.813629%2C9.514160+55.329144%2C11.228027+55.924585%2C11.271972+57.397624%2C10.612792+57.774517%2C7.866210+56.776808&PUBLISHING_COUNTRY=DK] returns 523,088 records:
{quote}
Your download is ready
Your [download|http://api.gbif.org/v1/occurrence/download/request/0000645-141002143147142.zip] is ready for download since 10/4/14 6:08:40 PM
Download information: 50.9 MB - 523,088 records - 10 datasets
{quote}

Here we see the correct number of records, noting it has 1 extra which is the header row.
{code}
$ wc -l occurrence.txt
  523089 occurrence.txt
{code}
    


Author: fmendez@gbif.org
Comment: https://github.com/gbif/occurrence/commit/d52f339dcbefeaddca8f58b4adc1ff9640ee7666
Created: 2015-03-04 14:03:29.012
Updated: 2015-03-04 14:03:29.012