Issue 15096

Create a fake dataset for crawling into dev to test interpretation

15096
Reporter: omeyn
Assignee: kbraak
Type: Task
Summary: Create a fake dataset for crawling into dev to test interpretation
Priority: Major
Resolution: Fixed
Status: Closed
Created: 2014-02-17 11:25:13.867
Updated: 2014-02-21 14:00:30.319
Resolved: 2014-02-21 14:00:30.294
        
Description: done when:
- dwca zip file is internet addressable and crawlable in dev an uat (registered in dev and uat registries)
- expected outcome for each record in the dataset (ie expected issues, country, lat lng etc) and for each dataset in summary (so that we can compare with the stats page in portal)
- consult with Andrea and Jan to add problematic stuff
]]>
    


Author: kbraak@gbif.org
Created: 2014-02-18 17:49:08.691
Updated: 2014-02-18 17:49:08.691
        
Test Excel file added to occurrence-processor project.

Each row tests 1 thing only, and is uniquely identified by its occurrenceID.

The occurrenceRemarks contains information about what's being tested, and the expected result.

Progress so far, is a few conditions for the modified date.

File committed in https://github.com/gbif/occurrence/commit/e41fd1a0ff73e3e6d2363f6d3f7cbdb2e8674aef
    


Author: kbraak@gbif.org
Created: 2014-02-20 17:39:08.038
Updated: 2014-02-20 17:39:08.038
        
The fake dataset in Excel, csv (gzipped) including eml.xml have been committed to github.

Outstanding, is to add the test dataset to the dev registry, using the following endpoints:

DwC-Archive endpoint: https://github.com/gbif/occurrence/raw/master/occurrence-processor/src/test/resources/dataset_to_test_gbif_interpretation.csv.gz

EML endpoint: https://raw2.github.com/gbif/occurrence/master/occurrence-processor/src/test/resources/eml.xml
    


Author: kbraak@gbif.org
Comment: The dataset has now been registered in the dev registry, and is ready for crawling/interpretation. See here: http://www.gbif-dev.org/dataset/5e296a86-f28d-4e53-bf83-b8493e18c71b
Created: 2014-02-21 12:35:14.064
Updated: 2014-02-21 12:35:14.064


Author: kbraak@gbif.org
Created: 2014-02-21 13:59:54.064
Updated: 2014-02-21 13:59:54.064
        
Another couple test records added, see https://github.com/gbif/occurrence/commit/99fb7b5c8e006e426ef65e5b3a870e61a15e6e00

The gzipped csv test dataset was rebuilt, see https://github.com/gbif/occurrence/commit/57cd770c298d76b77d0ac5510f7c3ca6f71b1d51
    


Author: kbraak@gbif.org
Created: 2014-02-21 14:00:30.317
Updated: 2014-02-21 14:00:30.317
        
Dataset completed, registered, and ready for crawling/interpretation.

In total, there are 74 test records.