Issue 12591
Adapt crawling infrastructure to work with DwC-A as well as XML based datasets

12591
Reporter: lfrancke
Assignee: lfrancke
Type: Task
Summary: Adapt crawling infrastructure to work with DwC-A as well as XML based datasets
Priority: Major
Resolution: Fixed
Status: Closed
Created: 2013-01-15 13:24:27.678
Updated: 2013-12-17 16:13:00.378
Resolved: 2013-01-18 15:52:05.667
        
Description: This needs changes in:

* gbif-api (to expose two different queue sets)
* crawler coordinator (needs to put on proper queue depending on dataset endpoint type)
* xml crawlserver (adapting to new paths)
* dwca crawlserver
* web service needs new resources
* the CRAP monitor needs to use new web service resources

To test it I'm going to do the following:
* Build a new crawler.jar with my uncommitted changes and deploy it on b14g2
* Kill Crawler & Coordinator on b14g2
* Clean ZK
* Start Coordinator & Crawler (or wait until after next step with crawler)
* Queue all Datasets and see if CRAP works
* Manually call WebService to see if the DwC-A datasets were enqueued successfully as well
* Kill Crawler so that it's easier to monitor DwC-A crawling
* Start DwC-A downloader
* Start DwC-A fragmenter]]>