Issue 15624

DwC-a datasets recently crawled blocks re-crawling of same datasets

15624
Reporter: jlegind
Type: Bug
Summary: DwC-a datasets recently crawled blocks re-crawling of same datasets
Priority: Critical
Status: Open
Created: 2014-05-19 12:14:07.269
Updated: 2016-02-15 13:45:38.512
        
Description: When a darwin core dataset has been crawled, it seems that it is tying up the crawling queue for that particular dataset, even though the crawl is finished.

One example is Guadeloupe_Bananier 1b93b558-b8ce-4e1a-af90-5a84e7f1c038

Here is an excerpt from the Kibana logs:

19.05 10:38:57	crawler-coordinator	ERROR	Caught exception while trying to enqueue crawl [1b93b558-b8ce-4e1a-af90-5a84e7f1c038]

Also the process does not write logs to the registry console. This is the CRAM logs:

{
  "datasetKey": "1b93b558-b8ce-4e1a-af90-5a84e7f1c038",
  "crawlJob": {
    "datasetKey": "1b93b558-b8ce-4e1a-af90-5a84e7f1c038",
    "endpointType": "DWC_ARCHIVE",
    "targetUrl": "http://www.gbif.fr:8080/ipt/archive.do?r=guadeloupe_bananier",
    "attempt": 8,
    "properties": {}
  },
  "startedCrawling": "2014-05-15T14:13:45.720+0000",
  "finishedCrawling": "2014-05-15T14:13:46.440+0000",
  "finishReason": "NORMAL",
  "pagesCrawled": 1,
  "pagesFragmentedSuccessful": 1,
  "pagesFragmentedError": 0,
  "fragmentsEmitted": 461,
  "fragmentsReceived": 461,
  "rawOccurrencesPersistedNew": 461,
  "rawOccurrencesPersistedUpdated": 0,
  "rawOccurrencesPersistedUnchanged": 0,
  "rawOccurrencesPersistedError": 0,
  "fragmentsProcessed": 461,
  "verbatimOccurrencesPersistedSuccessful": 461,
  "verbatimOccurrencesPersistedError": 0,
  "interpretedOccurrencesPersistedSuccessful": 0,
  "interpretedOccurrencesPersistedError": 0
}

VS the console logs:
http://registry.gbif.org/web/index.html#/dataset/1b93b558-b8ce-4e1a-af90-5a84e7f1c038/crawl

Notice that the CRAM is at crawl attempt 8 while registry console is at 6, and that no records have been indexed into the portal.]]>
    


Author: omeyn@gbif.org
Comment: Needs verification on recent code, then might no longer be "tiny".
Created: 2015-03-02 15:56:04.407
Updated: 2015-03-02 15:56:04.407