Issue 15624
DwC-a datasets recently crawled blocks re-crawling of same datasets
15624
Reporter: jlegind
Type: Bug
Summary: DwC-a datasets recently crawled blocks re-crawling of same datasets
Priority: Critical
Status: Open
Created: 2014-05-19 12:14:07.269
Updated: 2016-02-15 13:45:38.512
Description: When a darwin core dataset has been crawled, it seems that it is tying up the crawling queue for that particular dataset, even though the crawl is finished.
One example is Guadeloupe_Bananier 1b93b558-b8ce-4e1a-af90-5a84e7f1c038
Here is an excerpt from the Kibana logs:
19.05 10:38:57 crawler-coordinator ERROR Caught exception while trying to enqueue crawl [1b93b558-b8ce-4e1a-af90-5a84e7f1c038]
Also the process does not write logs to the registry console. This is the CRAM logs:
{
"datasetKey": "1b93b558-b8ce-4e1a-af90-5a84e7f1c038",
"crawlJob": {
"datasetKey": "1b93b558-b8ce-4e1a-af90-5a84e7f1c038",
"endpointType": "DWC_ARCHIVE",
"targetUrl": "http://www.gbif.fr:8080/ipt/archive.do?r=guadeloupe_bananier",
"attempt": 8,
"properties": {}
},
"startedCrawling": "2014-05-15T14:13:45.720+0000",
"finishedCrawling": "2014-05-15T14:13:46.440+0000",
"finishReason": "NORMAL",
"pagesCrawled": 1,
"pagesFragmentedSuccessful": 1,
"pagesFragmentedError": 0,
"fragmentsEmitted": 461,
"fragmentsReceived": 461,
"rawOccurrencesPersistedNew": 461,
"rawOccurrencesPersistedUpdated": 0,
"rawOccurrencesPersistedUnchanged": 0,
"rawOccurrencesPersistedError": 0,
"fragmentsProcessed": 461,
"verbatimOccurrencesPersistedSuccessful": 461,
"verbatimOccurrencesPersistedError": 0,
"interpretedOccurrencesPersistedSuccessful": 0,
"interpretedOccurrencesPersistedError": 0
}
VS the console logs:
http://registry.gbif.org/web/index.html#/dataset/1b93b558-b8ce-4e1a-af90-5a84e7f1c038/crawl
Notice that the CRAM is at crawl attempt 8 while registry console is at 6, and that no records have been indexed into the portal.]]>
Author: omeyn@gbif.org
Comment: Needs verification on recent code, then might no longer be "tiny".
Created: 2015-03-02 15:56:04.407
Updated: 2015-03-02 15:56:04.407