Issue 17723
crawler cleanup not cleaning up pensoft checklists
17723
Reporter: mdoering
Assignee: trobertson
Type: Bug
Summary: crawler cleanup not cleaning up pensoft checklists
Priority: Major
Status: Open
Created: 2015-07-24 15:41:45.529
Updated: 2015-07-27 14:27:01.381
Description: the checklist has been indexed by clb and occ clis, but the occ state still saysits running and the crawl does not get cleaned up:
"startedCrawling" : 1437648596882,
"finishedCrawling" : 1437648597089,
"crawlContext" : null,
"finishReason" : "NORMAL",
"processStateOccurrence" : "RUNNING",
"processStateChecklist" : "FINISHED",]]>
Author: trobertson@gbif.org
Created: 2015-07-27 11:11:11.816
Updated: 2015-07-27 11:11:11.816
Here is the full ZK dump from the crawler.gbif.org for an example:
{code}
{
"datasetKey": "e0971ffe-6108-4a29-9007-d68a8eaa5b26",
"crawlJob": {
"datasetKey": "e0971ffe-6108-4a29-9007-d68a8eaa5b26",
"endpointType": "DWC_ARCHIVE",
"targetUrl": "http://bdj.pensoft.net/lib/ajax_srv/archive_download.php?archive_type=2&document_id=1037",
"attempt": 12,
"properties": {}
},
"startedCrawling": "2015-07-23T10:50:02.662+0000",
"finishedCrawling": "2015-07-23T10:50:02.857+0000",
"finishReason": "NORMAL",
"processStateOccurrence": "RUNNING",
"processStateChecklist": "FINISHED",
"pagesCrawled": 1,
"pagesFragmentedSuccessful": 1,
"pagesFragmentedError": 0,
"fragmentsEmitted": 24,
"fragmentsReceived": 24,
"rawOccurrencesPersistedNew": 0,
"rawOccurrencesPersistedUpdated": 0,
"rawOccurrencesPersistedUnchanged": 24,
"rawOccurrencesPersistedError": 0,
"fragmentsProcessed": 24,
"verbatimOccurrencesPersistedSuccessful": 0,
"verbatimOccurrencesPersistedError": 0,
"interpretedOccurrencesPersistedSuccessful": 0,
"interpretedOccurrencesPersistedError": 0
}
{code}
It appears that when all fragments result in unchanged raw occurrence records, it is not detected that this is actually a successful exit outcome.