Issue 18436
Show reason last crawl attempt failed on dataset page
18436
Reporter: kbraak
Assignee: bko
Type: Improvement
Summary: Show reason last crawl attempt failed on dataset page
Priority: Unassessed
Status: Open
Created: 2016-04-27 10:30:35.964
Updated: 2017-10-10 15:21:27.976
Description: The crawl history maintained in the registry is insufficient. The root of the failure usually needs to be extracted from the Kibana logs, or determined manually.
If a crawl attempt fails, where the reason is unknown, the dataset page could show a status "Failed - under investigation".
After determining the root of the failure, the GBIF Data Manager [~jlegind] could push a more detailed status update such as: "Failed - missing unique occurrenceIDs" ]]>
Author: mblissett
Created: 2016-04-29 18:00:11.232
Updated: 2016-04-29 18:02:38.743
The logs could be extracted using the Elasticsearch API. Queries can be built and exported from Kibana.
I don't know how reliable this would be — would useful messages be the most recent ones?
{code}
curl -XGET 'http://kibana2.gbif.org/logstash-2016.04.29/_search?pretty' -d '{
"query": {
"filtered": {
"filter": {
"bool": {
"must": [
{
"fquery": {
"query": {
"query_string": {
"query": "datasetKey:(\"d5162873-89e0-40c7-8472-2e735c2443fd\")"
}
},
"_cache": true
}
}
]
}
}
}
},
"size": 10,
"sort": [
{
"@timestamp": {
"order": "desc",
"ignore_unmapped": true
}
},
{
"@timestamp": {
"order": "desc",
"ignore_unmapped": true
}
}
]
}'
{code}
Extract from result:
{code}
"_source" : {
"message" : "DwC-A for dataset [d5162873-89e0-40c7-8472-2e735c2443fd] not modified. Crawl finished",
"@version" : "1",
"@timestamp" : "2016-04-29T15:44:19.611Z",
"type" : "dwca-downloader",
"host" : "130.226.238.174:50460",
"path" : "org.gbif.crawler.dwca.downloader.DwcaCrawlConsumer",
"priority" : "INFO",
"logger_name" : "org.gbif.crawler.dwca.downloader.DwcaCrawlConsumer",
"thread" : "QueueBuilder-6",
"log_timestamp" : 1461944660575,
"attempt" : "3",
"datasetKey" : "d5162873-89e0-40c7-8472-2e735c2443fd"
},
{code}