Issue 14801

Crawling is not triggered when resources are (re)published in IPT

14801
Reporter: jlegind
Assignee: omeyn
Type: Bug
Summary: Crawling is not triggered when resources are (re)published in IPT
Priority: Critical
Resolution: Fixed
Status: Resolved
Created: 2014-01-16 12:27:01.434
Updated: 2014-06-11 14:01:30.834
Resolved: 2014-06-11 14:01:30.801
        
Description: A dataset from the Korean IPT was published but wasn't picked up by the crawler.
http://registry.gbif.org/web/index.html#/dataset/84352188-f762-11e1-a439-00145eb45e9a/crawl
I was only crawled once 7 days ago.

Likewise Xander has pointed out that a dataset updated from the Dutch IPT on the 13th of January http://registry.gbif.org/web/index.html#/dataset/c68d6fdc-f921-47f3-95c6-ca9766280ad4/crawl was not updated in the portal.

I suspect this will also affect custom HTTP installations such as GBIF Ireland.
]]>
    


Author: omeyn@gbif.org
Comment: Believe this was because registry couldn't send rabbit messages. This due to rabbit being rebooted but registry not restarted when rabbit came back (lots of Connection already closed errors in the reg log). I've restarted reg and it can send again. Reopen this issue if problem reappears.
Created: 2014-01-23 10:56:13.633
Updated: 2014-01-23 10:56:13.633


Author: jlegind@gbif.org
Created: 2014-02-20 12:17:42.888
Updated: 2014-02-20 12:17:42.888
        
[~omeyn@gbif.org]
I believe we are seeing the same issue again - I have looked at Artdata http://registry.gbif.org/web/index.html#/dataset/38b4c89f-584c-41bb-bd8f-cd1def33e92f/crawl and it was crawled 21 days ago yet it was republished on the 19th of February: http://www.gbif.se/ipt/
There were also four vertnet IPT datasets from Colorado that did not pick up and Laura Russel contacted me about it.

Artdata last emitted 34,933,136 fragments and the new version of the dataset has 35,037,918 records.

There is a Norwegian dataset 'NINA Vanndata fisk' that will be re-published/updated tomorrow 2014-02-21 and we will see if it is picked up by the crawler, but I do not think it is going to happen.
http://registry.gbif.org/web/index.html#/dataset/a639542a-654a-427b-9cf1-bde1953bbb52

    


Author: omeyn@gbif.org
Comment: Turns out this was a mistake in the original coding - on update we were only checking for changing owning org, and didn't do a crawl. Fixed in trunk, will be in production as part of next production release (1st half of March).
Created: 2014-02-26 09:52:14.312
Updated: 2014-02-26 09:52:14.312