Issue 17227
IPT update is NOT triggering crawls
17227
Reporter: trobertson
Assignee: mdoering
Type: Bug
Summary: IPT update is NOT triggering crawls
Priority: Blocker
Resolution: Fixed
Status: Closed
Created: 2015-02-12 21:04:44.091
Updated: 2015-02-17 11:42:32.474
Resolved: 2015-02-17 11:42:32.443
Description: I just had a conference call with VertNet (Laura and John W). Here is what we observed:
2 out of 3 times when Laura hit publish we saw:
i. the DOI was minted on the dataset page
ii. the “publication date” was NOT set to today
iii. the crawl was not initiated
The update button in the IPT is not triggering the desired behaviour. I manually clicked “crawl” in the registry and it did do exactly what we would expect. 1 out of 3 times this happened automatically.
I had Rabbit consoles open and the systems were all idle during this period. The registry is not emitting "crawl me now" messages on IPT updates correctly. Perhaps it is incorrectly looking for changes in metadata?]]>
Author: trobertson@gbif.org
Created: 2015-02-13 14:50:40.642
Updated: 2015-02-13 14:50:40.642
This commit hopefully will go a long way to solving this. https://github.com/gbif/registry/commit/a83b52d8070de80a5345cfc1d876ca2688388bda
The issue is a race condition.
Changes to the dataset table immediately trigger crawling by broadcasting a message through rabbit.
However, immediately after changing the dataset, the endpoints are deleted and updated (e.g. the second step of syncing the POSTed update XML from an IPT).
The crawling fires up, and finds no endpoints, as they have just been deleted.
When the endpoints are added, the dataset is scheduled for recrawl.
The second schedule however is discarded as the crawler coordinator decides it has just done it anyway, and assumes it is incorrect that repeated calls to crawl are coming in.
This shows in the logs (Ignoring update...) as:
{code}
[crap@bla6 logs]$ grep 3ad882bb-cd21-4201-8b83-3684bfc6d830 *log
registry-change_stdout.log:INFO [2015-12-02 20:13:50,524+0100] [pool-7-thread-1] org.gbif.occurrence.cli.registry.RegistryChangeListener: Sending crawl for updated dataset [3ad882bb-cd21-4201-8b83-3684bfc6d830]
registry-change_stdout.log:INFO [2015-12-02 20:13:50,923+0100] [pool-7-thread-1] org.gbif.occurrence.cli.registry.RegistryChangeListener: Ignoring update of dataset [3ad882bb-cd21-4201-8b83-3684bfc6d830] because either no crawlable endpoints or we just sent a crawl
registry-change_stdout.log:INFO [2015-12-02 20:13:51,126+0100] [pool-7-thread-1] org.gbif.occurrence.cli.registry.RegistryChangeListener: Ignoring update of dataset [3ad882bb-cd21-4201-8b83-3684bfc6d830] because either no crawlable endpoints or we just sent a crawl
registry-change_stdout.log:INFO [2015-12-02 20:20:18,386+0100] [pool-7-thread-1] org.gbif.occurrence.cli.registry.RegistryChangeListener: Sending crawl for updated dataset [3ad882bb-cd21-4201-8b83-3684bfc6d830]
[crap@bla6 logs]$
{code}