Issue 17241

Wikipedia crawl not removed from zookeeper

17241
Reporter: mdoering
Type: Bug
Summary: Wikipedia crawl not removed from zookeeper
Description: Trying to crawl the wikipedia checklist on Feb 16th left the crawl still "running" in zookeeper the next day. CRAM shows it and the coordinator cleanup thread keeps visiting it (see attached screens)
Priority: Major
Resolution: Fixed
Status: Closed
Created: 2015-02-17 11:05:43.156
Updated: 2015-02-18 18:07:53.163
Resolved: 2015-02-18 18:07:53.134


Author: mdoering@gbif.org
Created: 2015-02-17 18:31:51.804
Updated: 2015-02-17 18:31:51.804
        
The original crawl got shot down during dwca-metasyncing. The registry produced a 500 exception when posting the EML, see POR-2657

-----
8:INFO  [2015-02-16 17:50:01,797+0100] [pool-9-thread-2] org.gbif.crawler.dwca.metasync.DwcaMetasyncService: Updating metadata from DwC-A for dataset [cbb6498e-8927-405a-916b-576d00a6289b]
15:ERROR [2015-02-16 17:50:02,586+0100] [pool-9-thread-2] org.gbif.crawler.dwca.metasync.DwcaMetasyncService: Exception caught during metasyncing DwC-A [cbb6498e-8927-405a-916b-576d00a6289b]
35:

Problem accessing /dataset/cbb6498e-8927-405a-916b-576d00a6289b/document. Reason:



Author: mdoering@gbif.org
Created: 2015-02-18 17:06:39.408
Updated: 2015-02-18 17:06:39.408
        
Apart from misconfigurations on uat clis there has been a zookeeper/curator exception fixed that lead to zookeeper never being updated by the checklistbank-cli: https://github.com/gbif/checklistbank/commit/1901d663c79e19159543d007631c4151f7d6e08b

The exception seen might as well show up in crawler or occurrence cli as the code was copied from there. All curator ZK paths need to start with a / and cannot be relative:

ERROR [2015-02-18 12:51:42,683+0100] [main] org.gbif.checklistbank.cli.common.ZookeeperUtils: Exception while deleting ZooKeeper path crawls/cbb6498e-8927-405a-916b-576d00a6289b
java.lang.IllegalArgumentException: Path must start with / character
        at org.apache.curator.utils.PathUtils.validatePath(PathUtils.java:54) ~[checklistbank-cli.jar:2.11]
        at org.apache.curator.utils.PathUtils.validatePath(PathUtils.java:37) ~[checklistbank-cli.jar:2.11]
        at org.apache.curator.utils.ZKPaths.fixForNamespace(ZKPaths.java:63) ~[checklistbank-cli.jar:2.11]
        at org.apache.curator.framework.imps.NamespaceImpl.fixForNamespace(NamespaceImpl.java:82) ~[checklistbank-cli.jar:2.11]