Issue 17242

Failed deploys leave multiple ws instances running

17242
Reporter: omeyn
Assignee: fmendez
Type: Bug
Summary: Failed deploys leave multiple ws instances running
Priority: Major
Status: Open
Created: 2015-02-17 11:09:53.279
Updated: 2015-02-17 11:52:31.275
        
Description: I'd expect only a single instance running.

[root@uatapps2 geocode-ws]# ps aux | grep geo
root      2354  0.4  1.6 3048128 201984 ?      Sl   10:43   0:06 java -Xms256m -Xmx512m -XX:HeapDumpPath=/usr/local/gbif/services/geocode-ws/jvm-dumps -XX:+HeapDumpOnOutOfMemoryError -XX:MaxPermSize=256m -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=60 -jar geocode-ws-0.6.jar -conf /usr/local/gbif/services/geocode-ws/0.6/1424166135/application.properties -host apps2.gbif-uat.org -httpAdminPort 9030 -httpPort 9029 -externalAdminPort 9030 -externalPort 9029 -stopSecret stop -timestamp 1424166135 -zkHost prodmaster1-vh.gbif.org:2181,prodmaster2-vh.gbif.org:2181,prodmaster3-vh.gbif.org:2181 -zkPath uat/services
root      6233  1.9  1.7 3048128 213444 ?      Sl   11:03   0:06 java -Xms256m -Xmx512m -XX:HeapDumpPath=/usr/local/gbif/services/geocode-ws/jvm-dumps -XX:+HeapDumpOnOutOfMemoryError -XX:MaxPermSize=256m -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=60 -jar geocode-ws-0.6.jar -conf /usr/local/gbif/services/geocode-ws/0.6/1424167339/application.properties -host apps2.gbif-uat.org -httpAdminPort 9046 -httpPort 9045 -externalAdminPort 9046 -externalPort 9045 -stopSecret stop -timestamp 1424167339 -zkHost prodmaster1-vh.gbif.org:2181,prodmaster2-vh.gbif.org:2181,prodmaster3-vh.gbif.org:2181 -zkPath uat/services
root      6940  0.0  0.0 112644   932 pts/0    S+   11:08   0:00 grep --color=auto geo
root     13497  0.1  1.3 3058408 160992 ?      Sl   Feb11  12:00 java -Xms256m -Xmx512m -XX:HeapDumpPath=/usr/local/gbif/services/geocode-ws/jvm-dumps -XX:+HeapDumpOnOutOfMemoryError -XX:MaxPermSize=256m -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=60 -jar geocode-ws-0.6.jar -conf /usr/local/gbif/services/geocode-ws/0.6/1423658554/application.properties -host apps2.gbif-uat.org -httpAdminPort 8099 -httpPort 8098 -externalAdminPort 8099 -externalPort 8098 -stopSecret stop -timestamp 1423658554 -zkHost zk1.gbif.org:2181,zk2.gbif.org:2181,zk3.gbif.org:2181 -zkPath uat/services
root     21700  0.1  1.0 3048128 132248 ?      Sl   Feb16   1:09 java -Xms256m -Xmx512m -XX:HeapDumpPath=/usr/local/gbif/services/geocode-ws/jvm-dumps -XX:+HeapDumpOnOutOfMemoryError -XX:MaxPermSize=256m -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=60 -jar geocode-ws-0.7-20141120.153043-1.jar -conf /usr/local/gbif/services/geocode-ws/0.7-SNAPSHOT/1424102667/application.properties -host apps2.gbif-uat.org -httpAdminPort 9018 -httpPort 9017 -externalAdminPort 9018 -externalPort 9017 -stopSecret stop -timestamp 1424102667 -zkHost zk1.gbif.org:2181,zk2.gbif.org:2181,zk3.gbif.org:2181 -zkPath uat/services]]>
    


Author: trobertson@gbif.org
Created: 2015-02-17 11:36:57.335
Updated: 2015-02-17 11:36:57.335
        
Were they all registered in ZK?  That is important information to include as we need to understand if they are "runaway" instances (which may be 1 additional  bug in the micro service) or if they are registered as a parallel deploy (which is a bug deployment procedure bug).

If you see it again, please can you post a screenshot of a ZK browser or the terminal listing?
    


Author: omeyn@gbif.org
Comment: This was one failed deploy after INF-116 so at most 3 of these were in ZK, the others were runaway.
Created: 2015-02-17 11:44:18.413
Updated: 2015-02-17 11:44:18.413


Author: trobertson@gbif.org
Created: 2015-02-17 11:52:31.275
Updated: 2015-02-17 11:52:31.275
        
As far as I can tell reading https://github.com/gbif/gbif-microservice/blob/master/src/main/java/org/gbif/ws/discovery/lifecycle/DiscoveryLifeCycle.java and http://curator.apache.org/curator-x-discovery/ the registration and deregistration in the ZK service registry are triggered only by lifecycle changes on the container.

[~fmendez@gbif.org] - this is not happening, but for curiousity - what would happen should the ZK be manually edited to remove the entry?  Would the Jetty continue to run?