Issue 11310

registry-sync: long running biocase sync never terminates properly after encountering error

11310
Reporter: kbraak
Assignee: fmendez
Type: Bug
Summary: registry-sync: long running biocase sync never terminates properly after encountering error
Priority: Critical
Resolution: Fixed
Status: Closed
Created: 2012-06-07 17:22:55.264
Updated: 2013-12-16 17:50:49.244
Resolved: 2012-06-08 17:29:50.977
        
Description: Technical installation uuid = f861f398-abee-11e1-9773-0024e8565763
ACCESS POINT URL = http://biocase.senckenberg.de/biocase/pywrapper.cgi

last log msgs from registry-sync:

16:56:54.187 o.g.r.s.s.b.BiocaseMetadataSynchronizer An error occured while parsing dataset, the process will continue trying to get rest of metadata
org.gbif.registry.sync.util.ParseException: Object could not be interpreted, parser returned null
	at org.gbif.registry.sync.util.digest.DigesterUtils.digestObject(DigesterUtils.java:238) ~[registry-sync-1.0-SNAPSHOT.jar:1.0-SNAPSHOT]
	at org.gbif.registry.service.sync.biocase.BiocaseMetadataParser.parseDataset(BiocaseMetadataParser.java:61) ~[registry-sync-1.0-SNAPSHOT.jar:1.0-SNAPSHOT]
	at org.gbif.registry.service.sync.biocase.BiocaseMetadataSynchronizer.parseResourceFromTitle(BiocaseMetadataSynchronizer.java:343) [registry-sync-1.0-SNAPSHOT.jar:1.0-SNAPSHOT]
	at org.gbif.registry.service.sync.biocase.BiocaseMetadataSynchronizer.getResourceFromTitle(BiocaseMetadataSynchronizer.java:208) [registry-sync-1.0-SNAPSHOT.jar:1.0-SNAPSHOT]
	at org.gbif.registry.service.sync.biocase.BiocaseMetadataSynchronizer.parseAgent(BiocaseMetadataSynchronizer.java:290) [registry-sync-1.0-SNAPSHOT.jar:1.0-SNAPSHOT]
	at org.gbif.registry.service.sync.biocase.BiocaseMetadataSynchronizer.getResources(BiocaseMetadataSynchronizer.java:232) [registry-sync-1.0-SNAPSHOT.jar:1.0-SNAPSHOT]
	at org.gbif.registry.service.sync.biocase.BiocaseMetadataSynchronizer.synchronize(BiocaseMetadataSynchronizer.java:437) [registry-sync-1.0-SNAPSHOT.jar:1.0-SNAPSHOT]
	at org.gbif.registry.service.sync.messaging.RegistrySyncListener.synchronize(RegistrySyncListener.java:160) [registry-sync-1.0-SNAPSHOT.jar:1.0-SNAPSHOT]
	at org.gbif.registry.service.sync.messaging.RegistrySyncListener.onMessage(RegistrySyncListener.java:102) [registry-sync-1.0-SNAPSHOT.jar:1.0-SNAPSHOT]
	at org.gbif.registry.service.sync.messaging.RegistrySyncListener.onMessage(RegistrySyncListener.java:40) [registry-sync-1.0-SNAPSHOT.jar:1.0-SNAPSHOT]
	at org.gbif.messaging.amqp.AMQPListener.getMessage(AMQPListener.java:75) [gbif-messaging-1.0-20120605.073808-26.jar:1.0-SNAPSHOT]
	at org.gbif.messaging.amqp.AMQPThreadListener.startListening(AMQPThreadListener.java:73) [gbif-messaging-1.0-20120605.073808-26.jar:1.0-SNAPSHOT]
	at org.gbif.messaging.amqp.AMQPThreadListener.run(AMQPThreadListener.java:57) [gbif-messaging-1.0-20120605.073808-26.jar:1.0-SNAPSHOT]
	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) [na:1.6.0_22]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) [na:1.6.0_22]
	at java.lang.Thread.run(Thread.java:662) [na:1.6.0_22]
16:56:54.192 o.g.registry.sync.util.TemplateUtils Template has been successfully loaded from: org/gbif/harvest/biocase/biocase_1_3/template/search.vm
16:56:54.196 o.g.registry.sync.util.TemplateUtils Template and parameters merge to: 

0.98 Jun 7, 2012 4:56:54 PM 130.226.238.143 http://biocase.senckenberg.de/biocase/pywrapper.cgi?dsa=gpit search
http://www.tdwg.org/schemas/abcd/2.06 http://www.tdwg.org/schemas/abcd/2.06 true
16:56:54.741 o.g.r.s.s.b.BiocaseMetadataSynchronizer Search response : nt 2.6.6 (r266:84297, Aug 24 2010, 18:46:32) [MSC v.1500 32 bit (Intel)] 2.5.3 pymssql Server module v0.1 using pymssq1.0.2 2012-06-07T16:56:46.973000 gpit@biocase.senckenberg.de 130.226.238.143 search 0 Datasource wrapper gpit requested Reading PSF from E:\biocase\config\datasources\gpit\provider_setup_file.xml PSF: PSF=E:\biocase\config\datasources\gpit\provider_setup_file.xml, recLimit=10, loglevel=20, user=wrapper, database=tuebingen, dbIP=172.30.5.40, dbms=pymssql, encoding=latin_1, schemas={u'http://www.tdwg.org/schemas/abcd/2.06': <biocase.wrapper.psf_handler.SupportedSchema instance at 0x00BEBB70>}, tablegraph=GRAPH: graph: unit-wr_fundzeiten, unit-wr_abcdmetadata, unit-wr_typus, unit-wr_biosystematik, unit-wr_fundorte, +++ ALIAS2TABLE: {u'wr_typus': u'wr_typus', u'wr_fundorte': u'wr_fundorte', u'wr_abcdmetadata': u'abcdmetadata_wr', u'wr_fundzeiten': u'wr_fundzeiten', u'wr_biosystematik': u'wr_biosystematik', u'unit': u'wr_objekt'} BioCASe protocol used. HTTP parameter 'request' used for building the request. The biocase.wrapper.dbmod.dbmod_pymssql database module has been used. Try to get CMF for namespace http://www.tdwg.org/schemas/abcd/2.06 Load CMFile 'E:\biocase\config\datasources\gpit\cmf_ABCD_2.06.xml' Executing SQL: 'SELECT COUNT([unit].[objekt_id]) FROM [wr_objekt] AS [unit]' Hits: 1 ]]>


Author: kbraak@gbif.org
Comment: This error actually appears to have killed the registry-sync service altogether!
Created: 2012-06-07 17:24:59.364
Updated: 2012-06-07 17:24:59.364


Author: kbraak@gbif.org
Created: 2012-06-07 17:42:37.182
Updated: 2012-06-07 17:42:37.182
        
And exact same thing happened again, this time with this bioCase tech installation.

Common thing in the HIT logs is: f861f398-abee-11e1-9773-0024e8565763

2012-06-07 17:36:13.0	An error occured while parsing dataset, the process will continue trying to get rest of metadata
2012-06-07 17:36:12.0	Parsing dataset: NO_TITLE
2012-06-07 17:36:11.0	Synchronizing service d58b5ec4-adb5-4312-984f-2274472a12df, url: http://biocase.senckenberg.de/biocase/pywrapper.cgi?dsa=gpit

Similarly, the service dies completely after the error
    


Author: kbraak@gbif.org
Comment: A test after Fede's last commit (http://code.google.com/p/gbif-registry/source/detail?r=2973) appears to have fixed this problem. 
Created: 2012-06-08 17:29:51.153
Updated: 2012-06-08 17:29:51.153