Issue 11473

inventory DanBIF pythonprovider: inventory fail needs more informative error message

11473
Reporter: ahahn
Assignee: kbraak
Type: Bug
Summary: inventory DanBIF pythonprovider: inventory fail needs more informative error message
Description: Indexing on http://kyle.gbif.org:8080/hit/datasource/list.html?filter=datasource&value=the+phycology+herbarium fails at an inventory request, reporting "Inventory request could not be constructed". Need more informative error message to be able to interpret and fix the issue
Priority: Critical
Resolution: WontFix
Status: Closed
Created: 2012-06-22 16:41:23.797
Updated: 2012-10-12 15:02:58.868
Resolved: 2012-10-12 15:02:58.84


Author: kbraak@gbif.org
Created: 2012-06-22 16:46:16.461
Updated: 2012-06-22 16:46:16.461
        
Some logs to assist:

[harvest-webapp] 2012-06-22 16:45:38,300 [pool-1-thread-20] WARN  (DelegatingI18nLog.java:141) - Error interpreting parameter recordsPerResponse null taken from Metadata response:
java.lang.NumberFormatException: null
	at java.lang.Integer.parseInt(Integer.java:417)
	at java.lang.Integer.valueOf(Integer.java:554)
	at org.gbif.harvest.digir.DigirHarvester.inventory(DigirHarvester.java:459)
	at org.gbif.harvest.digir.DigirHarvester.inventory(DigirHarvester.java:424)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
	at java.lang.reflect.Method.invoke(Method.java:597)
	at org.apache.commons.beanutils.MethodUtils.invokeExactMethod(MethodUtils.java:404)
	at org.gbif.harvest.scheduler.SimpleScheduler.triggerMethodInvocation(SimpleScheduler.java:1001)
	at org.gbif.harvest.scheduler.SimpleScheduler.access$400(SimpleScheduler.java:52)
	at org.gbif.harvest.scheduler.SimpleScheduler$RunnableJob.run(SimpleScheduler.java:268)
	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
	at java.lang.Thread.run(Thread.java:662)
[harvest-webapp] 2012-06-22 16:45:38,308 [pool-1-thread-20] INFO  (DelegatingI18nLog.java:105) - Inventory response size defaulting to 200 records.
[harvest-webapp] 2012-06-22 16:45:38,314 [Thread-3] INFO  (JobManagerImpl.java:473) - ***There is already a job running for jobGroup=Datasource[31544]
[harvest-webapp] 2012-06-22 16:45:38,314 [pool-1-thread-20] ERROR (DelegatingI18nLog.java:85) - Inventory request could not be constructed.
org.apache.velocity.exception.ResourceNotFoundException: Unable to find resource 'org/gbif/harvest/digir/null/template/inventory.vm'
	at org.apache.velocity.runtime.resource.ResourceManagerImpl.loadResource(ResourceManagerImpl.java:474)
	at org.apache.velocity.runtime.resource.ResourceManagerImpl.getResource(ResourceManagerImpl.java:352)
	at org.apache.velocity.runtime.RuntimeInstance.getTemplate(RuntimeInstance.java:1533)
	at org.apache.velocity.runtime.RuntimeInstance.getTemplate(RuntimeInstance.java:1514)
	at org.apache.velocity.runtime.RuntimeSingleton.getTemplate(RuntimeSingleton.java:299)
	at org.apache.velocity.app.Velocity.getTemplate(Velocity.java:358)
	at org.gbif.harvest.util.TemplateUtils.getTemplate(TemplateUtils.java:79)
	at org.gbif.harvest.util.TemplateUtils.getAndMerge(TemplateUtils.java:58)
	at org.gbif.harvest.digir.DigirHarvester.fireInventory(DigirHarvester.java:177)
	at org.gbif.harvest.digir.DigirHarvester.inventory(DigirHarvester.java:489)
	at org.gbif.harvest.digir.DigirHarvester.inventory(DigirHarvester.java:424)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
	at java.lang.reflect.Method.invoke(Method.java:597)
	at org.apache.commons.beanutils.MethodUtils.invokeExactMethod(MethodUtils.java:404)
	at org.gbif.harvest.scheduler.SimpleScheduler.triggerMethodInvocation(SimpleScheduler.java:1001)
	at org.gbif.harvest.scheduler.SimpleScheduler.access$400(SimpleScheduler.java:52)
	at org.gbif.harvest.scheduler.SimpleScheduler$RunnableJob.run(SimpleScheduler.java:268)
	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
	at java.lang.Thread.run(Thread.java:662)
[harvest-webapp] 2012-06-22 16:45:38,323 [pool-1-thread-20] ERROR (SimpleScheduler.java:275) - !*!*!*! Job pythonprovider.danbif.dk - inventory failed: null
java.lang.reflect.InvocationTargetException
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
	at java.lang.reflect.Method.invoke(Method.java:597)
	at org.apache.commons.beanutils.MethodUtils.invokeExactMethod(MethodUtils.java:404)
	at org.gbif.harvest.scheduler.SimpleScheduler.triggerMethodInvocation(SimpleScheduler.java:1001)
	at org.gbif.harvest.scheduler.SimpleScheduler.access$400(SimpleScheduler.java:52)
	at org.gbif.harvest.scheduler.SimpleScheduler$RunnableJob.run(SimpleScheduler.java:268)
	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
	at java.lang.Thread.run(Thread.java:662)
Caused by: org.gbif.harvest.exception.HarvesterException: Unable to find resource 'org/gbif/harvest/digir/null/template/inventory.vm'
	at org.gbif.harvest.digir.DigirHarvester.fireInventory(DigirHarvester.java:181)
	at org.gbif.harvest.digir.DigirHarvester.inventory(DigirHarvester.java:489)
	at org.gbif.harvest.digir.DigirHarvester.inventory(DigirHarvester.java:424)
	... 11 more
Caused by: org.apache.velocity.exception.ResourceNotFoundException: Unable to find resource 'org/gbif/harvest/digir/null/template/inventory.vm'
	at org.apache.velocity.runtime.resource.ResourceManagerImpl.loadResource(ResourceManagerImpl.java:474)
	at org.apache.velocity.runtime.resource.ResourceManagerImpl.getResource(ResourceManagerImpl.java:352)
	at org.apache.velocity.runtime.RuntimeInstance.getTemplate(RuntimeInstance.java:1533)
	at org.apache.velocity.runtime.RuntimeInstance.getTemplate(RuntimeInstance.java:1514)
	at org.apache.velocity.runtime.RuntimeSingleton.getTemplate(RuntimeSingleton.java:299)
	at org.apache.velocity.app.Velocity.getTemplate(Velocity.java:358)
	at org.gbif.harvest.util.TemplateUtils.getTemplate(TemplateUtils.java:79)
	at org.gbif.harvest.util.TemplateUtils.getAndMerge(TemplateUtils.java:58)
	at org.gbif.harvest.digir.DigirHarvester.fireInventory(DigirHarvester.java:177)
	... 13 more
    


Author: kbraak@gbif.org
Created: 2012-06-22 18:19:36.523
Updated: 2012-06-22 18:19:36.523
        
First, I started to look at the metadata update and whether that was successful.

I had to restart it a couple times, but the second time it was successful.

On further analysis, the access point ( http://pythonprovider.danbif.dk/botanik/DigirProvider/index_html ) reported 4 conceptual schemas! This is the old DiGIR endpoint, so perhaps this is perfectly allowable, I don't know.

Anyways, from the logs I saw this:

2012-06-22 17:42:17.0	No BioDatasource was created for this Dataset: DiGIR ConceptualSchema not in white list: http://rs.tdwg.org/dwc/curatorial/
2012-06-22 17:42:17.0	Proceding with case 3 (case 3: one of DiGIR, BioCASE, or TAPIR)
2012-06-22 17:42:17.0	Synchronizing dataset with key: e1ce6650-bb9f-11e1-9773-0024e8565763

Indeed, the conceptual schema chosen, isn't in our DiGIR conceptual schema white list, therefore, the corresponding bioDatasource ( http://kyle.gbif.org:8080/hit/datasource/list.html?filter=datasource&value=Botanical+Museum%2C+Copenhagen%2C+the+Phycology+Herbarium ) wasn't updated with all the metadata required for harvesting. That explains why the inventory request could not be built, and you saw the logs:

2012-06-22 16:26:16.0	Inventory request could not be constructed.
2012-06-22 16:26:16.0	Inventory response size defaulting to 200 records.
2012-06-22 16:26:16.0	Error interpreting parameter recordsPerResponse null taken from Metadata response:
2012-06-22 16:26:16.0	Start inventory

So the question comes down to whether the Registry metadata synchronizer should parse all 4 conceptual schemas and just select the whitelisted one, or it just adds them all to the registry, and it's then the HIT's job to iterate through them, looking for the whitelisted one.

Any insights into the workings of the old provider toolkit andrea would be welcome. Thanks.


 
    


Author: trobertson@gbif.org
Created: 2012-06-23 09:50:15.78
Updated: 2012-06-23 09:50:15.78
        
A couple of observations on this:

i) There is an Integer parsing that is non null safe and that is a bug in the code (original exception) and should be addressed.

ii)  The DiGIR schema [1] says for conceptualSchema that the maxOccurs is "unbounded" and that "The namespace, and corresponding schemaLocation the resource understands/supports. A resource can support N schemas, including multiple versions of the same schema. In any case, the host of the resource must be able to understand them as well."

Therefore I think whatever is reading that inventory should look at all schemas, determine the appropriate one to use for harvesting.

[1] http://digir.sourceforge.net/schema/protocol/2003/1.0/digir.xsd



    


Author: kbraak@gbif.org
Comment: Tim, I believe the old DanBIF provider toolkit is the only setup we have ever encountered having support for mutltiple conceptualSchemas. Since DanBIF's installation is soon to be decommissioned, do you support 1) marking this issue as won't fix? or 2) ensuring the current and future crawlers are capable of handling this rare case? Thanks  
Created: 2012-10-12 14:43:27.201
Updated: 2012-10-12 14:43:35.941


Author: trobertson@gbif.org
Created: 2012-10-12 15:00:42.356
Updated: 2012-10-12 15:00:42.356
        
Go for 1)

The issue can always be reopened should it resurface.
    


Author: kbraak@gbif.org
Comment: Thanks Tim. Marking as won't fix. 
Created: 2012-10-12 15:02:58.866
Updated: 2012-10-12 15:02:58.866