Issue 12004

Natural History Museum Rotterdam - harvested text could not be found error, DwC archive

12004
Reporter: jlegind
Assignee: kbraak
Type: Bug
Summary: Natural History Museum Rotterdam - harvested text could not be found error, DwC archive
Priority: Major
Resolution: Fixed
Status: Closed
Created: 2012-10-10 16:31:51.23
Updated: 2013-01-15 14:40:04.776
Resolved: 2012-11-22 15:49:31.56
        
Description: Aborts indexation on this DwC archive.
http://hit.gbif.org/console/list.html?datasourceId=12456

Log excerpt:

2012-10-10 15:53:35.0 	Synchronisation was interrupted due to an error: null
org.gbif.harvest.portal.synchronise.GBIFPortalSynchroniser. synchroniseDwc ( GBIFPortalSynchroniser.java: 3045 )
org.gbif.harvest.portal.synchronise.GBIFPortalSynchroniser. synchronise ( GBIFPortalSynchroniser.java: 1649 )
org.gbif.harvest.portal.synchronise.GBIFPortalSynchroniser. synchronise ( GBIFPortalSynchroniser.java: 1451 )
sun.reflect.GeneratedMethodAccessor332. invoke ( : -1 )
sun.reflect.DelegatingMethodAccessorImpl. invoke ( DelegatingMethodAccessorImpl.java: 25 )
java.lang.reflect.Method. invoke ( Method.java: 597 )
org.apache.commons.beanutils.MethodUtils. invokeExactMethod ( MethodUtils.java: 404 )
org.gbif.harvest.scheduler.SimpleScheduler. triggerMethodInvocation ( SimpleScheduler.java: 1090 )
org.gbif.harvest.scheduler.SimpleScheduler. access$500 ( SimpleScheduler.java: 57 )
org.gbif.harvest.scheduler.SimpleScheduler$RunnableJob. run ( SimpleScheduler.java: 334 )
java.util.concurrent.ThreadPoolExecutor$Worker. runTask ( ThreadPoolExecutor.java: 886 )
java.util.concurrent.ThreadPoolExecutor$Worker. run ( ThreadPoolExecutor.java: 908 )
java.lang.Thread. run ( Thread.java: 662 )
2012-10-10 15:53:34.0 	File: /mnt/fiber/super_hit/natural_history_museum_rotterdam-3ead5cf3/fa87d03e-4959-451d-865f-ff03bb798339/harvested.txt could not be found: File '/mnt/fiber/super_hit/natural_history_museum_rotterdam-3ead5cf3/fa87d03e-4959-451d-865f-ff03bb798339/harvested.txt' does not exist]]>
    


Author: kbraak@gbif.org
Created: 2012-10-12 16:55:42.571
Updated: 2012-10-12 16:55:42.571
        
Result of the initial investigation:

On the file system, I located the harvested.txt file in the following folder:

/mnt/fiber/super_hit/natural_history_museum_rotterdam-3ead5cf3/fa87d03e-4959-451d-865f-ff03bb798339/natural_history_museum_rotterdam__nl__-_ifsm_collection_of_mollusca

The BioDatasource directory is configured to persist here:
"directory":"natural_history_museum_rotterdam-3ead5cf3/fa87d03e-4959-451d-865f-ff03bb798339"

Download and process harvested don't save to this directory.. 
    


Author: kbraak@gbif.org
Comment: I have just run through a download -> processHarvested -> synchronise successfully updating the index. The correct directory path was used in all steps, and therefore the same error was not observed. We can keep this issue open for a while, validating other DwC-Archives.
Created: 2012-10-16 10:55:38.069
Updated: 2012-10-16 10:55:38.069


Author: kbraak@gbif.org
Created: 2012-11-22 15:49:31.601
Updated: 2012-11-22 15:49:31.601
        
After some observation, the issue has not appeared again.

Closing issue.