Issue 16099

Crawling refuses to unzip EBI dataset 'Geographically tagged INSDC sequences'

16099
Reporter: jlegind
Assignee: omeyn
Type: Bug
Summary: Crawling refuses to unzip EBI dataset 'Geographically tagged INSDC sequences' 
Priority: Major
Resolution: Fixed
Status: Closed
Created: 2014-07-10 12:16:47.806
Updated: 2014-09-17 10:50:38.088
Resolved: 2014-09-17 10:50:38.051
        
Description: Hi Oliver, this is the log from the first attempt at crawling the EBI dataset:

UUID ad43e954-dd79-4986-ae34-9ccdbd8bf568

DEBUG	Could not uncompress archive for dataset [ad43e954-dd79-4986-ae34-9ccdbd8bf568]
Field	Action	Value
@fields.attempt		1
@fields.datasetKey		ad43e954-dd79-4986-ae34-9ccdbd8bf568
@fields.level		DEBUG
@fields.logger_name		org.gbif.crawler.dwca.downloader.CrawlConsumer
@fields.stack_trace		*org.gbif.utils.file.CompressionUtil$UnsupportedCompressionType: Unknown compression type. Neither zip nor gzip*
    at org.gbif.utils.file.CompressionUtil.decompressFile(CompressionUtil.java:115) ~[crawler-cli.jar:na]
    at org.gbif.crawler.dwca.downloader.CrawlConsumer.doCrawl(CrawlConsumer.java:195) [crawler-cli.jar:na]
    at org.gbif.crawler.dwca.downloader.CrawlConsumer.consumeMessage(CrawlConsumer.java:97) [crawler-cli.jar:na]
    at org.gbif.crawler.dwca.downloader.CrawlConsumer.consumeMessage(CrawlConsumer.java:48) [crawler-cli.jar:na]
    at org.apache.curator.framework.recipes.queue.DistributedQueue.processMessageBytes(DistributedQueue.java:672) ~[crawler-cli.jar:na]
    at org.apache.curator.framework.recipes.queue.DistributedQueue.processWithLockSafety(DistributedQueue.java:743) ~[crawler-cli.jar:na]
    at org.apache.curator.framework.recipes.queue.DistributedQueue$5.run(DistributedQueue.java:619) ~[crawler-cli.jar:na]
    at com.google.common.util.concurrent.MoreExecutors$SameThreadExecutorService.execute(MoreExecutors.java:297) ~[crawler-cli.jar:na]
    at org.apache.curator.framework.recipes.queue.DistributedQueue.processChildren(DistributedQueue.java:608) ~[crawler-cli.jar:na]
    at org.apache.curator.framework.recipes.queue.DistributedQueue.runLoop(DistributedQueue.java:560) ~[crawler-cli.jar:na]
    at org.apache.curator.framework.recipes.queue.DistributedQueue.access$000(DistributedQueue.java:64) ~[crawler-cli.jar:na]
    at org.apache.curator.framework.recipes.queue.DistributedQueue$1.call(DistributedQueue.java:195) ~[crawler-cli.jar:na]
    at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) ~[na:1.7.0_25]
    at java.util.concurrent.FutureTask.run(FutureTask.java:166) ~[na:1.7.0_25]
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) ~[na:1.7.0_25]
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) ~[na:1.7.0_25]
    at java.lang.Thread.run(Thread.java:724) ~[na:1.7.0_25]
Caused by: java.io.FileNotFoundException: /home/crap/storage/dwca/ad43e954-dd79-4986-ae34-9ccdbd8bf568/meta.xml (No such file or directory)
    at java.io.FileOutputStream.open(Native Method) ~[na:1.7.0_25]
    at java.io.FileOutputStream.(FileOutputStream.java:212) ~[na:1.7.0_25]
    at java.io.FileOutputStream.(FileOutputStream.java:165) ~[na:1.7.0_25]
    at org.gbif.utils.file.CompressionUtil.ungzipFile(CompressionUtil.java:154) ~[crawler-cli.jar:na]
    at org.gbif.utils.file.CompressionUtil.decompressFile(CompressionUtil.java:112) ~[crawler-cli.jar:na]]]>
    


Author: omeyn@gbif.org
Comment: fixed by markus update to crawling unzip
Created: 2014-09-17 10:50:38.085
Updated: 2014-09-17 10:50:38.085