Issue 17591

Indexing dwca validator fails to read valid zip archives

17591
Reporter: mdoering
Type: Bug
Summary: Indexing dwca validator fails to read valid zip archives
Priority: Critical
Status: Open
Created: 2015-05-20 16:09:09.149
Updated: 2015-05-20 16:09:35.557
        
Description: When validating the following 2 archives manually using http://tools.gbif.org/dwca-validator/ all is fine, but our indexing embedded validator chokes with an exception.
-----

 - http://www.gbif.org/dataset/80b4b440-eaca-4860-aadf-d0dfdd3e856e
     - https://github.com/gbif/iczn-lists/archive/master.zip

 - http://www.gbif.org/dataset/c5e74a7d-f81d-43d5-9cb3-dc31d914e3ed
     - https://dl.dropboxusercontent.com/u/457027/IndicesNominumSupragenericorumPlantarumVascularium.zip

-----
WARN  [2015-05-20 16:04:00,753+0200] [pool-9-thread-1] org.gbif.crawler.dwca.validator.ValidatorService: Invalid Dwc archive for dataset 80b4b440-eaca-4860-aadf-d0dfdd3e856e
org.gbif.dwc.text.UnsupportedArchiveException: The archive given is a folder with more or less than 1 data files having a txt or csv suffix
	at org.gbif.dwc.text.ArchiveFactory.openArchive(ArchiveFactory.java:337) ~[crawler-cli.jar:na]
	at org.gbif.crawler.dwca.LenientArchiveFactory.openArchive(LenientArchiveFactory.java:40) ~[crawler-cli.jar:na]
	at org.gbif.crawler.dwca.LenientArchiveFactory.openArchive(LenientArchiveFactory.java:57) ~[crawler-cli.jar:na]
	at org.gbif.crawler.dwca.validator.ValidatorService$DwcaDownloadFinishedMessageCallback.handleMessage(ValidatorService.java:90) [crawler-cli.jar:na]
	at org.gbif.crawler.dwca.validator.ValidatorService$DwcaDownloadFinishedMessageCallback.handleMessage(ValidatorService.java:54) [crawler-cli.jar:na]
	at org.gbif.common.messaging.MessageConsumer.handleCallback(MessageConsumer.java:101) [crawler-cli.jar:na]
	at org.gbif.common.messaging.MessageConsumer.handleDelivery(MessageConsumer.java:65) [crawler-cli.jar:na]
	at com.rabbitmq.client.impl.ConsumerDispatcher$4.run(ConsumerDispatcher.java:121) [crawler-cli.jar:na]
	at com.rabbitmq.client.impl.ConsumerWorkService$WorkPoolRunnable.run(ConsumerWorkService.java:76) [crawler-cli.jar:na]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) [na:1.7.0_75]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_75]
	at java.lang.Thread.run(Thread.java:745) [na:1.7.0_75]
]]>