Issue 17754

Implement MD5 checking on DwC-A Downloader

17754
Reporter: trobertson
Type: Improvement
Summary: Implement MD5 checking on DwC-A Downloader
Priority: Major
Status: Open
Created: 2015-08-06 11:26:54.384
Updated: 2015-08-06 11:27:07.279
        
Description: The DwC-A downloader uses conditional gets to try and pull only archives that have changed.  However, for installations that do not support conditional get, we download and go through indexing.  This is inefficient, and there are many instances that are in this situation.

We should try a conditional get and if it is not supported, then calculate the current MD5, download the archive, calculate and compare MD5s and then stop if there is no change, otherwise continue.

This will significantly reduce unnecessary load on the occurrence stream processing.


https://github.com/gbif/crawler/blob/master/crawler-cli/src/main/java/org/gbif/crawler/dwca/downloader/DwcaCrawlConsumer.java#L77]]>