Issue 17754
Implement MD5 checking on DwC-A Downloader
17754
Reporter: trobertson
Type: Improvement
Summary: Implement MD5 checking on DwC-A Downloader
Priority: Major
Status: Open
Created: 2015-08-06 11:26:54.384
Updated: 2015-08-06 11:27:07.279
Description: The DwC-A downloader uses conditional gets to try and pull only archives that have changed. However, for installations that do not support conditional get, we download and go through indexing. This is inefficient, and there are many instances that are in this situation.
We should try a conditional get and if it is not supported, then calculate the current MD5, download the archive, calculate and compare MD5s and then stop if there is no change, otherwise continue.
This will significantly reduce unnecessary load on the occurrence stream processing.
https://github.com/gbif/crawler/blob/master/crawler-cli/src/main/java/org/gbif/crawler/dwca/downloader/DwcaCrawlConsumer.java#L77]]>