Issue 18183

Two downloads of the same dataset differ in size?

Reporter: feedback bot
Assignee: mblissett
Type: Feedback
Summary: Two downloads of the same dataset differ in size?
Resolution: WontFix
Status: Closed
Created: 2016-01-28 11:26:43.868
Updated: 2016-01-29 09:44:49.37
Resolved: 2016-01-28 12:25:19.844
Description: Hi GBIF,

I accidentally started two downloads of the same dataset in a few seconds interval (doi:10.15468/dl.hq24pa and doi:10.15468/dl.f60ih2). Strangely, the file size for those two downloads is quite different.

No big deal for me, but I thought you'd want to know!


Nico ]]>

Author: mblissett
Created: 2016-01-28 12:25:19.882
Updated: 2016-01-28 12:25:19.882
The user has downloaded the records while the dataset was being recrawled.  The dataset now has 192k occurrences:

(Compare with UAT, which — for the moment — has the old dataset: 75k occurrences: )

I don't think there's a way to avoid this.  Delaying downloads until any crawling of data they depend on could mean extremely long delays, and many downloads retrieve records from lots of datasets.

Should we show information on the crawl status on the dataset page?

Created: 2016-01-28 12:30:04.673
Updated: 2016-01-28 12:30:04.673
It would be a nice addition to show a note on a dataset page while its being crawled. We dont even clearly show the last time we have indexed it, see POR-1736
[~hoefft], potentially also a page that shows the datasets currently being crawled, as not all users access our data via a single dataset key.

Author: hoefft
Comment: seems reasonable. Would it be useful to have a section in "my downloads" where i can see if the same query would give very different results if downloaded today. Such as "twice the number of records", "has been cleaned". For datasets that might be intersting.
Created: 2016-01-29 09:44:38.828
Updated: 2016-01-29 09:44:38.828