Issue 14303

Problem to download big batch of data (around 2 Go or more)

Reporter: feedback bot
Assignee: omeyn
Type: Feedback
Summary: Problem to download big batch of data (around 2 Go or more)
Resolution: Fixed
Status: Closed
Created: 2013-10-25 17:54:08.976
Updated: 2013-12-19 16:48:28.371
Resolved: 2013-11-06 09:50:05.5
Description: We are working on a model for freshwater species, and we would like to download data from GBIF locally to work on it.
I remark that when I download a file around 2 Go or more, at each time, I don't have all data.
The archive looks like good, but when I extract it, I get a message telling that there is an error with occurrence.txt file.
Then, when I upload into my data base, I get an error which is due to the last line of the data which is not complete. With all try I have done, it is the same kind of problem.
I have done various test, from different computer but, it is all the time the same...
I think that come from the GBIF side, because the zip file is not corrupted, and it is not normal that the Occurrence.txt file is never ending well (at the end of a data line).

Cheers, Onésime.

*Reporter*: Onésime Prud'homme
*E-mail*: []]]>

Comment:  "the last line of the data which is not complete" - any ideas []?
Created: 2013-10-28 11:41:44.386
Updated: 2013-10-28 11:41:44.386

Comment: I'm emailing with him, we'll see.
Created: 2013-10-28 11:46:36.211
Updated: 2013-10-28 11:46:36.211

Created: 2013-10-28 13:21:54.082
Updated: 2013-10-28 13:21:54.082
His response (note this may be the bad zip problem fixed with java7, i'll check):

No worries and thank you to give answer.

In fact, I work on a model for freshwater species. So we would like to get occurrences for species link to freshwater. For the moment I work on Swedish Fishes.

As there is not filter for the groups like Fishes, Birds, Mammals etc., I tried to download all the Swedish occurrences. But it doesn't work.

So, I then tried only for "Cyprinidae", and it's works well, because there is less occurrences, the file is quite small: 3,9 Mo.

For all my tests to download,
At each time, I used at least:
Country = Sweden
Basis of records = all except "fossil"
In the best, I would like to be able to download all the data for Sweden, like that, I can work on my computer without to download many small groups of species.

When I unzip the archive file, I get this error message: "CRC failed in 'occurrence.txt. File is broken".
And when I open the file with a text editor, I saw that the last line is not complete.

Below the email with one of the link.

Thanks for your help, If you need more details, do not hesitates.


-------- Message original --------
Sujet: 	Your GBIF data download is ready
Date : 	Thu, 17 Oct 2013 16:13:19 +0200 (CEST)
De :
Pour :
Your download 0001571-130906152512535 is ready at the following address:

Comment: was the unzip on windows (java6) problem, he's sorted now
Created: 2013-11-06 09:50:05.697
Updated: 2013-11-06 09:50:05.697