Issue 18301

Deleting old downloads to free space

18301
Reporter: mblissett
Assignee: mblissett
Type: Bug
Summary: Deleting old downloads to free space
Priority: Blocker
Resolution: Fixed
Status: Closed
Created: 2016-03-01 18:18:04.075
Updated: 2016-03-02 15:35:34.753
Resolved: 2016-03-02 15:35:34.682
        
Description: I've reduced the replication factor from 3→2 for every download that was at 3. That's saved 4TB, usage is now 78%.

All downloads use 47 TB, plus replicants making 77TB total.

I propose deleting the 520 downloads larger than 10GB and from 2013 or 2014.  That should free an additional 17TB.

How much space do we want to aim for?  Deleting downloads from heavy users (Jörg etc) will free up something like 7-12TB (including replicant space).  Deleting downloads larger than 10GB and more than 6 months old will free up an additional 2×20TB.

It's not worth bothering about downloads < 1GB, they take up about 1½×3.5TB. We could perhaps put everything < 0.1GB up to replication factor 2, which would cost an extra 500GB or so.

{code}
 Max, Space used GB
  1k,    0.0004
 10k,    0.09
100k,    2.5
  1m,   14.6
 10m,   70.2
100m,  304.1
  1g,  812.5
 10g, 2053.9
100g,37483.2
{code}

DataCite "will soon be releasing an improved service for DOI resolution statistics" which would be useful: https://www.datacite.org/services/get-your-doi-statistics.html (can only see our top 10 DOI uses, which are all dataset DOIs).]]>
    


Author: trobertson@gbif.org
Created: 2016-03-02 09:35:33.955
Updated: 2016-03-02 09:46:45.806
        
Your proposal sounds good "520 downloads larger than 10GB and from 2013 or 2014. That should free an additional 17TB."

Can you please send me the names (offline) of the "heavy users" which I will most likely recognise.  We can remove those as well when we know what they are for.

For anyone stumbling on this discussion: This is an interim solution before an "opt-in" approach is developed in the future.
    


Author: mblissett
Created: 2016-03-02 15:35:34.719
Updated: 2016-03-02 15:35:34.719
        
Done — there's now 53.5TB or 40% free.