Issue 17524

Serve downloads from a more reliable storage

17524
Reporter: fmendez
Assignee: omeyn
Type: Story
Summary: Serve downloads from a more reliable storage
Priority: Major
Resolution: Fixed
Status: Closed
Created: 2015-03-24 22:02:34.267
Updated: 2015-04-14 09:54:43.04
Resolved: 2015-04-14 09:54:43.011
        
Description:     We currently believe we should store and serve straight from HDFS.  If we are to do this, there are a few things to consider:
        - The Hadoop NFS Gateway might be suitable as it would allow existing code to simply write the HDFS.  OM points out it might have security implications in that we believe NFS mount has to mount the root of the HDFS, making it trivial for a user to accidentally issue “rm –fr /mnt/hdfs” on a linux box.
       - The Hadoop HTTP gateway might be applicable to serve the downloads and we write using the HDFS Java API.
       -  We could code a proxy to serve the incoming requests from HDFS and not use more CDH managed services (NFS or HTTP gateways)]]>
    


Author: trobertson@gbif.org
Created: 2015-03-25 07:45:19.875
Updated: 2015-03-25 07:45:19.875
        
On the NFS guide it says to mount using
{code}
mount -t nfs -o vers=3,proto=tcp,nolock $server:/  $mount_point
{code}

Presumably you can therefore use {code}$server:/occurrence_download{code} which might overcome the concern of accidental deletion.

http://docs.hortonworks.com/HDPDocuments/HDP1/HDP-1.3.9/bk_user-guide/content/user-guide-hdfs-nfs.html
    


Author: omeyn@gbif.org
Comment: I propose we use the NFS gateway, setup to only export /occurrence-downloads, and even then only mount as read only. The performance is limited to 50 MB/s, but I figure that should be good enough to handle around 10-20 concurrent downloads at typical user internet speeds (20-50Mb). It is also self limiting in that we can't kill the hdfs cluster from too much load (as we might be able to using httpfs or custom proxy).
Created: 2015-04-07 13:35:18.23
Updated: 2015-04-07 13:35:18.23


Author: omeyn@gbif.org
Comment: reopening until implemented
Created: 2015-04-07 13:43:36.257
Updated: 2015-04-07 13:43:36.257


Author: omeyn@gbif.org
Comment: prod nfs gateway running from prodmaster1-vh and uat nfs gateway is on prodmaster2-vh. Both are read only, and export their respective /occurrence-download/{env}-downloads directories.
Created: 2015-04-14 09:54:43.035
Updated: 2015-04-14 09:54:43.035