Issue 13765

POST chunked content broken

13765
Reporter: mdoering
Assignee: fmendez
Type: Bug
Summary: POST chunked content broken
Priority: Major
Resolution: Done
Status: Closed
Created: 2013-09-02 18:39:55.39
Updated: 2015-03-06 16:54:12.371
Resolved: 2015-03-06 16:54:12.305
        
        
Description: might be varnish related, needs checking

ERROR [2013-09-02 18:36:15,300+0200] [http-bio-8080-exec-78] org.apache.catalina.core.ContainerBase.[Catalina].[localhost].[/occurrence-download-ws].[default]: Servlet.service() for servlet [default] in context with path [/occurrence-do
wnload-ws] threw exception [java.io.EOFException: No content to map to Object due to end of input] with root cause
java.io.EOFException: No content to map to Object due to end of input
        at org.codehaus.jackson.map.ObjectMapper._initForReading(ObjectMapper.java:2775) ~[jackson-mapper-asl-1.9.12.jar:1.9.12]
        at org.codehaus.jackson.map.ObjectMapper._readValue(ObjectMapper.java:2691) ~[jackson-mapper-asl-1.9.12.jar:1.9.12]]]>
    


Author: mdoering@gbif.org
Created: 2013-09-03 18:10:10.93
Updated: 2013-09-04 10:18:22.715
        
this proves more difficult than initially expected. Using the clients directly at our webservices works, also using curl with varnish works fine.
Debugging shows that Varnish removes the actual body content. Http specs mandate for POSTs apparently that either Content-Length is set or TransferEncoding. Content-Length is not produced by the jersey clients. Curl does it though, that is why this works.

Varnishlog shows that the original client request does have a transfer encoding header though:
{noformat}
7 RxHeader     c Transfer-Encoding: chunked
{noformat}

We use varnish 3.0.4 on apidev.gbif.org which is the latest stable version

This get changed to this in varnish, which does not seem to be understood by the jersey webservices
{noformat}
    7 Debug        c Transfer-Encoding in request
{noformat}

    


Author: mdoering@gbif.org
Created: 2013-09-03 22:12:32.135
Updated: 2013-09-04 09:59:21.163
        
If I use chunked transfer encoding in curl it also fails:
--header "Transfer-Encoding: chunked"

It appears varnish removes chunked content:
https://www.varnish-cache.org/trac/wiki/Future_Feature#Chunkedencodingclientrequests
https://www.varnish-cache.org/trac/ticket/452

According to http://en.wikipedia.org/wiki/Chunked_transfer_encoding Content-Length and chunked transfers are alternative ways of transmitting the body entity
    


Author: mdoering@gbif.org
Created: 2013-09-03 22:56:28.087
Updated: 2013-09-03 23:01:17.991
        
Experimenting with the latest jersey 2.2 shows that by default the Content-Length is set using HttpUrlConnection & jackson. If I use the apache http cient chunked encoding is used and varnish even returns a 503.

If we would use the latest jersey and the default connector we would probably also use the clients in UDFs and bypass the http client version problem, maybe an option?
    


Author: mdoering@gbif.org
Created: 2013-09-04 10:05:53.804
Updated: 2013-09-04 10:05:53.804
        
Options to potentially get by this problem are:

1) make the backends public and keep things as they are, bypassing varnish
2) try to find a solution so that varnish does not drop the chunked content
3) upgrade to jersey 2.2
  3.1) only upgrade the clients, that is "just" a simple refactoring. After all our webservices are just RESTful services and we should be able to use different jersey versions in the client and server side without a problem
  3.2) upgrade all of jersey, but then we face the guice problem. There is no guice servlet support in jersey 2.2 so far. Simple injections of our guice modules into jersey resources should work  with the HK2 bridge, but its surely painful and we definitely cannot use any guice servlet specific annotations like @RequestScope

https://java.net/jira/browse/JERSEY-1950
https://java.net/jira/browse/HK2-121
https://hk2.java.net/guice-bridge/index.html
    


Author: trobertson@gbif.org
Created: 2013-09-04 10:46:57.503
Updated: 2013-09-04 10:46:57.503
        
Just from reading around other people trying this 3.2) looks scary.  From a Jersey committer on a Jira issue I saw (sorry, I forgot the issue number and it was late last night) I read something like "Jersey 2.0+ has no Guice support.  Period.  Any of these workarounds will not work and it requires Jersey developement"

I would have thought there was a 4)

4) Investigate why Jersey 1.0 does chunking even for a post(String) when using Apache 4 client, and see if it can be patched / configured not to
    


Author: mdoering@gbif.org
Created: 2013-09-04 10:52:41.609
Updated: 2013-09-04 10:52:41.609
        
... or if we really need apache client if some other connector would be able to handle non chunked posts.

I would like to start with investigating into varnish though, as chunked transfers is nothing very uncommon and it will also pose a problem for external clients, maybe using php or who knows what. Keep in mind that requesting a download requires a POST already.
    


Author: trobertson@gbif.org
Comment: Lars F: Or 5) we stop using Varnish and look at something like Apache Traffic Server
Created: 2013-09-04 11:26:52.464
Updated: 2013-09-04 11:26:52.464


Author: mdoering@gbif.org
Comment: before we go ahead with difficult solutions we will wait for replies to this varnish forum post: https://www.varnish-cache.org/forum/topic/1184
Created: 2013-09-04 12:04:59.126
Updated: 2013-09-04 12:04:59.126


Author: mdoering@gbif.org
Comment: Use download ws directly for now: https://code.google.com/p/gbif-portal/source/detail?r=1945
Created: 2013-09-04 12:07:01.59
Updated: 2013-09-04 12:07:01.59


Author: mdoering@gbif.org
Created: 2013-09-04 15:53:23.882
Updated: 2013-09-04 16:16:26.027
        
When using http client 3.1 or the default UrlConnection in a simple test example Content-Length is used and it all works.
See https://github.com/mdoering/chunked-test
    


Author: mdoering@gbif.org
Comment: Verified that I can create a new download on apidev when using http client 3.1 with our clients
Created: 2013-09-04 16:30:19.769
Updated: 2013-09-04 16:30:19.769


Author: mdoering@gbif.org
Created: 2013-09-04 23:08:05.344
Updated: 2013-09-04 23:08:05.344
        
NGINX can also be used as both, a routing frontend with rewrites but also rather recently as a reverse caching proxy:
http://serverfault.com/questions/30705/how-to-set-up-nginx-as-a-caching-reverse-proxy
http://nginx.org/en/docs/http/ngx_http_proxy_module.html
    


Author: mdoering@gbif.org
Created: 2013-09-05 22:14:01.993
Updated: 2013-09-05 22:14:40.236
        
Update on jersey2 and guice: It is pretty straight forward to use the HK2 guice bridge and inject instances from guice and the other way around. The thing that is problematic is to use the guice-servlet extension and it's new scopes. If you wire up a jersey up just in the regular way with an Application which is put into the web.xml, it is very simple to install guice modules in that application and then inject them into jersey resources. I don't think we really need the guice servlet extension and this would already work fine for our server code if we ever want to upgrade to jersey 2!

I've tested all this in a local little project, so Im sure it works
    


Author: mdoering@gbif.org
Comment: Got a varnish reply saying chunked POSTs are not supported right now, but hopefully in the next release
Created: 2013-09-10 14:18:43.66
Updated: 2013-09-10 14:18:43.66


Author: mdoering@gbif.org
Comment: bypassing issue in the one criticla request download method https://code.google.com/p/gbif-occurrencestore/source/detail?r=2144
Created: 2013-09-10 16:28:57.611
Updated: 2013-09-10 16:28:57.611


Author: trobertson@gbif.org
Comment: Can be closed?
Created: 2013-09-19 15:34:54.517
Updated: 2013-09-19 15:34:54.517


Author: mdoering@gbif.org
Created: 2013-09-26 17:10:30.379
Updated: 2013-09-26 17:10:30.379
        
The create download client works now, but any other POSTs via a client will still have the chunked problem.
Keeping the issue open and remain it as mayor. I can imagine it is critical for users of the registry client, especially our own internal use. We should not be able to use the cached API right now in case we want to POST things. Needs verification
    


Author: fmendez@gbif.org
Created: 2015-03-06 16:54:12.364
Updated: 2015-03-06 16:54:12.364
        
occurence downloads use the registry client and the varnish/apI urls without any problem...we can close this issue
see: https://github.com/gbif/gbif-configuration/blob/master/occurrence-download-workflow/profiles.xml#L92