Issue 12401

Random NPE during Paging

12401
Reporter: lfrancke
Assignee: mdoering
Type: Bug
Summary: Random NPE during Paging
Priority: Blocker
Resolution: Fixed
Status: Closed
Created: 2012-11-23 12:36:40.851
Updated: 2013-12-16 17:50:33.452
Resolved: 2012-12-03 11:20:53.401
        
Description: During this request http://staging.gbif.org:8080/registry-ws/dataset/?offset=13000&limit=1000

we see different results.

HTTP Error 500
{quote}
java.lang.NullPointerException
	java.lang.String$CaseInsensitiveComparator.compare(String.java:1217)
	java.lang.String$CaseInsensitiveComparator.compare(String.java:1211)
	java.lang.String.compareToIgnoreCase(String.java:1258)
	org.gbif.registry.service.DatasetServiceImpl$1.compare(DatasetServiceImpl.java:61)
	org.gbif.registry.service.DatasetServiceImpl$1.compare(DatasetServiceImpl.java:57)
	java.util.Arrays.mergeSort(Arrays.java:1270)
	java.util.Arrays.mergeSort(Arrays.java:1281)
	java.util.Arrays.mergeSort(Arrays.java:1281)
	java.util.Arrays.mergeSort(Arrays.java:1282)
	java.util.Arrays.mergeSort(Arrays.java:1282)
	java.util.Arrays.mergeSort(Arrays.java:1282)
	java.util.Arrays.mergeSort(Arrays.java:1281)
	java.util.Arrays.mergeSort(Arrays.java:1282)
	java.util.Arrays.sort(Arrays.java:1210)
	java.util.Collections.sort(Collections.java:157)
	org.gbif.registry.service.DatasetServiceImpl.list(DatasetServiceImpl.java:310)
	org.gbif.registry.ws.resources.NetworkEntityResource.list(NetworkEntityResource.java:136)
	org.gbif.registry.ws.resources.NetworkEntityResource.list(NetworkEntityResource.java:157)
	sun.reflect.GeneratedMethodAccessor1936.invoke(Unknown Source)
	sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
	java.lang.reflect.Method.invoke(Method.java:597)
	com.sun.jersey.spi.container.JavaMethodInvokerFactory$1.invoke(JavaMethodInvokerFactory.java:60)
	com.sun.jersey.server.impl.model.method.dispatch.AbstractResourceMethodDispatchProvider$TypeOutInvoker._dispatch(AbstractResourceMethodDispatchProvider.java:185)
	com.sun.jersey.server.impl.model.method.dispatch.ResourceJavaMethodDispatcher.dispatch(ResourceJavaMethodDispatcher.java:75)
	com.sun.jersey.server.impl.uri.rules.HttpMethodRule.accept(HttpMethodRule.java:288)
	com.sun.jersey.server.impl.uri.rules.ResourceClassRule.accept(ResourceClassRule.java:108)
	com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147)
	com.sun.jersey.server.impl.uri.rules.RootResourceClassesRule.accept(RootResourceClassesRule.java:84)
	com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1483)
	com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1414)
	com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1363)
	com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1353)
	com.sun.jersey.spi.container.servlet.WebComponent.service(WebComponent.java:414)
	com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:537)
	com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:708)
	javax.servlet.http.HttpServlet.service(HttpServlet.java:717)
	com.google.inject.servlet.ServletDefinition.doService(ServletDefinition.java:263)
	com.google.inject.servlet.ServletDefinition.service(ServletDefinition.java:178)
	com.google.inject.servlet.ManagedServletPipeline.service(ManagedServletPipeline.java:91)
	com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:62)
	com.google.inject.servlet.ManagedFilterPipeline.dispatch(ManagedFilterPipeline.java:118)
	com.google.inject.servlet.GuiceFilter.doFilter(GuiceFilter.java:113)
{quote}

{quote}
java.lang.NullPointerException
	org.gbif.registry.metadata.service.FileSystemMetadataService.listExternalNetworks(FileSystemMetadataService.java:104)
	org.gbif.registry.metadata.service.FileSystemMetadataService.listExternalDocuments(FileSystemMetadataService.java:134)
	org.gbif.registry.metadata.service.FileSystemMetadataService.listExternal(FileSystemMetadataService.java:232)
	org.gbif.registry.service.DatasetServiceImpl.list(DatasetServiceImpl.java:305)
	org.gbif.registry.ws.resources.NetworkEntityResource.list(NetworkEntityResource.java:136)
	org.gbif.registry.ws.resources.NetworkEntityResource.list(NetworkEntityResource.java:157)
	sun.reflect.GeneratedMethodAccessor1936.invoke(Unknown Source)
	sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
	java.lang.reflect.Method.invoke(Method.java:597)
	com.sun.jersey.spi.container.JavaMethodInvokerFactory$1.invoke(JavaMethodInvokerFactory.java:60)
	com.sun.jersey.server.impl.model.method.dispatch.AbstractResourceMethodDispatchProvider$TypeOutInvoker._dispatch(AbstractResourceMethodDispatchProvider.java:185)
	com.sun.jersey.server.impl.model.method.dispatch.ResourceJavaMethodDispatcher.dispatch(ResourceJavaMethodDispatcher.java:75)
	com.sun.jersey.server.impl.uri.rules.HttpMethodRule.accept(HttpMethodRule.java:288)
	com.sun.jersey.server.impl.uri.rules.ResourceClassRule.accept(ResourceClassRule.java:108)
	com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147)
	com.sun.jersey.server.impl.uri.rules.RootResourceClassesRule.accept(RootResourceClassesRule.java:84)
	com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1483)
	com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1414)
	com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1363)
	com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1353)
	com.sun.jersey.spi.container.servlet.WebComponent.service(WebComponent.java:414)
	com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:537)
	com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:708)
	javax.servlet.http.HttpServlet.service(HttpServlet.java:717)
	com.google.inject.servlet.ServletDefinition.doService(ServletDefinition.java:263)
	com.google.inject.servlet.ServletDefinition.service(ServletDefinition.java:178)
	com.google.inject.servlet.ManagedServletPipeline.service(ManagedServletPipeline.java:91)
	com.google.inject.servlet.FilterChainInvocation.doFilter(FilterChainInvocation.java:62)
	com.google.inject.servlet.ManagedFilterPipeline.dispatch(ManagedFilterPipeline.java:118)
	com.google.inject.servlet.GuiceFilter.doFilter(GuiceFilter.java:113)
{quote}

Sometimes zero datasets are returned, sometimes a few, sometimes many.

The random nature of this suggests a Threading issue but I might be wrong.]]>
    


Author: lfrancke@gbif.org
Created: 2012-11-23 12:43:26.557
Updated: 2012-11-23 12:43:26.557
        
{code:title=FileSystemMetadataService.listExternalNetworks}
    for (File networkDirectory : metadataDirectory.listFiles(directoryFilter)) {
{code}

{code:title=listFiles}
     * @return  An array of abstract pathnames denoting the files and
     *          directories in the directory denoted by this abstract pathname.
     *          The array will be empty if the directory is empty.  Returns
     *          {@code null} if this abstract pathname does not denote a
     *          directory, or if an I/O error occurs.
{code}

So this needs to check for null. I can't say if this _should_ ever happen though or if it is an error happening somewhere else.
    


Author: lfrancke@gbif.org
Created: 2012-11-23 12:45:27.45
Updated: 2012-11-23 12:45:27.45
        
{code:title=DatasetServiceImpl}
  /**
   * Comparator which compares two names lexicographically to be able to sort
   * them in ascending lexicographic order.
   */
  private final Ordering orderingByName = new Ordering() {

    @Override
    public int compare(Dataset left, Dataset right) {
      return Strings.nullToEmpty(left.getTitle()).compareToIgnoreCase(right.getTitle());
    }
  };
{code}

{code}
      if(!externals.isEmpty()) {
        // the non-augmented (internal)list is already sorted, but as it may include datasets
        // from the metadataService, it needs to be sorted again.
        Collections.sort(results, orderingByName);
      }
{code}

    


Author: lfrancke@gbif.org
Created: 2012-11-23 12:48:36.464
Updated: 2012-11-23 12:48:36.464
        
[~mdoering@gbif.org] You wrote the code in my first comment, could you please take a look?

[~jcuadra@gbif.org] You wrote the code in the second comment, could you too please take a look?

Shall I create two separate issues?
    


Author: lfrancke@gbif.org
Created: 2012-11-23 14:23:37.021
Updated: 2012-11-23 14:48:08.35
        
These are all the ranges and dataset keys that fail for me:

Key: 2344f83d-eefb-4635-afed-fb2a1c9bd466:PISCO_OSU_Intertidal_Tidbit_Temperature_Protocol.50.1
http://staging.gbif.org:8080/registry-ws/dataset/?offset=13336&limit=2
http://staging.gbif.org:8080/registry-ws/dataset/?offset=13337&limit=1

Key: 2344f83d-eefb-4635-afed-fb2a1c9bd466:PISCO_OSU_Intertidal_Tidbit_Temperature_Protocol.50.1
http://staging.gbif.org:8080/registry-ws/dataset/?offset=16736&limit=2
http://staging.gbif.org:8080/registry-ws/dataset/?offset=16737&limit=1

Key: 2344f83d-eefb-4635-afed-fb2a1c9bd466:PISCO_OSU_Intertidal_Temperature_Logger_Protocols.50.1
http://staging.gbif.org:8080/registry-ws/dataset/?offset=20006&limit=2
http://staging.gbif.org:8080/registry-ws/dataset/?offset=20007&limit=1

Key: 2344f83d-eefb-4635-afed-fb2a1c9bd466:PISCO_Spatially_Nested_Biodiversity_Survey_Protocol.50.1
http://staging.gbif.org:8080/registry-ws/dataset/?offset=22084&limit=2
http://staging.gbif.org:8080/registry-ws/dataset/?offset=22085&limit=1

Key: 2344f83d-eefb-4635-afed-fb2a1c9bd466:PISCO_OSU_Intertidal_Bionic_Mussel_Protocol.50.1
http://staging.gbif.org:8080/registry-ws/dataset/?offset=22800&limit=2
http://staging.gbif.org:8080/registry-ws/dataset/?offset=22801&limit=1

Key: 2344f83d-eefb-4635-afed-fb2a1c9bd466:PISCO_OSU_Mooring_Instrument_Protocols.50.3
http://staging.gbif.org:8080/registry-ws/dataset/?offset=26920&limit=2
http://staging.gbif.org:8080/registry-ws/dataset/?offset=26921&limit=1

Key: 2344f83d-eefb-4635-afed-fb2a1c9bd466:PISCO_OSU_Barnacle_Fecundity_Protocol.50.1
http://staging.gbif.org:8080/registry-ws/dataset/?offset=28706&limit=2
http://staging.gbif.org:8080/registry-ws/dataset/?offset=28707&limit=1

Key: http://staging.gbif.org:8080/registry-ws/dataset/?offset=31147&limit=1
http://staging.gbif.org:8080/registry-ws/dataset/?offset=31146&limit=2
http://staging.gbif.org:8080/registry-ws/dataset/?offset=31147&limit=1

Key: 2344f83d-eefb-4635-afed-fb2a1c9bd466:PISCO_OSU_Intertidal_Terrestrial_IButton_Protocol.50.1
http://staging.gbif.org:8080/registry-ws/dataset/?offset=33765&limit=2
http://staging.gbif.org:8080/registry-ws/dataset/?offset=33766&limit=1

Key: 2344f83d-eefb-4635-afed-fb2a1c9bd466:PISCO_OSU_Tidbit_Temperature_Protocol.50.1
http://staging.gbif.org:8080/registry-ws/dataset/?offset=34885&limit=2
http://staging.gbif.org:8080/registry-ws/dataset/?offset=34886&limit=1
    


Author: mdoering@gbif.org
Created: 2012-11-26 11:01:26.398
Updated: 2012-11-26 11:01:26.398
        
Wow, these datasets just dont have a title which is apparently expected and one would think required. What should we do with those cases?
Actually it seems they dont have any parsed property at all. Its just the key and some fixed values for all external docs. Probably some parsing exceptions (should) happen that we need to handle. On the other hand such broken documents should have never made it into the repository - they are here because we manually put them in. If we need to be able to page over broken documents and deal with it in "runtime" I can't see a way that we can skip them. But we could return an empty document like we do now and add an empty string title
    


Author: mdoering@gbif.org
Created: 2012-11-28 17:13:06.517
Updated: 2012-11-28 17:13:06.517
        
Interestingly EML not only describes datasets, but is also used to describe scientific protocols and other things. For example this "dataset" here in fact is a protocol and therefore doesn't parse at all and has no title:

http://staging.gbif.org:8080/registry-ws/dataset/2344f83d-eefb-4635-afed-fb2a1c9bd466:PISCO_OSU_Tidbit_Temperature_Protocol.50.1/document

http://staging.gbif.org:8080/registry-ws/dataset/2344f83d-eefb-4635-afed-fb2a1c9bd466:PISCO_OSU_Tidbit_Temperature_Protocol.50.1
    


Author: mdoering@gbif.org
Created: 2012-11-30 12:58:49.84
Updated: 2012-11-30 12:58:49.84
        
http://code.google.com/p/gbif-registry/source/detail?r=3354
http://code.google.com/p/gbif-registry/source/detail?r=3355
http://code.google.com/p/gbif-registry/source/detail?r=3356
http://code.google.com/p/gbif-registry/source/detail?r=3357
http://code.google.com/p/gbif-registry/source/detail?r=3358
    


Author: mdoering@gbif.org
Created: 2012-11-30 13:55:22.343
Updated: 2012-11-30 13:55:22.343
        
still some datasets appear completely empty, now even without a key:
http://staging.gbif.org:8080/registry-ws/dataset/?offset=13337&limit=2
    


Author: mdoering@gbif.org
Created: 2012-12-03 11:20:53.438
Updated: 2012-12-03 11:20:53.438
        
http://code.google.com/p/gbif-registry/source/detail?r=3361
http://code.google.com/p/gbif-registry/source/detail?r=3362