Issue 17617

too many open files: exceotion during plazi checklist indexing

17617
Reporter: mdoering
Assignee: mdoering
Type: Bug
Summary: too many open files: exceotion during plazi checklist indexing
Priority: Critical
Resolution: Fixed
Status: Resolved
Created: 2015-06-08 15:14:19.167
Updated: 2015-06-12 17:22:36.017
Resolved: 2015-06-12 17:22:35.997
        
Description: When indexing all 1000 checklist datasets of Plazi (most real tiny with just 2-8 names) we see nice throughput of 2-3 checklists per second being indexed. At some point after a few hundred datasets though there are exceptions found in both the normalizer and the importer that there are too many open files. The OS cannot create any new threads anymore and eventually a non root user cant even issue a "ps command.

This does not happen when the normalizer runs on its own. Then 1000 plazi lists are processed fine. So it seems the problem lies in the importer...]]>
    


Author: mdoering@gbif.org
Comment: Still unclear what the real cause is. When running the normalizer with 3 threads and just 1 for the importer indexing all of plazi on uat worked. The same setup on production failed though. A test trying to reprodude multithreaded normalization and importing does work fine on OSX and does not yield much insight: https://github.com/gbif/checklistbank/blob/nub-build/checklistbank-cli/src/test/java/org/gbif/checklistbank/cli/TooManyOpenFilesLeakTest.java
Created: 2015-06-09 17:15:10.285
Updated: 2015-06-09 17:15:10.285


Author: mdoering@gbif.org
Created: 2015-06-12 17:22:36.015
Updated: 2015-06-12 17:22:36.015
        
succesfully crawled all of plazi in uat without error.
Shutting down neo db properly in case of import exceptions.
See https://github.com/gbif/checklistbank/commit/369095405df43aa836b482b01e8a14b7175e1653 and https://github.com/gbif/checklistbank/commit/b1802126f776fbb74628d017ecca79847c0b7a58