Issue 17828

clb solr indexing throws "temporary file size exceeds temp_file_limit 52GB"

17828
Reporter: mdoering
Assignee: mdoering
Type: Feedback
Summary: clb solr indexing throws "temporary file size exceeds temp_file_limit 52GB"
Priority: Critical
Resolution: Fixed
Status: Closed
Created: 2015-09-18 13:57:36.204
Updated: 2015-09-18 16:54:52.336
Resolved: 2015-09-18 16:54:52.277
        
Description: Exception thrown when rebuilding a clb solr index on uat.
prod/uat postgres is configured with a maximum of 52GB temporary files.
That should really be enough...

{noformat}
INFO  [2015-09-18 13:50:51,658+0200] [pool-2-thread-5] org.gbif.checklistbank.index.NameUsageIndexingJob: Adding usages from id 113595569 to 113605568
ERROR [2015-09-18 13:50:53,232+0200] [pool-2-thread-1] org.gbif.checklistbank.index.NameUsageIndexingJob: Error indexing document for usage 100151407
org.apache.ibatis.exceptions.PersistenceException:
### Error querying database.  Cause: org.postgresql.util.PSQLException: ERROR: temporary file size exceeds temp_file_limit (52428800kB)
### The error may exist in org/gbif/checklistbank/service/mybatis/mapper/UsageMapper.xml
### The error may involve org.gbif.checklistbank.service.mybatis.mapper.UsageMapper.listParents-Inline
### The error occurred while setting parameters
### SQL: WITH RECURSIVE a AS (       SELECT id, parent_fk       FROM name_usage       WHERE id = ?     UNION ALL       SELECT p.id, p.parent_fk       FROM a JOIN name_usage p ON a.parent_fk = p.id     )     SELECT id FROM a     WHERE id!=?     LIMIT 100;
### Cause: org.postgresql.util.PSQLException: ERROR: temporary file size exceeds temp_file_limit (52428800kB)
	at org.apache.ibatis.exceptions.ExceptionFactory.wrapException(ExceptionFactory.java:30) ~[checklistbank-solr.jar:2.24]
	at org.apache.ibatis.session.defaults.DefaultSqlSession.selectList(DefaultSqlSession.java:122) ~[checklistbank-solr.jar:2.24]
	at org.apache.ibatis.session.defaults.DefaultSqlSession.selectList(DefaultSqlSession.java:113) ~[checklistbank-solr.jar:2.24]
	at sun.reflect.GeneratedMethodAccessor58.invoke(Unknown Source) ~[na:na]
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[na:1.8.0_45]
	at java.lang.reflect.Method.invoke(Method.java:497) ~[na:1.8.0_45]
	at org.apache.ibatis.session.SqlSessionManager$SqlSessionInterceptor.invoke(SqlSessionManager.java:341) ~[checklistbank-solr.jar:2.24]
	at com.sun.proxy.$Proxy26.selectList(Unknown Source) ~[na:na]
	at org.apache.ibatis.session.SqlSessionManager.selectList(SqlSessionManager.java:190) ~[checklistbank-solr.jar:2.24]
	at org.apache.ibatis.binding.MapperMethod.executeForMany(MapperMethod.java:122) ~[checklistbank-solr.jar:2.24]
	at org.apache.ibatis.binding.MapperMethod.execute(MapperMethod.java:64) ~[checklistbank-solr.jar:2.24]
	at org.apache.ibatis.binding.MapperProxy.invoke(MapperProxy.java:53) ~[checklistbank-solr.jar:2.24]
	at com.sun.proxy.$Proxy28.listParents(Unknown Source) ~[na:na]
	at org.gbif.checklistbank.service.mybatis.UsageServiceMyBatis.listParents(UsageServiceMyBatis.java:66) ~[checklistbank-solr.jar:2.24]
	at org.gbif.checklistbank.index.NameUsageIndexingJob.call(NameUsageIndexingJob.java:117) [checklistbank-solr.jar:2.24]
	at org.gbif.checklistbank.index.NameUsageIndexingJob.call(NameUsageIndexingJob.java:28) [checklistbank-solr.jar:2.24]
	at java.util.concurrent.FutureTask.run(FutureTask.java:266) [na:1.8.0_45]
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) [na:1.8.0_45]
	at java.util.concurrent.FutureTask.run(FutureTask.java:266) [na:1.8.0_45]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [na:1.8.0_45]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_45]
	at java.lang.Thread.run(Thread.java:745) [na:1.8.0_45]
Caused by: org.postgresql.util.PSQLException: ERROR: temporary file size exceeds temp_file_limit (52428800kB)
{noformat}]]>
    


Author: mdoering@gbif.org
Created: 2015-09-18 14:48:27.23
Updated: 2015-09-18 14:48:27.23
        
current configs involved:
{noformat}
work_mem=64MB
shared_buffers = 512MB
temp_buffers = 8MB
maintenance_work_mem = 512MB
max_stack_depth = 2MB

# 50GB limit
temp_file_limit = 52428800
{noformat}
    


Author: mdoering@gbif.org
Comment: suggest to increase the work_mem to 256MB which is used for sorting, hashing and subqueries
Created: 2015-09-18 14:54:08.748
Updated: 2015-09-18 14:54:08.748


Author: mdoering@gbif.org
Created: 2015-09-18 14:55:28.975
Updated: 2015-09-18 14:55:28.975
        
An troublesome example query is this recursive one:
{noformat}
EXPLAIN ANALYZE
WITH RECURSIVE a AS (
	SELECT id, parent_fk
	FROM name_usage
	WHERE id = 100151407
  UNION ALL
	SELECT p.id, p.parent_fk
	FROM a JOIN name_usage p ON a.parent_fk = p.id
)
SELECT id FROM a WHERE id!=100151407 LIMIT 100;
{noformat}


It turns out the parent_fk points to itself, so it appears we end up in some endless loop that is not caught by the limit 100:

{noformat}
checklistbank=> select id,parent_fk from name_usage where id=100151407;
    id     | parent_fk
-----------+-----------
 100151407 | 100151407
{noformat}

    


Author: mdoering@gbif.org
Created: 2015-09-18 14:59:43.387
Updated: 2015-09-18 14:59:43.387
        
There are only 2 records like this in clb:
{noformat}
checklistbank=> select dataset_key,id from name_usage where parent_fk=id;
             dataset_key              |    id
--------------------------------------+-----------
 aacd816d-662c-49d2-ad1a-97e66e2a2908 | 113689859
 52a423d2-0486-4e77-bcee-6350d708d6ff | 100151407
{noformat}

That is from the Brazilian Flora Checklist and Prokaryotic Nomenclature Up-to-date respectively.
    


Author: mdoering@gbif.org
Comment: Fixing those manually for now in the db by setting the parent_fk to null
Created: 2015-09-18 15:00:09.794
Updated: 2015-09-18 15:00:09.794


Author: mdoering@gbif.org
Comment: Both cases were created by the checklist normalizer when cleaning up synonym relations. This is fixed in code by making sure the accepted and parent are not the same: https://github.com/gbif/checklistbank/commit/84ef264d507b843908c8a6d8e7c6520eb5a3cdc8
Created: 2015-09-18 15:11:04.087
Updated: 2015-09-18 15:50:49.701


Author: mdoering@gbif.org
Comment: as various postgres tune sites recommend 25% of available memory to be used for shared_buffers I suggest to increase that setting too to 5GB
Created: 2015-09-18 15:53:31.386
Updated: 2015-09-18 15:54:40.593


Author: mdoering@gbif.org
Comment: Avoid eternal loops in recursive sql: https://github.com/gbif/checklistbank/commit/0a765bab6fa17d3de4c3a783f3fa376cd0aa6b46
Created: 2015-09-18 16:11:40.383
Updated: 2015-09-18 16:11:40.383