Issue 17530

species match TooComplexToDeterminizeException

17530
Reporter: mdoering
Assignee: mdoering
Type: Bug
Summary: species match TooComplexToDeterminizeException
Priority: Major
Resolution: Fixed
Status: Closed
Created: 2015-03-26 12:19:47.6
Updated: 2015-03-26 17:50:30.953
Resolved: 2015-03-26 17:50:30.928
        
Description: Found in recent nub lookup logs. Seems the internal lucene fuzzy matching throws an error which we should catch and fallback to a non fuzzy search in those cases within NubIndex.matchByName.

WARN  [2015-03-26 12:15:59,271+0100] [qtp402394288-44284] org.eclipse.jetty.servlet.ServletHandler: /species/match
org.apache.lucene.util.automaton.TooComplexToDeterminizeException: Determinizing automaton would result in more than 10000 states.
	at org.apache.lucene.util.automaton.Operations.determinize(Operations.java:743) ~[checklistbank-nub-ws-2.12.jar:2.12]
	at org.apache.lucene.util.automaton.RunAutomaton.(RunAutomaton.java:138) ~[checklistbank-nub-ws-2.12.jar:2.12]
	at org.apache.lucene.util.automaton.ByteRunAutomaton.(ByteRunAutomaton.java:32) ~[checklistbank-nub-ws-2.12.jar:2.12]
	at org.apache.lucene.util.automaton.CompiledAutomaton.(CompiledAutomaton.java:203) ~[checklistbank-nub-ws-2.12.jar:2.12]
	at org.apache.lucene.util.automaton.CompiledAutomaton.(CompiledAutomaton.java:104) ~[checklistbank-nub-ws-2.12.jar:2.12]
	at org.apache.lucene.search.FuzzyTermsEnum.initAutomata(FuzzyTermsEnum.java:176) ~[checklistbank-nub-ws-2.12.jar:2.12]
	at org.apache.lucene.search.FuzzyTermsEnum.getAutomatonEnum(FuzzyTermsEnum.java:152) ~[checklistbank-nub-ws-2.12.jar:2.12]
	at org.apache.lucene.search.FuzzyTermsEnum.maxEditDistanceChanged(FuzzyTermsEnum.java:211) ~[checklistbank-nub-ws-2.12.jar:2.12]
	at org.apache.lucene.search.FuzzyTermsEnum.bottomChanged(FuzzyTermsEnum.java:205) ~[checklistbank-nub-ws-2.12.jar:2.12]
	at org.apache.lucene.search.FuzzyTermsEnum.(FuzzyTermsEnum.java:143) ~[checklistbank-nub-ws-2.12.jar:2.12]
	at org.apache.lucene.search.FuzzyQuery.getTermsEnum(FuzzyQuery.java:155) ~[checklistbank-nub-ws-2.12.jar:2.12]
	at org.apache.lucene.search.MultiTermQuery$RewriteMethod.getTermsEnum(MultiTermQuery.java:76) ~[checklistbank-nub-ws-2.12.jar:2.12]
	at org.apache.lucene.search.TermCollectingRewrite.collectTerms(TermCollectingRewrite.java:64) ~[checklistbank-nub-ws-2.12.jar:2.12]
	at org.apache.lucene.search.TopTermsRewrite.rewrite(TopTermsRewrite.java:67) ~[checklistbank-nub-ws-2.12.jar:2.12]
	at org.apache.lucene.search.MultiTermQuery.rewrite(MultiTermQuery.java:288) ~[checklistbank-nub-ws-2.12.jar:2.12]
	at org.apache.lucene.search.IndexSearcher.rewrite(IndexSearcher.java:636) ~[checklistbank-nub-ws-2.12.jar:2.12]
	at org.apache.lucene.search.IndexSearcher.createNormalizedWeight(IndexSearcher.java:683) ~[checklistbank-nub-ws-2.12.jar:2.12]
	at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:281) ~[checklistbank-nub-ws-2.12.jar:2.12]
	at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:269) ~[checklistbank-nub-ws-2.12.jar:2.12]
	at org.gbif.nub.lookup.NubIndex.matchByName(NubIndex.java:192) ~[checklistbank-nub-ws-2.12.jar:2.12]
]]>
    


Author: mdoering@gbif.org
Created: 2015-03-26 12:21:10.652
Updated: 2015-03-26 12:21:10.652
        
could be related to this request which happened at the same time:

WARN  [2015-03-26 12:15:59,271+0100] [qtp402394288-44284] org.eclipse.jetty.server.HttpChannel: /species/match?phylum=Magnoliophyta&order=Caryophyllales&kingdom=Plantae&family=Cactaceae&name=Matucana+haynei+(Otto+ex+Salm-Dyck)+Britton+%26+Rose+x+Borzicactus+hempelianus+(G%C3%83%C2%BCrke)+Donald+var.+rettigii+(Quehl)+Donald&class=Magnoliopsida&genus=Matucana
java.lang.NoSuchMethodError: javax.servlet.http.HttpServletRequest.isAsyncStarted()Z
	at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:684) ~[checklistbank-nub-ws-2.12.jar:2.12]
	at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143) ~[checklistbank-nub-ws-2.12.jar:2.12]

    


Author: mdoering@gbif.org
Created: 2015-03-26 12:41:34.154
Updated: 2015-03-26 12:41:34.154
        
isAsyncStarted() is part of the 3.0 servlet API.
Upgrading servlet API from 2.5 to 3.1 which is used by our jetty 2.7 dependency.
For 3.x the artifactId has changed also to javax.servlet-api
    


Author: mdoering@gbif.org
Comment: https://github.com/gbif/checklistbank/commit/aade232df13c30b8b42fc4b1a4e6fb93abe07c80
Created: 2015-03-26 17:50:30.95
Updated: 2015-03-26 17:50:30.95