17530
Reporter: mdoering
Assignee: mdoering
Type: Bug
Summary: species match TooComplexToDeterminizeException
Priority: Major
Resolution: Fixed
Status: Closed
Created: 2015-03-26 12:19:47.6
Updated: 2015-03-26 17:50:30.953
Resolved: 2015-03-26 17:50:30.928
Description: Found in recent nub lookup logs. Seems the internal lucene fuzzy matching throws an error which we should catch and fallback to a non fuzzy search in those cases within NubIndex.matchByName.
WARN [2015-03-26 12:15:59,271+0100] [qtp402394288-44284] org.eclipse.jetty.servlet.ServletHandler: /species/match
org.apache.lucene.util.automaton.TooComplexToDeterminizeException: Determinizing automaton would result in more than 10000 states.
at org.apache.lucene.util.automaton.Operations.determinize(Operations.java:743) ~[checklistbank-nub-ws-2.12.jar:2.12]
at org.apache.lucene.util.automaton.RunAutomaton.(RunAutomaton.java:138) ~[checklistbank-nub-ws-2.12.jar:2.12]
at org.apache.lucene.util.automaton.ByteRunAutomaton.(ByteRunAutomaton.java:32) ~[checklistbank-nub-ws-2.12.jar:2.12]
at org.apache.lucene.util.automaton.CompiledAutomaton.(CompiledAutomaton.java:203) ~[checklistbank-nub-ws-2.12.jar:2.12]
at org.apache.lucene.util.automaton.CompiledAutomaton.(CompiledAutomaton.java:104) ~[checklistbank-nub-ws-2.12.jar:2.12]
at org.apache.lucene.search.FuzzyTermsEnum.initAutomata(FuzzyTermsEnum.java:176) ~[checklistbank-nub-ws-2.12.jar:2.12]
at org.apache.lucene.search.FuzzyTermsEnum.getAutomatonEnum(FuzzyTermsEnum.java:152) ~[checklistbank-nub-ws-2.12.jar:2.12]
at org.apache.lucene.search.FuzzyTermsEnum.maxEditDistanceChanged(FuzzyTermsEnum.java:211) ~[checklistbank-nub-ws-2.12.jar:2.12]
at org.apache.lucene.search.FuzzyTermsEnum.bottomChanged(FuzzyTermsEnum.java:205) ~[checklistbank-nub-ws-2.12.jar:2.12]
at org.apache.lucene.search.FuzzyTermsEnum.(FuzzyTermsEnum.java:143) ~[checklistbank-nub-ws-2.12.jar:2.12]
at org.apache.lucene.search.FuzzyQuery.getTermsEnum(FuzzyQuery.java:155) ~[checklistbank-nub-ws-2.12.jar:2.12]
at org.apache.lucene.search.MultiTermQuery$RewriteMethod.getTermsEnum(MultiTermQuery.java:76) ~[checklistbank-nub-ws-2.12.jar:2.12]
at org.apache.lucene.search.TermCollectingRewrite.collectTerms(TermCollectingRewrite.java:64) ~[checklistbank-nub-ws-2.12.jar:2.12]
at org.apache.lucene.search.TopTermsRewrite.rewrite(TopTermsRewrite.java:67) ~[checklistbank-nub-ws-2.12.jar:2.12]
at org.apache.lucene.search.MultiTermQuery.rewrite(MultiTermQuery.java:288) ~[checklistbank-nub-ws-2.12.jar:2.12]
at org.apache.lucene.search.IndexSearcher.rewrite(IndexSearcher.java:636) ~[checklistbank-nub-ws-2.12.jar:2.12]
at org.apache.lucene.search.IndexSearcher.createNormalizedWeight(IndexSearcher.java:683) ~[checklistbank-nub-ws-2.12.jar:2.12]
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:281) ~[checklistbank-nub-ws-2.12.jar:2.12]
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:269) ~[checklistbank-nub-ws-2.12.jar:2.12]
at org.gbif.nub.lookup.NubIndex.matchByName(NubIndex.java:192) ~[checklistbank-nub-ws-2.12.jar:2.12]
]]>
Author: mdoering@gbif.org
Created: 2015-03-26 12:21:10.652
Updated: 2015-03-26 12:21:10.652
could be related to this request which happened at the same time:
WARN [2015-03-26 12:15:59,271+0100] [qtp402394288-44284] org.eclipse.jetty.server.HttpChannel: /species/match?phylum=Magnoliophyta&order=Caryophyllales&kingdom=Plantae&family=Cactaceae&name=Matucana+haynei+(Otto+ex+Salm-Dyck)+Britton+%26+Rose+x+Borzicactus+hempelianus+(G%C3%83%C2%BCrke)+Donald+var.+rettigii+(Quehl)+Donald&class=Magnoliopsida&genus=Matucana
java.lang.NoSuchMethodError: javax.servlet.http.HttpServletRequest.isAsyncStarted()Z
at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:684) ~[checklistbank-nub-ws-2.12.jar:2.12]
at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143) ~[checklistbank-nub-ws-2.12.jar:2.12]
Author: mdoering@gbif.org
Created: 2015-03-26 12:41:34.154
Updated: 2015-03-26 12:41:34.154
isAsyncStarted() is part of the 3.0 servlet API.
Upgrading servlet API from 2.5 to 3.1 which is used by our jetty 2.7 dependency.
For 3.x the artifactId has changed also to javax.servlet-api