Issue 17529

species match TooComplexToDeterminizeException

17529
Reporter: mdoering
Type: Bug
Summary: species match TooComplexToDeterminizeException
Priority: Major
Resolution: Duplicate
Status: Closed
Created: 2015-03-26 12:19:46.019
Updated: 2015-04-02 11:24:42.871
Resolved: 2015-04-02 11:24:42.85
        
Description: Found in recent nub lookup logs. Seems the internal lucene fuzzy matching throws an error which we should catch and fallback to a non fuzzy search in those cases within NubIndex.matchByName.

WARN  [2015-03-26 12:15:59,271+0100] [qtp402394288-44284] org.eclipse.jetty.servlet.ServletHandler: /species/match
org.apache.lucene.util.automaton.TooComplexToDeterminizeException: Determinizing automaton would result in more than 10000 states.
	at org.apache.lucene.util.automaton.Operations.determinize(Operations.java:743) ~[checklistbank-nub-ws-2.12.jar:2.12]
	at org.apache.lucene.util.automaton.RunAutomaton.(RunAutomaton.java:138) ~[checklistbank-nub-ws-2.12.jar:2.12]
	at org.apache.lucene.util.automaton.ByteRunAutomaton.(ByteRunAutomaton.java:32) ~[checklistbank-nub-ws-2.12.jar:2.12]
	at org.apache.lucene.util.automaton.CompiledAutomaton.(CompiledAutomaton.java:203) ~[checklistbank-nub-ws-2.12.jar:2.12]
	at org.apache.lucene.util.automaton.CompiledAutomaton.(CompiledAutomaton.java:104) ~[checklistbank-nub-ws-2.12.jar:2.12]
	at org.apache.lucene.search.FuzzyTermsEnum.initAutomata(FuzzyTermsEnum.java:176) ~[checklistbank-nub-ws-2.12.jar:2.12]
	at org.apache.lucene.search.FuzzyTermsEnum.getAutomatonEnum(FuzzyTermsEnum.java:152) ~[checklistbank-nub-ws-2.12.jar:2.12]
	at org.apache.lucene.search.FuzzyTermsEnum.maxEditDistanceChanged(FuzzyTermsEnum.java:211) ~[checklistbank-nub-ws-2.12.jar:2.12]
	at org.apache.lucene.search.FuzzyTermsEnum.bottomChanged(FuzzyTermsEnum.java:205) ~[checklistbank-nub-ws-2.12.jar:2.12]
	at org.apache.lucene.search.FuzzyTermsEnum.(FuzzyTermsEnum.java:143) ~[checklistbank-nub-ws-2.12.jar:2.12]
	at org.apache.lucene.search.FuzzyQuery.getTermsEnum(FuzzyQuery.java:155) ~[checklistbank-nub-ws-2.12.jar:2.12]
	at org.apache.lucene.search.MultiTermQuery$RewriteMethod.getTermsEnum(MultiTermQuery.java:76) ~[checklistbank-nub-ws-2.12.jar:2.12]
	at org.apache.lucene.search.TermCollectingRewrite.collectTerms(TermCollectingRewrite.java:64) ~[checklistbank-nub-ws-2.12.jar:2.12]
	at org.apache.lucene.search.TopTermsRewrite.rewrite(TopTermsRewrite.java:67) ~[checklistbank-nub-ws-2.12.jar:2.12]
	at org.apache.lucene.search.MultiTermQuery.rewrite(MultiTermQuery.java:288) ~[checklistbank-nub-ws-2.12.jar:2.12]
	at org.apache.lucene.search.IndexSearcher.rewrite(IndexSearcher.java:636) ~[checklistbank-nub-ws-2.12.jar:2.12]
	at org.apache.lucene.search.IndexSearcher.createNormalizedWeight(IndexSearcher.java:683) ~[checklistbank-nub-ws-2.12.jar:2.12]
	at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:281) ~[checklistbank-nub-ws-2.12.jar:2.12]
	at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:269) ~[checklistbank-nub-ws-2.12.jar:2.12]
	at org.gbif.nub.lookup.NubIndex.matchByName(NubIndex.java:192) ~[checklistbank-nub-ws-2.12.jar:2.12]
]]>
    


Author: mdoering@gbif.org
Comment: Duplicate of POR-2725
Created: 2015-04-02 11:24:42.868
Updated: 2015-04-02 11:24:42.868