Issue 11468

Species search unusably slow

11468
Reporter: trobertson
Assignee: trobertson
Type: Bug
Summary: Species search unusably slow
Priority: Blocker
Resolution: Fixed
Status: Closed
Created: 2012-06-22 14:33:12.724
Updated: 2013-08-29 14:44:34.625
Resolved: 2012-08-27 14:41:55.169
        
Description: Searches take20 secs. Try Animalia on a freshly started tomcat.

Could be a cache warming issue, so suggest considering this in investigations.

/etc/tomcat6/tomcat6.conf has the JAVA_OPTS to tune.

Using the following can be used to warm the OS level page cache:
 cat / > _*.* > /dev/null

I believe this drive is partitioned for reasonably large files, but check with Andrei]]>


Author: mdoering@gbif.org
Created: 2012-06-26 22:33:11.855
Updated: 2012-06-26 22:33:11.855
        
A pure search on the webservice without facets takes ~300ms from copenhagen, but outside the GBIF LAN incl a 90ms latency:
http://jawa.gbif.org:8080/checklistbank-search-ws/search?q=Abies%20alba

This seems pretty acceptable.
When turning on all 7 facets it still takes exactly the same time for a binomial search at least:
http://jawa.gbif.org:8080/checklistbank-search-ws/search?q=Abies%20alba&facet=RANK&facet=CHECKLIST&facet=HIGHERTAXON&facet=EXTINCT&facet=TAXSTATUS&facet=MARINE&facet=THREAT


For a monomial search it still takes the same time:
http://jawa.gbif.org:8080/checklistbank-search-ws/search?q=Abies&facet=RANK&facet=CHECKLIST&facet=HIGHERTAXON&facet=EXTINCT&facet=TAXSTATUS&facet=MARINE&facet=THREAT


So it appears that the slow search is actually caused by the portals use of it.
The original guess still is that looking up all facet values, namely titles for datasets, causes the slow search.
Needs to be further investigated.


Author: mdoering@gbif.org
Created: 2012-07-03 10:24:06.458
Updated: 2012-07-03 10:24:06.458
        
On a search page like this:
http://jawa.gbif.org:8080/portal-web/species/search?q=frog&checklist=d7dddbf4-2cf0-4f39-9b2a-bb099caae36c

The portal needs to do a lookup for 35 checklist titles and 100 name usage titles, all being separate, uncached web service calls.
When this lookup is out commented its much, much faster.


Author: mdoering@gbif.org
Created: 2012-07-03 10:25:17.359
Updated: 2012-07-03 10:25:17.359
        
Possible improvements:
- cache for checklists and higher usages
- new api method that takes a list of ids and returns a list of titles only
- load all titles asynchroneously
- load the initially hidden "see all" titles only on request asynchronously


Author: trobertson@gbif.org
Created: 2012-07-03 10:34:54.823
Updated: 2012-07-03 10:35:13.146
        
The comments above are not entirely accurate - yes a load of get by IDs can add some time and be great candidates for caching, but the actual search itself is slow for a new term, and that can't be cached:

http://jawa.gbif.org:8080/checklistbank-search-ws/search?q=&facet=RANK&facet=CHECKLIST&facet=HIGHERTAXON&facet=EXTINCT&facet=TAXSTATUS&facet=MARINE&facet=THREAT

Repeated searches are immediate, presumably due to internal SOLR level caching, and might then indicate further improvements as outlined above, but the search is indeed problematic


Author: trobertson@gbif.org
Comment: The species search has been addressed, and the get by keys are addressed by using the caching service between the web app and the web services.  
Created: 2012-08-27 14:41:55.2
Updated: 2012-08-27 14:41:55.2