Issue 10474

facet counts wrong

10474
Reporter: mdoering
Assignee: fmendez
Type: Bug
Summary: facet counts wrong
Priority: Major
Resolution: Fixed
Status: Closed
Created: 2011-12-05 13:44:18.215
Updated: 2013-12-09 13:40:40.843
Resolved: 2011-12-09 10:46:02.51
        
Description: When searching for fishes by using several highertaxon facet filters the counts cannot be right:

http://localhost:8080/species/search?highertaxon=119&highertaxon=120&highertaxon=121&highertaxon=204&highertaxon=238&highertaxon=239&highertaxon=4853178&highertaxon=3238258&highertaxon=4836892&highertaxon=4815623

There are apparently more than 10 million species in the nub, while there are only 2 million or so across all kingdoms:
species (10,469,859)

Also the counts for the higher taxa seem completely wrong and how can a query like this match Insects?
Animalia (6,248,627)
Arthropoda (4,749,187)
Plantae (4,098,652)
Insecta (4,015,472)
]]>
    


Author: mdoering@gbif.org
Created: 2011-12-05 13:45:48.944
Updated: 2011-12-05 13:45:48.944
        
Oh, the counts don't take facet filters into account.
So neither the limit to the nub nor the fish filters influence the counts.

So basically these counts are for the entire solr index. In that case they are probably right.

But is this in any way understandable for a user?
    


Author: ahahn@gbif.org
Comment: I would propose not to display counts if they do not relate to the current filtered set. Their only purpose is to allow a user to decide what else to filter on and assess the impact ("there are only three accepted names, so if I add this filter, I will only have three results left"), so giving full index counts is not useful.
Created: 2011-12-05 13:54:06.309
Updated: 2011-12-05 13:54:06.309


Author: fmendez@gbif.org
Created: 2011-12-06 12:43:52.55
Updated: 2011-12-06 12:43:52.55
        
Markus, is right the counts are for the entire index; in this case the "user" is searching for everything, then the counts are calculated, and finally the displayed results are filtered by the default filter: GBIF Nub key.

The semantic of the results page is: total results XXXXXX, viewing XXXX filtered by the facets.
    


Author: mdoering@gbif.org
Created: 2011-12-06 12:56:32.055
Updated: 2011-12-06 12:56:32.055
        
The counts are only for the query string, not the facet filters.
Is this useful at all then? If the default search is limited to the nub and you get counts across all checklists I don't think this makes sense.
    


Author: mdoering@gbif.org
Created: 2011-12-08 10:29:28.163
Updated: 2011-12-08 10:29:28.163
        
A site using both multiselects and single filter:
http://www.musiciansfriend.com/gibson?_requestid=808272#fT=2012:0.00-25.00|25.00-50.00&gP=1&pS=20&v=g&sB=bS&lP=b&brandId=988

ALA only allow single facet filters and no multiselects. This removes the problem, but I think we want multiselects for most facets at least:
http://bie.ala.org.au/search?q=Passer&fq=australian_s:recorded

    


Author: mdoering@gbif.org
Created: 2011-12-08 10:56:33.532
Updated: 2011-12-08 10:56:33.532
        
Story 1:

 - "GBIF Taxonomic Backbone" is preselected.
 - I chose "Lepidoptera" under Higher Taxon - all counts in higher taxon still show.
 - I chose "species" for rank, the Lepidoptera count goes down
 - I add "subspecies" for rank, the Lepidoptera count goes slightly up
    


Author: mdoering@gbif.org
Created: 2011-12-08 11:02:49.222
Updated: 2011-12-08 11:02:49.222
        
As a vernal rule we figure this as the best approach, ignoring "state" as we want a stateless application.
So just think of a url that defines x,y,z filters, but no change over time.


A url with a filter for rank=species&checklist=NUB
 - the counts for the taxonomic status facet would use both filters
 - the counts for the checklist facet would use only the species filter
 - similar the rank one would use only the nub filter

Adding a third filter status accepted, so rank=species&checklist=NUB&status=accepted
 - rank count would be for status=accepted&checklist=nub
 - status count uses rank=species&checklist=NUB
 - checklist counts use rank=species&status=accepted

A multi facet url with 2 checklists selected: status=accepted&(checklist=nub or checklist=ITIS)?
 - the rank count would be using status=accepted&(checklist=nub or checklist=ITIS)
 - the checklist counts using status=accepted
 - the status count checklist=nub or checklist=ITIS


In general all facets can be multiselects and show counts applying all filters but exclude the facet the counts are calculated for.