Issue 10411

Add higher taxon facets to species search

10411
Reporter: mdoering
Assignee: mdoering
Type: Improvement
Summary: Add higher taxon facets to species search
Priority: Major
Resolution: Fixed
Status: Closed
Created: 2011-11-16 00:08:20.847
Updated: 2013-12-09 14:01:27.902
Resolved: 2011-12-01 21:58:47.643


Author: mdoering@gbif.org
Created: 2011-11-21 12:52:34.466
Updated: 2011-11-21 12:53:30.437
        
User story POR-4 says "higher taxa will be links that on click will add a facet for higher taxa" for search results.
Right now they are links to the name usage detail page.

What is best?

    


Author: mdoering@gbif.org
Created: 2011-11-21 12:58:54.54
Updated: 2011-11-21 12:58:54.54
        
For entering higher taxa a true facet that lists all matching higher taxa is probably unrealistic.
Would a simple autocomplete across all names (or only subgenus and above) be good enough?


    


Author: trobertson@gbif.org
Created: 2011-11-21 13:08:17.633
Updated: 2011-11-21 13:08:17.633
        
A facet would be unrealistic for something like au*, but in reality that is not a very real use case.

A real use case might be: "I want to compare the classifications across checklists of the plants in Oenanthe"

- User enters Oenanthe
- Users deselects the GBIF Backbone taxonomy only
- User realised that there are plants and birds in there
- User uses the higher taxa facet to select "Plantae" and results stop showing the birds

I suggest we compile a collection of these, and then determine what functionality the facet should support.
I don't understand what an autocomplete higher taxa would do - it is intended to be a facet not a search filter.


I suspect a true facet will be possible for many searches, and for others (e.g. wildcard of A*) it might not make any sense anyway.  Perhaps it will result in offering the facet only if the facetable options are of a reasonable number?

    


Author: trobertson@gbif.org
Created: 2011-11-21 13:10:39.304
Updated: 2011-11-21 13:10:39.304
        
Another use case relates to the fact that this page is the only page in the portal that allows you to compile and download a species checklist, so the facets need to support those use cases.  E.g. I want a checklist of the species endangered birds (requires a facet to allow selection of "Aves")


    


Author: mdoering@gbif.org
Created: 2011-11-21 13:16:45.885
Updated: 2011-11-21 13:18:13.011
        
Homonyms are one use case (also within vernacular names where they might appear much more often).

Others could be searching by the species epithet alone? Many zoologists use epithets alone frequently, so a search for "vulgaris" would make sense to be faceted by higher taxa down to but excluding genus.

The above genus limit seems like a general characteristic. If you know the genus already you can search for it directly, so only higher taxa with rank equal family or above make sense here.

The autocomplete was the original idea we had with vizzuality on how to select the higher taxon. It would work against the body of all higher nub taxa instead of the ones matching the current query.
    


Author: mdoering@gbif.org
Comment: The above Aves download example mandates the option for having a list of taxa combined by OR
Created: 2011-11-21 13:17:49.186
Updated: 2011-11-21 13:17:49.186


Author: mdoering@gbif.org
Created: 2011-11-21 13:20:44.873
Updated: 2011-11-21 13:20:44.873
        
It appears that the higher taxon must also be derived from the nub only as the nubKey is the only one which relates different checklist usages.
If not only a single checklist would match and the facet would overlap with the checklist facet.
    


Author: mdoering@gbif.org
Created: 2011-11-21 13:42:17.536
Updated: 2011-11-21 13:42:17.536
        
Thinking about the user experience again the links from the higher classification in the search results alone appears to work well for most cases.
Adding a higher taxon facet only makes sense when you have many hits. Having no autocomplete but only links from matching records requires that the higher taxon you want to add does show up in the first or second result page at least. This might not necessarily be the case, but its probably a rare one. Also one can drill down from sth like kingdom over the class to the family in case there are that many matches and the family you want to filter by does not show up on the first result page.
    


Author: mdoering@gbif.org
Comment: Suggest to start by only making the higher classification links "add facet" links?
Created: 2011-11-21 13:42:53.373
Updated: 2011-11-21 13:42:53.373


Author: ahahn@gbif.org
Created: 2011-11-21 13:58:44.529
Updated: 2011-11-21 13:58:44.529
        
Can we have a true facet (= narrowing down of the result set) at least for kingdoms, so that the user has a one-glance overview if there are hits across kingdoms? Part of the facet search idea was to allow quick judgement of search result content, which we loose if we have autocomplete-only.

I am not sure I understand the case of epithet search only ("% vulgaris"). Does the user really ever want all names across several groups that end in "vulgaris", or is it a lazy search by someone focussed on a certain group, omitting the otherwise known Genus name? Or something else?

If we decide to use autocomplete, I would still plead for using it on the current result set only. If a not-really-a-facet-filter gives frequent "no match" results because it includes terms from the full higher taxonomy that do not match any of the records of the previous filter, it can be a very frustrating user experience. The function of the filter should always be to further limit the current selection, but not down to NULL, or at least not without displaying an available facet with a (0) records count info.

I still like the original idea best of "higher taxa will be links that on click will add a facet for higher taxa" for search results. As we are in the middle of a search filter composition at this point, it feels more natural than jumping out of the search and straight into the higher taxon species page for an individual result record(?)

    


Author: ahahn@gbif.org
Comment: vote for: "Suggest to start by only making the higher classification links 'add facet' links?"
Created: 2011-11-21 14:07:37.47
Updated: 2011-11-21 14:10:20.756


Author: trobertson@gbif.org
Created: 2011-11-21 14:11:35.475
Updated: 2011-11-21 14:11:35.475
        
+1 on the higher classification links.

Just to clarify:

- User searches for Onanthe
- Results show:

  Oenanthe
  Animalia Chordata Aves ...

  Oenanthe
  Plantae ....


- User can click on Animalia, Chordata, Plantae etc and this will reduce the results to limit to only those records

Probably this covers most use cases anyway, so is a good start (and should be relatively trivial)
    


Author: fmendez@gbif.org
Created: 2011-11-21 14:40:18.924
Updated: 2011-11-21 14:40:18.924
        
The higher taxa facet is useful for any kind of search, is nice summary of counts of the classification of all results, and from my perspective is better if it works like the other facets. For the UI i was thinking in at least 3 options:
1) Display name + checklist:
   Animalia (992929) (GBIF Backbone)
   Plantae  (23423) (Wikispecies)
   Plantae  (12333) (GBIF Backbone)

If we already have a CHECKLIST selected we can remove from the UI the "(Checklist)".
This require using the key values for accomplish the facets because we have same classification names coming from different checklists.

2) Enable the facet link only when the user selects a "checklist" and never use the "(Checklist)".

3) Display the labels only:
   Animalia (992929)
   Plantae  (35756)
This could be (don't know how much) confusing by the fact that you have Name Usages for Plantae comming from 2 checklists.


Implementation implications:
1) In this moment we have a multivalue field "higher_taxon" containing the values for k,p,c,o,f,g,sg,s. Calculate on facet on multivalue fields has a penalty in performance, we can try with a new field "higher_taxon_key" that will holds the key values for k,p,c,o,f,g,sg,s (*_key).

2) From the Solr point of view, higher_taxa is actually 8 facets, 1 for each classification: k,p,c,o,f,g,sg,s. We can calculate each facet individually and then present them in the UI as 1 facet; this approach could be better in terms of performance.
    


Author: mdoering@gbif.org
Created: 2011-11-21 15:10:19.316
Updated: 2011-11-21 15:10:19.316
        
I don't think we should mix the checklist facet with the higher taxon one. Doig that actually prevents the use of it in searches across all checklists. And that is probably one of the important areas when we need it, as there will be far more results that need to be somehow limited.

We therefore need to work with the nubKeys only - but that has another serious implication on our implementation. We cannot use the NameUsage.kingdomKey etc, as these are keys to the checklist usage of that very checklist and not the nub.

I will start implementing this by populating the (renamed) higher_taxon_nub_key solr multivalue field
    


Author: mdoering@gbif.org
Comment: And yes, I think its good and feasible this way to use true facets and already show the ones with most entries just as we do for other facets too. Plus making the classification clickable.
Created: 2011-11-21 15:11:38.614
Updated: 2011-11-21 15:11:38.614


Author: mdoering@gbif.org
Created: 2011-11-23 12:55:06.975
Updated: 2011-11-23 12:55:06.975
        
Ok, implemented this mostly now.
But Id like some feedback if this is an intuitive behavior.
The issues spring to my mind:

a) right now you can select higher taxa filter below family, but there is no data in the index so you get 0 results. Either we deactivate the links for genus and below or add the data to the index so it yields results

b) all filtering is done via the nub. So "Plants" as kingdom is the same as "Plantae" - is this confusing and should be done on the plain canonical instead?

c) (applies to all facets): even though you can filter by several values of the same facet, there is no way of entering multiple values. Once you select a higher taxon, you won't see the "outside" values of that facet. If plants are selected, no animals will show up in the facet. An example link to show you can have multiple values selected: http://staging.gbif.org:8080/portal-web-dynamic/species/search?q=vulgaris&rank=species&rank=variety

    


Author: ahahn@gbif.org
Created: 2011-11-23 13:49:51.694
Updated: 2011-11-23 13:49:51.694
        
a) not sure I found the right place to look. The index surely contains genera? Apart from that: I am not sure whether including the genus in the higher taxon filter is necessary. If you are interested in species under a genus, you probably started your search by that, anyway
b) Need to check the data content there. The current selection under Higher Taxon does not give "Plants", but a whole lot of numbers, which does not make sense. In general I agree that we should not have duplication here (synonyms or otherwise) - only accepted canonical names
c) Initially, we had expected different kinds of facet filters (gliders for date ranges, tick boxes for multiple-select, etc). Somehow we lost track of that, but will probably want it back. Also: when and why did we introduce the "selected filters" section at the top? The displayed filters should be self-explaining without this duplication; filter removal had been attached to each filter section before
    


Author: trobertson@gbif.org
Created: 2011-11-23 14:13:22.715
Updated: 2011-11-23 14:13:22.715
        
Wow, this is huge progress since I last looked - on first impressions, very nice work indeed to you all.

I was playing, and everything looked ok for a while, but then please see:
http://staging.gbif.org:8080/portal-web-dynamic/species/search?checklist=GBIF%20Taxonomic%20Backbone&q=puma&initDefault=false&highertaxon=1&rank=species&highertaxon=216
Could it be that the Insecta and Animalia are OR'ed and not AND'ed ?  That is not a good user experience.

When I search for "Felidae", I see "Felidae" in the higher taxon facet - bug?


    


Author: mdoering@gbif.org
Created: 2011-11-24 13:08:11.59
Updated: 2011-11-24 13:08:11.59
        
Andrea: you are right. The selected filters could as well be shown instead of the available facet values. Would that be better? Or would it be confusing to mix selected filters with options/counts?

Tim: Multiple search filters for the same parameter are logically combined as ORs. Do you think this is incorrect? My user experience was pretty good like that. But maybe replacing the current higher taxon filter instead of adding a new is the right thing to do?

Regarding Felidae in the facets this is not a fist class "bug", but probably nothing nice to have. We simply show all higher nub taxa of all matching classifications in the facets. To exclude some frm the UI like the currently searched nub (not trivial to detect which usage that would be though) would need extra code
    


Author: ahahn@gbif.org
Created: 2011-11-25 12:14:48.022
Updated: 2011-11-25 12:14:48.022
        
I was really advocating to remove the SELECTED FILTERS section altogether, and let the facets speak for themselves.

From today's discussion:
- single selection filters show the selected value within the facets section. They need a "remove" option to reset the facet category to display all available values again, so that a user can change their filter on the facet
- multiple selection filters (e.g.: extinction status, higher taxonomy, distribution) should ideally continue to display the available values after select / refresh, with the selected ones marked. With the non-selected ones showing, they can be added to the selection through marking them. Marked ones can be removed by de-selcting the mark (tick box or similar)

NB: in a download, it would still be good service to add a summary of the search that generated this download, in a form that allows the user to reproduce this later. But this should rather be a text file supplied with the download, not something to copy off the screen.