Issue 14027

Collector as search field, with no standardization is useless

14027
Reporter: feedback bot
Type: Improvement
Summary: Collector as search field, with no standardization is useless
Priority: Major
Status: Open
Created: 2013-09-24 01:46:57.228
Updated: 2015-11-11 09:56:35.725
        
        
Description: It appears as though searches on collector name have to be entered exactly as they were entered into the database, mispelled or not, with or without caps, first initial or name first or last, with or without additional collectors, etc. ad infinitum.  Impossible. Trash in; nothing out.  Such fields need to be standardized.  Everyone knows that.  It's useless otherwise.

*Reporter*: Barry Hammel
*E-mail*: [mailto:barry.hammel@mobot.org]]]>
    


Author: jlegind@gbif.org
Created: 2015-11-10 14:39:42.709
Updated: 2015-11-10 15:58:56.374
        
+1

{quote}
I am trying to use your GBIF … a very good and useful idea to geolocalize the samples on a plan, but alas the filter by name doesn’t work (or maybe I don’t know how to use it!) !
I’m trying to find Perrier’s collections from Madagascar (His full name is Henri Perrier de la Bâthie); I have 17367 records on the mnhn website (https://science.mnhn.fr/institution/mnhn/collection/p/item/list?countryCode=MG&recordedBy=perrier) and 0 record on your website (http://www.gbif.org/occurrence/search?COUNTRY=MG&RECORDED_BY=Perrier&DATASET_KEY=b5cdf794-8fa4-4a85-8b26-755d087bf531).
{quote}

*Record_count       Collector*
	13	Perrier de la Bâthie, H
	112	Perrier de la Bâthie
	5	Perrier de la Bathie, H.
	3	Perrier de la Bâthie, h;
	1	Perrier
	1	Perrier de la Bâthie, A.
	7	Perrier de la Bâthie, J.M.H.A.
	17184	Perrier de la Bâthie, H.
	1	Perrier, R.
	45	Perrier, E.
	12	Perrier de la Bâthie, P.
	3	Perrier de la Bathie
	35	Perrier, A.
	17	Perrier, H.
	2	[Perrier], H.
    


Author: rdmpage
Created: 2015-11-10 22:08:01.785
Updated: 2015-11-10 22:15:36.109
        
[~jlegind@gbif.org] I think there are two things here,

1. GBIF doesn't support proper text searching on these fields, so users need to know exactly what they're looking for :(

2. The field the user actually needs to search on is *recordedBy* which AFAIK isn't indexed anyway.

If you go to the MNHN site you see, for example, https://science.mnhn.fr/institution/mnhn/collection/p/item/p030746?listIndex=1&listCount=17374 which has a link to the GBIF record http://www.gbif.org/occurrence/439286066/verbatim The recordedBy  field has the value *Perrier de la Bâthie, H.* which is what the user is after.

Standardisation would be nice, but a huge job that few would be willing to take on, especially if decent full-text indexing would get the user what they were after. I guess scaling this would be an issue, but perhaps we could have separate indexes, such as one that was solely for people's names.
    


Author: trobertson@gbif.org
Created: 2015-11-11 09:44:01.845
Updated: 2015-11-11 09:44:01.845
        
Thanks [~rdmpage]

RecordedBy is indexed, so the URL is this.
  http://www.gbif.org/occurrence/search?COUNTRY=MG&RECORDED_BY=Perrier%20de%20la%20B%C3%A2thie,%20H.&DATASET_KEY=b5cdf794-8fa4-4a85-8b26-755d087bf531

FYI (and off topic for this issue) - we have working versions of full text indexing in-house on a demo system and will be rolling that out with faceted occurrence search (e.g. facets of species contained, counties contained), and maps of the results (most likely back to some kind of grid for ad hoc maps like SIB Colombia for example) in the first half of 2016.  This is all looking possible thanks to work on SOLR 5 which we are running on a new SOLR cloud here to understand it's capabilities.

    


Author: rdmpage
Created: 2015-11-11 09:56:35.725
Updated: 2015-11-11 09:56:35.725
        
Thanks [~trobertson@gbif.org], not sure how I missed that. Interestingly the look-ahead feature for the filter doesn't show "Perrier de la Bâthie, H." as an option. If you type in "Perrier" you get some suggestions, but "Perrier de" has no suggestions. If that worked the user might have discovered that s/he needed to type in exactly "Perrier de la Bâthie, H." to get the desired results.

The SOLR work sounds cool, looking forward to seeing that.