Issue 12712

When using foreign characters, some dataset searches do not bring back results

12712
Reporter: jcuadra
Assignee: fmendez
Type: Bug
Summary: When using foreign characters, some dataset searches do not bring back results
Priority: Critical
Resolution: Fixed
Status: Closed
Created: 2013-02-06 14:28:27.994
Updated: 2013-08-29 14:44:19.902
Resolved: 2013-02-06 19:35:05.82
        
Description: When using some "foreign characters, some dataset searches do not bring back results

Some examples:

=======================
*"Friedhof Friedenstraße (Berlin)"*
http://staging.gbif.org:8080/portal/dataset/8a772b0e-f762-11e1-a439-00145eb45e9a

no results...
http://staging.gbif.org:8080/portal/dataset/search?q=Friedhof+Friedenstraße+(Berlin)

=======================
*"Museo de Ciencias Naturales de Tenerife-Entomología"*
http://staging.gbif.org:8080/portal/dataset/7a4bd518-f762-11e1-a439-00145eb45e9a

no results...
http://staging.gbif.org:8080/portal/dataset/search?q=Museo+de+Ciencias+Naturales+de+Tenerife-Entomología

=======================
*"臺灣冷杉林植物群落之動態研究-以合歡山臺灣冷杉林永久樣區為例"*
http://staging.gbif.org:8080/portal/dataset/2344f83d-eefb-4635-afed-fb2a1c9bd466:junchunwang.3.1

no results...
http://staging.gbif.org:8080/portal/dataset/search?q=臺灣冷杉林植物群落之動態研究-以合歡山臺灣冷杉林永久樣區為例

]]>
    


Author: mdoering@gbif.org
Created: 2013-02-06 14:41:43.019
Updated: 2013-02-06 14:42:03.404
        
Holy crap, that is bad. Im surprised. It also doesn't work in common name species searches:

Searching for "Löwe" yields nothing:
http://staging.gbif.org:8080/portal-web-dynamic/species/search?dataset_key=d7dddbf4-2cf0-4f39-9b2a-bb099caae36c&q=l%C3%B6we

But that name exists:
http://ecat-dev.gbif.org/usage/5219404
http://staging.gbif.org:8080/portal/species/5219404

This search hits the German "double s" ß:
http://staging.gbif.org:8080/portal/species/search?dataset_key=d7dddbf4-2cf0-4f39-9b2a-bb099caae36c&q=weisstanne

But searching with the actual character doesnt yield anything:
http://staging.gbif.org:8080/portal/species/search?dataset_key=d7dddbf4-2cf0-4f39-9b2a-bb099caae36c&q=wei%C3%9Ftannen


I am pretty sure this used to work
    


Author: jcuadra@gbif.org
Created: 2013-02-06 14:57:48.916
Updated: 2013-02-06 14:57:48.916
        
The issues goes even down to the WSs
http://staging.gbif.org:8080/registry-search-ws/search?q=Entomología

http://staging.gbif.org:8080/registry-search-ws/search?q=Friedhof+Friedenstraße+(Berlin)
    


Author: mdoering@gbif.org
Comment: Fede, can you also check the species search? Im nearly sure we have the ascifolding filter in there already
Created: 2013-02-06 17:29:00.726
Updated: 2013-02-06 17:29:00.726


Author: fmendez@gbif.org
Comment: the new index has the ASCIIFolding filter, works now!
Created: 2013-02-06 19:35:00.86
Updated: 2013-02-06 19:35:00.86