Some dataset names are prefixed with the word = ENGLISH
14075
Reporter: jlegind
Assignee: lfrancke
Type: Bug
Summary: Some dataset names are prefixed with the word = ENGLISH
Priority: Blocker
Resolution: Fixed
Status: Closed
Created: 2013-09-26 15:48:16.016
Updated: 2013-12-16 17:50:31.064
Resolved: 2013-09-27 10:36:52.898
Description: Dataset UUID 17a729a0-7ed1-11df-8c4a-0800200c9a66 has title "ENGLISHSANT-Algae" while the coorect name is "SANT-Algae.
This affects all un-deleted resources belong to Herbario Sant (UUID def87a70-0837-11d9-acb2-b8a03c50a862) on crawler_registry on registry staging.gbif.org
This behavior is not seen on "registry" or on my localhost version of registry using the migration script from 25-09-2013
See Crawler portal entries here:http://crawler.gbif.org/dataset/search?q=fishbase
Also here: http://crawler.gbif.org/dataset/search?q=herbario+sant
¤¤¤ Seems to affect TAPIR resources edited 7 hours ago ###]]>
Author: omeyn@gbif.org
Comment: This must be a metadata sync error - something that's never happened on the non-crawler registries.
Created: 2013-09-26 15:51:54.401
Updated: 2013-09-26 15:51:54.401
Author: jlegind@gbif.org
Created: 2013-09-26 16:15:41.158
Updated: 2013-09-26 16:15:41.158
I am fairly sure that this behavior is limited to TAPIR datasets.
http://crawler.gbif.org/dataset/search?q=SPANISHinstituto
http://crawler.gbif.org/dataset/387e98a0-16fa-11df-b5b3-b8a03c50a862
Notice that the description is also prefixed.
Author: lfrancke@gbif.org
Created: 2013-09-26 16:20:32.582
Updated: 2013-09-26 16:20:32.582
Yes, this is because we get metadata in different languages from Tapir.
I agree that this looks shit but not sure what we want to do. We could always just take english...but I _think_ some people only provide spanish titles (I might remember wrong). So we need some logic. To be discussed.
Author: jlegind@gbif.org
Created: 2013-09-26 16:30:52.739
Updated: 2013-09-26 16:30:52.739
->Lars - Ok, but the designator, in capital letters, should not creep into the title:
Full Title
"ENGLISHPlants galls"
http://crawler.gbif.org/dataset/387e98a0-16fa-11df-b5b3-b8a03c50a862
Author: lfrancke@gbif.org
Created: 2013-09-27 10:03:13.892
Updated: 2013-09-27 10:03:13.892
What would you suggest we do?
Let's say we get a spanish and an english title. Or French and German...
Suggestion A:
1) Pick english if available
2) Pick any
Suggestion B:
* Concatenate all titles separated by "/" or something like that (but not including the language as it is done now)
Author: ahahn@gbif.org
Created: 2013-09-27 10:18:46.837
Updated: 2013-09-27 10:18:46.837
I would go for suggestion A - English if available, and otherwise the first language version popping up. This might introduce a risk of flip-flopping between French and Spanish or whatever if someone tries to be really thorough but changes sequence, but I assume that we can live with.
Where does this surface, though? At http://portaldev.gbif.org/dataset/17a729a0-7ed1-11df-8c4a-0800200c9a66, the title looks ok. Is it only the crawler environment that is concerned?
Author: omeyn@gbif.org
Comment: [~ahahn@gbif.org] yes only crawler env right now, but as soon as we do metasync with the new crawler into live registry, we'll see this. Option A sounds good to me.
Created: 2013-09-27 10:28:05.221
Updated: 2013-09-27 10:28:05.221
Author: jlegind@gbif.org
Comment: I second option A.
Created: 2013-09-27 10:29:48.566
Updated: 2013-09-27 10:29:48.566