Issue 14075

Some dataset names are prefixed with the word = ENGLISH

14075
Reporter: jlegind
Assignee: lfrancke
Type: Bug
Summary: Some dataset names are prefixed with the word = ENGLISH
Priority: Blocker
Resolution: Fixed
Status: Closed
Created: 2013-09-26 15:48:16.016
Updated: 2013-12-16 17:50:31.064
Resolved: 2013-09-27 10:36:52.898
        
Description: Dataset UUID 17a729a0-7ed1-11df-8c4a-0800200c9a66 has title "ENGLISHSANT-Algae" while the coorect name is "SANT-Algae.

This affects all un-deleted resources belong to Herbario Sant (UUID def87a70-0837-11d9-acb2-b8a03c50a862) on crawler_registry on registry staging.gbif.org

This behavior is not seen on "registry" or on my localhost version of registry using the migration script from 25-09-2013

See Crawler portal entries here:http://crawler.gbif.org/dataset/search?q=fishbase
Also here: http://crawler.gbif.org/dataset/search?q=herbario+sant

¤¤¤ Seems to affect TAPIR resources edited 7 hours ago ###]]>
    


Author: trobertson@gbif.org
Comment: Isolated to crawler environment
Created: 2013-09-26 15:50:43.916
Updated: 2013-09-26 15:50:43.916


Author: omeyn@gbif.org
Comment: This must be a metadata sync error - something that's never happened on the non-crawler registries.
Created: 2013-09-26 15:51:54.401
Updated: 2013-09-26 15:51:54.401


Author: jlegind@gbif.org
Created: 2013-09-26 16:15:41.158
Updated: 2013-09-26 16:15:41.158
        
I am fairly sure that this behavior is limited to TAPIR datasets.
http://crawler.gbif.org/dataset/search?q=SPANISHinstituto

http://crawler.gbif.org/dataset/387e98a0-16fa-11df-b5b3-b8a03c50a862

Notice that the description is also prefixed.
    


Author: lfrancke@gbif.org
Created: 2013-09-26 16:20:32.582
Updated: 2013-09-26 16:20:32.582
        
Yes, this is because we get metadata in different languages from Tapir.

I agree that this looks shit but not sure what we want to do. We could always just take english...but I _think_ some people only provide spanish titles (I might remember wrong). So we need some logic. To be discussed.
    


Author: jlegind@gbif.org
Created: 2013-09-26 16:30:52.739
Updated: 2013-09-26 16:30:52.739
        
->Lars - Ok, but the designator, in capital letters, should not creep into the title:

Full Title
"ENGLISHPlants galls"

http://crawler.gbif.org/dataset/387e98a0-16fa-11df-b5b3-b8a03c50a862
    


Author: lfrancke@gbif.org
Created: 2013-09-27 10:03:13.892
Updated: 2013-09-27 10:03:13.892
        
What would you suggest we do?

Let's say we get a spanish and an english title. Or French and German...

Suggestion A:
1) Pick english if available
2) Pick any

Suggestion B:
* Concatenate all titles separated by "/" or something like that (but not including the language as it is done now)
    


Author: ahahn@gbif.org
Created: 2013-09-27 10:18:46.837
Updated: 2013-09-27 10:18:46.837
        
I would go for suggestion A - English if available, and otherwise the first language version popping up. This might introduce a risk of flip-flopping between French and Spanish or whatever if someone tries to be really thorough but changes sequence, but I assume that we can live with.

Where does this surface, though? At http://portaldev.gbif.org/dataset/17a729a0-7ed1-11df-8c4a-0800200c9a66, the title looks ok. Is it only the crawler environment that is concerned?
    


Author: lfrancke@gbif.org
Comment: Okay, great. I'll implement option A now.
Created: 2013-09-27 10:24:48.079
Updated: 2013-09-27 10:24:48.079


Author: omeyn@gbif.org
Comment: [~ahahn@gbif.org] yes only crawler env right now, but as soon as we do metasync with the new crawler into live registry, we'll see this. Option A sounds good to me.
Created: 2013-09-27 10:28:05.221
Updated: 2013-09-27 10:28:05.221


Author: jlegind@gbif.org
Comment: I second option A.
Created: 2013-09-27 10:29:48.566
Updated: 2013-09-27 10:29:48.566