Issue 11809

Incorporate french and spanish country names

11809
Reporter: lfrancke
Assignee: lfrancke
Type: Improvement
Summary: Incorporate french and spanish country names
Priority: Major
Resolution: Fixed
Status: Closed
Created: 2012-09-04 16:55:03.488
Updated: 2013-12-06 12:06:45.91
Resolved: 2012-09-05 13:46:45.281
        
Description: Andrea provided a list of country names in french and spanish that should go into the parsers project.

I need to check for collisions.]]>
    
Attachment CountryNamesForRollover_FrenchAndSpanish.txt


Author: lfrancke@gbif.org
Created: 2012-09-04 17:34:11.884
Updated: 2012-09-04 18:28:49.564
        
As a reminder for myself.

Convert the list from UTF-16 and DOS line endings, remove the header row, concatenate it with the existing file, sort the result, delete duplicate lines, cleans up whitespace and writes it to newCountryNames.txt
{code}
iconv -f utf-16 -t utf-8 CountryNamesForRollover_FrenchAndSpanish.txt | tr -d '\r' | sed '1d' | cat countryName.txt - | sort | awk -F'\t' '{ if ($1 in stored_lines) x = 1; else print; stored_lines[$1] = 1 }' | perl -pe 's|\t\s|\t|' > newCountryNames.txt
{code}

I made sure to check the result by diffing it against the old version and a few issues popped up like IREL being mapped to IE and GB and SABA being ambiguous.