Issue 13164

Nub creates duplicates for names with diacritic marks

13164
Reporter: mdoering
Assignee: mdoering
Type: Bug
Summary: Nub creates duplicates for names with diacritic marks
Priority: Critical
Resolution: Fixed
Status: Closed
Created: 2013-05-02 22:43:35.959
Updated: 2016-05-12 11:17:05.589
Resolved: 2015-09-18 20:27:12.686
        
Description: taken from the original google code issue:
https://code.google.com/p/gbif-ecat/issues/detail?id=98

Taxon names that differ only in the presence/absence of diacritic marks ought to be treated as spelling variants rather than as separate names.

Examples:
4307394	Achelous
6458829	Acheloüs

4377003 Achnanthes plonensis
4920773 Achnanthes plönensis

4919066 Bütschliella
6009122 Butschliella

5977298 Pseudopanthera oberthuri
1954573 Pseudopanthera oberthüri

I think there are quite a few of these.  There are also many cases where the names are also obviously synonyms but the spelling difference reflects a transliteration of a diacritic, for example:

2630477 Achnanthes ploenensis
4893234 Buetschliella

Best
Jonathan]]>
    


Author: mdoering@gbif.org
Comment: https://github.com/gbif/checklistbank/blob/master/checklistbank-cli/src/test/java/org/gbif/checklistbank/nub/NubBuilderTest.java#L130
Created: 2015-09-18 20:27:12.739
Updated: 2015-09-18 20:27:12.739