Issue 13164
Nub creates duplicates for names with diacritic marks
13164
Reporter: mdoering
Assignee: mdoering
Type: Bug
Summary: Nub creates duplicates for names with diacritic marks
Priority: Critical
Resolution: Fixed
Status: Closed
Created: 2013-05-02 22:43:35.959
Updated: 2016-05-12 11:17:05.589
Resolved: 2015-09-18 20:27:12.686
Description: taken from the original google code issue:
https://code.google.com/p/gbif-ecat/issues/detail?id=98
Taxon names that differ only in the presence/absence of diacritic marks ought to be treated as spelling variants rather than as separate names.
Examples:
4307394 Achelous
6458829 Acheloüs
4377003 Achnanthes plonensis
4920773 Achnanthes plönensis
4919066 Bütschliella
6009122 Butschliella
5977298 Pseudopanthera oberthuri
1954573 Pseudopanthera oberthüri
I think there are quite a few of these. There are also many cases where the names are also obviously synonyms but the spelling difference reflects a transliteration of a diacritic, for example:
2630477 Achnanthes ploenensis
4893234 Buetschliella
Best
Jonathan]]>
Author: mdoering@gbif.org
Comment: https://github.com/gbif/checklistbank/blob/master/checklistbank-cli/src/test/java/org/gbif/checklistbank/nub/NubBuilderTest.java#L130
Created: 2015-09-18 20:27:12.739
Updated: 2015-09-18 20:27:12.739