Uploaded image for project: 'Portal'
  1. Portal
  2. POR-284

Nub creates duplicates for names with diacritic marks

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Critical Critical
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: Checklistbank
    • Labels:

      Description

      taken from the original google code issue:
      https://code.google.com/p/gbif-ecat/issues/detail?id=98

      Taxon names that differ only in the presence/absence of diacritic marks ought to be treated as spelling variants rather than as separate names.

      Examples:
      4307394 Achelous
      6458829 Acheloüs

      4377003 Achnanthes plonensis
      4920773 Achnanthes plönensis

      4919066 Bütschliella
      6009122 Butschliella

      5977298 Pseudopanthera oberthuri
      1954573 Pseudopanthera oberthüri

      I think there are quite a few of these. There are also many cases where the names are also obviously synonyms but the spelling difference reflects a transliteration of a diacritic, for example:

      2630477 Achnanthes ploenensis
      4893234 Buetschliella

      Best
      Jonathan

        Gliffy Diagrams

        Issue Links

          Activity

          Show
          Markus Döring added a comment - https://github.com/gbif/checklistbank/blob/master/checklistbank-cli/src/test/java/org/gbif/checklistbank/nub/NubBuilderTest.java#L130

            People

            • Assignee:
              Markus Döring
              Reporter:
              Markus Döring
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: