Issue 13010

d21209bf-b1e0-4a97-9861-9868cb461786 contains multiple records for same triplet

13010
Reporter: omeyn
Assignee: jlegind
Type: Improvement
Summary: d21209bf-b1e0-4a97-9861-9868cb461786 contains multiple records for same triplet
Description: colliding records vary only slightly, typically by collector name and/or eventDate
Priority: Major
Resolution: Fixed
Status: Closed
Created: 2013-03-14 15:58:23.419
Updated: 2014-09-19 16:41:11.841
Resolved: 2014-09-19 16:41:11.813
Attachment d21209bf-b1e0-4a97-9861-9868cb461786_occurrences.xlsx


Author: ahahn@gbif.org
Created: 2013-03-14 16:46:31.666
Updated: 2013-03-14 16:48:00.885
        
http://gbrds.gbif.org/browse/agent?uuid=d21209bf-b1e0-4a97-9861-9868cb461786
"Anillamiento de Aves - Proyecto Cruzando el Caribe" dataset of "AsociaciĆ³n Selva" (owner)
-> see attached excel file for data content

Similar issue to Herbier de Strasbourg: needs examination of the data served at source to find out where the duplication of ID triplets comes in. Could be:
- mapping of database tables (cartesian product), artificial duplication
- data content (multiple capture of the same object by e.g. different people)
- errors in ID values (e.g. merging multiple source datasets with same types of catalog numbers, but giving them all the same collection code)
etc

NB: the records with duplicate ID triplets nevertheless have unique GUIDs

Worth checking more closely after a quick view:
- recordedBy: duplicates vary in this field, which might indicate multiple data capture
- catalogNumber: a series of records (314) has catalog number "999999", likely some internal marker. Such records should be excluded from the public dataset
- identifiedBy varies similar to recordedBy. The scientific names and other field contents, on the other hand, look identical

Proposed next step: contact publisher and alert them to the issue, copying the node contact for information


    


Author: jlegind@gbif.org
Comment: Publisher contacted.
Created: 2013-03-19 14:27:13.258
Updated: 2013-03-19 14:27:13.258


Author: jlegind@gbif.org
Comment: The Dataset (dwc-archive) has been indexed successfully which means the dataset has been cleaned.
Created: 2014-09-19 16:41:11.838
Updated: 2014-09-19 16:41:11.838