Issue 15634

Duplicates within GBIF index of dataset (duplicates not in dataset)

15634
Reporter: rdmpage
Type: Bug
Summary: Duplicates within GBIF index of dataset (duplicates not in dataset)
Priority: Major
Status: Open
Created: 2014-05-20 16:48:41.314
Updated: 2015-03-02 15:44:51.382
        
Description: The dataset MVZ Herp Collection (Arctos) http://www.gbif.org/dataset/09c4287e-e6d5-4552-a07f-bff8a00833d8 has duplicates, e.g. http://www.gbif.org/occurrence/780508428 and http://www.gbif.org/occurrence/896632930 are the same specimen. The have the same catalogue number, as well as same individualID.

I grabbed the Darwin Core archive and it has a single copy of this record. It also has 264,671 records, whereas the GBIF page says 529,314 (almost exactly twice). It looks like the GBIF index has two copies of this dataset (despite the fact the provider has unique ids for each occurrence).

I think this is also a bug in the MVZ (Birds) dataset.]]>