Issue 13170

Overlap/duplication between datasets

13170
Reporter: jlegind
Assignee: jlegind
Type: Task
Summary: Overlap/duplication between datasets
Priority: Major
Resolution: CantReproduce
Status: Closed
Created: 2013-05-07 10:05:57.819
Updated: 2014-09-19 16:47:12.663
Resolved: 2014-09-19 16:47:12.631
        
Description: Samy has produced a page that displays datasets suspected of overlap or duplication. http://samy.gbif.org/
The analysis is based on scientific names where geo-references as well as date appear in large numbers across two or more datasets.

"...also - I think we need to do more than jsut ask the publishers.
It should not be difficult to find say 10 examples, and then inspect them.  Samy is checking common species, location and day.
we should look at the rest of the fields and see if really 2 people have observed the same and published in 2 systems or is 1 person has published through 2 systems." Tim R.]]>
    


Author: jlegind@gbif.org
Created: 2014-09-19 16:47:12.661
Updated: 2014-09-19 16:47:12.661
        
Worst offenders on the list were cleaned.
Not all identified had bad data.

webpage no longer available.