Issue 10932

Develop strategy to handle dataset copies served by different publishers

10932
Reporter: ahahn
Type: Improvement
Summary: Develop strategy to handle dataset copies served by different publishers
Description: Occasionally +/- identical copies of datasets are served by different publishers. For indexing purposes, such duplicates should be flagged, so that double indexing can be avoided. This might be handled through a new relationship type (non-preferred copy?). 
Priority: Major
Status: Open
Created: 2012-03-15 16:37:53.897
Updated: 2016-02-15 13:45:37.622


Author: kbraak@gbif.org
Comment: When a migration wants to be done on purpose, the idea has been put forward to allow a publisher to do so via their IPT: please check this out: http://code.google.com/p/gbif-providertoolkit/issues/detail?id=843
Created: 2012-03-16 10:31:43.324
Updated: 2012-03-16 10:31:43.324


Author: ahahn@gbif.org
Comment: In the new registry model, such a duplicate will be marked by a relationship to the to-be-indexed one via dataset.duplicate_of_dataset_key. Check that this is handled correctly by the indexer, then close.
Created: 2013-10-01 10:00:09.341
Updated: 2013-10-01 10:00:09.341


Author: kbraak@gbif.org
Created: 2013-12-10 16:36:07.963
Updated: 2013-12-10 16:36:07.963
        
[~omeyn@gbif.org] can you please confirm, that the crawler respects the dataset.duplicate_of_dataset_key

If so, I can close this issue.


Author: omeyn@gbif.org
Comment: This needs some digging, please leave as open issue.
Created: 2013-12-11 11:04:45.334
Updated: 2013-12-11 11:04:45.334