Issue 12027
registry-metadata-sync: duplicate datasets on BioCASE
12027
Reporter: kbraak
Assignee: fmendez
Type: Bug
Summary: registry-metadata-sync: duplicate datasets on BioCASE
Priority: Blocker
Resolution: WontFix
Status: Closed
Created: 2012-10-15 15:30:17.362
Updated: 2013-12-16 17:50:20.188
Resolved: 2012-10-18 10:23:47.24
Description: A BioCASE technical installation ( http://gbrds.gbif.org/browse/agent?uuid=603b2dd6-f762-11e1-a439-00145eb45e9a ) has 4 registered endpoints:
BIOCASE - http://dsibib.mnhn.fr/biocase/pywrapper.cgi?dsa=arachne
BIOCASE - http://dsibib.mnhn.fr/biocase/pywrapper.cgi?dsa=coleoptere
BIOCASE - http://dsibib.mnhn.fr/biocase/pywrapper.cgi?dsa=mycobase
BIOCASE - http://dsibib.mnhn.fr/biocase/pywrapper.cgi?dsa=reptamph
The result of metadata synchronization, is that each endpoint produces 4 different datasets. Plus there are others. Here's the result:
4 x Arachnides datasets
4 x Coleopteres datasets
4 x Ensiferes datasets
4 x Marine invertebrates, mollusca and crustacea datasets
4 x MNHN Reptiles and Amphibians datasets
4 x Ressources fongiques datasets
Actually, the iventory (scan) from the endpoint is http://dsibib.mnhn.fr/biocase/pywrapper.cgi?dsa=arachne
Arachnides
Coleopteres
Ensiferes
MNHN Reptiles and Amphibians Collection Catalog
Marine invertebrates, mollusca and crustacea
Ressources fongiques
I suspect it is because all 4 endpoints expose all 6 dataset behind it, and each time one endpoint synchronizes, each dataset gets added/updated again.
Surely they all use the same dataset name (code) and we can update each dataset instead of creating a new one for each endpoint?
I attach a copy of the logs from the synchronization on the technical installation for our reference.
Thanks
]]>
Attachment Screen Shot 2012-10-15 at 3.29.37 PM.png
Attachment agent-603b2dd6-f762-11e1-a439-00145eb45e9a.log
Author: fmendez@gbif.org
Comment: the metadata-sync process each endpoint separately , that means each endpoint contains each own set of dataset...i understand that in this case we have several endpoints serving the same six datasets?...that case is not handled by the synchronizer at all...and I don't know if we should manage that situation; for the metadata-sync those are different datasets with the same name behind different endpoints
Created: 2012-10-17 12:22:40.12
Updated: 2012-10-17 12:22:40.12
Author: fmendez@gbif.org
Comment: The real issue here is that we are harvesting 3 endpoints that serve the same 6 datasets, for the metadata-synchronizer is not possible to know if those datasets are the same or not, the synchronizer assumes that each endpoint serves its own datasets.
Created: 2012-10-18 10:23:47.264
Updated: 2012-10-18 10:23:47.264