Issue 11046

registry-sync: DiGIR metadata update duplicates then deletes resources

11046
Reporter: kbraak
Assignee: fmendez
Type: Bug
Summary: registry-sync: DiGIR metadata update duplicates then deletes resources
Priority: Major
Resolution: Fixed
Status: Closed
Created: 2012-04-30 16:06:24.175
Updated: 2013-12-16 17:50:17.527
Resolved: 2012-05-08 13:34:41.434
TimeOriginalEstimate: 0
TimeEstimate: 0
TimeSpent: 14400
        
Description: A metadata sync was performed on the Field Museum's (uuid = 7b8aff00-a9f8-11d8-944b-b8a03c50a862) technical installation (installation uuid = 56e438d2-894d-11e1-a453-a4093f22ed41)

The result was a duplication of all its resources, then a deletion of the old agent. I suppose it was able to recognize an existing resource during the metadata update, but then deleted it in place of the new one. Please see the attached screenshot showing the result outcome on mogo.registry_staging

Does this have something to do with the sync key?]]>
    

Attachment Screen Shot 2012-04-30 at 4.04.06 PM.png


Attachment Screen Shot 2012-05-11 at 11.52.11 AM.png

Attachment metadata_response.000


Author: fmendez@gbif.org
Comment: Bug in persistence layer, service were not being loaded eagerly and that was causing the deletions of resources
Created: 2012-05-08 13:34:41.466
Updated: 2012-05-08 13:34:41.466


Author: kbraak@gbif.org
Comment: Bug still happening. Take a look at what's happening with the service urls in the attached screenshot
Created: 2012-05-11 11:55:31.645
Updated: 2012-05-11 11:55:31.645


Author: kbraak@gbif.org
Created: 2012-05-14 16:24:17.278
Updated: 2012-05-14 16:24:17.278
        
When a DiGIR resource exists in the registry, but not in the metadata response, do we delete the resources?

I have a problem with Pangaea:

The current metadata response has 6843 resources (see attached)

The old HIT doesn't do any resource cleanup, it has a total of 7470 resources.

The dataset-aware registry has a total of 7705 resources. That's 235 more than the old HIT knows about.

At the very least, we need to store the agent's created and modified dates to keep track of which resources get updated and which one's don't. I see the agent_relation and service tables created and modified are updated are set, but they are certainly not being updated.

    


Author: kbraak@gbif.org
Comment: pangaea metadata response attached
Created: 2012-05-14 16:24:43.313
Updated: 2012-05-14 16:24:51.857


Author: kbraak@gbif.org
Comment: I do notice the updated and created timestamp for agent get set at least for TAPIR resources.
Created: 2012-05-15 13:29:33.832
Updated: 2012-05-15 13:29:33.832