Issue 12403

registry-metadata-sync: BioCASE problem duplicating datasets

12403
Reporter: kbraak
Assignee: fmendez
Type: Bug
Summary: registry-metadata-sync: BioCASE problem duplicating datasets
Priority: Blocker
Resolution: Fixed
Status: Closed
Created: 2012-11-23 17:09:53.79
Updated: 2013-12-16 17:50:56.345
Resolved: 2013-03-06 15:43:32.36
        
Description: Fieldjournal is a real dataset, it should exist in the Registry and Index with the same UUID, but it was re-created for apparently no good reason (several times actually), and this means Registry and Index now have different dataset UUIDS.

I believe the root problem has something to do with how BioCASE is uniquely recognizing datasets, and that it is writing services with null remote_id_at_url.

In the attached screenshot, you will see 3 selects taken from the Registry db, each one taken after a metadata synchronization on Technical Installation (UUID = 603ee53e-f762-11e1-a439-00145eb45e9a) triggered by the Super HIT.

After the 1st update, instead of updating the agent 14451,

-agent 14451 is deleted,
-a new agent 14452 is created,
-service 15452 (with NOT-NULL remote_id_at_url) isn't updated,
-a new service 15453 (with NULL remote_id_at_url) was created

After the 2nd update, instead of updating the agent 14452,

-agent 14452 is deleted,
-a new agent 14453 is created,
-service 15453 (with NULL remote_id_at_url) isn't updated,
-a new service 15454 (with NOT-NULL remote_id_at_url) was created with NOT-NULL remote_id_at_url.

On subsequent metadata synchronizations for the same Technical Installation (more > 5 now), I could never reproduce this problem. The update always updated the agent 14453, as you would expect.

To further help you debug this problem:

Before today, there were 3 different UUIDs for this dataset called "Fieldjournal.org observation database":

Dataset UUIDs, in order earliest to latest:

1. 827b3d6e-f762-11e1-a439-00145eb45e9a
2. 8d7ee6c4-fe4d-11e1-9738-00145eb45e9a
3. d7554d22-0633-11e2-8b66-00145eb45e9a

If you look at the Super HIT log table for this Technical installation (http://hit.gbif.org/console/list.html?datasourceId=861), the dataset was...

1 recreated:

2012-09-14 11:21:23.0	Dataset deleted 827b3d6e-f762-11e1-a439-00145eb45e9a
2012-09-14 11:21:23.0	Error parsing dataset Fieldjournal.org observation database
2012-09-14 11:17:22.0	Parsing dataset: Fieldjournal.org observation database
2012-09-14 11:17:21.0	Synchronizing service d62a1c05-3dd0-4c5b-8c66-08928d9aed7d, url: http://pontikka.fmnh.helsinki.fi/biocase/pywrapper.cgi?dsa=mustikka2
2012-09-14 11:17:21.0	Error occurred obtaining capabilities response: Internal Server Error

2 recreated again:

2012-09-24 12:37:29.0	Dataset deleted 8d7ee6c4-fe4d-11e1-9738-00145eb45e9a
2012-09-24 12:35:47.0	Parsing dataset: Fieldjournal.org observation database
2012-09-24 12:35:45.0	Synchronizing service d62a1c05-3dd0-4c5b-8c66-08928d9aed7d, url: http://pontikka.fmnh.helsinki.fi/biocase/pywrapper.cgi?dsa=mustikka2
2012-09-24 12:35:45.0	Error occurred obtaining capabilities response: Internal Server Error

3 updated:

2012-09-24 12:40:15.0	Dataset updated/synchronized uuid:d7554d22-0633-11e2-8b66-00145eb45e9a, name:Fieldjournal.org observation database
2012-09-24 12:39:24.0	Parsing dataset: Fieldjournal.org observation database
2012-09-24 12:39:22.0	Synchronizing service d62a1c05-3dd0-4c5b-8c66-08928d9aed7d, url: http://pontikka.fmnh.helsinki.fi/biocase/pywrapper.cgi?dsa=mustikka2
2012-09-24 12:39:22.0	Error occurred obtaining capabilities response: Internal Server Error

4 recreated again:

2012-11-23 10:14:32.0	Dataset deleted d7554d22-0633-11e2-8b66-00145eb45e9a
2012-11-23 10:14:31.0	Error parsing dataset Fieldjournal.org observation database
2012-11-23 10:10:31.0	Parsing dataset: Fieldjournal.org observation database
2012-11-23 10:10:29.0	Synchronizing service d62a1c05-3dd0-4c5b-8c66-08928d9aed7d, url: http://pontikka.fmnh.helsinki.fi/biocase/pywrapper.cgi?dsa=mustikka2
2012-11-23 10:10:29.0	Error occurred obtaining capabilities response: Internal Server Error

(The complete logs cut and paste from the HIT are attached)

Can you please find out the true root of the problem in the code Fede? Thanks

PS As we know, the fact that agents are being logically deleted is a separate problem. ]]>
    
Attachment fieldjournal_logs.txt
Attachment fieldjournal.txt


Author: kbraak@gbif.org
Comment: Problem relates to similar old issues? Perhaps so I link them for reference
Created: 2012-11-23 18:20:39.55
Updated: 2012-11-23 18:20:39.55


Author: kbraak@gbif.org
Comment: Unless we can fix this problem, this case is definitely +1 in favor of resurrecting datasets. 
Created: 2012-11-26 10:34:14.477
Updated: 2012-11-26 10:34:14.477


Author: fmendez@gbif.org
Comment: this issue was fixed some time ago, not longer valid
Created: 2013-03-06 15:43:32.383
Updated: 2013-03-06 15:43:32.383