Issue 14188

Key reused for dataset and network

14188
Reporter: kbraak
Assignee: jlegind
Type: Bug
Summary: Key reused for dataset and network 
Priority: Critical
Status: Open
Created: 2013-10-09 15:32:25.888
Updated: 2015-03-02 16:27:01.42
        
Description: The UUID key for the COL dataset is being reused for the network with the same name:

http://api.gbif.org/v0.9/network/7ddf754f-d193-4cc9-b351-99906754a03b
http://api.gbif.org/v0.9/dataset/7ddf754f-d193-4cc9-b351-99906754a03b

[~ahahn] reckons COL shouldn't be a network.

[~jlegind] it was you who made this commit, whic causes COL to be added as a new network:
https://code.google.com/p/gbif-labs/source/detail?r=170&path=/registry2/trunk/registry-migration/src/main/resources/migrate-networks.xml

Can you please explain? Thanks]]>
    


Author: ahahn@gbif.org
Created: 2013-10-09 15:37:53.815
Updated: 2013-10-09 15:37:53.815
        
In the old registry, CoL is a dataset that has constituent datasets (http://gbrds.gbif.org/browse/agent?uuid=7ddf754f-d193-4cc9-b351-99906754a03b), not a network. The way to migrate this should be to add parent_dataset_keys to all constituent datasets, unless this has been discussed and decided on to handle otherwise, [~mdoering@gbif.org], [~trobertson@gbif.org]?

If it has been decided to change that and model it as a network instead, the newly generated network entry should certainly not have the same UUID as the dataset.
    


Author: jlegind@gbif.org
Created: 2013-10-09 15:38:34.96
Updated: 2013-10-09 15:38:34.96
        
My memory about this particular commit is a bit fuzzy, but I believe it was a case of this CoL having the status of both organization AND network at the same time.

Can you please run it without the "OR uudi=xxxxxxxx" part on a test machine to see if it is going to break anything during migration.
    


Author: kbraak@gbif.org
Comment: Before I do, I'll run this by [~mdoering@gbif.org] for comment. 
Created: 2013-10-09 15:41:51.126
Updated: 2013-10-09 15:41:51.126


Author: ahahn@gbif.org
Created: 2013-10-09 15:56:03.088
Updated: 2013-10-09 15:58:00.871
        
Just to summarize content of the old registry (pre-migration situation):
- organization: "The Catalogue of Life Partnership" (http://gbrds.gbif.org/browse/agent?uuid=f4ce3c03-7b38-445e-86e6-5f6b04b649d4)
- endorsed by: Species 2000 (http://gbrds.gbif.org/browse/agent?uuid=03e816b3-8f58-49ae-bc12-4e18b358d6d9)
- the organization owns a load of datasets, including both a lot of smaller original ones (constituents) and
- dataset Catalogue of Life (http://gbrds.gbif.org/browse/agent?uuid=7ddf754f-d193-4cc9-b351-99906754a03b).
- The smaller contributing datasets are connected to the Catalogue of Life dataset as "has constituent"
- there is no network "Catalogue of Life", and there has not been any in the past

The endorsement link has been changed relatively recently. All other relationships should have been in place at the last migration.
    


Author: mdoering@gbif.org
Created: 2013-10-11 10:39:18.416
Updated: 2013-10-11 10:39:37.266
        
The network is superflous Id say. The important piece is that CoL is a dataset with some 130 constituent datasets, please make sure this remains!

In addition it would be very nice to migrate the constituent tags that indicate which datasetID they are known under within the CoL. For example: http://gbrds.gbif.org/browse/agent?uuid=9a942908-2cb8-4bb0-bc3d-adf3b8158d54 has  tag dataset_id=67

This should become a machine tag on the constituent dataset in the new registry with properties:
 namespace=crawler.gbif.org
 name=dataset_id
 value=67
    


Author: kbraak@gbif.org
Created: 2013-12-12 15:43:41.141
Updated: 2013-12-12 15:43:41.141
        
If the Network with duplicate UUID is superfluous, I suggest we just assign it a new UUID directly in the DB. Does this sound sensible [~mdoering@gbif.org]?

Markus, also please open a new issue for the creation of new machine tags for CoL constituents if necessary.


    


Author: mdoering@gbif.org
Comment: Do we need to constraint the uniqueness of the uuid keys across all entities? A single new table with just the key might be need for this...
Created: 2015-03-02 15:02:43.617
Updated: 2015-03-02 15:02:43.617