Issue 11223

Dataset API object and DatasetFactory might be not compatible

11223
Reporter: jcuadra
Type: Bug
Summary: Dataset API object and DatasetFactory might be not compatible
Priority: Blocker
Resolution: Fixed
Status: Closed
Created: 2012-05-22 16:28:14.678
Updated: 2013-12-16 17:51:01.981
Resolved: 2012-05-23 11:28:02.404
        
Description: The problem is as follows, going to explain it step by step.

On the DatasetWSClientITests there is a test which checks that the "hosting" relation is persisted and brought back in a subsequent GET request.
==========================

1)    writableDataset.setHostingOrganizationKey(hostingOrganization);
2)    writableDataset.setOwningOrganizationKey(owningOrganization);
3)    UUID key = getClient().create(writableDataset);

4)    Dataset obj = getClient().get(key);
7)    Assert.assertNotNull(obj.getHostingOrganizationKey());
8)    Assert.assertNotNull(obj.getOwningOrganizationKey());

=======
the tests where failing at 7) and took me a time to discover the cause and this is what Im seeing inconsistent

=======
DatasetFactory is NOT setting the hosting organization anywhere. The only thing is a comment left:

    // can't set the hosting organization, we need to know the technical installation agent

which does not do any further activity
(http://code.google.com/p/gbif-registry/source/browse/trunk/registry-service/src/main/java/org/gbif/registry/service/factory/DatasetFactory.java#119)

I am not sure why is this completely overlooked as its of vital importance in our factory conversion.


=========
In another issue, in the same DatasetFactory we have the build(Agent) method which calls the getHostingOrganization() which is particularly confusing to me because getHostingOrganization() does the following:

1. it first checks if there is an entity that SERVES the dataset (presumably a TECH_Install SERVES a DS, i think)

2. Then it goes and gets the Organization that hosts that TECH_Install.  Returns the UUID of this organization.


This seems good logic, but this is forcing us to always have a triangle, as in:

TechInstallXYZ SERVES datasetKLM
OrganizationABC HAS_INSTALLATION TechInstallXYZ

which I am not sure is the way to go? Shouldnt it be possible to link straight a Dataset with its "hostingOrganization" straightforward? We have documented this possiblity at http://dev.gbif.org/wiki/display/POR/Agent+types+and+relations in where we could just have

OrganizationABC SERVES datasetKLM


]]>
    


Author: trobertson@gbif.org
Created: 2012-05-22 16:34:20.878
Updated: 2012-05-22 16:34:20.878
        
One thing to consider.  Can a dataset exist but not be tied to a technical installation at all?
E.g. an undigitized collection
    


Author: jcuadra@gbif.org
Created: 2012-05-22 16:51:47.22
Updated: 2012-05-22 16:51:47.22
        
I think I didn't offer any solutions, but I would support the

A) idea of having the possibility to link a hostingOrganization straight to the Dataset. This won't make us change the API at all.

Otherwise, I think we would need to include a hostingTechnicalInstallation attribute on the WritableDataset API object as well, as this would be the only way to reach to its hostingOrganization (under the current factory implementation) **BUT** we have these Dataset which have been registered by the likes of Ireland and UK using their own scripts (http://gbrds.gbif.org/browse/agent?uuid=51E25910-C7B5-11DE-B279-E8B0507C4765) and they are not linked to any technicall isntallation (Tim: so its a yes to your question)

I did some modifications in a revision that worked as explained in (A). I reverted back these changes afterwards as I don't want to mess up a code I didn't wrote:

but they can be found here
http://code.google.com/p/gbif-registry/source/diff?spec=svn2850&r=2850&format=side&path=/trunk/registry-service/src/main/java/org/gbif/registry/service/factory/DatasetFactory.java


    


Author: trobertson@gbif.org
Created: 2012-05-22 16:55:24.886
Updated: 2012-05-22 16:55:24.886
        
Perhaps you need to add the TechnicalInstallationKey to the WritableDataset, and let the caller provide all 3
Should the host and "host of the technical installation" conflict, then that should result in a bad request

?
    


Author: jcuadra@gbif.org
Created: 2012-05-22 17:16:54.623
Updated: 2012-05-22 17:17:24.191
        
This seems good with me. A client could then decide whether to use the a) hostingInstallationKey or b) hostingOrganizationKey or c) both - in which case we would check for the bad request . Am I correct?

And on the way back (GET request), we populate any hosting*Key that is possible to populate, right?

Nevertheless, we need to modify the DatasetFactory build() methods in any case, but this is quick.
    


Author: mdoering@gbif.org
Created: 2012-05-22 21:32:19.022
Updated: 2012-05-22 21:32:19.022
        
Im fearing if we introduce options it can get more couple in the long run than having a potential redundancy, but a clear policy.
What about
a) requiring a technical installation for all datasets that are served, i.e. are hosted. For custom scripts it would be a dummy 1:1 TI with no real information, maybe of type CUSTOM_SCRIPT
b) force all datasets to have a hostedByOrg key, regardless of any linked TI. The hostedByKey could only be updated directly if no TI exists, otherwise changing the host of the TI would trigger an update to all linked datasets

I quite like the first option with a required TI - anything that makes this option impossible?
    


Author: jcuadra@gbif.org
Created: 2012-05-23 10:09:21.001
Updated: 2012-05-23 10:09:21.001
        
in a) will we be forcing all these external clients to search for a hostingTechnicallInstallationKey inside our registry before registering anything (besides the owningOrganizationKey)? I don't like adding more constraints to registrations.



    


Author: jcuadra@gbif.org
Comment: The Dataset API object will now include a technicallInstallationKey instead of the hostingOrganizationKey. 
Created: 2012-05-23 11:28:02.433
Updated: 2012-05-23 11:28:02.433