Issue 10134

Investigate pros/cons of migrating to a maven repo

10134
Reporter: omeyn
Type: Technical
Summary: Investigate pros/cons of migrating to a maven repo
Priority: Major
Resolution: Fixed
Status: Closed
Created: 2011-11-07 15:36:17.523
Updated: 2013-12-06 13:16:02.898
Resolved: 2011-11-07 15:37:34.623


Author: omeyn@gbif.org
Created: 2011-11-07 15:36:34.962
Updated: 2011-11-07 15:36:34.962
        
from kyle:

A mavenized version of our vocabularies could be done by:
Extracting the vocabularies from rs.gbif.org (project) into a new project under subversion
Describing the new project by a pom file
There is the assumption that all vocabularies will change, some more often than others.
The advantages of having a separate mavenized project are:
Projects relying on it would not have to rely on rs.gbif.org always being available. Projects relying on it would have to be able to connect to the maven repository to be able to build, but not to run.
The mavenized project would have a version number, so projects relying on it as a dependency would just have to update to a newer version number to incorporate updates.
The disadvantage of having a separate mavenized project are:
Every time a vocabulary would change, the project would need to be assigned a new version number. If changes to vocabularies happened quite often, our projects would need to update to this new version number often in order to incorporate the updated vocabularies. This could get annoying.
Extremely large vocabularies could make the project quite large in size and inconvenient to include as a dependency in our projects.
Whether the vocabularies exist in rs.gbif.org (project) or in this new proposed project, an update script will still have to synchronize them with http://rs.gbif.org. A project like the IPT, that relies on the vocabularies'  public URIs, depend on http://rs.gbif.org and therefore it must be maintained. 
    


Author: omeyn@gbif.org
Created: 2011-11-07 15:36:48.546
Updated: 2011-11-07 15:36:48.546
        
from markus:

the vocabs as all other definitions in rs.gbif.org is already in SVN here:
http://code.google.com/p/gbif-registry/source/browse/trunk/rs.gbif.org/
I would argue they should stay together, but might want to move this to another project than the registry one.
Another serious disadvantage of bundling static vocabularies with jars is that projects need to update their dependency to get the latest vocabulary. That defeats the whole purpose of these live vocabs that continously evolve, in particular with translations, descriptions and examples.
I could imagine we release static snapshots for testing etc, but nevertheless those classes should expose a way to update to the latest online version if they are used in real applications
    


Author: omeyn@gbif.org
Created: 2011-11-07 15:36:58.784
Updated: 2011-11-07 15:36:58.784
        
from lars:

What you call a disadvantage I call an advantage: You need to make a conscious choice to update to a new set of (or updated) vocabularies because that might have ramifications for your existing data. So in my opinion: If we use an XML thing as a source they need to be versioned somehow e.g. http://rs.gbif.org/vocabulary/iso/3166-1_alpha2v3.xml or whatever. This is also necessary for reproducible builds and results for processing runs.
It seems to me like we have two different needs for these things so we might want to make a fundamental decision first: Do we want to keep everything in one place or not. I'm leaning towards the "or not" part and just have a simple project that has the java enums we need and that can be versioned and evolve as needed.
    


Author: omeyn@gbif.org
Created: 2011-11-07 15:37:10.461
Updated: 2011-11-07 15:37:10.461
        
from lars:

And thanks Kyle for figuring this out and writing it down this way.
This writeup and the discussions we had yesterday made me realize that we might need to take a step back first.
    


Author: omeyn@gbif.org
Created: 2011-11-07 15:37:20.158
Updated: 2011-11-07 15:37:20.158
        
from markus:

indeed. To do it properly we should probably see how and where we use those vocabs. Plus there will be a new community authoring tool for these that Dag is working on. This is likely to publish versions into our rs.gbif.org.
For the IPT for example its vital that an existing installation can update its vocabularies without the need to reinstall the software. Also for the dwca validator. Clb indexing updates its internal vocabs also from time to time using rs.gbif.org
I dont think I would trade these benefits for the sake of better or simpler "testability"