Merge our API projects into one single API project
11811
Reporter: lfrancke
Assignee: lfrancke
Type: Improvement
Summary: Merge our API projects into one single API project
Priority: Major
Resolution: Fixed
Status: Closed
Created: 2012-09-05 14:31:50.636
Updated: 2013-12-11 17:42:12.265
Resolved: 2012-09-11 12:39:08.59
Description: * registry API
* checklistbank API
* occurrence API
* metrics API
* occurrence download API
* gbif-common-api]]>
Author: lfrancke@gbif.org
Created: 2012-09-05 15:13:37.55
Updated: 2012-09-05 15:13:37.55
The structure seems relatively clear with a few exceptions
* org.gbif.registry.api.model.Tag depends on org.gbif.registry.api.util.TagSerDe and vice versa. So it's a circular dependency between those two packages. We should fix that while we're at it
* The packages are all called org.gbif.{registry,checklistbank,metrics,occurrencestore,api}.api. Do we want to flatten that to org.gbif.api.{registry,checklistbank,metrics,occurrence}?
** The common API things currently live in org.gbif.api. Do we want to keep it there or move to org.gbif.api.common?
** I suggest renaming org.gbif.api.occurrencestore to org.gbif.api.occurrence.
** What about the download API stuff?
** We could also rename to org.gbif.api.{model,service,vocabulary,search,paging,...}
** From looking at the package structure it seems like we're doing things slightly different everywhere but it should be relatively easy to come up with a common structure.
* checklistbank-api depends on commons-beanutils for the usage of just one method in NameUsageContainer. I'm in favor of cutting that.
* registry-api depends on bval-jsr303 and I can't find why. It's used in a test but the dependency plugin doesn't report it as unused. Can anyone tell me what this might be used for that I'm missing?
* registry-api depends on slf4j-api without using it?
* registry-api depends on gbif-metadata-profile for the sole use of DateUtils in the DateRange class. This should be cut. Either move that DateUtils class to gbif-common or just replicate that functionality in registry-api. Seems like a simple thing
* gbif-common-api depends on commons-validator for E-Mail validation. I'm in favor of cutting this dependency.
* As mentioned occurrence-download-api depends on Jackson but we decided that's okay
Author: mdoering@gbif.org
Created: 2012-09-05 15:19:58.728
Updated: 2012-09-05 15:19:58.728
Sounds all good Lars, but lets pelase wait with any refactoring of package names until the merger has successfully completed. In general Im definitely in favor of cleaning up package names.
Author: lfrancke@gbif.org
Created: 2012-09-05 15:26:47.519
Updated: 2012-09-05 15:26:47.519
* InterpretedEnum and InterpretedField should live in the same package
* What's the Project class in registry-api and why does it live in its own package? Should be moved to the parent package.
Author: lfrancke@gbif.org
Comment: I'm in favor of doing it all at once. I don't see why we would want to start in a bad state if we could just clean it all up in one go.
Created: 2012-09-05 15:27:26.421
Updated: 2012-09-05 15:27:26.421
Author: trobertson@gbif.org
Comment: Much of the registry package structure is largely there to separate concerns in the GBIF dataset metadata profile (geospatial, taxonomic, temporal etc). Following along this approach is the only reason why project exists as a package.
Created: 2012-09-05 15:40:18.384
Updated: 2012-09-05 15:40:18.384
Author: lfrancke@gbif.org
Created: 2012-09-07 12:07:55.936
Updated: 2012-09-07 12:29:11.809
This is an overview over our merged package structure. I didn't change anything yet so this really only represents the merged package level structure of all the API projects.
* api
** constraints
*** constraints.validator
** messaging
** model (6)
*** model.{citation,collection,curatorial,data,geospatial,graph,keyword,project,sampling,taxonomic,temporal}
*** model.constants
*** model.predicate
*** model.search (2)
*** model.vocabulary (3)
** paging
** search
** service (6)
*** service.search
** util
** vocabulary
** ws.json
Any opinions on how we want to streamline this? The numbers in brackets are how often we have that package.
Author: lfrancke@gbif.org
Created: 2012-09-07 12:17:03.885
Updated: 2012-09-07 12:17:03.885
Doesn't actually look that bad.
* util, messaging and constrains.validator look suspicious.
* vocabulary and model.vocabulary as well as search and model.search are the only ones that really do seem like they might be the same.
* Not sure about model.constants and if that should be vocabulary.
* ws.json is from occurrence-download-api and I'll look at that.
* That'd leave paging as a top level package, not sure if that should go into model
I have not looked at any code or class names even. This is purely from a package-level view.
Author: mdoering@gbif.org
Created: 2012-09-07 12:35:20.414
Updated: 2012-09-07 12:35:20.414
I understood we try to get to a structure like this:
{noformat}
api
model
common
messaging
checklistbank
registry
... citation,collection,curatorial,data,geospatial,graph,keyword,project,sampling,taxonomic,temporal
occurrence
download
vocabulary
service
common
messaging
checklistbank
registry
occurrence
download
util
{noformat}
I would consider not to keep separate search or paging packages, but rather merge them into the model/service common ones or the specific ones like registry.
If vocabulary should be below model, in parallel or below the individual model domains I dont know. A single package with all enums might be a good thing.
Author: lfrancke@gbif.org
Comment: Yes, the "graph" I put up in my comment was the unchanged but merged structure of our packages when I stripped them of the registry etc. names.
Created: 2012-09-07 13:22:43.895
Updated: 2012-09-07 13:22:43.895
Author: lfrancke@gbif.org
Created: 2012-09-10 11:53:53.56
Updated: 2012-09-10 11:53:53.56
* I have not changed Tag or TagSerDe yet. Not sure what to do there, it seems kinda messy that they depend on each other though.
* I have removed the EMailValidator without a replacement. We should just send a verification mail. So I'm thinking of also removing the ValidEmail annotation
* I have not changed the structure of the registry project with its various subpackages
I received this feedback from Markus:
{quote}
- treat the search facet param enums all the same - either in a search subpackage (aka clb model) or right in the model package. In occurrence its in model.constants
- the download packages both have just 1 class. Should we remove the downlaod package and merge it with occurrence, but keep the predicates as a subpackage?
- vocbulary is still all over the place. I think Id opt for a single package next to model
{quote}
Which I will look at now.
Author: lfrancke@gbif.org
Created: 2012-09-10 12:12:45.663
Updated: 2012-09-10 12:12:45.663
{quote}treat the search facet param enums all the same - either in a search subpackage (aka clb model) or right in the model package. In occurrence its in model.constants{quote}
* _project_.model.search
* _project_.model
* search._project_
Would be possibilities. I have no opinion but if I don't hear anything I'll just decide on one.
{quote}the download packages both have just 1 class. Should we remove the downlaod package and merge it with occurrence, but keep the predicates as a subpackage?{quote}
Hmm I always wanted to keep it separate but you're right in that it doesn't really make much sense now so I think I agree with you. Other opinions?
{quote}vocbulary is still all over the place. I think Id opt for a single package next to model{quote}
Okay makes sense. Would you keep the project separation? (vocabulary.checklistbank, vocabulary.registry, ...)
Author: fmendez@gbif.org
Comment: I prefer project.model.search, would like to have 1 vocabulary package if possible.
Created: 2012-09-10 12:19:31.413
Updated: 2012-09-10 12:19:31.413
Author: mdoering@gbif.org
Comment: As we seem to move more and more enums from specific projects to the common one I would suggest to throw them all into a single package
Created: 2012-09-10 12:21:38.189
Updated: 2012-09-10 12:21:38.189
Author: mdoering@gbif.org
Comment: Id vote for removing all search subpackages and merge the classes with the respective model or service package. In the service.occurrence.search package we also have OccurrenceSearchRequest which is a model object in clb (NameUsageSearch) and doesnt even exist in registry.
Created: 2012-09-10 12:27:29.363
Updated: 2012-09-10 12:27:29.363
Author: mdoering@gbif.org
Created: 2012-09-10 12:32:00.435
Updated: 2012-09-10 12:32:00.435
We have implemented various ways of converting a string to an enum. Id prefer to converge on the same strategy everywhere by having a simple static method in each enum. Named for example fromString(...), toEnum(...) infer(...) or lookup(...). There is also a VocabularyUtils class that contains a simple parser based on the enum names that could be our default implementation for all vocabulary enums?
For more complex parsing we should of course make use of the parser project, but you often find the need of converting from the name to the enum in a case and whitespace insensitive way. So I'd really like to keep a simple static method on each enum.
Author: lfrancke@gbif.org
Comment: According to Markus the two classes {{MemberFacetParameter}} and {{MemberSearchResult}} are unused and we have no plans on using them so I'll remove them.
Created: 2012-09-10 12:44:19.712
Updated: 2012-09-10 12:44:19.712
Author: lfrancke@gbif.org
Created: 2012-09-10 14:34:43.587
Updated: 2012-09-10 14:34:43.587
Okay I've done the occurrence.download and vocabulary move now. There's a class called "ErrorType" in vocabulary which comes from metrics. This name is now not descriptive enough anymore (with a merged vocabulary package). Any suggestions for a new name?
Now we need a decision about the search package stuff. Any other opinions to break the tie?
Author: lfrancke@gbif.org
Created: 2012-09-10 15:02:52.94
Updated: 2012-09-10 15:02:52.94
{quote}We have implemented various ways of converting a string to an enum. Id prefer to converge on the same strategy everywhere by having a simple static method in each enum. Named for example fromString(...), toEnum(...) infer(...) or lookup(...). There is also a VocabularyUtils class that contains a simple parser based on the enum names that could be our default implementation for all vocabulary enums?{quote}
I agree that this should be fixed.
The VocabularyUtils#lookupEnum thing is inefficient but some of that can probably be fixed.
The problem I see here is that we're mixing things. ContactType for example has a manually build map of strings to enum values. If every enum value needs a fixed string that is not to change then it should be a parameter of the enum. This way we can't do lookups the other way around. In Extension for example we support strings with or without underscores etc. I'm very much in favor of removing all that flexibility and just relying on the name of the enum itself and not some other magic string which only differs in delimiter and case. Then it'd be easy to provide a generic reverse lookup mechanism.
That probably has implications for a lot of our code and maybe even database stuff so I guess this is outside of the scope of this issue?
Author: lfrancke@gbif.org
Created: 2012-09-10 15:05:56.066
Updated: 2012-09-10 15:05:56.066
{quote}In the service.occurrence.search package we also have OccurrenceSearchRequest which is a model object in clb (NameUsageSearch) and doesnt even exist in registry.{quote}
I don't understand. Can you clarify that?
I've moved OccurrenceSearchRequest to org.gbif.api.model.occurence.search.
Author: mdoering@gbif.org
Created: 2012-09-10 16:05:20.61
Updated: 2012-09-10 16:05:20.61
Ive added a DatasetSearchRequest and renamed NameUsageSearch into NameUsageSearchRequest.
Now we have 3 different search request objects all named XyzSearchRequest, which was my issue above.
https://github.com/gbif/gbif-api/commit/ab748e88c83ac2c2ffebc143454f455cbe0857ee
Author: mdoering@gbif.org
Comment: For the string to enum conversion we could come up with a new general solution while keeping the old way marked as deprecated?
Created: 2012-09-10 16:06:20.146
Updated: 2012-09-10 16:06:41.223
Author: omeyn@gbif.org
Comment: I'm in favour of just using the enum.toString and fromString wherever possible - if there's some display issue we can have a converter, but would keep that for special cases.
Created: 2012-09-11 11:31:34.492
Updated: 2012-09-11 11:31:34.492
Author: jcuadra@gbif.org
Created: 2012-09-11 11:32:25.994
Updated: 2012-09-11 11:32:25.994
"There's a class called "ErrorType" in vocabulary which comes from metrics. This name is now not descriptive enough anymore (with a merged vocabulary package). Any suggestions for a new name?"
These are error types that will occur during processing of occurrence records (from RAW to PROCESSED). ProcessingErrorTypes I would call them.
Author: lfrancke@gbif.org
Comment: Final patch between unchanged but merged state of all API projects and what's been initially committed to SVN
Created: 2012-09-11 12:32:50.537
Updated: 2012-09-11 12:32:50.537