Issue 10166

Decide on NameUsage vs Species - one model or two?

10166
Reporter: omeyn
Type: Feedback
Summary: Decide on NameUsage vs Species - one model or two?
Priority: Major
Resolution: Fixed
Status: Closed
Created: 2011-11-08 09:46:00.287
Updated: 2013-08-29 14:46:02.684
Resolved: 2011-11-08 09:48:05.558


Author: omeyn@gbif.org
Created: 2011-11-08 09:46:24.694
Updated: 2011-11-08 09:46:24.694
        
Oliver Meyn Fri, 23 Sep at 8:43am
In discussion with Markus, decided that they should share the same model, but that the model must have a method on it that exposes whether it is a Species or NameUsage (i.e. nub or not-nub).  While they share the same model their handling in the portal will definitely be different so they must be represented as distinct resources.  There is only an interpreted view of Species, but there are interpreted and verbatim views required for NameUsage.  These comments have been added to the clbapi doc.

Lars Francke Fri, 23 Sep at 10:32am
Can you elaborate a bit on this?
Are Species and NameUsage the same or not?
This sounds like this quote may apply:
"Such tagged classes have numerous shortcomings. They are cluttered with boilerplate, including enum declarations, tag fields, and switch statements. Read- ability is further harmed because multiple implementations are jumbled together in a single class. Memory footprint is increased because instances are burdened with irrelevant fields belonging to other flavors. Fields can’t be made final unless constructors initialize irrelevant fields, resulting in more boilerplate. Constructors must set the tag field and initialize the right data fields with no help from the compiler: if you initialize the wrong fields, the program will fail at runtime. You can’t add a flavor to a tagged class unless you can modify its source file. If you do add a flavor, you must remember to add a case to every switch statement, or the class will fail at runtime. Finally, the data type of an instance gives no clue as to its flavor. In short, tagged classes are verbose, error-prone, and inefficient.
Luckily, object-oriented languages such as Java offer a far better alternative for defining a single data type capable of representing objects of multiple flavors: subtyping. A tagged class is just a pallid imitation of a class hierarchy."

Markus Döring Fri, 23 Sep at 11:02am
I didnt really think about subclassing I have to admit. That makes sense, in particular if those classes deviate in the future more.
I tend towards subclasses now...
PS: A tagged class can be a really good think though and subclassing is often not the way to go cause in java at least you cannot inherit from multiple parent classes. And in ontologies there are other reasons why people prefer tagging to class hierarchies. Sometimes it just comes down to personal flavors.

Markus Döring Fri, 23 Sep at 11:38am
Species and name usages are very, very alike. They are stored in the clb db as the same thing. And all current ecat classes treat them the same too.
First of all to terminology:
name usage:
its not a widely recognized taxonomic/biological term, but its not overloaded like all the other options such as taxon, taxon concept or species
it boils down to a scientific name being used somewhere, so it can be a synonym or accepted or not even classified at all, just a bare name
species
in the sense of the portal design here it is a name usage belonging to the backbone nub taxonomy.
its of course not a species in the real sense, it can as well be another rank like family or subspecies
its hard to find a good, not already overloaded term for this. We need to file this for Andrea and David: https://gbif.basecamphq.com/projects/7935093-portal/todo_items/108055024/comments
The properties are the same, but there is a different sql needed to retrieve subresources (image, description, etc).
A species does not have a verbatim view, as its assembled by us artificially (nub building) and not based on an indexed source record. It therefore also lacks a link to some external source webpage.
We try to link all other usages to a corresponding nub usage (see nubKey on name usage), so it becomes a true backbone where we attach all other records to and through which we can provide crosswalks from one usage to another (these are the related usages). All subresources (think images) of a species are also retrieved via a crosswalk to their original usage. Imagine we had 10 checklists that all treat Puma concolor in some way. They would all have the same nubKey=2435099 and when we ask for images for the species Puma concolor we would return all images linked to either of those 10 usages. See related sources in the ecat prototype http://ecat-dev.gbif.org/usage/2435099

Lars Francke Fri, 23 Sep at 11:41am
Thank you for the detailed explanation!

Markus Döring Fri, 23 Sep at 11:51am
would be good to keep sth like that in the api docs or even better the model javadocs?

Markus Döring Fri, 23 Sep at 11:53am
if we go for a subclass approach, how about naming the 3 classes like this:
NameUsage (superclass)
NubUsage
ChecklistUsage or SourceUsage

Oliver Meyn Mon, 26 Sep at 5:17am
clbAPI document updated with superclass/subclass recommendation


Author: omeyn@gbif.org
Comment: In the meantime we have now decided to go back to a single model with an isNub() method to differentiate between the two.  This is because, after building the ChecklistUsage, building the NubUsage was all copy and paste with only very, very small differences between them that didn't warrant an entire separate ws-client, ws, model, mapper stack.
Created: 2011-11-08 09:48:01.837
Updated: 2011-11-08 09:48:01.837