Issue 11895

Add higher taxa counts to the occurrence cube

11895
Reporter: trobertson
Assignee: trobertson
Type: Bug
Summary: Add higher taxa counts to the occurrence cube
Priority: Critical
Resolution: Fixed
Status: Closed
Created: 2012-09-14 08:34:52.742
Updated: 2013-12-05 11:05:08.907
Resolved: 2012-10-12 14:29:57.143
        
Description: The occurrence cube does not currently have higher taxa counts.
Ideally we want to be able to ask the cube questions like "what is the count of records for nub concept X" where X might be at any rank (Kingdom etc).

This means we want to get counts for kingdom, phylum etc all into a single dimension (the nub key).

If nubKey was the only dimension you can simply:

{{{code}
Set taxa =
        Sets.newHashSet(o.getKingdomID(), o.getPhylumID(), o.getClassID(), o.getOrderID(), o.getFamilyID(), o.getGenusID(), o.getSpeciesID(),
          o.getTaxonID());

for (Integer i : taxa) {
  // write to the cube
}
{code}}}

However, if you have rollups which do not include the nubKey the counts will be incorrectly incremented for those dimensions.

Consider a record with kingdom:Animalia (nubKey 1) and species:Puma con color (nubKet 2435099), and rollup definitions of datasetId, nubKey, and datasetId+nubKey:

cube.write(dataset[1], nubKey[1]);
cube.write(dataset[1], nubKey[2435099]);

Following these operations, the following would be returned on asking the cube:

cube.read(nubKey[1]) -> 1 (correct)
cube.read(nubKey[2435099]) -> 1 (correct)
cube.read(dataset[1], nubKey[1]) -> 1 (correct)
cube.read(dataset[1], nubKey[2435099]) -> 1 (correct)
cube.read(dataset[1]) -> 2 (WRONG)

How can we solve this?  Possible options:

i) use separate cube for the taxa counts (nubKey would be mandatory in all rollups)

ii) investigate a compound object for the dimension.  E.g. rather than Dimension investigate if Dimension> is feasible
 - this might need changes to the DataCube Rollup procedures

iii) investigate if this can be handled in a Bucketer (Datacube) -> don't think this is correct use of bucketer

iv) use a dimension per rank (makes reading the cube harder as you need to know the rank up front)


]]>
    
Attachment NormalizationTest2.java
Attachment NormalizationTest.java


Author: trobertson@gbif.org
Comment: Illustrates the problem in a simple example, and shared with UrbanAirship devs
Created: 2012-10-10 15:05:43.168
Updated: 2012-10-10 15:05:43.168


Author: trobertson@gbif.org
Created: 2012-10-10 18:04:48.581
Updated: 2012-10-10 18:04:48.581
        
I believe NormalizationTest2 shows a sane way to handle this problem, as there was a bug in NormalizationTest.java.

Waiting for a confirmation from UrbanAirship
    


Author: trobertson@gbif.org
Created: 2012-10-12 14:29:57.168
Updated: 2012-10-12 14:29:57.168
        
Fixed with:
  http://code.google.com/p/gbif-metrics/source/detail?r=63