Issue 14908

Reconsider generating multiple occurrences from a single ABCD unit

14908
Reporter: mdoering
Type: Story
Summary: Reconsider generating multiple occurrences from a single ABCD unit
Priority: Critical
Status: Open
Created: 2014-01-22 13:57:26.41
Updated: 2015-12-16 17:09:09.898
        
Description: Occurrence parsing historically produces multiple occurrence records from a single ABCD unit with a distinct unitQualifier to distinguish them. For dwc records we do not do this if there is a identification history. Review this decision and consider creating a single occurrence record, ditching the unit qualifier alltogether.

For historical identifications this does not make sense and we will have to cater for multiple identifications possibly within the occurrence records itself (see dwc-a identification extension). For a specimen consisting of multiple taxa this makes some sense, but I doubt we use the same approach anywhere else yet, so removing it for ABCD seems still fine.

In addition to ABCD dwc archives might also have an identification extension that will supply multiple identifications, see http://rs.gbif.org/extension/dwc/identification.xml]]>
    


Author: mdoering@gbif.org
Comment: [~rdmpage], did have have any problems or thoughts on how we duplicate multi identification records at GBIF?
Created: 2015-12-15 14:57:43.99
Updated: 2015-12-15 14:57:43.99


Author: rdmpage
Created: 2015-12-16 15:54:09.147
Updated: 2015-12-16 15:54:09.147
        
[~mdoering@gbif.org] Haven't thought much about this, but having multiple identifications per specimen seems important. Some providers already use this, see e.g. http://www.gbif.org/occurrence/search?datasetKey=6eb0c925-06b4-4dec-a153-c4a28cb4eb9d where two different identifications are provided (this get's lost by GBIF).

I can think of lots of cases where multiple identifications will be important, especially once we match specimens in GenBank, BOLD, and GBIF. Each may have a different identification. Astronomy changes and old specimens become types for new names, we will accumulate more identifications (the Megachile occurrence above is an example).

Hope I've understood your question.
    


Author: mdoering@gbif.org
Created: 2015-12-16 17:09:09.898
Updated: 2015-12-16 17:09:09.898
        
The question is how GBIF handles multiple identifications. Apparently not at all for Darwin Core so far as you pointed out:
http://www.gbif.org/occurrence/779870399/verbatim

But for ABCD we apprently create multiple occurrence records for the same specimen if there is not a single preferred determination. That has a long tradition.
I think we should instead track multiple determinations per occurrence record - which is a serious change for our data model.

Finally found an ABCD example:
http://ww2.bgbm.org/herbarium/view_biocase.cfm?SpecimenPK=2633
http://www.gbif.org/occurrence/893482420/fragment
http://www.gbif.org/occurrence/search?OCCURRENCE_ID=http%3A%2F%2Fherbarium.bgbm.org%2Fobject%2FBW19266010&DATASET_KEY=85714c48-f762-11e1-a439-00145eb45e9a

I thought we create a new record for each determination, preferred or not. That would have been bad in my opinion.
Things are not as bad as I thought if the preferred determination is used, so I guess I can close this issue and just the feature request remains open:
http://dev.gbif.org/issues/browse/POR-2458