Issue 12612

Review parallelization of ABCD.UnitID with DwC.CatalogNumber

12612
Reporter: ahahn
Type: Task
Summary: Review parallelization of ABCD.UnitID with DwC.CatalogNumber
Priority: Critical
Status: Open
Created: 2013-01-22 10:45:00.714
Updated: 2016-02-10 10:53:11.492
        
Description: So far, the three-part ID of DarwinCore (InstitutionCode, CollectionCode, CatalogNumber) has been parallelized with the ABCD concepts SourceInstitutionID, SourceID and UnitID. According to the element definition, the UnitID is "A unique identifier for the unit record within the data source. Preferrably, the ID should be stable in the database, so that it also can be used to find the same record again (e.g. for data exchange purposes). Third part of the record identifier.

The terminology has been taken over from DwC, which was originally targeted at physical collections. Since then, the concept of the CatalogNumber, also in the use of DwC, has shifted towards a general "unique identifier(...) within the data source". In consequence, explicit documentation of an actually existing catalogue number (accession number) is no longer easily possible.

User comments suggest that an accession number e.g. cited in the literature should be both searchable and consistently displayed under a label like "Catalog number", while the mix of system-internal IDs and real catalog numbers in that field in the current data portal is confusing. What adds to the confusion is that even if real catalog numbers do exist, they will not be used for that concept mapping if they are not available for all records - as part of the key triplet, unique values for all records are mandatory (http://iphylo.blogspot.dk/2013/01/more-gbif-specimen-identifier.html).

With the introduction of globally unique occurrence ids (or their consistent interim replacement in a form like "URN:catalog:[InstitutionCode]:[CollectionCode]:[CatalogNumber]"), the use of the related concepts should be reviewed. A suggested new element correspondance could be:
- ABCD.UnitGUID ->  DwC.occurrenceID
- ABCD.(...)SpecimenUnit/Accessions/AccessionNumber (plus others?) -> DwC.CatalogNumber
- ABCD.UnitID -> DwC.?

As this has further reaching consequences, it needs wider consultation and documentation follow-up. Current documentation confirms the parallelization of UnitID with CatalogNumber (http://rs.tdwg.org/dwc/terms/history/dwctoabcd/index.htm; http://www.bgbm.org/TDWG/CODATA/Schema/Mappings/DwC2.0.htm). If the ABCD.UnitGUID is to be used, publishers need to change their mapping to supply it; alternatively, a combination of ABCD.SourceInstitutionID+SourceID+UnitID would supply the value. Also needs further checking into impacts on crawling/indexing and data portal.]]>
    


Author: kbraak@gbif.org
Created: 2016-01-07 15:19:08.232
Updated: 2016-01-07 15:19:08.232
        
ABCD mappings provided by BGBM:
http://tinyurl.com/termMappings
    


Author: mdoering@gbif.org
Created: 2016-02-10 10:53:11.492
Updated: 2016-02-10 10:53:11.492
        
From BGBMs ABCD mappings ID values are mapped as:

{noformat}
DataSets/DataSet/DataSetGUID -> dwc:datasetID
DataSets/DataSet/Units/Unit/SourceInstitutionID -> dwc:institutionCode & dwc:institutionID
DataSets/DataSet/Units/Unit/SourceID -> dwc:collectionCode & dwc:collectionID

DataSets/DataSet/Units/Unit/UnitID -> dwc:catalogNumber
DataSets/DataSet/Units/Unit/UnitGUID -> dwc:occurrenceID
DataSets/DataSet/Units/Unit/CollectorsFieldNumber -> dwc:recordNumber

DataSets/DataSet/Units/Unit/Identifications/Identification/Result/TaxonIdentified/ScientificName/NameAtomised/Zoological/NamedIndividual -> individualID
DataSets/DataSet/Units/Unit/ObservationUnit/ObservationUnitIdentifiers/ObservationUnitIdentifier -> individualID
DataSets/DataSet/Units/Unit/SpecimenUnit/Accessions/AccessionNumber -> individualID
{noformat}

I would question the mapping of AccessionNumber and ObservationUnitIdentifier to individualID as ABCD users have reported themselves they'd like accession number to appear as catalogNumber