Issue 12046

Make use of dataset level metadata during BasisOfRecord interpretation

12046
Reporter: trobertson
Type: NewFeature
Summary: Make use of dataset level metadata during BasisOfRecord interpretation
Priority: Major
Status: Open
Created: 2012-10-18 16:15:56.98
Updated: 2016-02-15 13:45:37.83
        
Description: Many occurrence records do not have basis of record set.  Some 20% of the total data are set to UNKNOWN.

However, some metadata profiles (e.g. EML, DiGIR etc) are able to provide information that asserts this for the entire dataset.  We wish to use that information during interpretation of occurrence records such that if none is given in the occurrence record, it defaults to any given dataset value.  Furthermore, someone may read a description of the dataset, and tag that dataset in the registry.

This will most likely depend on new features for the dataset metadata synchronizer, which would store that as a tag along the lines of
  - namespace: indexing.gbif
  - subject: dwc:basisOfRecord
  - value: [HUMAN_OBSERVATION,LIVING_SPECIMEN etc...]
Which could then be used for the interpretation.

The rules for when a dataset tag could be used, might be something along the lines of:

If the tag was created by the metadata synchronizer, or is in a trusted namespace (e.g. the Andrea / Jan type administrator namespace) and if there is only a single dwc:basisOfRecord value, then use that for any occurrence records that have no BoR set.
]]>
    


Author: mdoering@gbif.org
Created: 2012-10-18 18:49:41.519
Updated: 2012-10-18 18:49:41.519
        
good idea, but the tags predicate should not have a colon as its used to delimit the ns and the predicate.
And if the namespace is alreadz the HIT (which I think makes a lot of sense) we cant use any other trusted namespace. We would rather have to make sure only trusted people have access to modify the HIT namespace.