Issue 12335

Improve percentage of occurrence records stating the basis of record

12335
Reporter: ahahn
Assignee: ahahn
Type: Epic
Summary: Improve percentage of occurrence records stating the basis of record
Priority: Critical
Status: Open
Created: 2012-11-20 15:55:29.597
Updated: 2012-11-21 12:10:52.015
        
Description: About 20% of all occurrence records give the basis of record as "unknown. As this is an important filter for users, this number needs to come down, especially since it is hard to explain why it should not be clear whether an occurence is a physical object in a collection, a field observation, a fossil or an ex situ living collection.

There are several possible points of failure:
1. the source data are a mixed legacy source where the nature of the occurrence is indeed unknown. This probably only accounts for a minority of the cases, though
2. the source data are a mixed set, and while the nature of the individual occurrences is known in principle, possibly implicitly, there are no resources to make this knowledge explicit in the database. Such cases would need handling on a case by case basis, judging the value of the investment
3. the source data supply a value for the basis of record, but the value cannot be interpreted in the index processing. Possible sub-cases: a) the value supplied does not make sense (e.g. due to a mis-mapping or a misunderstanding of the concept). This requires a better automated information flow (flagging, notification, possibly helpdesk support), or b) the value supplied is an unknown variant of a meaningful value. Then the interpretation could be extended (mapping)
4. the source data uniformly represents a single basis of record, which is declared in the dataset metadata. In such a case, it is not obvious to the publisher why the same information should be repeated at record level. Here, the interpretation workflow should make use of the dataset metadata and extend the individual records based on the value derived from there

To do:
- run some statistics to estimate the impact of 3. and 4.
- identify the easiest wins to improve record numbers with known basis of record
- define required changes (indexing workflow etc, reporting workflow, helpdesk activity) in more detail, determining priorities based on the outcome]]>
    


Author: ahahn@gbif.org
Comment: To judge the impact of dataset metadata and interpretation problems, some summary counts are needed.
Created: 2012-11-20 16:52:38.384
Updated: 2012-11-20 16:52:38.384