Issue 14373

Portal reports on data quality and fitness-for-use for each dataset

14373
Reporter: ahahn
Type: Epic
Summary: Portal reports on data quality and fitness-for-use for each dataset
Priority: Major
Status: ToDo
Created: 2013-11-18 12:37:54.509
Updated: 2016-02-25 14:05:22.983
DueDate: 2014-12-31 00:00:00.0
        
Description: Ideal future scenario: any user of data has the necessary information available to judge whether the content of a dataset is suitable for the type of use they intend to put it to, and can set appropriate filters so that only data meeting these criteria are included in a download. This includes criteria that are the outcome of fitness-for-use working groups, i.a. for different thematic areas like invasive alien species or mountain biodiversity. On the workflow side, the endorsement of data publishers and datasets involves not only Nodes, but also fitness-for-use working groups and other stakeholders, so that expert views can be communicated (peer review, tagging).

*Portal reports on data quality and fitness-for-use for each dataset (and species): milestone Dec 2014*

*Rationale*
To allow a user to judge whether the content of a dataset is suitable for their purpose, the necessary information needs to be available in a standardised way to allow filtering on those criteria. The areas that need to be covered include
- standards compliance (controlled vocabularies etc)
- metadata completeness (see GBIF-5, especially in the area of metadata critical to qualify data content)
- presence of key data elements
- automated check and flagging for issues and outliers

Definitions of criteria coming from the fitness-for-use working groups are to be integrated into the process. At the same time, procedures for the endorsement or tagging of datasets and data publishers by Nodes, working groups and other stakeholders are to be defined, seeking a broader basis for understanding quality and recognition.

*Required components*
- consolidate a list of quality markers / criteria that are already well-understood and defined
- specify useful formats for working group outcomes, to help focus their effort and allow implementation
- establish or connect with fitness-for-use working groups (for details, see relevant epic)
- IPT: include checking routines for key data elements, standard compliance and metadata completeness according to a) list of available criteria and b) first working group recommendations
- portal: handle data fitness and quality criteria in metadata overview, searches and downloads
- harvester: include identified checking routines for issues (incl missing elements or content) and outliers. Potentially allow reporting of identified issues back to the dataset curator.

*Note*
This epic is huge and requires further breakdown, not least for the endorsement workflow revision. For the 2014 WP, realistic "done" criteria need to be specified, while keeping notes about additional components for 2015+. It may be unlikely to include working group outcomes within 2014, depending on the speed with which the working group(s) can be established and run.
]]>