Issue 14374

Portal reports on data quality and fitness-for-use for each species

14374
Reporter: ahahn
Type: Epic
Summary: Portal reports on data quality and fitness-for-use for each species
Priority: Major
Status: ToDo
Created: 2013-11-18 13:53:05.145
Updated: 2014-02-25 13:23:51.077
DueDate: 2014-12-31 00:00:00.0
        
Description: Ideal future scenario: any user of species-related data has the necessary information available to judge whether the available information is suitable for the type of use they intend to put it to, and can set appropriate filters so that only data meeting these criteria are included in a download.

*Portal reports on data quality and fitness-for-use for each (dataset and) species: milestone Dec 2014*

*Rationale*
To allow a user to judge whether the data available about a species is suitable for their purpose, the necessary information needs to be available in a standardised way to allow filtering on those criteria. The areas that need to be covered include
- standards compliance (controlled vocabularies, adherence to syntax rules for names, etc)
- metadata completeness (see GBIF-5, especially in the area of metadata critical to qualify data content)
- presence of key data elements
- automated check and flagging for issues and outliers
- data coverage analyses and qualification, e.g. suitability for niche modelling
- coverage and completeness: are the available data representative of the species, e.g. compared against distribution maps

Definitions of criteria coming from the fitness-for-use working groups are to be integrated into the process. At the same time, procedures for the endorsement or tagging of (reference) datasets by Nodes, working groups and other stakeholders are to be defined, seeking a broader basis for understanding quality and recognition.

*Required components*
- consolidate a list of quality markers / criteria that are already well-understood and defined
- specify useful formats for working group outcomes, to help focus their effort and allow implementation
- establish or connect with fitness-for-use working groups (for details, see relevant epic)
- IPT: include checking routines for key data elements, standard compliance and metadata completeness according to a) list of available criteria and b) first working group recommendations
- portal: handle data fitness and quality criteria in metadata overview, searches and downloads
- harvester: include identified checking routines for issues (incl missing elements or content) and outliers. Potentially allow reporting of identified issues back to the dataset curator.

*Note*
As GBIF-7, this epic is huge and requires further breakdown. For the 2014 WP, realistic "done" criteria need to be specified, while keeping notes about additional components for 2015+. It may be unlikely to include working group outcomes within 2014, depending on the speed with which the working group(s) can be established and come back with recommendations that are specific enough for implementation.]]>