Issue 18642

Skip crawling occurrence records from datasets without a supported license

18642
Reporter: kbraak
Assignee: cgendreau
Type: NewFeature
Summary: Skip crawling occurrence records from datasets without a supported license
Priority: Unassessed
Status: Open
Created: 2016-07-14 17:44:52.159
Updated: 2016-08-29 17:25:02.582
        
Description: The following classes of datasets can have occurrence records:
* occurrence
* checklists (that can have associated occurrences)
* sampling event (that can have associated occurrences)

We need to skip crawling occurrence records from datasets that do not have a gbif-supported license assigned to them.

For new datasets, after downloading the dataset, crawling must determine whether the dataset has a supported license by parsing the dataset metadata for a machine readable license. _Note this step isn't necessary if the machine-readable license is parsed at the time of registration - see POR-3133._

Similarly for existing datasets, crawling should determine if the dataset's license has changed. If a supported license was previously assigned to the dataset (e.g. assigned manually by the data manager via the registry console) and no machine-readable license was supplied, proceed with crawling and do not override the license.

 ]]>
    


Author: ahahn@gbif.org
Comment: There is an issue here in that the license implementation only applies to occurrence datasets, and especially not to checklists. In that sense, we cannot expect accepted licenses for checklist datasets, but should not exclude their occurrence records from being indexed. Are there any implications for the implementation of license assignments in that?
Created: 2016-07-22 12:07:07.759
Updated: 2016-07-22 12:07:07.759


Author: kbraak@gbif.org
Created: 2016-07-22 16:54:18.567
Updated: 2016-07-22 16:54:18.567
        
[~ahahn@gbif.org]

Since IPT v2.2 was released in March 2015, the IPT prevents registration if the dataset was not assigned a GBIF-supported license. Important to note, is that this requirement has always applied to both occurrence datasets and datasets with associated occurrence records such as checklists and sampling event datasets.

Additional consultation will be needed with non-IPT publishers of checklists and sampling event datasets wanting to publish occurrence records associated to them. To be fair I agree we will have to relax our requirements for these publishers and allow their occurrence records to be indexed until that consultation is completed.

Until we can ensure that all datasets (of type occurrence or with associated occurrence records) have a supported license, we will not be able to ensure that all indexed occurrence records also have a supported license (see POR-3146).