Issue 18611

Ensure all datasets are assigned a license

18611
Reporter: kbraak
Assignee: kbraak
Type: NewFeature
Summary: Ensure all datasets are assigned a license
Priority: Unassessed
Resolution: Fixed
Status: Closed
Created: 2016-06-24 16:58:36.326
Updated: 2016-08-29 17:27:50.659
Resolved: 2016-08-29 17:27:50.625
        
Description: * Do NOT add @NotNull to license field on Dataset (model object gbif-api) to maintain backwards compatibility in our API. Otherwise existing consumers of our API would have to ensure Datasets have a license when being changed
* In the registry-ws:
** Ensure license set on dataset prior to executing create/update dataset operations, defaulting to CC_BY_4_0 when license hasn't been specified or cannot be parsed. Note this applies to all dataset types: occurrence, checklist, sampling event and metadata only.
*** Try to parse license from EML when adding a new Endpoint (with type EML) to the dataset, or when adding a new preferred EML metadata document.
*** When creating datasets, if the license was successfully parsed but isn't a GBIF-supported license, set to UNSUPPORTED.
*** When updating datasets, if the license was successfully parsed but isn't  a GBIF-supported license, do not override existing license. Log the URI or title that was interpreted as UNSUPPORTED. Later use information from these logs to make LicenseParser smarter.
** Update legacy (IPT) API accordingly and ensure IPTs can still register and update datasets.
* In the registry database:
** Add a license enum (liquibase changeset - registry-ws)
** Add license field to dataset table of type license enum (liquibase changeset - registry-ws)
** Insert license for each dataset in dataset table using SQL script, defaulting to UNSPECIFIED if license is unknown. ([~ahahn] to provide the list upon demand)
* In the registry-metadata:
** Parse machine readable license in EML (and DC) and set license on dataset
** Do not support free-text rights statements any more.
* In registry-metasync:
** Try to parse machine readable license specified in metadata responses for TAPIR and BioCASE.
** Do not support free-text rights statements any more.
** Continue creating new datasets published from DiGIR, BioCASE, and TAPIR using default license CC_BY_4_0.
** Continue updating datasets, but only override existing license when a supported license is detected.
* In registry-ws-client:
** Update tests accordingly
* In registry-examples:
** Update tests accordingly

]]>
    


Author: ahahn@gbif.org
Comment: Concerning "Reject update dataset operation (...)": we need some pathway that allows adding a different (supported) license to an already registered dataset that previously got UNSUPPORTED. Is the intention here to keep that process fully manual, or what is intended with "should also apply when trying to update a dataset by inserting a new preferred EML metadata document"?
Created: 2016-07-22 12:04:35.196
Updated: 2016-07-22 12:04:35.196


Author: ahahn@gbif.org
Comment: To consider: the license implementation only concerns occurrence datasets, not checklists - how best to handle this in the application of licenses?
Created: 2016-07-22 12:12:00.778
Updated: 2016-07-22 12:12:00.778


Author: kbraak@gbif.org
Created: 2016-07-22 16:27:36.867
Updated: 2016-07-22 16:27:36.867
        
Thanks for your feedback [~ahahn@gbif.org]

I believe we should try to automate the process of updating licenses on datasets as much as possible.

For example, a DwC-A dataset could have its license changed by uploading a new metadata document that specifies a new machine readable license. We should be able to support these types of changes, not wait for a helpdesk manager to change them manually as this would be a bottleneck in the workflow.

I did some research into support for machine readable licenses by DiGIR, BioCASE, and TAPIR. Apparently ABCD 2.06 also has support for machine readable licenses. Therefore we could also try to automatically parse licenses from BioCASE and TAPIR datasets using ABCD 2.0.6. Here are my findings for legacy protocols' support for machine readable licenses:
* *DiGIR* metadata response has no license (see [example DiGIR metadata response|http://digir.sourceforge.net/prot/current/metadataResponseExample.xml]). Therefore new DiGIR datasets will have to be evaluated manually.

* *BioCASE*:
** has support for machine readable licenses in its metadata via its [License|http://www.bgbm.org/TDWG/CODATA/Schema/ABCD_2.06/HTML/ABCD_2.06.html#element_License_Link031DA3E8] element:  It is not mandatory, and BioCASE publishers would need to be trained to populate it properly by specifying one of the three GBIF supported licenses - assuming we want to support machine readable licenses for ABCD.
 ** can also export data to DwC-A, however, its EML.xml has no intellectualRights block, and thus currently has no possibility of specifying a license. Adding a machine readable license to the exports' EML.xml would be a nice enhancement. BioCASE DwC-As will get treated the same as other DwC-As during indexing, so more follow up is needed with BioCASE developers and BioCASE publishers to ensure machine readable licenses used.
* *TAPIR* installations have support for a variety of standards: DwC 1.0, DwC 1.4, DwC 1.4 Geospatial DwC 1.4 Curatorial, ABCD 1.2, ABCD 2.06. Only ABCD 2.06 supports machine readable licenses (see above) via the [License|http://www.bgbm.org/TDWG/CODATA/Schema/ABCD_2.06/HTML/ABCD_2.06.html#element_License_Link031DA3E8]  element. This element isn't mandatory, and TAPIR/ABCD 2.06 publishers would need to be trained to use it properly by specifying one of the three GBIF supported licenses - assuming we want to support machine readable licenses for ABCD.

Thanks for highlighting that the round of consultation with publishers only covered occurrence datasets. Checklists and sampling event datasets could specify a machine readable license in their EML.xml file. Otherwise their license will be set to UNSPECIFIED. More details about how to handle the removal of occurrence records associated to checklists and sampling event datasets without a supported license will be discussed in POR-3146.
    


Author: kbraak@gbif.org
Comment: Work completed. Closing issue. 
Created: 2016-08-29 17:27:50.655
Updated: 2016-08-29 17:27:50.655