Issue 18494

Override or Interpret dc:license

18494
Reporter: kbraak
Assignee: fmendez
Type: Bug
Summary: Override or Interpret dc:license
Priority: Unassessed
Status: Open
Created: 2016-05-24 15:36:51.554
Updated: 2016-08-29 17:26:21.672
        
Description: Currently we do not show dc:license on the occurrence detail page even if it is populated, e.g. http://www.gbif.org/occurrence/1234530904

dc:license replaces deprecated term dc:rights. See [here|https://github.com/gbif/ipt/blob/master/v2.3-spec.md#deprecated-terms] for a list of all DwC terms that were recently deprecated.

With regards to dc:license, GBIF.org needs to either:

1. Override user-supplied values for dc:license, with the license that was applied to the whole dataset. Note machine-readable licences can only be applied to the dataset in the EML document, or manually.

2. Interpret user-supplied values for dc:license, and show it on the occurrence detail page, provided that it isn't a more restrictive license than the license applied to the whole dataset. Note dc:license is free-text, and an array of different licenses written in different formats, and using different version numbers will have to be supported if we try to interpret it, e.g.

* http://creativecommons.org/publicdomain/zero/1.0/legalcode
* http://creativecommons.org/publicdomain/zero/1.0
* CC0
* CC Zero
* CC BY 2.0, CC BY 3.0, CC BY 4.0

In the meantime, we should still be showing dc:license instead of dc:rights on the Occurrence Detail page - see POR-3111, POR-3112]]>
    


Author: ahahn@gbif.org
Created: 2016-05-25 13:48:23.513
Updated: 2016-05-25 13:48:23.513
        
We will have cases of datasets which, for the metadata value, use the most restrictive license option of all those licenses supplied for individual records, which may be more open. This would speak against option 1, though it is likely a rare case.

I cannot judge which implementation makes best sense, but we should make sure that
- the dataset license is, as stated above, not overridden by a more restrictive license within the record
-- I propose that a violation of this should flag a warning at indexing time, so it can be reported back to the publisher. Erring with caution, the record should not be indexed if it has a license that is not supported or stricter than the dataset license
- all occurrence records do carry a license after indexing - if none is stated, then the dataset license is used. A dataset that does not state one of the permitted licenses is not indexed.
- while there is no specification for dc:license, we cannot really enforce any specific format, so trying to interpret and matching the representation to that of the metadata sounds like a reasonable interim solution. Flagging any records that do not fit any of the considered values, and following up by extending the interpretation vocabulary might be reasonable, but probably an implementation decision.
    


Author: kbraak@gbif.org
Created: 2016-07-14 16:42:13.56
Updated: 2016-07-14 16:42:13.56
        
[~ahahn@gbif.org]

Below I summarise the decision on how GBIF has decided to interpret dc:license.

Note dc:license is now shown on the occurrence detail page following implementation of POR-3112.

Of the two options for implementation listed above, GBIF has decided to implement #1. "Override user-supplied values for dc:license, with the license that was applied to the whole dataset. Note machine-readable licences can only be applied to the dataset in the EML document, or manually."

Because GBIF won't interpret record-level licenses, GBIF won't check if the license applied to records is more restrictive that the license applied to the dataset. It will be the publisher's responsibility to assign the correct (adequately restrictive) license to the dataset. Of course GBIF needs to be able to properly parse and interpret dataset-level licenses supplied in machine readable format in the metadata (see blocking issue POR-3133).

Furthermore, only datasets with a supported license will ever be indexed. Therefore all indexed occurrence records will have a supported license assigned to them. Datasets with and unsupported license (or not specified at all) will not be indexed by GBIF.


    


Author: ahahn@gbif.org
Comment: Note: the decision summarized above describes the short-term implementation. Longer term, we may still want to evaluate supplied record-level licenses and allow filters and downloads to contain records with a more lenient license, even if the dataset as a whole applies a more restrictive one. Likewise, flagging of records with licenses more restrictive than the dataset one should ideally be caught at indexing time. This is, however, not for the first round of implementation. Publishers should be aware that a license assigned at dataset metadata level applies to the dataset and all its records.
Created: 2016-07-18 11:40:14.204
Updated: 2016-07-18 11:40:14.204