Issue 14352

All datasets have clear, machine-readable licences

14352
Reporter: ahahn
Assignee: trobertson
Type: Epic
Summary: All datasets have clear, machine-readable licences
Priority: Major
Status: ReadyForDev
Created: 2013-11-13 14:01:48.092
Updated: 2014-12-11 14:40:08.212
DueDate: 2014-12-31 00:00:00.0
        
Description: [By the end of 2014], each dataset that is indexed into the GBIF portal has a clear, machine-readable licence agreement assigned. The agreement is one out of a pre-agreed limited set of possible choices, agreed by the data publisher.

*All mobilized data sets are associated with a supported machine-readable data licence: milestone Dec 2014*

*Rationale*
To date, most datasets lack clear, unambiguous machine-readable licences. Users of data are requested to review the individual licence texts and adjust their data use accordingly. In practical terms, this is not realistically possible. In order to allow automated filtering of downloads for e.g. datasets excluding commercial use, licences need to be limited to an agreed, standardized set, and all datasets have to have one of those licences assigned.

*Existing work*
Consultation from Aug 2013, call attached. Responses in IMAP folder "licensing", summary list attached, extended summary for circulation/reponse in comment below.
In total: 31 responses, all in favour of machine-readable licences. NC support Y/N about 33/66%.
Consultation from Apr 2014, resulting in a proposal for decision at GB21 (links to the materials: see comment from 03/Sep/14, below). The Governing Board voted in favour of the core proposal (attached), but not Option 2 of the same document (original at http://livelink.gbif.org/gbif/livelink/overview/4803827); summary at http://www.gbif.org/page/9773

*Required components*
- settle agreement on set of licences to be used - done
- IPT: enforce selection from the set of licences for a dataset registration
- gbif.org: handle licences in filters and downloads, i.a.
- gbif.org: support citations / attribution
- information (guidelines on usage) for data users (e.g. make clear that CC0 still means that the GBIF Data Use Agreement applies)
- community norms to be developed
- guidance on what is considered to be "non-commercial"
- analyse the mapping of current rights statements to the set of supported licences (draft version: http://livelink.gbif.org/gbif/livelink/overview/4894766)
- communication with publishers to inform of the background, action required, and alternatives in case of disagreement
- follow-up communication with any publisher who did not assign a valid licence by [deadline 1]
- potential follow-up action on datasets without an assigned licence by [deadline 2]

*Risks*
Legal implications (successful consultation with Nodes is critical; the interpretation of licence equivalents requires some expert input; the drafting of final agreements (incl. Data Use and Data Sharing) should get legal input to ensure Nagoya-compliance)
]]>
    
Attachment GB21_PRE_20_Options_in_response_to_consultations.docx
Attachment GBIF_Consultation_Standard_Data_Licences.pdf
Attachment LicensingReponses.xlsx


Author: ahahn@gbif.org
Created: 2013-11-18 11:33:33.734
Updated: 2013-11-18 11:34:34.527
        
Check with Peter D. on https://github.com/Datafable/gbif-data-licenses/blob/master/data/licenses.csv, https://raw.github.com/Datafable/gbif-data-licenses/master/data/joined_data.csv, https://github.com/Datafable/gbif-data-licenses

    


Author: trobertson@gbif.org
Created: 2014-03-19 17:23:38.566
Updated: 2014-03-19 17:23:38.566
        
In August 2013, GBIF initiated a call for consultation surrounding the licensing of data.  The call is attached (word doc), which concisely summarizes the issues.

In the call respondents were asked to comment on the need for machine readable licenses, what licenses should be considered, and if GBIF should implement a solution to better differentiate between non commercial (NC) and commercial uses.

32 responses were received, and ranged from the thoughts of individuals, to official institutional responses and from working groups; some of them are very detailed and clearly from domain experts.  The attached spreadsheet lists the respondents.  Many of the responses included significant commentary - e.g. http://dx.doi.org/10.6084/m9.figshare.799766

There are 2 key messages emerging from the responses, which I paraphrase here:


1) Around 1/3 of respondents believe the only suitable license is the CC0 waiver of rights:

Copyright law does not apply to data (these are factual data for the most part so excluded)
Data use agreements / contracts are costly, and stifle use and reproducibility (e.g. archiving for future scientists to rerun procedures)
Non-commercial restrictions (if they could even be applied) inhibit scientific use, since most journals / data repositories engage in commercial transactions.
Data use cannot possibly be controlled (“how can you prove someone acted on the information for commercial benefit”?)
Enforcing anything requires expensive litigation processes

          Recommendations from this body are to:
              1. Promote, educate and use CC0 licensing (waive rights)
              2. Adopt “Norms” by defining the clear best practices for citation, relying on a self-policing community
              3. Focus on tools to assist in citation, and infrastructure to track use


2) Around 1/3 of respondents believe that restrictions for commercial use are indeed needed; several believing, rightly or wrongly, that CC-BY-NC was the way to achieve this.

Many agree that defining what constitutes commercial use is difficult, and requires clear explanation
Some specifically referred to the concern being limited only to large-scale reuse generating significant revenue to a commercial venture.



This is obviously going to require some discussion, but based on what I have learnt here I tend to think we should:

Implement our plan to issue a DOI for the citation during data access – many concerns were really about getting better attribution.
Draft community “Norms” to give guidance to those publishing data. Canadensys Norms could form the basis for this: http://www.canadensys.net/about/norms
Educate the GBIF community about licenses; many don’t know, don’t care or have misconceptions.
Make a decision if we are going to embrace commercial restrictions or not.  If we decide to, then we need to:
Document what is considered “commercial use”
Determine the means for allowing publishers to declare their intent is “not for commercial use”.  CC-BY-NC might be applicable for this, but is by no means definitive (see http://www.pensoft.net/J_FILES/1/articles/2189/2189-G-3-layout.pdffor a good discussion on this).
Add a filter to the portal to limit these data on download based on this flag

One way could be to allow people to use CC-BY-NC for this purpose with clear guidance in the “Norms” that it is unlikely to stand up to a legal challenge – however, GBIF would not further disseminate CC-BY-NC data through (e.g.) the portal unless people tick the “not commercial use” box.


If anyone else has been involved in this process, and have other comments, suggestions or ideas, I would gratefully receive them.

Thanks,
Tim
    


Author: trobertson@gbif.org
Comment: quick summary of the responses
Created: 2014-03-19 17:24:06.809
Updated: 2014-03-19 17:24:06.809


Author: ahahn@gbif.org
Created: 2014-09-03 11:11:55.378
Updated: 2014-09-03 11:12:53.892
        
Second consultation round April 2014:
Call: http://livelink.gbif.org/gbif/livelink?func=ll&objaction=overview&objid=4662029
Summary of responses: http://livelink.gbif.org/gbif/livelink?func=ll&objaction=overview&objid=4763660
Consultation summary (July 2014, both licensing and endorsement consultations): http://livelink.gbif.org/gbif/livelink?func=ll&objaction=overview&objid=4766941
Links to public materials: http://www.gbif.org/newsroom/consultations#licensing
    


Author: ahahn@gbif.org
Comment: New room publication summarizing the GB21 decisions: http://www.gbif.org/page/9773
Created: 2014-10-20 12:14:02.513
Updated: 2014-10-20 12:14:02.513


Author: ahahn@gbif.org
Created: 2014-12-11 14:40:08.212
Updated: 2014-12-11 14:40:08.212
        
IT side changes:
- registry changes
- licences on all records in the occurrence store (-> Hive)
- SOLR index
- filters
- downloads
- IPT