Issue 14341

Data usage reports

14341
Reporter: ahahn
Type: Task
Summary: Data usage reports
Priority: Major
Status: Reopened
Created: 2013-11-08 11:46:44.178
Updated: 2016-06-11 17:41:56.842
        
Description: interpreted issue (ah 13.1.14):
Re-supply the functionality from the old data portal that allowed a data publisher to retrieve data usage reports for individual datasets or all datasets of a given institution. In the new portal environment, this is limited to data downloads. Old portal example: http://data.gbif.org/datasets/provider/66/logs/?resource=&event=3001-3999&logGroup=&logLevel=&sd_day=01&sd_month=01&sd_year=2013&ed_day=31&ed_month=12&ed_year=2013

Merged information from PF-1992 (ah, 3.6.16)

Requested content:
- download event count by date (registry information)
- download record volume count contributed per dataset by date, and aggregated by publisher (registry information)
- graphs corresponding to the above by time (month, year)
- source of download request (country of user) - would technically be doable as we require countries at user registration time. Privacy issues need to be observed.
- the search parameters that the download responded to (registry information)
- key information to identify downloaded records (ids, scientific name, download time)

(- grouping records by kingdom, country of occurrences or other occurrence properties is far more demanding and we cannot promise that this will be implemented, but it might still be interesting to know if any of those questions are high on the shopping list report users)
(- getting stats about other data access than downloads (e.g. portal page views or external use of our non download webservices) is a completely new endeavour for us which we should avoid at this stage)

Interface considerations:
- allow to select a dataset and/or a publisher, or integrate into the stats tabs of these entities
- allow to set a date range for the reporting period

Use cases:
- Participant reporting (e.g. Annual Reports)
- prioritization of digitization and data quality improvement efforts

------
original requests:

As a data provider, we are interested in tracking usage of our data, namely:
- downloads by real users, but also
- data usage through gbif portal browses
- data usage through gbif web services

No matter if requests are made by real users of gbif portal, or by any kind of bots which provide data trough other pages. We are interested in ALL data usage.

In this scenario, which will be the way to collect these usage data?
Currently we use http://data.gbif.org/datasests/resource/[resourceID]/logs

We are very happy with the information provided in that link, but:
- Will that page be available in the future?
- Does it also reflect data usage through the new data portal?
- If not, how can I sum up both logs (old + new data portals) in order to generate our 2013 annual report?

In summary, how can I get the same info I used to get from a link like http://data.gbif.org/datasests/resource/[resourceID]/logs?

(received via email from David, ES)

follow-up 13.1.14:

So we wonder if:

1) you can provide the information about each downloaded record (at least scientificname + insitutioncode + collectioncode + catalognumber + download datetime). Perhaps in a separate sheet of the same xls file?

2) As a new (but useful) thing I would suggest the downloads sheet could include information about the country which the info was downloaded from (based on IP).

What for?
We are pretty interested in knowing what info is requested from Spain: this way we could give priority to provide more accurate information about the taxa (1) which is being studied from our own country (2) researchers.
For example, selecting specimens for digitize (making new high resolution scanned images accesible through ImageURL dwc concept).

(further request: see http://dev.gbif.org/issues/browse/PF-1992, merged above)]]>
    


Author: kbraak@gbif.org
Comment: Duplicates POR-1442
Created: 2014-01-09 15:30:35.498
Updated: 2014-01-09 15:30:35.498


Author: ahahn@gbif.org
Comment: Reopening this issue as the main tracker for data usage report implementation for individual datasets and publishers, to keep this separate from POR-1442 (comparative statistics across Participants)
Created: 2014-01-13 11:19:30.664
Updated: 2014-01-13 11:25:10.126


Author: kbraak@gbif.org
Comment: [~ahahn@gbif.org] PF-1992 duplicates this issue. There are some valuable comments in this issue though. Can you please review those comments, merge them into this issue, and close PF-1992? Thanks.
Created: 2016-06-03 11:17:01.226
Updated: 2016-06-03 11:17:01.226


Author: ahahn@gbif.org
Comment: Suggested next steps (TH, 3.6.16 via email) summarize this as concrete proposals on implementing this reporting service, based on needs that we are already aware of, (double-check implementation cost), then we could have a short consultation phase (probably through a nodes communication) to check if this meets expectations and maybe get ideas for tweaks or enhancements
Created: 2016-06-03 16:27:42.276
Updated: 2016-06-03 16:27:51.341


Author: kbraak@gbif.org
Comment: We also need to track the usage of datasets in papers. I scope out how we can implement this in POR-3120.
Created: 2016-06-11 17:41:56.842
Updated: 2016-06-11 17:41:56.842