Issue 18155

Citizen science statistics

18155
Reporter: kylecopas
Assignee: jlegind
Type: Task
Summary: Citizen science statistics
Priority: Critical
Resolution: Fixed
Status: Closed
Created: 2016-01-14 16:57:50.868
Updated: 2016-04-04 16:17:48.132
Resolved: 2016-04-04 16:17:48.073
        
Description: Okay, at long last, you have it in writing.

Attached is an edited version of the datasets report you gave us on Tuesday, Jan.


The legend of the colour-coding goes like this.
• Red = remove, exclude; certain or unlikely to include any significant cit-sci contribution.
• Yellow = proposed for inclusion in the ‘upper bound’ version of the statistics. Interpretation from metadata or other sources suggests the inclusion of some notable but indeterminate proportion of cit-sci records
• Green or blank = include in both upper and lower bound calculations; solidly connected to cit-sci

We have no additions to the datasets you've flagged, only deletions (red) and qualifications (yellow, for upper bound calculations only).

Once you've got them, we'd like upper- and lower-bound totals for the remaining datasets calculated by
1. country
2. higher level taxa per the country reports (i.e., http://www.gbif.org/country/CR/report), PLUS ONE ADDITIONAL TAXON: the order Lepidoptera

We ask for this additional due to the persistent but perhaps anecdotal claim that butterflies, moths, and skippers are better represented in citizen science.

Happy to help refine this as and when needed—tusind tak!

Kyle]]>
    
Attachment citsci_adds-160218.xlsx
Attachment citsci_Feb16_v2.xlsx
Attachment citsci_Jan-2016-REV1.xlsx


Author: kylecopas
Created: 2016-02-20 14:33:37.728
Updated: 2016-02-20 14:33:37.728
        
UPDATED FOR REPROCESSING.

New spreadsheet includes four worksheets containing datasets that need to be added—or in the case of 'NBN-REMOVE', removed from—our most recent citizen science analyses.

The working definition guiding this analysis are datasets generated in part by "Volunteer(s), who are not necessarily experts, collect and/or process data as part of scientific enquiry”.

The sources are (in order):
• Datasets that NBN identifies as containing citizen science contributions ('NBN' tab)
• Previously included datasets that should be removed from the analyses, because NBN reports they are unlikely to contain cit-sci records (NBN-REMOVE)
• Datasets classified as citizen science in two newly available sources: 1) the supplementary data from Quentin Groom’s manuscript ("Is Citizen Science Open Science?") now under review; 2) Citizen Science providers/resources with species occurence data in Europe, a list compiled by NBIC as part of EUBON work package 1 received from Nils Valland (Groom-NBIC)
• Datasets from the GBIF publisher GEO-Tag der Artenvielfalt (also identified by Groom)

In addition, the ALA dataset for Eremaea should be removed from these analyses, as Miles Nicholls reports that it is now duplicated by the eBird EOD. (see detail in PF-2359 http://dev.gbif.org/issues/browse/PF-2359).

Given the time lag between the analyses, I'd recommend that we recompile the datasets for a fresh run rather than making these adjustments and then adding them to previous ones. I'd also request the inclusion of the datasets' DOIs, even if they're in addition to a URL or datasetkey.

Finally, I'd also like to ask that we explore the option of creating a DOI for the compiled datasets, in the thought that it may provides an efficient means of replicating and sharing the data behind our analysis—particularly, say, as supplementary material for the manuscript. 
    


Author: kylecopas
Comment: Excel doc includes dataset additions and deletions for first analyses 
Created: 2016-02-20 14:43:57.999
Updated: 2016-02-20 14:43:57.999


Author: kylecopas
Comment: Clean sheets for citsci data analyses
Created: 2016-02-29 16:39:43.058
Updated: 2016-02-29 16:39:43.058


Author: kylecopas
Created: 2016-02-29 16:39:45.734
Updated: 2016-03-04 15:23:43.183
        
Okay, here we go.

Updated Excel doc has two worksheets containing datasets for inclusion in this analysis. First sheet contains all datasets (280) to include in this analysis except for the 1,192 from GEO-Tag. Second sheet contains the GEO-Tag datasets.

In addition, here are details on the desired facets of the analysis.

1. Total citsci occurrences, based on all 1,472 identified datasets

2. Total citsci occurrences by publishing country

3. Citsci occurrences by publishing country by the following taxa
    KINGDOM
    Animalia
    Plantae
    Fungi
    Other (Protozoa + Virus + Chromista + Archaea + Unknown)

    SUB-KINGDOM
{noformat}
|
(class)
|

    Lepidoptera
    Aves
    Mammalia
    Osteichthyes (Actinopterygii + Sarcopterygii + Tetrapoda)
    Amphibia
    Insecta
    Reptilia
    Arachnida
{noformat}
{noformat}
|
(phyla)
|

    Magnoliophyta
    Gymnospermae (Pinophyta/Coniferophyta, Ginkgophyta, Cycadophyta, Gnetophyta [Gnetum, Ephedra, Welwitschia])
    Pteridophyta
    Bryophyta
    Ascomycota
    Basidiomycota
    Mollusca
{noformat}
4. Citsci occurrences by country (record location)

5. Citsci occurences by country (location) by the following taxa
    KINGDOM
    Animalia
    Plantae
    Fungi
    Other (Protozoa + Virus + Chromista + Archaea + Unknown)

    SUB-KINGDOM
    Aves
    Lepidoptera

6. Total citsci occurrences by continent (Africa, Asia, Europe, North America, Oceania, South America)

7. Citsci occurrences by continent by the following taxa:
    KINGDOM
    Animalia
    Plantae
    Fungi
    Other (Protozoa + Virus + Chromista + Archaea + Unknown)

    SUB-KINGDOM
    Aves
    Lepidoptera

    


Author: kylecopas
Comment: two small corrections. Deleted 'NULL' set from Observation.org/Waarneming.nl and restored four Tela Botanica datasets.
Created: 2016-03-02 16:13:26.528
Updated: 2016-03-02 16:13:26.528


Author: kylecopas
Comment: One remaining correction made to dataset list, spreadsheet replaced
Created: 2016-03-02 16:13:29.965
Updated: 2016-03-02 16:13:29.965


Author: jlegind@gbif.org
Comment: Please reopen if a new statistical facet is needed.
Created: 2016-04-04 16:17:48.102
Updated: 2016-04-04 16:17:48.102