18155
Reporter: kylecopas
Assignee: jlegind
Type: Task
Summary: Citizen science statistics
Priority: Critical
Resolution: Fixed
Status: Closed
Created: 2016-01-14 16:57:50.868
Updated: 2016-04-04 16:17:48.132
Resolved: 2016-04-04 16:17:48.073
Description: Okay, at long last, you have it in writing.
Attached is an edited version of the datasets report you gave us on Tuesday, Jan.
The legend of the colour-coding goes like this.
• Red = remove, exclude; certain or unlikely to include any significant cit-sci contribution.
• Yellow = proposed for inclusion in the ‘upper bound’ version of the statistics. Interpretation from metadata or other sources suggests the inclusion of some notable but indeterminate proportion of cit-sci records
• Green or blank = include in both upper and lower bound calculations; solidly connected to cit-sci
We have no additions to the datasets you've flagged, only deletions (red) and qualifications (yellow, for upper bound calculations only).
Once you've got them, we'd like upper- and lower-bound totals for the remaining datasets calculated by
1. country
2. higher level taxa per the country reports (i.e., http://www.gbif.org/country/CR/report), PLUS ONE ADDITIONAL TAXON: the order Lepidoptera
We ask for this additional due to the persistent but perhaps anecdotal claim that butterflies, moths, and skippers are better represented in citizen science.
Happy to help refine this as and when needed—tusind tak!
Kyle]]>
Author: kylecopas
Created: 2016-02-20 14:33:37.728
Updated: 2016-02-20 14:33:37.728
UPDATED FOR REPROCESSING.
New spreadsheet includes four worksheets containing datasets that need to be added—or in the case of 'NBN-REMOVE', removed from—our most recent citizen science analyses.
The working definition guiding this analysis are datasets generated in part by "Volunteer(s), who are not necessarily experts, collect and/or process data as part of scientific enquiry”.
The sources are (in order):
• Datasets that NBN identifies as containing citizen science contributions ('NBN' tab)
• Previously included datasets that should be removed from the analyses, because NBN reports they are unlikely to contain cit-sci records (NBN-REMOVE)
• Datasets classified as citizen science in two newly available sources: 1) the supplementary data from Quentin Groom’s manuscript ("Is Citizen Science Open Science?") now under review; 2) Citizen Science providers/resources with species occurence data in Europe, a list compiled by NBIC as part of EUBON work package 1 received from Nils Valland (Groom-NBIC)
• Datasets from the GBIF publisher GEO-Tag der Artenvielfalt (also identified by Groom)
In addition, the ALA dataset for Eremaea should be removed from these analyses, as Miles Nicholls reports that it is now duplicated by the eBird EOD. (see detail in PF-2359 http://dev.gbif.org/issues/browse/PF-2359).
Given the time lag between the analyses, I'd recommend that we recompile the datasets for a fresh run rather than making these adjustments and then adding them to previous ones. I'd also request the inclusion of the datasets' DOIs, even if they're in addition to a URL or datasetkey.
Finally, I'd also like to ask that we explore the option of creating a DOI for the compiled datasets, in the thought that it may provides an efficient means of replicating and sharing the data behind our analysis—particularly, say, as supplementary material for the manuscript.
Author: kylecopas
Comment: Excel doc includes dataset additions and deletions for first analyses
Created: 2016-02-20 14:43:57.999
Updated: 2016-02-20 14:43:57.999
Author: kylecopas
Comment: Clean sheets for citsci data analyses
Created: 2016-02-29 16:39:43.058
Updated: 2016-02-29 16:39:43.058
Author: kylecopas
Created: 2016-02-29 16:39:45.734
Updated: 2016-03-04 15:23:43.183
Okay, here we go.
Updated Excel doc has two worksheets containing datasets for inclusion in this analysis. First sheet contains all datasets (280) to include in this analysis except for the 1,192 from GEO-Tag. Second sheet contains the GEO-Tag datasets.
In addition, here are details on the desired facets of the analysis.
1. Total citsci occurrences, based on all 1,472 identified datasets
2. Total citsci occurrences by publishing country
3. Citsci occurrences by publishing country by the following taxa
KINGDOM
Animalia
Plantae
Fungi
Other (Protozoa + Virus + Chromista + Archaea + Unknown)
SUB-KINGDOM
{noformat}
|
(class)
|
Lepidoptera
Aves
Mammalia
Osteichthyes (Actinopterygii + Sarcopterygii + Tetrapoda)
Amphibia
Insecta
Reptilia
Arachnida
{noformat}
{noformat}
|
(phyla)
|
Magnoliophyta
Gymnospermae (Pinophyta/Coniferophyta, Ginkgophyta, Cycadophyta, Gnetophyta [Gnetum, Ephedra, Welwitschia])
Pteridophyta
Bryophyta
Ascomycota
Basidiomycota
Mollusca
{noformat}
4. Citsci occurrences by country (record location)
5. Citsci occurences by country (location) by the following taxa
KINGDOM
Animalia
Plantae
Fungi
Other (Protozoa + Virus + Chromista + Archaea + Unknown)
SUB-KINGDOM
Aves
Lepidoptera
6. Total citsci occurrences by continent (Africa, Asia, Europe, North America, Oceania, South America)
7. Citsci occurrences by continent by the following taxa:
KINGDOM
Animalia
Plantae
Fungi
Other (Protozoa + Virus + Chromista + Archaea + Unknown)
SUB-KINGDOM
Aves
Lepidoptera
Author: kylecopas
Comment: two small corrections. Deleted 'NULL' set from Observation.org/Waarneming.nl and restored four Tela Botanica datasets.
Created: 2016-03-02 16:13:26.528
Updated: 2016-03-02 16:13:26.528
Author: kylecopas
Comment: One remaining correction made to dataset list, spreadsheet replaced
Created: 2016-03-02 16:13:29.965
Updated: 2016-03-02 16:13:29.965
Author: jlegind@gbif.org
Comment: Please reopen if a new statistical facet is needed.
Created: 2016-04-04 16:17:48.102
Updated: 2016-04-04 16:17:48.102