Issue 18451

Make Sampling Event Detail Page

18451
Reporter: kbraak
Type: Improvement
Summary: Make Sampling Event Detail Page
Priority: Unassessed
Status: Open
Created: 2016-04-29 18:04:05.57
Updated: 2017-10-10 15:28:42.04
        
Description: * Blocked until GBIF indexes Event records

A sampling event's detail page should have a verbatim and interpreted detail pages just like the occurrence detail page. Please note, a sampling event may or many not have associated occurrences and measurements and facts.

*Verbatim detail page*: lists all verbatim fields entered for that sampling event record, plus all its extension records.

*Interpreted detail page*: lists all interpreted fields for that sampling event record, plus all interpreted fields interpreted for its extension records. Interpretation will be done on:
* the _*location*_ (WKT or lat/long location) is interpreted and shown on a map (see below)
* if the _*location*_ is referenced by other datasets (e.g. permanent plots reused in other datasets), then this list of associated datasets will get shown.
* the map will show the location of all _*occurrences*_ derived from the sampling event, as long as each occurrence has a location distinct from the event itself.
* counts how many _*occurrences*_ are derived from the sampling event (if they exist), with a link to a filtered occurrence search page listing them all.
* the _*samplingProtocol*_ will be interpreted (unified to a controlled vocabulary of sampling protocols), thereby allowing the user to search for datasets using sampling events with the same protocol
* provide a link to its _*parent event*_ (via parentEventID if it exists).
* list all the event's _*child events*_ (if other events link to its eventID in their parentEventID) and show their location within the same map.
* for parent events: for each species being monitored/measured, show its change in abundance by looking at all _*occurrences*_ for that species, derived from all child events ]]>

Attachment Screen Shot 2016-04-29 at 11.17.15.png

Attachment Screen Shot 2016-05-09 at 11.47.07.png


Author: kbraak@gbif.org
Created: 2016-04-29 18:31:59.547
Updated: 2016-04-29 18:31:59.547
        
Christian Svindseth (GBIF Norway) has produced a prototype sample event browser in the data.gbif.no portal [here|http://data.gbif.no/datasets/events/events/f7b8df8b-3629-4e90-82a1-f646d2b82d36]. Please take a look!

This prototype was presented at the [EU Nodes Workshop in Lisbon 18-19 April|http://www.gbif.pt/EuropeanNodesMeeting/workshop] and helped participants better imagine how sample event datasets can be visualised. It will help collect feedback and provide inspiration for how sample events should be visualised in GBIF.org, work related to this issue.

Everyone please note, this prototype is using test data and will undergo much more development. [~DagEndresen] and Christian will certainly appreciate your feedback on the prototype to continue its development, thanks.


Author: kbraak@gbif.org
Created: 2016-04-29 18:39:00.951
Updated: 2016-04-29 18:39:00.951
        
See attached image from [SIVIM|http://www.sivim.info/sivi/] map showing the location of relevés with 1x1 km UTM squares. SIVIM allows users to search for relevés matching for example vegetation communities (syntaxons).

"SIVIM (Sistema de Información de la Vegetación Ibérica y
Macaronésica) is an information system designed for capturing,
hosting, editing, analyzing and outputting georeferenced
plot data of Iberian and Macaronesian vegetation. It
currently hosts 86,000 relevés, mainly from the northern half
of the Iberian Peninsula and the Balearic Islands, and will
grow to 100,000 relevés in the near future. SIVIM has been
conceived to offer direct and free on-line access to relevés,
tables, as well as to floristic, syntaxonomical and bibliographical
records." - http://afsv.de/download/literatur/waldoekologie-online/waldoekologie-online_heft-9-8.pdf


Author: rdmpage
Created: 2016-05-03 11:48:50.186
Updated: 2016-05-03 11:48:50.186
        
If I follow this correctly, the idea is to be able to group a set of occurrences that share the same sampling event, in this case based on information provided in the same dataset.

Can we extend this to the idea of clustering occurrences NOT in the same dataset. The rationale is that GBIF has lots of duplicate occurrences (e.g., herbarium specimens shared by multiple herbaria, specimen data from museums that is duplicated by DNA sequence datasets, etc.). At the moment we have no way of grouping these together. One approach would be to have a *parentOccurrenceID* field that all linked occurrences share, but perhaps we could link them by *eventID*. This would require that GBIF indexes these, makes them searchable, and displays multiple occurrence slinked to same *eventID*. it also requires that *eventID* be stable and discoverable, so that people like me who may invest effort in making these crosslinks don't have our work undone each time the data gets reloaded/indexed.

Can we come up with a strategy for stable *eventIDs*? One approach might be to generate ids base don the metadata. For any lat/lon pair we can have a unique geohash, for example, and for any date/time we can have a unique string (e.g., the relevant ISO data string). What if we generate identifiers from those strings (or have them as accompanying identifiers?).

Author: dagendresen
Created: 2016-05-03 17:01:31.672
Updated: 2016-05-03 17:01:31.672

Yes, our idea from GBIF Norway is absolutely to collect information for things such as events (and occurrences) from across different datasets. GBIF Norway offer a resolver for eventID-s and occurrenceID-s provided by Norwegian data publishers using globally unique identifiers (UUIDs). The resolver will collect information about the same dwc:Event or dwc:Occurrence from across any of the Norwegian datasets. The idea is to have the resolver automatically crawl all new Norwegian datasets and dataset-updates when they are republished. The resolver will not have any search functions, only lookup by identifier. Yes, this mandates stable identifiers - for the beginning we plan to only include events and occurrences with UUID-identifiers, but could eventually expand to recognize other persistent identifier-schema such as DOIs, etc. The development of the resolver is still in development and does not yet cover all the Norwegian datasets.

Our ultimate goal is to simply redirect the resolver to the global GBIF portal when the GBIF portal will offer this same functionality.

The idea for the dataset portal (this visualization example described here) is for each dataset to display information provided from within the dataset, but also additional information from the resolver (separated from the information provided within the dataset). We also plan to include information provided as annotations connected to the thing identified by the globally unique identifier (UUID). This could e.g. be information from AnnoSys provided by users at the GBIF portal. However, the resolver for the eventID would not include the list of occurrences at the event - and we do not plan search for occurrences reported as having a given eventID. So at least for now, we only plan to show the list of occurrences described from within the dataset.

We have already published a few datasets including only events (dwc:Event) and no occurrences (dwc:Occurrence). With the occurrences referencing the events using the eventID from the respective dataset where the events are described. Many of the occurrences from the same events will here be specimens maintained by the university museums in Oslo and in Trondheim. These specimens (at different museums) will have the same eventID UUIDs as declared in the event-only datasets. I am not sure how much of this database work is already done, but it is a procedure that has been agreed with the collection curators (based on a project grant from the GBIF-node).

Example:
http://doi.org/10.15468/hwvr0m (280 collecting events)
http://doi.org/10.15468/y6cctp (data publication in progress)

I am not sure how a parentOccurrenceID would be used? Data records sharing the same dwc:eventID and the same dwc:organismID would be the same e.g. animal observed or sampled at the same time - still there could be valid reasons why there are different dwc:occurrenceID-s. And at the same event you could observe different types of animals, of different species.


Author: rdmpage
Comment: Thanks for the comments [~dag.endresen@gmail.com]. I guess one use for *parentOccurrenceID* is to handle cases where two different occurrences are the "same", but because the data is incomplete in one or other (or both) cases, we can't match them based on eventID. For example, a museum might have a specimen code and lat/lon pair, GenBank might have a specimen code and a sequence, the mapping is made on the code alone.
Created: 2016-05-03 17:15:51.32
Updated: 2016-05-03 17:15:51.32


Author: dagendresen
Comment: We could annotate the data record from GenBank with the occurrenceID used by the museum? Or if the museum is using a simple not-unique specimen code (catalog number), perhaps we want to find a more persistent identifier e.g. use the GBIF occurrence-number, and annotate both data records with this number (for the museum specimen) as the occurrenceID. Or perhaps (if we want to invest more time) annotate both records with an identical eventID and organismID number?
Created: 2016-05-03 17:36:08.725
Updated: 2016-05-03 17:36:08.725


Author: rdmpage
Created: 2016-05-03 17:51:29.047
Updated: 2016-05-03 17:51:29.047
        
All of these are possible. GBIF occurrence-numbers *ARE NOT* stable in my experience (their stability is a function of how stable the data provider ids and/or Darwin Core triples are, so not very).

But they would be an obvious candidate for being able to express something like "this occurrence in this data set I'm uploading now is related to this one already in GBIF".

What I'm hoping is that, if we're thinking about linking events together, can we also tackle linking occurrences as well. If we can, then we can make big strides in cleaning up the duplication of occurrences in GBIF.


Author: hoefft
Comment: The idea, but not the comments, is duplicated at https://github.com/gbif/portal16/issues/613 
Created: 2017-10-10 15:28:42.04
Updated: 2017-10-10 15:28:42.04