Issue 11289

Citation is null in the Dataset object

11289
Reporter: trobertson
Assignee: jcuadra
Type: Bug
Summary: Citation is null in the Dataset object
Priority: Blocker
Resolution: Fixed
Status: Closed
Created: 2012-06-02 09:09:12.326
Updated: 2013-12-16 17:50:21.605
Resolved: 2012-11-06 17:07:43.095
        
Description: A call to a /Dataset/ appears to have a null Citation.
This should be populated with either the default citation (as per the current portal) or the citation string that is recognized in metadata crawling (e.g. DiGIR, BioCASe, TAPIR or from an EML document).]]>
    


Author: trobertson@gbif.org
Created: 2012-06-04 16:47:23.165
Updated: 2012-06-04 16:47:23.165
        
Please refer to this email thread, read from the bottom up:

Hi Tim,

Yes, since we are generating them we stick with what is visible through the portal.

Regards,
Vishwas

From: Tim Robertson [GBIF] [mailto:trobertson@gbif.org]
Sent: Monday, June 04, 2012 3:19 PM
To: Vishwas Chavan [GBIF]
Cc: 'GBIF Developers Developers'; 'Tim Hirsch (GBIF)'
Subject: Re: Citation text
Importance: High

Thanks Vishwas.

Can you please confirm what we should do for the "total no. of records" when generating citation strings from the Portal for datasets which don't declare a specific citation?
A dataset can have say 1000 records, but due to databasing issues only 999 are actually accessible through their publishing endpoint [1], and due to indexing procedures at GBIF [2] we may deem that only 998 records are suitable for discovery.

I propose that in this scenario, since we are generating a citation string for the users of the portal, the number of records would be 998.  My justification being that there are 998 possible records available to the portal user and therefore that would be accurate to their customized citation.

Would you agree?

Thanks,
Tim



[1] Potentially this applies to DiGIR, TAPIR and BioCASE endpoints and not IPTs
[2] E.g. ensuring that there is some kind of identification that looks biological to remove Paleo records such as "The remains of a wooden horse" (this has been known to happen)

On Jun 4, 2012, at 3:09 PM, Vishwas Chavan [GBIF] wrote:


Hi Tim,

1.       Last month, we have released ‘Recommended practices for citation of data published through the GBIF network’ (http://www.gbif.org/orc/?doc_id=4659&l=en).
2.       It recommends six styles of Publisher-based citations. These are authored or described by the owner, custodian or publisher of the dataset. The unit for citation is the dataset. All those responsible for the dataset development should be individually cited, and the role played by each contributor can be recorded in the citation. These six styles are listed in Table 2 of the above referred document. Hypothetical examples are provided in Table 3 of the same document.
3.       Given the fact that it is retrospective fixing of ‘citation’, I feel that following process be followed:
a.       Human intervention is needed to select one of the six style which is most appropriate for the dataset.
b.      Auto-generate the citation according to chosen style by using the available metadata. Knowing the state of metadata available, I presume that majority of citations will be blank, with information to be enriched.
c.       Publishers are asked to fill in the incomplete information so that the citation syntax be enriched.

Hope this helps moving forward. Let me know if we need to discuss this further in order to evolve coordinated approach on this topic.

Regards,
Vishwas

From: Tim Robertson [GBIF] [mailto:trobertson@gbif.org]
Sent: Monday, June 04, 2012 9:38 AM
To: Vishwas Chavan [GBIF]
Cc: GBIF Developers Developers; Tim Hirsch (GBIF)
Subject: Citation text

Hi Vishwas,

When a data publisher has not provided a citation text in their metadata, we currently assemble a citation such as:

PonTaurus collection database 1999 (accessed through GBIF data portal, http://data.gbif.org/datasets/resource/1099, 2012-06-04)

Can you please provide an example of the desired format from the citation groups you have been working with, so we can determine the best way to include this in the download service that is now being developed?  These citations are assembled into a multidataset citation for all datasets included in the download.  Please note that I am only interested in what we should generate when none is explicitly stated by the publisher at this point, which is the majority of resources.

Thanks,
Tim
    


Author: mdoering@gbif.org
Comment: Id suggest to not include html tags in the citation but rather insert anchor tags later in the portal
Created: 2012-08-21 21:03:08.832
Updated: 2012-08-21 21:03:08.832


Author: mdoering@gbif.org
Comment: still null here http://jawa.gbif.org:8080/registry-ws/dataset/4259c130-c065-11e1-9773-0024e8565763
Created: 2012-08-21 21:23:26.134
Updated: 2012-08-21 21:23:26.134


Author: jcuadra@gbif.org
Comment: All citations on registry_staging (and I am guessing jawa) are all empty. The citations are stored in the "extended_property" table
Created: 2012-08-22 10:14:03.532
Updated: 2012-08-22 10:14:03.532


Author: mdoering@gbif.org
Created: 2012-09-11 12:37:48.781
Updated: 2012-09-11 12:37:48.781
        
the staging ws still contain empty citations:
http://staging.gbif.org:8080/registry-ws/dataset/d7dddbf4-2cf0-4f39-9b2a-bb099caae36c
    


Author: jcuadra@gbif.org
Created: 2012-09-24 17:35:44.442
Updated: 2012-09-24 17:35:44.442
        
I have committed some changes and the default citation already gets populated as

, 

There has been several discussions on this issue on skype spawning multiple days. The last indication I received was when you guys were in Madrid, when Markus posted me a skype-message that reads

---------------
[8/22/12 4:50:40 PM] GBIF - Markus D: Jose just talked with tim again and he also thinks its best to replace null citations in the factory
[8/22/12 4:50:50 PM] GBIF - Markus D: with only the info we got about a dataset, so no numbers
---------------

so I haven't included number of records, or date of creation. So my pending question before closing this issue will be if you guys agree on the current <Publisher as Institution>, <Title of data resource> format, or you will like to see something more.



Example dataset response:

     ...
     "technicalInstallationKey": "404b1e20-c065-11e1-9773-0024e8565763",
     "citation": {
         "text": "Finnish Museum of Natural History, Botanic Garden of the Finnish Museum of Natural History" },
     "type": "OCCURRENCE",
     "constituents": false,
     ...</body>
    </Action>
</pre><hr/>
<pre>

Author: mdoering@gbif.org
Created: 2012-09-25 12:17:21.73
Updated: 2012-09-25 12:17:21.73
        
The citation guidelines above usually include:
 - year first published
 - date last updated
 - some persistent identifier
 - some primary accesspoint

Examples given are:

Chavan, V. S. (1996). Amphibians of the west coast of India. 1223 records, published online, http://www.vishwaschavan.in/indfauna/amphibians_west_coast/, released on 12 June 1998, doi: 10.5284/1000164.

Johnson, D. K. (2002 -). Observational dataset of the mammals of South Africa, 32001 records, Online http://www.satol.ac.za/mammalsdb/, 01/10/ 2002, version 1.2 (last updated on 01/01/2012), doi: 10.1000/123.


I think I would therefore include:
 - the GBIF registry URL or at least the UUID (once we have replaced by DOI). For the url the question is which domain we use right now, simply the current GBRDS? Or should we prefer the homepage url of a dataset if it exists over a gbif url?
 - the date created = date first registered?

The data last updated is unknown to us, or isnt it? And should it reflect the dataset as we have indexed it (then which case its the last indexed date) or the date the dwca has been modified last?


So in your example ideally:

Finnish Museum of Natural History (2012). Botanic Garden of the Finnish Museum of Natural History, http://gbrds.gbif.org/browse/agent?uuid=404b1e20-c065-11e1-9773-0024e8565763, last updated on 2012-09-21.</body>
    </Action>
</pre><hr/>
<pre>

Author: jcuadra@gbif.org
Created: 2012-10-04 15:24:41.01
Updated: 2012-10-04 15:24:41.01
        
Answering you:

 * the GBIF registry URL or at least the UUID (once we have replaced by DOI). For the url the question is which domain we use right now, simply the current GBRDS? Or should we prefer the homepage url of a dataset if it exists over a gbif url?

I guess we could use the current gbrds.gbif.org URL, but at the end I think we should aim for the new portal URL, wherever it is going to reside.

* the date created = date first registered?
Yes.

* The data last updated is unknown to us, or isnt it? And should it reflect the dataset as we have indexed it (then which case its the last indexed date) or the date the dwca has been modified last?

We have a modified timestamp whenever a publisher updates the dataset's metadata (either by IPT-->Registry WS or by using directly the WS). For the "records last updated", will be difficult to assess. we are not storing this info, that I am aware of.


</body>
    </Action>
</pre><hr/>
<pre>

Author: mdoering@gbif.org
Comment: I would suggest tp use the GBRDS urls now, but move as you say to the new portal pages as soon as its public
Created: 2012-10-04 16:06:28.74
Updated: 2012-10-04 16:06:28.74
</pre><hr/>
<pre>

Author: jcuadra@gbif.org
Created: 2012-10-05 17:12:27.253
Updated: 2012-10-05 17:12:27.253
        
Changes have been committed on http://code.google.com/p/gbif-registry/source/detail?r=3276

Now only missing to include the "last updated" bits of the citation, which has been converted into a subtask of this issue, at: http://dev.gbif.org/issues/browse/REG-322</body>
    </Action>
</pre><hr/>
<pre>

Author: mdoering@gbif.org
Created: 2012-10-17 16:31:01.267
Updated: 2012-10-17 16:31:01.267
        
The citation of the GBIF Backbone now looks like this as the default:

The Global Biodiversity Information Facility (2011): GBIF Backbone Taxonomy, http://gbrds.gbif.org/browse/agent?uuid=d7dddbf4-2cf0-4f39-9b2a-bb099caae36c

Does it really make sense to put the date created in the registry as the publishing date? To me this reads the current backbone dates from 2011, but it doesn't.</body>
    </Action>
</pre><hr/>
<pre>

Author: mdoering@gbif.org
Created: 2012-10-29 09:56:10.139
Updated: 2012-10-29 09:56:10.139
        
What about citations for external datasets?
http://staging.gbif.org:8080/portal-web-dynamic/dataset/2344f83d-eefb-4635-afed-fb2a1c9bd466:knb-lter-kbs.2.16</body>
    </Action>
</pre><hr/>
<pre>

Author: jcuadra@gbif.org
Created: 2012-11-02 14:50:23.144
Updated: 2012-11-02 14:50:23.144
        
Completely forgot about external datasets, argh. And all changes were made on the DatasetFactory, which is not used for creating the external ones..bummer

changing this now...</body>
    </Action>
</pre><hr/>
<pre>

Author: jcuadra@gbif.org
Created: 2012-11-06 17:07:43.129
Updated: 2012-11-06 17:07:43.129
        
A default citation is now populated.

Nevertheless, this issue has been spawned into several issues for easier resolution, as there are still outstanding issues that need to be addressed:

http://dev.gbif.org/issues/browse/REG-322
http://dev.gbif.org/issues/browse/REG-339
http://dev.gbif.org/issues/browse/REG-346</body>
    </Action>
</pre><hr/>
</body>
</html>