Issue 13436

As far as I have seen, all photos on species pages...

Reporter: thirsch
Assignee: mdoering
Type: Bug
Summary: As far as I have seen, all photos on species pages...
Resolution: Fixed
Status: Closed
Created: 2013-07-02 14:38:26.6
Updated: 2013-07-18 15:47:48.249
Resolved: 2013-07-18 15:47:48.226
Description: As far as I have seen, all photos on species pages have undefined source, photographer and copyright. E.g.

*Reporter*: Tim Hirsch
*E-mail*: [mailto:thirsch]]]>

Created: 2013-07-02 14:41:42.986
Updated: 2013-07-02 14:41:42.986
That might actually be the case, here is what we know about that image:

Here is one where we do have further information:

Comment: In that case we seem to have a problem in the way the Wikimedia API is getting the information to us - see details of the same image at at . This would explain why a very large number of the photos seem to be unsourced.
Created: 2013-07-02 14:49:28.285
Updated: 2013-07-02 14:49:28.285

Created: 2013-07-04 13:55:40.601
Updated: 2013-07-04 13:55:40.601
commons.wikimedia has an API we can use to get to the metadata details about an image:

Via their (slow) toolserver API for example one gets this about that image:

Here is another API they offer, but its more targeted at wikipedia pages in general:

Created: 2013-07-04 13:59:26.221
Updated: 2013-07-04 13:59:26.221
I am trying to use these services in the dwc builder, but this will only solve the issue for the wikipedia checklist datasets.
What if another dataset contains a wikipedia image without proper metadata? What if an occurrence dataset has images linked with missing information? Is GBIF responsible to only show images where we know the exact license information and block all other content?

Wikipedia is quite prominent on our species pages, but we need to have a general position across all datasets.

We could also think about subsequently adding image metadata once we indexed image (multimedia) records. That way any wikimedia file would get its attribution and we could even update it from wikipedia far more often without relying on the dataset publisher to update their dwc archive (not sure if BioCASE or DiGIR would even allow to specify all that metadata btw)

Created: 2013-07-04 14:22:07.127
Updated: 2013-07-04 14:22:07.127
I would say that if we solve this for Wikipedia checklist dataset we have solved most of the problem, as this accounts for some 150,000 species and I would guess a great proportion of the images. I have just conferred with Donald, and he agrees that we should only display images where we have at least some source information, if not full licensing details. It is better to err on the side of caution as we would be badly exposed if we were showing images on the portal without proper authority or attribution.


Created: 2013-07-04 23:14:24.6
Updated: 2013-07-04 23:14:24.6
I agree with better being careful here. But that would mean we would have to block images of specimens if we dont get proper attributions? Most formats we get images through dont even allow to specify such details.

Good side is that its looking good to get the wiki commons image metadata through simple screen scraping. Might get it out tomorrow.

Comment: I think specimen images provided by the institution where a collection is housed (and from which the data are published) are in a somewhat different category from species-level images served via a checklist like Wikipedia. I am not aware of how the first type of images are currently displayed on the portal, so if you could send some examples to help with context, we can make decisions. Probably better to continue this offline and once the Wikipedia licensing details are implemented, we can consider this issue closed.
Created: 2013-07-05 08:56:27.84
Updated: 2013-07-05 08:56:27.84

Created: 2013-07-05 12:31:45.424
Updated: 2013-07-05 12:31:45.424
We do not show any specimen images yet, that is work to be done. Let's discuss it once we get there.
Wikipedia image metadata should now be included in the next dwc archive build, the modified code is here:

I'll add a comment once it shows up on staging so we can review it.

Created: 2013-07-16 10:13:16.227
Updated: 2013-07-16 10:13:16.227
The German wikipedia image licenses can be previewed in this dwca validation report, see the bottom table>

Created: 2013-07-18 15:47:43.691
Updated: 2013-07-18 15:47:43.691
The wikipedia datasets now have license and futher image metadata indexed, for example see this spider page:

Also license information is indexed and presented for wikipedia textual descriptions.