Issue 13436

As far as I have seen, all photos on species pages...

13436
Reporter: thirsch
Assignee: mdoering
Type: Bug
Summary: As far as I have seen, all photos on species pages...
Resolution: Fixed
Status: Closed
Created: 2013-07-02 14:38:26.6
Updated: 2013-07-18 15:47:48.249
Resolved: 2013-07-18 15:47:48.226
        
        
Description: As far as I have seen, all photos on species pages have undefined source, photographer and copyright. E.g. http://uat.gbif.org/species/3239105

*Reporter*: Tim Hirsch
*E-mail*: [mailto:thirsch]]]>
    


Author: mdoering@gbif.org
Created: 2013-07-02 14:41:42.986
Updated: 2013-07-02 14:41:42.986
        
That might actually be the case, here is what we know about that image:
http://api.gbif.org/name_usage/3239105/images

Here is one where we do have further information:
http://uat.gbif.org/species/109496311#images
http://api.gbif.org/name_usage/109496311/images
    


Author: thirsch@gbif.org
Comment: In that case we seem to have a problem in the way the Wikimedia API is getting the information to us - see details of the same image at at http://commons.wikimedia.org/wiki/File:Tritylodon_BW.jpg . This would explain why a very large number of the photos seem to be unsourced.
Created: 2013-07-02 14:49:28.285
Updated: 2013-07-02 14:49:28.285


Author: mdoering@gbif.org
Created: 2013-07-04 13:55:40.601
Updated: 2013-07-04 13:55:40.601
        
commons.wikimedia has an API we can use to get to the metadata details about an image:
http://commons.wikimedia.org/wiki/Commons:Commons_API

Via their (slow) toolserver API for example one gets this about that image:
http://toolserver.org/~magnus/commonsapi.php?image=Tritylodon_BW.jpg&meta

Here is another API they offer, but its more targeted at wikipedia pages in general:
https://www.mediawiki.org/wiki/API:Main_page
    


Author: mdoering@gbif.org
Created: 2013-07-04 13:59:26.221
Updated: 2013-07-04 13:59:26.221
        
I am trying to use these services in the dwc builder, but this will only solve the issue for the wikipedia checklist datasets.
What if another dataset contains a wikipedia image without proper metadata? What if an occurrence dataset has images linked with missing information? Is GBIF responsible to only show images where we know the exact license information and block all other content?

Wikipedia is quite prominent on our species pages, but we need to have a general position across all datasets.

We could also think about subsequently adding image metadata once we indexed image (multimedia) records. That way any wikimedia file would get its attribution and we could even update it from wikipedia far more often without relying on the dataset publisher to update their dwc archive (not sure if BioCASE or DiGIR would even allow to specify all that metadata btw)
    


Author: thirsch@gbif.org
Created: 2013-07-04 14:22:07.127
Updated: 2013-07-04 14:22:07.127
        
I would say that if we solve this for Wikipedia checklist dataset we have solved most of the problem, as this accounts for some 150,000 species and I would guess a great proportion of the images. I have just conferred with Donald, and he agrees that we should only display images where we have at least some source information, if not full licensing details. It is better to err on the side of caution as we would be badly exposed if we were showing images on the portal without proper authority or attribution.

    


Author: mdoering@gbif.org
Created: 2013-07-04 23:14:24.6
Updated: 2013-07-04 23:14:24.6
        
I agree with better being careful here. But that would mean we would have to block images of specimens if we dont get proper attributions? Most formats we get images through dont even allow to specify such details.

Good side is that its looking good to get the wiki commons image metadata through simple screen scraping. Might get it out tomorrow.
    


Author: thirsch@gbif.org
Comment: I think specimen images provided by the institution where a collection is housed (and from which the data are published) are in a somewhat different category from species-level images served via a checklist like Wikipedia. I am not aware of how the first type of images are currently displayed on the portal, so if you could send some examples to help with context, we can make decisions. Probably better to continue this offline and once the Wikipedia licensing details are implemented, we can consider this issue closed.
Created: 2013-07-05 08:56:27.84
Updated: 2013-07-05 08:56:27.84


Author: mdoering@gbif.org
Created: 2013-07-05 12:31:45.424
Updated: 2013-07-05 12:31:45.424
        
We do not show any specimen images yet, that is work to be done. Let's discuss it once we get there.
Wikipedia image metadata should now be included in the next dwc archive build, the modified code is here: https://github.com/mdoering/wikipedia-dwca

I'll add a comment once it shows up on staging so we can review it.
    


Author: mdoering@gbif.org
Created: 2013-07-16 10:13:16.227
Updated: 2013-07-16 10:13:16.227
        
The German wikipedia image licenses can be previewed in this dwca validation report, see the bottom table>
http://tools.gbif.org/dwca-reports/197-2950889704639469747.html
    


Author: mdoering@gbif.org
Created: 2013-07-18 15:47:43.691
Updated: 2013-07-18 15:47:43.691
        
The wikipedia datasets now have license and futher image metadata indexed, for example see this spider page:
http://staging.gbif.org:8080/portal/species/116771927#images

Also license information is indexed and presented for wikipedia textual descriptions.