Issue 11635

DiscoverLife checklist dwca

11635
Reporter: mdoering
Type: Task
Summary: DiscoverLife checklist dwca
Priority: Major
Status: Open
Created: 2012-07-30 12:56:24.599
Updated: 2017-01-05 15:14:43.066
        
Description: Discover life will export a DwC-A with some kind of extension with image URLs.
  i) they can include some kind of priority on the preferred image
  ii) they can do various sizes of images
  iii) they need our portal to link to their page
  iv) they need their single copyright statement shown

All they need from us is the format to produce.
 ]]>
    
Attachment discoverlife.zip


Author: mdoering@gbif.org
Created: 2012-07-30 13:03:30.908
Updated: 2012-07-30 13:03:30.908
        
Sample data: http://www.discoverlife.org/export/img_list.moth_name.txt
for ~600k species images
    


Author: mdoering@gbif.org
Comment: Basic sample dwca attached
Created: 2012-07-30 13:39:45.379
Updated: 2012-07-30 13:39:45.379


Author: trobertson@gbif.org
Created: 2012-07-30 13:50:46.061
Updated: 2012-07-30 13:52:45.693
        
Thanks for the example.  This doesn't seem to address the i-iv) in the issue above.

a) how will it handle homonyms? They can provide higher taxa, but I am not sure if they have species/taxa IDs (we need one right? - concat the Kingdom:Species ?)
b) there is no dataset metadata - can we provide a basic sample?
c) it needs a link to their page
d) it needs copyright statement

What image size should we use? I'm not sure what sizes they have but please see:
  http://www.discoverlife.org/mp/20p?see=I_LHT607&res=320
  http://www.discoverlife.org/mp/20p?see=I_LHT607&res=640

The more complete our example the more easier this will all be - they will just copy the example.

Thanks!



    


Author: mdoering@gbif.org
Created: 2012-07-30 14:46:39.096
Updated: 2012-07-30 14:46:39.096
        
a) they need to provide some id that links the taxon to the image. As long as they know they don't have homonyms they could just use the scientific name. But maybe they have some taxon id, lets see.

b) I will attach a basic EML shortly

c) a taxon page link is simple, but I assume they want a link to the image / occurrence page? Thats not in the image extension so far. If required we could update the extension and add such a link

d) a license is included already. Would you think a separate dc:rights field is needed? I think we show the license as the copyright on the images here so far: http://portal-static.gbif.org/species/name_usage.html

e) for the size I always suggest the largest possible size, the original is possible. We need to crop/resize them for our portal use anyway to have consistent sizes. If we want several sizes to be published we might want to consider to use Audobon Core, but I would still think this is experimental and definitely not supported by any of our software so far.

    


Author: mdoering@gbif.org
Comment: Lat / Lon might be good to, Im gonna add that
Created: 2012-07-30 14:47:49.985
Updated: 2012-07-30 14:47:57.398


Author: trobertson@gbif.org
Created: 2012-07-30 14:51:45.952
Updated: 2012-07-30 14:51:45.952
        
a) they have homonyms.  "Animalia:Puma concolor" ?

b) thanks

c) taxon page would be fine I think

d) as long as we display it, license field would be fine

e) large is HUGE!  Given it is primarily for the species page on that widget... I propose 640px.  Remember this is loaded by all browsers...

    


Author: trobertson@gbif.org
Created: 2012-07-30 14:55:56.632
Updated: 2012-07-30 14:55:56.632
        
http://www.discoverlife.org/mp/20p?see=I_LHT607&res=mx

see this sizes underneath
    


Author: mdoering@gbif.org
Created: 2012-07-30 14:59:15.672
Updated: 2012-07-30 14:59:15.672
        
c) Ill add a taxon page link like this: http://www.discoverlife.org/mp/20q?search=Falco+sparverius

e) its an outstanding issue, but don't you think its rather unlikely we can get away without caching and resizing images? Some of them load damn slow even though they are little.
    


Author: mdoering@gbif.org
Comment: c) the image extension actually provides a concept for html links for images too, http://purl.org/dc/terms/references. Ill add both
Created: 2012-07-30 15:15:47.342
Updated: 2012-07-30 15:15:47.342


Author: mdoering@gbif.org
Comment: Attaching new dwca with all above issues fixed and EML file included. Validator report: http://tools.gbif.org/dwca-reports/212-3672433171397159244.html
Created: 2012-07-30 15:42:25.276
Updated: 2012-07-30 15:42:25.276


Author: trobertson@gbif.org
Created: 2012-07-31 12:15:48.209
Updated: 2012-07-31 12:15:48.209
        
This appears to have homonym issues no?  Should it not concatenate the kingdom:species or family:species in the taxonID?


    


Author: mdoering@gbif.org
Created: 2012-07-31 12:24:11.562
Updated: 2012-07-31 12:24:11.562
        
Hard to say if they have homonyms without knowing the data. If they have there is no simple rule to avoid them. Many homonyms for example are within the animals kingdom, but then again binomial homonyms are very, very rare. If you enter their search with our classic Oenanthe homonym they dont disambiguate them and show bird names as plants: http://www.discoverlife.org/mp/20q?search=Oenanthe

If they always had a family then adding the family would be good enough. Ill write to John to see what their data looks like and we come up with a strategy.
    


Author: trobertson@gbif.org
Comment: I have already - we did some analysis in the US and he believes he has 6 cross kingdom binomials, but I gave him a list of 100 or so from our stuff, which they are no checking in case they missed some.  They do have homonyms... Should we suggest "family:name" ?
Created: 2012-07-31 12:27:40.794
Updated: 2012-07-31 12:27:40.794


Author: mdoering@gbif.org
Comment: if the have a family for those cases at least that sounds safe, yes!
Created: 2012-07-31 12:30:29.045
Updated: 2012-07-31 12:30:29.045


Author: trobertson@gbif.org
Created: 2012-07-31 12:39:35.049
Updated: 2012-07-31 12:39:35.049
        
Please see the new (smaller) DwC-A attached with:
a) reduced records as it is only for example
b) family:name as the key
c) reordered fields to improve readability (k,p,c,o,f,g,sn)

Can you please confirm this is the format we want to request of them Markus?
    


Author: trobertson@gbif.org
Created: 2012-07-31 12:42:17.21
Updated: 2012-07-31 12:42:17.21
        
http://tools.gbif.org/dwca-reports/213-480599337726800466.html

Why are the "Geometridae:Melanolophia signataria" missing?
    


Author: mdoering@gbif.org
Comment: Uploading updated version with license in data file and cleaned meta.xml. Validator result has all images: http://tools.gbif.org/dwca-reports/213-3873068880386366569.html
Created: 2012-07-31 15:22:26.435
Updated: 2012-07-31 15:22:26.435


Author: mdoering@gbif.org
Comment: If we can get data like this that would be awesome!
Created: 2012-07-31 15:23:13.425
Updated: 2012-07-31 15:23:13.425