Issue 10631

Support extra dataset constituents metadata in dwc archives

10631
Reporter: mdoering
Assignee: mdoering
Type: NewFeature
Summary: Support extra dataset constituents metadata in dwc archives
Priority: Critical
Resolution: Fixed
Status: Closed
Created: 2012-01-15 12:21:51.327
Updated: 2013-12-17 15:46:36.308
DueDate: 2012-11-30 00:00:00.0
Resolved: 2012-11-21 09:34:00.162
        
Description: The catalogue of life dwc checklist archvie come with extra EML files, one per dataset constituent (GSD=Global Species Database). The dwc:datasetID appended with ".xml is used as the EML file name within the dataset subfolder in the archive.

For example see dataset/45.xml in the attached archive.

Every included constituent needs to be registered as a separate dataset with the constituents relationship to the full archive being present. To reliably update existing constituents the datasetID needs to be tracked (as tags?)

]]>
    
Attachment archive-kingdom-plantae-family-araucariaceae-bl3.zip


Author: mdoering@gbif.org
Comment: is this a job of the crawler or the synchronizer or how do they play together?
Created: 2012-10-26 21:25:15.258
Updated: 2012-10-26 21:25:15.258


Author: mdoering@gbif.org
Comment: This is part of a deliverable in january 2013 for the i4Life project. To have some time for testing the due date should be before christmas
Created: 2012-11-01 17:57:09.4
Updated: 2012-11-01 21:26:06.518


Author: mdoering@gbif.org
Created: 2012-11-21 09:33:42.783
Updated: 2012-11-21 09:33:42.783
        
added a method to dwca reader to get a map of constituents:
http://code.google.com/p/darwincore/source/detail?r=1584

added support for that in the dwca metasync:
http://code.google.com/p/gbif-crawler/source/detail?r=145
http://code.google.com/p/gbif-crawler/source/detail?r=148