Issue 10533

Registry-metadata-service: Complete the EMLParser for dataset

10533
Reporter: trobertson
Assignee: kbraak
Type: Bug
Summary: Registry-metadata-service: Complete the EMLParser for dataset
Priority: Critical
Resolution: Fixed
Status: Closed
Created: 2011-12-15 10:51:21.54
Updated: 2013-12-16 17:50:41.418
Resolved: 2012-01-23 11:27:38.217
        
Description: The EMLParser and related EMLParserTest have a lot of commented out code.

These need to be worked through, porting the code from building the EML, to building the Dataset object.]]>
    


Author: kbraak@gbif.org
Created: 2011-12-19 13:23:11.658
Updated: 2011-12-19 13:23:11.658
        
Things not currently parsed:

-metadata language as xml:lang attribute
-GUID as packageId attribute
-support for titles in other languages
-additionalInfo
-intellectualRights
-purpose
-specimenPreservationMethod
-hierarchyLevel
-parentCollectionIdentifier
-collectionIdentifier
-collectionName
-pubDate
-dateStamp

Missing jgtiCuratorialUnit info:

-jgtiUnitType
jgtiUnitRanges (beginRange & endRange)
-jgtiUnits

Missing attributes from studyAreaDescription

-studyAreaDescription/descriptor@name"
-studyAreaDescription/descriptor@citableClassificationSystem

Missing PhysicalData:

-physical/objectName
-physical/characterEncoding
-physical/dataFormat/externallyDefinedFormat/formatName
-physical/dataFormat/externallyDefinedFormat/formatVersion


    


Author: kbraak@gbif.org
Comment: We need to now decide which fields should be added to the Dataset Object and which remaining elements not parsed actually correspond to existing Dataset fields.
Created: 2011-12-20 11:22:19.658
Updated: 2011-12-20 11:22:19.658


Author: kbraak@gbif.org
Created: 2012-01-20 10:55:06.746
Updated: 2012-01-20 10:55:06.746
        
Added recently to Dataset are:

-metadata language as xml:lang attribute
-additionalInfo
-intellectualRights
-purpose
-specimenPreservationMethod
-parentCollectionIdentifier
-collectionIdentifier
-collectionName
-pubDate

-jgtiUnitType
jgtiUnitRanges (beginRange & endRange)
-jgtiUnits
-jgti uncertaintyMeasure

-physical/objectName
-physical/characterEncoding
-physical/dataFormat/externallyDefinedFormat/formatName
-physical/dataFormat/externallyDefinedFormat/formatVersion

Outstanding fields still not present in the Dataset and not parsed from the eml (with reasons why):

-GUID as packageId attribute (not really important, used to version the metadata)
-dateStamp (same as pubDate)
-hierarchyLevel (means "Dataset level to which the metadata applies; default value is dataset" - I don't know of any deviation from the default)
--studyAreaDescription/descriptor@name" and studyAreaDescription/descriptor@citableClassificationSystem (The IPT populates these fields with name="generic" citableClassificationSystem="false", and I reckon they are scarcely populated anyway)

    


Author: kbraak@gbif.org
Comment: These new fields are also showing in the portal Dataset page also btw.
Created: 2012-01-20 18:59:15.156
Updated: 2012-01-20 18:59:15.156


Author: kbraak@gbif.org
Comment: At this point, we are parsing all fields required and can confidently close this.
Created: 2012-01-23 11:27:38.266
Updated: 2012-01-23 11:27:38.266