Registry-metadata-service: Complete the EMLParser for dataset
10533
Reporter: trobertson
Assignee: kbraak
Type: Bug
Summary: Registry-metadata-service: Complete the EMLParser for dataset
Priority: Critical
Resolution: Fixed
Status: Closed
Created: 2011-12-15 10:51:21.54
Updated: 2013-12-16 17:50:41.418
Resolved: 2012-01-23 11:27:38.217
Description: The EMLParser and related EMLParserTest have a lot of commented out code.
These need to be worked through, porting the code from building the EML, to building the Dataset object.]]>
Author: kbraak@gbif.org
Created: 2011-12-19 13:23:11.658
Updated: 2011-12-19 13:23:11.658
Things not currently parsed:
-metadata language as xml:lang attribute
-GUID as packageId attribute
-support for titles in other languages
-additionalInfo
-intellectualRights
-purpose
-specimenPreservationMethod
-hierarchyLevel
-parentCollectionIdentifier
-collectionIdentifier
-collectionName
-pubDate
-dateStamp
Missing jgtiCuratorialUnit info:
-jgtiUnitType
jgtiUnitRanges (beginRange & endRange)
-jgtiUnits
Missing attributes from studyAreaDescription
-studyAreaDescription/descriptor@name"
-studyAreaDescription/descriptor@citableClassificationSystem
Missing PhysicalData:
-physical/objectName
-physical/characterEncoding
-physical/dataFormat/externallyDefinedFormat/formatName
-physical/dataFormat/externallyDefinedFormat/formatVersion
Author: kbraak@gbif.org
Comment: We need to now decide which fields should be added to the Dataset Object and which remaining elements not parsed actually correspond to existing Dataset fields.
Created: 2011-12-20 11:22:19.658
Updated: 2011-12-20 11:22:19.658
Author: kbraak@gbif.org
Created: 2012-01-20 10:55:06.746
Updated: 2012-01-20 10:55:06.746
Added recently to Dataset are:
-metadata language as xml:lang attribute
-additionalInfo
-intellectualRights
-purpose
-specimenPreservationMethod
-parentCollectionIdentifier
-collectionIdentifier
-collectionName
-pubDate
-jgtiUnitType
jgtiUnitRanges (beginRange & endRange)
-jgtiUnits
-jgti uncertaintyMeasure
-physical/objectName
-physical/characterEncoding
-physical/dataFormat/externallyDefinedFormat/formatName
-physical/dataFormat/externallyDefinedFormat/formatVersion
Outstanding fields still not present in the Dataset and not parsed from the eml (with reasons why):
-GUID as packageId attribute (not really important, used to version the metadata)
-dateStamp (same as pubDate)
-hierarchyLevel (means "Dataset level to which the metadata applies; default value is dataset" - I don't know of any deviation from the default)
--studyAreaDescription/descriptor@name" and studyAreaDescription/descriptor@citableClassificationSystem (The IPT populates these fields with name="generic" citableClassificationSystem="false", and I reckon they are scarcely populated anyway)
Author: kbraak@gbif.org
Comment: These new fields are also showing in the portal Dataset page also btw.
Created: 2012-01-20 18:59:15.156
Updated: 2012-01-20 18:59:15.156
Author: kbraak@gbif.org
Comment: At this point, we are parsing all fields required and can confidently close this.
Created: 2012-01-23 11:27:38.266
Updated: 2012-01-23 11:27:38.266