Issue 17805

Generated EML doesn't validate against its own XSD

17805
Reporter: cgendreau
Assignee: kbraak
Type: Bug
Summary: Generated EML doesn't validate against its own XSD
Priority: Major
Resolution: Fixed
Status: Closed
Created: 2015-09-16 15:15:15.522
Updated: 2016-09-09 11:14:21.536
Resolved: 2016-09-09 11:14:21.417
        
Description: EML documents generated by the EMLWriter class doesn't validate against the declared schema (http://rs.gbif.org/schema/eml-gbif-profile/1.0.2/eml.xsd).

The current validation issues currently exist:
* Improper use of system attribute in alternateIdentifier: _cvc-type.3.1.1: Element 'alternateIdentifier' is a simple type, so it cannot have attributes, ... However, the attribute, 'system' was found._
* Improper use of userID element: _cvc-complex-type.2.4.a: Invalid content was found starting with element 'userId'. One of '{organizationName, individualName, positionName, address, phone, electronicMailAddress, onlineUrl}' is expected._
* Empty temporalCoverage when no singleDate or dateRange exists: _cvc-complex-type.2.4.b: The content of element 'temporalCoverage' is not complete. One of '{rangeOfDates, singleDateTime}' is expected._
* Invalid attribute 'name' on element descriptor: _cvc-attribute.3: The value 'description' of attribute 'name' on element 'descriptor' is not valid with respect to its type, 'descriptorEnum'._

I would like to know which component needs to be fixed (XSD or FTL template) and the possible impacts on other projects.

cc [~mblissett]]]>
    
Attachment GBIF annotated version (EML) - 2016-08-02.xml


Author: cgendreau
Created: 2015-09-17 10:33:03.325
Updated: 2015-09-17 10:33:03.325
        
More informations:
The EML GBIF profile version 1.0.2 does not support the attribute 'system' on the element 'alternateIdentifier' according to its XSD. Same for the GBIF profile 1.1 version.

This attribute is used in EML 2.1.1: http://rs.gbif.org/schema/eml-2.1.1/eml-entity.xsd
The definition is:
"The information management system within which this identifier has relevance. Generally, the identifier would be unique within the "system" and would be sufficient to retrieve the entity from the system. The system is often a URL or URI that identifies the main entry point for the information management system."

The 'metadata-registry' EMLWriter currently use it only when a DOI exists for a dataset:

    


Author: kbraak@gbif.org
Comment: Attached file "GBIF annotated version (EML).xml" downloaded from http://www.gbif.org/dataset/38f06820-08c5-42b2-94f6-47cc3e83a54a today demonstrates array of validation issues in our own EML generated files.
Created: 2016-08-02 16:17:57.799
Updated: 2016-08-02 16:17:57.799


Author: kbraak@gbif.org
Created: 2016-09-09 11:14:21.489
Updated: 2016-09-09 11:14:21.489
        
Issue fixed.

Our generated EML files now use v1.1 of our schema, are produced using an updated version of the [freemarker template|https://github.com/gbif/registry/blob/master/registry-metadata/src/main/resources/gbif-eml-profile-template/eml-dataset-1.1.ftl] and are confirmed to validate successfully, e.g. http://api.gbif.org/v1/dataset/38f06820-08c5-42b2-94f6-47cc3e83a54a/document