Issue 16684

Update registry-metadata to parse new version of GBIF Metadata Profile (1.1)

16684
Reporter: kbraak
Assignee: kbraak
Type: Improvement
Summary: Update registry-metadata to parse new version of GBIF Metadata Profile (1.1)
Priority: Critical
Resolution: Fixed
Status: Closed
Created: 2014-11-27 08:08:46.986
Updated: 2016-08-29 14:10:02.529
Resolved: 2016-08-29 14:10:02.474
        
Description: GBIF Metadata Profile version 1.1 will be released shortly. Currently all our projects use version 1.0.2

Please update the DatasetParser to accommodate the changes introduced in version 1.1

# Parse multiple collections (previously parser may have expected a single collection). _This does not require a change to the API, because Dataset already has List collections_.
# Parse multiple contacts, creators, and metadataProvider (previously parser may have expected a single one for each). _This does not require a change to the API, because Dataset already has List_. See Issue POR-523
# Parse multiple userIds for any agent (requires aggregating the directory attribute and value together). _This does not require an API change, because Contact already has List userId_.
# Parse the maintenanceUpdateFrequency, which must use a controlled vocabulary. *This vocabulary needs to be added to our API, and field added to Dataset API object*. Please note, an ENUM implementation already exists in the gbif-metadata-profile project [here|https://code.google.com/p/gbif-common-resources/source/browse/gbif-metadata-profile/trunk/src/main/java/org/gbif/metadata/eml/MaintenanceUpdateFrequency.java] And parse maintenanceDescription, which is free-text.
# Parse multiple project personnel (previously may have expected only a single person). _This does not require an API change, because Project already has List_.
# Parse the project ID attribute. *This requires new field added to API [Project|https://github.com/gbif/gbif-api/blob/master/src/main/java/org/gbif/api/model/registry/eml/Project.java]*
# Parse the project description. Multiple paragraphs must be parsed and concatenated into Project.description. _In this way, no API change is required_.
# Parse the  element in intellectualRights used to store the license name and license URL. Our schema allows  inside any  element, so it could be a good idea to apply the parsing rule to all . The gbif-metadata-profile converts it into an HTML anchor inside the intellectualRights string. *Alternatively, the license name and URL could be parsed and stored in new fields, but this will require an API change.*]]>
    


Author: kbraak@gbif.org
Comment: The API changes should be done before starting this work, see POR-2562.
Created: 2014-11-27 08:52:31.939
Updated: 2014-11-27 08:52:31.939


Author: kbraak@gbif.org
Created: 2016-08-29 14:10:02.526
Updated: 2016-08-29 14:10:02.526
        
Work completed. [~hoefft], [~bko@gbif.org] please note a wealth of new information is now available to show on the dataset page. Please see the issue description for a list of what has changed in the Dataset API response. With regards to POR-3091 note there can now be multiple contacts, creators and metadataProviders and that all contacts can now have a userID such as an ORCID.

It is rare to find a real dataset with all metadata fields filled in. Therefore during development I recommend you use [this EML file|https://raw.githubusercontent.com/gbif/registry/master/registry-metadata/src/test/resources/eml-metadata-profile/sample4-v1.1.xml] that uses fake data and has all new fields populated

Remember that you can update any dataset with this EML file by using the following API command:

{quote}
curl -i --user "username":password -H "Content-Type: application/xml" -H "Accept: application/json" -X POST -T "/tmp/eml-filename.xml" http://api.gbif-[ENV].org/v1/dataset/[UUID]/document
{quote}

Closing issue.