Issue 17306

Metadata is not updated according to the EML file

17306
Reporter: jlegind
Assignee: mdoering
Type: Bug
Summary: Metadata is not updated according to the EML file
Priority: Critical
Resolution: Fixed
Status: Closed
Created: 2015-02-23 14:52:58.381
Updated: 2015-02-26 12:58:50.381
Resolved: 2015-02-26 12:58:50.356
        
Description: I tried crawling two datasets from GBIF Spain with new EML content, but the portal pages do not reflect this. I checked this three hours after indexing completed. I fear it is a general issue...

These datasets were tested:
MCNB-Art  - http://registry.gbif.org/web/index.html#/dataset/7a498826-f762-11e1-a439-00145eb45e9a
MCNB-Tissue - http://registry.gbif.org/web/index.html#/dataset/71cfcf5c-f762-11e1-a439-00145eb45e9a

Basically there should be a new METADATA PROVIDER : Jordi Agulló Villaronga added to both datasets.

http://www.gbif.org/dataset/7a498826-f762-11e1-a439-00145eb45e9a
Original EML http://www.gbif.es:8080/ipt/eml.do?r=MCNB-Art&v=6

http://www.gbif.org/dataset/71cfcf5c-f762-11e1-a439-00145eb45e9a
Original EML http://www.gbif.es:8080/ipt/eml.do?r=MCNB-Tissue

There is a doi-updater-listener reject message, not sure if it matters
Crawl log: http://b6g8.gbif.org:5601/index.html#eyJzZWFyY2giOiJAZmllbGRzLmRhdGFzZXRLZXk9XCI3YTQ5ODgyNi1mNzYyLTExZTEtYTQzOS0wMDE0NWViNDVlOWFcIiIsImZpZWxkcyI6WyJAdHlwZSIsIkBmaWVsZHMubGV2ZWwiLCJAbWVzc2FnZSJdLCJvZmZzZXQiOjAsInRpbWVmcmFtZSI6IjYwNDgwMCIsImdyYXBobW9kZSI6ImNvdW50IiwidGltZSI6eyJ1c2VyX2ludGVydmFsIjowfSwic3RhbXAiOjE0MjQ2OTk5ODQ3NjN9

]]>
    


Author: mdoering@gbif.org
Created: 2015-02-25 17:24:56.351
Updated: 2015-02-25 17:24:56.351
        
The GBIF cached version of the EML at version 1 still, while the IPT is on v3:
http://api.gbif.org/v1/dataset/71cfcf5c-f762-11e1-a439-00145eb45e9a/document
http://www.gbif.es:8080/ipt/eml.do?r=MCNB-Tissue

    


Author: mdoering@gbif.org
Created: 2015-02-25 17:33:14.161
Updated: 2015-02-25 17:33:14.161
        
I have tried to POST the EML document manually using curl as such:

'''curl -i --user markus:password -H "Content-Type: application/xml" -H "Accept: application/json" -X POST -d @eml.xml  http://api.gbif.org/v1/dataset/71cfcf5c-f762-11e1-a439-00145eb45e9a/document'''

This was successful, but the registry logs say the dataset is locked for auto updates!

"Dataset 71cfcf5c-f762-11e1-a439-00145eb45e9a locked for automatic updates. Uploaded metadata document not does not modify registered dataset information"

Whic is true as you can see (and change) here:
http://registry.gbif.org/web/index.html#/dataset/71cfcf5c-f762-11e1-a439-00145eb45e9a

    


Author: mdoering@gbif.org
Created: 2015-02-25 17:34:50.199
Updated: 2015-02-25 17:40:24.232
        
After disabling the update lock and posting the EML again I hit this registry bug:
http://dev.gbif.org/issues/browse/POR-2675

Caused by an empty associated party contact:







    


Author: jlegind@gbif.org
Created: 2015-02-26 09:15:27.469
Updated: 2015-02-26 09:30:37.982
        
Thanks Markus, I hadn't considered the 'locked for auto updates' flag.

The GBIF schema also requires that surname is not null. Maybe it is worth considering bringing the XML schema and the DB constraints into alignment.   
    


Author: mdoering@gbif.org
Comment: [~jlegind@gbif.org] the xml schema and the db are pretty much matching. The problem is a) we never validate any incoming EML docs and if we did my estimate would be that 80-90% of the currenlty existing ones would fail b) most of the invalid reasons is due to the requirement of having at least a singlt non whitespace character in most places where a string is expected. And we often see empty string or just spaces in EML
Created: 2015-02-26 11:25:47.057
Updated: 2015-02-26 11:25:47.057


Author: mdoering@gbif.org
Created: 2015-02-26 12:58:50.379
Updated: 2015-02-26 12:58:50.379
        
After updating our prod registry to 2.25 I successfully updated the EML for the dataset and the new contacts show on the details page now:

http://api.gbif.org/v1/dataset/71cfcf5c-f762-11e1-a439-00145eb45e9a/document

http://www.gbif.org/dataset/71cfcf5c-f762-11e1-a439-00145eb45e9a