Issue 12756

Be lenient on ids

Reporter: trobertson
Assignee: omeyn
Type: Task
Summary: Be lenient on ids
Priority: Major
Resolution: WontFix
Status: Closed
Created: 2013-02-19 11:36:41.148
Updated: 2013-12-17 15:17:07.881
Resolved: 2013-03-18 16:30:53.357
Description: Please see the attached DwC-A.
In this archive, the meta.xml tells us that the row is of type occurrence, and the ID is index 0.  However, the current HIT does not persist this, as there is no explicit mapping that states the occurrenceID is index 0.

This came from the IPT, and is quite easy to reproduce.  Please ensure that where the row type is known to be occurrence, and where there is an ID, if there is no occurrenceID mapped explicitly, that we infer the ID to mean occurrenceID.

This is why the dataset behind the latest data paper did not index properly.

Be aware of this guys [ [~mdoering] *[~kbraak] [~jlegind] [~ahahn] ] and please comment if you have concerns.]]>

Created: 2013-02-19 11:43:00.171
Updated: 2013-02-19 11:43:00.171
This is the registration:

And the IPT:

But we are likely to ask them to add an explicit mapping, so see the attached to view the error (by the time you look at this they might have fixed it)

Created: 2013-02-19 11:58:53.014
Updated: 2013-02-19 11:58:53.014
The occurrences are missing the mandatory properties collectionCode, catalogNumber and scientificName. I believe that the resource will index fine when these elements are included.

Publisher is being contacted.

Created: 2013-02-19 12:01:50.706
Updated: 2013-02-19 12:01:50.706
Thanks []

Please do verify that this is handled in the new processing though - we should be able to interpret what we can as only the ID is required

Comment: The HIT does persist this as an identifier record. The triplet remains mandatory in the HIT, however, as the only way of identifying an occurrence record during synchronization. 
Created: 2013-02-20 15:50:37.175
Updated: 2013-02-20 15:50:37.175

Comment: After discussion we decided that the ID column of dwca can't be trusted, especially from IPT, because they change every time the archive is published (by design).
Created: 2013-03-18 16:30:53.381
Updated: 2013-03-18 16:30:53.381