Issue 12756

Be lenient on ids

12756
Reporter: trobertson
Assignee: omeyn
Type: Task
Summary: Be lenient on ids
Priority: Major
Resolution: WontFix
Status: Closed
Created: 2013-02-19 11:36:41.148
Updated: 2013-12-17 15:17:07.881
Resolved: 2013-03-18 16:30:53.357
        
Description: Please see the attached DwC-A.
In this archive, the meta.xml tells us that the row is of type occurrence, and the ID is index 0.  However, the current HIT does not persist this, as there is no explicit mapping that states the occurrenceID is index 0.

This came from the IPT, and is quite easy to reproduce.  Please ensure that where the row type is known to be occurrence, and where there is an ID, if there is no occurrenceID mapped explicitly, that we infer the ID to mean occurrenceID.

This is why the dataset behind the latest data paper did not index properly.

Be aware of this guys [ [~mdoering] *[~kbraak] [~jlegind] [~ahahn] ] and please comment if you have concerns.]]>
    
Attachment dwca-macrobenthos.zip


Author: trobertson@gbif.org
Created: 2013-02-19 11:43:00.171
Updated: 2013-02-19 11:43:00.171
        
This is the registration:
  http://gbrds.gbif.org/browse/agent?uuid=dd92c709-8f7d-4bf3-9897-901aa88486e5

And the IPT:
  http://ipt.biodiversity.aq/resource.do?r=macrobenthos

But we are likely to ask them to add an explicit mapping, so see the attached to view the error (by the time you look at this they might have fixed it)
    


Author: jlegind@gbif.org
Created: 2013-02-19 11:58:53.014
Updated: 2013-02-19 11:58:53.014
        
The occurrences are missing the mandatory properties collectionCode, catalogNumber and scientificName. I believe that the resource will index fine when these elements are included.

Publisher is being contacted.
    


Author: trobertson@gbif.org
Created: 2013-02-19 12:01:50.706
Updated: 2013-02-19 12:01:50.706
        
Thanks [~jlegind@gbif.org]

Please do verify that this is handled in the new processing though - we should be able to interpret what we can as only the ID is required
    


Author: kbraak@gbif.org
Comment: The HIT does persist this as an identifier record. The triplet remains mandatory in the HIT, however, as the only way of identifying an occurrence record during synchronization. 
Created: 2013-02-20 15:50:37.175
Updated: 2013-02-20 15:50:37.175


Author: omeyn@gbif.org
Comment: After discussion we decided that the ID column of dwca can't be trusted, especially from IPT, because they change every time the archive is published (by design).
Created: 2013-03-18 16:30:53.381
Updated: 2013-03-18 16:30:53.381