Issue 15618

Occurrence missing publishingOrgKey

15618
Reporter: mdoering
Type: Bug
Summary: Occurrence missing publishingOrgKey
Priority: Critical
Status: Open
Created: 2014-05-16 19:28:07.3
Updated: 2016-02-15 13:45:38.55
        
Description: Reported by Matthew Neilson this record lacks a publishingOrgKey:
http://api.gbif.org/v0.9/occurrence/895528607

The dataset has a publishing org linked, so appears to be some hbase data inconsistency?
http://www.gbif.org/dataset/2a79f202-3f3a-4d54-88fa-09aa8de1ac73
---
I'm working on an internal application that will use the new GBIF API, and I had a question regarding the fields returned in the occurrence response body. Some occurrence records will return the publishingOrgKey (e.g., http://api.gbif.org/v0.9/occurrence/199238464) whereas other do not (e.g., http://api.gbif.org/v0.9/occurrence/895528607). Any particular reason for this?]]>
    


Author: mdoering@gbif.org
Created: 2014-05-19 13:19:23.945
Updated: 2014-05-19 13:20:41.367
        
The problem affects 20 datasets in the live HBase tables:

{noformat}
datasetkey	numMissing
1b93b558-b8ce-4e1a-af90-5a84e7f1c038	461
262f8270-f9c2-4bc6-a562-8ed71c0790e6	27
2a79f202-3f3a-4d54-88fa-09aa8de1ac73	74208
3633e0e7-8c25-4c3d-b9c7-078c0be25665	5
47881e45-febd-4622-b7a1-6efbce4fd7b3	1641
51fa0155-a545-4154-ac20-b89dbb2c312b	3
5f1568db-5d5b-4597-9c66-dfbd0d7ddd7b	27299
78dcfcbd-03c4-4d47-8618-830aac6f2ee5	2
813b435e-f762-11e1-a439-00145eb45e9a	1082
85b1cfb6-f762-11e1-a439-00145eb45e9a	541641
91aa5e23-9cad-4751-86e0-241da77d7407	1
95ed873a-f762-11e1-a439-00145eb45e9a	42428
962cceea-f762-11e1-a439-00145eb45e9a	19562
962e2f2e-f762-11e1-a439-00145eb45e9a	4400
b720d45b-d251-484b-a2bd-64cf161a7881	2056
d6cc311c-c5ab-4f23-9a20-10514f9eb9c4	162
d7ce3688-e91d-4f26-b2bb-333357c6da9f	24422
e70580b8-b1df-4566-9a33-b32f30aab526	58164
ea9f5b0b-ad97-45ea-935a-ba2784c80cbb	14753
f11db245-3f9f-4fc6-a0cc-12b4124d081b	62
{noformat}


----
HIVE QUERY USED:

SELECT t1.datasetKey, SUM(t1.hasNone) missing FROM
(
  SELECT gbifID, datasetKey, IF(publishingOrgKey IS NULL, 1, 0) hasNone
  FROM occurrence_hbase
) t1
GROUP BY t1.datasetKey
HAVING SUM(t1.hasNone) > 0
    


Author: mdoering@gbif.org
Comment: owning_organization_key in the registry exists for the first few datasets checked. So appears to be a processing error. What does happen if there are registry ws exceptions happening during processing, does the record still make it into an interpreted occurrence? The model class has pubOrgKey annotated with @NotNull and downstream code relies on that fact. We need to make sure this cannot happen
Created: 2014-05-19 13:27:02.282
Updated: 2014-05-19 13:27:02.282


Author: omeyn@gbif.org
Created: 2014-05-19 15:38:42.771
Updated: 2014-05-19 15:38:42.771
        
here are the results from uat:

34c0bc41-cb59-4927-9e41-53fb0a5ce44b	9
427a6290-0c65-11dd-84d2-b8a03c50a862	50
b124e1e0-4755-430f-9eab-894f25a9b59c	100

interesting - must mean a real transient error during recent processing in production since uat is a copy from not that long ago (~1 month).
    


Author: omeyn@gbif.org
Comment: sending reinterp for the prod datasets had fixed all but the 5 records from 3633e0e7-8c25-4c3d-b9c7-078c0be25665. investigating some more
Created: 2014-05-19 16:56:20.237
Updated: 2014-05-19 16:56:20.237


Author: omeyn@gbif.org
Comment: ran again and done. Problem still needs solving for real though - ie interp has to keep trying until it gets a legit pub org key
Created: 2014-05-19 17:03:48.898
Updated: 2014-05-19 17:03:48.898


Author: mdoering@gbif.org
Comment: [~omeyn@gbif.org] should we do sth about this in the interpretation code before we rerun UAT? At least flagging a record with some OccurrenceIssue when an exception occurs must be done. As a start maybe even just sth like INCOMPLETE_PROCESSING - then we can spot these records at least immediately. Better would be if we apply some issue for each processed section with a try/catch
Created: 2014-05-20 10:38:48.399
Updated: 2014-05-20 10:38:48.399


Author: omeyn@gbif.org
Comment: Marked as tiny for the check to see if still happening. If so, probably not tiny to fix, but priority should be bumped.
Created: 2015-03-02 16:05:16.592
Updated: 2015-03-02 16:05:16.592