15618
Reporter: mdoering
Type: Bug
Summary: Occurrence missing publishingOrgKey
Priority: Critical
Status: Open
Created: 2014-05-16 19:28:07.3
Updated: 2016-02-15 13:45:38.55
Description: Reported by Matthew Neilson this record lacks a publishingOrgKey:
http://api.gbif.org/v0.9/occurrence/895528607
The dataset has a publishing org linked, so appears to be some hbase data inconsistency?
http://www.gbif.org/dataset/2a79f202-3f3a-4d54-88fa-09aa8de1ac73
---
I'm working on an internal application that will use the new GBIF API, and I had a question regarding the fields returned in the occurrence response body. Some occurrence records will return the publishingOrgKey (e.g., http://api.gbif.org/v0.9/occurrence/199238464) whereas other do not (e.g., http://api.gbif.org/v0.9/occurrence/895528607). Any particular reason for this?]]>
Author: mdoering@gbif.org
Created: 2014-05-19 13:19:23.945
Updated: 2014-05-19 13:20:41.367
The problem affects 20 datasets in the live HBase tables:
{noformat}
datasetkey numMissing
1b93b558-b8ce-4e1a-af90-5a84e7f1c038 461
262f8270-f9c2-4bc6-a562-8ed71c0790e6 27
2a79f202-3f3a-4d54-88fa-09aa8de1ac73 74208
3633e0e7-8c25-4c3d-b9c7-078c0be25665 5
47881e45-febd-4622-b7a1-6efbce4fd7b3 1641
51fa0155-a545-4154-ac20-b89dbb2c312b 3
5f1568db-5d5b-4597-9c66-dfbd0d7ddd7b 27299
78dcfcbd-03c4-4d47-8618-830aac6f2ee5 2
813b435e-f762-11e1-a439-00145eb45e9a 1082
85b1cfb6-f762-11e1-a439-00145eb45e9a 541641
91aa5e23-9cad-4751-86e0-241da77d7407 1
95ed873a-f762-11e1-a439-00145eb45e9a 42428
962cceea-f762-11e1-a439-00145eb45e9a 19562
962e2f2e-f762-11e1-a439-00145eb45e9a 4400
b720d45b-d251-484b-a2bd-64cf161a7881 2056
d6cc311c-c5ab-4f23-9a20-10514f9eb9c4 162
d7ce3688-e91d-4f26-b2bb-333357c6da9f 24422
e70580b8-b1df-4566-9a33-b32f30aab526 58164
ea9f5b0b-ad97-45ea-935a-ba2784c80cbb 14753
f11db245-3f9f-4fc6-a0cc-12b4124d081b 62
{noformat}
----
HIVE QUERY USED:
SELECT t1.datasetKey, SUM(t1.hasNone) missing FROM
(
SELECT gbifID, datasetKey, IF(publishingOrgKey IS NULL, 1, 0) hasNone
FROM occurrence_hbase
) t1
GROUP BY t1.datasetKey
HAVING SUM(t1.hasNone) > 0
Author: mdoering@gbif.org
Comment: owning_organization_key in the registry exists for the first few datasets checked. So appears to be a processing error. What does happen if there are registry ws exceptions happening during processing, does the record still make it into an interpreted occurrence? The model class has pubOrgKey annotated with @NotNull and downstream code relies on that fact. We need to make sure this cannot happen
Created: 2014-05-19 13:27:02.282
Updated: 2014-05-19 13:27:02.282
Author: omeyn@gbif.org
Created: 2014-05-19 15:38:42.771
Updated: 2014-05-19 15:38:42.771
here are the results from uat:
34c0bc41-cb59-4927-9e41-53fb0a5ce44b 9
427a6290-0c65-11dd-84d2-b8a03c50a862 50
b124e1e0-4755-430f-9eab-894f25a9b59c 100
interesting - must mean a real transient error during recent processing in production since uat is a copy from not that long ago (~1 month).
Author: omeyn@gbif.org
Comment: sending reinterp for the prod datasets had fixed all but the 5 records from 3633e0e7-8c25-4c3d-b9c7-078c0be25665. investigating some more
Created: 2014-05-19 16:56:20.237
Updated: 2014-05-19 16:56:20.237
Author: omeyn@gbif.org
Comment: ran again and done. Problem still needs solving for real though - ie interp has to keep trying until it gets a legit pub org key
Created: 2014-05-19 17:03:48.898
Updated: 2014-05-19 17:03:48.898
Author: mdoering@gbif.org
Comment: [~omeyn@gbif.org] should we do sth about this in the interpretation code before we rerun UAT? At least flagging a record with some OccurrenceIssue when an exception occurs must be done. As a start maybe even just sth like INCOMPLETE_PROCESSING - then we can spot these records at least immediately. Better would be if we apply some issue for each processed section with a try/catch
Created: 2014-05-20 10:38:48.399
Updated: 2014-05-20 10:38:48.399
Author: omeyn@gbif.org
Comment: Marked as tiny for the check to see if still happening. If so, probably not tiny to fix, but priority should be bumped.
Created: 2015-03-02 16:05:16.592
Updated: 2015-03-02 16:05:16.592