Issue 11628

404 not found for usage 6103443

11628
Reporter: bko
Assignee: mdoering
Type: Bug
Summary: 404 not found for usage 6103443
Environment: Safari 5 / Mac OS X 10.7
Priority: Major
Resolution: Invalid
Status: Closed
Created: 2012-07-23 21:23:35.01
Updated: 2013-12-09 13:40:35.118
Resolved: 2012-07-30 17:45:10.97
        
Description: Unlike other non-sense nubkey simply generate empty JSON output, this one returns 404 not found.

http://jawa.gbif.org:8080/checklistbank-ws/name_usage/6103443]]>


Author: bko@gbif.org
Comment: The same error happens to http://jawa.gbif.org:8080/checklistbank-ws/name_usage/5854394
Created: 2012-07-23 22:24:25.713
Updated: 2012-07-23 22:24:25.713


Author: bko@gbif.org
Comment: more IDs that cause the same issue: 6086304, 2221533, 6091219, 6009701, 2565937, 3393241, 6094233.
Created: 2012-07-24 11:03:28.99
Updated: 2012-07-24 11:03:28.99


Author: mdoering@gbif.org
Comment: where do those ids come from? 6103443 at least doesn't exist in the database, so its correct to see a 404
Created: 2012-07-30 17:02:08.395
Updated: 2012-07-30 17:02:08.395


Author: bko@gbif.org
Created: 2012-07-30 17:38:20.063
Updated: 2012-07-30 17:38:20.063
        
I guess it's because the id I got from taxon_concept table was outdated. I made a dump to my local machine to test run the process and it's very likely during this few weeks the checklist bank index has been updated.
I am soon using the latest version of index to run the country checklist generation. If similar issue appeared, probably it's easier to trace the reason.


Author: bko@gbif.org
Created: 2012-07-31 10:50:54.28
Updated: 2012-07-31 10:50:54.28
        
Now some nub_concept_id from occurrence_record on the portal (mogo) database also pointing to non-existing record.

5860113
5854394
6103443
6009701
3393241
2221533
2565937
3243420
6091219
6086304
6094233


Author: mdoering@gbif.org
Created: 2012-07-31 11:42:17.393
Updated: 2012-07-31 11:42:17.393
        
Apparently portal_rollover is the current live portal db which should be in sync with the current checklistbank. "portal" is the previous one and *might* contain dead ids.

At least the first of the dead ids above are artificially created autonyms for infraspecific synonyms or undetermined species causing interpreteation trouble. These shouldn't have existed and the latest nub algorithm correctly doesnt create those anymore. For example:

db: portal
taxon_id / occ_id
5860113 / 331972069   INTERPRETED: Rhipicephalus evertsi evertsi  RAW:Rhipicephalus evertsi evertsi
5854394 / 459876216   INTERPRETED: Spionida [species] RAW:Spionida sp.
6103443 / 40571737    INTERPRETED: Squamata [species] RAW:Squamata indet.


db: portal_rollover
The taxon ids dont exist anymore as you say, query for the above occurrence ids gives this which looks much cleaner as it binds the indeterminded occurrences to the higher taxon and doesnt create a non existing species record:
taxon_id / occ_id
4900917 / 331972069   INTERPRETED: Rhipicephalus evertsi  RAW:Rhipicephalus evertsi evertsi
474     / 459876216   INTERPRETED: Spionida [order] RAW:Spionida sp.
715     / 40571737    INTERPRETED: Squamata [order] RAW:Squamata indet.


Author: bko@gbif.org
Comment: I see. Perhaps I should try to construct the parent_child file according to portal_rollover.
Created: 2012-07-31 11:48:35.052
Updated: 2012-07-31 11:48:35.052


Author: mdoering@gbif.org
Comment: For now yes. We switch between those 2 dbs every time we do a rollover, so your script also needs to follow those switches I am afraid if it wants to be correct.
Created: 2012-07-31 11:53:39.367
Updated: 2012-07-31 11:53:39.367


Author: bko@gbif.org
Comment: Is there any setting stored somewhere that can be accessed by the script, so the script would know whether to use 'portal' or 'portal_rollover'?
Created: 2012-07-31 11:58:12.488
Updated: 2012-07-31 11:58:12.488


Author: mdoering@gbif.org
Comment: I think there is a note sticked to the wall next to Andreas, but thats it :)
Created: 2012-07-31 12:04:05.752
Updated: 2012-07-31 12:04:05.752


Author: bko@gbif.org
Created: 2012-07-31 12:08:19.454
Updated: 2012-07-31 12:08:19.454
        
okay... so to automate that I'll probably need a webcam...
I'll figure out. ;)


Author: trobertson@gbif.org
Created: 2012-07-31 12:10:57.208
Updated: 2012-07-31 12:10:57.208
        
It depends what you are trying to do Burke.
If you are looking for the transient databases, used during rollover, then mogo:portal represents the last index, mogo:rollover_portal represents the new one.  Rancor:portal and Krayt:portal will represent the live web app and harvesting databases but they swap on each rollover.


Author: bko@gbif.org
Created: 2012-07-31 13:26:16.297
Updated: 2012-07-31 13:26:16.297
        
I think for now it's good to know, thanks Tim! I can imagine the script will need to access the latest index only when it's required that the exported nub_concept_id are persistent and the exportation will be used for synchronisation purposes, or for example, to link to live data portal species pages. The first part is quite challenging. And there are other parts to be considered in the workflow.

So from my end, at this  I'll just test if the script works okay with both portal and portal_rollover.