Issue 14486

Metadata sync on TAPIR installations fail if first endpoint in the list is illegit

14486
Reporter: jlegind
Assignee: lfrancke
Type: Bug
Summary: Metadata sync on TAPIR installations fail if first endpoint in the list is illegit
Priority: Critical
Resolution: Invalid
Status: Closed
Created: 2013-12-19 11:57:08.572
Updated: 2015-03-02 16:55:15.157
Resolved: 2015-03-02 16:55:15.136
        
Description: If the first entry in the list under Endpoints () is an endpoint that is no longer valid (removed by the publisher) the sync cannot get a proper capabilities response.

The Finnish Museum of Natural History TAPIR endpoint was 'fixed' by removing two no longer existing endpoints that were at the top of the list.
http://registry.gbif.org/web/index.html#/installation/605b9238-f762-11e1-a439-00145eb45e9a
(See synchronization history)

Is it possible that an error in the first endpoint will stop the loop iterating to through the remaining endpoints?

https://code.google.com/p/gbif-registry/source/browse/registry/trunk/registry-metasync/src/main/java/org/gbif/registry/metasync/protocols/tapir/TapirMetadataSynchroniser.java#121]]>
    


Author: lfrancke@gbif.org
Created: 2013-12-19 12:24:15.704
Updated: 2013-12-19 12:24:15.704
        
The assumption is correct: https://code.google.com/p/gbif-registry/source/browse/registry/trunk/registry-metasync/src/main/java/org/gbif/registry/metasync/protocols/tapir/TapirMetadataSynchroniser.java#70

I stop iterating through Endpoints once we encounter a bad one. I'll take a look at this now.
    


Author: lfrancke@gbif.org
Created: 2013-12-19 13:57:27.6
Updated: 2013-12-19 13:57:27.6
        
I remember why I did this.

The logic is as follows:

* For each Endpoint for an Installation:
** Get Capabilities, abort Metasync for this Installation if that throws an exception
** Match Capabilities to an existing Dataset for this Installation
*** If one could be found it is considered to be updated
*** If none could be found it is considered to be new
* For each Dataset for the Installation check if it was found in the _updated_ or _new_ list
** If not consider it _deleted_

The problem now is that if a Capabilities response fails I currently err on the safe side and abort a Metasync because we can't distinguish between transient or permanent failures. If it is transient we don't want to delete the Dataset. If it's permanent we do. By aborting the Metasync (in an admittedly non obvious way) you can fix the Registry after a manual check and then rerun a Metasync.

That was my reason for doing things this way. I'm happy to implement it in another way.
    


Author: jlegind@gbif.org
Created: 2013-12-19 17:08:13.88
Updated: 2013-12-19 17:08:13.88
        
Thank you Lars for this eloquent and to the point answer.

I can certainly understand the safety feature and it should stay in place, but in that case I would like a more detailed sync history response that lists the non responsive endpoints, or at least the endpoint that caused the abort.