Issue 15637

Pensoft/Biodiversity Data Journal - datasets created using our webservice crawl without having an endpoint

15637
Reporter: jlegind
Assignee: kbraak
Type: Bug
Summary: Pensoft/Biodiversity Data Journal - datasets created using our webservice crawl without having an endpoint
Priority: Blocker
Resolution: Fixed
Status: Resolved
Created: 2014-05-22 14:46:09.542
Updated: 2014-05-28 17:03:36.969
Resolved: 2014-05-23 16:15:20.911
        
Description: The custom setup that Pensoft employs which uses the registry webservices, creates datasets as a first step and later associates the endpoint.
The first part triggers a crawl that fails because the latter step is not completed.

Jordan Biserkov shared an overview that displayed around 50 new datasets that were not indexed due to this issue: http://pwt.pensoft.net/dev/gbif.html

Those resources have been crawled manually after the fact. What's needed is a trigger that will crawl a dataset when its endpoint changes. But need to be careful not to send two messages in quick succession such that they both fail (e.g. crawl dataset and then crawl endpoint, and while crawl dataset is working and eventually failing, crawl endpoint comes in and gets rejected as already crawling).

Pieces needed:
1) registry needs to send a "dataset changed" message when an endpoint is added or altered for that dataset (MessageSendingEventListener)
2) in occurrence cli the registry-change-listener needs to check incoming datasets for endpoints and only send the crawl now msg for those datasets that have endpoints]]>
    


Author: kbraak@gbif.org
Comment: [~omeyn@gbif.org], required registry change under review: http://dev.gbif.org/code/cru/CR-POR-168
Created: 2014-05-22 17:35:50.696
Updated: 2014-05-22 17:35:50.696


Author: kbraak@gbif.org
Comment: An alternative approach taken. New patch added for review in http://dev.gbif.org/code/cru/CR-POR-168
Created: 2014-05-23 11:25:52.035
Updated: 2014-05-23 11:25:52.035


Author: kbraak@gbif.org
Created: 2014-05-23 16:15:20.934
Updated: 2014-05-23 16:15:20.934
        
Change committed: https://github.com/gbif/registry/commit/e03112f3d1830a1a35855b0e3f636a423e26a233

A test confirmed that the message arrived successfully in the RabbitMq registry change queue, having tested on /dev.

Message copied below:

Message 1

The server reported 0 messages remaining.

Exchange	registry
Routing Key	registry.change.updated.dataset
Redelivered	○
Properties
priority:	0
delivery_mode:	1
content_type:	text/plain
Payload
10882 bytes
Encoding: string
{"changeType":"UPDATED","objectClass":"org.gbif.api.model.registry.Dataset","oldObject":{"objectClass":"org.gbif.api.model.registry.Dataset","key":"af3bce08-0599-45a6-9bfc-08188bcd868e","parentDatasetKey":null,"duplicateOfDatasetKey":null,"installationKey":"6aca019f-ee9d-4088-859b-5a601d4b093f","owningOrganizationKey":"456058db-f70b-4005-97ad-e08570cf0c56","external":false,"numConstituents":0,"type":"OCCURRENCE","subtype":null,"title":"Aves Tanzanian collection at the Natural History Museum of Denmark (SNM)","alias":null,"abbreviation":null,"description":"Collections from Tanzania and the Eastern Arc Mts in particular, in the Zoological Museum, Natural History Museum of Denmark.  The Zoological Museum of the Natural History Museum has a long tradition for field work in Tanzania, and in particular in the forests of the Eastern Arc Mountains. In addition to these recent collecting efforts, the museum houses bird material collected 1947-67 in all parts of the country by the Danish plantation owner Thorkild Andersen. Today, our bird collections from Tanzania are among the largest in the world. We should like to see our collections and the associated data being increasingly utilized for scientific research, conservation and education, and to fulfill this goal we are here making part of the data available online. 9605 records, covering 18 Orders; 64 Families; 283 Genera; 621 Species. 1503 records with image links.","language":"ENGLISH","homepage":null,"logoUrl":null,"citation":{"text":"Aves Tanzanian collection at the Natural History Museum of Denmark (SNM)","identifier":null},"rights":"Zoological Museum - Natural History Museum of Denmark This work is licensed under a Creative Commons Attribution-Noncommercial License (http://creativecommons.org/licenses/by-nc/3.0/).","lockedForAutoUpdate":false,"createdBy":"registry-migration.gbif.org","modifiedBy":"crawler.gbif.org","created":1325775423000,"modified":1400844614592,"deleted":null,"contacts":[{"key":47435,"type":"ADMINISTRATIVE_POINT_OF_CONTACT","primary":true,"firstName":"Jon","lastName":"Fjeldså","position":"Professor","description":null,"email":"jfjeldsaa@snm.ku.dk","phone":"+45 35 32 10 23","organization":"The natural History Museum of Denmark","address":"Universitetsparken 15","city":"Copenhagen OE","province":null,"country":"DK","postalCode":"2100","createdBy":"crawler.gbif.org","modifiedBy":"crawler.gbif.org","created":1400844614436,"modified":1400844614436},{"key":47434,"type":"METADATA_AUTHOR","primary":true,"firstName":"Jon","lastName":"Fjeldså","position":"Professor","description":null,"email":"jfjeldsaa@snm.ku.dk","phone":"+45 35 32 10 23","organization":"The natural History Museum of Denmark","address":"Universitetsparken 15","city":"Copenhagen OE","province":null,"country":"DK","postalCode":"2100","createdBy":"crawler.gbif.org","modifiedBy":"crawler.gbif.org","created":1400844614393,"modified":1400844614393},{"key":47433,"type":"ORIGINATOR","primary":true,"firstName":"Jon","lastName":"Fjeldså","position":"Professor","description":null,"email":"jfjeldsaa@snm.ku.dk","phone":"+45 35 32 10 23","organization":"The natural History Museum of Denmark","address":"Universitetsparken 15","city":"Copenhagen OE","province":null,"country":"DK","postalCode":"2100","createdBy":"crawler.gbif.org","modifiedBy":"crawler.gbif.org","created":1400844614360,"modified":1400844614360}],"endpoints":[{"key":17599,"type":"DWC_ARCHIVE","url":"http://danbif.au.dk/ipt/archive.do?r=aves_tanza","description":null,"createdBy":"456058db-f70b-4005-97ad-e08570cf0c56","modifiedBy":"456058db-f70b-4005-97ad-e08570cf0c56","created":1384957707711,"modified":1384957707711,"machineTags":[]},{"key":17598,"type":"EML","url":"http://danbif.au.dk/ipt/eml.do?r=aves_tanza","description":null,"createdBy":"456058db-f70b-4005-97ad-e08570cf0c56","modifiedBy":"456058db-f70b-4005-97ad-e08570cf0c56","created":1384957707689,"modified":1384957707689,"machineTags":[]}],"machineTags":[{"key":110728,"namespace":"crawler.gbif.org","name":"crawl_attempt","value":"53","createdBy":"crawler.gbif.org","created":1400844612796}],"tags":[],"identifiers":[{"key":11647,"type":"GBIF_PORTAL","identifier":"13987","createdBy":"registry-migration.gbif.org","created":1346853862000}],"comments":[],"bibliographicCitations":[],"curatorialUnits":[],"taxonomicCoverages":[{"description":null,"coverages":[{"scientificName":"Animalia","commonName":null,"rank":{"verbatim":"kingdom","interpreted":"KINGDOM"}},{"scientificName":"Chordata","commonName":null,"rank":{"verbatim":"phylum","interpreted":"PHYLUM"}},{"scientificName":"Aves","commonName":null,"rank":{"verbatim":"class","interpreted":"CLASS"}}]}],"geographicCoverageDescription":null,"geographicCoverages":[{"description":"Tanzania","boundingBox":{"minLatitude":-90.0,"maxLatitude":90.0,"minLongitude":-180.0,"maxLongitude":180.0,"globalCoverage":true}}],"temporalCoverages":[],"keywordCollections":[{"thesaurus":"GBIF Dataset Type Vocabulary: http://rs.gbif.org/vocabulary/gbif/dataset_type.xml","keywords":["Occurrence"]},{"thesaurus":"GBIF Dataset Subtype Vocabulary: http://rs.gbif.org/vocabulary/gbif/dataset_subtype.xml","keywords":["Specimen"]}],"project":null,"samplingDescription":null,"countryCoverage":[],"collections":[],"dataDescriptions":[],"dataLanguage":"ENGLISH","purpose":null,"additionalInfo":null,"pubDate":1398636000000},"newObject":{"objectClass":"org.gbif.api.model.registry.Dataset","key":"af3bce08-0599-45a6-9bfc-08188bcd868e","parentDatasetKey":null,"duplicateOfDatasetKey":null,"installationKey":"6aca019f-ee9d-4088-859b-5a601d4b093f","owningOrganizationKey":"456058db-f70b-4005-97ad-e08570cf0c56","external":false,"numConstituents":0,"type":"OCCURRENCE","subtype":null,"title":"Aves Tanzanian collection at the Natural History Museum of Denmark (SNM)","alias":null,"abbreviation":null,"description":"Collections from Tanzania and the Eastern Arc Mts in particular, in the Zoological Museum, Natural History Museum of Denmark.  The Zoological Museum of the Natural History Museum has a long tradition for field work in Tanzania, and in particular in the forests of the Eastern Arc Mountains. In addition to these recent collecting efforts, the museum houses bird material collected 1947-67 in all parts of the country by the Danish plantation owner Thorkild Andersen. Today, our bird collections from Tanzania are among the largest in the world. We should like to see our collections and the associated data being increasingly utilized for scientific research, conservation and education, and to fulfill this goal we are here making part of the data available online. 9605 records, covering 18 Orders; 64 Families; 283 Genera; 621 Species. 1503 records with image links.","language":"ENGLISH","homepage":null,"logoUrl":null,"citation":{"text":"Aves Tanzanian collection at the Natural History Museum of Denmark (SNM)","identifier":null},"rights":"Zoological Museum - Natural History Museum of Denmark This work is licensed under a Creative Commons Attribution-Noncommercial License (http://creativecommons.org/licenses/by-nc/3.0/).","lockedForAutoUpdate":false,"createdBy":"registry-migration.gbif.org","modifiedBy":"crawler.gbif.org","created":1325775423000,"modified":1400844614592,"deleted":null,"contacts":[{"key":47435,"type":"ADMINISTRATIVE_POINT_OF_CONTACT","primary":true,"firstName":"Jon","lastName":"Fjeldså","position":"Professor","description":null,"email":"jfjeldsaa@snm.ku.dk","phone":"+45 35 32 10 23","organization":"The natural History Museum of Denmark","address":"Universitetsparken 15","city":"Copenhagen OE","province":null,"country":"DK","postalCode":"2100","createdBy":"crawler.gbif.org","modifiedBy":"crawler.gbif.org","created":1400844614436,"modified":1400844614436},{"key":47434,"type":"METADATA_AUTHOR","primary":true,"firstName":"Jon","lastName":"Fjeldså","position":"Professor","description":null,"email":"jfjeldsaa@snm.ku.dk","phone":"+45 35 32 10 23","organization":"The natural History Museum of Denmark","address":"Universitetsparken 15","city":"Copenhagen OE","province":null,"country":"DK","postalCode":"2100","createdBy":"crawler.gbif.org","modifiedBy":"crawler.gbif.org","created":1400844614393,"modified":1400844614393},{"key":47433,"type":"ORIGINATOR","primary":true,"firstName":"Jon","lastName":"Fjeldså","position":"Professor","description":null,"email":"jfjeldsaa@snm.ku.dk","phone":"+45 35 32 10 23","organization":"The natural History Museum of Denmark","address":"Universitetsparken 15","city":"Copenhagen OE","province":null,"country":"DK","postalCode":"2100","createdBy":"crawler.gbif.org","modifiedBy":"crawler.gbif.org","created":1400844614360,"modified":1400844614360}],"endpoints":[{"key":37995,"type":"DWC_ARCHIVE","url":"http://ipt.gbif.org/archive.do?r=kyleTest","description":"Testing testing 1, 2, 3","createdBy":"Kyle Braak","modifiedBy":"Kyle Braak","created":1400852236099,"modified":1400852236099,"machineTags":[]},{"key":17599,"type":"DWC_ARCHIVE","url":"http://danbif.au.dk/ipt/archive.do?r=aves_tanza","description":null,"createdBy":"456058db-f70b-4005-97ad-e08570cf0c56","modifiedBy":"456058db-f70b-4005-97ad-e08570cf0c56","created":1384957707711,"modified":1384957707711,"machineTags":[]},{"key":17598,"type":"EML","url":"http://danbif.au.dk/ipt/eml.do?r=aves_tanza","description":null,"createdBy":"456058db-f70b-4005-97ad-e08570cf0c56","modifiedBy":"456058db-f70b-4005-97ad-e08570cf0c56","created":1384957707689,"modified":1384957707689,"machineTags":[]}],"machineTags":[{"key":110728,"namespace":"crawler.gbif.org","name":"crawl_attempt","value":"53","createdBy":"crawler.gbif.org","created":1400844612796}],"tags":[],"identifiers":[{"key":11647,"type":"GBIF_PORTAL","identifier":"13987","createdBy":"registry-migration.gbif.org","created":1346853862000}],"comments":[],"bibliographicCitations":[],"curatorialUnits":[],"taxonomicCoverages":[{"description":null,"coverages":[{"scientificName":"Animalia","commonName":null,"rank":{"verbatim":"kingdom","interpreted":"KINGDOM"}},{"scientificName":"Chordata","commonName":null,"rank":{"verbatim":"phylum","interpreted":"PHYLUM"}},{"scientificName":"Aves","commonName":null,"rank":{"verbatim":"class","interpreted":"CLASS"}}]}],"geographicCoverageDescription":null,"geographicCoverages":[{"description":"Tanzania","boundingBox":{"minLatitude":-90.0,"maxLatitude":90.0,"minLongitude":-180.0,"maxLongitude":180.0,"globalCoverage":true}}],"temporalCoverages":[],"keywordCollections":[{"thesaurus":"GBIF Dataset Type Vocabulary: http://rs.gbif.org/vocabulary/gbif/dataset_type.xml","keywords":["Occurrence"]},{"thesaurus":"GBIF Dataset Subtype Vocabulary: http://rs.gbif.org/vocabulary/gbif/dataset_subtype.xml","keywords":["Specimen"]}],"project":null,"samplingDescription":null,"countryCoverage":[],"collections":[],"dataDescriptions":[],"dataLanguage":"ENGLISH","purpose":null,"additionalInfo":null,"pubDate":1398636000000}}
    


Author: kbraak@gbif.org
Comment: Additional change required by crawler committed https://github.com/gbif/registry/commit/49be8122ce3993264f3e387587173b983c43fe11
Created: 2014-05-28 17:03:36.969
Updated: 2014-05-28 17:03:36.969