17757
Reporter: mdoering
Assignee: mdoering
Type: NewFeature
Summary: Create new backbone dwca archive
Priority: Major
Resolution: Fixed
Status: Closed
Created: 2015-08-10 14:05:33.421
Updated: 2016-05-13 11:39:56.626
Resolved: 2016-05-13 11:39:56.516
Description: After a new backbone is built a new dwc archive of the latest backbone needs to be created and placed under the currently registered endpoint: http://rs.gbif.org/datasets/backbone/backbone.zip
Could be done by the nubchanged checklistbank cli that listens to BackboneChanged already and updates the sources network and nub dataset metadata]]>
Author: mdoering@gbif.org
Created: 2015-12-17 11:42:51.046
Updated: 2015-12-17 11:42:51.046
It would be nice to have the regular dwca being the fully normalized file with all records.
In addition to that the archive should have 3 extra files for spreadsheet users as this format has been often requested at the gbif helpdesk:
- the normalized higher taxonomy down to families
- list of all genera with their denormalised higher classification and familyID pointer
- list of all species or infraspecific names with denormalized higher classification and genusID pointer
Author: mdoering@gbif.org
Created: 2015-12-17 16:33:26.964
Updated: 2016-05-11 07:45:33.104
Keep archive and associated log and change files in a folder by date:
/backbone_releases/2015-12-24
Should we use HDFS for storing many copies?
Author: mdoering@gbif.org
Comment: the CLB admin cli now supports a manual dwca export command. I have used it to manually create a dwca copy of the current backbone and place it here: http://rs.gbif.org/datasets/backbone/ specifically http://rs.gbif.org/datasets/backbone/backbone-2016-04-13.dwca.gz
Created: 2016-05-13 11:38:15.245
Updated: 2016-05-13 11:38:15.245
Author: mdoering@gbif.org
Comment: As we do not update the nub too frequently this can remain a slightly manual process of exporting the dwca and placing it on rs.gbif.org. The registered symlink http://rs.gbif.org/datasets/backbone/backbone-current.zip should always point to the recent copy
Created: 2016-05-13 11:39:56.623
Updated: 2016-05-13 11:39:56.623