Issue 14219

Store CoL GSD annotations

14219
Reporter: mdoering
Assignee: mdoering
Type: NewFeature
Summary: Store CoL GSD annotations
Priority: Major
Status: Open
Created: 2013-10-14 12:23:36.293
Updated: 2015-12-14 18:38:22.611
        
Description: http://www.catalogueoflife.org/piping_devel/webservice/gbp/GBIF/annotated.zip

contains a dwca file with a single taxonNote column that contains GSD annotations . The GSD can be extracted via datasetName

Example lines:
{noformat}
"taxonID"	"scientificName" ... "taxonRemarks"	"source"	"datasetName"

"NULL"	"Abarema adenophora"	"rejected:Misspelled name in ILDIS: Abarema adenophorum (Ducke) Barneby & J.W.Grimes. Correct name Abarema adenophora (Ducke) Barneby & J.W.Grimes:Misspelled name"	"NULL"	"ILDIS"

"NULL"	"Acmispon nevadensis"	"rejected:"Acmispon nevadensis (S.Watson) Brouillet (missing in ILDIS), basionym - Hosackia decumbens Benth. var. nevadensis S.Watson (already in ILDIS).":Others"	"NULL"	"ILDIS"

"NULL"	"Acacia ataxantha"	"rejected:Already in ILDIS:Others"	"NULL"	"ILDIS"



"NULL"	"Acacia atrox"	"placed:Acacia atrox Kodela - basionym of Racosperma atrox ( Kodela ) Pedley:Incomplete name"	"NULL"	"ILDIS"

"NULL"	"Abarema laeta"	"rejected:In ILDIS as Pithecellobium laetum (Poepp.) Benth. Correct name Abarema laeta (Benth.) Barneby & J.W. Grimes:Others"	"NULL"	"ILDIS"

"NULL"	"Acacia acanthophora"	"rejected:Acacia acanthophora Steud.:Name with unresolved nomenclatural status"	"NULL"	"ILDIS"


"NULL"	"Acacia amambayensis" "placed:Acacia amambayensis Hassl.:Others"	"NULL"	"ILDIS"

"NULL"	"Acacia anisophylla"	"placed:Acacia anisophylla S.Watson:Incomplete name"	"NULL"	"ILDIS"


"NULL"	"Macrodorcus recta"	"idem id84341"	"NULL"	"Scarabs"

"NULL"	"Colophon westwoodi"	"idem TaxonID 5161"	"NULL"	"Scarabs"

{noformat}
]]>
    


Author: mdoering@gbif.org
Created: 2013-10-15 12:31:16.944
Updated: 2013-10-15 12:48:03.659
        
Viktor: "The system is updated once per month. Files are not versioned - annotated names are loaded from files provided by GSDs during each update. If there is new update from GSDs - old annotations will be kept, but there is an option to delete them. Whether to keep or delete them should be agreed before starting a new project involving GSDs.

We do not have an option to keep annotated files strictly in sync with the last provided checklists from GBIF or other providers as GSDs annotate only if they have projects/resources allocated for doing the job and then it takes a few months for them to complete. However during each new update from GBIF or other providers (GBPs) only new names are added to the buffer database of the piping tools, whereas for duplicated names only name counter in the record is incremented, but no new name record inserted, to prevent GSDs from annotating the same name again and again."
    


Author: mdoering@gbif.org
Created: 2013-10-15 12:49:12.581
Updated: 2013-10-15 12:49:12.581
        
Statistics of the names piping from CoL is found in this RSS feed:
http://www.catalogueoflife.org/piping_devel/webservice/rss/

Which shows a weird number for ILDIS names from GBIF:
{noformat}
ILDIS	GBIF	550526	541	6238	6779
{noformat}

How can GBIF pipe half a million Fabaceae records to ILDIS?

The GBIF backbone contains a little less than 50.000 Fabales species:
http://www.gbif.org/species/5386

and no more than 88.000 names of any rank:
http://www.gbif.org/species/search?dataset_key=d7dddbf4-2cf0-4f39-9b2a-bb099caae36c&highertaxon_key=1370

    


Author: mdoering@gbif.org
Created: 2013-10-15 15:19:49.086
Updated: 2013-10-16 13:28:22.416
        
annotation archive is invalid and has various issues:
http://dev.4d4life.eu:8081/browse/PIPTO-15
http://dev.4d4life.eu:8081/browse/PIPTO-16
http://dev.4d4life.eu:8081/browse/PIPTO-17