Issue 10815

NCBI taxonomy feeding common names into Nub

10815
Reporter: ahahn
Assignee: mdoering
Type: Task
Summary: NCBI taxonomy feeding common names into Nub
Priority: Critical
Resolution: Fixed
Status: Closed
Created: 2012-02-14 11:10:14.375
Updated: 2013-12-06 17:55:13.8
Resolved: 2012-02-14 13:04:59.16
        
Description: A search for a term like "orange" in the data portal (http://data.gbif.org/search/orange) lists a number of common names in the Scientific Names category: "Orange little leaf", "Rice orange leaf. Most of those seem to come from the NCBI checklist data source: http://ecat-dev.gbif.org/search?q=orange&rkey=1. Check whether NCBI dataset ranking for nub inclusion needs tuning.

]]>
    


Author: mdoering@gbif.org
Created: 2012-02-14 11:27:00.799
Updated: 2012-02-14 11:27:00.799
        
Indeed a problem. Maybe its best to simply exclude NCBI from nub building as it contains too much noise. For virus names many of these names are correct, but as they do not follow some strict nomenclature we cannot validate the correctness and have to simply admin any name.

I se these 2 options:

1) exclude NCBI and consider using
  1a) rely soleyl on CoL virus and bacteria names
  1b) try to find additional pure virus or bacteria sources (e.g. use the respective authorities: ictvonline.org / http://en.wikipedia.org/wiki/Bacterial_taxonomy#Authorities)

2) use NCBI only for virus names


I tend to go for option 1b ...
    


Author: mdoering@gbif.org
Created: 2012-02-14 11:31:04.508
Updated: 2012-02-14 11:31:04.508
        
ICTV has an excel list here that is easily converted to dwca:
http://talk.ictvonline.org/files/ictv_documents/m/msl/1231.aspx
    


Author: mdoering@gbif.org
Created: 2012-02-14 11:35:19.667
Updated: 2012-02-14 11:35:19.667
        
CoL contains 2,083 names from ICTV 2008:
http://www.catalogueoflife.org/details/database/id/14

The latest ICT list above is version 10 published on August 24, 2011 contains 2285 virus names
    


Author: mdoering@gbif.org
Created: 2012-02-14 13:03:39.026
Updated: 2012-02-14 13:03:39.026
        
Ive registered a new ICTV dwca:
http://gbrds.gbif.org/browse/agent?uuid=e01b0cbb-a10a-420c-b5f3-a3b20cc266ad

dwca:
http://dl.dropbox.com/u/457027/ictv.zip
    


Author: mdoering@gbif.org
Comment: NCBI is excluded now in the clb indexing db, with next nub build this issue should be fixed.
Created: 2012-02-14 13:04:50.139
Updated: 2012-02-14 13:04:50.139


Author: mdoering@gbif.org
Created: 2012-02-14 13:25:35.86
Updated: 2012-02-14 13:25:35.86
        
We should try to get the List of Prokaryotic names with Standing in Nomenclature (LPSN) into clb
http://en.wikipedia.org/wiki/LPSN

Most important bacterial journal for new names added to our publication index:
http://ijs.sgmjournals.org/