Issue 10757

Better strategy for handling bad scientific names

10757
Reporter: ahahn
Assignee: kbraak
Type: Task
Summary: Better strategy for handling bad scientific names
Priority: Major
Resolution: Fixed
Status: Closed
Created: 2012-02-03 14:33:14.659
Updated: 2015-07-27 11:14:27.023
Resolved: 2015-07-27 11:14:26.974
        
Description: Reported by project member kyle.br...@gmail.com, Dec 14, 2011 (transferred from http://code.google.com/p/gbif-crawler/issues/detail?id=2)

Currently there is a good chance of catching garbage scientific names during construction of name ranges from an inventory of scientific names.

Moving harvesting to use auto-generated ranges (aaa-aba, aba-aca, etc) in the new crawler, whole XML responses will be synchronised to the db as soon as they have been persisted.

One of the most important fields harvested is the scientific name. Question: what strategy will we employ to handle (detect, log, parse) scientific names during synchronisation.

For example, take a bad scientific name:

&times; <i>Odontioda</i> <i>Bohnhoffiae</i>  grex

Would we try to trim out a valid name, or identify it as garbage and get the provider to fix it before trying to index again?





]]>


Author: trobertson@gbif.org
Comment: Aaa - aab crawling has been operational for years now
Created: 2015-07-27 11:14:27.02
Updated: 2015-07-27 11:14:27.02