Issue 18207

New NUB usages without authors (regression)

18207
Reporter: mblissett
Assignee: mdoering
Type: Bug
Summary: New NUB usages without authors (regression)
Priority: Blocker
Resolution: Fixed
Status: Closed
Created: 2016-02-04 12:53:25.753
Updated: 2016-03-21 16:28:07.074
Resolved: 2016-03-21 15:56:43.636
        
Description: The usage http://www.gbif-uat.org/species/7609276 is new, and its genus is http://www.gbif-uat.org/species/7804592 which doesn't have an author.

The old usage http://www.gbif-uat.org/species/2490384 , has genus http://www.gbif-uat.org/species/2490383 , which does have an author. This one has been deleted.

(There are probably others, but this one has the most — 4M — occurrences.)]]>
    


Author: mblissett
Created: 2016-03-04 14:47:51.669
Updated: 2016-03-04 14:47:51.669
        
(I updated the links since this was found using a previous test NUB.)

    


Author: mdoering@gbif.org
Created: 2016-03-04 16:17:44.476
Updated: 2016-03-04 16:17:44.476
        
The id change is clearly wrong.
But the genus without authorship comes from CoL which does not have any author:
http://www.gbif-uat.org/species/115784064

Here are all usages in the UAT CLB for Cardinalis:
{noformat}
    id     |             dataset_key              |                      substr                       | rank  |      scientific_name
-----------+--------------------------------------+---------------------------------------------------+-------+----------------------------
 104076772 | 046bbc50-cae2-47ff-aa43-729fbf53f7c5 | International Plant Names Index                   | GENUS | Cardinalis Fabr.
 104076774 | 046bbc50-cae2-47ff-aa43-729fbf53f7c5 | International Plant Names Index                   | GENUS | Cardinalis Rupp.
 107867149 | 0938172b-2086-439c-a1dd-c21cb0109ed5 | Interim Register of Marine and Nonmarine Genera   | GENUS | Cardinalis Bonaparte, 1831
 107868809 | 0938172b-2086-439c-a1dd-c21cb0109ed5 | Interim Register of Marine and Nonmarine Genera   | GENUS | Cardinalis Bonaparte, 1838
 108476876 | 0938172b-2086-439c-a1dd-c21cb0109ed5 | Interim Register of Marine and Nonmarine Genera   | GENUS | Cardinalis Fabricius, 1759
 107880512 | 0938172b-2086-439c-a1dd-c21cb0109ed5 | Interim Register of Marine and Nonmarine Genera   | GENUS | Cardinalis Jarocki, 1821
 115190422 | 16c3f9cb-4b19-4553-ac8e-ebb90003aa02 | Wikipedia Species Pages - German                  | GENUS | Cardinalis
 114110809 | 3e9a9493-47e4-4dc9-a73a-00c23156b100 | Colaboraciones Americanas Sobre Aves              | GENUS | Cardinalis
 101956984 | 714c64e3-2dc1-4bb7-91e4-54be5af4da12 | IRMNG Homonym List                                | GENUS | Cardinalis Bonaparte, 1831
 101957229 | 714c64e3-2dc1-4bb7-91e4-54be5af4da12 | IRMNG Homonym List                                | GENUS | Cardinalis Bonaparte, 1838
 102019108 | 714c64e3-2dc1-4bb7-91e4-54be5af4da12 | IRMNG Homonym List                                | GENUS | Cardinalis Fabricius, 1759
 116897153 | 71667154-257d-4d8e-a2a5-711aaf9b2d74 | Phthiraptera.info                                 | GENUS | Cardinalis
 115784064 | 7ddf754f-d193-4cc9-b351-99906754a03b | Catalogue of Life                                 | GENUS | Cardinalis
 100074882 | 80b4b440-eaca-4860-aadf-d0dfdd3e856e | Official Lists and Indexes of Names in Zoology    | GENUS | Cardinalis Bonaparte, 1838
 100094644 | 80b4b440-eaca-4860-aadf-d0dfdd3e856e | Official Lists and Indexes of Names in Zoology    |       | Cardinalis Jarocki, 1821
 116802460 | 88f4e35a-bdf8-4aa2-9a1b-56401d4eed15 |                                                   | GENUS | Cardinalis
 102094242 | 9ca92552-f23a-41a8-a140-01abaa31c931 | Integrated Taxonomic Information System (ITIS)    | GENUS | Cardinalis Bonaparte, 1838
 113865628 | a6c6cead-b5ce-4a4e-8cf5-1542ba708dec | Artsnavnebasen                                    | GENUS | Cardinalis
 114998727 | bd0a2b6d-69d1-4650-8bb1-829c8f92035f | Biodiversity inventories in high gear: DNA barcod | GENUS | Cardinalis
 100225628 | c696e5ee-9088-4d11-bdae-ab88daffab78 | IOC World Bird List, version 3.4                  | GENUS | Cardinalis
 115340076 | cbb6498e-8927-405a-916b-576d00a6289b | English Wikipedia - Species Pages                 | GENUS | Cardinalis
 115337597 | cbb6498e-8927-405a-916b-576d00a6289b | English Wikipedia - Species Pages                 | GENUS | Cardinalis
 113366329 | cbb6498e-8927-405a-916b-576d00a6289b | English Wikipedia - Species Pages                 | GENUS | Cardinalis Bonaparte, 1838
 113590088 | cbb6498e-8927-405a-916b-576d00a6289b | English Wikipedia - Species Pages                 |       | Cardinalis Fabr.
 116914154 | d7435f14-dfc9-4aaa-bef3-5d1ed22d65bf | Taxonomy in Flux Checklist                        | GENUS | Cardinalis
   7804592 | d7dddbf4-2cf0-4f39-9b2a-bb099caae36c | GBIF Backbone Taxonomy                            | GENUS | Cardinalis
   2490383 | d7dddbf4-2cf0-4f39-9b2a-bb099caae36c | GBIF Backbone Taxonomy                            | GENUS | Cardinalis Bonaparte, 1831
   3241527 | d7dddbf4-2cf0-4f39-9b2a-bb099caae36c | GBIF Backbone Taxonomy                            | GENUS | Cardinalis Bonaparte, 1838
   3232102 | d7dddbf4-2cf0-4f39-9b2a-bb099caae36c | GBIF Backbone Taxonomy                            | GENUS | Cardinalis Fabricius, 1759
   7650745 | d7dddbf4-2cf0-4f39-9b2a-bb099caae36c | GBIF Backbone Taxonomy                            | GENUS | Cardinalis Jarocki, 1821
   7904806 | d7dddbf4-2cf0-4f39-9b2a-bb099caae36c | GBIF Backbone Taxonomy                            | GENUS | Cardinalis Rupp.
 104120238 | fab88965-e69d-4491-a04d-e3198b626e52 | NCBI Taxonomy                                     | GENUS | Cardinalis
{noformat}

    


Author: mdoering@gbif.org
Created: 2016-03-04 17:27:15.045
Updated: 2016-03-04 17:27:15.045
        
This is unexpected. When I create a test with all those names above and build a nub in the current dataset priority order I end up with this result which looks exactly as expected:
{noformat}
Animalia [kingdom]
  Cardinalidae [family]
    Cardinalis Bonaparte, 1838 [genus]
      Cardinalis cardinalis (Linnaeus, 1758) [species]
  Cardinalis Jarocki, 1821 [genus]
Plantae [kingdom]
  Campanulaceae [family]
    Cardinalis Fabr. [genus]
    Cardinalis Rupp. [genus]
{noformat}

Cardinalis Jarocki and Rupp. will be flagged as doubtful so only one accepted genus per kingdom remains. These should then be the ones occurrences get attached to
    


Author: mdoering@gbif.org
Created: 2016-03-10 14:13:25.361
Updated: 2016-03-10 14:13:25.361
        
After rebuilding a new backbone we still see the issue.
These are all Cardinalis genera in the nub, the 2 old Bonaparte ones are now deleted

{noformat} 2490383 | GENUS | 2016-03-10 04:55:00.207585 | Cardinalis Bonaparte, 1831
 3232102 | GENUS |                            | Cardinalis Fabricius, 1759
 3241527 | GENUS | 2016-03-10 04:55:27.932974 | Cardinalis Bonaparte, 1838
 7370558 | GENUS |                            | Cardinalis Jarocki, 1821
 8244778 | GENUS |                            | Cardinalis
 7756296 | GENUS |                            | Cardinalis Rupp.
{noformat}
    


Author: mdoering@gbif.org
Created: 2016-03-10 20:09:49.704
Updated: 2016-03-10 20:09:49.704
        
There are 2 issues here.

1) the ids are not stable. This is dealt with in a new jira POR-3060

2) the genus Cardinalis is lacking an authorshi even though the sources provide them. This is kept the topic for this issue
    


Author: mdoering@gbif.org
Created: 2016-03-21 12:57:34.562
Updated: 2016-03-21 12:59:11.461
        
This also happens for other homonyms, e.g. the Oenanthes:
http://www.gbif-uat.org/species/search?q=oenanthe&dataset_key=d7dddbf4-2cf0-4f39-9b2a-bb099caae36c&rank=GENUS

That looks pretty bad, increasing to blocker.
Weirdly there are tests for exactly this and they do not fail:
https://github.com/gbif/checklistbank/blob/master/checklistbank-cli/src/test/java/org/gbif/checklistbank/nub/NubBuilderIT.java#L442
    


Author: mdoering@gbif.org
Created: 2016-03-21 13:59:51.513
Updated: 2016-03-21 13:59:51.513
        
neo4j and the usageDAO contains the right data with authors:
{noformat}
neo4j-sh (?)$ match (n:TAXON) where n.canonicalName='Oenanthe' return n;
+------------------------------------------------------------------------------------------+
| Node[1575439]{scientificName:"Oenanthe Vieillot, 1816",rank:19,canonicalName:"Oenanthe"} |
| Node[2104649]{rank:19,canonicalName:"Oenanthe",scientificName:"Oenanthe L."}             |
| Node[4244677]{scientificName:"Oenanthe Pallas, 1771",canonicalName:"Oenanthe",rank:19}   |
| Node[4426626]{rank:19,canonicalName:"Oenanthe",scientificName:"Oenanthe"}                |
| Node[4527873]{rank:19,canonicalName:"Oenanthe",scientificName:"Oenanthe"}                |
+------------------------------------------------------------------------------------------+
{noformat}

{noformat}
NUB: NubUsage{usageKey=8239683, publishedIn=null, scientificNameID=null, rank=GENUS, origin=SOURCE, parsedName=Oenanthe G:Oenanthe R:gen. A:Vieillot Y:1816 [SCIENTIFIC], status=ACCEPTED, nomStatus=[], node=Node[1575439], kingdom=ANIMALIA, sourceIds=[115698472, 106343802, 102100548, 107871625, 101499267], issues=[], remarks=[], datasetKey=7ddf754f-d193-4cc9-b351-99906754a03b}

USAGE: NameUsage{key=8239683, kingdom=null, phylum=null, clazz=null, order=null, family=null, genus=null, subgenus=null, species=null, kingdomKey=null, phylumKey=null, classKey=null, orderKey=null, familyKey=null, genusKey=null, subgenusKey=null, speciesKey=null, datasetKey=null, subDatasetKey=7ddf754f-d193-4cc9-b351-99906754a03b, nubKey=null, parentKey=1575041, parent=Muscicapidae, proParteKey=null, acceptedKey=null, accepted=null, basionymKey=null, basionym=null, scientificName=Oenanthe Vieillot, 1816, canonicalName=Oenanthe, vernacularName=null, authorship=null, nameType=null, taxonomicStatus=ACCEPTED, nomenclaturalStatus=[], rank=GENUS, publishedIn=null, accordingTo=null, numDescendants=0, isSynonym=false, origin=SOURCE, remarks=, references=null, taxonID=gbif:8239683, modified=null, deleted=null, lastCrawled=null, lastInterpreted=null, issues=[]}
{noformat}
    


Author: mdoering@gbif.org
Comment: https://github.com/gbif/checklistbank/commit/3e3d8cda49d962c3150072c669e5ba3291ddc8b2
Created: 2016-03-21 15:56:43.733
Updated: 2016-03-21 15:56:43.733


Author: mdoering@gbif.org
Created: 2016-03-21 16:27:41.724
Updated: 2016-03-21 16:27:41.724
        
Added clb-admin cli method to update all wrong parsed names:
https://github.com/gbif/checklistbank/commit/b6ce6f0eb33b52d959f737ce5130249563de9bb1