Issue 11049

checklistbank-ws/name_usage/[TaxonID]/descendants call doesn't respond

11049
Reporter: bko
Type: Bug
Summary: checklistbank-ws/name_usage/[TaxonID]/descendants call doesn't respond
Priority: Major
Resolution: WontFix
Status: Resolved
Created: 2012-05-02 10:41:11.734
Updated: 2014-01-20 11:34:56.966
Resolved: 2014-01-20 11:34:56.933
        
Description: This call produces a JSON object with descendants, like for a genus taxon, it lists descendant species and their synonyms.
Right now the API seems not functioning properly, using a browser, there is nothing return and seems waiting for loading.]]>
    


Author: mdoering@gbif.org
Created: 2012-05-02 10:46:32.41
Updated: 2012-05-02 10:46:32.41
        
Testing with http://staging.gbif.org:8080/checklistbank-ws/name_usage/2864092/descendants

    


Author: bko@gbif.org
Comment: This url doesn't return anything to me.
Created: 2012-05-02 10:53:56.424
Updated: 2012-05-02 10:53:56.424


Author: mdoering@gbif.org
Created: 2012-05-02 12:27:32.502
Updated: 2012-05-02 12:27:32.502
        
The SQL statement seems to run forever. Its a rather large query with many joins:


EXPLAIN
SELECT u.id, u.nub_fk, chk.resource_key, u.dataset_fk,
 CASE WHEN u.is_synonym THEN par.parent_fk ELSE u.parent_fk END as parentKey,
 CASE WHEN u.is_synonym THEN coalesce(parcn2.scientific_name, parn2.scientific_name) ELSE coalesce(parcn.scientific_name, parn.scientific_name) END as parent,
 CASE WHEN u.is_synonym THEN u.parent_fk ELSE null END as acceptedKey,
 CASE WHEN u.is_synonym THEN parn.scientific_name ELSE null END as accepted,
 u.basionym_fk, basn.scientific_name as basionym,
 n.scientific_name scientificName, cn.scientific_name as canonicalName, n.type as nameType,
 authorship_full(np.authorship,np.year,np.authorship_basionym, np.year_basionym) as authorship,
 pub.citation as publishedIn, accto.citation as accordingTo,
 ilnk.identifier as link, isrc.identifier as sourceId,
 ku.id as kid, coalesce(kcn.scientific_name,ksn.scientific_name) as kingdom, 0 as num_k,
 pu.id as pid, coalesce(pcn.scientific_name,psn.scientific_name) as phylum, u.num_p,
 cu.id as cid, coalesce(ccn.scientific_name,csn.scientific_name) as class, u.num_c,
 ou.id as oid, coalesce(ocn.scientific_name,osn.scientific_name) as order, u.num_o,
 fu.id as fid, coalesce(fcn.scientific_name,fsn.scientific_name) as family, u.num_f,
 gu.id as gid, coalesce(gcn.scientific_name,gsn.scientific_name) as genus, u.num_g,
 sgu.id as sgid, coalesce(sgcn.scientific_name,sgsn.scientific_name) as subgenus, u.num_sg,
 su.id as sid, coalesce(ssn.scientific_name,ssn.scientific_name) as species, u.num_s,
 u.num_children, (u.rgt-u.lft-1)/2 as num_descendants, occ.num_occurrence as num_occurrences, u.num_synonyms,
 u.is_synonym, u.pp_synonym_fk, u.origin
 , taxstat.preferred_term_fk as taxonomicStatus, taxstat.preferred_term_fk as nomenclaturalStatus, rnk.preferred_term_fk as rank

 FROM
 name_usage u
 join checklist chk on u.checklist_fk=chk.id
 join name_string n on u.name_fk=n.id
 left join name_string cn on n.canonical_name_fk=cn.id
 left join parsed_name np on u.name_fk=np.name_fk

 left join term rnk on u.rank_fk=rnk.id
 left join term taxstat on u.tax_status_fk=taxstat.id
 left join term nomstat on u.nom_status_fk=nomstat.id

 left join citation pub on u.name_published_in_fk=pub.id
 left join citation accto on u.according_to_fk=accto.id

 left join identifier isrc on isrc.usage_fk=u.id and isrc.type_fk=2007
 left join identifier ilnk on ilnk.usage_fk=u.id and ilnk.type_fk=2000

 left join occurrence_metrics_usage occ on occ.nub_usage_fk=u.nub_fk

 left join name_usage ku on u.kingdom_fk=ku.id
 left join name_string ksn on ku.name_fk=ksn.id
 left join name_string kcn on ksn.canonical_name_fk=kcn.id

 left join name_usage pu on u.phylum_fk=pu.id
 left join name_string psn on pu.name_fk=psn.id
 left join name_string pcn on psn.canonical_name_fk=pcn.id

 left join name_usage cu on u.class_fk=cu.id
 left join name_string csn on cu.name_fk=csn.id
 left join name_string ccn on csn.canonical_name_fk=ccn.id

 left join name_usage ou on u.order_fk=ou.id
 left join name_string osn on ou.name_fk=osn.id
 left join name_string ocn on osn.canonical_name_fk=ocn.id

 left join name_usage fu on u.family_fk=fu.id
 left join name_string fsn on fu.name_fk=fsn.id
 left join name_string fcn on fsn.canonical_name_fk=fcn.id

 left join name_usage gu on u.genus_fk=gu.id
 left join name_string gsn on gu.name_fk=gsn.id
 left join name_string gcn on gsn.canonical_name_fk=gcn.id

 left join name_usage sgu on u.subgenus_fk=sgu.id
 left join name_string sgsn on sgu.name_fk=sgsn.id
 left join name_string sgcn on sgsn.canonical_name_fk=sgcn.id

 left join name_usage su on u.species_fk=su.id
 left join name_string ssn on su.name_fk=ssn.id
 left join name_string scn on ssn.canonical_name_fk=scn.id

 left join name_usage par on u.parent_fk=par.id
 left join name_string parn on par.name_fk=parn.id
 left join name_string parcn on parn.canonical_name_fk=parcn.id

 left join name_usage par2 on par.parent_fk=par2.id
 left join name_string parn2 on par2.name_fk=parn2.id
 left join name_string parcn2 on parn2.canonical_name_fk=parcn2.id

 left join name_usage bas on u.basionym_fk=bas.id
 left join name_string basn on bas.name_fk=basn.id

 join name_usage ad on ad.id=u.id or (u.parent_fk=ad.id and u.is_synonym=true)
 join name_usage p on p.checklist_fk=ad.checklist_fk and ad.lft > p.lft and ad.rgt < p.rgt
 WHERE p.id=2864092
 ORDER BY 2,1
 LIMIT 100 OFFSET 0;

    


Author: mdoering@gbif.org
Comment: Changing the order to use only the u.id which speeds things up in the query analyzer by 100 to 1000 fold, committed change needs 5-10 minutes to be online on staging
Created: 2012-05-02 12:28:20.021
Updated: 2012-05-02 12:28:20.021


Author: mdoering@gbif.org
Comment: Still too slow
Created: 2012-05-02 13:19:23.928
Updated: 2012-05-02 13:19:23.928


Author: bko@gbif.org
Comment: hum I remember the first time you showed me was quite quick, there are not many descendants under genus Halodule though...
Created: 2012-05-02 13:40:50.74
Updated: 2012-05-02 13:40:50.74


Author: mdoering@gbif.org
Created: 2012-05-02 14:00:41.139
Updated: 2012-05-02 14:00:41.139
        
even a simpler query takes half a minute already, strange:

SELECT u.id, u.nub_fk, n.scientific_name
FROM name_usage u
join name_string n on u.name_fk=n.id
join name_usage ad on ad.id=u.id or (u.parent_fk=ad.id and u.is_synonym=true)
join name_usage p on p.checklist_fk=ad.checklist_fk and ad.lft > p.lft and ad.rgt < p.rgt
WHERE p.id=2864092
ORDER BY 1
LIMIT 2 OFFSET 0;
    


Author: bko@gbif.org
Created: 2012-05-03 12:00:00.905
Updated: 2012-05-03 12:00:00.905
        
I can see last time it worked was 19 Apr. So probably revert back to the version around that time?

I also think perhaps in terms of getting names of synonyms of descendants, I should try to make several calls to complete the retrieval, to avoid this heavy SQL processing. What do you think?
    


Author: mdoering@gbif.org
Created: 2012-06-07 23:52:51.751
Updated: 2012-06-07 23:52:51.751
        
Tuning the sql needs more time and potentially a new index or even column based on intarray or ltree:
http://www.postgresql.org/docs/9.1/static/intarray.html
http://www.postgresql.org/docs/current/static/ltree.html

Shifting to next version
    


Author: mdoering@gbif.org
Created: 2012-06-28 14:03:55.328
Updated: 2012-06-28 14:03:55.328
        
According to http://gbif.blogspot.dk/2012/06/taxonomic-trees-in-postgresql.html the nested set with ordering by lft has been used and its working fine. But because the nested sets in clb right now are only for accepted taxa, an additional ordering by u.id is needed to guarantee a stable sorting for paging of synonyms. That additional sorting slows down the query dramatically though, so its unusable.

It should be considered to build a nested set for all usages incl synonyms.
    


Author: mdoering@gbif.org
Created: 2012-07-05 11:24:05.803
Updated: 2012-07-05 11:24:05.803
        
Plant descendants doesn't work (its a million descendants or so):
http://jawa.gbif.org:8080/checklistbank-ws/name_usage/6/descendants
    


Author: mdoering@gbif.org
Comment: Sorting of name usages relies on the nested set indices sort order. Make sure the new nested set indices place incertae sedis last!
Created: 2012-07-05 11:24:56.424
Updated: 2012-07-05 11:24:56.424


Author: mdoering@gbif.org
Comment: Removed from API
Created: 2014-01-20 11:34:56.964
Updated: 2014-01-20 11:34:56.964