15979
Reporter: mdoering
Assignee: jlegind
Type: Feedback
Summary: Images exist for only 109 records
Status: InProgress
Created: 2014-06-19 16:07:42.654
Updated: 2016-02-10 10:20:12.504
Description: Many (all?) records in the dataset http://www.gbif.org/dataset/962f59bc-f762-11e1-a439-00145eb45e9a have multimedia according to Guy Colling who send us an email. But only 109 records, all having species names starting with A have images in our portal.
Can we try to re-harvest them? Looks like indexing failed and stopped in the middle? verbatim records of other records are lacking any multimedia information, for example http://www.gbif.org/occurrence/768468162/verbatim They actually are lacking an xml fragment http://www.gbif.org/occurrence/768468162/fragment
]]>
Author: omeyn@gbif.org
Created: 2014-06-19 16:48:12.105
Updated: 2014-06-19 16:48:12.105
I tried a crawl and a few pages went ok but then a protocol exception. The biocase server reports a bad request (bad xml) but the first request sent in the crawl returned fine. On subsequent attempts to send that first (known good) request it also fails with a bad request error. Seems something wrong on their end. Here's the request that worked once and not afterwards:
http://extranet.mnhn.lu/biocase/pywrapper.cgi?dsa=BiocaseMNHNL&request=%3C%3Fxml+version%3D%271.0%27+encoding%3D%27UTF-8%27%3F%3E%0A%3Crequest+xmlns%3D%27http%3A%2F%2Fwww.biocase.org%2Fschemas%2Fprotocol%2F1.3%27%0A+++++++++xmlns%3Axsi%3D%27http%3A%2F%2Fwww.w3.org%2F2001%2FXMLSchema-instance%27%0A+++++++++xsi%3AschemaLocation%3D%27http%3A%2F%2Fwww.biocase.org%2Fschemas%2Fprotocol%2F1.3+http%3A%2F%2Fwww.bgbm.org%2Fbiodivinf%2FSchema%2Fprotocol_1_3.xsd%27%3E%0A++%3Cheader%3E%0A++++%3Ctype%3Esearch%3C%2Ftype%3E%0A++%3C%2Fheader%3E%0A++%3Csearch%3E%0A++++%3CrequestFormat%3Ehttp%3A%2F%2Fwww.tdwg.org%2Fschemas%2Fabcd%2F2.06%3C%2FrequestFormat%3E%0A++++%3CresponseFormat+start%3D%220%22+limit%3D%221000%22%3Ehttp%3A%2F%2Fwww.tdwg.org%2Fschemas%2Fabcd%2F2.06%3C%2FresponseFormat%3E%0A++++%3Cfilter%3E%0A++++++%3Cand%3E%0A++++++++%3Cequals+path%3D%22%2FDataSets%2FDataSet%2FMetadata%2FDescription%2FRepresentation%2FTitle%22%3EBiological+and+palaeontological+collection+and+observation+data+MNHNL%3C%2Fequals%3E%0A++++++++%3Cand%3E%0A++++++++++%3ClessThan+path%3D%22%2FDataSets%2FDataSet%2FUnits%2FUnit%2FIdentifications%2FIdentification%2FResult%2FTaxonIdentified%2FScientificName%2FFullScientificNameString%22%3EAaa%3C%2FlessThan%3E%0A++++++++%3C%2Fand%3E%0A++++++%3C%2Fand%3E%0A++++%3C%2Ffilter%3E%0A++++%3Ccount%3Efalse%3C%2Fcount%3E%0A++%3C%2Fsearch%3E%0A%3C%2Frequest%3E%0A]
Author: jlegind@gbif.org
Created: 2014-12-17 14:35:25.659
Updated: 2014-12-17 14:35:25.659
The endpoint returns bad xml, but in a erratic fashion, basically confirming what Oliver said.
Something to be aware of is that the installation is BioCASE 2.4.2 which is from 2010.
The publisher has been contacted.
Author: mdoering@gbif.org
Comment: [~jlegind@gbif.org] any news on this one?
Created: 2015-02-23 17:35:07.329
Updated: 2015-02-23 17:35:07.329