Issue 16204

occ counts 7x higher than expected in Ohio State University Fish Division (OSUM)

16204
Reporter: mdoering
Assignee: jlegind
Type: Bug
Summary: occ counts 7x higher than expected in Ohio State University Fish Division (OSUM)
Priority: Critical
Resolution: Fixed
Status: Closed
Created: 2014-07-24 18:15:57.766
Updated: 2018-05-31 16:33:39.383
Resolved: 2018-05-31 16:33:39.304
        
Description: Originally report by Alex Thompson on api user list.

-----
Hi Alex,

at least the counts in our metrics cube and our solr search index line up and we do indeed have that large number of records in our index:
http://api.gbif.org/v1/occurrence/count?datasetKey=813b435e-f762-11e1-a439-00145eb45e9a
http://api.gbif.org/v1/occurrence/search?datasetKey=813b435e-f762-11e1-a439-00145eb45e9a

So it is no bug in the API, but we need to figure out why we have that inflation.
Usually this is caused by identifier/triplet changes, but we need to investigate with a bit more time to say anything more.

It might be related to the fact the the dwc archive of that dataset apparently does not even validate:
http://tools.gbif.org/dwca-reports/205-8707255612452827269.html

Rather strange as it is an IPT which reports your 98.439 records:
http://hymfiles.biosci.ohio-state.edu:8080/ipt/resource.do?r=osum-fish]]>
    


Author: jlegind@gbif.org
Created: 2014-09-11 16:10:28.631
Updated: 2014-09-11 16:10:28.631
        

#ct	#collectioncode	#institutioncode	date_crawled

1	Fish	OSUM	2013-12-16
87733	Fish	OSUM	2013-09-07
374	Fish	OSUM	2013-12-17
55	Insects	OSUMT	2013-12-17
451382	Insects	OSUMU	2013-12-17
51	Fishes	Ohio State University - Fish Division, Columbus, OH (OSUM)	2014-03-15
98439	Fishes	Ohio State University - Fish Division, Columbus, OH (OSUM)	2014-08-26

These are the counts of collection code and inst code by date crawled. THis sugggests that the original dataset was split into two and this is the Fish collection that came out of it.

Deletion is moving forward and publisher will be contacted.
    


Author: mblissett
Comment: It's in sync now.
Created: 2018-05-31 16:33:39.374
Updated: 2018-05-31 16:33:39.374