Issue 17737
Crawling: Super slow after a deletion
17737
Reporter: trobertson
Assignee: trobertson
Type: Bug
Summary: Crawling: Super slow after a deletion
Priority: Major
Resolution: Fixed
Status: Closed
Created: 2015-07-31 07:56:36.706
Updated: 2015-07-31 13:40:41.541
Resolved: 2015-07-31 13:40:41.47
Description: When records are deleted from a dataset, and then the dataset is recrawled, it brings the system to it's knees.
It runs at around 15/s showing this in the logs:
{code}
Could not retrieve Fragment with key [812995720] even though it exists in lookup table - deleting lookup and inserting Fragment as NEW.
{code}
When you 5 million records this blocks crawling for 90+ hours which is not acceptable.
If we can't speed this up, we should just remove all lookups on a delete, knowing that we can't restore records with the same id.
CC [~omeyn]
]]>
Author: trobertson@gbif.org
Comment: This happens because of POR-995 which is blocked by POR-989. I am fixing all 3
Created: 2015-07-31 11:15:04.546
Updated: 2015-07-31 11:15:04.546
Author: trobertson@gbif.org
Created: 2015-07-31 13:40:41.539
Updated: 2015-07-31 13:40:41.539
With https://github.com/gbif/occurrence/commit/df8f062e7c3904d5f8600507ede7ecc96aadca48
The slowness came because occurrenceID was never deleted.