Issue 17737

Crawling: Super slow after a deletion

17737
Reporter: trobertson
Assignee: trobertson
Type: Bug
Summary: Crawling: Super slow after a deletion
Priority: Major
Resolution: Fixed
Status: Closed
Created: 2015-07-31 07:56:36.706
Updated: 2015-07-31 13:40:41.541
Resolved: 2015-07-31 13:40:41.47
        
Description: When records are deleted from a dataset, and then the dataset is recrawled, it brings the system to it's knees.

It runs at around 15/s showing this in the logs:
{code}
Could not retrieve Fragment with key [812995720] even though it exists in lookup table - deleting lookup and inserting Fragment as NEW.
{code}

When you 5 million records this blocks crawling for 90+ hours which is not acceptable.

If we can't speed this up, we should just remove all lookups on a delete, knowing that we can't restore records with the same id.

CC [~omeyn]

]]>
    


Author: trobertson@gbif.org
Comment: This happens because of POR-995 which is blocked by POR-989.  I am fixing all 3
Created: 2015-07-31 11:15:04.546
Updated: 2015-07-31 11:15:04.546


Author: trobertson@gbif.org
Created: 2015-07-31 13:40:41.539
Updated: 2015-07-31 13:40:41.539
        
With https://github.com/gbif/occurrence/commit/df8f062e7c3904d5f8600507ede7ecc96aadca48

The slowness came because occurrenceID was never deleted.