Issue 12980

Generate raw throughput numbers of major processor and persistence components

12980
Reporter: omeyn
Assignee: omeyn
Type: Task
Summary: Generate raw throughput numbers of major processor and persistence components
Priority: Critical
Resolution: Fixed
Status: Closed
Created: 2013-03-11 11:48:48.716
Updated: 2013-12-17 15:17:02.584
Resolved: 2013-06-17 10:29:08.019
        
Description: From Tim:

1) Fragment persisting which could be bound by:
  i) the primary key determination stuff (sync HBase ops)
    - what is the raw throughput of that please?
  ii) the sending of messages (perhaps)
    - what is the raw throughput of that please?
  iii) HBase presisting has to be synchronous since the message sent only has the key, and a client is going to read that immediately (batching will cause bad reads)
    - is that around 1400 rec/s ? (as per my cube writes)
  [Is there XML parsing, in which case CPU might limit?]

2) The frag -> dwc fields which could be bound by:
  i) CPU on XML parsing
  ii) the sending of messages (perhaps)
  iii) SVN shows HBase ops are synchronous (presumably since the message sent only has the key, and a client is going to read that immediately)
    - is that around 1400 rec/s ? (as per my cube writes)

3) The dwc fields -> interpreted which could be bound by:
  i) Nub lookup WS (presume huge bottleneck)
  ii) Geocode lookup WS (presume no bottleneck?)
  iii) SVN shows HBase ops are synchronous (presumably since the message sent only has the key, and a client is going to read that immediately)
    - these could be batched, as the messages for sure contain the new values]]>
    


Author: omeyn@gbif.org
Comment: HBase key generation: ~3000/keys/s (if there's no contention - the usual case)
Created: 2013-03-11 16:25:45.977
Updated: 2013-03-11 16:25:45.977


Author: omeyn@gbif.org
Comment: Message sending is no longer a bottleneck since upgrading rabbit, giving it more cpu, and Tim's sharing channel patch.
Created: 2013-03-15 10:26:55.421
Updated: 2013-03-15 10:26:55.421


Author: omeyn@gbif.org
Comment: Given that the processing pieces are together achieving the goal of 1M/hour, generating more numbers isn't necessary right now.
Created: 2013-06-17 10:29:08.057
Updated: 2013-06-17 10:29:08.057