Issue 12980
Generate raw throughput numbers of major processor and persistence components
12980
Reporter: omeyn
Assignee: omeyn
Type: Task
Summary: Generate raw throughput numbers of major processor and persistence components
Priority: Critical
Resolution: Fixed
Status: Closed
Created: 2013-03-11 11:48:48.716
Updated: 2013-12-17 15:17:02.584
Resolved: 2013-06-17 10:29:08.019
Description: From Tim:
1) Fragment persisting which could be bound by:
i) the primary key determination stuff (sync HBase ops)
- what is the raw throughput of that please?
ii) the sending of messages (perhaps)
- what is the raw throughput of that please?
iii) HBase presisting has to be synchronous since the message sent only has the key, and a client is going to read that immediately (batching will cause bad reads)
- is that around 1400 rec/s ? (as per my cube writes)
[Is there XML parsing, in which case CPU might limit?]
2) The frag -> dwc fields which could be bound by:
i) CPU on XML parsing
ii) the sending of messages (perhaps)
iii) SVN shows HBase ops are synchronous (presumably since the message sent only has the key, and a client is going to read that immediately)
- is that around 1400 rec/s ? (as per my cube writes)
3) The dwc fields -> interpreted which could be bound by:
i) Nub lookup WS (presume huge bottleneck)
ii) Geocode lookup WS (presume no bottleneck?)
iii) SVN shows HBase ops are synchronous (presumably since the message sent only has the key, and a client is going to read that immediately)
- these could be batched, as the messages for sure contain the new values]]>
Author: omeyn@gbif.org
Comment: HBase key generation: ~3000/keys/s (if there's no contention - the usual case)
Created: 2013-03-11 16:25:45.977
Updated: 2013-03-11 16:25:45.977
Author: omeyn@gbif.org
Comment: Message sending is no longer a bottleneck since upgrading rabbit, giving it more cpu, and Tim's sharing channel patch.
Created: 2013-03-15 10:26:55.421
Updated: 2013-03-15 10:26:55.421
Author: omeyn@gbif.org
Comment: Given that the processing pieces are together achieving the goal of 1M/hour, generating more numbers isn't necessary right now.
Created: 2013-06-17 10:29:08.057
Updated: 2013-06-17 10:29:08.057