Tuesday, August 19, 2014

HBase write throughput

Hbase_write_throughput

HBase write throughput as a function of number of column qualifiers

In Hbase, every cell value is stored along with all its cardinalities as follows,

rowkey:columnfamily:columnqualifier:timestamp:value

Hypothetically, let us assume the following

Data payload size               = 10 kb
rowkey size                     = 64 kb
columnfamily:columnname size    = 60 kb

In order to write a row with say 2 columns, the total amount of bytes transferred and written will be

2 * ( 5kb + 64 kb + 60 kb) = 258 kb (Total 10kb of payload split between two columns)

In order to write a row with say 1 column, the total will be

1 * (10 + 64 + 60) = 134 kb.

Larger the size, more data transfer across network, memstore will get full more often and hence will need more flush. This will negatively impact write throughput.

Verfiying this behaviour using HBase Load Testing tool,

Summary

- Rows          :   10k             10k             10K
- Columns       :   2               5               10
- PayLoad       :   512 kb          200 kb          100 kb
- Total PayLoad :   ~1000 kb        1000 kb         1000 kb     
- Throughput    :   405 Keys/s      252 keys/s      175 keys/s

Details

We start with 10000 rows, 2 columns with a payload of 512kb for every cell, indicated by - write 2:512:20

$ hbase org.apache.hadoop.hbase.util.LoadTestTool -write 2:512:20 -num_keys 10000
14/08/19 11:47:26 INFO Configuration.deprecation: hadoop.native.lib is deprecated. Instead, use io.native.lib.available
Key range: [0..9999]
Multi-puts: false
Columns per key: 1..4
Data size per column: 256..768

Below is a log captured at 5 seconds interval, at the end of 20 seconds, we see that write throughput is 405 keys/s

Starting to write data...
14/08/19 11:47:39 INFO util.MultiThreadedAction: [W:20] Keys=1663, cols=5.6 K, time=00:00:05 Overall: [keys/s= 332, latency=58 ms] Current: [keys/s=332, latency=58 ms], wroteUpTo=-1
14/08/19 11:47:44 INFO util.MultiThreadedAction: [W:20] Keys=3641, cols=12.3 K, time=00:00:10 Overall: [keys/s= 361, latency=54 ms] Current: [keys/s=395, latency=51 ms], wroteUpTo=-1
14/08/19 11:47:49 INFO util.MultiThreadedAction: [W:20] Keys=5769, cols=19.5 K, time=00:00:15 Overall: [keys/s= 382, latency=51 ms] Current: [keys/s=425, latency=46 ms], wroteUpTo=-1
14/08/19 11:47:54 INFO util.MultiThreadedAction: [W:20] Keys=8128, cols=27.6 K, time=00:00:20 Overall: [keys/s= 405, latency=49 ms] Current: [keys/s=471, latency=42 ms], wroteUpTo=-1
Failed to write keys: 0

We do it again with 5 columns, 200kb payload and 10k rows

$ hbase org.apache.hadoop.hbase.util.LoadTestTool -write 5:200:20 -num_keys 10000
14/08/19 14:38:20 INFO Configuration.deprecation: hadoop.native.lib is deprecated. Instead, use io.native.lib.available
Key range: [0..9999]
Multi-puts: false
Columns per key: 1..10
Data size per column: 100..300
.
.
Starting to write data...
14/08/19 14:38:31 INFO util.MultiThreadedAction: [W:20] Keys=901, cols=5.6 K, time=00:00:05 Overall: [keys/s= 180, latency=106 ms] Current: [keys/s=180, latency=106 ms], wroteUpTo=-1
14/08/19 14:38:36 INFO util.MultiThreadedAction: [W:20] Keys=1979, cols=12.4 K, time=00:00:10 Overall: [keys/s= 197, latency=99 ms] Current: [keys/s=215, latency=92 ms], wroteUpTo=-1
14/08/19 14:38:41 INFO util.MultiThreadedAction: [W:20] Keys=3070, cols=19.3 K, time=00:00:15 Overall: [keys/s= 204, latency=96 ms] Current: [keys/s=218, latency=91 ms], wroteUpTo=-1
14/08/19 14:38:46 INFO util.MultiThreadedAction: [W:20] Keys=4367, cols=27.7 K, time=00:00:20 Overall: [keys/s= 218, latency=90 ms] Current: [keys/s=259, latency=77 ms], wroteUpTo=-1
14/08/19 14:38:51 INFO util.MultiThreadedAction: [W:20] Keys=5857, cols=36.9 K, time=00:00:25 Overall: [keys/s= 234, latency=84 ms] Current: [keys/s=298, latency=66 ms], wroteUpTo=-1
14/08/19 14:38:56 INFO util.MultiThreadedAction: [W:20] Keys=7373, cols=46.4 K, time=00:00:30 Overall: [keys/s= 245, latency=80 ms] Current: [keys/s=303, latency=65 ms], wroteUpTo=-1
14/08/19 14:39:01 INFO util.MultiThreadedAction: [W:20] Keys=8843, cols=55.7 K, time=00:00:35 Overall: [keys/s= 252, latency=78 ms] Current: [keys/s=294, latency=67 ms], wroteUpTo=-1
Failed to write keys: 0

As seen above, the write throughput has reduced to 252 keys/s.

Further increasing the number of columns to 10, with 100K payload, the write throughput is reduced to 175 keys/s

$ hbase org.apache.hadoop.hbase.util.LoadTestTool -write 10:100:20 -num_keys 10000
14/08/19 14:34:54 INFO Configuration.deprecation: hadoop.native.lib is deprecated. Instead, use io.native.lib.available
Key range: [0..9999]
Multi-puts: false
Columns per key: 1..20
Data size per column: 50..150

Starting to write data...
14/08/19 14:35:07 INFO util.MultiThreadedAction: [W:20] Keys=582, cols=6.4 K, time=00:00:05 Overall: [keys/s= 116, latency=168 ms] Current: [keys/s=116, latency=168 ms], wroteUpTo=-1
14/08/19 14:35:12 INFO util.MultiThreadedAction: [W:20] Keys=1157, cols=13.0 K, time=00:00:10 Overall: [keys/s= 115, latency=171 ms] Current: [keys/s=115, latency=173 ms], wroteUpTo=-1
14/08/19 14:35:17 INFO util.MultiThreadedAction: [W:20] Keys=1884, cols=21.0 K, time=00:00:15 Overall: [keys/s= 125, latency=158 ms] Current: [keys/s=145, latency=137 ms], wroteUpTo=-1
14/08/19 14:35:22 INFO util.MultiThreadedAction: [W:20] Keys=2687, cols=30.0 K, time=00:00:20 Overall: [keys/s= 134, latency=147 ms] Current: [keys/s=160, latency=123 ms], wroteUpTo=-1
14/08/19 14:35:27 INFO util.MultiThreadedAction: [W:20] Keys=3558, cols=39.8 K, time=00:00:25 Overall: [keys/s= 142, latency=139 ms] Current: [keys/s=174, latency=115 ms], wroteUpTo=-1
14/08/19 14:35:32 INFO util.MultiThreadedAction: [W:20] Keys=4513, cols=50.5 K, time=00:00:30 Overall: [keys/s= 150, latency=132 ms] Current: [keys/s=191, latency=104 ms], wroteUpTo=-1
14/08/19 14:35:37 INFO util.MultiThreadedAction: [W:20] Keys=5410, cols=60.5 K, time=00:00:35 Overall: [keys/s= 154, latency=128 ms] Current: [keys/s=179, latency=111 ms], wroteUpTo=-1
14/08/19 14:35:42 INFO util.MultiThreadedAction: [W:20] Keys=6322, cols=70.8 K, time=00:00:40 Overall: [keys/s= 157, latency=126 ms] Current: [keys/s=182, latency=109 ms], wroteUpTo=-1
14/08/19 14:35:47 INFO util.MultiThreadedAction: [W:20] Keys=7280, cols=81.8 K, time=00:00:45 Overall: [keys/s= 161, latency=123 ms] Current: [keys/s=191, latency=104 ms], wroteUpTo=-1
14/08/19 14:35:52 INFO util.MultiThreadedAction: [W:20] Keys=8496, cols=95.6 K, time=00:00:50 Overall: [keys/s= 169, latency=117 ms] Current: [keys/s=243, latency=82 ms], wroteUpTo=-1
14/08/19 14:35:57 INFO util.MultiThreadedAction: [W:20] Keys=9632, cols=108.1 K, time=00:00:55 Overall: [keys/s= 175, latency=113 ms] Current: [keys/s=227, latency=87 ms], wroteUpTo=-1
Failed to write keys: 0

No comments:

Post a Comment