소스 검색

ZOOKEEPER-3264: [YCSB-binding] Add a benchmark tool for zookeeper

### 1. Brief Introduction
At first, I want to implement this benchmark tool based on the JMH. However, its has the following drawbacks:

-  For this situation:I want to simulate 1000000 requests with 100 concurrency client threads,What the parameters/options of JMH(mode,fork,threads,measurementIterations,measurementBatchSize) should I set?

-  Based on JMH, it also needs or use great effort to write a satisfactory script/CLI.

Then, I want to implement the tool from scratch. Have to admit that: it's difficult to implement a tool as powerful as YCSB did.

Finally, the YCSB is our choice. Here are its introductions:
- [Home Page](https://github.com/brianfrankcooper/YCSB/blob/master/README.md)
- [YCSB Paper](https://www2.cs.duke.edu/courses/fall13/cps296.4/838-CloudPapers/ycsb.pdf) is one of the best materials(A Great Reading)

YCSB has the following pros:

  - It supports requests number and concurrency threads number that user appoints
  - It supports various workloads for different read/write/insert/scan request ratio
  - It supports different request distributions(uniform, zipfian, latest)
  - It can output the benchmark results to different report formats(e.g.: `hdrhistogram`)
  - It has abundant metrics: throughout, avg/min/max/percentile latency, etc.
  - It has a detailed benchmark usage documentation

### 2. Benchmark Report
Here is a [WIP Benchmark Report](https://docs.google.com/document/d/1_kGwnx5y7SYLSLtEPbkGJkrWbnsSftwNTvi8RCC9L7Y/edit?usp=sharing)
which tested on the standard cloud machines with release-3.6.1(still a very messy draft and lots of works to do in the future). Two conspicuous performance issues:

#### 2.1. Tail latency
The tail latency is sometimes up to 200 ms. we can observer long-fat tail. Look at [figure-1](https://issues.apache.org/jira/secure/attachment/13017005/bm-read-histogram.png) and [figure-2](https://issues.apache.org/jira/secure/attachment/13017004/bm-update-histogram.png). We demand a smoother latency curve. The potential causes:

- Java GC
- The commit algorithm which schedules read/write requests in the `CommitProcessor`. (Most likely place where significantly improves the performance)

#### 2.2. Connection numbers
When connection numbers >= 1000, the throughput and latency extremely degrade. Look at [figure-3](https://issues.apache.org/jira/secure/attachment/13017006/bm-clients.png).
Using the TCP connection can guarantee the message/packet order, but TCP connections
that one Linux VM could support/hold is limited. Perhaps based/focused on Netty, we can improve this issue further.

### 3. Related Works
3.1 [YCSB-PR-1327](https://github.com/brianfrankcooper/YCSB/pull/1327)

3.2 [ZooKeeper Watch Benchmark Tool](https://github.com/apache/zookeeper/pull/1406)

Author: maoling <maoling199210191@sina.com>

Reviewers: Mate Szalay-Beko <symat@apache.org>, Damien Diederen <ddiederen@apache.org>

Closes #1558 from maoling/ZOOKEEPER-3264
maoling 4 년 전
부모
커밋
4e82a8be88
1개의 변경된 파일97개의 추가작업 그리고 0개의 파일을 삭제
  1. 97 0
      zookeeper-docs/src/main/resources/markdown/zookeeperTools.md

+ 97 - 0
zookeeper-docs/src/main/resources/markdown/zookeeperTools.md

@@ -24,6 +24,10 @@ limitations under the License.
     * [zkTxnLogToolkit.sh](#zkTxnLogToolkit)
     * [zkSnapShotToolkit.sh](#zkSnapShotToolkit)
     * [zkSnapshotComparer.sh](#zkSnapshotComparer)
+
+* [Benchmark](#Benchmark)
+    * [YCSB](#YCSB)
+    * [zk-smoketest](#zk-smoketest)
     
 * [Testing](#Testing)
     * [Jepsen Test](#jepsen-test)
@@ -425,6 +429,99 @@ All layers compared.
 
 Or use `^c` to exit interactive mode anytime.
 
+
+<a name="Benchmark"></a>
+
+## Benchmark
+
+<a name="YCSB"></a>
+
+### YCSB
+
+#### Quick Start
+
+This section describes how to run YCSB on ZooKeeper.
+
+#### 1. Start ZooKeeper Server(s)
+
+#### 2. Install Java and Maven
+
+#### 3. Set Up YCSB
+
+Git clone YCSB and compile:
+
+    git clone http://github.com/brianfrankcooper/YCSB.git
+    # more details in the landing page for instructions on downloading YCSB(https://github.com/brianfrankcooper/YCSB#getting-started).
+    cd YCSB
+    mvn -pl site.ycsb:zookeeper-binding -am clean package -DskipTests
+
+#### 4. Provide ZooKeeper Connection Parameters
+
+Set connectString, sessionTimeout, watchFlag in the workload you plan to run.
+
+- `zookeeper.connectString`
+- `zookeeper.sessionTimeout`
+- `zookeeper.watchFlag`
+  * A parameter for enabling ZooKeeper's watch, optional values:true or false.the default value is false.
+  * This parameter cannot test the watch performance, but for testing what effect will take on the read/write requests when enabling the watch.
+
+      ```bash
+      ./bin/ycsb run zookeeper -s -P workloads/workloadb -p zookeeper.connectString=127.0.0.1:2181/benchmark -p zookeeper.watchFlag=true
+      ```
+
+Or, you can set configs with the shell command, EG:
+
+    # create a /benchmark namespace for sake of cleaning up the workspace after test.
+    # e.g the CLI:create /benchmark
+    ./bin/ycsb run zookeeper -s -P workloads/workloadb -p zookeeper.connectString=127.0.0.1:2181/benchmark -p zookeeper.sessionTimeout=30000
+
+#### 5. Load data and run tests
+
+Load the data:
+
+    # -p recordcount,the count of records/paths you want to insert
+    ./bin/ycsb load zookeeper -s -P workloads/workloadb -p zookeeper.connectString=127.0.0.1:2181/benchmark -p recordcount=10000 > outputLoad.txt
+
+Run the workload test:
+
+    # YCSB workloadb is the most suitable workload for read-heavy workload for the ZooKeeper in the real world.
+
+    # -p fieldlength, test the length of value/data-content took effect on performance
+    ./bin/ycsb run zookeeper -s -P workloads/workloadb -p zookeeper.connectString=127.0.0.1:2181/benchmark -p fieldlength=1000
+
+    # -p fieldcount
+    ./bin/ycsb run zookeeper -s -P workloads/workloadb -p zookeeper.connectString=127.0.0.1:2181/benchmark -p fieldcount=20
+
+    # -p hdrhistogram.percentiles,show the hdrhistogram benchmark result
+    ./bin/ycsb run zookeeper -threads 1 -P workloads/workloadb -p zookeeper.connectString=127.0.0.1:2181/benchmark -p hdrhistogram.percentiles=10,25,50,75,90,95,99,99.9 -p histogram.buckets=500
+
+    # -threads: multi-clients test, increase the **maxClientCnxns** in the zoo.cfg to handle more connections.
+    ./bin/ycsb run zookeeper -threads 10 -P workloads/workloadb -p zookeeper.connectString=127.0.0.1:2181/benchmark
+
+    # show the timeseries benchmark result
+    ./bin/ycsb run zookeeper -threads 1 -P workloads/workloadb -p zookeeper.connectString=127.0.0.1:2181/benchmark -p measurementtype=timeseries -p timeseries.granularity=50
+
+    # cluster test
+    ./bin/ycsb run zookeeper -P workloads/workloadb -p zookeeper.connectString=192.168.10.43:2181,192.168.10.45:2181,192.168.10.27:2181/benchmark
+
+    # test leader's read/write performance by setting zookeeper.connectString to leader's(192.168.10.43:2181)
+    ./bin/ycsb run zookeeper -P workloads/workloadb -p zookeeper.connectString=192.168.10.43:2181/benchmark
+
+    # test for large znode(by default: jute.maxbuffer is 1048575 bytes/1 MB ). Notice:jute.maxbuffer should also be set the same value in all the zk servers.
+    ./bin/ycsb run zookeeper -jvm-args="-Djute.maxbuffer=4194304" -s -P workloads/workloadc -p zookeeper.connectString=127.0.0.1:2181/benchmark
+
+    # Cleaning up the workspace after finishing the benchmark.
+    # e.g the CLI:deleteall /benchmark
+
+
+<a name="zk-smoketest"></a>
+
+### zk-smoketest
+
+**zk-smoketest** provides a simple smoketest client for a ZooKeeper ensemble. Useful for verifying new, updated,
+existing installations. More details are [here](https://github.com/phunt/zk-smoketest).
+
+
 <a name="Testing"></a>
 
 ## Testing