浏览代码

HDFS-14410. Make Dynamometer documentation properly compile onto the Hadoop site. Contributed by Erik Krogen.

Yiqun Lin 5 年之前
父节点
当前提交
5043840b1d

+ 1 - 0
hadoop-project/src/site/site.xml

@@ -215,6 +215,7 @@
       <item name="Resource Estimator Service" href="hadoop-resourceestimator/ResourceEstimator.html"/>
       <item name="Scheduler Load Simulator" href="hadoop-sls/SchedulerLoadSimulator.html"/>
       <item name="Hadoop Benchmarking" href="hadoop-project-dist/hadoop-common/Benchmarking.html"/>
+      <item name="Dynamometer" href="hadoop-dynamometer/Dynamometer.html"/>
     </menu>
 
     <menu name="Reference" inherit="top">

+ 299 - 0
hadoop-tools/hadoop-dynamometer/src/site/markdown/Dynamometer.md

@@ -0,0 +1,299 @@
+<!---
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License. See accompanying LICENSE file.
+-->
+
+# Dynamometer Guide
+
+<!-- MACRO{toc|fromDepth=0|toDepth=3} -->
+
+## Overview
+
+Dynamometer is a tool to performance test Hadoop's HDFS NameNode. The intent is to provide a
+real-world environment by initializing the NameNode against a production file system image and replaying
+a production workload collected via e.g. the NameNode's audit logs. This allows for replaying a workload
+which is not only similar in characteristic to that experienced in production, but actually identical.
+
+Dynamometer will launch a YARN application which starts a single NameNode and a configurable number of
+DataNodes, simulating an entire HDFS cluster as a single application. There is an additional `workload`
+job run as a MapReduce job which accepts audit logs as input and uses the information contained within to
+submit matching requests to the NameNode, inducing load on the service.
+
+Dynamometer can execute this same workload against different Hadoop versions or with different
+configurations, allowing for the testing of configuration tweaks and code changes at scale without the
+necessity of deploying to a real large-scale cluster.
+
+Throughout this documentation, we will use "Dyno-HDFS", "Dyno-NN", and "Dyno-DN" to refer to the HDFS
+cluster, NameNode, and DataNodes (respectively) which are started _inside of_ a Dynamometer application.
+Terms like HDFS, YARN, and NameNode used without qualification refer to the existing infrastructure on
+top of which Dynamometer is run.
+
+For more details on how Dynamometer works, as opposed to how to use it, see the Architecture section
+at the end of this page.
+
+## Requirements
+
+Dynamometer is based around YARN applications, so an existing YARN cluster will be required for execution.
+It also requires an accompanying HDFS instance to store some temporary files for communication.
+
+## Building
+
+Dynamometer consists of three main components, each one in its own module:
+
+* Infrastructure (`dynamometer-infra`): This is the YARN application which starts a Dyno-HDFS cluster.
+* Workload (`dynamometer-workload`): This is the MapReduce job which replays audit logs.
+* Block Generator (`dynamometer-blockgen`): This is a MapReduce job used to generate input files for each Dyno-DN; its
+  execution is a prerequisite step to running the infrastructure application.
+
+The compiled version of all of these components will be included in a standard Hadoop distribution.
+You can find them in the packaged distribution within `share/hadoop/tools/dynamometer`.
+
+## Setup Steps
+
+Before launching a Dynamometer application, there are a number of setup steps that must be completed,
+instructing Dynamometer what configurations to use, what version to use, what fsimage to use when
+loading, etc. These steps can be performed a single time to put everything in place, and then many
+Dynamometer executions can be performed against them with minor tweaks to measure variations.
+
+Scripts discussed below can be found in the `share/hadoop/tools/dynamometer/dynamometer-{infra,workload,blockgen}/bin`
+directories of the distribution. The corresponding Java JAR files can be found in the `share/hadoop/tools/lib/` directory.
+References to bin files below assume that the current working directory is `share/hadoop/tools/dynamometer`.
+
+### Step 1: Preparing Requisite Files
+
+A number of steps are required in advance of starting your first Dyno-HDFS cluster:
+
+### Step 2: Prepare FsImage Files
+
+Collect an fsimage and related files from your NameNode. This will include the `fsimage_TXID` file
+which the NameNode creates as part of checkpointing, the `fsimage_TXID.md5` containing the md5 hash
+of the image, the `VERSION` file containing some metadata, and the `fsimage_TXID.xml` file which can
+be generated from the fsimage using the offline image viewer:
+```
+hdfs oiv -i fsimage_TXID -o fsimage_TXID.xml -p XML
+```
+It is recommended that you collect these files from your Secondary/Standby NameNode if you have one
+to avoid placing additional load on your Active NameNode.
+
+All of these files must be placed somewhere on HDFS where the various jobs will be able to access them.
+They should all be in the same folder, e.g. `hdfs:///dyno/fsimage`.
+
+All these steps can be automated with the `upload-fsimage.sh` script, e.g.:
+```
+./dynamometer-infra/bin/upload-fsimage.sh 0001 hdfs:///dyno/fsimage
+```
+Where 0001 is the transaction ID of the desired fsimage. See usage info of the script for more detail.
+
+### Step 3: Prepare a Hadoop Binary
+
+Collect the Hadoop distribution tarball to use to start the Dyno-NN and -DNs. For example, if
+testing against Hadoop 3.0.2, use
+[hadoop-3.0.2.tar.gz](http://www.apache.org/dyn/closer.cgi/hadoop/common/hadoop-3.0.2/hadoop-3.0.2.tar.gz).
+This distribution contains several components unnecessary for Dynamometer (e.g. YARN), so to reduce
+its size, you can optionally use the `create-slim-hadoop-tar.sh` script:
+```
+./dynamometer-infra/bin/create-slim-hadoop-tar.sh hadoop-VERSION.tar.gz
+```
+The Hadoop tar can be present on HDFS or locally where the client will be run from. Its path will be
+supplied to the client via the `-hadoop_binary_path` argument.
+
+Alternatively, if you use the `-hadoop_version` argument, you can simply specify which version you would
+like to run against (e.g. '3.0.2') and the client will attempt to download it automatically from an
+Apache mirror. See the usage information of the client for more details.
+
+### Step 4: Prepare Configurations
+
+Prepare a configuration directory. You will need to specify a configuration directory with the standard
+Hadoop configuration layout, e.g. it should contain `etc/hadoop/*-site.xml`. This determines with what
+configuration the Dyno-NN and -DNs will be launched. Configurations that must be modified for
+Dynamometer to work properly (e.g. `fs.defaultFS` or `dfs.namenode.name.dir`) will be overridden
+at execution time. This can be a directory if it is available locally, else an archive file on local
+or remote (HDFS) storage.
+
+### Step 5: Execute the Block Generation Job
+
+This will use the `fsimage_TXID.xml` file to generate the list of blocks that each Dyno-DN should
+advertise to the Dyno-NN. It runs as a MapReduce job.
+```
+./dynamometer-blockgen/bin/generate-block-lists.sh
+    -fsimage_input_path hdfs:///dyno/fsimage/fsimage_TXID.xml
+    -block_image_output_dir hdfs:///dyno/blocks
+    -num_reducers R
+    -num_datanodes D
+```
+In this example, the XML file uploaded above is used to generate block listings into `hdfs:///dyno/blocks`.
+`R` reducers are used for the job, and `D` block listings are generated - this will determine how many
+Dyno-DNs are started in the Dyno-HDFS cluster.
+
+### Step 6: Prepare Audit Traces (Optional)
+
+This step is only necessary if you intend to use the audit trace replay capabilities of Dynamometer; if you
+just intend to start a Dyno-HDFS cluster you can skip to the next section.
+
+The audit trace replay accepts one input file per mapper, and currently supports two input formats, configurable
+via the `auditreplay.command-parser.class` configuration. One mapper will automatically be created for every
+audit log file within the audit log directory specified at launch time.
+
+The default is a direct format,
+`com.linkedin.dynamometer.workloadgenerator.audit.AuditLogDirectParser`. This accepts files in the format produced
+by a standard configuration audit logger, e.g. lines like:
+```
+1970-01-01 00:00:42,000 INFO FSNamesystem.audit: allowed=true	ugi=hdfs	ip=/127.0.0.1	cmd=open	src=/tmp/foo	dst=null	perm=null	proto=rpc
+```
+When using this format you must also specify `auditreplay.log-start-time.ms`, which should be (in milliseconds since
+the Unix epoch) the start time of the audit traces. This is needed for all mappers to agree on a single start time. For
+example, if the above line was the first audit event, you would specify `auditreplay.log-start-time.ms=42000`. Within a
+file, the audit logs must be in order of ascending timestamp.
+
+The other supported format is `com.linkedin.dynamometer.workloadgenerator.audit.AuditLogHiveTableParser`. This accepts
+files in the format produced by a Hive query with output fields, in order:
+
+* `relativeTimestamp`: event time offset, in milliseconds, from the start of the trace
+* `ugi`: user information of the submitting user
+* `command`: name of the command, e.g. 'open'
+* `source`: source path
+* `dest`: destination path
+* `sourceIP`: source IP of the event
+
+Assuming your audit logs are available in Hive, this can be produced via a Hive query looking like:
+```sql
+INSERT OVERWRITE DIRECTORY '${outputPath}'
+SELECT (timestamp - ${startTimestamp} AS relativeTimestamp, ugi, command, source, dest, sourceIP
+FROM '${auditLogTableLocation}'
+WHERE timestamp >= ${startTimestamp} AND timestamp < ${endTimestamp}
+DISTRIBUTE BY src
+SORT BY relativeTimestamp ASC;
+```
+
+#### Partitioning the Audit Logs
+
+You may notice that in the Hive query shown above, there is a `DISTRIBUTE BY src` clause which indicates that the
+output files should be partitioned by the source IP of the caller. This is done to try to maintain closer ordering
+of requests which originated from a single client. Dynamometer does not guarantee strict ordering of operations even
+within a partition, but ordering will typically be maintained more closely within a partition than across partitions.
+
+Whether you use Hive or raw audit logs, it will be necessary to partition the audit logs based on the number of
+simultaneous clients you required to perform your workload replay. Using the source IP as a partition key is one
+approach with the potential advantages discussed above, but any partition scheme should work reasonably well.
+
+## Running Dynamometer
+
+After the setup steps above have been completed, you're ready to start up a Dyno-HDFS cluster and replay
+some workload against it!
+
+The client which launches the Dyno-HDFS YARN application can optionally launch the workload replay
+job once the Dyno-HDFS cluster has fully started. This makes each replay into a single execution of the client,
+enabling easy testing of various configurations. You can also launch the two separately to have more control.
+Similarly, it is possible to launch Dyno-DNs for an external NameNode which is not controlled by Dynamometer/YARN.
+This can be useful for testing NameNode configurations which are not yet supported (e.g. HA NameNodes). You can do
+this by passing the `-namenode_servicerpc_addr` argument to the infrastructure application with a value that points
+to an external NameNode's service RPC address.
+
+### Manual Workload Launch
+
+First launch the infrastructure application to begin the startup of the internal HDFS cluster, e.g.:
+```
+./dynamometer-infra/bin/start-dynamometer-cluster.sh
+    -hadoop_binary_path hadoop-3.0.2.tar.gz
+    -conf_path my-hadoop-conf
+    -fs_image_dir hdfs:///fsimage
+    -block_list_path hdfs:///dyno/blocks
+```
+This demonstrates the required arguments. You can run this with the `-help` flag to see further usage information.
+
+The client will track the Dyno-NN's startup progress and how many Dyno-DNs it considers live. It will notify
+via logging when the Dyno-NN has exited safemode and is ready for use.
+
+At this point, a workload job (map-only MapReduce job) can be launched, e.g.:
+```
+./dynamometer-workload/bin/start-workload.sh
+    -Dauditreplay.input-path=hdfs:///dyno/audit_logs/
+    -Dauditreplay.output-path=hdfs:///dyno/results/
+    -Dauditreplay.num-threads=50
+    -nn_uri hdfs://namenode_address:port/
+    -start_time_offset 5m
+    -mapper_class_name AuditReplayMapper
+```
+The type of workload generation is configurable; AuditReplayMapper replays an audit log trace as discussed previously.
+The AuditReplayMapper is configured via configurations; `auditreplay.input-path`, `auditreplay.output-path` and
+`auditreplay.num-threads` are required to specify the input path for audit log files, the output path for the results,
+and the number of threads per map task. A number of map tasks equal to the number of files in `input-path` will be
+launched; each task will read in one of these input files and use `num-threads` threads to replay the events contained
+within that file. A best effort is made to faithfully replay the audit log events at the same pace at which they
+originally occurred (optionally, this can be adjusted by specifying `auditreplay.rate-factor` which is a multiplicative
+factor towards the rate of replay, e.g. use 2.0 to replay the events at twice the original speed).
+
+### Integrated Workload Launch
+
+To have the infrastructure application client launch the workload automatically, parameters for the workload job
+are passed to the infrastructure script. Only the AuditReplayMapper is supported in this fashion at this time. To
+launch an integrated application with the same parameters as were used above, the following can be used:
+```
+./dynamometer-infra/bin/start-dynamometer-cluster.sh
+    -hadoop_binary hadoop-3.0.2.tar.gz
+    -conf_path my-hadoop-conf
+    -fs_image_dir hdfs:///fsimage
+    -block_list_path hdfs:///dyno/blocks
+    -workload_replay_enable
+    -workload_input_path hdfs:///dyno/audit_logs/
+    -workload_output_path hdfs:///dyno/results/
+    -workload_threads_per_mapper 50
+    -workload_start_delay 5m
+```
+When run in this way, the client will automatically handle tearing down the Dyno-HDFS cluster once the
+workload has completed. To see the full list of supported parameters, run this with the `-help` flag.
+
+## Architecture
+
+Dynamometer is implemented as an application on top of YARN. There are three main actors in a Dynamometer application:
+
+* Infrastructure is the simulated HDFS cluster.
+* Workload simulates HDFS clients to generate load on the simulated NameNode.
+* The driver coordinates the two other components.
+
+The logic encapsulated in the driver enables a user to perform a full test execution of Dynamometer with a single command,
+making it possible to do things like sweeping over different parameters to find optimal configurations.
+
+![Dynamometer Infrastructure Application Architecture](./images/dynamometer-architecture-infra.png)
+
+The infrastructure application is written as a native YARN application in which a single NameNode and numerous DataNodes
+are launched and wired together to create a fully simulated HDFS cluster. For Dynamometer to provide an extremely realistic scenario,
+it is necessary to have a cluster which contains, from the NameNode’s perspective, the same information as a production cluster.
+This is why the setup steps described above involve first collecting the FsImage file from a production NameNode and placing it onto
+the host HDFS cluster. To avoid having to copy an entire cluster’s worth of blocks, Dynamometer leverages the fact that the actual
+data stored in blocks is irrelevant to the NameNode, which is only aware of the block metadata. Dynamometer's blockgen job first
+uses the Offline Image Viewer to turn the FsImage into XML, then parses this to extract the metadata for each block, then partitions this
+information before placing it on HDFS for the simulated DataNodes to consume. `SimulatedFSDataset` is used to bypass the DataNode storage
+layer and store only the block metadata, loaded from the information extracted in the previous step. This scheme allows Dynamometer to
+pack many simulated DataNodes onto each physical node, as the size of the metadata is many orders of magnitude smaller than the data itself.
+
+To create a stress test that matches a production environment, Dynamometer needs a way to collect the information about the production workload.
+For this the HDFS audit log is used, which contains a faithful record of all client-facing operations against the NameNode. By replaying this
+audit log to recreate the client load, and running simulated DataNodes to recreate the cluster management load, Dynamometer is able to provide
+a realistic simulation of the conditions of a production NameNode.
+
+![Dynamometer Replay Architecture](./images/dynamometer-architecture-replay.png)
+
+A heavily-loaded NameNode can service tens of thousands of operations per second; to induce such a load, Dynamometer needs numerous clients to submit
+requests. In an effort to ensure that each request has the same effect and performance implications as its original submission, Dynamometer
+attempts to make related requests (for example, a directory creation followed by a listing of that directory) in such a way as to preserve their original
+ordering. It is for this reason that audit log files are suggested to be partitioned by source IP address, using the assumption that requests which
+originated from the same host have more tightly coupled causal relationships than those which originated from different hosts. In the interest of
+simplicity, the stress testing job is written as a map-only MapReduce job, in which each mapper consumes a partitioned audit log file and replays
+the commands contained within against the simulated NameNode. During execution statistics are collected about the replay, such as latency for
+different types of requests.
+
+## External Resources
+
+To see more information on Dynamometer, you can see the
+[blog post announcing its initial release](https://engineering.linkedin.com/blog/2018/02/dynamometer--scale-testing-hdfs-on-minimal-hardware-with-maximum)
+or [this presentation](https://www.slideshare.net/xkrogen/hadoop-meetup-jan-2019-dynamometer-and-a-case-study-in-namenode-gc).

+ 30 - 0
hadoop-tools/hadoop-dynamometer/src/site/resources/css/site.css

@@ -0,0 +1,30 @@
+/*
+* Licensed to the Apache Software Foundation (ASF) under one or more
+* contributor license agreements.  See the NOTICE file distributed with
+* this work for additional information regarding copyright ownership.
+* The ASF licenses this file to You under the Apache License, Version 2.0
+* (the "License"); you may not use this file except in compliance with
+* the License.  You may obtain a copy of the License at
+*
+*     http://www.apache.org/licenses/LICENSE-2.0
+*
+* Unless required by applicable law or agreed to in writing, software
+* distributed under the License is distributed on an "AS IS" BASIS,
+* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+* See the License for the specific language governing permissions and
+* limitations under the License.
+*/
+#banner {
+  height: 93px;
+  background: none;
+}
+
+#bannerLeft img {
+  margin-left: 30px;
+  margin-top: 10px;
+}
+
+#bannerRight img {
+  margin: 17px;
+}
+

二进制
hadoop-tools/hadoop-dynamometer/src/site/resources/images/dynamometer-architecture-infra.png


二进制
hadoop-tools/hadoop-dynamometer/src/site/resources/images/dynamometer-architecture-replay.png