Browse Source

HDFS-8284. Update documentation about how to use HTrace with HDFS (Masatake Iwasaki via Colin P. McCabe)

(cherry picked from commit 8f7c2364d7254a1d987b095ba442bf20727796f8)
Colin Patrick Mccabe 10 years ago
parent
commit
d83ae68bb0

+ 0 - 12
hadoop-common-project/hadoop-common/src/main/resources/core-default.xml

@@ -1758,18 +1758,6 @@ for ldap providers in the same way as above does.
   </description>
   </description>
 </property>
 </property>
 
 
-<property>
-  <name>hadoop.htrace.spanreceiver.classes</name>
-  <value></value>
-  <description>
-    A comma separated list of the fully-qualified class name of classes 
-    implementing SpanReceiver. The tracing system works by collecting 
-    information in structs called 'Spans'. It is up to you to choose 
-    how you want to receive this information by implementing the 
-    SpanReceiver interface.
-  </description>
-</property>
-
  <property>
  <property>
   <name>ipc.server.max.connections</name>
   <name>ipc.server.max.connections</name>
   <value>0</value>
   <value>0</value>

+ 66 - 62
hadoop-common-project/hadoop-common/src/site/markdown/Tracing.md

@@ -18,13 +18,13 @@ Enabling Dapper-like Tracing in Hadoop
 * [Enabling Dapper-like Tracing in Hadoop](#Enabling_Dapper-like_Tracing_in_Hadoop)
 * [Enabling Dapper-like Tracing in Hadoop](#Enabling_Dapper-like_Tracing_in_Hadoop)
     * [Dapper-like Tracing in Hadoop](#Dapper-like_Tracing_in_Hadoop)
     * [Dapper-like Tracing in Hadoop](#Dapper-like_Tracing_in_Hadoop)
         * [HTrace](#HTrace)
         * [HTrace](#HTrace)
-        * [Samplers](#Samplers)
         * [SpanReceivers](#SpanReceivers)
         * [SpanReceivers](#SpanReceivers)
-        * [Setting up ZipkinSpanReceiver](#Setting_up_ZipkinSpanReceiver)
         * [Dynamic update of tracing configuration](#Dynamic_update_of_tracing_configuration)
         * [Dynamic update of tracing configuration](#Dynamic_update_of_tracing_configuration)
         * [Starting tracing spans by HTrace API](#Starting_tracing_spans_by_HTrace_API)
         * [Starting tracing spans by HTrace API](#Starting_tracing_spans_by_HTrace_API)
         * [Sample code for tracing](#Sample_code_for_tracing)
         * [Sample code for tracing](#Sample_code_for_tracing)
-  
+        * [Starting tracing spans by configuration for HDFS client](#Starting_tracing_spans_by_configuration_for_HDFS_client)
+
+
 Dapper-like Tracing in Hadoop
 Dapper-like Tracing in Hadoop
 -----------------------------
 -----------------------------
 
 
@@ -32,83 +32,51 @@ Dapper-like Tracing in Hadoop
 
 
 [HDFS-5274](https://issues.apache.org/jira/browse/HDFS-5274) added support for tracing requests through HDFS,
 [HDFS-5274](https://issues.apache.org/jira/browse/HDFS-5274) added support for tracing requests through HDFS,
 using the open source tracing library,
 using the open source tracing library,
-[Apache HTrace](https://git-wip-us.apache.org/repos/asf/incubator-htrace.git). 
+[Apache HTrace](http://htrace.incubator.apache.org/).
 Setting up tracing is quite simple, however it requires some very minor changes to your client code.
 Setting up tracing is quite simple, however it requires some very minor changes to your client code.
 
 
-### Samplers
-
-Configure the samplers in `core-site.xml` property: `hadoop.htrace.sampler`.
-The value can be NeverSampler, AlwaysSampler or ProbabilitySampler.
-NeverSampler: HTrace is OFF for all spans;
-AlwaysSampler: HTrace is ON for all spans;
-ProbabilitySampler: HTrace is ON for some percentage% of top-level spans.
-
-      <property>
-        <name>hadoop.htrace.sampler</name>
-        <value>NeverSampler</value>
-      </property>
-
 ### SpanReceivers
 ### SpanReceivers
 
 
 The tracing system works by collecting information in structs called 'Spans'.
 The tracing system works by collecting information in structs called 'Spans'.
 It is up to you to choose how you want to receive this information
 It is up to you to choose how you want to receive this information
-by implementing the SpanReceiver interface, which defines one method:
+by using implementation of [SpanReceiver](http://htrace.incubator.apache.org/#Span_Receivers)
+interface bundled with HTrace or implementing it by yourself.
 
 
-    public void receiveSpan(Span span);
+[HTrace](http://htrace.incubator.apache.org/) provides options such as
 
 
-Configure what SpanReceivers you'd like to use
+* FlumeSpanReceiver
+* HBaseSpanReceiver
+* HTracedRESTReceiver
+* ZipkinSpanReceiver
+
+In order to set up SpanReceivers for HDFS servers,
+configure what SpanReceivers you'd like to use
 by putting a comma separated list of the fully-qualified class name of classes implementing SpanReceiver
 by putting a comma separated list of the fully-qualified class name of classes implementing SpanReceiver
-in `core-site.xml` property: `hadoop.htrace.spanreceiver.classes`.
+in `hdfs-site.xml` property: `dfs.htrace.spanreceiver.classes`.
 
 
+```xml
       <property>
       <property>
-        <name>hadoop.htrace.spanreceiver.classes</name>
+        <name>dfs.htrace.spanreceiver.classes</name>
         <value>org.apache.htrace.impl.LocalFileSpanReceiver</value>
         <value>org.apache.htrace.impl.LocalFileSpanReceiver</value>
       </property>
       </property>
       <property>
       <property>
-        <name>hadoop.htrace.local-file-span-receiver.path</name>
+        <name>dfs.htrace.local-file-span-receiver.path</name>
         <value>/var/log/hadoop/htrace.out</value>
         <value>/var/log/hadoop/htrace.out</value>
       </property>
       </property>
+```
 
 
 You can omit package name prefix if you use span receiver bundled with HTrace.
 You can omit package name prefix if you use span receiver bundled with HTrace.
 
 
+```xml
       <property>
       <property>
-        <name>hadoop.htrace.spanreceiver.classes</name>
+        <name>dfs.htrace.spanreceiver.classes</name>
         <value>LocalFileSpanReceiver</value>
         <value>LocalFileSpanReceiver</value>
       </property>
       </property>
+```
 
 
-### Setting up ZipkinSpanReceiver
-
-Instead of implementing SpanReceiver by yourself,
-you can use `ZipkinSpanReceiver` which uses
-[Zipkin](https://github.com/twitter/zipkin) for collecting and displaying tracing data.
-
-In order to use `ZipkinSpanReceiver`,
-you need to download and setup [Zipkin](https://github.com/twitter/zipkin) first.
-
-you also need to add the jar of `htrace-zipkin` to the classpath of Hadoop on each node.
-Here is example setup procedure.
-
-      $ git clone https://github.com/cloudera/htrace
-      $ cd htrace/htrace-zipkin
-      $ mvn compile assembly:single
-      $ cp target/htrace-zipkin-*-jar-with-dependencies.jar $HADOOP_HOME/share/hadoop/common/lib/
-
-The sample configuration for `ZipkinSpanReceiver` is shown below.
-By adding these to `core-site.xml` of NameNode and DataNodes, `ZipkinSpanReceiver` is initialized on the startup.
-You also need this configuration on the client node in addition to the servers.
-
-      <property>
-        <name>hadoop.htrace.spanreceiver.classes</name>
-        <value>ZipkinSpanReceiver</value>
-      </property>
-      <property>
-        <name>hadoop.htrace.zipkin.collector-hostname</name>
-        <value>192.168.1.2</value>
-      </property>
-      <property>
-        <name>hadoop.htrace.zipkin.collector-port</name>
-        <value>9410</value>
-      </property>
+You also need to add the jar bundling SpanReceiver to the classpath of Hadoop
+on each node. (LocalFileSpanReceiver in the example above is included in the
+jar of htrace-core which is bundled with Hadoop.)
 
 
 ### Dynamic update of tracing configuration
 ### Dynamic update of tracing configuration
 
 
@@ -136,8 +104,8 @@ You need to run the command against all servers if you want to update the config
 You need to specify the class name of span receiver as argument of `-class` option.
 You need to specify the class name of span receiver as argument of `-class` option.
 You can specify the configuration associated with span receiver by `-Ckey=value` options.
 You can specify the configuration associated with span receiver by `-Ckey=value` options.
 
 
-      $ hadoop trace -add -class LocalFileSpanReceiver -Chadoop.htrace.local-file-span-receiver.path=/tmp/htrace.out -host 192.168.56.2:9000
-      Added trace span receiver 2 with configuration hadoop.htrace.local-file-span-receiver.path = /tmp/htrace.out
+      $ hadoop trace -add -class LocalFileSpanReceiver -Cdfs.htrace.local-file-span-receiver.path=/tmp/htrace.out -host 192.168.56.2:9000
+      Added trace span receiver 2 with configuration dfs.htrace.local-file-span-receiver.path = /tmp/htrace.out
 
 
       $ hadoop trace -list -host 192.168.56.2:9000
       $ hadoop trace -list -host 192.168.56.2:9000
       ID  CLASS
       ID  CLASS
@@ -149,8 +117,9 @@ In order to trace, you will need to wrap the traced logic with **tracing span**
 When there is running tracing spans,
 When there is running tracing spans,
 the tracing information is propagated to servers along with RPC requests.
 the tracing information is propagated to servers along with RPC requests.
 
 
-In addition, you need to initialize `SpanReceiver` once per process.
+In addition, you need to initialize `SpanReceiverHost` once per process.
 
 
+```java
     import org.apache.hadoop.hdfs.HdfsConfiguration;
     import org.apache.hadoop.hdfs.HdfsConfiguration;
     import org.apache.hadoop.tracing.SpanReceiverHost;
     import org.apache.hadoop.tracing.SpanReceiverHost;
     import org.apache.htrace.Sampler;
     import org.apache.htrace.Sampler;
@@ -169,14 +138,17 @@ In addition, you need to initialize `SpanReceiver` once per process.
         } finally {
         } finally {
           if (ts != null) ts.close();
           if (ts != null) ts.close();
         }
         }
+```
 
 
-### Sample code for tracing
+### Sample code for tracing by HTrace API
 
 
 The `TracingFsShell.java` shown below is the wrapper of FsShell
 The `TracingFsShell.java` shown below is the wrapper of FsShell
 which start tracing span before invoking HDFS shell command.
 which start tracing span before invoking HDFS shell command.
 
 
+```java
     import org.apache.hadoop.conf.Configuration;
     import org.apache.hadoop.conf.Configuration;
     import org.apache.hadoop.fs.FsShell;
     import org.apache.hadoop.fs.FsShell;
+    import org.apache.hadoop.hdfs.DFSConfigKeys;
     import org.apache.hadoop.tracing.SpanReceiverHost;
     import org.apache.hadoop.tracing.SpanReceiverHost;
     import org.apache.hadoop.util.ToolRunner;
     import org.apache.hadoop.util.ToolRunner;
     import org.apache.htrace.Sampler;
     import org.apache.htrace.Sampler;
@@ -189,7 +161,7 @@ which start tracing span before invoking HDFS shell command.
         FsShell shell = new FsShell();
         FsShell shell = new FsShell();
         conf.setQuietMode(false);
         conf.setQuietMode(false);
         shell.setConf(conf);
         shell.setConf(conf);
-        SpanReceiverHost.getInstance(conf);
+        SpanReceiverHost.get(conf, DFSConfigKeys.DFS_SERVER_HTRACE_PREFIX);
         int res = 0;
         int res = 0;
         TraceScope ts = null;
         TraceScope ts = null;
         try {
         try {
@@ -202,8 +174,40 @@ which start tracing span before invoking HDFS shell command.
         System.exit(res);
         System.exit(res);
       }
       }
     }
     }
+```
 
 
 You can compile and execute this code as shown below.
 You can compile and execute this code as shown below.
 
 
     $ javac -cp `hadoop classpath` TracingFsShell.java
     $ javac -cp `hadoop classpath` TracingFsShell.java
     $ java -cp .:`hadoop classpath` TracingFsShell -ls /
     $ java -cp .:`hadoop classpath` TracingFsShell -ls /
+
+### Starting tracing spans by configuration for HDFS client
+
+The DFSClient can enable tracing internally. This allows you to use HTrace with
+your client without modifying the client source code.
+
+Configure the span receivers and samplers in `hdfs-site.xml`
+by properties `dfs.client.htrace.sampler` and `dfs.client.htrace.sampler`.
+The value of `dfs.client.htrace.sampler` can be NeverSampler, AlwaysSampler or ProbabilitySampler.
+
+* NeverSampler: HTrace is OFF for all requests to namenodes and datanodes;
+* AlwaysSampler: HTrace is ON for all requests to namenodes and datanodes;
+* ProbabilitySampler: HTrace is ON for some percentage% of  requests to namenodes and datanodes
+
+You do not need to enable this if your client program has been modified
+to use HTrace.
+
+```xml
+      <property>
+        <name>dfs.client.htrace.spanreceiver.classes</name>
+        <value>LocalFileSpanReceiver</value>
+      </property>
+      <property>
+        <name>dfs.client.htrace.sampler</name>
+        <value>ProbabilitySampler</value>
+      </property>
+      <property>
+        <name>dfs.client.htrace.sampler.fraction</name>
+        <value>0.5</value>
+      </property>
+```

+ 3 - 0
hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt

@@ -197,6 +197,9 @@ Release 2.8.0 - UNRELEASED
 
 
     HDFS-5640. Add snapshot methods to FileContext. (Rakesh R via cnauroth)
     HDFS-5640. Add snapshot methods to FileContext. (Rakesh R via cnauroth)
 
 
+    HDFS-8284. Update documentation about how to use HTrace with HDFS (Masatake
+    Iwasaki via Colin P. McCabe)
+
   OPTIMIZATIONS
   OPTIMIZATIONS
 
 
     HDFS-8026. Trace FSOutputSummer#writeChecksumChunks rather than
     HDFS-8026. Trace FSOutputSummer#writeChecksumChunks rather than

+ 18 - 0
hadoop-hdfs-project/hadoop-hdfs/src/main/resources/hdfs-default.xml

@@ -2315,4 +2315,22 @@
     the delay time will increase exponentially(double) for each retry.
     the delay time will increase exponentially(double) for each retry.
   </description>
   </description>
 </property>
 </property>
+
+<property>
+  <name>dfs.htrace.spanreceiver.classes</name>
+  <value></value>
+  <description>
+    The class name of the HTrace SpanReceiver for the NameNode and DataNode.
+  </description>
+</property>
+
+<property>
+  <name>dfs.client.htrace.spanreceiver.classes</name>
+  <value></value>
+  <description>
+    The class name of the HTrace SpanReceiver for the HDFS client. You do not
+    need to enable this if your client program has been modified to use HTrace.
+  </description>
+</property>
+
 </configuration>
 </configuration>