Jelajahi Sumber

HADOOP-13107. clean up how rumen is executed

Allen Wittenauer 9 tahun lalu
induk
melakukan
3cd7e4678e

+ 8 - 0
hadoop-assemblies/src/main/resources/assemblies/hadoop-tools.xml

@@ -132,6 +132,14 @@
         <include>*-sources.jar</include>
         <include>*-sources.jar</include>
       </includes>
       </includes>
     </fileSet>
     </fileSet>
+    <fileSet>
+      <directory>../hadoop-rumen/src/main/shellprofile.d</directory>
+      <includes>
+        <include>*</include>
+      </includes>
+      <outputDirectory>/libexec/shellprofile.d</outputDirectory>
+      <fileMode>0755</fileMode>
+    </fileSet>
     <fileSet>
     <fileSet>
       <directory>../hadoop-streaming/target</directory>
       <directory>../hadoop-streaming/target</directory>
       <outputDirectory>/share/hadoop/${hadoop.component}/sources</outputDirectory>
       <outputDirectory>/share/hadoop/${hadoop.component}/sources</outputDirectory>

+ 58 - 0
hadoop-tools/hadoop-rumen/src/main/shellprofile.d/hadoop-rumen.sh

@@ -0,0 +1,58 @@
+#!/usr/bin/env bash
+
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+if ! declare -f hadoop_subcommand_rumenfolder >/dev/null 2>/dev/null; then
+
+  if [[ "${HADOOP_SHELL_EXECNAME}" = hadoop ]]; then
+    hadoop_add_subcommand "rumenfolder" "scale a rumen input trace"
+  fi
+
+## @description  rumenfolder command for hadoop
+## @audience     public
+## @stability    stable
+## @replaceable  yes
+function hadoop_subcommand_rumenfolder
+{
+  # shellcheck disable=SC2034
+  HADOOP_CLASSNAME=org.apache.hadoop.tools.rumen.Folder
+  hadoop_add_to_classpath_tools hadoop-rumen
+  hadoop_debug "Appending HADOOP_CLIENT_OPTS onto HADOOP_OPTS"
+  HADOOP_OPTS="${HADOOP_OPTS} ${HADOOP_CLIENT_OPTS}"
+}
+
+fi
+
+if ! declare -f hadoop_subcommand_rumentrace >/dev/null 2>/dev/null; then
+
+  if [[ "${HADOOP_SHELL_EXECNAME}" = hadoop ]]; then
+    hadoop_add_subcommand "rumentrace" "convert logs into a rumen trace"
+  fi
+
+## @description  rumentrace command for hadoop
+## @audience     public
+## @stability    stable
+## @replaceable  yes
+function hadoop_subcommand_rumentrace
+{
+  # shellcheck disable=SC2034
+  HADOOP_CLASSNAME=org.apache.hadoop.tools.rumen.TraceBuilder
+  hadoop_add_to_classpath_tools hadoop-rumen
+  hadoop_debug "Appending HADOOP_CLIENT_OPTS onto HADOOP_OPTS"
+  HADOOP_OPTS="${HADOOP_OPTS} ${HADOOP_CLIENT_OPTS}"
+}
+
+fi

+ 9 - 31
hadoop-tools/hadoop-rumen/src/site/markdown/Rumen.md.vm

@@ -50,8 +50,8 @@ but a simulation of the scheduler elects to run that task on a remote
 rack, the simulator requires a runtime its input cannot provide. 
 rack, the simulator requires a runtime its input cannot provide. 
 To fill in these gaps, Rumen performs a statistical analysis of the 
 To fill in these gaps, Rumen performs a statistical analysis of the 
 digest to estimate the variables the trace doesn't supply. Rumen traces 
 digest to estimate the variables the trace doesn't supply. Rumen traces 
-drive both Gridmix (a benchmark of Hadoop MapReduce clusters) and Mumak 
-(a simulator for the JobTracker).
+drive both Gridmix (a benchmark of Hadoop MapReduce clusters) and SLS
+(a simulator for the resource manager scheduler).
 
 
 
 
 $H3 Motivation
 $H3 Motivation
@@ -126,16 +126,13 @@ can use the `Folder` utility to fold the current trace to the
 desired length. The remaining part of this section explains these 
 desired length. The remaining part of this section explains these 
 utilities in detail.
 utilities in detail.
     
     
-Examples in this section assumes that certain libraries are present 
-in the java CLASSPATH. See [Dependencies](#Dependencies) for more details.
-
 
 
 $H3 Trace Builder
 $H3 Trace Builder
       
       
 $H4 Command
 $H4 Command
 
 
 ```
 ```
-java org.apache.hadoop.tools.rumen.TraceBuilder [options] <jobtrace-output> <topology-output> <inputs>
+hadoop rumentrace [options] <jobtrace-output> <topology-output> <inputs>
 ```
 ```
   
   
 This command invokes the `TraceBuilder` utility of *Rumen*.
 This command invokes the `TraceBuilder` utility of *Rumen*.
@@ -205,12 +202,8 @@ $H4 Options
 
 
 $H4 Example
 $H4 Example
 
 
-*Rumen* expects certain library *JARs* to be present in  the *CLASSPATH*.
-One simple way to run Rumen is to use
-`$HADOOP_HOME/bin/hadoop jar` command to run it as example below.
-
 ```
 ```
-java org.apache.hadoop.tools.rumen.TraceBuilder \
+hadoop rumentrace \
   file:///tmp/job-trace.json \
   file:///tmp/job-trace.json \
   file:///tmp/job-topology.json \
   file:///tmp/job-topology.json \
   hdfs:///tmp/hadoop-yarn/staging/history/done_intermediate/testuser
   hdfs:///tmp/hadoop-yarn/staging/history/done_intermediate/testuser
@@ -229,7 +222,7 @@ $H3 Folder
 $H4 Command
 $H4 Command
 
 
 ```
 ```
-java org.apache.hadoop.tools.rumen.Folder [options] [input] [output]
+hadoop rumenfolder [options] [input] [output]
 ```
 ```
       
       
 This command invokes the `Folder` utility of 
 This command invokes the `Folder` utility of 
@@ -350,7 +343,7 @@ $H4 Examples
 $H5 Folding an input trace with 10 hours of total runtime to generate an output trace with 1 hour of total runtime
 $H5 Folding an input trace with 10 hours of total runtime to generate an output trace with 1 hour of total runtime
 
 
 ```
 ```
-java org.apache.hadoop.tools.rumen.Folder \
+hadoop rumenfolder \
   -output-duration 1h \
   -output-duration 1h \
   -input-cycle 20m \
   -input-cycle 20m \
   file:///tmp/job-trace.json \
   file:///tmp/job-trace.json \
@@ -362,7 +355,7 @@ If the folded jobs are out of order then the command will bail out.
 $H5 Folding an input trace with 10 hours of total runtime to generate an output trace with 1 hour of total runtime and tolerate some skewness
 $H5 Folding an input trace with 10 hours of total runtime to generate an output trace with 1 hour of total runtime and tolerate some skewness
 
 
 ```
 ```
-java org.apache.hadoop.tools.rumen.Folder \
+hadoop rumenfolder \
   -output-duration 1h \
   -output-duration 1h \
   -input-cycle 20m \
   -input-cycle 20m \
   -allow-missorting \
   -allow-missorting \
@@ -378,7 +371,7 @@ If the folded jobs are out of order, then atmost
 $H5 Folding an input trace with 10 hours of total runtime to generate an output trace with 1 hour of total runtime in debug mode
 $H5 Folding an input trace with 10 hours of total runtime to generate an output trace with 1 hour of total runtime in debug mode
 
 
 ```
 ```
-java org.apache.hadoop.tools.rumen.Folder \
+hadoop rumenfolder \
   -output-duration 1h \
   -output-duration 1h \
   -input-cycle 20m \
   -input-cycle 20m \
   -debug -temp-directory file:///tmp/debug \
   -debug -temp-directory file:///tmp/debug \
@@ -395,7 +388,7 @@ up.
 $H5 Folding an input trace with 10 hours of total runtime to generate an output trace with 1 hour of total runtime with custom concentration.
 $H5 Folding an input trace with 10 hours of total runtime to generate an output trace with 1 hour of total runtime with custom concentration.
 
 
 ```
 ```
-java org.apache.hadoop.tools.rumen.Folder \
+hadoop rumenfolder \
   -output-duration 1h \
   -output-duration 1h \
   -input-cycle 20m \
   -input-cycle 20m \
   -concentration 2 \
   -concentration 2 \
@@ -421,18 +414,3 @@ Look at the MapReduce
 <a href="https://issues.apache.org/jira/browse/MAPREDUCE/component/12313617">rumen-component</a>
 <a href="https://issues.apache.org/jira/browse/MAPREDUCE/component/12313617">rumen-component</a>
 for further details.
 for further details.
 
 
-
-$H3 Dependencies
-
-*Rumen* expects certain library *JARs* to be present in  the *CLASSPATH*.
-One simple way to run Rumen is to use
-`hadoop jar` command to run it as example below.
-
-```
-$HADOOP_HOME/bin/hadoop jar \
-  $HADOOP_HOME/share/hadoop/tools/lib/hadoop-rumen-2.5.1.jar \
-  org.apache.hadoop.tools.rumen.TraceBuilder \
-  file:///tmp/job-trace.json \
-  file:///tmp/job-topology.json \
-  hdfs:///tmp/hadoop-yarn/staging/history/done_intermediate/testuser
-```