|
@@ -29,9 +29,7 @@ Rumen
|
|
|
- [Components](#Components)
|
|
|
- [How to use Rumen?](#How_to_use_Rumen)
|
|
|
- [Trace Builder](#Trace_Builder)
|
|
|
- - [Example](#Example)
|
|
|
- [Folder](#Folder)
|
|
|
- - [Examples](#Examples)
|
|
|
- [Appendix](#Appendix)
|
|
|
- [Resources](#Resources)
|
|
|
- [Dependencies](#Dependencies)
|
|
@@ -128,18 +126,21 @@ can use the `Folder` utility to fold the current trace to the
|
|
|
desired length. The remaining part of this section explains these
|
|
|
utilities in detail.
|
|
|
|
|
|
-> Examples in this section assumes that certain libraries are present
|
|
|
-> in the java CLASSPATH. See <em>Section-3.2</em> for more details.
|
|
|
+Examples in this section assumes that certain libraries are present
|
|
|
+in the java CLASSPATH. See [Dependencies](#Dependencies) for more details.
|
|
|
|
|
|
|
|
|
$H3 Trace Builder
|
|
|
|
|
|
-`Command:`
|
|
|
+$H4 Command
|
|
|
|
|
|
- java org.apache.hadoop.tools.rumen.TraceBuilder [options] <jobtrace-output> <topology-output> <inputs>
|
|
|
+```
|
|
|
+java org.apache.hadoop.tools.rumen.TraceBuilder [options] <jobtrace-output> <topology-output> <inputs>
|
|
|
+```
|
|
|
|
|
|
-This command invokes the `TraceBuilder` utility of
|
|
|
-*Rumen*. It converts the JobHistory files into a series of JSON
|
|
|
+This command invokes the `TraceBuilder` utility of *Rumen*.
|
|
|
+
|
|
|
+TraceBuilder converts the JobHistory files into a series of JSON
|
|
|
objects and writes them into the `<jobtrace-output>`
|
|
|
file. It also extracts the cluster layout (topology) and writes it in
|
|
|
the`<topology-output>` file.
|
|
@@ -169,7 +170,7 @@ Cluster topology is used as follows :
|
|
|
* To extrapolate splits information for tasks with missing splits
|
|
|
details or synthetically generated tasks.
|
|
|
|
|
|
-`Options :`
|
|
|
+$H4 Options
|
|
|
|
|
|
<table>
|
|
|
<tr>
|
|
@@ -204,33 +205,45 @@ Cluster topology is used as follows :
|
|
|
|
|
|
$H4 Example
|
|
|
|
|
|
- java org.apache.hadoop.tools.rumen.TraceBuilder file:///home/user/job-trace.json file:///home/user/topology.output file:///home/user/logs/history/done
|
|
|
+*Rumen* expects certain library *JARs* to be present in the *CLASSPATH*.
|
|
|
+One simple way to run Rumen is to use
|
|
|
+`$HADOOP_HOME/bin/hadoop jar` command to run it as example below.
|
|
|
|
|
|
-This will analyze all the jobs in
|
|
|
+```
|
|
|
+java org.apache.hadoop.tools.rumen.TraceBuilder \
|
|
|
+ file:///tmp/job-trace.json \
|
|
|
+ file:///tmp/job-topology.json \
|
|
|
+ hdfs:///tmp/hadoop-yarn/staging/history/done_intermediate/testuser
|
|
|
+```
|
|
|
|
|
|
-`/home/user/logs/history/done` stored on the
|
|
|
-`local` FileSystem and output the jobtraces in
|
|
|
-`/home/user/job-trace.json` along with topology
|
|
|
-information in `/home/user/topology.output`.
|
|
|
+This will analyze all the jobs in
|
|
|
+`/tmp/hadoop-yarn/staging/history/done_intermediate/testuser`
|
|
|
+stored on the `HDFS` FileSystem
|
|
|
+and output the jobtraces in `/tmp/job-trace.json`
|
|
|
+along with topology information in `/tmp/job-topology.json`
|
|
|
+stored on the `local` FileSystem.
|
|
|
|
|
|
|
|
|
$H3 Folder
|
|
|
|
|
|
-`Command`:
|
|
|
+$H4 Command
|
|
|
|
|
|
- java org.apache.hadoop.tools.rumen.Folder [options] [input] [output]
|
|
|
-
|
|
|
-> Input and output to `Folder` is expected to be a fully
|
|
|
-> qualified FileSystem path. So use file:// to specify
|
|
|
-> files on the `local` FileSystem and hdfs:// to
|
|
|
-> specify files on HDFS.
|
|
|
+```
|
|
|
+java org.apache.hadoop.tools.rumen.Folder [options] [input] [output]
|
|
|
+```
|
|
|
|
|
|
This command invokes the `Folder` utility of
|
|
|
*Rumen*. Folding essentially means that the output duration of
|
|
|
the resulting trace is fixed and job timelines are adjusted
|
|
|
to respect the final output duration.
|
|
|
|
|
|
-`Options :`
|
|
|
+> Input and output to `Folder` is expected to be a fully
|
|
|
+> qualified FileSystem path. So use `file://` to specify
|
|
|
+> files on the `local` FileSystem and `hdfs://` to
|
|
|
+> specify files on HDFS.
|
|
|
+
|
|
|
+
|
|
|
+$H4 Options
|
|
|
|
|
|
<table>
|
|
|
<tr>
|
|
@@ -335,14 +348,28 @@ to respect the final output duration.
|
|
|
$H4 Examples
|
|
|
|
|
|
$H5 Folding an input trace with 10 hours of total runtime to generate an output trace with 1 hour of total runtime
|
|
|
-
|
|
|
- java org.apache.hadoop.tools.rumen.Folder -output-duration 1h -input-cycle 20m file:///home/user/job-trace.json file:///home/user/job-trace-1hr.json
|
|
|
+
|
|
|
+```
|
|
|
+java org.apache.hadoop.tools.rumen.Folder \
|
|
|
+ -output-duration 1h \
|
|
|
+ -input-cycle 20m \
|
|
|
+ file:///tmp/job-trace.json \
|
|
|
+ file:///tmp/job-trace-1hr.json
|
|
|
+```
|
|
|
|
|
|
If the folded jobs are out of order then the command will bail out.
|
|
|
|
|
|
$H5 Folding an input trace with 10 hours of total runtime to generate an output trace with 1 hour of total runtime and tolerate some skewness
|
|
|
|
|
|
- java org.apache.hadoop.tools.rumen.Folder -output-duration 1h -input-cycle 20m -allow-missorting -skew-buffer-length 100 file:///home/user/job-trace.json file:///home/user/job-trace-1hr.json
|
|
|
+```
|
|
|
+java org.apache.hadoop.tools.rumen.Folder \
|
|
|
+ -output-duration 1h \
|
|
|
+ -input-cycle 20m \
|
|
|
+ -allow-missorting \
|
|
|
+ -skew-buffer-length 100 \
|
|
|
+ file:///tmp/job-trace.json \
|
|
|
+ file:///tmp/job-trace-1hr.json
|
|
|
+```
|
|
|
|
|
|
If the folded jobs are out of order, then atmost
|
|
|
100 jobs will be de-skewed. If the 101<sup>st</sup> job is
|
|
@@ -350,23 +377,37 @@ If the folded jobs are out of order, then atmost
|
|
|
|
|
|
$H5 Folding an input trace with 10 hours of total runtime to generate an output trace with 1 hour of total runtime in debug mode
|
|
|
|
|
|
- java org.apache.hadoop.tools.rumen.Folder -output-duration 1h -input-cycle 20m -debug -temp-directory file:///tmp/debug file:///home/user/job-trace.json file:///home/user/job-trace-1hr.json
|
|
|
+```
|
|
|
+java org.apache.hadoop.tools.rumen.Folder \
|
|
|
+ -output-duration 1h \
|
|
|
+ -input-cycle 20m \
|
|
|
+ -debug -temp-directory file:///tmp/debug \
|
|
|
+ file:///tmp/job-trace.json \
|
|
|
+ file:///tmp/job-trace-1hr.json
|
|
|
+```
|
|
|
|
|
|
This will fold the 10hr job-trace file
|
|
|
-`file:///home/user/job-trace.json` to finish within 1hr
|
|
|
+`file:///tmp/job-trace.json` to finish within 1hr
|
|
|
and use `file:///tmp/debug` as the temporary directory.
|
|
|
The intermediate files in the temporary directory will not be cleaned
|
|
|
up.
|
|
|
|
|
|
$H5 Folding an input trace with 10 hours of total runtime to generate an output trace with 1 hour of total runtime with custom concentration.
|
|
|
|
|
|
- java org.apache.hadoop.tools.rumen.Folder -output-duration 1h -input-cycle 20m -concentration 2 file:///home/user/job-trace.json file:///home/user/job-trace-1hr.json</source>
|
|
|
+```
|
|
|
+java org.apache.hadoop.tools.rumen.Folder \
|
|
|
+ -output-duration 1h \
|
|
|
+ -input-cycle 20m \
|
|
|
+ -concentration 2 \
|
|
|
+ file:///tmp/job-trace.json \
|
|
|
+ file:///tmp/job-trace-1hr.json
|
|
|
+```
|
|
|
|
|
|
This will fold the 10hr job-trace file
|
|
|
-`file:///home/user/job-trace.json` to finish within 1hr
|
|
|
-with concentration of 2. `Example-2.3.2` will retain 10%
|
|
|
-of the jobs. With *concentration* as 2, 20% of the total input
|
|
|
-jobs will be retained.
|
|
|
+`file:///tmp/job-trace.json` to finish within 1hr
|
|
|
+with concentration of 2.
|
|
|
+If the 10h job-trace is folded to 1h, it retains 10% of the jobs by default.
|
|
|
+With *concentration* as 2, 20% of the total input jobs will be retained.
|
|
|
|
|
|
|
|
|
Appendix
|
|
@@ -377,21 +418,21 @@ $H3 Resources
|
|
|
<a href="https://issues.apache.org/jira/browse/MAPREDUCE-751">MAPREDUCE-751</a>
|
|
|
is the main JIRA that introduced *Rumen* to *MapReduce*.
|
|
|
Look at the MapReduce
|
|
|
-<a href="https://issues.apache.org/jira/browse/MAPREDUCE/component/12313617">
|
|
|
-rumen-component</a>for further details.
|
|
|
+<a href="https://issues.apache.org/jira/browse/MAPREDUCE/component/12313617">rumen-component</a>
|
|
|
+for further details.
|
|
|
|
|
|
|
|
|
$H3 Dependencies
|
|
|
|
|
|
-*Rumen* expects certain library *JARs* to be present in
|
|
|
-the *CLASSPATH*. The required libraries are
|
|
|
-
|
|
|
-* `Hadoop MapReduce Tools` (`hadoop-mapred-tools-{hadoop-version}.jar`)
|
|
|
-* `Hadoop Common` (`hadoop-common-{hadoop-version}.jar`)
|
|
|
-* `Apache Commons Logging` (`commons-logging-1.1.1.jar`)
|
|
|
-* `Apache Commons CLI` (`commons-cli-1.2.jar`)
|
|
|
-* `Jackson Mapper` (`jackson-mapper-asl-1.4.2.jar`)
|
|
|
-* `Jackson Core` (`jackson-core-asl-1.4.2.jar`)
|
|
|
-
|
|
|
-> One simple way to run Rumen is to use '$HADOOP_HOME/bin/hadoop jar'
|
|
|
-> option to run it.
|
|
|
+*Rumen* expects certain library *JARs* to be present in the *CLASSPATH*.
|
|
|
+One simple way to run Rumen is to use
|
|
|
+`hadoop jar` command to run it as example below.
|
|
|
+
|
|
|
+```
|
|
|
+$HADOOP_HOME/bin/hadoop jar \
|
|
|
+ $HADOOP_HOME/share/hadoop/tools/lib/hadoop-rumen-2.5.1.jar \
|
|
|
+ org.apache.hadoop.tools.rumen.TraceBuilder \
|
|
|
+ file:///tmp/job-trace.json \
|
|
|
+ file:///tmp/job-topology.json \
|
|
|
+ hdfs:///tmp/hadoop-yarn/staging/history/done_intermediate/testuser
|
|
|
+```
|