瀏覽代碼

HADOOP-3691. Fix streaming and tutorial docs. Contributed by Jothi Padmanabhan.

git-svn-id: https://svn.apache.org/repos/asf/hadoop/core/trunk@674834 13f79535-47bb-0310-9956-ffa450edef68
Devaraj Das 17 年之前
父節點
當前提交
cc34134077

+ 2 - 0
CHANGES.txt

@@ -777,6 +777,8 @@ Release 0.18.0 - Unreleased
     HADOOP-3692. Fix documentation for Cluster setup and Quick start guides. 
     HADOOP-3692. Fix documentation for Cluster setup and Quick start guides. 
     (Amareshwari Sriramadasu via ddas)
     (Amareshwari Sriramadasu via ddas)
 
 
+    HADOOP-3691. Fix streaming and tutorial docs. (Jothi Padmanabhan via ddas)
+
 Release 0.17.1 - Unreleased
 Release 0.17.1 - Unreleased
 
 
   INCOMPATIBLE CHANGES
   INCOMPATIBLE CHANGES

+ 3 - 1
docs/changes.html

@@ -378,7 +378,7 @@ InputFormat.validateInput.<br />(tomwhite via omalley)</li>
     </ol>
     </ol>
   </li>
   </li>
   <li><a href="javascript:toggleList('release_0.18.0_-_unreleased_._bug_fixes_')">  BUG FIXES
   <li><a href="javascript:toggleList('release_0.18.0_-_unreleased_._bug_fixes_')">  BUG FIXES
-</a>&nbsp;&nbsp;&nbsp;(113)
+</a>&nbsp;&nbsp;&nbsp;(115)
     <ol id="release_0.18.0_-_unreleased_._bug_fixes_">
     <ol id="release_0.18.0_-_unreleased_._bug_fixes_">
       <li><a href="http://issues.apache.org/jira/browse/HADOOP-2905">HADOOP-2905</a>. 'fsck -move' triggers NPE in NameNode.<br />(Lohit Vjayarenu via rangadi)</li>
       <li><a href="http://issues.apache.org/jira/browse/HADOOP-2905">HADOOP-2905</a>. 'fsck -move' triggers NPE in NameNode.<br />(Lohit Vjayarenu via rangadi)</li>
       <li>Increment ClientProtocol.versionID missed by <a href="http://issues.apache.org/jira/browse/HADOOP-2585">HADOOP-2585</a>.<br />(shv)</li>
       <li>Increment ClientProtocol.versionID missed by <a href="http://issues.apache.org/jira/browse/HADOOP-2585">HADOOP-2585</a>.<br />(shv)</li>
@@ -603,6 +603,8 @@ input. Validation job still runs on default fs.<br />(Jothi Padmanabhan via cdou
 conform to style guidelines.<br />(Amareshwari Sriramadasu via cdouglas)</li>
 conform to style guidelines.<br />(Amareshwari Sriramadasu via cdouglas)</li>
       <li><a href="http://issues.apache.org/jira/browse/HADOOP-3653">HADOOP-3653</a>. Fix test-patch target to properly account for Eclipse
       <li><a href="http://issues.apache.org/jira/browse/HADOOP-3653">HADOOP-3653</a>. Fix test-patch target to properly account for Eclipse
 classpath jars.<br />(Brice Arnould via nigel)</li>
 classpath jars.<br />(Brice Arnould via nigel)</li>
+      <li><a href="http://issues.apache.org/jira/browse/HADOOP-3692">HADOOP-3692</a>. Fix documentation for Cluster setup and Quick start guides.<br />(Amareshwari Sriramadasu via ddas)</li>
+      <li><a href="http://issues.apache.org/jira/browse/HADOOP-3691">HADOOP-3691</a>. Fix streaming and tutorial docs.<br />(Jothi Padmanabhan via ddas)</li>
     </ol>
     </ol>
   </li>
   </li>
 </ul>
 </ul>

+ 40 - 40
docs/mapred_tutorial.html

@@ -5,7 +5,7 @@
 <meta content="Apache Forrest" name="Generator">
 <meta content="Apache Forrest" name="Generator">
 <meta name="Forrest-version" content="0.8">
 <meta name="Forrest-version" content="0.8">
 <meta name="Forrest-skin-name" content="pelt">
 <meta name="Forrest-skin-name" content="pelt">
-<title>Hadoop Map-Reduce Tutorial</title>
+<title>Hadoop Map/Reduce Tutorial</title>
 <link type="text/css" href="skin/basic.css" rel="stylesheet">
 <link type="text/css" href="skin/basic.css" rel="stylesheet">
 <link media="screen" type="text/css" href="skin/screen.css" rel="stylesheet">
 <link media="screen" type="text/css" href="skin/screen.css" rel="stylesheet">
 <link media="print" type="text/css" href="skin/print.css" rel="stylesheet">
 <link media="print" type="text/css" href="skin/print.css" rel="stylesheet">
@@ -187,7 +187,7 @@ document.write("Last Published: " + document.lastModified);
 <a class="dida" href="mapred_tutorial.pdf"><img alt="PDF -icon" src="skin/images/pdfdoc.gif" class="skin"><br>
 <a class="dida" href="mapred_tutorial.pdf"><img alt="PDF -icon" src="skin/images/pdfdoc.gif" class="skin"><br>
         PDF</a>
         PDF</a>
 </div>
 </div>
-<h1>Hadoop Map-Reduce Tutorial</h1>
+<h1>Hadoop Map/Reduce Tutorial</h1>
 <div id="minitoc-area">
 <div id="minitoc-area">
 <ul class="minitoc">
 <ul class="minitoc">
 <li>
 <li>
@@ -217,7 +217,7 @@ document.write("Last Published: " + document.lastModified);
 </ul>
 </ul>
 </li>
 </li>
 <li>
 <li>
-<a href="#Map-Reduce+-+User+Interfaces">Map-Reduce - User Interfaces</a>
+<a href="#Map%2FReduce+-+User+Interfaces">Map/Reduce - User Interfaces</a>
 <ul class="minitoc">
 <ul class="minitoc">
 <li>
 <li>
 <a href="#Payload">Payload</a>
 <a href="#Payload">Payload</a>
@@ -328,7 +328,7 @@ document.write("Last Published: " + document.lastModified);
 <h2 class="h3">Purpose</h2>
 <h2 class="h3">Purpose</h2>
 <div class="section">
 <div class="section">
 <p>This document comprehensively describes all user-facing facets of the 
 <p>This document comprehensively describes all user-facing facets of the 
-      Hadoop Map-Reduce framework and serves as a tutorial.
+      Hadoop Map/Reduce framework and serves as a tutorial.
       </p>
       </p>
 </div>
 </div>
     
     
@@ -356,11 +356,11 @@ document.write("Last Published: " + document.lastModified);
 <a name="N10032"></a><a name="Overview"></a>
 <a name="N10032"></a><a name="Overview"></a>
 <h2 class="h3">Overview</h2>
 <h2 class="h3">Overview</h2>
 <div class="section">
 <div class="section">
-<p>Hadoop Map-Reduce is a software framework for easily writing 
+<p>Hadoop Map/Reduce is a software framework for easily writing 
       applications which process vast amounts of data (multi-terabyte data-sets) 
       applications which process vast amounts of data (multi-terabyte data-sets) 
       in-parallel on large clusters (thousands of nodes) of commodity 
       in-parallel on large clusters (thousands of nodes) of commodity 
       hardware in a reliable, fault-tolerant manner.</p>
       hardware in a reliable, fault-tolerant manner.</p>
-<p>A Map-Reduce <em>job</em> usually splits the input data-set into 
+<p>A Map/Reduce <em>job</em> usually splits the input data-set into 
       independent chunks which are processed by the <em>map tasks</em> in a
       independent chunks which are processed by the <em>map tasks</em> in a
       completely parallel manner. The framework sorts the outputs of the maps, 
       completely parallel manner. The framework sorts the outputs of the maps, 
       which are then input to the <em>reduce tasks</em>. Typically both the 
       which are then input to the <em>reduce tasks</em>. Typically both the 
@@ -368,12 +368,12 @@ document.write("Last Published: " + document.lastModified);
       takes care of scheduling tasks, monitoring them and re-executes the failed
       takes care of scheduling tasks, monitoring them and re-executes the failed
       tasks.</p>
       tasks.</p>
 <p>Typically the compute nodes and the storage nodes are the same, that is, 
 <p>Typically the compute nodes and the storage nodes are the same, that is, 
-      the Map-Reduce framework and the <a href="hdfs_design.html">Distributed 
+      the Map/Reduce framework and the <a href="hdfs_design.html">Distributed 
       FileSystem</a> are running on the same set of nodes. This configuration
       FileSystem</a> are running on the same set of nodes. This configuration
       allows the framework to effectively schedule tasks on the nodes where data 
       allows the framework to effectively schedule tasks on the nodes where data 
       is already present, resulting in very high aggregate bandwidth across the 
       is already present, resulting in very high aggregate bandwidth across the 
       cluster.</p>
       cluster.</p>
-<p>The Map-Reduce framework consists of a single master 
+<p>The Map/Reduce framework consists of a single master 
       <span class="codefrag">JobTracker</span> and one slave <span class="codefrag">TaskTracker</span> per 
       <span class="codefrag">JobTracker</span> and one slave <span class="codefrag">TaskTracker</span> per 
       cluster-node. The master is responsible for scheduling the jobs' component 
       cluster-node. The master is responsible for scheduling the jobs' component 
       tasks on the slaves, monitoring them and re-executing the failed tasks. The 
       tasks on the slaves, monitoring them and re-executing the failed tasks. The 
@@ -388,7 +388,7 @@ document.write("Last Published: " + document.lastModified);
       scheduling tasks and monitoring them, providing status and diagnostic 
       scheduling tasks and monitoring them, providing status and diagnostic 
       information to the job-client.</p>
       information to the job-client.</p>
 <p>Although the Hadoop framework is implemented in Java<sup>TM</sup>, 
 <p>Although the Hadoop framework is implemented in Java<sup>TM</sup>, 
-      Map-Reduce applications need not be written in Java.</p>
+      Map/Reduce applications need not be written in Java.</p>
 <ul>
 <ul>
         
         
 <li>
 <li>
@@ -403,7 +403,7 @@ document.write("Last Published: " + document.lastModified);
           
           
 <a href="api/org/apache/hadoop/mapred/pipes/package-summary.html">
 <a href="api/org/apache/hadoop/mapred/pipes/package-summary.html">
           Hadoop Pipes</a> is a <a href="http://www.swig.org/">SWIG</a>-
           Hadoop Pipes</a> is a <a href="http://www.swig.org/">SWIG</a>-
-          compatible <em>C++ API</em> to implement Map-Reduce applications (non 
+          compatible <em>C++ API</em> to implement Map/Reduce applications (non 
           JNI<sup>TM</sup> based).
           JNI<sup>TM</sup> based).
         </li>
         </li>
       
       
@@ -414,7 +414,7 @@ document.write("Last Published: " + document.lastModified);
 <a name="N1008B"></a><a name="Inputs+and+Outputs"></a>
 <a name="N1008B"></a><a name="Inputs+and+Outputs"></a>
 <h2 class="h3">Inputs and Outputs</h2>
 <h2 class="h3">Inputs and Outputs</h2>
 <div class="section">
 <div class="section">
-<p>The Map-Reduce framework operates exclusively on 
+<p>The Map/Reduce framework operates exclusively on 
       <span class="codefrag">&lt;key, value&gt;</span> pairs, that is, the framework views the 
       <span class="codefrag">&lt;key, value&gt;</span> pairs, that is, the framework views the 
       input to the job as a set of <span class="codefrag">&lt;key, value&gt;</span> pairs and 
       input to the job as a set of <span class="codefrag">&lt;key, value&gt;</span> pairs and 
       produces a set of <span class="codefrag">&lt;key, value&gt;</span> pairs as the output of 
       produces a set of <span class="codefrag">&lt;key, value&gt;</span> pairs as the output of 
@@ -426,7 +426,7 @@ document.write("Last Published: " + document.lastModified);
       <a href="api/org/apache/hadoop/io/WritableComparable.html">
       <a href="api/org/apache/hadoop/io/WritableComparable.html">
       WritableComparable</a> interface to facilitate sorting by the framework.
       WritableComparable</a> interface to facilitate sorting by the framework.
       </p>
       </p>
-<p>Input and Output types of a Map-Reduce job:</p>
+<p>Input and Output types of a Map/Reduce job:</p>
 <p>
 <p>
         (input) <span class="codefrag">&lt;k1, v1&gt;</span> 
         (input) <span class="codefrag">&lt;k1, v1&gt;</span> 
         -&gt; 
         -&gt; 
@@ -448,7 +448,7 @@ document.write("Last Published: " + document.lastModified);
 <a name="N100CD"></a><a name="Example%3A+WordCount+v1.0"></a>
 <a name="N100CD"></a><a name="Example%3A+WordCount+v1.0"></a>
 <h2 class="h3">Example: WordCount v1.0</h2>
 <h2 class="h3">Example: WordCount v1.0</h2>
 <div class="section">
 <div class="section">
-<p>Before we jump into the details, lets walk through an example Map-Reduce 
+<p>Before we jump into the details, lets walk through an example Map/Reduce 
       application to get a flavour for how they work.</p>
       application to get a flavour for how they work.</p>
 <p>
 <p>
 <span class="codefrag">WordCount</span> is a simple application that counts the number of
 <span class="codefrag">WordCount</span> is a simple application that counts the number of
@@ -1226,11 +1226,11 @@ document.write("Last Published: " + document.lastModified);
 </div>
 </div>
     
     
     
     
-<a name="N105A3"></a><a name="Map-Reduce+-+User+Interfaces"></a>
-<h2 class="h3">Map-Reduce - User Interfaces</h2>
+<a name="N105A3"></a><a name="Map%2FReduce+-+User+Interfaces"></a>
+<h2 class="h3">Map/Reduce - User Interfaces</h2>
 <div class="section">
 <div class="section">
 <p>This section provides a reasonable amount of detail on every user-facing 
 <p>This section provides a reasonable amount of detail on every user-facing 
-      aspect of the Map-Reduce framwork. This should help users implement, 
+      aspect of the Map/Reduce framwork. This should help users implement, 
       configure and tune their jobs in a fine-grained manner. However, please 
       configure and tune their jobs in a fine-grained manner. However, please 
       note that the javadoc for each class/interface remains the most 
       note that the javadoc for each class/interface remains the most 
       comprehensive documentation available; this is only meant to be a tutorial.
       comprehensive documentation available; this is only meant to be a tutorial.
@@ -1260,7 +1260,7 @@ document.write("Last Published: " + document.lastModified);
           intermediate records. The transformed intermediate records do not need
           intermediate records. The transformed intermediate records do not need
           to be of the same type as the input records. A given input pair may 
           to be of the same type as the input records. A given input pair may 
           map to zero or many output pairs.</p>
           map to zero or many output pairs.</p>
-<p>The Hadoop Map-Reduce framework spawns one map task for each 
+<p>The Hadoop Map/Reduce framework spawns one map task for each 
           <span class="codefrag">InputSplit</span> generated by the <span class="codefrag">InputFormat</span> for 
           <span class="codefrag">InputSplit</span> generated by the <span class="codefrag">InputFormat</span> for 
           the job.</p>
           the job.</p>
 <p>Overall, <span class="codefrag">Mapper</span> implementations are passed the 
 <p>Overall, <span class="codefrag">Mapper</span> implementations are passed the 
@@ -1423,7 +1423,7 @@ document.write("Last Published: " + document.lastModified);
 <h4>Reporter</h4>
 <h4>Reporter</h4>
 <p>
 <p>
 <a href="api/org/apache/hadoop/mapred/Reporter.html">
 <a href="api/org/apache/hadoop/mapred/Reporter.html">
-          Reporter</a> is a facility for Map-Reduce applications to report 
+          Reporter</a> is a facility for Map/Reduce applications to report 
           progress, set application-level status messages and update 
           progress, set application-level status messages and update 
           <span class="codefrag">Counters</span>.</p>
           <span class="codefrag">Counters</span>.</p>
 <p>
 <p>
@@ -1443,20 +1443,20 @@ document.write("Last Published: " + document.lastModified);
 <p>
 <p>
 <a href="api/org/apache/hadoop/mapred/OutputCollector.html">
 <a href="api/org/apache/hadoop/mapred/OutputCollector.html">
           OutputCollector</a> is a generalization of the facility provided by
           OutputCollector</a> is a generalization of the facility provided by
-          the Map-Reduce framework to collect data output by the 
+          the Map/Reduce framework to collect data output by the 
           <span class="codefrag">Mapper</span> or the <span class="codefrag">Reducer</span> (either the 
           <span class="codefrag">Mapper</span> or the <span class="codefrag">Reducer</span> (either the 
           intermediate outputs or the output of the job).</p>
           intermediate outputs or the output of the job).</p>
-<p>Hadoop Map-Reduce comes bundled with a 
+<p>Hadoop Map/Reduce comes bundled with a 
         <a href="api/org/apache/hadoop/mapred/lib/package-summary.html">
         <a href="api/org/apache/hadoop/mapred/lib/package-summary.html">
         library</a> of generally useful mappers, reducers, and partitioners.</p>
         library</a> of generally useful mappers, reducers, and partitioners.</p>
 <a name="N107B6"></a><a name="Job+Configuration"></a>
 <a name="N107B6"></a><a name="Job+Configuration"></a>
 <h3 class="h4">Job Configuration</h3>
 <h3 class="h4">Job Configuration</h3>
 <p>
 <p>
 <a href="api/org/apache/hadoop/mapred/JobConf.html">
 <a href="api/org/apache/hadoop/mapred/JobConf.html">
-        JobConf</a> represents a Map-Reduce job configuration.</p>
+        JobConf</a> represents a Map/Reduce job configuration.</p>
 <p>
 <p>
 <span class="codefrag">JobConf</span> is the primary interface for a user to describe
 <span class="codefrag">JobConf</span> is the primary interface for a user to describe
-        a map-reduce job to the Hadoop framework for execution. The framework 
+        a Map/Reduce job to the Hadoop framework for execution. The framework 
         tries to faithfully execute the job as described by <span class="codefrag">JobConf</span>, 
         tries to faithfully execute the job as described by <span class="codefrag">JobConf</span>, 
         however:</p>
         however:</p>
 <ul>
 <ul>
@@ -1747,7 +1747,7 @@ document.write("Last Published: " + document.lastModified);
         with the <span class="codefrag">JobTracker</span>.</p>
         with the <span class="codefrag">JobTracker</span>.</p>
 <p>
 <p>
 <span class="codefrag">JobClient</span> provides facilities to submit jobs, track their 
 <span class="codefrag">JobClient</span> provides facilities to submit jobs, track their 
-        progress, access component-tasks' reports/logs, get the Map-Reduce 
+        progress, access component-tasks' reports and logs, get the Map/Reduce 
         cluster's status information and so on.</p>
         cluster's status information and so on.</p>
 <p>The job submission process involves:</p>
 <p>The job submission process involves:</p>
 <ol>
 <ol>
@@ -1762,7 +1762,7 @@ document.write("Last Published: " + document.lastModified);
           </li>
           </li>
           
           
 <li>
 <li>
-            Copying the job's jar and configuration to the map-reduce system 
+            Copying the job's jar and configuration to the Map/Reduce system 
             directory on the <span class="codefrag">FileSystem</span>.
             directory on the <span class="codefrag">FileSystem</span>.
           </li>
           </li>
           
           
@@ -1802,8 +1802,8 @@ document.write("Last Published: " + document.lastModified);
         <span class="codefrag">JobClient</span> to submit the job and monitor its progress.</p>
         <span class="codefrag">JobClient</span> to submit the job and monitor its progress.</p>
 <a name="N10A48"></a><a name="Job+Control"></a>
 <a name="N10A48"></a><a name="Job+Control"></a>
 <h4>Job Control</h4>
 <h4>Job Control</h4>
-<p>Users may need to chain map-reduce jobs to accomplish complex
-          tasks which cannot be done via a single map-reduce job. This is fairly
+<p>Users may need to chain Map/Reduce jobs to accomplish complex
+          tasks which cannot be done via a single Map/Reduce job. This is fairly
           easy since the output of the job typically goes to distributed 
           easy since the output of the job typically goes to distributed 
           file-system, and the output, in turn, can be used as the input for the 
           file-system, and the output, in turn, can be used as the input for the 
           next job.</p>
           next job.</p>
@@ -1840,9 +1840,9 @@ document.write("Last Published: " + document.lastModified);
 <h3 class="h4">Job Input</h3>
 <h3 class="h4">Job Input</h3>
 <p>
 <p>
 <a href="api/org/apache/hadoop/mapred/InputFormat.html">
 <a href="api/org/apache/hadoop/mapred/InputFormat.html">
-        InputFormat</a> describes the input-specification for a Map-Reduce job.
+        InputFormat</a> describes the input-specification for a Map/Reduce job.
         </p>
         </p>
-<p>The Map-Reduce framework relies on the <span class="codefrag">InputFormat</span> of 
+<p>The Map/Reduce framework relies on the <span class="codefrag">InputFormat</span> of 
         the job to:</p>
         the job to:</p>
 <ol>
 <ol>
           
           
@@ -1914,9 +1914,9 @@ document.write("Last Published: " + document.lastModified);
 <h3 class="h4">Job Output</h3>
 <h3 class="h4">Job Output</h3>
 <p>
 <p>
 <a href="api/org/apache/hadoop/mapred/OutputFormat.html">
 <a href="api/org/apache/hadoop/mapred/OutputFormat.html">
-        OutputFormat</a> describes the output-specification for a Map-Reduce 
+        OutputFormat</a> describes the output-specification for a Map/Reduce 
         job.</p>
         job.</p>
-<p>The Map-Reduce framework relies on the <span class="codefrag">OutputFormat</span> of 
+<p>The Map/Reduce framework relies on the <span class="codefrag">OutputFormat</span> of 
         the job to:</p>
         the job to:</p>
 <ol>
 <ol>
           
           
@@ -1946,7 +1946,7 @@ document.write("Last Published: " + document.lastModified);
           application-writer will have to pick unique names per task-attempt 
           application-writer will have to pick unique names per task-attempt 
           (using the attemptid, say <span class="codefrag">attempt_200709221812_0001_m_000000_0</span>), 
           (using the attemptid, say <span class="codefrag">attempt_200709221812_0001_m_000000_0</span>), 
           not just per task.</p>
           not just per task.</p>
-<p>To avoid these issues the Map-Reduce framework maintains a special 
+<p>To avoid these issues the Map/Reduce framework maintains a special 
           <span class="codefrag">${mapred.output.dir}/_temporary/_${taskid}</span> sub-directory
           <span class="codefrag">${mapred.output.dir}/_temporary/_${taskid}</span> sub-directory
           accessible via <span class="codefrag">${mapred.work.output.dir}</span>
           accessible via <span class="codefrag">${mapred.work.output.dir}</span>
           for each task-attempt on the <span class="codefrag">FileSystem</span> where the output
           for each task-attempt on the <span class="codefrag">FileSystem</span> where the output
@@ -1966,7 +1966,7 @@ document.write("Last Published: " + document.lastModified);
 <p>Note: The value of <span class="codefrag">${mapred.work.output.dir}</span> during 
 <p>Note: The value of <span class="codefrag">${mapred.work.output.dir}</span> during 
           execution of a particular task-attempt is actually 
           execution of a particular task-attempt is actually 
           <span class="codefrag">${mapred.output.dir}/_temporary/_{$taskid}</span>, and this value is 
           <span class="codefrag">${mapred.output.dir}/_temporary/_{$taskid}</span>, and this value is 
-          set by the map-reduce framework. So, just create any side-files in the 
+          set by the Map/Reduce framework. So, just create any side-files in the 
           path  returned by
           path  returned by
           <a href="api/org/apache/hadoop/mapred/FileOutputFormat.html#getWorkOutputPath(org.apache.hadoop.mapred.JobConf)">
           <a href="api/org/apache/hadoop/mapred/FileOutputFormat.html#getWorkOutputPath(org.apache.hadoop.mapred.JobConf)">
           FileOutputFormat.getWorkOutputPath() </a>from map/reduce 
           FileOutputFormat.getWorkOutputPath() </a>from map/reduce 
@@ -1988,7 +1988,7 @@ document.write("Last Published: " + document.lastModified);
 <h4>Counters</h4>
 <h4>Counters</h4>
 <p>
 <p>
 <span class="codefrag">Counters</span> represent global counters, defined either by 
 <span class="codefrag">Counters</span> represent global counters, defined either by 
-          the Map-Reduce framework or applications. Each <span class="codefrag">Counter</span> can 
+          the Map/Reduce framework or applications. Each <span class="codefrag">Counter</span> can 
           be of any <span class="codefrag">Enum</span> type. Counters of a particular 
           be of any <span class="codefrag">Enum</span> type. Counters of a particular 
           <span class="codefrag">Enum</span> are bunched into groups of type 
           <span class="codefrag">Enum</span> are bunched into groups of type 
           <span class="codefrag">Counters.Group</span>.</p>
           <span class="codefrag">Counters.Group</span>.</p>
@@ -2009,7 +2009,7 @@ document.write("Last Published: " + document.lastModified);
           files efficiently.</p>
           files efficiently.</p>
 <p>
 <p>
 <span class="codefrag">DistributedCache</span> is a facility provided by the 
 <span class="codefrag">DistributedCache</span> is a facility provided by the 
-          Map-Reduce framework to cache files (text, archives, jars and so on) 
+          Map/Reduce framework to cache files (text, archives, jars and so on) 
           needed by applications.</p>
           needed by applications.</p>
 <p>Applications specify the files to be cached via urls (hdfs://)
 <p>Applications specify the files to be cached via urls (hdfs://)
           in the <span class="codefrag">JobConf</span>. The <span class="codefrag">DistributedCache</span> 
           in the <span class="codefrag">JobConf</span>. The <span class="codefrag">DistributedCache</span> 
@@ -2078,7 +2078,7 @@ document.write("Last Published: " + document.lastModified);
           interface supports the handling of generic Hadoop command-line options.
           interface supports the handling of generic Hadoop command-line options.
           </p>
           </p>
 <p>
 <p>
-<span class="codefrag">Tool</span> is the standard for any Map-Reduce tool or 
+<span class="codefrag">Tool</span> is the standard for any Map/Reduce tool or 
           application. The application should delegate the handling of 
           application. The application should delegate the handling of 
           standard command-line options to 
           standard command-line options to 
           <a href="api/org/apache/hadoop/util/GenericOptionsParser.html">
           <a href="api/org/apache/hadoop/util/GenericOptionsParser.html">
@@ -2116,7 +2116,7 @@ document.write("Last Published: " + document.lastModified);
 <h4>IsolationRunner</h4>
 <h4>IsolationRunner</h4>
 <p>
 <p>
 <a href="api/org/apache/hadoop/mapred/IsolationRunner.html">
 <a href="api/org/apache/hadoop/mapred/IsolationRunner.html">
-          IsolationRunner</a> is a utility to help debug Map-Reduce programs.</p>
+          IsolationRunner</a> is a utility to help debug Map/Reduce programs.</p>
 <p>To use the <span class="codefrag">IsolationRunner</span>, first set 
 <p>To use the <span class="codefrag">IsolationRunner</span>, first set 
           <span class="codefrag">keep.failed.tasks.files</span> to <span class="codefrag">true</span> 
           <span class="codefrag">keep.failed.tasks.files</span> to <span class="codefrag">true</span> 
           (also see <span class="codefrag">keep.tasks.files.pattern</span>).</p>
           (also see <span class="codefrag">keep.tasks.files.pattern</span>).</p>
@@ -2219,11 +2219,11 @@ document.write("Last Published: " + document.lastModified);
 <h4>JobControl</h4>
 <h4>JobControl</h4>
 <p>
 <p>
 <a href="api/org/apache/hadoop/mapred/jobcontrol/package-summary.html">
 <a href="api/org/apache/hadoop/mapred/jobcontrol/package-summary.html">
-          JobControl</a> is a utility which encapsulates a set of Map-Reduce jobs
+          JobControl</a> is a utility which encapsulates a set of Map/Reduce jobs
           and their dependencies.</p>
           and their dependencies.</p>
 <a name="N10D57"></a><a name="Data+Compression"></a>
 <a name="N10D57"></a><a name="Data+Compression"></a>
 <h4>Data Compression</h4>
 <h4>Data Compression</h4>
-<p>Hadoop Map-Reduce provides facilities for the application-writer to
+<p>Hadoop Map/Reduce provides facilities for the application-writer to
           specify compression for both intermediate map-outputs and the
           specify compression for both intermediate map-outputs and the
           job-outputs i.e. output of the reduces. It also comes bundled with
           job-outputs i.e. output of the reduces. It also comes bundled with
           <a href="api/org/apache/hadoop/io/compress/CompressionCodec.html">
           <a href="api/org/apache/hadoop/io/compress/CompressionCodec.html">
@@ -2268,7 +2268,7 @@ document.write("Last Published: " + document.lastModified);
 <h2 class="h3">Example: WordCount v2.0</h2>
 <h2 class="h3">Example: WordCount v2.0</h2>
 <div class="section">
 <div class="section">
 <p>Here is a more complete <span class="codefrag">WordCount</span> which uses many of the
 <p>Here is a more complete <span class="codefrag">WordCount</span> which uses many of the
-      features provided by the Map-Reduce framework we discussed so far.</p>
+      features provided by the Map/Reduce framework we discussed so far.</p>
 <p>This needs the HDFS to be up and running, especially for the 
 <p>This needs the HDFS to be up and running, especially for the 
       <span class="codefrag">DistributedCache</span>-related features. Hence it only works with a 
       <span class="codefrag">DistributedCache</span>-related features. Hence it only works with a 
       <a href="quickstart.html#SingleNodeSetup">pseudo-distributed</a> or
       <a href="quickstart.html#SingleNodeSetup">pseudo-distributed</a> or
@@ -3655,7 +3655,7 @@ document.write("Last Published: " + document.lastModified);
 <a name="N1160B"></a><a name="Highlights"></a>
 <a name="N1160B"></a><a name="Highlights"></a>
 <h3 class="h4">Highlights</h3>
 <h3 class="h4">Highlights</h3>
 <p>The second version of <span class="codefrag">WordCount</span> improves upon the 
 <p>The second version of <span class="codefrag">WordCount</span> improves upon the 
-        previous one by using some features offered by the Map-Reduce framework:
+        previous one by using some features offered by the Map/Reduce framework:
         </p>
         </p>
 <ul>
 <ul>
           
           

文件差異過大導致無法顯示
+ 4 - 4
docs/mapred_tutorial.pdf


+ 10 - 10
docs/streaming.html

@@ -287,7 +287,7 @@ document.write("Last Published: " + document.lastModified);
 <h2 class="h3">Hadoop Streaming</h2>
 <h2 class="h3">Hadoop Streaming</h2>
 <div class="section">
 <div class="section">
 <p>
 <p>
-Hadoop streaming is a utility that comes with the Hadoop distribution. The utility allows you to create and run map/reduce jobs with any executable or script as the mapper and/or the reducer. For example:
+Hadoop streaming is a utility that comes with the Hadoop distribution. The utility allows you to create and run Map/Reduce jobs with any executable or script as the mapper and/or the reducer. For example:
 </p>
 </p>
 <pre class="code">
 <pre class="code">
 $HADOOP_HOME/bin/hadoop  jar $HADOOP_HOME/hadoop-streaming.jar \
 $HADOOP_HOME/bin/hadoop  jar $HADOOP_HOME/hadoop-streaming.jar \
@@ -303,7 +303,7 @@ $HADOOP_HOME/bin/hadoop  jar $HADOOP_HOME/hadoop-streaming.jar \
 <h2 class="h3">How Does Streaming Work </h2>
 <h2 class="h3">How Does Streaming Work </h2>
 <div class="section">
 <div class="section">
 <p>
 <p>
-In the above example, both the mapper and the reducer are executables that read the input from stdin (line by line) and emit the output to stdout. The utility will create a map/reduce job, submit the job to an appropriate cluster, and monitor the progress of the job until it completes.
+In the above example, both the mapper and the reducer are executables that read the input from stdin (line by line) and emit the output to stdout. The utility will create a Map/Reduce job, submit the job to an appropriate cluster, and monitor the progress of the job until it completes.
 </p>
 </p>
 <p>
 <p>
   When an executable is specified for mappers, each mapper task will launch the executable as a separate process when the mapper is initialized. As the mapper task runs, it converts its inputs into lines and feed the lines to the stdin of the process. In the meantime, the mapper collects the line oriented outputs from the stdout of the process and converts each line into a key/value pair, which is collected as the output of the mapper. By default, the 
   When an executable is specified for mappers, each mapper task will launch the executable as a separate process when the mapper is initialized. As the mapper task runs, it converts its inputs into lines and feed the lines to the stdin of the process. In the meantime, the mapper collects the line oriented outputs from the stdout of the process and converts each line into a key/value pair, which is collected as the output of the mapper. By default, the 
@@ -314,7 +314,7 @@ In the above example, both the mapper and the reducer are executables that read
 When an executable is specified for reducers, each reducer task will launch the executable as a separate process then the reducer is initialized. As the reducer task runs, it converts its input key/values pairs into lines and feeds the lines to the stdin of the process. In the meantime, the reducer collects the line oriented outputs from the stdout of the process, converts each line into a key/value pair, which is collected as the output of the reducer. By default, the prefix of a line up to the first tab character is the key and the the rest of the line (excluding the tab character) is the value. However, this can be customized, as discussed later.
 When an executable is specified for reducers, each reducer task will launch the executable as a separate process then the reducer is initialized. As the reducer task runs, it converts its input key/values pairs into lines and feeds the lines to the stdin of the process. In the meantime, the reducer collects the line oriented outputs from the stdout of the process, converts each line into a key/value pair, which is collected as the output of the reducer. By default, the prefix of a line up to the first tab character is the key and the the rest of the line (excluding the tab character) is the value. However, this can be customized, as discussed later.
 </p>
 </p>
 <p>
 <p>
-This is the basis for the communication protocol between the map/reduce framework and the streaming mapper/reducer.
+This is the basis for the communication protocol between the Map/Reduce framework and the streaming mapper/reducer.
 </p>
 </p>
 <p>
 <p>
 You can supply a Java class as the mapper and/or the reducer. The above example is equivalent to:
 You can supply a Java class as the mapper and/or the reducer. The above example is equivalent to:
@@ -372,7 +372,7 @@ $HADOOP_HOME/bin/hadoop  jar $HADOOP_HOME/hadoop-streaming.jar \
 <a name="N10077"></a><a name="Mapper-Only+Jobs"></a>
 <a name="N10077"></a><a name="Mapper-Only+Jobs"></a>
 <h3 class="h4">Mapper-Only Jobs </h3>
 <h3 class="h4">Mapper-Only Jobs </h3>
 <p>
 <p>
-Often, you may want to process input data using a map function only. To do this, simply set mapred.reduce.tasks to zero. The map/reduce framework will not create any reducer tasks. Rather, the outputs of the mapper tasks will be the final output of the job.
+Often, you may want to process input data using a map function only. To do this, simply set mapred.reduce.tasks to zero. The Map/Reduce framework will not create any reducer tasks. Rather, the outputs of the mapper tasks will be the final output of the job.
 </p>
 </p>
 <p>
 <p>
 To be backward compatible, Hadoop Streaming also supports the "-reduce NONE" option, which is equivalent to "-jobconf mapred.reduce.tasks=0".
 To be backward compatible, Hadoop Streaming also supports the "-reduce NONE" option, which is equivalent to "-jobconf mapred.reduce.tasks=0".
@@ -380,7 +380,7 @@ To be backward compatible, Hadoop Streaming also supports the "-reduce NONE" opt
 <a name="N10083"></a><a name="Specifying+Other+Plugins+for+Jobs"></a>
 <a name="N10083"></a><a name="Specifying+Other+Plugins+for+Jobs"></a>
 <h3 class="h4">Specifying Other Plugins for Jobs </h3>
 <h3 class="h4">Specifying Other Plugins for Jobs </h3>
 <p>
 <p>
-Just as with a normal map/reduce job, you can specify other plugins for a streaming job:
+Just as with a normal Map/Reduce job, you can specify other plugins for a streaming job:
 </p>
 </p>
 <pre class="code">
 <pre class="code">
    -inputformat JavaClassName
    -inputformat JavaClassName
@@ -500,7 +500,7 @@ Other options you may specify for a streaming job are described here:
 
 
 
 
 <tr>
 <tr>
-<td colspan="1" rowspan="1"> -dfs  host:port or local </td><td colspan="1" rowspan="1"> Optional </td><td colspan="1" rowspan="1"> Override the DFS configuration for the job </td>
+<td colspan="1" rowspan="1"> -dfs  host:port or local </td><td colspan="1" rowspan="1"> Optional </td><td colspan="1" rowspan="1"> Override the HDFS configuration for the job </td>
 </tr>
 </tr>
 
 
 <tr>
 <tr>
@@ -571,7 +571,7 @@ To set an environment variable in a streaming command use:
 <a name="N10194"></a><a name="Customizing+the+Way+to+Split+Lines+into+Key%2FValue+Pairs"></a>
 <a name="N10194"></a><a name="Customizing+the+Way+to+Split+Lines+into+Key%2FValue+Pairs"></a>
 <h3 class="h4">Customizing the Way to Split Lines into Key/Value Pairs </h3>
 <h3 class="h4">Customizing the Way to Split Lines into Key/Value Pairs </h3>
 <p>
 <p>
-As noted earlier, when the map/reduce framework reads a line from the stdout of the mapper, it splits the line into a key/value pair. By default, the prefix of the line up to the first tab character is the key and the the rest of the line (excluding the tab character) is the value.
+As noted earlier, when the Map/Reduce framework reads a line from the stdout of the mapper, it splits the line into a key/value pair. By default, the prefix of the line up to the first tab character is the key and the the rest of the line (excluding the tab character) is the value.
 </p>
 </p>
 <p>
 <p>
 However, you can customize this default. You can specify a field separator other than the tab character (the default), and you can specify the nth (n &gt;= 1) character rather than the first character in a line (the default) as the separator between the key and value. For example:
 However, you can customize this default. You can specify a field separator other than the tab character (the default), and you can specify the nth (n &gt;= 1) character rather than the first character in a line (the default) as the separator between the key and value. For example:
@@ -594,7 +594,7 @@ Similarly, you can use "-jobconf stream.reduce.output.field.separator=SEP" and "
 <a name="N101AA"></a><a name="A+Useful+Partitioner+Class+%28secondary+sort%2C+the+-partitioner+org.apache.hadoop.mapred.lib.KeyFieldBasedPartitioner+option%29"></a>
 <a name="N101AA"></a><a name="A+Useful+Partitioner+Class+%28secondary+sort%2C+the+-partitioner+org.apache.hadoop.mapred.lib.KeyFieldBasedPartitioner+option%29"></a>
 <h3 class="h4">A Useful Partitioner Class (secondary sort, the -partitioner org.apache.hadoop.mapred.lib.KeyFieldBasedPartitioner option) </h3>
 <h3 class="h4">A Useful Partitioner Class (secondary sort, the -partitioner org.apache.hadoop.mapred.lib.KeyFieldBasedPartitioner option) </h3>
 <p>
 <p>
-Hadoop has a library class, org.apache.hadoop.mapred.lib.KeyFieldBasedPartitioner, that is useful for many applications. This class allows the map/reduce framework to partition the map outputs based on prefixes of keys, not the whole keys. For example:
+Hadoop has a library class, org.apache.hadoop.mapred.lib.KeyFieldBasedPartitioner, that is useful for many applications. This class allows the Map/Reduce framework to partition the map outputs based on prefixes of keys, not the whole keys. For example:
 </p>
 </p>
 <pre class="code">
 <pre class="code">
 $HADOOP_HOME/bin/hadoop  jar $HADOOP_HOME/hadoop-streaming.jar \
 $HADOOP_HOME/bin/hadoop  jar $HADOOP_HOME/hadoop-streaming.jar \
@@ -613,7 +613,7 @@ $HADOOP_HOME/bin/hadoop  jar $HADOOP_HOME/hadoop-streaming.jar \
 Here, <em>-jobconf stream.map.output.field.separator=.</em> and <em>-jobconf stream.num.map.output.key.fields=4</em> are as explained in previous example. The two variables are used by streaming to identify the key/value pair of mapper. 
 Here, <em>-jobconf stream.map.output.field.separator=.</em> and <em>-jobconf stream.num.map.output.key.fields=4</em> are as explained in previous example. The two variables are used by streaming to identify the key/value pair of mapper. 
 </p>
 </p>
 <p>
 <p>
-The map output keys of the above map/reduce job normally have four fields separated by ".". However, the map/reduce framework will partition the map outputs by the first two fields of the keys using the <em>-jobconf num.key.fields.for.partition=2</em> option. Here, <em>-jobconf map.output.key.field.separator=.</em> specifies the separator for the partition. This guarantees that all the key/value pairs with the same first two fields in the keys will be partitioned into the same reducer.
+The map output keys of the above Map/Reduce job normally have four fields separated by ".". However, the Map/Reduce framework will partition the map outputs by the first two fields of the keys using the <em>-jobconf num.key.fields.for.partition=2</em> option. Here, <em>-jobconf map.output.key.field.separator=.</em> specifies the separator for the partition. This guarantees that all the key/value pairs with the same first two fields in the keys will be partitioned into the same reducer.
 </p>
 </p>
 <p>
 <p>
 
 
@@ -746,7 +746,7 @@ As an example, consider the problem of zipping (compressing) a set of files acro
 
 
 <li> Hadoop Streaming and custom mapper script:<ul>
 <li> Hadoop Streaming and custom mapper script:<ul>
   
   
-<li> Generate a file containing the full DFS path of the input files. Each map task would get one file name as input.</li>
+<li> Generate a file containing the full HDFS path of the input files. Each map task would get one file name as input.</li>
   
   
 <li> Create a mapper script which, given a filename, will get the file to local disk, gzip the file and put it back in the desired output directory</li>
 <li> Create a mapper script which, given a filename, will get the file to local disk, gzip the file and put it back in the desired output directory</li>
 
 

文件差異過大導致無法顯示
+ 1 - 1
docs/streaming.pdf


+ 37 - 37
src/docs/src/documentation/content/xdocs/mapred_tutorial.xml

@@ -20,7 +20,7 @@
 <document>
 <document>
   
   
   <header>
   <header>
-    <title>Hadoop Map-Reduce Tutorial</title>
+    <title>Hadoop Map/Reduce Tutorial</title>
   </header>
   </header>
   
   
   <body>
   <body>
@@ -29,7 +29,7 @@
       <title>Purpose</title>
       <title>Purpose</title>
       
       
       <p>This document comprehensively describes all user-facing facets of the 
       <p>This document comprehensively describes all user-facing facets of the 
-      Hadoop Map-Reduce framework and serves as a tutorial.
+      Hadoop Map/Reduce framework and serves as a tutorial.
       </p>
       </p>
     </section>
     </section>
     
     
@@ -52,12 +52,12 @@
     <section>
     <section>
       <title>Overview</title>
       <title>Overview</title>
       
       
-      <p>Hadoop Map-Reduce is a software framework for easily writing 
+      <p>Hadoop Map/Reduce is a software framework for easily writing 
       applications which process vast amounts of data (multi-terabyte data-sets) 
       applications which process vast amounts of data (multi-terabyte data-sets) 
       in-parallel on large clusters (thousands of nodes) of commodity 
       in-parallel on large clusters (thousands of nodes) of commodity 
       hardware in a reliable, fault-tolerant manner.</p>
       hardware in a reliable, fault-tolerant manner.</p>
       
       
-      <p>A Map-Reduce <em>job</em> usually splits the input data-set into 
+      <p>A Map/Reduce <em>job</em> usually splits the input data-set into 
       independent chunks which are processed by the <em>map tasks</em> in a
       independent chunks which are processed by the <em>map tasks</em> in a
       completely parallel manner. The framework sorts the outputs of the maps, 
       completely parallel manner. The framework sorts the outputs of the maps, 
       which are then input to the <em>reduce tasks</em>. Typically both the 
       which are then input to the <em>reduce tasks</em>. Typically both the 
@@ -66,13 +66,13 @@
       tasks.</p>
       tasks.</p>
       
       
       <p>Typically the compute nodes and the storage nodes are the same, that is, 
       <p>Typically the compute nodes and the storage nodes are the same, that is, 
-      the Map-Reduce framework and the <a href="hdfs_design.html">Distributed 
+      the Map/Reduce framework and the <a href="hdfs_design.html">Distributed 
       FileSystem</a> are running on the same set of nodes. This configuration
       FileSystem</a> are running on the same set of nodes. This configuration
       allows the framework to effectively schedule tasks on the nodes where data 
       allows the framework to effectively schedule tasks on the nodes where data 
       is already present, resulting in very high aggregate bandwidth across the 
       is already present, resulting in very high aggregate bandwidth across the 
       cluster.</p>
       cluster.</p>
       
       
-      <p>The Map-Reduce framework consists of a single master 
+      <p>The Map/Reduce framework consists of a single master 
       <code>JobTracker</code> and one slave <code>TaskTracker</code> per 
       <code>JobTracker</code> and one slave <code>TaskTracker</code> per 
       cluster-node. The master is responsible for scheduling the jobs' component 
       cluster-node. The master is responsible for scheduling the jobs' component 
       tasks on the slaves, monitoring them and re-executing the failed tasks. The 
       tasks on the slaves, monitoring them and re-executing the failed tasks. The 
@@ -89,7 +89,7 @@
       information to the job-client.</p>
       information to the job-client.</p>
       
       
       <p>Although the Hadoop framework is implemented in Java<sup>TM</sup>, 
       <p>Although the Hadoop framework is implemented in Java<sup>TM</sup>, 
-      Map-Reduce applications need not be written in Java.</p>
+      Map/Reduce applications need not be written in Java.</p>
       <ul>
       <ul>
         <li>
         <li>
           <a href="ext:api/org/apache/hadoop/streaming/package-summary">
           <a href="ext:api/org/apache/hadoop/streaming/package-summary">
@@ -100,7 +100,7 @@
         <li>
         <li>
           <a href="ext:api/org/apache/hadoop/mapred/pipes/package-summary">
           <a href="ext:api/org/apache/hadoop/mapred/pipes/package-summary">
           Hadoop Pipes</a> is a <a href="http://www.swig.org/">SWIG</a>-
           Hadoop Pipes</a> is a <a href="http://www.swig.org/">SWIG</a>-
-          compatible <em>C++ API</em> to implement Map-Reduce applications (non 
+          compatible <em>C++ API</em> to implement Map/Reduce applications (non 
           JNI<sup>TM</sup> based).
           JNI<sup>TM</sup> based).
         </li>
         </li>
       </ul>
       </ul>
@@ -109,7 +109,7 @@
     <section>
     <section>
       <title>Inputs and Outputs</title>
       <title>Inputs and Outputs</title>
 
 
-      <p>The Map-Reduce framework operates exclusively on 
+      <p>The Map/Reduce framework operates exclusively on 
       <code>&lt;key, value&gt;</code> pairs, that is, the framework views the 
       <code>&lt;key, value&gt;</code> pairs, that is, the framework views the 
       input to the job as a set of <code>&lt;key, value&gt;</code> pairs and 
       input to the job as a set of <code>&lt;key, value&gt;</code> pairs and 
       produces a set of <code>&lt;key, value&gt;</code> pairs as the output of 
       produces a set of <code>&lt;key, value&gt;</code> pairs as the output of 
@@ -123,7 +123,7 @@
       WritableComparable</a> interface to facilitate sorting by the framework.
       WritableComparable</a> interface to facilitate sorting by the framework.
       </p>
       </p>
 
 
-      <p>Input and Output types of a Map-Reduce job:</p>
+      <p>Input and Output types of a Map/Reduce job:</p>
       <p>
       <p>
         (input) <code>&lt;k1, v1&gt;</code> 
         (input) <code>&lt;k1, v1&gt;</code> 
         -&gt; 
         -&gt; 
@@ -144,7 +144,7 @@
     <section>
     <section>
       <title>Example: WordCount v1.0</title>
       <title>Example: WordCount v1.0</title>
       
       
-      <p>Before we jump into the details, lets walk through an example Map-Reduce 
+      <p>Before we jump into the details, lets walk through an example Map/Reduce 
       application to get a flavour for how they work.</p>
       application to get a flavour for how they work.</p>
       
       
       <p><code>WordCount</code> is a simple application that counts the number of
       <p><code>WordCount</code> is a simple application that counts the number of
@@ -683,10 +683,10 @@
     </section>
     </section>
     
     
     <section>
     <section>
-      <title>Map-Reduce - User Interfaces</title>
+      <title>Map/Reduce - User Interfaces</title>
       
       
       <p>This section provides a reasonable amount of detail on every user-facing 
       <p>This section provides a reasonable amount of detail on every user-facing 
-      aspect of the Map-Reduce framwork. This should help users implement, 
+      aspect of the Map/Reduce framwork. This should help users implement, 
       configure and tune their jobs in a fine-grained manner. However, please 
       configure and tune their jobs in a fine-grained manner. However, please 
       note that the javadoc for each class/interface remains the most 
       note that the javadoc for each class/interface remains the most 
       comprehensive documentation available; this is only meant to be a tutorial.
       comprehensive documentation available; this is only meant to be a tutorial.
@@ -724,7 +724,7 @@
           to be of the same type as the input records. A given input pair may 
           to be of the same type as the input records. A given input pair may 
           map to zero or many output pairs.</p> 
           map to zero or many output pairs.</p> 
  
  
-          <p>The Hadoop Map-Reduce framework spawns one map task for each 
+          <p>The Hadoop Map/Reduce framework spawns one map task for each 
           <code>InputSplit</code> generated by the <code>InputFormat</code> for 
           <code>InputSplit</code> generated by the <code>InputFormat</code> for 
           the job.</p>
           the job.</p>
           
           
@@ -935,7 +935,7 @@
           <title>Reporter</title>
           <title>Reporter</title>
         
         
           <p><a href="ext:api/org/apache/hadoop/mapred/reporter">
           <p><a href="ext:api/org/apache/hadoop/mapred/reporter">
-          Reporter</a> is a facility for Map-Reduce applications to report 
+          Reporter</a> is a facility for Map/Reduce applications to report 
           progress, set application-level status messages and update 
           progress, set application-level status messages and update 
           <code>Counters</code>.</p>
           <code>Counters</code>.</p>
  
  
@@ -958,12 +958,12 @@
         
         
           <p><a href="ext:api/org/apache/hadoop/mapred/outputcollector">
           <p><a href="ext:api/org/apache/hadoop/mapred/outputcollector">
           OutputCollector</a> is a generalization of the facility provided by
           OutputCollector</a> is a generalization of the facility provided by
-          the Map-Reduce framework to collect data output by the 
+          the Map/Reduce framework to collect data output by the 
           <code>Mapper</code> or the <code>Reducer</code> (either the 
           <code>Mapper</code> or the <code>Reducer</code> (either the 
           intermediate outputs or the output of the job).</p>
           intermediate outputs or the output of the job).</p>
         </section>
         </section>
       
       
-        <p>Hadoop Map-Reduce comes bundled with a 
+        <p>Hadoop Map/Reduce comes bundled with a 
         <a href="ext:api/org/apache/hadoop/mapred/lib/package-summary">
         <a href="ext:api/org/apache/hadoop/mapred/lib/package-summary">
         library</a> of generally useful mappers, reducers, and partitioners.</p>
         library</a> of generally useful mappers, reducers, and partitioners.</p>
       </section>
       </section>
@@ -972,10 +972,10 @@
         <title>Job Configuration</title>
         <title>Job Configuration</title>
         
         
         <p><a href="ext:api/org/apache/hadoop/mapred/jobconf">
         <p><a href="ext:api/org/apache/hadoop/mapred/jobconf">
-        JobConf</a> represents a Map-Reduce job configuration.</p>
+        JobConf</a> represents a Map/Reduce job configuration.</p>
  
  
         <p><code>JobConf</code> is the primary interface for a user to describe
         <p><code>JobConf</code> is the primary interface for a user to describe
-        a map-reduce job to the Hadoop framework for execution. The framework 
+        a Map/Reduce job to the Hadoop framework for execution. The framework 
         tries to faithfully execute the job as described by <code>JobConf</code>, 
         tries to faithfully execute the job as described by <code>JobConf</code>, 
         however:</p> 
         however:</p> 
         <ul>
         <ul>
@@ -1204,7 +1204,7 @@
         with the <code>JobTracker</code>.</p>
         with the <code>JobTracker</code>.</p>
  
  
         <p><code>JobClient</code> provides facilities to submit jobs, track their 
         <p><code>JobClient</code> provides facilities to submit jobs, track their 
-        progress, access component-tasks' reports/logs, get the Map-Reduce 
+        progress, access component-tasks' reports and logs, get the Map/Reduce 
         cluster's status information and so on.</p>
         cluster's status information and so on.</p>
  
  
         <p>The job submission process involves:</p>
         <p>The job submission process involves:</p>
@@ -1216,7 +1216,7 @@
             <code>DistributedCache</code> of the job, if necessary.
             <code>DistributedCache</code> of the job, if necessary.
           </li>
           </li>
           <li>
           <li>
-            Copying the job's jar and configuration to the map-reduce system 
+            Copying the job's jar and configuration to the Map/Reduce system 
             directory on the <code>FileSystem</code>.
             directory on the <code>FileSystem</code>.
           </li>
           </li>
           <li>
           <li>
@@ -1253,8 +1253,8 @@
         <section>
         <section>
           <title>Job Control</title>
           <title>Job Control</title>
  
  
-          <p>Users may need to chain map-reduce jobs to accomplish complex
-          tasks which cannot be done via a single map-reduce job. This is fairly
+          <p>Users may need to chain Map/Reduce jobs to accomplish complex
+          tasks which cannot be done via a single Map/Reduce job. This is fairly
           easy since the output of the job typically goes to distributed 
           easy since the output of the job typically goes to distributed 
           file-system, and the output, in turn, can be used as the input for the 
           file-system, and the output, in turn, can be used as the input for the 
           next job.</p>
           next job.</p>
@@ -1288,10 +1288,10 @@
         <title>Job Input</title>
         <title>Job Input</title>
         
         
         <p><a href="ext:api/org/apache/hadoop/mapred/inputformat">
         <p><a href="ext:api/org/apache/hadoop/mapred/inputformat">
-        InputFormat</a> describes the input-specification for a Map-Reduce job.
+        InputFormat</a> describes the input-specification for a Map/Reduce job.
         </p> 
         </p> 
  
  
-        <p>The Map-Reduce framework relies on the <code>InputFormat</code> of 
+        <p>The Map/Reduce framework relies on the <code>InputFormat</code> of 
         the job to:</p>
         the job to:</p>
         <ol>
         <ol>
           <li>Validate the input-specification of the job.</li>
           <li>Validate the input-specification of the job.</li>
@@ -1370,10 +1370,10 @@
         <title>Job Output</title>
         <title>Job Output</title>
         
         
         <p><a href="ext:api/org/apache/hadoop/mapred/outputformat">
         <p><a href="ext:api/org/apache/hadoop/mapred/outputformat">
-        OutputFormat</a> describes the output-specification for a Map-Reduce 
+        OutputFormat</a> describes the output-specification for a Map/Reduce 
         job.</p>
         job.</p>
 
 
-        <p>The Map-Reduce framework relies on the <code>OutputFormat</code> of 
+        <p>The Map/Reduce framework relies on the <code>OutputFormat</code> of 
         the job to:</p>
         the job to:</p>
         <ol>
         <ol>
           <li>
           <li>
@@ -1404,7 +1404,7 @@
           (using the attemptid, say <code>attempt_200709221812_0001_m_000000_0</code>), 
           (using the attemptid, say <code>attempt_200709221812_0001_m_000000_0</code>), 
           not just per task.</p> 
           not just per task.</p> 
  
  
-          <p>To avoid these issues the Map-Reduce framework maintains a special 
+          <p>To avoid these issues the Map/Reduce framework maintains a special 
           <code>${mapred.output.dir}/_temporary/_${taskid}</code> sub-directory
           <code>${mapred.output.dir}/_temporary/_${taskid}</code> sub-directory
           accessible via <code>${mapred.work.output.dir}</code>
           accessible via <code>${mapred.work.output.dir}</code>
           for each task-attempt on the <code>FileSystem</code> where the output
           for each task-attempt on the <code>FileSystem</code> where the output
@@ -1426,7 +1426,7 @@
           <p>Note: The value of <code>${mapred.work.output.dir}</code> during 
           <p>Note: The value of <code>${mapred.work.output.dir}</code> during 
           execution of a particular task-attempt is actually 
           execution of a particular task-attempt is actually 
           <code>${mapred.output.dir}/_temporary/_{$taskid}</code>, and this value is 
           <code>${mapred.output.dir}/_temporary/_{$taskid}</code>, and this value is 
-          set by the map-reduce framework. So, just create any side-files in the 
+          set by the Map/Reduce framework. So, just create any side-files in the 
           path  returned by
           path  returned by
           <a href="ext:api/org/apache/hadoop/mapred/fileoutputformat/getworkoutputpath">
           <a href="ext:api/org/apache/hadoop/mapred/fileoutputformat/getworkoutputpath">
           FileOutputFormat.getWorkOutputPath() </a>from map/reduce 
           FileOutputFormat.getWorkOutputPath() </a>from map/reduce 
@@ -1456,7 +1456,7 @@
           <title>Counters</title>
           <title>Counters</title>
           
           
           <p><code>Counters</code> represent global counters, defined either by 
           <p><code>Counters</code> represent global counters, defined either by 
-          the Map-Reduce framework or applications. Each <code>Counter</code> can 
+          the Map/Reduce framework or applications. Each <code>Counter</code> can 
           be of any <code>Enum</code> type. Counters of a particular 
           be of any <code>Enum</code> type. Counters of a particular 
           <code>Enum</code> are bunched into groups of type 
           <code>Enum</code> are bunched into groups of type 
           <code>Counters.Group</code>.</p>
           <code>Counters.Group</code>.</p>
@@ -1480,7 +1480,7 @@
           files efficiently.</p>
           files efficiently.</p>
  
  
           <p><code>DistributedCache</code> is a facility provided by the 
           <p><code>DistributedCache</code> is a facility provided by the 
-          Map-Reduce framework to cache files (text, archives, jars and so on) 
+          Map/Reduce framework to cache files (text, archives, jars and so on) 
           needed by applications.</p>
           needed by applications.</p>
  
  
           <p>Applications specify the files to be cached via urls (hdfs://)
           <p>Applications specify the files to be cached via urls (hdfs://)
@@ -1558,7 +1558,7 @@
           interface supports the handling of generic Hadoop command-line options.
           interface supports the handling of generic Hadoop command-line options.
           </p>
           </p>
           
           
-          <p><code>Tool</code> is the standard for any Map-Reduce tool or 
+          <p><code>Tool</code> is the standard for any Map/Reduce tool or 
           application. The application should delegate the handling of 
           application. The application should delegate the handling of 
           standard command-line options to 
           standard command-line options to 
           <a href="ext:api/org/apache/hadoop/util/genericoptionsparser">
           <a href="ext:api/org/apache/hadoop/util/genericoptionsparser">
@@ -1591,7 +1591,7 @@
           <title>IsolationRunner</title>
           <title>IsolationRunner</title>
           
           
           <p><a href="ext:api/org/apache/hadoop/mapred/isolationrunner">
           <p><a href="ext:api/org/apache/hadoop/mapred/isolationrunner">
-          IsolationRunner</a> is a utility to help debug Map-Reduce programs.</p>
+          IsolationRunner</a> is a utility to help debug Map/Reduce programs.</p>
           
           
           <p>To use the <code>IsolationRunner</code>, first set 
           <p>To use the <code>IsolationRunner</code>, first set 
           <code>keep.failed.tasks.files</code> to <code>true</code> 
           <code>keep.failed.tasks.files</code> to <code>true</code> 
@@ -1703,14 +1703,14 @@
           <title>JobControl</title>
           <title>JobControl</title>
           
           
           <p><a href="ext:api/org/apache/hadoop/mapred/jobcontrol/package-summary">
           <p><a href="ext:api/org/apache/hadoop/mapred/jobcontrol/package-summary">
-          JobControl</a> is a utility which encapsulates a set of Map-Reduce jobs
+          JobControl</a> is a utility which encapsulates a set of Map/Reduce jobs
           and their dependencies.</p>
           and their dependencies.</p>
         </section>
         </section>
         
         
         <section>
         <section>
           <title>Data Compression</title>
           <title>Data Compression</title>
           
           
-          <p>Hadoop Map-Reduce provides facilities for the application-writer to
+          <p>Hadoop Map/Reduce provides facilities for the application-writer to
           specify compression for both intermediate map-outputs and the
           specify compression for both intermediate map-outputs and the
           job-outputs i.e. output of the reduces. It also comes bundled with
           job-outputs i.e. output of the reduces. It also comes bundled with
           <a href="ext:api/org/apache/hadoop/io/compress/compressioncodec">
           <a href="ext:api/org/apache/hadoop/io/compress/compressioncodec">
@@ -1765,7 +1765,7 @@
       <title>Example: WordCount v2.0</title>
       <title>Example: WordCount v2.0</title>
       
       
       <p>Here is a more complete <code>WordCount</code> which uses many of the
       <p>Here is a more complete <code>WordCount</code> which uses many of the
-      features provided by the Map-Reduce framework we discussed so far.</p>
+      features provided by the Map/Reduce framework we discussed so far.</p>
       
       
       <p>This needs the HDFS to be up and running, especially for the 
       <p>This needs the HDFS to be up and running, especially for the 
       <code>DistributedCache</code>-related features. Hence it only works with a 
       <code>DistributedCache</code>-related features. Hence it only works with a 
@@ -2717,7 +2717,7 @@
         <title>Highlights</title>
         <title>Highlights</title>
         
         
         <p>The second version of <code>WordCount</code> improves upon the 
         <p>The second version of <code>WordCount</code> improves upon the 
-        previous one by using some features offered by the Map-Reduce framework:
+        previous one by using some features offered by the Map/Reduce framework:
         </p>
         </p>
         <ul>
         <ul>
           <li>
           <li>

+ 10 - 10
src/docs/src/documentation/content/xdocs/streaming.xml

@@ -31,7 +31,7 @@
 <title>Hadoop Streaming</title>
 <title>Hadoop Streaming</title>
 
 
 <p>
 <p>
-Hadoop streaming is a utility that comes with the Hadoop distribution. The utility allows you to create and run map/reduce jobs with any executable or script as the mapper and/or the reducer. For example:
+Hadoop streaming is a utility that comes with the Hadoop distribution. The utility allows you to create and run Map/Reduce jobs with any executable or script as the mapper and/or the reducer. For example:
 </p>
 </p>
 <source>
 <source>
 $HADOOP_HOME/bin/hadoop  jar $HADOOP_HOME/hadoop-streaming.jar \
 $HADOOP_HOME/bin/hadoop  jar $HADOOP_HOME/hadoop-streaming.jar \
@@ -45,7 +45,7 @@ $HADOOP_HOME/bin/hadoop  jar $HADOOP_HOME/hadoop-streaming.jar \
 <section>
 <section>
 <title>How Does Streaming Work </title>
 <title>How Does Streaming Work </title>
 <p>
 <p>
-In the above example, both the mapper and the reducer are executables that read the input from stdin (line by line) and emit the output to stdout. The utility will create a map/reduce job, submit the job to an appropriate cluster, and monitor the progress of the job until it completes.
+In the above example, both the mapper and the reducer are executables that read the input from stdin (line by line) and emit the output to stdout. The utility will create a Map/Reduce job, submit the job to an appropriate cluster, and monitor the progress of the job until it completes.
 </p><p>
 </p><p>
   When an executable is specified for mappers, each mapper task will launch the executable as a separate process when the mapper is initialized. As the mapper task runs, it converts its inputs into lines and feed the lines to the stdin of the process. In the meantime, the mapper collects the line oriented outputs from the stdout of the process and converts each line into a key/value pair, which is collected as the output of the mapper. By default, the 
   When an executable is specified for mappers, each mapper task will launch the executable as a separate process when the mapper is initialized. As the mapper task runs, it converts its inputs into lines and feed the lines to the stdin of the process. In the meantime, the mapper collects the line oriented outputs from the stdout of the process and converts each line into a key/value pair, which is collected as the output of the mapper. By default, the 
   <em>prefix of a line up to the first tab character</em> is the <strong>key</strong> and the the rest of the line (excluding the tab character) will be the <strong>value</strong>. 
   <em>prefix of a line up to the first tab character</em> is the <strong>key</strong> and the the rest of the line (excluding the tab character) will be the <strong>value</strong>. 
@@ -54,7 +54,7 @@ In the above example, both the mapper and the reducer are executables that read
 <p>
 <p>
 When an executable is specified for reducers, each reducer task will launch the executable as a separate process then the reducer is initialized. As the reducer task runs, it converts its input key/values pairs into lines and feeds the lines to the stdin of the process. In the meantime, the reducer collects the line oriented outputs from the stdout of the process, converts each line into a key/value pair, which is collected as the output of the reducer. By default, the prefix of a line up to the first tab character is the key and the the rest of the line (excluding the tab character) is the value. However, this can be customized, as discussed later.
 When an executable is specified for reducers, each reducer task will launch the executable as a separate process then the reducer is initialized. As the reducer task runs, it converts its input key/values pairs into lines and feeds the lines to the stdin of the process. In the meantime, the reducer collects the line oriented outputs from the stdout of the process, converts each line into a key/value pair, which is collected as the output of the reducer. By default, the prefix of a line up to the first tab character is the key and the the rest of the line (excluding the tab character) is the value. However, this can be customized, as discussed later.
 </p><p>
 </p><p>
-This is the basis for the communication protocol between the map/reduce framework and the streaming mapper/reducer.
+This is the basis for the communication protocol between the Map/Reduce framework and the streaming mapper/reducer.
 </p><p>
 </p><p>
 You can supply a Java class as the mapper and/or the reducer. The above example is equivalent to:
 You can supply a Java class as the mapper and/or the reducer. The above example is equivalent to:
 </p>
 </p>
@@ -109,7 +109,7 @@ $HADOOP_HOME/bin/hadoop  jar $HADOOP_HOME/hadoop-streaming.jar \
 <section>
 <section>
 <title>Mapper-Only Jobs </title>
 <title>Mapper-Only Jobs </title>
 <p>
 <p>
-Often, you may want to process input data using a map function only. To do this, simply set mapred.reduce.tasks to zero. The map/reduce framework will not create any reducer tasks. Rather, the outputs of the mapper tasks will be the final output of the job.
+Often, you may want to process input data using a map function only. To do this, simply set mapred.reduce.tasks to zero. The Map/Reduce framework will not create any reducer tasks. Rather, the outputs of the mapper tasks will be the final output of the job.
 </p><p>
 </p><p>
 To be backward compatible, Hadoop Streaming also supports the "-reduce NONE" option, which is equivalent to "-jobconf mapred.reduce.tasks=0".
 To be backward compatible, Hadoop Streaming also supports the "-reduce NONE" option, which is equivalent to "-jobconf mapred.reduce.tasks=0".
 </p>
 </p>
@@ -118,7 +118,7 @@ To be backward compatible, Hadoop Streaming also supports the "-reduce NONE" opt
 <section>
 <section>
 <title>Specifying Other Plugins for Jobs </title>
 <title>Specifying Other Plugins for Jobs </title>
 <p>
 <p>
-Just as with a normal map/reduce job, you can specify other plugins for a streaming job:
+Just as with a normal Map/Reduce job, you can specify other plugins for a streaming job:
 </p>
 </p>
 <source>
 <source>
    -inputformat JavaClassName
    -inputformat JavaClassName
@@ -235,7 +235,7 @@ Other options you may specify for a streaming job are described here:
 <tr><th>Parameter</th><th>Optional/Required </th><th>Description </th></tr>
 <tr><th>Parameter</th><th>Optional/Required </th><th>Description </th></tr>
 <tr><td> -cluster name </td><td> Optional </td><td> Switch between local Hadoop and one or more remote clusters </td></tr>
 <tr><td> -cluster name </td><td> Optional </td><td> Switch between local Hadoop and one or more remote clusters </td></tr>
 
 
-<tr><td> -dfs  host:port or local </td><td> Optional </td><td> Override the DFS configuration for the job </td></tr>
+<tr><td> -dfs  host:port or local </td><td> Optional </td><td> Override the HDFS configuration for the job </td></tr>
 <tr><td> -jt host:port or local </td><td> Optional </td><td> Override the JobTracker configuration for the job </td></tr>
 <tr><td> -jt host:port or local </td><td> Optional </td><td> Override the JobTracker configuration for the job </td></tr>
 <tr><td> -additionalconfspec specfile </td><td> Optional </td><td> Specifies a set of configuration variables in an XML file like hadoop-site.xml, instead of using multiple  options of type "-jobconf name=value" </td></tr>
 <tr><td> -additionalconfspec specfile </td><td> Optional </td><td> Specifies a set of configuration variables in an XML file like hadoop-site.xml, instead of using multiple  options of type "-jobconf name=value" </td></tr>
 
 
@@ -282,7 +282,7 @@ To set an environment variable in a streaming command use:
 <section>
 <section>
 <title>Customizing the Way to Split Lines into Key/Value Pairs </title>
 <title>Customizing the Way to Split Lines into Key/Value Pairs </title>
 <p>
 <p>
-As noted earlier, when the map/reduce framework reads a line from the stdout of the mapper, it splits the line into a key/value pair. By default, the prefix of the line up to the first tab character is the key and the the rest of the line (excluding the tab character) is the value.
+As noted earlier, when the Map/Reduce framework reads a line from the stdout of the mapper, it splits the line into a key/value pair. By default, the prefix of the line up to the first tab character is the key and the the rest of the line (excluding the tab character) is the value.
 </p>
 </p>
 <p>
 <p>
 However, you can customize this default. You can specify a field separator other than the tab character (the default), and you can specify the nth (n >= 1) character rather than the first character in a line (the default) as the separator between the key and value. For example:
 However, you can customize this default. You can specify a field separator other than the tab character (the default), and you can specify the nth (n >= 1) character rather than the first character in a line (the default) as the separator between the key and value. For example:
@@ -308,7 +308,7 @@ Similarly, you can use "-jobconf stream.reduce.output.field.separator=SEP" and "
 <section>
 <section>
 <title>A Useful Partitioner Class (secondary sort, the -partitioner org.apache.hadoop.mapred.lib.KeyFieldBasedPartitioner option) </title>
 <title>A Useful Partitioner Class (secondary sort, the -partitioner org.apache.hadoop.mapred.lib.KeyFieldBasedPartitioner option) </title>
 <p>
 <p>
-Hadoop has a library class, org.apache.hadoop.mapred.lib.KeyFieldBasedPartitioner, that is useful for many applications. This class allows the map/reduce framework to partition the map outputs based on prefixes of keys, not the whole keys. For example:
+Hadoop has a library class, org.apache.hadoop.mapred.lib.KeyFieldBasedPartitioner, that is useful for many applications. This class allows the Map/Reduce framework to partition the map outputs based on prefixes of keys, not the whole keys. For example:
 </p>
 </p>
 <source>
 <source>
 $HADOOP_HOME/bin/hadoop  jar $HADOOP_HOME/hadoop-streaming.jar \
 $HADOOP_HOME/bin/hadoop  jar $HADOOP_HOME/hadoop-streaming.jar \
@@ -326,7 +326,7 @@ $HADOOP_HOME/bin/hadoop  jar $HADOOP_HOME/hadoop-streaming.jar \
 <p>
 <p>
 Here, <em>-jobconf stream.map.output.field.separator=.</em> and <em>-jobconf stream.num.map.output.key.fields=4</em> are as explained in previous example. The two variables are used by streaming to identify the key/value pair of mapper. 
 Here, <em>-jobconf stream.map.output.field.separator=.</em> and <em>-jobconf stream.num.map.output.key.fields=4</em> are as explained in previous example. The two variables are used by streaming to identify the key/value pair of mapper. 
 </p><p>
 </p><p>
-The map output keys of the above map/reduce job normally have four fields separated by ".". However, the map/reduce framework will partition the map outputs by the first two fields of the keys using the <em>-jobconf num.key.fields.for.partition=2</em> option. Here, <em>-jobconf map.output.key.field.separator=.</em> specifies the separator for the partition. This guarantees that all the key/value pairs with the same first two fields in the keys will be partitioned into the same reducer.
+The map output keys of the above Map/Reduce job normally have four fields separated by ".". However, the Map/Reduce framework will partition the map outputs by the first two fields of the keys using the <em>-jobconf num.key.fields.for.partition=2</em> option. Here, <em>-jobconf map.output.key.field.separator=.</em> specifies the separator for the partition. This guarantees that all the key/value pairs with the same first two fields in the keys will be partitioned into the same reducer.
 </p><p>
 </p><p>
 <em>This is effectively equivalent to specifying the first two fields as the primary key and the next two fields as the secondary. The primary key is used for partitioning, and the combination of the primary and secondary keys is used for sorting.</em> A simple illustration is shown here:
 <em>This is effectively equivalent to specifying the first two fields as the primary key and the next two fields as the secondary. The primary key is used for partitioning, and the combination of the primary and secondary keys is used for sorting.</em> A simple illustration is shown here:
 </p>
 </p>
@@ -456,7 +456,7 @@ Often you do not need the full power of Map Reduce, but only need to run multipl
 As an example, consider the problem of zipping (compressing) a set of files across the hadoop cluster. You can achieve this using either of these methods:
 As an example, consider the problem of zipping (compressing) a set of files across the hadoop cluster. You can achieve this using either of these methods:
 </p><ol>
 </p><ol>
 <li> Hadoop Streaming and custom mapper script:<ul>
 <li> Hadoop Streaming and custom mapper script:<ul>
-  <li> Generate a file containing the full DFS path of the input files. Each map task would get one file name as input.</li>
+  <li> Generate a file containing the full HDFS path of the input files. Each map task would get one file name as input.</li>
   <li> Create a mapper script which, given a filename, will get the file to local disk, gzip the file and put it back in the desired output directory</li>
   <li> Create a mapper script which, given a filename, will get the file to local disk, gzip the file and put it back in the desired output directory</li>
 </ul></li>
 </ul></li>
 <li>The existing Hadoop Framework:<ul>
 <li>The existing Hadoop Framework:<ul>

部分文件因文件數量過多而無法顯示