16 роки тому · 4f51d458ff
--- a/CHANGES.txt
+++ b/CHANGES.txt
@@ -60,6 +60,9 @@ Release 0.18.3 - Unreleased
 
				     HADOOP-4714. Report status between merges and make the number of records
			
 
				     between progress reports configurable. (Jothi Padmanabhan via cdouglas)
			
 
				 
			
 
				+    HADOOP-4726. Fix documentation typos "the the". (Edward J. Yoon via
			
 
				+    szetszwo)
			
 
				+
			
 
				 Release 0.18.2 - 2008-11-03
			
 
				 
			
 
				   BUG FIXES
			
--- a/conf/hadoop-default.xml
+++ b/conf/hadoop-default.xml
@@ -1059,7 +1059,7 @@ creations/deletions), or "all".</description>
 
				     <value>false</value>
			
 
				     <description>To set whether the system should collect profiler
			
 
				      information for some of the tasks in this job? The information is stored
			
 
				-     in the the user log directory. The value is "true" if task profiling
			
 
				+     in the user log directory. The value is "true" if task profiling
			
 
				      is enabled.</description>
			
 
				   </property>
			
 
				 
			
--- a/docs/changes.html
+++ b/docs/changes.html
@@ -36,7 +36,7 @@
 
				     function collapse() {
			
 
				       for (var i = 0; i < document.getElementsByTagName("ul").length; i++) {
			
 
				         var list = document.getElementsByTagName("ul")[i];
			
 
				-        if (list.id != 'release_0.18.2_-_2008-11-03_' && list.id != 'release_0.18.1_-_2008-09-17_') {
			
 
				+        if (list.id != 'release_0.18.3_-_unreleased_' && list.id != 'release_0.18.2_-_2008-11-03_') {
			
 
				           list.style.display = "none";
			
 
				         }
			
 
				       }
			
@@ -52,6 +52,53 @@
 
				 <a href="http://hadoop.apache.org/core/"><img class="logoImage" alt="Hadoop" src="images/hadoop-logo.jpg" title="Scalable Computing Platform"></a>
			
 
				 <h1>Hadoop Change Log</h1>
			
 
				 
			
 
				+<h2><a href="javascript:toggleList('release_0.18.3_-_unreleased_')">Release 0.18.3 - Unreleased
			
 
				+</a></h2>
			
 
				+<ul id="release_0.18.3_-_unreleased_">
			
 
				+  <li><a href="javascript:toggleList('release_0.18.3_-_unreleased_._improvements_')">  IMPROVEMENTS
			
 
				+</a>&nbsp;&nbsp;&nbsp;(1)
			
 
				+    <ol id="release_0.18.3_-_unreleased_._improvements_">
			
 
				+      <li><a href="http://issues.apache.org/jira/browse/HADOOP-4150">HADOOP-4150</a>. Include librecordio in hadoop releases.<br />(Giridharan Kesavan
			
 
				+via acmurthy)</li>
			
 
				+    </ol>
			
 
				+  </li>
			
 
				+  <li><a href="javascript:toggleList('release_0.18.3_-_unreleased_._bug_fixes_')">  BUG FIXES
			
 
				+</a>&nbsp;&nbsp;&nbsp;(19)
			
 
				+    <ol id="release_0.18.3_-_unreleased_._bug_fixes_">
			
 
				+      <li><a href="http://issues.apache.org/jira/browse/HADOOP-4499">HADOOP-4499</a>. DFSClient should invoke checksumOk only once.<br />(Raghu Angadi)</li>
			
 
				+      <li><a href="http://issues.apache.org/jira/browse/HADOOP-4597">HADOOP-4597</a>. Calculate mis-replicated blocks when safe-mode is turned
			
 
				+off manually.<br />(shv)</li>
			
 
				+      <li><a href="http://issues.apache.org/jira/browse/HADOOP-3121">HADOOP-3121</a>. lsr should keep listing the remaining items but not
			
 
				+terminate if there is any IOException.<br />(szetszwo)</li>
			
 
				+      <li><a href="http://issues.apache.org/jira/browse/HADOOP-4610">HADOOP-4610</a>. Always calculate mis-replicated blocks when safe-mode is
			
 
				+turned off.<br />(shv)</li>
			
 
				+      <li><a href="http://issues.apache.org/jira/browse/HADOOP-3883">HADOOP-3883</a>. Limit namenode to assign at most one generation stamp for
			
 
				+a particular block within a short period.<br />(szetszwo)</li>
			
 
				+      <li><a href="http://issues.apache.org/jira/browse/HADOOP-4556">HADOOP-4556</a>. Block went missing.<br />(hairong)</li>
			
 
				+      <li><a href="http://issues.apache.org/jira/browse/HADOOP-4643">HADOOP-4643</a>. NameNode should exclude excessive replicas when counting
			
 
				+live replicas for a block.<br />(hairong)</li>
			
 
				+      <li><a href="http://issues.apache.org/jira/browse/HADOOP-4703">HADOOP-4703</a>. Should not wait for proxy forever in lease recovering.<br />(szetszwo)</li>
			
 
				+      <li><a href="http://issues.apache.org/jira/browse/HADOOP-4647">HADOOP-4647</a>. NamenodeFsck should close the DFSClient it has created.<br />(szetszwo)</li>
			
 
				+      <li><a href="http://issues.apache.org/jira/browse/HADOOP-4616">HADOOP-4616</a>. Fuse-dfs can handle bad values from FileSystem.read call.<br />(Pete Wyckoff via dhruba)</li>
			
 
				+      <li><a href="http://issues.apache.org/jira/browse/HADOOP-4061">HADOOP-4061</a>. Throttle Datanode decommission monitoring in Namenode.<br />(szetszwo)</li>
			
 
				+      <li><a href="http://issues.apache.org/jira/browse/HADOOP-4659">HADOOP-4659</a>. Root cause of connection failure is being ost to code that
			
 
				+uses it for delaying startup.<br />(Steve Loughran and Hairong via hairong)</li>
			
 
				+      <li><a href="http://issues.apache.org/jira/browse/HADOOP-4614">HADOOP-4614</a>. Lazily open segments when merging map spills to avoid using
			
 
				+too many file descriptors.<br />(Yuri Pradkin via cdouglas)</li>
			
 
				+      <li><a href="http://issues.apache.org/jira/browse/HADOOP-4542">HADOOP-4542</a>. TestDistributedUpgrade used succeed for wrong reasons.<br />(Raghu Angadi)</li>
			
 
				+      <li><a href="http://issues.apache.org/jira/browse/HADOOP-4257">HADOOP-4257</a>. The DFS client should pick only one datanode as the candidate
			
 
				+to initiate lease recovery.  (Tsz Wo (Nicholas), SZE via dhruba)
			
 
				+</li>
			
 
				+      <li><a href="http://issues.apache.org/jira/browse/HADOOP-4713">HADOOP-4713</a>. Fix librecordio to handle records larger than 64k.<br />(Christian
			
 
				+Kunz via cdouglas)</li>
			
 
				+      <li><a href="http://issues.apache.org/jira/browse/HADOOP-4635">HADOOP-4635</a>. Fix a memory leak in fuse dfs.<br />(pete wyckoff via mahadev)</li>
			
 
				+      <li><a href="http://issues.apache.org/jira/browse/HADOOP-4714">HADOOP-4714</a>. Report status between merges and make the number of records
			
 
				+between progress reports configurable.<br />(Jothi Padmanabhan via cdouglas)</li>
			
 
				+      <li><a href="http://issues.apache.org/jira/browse/HADOOP-4726">HADOOP-4726</a>. Fix documentation typos "the the".<br />(Edward J. Yoon via
			
 
				+szetszwo)</li>
			
 
				+    </ol>
			
 
				+  </li>
			
 
				+</ul>
			
 
				 <h2><a href="javascript:toggleList('release_0.18.2_-_2008-11-03_')">Release 0.18.2 - 2008-11-03
			
 
				 </a></h2>
			
 
				 <ul id="release_0.18.2_-_2008-11-03_">
			
@@ -96,8 +143,10 @@ changes from the prior release.<br />(cutting)</li>
 
				     </ol>
			
 
				   </li>
			
 
				 </ul>
			
 
				-<h2><a href="javascript:toggleList('release_0.18.1_-_2008-09-17_')">Release 0.18.1 - 2008-09-17
			
 
				-</a></h2>
			
 
				+<h2><a href="javascript:toggleList('older')">Older Releases</a></h2>
			
 
				+<ul id="older">
			
 
				+<h3><a href="javascript:toggleList('release_0.18.1_-_2008-09-17_')">Release 0.18.1 - 2008-09-17
			
 
				+</a></h3>
			
 
				 <ul id="release_0.18.1_-_2008-09-17_">
			
 
				   <li><a href="javascript:toggleList('release_0.18.1_-_2008-09-17_._improvements_')">  IMPROVEMENTS
			
 
				 </a>&nbsp;&nbsp;&nbsp;(1)
			
@@ -122,8 +171,6 @@ outputs or when the final map outputs are being fetched without contention.<br /
 
				     </ol>
			
 
				   </li>
			
 
				 </ul>
			
 
				-<h2><a href="javascript:toggleList('older')">Older Releases</a></h2>
			
 
				-<ul id="older">
			
 
				 <h3><a href="javascript:toggleList('release_0.18.0_-_2008-08-19_')">Release 0.18.0 - 2008-08-19
			
 
				 </a></h3>
			
 
				 <ul id="release_0.18.0_-_2008-08-19_">
			
--- a/docs/cluster_setup.html
+++ b/docs/cluster_setup.html
@@ -317,7 +317,7 @@ document.write("Last Published: " + document.lastModified);
 
				         values via the <span class="codefrag">conf/hadoop-env.sh</span>.</p>
			
 
				 <a name="N10097"></a><a name="Site+Configuration"></a>
			
 
				 <h3 class="h4">Site Configuration</h3>
			
 
				-<p>To configure the the Hadoop cluster you will need to configure the
			
 
				+<p>To configure the Hadoop cluster you will need to configure the
			
 
				         <em>environment</em> in which the Hadoop daemons execute as well as
			
 
				         the <em>configuration parameters</em> for the Hadoop daemons.</p>
			
 
				 <p>The Hadoop daemons are <span class="codefrag">NameNode</span>/<span class="codefrag">DataNode</span> 
			
--- a/docs/cluster_setup.pdf
+++ b/docs/cluster_setup.pdf
--- a/docs/hadoop-default.html
+++ b/docs/hadoop-default.html
@@ -328,7 +328,11 @@ creations/deletions), or "all".</td>
 
				   </td>
			
 
				 </tr>
			
 
				 <tr>
			
 
				-<td><a name="dfs.namenode.decommission.interval">dfs.namenode.decommission.interval</a></td><td>300</td><td>Namenode periodicity in seconds to check if decommission is complete.</td>
			
 
				+<td><a name="dfs.namenode.decommission.interval">dfs.namenode.decommission.interval</a></td><td>30</td><td>Namenode periodicity in seconds to check if decommission is complete.</td>
			
 
				+</tr>
			
 
				+<tr>
			
 
				+<td><a name="dfs.namenode.decommission.nodes.per.interval">dfs.namenode.decommission.nodes.per.interval</a></td><td>5</td><td>The number of nodes namenode checks if decommission is complete
			
 
				+  in each dfs.namenode.decommission.interval.</td>
			
 
				 </tr>
			
 
				 <tr>
			
 
				 <td><a name="dfs.replication.interval">dfs.replication.interval</a></td><td>3</td><td>The periodicity in seconds with which the namenode computes repliaction work for datanodes. </td>
			
@@ -649,7 +653,7 @@ creations/deletions), or "all".</td>
 
				 <tr>
			
 
				 <td><a name="mapred.task.profile">mapred.task.profile</a></td><td>false</td><td>To set whether the system should collect profiler
			
 
				      information for some of the tasks in this job? The information is stored
			
 
				-     in the the user log directory. The value is "true" if task profiling
			
 
				+     in the user log directory. The value is "true" if task profiling
			
 
				      is enabled.</td>
			
 
				 </tr>
			
 
				 <tr>
			
@@ -763,6 +767,11 @@ creations/deletions), or "all".</td>
 
				     level.
			
 
				   </td>
			
 
				 </tr>
			
 
				+<tr>
			
 
				+<td><a name="mapred.merge.recordsBeforeProgress">mapred.merge.recordsBeforeProgress</a></td><td>10000</td><td> The number of records to process during merge before
			
 
				+   sending a progress notification to the TaskTracker.
			
 
				+  </td>
			
 
				+</tr>
			
 
				 </table>
			
 
				 </body>
			
 
				 </html>
			
--- a/docs/hdfs_permissions_guide.html
+++ b/docs/hdfs_permissions_guide.html
@@ -247,12 +247,12 @@ document.write("Last Published: " + document.lastModified);
 
				 		</li>
			
 
				 		
			
 
				 <li>
			
 
				-		   Otherwise the the other permissions of <span class="codefrag">foo</span> are tested.
			
 
				+		   Otherwise the other permissions of <span class="codefrag">foo</span> are tested.
			
 
				 		</li>
			
 
				 	
			
 
				 </ul>
			
 
				 <p>
			
 
				-		If a permissions check fails, the the client operation fails.	
			
 
				+		If a permissions check fails, the client operation fails.	
			
 
				 </p>
			
 
				 </div>
			
 
				 
			
--- a/docs/hdfs_permissions_guide.pdf
+++ b/docs/hdfs_permissions_guide.pdf
--- a/docs/jdiff/hadoop_0.17.0.xml
+++ b/docs/jdiff/hadoop_0.17.0.xml
@@ -551,7 +551,7 @@
 
				       <doc>
			
 
				       <![CDATA[Set the quiteness-mode. 
			
 
				  
			
 
				- In the the quite-mode error and informational messages might not be logged.
			
 
				+ In the quite-mode error and informational messages might not be logged.
			
 
				  
			
 
				  @param quietmode <code>true</code> to set quiet-mode on, <code>false</code>
			
 
				               to turn it off.]]>
			
@@ -23538,7 +23538,7 @@ To add a new serialization framework write an implementation of
 
				  helps to cut down the amount of data transferred from the {@link Mapper} to
			
 
				  the {@link Reducer}, leading to better performance.</p>
			
 
				   
			
 
				- <p>Typically the combiner is same as the the <code>Reducer</code> for the  
			
 
				+ <p>Typically the combiner is same as the <code>Reducer</code> for the  
			
 
				  job i.e. {@link #setReducerClass(Class)}.</p>
			
 
				  
			
 
				  @param theClass the user-defined combiner class used to combine 
			
@@ -23958,7 +23958,7 @@ To add a new serialization framework write an implementation of
 
				       <param name="newValue" type="boolean"/>
			
 
				       <doc>
			
 
				       <![CDATA[Set whether the system should collect profiler information for some of 
			
 
				- the tasks in this job? The information is stored in the the user log 
			
 
				+ the tasks in this job? The information is stored in the user log 
			
 
				  directory.
			
 
				  @param newValue true means it should be gathered]]>
			
 
				       </doc>
			
@@ -29523,7 +29523,7 @@ Sun Microsystems, Inc. in the United States and other countries.</i></p>]]>
 
				       <param name="aJob" type="org.apache.hadoop.mapred.jobcontrol.Job"/>
			
 
				       <doc>
			
 
				       <![CDATA[Add a new job.
			
 
				- @param aJob the the new job]]>
			
 
				+ @param aJob the new job]]>
			
 
				       </doc>
			
 
				     </method>
			
 
				     <method name="addJobs"
			
@@ -33101,7 +33101,7 @@ public class WordCountAggregatorDescriptor extends ValueAggregatorBaseDescriptor
 
				 </pre>
			
 
				 </blockquote>
			
 
				 In the above code, LONG_VALUE_SUM is a string denoting the aggregation type LongValueSum, which sums over long values.
			
 
				-ONE denotes a string "1". Function generateEntry(LONG_VALUE_SUM, words[i], ONE) will inperpret the first argument as an aggregation type, the second as an aggregation ID, and the the third argumnent as the value to be aggregated. The output will look like: "LongValueSum:xxxx", where XXXX is the string value of words[i]. The value will be "1". The mapper will call generateKeyValPairs(Object key, Object val)  for each input key/value pair to generate the desired aggregation id/value pairs. 
			
 
				+ONE denotes a string "1". Function generateEntry(LONG_VALUE_SUM, words[i], ONE) will inperpret the first argument as an aggregation type, the second as an aggregation ID, and the third argumnent as the value to be aggregated. The output will look like: "LongValueSum:xxxx", where XXXX is the string value of words[i]. The value will be "1". The mapper will call generateKeyValPairs(Object key, Object val)  for each input key/value pair to generate the desired aggregation id/value pairs. 
			
 
				 The down stream combiner/reducer will interpret these pairs as adding one to the aggregator XXXX.
			
 
				 <p />
			
 
				 Class ValueAggregatorBaseDescriptor is a base class that user plugin classes can extend. Here is the XML fragment specifying the user plugin class:
			
@@ -41954,7 +41954,7 @@ is serialized as
 
				     <doc>
			
 
				     <![CDATA[A helper to load the native hadoop code i.e. libhadoop.so.
			
 
				  This handles the fallback to either the bundled libhadoop-Linux-i386-32.so
			
 
				- or the the default java implementations where appropriate.]]>
			
 
				+ or the default java implementations where appropriate.]]>
			
 
				     </doc>
			
 
				   </class>
			
 
				   <!-- end class org.apache.hadoop.util.NativeCodeLoader -->
			
--- a/docs/jdiff/hadoop_0.18.1.xml
+++ b/docs/jdiff/hadoop_0.18.1.xml
@@ -567,7 +567,7 @@
 
				       <doc>
			
 
				       <![CDATA[Set the quiteness-mode. 
			
 
				  
			
 
				- In the the quite-mode error and informational messages might not be logged.
			
 
				+ In the quite-mode error and informational messages might not be logged.
			
 
				  
			
 
				  @param quietmode <code>true</code> to set quiet-mode on, <code>false</code>
			
 
				               to turn it off.]]>
			
@@ -25379,7 +25379,7 @@
 
				  helps to cut down the amount of data transferred from the {@link Mapper} to
			
 
				  the {@link Reducer}, leading to better performance.</p>
			
 
				   
			
 
				- <p>Typically the combiner is same as the the <code>Reducer</code> for the  
			
 
				+ <p>Typically the combiner is same as the <code>Reducer</code> for the  
			
 
				  job i.e. {@link #setReducerClass(Class)}.</p>
			
 
				  
			
 
				  @param theClass the user-defined combiner class used to combine 
			
@@ -25814,7 +25814,7 @@
 
				       <param name="newValue" type="boolean"/>
			
 
				       <doc>
			
 
				       <![CDATA[Set whether the system should collect profiler information for some of 
			
 
				- the tasks in this job? The information is stored in the the user log 
			
 
				+ the tasks in this job? The information is stored in the user log 
			
 
				  directory.
			
 
				  @param newValue true means it should be gathered]]>
			
 
				       </doc>
			
@@ -31855,7 +31855,7 @@
 
				       <param name="aJob" type="org.apache.hadoop.mapred.jobcontrol.Job"/>
			
 
				       <doc>
			
 
				       <![CDATA[Add a new job.
			
 
				- @param aJob the the new job]]>
			
 
				+ @param aJob the new job]]>
			
 
				       </doc>
			
 
				     </method>
			
 
				     <method name="addJobs"
			
@@ -43418,7 +43418,7 @@
 
				     <doc>
			
 
				     <![CDATA[A helper to load the native hadoop code i.e. libhadoop.so.
			
 
				  This handles the fallback to either the bundled libhadoop-Linux-i386-32.so
			
 
				- or the the default java implementations where appropriate.]]>
			
 
				+ or the default java implementations where appropriate.]]>
			
 
				     </doc>
			
 
				   </class>
			
 
				   <!-- end class org.apache.hadoop.util.NativeCodeLoader -->
			
--- a/docs/jdiff/hadoop_0.18.2.xml
+++ b/docs/jdiff/hadoop_0.18.2.xml
@@ -567,7 +567,7 @@
 
				       <doc>
			
 
				       <![CDATA[Set the quiteness-mode. 
			
 
				  
			
 
				- In the the quite-mode error and informational messages might not be logged.
			
 
				+ In the quite-mode error and informational messages might not be logged.
			
 
				  
			
 
				  @param quietmode <code>true</code> to set quiet-mode on, <code>false</code>
			
 
				               to turn it off.]]>
			
@@ -19389,7 +19389,7 @@
 
				  helps to cut down the amount of data transferred from the {@link Mapper} to
			
 
				  the {@link Reducer}, leading to better performance.</p>
			
 
				   
			
 
				- <p>Typically the combiner is same as the the <code>Reducer</code> for the  
			
 
				+ <p>Typically the combiner is same as the <code>Reducer</code> for the  
			
 
				  job i.e. {@link #setReducerClass(Class)}.</p>
			
 
				  
			
 
				  @param theClass the user-defined combiner class used to combine 
			
@@ -19824,7 +19824,7 @@
 
				       <param name="newValue" type="boolean"/>
			
 
				       <doc>
			
 
				       <![CDATA[Set whether the system should collect profiler information for some of 
			
 
				- the tasks in this job? The information is stored in the the user log 
			
 
				+ the tasks in this job? The information is stored in the user log 
			
 
				  directory.
			
 
				  @param newValue true means it should be gathered]]>
			
 
				       </doc>
			
@@ -25865,7 +25865,7 @@
 
				       <param name="aJob" type="org.apache.hadoop.mapred.jobcontrol.Job"/>
			
 
				       <doc>
			
 
				       <![CDATA[Add a new job.
			
 
				- @param aJob the the new job]]>
			
 
				+ @param aJob the new job]]>
			
 
				       </doc>
			
 
				     </method>
			
 
				     <method name="addJobs"
			
@@ -37428,7 +37428,7 @@
 
				     <doc>
			
 
				     <![CDATA[A helper to load the native hadoop code i.e. libhadoop.so.
			
 
				  This handles the fallback to either the bundled libhadoop-Linux-i386-32.so
			
 
				- or the the default java implementations where appropriate.]]>
			
 
				+ or the default java implementations where appropriate.]]>
			
 
				     </doc>
			
 
				   </class>
			
 
				   <!-- end class org.apache.hadoop.util.NativeCodeLoader -->
			
--- a/docs/mapred_tutorial.html
+++ b/docs/mapred_tutorial.html
@@ -2167,7 +2167,7 @@ document.write("Last Published: " + document.lastModified);
 
				           <a href="api/org/apache/hadoop/mapred/JobConf.html#setProfileEnabled(boolean)">
			
 
				           JobConf.setProfileEnabled(boolean)</a>. If the value is set 
			
 
				           <span class="codefrag">true</span>, the task profiling is enabled. The profiler
			
 
				-          information is stored in the the user log directory. By default, 
			
 
				+          information is stored in the user log directory. By default, 
			
 
				           profiling is not enabled for the job.  </p>
			
 
				 <p>Once user configures that profiling is needed, she/he can use
			
 
				           the configuration property 
			
--- a/docs/mapred_tutorial.pdf
+++ b/docs/mapred_tutorial.pdf
--- a/docs/streaming.html
+++ b/docs/streaming.html
@@ -310,11 +310,11 @@ In the above example, both the mapper and the reducer are executables that read
 
				 </p>
			
 
				 <p>
			
 
				   When an executable is specified for mappers, each mapper task will launch the executable as a separate process when the mapper is initialized. As the mapper task runs, it converts its inputs into lines and feed the lines to the stdin of the process. In the meantime, the mapper collects the line oriented outputs from the stdout of the process and converts each line into a key/value pair, which is collected as the output of the mapper. By default, the 
			
 
				-  <em>prefix of a line up to the first tab character</em> is the <strong>key</strong> and the the rest of the line (excluding the tab character) will be the <strong>value</strong>. 
			
 
				+  <em>prefix of a line up to the first tab character</em> is the <strong>key</strong> and the rest of the line (excluding the tab character) will be the <strong>value</strong>. 
			
 
				   If there is no tab character in the line, then entire line is considered as key and the value is null. However, this can be customized, as discussed later.
			
 
				 </p>
			
 
				 <p>
			
 
				-When an executable is specified for reducers, each reducer task will launch the executable as a separate process then the reducer is initialized. As the reducer task runs, it converts its input key/values pairs into lines and feeds the lines to the stdin of the process. In the meantime, the reducer collects the line oriented outputs from the stdout of the process, converts each line into a key/value pair, which is collected as the output of the reducer. By default, the prefix of a line up to the first tab character is the key and the the rest of the line (excluding the tab character) is the value. However, this can be customized, as discussed later.
			
 
				+When an executable is specified for reducers, each reducer task will launch the executable as a separate process then the reducer is initialized. As the reducer task runs, it converts its input key/values pairs into lines and feeds the lines to the stdin of the process. In the meantime, the reducer collects the line oriented outputs from the stdout of the process, converts each line into a key/value pair, which is collected as the output of the reducer. By default, the prefix of a line up to the first tab character is the key and the rest of the line (excluding the tab character) is the value. However, this can be customized, as discussed later.
			
 
				 </p>
			
 
				 <p>
			
 
				 This is the basis for the communication protocol between the Map/Reduce framework and the streaming mapper/reducer.
			
@@ -574,7 +574,7 @@ To set an environment variable in a streaming command use:
 
				 <a name="N10194"></a><a name="Customizing+the+Way+to+Split+Lines+into+Key%2FValue+Pairs"></a>
			
 
				 <h3 class="h4">Customizing the Way to Split Lines into Key/Value Pairs </h3>
			
 
				 <p>
			
 
				-As noted earlier, when the Map/Reduce framework reads a line from the stdout of the mapper, it splits the line into a key/value pair. By default, the prefix of the line up to the first tab character is the key and the the rest of the line (excluding the tab character) is the value.
			
 
				+As noted earlier, when the Map/Reduce framework reads a line from the stdout of the mapper, it splits the line into a key/value pair. By default, the prefix of the line up to the first tab character is the key and the rest of the line (excluding the tab character) is the value.
			
 
				 </p>
			
 
				 <p>
			
 
				 However, you can customize this default. You can specify a field separator other than the tab character (the default), and you can specify the nth (n &gt;= 1) character rather than the first character in a line (the default) as the separator between the key and value. For example:
			
--- a/docs/streaming.pdf
+++ b/docs/streaming.pdf
--- a/src/docs/src/documentation/content/xdocs/cluster_setup.xml
+++ b/src/docs/src/documentation/content/xdocs/cluster_setup.xml
@@ -100,7 +100,7 @@
 
				       <section>
			
 
				         <title>Site Configuration</title>
			
 
				         
			
 
				-        <p>To configure the the Hadoop cluster you will need to configure the
			
 
				+        <p>To configure the Hadoop cluster you will need to configure the
			
 
				         <em>environment</em> in which the Hadoop daemons execute as well as
			
 
				         the <em>configuration parameters</em> for the Hadoop daemons.</p>
			
 
				         
			
--- a/src/docs/src/documentation/content/xdocs/hdfs_permissions_guide.xml
+++ b/src/docs/src/documentation/content/xdocs/hdfs_permissions_guide.xml
@@ -43,12 +43,12 @@
 
				 		   Else if the group of <code>foo</code> matches any of member of the groups list, then the group permissions are tested;
			
 
				 		</li>
			
 
				 		<li>
			
 
				-		   Otherwise the the other permissions of <code>foo</code> are tested.
			
 
				+		   Otherwise the other permissions of <code>foo</code> are tested.
			
 
				 		</li>
			
 
				 	</ul>
			
 
				 
			
 
				 <p>
			
 
				-		If a permissions check fails, the the client operation fails.	
			
 
				+		If a permissions check fails, the client operation fails.	
			
 
				 </p>
			
 
				      </section>
			
 
				 
			
--- a/src/docs/src/documentation/content/xdocs/mapred_tutorial.xml
+++ b/src/docs/src/documentation/content/xdocs/mapred_tutorial.xml
@@ -1639,7 +1639,7 @@
 
				           <a href="ext:api/org/apache/hadoop/mapred/jobconf/setprofileenabled">
			
 
				           JobConf.setProfileEnabled(boolean)</a>. If the value is set 
			
 
				           <code>true</code>, the task profiling is enabled. The profiler
			
 
				-          information is stored in the the user log directory. By default, 
			
 
				+          information is stored in the user log directory. By default, 
			
 
				           profiling is not enabled for the job.  </p>
			
 
				           
			
 
				           <p>Once user configures that profiling is needed, she/he can use
			
--- a/src/docs/src/documentation/content/xdocs/streaming.xml
+++ b/src/docs/src/documentation/content/xdocs/streaming.xml
@@ -48,11 +48,11 @@ $HADOOP_HOME/bin/hadoop  jar $HADOOP_HOME/hadoop-streaming.jar \
 
				 In the above example, both the mapper and the reducer are executables that read the input from stdin (line by line) and emit the output to stdout. The utility will create a Map/Reduce job, submit the job to an appropriate cluster, and monitor the progress of the job until it completes.

			
 
				 </p><p>

			
 
				   When an executable is specified for mappers, each mapper task will launch the executable as a separate process when the mapper is initialized. As the mapper task runs, it converts its inputs into lines and feed the lines to the stdin of the process. In the meantime, the mapper collects the line oriented outputs from the stdout of the process and converts each line into a key/value pair, which is collected as the output of the mapper. By default, the 

			
 
				-  <em>prefix of a line up to the first tab character</em> is the <strong>key</strong> and the the rest of the line (excluding the tab character) will be the <strong>value</strong>. 

			
 
				+  <em>prefix of a line up to the first tab character</em> is the <strong>key</strong> and the rest of the line (excluding the tab character) will be the <strong>value</strong>. 

			
 
				   If there is no tab character in the line, then entire line is considered as key and the value is null. However, this can be customized, as discussed later.

			
 
				 </p>

			
 
				 <p>

			
 
				-When an executable is specified for reducers, each reducer task will launch the executable as a separate process then the reducer is initialized. As the reducer task runs, it converts its input key/values pairs into lines and feeds the lines to the stdin of the process. In the meantime, the reducer collects the line oriented outputs from the stdout of the process, converts each line into a key/value pair, which is collected as the output of the reducer. By default, the prefix of a line up to the first tab character is the key and the the rest of the line (excluding the tab character) is the value. However, this can be customized, as discussed later.

			
 
				+When an executable is specified for reducers, each reducer task will launch the executable as a separate process then the reducer is initialized. As the reducer task runs, it converts its input key/values pairs into lines and feeds the lines to the stdin of the process. In the meantime, the reducer collects the line oriented outputs from the stdout of the process, converts each line into a key/value pair, which is collected as the output of the reducer. By default, the prefix of a line up to the first tab character is the key and the rest of the line (excluding the tab character) is the value. However, this can be customized, as discussed later.

			
 
				 </p><p>

			
 
				 This is the basis for the communication protocol between the Map/Reduce framework and the streaming mapper/reducer.

			
 
				 </p><p>

			
@@ -282,7 +282,7 @@ To set an environment variable in a streaming command use:
 
				 <section>

			
 
				 <title>Customizing the Way to Split Lines into Key/Value Pairs </title>

			
 
				 <p>

			
 
				-As noted earlier, when the Map/Reduce framework reads a line from the stdout of the mapper, it splits the line into a key/value pair. By default, the prefix of the line up to the first tab character is the key and the the rest of the line (excluding the tab character) is the value.

			
 
				+As noted earlier, when the Map/Reduce framework reads a line from the stdout of the mapper, it splits the line into a key/value pair. By default, the prefix of the line up to the first tab character is the key and the rest of the line (excluding the tab character) is the value.

			
 
				 </p>

			
 
				 <p>

			
 
				 However, you can customize this default. You can specify a field separator other than the tab character (the default), and you can specify the nth (n >= 1) character rather than the first character in a line (the default) as the separator between the key and value. For example:

			
--- a/src/mapred/org/apache/hadoop/mapred/ReduceTask.java
+++ b/src/mapred/org/apache/hadoop/mapred/ReduceTask.java
@@ -2145,7 +2145,7 @@ class ReduceTask extends Task {
 
				           //earlier when we invoked cloneFileAttributes
			
 
				           localFileSys.delete(outputPath, true);
			
 
				           throw (IOException)new IOException
			
 
				-                  ("Intermedate merge failed").initCause(e);
			
 
				+                  ("Intermediate merge failed").initCause(e);
			
 
				         }
			
 
				 
			
 
				         // Note the output of the merge