Pārlūkot izejas kodu

Preparing to branch for release 0.16.0

git-svn-id: https://svn.apache.org/repos/asf/hadoop/core/trunk@616662 13f79535-47bb-0310-9956-ffa450edef68
Nigel Daley 17 gadi atpakaļ
vecāks
revīzija
7496648a65

+ 10 - 0
CHANGES.txt

@@ -3,6 +3,16 @@ Hadoop Change Log
 
 
 Trunk (unreleased changes)
 Trunk (unreleased changes)
 
 
+  INCOMPATIBLE CHANGES
+
+  NEW FEATURES
+
+  OPTIMIZATIONS
+
+  BUG FIXES
+
+Release 0.16.0 - 2008-02-04
+
   INCOMPATIBLE CHANGES
   INCOMPATIBLE CHANGES
 
 
     HADOOP-1245.  Use the mapred.tasktracker.tasks.maximum value
     HADOOP-1245.  Use the mapred.tasktracker.tasks.maximum value

+ 1 - 1
build.xml

@@ -26,7 +26,7 @@
  
  
   <property name="Name" value="Hadoop"/>
   <property name="Name" value="Hadoop"/>
   <property name="name" value="hadoop"/>
   <property name="name" value="hadoop"/>
-  <property name="version" value="0.16.0-dev"/>
+  <property name="version" value="0.17.0-dev"/>
   <property name="final.name" value="${name}-${version}"/>
   <property name="final.name" value="${name}-${version}"/>
   <property name="year" value="2006"/>
   <property name="year" value="2006"/>
   <property name="libhdfs.version" value="1"/>
   <property name="libhdfs.version" value="1"/>

+ 13 - 13
docs/cluster_setup.html

@@ -210,7 +210,7 @@ document.write("Last Published: " + document.lastModified);
 </div>
 </div>
   
   
     
     
-<a name="N1000C"></a><a name="Purpose"></a>
+<a name="N1000D"></a><a name="Purpose"></a>
 <h2 class="h3">Purpose</h2>
 <h2 class="h3">Purpose</h2>
 <div class="section">
 <div class="section">
 <p>This document describes how to install, configure and manage non-trivial
 <p>This document describes how to install, configure and manage non-trivial
@@ -222,7 +222,7 @@ document.write("Last Published: " + document.lastModified);
 </div>
 </div>
     
     
     
     
-<a name="N1001D"></a><a name="Pre-requisites"></a>
+<a name="N1001E"></a><a name="Pre-requisites"></a>
 <h2 class="h3">Pre-requisites</h2>
 <h2 class="h3">Pre-requisites</h2>
 <div class="section">
 <div class="section">
 <ol>
 <ol>
@@ -241,7 +241,7 @@ document.write("Last Published: " + document.lastModified);
 </div>
 </div>
     
     
     
     
-<a name="N10035"></a><a name="Installation"></a>
+<a name="N10036"></a><a name="Installation"></a>
 <h2 class="h3">Installation</h2>
 <h2 class="h3">Installation</h2>
 <div class="section">
 <div class="section">
 <p>Installing a Hadoop cluster typically involves unpacking the software 
 <p>Installing a Hadoop cluster typically involves unpacking the software 
@@ -257,11 +257,11 @@ document.write("Last Published: " + document.lastModified);
 </div>
 </div>
     
     
     
     
-<a name="N10060"></a><a name="Configuration"></a>
+<a name="N10061"></a><a name="Configuration"></a>
 <h2 class="h3">Configuration</h2>
 <h2 class="h3">Configuration</h2>
 <div class="section">
 <div class="section">
 <p>The following sections describe how to configure a Hadoop cluster.</p>
 <p>The following sections describe how to configure a Hadoop cluster.</p>
-<a name="N10069"></a><a name="Configuration+Files"></a>
+<a name="N1006A"></a><a name="Configuration+Files"></a>
 <h3 class="h4">Configuration Files</h3>
 <h3 class="h4">Configuration Files</h3>
 <p>Hadoop configuration is driven by two important configuration files
 <p>Hadoop configuration is driven by two important configuration files
         found in the <span class="codefrag">conf/</span> directory of the distribution:</p>
         found in the <span class="codefrag">conf/</span> directory of the distribution:</p>
@@ -285,14 +285,14 @@ document.write("Last Published: " + document.lastModified);
 <p>Additionally, you can control the Hadoop scripts found in the 
 <p>Additionally, you can control the Hadoop scripts found in the 
         <span class="codefrag">bin/</span> directory of the distribution, by setting site-specific 
         <span class="codefrag">bin/</span> directory of the distribution, by setting site-specific 
         values via the <span class="codefrag">conf/hadoop-env.sh</span>.</p>
         values via the <span class="codefrag">conf/hadoop-env.sh</span>.</p>
-<a name="N10096"></a><a name="Site+Configuration"></a>
+<a name="N10097"></a><a name="Site+Configuration"></a>
 <h3 class="h4">Site Configuration</h3>
 <h3 class="h4">Site Configuration</h3>
 <p>To configure the the Hadoop cluster you will need to configure the
 <p>To configure the the Hadoop cluster you will need to configure the
         <em>environment</em> in which the Hadoop daemons execute as well as
         <em>environment</em> in which the Hadoop daemons execute as well as
         the <em>configuration parameters</em> for the Hadoop daemons.</p>
         the <em>configuration parameters</em> for the Hadoop daemons.</p>
 <p>The Hadoop daemons are <span class="codefrag">NameNode</span>/<span class="codefrag">DataNode</span> 
 <p>The Hadoop daemons are <span class="codefrag">NameNode</span>/<span class="codefrag">DataNode</span> 
         and <span class="codefrag">JobTracker</span>/<span class="codefrag">TaskTracker</span>.</p>
         and <span class="codefrag">JobTracker</span>/<span class="codefrag">TaskTracker</span>.</p>
-<a name="N100B4"></a><a name="Configuring+the+Environment+of+the+Hadoop+Daemons"></a>
+<a name="N100B5"></a><a name="Configuring+the+Environment+of+the+Hadoop+Daemons"></a>
 <h4>Configuring the Environment of the Hadoop Daemons</h4>
 <h4>Configuring the Environment of the Hadoop Daemons</h4>
 <p>Administrators should use the <span class="codefrag">conf/hadoop-env.sh</span> script
 <p>Administrators should use the <span class="codefrag">conf/hadoop-env.sh</span> script
           to do site-specific customization of the Hadoop daemons' process 
           to do site-specific customization of the Hadoop daemons' process 
@@ -318,7 +318,7 @@ document.write("Last Published: " + document.lastModified);
             </li>
             </li>
           
           
 </ul>
 </ul>
-<a name="N100DC"></a><a name="Configuring+the+Hadoop+Daemons"></a>
+<a name="N100DD"></a><a name="Configuring+the+Hadoop+Daemons"></a>
 <h4>Configuring the Hadoop Daemons</h4>
 <h4>Configuring the Hadoop Daemons</h4>
 <p>This section deals with important parameters to be specified in the
 <p>This section deals with important parameters to be specified in the
           <span class="codefrag">conf/hadoop-site.xml</span> for the Hadoop cluster.</p>
           <span class="codefrag">conf/hadoop-site.xml</span> for the Hadoop cluster.</p>
@@ -442,7 +442,7 @@ document.write("Last Published: " + document.lastModified);
           <a href="api/org/apache/hadoop/conf/Configuration.html#FinalParams">
           <a href="api/org/apache/hadoop/conf/Configuration.html#FinalParams">
           final</a> to ensure that they cannot be overriden by user-applications.
           final</a> to ensure that they cannot be overriden by user-applications.
           </p>
           </p>
-<a name="N101BC"></a><a name="Real-World+Cluster+Configurations"></a>
+<a name="N101BD"></a><a name="Real-World+Cluster+Configurations"></a>
 <h5>Real-World Cluster Configurations</h5>
 <h5>Real-World Cluster Configurations</h5>
 <p>This section lists some non-default configuration parameters which 
 <p>This section lists some non-default configuration parameters which 
             have been used to run the <em>sort</em> benchmark on very large 
             have been used to run the <em>sort</em> benchmark on very large 
@@ -603,7 +603,7 @@ document.write("Last Published: " + document.lastModified);
 </li>
 </li>
             
             
 </ul>
 </ul>
-<a name="N102D9"></a><a name="Slaves"></a>
+<a name="N102DA"></a><a name="Slaves"></a>
 <h4>Slaves</h4>
 <h4>Slaves</h4>
 <p>Typically you choose one machine in the cluster to act as the 
 <p>Typically you choose one machine in the cluster to act as the 
           <span class="codefrag">NameNode</span> and one machine as to act as the 
           <span class="codefrag">NameNode</span> and one machine as to act as the 
@@ -612,7 +612,7 @@ document.write("Last Published: " + document.lastModified);
           referred to as <em>slaves</em>.</p>
           referred to as <em>slaves</em>.</p>
 <p>List all slave hostnames or IP addresses in your 
 <p>List all slave hostnames or IP addresses in your 
           <span class="codefrag">conf/slaves</span> file, one per line.</p>
           <span class="codefrag">conf/slaves</span> file, one per line.</p>
-<a name="N102F8"></a><a name="Logging"></a>
+<a name="N102F9"></a><a name="Logging"></a>
 <h4>Logging</h4>
 <h4>Logging</h4>
 <p>Hadoop uses the <a href="http://logging.apache.org/log4j/">Apache 
 <p>Hadoop uses the <a href="http://logging.apache.org/log4j/">Apache 
           log4j</a> via the <a href="http://commons.apache.org/logging/">Apache 
           log4j</a> via the <a href="http://commons.apache.org/logging/">Apache 
@@ -625,7 +625,7 @@ document.write("Last Published: " + document.lastModified);
 </div>
 </div>
     
     
     
     
-<a name="N10318"></a><a name="Hadoop+Startup"></a>
+<a name="N10319"></a><a name="Hadoop+Startup"></a>
 <h2 class="h3">Hadoop Startup</h2>
 <h2 class="h3">Hadoop Startup</h2>
 <div class="section">
 <div class="section">
 <p>To start a Hadoop cluster you will need to start both the HDFS and 
 <p>To start a Hadoop cluster you will need to start both the HDFS and 
@@ -660,7 +660,7 @@ document.write("Last Published: " + document.lastModified);
 </div>
 </div>
     
     
     
     
-<a name="N1035E"></a><a name="Hadoop+Shutdown"></a>
+<a name="N1035F"></a><a name="Hadoop+Shutdown"></a>
 <h2 class="h3">Hadoop Shutdown</h2>
 <h2 class="h3">Hadoop Shutdown</h2>
 <div class="section">
 <div class="section">
 <p>
 <p>

+ 34 - 34
docs/hdfs_design.html

@@ -287,7 +287,7 @@ document.write("Last Published: " + document.lastModified);
 </ul>
 </ul>
 </div>
 </div>
     
     
-<a name="N10013"></a><a name="Introduction"></a>
+<a name="N10014"></a><a name="Introduction"></a>
 <h2 class="h3"> Introduction </h2>
 <h2 class="h3"> Introduction </h2>
 <div class="section">
 <div class="section">
 <p>
 <p>
@@ -296,35 +296,35 @@ document.write("Last Published: " + document.lastModified);
 </div>
 </div>
 
 
     
     
-<a name="N10025"></a><a name="Assumptions+and+Goals"></a>
+<a name="N10026"></a><a name="Assumptions+and+Goals"></a>
 <h2 class="h3"> Assumptions and Goals </h2>
 <h2 class="h3"> Assumptions and Goals </h2>
 <div class="section">
 <div class="section">
-<a name="N1002B"></a><a name="Hardware+Failure"></a>
+<a name="N1002C"></a><a name="Hardware+Failure"></a>
 <h3 class="h4"> Hardware Failure </h3>
 <h3 class="h4"> Hardware Failure </h3>
 <p>
 <p>
         Hardware failure is the norm rather than the exception. An HDFS instance may consist of hundreds or thousands of server machines, each storing part of the file system&rsquo;s data. The fact that there are a huge number of components and that each component has a non-trivial probability of failure means that some component of HDFS is always non-functional. Therefore, detection of faults and quick, automatic recovery from them is a core architectural goal of HDFS.
         Hardware failure is the norm rather than the exception. An HDFS instance may consist of hundreds or thousands of server machines, each storing part of the file system&rsquo;s data. The fact that there are a huge number of components and that each component has a non-trivial probability of failure means that some component of HDFS is always non-functional. Therefore, detection of faults and quick, automatic recovery from them is a core architectural goal of HDFS.
        </p>
        </p>
-<a name="N10035"></a><a name="Streaming+Data+Access"></a>
+<a name="N10036"></a><a name="Streaming+Data+Access"></a>
 <h3 class="h4"> Streaming Data Access </h3>
 <h3 class="h4"> Streaming Data Access </h3>
 <p>
 <p>
         Applications that run on HDFS need streaming access to their data sets. They are not general purpose applications that typically run on general purpose file systems. HDFS is designed more for batch processing rather than interactive use by users. The emphasis is on high throughput of data access rather than low latency of data access. POSIX imposes many hard requirements that are not needed for applications that are targeted for HDFS. POSIX semantics in a few key areas has been traded to increase data throughput rates. 
         Applications that run on HDFS need streaming access to their data sets. They are not general purpose applications that typically run on general purpose file systems. HDFS is designed more for batch processing rather than interactive use by users. The emphasis is on high throughput of data access rather than low latency of data access. POSIX imposes many hard requirements that are not needed for applications that are targeted for HDFS. POSIX semantics in a few key areas has been traded to increase data throughput rates. 
         </p>
         </p>
-<a name="N1003F"></a><a name="Large+Data+Sets"></a>
+<a name="N10040"></a><a name="Large+Data+Sets"></a>
 <h3 class="h4"> Large Data Sets </h3>
 <h3 class="h4"> Large Data Sets </h3>
 <p>
 <p>
         Applications that run on HDFS have large data sets. A typical file in HDFS is gigabytes to terabytes in size. Thus, HDFS is tuned to support large files. It should provide high aggregate data bandwidth and scale to hundreds of nodes in a single cluster. It should support tens of millions of files in a single instance.
         Applications that run on HDFS have large data sets. A typical file in HDFS is gigabytes to terabytes in size. Thus, HDFS is tuned to support large files. It should provide high aggregate data bandwidth and scale to hundreds of nodes in a single cluster. It should support tens of millions of files in a single instance.
         </p>
         </p>
-<a name="N10049"></a><a name="Simple+Coherency+Model"></a>
+<a name="N1004A"></a><a name="Simple+Coherency+Model"></a>
 <h3 class="h4"> Simple Coherency Model </h3>
 <h3 class="h4"> Simple Coherency Model </h3>
 <p>
 <p>
         HDFS applications need a write-once-read-many access model for files. A file once created, written, and closed need not be changed. This assumption simplifies data coherency issues and enables high throughput data access. A MapReduce application or a web crawler application fits perfectly with this model. There is a plan to support appending-writes to files in the future. 
         HDFS applications need a write-once-read-many access model for files. A file once created, written, and closed need not be changed. This assumption simplifies data coherency issues and enables high throughput data access. A MapReduce application or a web crawler application fits perfectly with this model. There is a plan to support appending-writes to files in the future. 
         </p>
         </p>
-<a name="N10053"></a><a name="%E2%80%9CMoving+Computation+is+Cheaper+than+Moving+Data%E2%80%9D"></a>
+<a name="N10054"></a><a name="%E2%80%9CMoving+Computation+is+Cheaper+than+Moving+Data%E2%80%9D"></a>
 <h3 class="h4"> &ldquo;Moving Computation is Cheaper than Moving Data&rdquo; </h3>
 <h3 class="h4"> &ldquo;Moving Computation is Cheaper than Moving Data&rdquo; </h3>
 <p>
 <p>
         A computation requested by an application is much more efficient if it is executed near the data it operates on. This is especially true when the size of the data set is huge. This minimizes network congestion and increases the overall throughput of the system. The assumption is that it is often better to migrate the computation closer to where the data is located rather than moving the data to where the application is running. HDFS provides interfaces for applications to move themselves closer to where the data is located. 
         A computation requested by an application is much more efficient if it is executed near the data it operates on. This is especially true when the size of the data set is huge. This minimizes network congestion and increases the overall throughput of the system. The assumption is that it is often better to migrate the computation closer to where the data is located rather than moving the data to where the application is running. HDFS provides interfaces for applications to move themselves closer to where the data is located. 
         </p>
         </p>
-<a name="N1005D"></a><a name="Portability+Across+Heterogeneous+Hardware+and+Software+Platforms"></a>
+<a name="N1005E"></a><a name="Portability+Across+Heterogeneous+Hardware+and+Software+Platforms"></a>
 <h3 class="h4"> Portability Across Heterogeneous Hardware and Software Platforms </h3>
 <h3 class="h4"> Portability Across Heterogeneous Hardware and Software Platforms </h3>
 <p>
 <p>
         HDFS has been designed to be easily portable from one platform to another. This facilitates widespread adoption of HDFS as a platform of choice for a large set of applications. 
         HDFS has been designed to be easily portable from one platform to another. This facilitates widespread adoption of HDFS as a platform of choice for a large set of applications. 
@@ -333,7 +333,7 @@ document.write("Last Published: " + document.lastModified);
 
 
  
  
     
     
-<a name="N10068"></a><a name="Namenode+and+Datanodes"></a>
+<a name="N10069"></a><a name="Namenode+and+Datanodes"></a>
 <h2 class="h3"> Namenode and Datanodes </h2>
 <h2 class="h3"> Namenode and Datanodes </h2>
 <div class="section">
 <div class="section">
 <p>
 <p>
@@ -352,7 +352,7 @@ document.write("Last Published: " + document.lastModified);
  
  
 
 
     
     
-<a name="N10089"></a><a name="The+File+System+Namespace"></a>
+<a name="N1008A"></a><a name="The+File+System+Namespace"></a>
 <h2 class="h3"> The File System Namespace </h2>
 <h2 class="h3"> The File System Namespace </h2>
 <div class="section">
 <div class="section">
 <p>
 <p>
@@ -366,7 +366,7 @@ document.write("Last Published: " + document.lastModified);
  
  
 
 
     
     
-<a name="N10096"></a><a name="Data+Replication"></a>
+<a name="N10097"></a><a name="Data+Replication"></a>
 <h2 class="h3"> Data Replication </h2>
 <h2 class="h3"> Data Replication </h2>
 <div class="section">
 <div class="section">
 <p>
 <p>
@@ -377,7 +377,7 @@ document.write("Last Published: " + document.lastModified);
     </p>
     </p>
 <div id="" style="text-align: center;">
 <div id="" style="text-align: center;">
 <img id="" class="figure" alt="HDFS Datanodes" src="images/hdfsdatanodes.gif"></div>
 <img id="" class="figure" alt="HDFS Datanodes" src="images/hdfsdatanodes.gif"></div>
-<a name="N100AC"></a><a name="Replica+Placement%3A+The+First+Baby+Steps"></a>
+<a name="N100AD"></a><a name="Replica+Placement%3A+The+First+Baby+Steps"></a>
 <h3 class="h4"> Replica Placement: The First Baby Steps </h3>
 <h3 class="h4"> Replica Placement: The First Baby Steps </h3>
 <p>
 <p>
         The placement of replicas is critical to HDFS reliability and performance. Optimizing replica placement distinguishes HDFS from most other distributed file systems. This is a feature that needs lots of tuning and experience. The purpose of a rack-aware replica placement policy is to improve data reliability, availability, and network bandwidth utilization. The current implementation for the replica placement policy is a first effort in this direction. The short-term goals of implementing this policy are to validate it on production systems, learn more about its behavior, and build a foundation to test and research more sophisticated policies. 
         The placement of replicas is critical to HDFS reliability and performance. Optimizing replica placement distinguishes HDFS from most other distributed file systems. This is a feature that needs lots of tuning and experience. The purpose of a rack-aware replica placement policy is to improve data reliability, availability, and network bandwidth utilization. The current implementation for the replica placement policy is a first effort in this direction. The short-term goals of implementing this policy are to validate it on production systems, learn more about its behavior, and build a foundation to test and research more sophisticated policies. 
@@ -394,12 +394,12 @@ document.write("Last Published: " + document.lastModified);
 <p>
 <p>
         The current, default replica placement policy described here is a work in progress.
         The current, default replica placement policy described here is a work in progress.
         </p>
         </p>
-<a name="N100C6"></a><a name="Replica+Selection"></a>
+<a name="N100C7"></a><a name="Replica+Selection"></a>
 <h3 class="h4"> Replica Selection </h3>
 <h3 class="h4"> Replica Selection </h3>
 <p>
 <p>
         To minimize global bandwidth consumption and read latency, HDFS tries to satisfy a read request from a replica that is closest to the reader. If there exists a replica on the same rack as the reader node, then that replica is preferred to satisfy the read request. If angg/ HDFS cluster spans multiple data centers, then a replica that is resident in the local data center is preferred over any remote replica.
         To minimize global bandwidth consumption and read latency, HDFS tries to satisfy a read request from a replica that is closest to the reader. If there exists a replica on the same rack as the reader node, then that replica is preferred to satisfy the read request. If angg/ HDFS cluster spans multiple data centers, then a replica that is resident in the local data center is preferred over any remote replica.
         </p>
         </p>
-<a name="N100D0"></a><a name="SafeMode"></a>
+<a name="N100D1"></a><a name="SafeMode"></a>
 <h3 class="h4"> SafeMode </h3>
 <h3 class="h4"> SafeMode </h3>
 <p>
 <p>
         On startup, the Namenode enters a special state called <em>Safemode</em>. Replication of data blocks does not occur when the Namenode is in the Safemode state. The Namenode receives Heartbeat and Blockreport messages from the Datanodes. A Blockreport contains the list of data blocks that a Datanode is hosting. Each block has a specified minimum number of replicas. A block is considered <em>safely replicated</em> when the minimum number of replicas of that data block has checked in with the Namenode. After a configurable percentage of safely replicated data blocks checks in with the Namenode (plus an additional 30 seconds), the Namenode exits the Safemode state. It then determines the list of data blocks (if any) that still have fewer than the specified number of replicas. The Namenode then replicates these blocks to other Datanodes.
         On startup, the Namenode enters a special state called <em>Safemode</em>. Replication of data blocks does not occur when the Namenode is in the Safemode state. The Namenode receives Heartbeat and Blockreport messages from the Datanodes. A Blockreport contains the list of data blocks that a Datanode is hosting. Each block has a specified minimum number of replicas. A block is considered <em>safely replicated</em> when the minimum number of replicas of that data block has checked in with the Namenode. After a configurable percentage of safely replicated data blocks checks in with the Namenode (plus an additional 30 seconds), the Namenode exits the Safemode state. It then determines the list of data blocks (if any) that still have fewer than the specified number of replicas. The Namenode then replicates these blocks to other Datanodes.
@@ -407,7 +407,7 @@ document.write("Last Published: " + document.lastModified);
 </div>
 </div>
 
 
     
     
-<a name="N100E1"></a><a name="The+Persistence+of+File+System+Metadata"></a>
+<a name="N100E2"></a><a name="The+Persistence+of+File+System+Metadata"></a>
 <h2 class="h3"> The Persistence of File System Metadata </h2>
 <h2 class="h3"> The Persistence of File System Metadata </h2>
 <div class="section">
 <div class="section">
 <p>
 <p>
@@ -423,7 +423,7 @@ document.write("Last Published: " + document.lastModified);
 
 
 
 
     
     
-<a name="N10103"></a><a name="The+Communication+Protocols"></a>
+<a name="N10104"></a><a name="The+Communication+Protocols"></a>
 <h2 class="h3"> The Communication Protocols </h2>
 <h2 class="h3"> The Communication Protocols </h2>
 <div class="section">
 <div class="section">
 <p>
 <p>
@@ -433,29 +433,29 @@ document.write("Last Published: " + document.lastModified);
  
  
 
 
     
     
-<a name="N1011B"></a><a name="Robustness"></a>
+<a name="N1011C"></a><a name="Robustness"></a>
 <h2 class="h3"> Robustness </h2>
 <h2 class="h3"> Robustness </h2>
 <div class="section">
 <div class="section">
 <p>
 <p>
       The primary objective of HDFS is to store data reliably even in the presence of failures. The three common types of failures are Namenode failures, Datanode failures and network partitions.
       The primary objective of HDFS is to store data reliably even in the presence of failures. The three common types of failures are Namenode failures, Datanode failures and network partitions.
       </p>
       </p>
-<a name="N10124"></a><a name="Data+Disk+Failure%2C+Heartbeats+and+Re-Replication"></a>
+<a name="N10125"></a><a name="Data+Disk+Failure%2C+Heartbeats+and+Re-Replication"></a>
 <h3 class="h4"> Data Disk Failure, Heartbeats and Re-Replication </h3>
 <h3 class="h4"> Data Disk Failure, Heartbeats and Re-Replication </h3>
 <p>
 <p>
         Each Datanode sends a Heartbeat message to the Namenode periodically. A network partition can cause a subset of Datanodes to lose connectivity with the Namenode. The Namenode detects this condition by the absence of a Heartbeat message. The Namenode marks Datanodes without recent Heartbeats as dead and does not forward any new <acronym title="Input/Output">IO</acronym> requests to them. Any data that was registered to a dead Datanode is not available to HDFS any more. Datanode death may cause the replication factor of some blocks to fall below their specified value. The Namenode constantly tracks which blocks need to be replicated and initiates replication whenever necessary. The necessity for re-replication may arise due to many reasons: a Datanode may become unavailable, a replica may become corrupted, a hard disk on a Datanode may fail, or the replication factor of a file may be increased. 
         Each Datanode sends a Heartbeat message to the Namenode periodically. A network partition can cause a subset of Datanodes to lose connectivity with the Namenode. The Namenode detects this condition by the absence of a Heartbeat message. The Namenode marks Datanodes without recent Heartbeats as dead and does not forward any new <acronym title="Input/Output">IO</acronym> requests to them. Any data that was registered to a dead Datanode is not available to HDFS any more. Datanode death may cause the replication factor of some blocks to fall below their specified value. The Namenode constantly tracks which blocks need to be replicated and initiates replication whenever necessary. The necessity for re-replication may arise due to many reasons: a Datanode may become unavailable, a replica may become corrupted, a hard disk on a Datanode may fail, or the replication factor of a file may be increased. 
         </p>
         </p>
-<a name="N10132"></a><a name="Cluster+Rebalancing"></a>
+<a name="N10133"></a><a name="Cluster+Rebalancing"></a>
 <h3 class="h4"> Cluster Rebalancing </h3>
 <h3 class="h4"> Cluster Rebalancing </h3>
 <p>
 <p>
         The HDFS architecture is compatible with <em>data rebalancing schemes</em>. A scheme might automatically move data from one Datanode to another if the free space on a Datanode falls below a certain threshold. In the event of a sudden high demand for a particular file, a scheme might dynamically create additional replicas and rebalance other data in the cluster. These types of data rebalancing schemes are not yet implemented. 
         The HDFS architecture is compatible with <em>data rebalancing schemes</em>. A scheme might automatically move data from one Datanode to another if the free space on a Datanode falls below a certain threshold. In the event of a sudden high demand for a particular file, a scheme might dynamically create additional replicas and rebalance other data in the cluster. These types of data rebalancing schemes are not yet implemented. 
         </p>
         </p>
-<a name="N1013F"></a><a name="Data+Integrity"></a>
+<a name="N10140"></a><a name="Data+Integrity"></a>
 <h3 class="h4"> Data Integrity </h3>
 <h3 class="h4"> Data Integrity </h3>
 <p>
 <p>
         <!-- XXX "checksum checking" sounds funny -->
         <!-- XXX "checksum checking" sounds funny -->
         It is possible that a block of data fetched from a Datanode arrives corrupted. This corruption can occur because of faults in a storage device, network faults, or buggy software. The HDFS client software implements checksum checking on the contents of HDFS files. When a client creates an HDFS file, it computes a checksum of each block of the file and stores these checksums in a separate hidden file in the same HDFS namespace. When a client retrieves file contents it verifies that the data it received from each Datanode matches the checksum stored in the associated checksum file. If not, then the client can opt to retrieve that block from another Datanode that has a replica of that block.
         It is possible that a block of data fetched from a Datanode arrives corrupted. This corruption can occur because of faults in a storage device, network faults, or buggy software. The HDFS client software implements checksum checking on the contents of HDFS files. When a client creates an HDFS file, it computes a checksum of each block of the file and stores these checksums in a separate hidden file in the same HDFS namespace. When a client retrieves file contents it verifies that the data it received from each Datanode matches the checksum stored in the associated checksum file. If not, then the client can opt to retrieve that block from another Datanode that has a replica of that block.
         </p>
         </p>
-<a name="N1014B"></a><a name="Metadata+Disk+Failure"></a>
+<a name="N1014C"></a><a name="Metadata+Disk+Failure"></a>
 <h3 class="h4"> Metadata Disk Failure </h3>
 <h3 class="h4"> Metadata Disk Failure </h3>
 <p>
 <p>
         The FsImage and the EditLog are central data structures of HDFS. A corruption of these files can cause the HDFS instance to be non-functional. For this reason, the Namenode can be configured to support maintaining multiple copies of the FsImage and EditLog. Any update to either the FsImage or EditLog causes each of the FsImages and EditLogs to get updated synchronously. This synchronous updating of multiple copies of the FsImage and EditLog may degrade the rate of namespace transactions per second that a Namenode can support. However, this degradation is acceptable because even though HDFS applications are very <em>data</em> intensive in nature, they are not <em>metadata</em> intensive. When a Namenode restarts, it selects the latest consistent FsImage and EditLog to use.
         The FsImage and the EditLog are central data structures of HDFS. A corruption of these files can cause the HDFS instance to be non-functional. For this reason, the Namenode can be configured to support maintaining multiple copies of the FsImage and EditLog. Any update to either the FsImage or EditLog causes each of the FsImages and EditLogs to get updated synchronously. This synchronous updating of multiple copies of the FsImage and EditLog may degrade the rate of namespace transactions per second that a Namenode can support. However, this degradation is acceptable because even though HDFS applications are very <em>data</em> intensive in nature, they are not <em>metadata</em> intensive. When a Namenode restarts, it selects the latest consistent FsImage and EditLog to use.
@@ -463,7 +463,7 @@ document.write("Last Published: " + document.lastModified);
 <p> 
 <p> 
         The Namenode machine is a single point of failure for an HDFS cluster. If the Namenode machine fails, manual intervention is necessary. Currently, automatic restart and failover of the Namenode software to another machine is not supported.
         The Namenode machine is a single point of failure for an HDFS cluster. If the Namenode machine fails, manual intervention is necessary. Currently, automatic restart and failover of the Namenode software to another machine is not supported.
         </p>
         </p>
-<a name="N1015E"></a><a name="Snapshots"></a>
+<a name="N1015F"></a><a name="Snapshots"></a>
 <h3 class="h4"> Snapshots </h3>
 <h3 class="h4"> Snapshots </h3>
 <p>
 <p>
         Snapshots support storing a copy of data at a particular instant of time. One usage of the snapshot feature may be to roll back a corrupted HDFS instance to a previously known good point in time. HDFS does not currently support snapshots but will in a future release.
         Snapshots support storing a copy of data at a particular instant of time. One usage of the snapshot feature may be to roll back a corrupted HDFS instance to a previously known good point in time. HDFS does not currently support snapshots but will in a future release.
@@ -472,15 +472,15 @@ document.write("Last Published: " + document.lastModified);
  
  
 
 
     
     
-<a name="N10169"></a><a name="Data+Organization"></a>
+<a name="N1016A"></a><a name="Data+Organization"></a>
 <h2 class="h3"> Data Organization </h2>
 <h2 class="h3"> Data Organization </h2>
 <div class="section">
 <div class="section">
-<a name="N10171"></a><a name="Data+Blocks"></a>
+<a name="N10172"></a><a name="Data+Blocks"></a>
 <h3 class="h4"> Data Blocks </h3>
 <h3 class="h4"> Data Blocks </h3>
 <p>
 <p>
         HDFS is designed to support very large files. Applications that are compatible with HDFS are those that deal with large data sets. These applications write their data only once but they read it one or more times and require these reads to be satisfied at streaming speeds. HDFS supports write-once-read-many semantics on files. A typical block size used by HDFS is 64 MB. Thus, an HDFS file is chopped up into 64 MB chunks, and if possible, each chunk will reside on a different Datanode.
         HDFS is designed to support very large files. Applications that are compatible with HDFS are those that deal with large data sets. These applications write their data only once but they read it one or more times and require these reads to be satisfied at streaming speeds. HDFS supports write-once-read-many semantics on files. A typical block size used by HDFS is 64 MB. Thus, an HDFS file is chopped up into 64 MB chunks, and if possible, each chunk will reside on a different Datanode.
         </p>
         </p>
-<a name="N1017B"></a><a name="Staging"></a>
+<a name="N1017C"></a><a name="Staging"></a>
 <h3 class="h4"> Staging </h3>
 <h3 class="h4"> Staging </h3>
 <p>
 <p>
         A client request to create a file does not reach the Namenode immediately. In fact, initially the HDFS client caches the file data into a temporary local file. Application writes are transparently redirected to this temporary local file. When the local file accumulates data worth over one HDFS block size, the client contacts the Namenode. The Namenode inserts the file name into the file system hierarchy and allocates a data block for it. The Namenode responds to the client request with the identity of the Datanode and the destination data block. Then the client flushes the block of data from the local temporary file to the specified Datanode. When a file is closed, the remaining un-flushed data in the temporary local file is transferred to the Datanode. The client then tells the Namenode that the file is closed. At this point, the Namenode commits the file creation operation into a persistent store. If the Namenode dies before the file is closed, the file is lost. 
         A client request to create a file does not reach the Namenode immediately. In fact, initially the HDFS client caches the file data into a temporary local file. Application writes are transparently redirected to this temporary local file. When the local file accumulates data worth over one HDFS block size, the client contacts the Namenode. The Namenode inserts the file name into the file system hierarchy and allocates a data block for it. The Namenode responds to the client request with the identity of the Datanode and the destination data block. Then the client flushes the block of data from the local temporary file to the specified Datanode. When a file is closed, the remaining un-flushed data in the temporary local file is transferred to the Datanode. The client then tells the Namenode that the file is closed. At this point, the Namenode commits the file creation operation into a persistent store. If the Namenode dies before the file is closed, the file is lost. 
@@ -488,7 +488,7 @@ document.write("Last Published: " + document.lastModified);
 <p>
 <p>
         The above approach has been adopted after careful consideration of target applications that run on HDFS. These applications need streaming writes to files. If a client writes to a remote file directly without any client side buffering, the network speed and the congestion in the network impacts throughput considerably. This approach is not without precedent. Earlier distributed file systems, e.g. <acronym title="Andrew File System">AFS</acronym>, have used client side caching to improve performance. A POSIX requirement has been relaxed to achieve higher performance of data uploads. 
         The above approach has been adopted after careful consideration of target applications that run on HDFS. These applications need streaming writes to files. If a client writes to a remote file directly without any client side buffering, the network speed and the congestion in the network impacts throughput considerably. This approach is not without precedent. Earlier distributed file systems, e.g. <acronym title="Andrew File System">AFS</acronym>, have used client side caching to improve performance. A POSIX requirement has been relaxed to achieve higher performance of data uploads. 
         </p>
         </p>
-<a name="N1018E"></a><a name="Replication+Pipelining"></a>
+<a name="N1018F"></a><a name="Replication+Pipelining"></a>
 <h3 class="h4"> Replication Pipelining </h3>
 <h3 class="h4"> Replication Pipelining </h3>
 <p>
 <p>
         When a client is writing data to an HDFS file, its data is first written to a local file as explained in the previous section. Suppose the HDFS file has a replication factor of three. When the local file accumulates a full block of user data, the client retrieves a list of Datanodes from the Namenode. This list contains the Datanodes that will host a replica of that block. The client then flushes the data block to the first Datanode. The first Datanode starts receiving the data in small portions (4 KB), writes each portion to its local repository and transfers that portion to the second Datanode in the list. The second Datanode, in turn starts receiving each portion of the data block, writes that portion to its repository and then flushes that portion to the third Datanode. Finally, the third Datanode writes the data to its local repository. Thus, a Datanode can be receiving data from the previous one in the pipeline and at the same time forwarding data to the next one in the pipeline. Thus, the data is pipelined from one Datanode to the next.
         When a client is writing data to an HDFS file, its data is first written to a local file as explained in the previous section. Suppose the HDFS file has a replication factor of three. When the local file accumulates a full block of user data, the client retrieves a list of Datanodes from the Namenode. This list contains the Datanodes that will host a replica of that block. The client then flushes the data block to the first Datanode. The first Datanode starts receiving the data in small portions (4 KB), writes each portion to its local repository and transfers that portion to the second Datanode in the list. The second Datanode, in turn starts receiving each portion of the data block, writes that portion to its repository and then flushes that portion to the third Datanode. Finally, the third Datanode writes the data to its local repository. Thus, a Datanode can be receiving data from the previous one in the pipeline and at the same time forwarding data to the next one in the pipeline. Thus, the data is pipelined from one Datanode to the next.
@@ -496,13 +496,13 @@ document.write("Last Published: " + document.lastModified);
 </div>
 </div>
 
 
     
     
-<a name="N10199"></a><a name="Accessibility"></a>
+<a name="N1019A"></a><a name="Accessibility"></a>
 <h2 class="h3"> Accessibility </h2>
 <h2 class="h3"> Accessibility </h2>
 <div class="section">
 <div class="section">
 <p>
 <p>
       HDFS can be accessed from applications in many different ways. Natively, HDFS provides a <a href="http://hadoop.apache.org/core/docs/current/api/">Java API</a> for applications to use. A C language wrapper for this Java API is also available. In addition, an HTTP browser can also be used to browse the files of an HDFS instance. Work is in progress to expose HDFS through the <acronym title="Web-based Distributed Authoring and Versioning">WebDAV</acronym> protocol. 
       HDFS can be accessed from applications in many different ways. Natively, HDFS provides a <a href="http://hadoop.apache.org/core/docs/current/api/">Java API</a> for applications to use. A C language wrapper for this Java API is also available. In addition, an HTTP browser can also be used to browse the files of an HDFS instance. Work is in progress to expose HDFS through the <acronym title="Web-based Distributed Authoring and Versioning">WebDAV</acronym> protocol. 
       </p>
       </p>
-<a name="N101AE"></a><a name="DFSShell"></a>
+<a name="N101AF"></a><a name="DFSShell"></a>
 <h3 class="h4"> DFSShell </h3>
 <h3 class="h4"> DFSShell </h3>
 <p>
 <p>
         HDFS allows user data to be organized in the form of files and directories. It provides a commandline interface called <em>DFSShell</em> that lets a user interact with the data in HDFS. The syntax of this command set is similar to other shells (e.g. bash, csh) that users are already familiar with. Here are some sample action/command pairs:
         HDFS allows user data to be organized in the form of files and directories. It provides a commandline interface called <em>DFSShell</em> that lets a user interact with the data in HDFS. The syntax of this command set is similar to other shells (e.g. bash, csh) that users are already familiar with. Here are some sample action/command pairs:
@@ -537,7 +537,7 @@ document.write("Last Published: " + document.lastModified);
 <p>
 <p>
         DFSShell is targeted for applications that need a scripting language to interact with the stored data.
         DFSShell is targeted for applications that need a scripting language to interact with the stored data.
         </p>
         </p>
-<a name="N10206"></a><a name="DFSAdmin"></a>
+<a name="N10207"></a><a name="DFSAdmin"></a>
 <h3 class="h4"> DFSAdmin </h3>
 <h3 class="h4"> DFSAdmin </h3>
 <p>
 <p>
         The <em>DFSAdmin</em> command set is used for administering an HDFS cluster. These are commands that are used only by an HDFS administrator. Here are some sample action/command pairs:
         The <em>DFSAdmin</em> command set is used for administering an HDFS cluster. These are commands that are used only by an HDFS administrator. Here are some sample action/command pairs:
@@ -569,7 +569,7 @@ document.write("Last Published: " + document.lastModified);
 </tr>
 </tr>
         
         
 </table>
 </table>
-<a name="N10254"></a><a name="Browser+Interface"></a>
+<a name="N10255"></a><a name="Browser+Interface"></a>
 <h3 class="h4"> Browser Interface </h3>
 <h3 class="h4"> Browser Interface </h3>
 <p>
 <p>
         A typical HDFS install configures a web server to expose the HDFS namespace through a configurable TCP port. This allows a user to navigate the HDFS namespace and view the contents of its files using a web browser.
         A typical HDFS install configures a web server to expose the HDFS namespace through a configurable TCP port. This allows a user to navigate the HDFS namespace and view the contents of its files using a web browser.
@@ -577,10 +577,10 @@ document.write("Last Published: " + document.lastModified);
 </div> 
 </div> 
 
 
     
     
-<a name="N1025F"></a><a name="Space+Reclamation"></a>
+<a name="N10260"></a><a name="Space+Reclamation"></a>
 <h2 class="h3"> Space Reclamation </h2>
 <h2 class="h3"> Space Reclamation </h2>
 <div class="section">
 <div class="section">
-<a name="N10265"></a><a name="File+Deletes+and+Undeletes"></a>
+<a name="N10266"></a><a name="File+Deletes+and+Undeletes"></a>
 <h3 class="h4"> File Deletes and Undeletes </h3>
 <h3 class="h4"> File Deletes and Undeletes </h3>
 <p>
 <p>
         When a file is deleted by a user or an application, it is not immediately removed from HDFS.  Instead, HDFS first renames it to a file in the <span class="codefrag">/trash</span> directory. The file can be restored quickly as long as it remains in <span class="codefrag">/trash</span>. A file remains in <span class="codefrag">/trash</span> for a configurable amount of time. After the expiry of its life in <span class="codefrag">/trash</span>, the Namenode deletes the file from the HDFS namespace. The deletion of a file causes the blocks associated with the file to be freed. Note that there could be an appreciable time delay between the time a file is deleted by a user and the time of the corresponding increase in free space in HDFS.
         When a file is deleted by a user or an application, it is not immediately removed from HDFS.  Instead, HDFS first renames it to a file in the <span class="codefrag">/trash</span> directory. The file can be restored quickly as long as it remains in <span class="codefrag">/trash</span>. A file remains in <span class="codefrag">/trash</span> for a configurable amount of time. After the expiry of its life in <span class="codefrag">/trash</span>, the Namenode deletes the file from the HDFS namespace. The deletion of a file causes the blocks associated with the file to be freed. Note that there could be an appreciable time delay between the time a file is deleted by a user and the time of the corresponding increase in free space in HDFS.
@@ -588,7 +588,7 @@ document.write("Last Published: " + document.lastModified);
 <p>
 <p>
         A user can Undelete a file after deleting it as long as it remains in the <span class="codefrag">/trash</span> directory. If a user wants to undelete a file that he/she has deleted, he/she can navigate the <span class="codefrag">/trash</span> directory and retrieve the file. The <span class="codefrag">/trash</span> directory contains only the latest copy of the file that was deleted. The <span class="codefrag">/trash</span> directory is just like any other directory with one special feature: HDFS applies specified policies to automatically delete files from this directory. The current default policy is to delete files from <span class="codefrag">/trash</span> that are more than 6 hours old. In the future, this policy will be configurable through a well defined interface.
         A user can Undelete a file after deleting it as long as it remains in the <span class="codefrag">/trash</span> directory. If a user wants to undelete a file that he/she has deleted, he/she can navigate the <span class="codefrag">/trash</span> directory and retrieve the file. The <span class="codefrag">/trash</span> directory contains only the latest copy of the file that was deleted. The <span class="codefrag">/trash</span> directory is just like any other directory with one special feature: HDFS applies specified policies to automatically delete files from this directory. The current default policy is to delete files from <span class="codefrag">/trash</span> that are more than 6 hours old. In the future, this policy will be configurable through a well defined interface.
         </p>
         </p>
-<a name="N1028D"></a><a name="Decrease+Replication+Factor"></a>
+<a name="N1028E"></a><a name="Decrease+Replication+Factor"></a>
 <h3 class="h4"> Decrease Replication Factor </h3>
 <h3 class="h4"> Decrease Replication Factor </h3>
 <p>
 <p>
         When the replication factor of a file is reduced, the Namenode selects excess replicas that can be deleted. The next Heartbeat transfers this information to the Datanode. The Datanode then removes the corresponding blocks and the corresponding free space appears in the cluster. Once again, there might be a time delay between the completion of the <span class="codefrag">setReplication</span> API call and the appearance of free space in the cluster.
         When the replication factor of a file is reduced, the Namenode selects excess replicas that can be deleted. The next Heartbeat transfers this information to the Datanode. The Datanode then removes the corresponding blocks and the corresponding free space appears in the cluster. Once again, there might be a time delay between the completion of the <span class="codefrag">setReplication</span> API call and the appearance of free space in the cluster.
@@ -597,7 +597,7 @@ document.write("Last Published: " + document.lastModified);
 
 
 
 
     
     
-<a name="N1029B"></a><a name="References"></a>
+<a name="N1029C"></a><a name="References"></a>
 <h2 class="h3"> References </h2>
 <h2 class="h3"> References </h2>
 <div class="section">
 <div class="section">
 <p>
 <p>

+ 15 - 15
docs/hdfs_user_guide.html

@@ -220,7 +220,7 @@ document.write("Last Published: " + document.lastModified);
 </ul>
 </ul>
 </div>
 </div>
     
     
-<a name="N1000C"></a><a name="Purpose"></a>
+<a name="N1000D"></a><a name="Purpose"></a>
 <h2 class="h3">Purpose</h2>
 <h2 class="h3">Purpose</h2>
 <div class="section">
 <div class="section">
 <p>
 <p>
@@ -235,7 +235,7 @@ document.write("Last Published: " + document.lastModified);
 </div>
 </div>
 
 
     
     
-<a name="N1001A"></a><a name="Overview"></a>
+<a name="N1001B"></a><a name="Overview"></a>
 <h2 class="h3"> Overview </h2>
 <h2 class="h3"> Overview </h2>
 <div class="section">
 <div class="section">
 <p>
 <p>
@@ -341,7 +341,7 @@ document.write("Last Published: " + document.lastModified);
     
     
 </ul>
 </ul>
 </div> 
 </div> 
-<a name="N10082"></a><a name="Pre-requisites"></a>
+<a name="N10083"></a><a name="Pre-requisites"></a>
 <h2 class="h3"> Pre-requisites </h2>
 <h2 class="h3"> Pre-requisites </h2>
 <div class="section">
 <div class="section">
 <p>
 <p>
@@ -370,7 +370,7 @@ document.write("Last Published: " + document.lastModified);
  	machine.	
  	machine.	
     </p>
     </p>
 </div> 
 </div> 
-<a name="N100A0"></a><a name="Web+Interface"></a>
+<a name="N100A1"></a><a name="Web+Interface"></a>
 <h2 class="h3"> Web Interface </h2>
 <h2 class="h3"> Web Interface </h2>
 <div class="section">
 <div class="section">
 <p>
 <p>
@@ -384,7 +384,7 @@ document.write("Last Published: " + document.lastModified);
  	page).
  	page).
  </p>
  </p>
 </div> 
 </div> 
-<a name="N100AD"></a><a name="Shell+Commands"></a>
+<a name="N100AE"></a><a name="Shell+Commands"></a>
 <h2 class="h3">Shell Commands</h2>
 <h2 class="h3">Shell Commands</h2>
 <div class="section">
 <div class="section">
 <p>
 <p>
@@ -400,7 +400,7 @@ document.write("Last Published: " + document.lastModified);
       changing file permissions, etc. It also supports a few HDFS
       changing file permissions, etc. It also supports a few HDFS
       specific operations like changing replication of files.
       specific operations like changing replication of files.
      </p>
      </p>
-<a name="N100BC"></a><a name="DFSAdmin+Command"></a>
+<a name="N100BD"></a><a name="DFSAdmin+Command"></a>
 <h3 class="h4"> DFSAdmin Command </h3>
 <h3 class="h4"> DFSAdmin Command </h3>
 <p>
 <p>
    	
    	
@@ -433,7 +433,7 @@ document.write("Last Published: " + document.lastModified);
    	
    	
 </ul>
 </ul>
 </div> 
 </div> 
-<a name="N100E5"></a><a name="Secondary+Namenode"></a>
+<a name="N100E6"></a><a name="Secondary+Namenode"></a>
 <h2 class="h3"> Secondary Namenode </h2>
 <h2 class="h3"> Secondary Namenode </h2>
 <div class="section">
 <div class="section">
 <p>
 <p>
@@ -458,7 +458,7 @@ document.write("Last Published: " + document.lastModified);
      specified in <span class="codefrag">conf/masters</span> file.
      specified in <span class="codefrag">conf/masters</span> file.
    </p>
    </p>
 </div> 
 </div> 
-<a name="N1010A"></a><a name="Rebalancer"></a>
+<a name="N1010B"></a><a name="Rebalancer"></a>
 <h2 class="h3"> Rebalancer </h2>
 <h2 class="h3"> Rebalancer </h2>
 <div class="section">
 <div class="section">
 <p>
 <p>
@@ -503,7 +503,7 @@ document.write("Last Published: " + document.lastModified);
       <a href="http://issues.apache.org/jira/browse/HADOOP-1652">HADOOP-1652</a>.
       <a href="http://issues.apache.org/jira/browse/HADOOP-1652">HADOOP-1652</a>.
     </p>
     </p>
 </div> 
 </div> 
-<a name="N10131"></a><a name="Rack+Awareness"></a>
+<a name="N10132"></a><a name="Rack+Awareness"></a>
 <h2 class="h3"> Rack Awareness </h2>
 <h2 class="h3"> Rack Awareness </h2>
 <div class="section">
 <div class="section">
 <p>
 <p>
@@ -522,7 +522,7 @@ document.write("Last Published: " + document.lastModified);
       <a href="http://issues.apache.org/jira/browse/HADOOP-692">HADOOP-692</a>.
       <a href="http://issues.apache.org/jira/browse/HADOOP-692">HADOOP-692</a>.
     </p>
     </p>
 </div> 
 </div> 
-<a name="N1014F"></a><a name="Safemode"></a>
+<a name="N10150"></a><a name="Safemode"></a>
 <h2 class="h3"> Safemode </h2>
 <h2 class="h3"> Safemode </h2>
 <div class="section">
 <div class="section">
 <p>
 <p>
@@ -542,7 +542,7 @@ document.write("Last Published: " + document.lastModified);
       <a href="http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/dfs/NameNode.html#setSafeMode(org.apache.hadoop.dfs.FSConstants.SafeModeAction)"><span class="codefrag">setSafeMode()</span></a>.
       <a href="http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/dfs/NameNode.html#setSafeMode(org.apache.hadoop.dfs.FSConstants.SafeModeAction)"><span class="codefrag">setSafeMode()</span></a>.
     </p>
     </p>
 </div> 
 </div> 
-<a name="N1016D"></a><a name="Fsck"></a>
+<a name="N1016E"></a><a name="Fsck"></a>
 <h2 class="h3"> Fsck </h2>
 <h2 class="h3"> Fsck </h2>
 <div class="section">
 <div class="section">
 <p>    
 <p>    
@@ -558,7 +558,7 @@ document.write("Last Published: " + document.lastModified);
       Fsck can be run on the whole filesystem or on a subset of files.
       Fsck can be run on the whole filesystem or on a subset of files.
      </p>
      </p>
 </div> 
 </div> 
-<a name="N1017D"></a><a name="Upgrade+and+Rollback"></a>
+<a name="N1017E"></a><a name="Upgrade+and+Rollback"></a>
 <h2 class="h3"> Upgrade and Rollback </h2>
 <h2 class="h3"> Upgrade and Rollback </h2>
 <div class="section">
 <div class="section">
 <p>
 <p>
@@ -617,7 +617,7 @@ document.write("Last Published: " + document.lastModified);
       
       
 </ul>
 </ul>
 </div> 
 </div> 
-<a name="N101BE"></a><a name="File+Permissions+and+Security"></a>
+<a name="N101BF"></a><a name="File+Permissions+and+Security"></a>
 <h2 class="h3"> File Permissions and Security </h2>
 <h2 class="h3"> File Permissions and Security </h2>
 <div class="section">
 <div class="section">
 <p>           
 <p>           
@@ -629,7 +629,7 @@ document.write("Last Published: " + document.lastModified);
       authentication and encryption of data transfers.
       authentication and encryption of data transfers.
      </p>
      </p>
 </div> 
 </div> 
-<a name="N101CB"></a><a name="Scalability"></a>
+<a name="N101CC"></a><a name="Scalability"></a>
 <h2 class="h3"> Scalability </h2>
 <h2 class="h3"> Scalability </h2>
 <div class="section">
 <div class="section">
 <p>
 <p>
@@ -647,7 +647,7 @@ document.write("Last Published: " + document.lastModified);
       suggested configuration improvements for large Hadoop clusters.
       suggested configuration improvements for large Hadoop clusters.
      </p>
      </p>
 </div> 
 </div> 
-<a name="N101DD"></a><a name="Related+Documentation"></a>
+<a name="N101DE"></a><a name="Related+Documentation"></a>
 <h2 class="h3"> Related Documentation </h2>
 <h2 class="h3"> Related Documentation </h2>
 <div class="section">
 <div class="section">
 <p>
 <p>

+ 35 - 35
docs/hod.html

@@ -294,7 +294,7 @@ document.write("Last Published: " + document.lastModified);
 </ul>
 </ul>
 </div>
 </div>
     
     
-<a name="N1000C"></a><a name="Introduction"></a>
+<a name="N1000D"></a><a name="Introduction"></a>
 <h2 class="h3"> Introduction </h2>
 <h2 class="h3"> Introduction </h2>
 <div class="section">
 <div class="section">
 <p>
 <p>
@@ -303,30 +303,30 @@ document.write("Last Published: " + document.lastModified);
 </div>
 </div>
 
 
     
     
-<a name="N1001E"></a><a name="Feature+List"></a>
+<a name="N1001F"></a><a name="Feature+List"></a>
 <h2 class="h3"> Feature List </h2>
 <h2 class="h3"> Feature List </h2>
 <div class="section">
 <div class="section">
-<a name="N10024"></a><a name="Simplified+Interface+for+Provisioning+Hadoop+Clusters"></a>
+<a name="N10025"></a><a name="Simplified+Interface+for+Provisioning+Hadoop+Clusters"></a>
 <h3 class="h4"> Simplified Interface for Provisioning Hadoop Clusters </h3>
 <h3 class="h4"> Simplified Interface for Provisioning Hadoop Clusters </h3>
 <p>
 <p>
         By far, the biggest advantage of HOD is to quickly setup a Hadoop cluster. The user interacts with the cluster through a simple command line interface, the HOD client. HOD brings up a virtual MapReduce cluster with the required number of nodes, which the user can use for running Hadoop jobs. When done, HOD will automatically clean up the resources and make the nodes available again.
         By far, the biggest advantage of HOD is to quickly setup a Hadoop cluster. The user interacts with the cluster through a simple command line interface, the HOD client. HOD brings up a virtual MapReduce cluster with the required number of nodes, which the user can use for running Hadoop jobs. When done, HOD will automatically clean up the resources and make the nodes available again.
         </p>
         </p>
-<a name="N1002E"></a><a name="Automatic+installation+of+Hadoop"></a>
+<a name="N1002F"></a><a name="Automatic+installation+of+Hadoop"></a>
 <h3 class="h4"> Automatic installation of Hadoop </h3>
 <h3 class="h4"> Automatic installation of Hadoop </h3>
 <p>
 <p>
         With HOD, Hadoop does not need to be even installed on the cluster. The user can provide a Hadoop tarball that HOD will automatically distribute to all the nodes in the cluster.
         With HOD, Hadoop does not need to be even installed on the cluster. The user can provide a Hadoop tarball that HOD will automatically distribute to all the nodes in the cluster.
         </p>
         </p>
-<a name="N10038"></a><a name="Configuring+Hadoop"></a>
+<a name="N10039"></a><a name="Configuring+Hadoop"></a>
 <h3 class="h4"> Configuring Hadoop </h3>
 <h3 class="h4"> Configuring Hadoop </h3>
 <p>
 <p>
         Dynamic parameters of Hadoop configuration, such as the NameNode and JobTracker addresses and ports, and file system temporary directories are generated and distributed by HOD automatically to all nodes in the cluster. In addition, HOD allows the user to configure Hadoop parameters at both the server (for e.g. JobTracker) and client (for e.g. JobClient) level, including 'final' parameters, that were introduced with Hadoop 0.15.
         Dynamic parameters of Hadoop configuration, such as the NameNode and JobTracker addresses and ports, and file system temporary directories are generated and distributed by HOD automatically to all nodes in the cluster. In addition, HOD allows the user to configure Hadoop parameters at both the server (for e.g. JobTracker) and client (for e.g. JobClient) level, including 'final' parameters, that were introduced with Hadoop 0.15.
         </p>
         </p>
-<a name="N10042"></a><a name="Auto-cleanup+of+Unused+Clusters"></a>
+<a name="N10043"></a><a name="Auto-cleanup+of+Unused+Clusters"></a>
 <h3 class="h4"> Auto-cleanup of Unused Clusters </h3>
 <h3 class="h4"> Auto-cleanup of Unused Clusters </h3>
 <p>
 <p>
         HOD has an automatic timeout so that users cannot misuse resources they aren't using. The timeout applies only when there is no MapReduce job running. 
         HOD has an automatic timeout so that users cannot misuse resources they aren't using. The timeout applies only when there is no MapReduce job running. 
         </p>
         </p>
-<a name="N1004C"></a><a name="Log+Services"></a>
+<a name="N1004D"></a><a name="Log+Services"></a>
 <h3 class="h4"> Log Services </h3>
 <h3 class="h4"> Log Services </h3>
 <p>
 <p>
         HOD can be used to collect all MapReduce logs to a central location for archiving and inspection after the job is completed.
         HOD can be used to collect all MapReduce logs to a central location for archiving and inspection after the job is completed.
@@ -334,13 +334,13 @@ document.write("Last Published: " + document.lastModified);
 </div>
 </div>
 
 
     
     
-<a name="N10057"></a><a name="HOD+Components"></a>
+<a name="N10058"></a><a name="HOD+Components"></a>
 <h2 class="h3"> HOD Components </h2>
 <h2 class="h3"> HOD Components </h2>
 <div class="section">
 <div class="section">
 <p>
 <p>
       This is a brief overview of the various components of HOD and how they interact to provision Hadoop.
       This is a brief overview of the various components of HOD and how they interact to provision Hadoop.
       </p>
       </p>
-<a name="N10060"></a><a name="HOD+Client"></a>
+<a name="N10061"></a><a name="HOD+Client"></a>
 <h3 class="h4"> HOD Client </h3>
 <h3 class="h4"> HOD Client </h3>
 <p>
 <p>
         The HOD client is a Unix command that users use to allocate Hadoop MapReduce clusters. The command provides other options to list allocated clusters and deallocate them. The HOD client generates the <em>hadoop-site.xml</em> in a user specified directory. The user can point to this configuration file while running Map/Reduce jobs on the allocated cluster.
         The HOD client is a Unix command that users use to allocate Hadoop MapReduce clusters. The command provides other options to list allocated clusters and deallocate them. The HOD client generates the <em>hadoop-site.xml</em> in a user specified directory. The user can point to this configuration file while running Map/Reduce jobs on the allocated cluster.
@@ -348,7 +348,7 @@ document.write("Last Published: " + document.lastModified);
 <p>
 <p>
         The nodes from where the HOD Client is run are called <em>submit nodes</em> because jobs are submitted to the resource manager system for allocating and running clusters from these nodes.
         The nodes from where the HOD Client is run are called <em>submit nodes</em> because jobs are submitted to the resource manager system for allocating and running clusters from these nodes.
         </p>
         </p>
-<a name="N10073"></a><a name="RingMaster"></a>
+<a name="N10074"></a><a name="RingMaster"></a>
 <h3 class="h4"> RingMaster </h3>
 <h3 class="h4"> RingMaster </h3>
 <p>
 <p>
         The RingMaster is a HOD process that is started on one node per every allocated cluster. It is submitted as a 'job' to the resource manager by the HOD client. It controls which Hadoop daemons start on which nodes. It provides this information to other HOD processes, such as the HOD client, so users can also determine this information. The RingMaster is responsible for hosting and distributing the Hadoop tarball to all nodes in the cluster. It also automatically cleans up unused clusters.
         The RingMaster is a HOD process that is started on one node per every allocated cluster. It is submitted as a 'job' to the resource manager by the HOD client. It controls which Hadoop daemons start on which nodes. It provides this information to other HOD processes, such as the HOD client, so users can also determine this information. The RingMaster is responsible for hosting and distributing the Hadoop tarball to all nodes in the cluster. It also automatically cleans up unused clusters.
@@ -356,17 +356,17 @@ document.write("Last Published: " + document.lastModified);
 <p>
 <p>
         
         
 </p>
 </p>
-<a name="N10080"></a><a name="HodRing"></a>
+<a name="N10081"></a><a name="HodRing"></a>
 <h3 class="h4"> HodRing </h3>
 <h3 class="h4"> HodRing </h3>
 <p>
 <p>
         The HodRing is a HOD process that runs on every allocated node in the cluster. These processes are run by the RingMaster through the resource manager, using a facility of parallel execution. The HodRings are responsible for launching Hadoop commands on the nodes to bring up the Hadoop daemons. They get the command to launch from the RingMaster.
         The HodRing is a HOD process that runs on every allocated node in the cluster. These processes are run by the RingMaster through the resource manager, using a facility of parallel execution. The HodRings are responsible for launching Hadoop commands on the nodes to bring up the Hadoop daemons. They get the command to launch from the RingMaster.
         </p>
         </p>
-<a name="N1008A"></a><a name="Hodrc+%2F+HOD+configuration+file"></a>
+<a name="N1008B"></a><a name="Hodrc+%2F+HOD+configuration+file"></a>
 <h3 class="h4"> Hodrc / HOD configuration file </h3>
 <h3 class="h4"> Hodrc / HOD configuration file </h3>
 <p>
 <p>
         An INI style configuration file where the users configure various options for the HOD system, including install locations of different software, resource manager parameters, log and temp file directories, parameters for their MapReduce jobs, etc.
         An INI style configuration file where the users configure various options for the HOD system, including install locations of different software, resource manager parameters, log and temp file directories, parameters for their MapReduce jobs, etc.
         </p>
         </p>
-<a name="N10094"></a><a name="Submit+Nodes+and+Compute+Nodes"></a>
+<a name="N10095"></a><a name="Submit+Nodes+and+Compute+Nodes"></a>
 <h3 class="h4"> Submit Nodes and Compute Nodes </h3>
 <h3 class="h4"> Submit Nodes and Compute Nodes </h3>
 <p>
 <p>
         The nodes from where the <em>HOD Client</em> is run are referred as <em>submit nodes</em> because jobs are submitted to the resource manager system for allocating and running clusters from these nodes.
         The nodes from where the <em>HOD Client</em> is run are referred as <em>submit nodes</em> because jobs are submitted to the resource manager system for allocating and running clusters from these nodes.
@@ -377,17 +377,17 @@ document.write("Last Published: " + document.lastModified);
 </div>
 </div>
 
 
     
     
-<a name="N100AE"></a><a name="Getting+Started+with+HOD"></a>
+<a name="N100AF"></a><a name="Getting+Started+with+HOD"></a>
 <h2 class="h3"> Getting Started with HOD </h2>
 <h2 class="h3"> Getting Started with HOD </h2>
 <div class="section">
 <div class="section">
-<a name="N100B4"></a><a name="Pre-Requisites"></a>
+<a name="N100B5"></a><a name="Pre-Requisites"></a>
 <h3 class="h4"> Pre-Requisites </h3>
 <h3 class="h4"> Pre-Requisites </h3>
-<a name="N100BA"></a><a name="Hardware"></a>
+<a name="N100BB"></a><a name="Hardware"></a>
 <h4> Hardware </h4>
 <h4> Hardware </h4>
 <p>
 <p>
           HOD requires a minimum of 3 nodes configured through a resource manager.
           HOD requires a minimum of 3 nodes configured through a resource manager.
           </p>
           </p>
-<a name="N100C4"></a><a name="Software"></a>
+<a name="N100C5"></a><a name="Software"></a>
 <h4> Software </h4>
 <h4> Software </h4>
 <p>
 <p>
           The following components are assumed to be installed before using HOD:
           The following components are assumed to be installed before using HOD:
@@ -424,7 +424,7 @@ document.write("Last Published: " + document.lastModified);
 <p>
 <p>
           HOD configuration requires the location of installs of these components to be the same on all nodes in the cluster. It will also make the configuration simpler to have the same location on the submit nodes.
           HOD configuration requires the location of installs of these components to be the same on all nodes in the cluster. It will also make the configuration simpler to have the same location on the submit nodes.
           </p>
           </p>
-<a name="N100FE"></a><a name="Resource+Manager+Configuration+Pre-requisites"></a>
+<a name="N100FF"></a><a name="Resource+Manager+Configuration+Pre-requisites"></a>
 <h4>Resource Manager Configuration Pre-requisites</h4>
 <h4>Resource Manager Configuration Pre-requisites</h4>
 <p>
 <p>
           For using HOD with Torque:
           For using HOD with Torque:
@@ -456,7 +456,7 @@ document.write("Last Published: " + document.lastModified);
           More information about setting up Torque can be found by referring to the documentation <a href="http://www.clusterresources.com/pages/products/torque-resource-manager.php">here.</a>
           More information about setting up Torque can be found by referring to the documentation <a href="http://www.clusterresources.com/pages/products/torque-resource-manager.php">here.</a>
           
           
 </p>
 </p>
-<a name="N10125"></a><a name="Setting+up+HOD"></a>
+<a name="N10126"></a><a name="Setting+up+HOD"></a>
 <h3 class="h4">Setting up HOD</h3>
 <h3 class="h4">Setting up HOD</h3>
 <ul>
 <ul>
           
           
@@ -550,15 +550,15 @@ document.write("Last Published: " + document.lastModified);
 </div>
 </div>
 
 
     
     
-<a name="N101B3"></a><a name="Running+HOD"></a>
+<a name="N101B4"></a><a name="Running+HOD"></a>
 <h2 class="h3">Running HOD</h2>
 <h2 class="h3">Running HOD</h2>
 <div class="section">
 <div class="section">
-<a name="N101B9"></a><a name="Overview"></a>
+<a name="N101BA"></a><a name="Overview"></a>
 <h3 class="h4">Overview</h3>
 <h3 class="h4">Overview</h3>
 <p>
 <p>
         A typical session of HOD will involve atleast three steps: allocate, run hadoop jobs, deallocate.
         A typical session of HOD will involve atleast three steps: allocate, run hadoop jobs, deallocate.
         </p>
         </p>
-<a name="N101C2"></a><a name="Operation+allocate"></a>
+<a name="N101C3"></a><a name="Operation+allocate"></a>
 <h4>Operation allocate</h4>
 <h4>Operation allocate</h4>
 <p>
 <p>
           The allocate operation is used to allocate a set of nodes and install and provision Hadoop on them. It has the following syntax:
           The allocate operation is used to allocate a set of nodes and install and provision Hadoop on them. It has the following syntax:
@@ -605,7 +605,7 @@ document.write("Last Published: " + document.lastModified);
 </tr>
 </tr>
           
           
 </table>
 </table>
-<a name="N10202"></a><a name="Running+Hadoop+jobs+using+the+allocated+cluster"></a>
+<a name="N10203"></a><a name="Running+Hadoop+jobs+using+the+allocated+cluster"></a>
 <h4>Running Hadoop jobs using the allocated cluster</h4>
 <h4>Running Hadoop jobs using the allocated cluster</h4>
 <p>
 <p>
           Now, one can run Hadoop jobs using the allocated cluster in the usual manner:
           Now, one can run Hadoop jobs using the allocated cluster in the usual manner:
@@ -631,7 +631,7 @@ document.write("Last Published: " + document.lastModified);
 </tr>
 </tr>
           
           
 </table>
 </table>
-<a name="N10225"></a><a name="Operation+deallocate"></a>
+<a name="N10226"></a><a name="Operation+deallocate"></a>
 <h4>Operation deallocate</h4>
 <h4>Operation deallocate</h4>
 <p>
 <p>
           The deallocate operation is used to release an allocated cluster. When finished with a cluster, deallocate must be run so that the nodes become free for others to use. The deallocate operation has the following syntax:
           The deallocate operation is used to release an allocated cluster. When finished with a cluster, deallocate must be run so that the nodes become free for others to use. The deallocate operation has the following syntax:
@@ -657,7 +657,7 @@ document.write("Last Published: " + document.lastModified);
 </tr>
 </tr>
           
           
 </table>
 </table>
-<a name="N10249"></a><a name="Command+Line+Options"></a>
+<a name="N1024A"></a><a name="Command+Line+Options"></a>
 <h3 class="h4">Command Line Options</h3>
 <h3 class="h4">Command Line Options</h3>
 <p>
 <p>
         This section covers the major command line options available via the hod command:
         This section covers the major command line options available via the hod command:
@@ -768,10 +768,10 @@ document.write("Last Published: " + document.lastModified);
         </p>
         </p>
 </div>
 </div>
     
     
-<a name="N102C9"></a><a name="HOD+Configuration"></a>
+<a name="N102CA"></a><a name="HOD+Configuration"></a>
 <h2 class="h3"> HOD Configuration </h2>
 <h2 class="h3"> HOD Configuration </h2>
 <div class="section">
 <div class="section">
-<a name="N102CF"></a><a name="Introduction+to+HOD+Configuration"></a>
+<a name="N102D0"></a><a name="Introduction+to+HOD+Configuration"></a>
 <h3 class="h4"> Introduction to HOD Configuration </h3>
 <h3 class="h4"> Introduction to HOD Configuration </h3>
 <p>
 <p>
         Configuration options for HOD are organized as sections and options within them. They can be specified in two ways: a configuration file in the INI format, and as command line options to the HOD shell, specified in the format --section.option[=value]. If the same option is specified in both places, the value specified on the command line overrides the value in the configuration file.
         Configuration options for HOD are organized as sections and options within them. They can be specified in two ways: a configuration file in the INI format, and as command line options to the HOD shell, specified in the format --section.option[=value]. If the same option is specified in both places, the value specified on the command line overrides the value in the configuration file.
@@ -783,7 +783,7 @@ document.write("Last Published: " + document.lastModified);
 <p>
 <p>
         This section explains some of the most important or commonly used configuration options in some more detail.
         This section explains some of the most important or commonly used configuration options in some more detail.
         </p>
         </p>
-<a name="N102E2"></a><a name="Categories+%2F+Sections+in+HOD+Configuration"></a>
+<a name="N102E3"></a><a name="Categories+%2F+Sections+in+HOD+Configuration"></a>
 <h3 class="h4"> Categories / Sections in HOD Configuration </h3>
 <h3 class="h4"> Categories / Sections in HOD Configuration </h3>
 <p>
 <p>
         The following are the various sections in the HOD configuration:
         The following are the various sections in the HOD configuration:
@@ -840,9 +840,9 @@ document.write("Last Published: " + document.lastModified);
 </tr>
 </tr>
         
         
 </table>
 </table>
-<a name="N1034A"></a><a name="Important+and+Commonly+Used+Configuration+Options"></a>
+<a name="N1034B"></a><a name="Important+and+Commonly+Used+Configuration+Options"></a>
 <h3 class="h4"> Important and Commonly Used Configuration Options </h3>
 <h3 class="h4"> Important and Commonly Used Configuration Options </h3>
-<a name="N10350"></a><a name="Common+configuration+options"></a>
+<a name="N10351"></a><a name="Common+configuration+options"></a>
 <h4> Common configuration options </h4>
 <h4> Common configuration options </h4>
 <p>
 <p>
           Certain configuration options are defined in most of the sections of the HOD configuration. Options defined in a section, are used by the process for which that section applies. These options have the same meaning, but can have different values in each section.
           Certain configuration options are defined in most of the sections of the HOD configuration. Options defined in a section, are used by the process for which that section applies. These options have the same meaning, but can have different values in each section.
@@ -892,7 +892,7 @@ document.write("Last Published: " + document.lastModified);
 </tr>
 </tr>
           
           
 </table>
 </table>
-<a name="N103AE"></a><a name="hod+options"></a>
+<a name="N103AF"></a><a name="hod+options"></a>
 <h4> hod options </h4>
 <h4> hod options </h4>
 <table class="ForrestTable" cellspacing="1" cellpadding="4">
 <table class="ForrestTable" cellspacing="1" cellpadding="4">
             
             
@@ -918,7 +918,7 @@ document.write("Last Published: " + document.lastModified);
 </tr>
 </tr>
           
           
 </table>
 </table>
-<a name="N103DF"></a><a name="resource_manager+options"></a>
+<a name="N103E0"></a><a name="resource_manager+options"></a>
 <h4> resource_manager options </h4>
 <h4> resource_manager options </h4>
 <table class="ForrestTable" cellspacing="1" cellpadding="4">
 <table class="ForrestTable" cellspacing="1" cellpadding="4">
             
             
@@ -951,7 +951,7 @@ document.write("Last Published: " + document.lastModified);
 </tr>
 </tr>
           
           
 </table>
 </table>
-<a name="N1041D"></a><a name="ringmaster+options"></a>
+<a name="N1041E"></a><a name="ringmaster+options"></a>
 <h4> ringmaster options </h4>
 <h4> ringmaster options </h4>
 <table class="ForrestTable" cellspacing="1" cellpadding="4">
 <table class="ForrestTable" cellspacing="1" cellpadding="4">
             
             
@@ -970,7 +970,7 @@ document.write("Last Published: " + document.lastModified);
 </tr>
 </tr>
           
           
 </table>
 </table>
-<a name="N10441"></a><a name="gridservice-hdfs+options"></a>
+<a name="N10442"></a><a name="gridservice-hdfs+options"></a>
 <h4> gridservice-hdfs options </h4>
 <h4> gridservice-hdfs options </h4>
 <table class="ForrestTable" cellspacing="1" cellpadding="4">
 <table class="ForrestTable" cellspacing="1" cellpadding="4">
             
             
@@ -1037,7 +1037,7 @@ document.write("Last Published: " + document.lastModified);
 </tr>
 </tr>
           
           
 </table>
 </table>
-<a name="N104B9"></a><a name="gridservice-mapred+options"></a>
+<a name="N104BA"></a><a name="gridservice-mapred+options"></a>
 <h4> gridservice-mapred options </h4>
 <h4> gridservice-mapred options </h4>
 <table class="ForrestTable" cellspacing="1" cellpadding="4">
 <table class="ForrestTable" cellspacing="1" cellpadding="4">
             
             

+ 46 - 46
docs/mapred_tutorial.html

@@ -280,7 +280,7 @@ document.write("Last Published: " + document.lastModified);
 <a href="#Example%3A+WordCount+v2.0">Example: WordCount v2.0</a>
 <a href="#Example%3A+WordCount+v2.0">Example: WordCount v2.0</a>
 <ul class="minitoc">
 <ul class="minitoc">
 <li>
 <li>
-<a href="#Source+Code-N10BBD">Source Code</a>
+<a href="#Source+Code-N10BBE">Source Code</a>
 </li>
 </li>
 <li>
 <li>
 <a href="#Sample+Runs">Sample Runs</a>
 <a href="#Sample+Runs">Sample Runs</a>
@@ -294,7 +294,7 @@ document.write("Last Published: " + document.lastModified);
 </div>
 </div>
   
   
     
     
-<a name="N1000C"></a><a name="Purpose"></a>
+<a name="N1000D"></a><a name="Purpose"></a>
 <h2 class="h3">Purpose</h2>
 <h2 class="h3">Purpose</h2>
 <div class="section">
 <div class="section">
 <p>This document comprehensively describes all user-facing facets of the 
 <p>This document comprehensively describes all user-facing facets of the 
@@ -303,7 +303,7 @@ document.write("Last Published: " + document.lastModified);
 </div>
 </div>
     
     
     
     
-<a name="N10016"></a><a name="Pre-requisites"></a>
+<a name="N10017"></a><a name="Pre-requisites"></a>
 <h2 class="h3">Pre-requisites</h2>
 <h2 class="h3">Pre-requisites</h2>
 <div class="section">
 <div class="section">
 <p>Ensure that Hadoop is installed, configured and is running. More
 <p>Ensure that Hadoop is installed, configured and is running. More
@@ -323,7 +323,7 @@ document.write("Last Published: " + document.lastModified);
 </div>
 </div>
     
     
     
     
-<a name="N10031"></a><a name="Overview"></a>
+<a name="N10032"></a><a name="Overview"></a>
 <h2 class="h3">Overview</h2>
 <h2 class="h3">Overview</h2>
 <div class="section">
 <div class="section">
 <p>Hadoop Map-Reduce is a software framework for easily writing 
 <p>Hadoop Map-Reduce is a software framework for easily writing 
@@ -381,7 +381,7 @@ document.write("Last Published: " + document.lastModified);
 </div>
 </div>
     
     
     
     
-<a name="N1008A"></a><a name="Inputs+and+Outputs"></a>
+<a name="N1008B"></a><a name="Inputs+and+Outputs"></a>
 <h2 class="h3">Inputs and Outputs</h2>
 <h2 class="h3">Inputs and Outputs</h2>
 <div class="section">
 <div class="section">
 <p>The Map-Reduce framework operates exclusively on 
 <p>The Map-Reduce framework operates exclusively on 
@@ -415,7 +415,7 @@ document.write("Last Published: " + document.lastModified);
 </div>
 </div>
 
 
     
     
-<a name="N100CC"></a><a name="Example%3A+WordCount+v1.0"></a>
+<a name="N100CD"></a><a name="Example%3A+WordCount+v1.0"></a>
 <h2 class="h3">Example: WordCount v1.0</h2>
 <h2 class="h3">Example: WordCount v1.0</h2>
 <div class="section">
 <div class="section">
 <p>Before we jump into the details, lets walk through an example Map-Reduce 
 <p>Before we jump into the details, lets walk through an example Map-Reduce 
@@ -428,7 +428,7 @@ document.write("Last Published: " + document.lastModified);
       <a href="quickstart.html#SingleNodeSetup">pseudo-distributed</a> or
       <a href="quickstart.html#SingleNodeSetup">pseudo-distributed</a> or
       <a href="quickstart.html#Fully-Distributed+Operation">fully-distributed</a> 
       <a href="quickstart.html#Fully-Distributed+Operation">fully-distributed</a> 
       Hadoop installation.</p>
       Hadoop installation.</p>
-<a name="N100E9"></a><a name="Source+Code"></a>
+<a name="N100EA"></a><a name="Source+Code"></a>
 <h3 class="h4">Source Code</h3>
 <h3 class="h4">Source Code</h3>
 <table class="ForrestTable" cellspacing="1" cellpadding="4">
 <table class="ForrestTable" cellspacing="1" cellpadding="4">
           
           
@@ -991,7 +991,7 @@ document.write("Last Published: " + document.lastModified);
 </tr>
 </tr>
         
         
 </table>
 </table>
-<a name="N1046B"></a><a name="Usage"></a>
+<a name="N1046C"></a><a name="Usage"></a>
 <h3 class="h4">Usage</h3>
 <h3 class="h4">Usage</h3>
 <p>Assuming <span class="codefrag">HADOOP_HOME</span> is the root of the installation and 
 <p>Assuming <span class="codefrag">HADOOP_HOME</span> is the root of the installation and 
         <span class="codefrag">HADOOP_VERSION</span> is the Hadoop version installed, compile 
         <span class="codefrag">HADOOP_VERSION</span> is the Hadoop version installed, compile 
@@ -1086,7 +1086,7 @@ document.write("Last Published: " + document.lastModified);
 <br>
 <br>
         
         
 </p>
 </p>
-<a name="N104EB"></a><a name="Walk-through"></a>
+<a name="N104EC"></a><a name="Walk-through"></a>
 <h3 class="h4">Walk-through</h3>
 <h3 class="h4">Walk-through</h3>
 <p>The <span class="codefrag">WordCount</span> application is quite straight-forward.</p>
 <p>The <span class="codefrag">WordCount</span> application is quite straight-forward.</p>
 <p>The <span class="codefrag">Mapper</span> implementation (lines 14-26), via the 
 <p>The <span class="codefrag">Mapper</span> implementation (lines 14-26), via the 
@@ -1196,7 +1196,7 @@ document.write("Last Published: " + document.lastModified);
 </div>
 </div>
     
     
     
     
-<a name="N105A2"></a><a name="Map-Reduce+-+User+Interfaces"></a>
+<a name="N105A3"></a><a name="Map-Reduce+-+User+Interfaces"></a>
 <h2 class="h3">Map-Reduce - User Interfaces</h2>
 <h2 class="h3">Map-Reduce - User Interfaces</h2>
 <div class="section">
 <div class="section">
 <p>This section provides a reasonable amount of detail on every user-facing 
 <p>This section provides a reasonable amount of detail on every user-facing 
@@ -1215,12 +1215,12 @@ document.write("Last Published: " + document.lastModified);
 <p>Finally, we will wrap up by discussing some useful features of the
 <p>Finally, we will wrap up by discussing some useful features of the
       framework such as the <span class="codefrag">DistributedCache</span>, 
       framework such as the <span class="codefrag">DistributedCache</span>, 
       <span class="codefrag">IsolationRunner</span> etc.</p>
       <span class="codefrag">IsolationRunner</span> etc.</p>
-<a name="N105DB"></a><a name="Payload"></a>
+<a name="N105DC"></a><a name="Payload"></a>
 <h3 class="h4">Payload</h3>
 <h3 class="h4">Payload</h3>
 <p>Applications typically implement the <span class="codefrag">Mapper</span> and 
 <p>Applications typically implement the <span class="codefrag">Mapper</span> and 
         <span class="codefrag">Reducer</span> interfaces to provide the <span class="codefrag">map</span> and 
         <span class="codefrag">Reducer</span> interfaces to provide the <span class="codefrag">map</span> and 
         <span class="codefrag">reduce</span> methods. These form the core of the job.</p>
         <span class="codefrag">reduce</span> methods. These form the core of the job.</p>
-<a name="N105F0"></a><a name="Mapper"></a>
+<a name="N105F1"></a><a name="Mapper"></a>
 <h4>Mapper</h4>
 <h4>Mapper</h4>
 <p>
 <p>
 <a href="api/org/apache/hadoop/mapred/Mapper.html">
 <a href="api/org/apache/hadoop/mapred/Mapper.html">
@@ -1276,7 +1276,7 @@ document.write("Last Published: " + document.lastModified);
           <a href="api/org/apache/hadoop/io/compress/CompressionCodec.html">
           <a href="api/org/apache/hadoop/io/compress/CompressionCodec.html">
           CompressionCodec</a> to be used via the <span class="codefrag">JobConf</span>.
           CompressionCodec</a> to be used via the <span class="codefrag">JobConf</span>.
           </p>
           </p>
-<a name="N1066A"></a><a name="How+Many+Maps%3F"></a>
+<a name="N1066B"></a><a name="How+Many+Maps%3F"></a>
 <h5>How Many Maps?</h5>
 <h5>How Many Maps?</h5>
 <p>The number of maps is usually driven by the total size of the 
 <p>The number of maps is usually driven by the total size of the 
             inputs, that is, the total number of blocks of the input files.</p>
             inputs, that is, the total number of blocks of the input files.</p>
@@ -1289,7 +1289,7 @@ document.write("Last Published: " + document.lastModified);
             <a href="api/org/apache/hadoop/mapred/JobConf.html#setNumMapTasks(int)">
             <a href="api/org/apache/hadoop/mapred/JobConf.html#setNumMapTasks(int)">
             setNumMapTasks(int)</a> (which only provides a hint to the framework) 
             setNumMapTasks(int)</a> (which only provides a hint to the framework) 
             is used to set it even higher.</p>
             is used to set it even higher.</p>
-<a name="N10682"></a><a name="Reducer"></a>
+<a name="N10683"></a><a name="Reducer"></a>
 <h4>Reducer</h4>
 <h4>Reducer</h4>
 <p>
 <p>
 <a href="api/org/apache/hadoop/mapred/Reducer.html">
 <a href="api/org/apache/hadoop/mapred/Reducer.html">
@@ -1312,18 +1312,18 @@ document.write("Last Published: " + document.lastModified);
 <p>
 <p>
 <span class="codefrag">Reducer</span> has 3 primary phases: shuffle, sort and reduce.
 <span class="codefrag">Reducer</span> has 3 primary phases: shuffle, sort and reduce.
           </p>
           </p>
-<a name="N106B2"></a><a name="Shuffle"></a>
+<a name="N106B3"></a><a name="Shuffle"></a>
 <h5>Shuffle</h5>
 <h5>Shuffle</h5>
 <p>Input to the <span class="codefrag">Reducer</span> is the sorted output of the
 <p>Input to the <span class="codefrag">Reducer</span> is the sorted output of the
             mappers. In this phase the framework fetches the relevant partition 
             mappers. In this phase the framework fetches the relevant partition 
             of the output of all the mappers, via HTTP.</p>
             of the output of all the mappers, via HTTP.</p>
-<a name="N106BF"></a><a name="Sort"></a>
+<a name="N106C0"></a><a name="Sort"></a>
 <h5>Sort</h5>
 <h5>Sort</h5>
 <p>The framework groups <span class="codefrag">Reducer</span> inputs by keys (since 
 <p>The framework groups <span class="codefrag">Reducer</span> inputs by keys (since 
             different mappers may have output the same key) in this stage.</p>
             different mappers may have output the same key) in this stage.</p>
 <p>The shuffle and sort phases occur simultaneously; while 
 <p>The shuffle and sort phases occur simultaneously; while 
             map-outputs are being fetched they are merged.</p>
             map-outputs are being fetched they are merged.</p>
-<a name="N106CE"></a><a name="Secondary+Sort"></a>
+<a name="N106CF"></a><a name="Secondary+Sort"></a>
 <h5>Secondary Sort</h5>
 <h5>Secondary Sort</h5>
 <p>If equivalence rules for grouping the intermediate keys are 
 <p>If equivalence rules for grouping the intermediate keys are 
               required to be different from those for grouping keys before 
               required to be different from those for grouping keys before 
@@ -1334,7 +1334,7 @@ document.write("Last Published: " + document.lastModified);
               JobConf.setOutputKeyComparatorClass(Class)</a> can be used to 
               JobConf.setOutputKeyComparatorClass(Class)</a> can be used to 
               control how intermediate keys are grouped, these can be used in 
               control how intermediate keys are grouped, these can be used in 
               conjunction to simulate <em>secondary sort on values</em>.</p>
               conjunction to simulate <em>secondary sort on values</em>.</p>
-<a name="N106E7"></a><a name="Reduce"></a>
+<a name="N106E8"></a><a name="Reduce"></a>
 <h5>Reduce</h5>
 <h5>Reduce</h5>
 <p>In this phase the 
 <p>In this phase the 
             <a href="api/org/apache/hadoop/mapred/Reducer.html#reduce(K2, java.util.Iterator, org.apache.hadoop.mapred.OutputCollector, org.apache.hadoop.mapred.Reporter)">
             <a href="api/org/apache/hadoop/mapred/Reducer.html#reduce(K2, java.util.Iterator, org.apache.hadoop.mapred.OutputCollector, org.apache.hadoop.mapred.Reporter)">
@@ -1350,7 +1350,7 @@ document.write("Last Published: " + document.lastModified);
             progress, set application-level status messages and update 
             progress, set application-level status messages and update 
             <span class="codefrag">Counters</span>, or just indicate that they are alive.</p>
             <span class="codefrag">Counters</span>, or just indicate that they are alive.</p>
 <p>The output of the <span class="codefrag">Reducer</span> is <em>not sorted</em>.</p>
 <p>The output of the <span class="codefrag">Reducer</span> is <em>not sorted</em>.</p>
-<a name="N10715"></a><a name="How+Many+Reduces%3F"></a>
+<a name="N10716"></a><a name="How+Many+Reduces%3F"></a>
 <h5>How Many Reduces?</h5>
 <h5>How Many Reduces?</h5>
 <p>The right number of reduces seems to be <span class="codefrag">0.95</span> or 
 <p>The right number of reduces seems to be <span class="codefrag">0.95</span> or 
             <span class="codefrag">1.75</span> multiplied by (&lt;<em>no. of nodes</em>&gt; * 
             <span class="codefrag">1.75</span> multiplied by (&lt;<em>no. of nodes</em>&gt; * 
@@ -1365,7 +1365,7 @@ document.write("Last Published: " + document.lastModified);
 <p>The scaling factors above are slightly less than whole numbers to 
 <p>The scaling factors above are slightly less than whole numbers to 
             reserve a few reduce slots in the framework for speculative-tasks and
             reserve a few reduce slots in the framework for speculative-tasks and
             failed tasks.</p>
             failed tasks.</p>
-<a name="N1073A"></a><a name="Reducer+NONE"></a>
+<a name="N1073B"></a><a name="Reducer+NONE"></a>
 <h5>Reducer NONE</h5>
 <h5>Reducer NONE</h5>
 <p>It is legal to set the number of reduce-tasks to <em>zero</em> if 
 <p>It is legal to set the number of reduce-tasks to <em>zero</em> if 
             no reduction is desired.</p>
             no reduction is desired.</p>
@@ -1375,7 +1375,7 @@ document.write("Last Published: " + document.lastModified);
             setOutputPath(Path)</a>. The framework does not sort the 
             setOutputPath(Path)</a>. The framework does not sort the 
             map-outputs before writing them out to the <span class="codefrag">FileSystem</span>.
             map-outputs before writing them out to the <span class="codefrag">FileSystem</span>.
             </p>
             </p>
-<a name="N10755"></a><a name="Partitioner"></a>
+<a name="N10756"></a><a name="Partitioner"></a>
 <h4>Partitioner</h4>
 <h4>Partitioner</h4>
 <p>
 <p>
 <a href="api/org/apache/hadoop/mapred/Partitioner.html">
 <a href="api/org/apache/hadoop/mapred/Partitioner.html">
@@ -1389,7 +1389,7 @@ document.write("Last Published: " + document.lastModified);
 <p>
 <p>
 <a href="api/org/apache/hadoop/mapred/lib/HashPartitioner.html">
 <a href="api/org/apache/hadoop/mapred/lib/HashPartitioner.html">
           HashPartitioner</a> is the default <span class="codefrag">Partitioner</span>.</p>
           HashPartitioner</a> is the default <span class="codefrag">Partitioner</span>.</p>
-<a name="N10774"></a><a name="Reporter"></a>
+<a name="N10775"></a><a name="Reporter"></a>
 <h4>Reporter</h4>
 <h4>Reporter</h4>
 <p>
 <p>
 <a href="api/org/apache/hadoop/mapred/Reporter.html">
 <a href="api/org/apache/hadoop/mapred/Reporter.html">
@@ -1408,7 +1408,7 @@ document.write("Last Published: " + document.lastModified);
           </p>
           </p>
 <p>Applications can also update <span class="codefrag">Counters</span> using the 
 <p>Applications can also update <span class="codefrag">Counters</span> using the 
           <span class="codefrag">Reporter</span>.</p>
           <span class="codefrag">Reporter</span>.</p>
-<a name="N1079E"></a><a name="OutputCollector"></a>
+<a name="N1079F"></a><a name="OutputCollector"></a>
 <h4>OutputCollector</h4>
 <h4>OutputCollector</h4>
 <p>
 <p>
 <a href="api/org/apache/hadoop/mapred/OutputCollector.html">
 <a href="api/org/apache/hadoop/mapred/OutputCollector.html">
@@ -1419,7 +1419,7 @@ document.write("Last Published: " + document.lastModified);
 <p>Hadoop Map-Reduce comes bundled with a 
 <p>Hadoop Map-Reduce comes bundled with a 
         <a href="api/org/apache/hadoop/mapred/lib/package-summary.html">
         <a href="api/org/apache/hadoop/mapred/lib/package-summary.html">
         library</a> of generally useful mappers, reducers, and partitioners.</p>
         library</a> of generally useful mappers, reducers, and partitioners.</p>
-<a name="N107B9"></a><a name="Job+Configuration"></a>
+<a name="N107BA"></a><a name="Job+Configuration"></a>
 <h3 class="h4">Job Configuration</h3>
 <h3 class="h4">Job Configuration</h3>
 <p>
 <p>
 <a href="api/org/apache/hadoop/mapred/JobConf.html">
 <a href="api/org/apache/hadoop/mapred/JobConf.html">
@@ -1474,7 +1474,7 @@ document.write("Last Published: " + document.lastModified);
         <a href="api/org/apache/hadoop/conf/Configuration.html#set(java.lang.String, java.lang.String)">set(String, String)</a>/<a href="api/org/apache/hadoop/conf/Configuration.html#get(java.lang.String, java.lang.String)">get(String, String)</a>
         <a href="api/org/apache/hadoop/conf/Configuration.html#set(java.lang.String, java.lang.String)">set(String, String)</a>/<a href="api/org/apache/hadoop/conf/Configuration.html#get(java.lang.String, java.lang.String)">get(String, String)</a>
         to set/get arbitrary parameters needed by applications. However, use the 
         to set/get arbitrary parameters needed by applications. However, use the 
         <span class="codefrag">DistributedCache</span> for large amounts of (read-only) data.</p>
         <span class="codefrag">DistributedCache</span> for large amounts of (read-only) data.</p>
-<a name="N10843"></a><a name="Task+Execution+%26+Environment"></a>
+<a name="N10844"></a><a name="Task+Execution+%26+Environment"></a>
 <h3 class="h4">Task Execution &amp; Environment</h3>
 <h3 class="h4">Task Execution &amp; Environment</h3>
 <p>The <span class="codefrag">TaskTracker</span> executes the <span class="codefrag">Mapper</span>/ 
 <p>The <span class="codefrag">TaskTracker</span> executes the <span class="codefrag">Mapper</span>/ 
         <span class="codefrag">Reducer</span>  <em>task</em> as a child process in a separate jvm.
         <span class="codefrag">Reducer</span>  <em>task</em> as a child process in a separate jvm.
@@ -1534,7 +1534,7 @@ document.write("Last Published: " + document.lastModified);
         loaded via <a href="http://java.sun.com/j2se/1.5.0/docs/api/java/lang/System.html#loadLibrary(java.lang.String)">
         loaded via <a href="http://java.sun.com/j2se/1.5.0/docs/api/java/lang/System.html#loadLibrary(java.lang.String)">
         System.loadLibrary</a> or <a href="http://java.sun.com/j2se/1.5.0/docs/api/java/lang/System.html#load(java.lang.String)">
         System.loadLibrary</a> or <a href="http://java.sun.com/j2se/1.5.0/docs/api/java/lang/System.html#load(java.lang.String)">
         System.load</a>.</p>
         System.load</a>.</p>
-<a name="N108B8"></a><a name="Job+Submission+and+Monitoring"></a>
+<a name="N108B9"></a><a name="Job+Submission+and+Monitoring"></a>
 <h3 class="h4">Job Submission and Monitoring</h3>
 <h3 class="h4">Job Submission and Monitoring</h3>
 <p>
 <p>
 <a href="api/org/apache/hadoop/mapred/JobClient.html">
 <a href="api/org/apache/hadoop/mapred/JobClient.html">
@@ -1570,7 +1570,7 @@ document.write("Last Published: " + document.lastModified);
 <p>Normally the user creates the application, describes various facets 
 <p>Normally the user creates the application, describes various facets 
         of the job via <span class="codefrag">JobConf</span>, and then uses the 
         of the job via <span class="codefrag">JobConf</span>, and then uses the 
         <span class="codefrag">JobClient</span> to submit the job and monitor its progress.</p>
         <span class="codefrag">JobClient</span> to submit the job and monitor its progress.</p>
-<a name="N108F6"></a><a name="Job+Control"></a>
+<a name="N108F7"></a><a name="Job+Control"></a>
 <h4>Job Control</h4>
 <h4>Job Control</h4>
 <p>Users may need to chain map-reduce jobs to accomplish complex
 <p>Users may need to chain map-reduce jobs to accomplish complex
           tasks which cannot be done via a single map-reduce job. This is fairly
           tasks which cannot be done via a single map-reduce job. This is fairly
@@ -1606,7 +1606,7 @@ document.write("Last Published: " + document.lastModified);
             </li>
             </li>
           
           
 </ul>
 </ul>
-<a name="N10920"></a><a name="Job+Input"></a>
+<a name="N10921"></a><a name="Job+Input"></a>
 <h3 class="h4">Job Input</h3>
 <h3 class="h4">Job Input</h3>
 <p>
 <p>
 <a href="api/org/apache/hadoop/mapred/InputFormat.html">
 <a href="api/org/apache/hadoop/mapred/InputFormat.html">
@@ -1654,7 +1654,7 @@ document.write("Last Published: " + document.lastModified);
         appropriate <span class="codefrag">CompressionCodec</span>. However, it must be noted that
         appropriate <span class="codefrag">CompressionCodec</span>. However, it must be noted that
         compressed files with the above extensions cannot be <em>split</em> and 
         compressed files with the above extensions cannot be <em>split</em> and 
         each compressed file is processed in its entirety by a single mapper.</p>
         each compressed file is processed in its entirety by a single mapper.</p>
-<a name="N1098A"></a><a name="InputSplit"></a>
+<a name="N1098B"></a><a name="InputSplit"></a>
 <h4>InputSplit</h4>
 <h4>InputSplit</h4>
 <p>
 <p>
 <a href="api/org/apache/hadoop/mapred/InputSplit.html">
 <a href="api/org/apache/hadoop/mapred/InputSplit.html">
@@ -1668,7 +1668,7 @@ document.write("Last Published: " + document.lastModified);
           FileSplit</a> is the default <span class="codefrag">InputSplit</span>. It sets 
           FileSplit</a> is the default <span class="codefrag">InputSplit</span>. It sets 
           <span class="codefrag">map.input.file</span> to the path of the input file for the
           <span class="codefrag">map.input.file</span> to the path of the input file for the
           logical split.</p>
           logical split.</p>
-<a name="N109AF"></a><a name="RecordReader"></a>
+<a name="N109B0"></a><a name="RecordReader"></a>
 <h4>RecordReader</h4>
 <h4>RecordReader</h4>
 <p>
 <p>
 <a href="api/org/apache/hadoop/mapred/RecordReader.html">
 <a href="api/org/apache/hadoop/mapred/RecordReader.html">
@@ -1680,7 +1680,7 @@ document.write("Last Published: " + document.lastModified);
           for processing. <span class="codefrag">RecordReader</span> thus assumes the 
           for processing. <span class="codefrag">RecordReader</span> thus assumes the 
           responsibility of processing record boundaries and presents the tasks 
           responsibility of processing record boundaries and presents the tasks 
           with keys and values.</p>
           with keys and values.</p>
-<a name="N109D2"></a><a name="Job+Output"></a>
+<a name="N109D3"></a><a name="Job+Output"></a>
 <h3 class="h4">Job Output</h3>
 <h3 class="h4">Job Output</h3>
 <p>
 <p>
 <a href="api/org/apache/hadoop/mapred/OutputFormat.html">
 <a href="api/org/apache/hadoop/mapred/OutputFormat.html">
@@ -1705,7 +1705,7 @@ document.write("Last Published: " + document.lastModified);
 <p>
 <p>
 <span class="codefrag">TextOutputFormat</span> is the default 
 <span class="codefrag">TextOutputFormat</span> is the default 
         <span class="codefrag">OutputFormat</span>.</p>
         <span class="codefrag">OutputFormat</span>.</p>
-<a name="N109FB"></a><a name="Task+Side-Effect+Files"></a>
+<a name="N109FC"></a><a name="Task+Side-Effect+Files"></a>
 <h4>Task Side-Effect Files</h4>
 <h4>Task Side-Effect Files</h4>
 <p>In some applications, component tasks need to create and/or write to
 <p>In some applications, component tasks need to create and/or write to
           side-files, which differ from the actual job-output files.</p>
           side-files, which differ from the actual job-output files.</p>
@@ -1731,7 +1731,7 @@ document.write("Last Published: " + document.lastModified);
           JobConf.getOutputPath()</a>, and the framework will promote them 
           JobConf.getOutputPath()</a>, and the framework will promote them 
           similarly for succesful task-attempts, thus eliminating the need to 
           similarly for succesful task-attempts, thus eliminating the need to 
           pick unique paths per task-attempt.</p>
           pick unique paths per task-attempt.</p>
-<a name="N10A30"></a><a name="RecordWriter"></a>
+<a name="N10A31"></a><a name="RecordWriter"></a>
 <h4>RecordWriter</h4>
 <h4>RecordWriter</h4>
 <p>
 <p>
 <a href="api/org/apache/hadoop/mapred/RecordWriter.html">
 <a href="api/org/apache/hadoop/mapred/RecordWriter.html">
@@ -1739,9 +1739,9 @@ document.write("Last Published: " + document.lastModified);
           pairs to an output file.</p>
           pairs to an output file.</p>
 <p>RecordWriter implementations write the job outputs to the 
 <p>RecordWriter implementations write the job outputs to the 
           <span class="codefrag">FileSystem</span>.</p>
           <span class="codefrag">FileSystem</span>.</p>
-<a name="N10A47"></a><a name="Other+Useful+Features"></a>
+<a name="N10A48"></a><a name="Other+Useful+Features"></a>
 <h3 class="h4">Other Useful Features</h3>
 <h3 class="h4">Other Useful Features</h3>
-<a name="N10A4D"></a><a name="Counters"></a>
+<a name="N10A4E"></a><a name="Counters"></a>
 <h4>Counters</h4>
 <h4>Counters</h4>
 <p>
 <p>
 <span class="codefrag">Counters</span> represent global counters, defined either by 
 <span class="codefrag">Counters</span> represent global counters, defined either by 
@@ -1755,7 +1755,7 @@ document.write("Last Published: " + document.lastModified);
           Reporter.incrCounter(Enum, long)</a> in the <span class="codefrag">map</span> and/or 
           Reporter.incrCounter(Enum, long)</a> in the <span class="codefrag">map</span> and/or 
           <span class="codefrag">reduce</span> methods. These counters are then globally 
           <span class="codefrag">reduce</span> methods. These counters are then globally 
           aggregated by the framework.</p>
           aggregated by the framework.</p>
-<a name="N10A78"></a><a name="DistributedCache"></a>
+<a name="N10A79"></a><a name="DistributedCache"></a>
 <h4>DistributedCache</h4>
 <h4>DistributedCache</h4>
 <p>
 <p>
 <a href="api/org/apache/hadoop/filecache/DistributedCache.html">
 <a href="api/org/apache/hadoop/filecache/DistributedCache.html">
@@ -1788,7 +1788,7 @@ document.write("Last Published: " + document.lastModified);
           <a href="api/org/apache/hadoop/filecache/DistributedCache.html#createSymlink(org.apache.hadoop.conf.Configuration)">
           <a href="api/org/apache/hadoop/filecache/DistributedCache.html#createSymlink(org.apache.hadoop.conf.Configuration)">
           DistributedCache.createSymlink(Path, Configuration)</a> api. Files 
           DistributedCache.createSymlink(Path, Configuration)</a> api. Files 
           have <em>execution permissions</em> set.</p>
           have <em>execution permissions</em> set.</p>
-<a name="N10AB6"></a><a name="Tool"></a>
+<a name="N10AB7"></a><a name="Tool"></a>
 <h4>Tool</h4>
 <h4>Tool</h4>
 <p>The <a href="api/org/apache/hadoop/util/Tool.html">Tool</a> 
 <p>The <a href="api/org/apache/hadoop/util/Tool.html">Tool</a> 
           interface supports the handling of generic Hadoop command-line options.
           interface supports the handling of generic Hadoop command-line options.
@@ -1828,7 +1828,7 @@ document.write("Last Published: " + document.lastModified);
             </span>
             </span>
           
           
 </p>
 </p>
-<a name="N10AE8"></a><a name="IsolationRunner"></a>
+<a name="N10AE9"></a><a name="IsolationRunner"></a>
 <h4>IsolationRunner</h4>
 <h4>IsolationRunner</h4>
 <p>
 <p>
 <a href="api/org/apache/hadoop/mapred/IsolationRunner.html">
 <a href="api/org/apache/hadoop/mapred/IsolationRunner.html">
@@ -1852,13 +1852,13 @@ document.write("Last Published: " + document.lastModified);
 <p>
 <p>
 <span class="codefrag">IsolationRunner</span> will run the failed task in a single 
 <span class="codefrag">IsolationRunner</span> will run the failed task in a single 
           jvm, which can be in the debugger, over precisely the same input.</p>
           jvm, which can be in the debugger, over precisely the same input.</p>
-<a name="N10B1B"></a><a name="JobControl"></a>
+<a name="N10B1C"></a><a name="JobControl"></a>
 <h4>JobControl</h4>
 <h4>JobControl</h4>
 <p>
 <p>
 <a href="api/org/apache/hadoop/mapred/jobcontrol/package-summary.html">
 <a href="api/org/apache/hadoop/mapred/jobcontrol/package-summary.html">
           JobControl</a> is a utility which encapsulates a set of Map-Reduce jobs
           JobControl</a> is a utility which encapsulates a set of Map-Reduce jobs
           and their dependencies.</p>
           and their dependencies.</p>
-<a name="N10B28"></a><a name="Data+Compression"></a>
+<a name="N10B29"></a><a name="Data+Compression"></a>
 <h4>Data Compression</h4>
 <h4>Data Compression</h4>
 <p>Hadoop Map-Reduce provides facilities for the application-writer to
 <p>Hadoop Map-Reduce provides facilities for the application-writer to
           specify compression for both intermediate map-outputs and the
           specify compression for both intermediate map-outputs and the
@@ -1872,7 +1872,7 @@ document.write("Last Published: " + document.lastModified);
           codecs for reasons of both performance (zlib) and non-availability of
           codecs for reasons of both performance (zlib) and non-availability of
           Java libraries (lzo). More details on their usage and availability are
           Java libraries (lzo). More details on their usage and availability are
           available <a href="native_libraries.html">here</a>.</p>
           available <a href="native_libraries.html">here</a>.</p>
-<a name="N10B48"></a><a name="Intermediate+Outputs"></a>
+<a name="N10B49"></a><a name="Intermediate+Outputs"></a>
 <h5>Intermediate Outputs</h5>
 <h5>Intermediate Outputs</h5>
 <p>Applications can control compression of intermediate map-outputs
 <p>Applications can control compression of intermediate map-outputs
             via the 
             via the 
@@ -1893,7 +1893,7 @@ document.write("Last Published: " + document.lastModified);
             <a href="api/org/apache/hadoop/mapred/JobConf.html#setMapOutputCompressionType(org.apache.hadoop.io.SequenceFile.CompressionType)">
             <a href="api/org/apache/hadoop/mapred/JobConf.html#setMapOutputCompressionType(org.apache.hadoop.io.SequenceFile.CompressionType)">
             JobConf.setMapOutputCompressionType(SequenceFile.CompressionType)</a> 
             JobConf.setMapOutputCompressionType(SequenceFile.CompressionType)</a> 
             api.</p>
             api.</p>
-<a name="N10B74"></a><a name="Job+Outputs"></a>
+<a name="N10B75"></a><a name="Job+Outputs"></a>
 <h5>Job Outputs</h5>
 <h5>Job Outputs</h5>
 <p>Applications can control compression of job-outputs via the
 <p>Applications can control compression of job-outputs via the
             <a href="api/org/apache/hadoop/mapred/OutputFormatBase.html#setCompressOutput(org.apache.hadoop.mapred.JobConf,%20boolean)">
             <a href="api/org/apache/hadoop/mapred/OutputFormatBase.html#setCompressOutput(org.apache.hadoop.mapred.JobConf,%20boolean)">
@@ -1913,7 +1913,7 @@ document.write("Last Published: " + document.lastModified);
 </div>
 </div>
 
 
     
     
-<a name="N10BA3"></a><a name="Example%3A+WordCount+v2.0"></a>
+<a name="N10BA4"></a><a name="Example%3A+WordCount+v2.0"></a>
 <h2 class="h3">Example: WordCount v2.0</h2>
 <h2 class="h3">Example: WordCount v2.0</h2>
 <div class="section">
 <div class="section">
 <p>Here is a more complete <span class="codefrag">WordCount</span> which uses many of the
 <p>Here is a more complete <span class="codefrag">WordCount</span> which uses many of the
@@ -1923,7 +1923,7 @@ document.write("Last Published: " + document.lastModified);
       <a href="quickstart.html#SingleNodeSetup">pseudo-distributed</a> or
       <a href="quickstart.html#SingleNodeSetup">pseudo-distributed</a> or
       <a href="quickstart.html#Fully-Distributed+Operation">fully-distributed</a> 
       <a href="quickstart.html#Fully-Distributed+Operation">fully-distributed</a> 
       Hadoop installation.</p>
       Hadoop installation.</p>
-<a name="N10BBD"></a><a name="Source+Code-N10BBD"></a>
+<a name="N10BBE"></a><a name="Source+Code-N10BBE"></a>
 <h3 class="h4">Source Code</h3>
 <h3 class="h4">Source Code</h3>
 <table class="ForrestTable" cellspacing="1" cellpadding="4">
 <table class="ForrestTable" cellspacing="1" cellpadding="4">
           
           
@@ -3133,7 +3133,7 @@ document.write("Last Published: " + document.lastModified);
 </tr>
 </tr>
         
         
 </table>
 </table>
-<a name="N1131F"></a><a name="Sample+Runs"></a>
+<a name="N11320"></a><a name="Sample+Runs"></a>
 <h3 class="h4">Sample Runs</h3>
 <h3 class="h4">Sample Runs</h3>
 <p>Sample text-files as input:</p>
 <p>Sample text-files as input:</p>
 <p>
 <p>
@@ -3301,7 +3301,7 @@ document.write("Last Published: " + document.lastModified);
 <br>
 <br>
         
         
 </p>
 </p>
-<a name="N113F3"></a><a name="Highlights"></a>
+<a name="N113F4"></a><a name="Highlights"></a>
 <h3 class="h4">Highlights</h3>
 <h3 class="h4">Highlights</h3>
 <p>The second version of <span class="codefrag">WordCount</span> improves upon the 
 <p>The second version of <span class="codefrag">WordCount</span> improves upon the 
         previous one by using some features offered by the Map-Reduce framework:
         previous one by using some features offered by the Map-Reduce framework:

+ 6 - 6
docs/native_libraries.html

@@ -190,7 +190,7 @@ document.write("Last Published: " + document.lastModified);
 </div>
 </div>
   
   
     
     
-<a name="N1000C"></a><a name="Purpose"></a>
+<a name="N1000D"></a><a name="Purpose"></a>
 <h2 class="h3">Purpose</h2>
 <h2 class="h3">Purpose</h2>
 <div class="section">
 <div class="section">
 <p>Hadoop has native implementations of certain components for reasons of 
 <p>Hadoop has native implementations of certain components for reasons of 
@@ -201,7 +201,7 @@ document.write("Last Published: " + document.lastModified);
 </div>
 </div>
     
     
     
     
-<a name="N10019"></a><a name="Components"></a>
+<a name="N1001A"></a><a name="Components"></a>
 <h2 class="h3">Components</h2>
 <h2 class="h3">Components</h2>
 <div class="section">
 <div class="section">
 <p>Hadoop currently has the following 
 <p>Hadoop currently has the following 
@@ -227,7 +227,7 @@ document.write("Last Published: " + document.lastModified);
 </div>
 </div>
 
 
     
     
-<a name="N1003C"></a><a name="Usage"></a>
+<a name="N1003D"></a><a name="Usage"></a>
 <h2 class="h3">Usage</h2>
 <h2 class="h3">Usage</h2>
 <div class="section">
 <div class="section">
 <p>It is fairly simple to use the native hadoop libraries:</p>
 <p>It is fairly simple to use the native hadoop libraries:</p>
@@ -281,7 +281,7 @@ document.write("Last Published: " + document.lastModified);
 </div>
 </div>
     
     
     
     
-<a name="N10086"></a><a name="Supported+Platforms"></a>
+<a name="N10087"></a><a name="Supported+Platforms"></a>
 <h2 class="h3">Supported Platforms</h2>
 <h2 class="h3">Supported Platforms</h2>
 <div class="section">
 <div class="section">
 <p>Hadoop native library is supported only on *nix platforms only.
 <p>Hadoop native library is supported only on *nix platforms only.
@@ -311,7 +311,7 @@ document.write("Last Published: " + document.lastModified);
 </div>
 </div>
     
     
     
     
-<a name="N100B6"></a><a name="Building+Native+Hadoop+Libraries"></a>
+<a name="N100B7"></a><a name="Building+Native+Hadoop+Libraries"></a>
 <h2 class="h3">Building Native Hadoop Libraries</h2>
 <h2 class="h3">Building Native Hadoop Libraries</h2>
 <div class="section">
 <div class="section">
 <p>Hadoop native library is written in 
 <p>Hadoop native library is written in 
@@ -360,7 +360,7 @@ document.write("Last Published: " + document.lastModified);
 <p>where &lt;platform&gt; is combination of the system-properties: 
 <p>where &lt;platform&gt; is combination of the system-properties: 
       <span class="codefrag">${os.name}-${os.arch}-${sun.arch.data.model}</span>; for e.g. 
       <span class="codefrag">${os.name}-${os.arch}-${sun.arch.data.model}</span>; for e.g. 
       Linux-i386-32.</p>
       Linux-i386-32.</p>
-<a name="N10109"></a><a name="Notes"></a>
+<a name="N1010A"></a><a name="Notes"></a>
 <h3 class="h4">Notes</h3>
 <h3 class="h4">Notes</h3>
 <ul>
 <ul>
           
           

+ 13 - 13
docs/quickstart.html

@@ -215,7 +215,7 @@ document.write("Last Published: " + document.lastModified);
 </div>
 </div>
   
   
     
     
-<a name="N1000C"></a><a name="Purpose"></a>
+<a name="N1000D"></a><a name="Purpose"></a>
 <h2 class="h3">Purpose</h2>
 <h2 class="h3">Purpose</h2>
 <div class="section">
 <div class="section">
 <p>The purpose of this document is to help users get a single-node Hadoop 
 <p>The purpose of this document is to help users get a single-node Hadoop 
@@ -227,10 +227,10 @@ document.write("Last Published: " + document.lastModified);
 </div>
 </div>
     
     
     
     
-<a name="N1001E"></a><a name="PreReqs"></a>
+<a name="N1001F"></a><a name="PreReqs"></a>
 <h2 class="h3">Pre-requisites</h2>
 <h2 class="h3">Pre-requisites</h2>
 <div class="section">
 <div class="section">
-<a name="N10024"></a><a name="Supported+Platforms"></a>
+<a name="N10025"></a><a name="Supported+Platforms"></a>
 <h3 class="h4">Supported Platforms</h3>
 <h3 class="h4">Supported Platforms</h3>
 <ul>
 <ul>
           
           
@@ -245,7 +245,7 @@ document.write("Last Published: " + document.lastModified);
           </li>
           </li>
         
         
 </ul>
 </ul>
-<a name="N1003A"></a><a name="Required+Software"></a>
+<a name="N1003B"></a><a name="Required+Software"></a>
 <h3 class="h4">Required Software</h3>
 <h3 class="h4">Required Software</h3>
 <ol>
 <ol>
           
           
@@ -262,7 +262,7 @@ document.write("Last Published: " + document.lastModified);
           </li>
           </li>
         
         
 </ol>
 </ol>
-<a name="N10055"></a><a name="Additional+requirements+for+Windows"></a>
+<a name="N10056"></a><a name="Additional+requirements+for+Windows"></a>
 <h4>Additional requirements for Windows</h4>
 <h4>Additional requirements for Windows</h4>
 <ol>
 <ol>
             
             
@@ -273,7 +273,7 @@ document.write("Last Published: " + document.lastModified);
             </li>
             </li>
           
           
 </ol>
 </ol>
-<a name="N10067"></a><a name="Installing+Software"></a>
+<a name="N10068"></a><a name="Installing+Software"></a>
 <h3 class="h4">Installing Software</h3>
 <h3 class="h4">Installing Software</h3>
 <p>If your cluster doesn't have the requisite software you will need to
 <p>If your cluster doesn't have the requisite software you will need to
         install it.</p>
         install it.</p>
@@ -296,7 +296,7 @@ document.write("Last Published: " + document.lastModified);
 </div>
 </div>
     
     
     
     
-<a name="N1008B"></a><a name="Download"></a>
+<a name="N1008C"></a><a name="Download"></a>
 <h2 class="h3">Download</h2>
 <h2 class="h3">Download</h2>
 <div class="section">
 <div class="section">
 <p>
 <p>
@@ -318,7 +318,7 @@ document.write("Last Published: " + document.lastModified);
 </div>
 </div>
     
     
     
     
-<a name="N100AE"></a><a name="Standalone+Operation"></a>
+<a name="N100AF"></a><a name="Standalone+Operation"></a>
 <h2 class="h3">Standalone Operation</h2>
 <h2 class="h3">Standalone Operation</h2>
 <div class="section">
 <div class="section">
 <p>By default, Hadoop is configured to run things in a non-distributed 
 <p>By default, Hadoop is configured to run things in a non-distributed 
@@ -346,12 +346,12 @@ document.write("Last Published: " + document.lastModified);
 </div>
 </div>
     
     
     
     
-<a name="N100D2"></a><a name="SingleNodeSetup"></a>
+<a name="N100D3"></a><a name="SingleNodeSetup"></a>
 <h2 class="h3">Pseudo-Distributed Operation</h2>
 <h2 class="h3">Pseudo-Distributed Operation</h2>
 <div class="section">
 <div class="section">
 <p>Hadoop can also be run on a single-node in a pseudo-distributed mode 
 <p>Hadoop can also be run on a single-node in a pseudo-distributed mode 
 	  where each Hadoop daemon runs in a separate Java process.</p>
 	  where each Hadoop daemon runs in a separate Java process.</p>
-<a name="N100DB"></a><a name="Configuration"></a>
+<a name="N100DC"></a><a name="Configuration"></a>
 <h3 class="h4">Configuration</h3>
 <h3 class="h4">Configuration</h3>
 <p>Use the following <span class="codefrag">conf/hadoop-site.xml</span>:</p>
 <p>Use the following <span class="codefrag">conf/hadoop-site.xml</span>:</p>
 <table class="ForrestTable" cellspacing="1" cellpadding="4">
 <table class="ForrestTable" cellspacing="1" cellpadding="4">
@@ -417,7 +417,7 @@ document.write("Last Published: " + document.lastModified);
 </tr>
 </tr>
         
         
 </table>
 </table>
-<a name="N1013F"></a><a name="Setup+passphraseless"></a>
+<a name="N10140"></a><a name="Setup+passphraseless"></a>
 <h3 class="h4">Setup passphraseless ssh</h3>
 <h3 class="h4">Setup passphraseless ssh</h3>
 <p>
 <p>
           Now check that you can ssh to the localhost without a passphrase:<br>
           Now check that you can ssh to the localhost without a passphrase:<br>
@@ -435,7 +435,7 @@ document.write("Last Published: " + document.lastModified);
 <span class="codefrag">$ cat ~/.ssh/id_dsa.pub &gt;&gt; ~/.ssh/authorized_keys</span>
 <span class="codefrag">$ cat ~/.ssh/id_dsa.pub &gt;&gt; ~/.ssh/authorized_keys</span>
 		
 		
 </p>
 </p>
-<a name="N1015C"></a><a name="Execution"></a>
+<a name="N1015D"></a><a name="Execution"></a>
 <h3 class="h4">Execution</h3>
 <h3 class="h4">Execution</h3>
 <p>
 <p>
           Format a new distributed-filesystem:<br>
           Format a new distributed-filesystem:<br>
@@ -512,7 +512,7 @@ document.write("Last Published: " + document.lastModified);
 </div>
 </div>
     
     
     
     
-<a name="N101C9"></a><a name="Fully-Distributed+Operation"></a>
+<a name="N101CA"></a><a name="Fully-Distributed+Operation"></a>
 <h2 class="h3">Fully-Distributed Operation</h2>
 <h2 class="h3">Fully-Distributed Operation</h2>
 <div class="section">
 <div class="section">
 <p>Information on setting up fully-distributed non-trivial clusters
 <p>Information on setting up fully-distributed non-trivial clusters

+ 25 - 25
docs/streaming.html

@@ -253,7 +253,7 @@ document.write("Last Published: " + document.lastModified);
 </ul>
 </ul>
 </div>
 </div>
 
 
-<a name="N10018"></a><a name="Hadoop+Streaming"></a>
+<a name="N10019"></a><a name="Hadoop+Streaming"></a>
 <h2 class="h3">Hadoop Streaming</h2>
 <h2 class="h3">Hadoop Streaming</h2>
 <div class="section">
 <div class="section">
 <p>
 <p>
@@ -269,7 +269,7 @@ $HADOOP_HOME/bin/hadoop  jar $HADOOP_HOME/hadoop-streaming.jar \
 </div>
 </div>
 
 
 
 
-<a name="N10026"></a><a name="How+Does+Streaming+Work"></a>
+<a name="N10027"></a><a name="How+Does+Streaming+Work"></a>
 <h2 class="h3">How Does Streaming Work </h2>
 <h2 class="h3">How Does Streaming Work </h2>
 <div class="section">
 <div class="section">
 <p>
 <p>
@@ -298,7 +298,7 @@ $HADOOP_HOME/bin/hadoop  jar $HADOOP_HOME/hadoop-streaming.jar \
 </div>
 </div>
 
 
 
 
-<a name="N1004E"></a><a name="Package+Files+With+Job+Submissions"></a>
+<a name="N1004F"></a><a name="Package+Files+With+Job+Submissions"></a>
 <h2 class="h3">Package Files With Job Submissions</h2>
 <h2 class="h3">Package Files With Job Submissions</h2>
 <div class="section">
 <div class="section">
 <p>
 <p>
@@ -330,10 +330,10 @@ $HADOOP_HOME/bin/hadoop  jar $HADOOP_HOME/hadoop-streaming.jar \
 </div>
 </div>
 
 
 
 
-<a name="N10066"></a><a name="Streaming+Options+and+Usage"></a>
+<a name="N10067"></a><a name="Streaming+Options+and+Usage"></a>
 <h2 class="h3">Streaming Options and Usage </h2>
 <h2 class="h3">Streaming Options and Usage </h2>
 <div class="section">
 <div class="section">
-<a name="N1006C"></a><a name="Mapper-Only+Jobs"></a>
+<a name="N1006D"></a><a name="Mapper-Only+Jobs"></a>
 <h3 class="h4">Mapper-Only Jobs </h3>
 <h3 class="h4">Mapper-Only Jobs </h3>
 <p>
 <p>
 Often, you may want to process input data using a map function only. To do this, simply set mapred.reduce.tasks to zero. The map/reduce framework will not create any reducer tasks. Rather, the outputs of the mapper tasks will be the final output of the job.
 Often, you may want to process input data using a map function only. To do this, simply set mapred.reduce.tasks to zero. The map/reduce framework will not create any reducer tasks. Rather, the outputs of the mapper tasks will be the final output of the job.
@@ -341,7 +341,7 @@ Often, you may want to process input data using a map function only. To do this,
 <p>
 <p>
 To be backward compatible, Hadoop Streaming also supports the "-reduce NONE" option, which is equivalent to "-jobconf mapred.reduce.tasks=0".
 To be backward compatible, Hadoop Streaming also supports the "-reduce NONE" option, which is equivalent to "-jobconf mapred.reduce.tasks=0".
 </p>
 </p>
-<a name="N10078"></a><a name="Specifying+Other+Plugins+for+Jobs"></a>
+<a name="N10079"></a><a name="Specifying+Other+Plugins+for+Jobs"></a>
 <h3 class="h4">Specifying Other Plugins for Jobs </h3>
 <h3 class="h4">Specifying Other Plugins for Jobs </h3>
 <p>
 <p>
 Just as with a normal map/reduce job, you can specify other plugins for a streaming job:
 Just as with a normal map/reduce job, you can specify other plugins for a streaming job:
@@ -358,7 +358,7 @@ The class you supply for the input format should return key/value pairs of Text
 <p>
 <p>
 The class you supply for the output format is expected to take key/value pairs of Text class. If you do not specify an output format class, the TextOutputFormat is used as the default.
 The class you supply for the output format is expected to take key/value pairs of Text class. If you do not specify an output format class, the TextOutputFormat is used as the default.
 </p>
 </p>
-<a name="N1008B"></a><a name="Large+files+and+archives+in+Hadoop+Streaming"></a>
+<a name="N1008C"></a><a name="Large+files+and+archives+in+Hadoop+Streaming"></a>
 <h3 class="h4">Large files and archives in Hadoop Streaming </h3>
 <h3 class="h4">Large files and archives in Hadoop Streaming </h3>
 <p>
 <p>
 The -cacheFile and -cacheArchive options allow you to make files and archives available to the tasks. The argument is a URI to the file or archive that you have already uploaded to HDFS. These files and archives are cached across jobs. You can retrieve the host and fs_port values from the fs.default.name config variable.
 The -cacheFile and -cacheArchive options allow you to make files and archives available to the tasks. The argument is a URI to the file or archive that you have already uploaded to HDFS. These files and archives are cached across jobs. You can retrieve the host and fs_port values from the fs.default.name config variable.
@@ -427,7 +427,7 @@ This is just the cache string
 This is just the second cache string
 This is just the second cache string
 
 
 </pre>
 </pre>
-<a name="N100B4"></a><a name="Specifying+Additional+Configuration+Variables+for+Jobs"></a>
+<a name="N100B5"></a><a name="Specifying+Additional+Configuration+Variables+for+Jobs"></a>
 <h3 class="h4">Specifying Additional Configuration Variables for Jobs </h3>
 <h3 class="h4">Specifying Additional Configuration Variables for Jobs </h3>
 <p>
 <p>
 You can specify additional configuration variables by using "-jobconf  &lt;n&gt;=&lt;v&gt;". For example: 
 You can specify additional configuration variables by using "-jobconf  &lt;n&gt;=&lt;v&gt;". For example: 
@@ -446,7 +446,7 @@ The -jobconf mapred.reduce.tasks=2 in the above example specifies to use two red
 <p>
 <p>
 For more details on the jobconf parameters see: <a href="http://wiki.apache.org/hadoop/JobConfFile">http://wiki.apache.org/hadoop/JobConfFile</a>
 For more details on the jobconf parameters see: <a href="http://wiki.apache.org/hadoop/JobConfFile">http://wiki.apache.org/hadoop/JobConfFile</a>
 </p>
 </p>
-<a name="N100CB"></a><a name="Other+Supported+Options"></a>
+<a name="N100CC"></a><a name="Other+Supported+Options"></a>
 <h3 class="h4">Other Supported Options </h3>
 <h3 class="h4">Other Supported Options </h3>
 <p>
 <p>
 Other options you may specify for a streaming job are described here:
 Other options you may specify for a streaming job are described here:
@@ -528,10 +528,10 @@ To set an environment variable in a streaming command use:
 </div>
 </div>
 
 
 
 
-<a name="N10183"></a><a name="More+usage+examples"></a>
+<a name="N10184"></a><a name="More+usage+examples"></a>
 <h2 class="h3">More usage examples </h2>
 <h2 class="h3">More usage examples </h2>
 <div class="section">
 <div class="section">
-<a name="N10189"></a><a name="Customizing+the+Way+to+Split+Lines+into+Key%2FValue+Pairs"></a>
+<a name="N1018A"></a><a name="Customizing+the+Way+to+Split+Lines+into+Key%2FValue+Pairs"></a>
 <h3 class="h4">Customizing the Way to Split Lines into Key/Value Pairs </h3>
 <h3 class="h4">Customizing the Way to Split Lines into Key/Value Pairs </h3>
 <p>
 <p>
 As noted earlier, when the map/reduce framework reads a line from the stdout of the mapper, it splits the line into a key/value pair. By default, the prefix of the line up to the first tab character is the key and the the rest of the line (excluding the tab character) is the value.
 As noted earlier, when the map/reduce framework reads a line from the stdout of the mapper, it splits the line into a key/value pair. By default, the prefix of the line up to the first tab character is the key and the the rest of the line (excluding the tab character) is the value.
@@ -554,7 +554,7 @@ In the above example, "-jobconf stream.map.output.field.separator=." specifies "
 <p>
 <p>
 Similarly, you can use "-jobconf stream.reduce.output.field.separator=SEP" and "-jobconf stream.num.reduce.output.fields=NUM" to specify the nth field separator in a line of the reduce outputs as the separator between the key and the value.
 Similarly, you can use "-jobconf stream.reduce.output.field.separator=SEP" and "-jobconf stream.num.reduce.output.fields=NUM" to specify the nth field separator in a line of the reduce outputs as the separator between the key and the value.
 </p>
 </p>
-<a name="N1019F"></a><a name="A+Useful+Partitioner+Class+%28secondary+sort%2C+the+-partitioner+org.apache.hadoop.mapred.lib.KeyFieldBasedPartitioner+option%29"></a>
+<a name="N101A0"></a><a name="A+Useful+Partitioner+Class+%28secondary+sort%2C+the+-partitioner+org.apache.hadoop.mapred.lib.KeyFieldBasedPartitioner+option%29"></a>
 <h3 class="h4">A Useful Partitioner Class (secondary sort, the -partitioner org.apache.hadoop.mapred.lib.KeyFieldBasedPartitioner option) </h3>
 <h3 class="h4">A Useful Partitioner Class (secondary sort, the -partitioner org.apache.hadoop.mapred.lib.KeyFieldBasedPartitioner option) </h3>
 <p>
 <p>
 Hadoop has a library class, org.apache.hadoop.mapred.lib.KeyFieldBasedPartitioner, that is useful for many applications. This class allows the map/reduce framework to partition the map outputs based on prefixes of keys, not the whole keys. For example:
 Hadoop has a library class, org.apache.hadoop.mapred.lib.KeyFieldBasedPartitioner, that is useful for many applications. This class allows the map/reduce framework to partition the map outputs based on prefixes of keys, not the whole keys. For example:
@@ -614,7 +614,7 @@ Sorting within each partition for the reducer(all 4 fields used for sorting)</p>
 11.14.2.2
 11.14.2.2
 11.14.2.3
 11.14.2.3
 </pre>
 </pre>
-<a name="N101D5"></a><a name="Working+with+the+Hadoop+Aggregate+Package+%28the+-reduce+aggregate+option%29"></a>
+<a name="N101D6"></a><a name="Working+with+the+Hadoop+Aggregate+Package+%28the+-reduce+aggregate+option%29"></a>
 <h3 class="h4">Working with the Hadoop Aggregate Package (the -reduce aggregate option) </h3>
 <h3 class="h4">Working with the Hadoop Aggregate Package (the -reduce aggregate option) </h3>
 <p>
 <p>
 Hadoop has a library package called "Aggregate" (<a href="https://svn.apache.org/repos/asf/hadoop/core/trunk/src/java/org/apache/hadoop/mapred/lib/aggregate">https://svn.apache.org/repos/asf/hadoop/core/trunk/src/java/org/apache/hadoop/mapred/lib/aggregate</a>).  Aggregate provides a special reducer class and a special combiner class, and a list of simple aggregators that perform aggregations such as "sum", "max", "min" and so on  over a sequence of values. Aggregate allows you to define a mapper plugin class that is expected to generate "aggregatable items" for each input key/value pair of the mappers. The combiner/reducer will aggregate those aggregatable items by invoking the appropriate aggregators.
 Hadoop has a library package called "Aggregate" (<a href="https://svn.apache.org/repos/asf/hadoop/core/trunk/src/java/org/apache/hadoop/mapred/lib/aggregate">https://svn.apache.org/repos/asf/hadoop/core/trunk/src/java/org/apache/hadoop/mapred/lib/aggregate</a>).  Aggregate provides a special reducer class and a special combiner class, and a list of simple aggregators that perform aggregations such as "sum", "max", "min" and so on  over a sequence of values. Aggregate allows you to define a mapper plugin class that is expected to generate "aggregatable items" for each input key/value pair of the mappers. The combiner/reducer will aggregate those aggregatable items by invoking the appropriate aggregators.
@@ -655,7 +655,7 @@ def main(argv):
 if __name__ == "__main__":
 if __name__ == "__main__":
      main(sys.argv)
      main(sys.argv)
 </pre>
 </pre>
-<a name="N101F0"></a><a name="Field+Selection+%28+similar+to+unix+%27cut%27+command%29"></a>
+<a name="N101F1"></a><a name="Field+Selection+%28+similar+to+unix+%27cut%27+command%29"></a>
 <h3 class="h4">Field Selection ( similar to unix 'cut' command) </h3>
 <h3 class="h4">Field Selection ( similar to unix 'cut' command) </h3>
 <p>
 <p>
 Hadoop has a library class, org.apache.hadoop.mapred.lib.FieldSelectionMapReduce, that effectively allows you to process text data like the unix "cut" utility. The map function defined in the class treats each input key/value pair as a list of fields. You can specify the field separator (the default is the tab character). You can select an arbitrary list of fields as the map output key, and an arbitrary list of fields as the map output value. Similarly, the reduce function defined in the class treats each input key/value pair as a list of fields. You can select an arbitrary list of fields as the reduce output key, and an arbitrary list of fields as the reduce output value. For example:
 Hadoop has a library class, org.apache.hadoop.mapred.lib.FieldSelectionMapReduce, that effectively allows you to process text data like the unix "cut" utility. The map function defined in the class treats each input key/value pair as a list of fields. You can specify the field separator (the default is the tab character). You can select an arbitrary list of fields as the map output key, and an arbitrary list of fields as the map output value. Similarly, the reduce function defined in the class treats each input key/value pair as a list of fields. You can select an arbitrary list of fields as the reduce output key, and an arbitrary list of fields as the reduce output value. For example:
@@ -684,15 +684,15 @@ The option "-jobconf reduce.output.key.value.fields.spec=0-2:0-" specifies key/v
 </div>
 </div>
 
 
 
 
-<a name="N10204"></a><a name="Frequently+Asked+Questions"></a>
+<a name="N10205"></a><a name="Frequently+Asked+Questions"></a>
 <h2 class="h3">Frequently Asked Questions </h2>
 <h2 class="h3">Frequently Asked Questions </h2>
 <div class="section">
 <div class="section">
-<a name="N1020A"></a><a name="How+do+I+use+Hadoop+Streaming+to+run+an+arbitrary+set+of+%28semi-%29independent+tasks%3F"></a>
+<a name="N1020B"></a><a name="How+do+I+use+Hadoop+Streaming+to+run+an+arbitrary+set+of+%28semi-%29independent+tasks%3F"></a>
 <h3 class="h4">How do I use Hadoop Streaming to run an arbitrary set of (semi-)independent tasks? </h3>
 <h3 class="h4">How do I use Hadoop Streaming to run an arbitrary set of (semi-)independent tasks? </h3>
 <p>
 <p>
 Often you do not need the full power of Map Reduce, but only need to run multiple instances of the same program - either on different parts of the data, or on the same data, but with different parameters. You can use Hadoop Streaming to do this.
 Often you do not need the full power of Map Reduce, but only need to run multiple instances of the same program - either on different parts of the data, or on the same data, but with different parameters. You can use Hadoop Streaming to do this.
 </p>
 </p>
-<a name="N10214"></a><a name="How+do+I+process+files%2C+one+per+map%3F"></a>
+<a name="N10215"></a><a name="How+do+I+process+files%2C+one+per+map%3F"></a>
 <h3 class="h4">How do I process files, one per map? </h3>
 <h3 class="h4">How do I process files, one per map? </h3>
 <p>
 <p>
 As an example, consider the problem of zipping (compressing) a set of files across the hadoop cluster. You can achieve this using either of these methods:
 As an example, consider the problem of zipping (compressing) a set of files across the hadoop cluster. You can achieve this using either of these methods:
@@ -736,13 +736,13 @@ As an example, consider the problem of zipping (compressing) a set of files acro
 </li>
 </li>
 
 
 </ol>
 </ol>
-<a name="N1023F"></a><a name="How+many+reducers+should+I+use%3F"></a>
+<a name="N10240"></a><a name="How+many+reducers+should+I+use%3F"></a>
 <h3 class="h4">How many reducers should I use? </h3>
 <h3 class="h4">How many reducers should I use? </h3>
 <p>
 <p>
 See the Hadoop Wiki for details: <a href="http://wiki.apache.org/hadoop/HowManyMapsAndReduces">http://wiki.apache.org/hadoop/HowManyMapsAndReduces</a>
 See the Hadoop Wiki for details: <a href="http://wiki.apache.org/hadoop/HowManyMapsAndReduces">http://wiki.apache.org/hadoop/HowManyMapsAndReduces</a>
 
 
 </p>
 </p>
-<a name="N1024D"></a><a name="If+I+set+up+an+alias+in+my+shell+script%2C+will+that+work+after+-mapper%2C+i.e.+say+I+do%3A+alias+c1%3D%27cut+-f1%27.+Will+-mapper+%22c1%22+work%3F"></a>
+<a name="N1024E"></a><a name="If+I+set+up+an+alias+in+my+shell+script%2C+will+that+work+after+-mapper%2C+i.e.+say+I+do%3A+alias+c1%3D%27cut+-f1%27.+Will+-mapper+%22c1%22+work%3F"></a>
 <h3 class="h4">If I set up an alias in my shell script, will that work after -mapper, i.e. say I do: alias c1='cut -f1'. Will -mapper "c1" work? </h3>
 <h3 class="h4">If I set up an alias in my shell script, will that work after -mapper, i.e. say I do: alias c1='cut -f1'. Will -mapper "c1" work? </h3>
 <p>
 <p>
 Using an alias will not work, but variable substitution is allowed as shown in this example:
 Using an alias will not work, but variable substitution is allowed as shown in this example:
@@ -769,12 +769,12 @@ $ hadoop dfs -cat samples/student_out/part-00000
 75
 75
 80
 80
 </pre>
 </pre>
-<a name="N1025B"></a><a name="Can+I+use+UNIX+pipes%3F+For+example%2C+will+-mapper+%22cut+-f1+%7C+sed+s%2Ffoo%2Fbar%2Fg%22+work%3F"></a>
+<a name="N1025C"></a><a name="Can+I+use+UNIX+pipes%3F+For+example%2C+will+-mapper+%22cut+-f1+%7C+sed+s%2Ffoo%2Fbar%2Fg%22+work%3F"></a>
 <h3 class="h4">Can I use UNIX pipes? For example, will -mapper "cut -f1 | sed s/foo/bar/g" work?</h3>
 <h3 class="h4">Can I use UNIX pipes? For example, will -mapper "cut -f1 | sed s/foo/bar/g" work?</h3>
 <p>
 <p>
 Currently this does not work and gives an "java.io.IOException: Broken pipe" error. This is probably a bug that needs to be investigated.
 Currently this does not work and gives an "java.io.IOException: Broken pipe" error. This is probably a bug that needs to be investigated.
 </p>
 </p>
-<a name="N10265"></a><a name="When+I+run+a+streaming+job+by"></a>
+<a name="N10266"></a><a name="When+I+run+a+streaming+job+by"></a>
 <h3 class="h4">When I run a streaming job by distributing large executables (for example, 3.6G) through the -file option, I get a "No space left on device" error. What do I do? </h3>
 <h3 class="h4">When I run a streaming job by distributing large executables (for example, 3.6G) through the -file option, I get a "No space left on device" error. What do I do? </h3>
 <p>
 <p>
 The jar packaging happens in a directory pointed to by the configuration variable stream.tmpdir. The default value of stream.tmpdir is /tmp. Set the value to a directory with more space:
 The jar packaging happens in a directory pointed to by the configuration variable stream.tmpdir. The default value of stream.tmpdir is /tmp. Set the value to a directory with more space:
@@ -782,7 +782,7 @@ The jar packaging happens in a directory pointed to by the configuration variabl
 <pre class="code">
 <pre class="code">
 -jobconf stream.tmpdir=/export/bigspace/...
 -jobconf stream.tmpdir=/export/bigspace/...
 </pre>
 </pre>
-<a name="N10276"></a><a name="How+do+I+specify+multiple+input+directories%3F"></a>
+<a name="N10277"></a><a name="How+do+I+specify+multiple+input+directories%3F"></a>
 <h3 class="h4">How do I specify multiple input directories? </h3>
 <h3 class="h4">How do I specify multiple input directories? </h3>
 <p>
 <p>
 You can specify multiple input directories with multiple '-input' options:
 You can specify multiple input directories with multiple '-input' options:
@@ -790,17 +790,17 @@ You can specify multiple input directories with multiple '-input' options:
 <pre class="code">
 <pre class="code">
  hadoop jar hadoop-streaming.jar -input '/user/foo/dir1' -input '/user/foo/dir2' 
  hadoop jar hadoop-streaming.jar -input '/user/foo/dir1' -input '/user/foo/dir2' 
 </pre>
 </pre>
-<a name="N10283"></a><a name="How+do+I+generate+output+files+with+gzip+format%3F"></a>
+<a name="N10284"></a><a name="How+do+I+generate+output+files+with+gzip+format%3F"></a>
 <h3 class="h4">How do I generate output files with gzip format? </h3>
 <h3 class="h4">How do I generate output files with gzip format? </h3>
 <p>
 <p>
 Instead of plain text files, you can generate gzip files as your generated output. Pass '-jobconf mapred.output.compress=true -jobconf  mapred.output.compression.codec=org.apache.hadoop.io.compress.GzipCode' as option to your streaming job.
 Instead of plain text files, you can generate gzip files as your generated output. Pass '-jobconf mapred.output.compress=true -jobconf  mapred.output.compression.codec=org.apache.hadoop.io.compress.GzipCode' as option to your streaming job.
 </p>
 </p>
-<a name="N1028D"></a><a name="How+do+I+provide+my+own+input%2Foutput+format+with+streaming%3F"></a>
+<a name="N1028E"></a><a name="How+do+I+provide+my+own+input%2Foutput+format+with+streaming%3F"></a>
 <h3 class="h4">How do I provide my own input/output format with streaming? </h3>
 <h3 class="h4">How do I provide my own input/output format with streaming? </h3>
 <p>
 <p>
 At least as late as version 0.14, Hadoop does not support multiple jar files. So, when specifying your own custom classes you will have to pack them along with the streaming jar and use the custom jar instead of the default hadoop streaming jar. 
 At least as late as version 0.14, Hadoop does not support multiple jar files. So, when specifying your own custom classes you will have to pack them along with the streaming jar and use the custom jar instead of the default hadoop streaming jar. 
 </p>
 </p>
-<a name="N10297"></a><a name="How+do+I+parse+XML+documents+using+streaming%3F"></a>
+<a name="N10298"></a><a name="How+do+I+parse+XML+documents+using+streaming%3F"></a>
 <h3 class="h4">How do I parse XML documents using streaming? </h3>
 <h3 class="h4">How do I parse XML documents using streaming? </h3>
 <p>
 <p>
 You can use the record reader StreamXmlRecordReader to process XML documents. 
 You can use the record reader StreamXmlRecordReader to process XML documents. 

+ 2 - 2
src/docs/src/documentation/content/xdocs/tabs.xml

@@ -18,8 +18,8 @@
 <!DOCTYPE tabs PUBLIC "-//APACHE//DTD Cocoon Documentation Tab V1.0//EN" 
 <!DOCTYPE tabs PUBLIC "-//APACHE//DTD Cocoon Documentation Tab V1.0//EN" 
           "http://forrest.apache.org/dtd/tab-cocoon-v10.dtd">
           "http://forrest.apache.org/dtd/tab-cocoon-v10.dtd">
 
 
-<tabs software="Nutch"
-      title="Nutch"
+<tabs software="Hadoop"
+      title="Hadoop"
       copyright="The Apache Software Foundation"
       copyright="The Apache Software Foundation"
       xmlns:xlink="http://www.w3.org/1999/xlink">
       xmlns:xlink="http://www.w3.org/1999/xlink">