浏览代码

HADOOP-3692. Fixes documentation for Cluster setup and Quick start guides. Contributed by Amareshwari Sriramadasu.

git-svn-id: https://svn.apache.org/repos/asf/hadoop/core/trunk@674819 13f79535-47bb-0310-9956-ffa450edef68
Devaraj Das 17 年之前
父节点
当前提交
d56fbc53b1

+ 5 - 1
docs/changes.html

@@ -378,7 +378,7 @@ InputFormat.validateInput.<br />(tomwhite via omalley)</li>
     </ol>
   </li>
   <li><a href="javascript:toggleList('release_0.18.0_-_unreleased_._bug_fixes_')">  BUG FIXES
-</a>&nbsp;&nbsp;&nbsp;(111)
+</a>&nbsp;&nbsp;&nbsp;(113)
     <ol id="release_0.18.0_-_unreleased_._bug_fixes_">
       <li><a href="http://issues.apache.org/jira/browse/HADOOP-2905">HADOOP-2905</a>. 'fsck -move' triggers NPE in NameNode.<br />(Lohit Vjayarenu via rangadi)</li>
       <li>Increment ClientProtocol.versionID missed by <a href="http://issues.apache.org/jira/browse/HADOOP-2585">HADOOP-2585</a>.<br />(shv)</li>
@@ -599,6 +599,10 @@ read from DFS.<br />(rangadi)</li>
 listed.<br />(lohit vijayarenu via cdouglas)</li>
       <li><a href="http://issues.apache.org/jira/browse/HADOOP-3597">HADOOP-3597</a>. Fix SortValidator to use filesystems other than the default as
 input. Validation job still runs on default fs.<br />(Jothi Padmanabhan via cdouglas)</li>
+      <li><a href="http://issues.apache.org/jira/browse/HADOOP-3693">HADOOP-3693</a>. Fix archives, distcp and native library documentation to
+conform to style guidelines.<br />(Amareshwari Sriramadasu via cdouglas)</li>
+      <li><a href="http://issues.apache.org/jira/browse/HADOOP-3653">HADOOP-3653</a>. Fix test-patch target to properly account for Eclipse
+classpath jars.<br />(Brice Arnould via nigel)</li>
     </ol>
   </li>
 </ul>

+ 8 - 8
docs/cluster_setup.html

@@ -448,7 +448,7 @@ document.write("Last Published: " + document.lastModified);
 		      
 <td colspan="1" rowspan="1">mapred.system.dir</td>
 		      <td colspan="1" rowspan="1">
-		        Path on the HDFS where where the Map-Reduce framework stores 
+		        Path on the HDFS where where the Map/Reduce framework stores 
 		        system files e.g. <span class="codefrag">/hadoop/mapred/system/</span>.
 		      </td>
 		      <td colspan="1" rowspan="1">
@@ -463,7 +463,7 @@ document.write("Last Published: " + document.lastModified);
 <td colspan="1" rowspan="1">mapred.local.dir</td>
 		      <td colspan="1" rowspan="1">
 		        Comma-separated list of paths on the local filesystem where 
-		        temporary Map-Reduce data is written.
+		        temporary Map/Reduce data is written.
 		      </td>
 		      <td colspan="1" rowspan="1">Multiple paths help spread disk i/o.</td>
 		    
@@ -473,7 +473,7 @@ document.write("Last Published: " + document.lastModified);
 		      
 <td colspan="1" rowspan="1">mapred.tasktracker.{map|reduce}.tasks.maximum</td>
 		      <td colspan="1" rowspan="1">
-		        The maximum number of map/reduce tasks, which are run 
+		        The maximum number of Map/Reduce tasks, which are run 
 		        simultaneously on a given <span class="codefrag">TaskTracker</span>, individually.
 		      </td>
 		      <td colspan="1" rowspan="1">
@@ -500,7 +500,7 @@ document.write("Last Published: " + document.lastModified);
 		      <td colspan="1" rowspan="1">List of permitted/excluded TaskTrackers.</td>
 		      <td colspan="1" rowspan="1">
 		        If necessary, use these files to control the list of allowable 
-		        tasktrackers.
+		        TaskTrackers.
 		      </td>
   		    
 </tr>
@@ -724,7 +724,7 @@ document.write("Last Published: " + document.lastModified);
 <a name="N10398"></a><a name="Hadoop+Rack+Awareness"></a>
 <h2 class="h3">Hadoop Rack Awareness</h2>
 <div class="section">
-<p>The HDFS and the Map-Reduce components are rack-aware.</p>
+<p>The HDFS and the Map/Reduce components are rack-aware.</p>
 <p>The <span class="codefrag">NameNode</span> and the <span class="codefrag">JobTracker</span> obtains the
       <span class="codefrag">rack id</span> of the slaves in the cluster by invoking an API 
       <a href="api/org/apache/hadoop/net/DNSToSwitchMapping.html#resolve(java.util.List)">resolve</a> in an administrator configured
@@ -734,7 +734,7 @@ document.write("Last Published: " + document.lastModified);
       implementation of the same runs a script/command configured using 
       <span class="codefrag">topology.script.file.name</span>. If topology.script.file.name is
       not set, the rack id <span class="codefrag">/default-rack</span> is returned for any 
-      passed IP address. The additional configuration in the Map-Reduce
+      passed IP address. The additional configuration in the Map/Reduce
       part is <span class="codefrag">mapred.cache.task.levels</span> which determines the number
       of levels (in the network topology) of caches. So, for example, if it is
       the default value of 2, two levels of caches will be constructed - 
@@ -748,7 +748,7 @@ document.write("Last Published: " + document.lastModified);
 <h2 class="h3">Hadoop Startup</h2>
 <div class="section">
 <p>To start a Hadoop cluster you will need to start both the HDFS and 
-      Map-Reduce cluster.</p>
+      Map/Reduce cluster.</p>
 <p>
         Format a new distributed filesystem:<br>
         
@@ -793,7 +793,7 @@ document.write("Last Published: " + document.lastModified);
       <span class="codefrag">${HADOOP_CONF_DIR}/slaves</span> file on the <span class="codefrag">NameNode</span> 
       and stops the <span class="codefrag">DataNode</span> daemon on all the listed slaves.</p>
 <p>
-        Stop Map-Reduce with the following command, run on the designated
+        Stop Map/Reduce with the following command, run on the designated
         the designated <span class="codefrag">JobTracker</span>:<br>
         
 <span class="codefrag">$ bin/stop-mapred.sh</span>

文件差异内容过多而无法显示
+ 1 - 1
docs/cluster_setup.pdf


+ 29 - 33
docs/quickstart.html

@@ -201,11 +201,6 @@ document.write("Last Published: " + document.lastModified);
 </li>
 <li>
 <a href="#Required+Software">Required Software</a>
-<ul class="minitoc">
-<li>
-<a href="#Additional+requirements+for+Windows">Additional requirements for Windows</a>
-</li>
-</ul>
 </li>
 <li>
 <a href="#Installing+Software">Installing Software</a>
@@ -245,12 +240,12 @@ document.write("Last Published: " + document.lastModified);
 <a name="N1000D"></a><a name="Purpose"></a>
 <h2 class="h3">Purpose</h2>
 <div class="section">
-<p>The purpose of this document is to help users get a single-node Hadoop 
-      installation up and running very quickly so that users can get a flavour 
+<p>The purpose of this document is to help you get a single-node Hadoop 
+      installation up and running very quickly so that you can get a flavour 
       of the <a href="hdfs_design.html">Hadoop Distributed File System 
       (<acronym title="Hadoop Distributed File System">HDFS</acronym>)</a> and 
-      the Map-Reduce framework i.e. perform simple operations on HDFS, run 
-      example/simple jobs etc.</p>
+      the Map/Reduce framework; that is, perform simple operations on HDFS and 
+      run example jobs.</p>
 </div>
     
     
@@ -262,18 +257,20 @@ document.write("Last Published: " + document.lastModified);
 <ul>
           
 <li>
+            GNU/Linux is supported as a development and production platform. 
             Hadoop has been demonstrated on GNU/Linux clusters with 2000 nodes.
           </li>
           
 <li>
             Win32 is supported as a <em>development platform</em>. Distributed 
-            operation has not been well tested on Win32, so this is not a 
-            <em>production platform</em>.
+            operation has not been well tested on Win32, so it is not 
+            supported as a <em>production platform</em>.
           </li>
         
 </ul>
 <a name="N1003B"></a><a name="Required+Software"></a>
 <h3 class="h4">Required Software</h3>
+<p>Required software for Linux and Windows include:</p>
 <ol>
           
 <li>
@@ -288,18 +285,17 @@ document.write("Last Published: " + document.lastModified);
           </li>
         
 </ol>
-<a name="N10053"></a><a name="Additional+requirements+for+Windows"></a>
-<h4>Additional requirements for Windows</h4>
+<p>Additional requirements for Windows include:</p>
 <ol>
-            
+          
 <li>
-              
+            
 <a href="http://www.cygwin.com/">Cygwin</a> - Required for shell 
-              support in addition to the required software above. 
-            </li>
-          
+            support in addition to the required software above. 
+          </li>
+        
 </ol>
-<a name="N10065"></a><a name="Installing+Software"></a>
+<a name="N10064"></a><a name="Installing+Software"></a>
 <h3 class="h4">Installing Software</h3>
 <p>If your cluster doesn't have the requisite software you will need to
         install it.</p>
@@ -322,7 +318,7 @@ document.write("Last Published: " + document.lastModified);
 </div>
     
     
-<a name="N10089"></a><a name="Download"></a>
+<a name="N10088"></a><a name="Download"></a>
 <h2 class="h3">Download</h2>
 <div class="section">
 <p>
@@ -333,7 +329,7 @@ document.write("Last Published: " + document.lastModified);
 </div>
 
     
-<a name="N10097"></a><a name="Prepare+to+Start+the+Hadoop+Cluster"></a>
+<a name="N10096"></a><a name="Prepare+to+Start+the+Hadoop+Cluster"></a>
 <h2 class="h3">Prepare to Start the Hadoop Cluster</h2>
 <div class="section">
 <p>
@@ -364,10 +360,10 @@ document.write("Last Published: " + document.lastModified);
 </div>
     
     
-<a name="N100C2"></a><a name="Local"></a>
+<a name="N100C1"></a><a name="Local"></a>
 <h2 class="h3">Standalone Operation</h2>
 <div class="section">
-<p>By default, Hadoop is configured to run things in a non-distributed 
+<p>By default, Hadoop is configured to run in a non-distributed 
       mode, as a single Java process. This is useful for debugging.</p>
 <p>
         The following example copies the unpacked <span class="codefrag">conf</span> directory to 
@@ -392,12 +388,12 @@ document.write("Last Published: " + document.lastModified);
 </div>
     
     
-<a name="N100E6"></a><a name="PseudoDistributed"></a>
+<a name="N100E5"></a><a name="PseudoDistributed"></a>
 <h2 class="h3">Pseudo-Distributed Operation</h2>
 <div class="section">
 <p>Hadoop can also be run on a single-node in a pseudo-distributed mode 
 	  where each Hadoop daemon runs in a separate Java process.</p>
-<a name="N100EF"></a><a name="Configuration"></a>
+<a name="N100EE"></a><a name="Configuration"></a>
 <h3 class="h4">Configuration</h3>
 <p>Use the following <span class="codefrag">conf/hadoop-site.xml</span>:</p>
 <table class="ForrestTable" cellspacing="1" cellpadding="4">
@@ -463,7 +459,7 @@ document.write("Last Published: " + document.lastModified);
 </tr>
         
 </table>
-<a name="N10153"></a><a name="Setup+passphraseless"></a>
+<a name="N10152"></a><a name="Setup+passphraseless"></a>
 <h3 class="h4">Setup passphraseless ssh</h3>
 <p>
           Now check that you can ssh to the localhost without a passphrase:<br>
@@ -481,7 +477,7 @@ document.write("Last Published: " + document.lastModified);
 <span class="codefrag">$ cat ~/.ssh/id_dsa.pub &gt;&gt; ~/.ssh/authorized_keys</span>
 		
 </p>
-<a name="N10170"></a><a name="Execution"></a>
+<a name="N1016F"></a><a name="Execution"></a>
 <h3 class="h4">Execution</h3>
 <p>
           Format a new distributed-filesystem:<br>
@@ -490,7 +486,7 @@ document.write("Last Published: " + document.lastModified);
         
 </p>
 <p>
-		  Start The hadoop daemons:<br>
+		  Start the hadoop daemons:<br>
           
 <span class="codefrag">$ bin/start-all.sh</span>
         
@@ -498,7 +494,7 @@ document.write("Last Published: " + document.lastModified);
 <p>The hadoop daemon log output is written to the 
         <span class="codefrag">${HADOOP_LOG_DIR}</span> directory (defaults to 
         <span class="codefrag">${HADOOP_HOME}/logs</span>).</p>
-<p>Browse the web-interface for the NameNode and the JobTracker, by
+<p>Browse the web interface for the NameNode and the JobTracker; by
         default they are available at:</p>
 <ul>
           
@@ -520,7 +516,7 @@ document.write("Last Published: " + document.lastModified);
 <p>
           Copy the input files into the distributed filesystem:<br>
 		  
-<span class="codefrag">$ bin/hadoop dfs -put conf input</span>
+<span class="codefrag">$ bin/hadoop fs -put conf input</span>
 		
 </p>
 <p>
@@ -536,7 +532,7 @@ document.write("Last Published: " + document.lastModified);
           Copy the output files from the distributed filesystem to the local 
           filesytem and examine them:<br>
           
-<span class="codefrag">$ bin/hadoop dfs -get output output</span>
+<span class="codefrag">$ bin/hadoop fs -get output output</span>
 <br>
           
 <span class="codefrag">$ cat output/*</span>
@@ -546,7 +542,7 @@ document.write("Last Published: " + document.lastModified);
 <p>
           View the output files on the distributed filesystem:<br>
           
-<span class="codefrag">$ bin/hadoop dfs -cat output/*</span>
+<span class="codefrag">$ bin/hadoop fs -cat output/*</span>
         
 </p>
 <p>
@@ -558,7 +554,7 @@ document.write("Last Published: " + document.lastModified);
 </div>
     
     
-<a name="N101DD"></a><a name="FullyDistributed"></a>
+<a name="N101DC"></a><a name="FullyDistributed"></a>
 <h2 class="h3">Fully-Distributed Operation</h2>
 <div class="section">
 <p>Information on setting up fully-distributed, non-trivial clusters

文件差异内容过多而无法显示
+ 1 - 1
docs/quickstart.pdf


+ 8 - 8
src/docs/src/documentation/content/xdocs/cluster_setup.xml

@@ -201,7 +201,7 @@
 		    <tr>
 		      <td>mapred.system.dir</td>
 		      <td>
-		        Path on the HDFS where where the Map-Reduce framework stores 
+		        Path on the HDFS where where the Map/Reduce framework stores 
 		        system files e.g. <code>/hadoop/mapred/system/</code>.
 		      </td>
 		      <td>
@@ -213,14 +213,14 @@
 		      <td>mapred.local.dir</td>
 		      <td>
 		        Comma-separated list of paths on the local filesystem where 
-		        temporary Map-Reduce data is written.
+		        temporary Map/Reduce data is written.
 		      </td>
 		      <td>Multiple paths help spread disk i/o.</td>
 		    </tr>
 		    <tr>
 		      <td>mapred.tasktracker.{map|reduce}.tasks.maximum</td>
 		      <td>
-		        The maximum number of map/reduce tasks, which are run 
+		        The maximum number of Map/Reduce tasks, which are run 
 		        simultaneously on a given <code>TaskTracker</code>, individually.
 		      </td>
 		      <td>
@@ -241,7 +241,7 @@
 		      <td>List of permitted/excluded TaskTrackers.</td>
 		      <td>
 		        If necessary, use these files to control the list of allowable 
-		        tasktrackers.
+		        TaskTrackers.
 		      </td>
   		    </tr>
 		  </table>
@@ -423,7 +423,7 @@
     
     <section>
       <title>Hadoop Rack Awareness</title>
-      <p>The HDFS and the Map-Reduce components are rack-aware.</p>
+      <p>The HDFS and the Map/Reduce components are rack-aware.</p>
       <p>The <code>NameNode</code> and the <code>JobTracker</code> obtains the
       <code>rack id</code> of the slaves in the cluster by invoking an API 
       <a href="ext:api/org/apache/hadoop/net/dnstoswitchmapping/resolve
@@ -434,7 +434,7 @@
       implementation of the same runs a script/command configured using 
       <code>topology.script.file.name</code>. If topology.script.file.name is
       not set, the rack id <code>/default-rack</code> is returned for any 
-      passed IP address. The additional configuration in the Map-Reduce
+      passed IP address. The additional configuration in the Map/Reduce
       part is <code>mapred.cache.task.levels</code> which determines the number
       of levels (in the network topology) of caches. So, for example, if it is
       the default value of 2, two levels of caches will be constructed - 
@@ -447,7 +447,7 @@
       <title>Hadoop Startup</title>
       
       <p>To start a Hadoop cluster you will need to start both the HDFS and 
-      Map-Reduce cluster.</p>
+      Map/Reduce cluster.</p>
 
       <p>
         Format a new distributed filesystem:<br/>
@@ -487,7 +487,7 @@
       and stops the <code>DataNode</code> daemon on all the listed slaves.</p>
       
       <p>
-        Stop Map-Reduce with the following command, run on the designated
+        Stop Map/Reduce with the following command, run on the designated
         the designated <code>JobTracker</code>:<br/>
         <code>$ bin/stop-mapred.sh</code><br/>
       </p>

+ 21 - 25
src/docs/src/documentation/content/xdocs/quickstart.xml

@@ -28,12 +28,12 @@
     <section>
       <title>Purpose</title>
       
-      <p>The purpose of this document is to help users get a single-node Hadoop 
-      installation up and running very quickly so that users can get a flavour 
+      <p>The purpose of this document is to help you get a single-node Hadoop 
+      installation up and running very quickly so that you can get a flavour 
       of the <a href="hdfs_design.html">Hadoop Distributed File System 
       (<acronym title="Hadoop Distributed File System">HDFS</acronym>)</a> and 
-      the Map-Reduce framework i.e. perform simple operations on HDFS, run 
-      example/simple jobs etc.</p>
+      the Map/Reduce framework; that is, perform simple operations on HDFS and 
+      run example jobs.</p>
     </section>
     
     <section id="PreReqs">
@@ -44,19 +44,20 @@
         
         <ul>
           <li>
+            GNU/Linux is supported as a development and production platform. 
             Hadoop has been demonstrated on GNU/Linux clusters with 2000 nodes.
           </li>
           <li>
             Win32 is supported as a <em>development platform</em>. Distributed 
-            operation has not been well tested on Win32, so this is not a 
-            <em>production platform</em>.
+            operation has not been well tested on Win32, so it is not 
+            supported as a <em>production platform</em>.
           </li>
         </ul>        
       </section>
       
       <section>
         <title>Required Software</title>
-        
+        <p>Required software for Linux and Windows include:</p>
         <ol>
           <li>
             Java<sup>TM</sup> 1.6.x, preferably from Sun, must be installed.
@@ -67,18 +68,13 @@
             daemons.
           </li>
         </ol>
-        
-        <section>
-          <title>Additional requirements for Windows</title>
-          
-          <ol>
-            <li>
-              <a href="http://www.cygwin.com/">Cygwin</a> - Required for shell 
-              support in addition to the required software above. 
-            </li>
-          </ol>
-        </section>
-        
+        <p>Additional requirements for Windows include:</p>
+        <ol>
+          <li>
+            <a href="http://www.cygwin.com/">Cygwin</a> - Required for shell 
+            support in addition to the required software above. 
+          </li>
+        </ol>
       </section>
 
       <section>
@@ -140,7 +136,7 @@
     <section id="Local">
       <title>Standalone Operation</title>
       
-      <p>By default, Hadoop is configured to run things in a non-distributed 
+      <p>By default, Hadoop is configured to run in a non-distributed 
       mode, as a single Java process. This is useful for debugging.</p>
       
       <p>
@@ -213,7 +209,7 @@
         </p>
 
 		<p>
-		  Start The hadoop daemons:<br/>
+		  Start the hadoop daemons:<br/>
           <code>$ bin/start-all.sh</code>
         </p>
 
@@ -221,7 +217,7 @@
         <code>${HADOOP_LOG_DIR}</code> directory (defaults to 
         <code>${HADOOP_HOME}/logs</code>).</p>
 
-        <p>Browse the web-interface for the NameNode and the JobTracker, by
+        <p>Browse the web interface for the NameNode and the JobTracker; by
         default they are available at:</p>
         <ul>
           <li>
@@ -236,7 +232,7 @@
         
         <p>
           Copy the input files into the distributed filesystem:<br/>
-		  <code>$ bin/hadoop dfs -put conf input</code>
+		  <code>$ bin/hadoop fs -put conf input</code>
 		</p>
 		
         <p>
@@ -250,13 +246,13 @@
         <p>
           Copy the output files from the distributed filesystem to the local 
           filesytem and examine them:<br/>
-          <code>$ bin/hadoop dfs -get output output</code><br/>
+          <code>$ bin/hadoop fs -get output output</code><br/>
           <code>$ cat output/*</code>
         </p>
         <p> or </p>
         <p>
           View the output files on the distributed filesystem:<br/>
-          <code>$ bin/hadoop dfs -cat output/*</code>
+          <code>$ bin/hadoop fs -cat output/*</code>
         </p>
 
 		<p>

部分文件因为文件数量过多而无法显示