há 18 anos atrás · 61ff1e1503
--- a/CHANGES.txt
+++ b/CHANGES.txt
@@ -35,6 +35,11 @@ Trunk (unreleased changes)
 
				 
			
 
				     HADOOP-1210.  Log counters in job history. (Owen O'Malley via ddas)
			
 
				 
			
 
				+    HADOOP-2105.  Improve overview.html to clarify supported platforms, 
			
 
				+    software pre-requisites for hadoop, how to install them on various 
			
 
				+    platforms and a better general description of hadoop and it's utility. 
			
 
				+    (Jim Kellerman via acmurthy) 
			
 
				+
			
 
				   OPTIMIZATIONS
			
 
				 
			
 
				     HADOOP-1898.  Release the lock protecting the last time of the last stack
			
--- a/src/java/overview.html
+++ b/src/java/overview.html
@@ -1,3 +1,4 @@
 
				+<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
			
 
				 <html>
			
 
				 <head>
			
 
				    <title>Hadoop</title>
			
@@ -6,47 +7,110 @@
 
				 
			
 
				 Hadoop is a distributed computing platform.
			
 
				 
			
 
				-<p>Hadoop primarily consists of a distributed filesystem (DFS, in <a
			
 
				-href="org/apache/hadoop/dfs/package-summary.html">org.apache.hadoop.dfs</a>)
			
 
				-and an implementation of a MapReduce distributed data processor (in <a
			
 
				-href="org/apache/hadoop/mapred/package-summary.html">org.apache.hadoop.mapred
			
 
				-</a>).</p>
			
 
				+<p>Hadoop primarily consists of the <a 
			
 
				+href="org/apache/hadoop/dfs/package-summary.html">Hadoop Distributed FileSystem 
			
 
				+(HDFS)</a> and an 
			
 
				+implementation of the <a href="org/apache/hadoop/mapred/package-summary.html">
			
 
				+Map-Reduce</a> programming paradigm.</p>
			
 
				+
			
 
				+
			
 
				+<p>Hadoop is a software framework that lets one easily write and run applications 
			
 
				+that process vast amounts of data. Here's what makes Hadoop especially useful:</p>
			
 
				+<ul>
			
 
				+  <li>
			
 
				+    <b>Scalable</b>: Hadoop can reliably store and process petabytes.
			
 
				+  </li>
			
 
				+  <li>
			
 
				+    <b>Economical</b>: It distributes the data and processing across clusters 
			
 
				+    of commonly available computers. These clusters can number into the thousands 
			
 
				+    of nodes.
			
 
				+  </li>
			
 
				+  <li>
			
 
				+    <b>Efficient</b>: By distributing the data, Hadoop can process it in parallel 
			
 
				+    on the nodes where the data is located. This makes it extremely rapid.
			
 
				+  </li>
			
 
				+  <li>
			
 
				+    <b>Reliable</b>: Hadoop automatically maintains multiple copies of data and 
			
 
				+    automatically redeploys computing tasks based on failures.
			
 
				+  </li>
			
 
				+</ul>  
			
 
				 
			
 
				 <h2>Requirements</h2>
			
 
				 
			
 
				-<ol>
			
 
				-  
			
 
				-<li>Java 1.5.x, preferably from <a
			
 
				- href="http://java.sun.com/j2se/downloads.html">Sun</a> Set
			
 
				- <tt>JAVA_HOME</tt> to the root of your Java installation.</li>
			
 
				-  
			
 
				-<li>ssh must be installed and sshd must be running to use Hadoop's
			
 
				-scripts to manage remote Hadoop daemons.  On Ubuntu, this may done
			
 
				-with <br><tt>sudo apt-get install ssh</tt></li>
			
 
				-  
			
 
				-<li>rsync must be installed to use Hadoop's scripts to manage remote
			
 
				-Hadoop installations.  On Ubuntu, this may done with <br><tt>sudo
			
 
				-apt-get install rsync</tt>.</li>
			
 
				-  
			
 
				-<li>On Win32, <a href="http://www.cygwin.com/">cygwin</a>, for shell
			
 
				-support.  To use Subversion on Win32, select the subversion package
			
 
				-when you install, in the "Devel" category.  Distributed operation has
			
 
				-not been well tested on Win32, so this should primarily be considered
			
 
				-a development platform at this point, not a production platform.</li>
			
 
				+<h3>Platforms</h3>
			
 
				+
			
 
				+<ul>
			
 
				+  <li>
			
 
				+    Hadoop was been demonstrated on GNU/Linux clusters with 2000 nodes.
			
 
				+  </li>
			
 
				+  <li>
			
 
				+    Win32 is supported as a <i>development</i> platform. Distributed operation 
			
 
				+    has not been well tested on Win32, so this is not a <i>production</i> 
			
 
				+    platform.
			
 
				+  </li>  
			
 
				+</ul>
			
 
				   
			
 
				+<h3>Requisite Software</h3>
			
 
				+
			
 
				+<ol>
			
 
				+  <li>
			
 
				+    Java 1.5.x, preferably from 
			
 
				+    <a href="http://java.sun.com/j2se/downloads.html">Sun</a>. 
			
 
				+    Set <tt>JAVA_HOME</tt> to the root of your Java installation.
			
 
				+  </li>
			
 
				+  <li>
			
 
				+    ssh must be installed and sshd must be running to use Hadoop's
			
 
				+    scripts to manage remote Hadoop daemons.
			
 
				+  </li>
			
 
				+  <li>
			
 
				+    rsync may be installed to use Hadoop's scripts to manage remote
			
 
				+    Hadoop installations.
			
 
				+  </li>
			
 
				 </ol>
			
 
				 
			
 
				+<h4>Additional requirements for Windows</h4>
			
 
				+
			
 
				+<ol>
			
 
				+  <li>
			
 
				+    <a href="http://www.cygwin.com/">Cygwin</a> - Required for shell support in 
			
 
				+    addition to the required software above.
			
 
				+  </li>
			
 
				+  <li>
			
 
				+    Subversion - Optional, for checking-out code from the source repository.
			
 
				+  </li>
			
 
				+</ol>
			
 
				+  
			
 
				+<h3>Installing Required Software</h3>
			
 
				+
			
 
				+<p>If your platform does not have the required software listed above, you
			
 
				+will have to install it.</p>
			
 
				+
			
 
				+<p>For example on Ubuntu Linux:</p>
			
 
				+<p><blockquote><pre>
			
 
				+$ sudo apt-get install ssh<br>
			
 
				+$ sudo apt-get install rsync<br>
			
 
				+</pre></blockquote></p>
			
 
				+
			
 
				+<p>On Windows, if you did not install the required software when you
			
 
				+installed cygwin, start the cygwin installer and select the packages:</p>
			
 
				+<ul>
			
 
				+  <li>openssh - the "Net" category</li>
			
 
				+  <li>rsync - the "Net" category</li>
			
 
				+  <li>subversion (optional) - the "Devel" category</li>
			
 
				+</ul>
			
 
				+
			
 
				 <h2>Getting Started</h2>
			
 
				 
			
 
				 <p>First, you need to get a copy of the Hadoop code.</p>
			
 
				 
			
 
				 <p>You can download a nightly build from <a
			
 
				-href="http://cvs.apache.org/dist/lucene/hadoop/nightly/">http://cvs.apache.org/dist/lucene/hadoop/nightly/</a>.
			
 
				-Unpack the release and connect to its top-level directory.</p>
			
 
				+href="http://cvs.apache.org/dist/lucene/hadoop/nightly/">
			
 
				+http://cvs.apache.org/dist/lucene/hadoop/nightly/</a>. Unpack the release and 
			
 
				+connect to its top-level directory.</p>
			
 
				 
			
 
				 <p>Or, check out the code from <a
			
 
				 href="http://lucene.apache.org/hadoop/version_control.html">subversion</a>
			
 
				-and build it with <a href="http://ant.apache.org/">Ant</a>.</p>
			
 
				+and build it with <a href="http://ant.apache.org/">ant</a>.</p>
			
 
				 
			
 
				 <p>Edit the file <tt>conf/hadoop-env.sh</tt> to define at least
			
 
				 <tt>JAVA_HOME</tt>.</p>