|
@@ -18,210 +18,7 @@
|
|
|
|
|
|
Single Node Setup
|
|
|
|
|
|
-%{toc|section=1|fromDepth=0}
|
|
|
+ This page will be removed in the next major release.
|
|
|
|
|
|
-* Purpose
|
|
|
-
|
|
|
- This document describes how to set up and configure a single-node
|
|
|
- Hadoop installation so that you can quickly perform simple operations
|
|
|
- using Hadoop MapReduce and the Hadoop Distributed File System (HDFS).
|
|
|
-
|
|
|
-* Prerequisites
|
|
|
-
|
|
|
-** Supported Platforms
|
|
|
-
|
|
|
- * GNU/Linux is supported as a development and production platform.
|
|
|
- Hadoop has been demonstrated on GNU/Linux clusters with 2000 nodes.
|
|
|
-
|
|
|
- * Windows is also a supported platform.
|
|
|
-
|
|
|
-** Required Software
|
|
|
-
|
|
|
- Required software for Linux and Windows include:
|
|
|
-
|
|
|
- [[1]] Java^TM 1.6.x, preferably from Sun, must be installed.
|
|
|
-
|
|
|
- [[2]] ssh must be installed and sshd must be running to use the Hadoop
|
|
|
- scripts that manage remote Hadoop daemons.
|
|
|
-
|
|
|
-** Installing Software
|
|
|
-
|
|
|
- If your cluster doesn't have the requisite software you will need to
|
|
|
- install it.
|
|
|
-
|
|
|
- For example on Ubuntu Linux:
|
|
|
-
|
|
|
-----
|
|
|
- $ sudo apt-get install ssh
|
|
|
- $ sudo apt-get install rsync
|
|
|
-----
|
|
|
-
|
|
|
-* Download
|
|
|
-
|
|
|
- To get a Hadoop distribution, download a recent stable release from one
|
|
|
- of the Apache Download Mirrors.
|
|
|
-
|
|
|
-* Prepare to Start the Hadoop Cluster
|
|
|
-
|
|
|
- Unpack the downloaded Hadoop distribution. In the distribution, edit
|
|
|
- the file <<<conf/hadoop-env.sh>>> to define at least <<<JAVA_HOME>>> to be the root
|
|
|
- of your Java installation.
|
|
|
-
|
|
|
- Try the following command:
|
|
|
-
|
|
|
-----
|
|
|
- $ bin/hadoop
|
|
|
-----
|
|
|
-
|
|
|
- This will display the usage documentation for the hadoop script.
|
|
|
-
|
|
|
- Now you are ready to start your Hadoop cluster in one of the three
|
|
|
- supported modes:
|
|
|
-
|
|
|
- * Local (Standalone) Mode
|
|
|
-
|
|
|
- * Pseudo-Distributed Mode
|
|
|
-
|
|
|
- * Fully-Distributed Mode
|
|
|
-
|
|
|
-* Standalone Operation
|
|
|
-
|
|
|
- By default, Hadoop is configured to run in a non-distributed mode, as a
|
|
|
- single Java process. This is useful for debugging.
|
|
|
-
|
|
|
- The following example copies the unpacked conf directory to use as
|
|
|
- input and then finds and displays every match of the given regular
|
|
|
- expression. Output is written to the given output directory.
|
|
|
-
|
|
|
-----
|
|
|
- $ mkdir input
|
|
|
- $ cp conf/*.xml input
|
|
|
- $ bin/hadoop jar hadoop-*-examples.jar grep input output 'dfs[a-z.]+'
|
|
|
- $ cat output/*
|
|
|
----
|
|
|
-
|
|
|
-* Pseudo-Distributed Operation
|
|
|
-
|
|
|
- Hadoop can also be run on a single-node in a pseudo-distributed mode
|
|
|
- where each Hadoop daemon runs in a separate Java process.
|
|
|
-
|
|
|
-** Configuration
|
|
|
-
|
|
|
- Use the following:
|
|
|
-
|
|
|
- conf/core-site.xml:
|
|
|
-
|
|
|
-----
|
|
|
-<configuration>
|
|
|
- <property>
|
|
|
- <name>fs.defaultFS</name>
|
|
|
- <value>hdfs://localhost:9000</value>
|
|
|
- </property>
|
|
|
-</configuration>
|
|
|
-----
|
|
|
-
|
|
|
- conf/hdfs-site.xml:
|
|
|
-
|
|
|
-----
|
|
|
-<configuration>
|
|
|
- <property>
|
|
|
- <name>dfs.replication</name>
|
|
|
- <value>1</value>
|
|
|
- </property>
|
|
|
-</configuration>
|
|
|
-----
|
|
|
-
|
|
|
- conf/mapred-site.xml:
|
|
|
-
|
|
|
-----
|
|
|
-<configuration>
|
|
|
- <property>
|
|
|
- <name>mapred.job.tracker</name>
|
|
|
- <value>localhost:9001</value>
|
|
|
- </property>
|
|
|
-</configuration>
|
|
|
-----
|
|
|
-
|
|
|
-** Setup passphraseless ssh
|
|
|
-
|
|
|
- Now check that you can ssh to the localhost without a passphrase:
|
|
|
-
|
|
|
-----
|
|
|
- $ ssh localhost
|
|
|
-----
|
|
|
-
|
|
|
- If you cannot ssh to localhost without a passphrase, execute the
|
|
|
- following commands:
|
|
|
-
|
|
|
-----
|
|
|
- $ ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
|
|
|
- $ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
|
|
|
-----
|
|
|
-
|
|
|
-** Execution
|
|
|
-
|
|
|
- Format a new distributed-filesystem:
|
|
|
-
|
|
|
-----
|
|
|
- $ bin/hadoop namenode -format
|
|
|
-----
|
|
|
-
|
|
|
- Start the hadoop daemons:
|
|
|
-
|
|
|
-----
|
|
|
- $ bin/start-all.sh
|
|
|
-----
|
|
|
-
|
|
|
- The hadoop daemon log output is written to the <<<${HADOOP_LOG_DIR}>>>
|
|
|
- directory (defaults to <<<${HADOOP_PREFIX}/logs>>>).
|
|
|
-
|
|
|
- Browse the web interface for the NameNode and the JobTracker; by
|
|
|
- default they are available at:
|
|
|
-
|
|
|
- * NameNode - <<<http://localhost:50070/>>>
|
|
|
-
|
|
|
- * JobTracker - <<<http://localhost:50030/>>>
|
|
|
-
|
|
|
- Copy the input files into the distributed filesystem:
|
|
|
-
|
|
|
-----
|
|
|
- $ bin/hadoop fs -put conf input
|
|
|
-----
|
|
|
-
|
|
|
- Run some of the examples provided:
|
|
|
-
|
|
|
-----
|
|
|
- $ bin/hadoop jar hadoop-*-examples.jar grep input output 'dfs[a-z.]+'
|
|
|
-----
|
|
|
-
|
|
|
- Examine the output files:
|
|
|
-
|
|
|
- Copy the output files from the distributed filesystem to the local
|
|
|
- filesytem and examine them:
|
|
|
-
|
|
|
-----
|
|
|
- $ bin/hadoop fs -get output output
|
|
|
- $ cat output/*
|
|
|
-----
|
|
|
-
|
|
|
- or
|
|
|
-
|
|
|
- View the output files on the distributed filesystem:
|
|
|
-
|
|
|
-----
|
|
|
- $ bin/hadoop fs -cat output/*
|
|
|
-----
|
|
|
-
|
|
|
- When you're done, stop the daemons with:
|
|
|
-
|
|
|
-----
|
|
|
- $ bin/stop-all.sh
|
|
|
-----
|
|
|
-
|
|
|
-* Fully-Distributed Operation
|
|
|
-
|
|
|
- For information on setting up fully-distributed, non-trivial clusters
|
|
|
- see {{{./ClusterSetup.html}Cluster Setup}}.
|
|
|
-
|
|
|
- Java and JNI are trademarks or registered trademarks of Sun
|
|
|
- Microsystems, Inc. in the United States and other countries.
|
|
|
+ See {{{./SingleCluster.html}Single Cluster Setup}} to set up and configure a
|
|
|
+ single-node Hadoop installation.
|