123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227 |
- ~~ Licensed under the Apache License, Version 2.0 (the "License");
- ~~ you may not use this file except in compliance with the License.
- ~~ You may obtain a copy of the License at
- ~~
- ~~ http://www.apache.org/licenses/LICENSE-2.0
- ~~
- ~~ Unless required by applicable law or agreed to in writing, software
- ~~ distributed under the License is distributed on an "AS IS" BASIS,
- ~~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- ~~ See the License for the specific language governing permissions and
- ~~ limitations under the License. See accompanying LICENSE file.
- ---
- Single Node Setup
- ---
- ---
- ${maven.build.timestamp}
- Single Node Setup
- %{toc|section=1|fromDepth=0}
- * Purpose
- This document describes how to set up and configure a single-node
- Hadoop installation so that you can quickly perform simple operations
- using Hadoop MapReduce and the Hadoop Distributed File System (HDFS).
- * Prerequisites
- ** Supported Platforms
- * GNU/Linux is supported as a development and production platform.
- Hadoop has been demonstrated on GNU/Linux clusters with 2000 nodes.
- * Windows is also a supported platform.
- ** Required Software
- Required software for Linux and Windows include:
- [[1]] Java^TM 1.6.x, preferably from Sun, must be installed.
- [[2]] ssh must be installed and sshd must be running to use the Hadoop
- scripts that manage remote Hadoop daemons.
- ** Installing Software
- If your cluster doesn't have the requisite software you will need to
- install it.
- For example on Ubuntu Linux:
- ----
- $ sudo apt-get install ssh
- $ sudo apt-get install rsync
- ----
- * Download
- To get a Hadoop distribution, download a recent stable release from one
- of the Apache Download Mirrors.
- * Prepare to Start the Hadoop Cluster
- Unpack the downloaded Hadoop distribution. In the distribution, edit
- the file <<<conf/hadoop-env.sh>>> to define at least <<<JAVA_HOME>>> to be the root
- of your Java installation.
- Try the following command:
- ----
- $ bin/hadoop
- ----
- This will display the usage documentation for the hadoop script.
- Now you are ready to start your Hadoop cluster in one of the three
- supported modes:
- * Local (Standalone) Mode
- * Pseudo-Distributed Mode
- * Fully-Distributed Mode
- * Standalone Operation
- By default, Hadoop is configured to run in a non-distributed mode, as a
- single Java process. This is useful for debugging.
- The following example copies the unpacked conf directory to use as
- input and then finds and displays every match of the given regular
- expression. Output is written to the given output directory.
- ----
- $ mkdir input
- $ cp conf/*.xml input
- $ bin/hadoop jar hadoop-*-examples.jar grep input output 'dfs[a-z.]+'
- $ cat output/*
- ---
- * Pseudo-Distributed Operation
- Hadoop can also be run on a single-node in a pseudo-distributed mode
- where each Hadoop daemon runs in a separate Java process.
- ** Configuration
- Use the following:
- conf/core-site.xml:
- ----
- <configuration>
- <property>
- <name>fs.defaultFS</name>
- <value>hdfs://localhost:9000</value>
- </property>
- </configuration>
- ----
- conf/hdfs-site.xml:
- ----
- <configuration>
- <property>
- <name>dfs.replication</name>
- <value>1</value>
- </property>
- </configuration>
- ----
- conf/mapred-site.xml:
- ----
- <configuration>
- <property>
- <name>mapred.job.tracker</name>
- <value>localhost:9001</value>
- </property>
- </configuration>
- ----
- ** Setup passphraseless ssh
- Now check that you can ssh to the localhost without a passphrase:
- ----
- $ ssh localhost
- ----
- If you cannot ssh to localhost without a passphrase, execute the
- following commands:
- ----
- $ ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
- $ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
- ----
- ** Execution
- Format a new distributed-filesystem:
- ----
- $ bin/hadoop namenode -format
- ----
- Start the hadoop daemons:
- ----
- $ bin/start-all.sh
- ----
- The hadoop daemon log output is written to the <<<${HADOOP_LOG_DIR}>>>
- directory (defaults to <<<${HADOOP_PREFIX}/logs>>>).
- Browse the web interface for the NameNode and the JobTracker; by
- default they are available at:
- * NameNode - <<<http://localhost:50070/>>>
- * JobTracker - <<<http://localhost:50030/>>>
- Copy the input files into the distributed filesystem:
- ----
- $ bin/hadoop fs -put conf input
- ----
- Run some of the examples provided:
- ----
- $ bin/hadoop jar hadoop-*-examples.jar grep input output 'dfs[a-z.]+'
- ----
- Examine the output files:
- Copy the output files from the distributed filesystem to the local
- filesytem and examine them:
- ----
- $ bin/hadoop fs -get output output
- $ cat output/*
- ----
- or
- View the output files on the distributed filesystem:
- ----
- $ bin/hadoop fs -cat output/*
- ----
- When you're done, stop the daemons with:
- ----
- $ bin/stop-all.sh
- ----
- * Fully-Distributed Operation
- For information on setting up fully-distributed, non-trivial clusters
- see {{{Cluster Setup}}}.
- Java and JNI are trademarks or registered trademarks of Sun
- Microsystems, Inc. in the United States and other countries.
|