123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194 |
- ~~ Licensed under the Apache License, Version 2.0 (the "License");
- ~~ you may not use this file except in compliance with the License.
- ~~ You may obtain a copy of the License at
- ~~
- ~~ http://www.apache.org/licenses/LICENSE-2.0
- ~~
- ~~ Unless required by applicable law or agreed to in writing, software
- ~~ distributed under the License is distributed on an "AS IS" BASIS,
- ~~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- ~~ See the License for the specific language governing permissions and
- ~~ limitations under the License. See accompanying LICENSE file.
- ---
- Hadoop MapReduce Next Generation ${project.version} - Setting up a Single Node Cluster.
- ---
- ---
- ${maven.build.timestamp}
- Hadoop MapReduce Next Generation - Setting up a Single Node Cluster.
- \[ {{{./index.html}Go Back}} \]
- %{toc|section=1|fromDepth=0}
- * Mapreduce Tarball
- You should be able to obtain the MapReduce tarball from the release.
- If not, you should be able to create a tarball from the source.
- +---+
- $ mvn clean install -DskipTests
- $ cd hadoop-mapreduce-project
- $ mvn clean install assembly:assembly -Pnative
- +---+
- <<NOTE:>> You will need protoc 2.5.0 installed.
- To ignore the native builds in mapreduce you can omit the <<<-Pnative>>> argument
- for maven. The tarball should be available in <<<target/>>> directory.
-
- * Setting up the environment.
- Assuming you have installed hadoop-common/hadoop-hdfs and exported
- <<$HADOOP_COMMON_HOME>>/<<$HADOOP_HDFS_HOME>>, untar hadoop mapreduce
- tarball and set environment variable <<$HADOOP_MAPRED_HOME>> to the
- untarred directory. Set <<$HADOOP_YARN_HOME>> the same as <<$HADOOP_MAPRED_HOME>>.
-
- <<NOTE:>> The following instructions assume you have hdfs running.
- * Setting up Configuration.
- To start the ResourceManager and NodeManager, you will have to update the configs.
- Assuming your $HADOOP_CONF_DIR is the configuration directory and has the installed
- configs for HDFS and <<<core-site.xml>>>. There are 2 config files you will have to setup
- <<<mapred-site.xml>>> and <<<yarn-site.xml>>>.
- ** Setting up <<<mapred-site.xml>>>
- Add the following configs to your <<<mapred-site.xml>>>.
- +---+
- <property>
- <name>mapreduce.cluster.temp.dir</name>
- <value></value>
- <description>No description</description>
- <final>true</final>
- </property>
- <property>
- <name>mapreduce.cluster.local.dir</name>
- <value></value>
- <description>No description</description>
- <final>true</final>
- </property>
- +---+
- ** Setting up <<<yarn-site.xml>>>
- Add the following configs to your <<<yarn-site.xml>>>
- +---+
- <property>
- <name>yarn.resourcemanager.resource-tracker.address</name>
- <value>host:port</value>
- <description>host is the hostname of the resource manager and
- port is the port on which the NodeManagers contact the Resource Manager.
- </description>
- </property>
- <property>
- <name>yarn.resourcemanager.scheduler.address</name>
- <value>host:port</value>
- <description>host is the hostname of the resourcemanager and port is the port
- on which the Applications in the cluster talk to the Resource Manager.
- </description>
- </property>
- <property>
- <name>yarn.resourcemanager.scheduler.class</name>
- <value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler</value>
- <description>In case you do not want to use the default scheduler</description>
- </property>
- <property>
- <name>yarn.resourcemanager.address</name>
- <value>host:port</value>
- <description>the host is the hostname of the ResourceManager and the port is the port on
- which the clients can talk to the Resource Manager. </description>
- </property>
- <property>
- <name>yarn.nodemanager.local-dirs</name>
- <value></value>
- <description>the local directories used by the nodemanager</description>
- </property>
- <property>
- <name>yarn.nodemanager.address</name>
- <value>0.0.0.0:port</value>
- <description>the nodemanagers bind to this port</description>
- </property>
- <property>
- <name>yarn.nodemanager.resource.memory-mb</name>
- <value>10240</value>
- <description>the amount of memory on the NodeManager in GB</description>
- </property>
-
- <property>
- <name>yarn.nodemanager.remote-app-log-dir</name>
- <value>/app-logs</value>
- <description>directory on hdfs where the application logs are moved to </description>
- </property>
- <property>
- <name>yarn.nodemanager.log-dirs</name>
- <value></value>
- <description>the directories used by Nodemanagers as log directories</description>
- </property>
- <property>
- <name>yarn.nodemanager.aux-services</name>
- <value>mapreduce.shuffle</value>
- <description>shuffle service that needs to be set for Map Reduce to run </description>
- </property>
- +---+
- * Setting up <<<capacity-scheduler.xml>>>
- Make sure you populate the root queues in <<<capacity-scheduler.xml>>>.
- +---+
- <property>
- <name>yarn.scheduler.capacity.root.queues</name>
- <value>unfunded,default</value>
- </property>
-
- <property>
- <name>yarn.scheduler.capacity.root.capacity</name>
- <value>100</value>
- </property>
-
- <property>
- <name>yarn.scheduler.capacity.root.unfunded.capacity</name>
- <value>50</value>
- </property>
-
- <property>
- <name>yarn.scheduler.capacity.root.default.capacity</name>
- <value>50</value>
- </property>
- +---+
- * Running daemons.
- Assuming that the environment variables <<$HADOOP_COMMON_HOME>>, <<$HADOOP_HDFS_HOME>>, <<$HADOO_MAPRED_HOME>>,
- <<$HADOOP_YARN_HOME>>, <<$JAVA_HOME>> and <<$HADOOP_CONF_DIR>> have been set appropriately.
- Set $<<$YARN_CONF_DIR>> the same as $<<HADOOP_CONF_DIR>>
-
- Run ResourceManager and NodeManager as:
-
- +---+
- $ cd $HADOOP_MAPRED_HOME
- $ sbin/yarn-daemon.sh start resourcemanager
- $ sbin/yarn-daemon.sh start nodemanager
- +---+
- You should be up and running. You can run randomwriter as:
- +---+
- $ $HADOOP_COMMON_HOME/bin/hadoop jar hadoop-examples.jar randomwriter out
- +---+
- Good luck.
|