SingleCluster.apt.vm 6.0 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180
  1. ~~ Licensed under the Apache License, Version 2.0 (the "License");
  2. ~~ you may not use this file except in compliance with the License.
  3. ~~ You may obtain a copy of the License at
  4. ~~
  5. ~~ http://www.apache.org/licenses/LICENSE-2.0
  6. ~~
  7. ~~ Unless required by applicable law or agreed to in writing, software
  8. ~~ distributed under the License is distributed on an "AS IS" BASIS,
  9. ~~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  10. ~~ See the License for the specific language governing permissions and
  11. ~~ limitations under the License. See accompanying LICENSE file.
  12. ---
  13. Hadoop MapReduce Next Generation ${project.version} - Setting up a Single Node Cluster.
  14. ---
  15. ---
  16. ${maven.build.timestamp}
  17. Hadoop MapReduce Next Generation - Setting up a Single Node Cluster.
  18. \[ {{{./index.html}Go Back}} \]
  19. * Mapreduce Tarball
  20. You should be able to obtain the MapReduce tarball from the release.
  21. If not, you should be able to create a tarball from the source.
  22. +---+
  23. $ mvn clean install -DskipTests
  24. $ cd hadoop-mapreduce-project
  25. $ mvn clean install assembly:assembly
  26. +---+
  27. <<NOTE:>> You will need protoc installed of version 2.4.1 or greater.
  28. To ignore the native builds in mapreduce you can use <<<-P-cbuild>>> argument
  29. for maven. The tarball should be available in <<<target/>>> directory.
  30. * Setting up the environment.
  31. Assuming you have installed hadoop-common/hadoop-hdfs and exported
  32. <<$HADOOP_COMMON_HOME>>/<<$HADOOP_COMMON_HOME>>, untar hadoop mapreduce
  33. tarball and set environment variable <<$HADOOP_MAPRED_HOME>> to the
  34. untarred directory. Set <<$YARN_HOME>> the same as <<$HADOOP_MAPRED_HOME>>.
  35. <<NOTE:>> The following instructions assume you have hdfs running.
  36. * Setting up Configuration.
  37. To start the ResourceManager and NodeManager, you will have to update the configs.
  38. Assuming your $HADOOP_CONF_DIR is the configuration directory and has the installed
  39. configs for HDFS and <<<core-site.xml>>>. There are 2 config files you will have to setup
  40. <<<mapred-site.xml>>> and <<<yarn-site.xml>>>.
  41. ** Setting up <<<mapred-site.xml>>>
  42. Add the following configs to your <<<mapred-site.xml>>>.
  43. +---+
  44. <property>
  45. <name>mapreduce.cluster.temp.dir</name>
  46. <value></value>
  47. <description>No description</description>
  48. <final>true</final>
  49. </property>
  50. <property>
  51. <name>mapreduce.cluster.local.dir</name>
  52. <value></value>
  53. <description>No description</description>
  54. <final>true</final>
  55. </property>
  56. +---+
  57. ** Setting up <<<yarn-site.xml>>>
  58. Add the following configs to your <<<yarn-site.xml>>>
  59. +---+
  60. <property>
  61. <name>yarn.resourcemanager.resource-tracker.address</name>
  62. <value>host:port</value>
  63. <description>host is the hostname of the resource manager and
  64. port is the port on which the NodeManagers contact the Resource Manager.
  65. </description>
  66. </property>
  67. <property>
  68. <name>yarn.resourcemanager.scheduler.address</name>
  69. <value>host:port</value>
  70. <description>host is the hostname of the resourcemanager and port is the port
  71. on which the Applications in the cluster talk to the Resource Manager.
  72. </description>
  73. </property>
  74. <property>
  75. <name>yarn.resourcemanager.scheduler.class</name>
  76. <value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler</value>
  77. <description>In case you do not want to use the default scheduler</description>
  78. </property>
  79. <property>
  80. <name>yarn.resourcemanager.address</name>
  81. <value>host:port</value>
  82. <description>the host is the hostname of the ResourceManager and the port is the port on
  83. which the clients can talk to the Resource Manager. </description>
  84. </property>
  85. <property>
  86. <name>yarn.nodemanager.local-dirs</name>
  87. <value></value>
  88. <description>the local directories used by the nodemanager</description>
  89. </property>
  90. <property>
  91. <name>yarn.nodemanager.address</name>
  92. <value>0.0.0.0:port</value>
  93. <description>the nodemanagers bind to this port</description>
  94. </property>
  95. <property>
  96. <name>yarn.nodemanager.resource.memory-gb</name>
  97. <value>10</value>
  98. <description>the amount of memory on the NodeManager in GB</description>
  99. </property>
  100. <property>
  101. <name>yarn.nodemanager.remote-app-log-dir</name>
  102. <value>/app-logs</value>
  103. <description>directory on hdfs where the application logs are moved to </description>
  104. </property>
  105. <property>
  106. <name>yarn.nodemanager.log-dirs</name>
  107. <value></value>
  108. <description>the directories used by Nodemanagers as log directories</description>
  109. </property>
  110. <property>
  111. <name>yarn.nodemanager.aux-services</name>
  112. <value>mapreduce.shuffle</value>
  113. <description>shuffle service that needs to be set for Map Reduce to run </description>
  114. </property>
  115. +---+
  116. * Create Symlinks.
  117. You will have to create the following symlinks:
  118. +---+
  119. $ cd $HADOOP_COMMON_HOME/share/hadoop/common/lib/
  120. $ ln -s $HADOOP_MAPRED_HOME/modules/hadoop-mapreduce-client-app-*-SNAPSHOT.jar .
  121. $ ln -s $HADOOP_MAPRED_HOME/modules/hadoop-mapreduce-client-jobclient-*-SNAPSHOT.jar .
  122. $ ln -s $HADOOP_MAPRED_HOME/modules/hadoop-mapreduce-client-common-*-SNAPSHOT.jar .
  123. $ ln -s $HADOOP_MAPRED_HOME/modules/hadoop-mapreduce-client-shuffle-*-SNAPSHOT.jar .
  124. $ ln -s $HADOOP_MAPRED_HOME/modules/hadoop-mapreduce-client-core-*-SNAPSHOT.jar .
  125. $ ln -s $HADOOP_MAPRED_HOME/modules/hadoop-yarn-common-*-SNAPSHOT.jar .
  126. $ ln -s $HADOOP_MAPRED_HOME/modules/hadoop-yarn-api-*-SNAPSHOT.jar .
  127. +---+
  128. * Running daemons.
  129. Assuming that the environment variables <<$HADOOP_COMMON_HOME>>, <<$HADOOP_HDFS_HOME>>, <<$HADOO_MAPRED_HOME>>,
  130. <<$YARN_HOME>>, <<$JAVA_HOME>> and <<$HADOOP_CONF_DIR>> have been set appropriately.
  131. Set $<<$YARN_CONF_DIR>> the same as $<<HADOOP_CONF_DIR>>
  132. Run ResourceManager and NodeManager as:
  133. +---+
  134. $ cd $HADOOP_MAPRED_HOME
  135. $ bin/yarn-daemon.sh start resourcemanager
  136. $ bin/yarn-daemon.sh start nodemanager
  137. +---+
  138. You should be up and running. You can run randomwriter as:
  139. +---+
  140. $ $HADOOP_COMMON_HOME/bin/hadoop jar hadoop-examples.jar randomwriter out
  141. +---+
  142. Good luck.