cluster_setup.xml 39 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506507508509510511512513514515516517518519520521522523524525526527528529530531532533534535536537538539540541542543544545546547548549550551552553554555556557558559560561562563564565566567568569570571572573574575576577578579580581582583584585586587588589590591592593594595596597598599600601602603604605606607608609610611612613614615616617618619620621622623624625626627628629630631632633634635636637638639640641642643644645646647648649650651652653654655656657658659660661662663664665666667668669670671672673674675676677678679680681682683684685686687688689690691692693694695696697698699700701702703704705706707708709710711712713714715716717718719720721722723724725726727728729730731732733734735736737738739740741742743744745746747748749750751752753754755756757758759760761762763764765766767768769770771772773774775776777778779780781782783784785786787788789790791792793794795796797798799800801802803804805806807808809810811812813814815816817818819820821822823824825826827828829830831832833834835836837838839840841842843844845846847848849850851852853854855856857858859860861862863864865866867868869870871872873874875876877878879880881882883884885886887888889890891892893894895896897898899900901902903904905906907908909910911912913
  1. <?xml version="1.0"?>
  2. <!--
  3. Licensed to the Apache Software Foundation (ASF) under one or more
  4. contributor license agreements. See the NOTICE file distributed with
  5. this work for additional information regarding copyright ownership.
  6. The ASF licenses this file to You under the Apache License, Version 2.0
  7. (the "License"); you may not use this file except in compliance with
  8. the License. You may obtain a copy of the License at
  9. http://www.apache.org/licenses/LICENSE-2.0
  10. Unless required by applicable law or agreed to in writing, software
  11. distributed under the License is distributed on an "AS IS" BASIS,
  12. WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  13. See the License for the specific language governing permissions and
  14. limitations under the License.
  15. -->
  16. <!DOCTYPE document PUBLIC "-//APACHE//DTD Documentation V2.0//EN" "http://forrest.apache.org/dtd/document-v20.dtd">
  17. <document>
  18. <header>
  19. <title>Cluster Setup</title>
  20. </header>
  21. <body>
  22. <section>
  23. <title>Purpose</title>
  24. <p>This document describes how to install, configure and manage non-trivial
  25. Hadoop clusters ranging from a few nodes to extremely large clusters with
  26. thousands of nodes.</p>
  27. <p>
  28. To play with Hadoop, you may first want to install Hadoop on a single machine (see <a href="quickstart.html"> Hadoop Quick Start</a>).
  29. </p>
  30. </section>
  31. <section>
  32. <title>Pre-requisites</title>
  33. <ol>
  34. <li>
  35. Make sure all <a href="quickstart.html#PreReqs">requisite</a> software
  36. is installed on all nodes in your cluster.
  37. </li>
  38. <li>
  39. <a href="quickstart.html#Download">Get</a> the Hadoop software.
  40. </li>
  41. </ol>
  42. </section>
  43. <section>
  44. <title>Installation</title>
  45. <p>Installing a Hadoop cluster typically involves unpacking the software
  46. on all the machines in the cluster.</p>
  47. <p>Typically one machine in the cluster is designated as the
  48. <code>NameNode</code> and another machine the as <code>JobTracker</code>,
  49. exclusively. These are the <em>masters</em>. The rest of the machines in
  50. the cluster act as both <code>DataNode</code> <em>and</em>
  51. <code>TaskTracker</code>. These are the <em>slaves</em>.</p>
  52. <p>The root of the distribution is referred to as
  53. <code>HADOOP_HOME</code>. All machines in the cluster usually have the same
  54. <code>HADOOP_HOME</code> path.</p>
  55. </section>
  56. <section>
  57. <title>Configuration</title>
  58. <p>The following sections describe how to configure a Hadoop cluster.</p>
  59. <section>
  60. <title>Configuration Files</title>
  61. <p>Hadoop configuration is driven by two types of important
  62. configuration files:</p>
  63. <ol>
  64. <li>
  65. Read-only default configuration -
  66. <a href="ext:core-default">src/core/core-default.xml</a>,
  67. <a href="ext:hdfs-default">src/hdfs/hdfs-default.xml</a> and
  68. <a href="ext:mapred-default">src/mapred/mapred-default.xml</a>.
  69. </li>
  70. <li>
  71. Site-specific configuration -
  72. <em>conf/core-site.xml</em>,
  73. <em>conf/hdfs-site.xml</em> and
  74. <em>conf/mapred-site.xml</em>.
  75. </li>
  76. </ol>
  77. <p>To learn more about how the Hadoop framework is controlled by these
  78. configuration files, look
  79. <a href="ext:api/org/apache/hadoop/conf/configuration">here</a>.</p>
  80. <p>Additionally, you can control the Hadoop scripts found in the
  81. <code>bin/</code> directory of the distribution, by setting site-specific
  82. values via the <code>conf/hadoop-env.sh</code>.</p>
  83. </section>
  84. <section>
  85. <title>Site Configuration</title>
  86. <p>To configure the Hadoop cluster you will need to configure the
  87. <em>environment</em> in which the Hadoop daemons execute as well as
  88. the <em>configuration parameters</em> for the Hadoop daemons.</p>
  89. <p>The Hadoop daemons are <code>NameNode</code>/<code>DataNode</code>
  90. and <code>JobTracker</code>/<code>TaskTracker</code>.</p>
  91. <section>
  92. <title>Configuring the Environment of the Hadoop Daemons</title>
  93. <p>Administrators should use the <code>conf/hadoop-env.sh</code> script
  94. to do site-specific customization of the Hadoop daemons' process
  95. environment.</p>
  96. <p>At the very least you should specify the
  97. <code>JAVA_HOME</code> so that it is correctly defined on each
  98. remote node.</p>
  99. <p>Administrators can configure individual daemons using the
  100. configuration options <code>HADOOP_*_OPTS</code>. Various options
  101. available are shown below in the table. </p>
  102. <table>
  103. <tr><th>Daemon</th><th>Configure Options</th></tr>
  104. <tr><td>NameNode</td><td>HADOOP_NAMENODE_OPTS</td></tr>
  105. <tr><td>DataNode</td><td>HADOOP_DATANODE_OPTS</td></tr>
  106. <tr><td>SecondaryNamenode</td>
  107. <td>HADOOP_SECONDARYNAMENODE_OPTS</td></tr>
  108. <tr><td>JobTracker</td><td>HADOOP_JOBTRACKER_OPTS</td></tr>
  109. <tr><td>TaskTracker</td><td>HADOOP_TASKTRACKER_OPTS</td></tr>
  110. </table>
  111. <p> For example, To configure Namenode to use parallelGC, the
  112. following statement should be added in <code>hadoop-env.sh</code> :
  113. <br/><code>
  114. export HADOOP_NAMENODE_OPTS="-XX:+UseParallelGC ${HADOOP_NAMENODE_OPTS}"
  115. </code><br/></p>
  116. <p>Other useful configuration parameters that you can customize
  117. include:</p>
  118. <ul>
  119. <li>
  120. <code>HADOOP_LOG_DIR</code> - The directory where the daemons'
  121. log files are stored. They are automatically created if they don't
  122. exist.
  123. </li>
  124. <li>
  125. <code>HADOOP_HEAPSIZE</code> - The maximum amount of heapsize
  126. to use, in MB e.g. <code>1000MB</code>. This is used to
  127. configure the heap size for the hadoop daemon. By default,
  128. the value is <code>1000MB</code>.
  129. </li>
  130. </ul>
  131. </section>
  132. <section>
  133. <title>Configuring the Hadoop Daemons</title>
  134. <p>This section deals with important parameters to be specified in the
  135. following:
  136. <br/>
  137. <code>conf/core-site.xml</code>:</p>
  138. <table>
  139. <tr>
  140. <th>Parameter</th>
  141. <th>Value</th>
  142. <th>Notes</th>
  143. </tr>
  144. <tr>
  145. <td>fs.default.name</td>
  146. <td>URI of <code>NameNode</code>.</td>
  147. <td><em>hdfs://hostname/</em></td>
  148. </tr>
  149. </table>
  150. <p><br/><code>conf/hdfs-site.xml</code>:</p>
  151. <table>
  152. <tr>
  153. <th>Parameter</th>
  154. <th>Value</th>
  155. <th>Notes</th>
  156. </tr>
  157. <tr>
  158. <td>dfs.name.dir</td>
  159. <td>
  160. Path on the local filesystem where the <code>NameNode</code>
  161. stores the namespace and transactions logs persistently.</td>
  162. <td>
  163. If this is a comma-delimited list of directories then the name
  164. table is replicated in all of the directories, for redundancy.
  165. </td>
  166. </tr>
  167. <tr>
  168. <td>dfs.data.dir</td>
  169. <td>
  170. Comma separated list of paths on the local filesystem of a
  171. <code>DataNode</code> where it should store its blocks.
  172. </td>
  173. <td>
  174. If this is a comma-delimited list of directories, then data will
  175. be stored in all named directories, typically on different
  176. devices.
  177. </td>
  178. </tr>
  179. </table>
  180. <p><br/><code>conf/mapred-site.xml</code>:</p>
  181. <table>
  182. <tr>
  183. <th>Parameter</th>
  184. <th>Value</th>
  185. <th>Notes</th>
  186. </tr>
  187. <tr>
  188. <td>mapred.job.tracker</td>
  189. <td>Host or IP and port of <code>JobTracker</code>.</td>
  190. <td><em>host:port</em> pair.</td>
  191. </tr>
  192. <tr>
  193. <td>mapred.system.dir</td>
  194. <td>
  195. Path on the HDFS where where the Map/Reduce framework stores
  196. system files e.g. <code>/hadoop/mapred/system/</code>.
  197. </td>
  198. <td>
  199. This is in the default filesystem (HDFS) and must be accessible
  200. from both the server and client machines.
  201. </td>
  202. </tr>
  203. <tr>
  204. <td>mapred.local.dir</td>
  205. <td>
  206. Comma-separated list of paths on the local filesystem where
  207. temporary Map/Reduce data is written.
  208. </td>
  209. <td>Multiple paths help spread disk i/o.</td>
  210. </tr>
  211. <tr>
  212. <td>mapred.tasktracker.{map|reduce}.tasks.maximum</td>
  213. <td>
  214. The maximum number of Map/Reduce tasks, which are run
  215. simultaneously on a given <code>TaskTracker</code>, individually.
  216. </td>
  217. <td>
  218. Defaults to 2 (2 maps and 2 reduces), but vary it depending on
  219. your hardware.
  220. </td>
  221. </tr>
  222. <tr>
  223. <td>dfs.hosts/dfs.hosts.exclude</td>
  224. <td>List of permitted/excluded DataNodes.</td>
  225. <td>
  226. If necessary, use these files to control the list of allowable
  227. datanodes.
  228. </td>
  229. </tr>
  230. <tr>
  231. <td>mapred.hosts/mapred.hosts.exclude</td>
  232. <td>List of permitted/excluded TaskTrackers.</td>
  233. <td>
  234. If necessary, use these files to control the list of allowable
  235. TaskTrackers.
  236. </td>
  237. </tr>
  238. <tr>
  239. <td>mapred.queue.names</td>
  240. <td>Comma separated list of queues to which jobs can be submitted.</td>
  241. <td>
  242. The Map/Reduce system always supports atleast one queue
  243. with the name as <em>default</em>. Hence, this parameter's
  244. value should always contain the string <em>default</em>.
  245. Some job schedulers supported in Hadoop, like the
  246. <a href="capacity_scheduler.html">Capacity
  247. Scheduler</a>, support multiple queues. If such a scheduler is
  248. being used, the list of configured queue names must be
  249. specified here. Once queues are defined, users can submit
  250. jobs to a queue using the property name
  251. <em>mapred.job.queue.name</em> in the job configuration.
  252. There could be a separate
  253. configuration file for configuring properties of these
  254. queues that is managed by the scheduler.
  255. Refer to the documentation of the scheduler for information on
  256. the same.
  257. </td>
  258. </tr>
  259. <tr>
  260. <td>mapred.acls.enabled</td>
  261. <td>Specifies whether ACLs are supported for controlling job
  262. submission and administration</td>
  263. <td>
  264. If <em>true</em>, ACLs would be checked while submitting
  265. and administering jobs. ACLs can be specified using the
  266. configuration parameters of the form
  267. <em>mapred.queue.queue-name.acl-name</em>, defined below.
  268. </td>
  269. </tr>
  270. </table>
  271. <p><br/><code> conf/mapred-queue-acls.xml</code></p>
  272. <table>
  273. <tr>
  274. <th>Parameter</th>
  275. <th>Value</th>
  276. <th>Notes</th>
  277. </tr>
  278. <tr>
  279. <td>mapred.queue.<em>queue-name</em>.acl-submit-job</td>
  280. <td>List of users and groups that can submit jobs to the
  281. specified <em>queue-name</em>.</td>
  282. <td>
  283. The list of users and groups are both comma separated
  284. list of names. The two lists are separated by a blank.
  285. Example: <em>user1,user2 group1,group2</em>.
  286. If you wish to define only a list of groups, provide
  287. a blank at the beginning of the value.
  288. </td>
  289. </tr>
  290. <tr>
  291. <td>mapred.queue.<em>queue-name</em>.acl-administer-job</td>
  292. <td>List of users and groups that can change the priority
  293. or kill jobs that have been submitted to the
  294. specified <em>queue-name</em>.</td>
  295. <td>
  296. The list of users and groups are both comma separated
  297. list of names. The two lists are separated by a blank.
  298. Example: <em>user1,user2 group1,group2</em>.
  299. If you wish to define only a list of groups, provide
  300. a blank at the beginning of the value. Note that an
  301. owner of a job can always change the priority or kill
  302. his/her own job, irrespective of the ACLs.
  303. </td>
  304. </tr>
  305. </table>
  306. <p>Typically all the above parameters are marked as
  307. <a href="ext:api/org/apache/hadoop/conf/configuration/final_parameters">
  308. final</a> to ensure that they cannot be overriden by user-applications.
  309. </p>
  310. <section>
  311. <title>Real-World Cluster Configurations</title>
  312. <p>This section lists some non-default configuration parameters which
  313. have been used to run the <em>sort</em> benchmark on very large
  314. clusters.</p>
  315. <ul>
  316. <li>
  317. <p>Some non-default configuration values used to run sort900,
  318. that is 9TB of data sorted on a cluster with 900 nodes:</p>
  319. <table>
  320. <tr>
  321. <th>Configuration File</th>
  322. <th>Parameter</th>
  323. <th>Value</th>
  324. <th>Notes</th>
  325. </tr>
  326. <tr>
  327. <td>conf/hdfs-site.xml</td>
  328. <td>dfs.block.size</td>
  329. <td>134217728</td>
  330. <td>HDFS blocksize of 128MB for large file-systems.</td>
  331. </tr>
  332. <tr>
  333. <td>conf/hdfs-site.xml</td>
  334. <td>dfs.namenode.handler.count</td>
  335. <td>40</td>
  336. <td>
  337. More NameNode server threads to handle RPCs from large
  338. number of DataNodes.
  339. </td>
  340. </tr>
  341. <tr>
  342. <td>conf/mapred-site.xml</td>
  343. <td>mapred.reduce.parallel.copies</td>
  344. <td>20</td>
  345. <td>
  346. Higher number of parallel copies run by reduces to fetch
  347. outputs from very large number of maps.
  348. </td>
  349. </tr>
  350. <tr>
  351. <td>conf/mapred-site.xml</td>
  352. <td>mapred.child.java.opts</td>
  353. <td>-Xmx512M</td>
  354. <td>
  355. Larger heap-size for child jvms of maps/reduces.
  356. </td>
  357. </tr>
  358. <tr>
  359. <td>conf/core-site.xml</td>
  360. <td>fs.inmemory.size.mb</td>
  361. <td>200</td>
  362. <td>
  363. Larger amount of memory allocated for the in-memory
  364. file-system used to merge map-outputs at the reduces.
  365. </td>
  366. </tr>
  367. <tr>
  368. <td>conf/core-site.xml</td>
  369. <td>io.sort.factor</td>
  370. <td>100</td>
  371. <td>More streams merged at once while sorting files.</td>
  372. </tr>
  373. <tr>
  374. <td>conf/core-site.xml</td>
  375. <td>io.sort.mb</td>
  376. <td>200</td>
  377. <td>Higher memory-limit while sorting data.</td>
  378. </tr>
  379. <tr>
  380. <td>conf/core-site.xml</td>
  381. <td>io.file.buffer.size</td>
  382. <td>131072</td>
  383. <td>Size of read/write buffer used in SequenceFiles.</td>
  384. </tr>
  385. </table>
  386. </li>
  387. <li>
  388. <p>Updates to some configuration values to run sort1400 and
  389. sort2000, that is 14TB of data sorted on 1400 nodes and 20TB of
  390. data sorted on 2000 nodes:</p>
  391. <table>
  392. <tr>
  393. <th>Configuration File</th>
  394. <th>Parameter</th>
  395. <th>Value</th>
  396. <th>Notes</th>
  397. </tr>
  398. <tr>
  399. <td>conf/mapred-site.xml</td>
  400. <td>mapred.job.tracker.handler.count</td>
  401. <td>60</td>
  402. <td>
  403. More JobTracker server threads to handle RPCs from large
  404. number of TaskTrackers.
  405. </td>
  406. </tr>
  407. <tr>
  408. <td>conf/mapred-site.xml</td>
  409. <td>mapred.reduce.parallel.copies</td>
  410. <td>50</td>
  411. <td></td>
  412. </tr>
  413. <tr>
  414. <td>conf/mapred-site.xml</td>
  415. <td>tasktracker.http.threads</td>
  416. <td>50</td>
  417. <td>
  418. More worker threads for the TaskTracker's http server. The
  419. http server is used by reduces to fetch intermediate
  420. map-outputs.
  421. </td>
  422. </tr>
  423. <tr>
  424. <td>conf/mapred-site.xml</td>
  425. <td>mapred.child.java.opts</td>
  426. <td>-Xmx1024M</td>
  427. <td>Larger heap-size for child jvms of maps/reduces.</td>
  428. </tr>
  429. </table>
  430. </li>
  431. </ul>
  432. </section>
  433. <section>
  434. <title>Task Controllers</title>
  435. <p>Task controllers are classes in the Hadoop Map/Reduce
  436. framework that define how user's map and reduce tasks
  437. are launched and controlled. They can
  438. be used in clusters that require some customization in
  439. the process of launching or controlling the user tasks.
  440. For example, in some
  441. clusters, there may be a requirement to run tasks as
  442. the user who submitted the job, instead of as the task
  443. tracker user, which is how tasks are launched by default.
  444. This section describes how to configure and use
  445. task controllers.</p>
  446. <p>The following task controllers are the available in
  447. Hadoop.
  448. </p>
  449. <table>
  450. <tr><th>Name</th><th>Class Name</th><th>Description</th></tr>
  451. <tr>
  452. <td>DefaultTaskController</td>
  453. <td>org.apache.hadoop.mapred.DefaultTaskController</td>
  454. <td> The default task controller which Hadoop uses to manage task
  455. execution. The tasks run as the task tracker user.</td>
  456. </tr>
  457. <tr>
  458. <td>LinuxTaskController</td>
  459. <td>org.apache.hadoop.mapred.LinuxTaskController</td>
  460. <td>This task controller, which is supported only on Linux,
  461. runs the tasks as the user who submitted the job. It requires
  462. these user accounts to be created on the cluster nodes
  463. where the tasks are launched. It
  464. uses a setuid executable that is included in the Hadoop
  465. distribution. The task tracker uses this executable to
  466. launch and kill tasks. The setuid executable switches to
  467. the user who has submitted the job and launches or kills
  468. the tasks. Currently, this task controller
  469. opens up permissions to local files and directories used
  470. by the tasks such as the job jar files, distributed archive
  471. files, intermediate files and task log files. In future,
  472. it is expected that stricter file permissions are used.
  473. </td>
  474. </tr>
  475. </table>
  476. <section>
  477. <title>Configuring Task Controllers</title>
  478. <p>The task controller to be used can be configured by setting the
  479. value of the following key in mapred-site.xml</p>
  480. <table>
  481. <tr>
  482. <th>Property</th><th>Value</th><th>Notes</th>
  483. </tr>
  484. <tr>
  485. <td>mapred.task.tracker.task-controller</td>
  486. <td>Fully qualified class name of the task controller class</td>
  487. <td>Currently there are two implementations of task controller
  488. in the Hadoop system, DefaultTaskController and LinuxTaskController.
  489. Refer to the class names mentioned above to determine the value
  490. to set for the class of choice.
  491. </td>
  492. </tr>
  493. </table>
  494. </section>
  495. <section>
  496. <title>Using the LinuxTaskController</title>
  497. <p>This section of the document describes the steps required to
  498. use the LinuxTaskController.</p>
  499. <p>In order to use the LinuxTaskController, a setuid executable
  500. should be built and deployed on the compute nodes. The
  501. executable is named task-controller. To build the executable,
  502. execute
  503. <em>ant task-controller -Dhadoop.conf.dir=/path/to/conf/dir.
  504. </em>
  505. The path passed in <em>-Dhadoop.conf.dir</em> should be the path
  506. on the cluster nodes where a configuration file for the setuid
  507. executable would be located. The executable would be built to
  508. <em>build.dir/dist.dir/bin</em> and should be installed to
  509. <em>$HADOOP_HOME/bin</em>.
  510. </p>
  511. <p>
  512. The executable must be deployed as a setuid executable, by changing
  513. the ownership to <em>root</em>, group ownership to that of tasktracker
  514. and giving it permissions <em>4510</em>.Please take a note that,
  515. group which owns task-controller should contain only tasktracker
  516. as its memeber and not users who submit jobs.
  517. </p>
  518. <p>The executable requires a configuration file called
  519. <em>taskcontroller.cfg</em> to be
  520. present in the configuration directory passed to the ant target
  521. mentioned above. If the binary was not built with a specific
  522. conf directory, the path defaults to <em>/path-to-binary/../conf</em>.
  523. </p>
  524. <p>The executable requires following configuration items to be
  525. present in the <em>taskcontroller.cfg</em> file. The items should
  526. be mentioned as simple <em>key=value</em> pairs.
  527. </p>
  528. <table><tr><th>Name</th><th>Description</th></tr>
  529. <tr>
  530. <td>mapred.local.dir</td>
  531. <td>Path to mapred local directories. Should be same as the value
  532. which was provided to key in mapred-site.xml. This is required to
  533. validate paths passed to the setuid executable in order to prevent
  534. arbitrary paths being passed to it.</td>
  535. </tr>
  536. </table>
  537. <p>
  538. The LinuxTaskController requires that paths leading up to
  539. the directories specified in
  540. <em>mapred.local.dir</em> and <em>hadoop.log.dir</em> to be 755
  541. and directories themselves having 777 permissions.
  542. </p>
  543. </section>
  544. </section>
  545. <section>
  546. <title>Monitoring Health of TaskTracker Nodes</title>
  547. <p>Hadoop Map/Reduce provides a mechanism by which administrators
  548. can configure the TaskTracker to run an administrator supplied
  549. script periodically to determine if a node is healthy or not.
  550. Administrators can determine if the node is in a healthy state
  551. by performing any checks of their choice in the script. If the
  552. script detects the node to be in an unhealthy state, it must print
  553. a line to standard output beginning with the string <em>ERROR</em>.
  554. The TaskTracker spawns the script periodically and checks its
  555. output. If the script's output contains the string <em>ERROR</em>,
  556. as described above, the node's status is reported as 'unhealthy'
  557. and the node is black-listed on the JobTracker. No further tasks
  558. will be assigned to this node. However, the
  559. TaskTracker continues to run the script, so that if the node
  560. becomes healthy again, it will be removed from the blacklisted
  561. nodes on the JobTracker automatically. The node's health
  562. along with the output of the script, if it is unhealthy, is
  563. available to the administrator in the JobTracker's web interface.
  564. The time since the node was healthy is also displayed on the
  565. web interface.
  566. </p>
  567. <section>
  568. <title>Configuring the Node Health Check Script</title>
  569. <p>The following parameters can be used to control the node health
  570. monitoring script in <em>mapred-site.xml</em>.</p>
  571. <table>
  572. <tr><th>Name</th><th>Description</th></tr>
  573. <tr><td><code>mapred.healthChecker.script.path</code></td>
  574. <td>Absolute path to the script which is periodically run by the
  575. TaskTracker to determine if the node is
  576. healthy or not. The file should be executable by the TaskTracker.
  577. If the value of this key is empty or the file does
  578. not exist or is not executable, node health monitoring
  579. is not started.</td>
  580. </tr>
  581. <tr>
  582. <td><code>mapred.healthChecker.interval</code></td>
  583. <td>Frequency at which the node health script is run,
  584. in milliseconds</td>
  585. </tr>
  586. <tr>
  587. <td><code>mapred.healthChecker.script.timeout</code></td>
  588. <td>Time after which the node health script will be killed by
  589. the TaskTracker if unresponsive.
  590. The node is marked unhealthy. if node health script times out.</td>
  591. </tr>
  592. <tr>
  593. <td><code>mapred.healthChecker.script.args</code></td>
  594. <td>Extra arguments that can be passed to the node health script
  595. when launched.
  596. These should be comma separated list of arguments. </td>
  597. </tr>
  598. </table>
  599. </section>
  600. </section>
  601. </section>
  602. <section>
  603. <title> Memory monitoring</title>
  604. <p>A <code>TaskTracker</code>(TT) can be configured to monitor memory
  605. usage of tasks it spawns, so that badly-behaved jobs do not bring
  606. down a machine due to excess memory consumption. With monitoring
  607. enabled, every task is assigned a task-limit for virtual memory (VMEM).
  608. In addition, every node is assigned a node-limit for VMEM usage.
  609. A TT ensures that a task is killed if it, and
  610. its descendants, use VMEM over the task's per-task limit. It also
  611. ensures that one or more tasks are killed if the sum total of VMEM
  612. usage by all tasks, and their descendents, cross the node-limit.</p>
  613. <p>Users can, optionally, specify the VMEM task-limit per job. If no
  614. such limit is provided, a default limit is used. A node-limit can be
  615. set per node.</p>
  616. <p>Currently the memory monitoring and management is only supported
  617. in Linux platform.</p>
  618. <p>To enable monitoring for a TT, the
  619. following parameters all need to be set:</p>
  620. <table>
  621. <tr><th>Name</th><th>Type</th><th>Description</th></tr>
  622. <tr><td>mapred.tasktracker.vmem.reserved</td><td>long</td>
  623. <td>A number, in bytes, that represents an offset. The total VMEM on
  624. the machine, minus this offset, is the VMEM node-limit for all
  625. tasks, and their descendants, spawned by the TT.
  626. </td></tr>
  627. <tr><td>mapred.task.default.maxvmem</td><td>long</td>
  628. <td>A number, in bytes, that represents the default VMEM task-limit
  629. associated with a task. Unless overridden by a job's setting,
  630. this number defines the VMEM task-limit.
  631. </td></tr>
  632. <tr><td>mapred.task.limit.maxvmem</td><td>long</td>
  633. <td>A number, in bytes, that represents the upper VMEM task-limit
  634. associated with a task. Users, when specifying a VMEM task-limit
  635. for their tasks, should not specify a limit which exceeds this amount.
  636. </td></tr>
  637. </table>
  638. <p>In addition, the following parameters can also be configured.</p>
  639. <table>
  640. <tr><th>Name</th><th>Type</th><th>Description</th></tr>
  641. <tr><td>mapred.tasktracker.taskmemorymanager.monitoring-interval</td>
  642. <td>long</td>
  643. <td>The time interval, in milliseconds, between which the TT
  644. checks for any memory violation. The default value is 5000 msec
  645. (5 seconds).
  646. </td></tr>
  647. </table>
  648. <p>Here's how the memory monitoring works for a TT.</p>
  649. <ol>
  650. <li>If one or more of the configuration parameters described
  651. above are missing or -1 is specified , memory monitoring is
  652. disabled for the TT.
  653. </li>
  654. <li>In addition, monitoring is disabled if
  655. <code>mapred.task.default.maxvmem</code> is greater than
  656. <code>mapred.task.limit.maxvmem</code>.
  657. </li>
  658. <li>If a TT receives a task whose task-limit is set by the user
  659. to a value larger than <code>mapred.task.limit.maxvmem</code>, it
  660. logs a warning but executes the task.
  661. </li>
  662. <li>Periodically, the TT checks the following:
  663. <ul>
  664. <li>If any task's current VMEM usage is greater than that task's
  665. VMEM task-limit, the task is killed and reason for killing
  666. the task is logged in task diagonistics . Such a task is considered
  667. failed, i.e., the killing counts towards the task's failure count.
  668. </li>
  669. <li>If the sum total of VMEM used by all tasks and descendants is
  670. greater than the node-limit, the TT kills enough tasks, in the
  671. order of least progress made, till the overall VMEM usage falls
  672. below the node-limt. Such killed tasks are not considered failed
  673. and their killing does not count towards the tasks' failure counts.
  674. </li>
  675. </ul>
  676. </li>
  677. </ol>
  678. <p>Schedulers can choose to ease the monitoring pressure on the TT by
  679. preventing too many tasks from running on a node and by scheduling
  680. tasks only if the TT has enough VMEM free. In addition, Schedulers may
  681. choose to consider the physical memory (RAM) available on the node
  682. as well. To enable Scheduler support, TTs report their memory settings
  683. to the JobTracker in every heartbeat. Before getting into details,
  684. consider the following additional memory-related parameters than can be
  685. configured to enable better scheduling:</p>
  686. <table>
  687. <tr><th>Name</th><th>Type</th><th>Description</th></tr>
  688. <tr><td>mapred.tasktracker.pmem.reserved</td><td>int</td>
  689. <td>A number, in bytes, that represents an offset. The total
  690. physical memory (RAM) on the machine, minus this offset, is the
  691. recommended RAM node-limit. The RAM node-limit is a hint to a
  692. Scheduler to scheduler only so many tasks such that the sum
  693. total of their RAM requirements does not exceed this limit.
  694. RAM usage is not monitored by a TT.
  695. </td></tr>
  696. </table>
  697. <p>A TT reports the following memory-related numbers in every
  698. heartbeat:</p>
  699. <ul>
  700. <li>The total VMEM available on the node.</li>
  701. <li>The value of <code>mapred.tasktracker.vmem.reserved</code>,
  702. if set.</li>
  703. <li>The total RAM available on the node.</li>
  704. <li>The value of <code>mapred.tasktracker.pmem.reserved</code>,
  705. if set.</li>
  706. </ul>
  707. </section>
  708. <section>
  709. <title>Slaves</title>
  710. <p>Typically you choose one machine in the cluster to act as the
  711. <code>NameNode</code> and one machine as to act as the
  712. <code>JobTracker</code>, exclusively. The rest of the machines act as
  713. both a <code>DataNode</code> and <code>TaskTracker</code> and are
  714. referred to as <em>slaves</em>.</p>
  715. <p>List all slave hostnames or IP addresses in your
  716. <code>conf/slaves</code> file, one per line.</p>
  717. </section>
  718. <section>
  719. <title>Logging</title>
  720. <p>Hadoop uses the <a href="http://logging.apache.org/log4j/">Apache
  721. log4j</a> via the <a href="http://commons.apache.org/logging/">Apache
  722. Commons Logging</a> framework for logging. Edit the
  723. <code>conf/log4j.properties</code> file to customize the Hadoop
  724. daemons' logging configuration (log-formats and so on).</p>
  725. <section>
  726. <title>History Logging</title>
  727. <p> The job history files are stored in central location
  728. <code> hadoop.job.history.location </code> which can be on DFS also,
  729. whose default value is <code>${HADOOP_LOG_DIR}/history</code>.
  730. The history web UI is accessible from job tracker web UI.</p>
  731. <p> The history files are also logged to user specified directory
  732. <code>hadoop.job.history.user.location</code>
  733. which defaults to job output directory. The files are stored in
  734. "_logs/history/" in the specified directory. Hence, by default
  735. they will be in "mapred.output.dir/_logs/history/". User can stop
  736. logging by giving the value <code>none</code> for
  737. <code>hadoop.job.history.user.location</code> </p>
  738. <p> User can view the history logs summary in specified directory
  739. using the following command <br/>
  740. <code>$ bin/hadoop job -history output-dir</code><br/>
  741. This command will print job details, failed and killed tip
  742. details. <br/>
  743. More details about the job such as successful tasks and
  744. task attempts made for each task can be viewed using the
  745. following command <br/>
  746. <code>$ bin/hadoop job -history all output-dir</code><br/></p>
  747. </section>
  748. </section>
  749. </section>
  750. <p>Once all the necessary configuration is complete, distribute the files
  751. to the <code>HADOOP_CONF_DIR</code> directory on all the machines,
  752. typically <code>${HADOOP_HOME}/conf</code>.</p>
  753. </section>
  754. <section>
  755. <title>Cluster Restartability</title>
  756. <section>
  757. <title>Map/Reduce</title>
  758. <p>The job tracker restart can recover running jobs if
  759. <code>mapred.jobtracker.restart.recover</code> is set true and
  760. <a href="#Logging">JobHistory logging</a> is enabled. Also
  761. <code>mapred.jobtracker.job.history.block.size</code> value should be
  762. set to an optimal value to dump job history to disk as soon as
  763. possible, the typical value is 3145728(3MB).</p>
  764. </section>
  765. </section>
  766. <section>
  767. <title>Hadoop Rack Awareness</title>
  768. <p>The HDFS and the Map/Reduce components are rack-aware.</p>
  769. <p>The <code>NameNode</code> and the <code>JobTracker</code> obtains the
  770. <code>rack id</code> of the slaves in the cluster by invoking an API
  771. <a href="ext:api/org/apache/hadoop/net/dnstoswitchmapping/resolve
  772. ">resolve</a> in an administrator configured
  773. module. The API resolves the slave's DNS name (also IP address) to a
  774. rack id. What module to use can be configured using the configuration
  775. item <code>topology.node.switch.mapping.impl</code>. The default
  776. implementation of the same runs a script/command configured using
  777. <code>topology.script.file.name</code>. If topology.script.file.name is
  778. not set, the rack id <code>/default-rack</code> is returned for any
  779. passed IP address. The additional configuration in the Map/Reduce
  780. part is <code>mapred.cache.task.levels</code> which determines the number
  781. of levels (in the network topology) of caches. So, for example, if it is
  782. the default value of 2, two levels of caches will be constructed -
  783. one for hosts (host -> task mapping) and another for racks
  784. (rack -> task mapping).
  785. </p>
  786. </section>
  787. <section>
  788. <title>Hadoop Startup</title>
  789. <p>To start a Hadoop cluster you will need to start both the HDFS and
  790. Map/Reduce cluster.</p>
  791. <p>
  792. Format a new distributed filesystem:<br/>
  793. <code>$ bin/hadoop namenode -format</code>
  794. </p>
  795. <p>
  796. Start the HDFS with the following command, run on the designated
  797. <code>NameNode</code>:<br/>
  798. <code>$ bin/start-dfs.sh</code>
  799. </p>
  800. <p>The <code>bin/start-dfs.sh</code> script also consults the
  801. <code>${HADOOP_CONF_DIR}/slaves</code> file on the <code>NameNode</code>
  802. and starts the <code>DataNode</code> daemon on all the listed slaves.</p>
  803. <p>
  804. Start Map-Reduce with the following command, run on the designated
  805. <code>JobTracker</code>:<br/>
  806. <code>$ bin/start-mapred.sh</code>
  807. </p>
  808. <p>The <code>bin/start-mapred.sh</code> script also consults the
  809. <code>${HADOOP_CONF_DIR}/slaves</code> file on the <code>JobTracker</code>
  810. and starts the <code>TaskTracker</code> daemon on all the listed slaves.
  811. </p>
  812. </section>
  813. <section>
  814. <title>Hadoop Shutdown</title>
  815. <p>
  816. Stop HDFS with the following command, run on the designated
  817. <code>NameNode</code>:<br/>
  818. <code>$ bin/stop-dfs.sh</code>
  819. </p>
  820. <p>The <code>bin/stop-dfs.sh</code> script also consults the
  821. <code>${HADOOP_CONF_DIR}/slaves</code> file on the <code>NameNode</code>
  822. and stops the <code>DataNode</code> daemon on all the listed slaves.</p>
  823. <p>
  824. Stop Map/Reduce with the following command, run on the designated
  825. the designated <code>JobTracker</code>:<br/>
  826. <code>$ bin/stop-mapred.sh</code><br/>
  827. </p>
  828. <p>The <code>bin/stop-mapred.sh</code> script also consults the
  829. <code>${HADOOP_CONF_DIR}/slaves</code> file on the <code>JobTracker</code>
  830. and stops the <code>TaskTracker</code> daemon on all the listed slaves.</p>
  831. </section>
  832. </body>
  833. </document>