HdfsUserGuide.apt.vm 25 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506507508509510511512513514515516517518519520521522523524525526527528529530531532533534535536537
  1. ~~ Licensed under the Apache License, Version 2.0 (the "License");
  2. ~~ you may not use this file except in compliance with the License.
  3. ~~ You may obtain a copy of the License at
  4. ~~
  5. ~~ http://www.apache.org/licenses/LICENSE-2.0
  6. ~~
  7. ~~ Unless required by applicable law or agreed to in writing, software
  8. ~~ distributed under the License is distributed on an "AS IS" BASIS,
  9. ~~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  10. ~~ See the License for the specific language governing permissions and
  11. ~~ limitations under the License. See accompanying LICENSE file.
  12. ---
  13. HDFS Users Guide
  14. ---
  15. ---
  16. ${maven.build.timestamp}
  17. HDFS Users Guide
  18. %{toc|section=1|fromDepth=0}
  19. * Purpose
  20. This document is a starting point for users working with Hadoop
  21. Distributed File System (HDFS) either as a part of a Hadoop cluster or
  22. as a stand-alone general purpose distributed file system. While HDFS is
  23. designed to "just work" in many environments, a working knowledge of
  24. HDFS helps greatly with configuration improvements and diagnostics on a
  25. specific cluster.
  26. * Overview
  27. HDFS is the primary distributed storage used by Hadoop applications. A
  28. HDFS cluster primarily consists of a NameNode that manages the file
  29. system metadata and DataNodes that store the actual data. The HDFS
  30. Architecture Guide describes HDFS in detail. This user guide primarily
  31. deals with the interaction of users and administrators with HDFS
  32. clusters. The HDFS architecture diagram depicts basic interactions
  33. among NameNode, the DataNodes, and the clients. Clients contact
  34. NameNode for file metadata or file modifications and perform actual
  35. file I/O directly with the DataNodes.
  36. The following are some of the salient features that could be of
  37. interest to many users.
  38. * Hadoop, including HDFS, is well suited for distributed storage and
  39. distributed processing using commodity hardware. It is fault
  40. tolerant, scalable, and extremely simple to expand. MapReduce, well
  41. known for its simplicity and applicability for large set of
  42. distributed applications, is an integral part of Hadoop.
  43. * HDFS is highly configurable with a default configuration well
  44. suited for many installations. Most of the time, configuration
  45. needs to be tuned only for very large clusters.
  46. * Hadoop is written in Java and is supported on all major platforms.
  47. * Hadoop supports shell-like commands to interact with HDFS directly.
  48. * The NameNode and Datanodes have built in web servers that makes it
  49. easy to check current status of the cluster.
  50. * New features and improvements are regularly implemented in HDFS.
  51. The following is a subset of useful features in HDFS:
  52. * File permissions and authentication.
  53. * Rack awareness: to take a node's physical location into
  54. account while scheduling tasks and allocating storage.
  55. * Safemode: an administrative mode for maintenance.
  56. * <<<fsck>>>: a utility to diagnose health of the file system, to find
  57. missing files or blocks.
  58. * <<<fetchdt>>>: a utility to fetch DelegationToken and store it in a
  59. file on the local system.
  60. * Rebalancer: tool to balance the cluster when the data is
  61. unevenly distributed among DataNodes.
  62. * Upgrade and rollback: after a software upgrade, it is possible
  63. to rollback to HDFS' state before the upgrade in case of
  64. unexpected problems.
  65. * Secondary NameNode: performs periodic checkpoints of the
  66. namespace and helps keep the size of file containing log of
  67. HDFS modifications within certain limits at the NameNode.
  68. * Checkpoint node: performs periodic checkpoints of the
  69. namespace and helps minimize the size of the log stored at the
  70. NameNode containing changes to the HDFS. Replaces the role
  71. previously filled by the Secondary NameNode, though is not yet
  72. battle hardened. The NameNode allows multiple Checkpoint nodes
  73. simultaneously, as long as there are no Backup nodes
  74. registered with the system.
  75. * Backup node: An extension to the Checkpoint node. In addition
  76. to checkpointing it also receives a stream of edits from the
  77. NameNode and maintains its own in-memory copy of the
  78. namespace, which is always in sync with the active NameNode
  79. namespace state. Only one Backup node may be registered with
  80. the NameNode at once.
  81. * Prerequisites
  82. The following documents describe how to install and set up a Hadoop
  83. cluster:
  84. * {{{../hadoop-common/SingleCluster.html}Single Node Setup}}
  85. for first-time users.
  86. * {{{../hadoop-common/ClusterSetup.html}Cluster Setup}}
  87. for large, distributed clusters.
  88. The rest of this document assumes the user is able to set up and run a
  89. HDFS with at least one DataNode. For the purpose of this document, both
  90. the NameNode and DataNode could be running on the same physical
  91. machine.
  92. * Web Interface
  93. NameNode and DataNode each run an internal web server in order to
  94. display basic information about the current status of the cluster. With
  95. the default configuration, the NameNode front page is at
  96. <<<http://namenode-name:50070/>>>. It lists the DataNodes in the cluster and
  97. basic statistics of the cluster. The web interface can also be used to
  98. browse the file system (using "Browse the file system" link on the
  99. NameNode front page).
  100. * Shell Commands
  101. Hadoop includes various shell-like commands that directly interact with
  102. HDFS and other file systems that Hadoop supports. The command <<<bin/hdfs dfs -help>>>
  103. lists the commands supported by Hadoop shell. Furthermore,
  104. the command <<<bin/hdfs dfs -help command-name>>> displays more detailed help
  105. for a command. These commands support most of the normal files system
  106. operations like copying files, changing file permissions, etc. It also
  107. supports a few HDFS specific operations like changing replication of
  108. files. For more information see {{{../hadoop-common/FileSystemShell.html}
  109. File System Shell Guide}}.
  110. ** DFSAdmin Command
  111. The <<<bin/hadoop dfsadmin>>> command supports a few HDFS administration
  112. related operations. The <<<bin/hadoop dfsadmin -help>>> command lists all the
  113. commands currently supported. For e.g.:
  114. * <<<-report>>>: reports basic statistics of HDFS. Some of this
  115. information is also available on the NameNode front page.
  116. * <<<-safemode>>>: though usually not required, an administrator can
  117. manually enter or leave Safemode.
  118. * <<<-finalizeUpgrade>>>: removes previous backup of the cluster made
  119. during last upgrade.
  120. * <<<-refreshNodes>>>: Updates the namenode with the set of datanodes
  121. allowed to connect to the namenode. Namenodes re-read datanode
  122. hostnames in the file defined by <<<dfs.hosts>>>, <<<dfs.hosts.exclude>>>.
  123. Hosts defined in <<<dfs.hosts>>> are the datanodes that are part of the
  124. cluster. If there are entries in <<<dfs.hosts>>>, only the hosts in it
  125. are allowed to register with the namenode. Entries in
  126. <<<dfs.hosts.exclude>>> are datanodes that need to be decommissioned.
  127. Datanodes complete decommissioning when all the replicas from them
  128. are replicated to other datanodes. Decommissioned nodes are not
  129. automatically shutdown and are not chosen for writing for new
  130. replicas.
  131. * <<<-printTopology>>> : Print the topology of the cluster. Display a tree
  132. of racks and datanodes attached to the tracks as viewed by the
  133. NameNode.
  134. For command usage, see {{{../hadoop-common/CommandsManual.html#dfsadmin}dfsadmin}}.
  135. * Secondary NameNode
  136. The NameNode stores modifications to the file system as a log appended
  137. to a native file system file, edits. When a NameNode starts up, it
  138. reads HDFS state from an image file, fsimage, and then applies edits
  139. from the edits log file. It then writes new HDFS state to the fsimage
  140. and starts normal operation with an empty edits file. Since NameNode
  141. merges fsimage and edits files only during start up, the edits log file
  142. could get very large over time on a busy cluster. Another side effect
  143. of a larger edits file is that next restart of NameNode takes longer.
  144. The secondary NameNode merges the fsimage and the edits log files
  145. periodically and keeps edits log size within a limit. It is usually run
  146. on a different machine than the primary NameNode since its memory
  147. requirements are on the same order as the primary NameNode.
  148. The start of the checkpoint process on the secondary NameNode is
  149. controlled by two configuration parameters.
  150. * <<<dfs.namenode.checkpoint.period>>>, set to 1 hour by default, specifies
  151. the maximum delay between two consecutive checkpoints, and
  152. * <<<dfs.namenode.checkpoint.txns>>>, set to 1 million by default, defines the
  153. number of uncheckpointed transactions on the NameNode which will
  154. force an urgent checkpoint, even if the checkpoint period has not
  155. been reached.
  156. The secondary NameNode stores the latest checkpoint in a directory
  157. which is structured the same way as the primary NameNode's directory.
  158. So that the check pointed image is always ready to be read by the
  159. primary NameNode if necessary.
  160. For command usage,
  161. see {{{../hadoop-common/CommandsManual.html#secondarynamenode}secondarynamenode}}.
  162. * Checkpoint Node
  163. NameNode persists its namespace using two files: fsimage, which is the
  164. latest checkpoint of the namespace and edits, a journal (log) of
  165. changes to the namespace since the checkpoint. When a NameNode starts
  166. up, it merges the fsimage and edits journal to provide an up-to-date
  167. view of the file system metadata. The NameNode then overwrites fsimage
  168. with the new HDFS state and begins a new edits journal.
  169. The Checkpoint node periodically creates checkpoints of the namespace.
  170. It downloads fsimage and edits from the active NameNode, merges them
  171. locally, and uploads the new image back to the active NameNode. The
  172. Checkpoint node usually runs on a different machine than the NameNode
  173. since its memory requirements are on the same order as the NameNode.
  174. The Checkpoint node is started by bin/hdfs namenode -checkpoint on the
  175. node specified in the configuration file.
  176. The location of the Checkpoint (or Backup) node and its accompanying
  177. web interface are configured via the <<<dfs.namenode.backup.address>>> and
  178. <<<dfs.namenode.backup.http-address>>> configuration variables.
  179. The start of the checkpoint process on the Checkpoint node is
  180. controlled by two configuration parameters.
  181. * <<<dfs.namenode.checkpoint.period>>>, set to 1 hour by default, specifies
  182. the maximum delay between two consecutive checkpoints
  183. * <<<dfs.namenode.checkpoint.txns>>>, set to 1 million by default, defines the
  184. number of uncheckpointed transactions on the NameNode which will
  185. force an urgent checkpoint, even if the checkpoint period has not
  186. been reached.
  187. The Checkpoint node stores the latest checkpoint in a directory that is
  188. structured the same as the NameNode's directory. This allows the
  189. checkpointed image to be always available for reading by the NameNode
  190. if necessary. See Import checkpoint.
  191. Multiple checkpoint nodes may be specified in the cluster configuration
  192. file.
  193. For command usage, see {{{../hadoop-common/CommandsManual.html#namenode}namenode}}.
  194. * Backup Node
  195. The Backup node provides the same checkpointing functionality as the
  196. Checkpoint node, as well as maintaining an in-memory, up-to-date copy
  197. of the file system namespace that is always synchronized with the
  198. active NameNode state. Along with accepting a journal stream of file
  199. system edits from the NameNode and persisting this to disk, the Backup
  200. node also applies those edits into its own copy of the namespace in
  201. memory, thus creating a backup of the namespace.
  202. The Backup node does not need to download fsimage and edits files from
  203. the active NameNode in order to create a checkpoint, as would be
  204. required with a Checkpoint node or Secondary NameNode, since it already
  205. has an up-to-date state of the namespace state in memory. The Backup
  206. node checkpoint process is more efficient as it only needs to save the
  207. namespace into the local fsimage file and reset edits.
  208. As the Backup node maintains a copy of the namespace in memory, its RAM
  209. requirements are the same as the NameNode.
  210. The NameNode supports one Backup node at a time. No Checkpoint nodes
  211. may be registered if a Backup node is in use. Using multiple Backup
  212. nodes concurrently will be supported in the future.
  213. The Backup node is configured in the same manner as the Checkpoint
  214. node. It is started with <<<bin/hdfs namenode -backup>>>.
  215. The location of the Backup (or Checkpoint) node and its accompanying
  216. web interface are configured via the <<<dfs.namenode.backup.address>>> and
  217. <<<dfs.namenode.backup.http-address>>> configuration variables.
  218. Use of a Backup node provides the option of running the NameNode with
  219. no persistent storage, delegating all responsibility for persisting the
  220. state of the namespace to the Backup node. To do this, start the
  221. NameNode with the <<<-importCheckpoint>>> option, along with specifying no
  222. persistent storage directories of type edits <<<dfs.namenode.edits.dir>>> for
  223. the NameNode configuration.
  224. For a complete discussion of the motivation behind the creation of the
  225. Backup node and Checkpoint node, see {{{https://issues.apache.org/jira/browse/HADOOP-4539}HADOOP-4539}}.
  226. For command usage, see {{{../hadoop-common/CommandsManual.html#namenode}namenode}}.
  227. * Import Checkpoint
  228. The latest checkpoint can be imported to the NameNode if all other
  229. copies of the image and the edits files are lost. In order to do that
  230. one should:
  231. * Create an empty directory specified in the <<<dfs.namenode.name.dir>>>
  232. configuration variable;
  233. * Specify the location of the checkpoint directory in the
  234. configuration variable <<<dfs.namenode.checkpoint.dir>>>;
  235. * and start the NameNode with <<<-importCheckpoint>>> option.
  236. The NameNode will upload the checkpoint from the
  237. <<<dfs.namenode.checkpoint.dir>>> directory and then save it to the NameNode
  238. directory(s) set in <<<dfs.namenode.name.dir>>>. The NameNode will fail if a
  239. legal image is contained in <<<dfs.namenode.name.dir>>>. The NameNode
  240. verifies that the image in <<<dfs.namenode.checkpoint.dir>>> is consistent,
  241. but does not modify it in any way.
  242. For command usage, see {{{../hadoop-common/CommandsManual.html#namenode}namenode}}.
  243. * Rebalancer
  244. HDFS data might not always be be placed uniformly across the DataNode.
  245. One common reason is addition of new DataNodes to an existing cluster.
  246. While placing new blocks (data for a file is stored as a series of
  247. blocks), NameNode considers various parameters before choosing the
  248. DataNodes to receive these blocks. Some of the considerations are:
  249. * Policy to keep one of the replicas of a block on the same node as
  250. the node that is writing the block.
  251. * Need to spread different replicas of a block across the racks so
  252. that cluster can survive loss of whole rack.
  253. * One of the replicas is usually placed on the same rack as the node
  254. writing to the file so that cross-rack network I/O is reduced.
  255. * Spread HDFS data uniformly across the DataNodes in the cluster.
  256. Due to multiple competing considerations, data might not be uniformly
  257. placed across the DataNodes. HDFS provides a tool for administrators
  258. that analyzes block placement and rebalanaces data across the DataNode.
  259. A brief administrator's guide for rebalancer as a PDF is attached to
  260. {{{https://issues.apache.org/jira/browse/HADOOP-1652}HADOOP-1652}}.
  261. For command usage, see {{{../hadoop-common/CommandsManual.html#balancer}balancer}}.
  262. * Rack Awareness
  263. Typically large Hadoop clusters are arranged in racks and network
  264. traffic between different nodes with in the same rack is much more
  265. desirable than network traffic across the racks. In addition NameNode
  266. tries to place replicas of block on multiple racks for improved fault
  267. tolerance. Hadoop lets the cluster administrators decide which rack a
  268. node belongs to through configuration variable
  269. <<<net.topology.script.file.name>>>. When this script is configured, each
  270. node runs the script to determine its rack id. A default installation
  271. assumes all the nodes belong to the same rack. This feature and
  272. configuration is further described in PDF attached to
  273. {{{https://issues.apache.org/jira/browse/HADOOP-692}HADOOP-692}}.
  274. * Safemode
  275. During start up the NameNode loads the file system state from the
  276. fsimage and the edits log file. It then waits for DataNodes to report
  277. their blocks so that it does not prematurely start replicating the
  278. blocks though enough replicas already exist in the cluster. During this
  279. time NameNode stays in Safemode. Safemode for the NameNode is
  280. essentially a read-only mode for the HDFS cluster, where it does not
  281. allow any modifications to file system or blocks. Normally the NameNode
  282. leaves Safemode automatically after the DataNodes have reported that
  283. most file system blocks are available. If required, HDFS could be
  284. placed in Safemode explicitly using <<<bin/hadoop dfsadmin -safemode>>>
  285. command. NameNode front page shows whether Safemode is on or off. A
  286. more detailed description and configuration is maintained as JavaDoc
  287. for <<<setSafeMode()>>>.
  288. * fsck
  289. HDFS supports the fsck command to check for various inconsistencies. It
  290. it is designed for reporting problems with various files, for example,
  291. missing blocks for a file or under-replicated blocks. Unlike a
  292. traditional fsck utility for native file systems, this command does not
  293. correct the errors it detects. Normally NameNode automatically corrects
  294. most of the recoverable failures. By default fsck ignores open files
  295. but provides an option to select all files during reporting. The HDFS
  296. fsck command is not a Hadoop shell command. It can be run as
  297. <<<bin/hadoop fsck>>>. For command usage, see
  298. {{{../hadoop-common/CommandsManual.html#fsck}fsck}}. fsck can be run on
  299. the whole file system or on a subset of files.
  300. * fetchdt
  301. HDFS supports the fetchdt command to fetch Delegation Token and store
  302. it in a file on the local system. This token can be later used to
  303. access secure server (NameNode for example) from a non secure client.
  304. Utility uses either RPC or HTTPS (over Kerberos) to get the token, and
  305. thus requires kerberos tickets to be present before the run (run kinit
  306. to get the tickets). The HDFS fetchdt command is not a Hadoop shell
  307. command. It can be run as <<<bin/hadoop fetchdt DTfile>>>. After you got
  308. the token you can run an HDFS command without having Kerberos tickets,
  309. by pointing <<<HADOOP_TOKEN_FILE_LOCATION>>> environmental variable to the
  310. delegation token file. For command usage, see
  311. {{{../hadoop-common/CommandsManual.html#fetchdt}fetchdt}} command.
  312. * Recovery Mode
  313. Typically, you will configure multiple metadata storage locations.
  314. Then, if one storage location is corrupt, you can read the metadata
  315. from one of the other storage locations.
  316. However, what can you do if the only storage locations available are
  317. corrupt? In this case, there is a special NameNode startup mode called
  318. Recovery mode that may allow you to recover most of your data.
  319. You can start the NameNode in recovery mode like so: <<<namenode -recover>>>
  320. When in recovery mode, the NameNode will interactively prompt you at
  321. the command line about possible courses of action you can take to
  322. recover your data.
  323. If you don't want to be prompted, you can give the <<<-force>>> option. This
  324. option will force recovery mode to always select the first choice.
  325. Normally, this will be the most reasonable choice.
  326. Because Recovery mode can cause you to lose data, you should always
  327. back up your edit log and fsimage before using it.
  328. * Upgrade and Rollback
  329. When Hadoop is upgraded on an existing cluster, as with any software
  330. upgrade, it is possible there are new bugs or incompatible changes that
  331. affect existing applications and were not discovered earlier. In any
  332. non-trivial HDFS installation, it is not an option to loose any data,
  333. let alone to restart HDFS from scratch. HDFS allows administrators to
  334. go back to earlier version of Hadoop and rollback the cluster to the
  335. state it was in before the upgrade. HDFS upgrade is described in more
  336. detail in {{{http://wiki.apache.org/hadoop/Hadoop_Upgrade}Hadoop Upgrade}}
  337. Wiki page. HDFS can have one such backup at a time. Before upgrading,
  338. administrators need to remove existing backup using bin/hadoop dfsadmin
  339. <<<-finalizeUpgrade>>> command. The following briefly describes the
  340. typical upgrade procedure:
  341. * Before upgrading Hadoop software, finalize if there an existing
  342. backup. <<<dfsadmin -upgradeProgress>>> status can tell if the cluster
  343. needs to be finalized.
  344. * Stop the cluster and distribute new version of Hadoop.
  345. * Run the new version with <<<-upgrade>>> option (<<<bin/start-dfs.sh -upgrade>>>).
  346. * Most of the time, cluster works just fine. Once the new HDFS is
  347. considered working well (may be after a few days of operation),
  348. finalize the upgrade. Note that until the cluster is finalized,
  349. deleting the files that existed before the upgrade does not free up
  350. real disk space on the DataNodes.
  351. * If there is a need to move back to the old version,
  352. * stop the cluster and distribute earlier version of Hadoop.
  353. * start the cluster with rollback option. (<<<bin/start-dfs.sh -rollback>>>).
  354. When upgrading to a new version of HDFS, it is necessary to rename or
  355. delete any paths that are reserved in the new version of HDFS. If the
  356. NameNode encounters a reserved path during upgrade, it will print an
  357. error like the following:
  358. <<< /.reserved is a reserved path and .snapshot is a
  359. reserved path component in this version of HDFS. Please rollback and delete
  360. or rename this path, or upgrade with the -renameReserved [key-value pairs]
  361. option to automatically rename these paths during upgrade.>>>
  362. Specifying <<<-upgrade -renameReserved [optional key-value pairs]>>> causes
  363. the NameNode to automatically rename any reserved paths found during
  364. startup. For example, to rename all paths named <<<.snapshot>>> to
  365. <<<.my-snapshot>>> and <<<.reserved>>> to <<<.my-reserved>>>, a user would
  366. specify <<<-upgrade -renameReserved
  367. .snapshot=.my-snapshot,.reserved=.my-reserved>>>.
  368. If no key-value pairs are specified with <<<-renameReserved>>>, the
  369. NameNode will then suffix reserved paths with
  370. <<<.<LAYOUT-VERSION>.UPGRADE_RENAMED>>>, e.g.
  371. <<<.snapshot.-51.UPGRADE_RENAMED>>>.
  372. There are some caveats to this renaming process. It's recommended,
  373. if possible, to first <<<hdfs dfsadmin -saveNamespace>>> before upgrading.
  374. This is because data inconsistency can result if an edit log operation
  375. refers to the destination of an automatically renamed file.
  376. * File Permissions and Security
  377. The file permissions are designed to be similar to file permissions on
  378. other familiar platforms like Linux. Currently, security is limited to
  379. simple file permissions. The user that starts NameNode is treated as
  380. the superuser for HDFS. Future versions of HDFS will support network
  381. authentication protocols like Kerberos for user authentication and
  382. encryption of data transfers. The details are discussed in the
  383. Permissions Guide.
  384. * Scalability
  385. Hadoop currently runs on clusters with thousands of nodes. The
  386. {{{http://wiki.apache.org/hadoop/PoweredBy}PoweredBy}} Wiki page lists
  387. some of the organizations that deploy Hadoop on large clusters.
  388. HDFS has one NameNode for each cluster. Currently the total memory
  389. available on NameNode is the primary scalability limitation.
  390. On very large clusters, increasing average size of files stored in
  391. HDFS helps with increasing cluster size without increasing memory
  392. requirements on NameNode. The default configuration may not suite
  393. very large clusters. The {{{http://wiki.apache.org/hadoop/FAQ}FAQ}}
  394. Wiki page lists suggested configuration improvements for large Hadoop clusters.
  395. * Related Documentation
  396. This user guide is a good starting point for working with HDFS. While
  397. the user guide continues to improve, there is a large wealth of
  398. documentation about Hadoop and HDFS. The following list is a starting
  399. point for further exploration:
  400. * {{{http://hadoop.apache.org}Hadoop Site}}: The home page for
  401. the Apache Hadoop site.
  402. * {{{http://wiki.apache.org/hadoop/FrontPage}Hadoop Wiki}}:
  403. The home page (FrontPage) for the Hadoop Wiki. Unlike
  404. the released documentation, which is part of Hadoop source tree,
  405. Hadoop Wiki is regularly edited by Hadoop Community.
  406. * {{{http://wiki.apache.org/hadoop/FAQ}FAQ}}: The FAQ Wiki page.
  407. * {{{../../api/index.html}Hadoop JavaDoc API}}.
  408. * Hadoop User Mailing List: user[at]hadoop.apache.org.
  409. * Explore {{{./hdfs-default.xml}hdfs-default.xml}}. It includes
  410. brief description of most of the configuration variables available.
  411. * {{{../hadoop-common/CommandsManual.html}Hadoop Commands Guide}}:
  412. Hadoop commands usage.