123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506507508509510511512513514515516517518519520521522523524525526527528529530531532533534535536537 |
- ~~ Licensed under the Apache License, Version 2.0 (the "License");
- ~~ you may not use this file except in compliance with the License.
- ~~ You may obtain a copy of the License at
- ~~
- ~~ http://www.apache.org/licenses/LICENSE-2.0
- ~~
- ~~ Unless required by applicable law or agreed to in writing, software
- ~~ distributed under the License is distributed on an "AS IS" BASIS,
- ~~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- ~~ See the License for the specific language governing permissions and
- ~~ limitations under the License. See accompanying LICENSE file.
- ---
- HDFS Users Guide
- ---
- ---
- ${maven.build.timestamp}
- HDFS Users Guide
- %{toc|section=1|fromDepth=0}
- * Purpose
- This document is a starting point for users working with Hadoop
- Distributed File System (HDFS) either as a part of a Hadoop cluster or
- as a stand-alone general purpose distributed file system. While HDFS is
- designed to "just work" in many environments, a working knowledge of
- HDFS helps greatly with configuration improvements and diagnostics on a
- specific cluster.
- * Overview
- HDFS is the primary distributed storage used by Hadoop applications. A
- HDFS cluster primarily consists of a NameNode that manages the file
- system metadata and DataNodes that store the actual data. The HDFS
- Architecture Guide describes HDFS in detail. This user guide primarily
- deals with the interaction of users and administrators with HDFS
- clusters. The HDFS architecture diagram depicts basic interactions
- among NameNode, the DataNodes, and the clients. Clients contact
- NameNode for file metadata or file modifications and perform actual
- file I/O directly with the DataNodes.
- The following are some of the salient features that could be of
- interest to many users.
- * Hadoop, including HDFS, is well suited for distributed storage and
- distributed processing using commodity hardware. It is fault
- tolerant, scalable, and extremely simple to expand. MapReduce, well
- known for its simplicity and applicability for large set of
- distributed applications, is an integral part of Hadoop.
- * HDFS is highly configurable with a default configuration well
- suited for many installations. Most of the time, configuration
- needs to be tuned only for very large clusters.
- * Hadoop is written in Java and is supported on all major platforms.
- * Hadoop supports shell-like commands to interact with HDFS directly.
- * The NameNode and Datanodes have built in web servers that makes it
- easy to check current status of the cluster.
- * New features and improvements are regularly implemented in HDFS.
- The following is a subset of useful features in HDFS:
- * File permissions and authentication.
- * Rack awareness: to take a node's physical location into
- account while scheduling tasks and allocating storage.
- * Safemode: an administrative mode for maintenance.
- * <<<fsck>>>: a utility to diagnose health of the file system, to find
- missing files or blocks.
- * <<<fetchdt>>>: a utility to fetch DelegationToken and store it in a
- file on the local system.
- * Rebalancer: tool to balance the cluster when the data is
- unevenly distributed among DataNodes.
- * Upgrade and rollback: after a software upgrade, it is possible
- to rollback to HDFS' state before the upgrade in case of
- unexpected problems.
- * Secondary NameNode: performs periodic checkpoints of the
- namespace and helps keep the size of file containing log of
- HDFS modifications within certain limits at the NameNode.
- * Checkpoint node: performs periodic checkpoints of the
- namespace and helps minimize the size of the log stored at the
- NameNode containing changes to the HDFS. Replaces the role
- previously filled by the Secondary NameNode, though is not yet
- battle hardened. The NameNode allows multiple Checkpoint nodes
- simultaneously, as long as there are no Backup nodes
- registered with the system.
- * Backup node: An extension to the Checkpoint node. In addition
- to checkpointing it also receives a stream of edits from the
- NameNode and maintains its own in-memory copy of the
- namespace, which is always in sync with the active NameNode
- namespace state. Only one Backup node may be registered with
- the NameNode at once.
- * Prerequisites
- The following documents describe how to install and set up a Hadoop
- cluster:
- * {{{../hadoop-common/SingleCluster.html}Single Node Setup}}
- for first-time users.
- * {{{../hadoop-common/ClusterSetup.html}Cluster Setup}}
- for large, distributed clusters.
- The rest of this document assumes the user is able to set up and run a
- HDFS with at least one DataNode. For the purpose of this document, both
- the NameNode and DataNode could be running on the same physical
- machine.
- * Web Interface
- NameNode and DataNode each run an internal web server in order to
- display basic information about the current status of the cluster. With
- the default configuration, the NameNode front page is at
- <<<http://namenode-name:50070/>>>. It lists the DataNodes in the cluster and
- basic statistics of the cluster. The web interface can also be used to
- browse the file system (using "Browse the file system" link on the
- NameNode front page).
- * Shell Commands
- Hadoop includes various shell-like commands that directly interact with
- HDFS and other file systems that Hadoop supports. The command <<<bin/hdfs dfs -help>>>
- lists the commands supported by Hadoop shell. Furthermore,
- the command <<<bin/hdfs dfs -help command-name>>> displays more detailed help
- for a command. These commands support most of the normal files system
- operations like copying files, changing file permissions, etc. It also
- supports a few HDFS specific operations like changing replication of
- files. For more information see {{{../hadoop-common/FileSystemShell.html}
- File System Shell Guide}}.
- ** DFSAdmin Command
- The <<<bin/hadoop dfsadmin>>> command supports a few HDFS administration
- related operations. The <<<bin/hadoop dfsadmin -help>>> command lists all the
- commands currently supported. For e.g.:
- * <<<-report>>>: reports basic statistics of HDFS. Some of this
- information is also available on the NameNode front page.
- * <<<-safemode>>>: though usually not required, an administrator can
- manually enter or leave Safemode.
- * <<<-finalizeUpgrade>>>: removes previous backup of the cluster made
- during last upgrade.
- * <<<-refreshNodes>>>: Updates the namenode with the set of datanodes
- allowed to connect to the namenode. Namenodes re-read datanode
- hostnames in the file defined by <<<dfs.hosts>>>, <<<dfs.hosts.exclude>>>.
- Hosts defined in <<<dfs.hosts>>> are the datanodes that are part of the
- cluster. If there are entries in <<<dfs.hosts>>>, only the hosts in it
- are allowed to register with the namenode. Entries in
- <<<dfs.hosts.exclude>>> are datanodes that need to be decommissioned.
- Datanodes complete decommissioning when all the replicas from them
- are replicated to other datanodes. Decommissioned nodes are not
- automatically shutdown and are not chosen for writing for new
- replicas.
- * <<<-printTopology>>> : Print the topology of the cluster. Display a tree
- of racks and datanodes attached to the tracks as viewed by the
- NameNode.
- For command usage, see {{{../hadoop-common/CommandsManual.html#dfsadmin}dfsadmin}}.
- * Secondary NameNode
- The NameNode stores modifications to the file system as a log appended
- to a native file system file, edits. When a NameNode starts up, it
- reads HDFS state from an image file, fsimage, and then applies edits
- from the edits log file. It then writes new HDFS state to the fsimage
- and starts normal operation with an empty edits file. Since NameNode
- merges fsimage and edits files only during start up, the edits log file
- could get very large over time on a busy cluster. Another side effect
- of a larger edits file is that next restart of NameNode takes longer.
- The secondary NameNode merges the fsimage and the edits log files
- periodically and keeps edits log size within a limit. It is usually run
- on a different machine than the primary NameNode since its memory
- requirements are on the same order as the primary NameNode.
- The start of the checkpoint process on the secondary NameNode is
- controlled by two configuration parameters.
- * <<<dfs.namenode.checkpoint.period>>>, set to 1 hour by default, specifies
- the maximum delay between two consecutive checkpoints, and
- * <<<dfs.namenode.checkpoint.txns>>>, set to 1 million by default, defines the
- number of uncheckpointed transactions on the NameNode which will
- force an urgent checkpoint, even if the checkpoint period has not
- been reached.
- The secondary NameNode stores the latest checkpoint in a directory
- which is structured the same way as the primary NameNode's directory.
- So that the check pointed image is always ready to be read by the
- primary NameNode if necessary.
- For command usage,
- see {{{../hadoop-common/CommandsManual.html#secondarynamenode}secondarynamenode}}.
- * Checkpoint Node
- NameNode persists its namespace using two files: fsimage, which is the
- latest checkpoint of the namespace and edits, a journal (log) of
- changes to the namespace since the checkpoint. When a NameNode starts
- up, it merges the fsimage and edits journal to provide an up-to-date
- view of the file system metadata. The NameNode then overwrites fsimage
- with the new HDFS state and begins a new edits journal.
- The Checkpoint node periodically creates checkpoints of the namespace.
- It downloads fsimage and edits from the active NameNode, merges them
- locally, and uploads the new image back to the active NameNode. The
- Checkpoint node usually runs on a different machine than the NameNode
- since its memory requirements are on the same order as the NameNode.
- The Checkpoint node is started by bin/hdfs namenode -checkpoint on the
- node specified in the configuration file.
- The location of the Checkpoint (or Backup) node and its accompanying
- web interface are configured via the <<<dfs.namenode.backup.address>>> and
- <<<dfs.namenode.backup.http-address>>> configuration variables.
- The start of the checkpoint process on the Checkpoint node is
- controlled by two configuration parameters.
- * <<<dfs.namenode.checkpoint.period>>>, set to 1 hour by default, specifies
- the maximum delay between two consecutive checkpoints
- * <<<dfs.namenode.checkpoint.txns>>>, set to 1 million by default, defines the
- number of uncheckpointed transactions on the NameNode which will
- force an urgent checkpoint, even if the checkpoint period has not
- been reached.
- The Checkpoint node stores the latest checkpoint in a directory that is
- structured the same as the NameNode's directory. This allows the
- checkpointed image to be always available for reading by the NameNode
- if necessary. See Import checkpoint.
- Multiple checkpoint nodes may be specified in the cluster configuration
- file.
- For command usage, see {{{../hadoop-common/CommandsManual.html#namenode}namenode}}.
- * Backup Node
- The Backup node provides the same checkpointing functionality as the
- Checkpoint node, as well as maintaining an in-memory, up-to-date copy
- of the file system namespace that is always synchronized with the
- active NameNode state. Along with accepting a journal stream of file
- system edits from the NameNode and persisting this to disk, the Backup
- node also applies those edits into its own copy of the namespace in
- memory, thus creating a backup of the namespace.
- The Backup node does not need to download fsimage and edits files from
- the active NameNode in order to create a checkpoint, as would be
- required with a Checkpoint node or Secondary NameNode, since it already
- has an up-to-date state of the namespace state in memory. The Backup
- node checkpoint process is more efficient as it only needs to save the
- namespace into the local fsimage file and reset edits.
- As the Backup node maintains a copy of the namespace in memory, its RAM
- requirements are the same as the NameNode.
- The NameNode supports one Backup node at a time. No Checkpoint nodes
- may be registered if a Backup node is in use. Using multiple Backup
- nodes concurrently will be supported in the future.
- The Backup node is configured in the same manner as the Checkpoint
- node. It is started with <<<bin/hdfs namenode -backup>>>.
- The location of the Backup (or Checkpoint) node and its accompanying
- web interface are configured via the <<<dfs.namenode.backup.address>>> and
- <<<dfs.namenode.backup.http-address>>> configuration variables.
- Use of a Backup node provides the option of running the NameNode with
- no persistent storage, delegating all responsibility for persisting the
- state of the namespace to the Backup node. To do this, start the
- NameNode with the <<<-importCheckpoint>>> option, along with specifying no
- persistent storage directories of type edits <<<dfs.namenode.edits.dir>>> for
- the NameNode configuration.
- For a complete discussion of the motivation behind the creation of the
- Backup node and Checkpoint node, see {{{https://issues.apache.org/jira/browse/HADOOP-4539}HADOOP-4539}}.
- For command usage, see {{{../hadoop-common/CommandsManual.html#namenode}namenode}}.
- * Import Checkpoint
- The latest checkpoint can be imported to the NameNode if all other
- copies of the image and the edits files are lost. In order to do that
- one should:
- * Create an empty directory specified in the <<<dfs.namenode.name.dir>>>
- configuration variable;
- * Specify the location of the checkpoint directory in the
- configuration variable <<<dfs.namenode.checkpoint.dir>>>;
- * and start the NameNode with <<<-importCheckpoint>>> option.
- The NameNode will upload the checkpoint from the
- <<<dfs.namenode.checkpoint.dir>>> directory and then save it to the NameNode
- directory(s) set in <<<dfs.namenode.name.dir>>>. The NameNode will fail if a
- legal image is contained in <<<dfs.namenode.name.dir>>>. The NameNode
- verifies that the image in <<<dfs.namenode.checkpoint.dir>>> is consistent,
- but does not modify it in any way.
- For command usage, see {{{../hadoop-common/CommandsManual.html#namenode}namenode}}.
- * Rebalancer
- HDFS data might not always be be placed uniformly across the DataNode.
- One common reason is addition of new DataNodes to an existing cluster.
- While placing new blocks (data for a file is stored as a series of
- blocks), NameNode considers various parameters before choosing the
- DataNodes to receive these blocks. Some of the considerations are:
- * Policy to keep one of the replicas of a block on the same node as
- the node that is writing the block.
- * Need to spread different replicas of a block across the racks so
- that cluster can survive loss of whole rack.
- * One of the replicas is usually placed on the same rack as the node
- writing to the file so that cross-rack network I/O is reduced.
- * Spread HDFS data uniformly across the DataNodes in the cluster.
- Due to multiple competing considerations, data might not be uniformly
- placed across the DataNodes. HDFS provides a tool for administrators
- that analyzes block placement and rebalanaces data across the DataNode.
- A brief administrator's guide for rebalancer as a PDF is attached to
- {{{https://issues.apache.org/jira/browse/HADOOP-1652}HADOOP-1652}}.
- For command usage, see {{{../hadoop-common/CommandsManual.html#balancer}balancer}}.
- * Rack Awareness
- Typically large Hadoop clusters are arranged in racks and network
- traffic between different nodes with in the same rack is much more
- desirable than network traffic across the racks. In addition NameNode
- tries to place replicas of block on multiple racks for improved fault
- tolerance. Hadoop lets the cluster administrators decide which rack a
- node belongs to through configuration variable
- <<<net.topology.script.file.name>>>. When this script is configured, each
- node runs the script to determine its rack id. A default installation
- assumes all the nodes belong to the same rack. This feature and
- configuration is further described in PDF attached to
- {{{https://issues.apache.org/jira/browse/HADOOP-692}HADOOP-692}}.
- * Safemode
- During start up the NameNode loads the file system state from the
- fsimage and the edits log file. It then waits for DataNodes to report
- their blocks so that it does not prematurely start replicating the
- blocks though enough replicas already exist in the cluster. During this
- time NameNode stays in Safemode. Safemode for the NameNode is
- essentially a read-only mode for the HDFS cluster, where it does not
- allow any modifications to file system or blocks. Normally the NameNode
- leaves Safemode automatically after the DataNodes have reported that
- most file system blocks are available. If required, HDFS could be
- placed in Safemode explicitly using <<<bin/hadoop dfsadmin -safemode>>>
- command. NameNode front page shows whether Safemode is on or off. A
- more detailed description and configuration is maintained as JavaDoc
- for <<<setSafeMode()>>>.
- * fsck
- HDFS supports the fsck command to check for various inconsistencies. It
- it is designed for reporting problems with various files, for example,
- missing blocks for a file or under-replicated blocks. Unlike a
- traditional fsck utility for native file systems, this command does not
- correct the errors it detects. Normally NameNode automatically corrects
- most of the recoverable failures. By default fsck ignores open files
- but provides an option to select all files during reporting. The HDFS
- fsck command is not a Hadoop shell command. It can be run as
- <<<bin/hadoop fsck>>>. For command usage, see
- {{{../hadoop-common/CommandsManual.html#fsck}fsck}}. fsck can be run on
- the whole file system or on a subset of files.
- * fetchdt
- HDFS supports the fetchdt command to fetch Delegation Token and store
- it in a file on the local system. This token can be later used to
- access secure server (NameNode for example) from a non secure client.
- Utility uses either RPC or HTTPS (over Kerberos) to get the token, and
- thus requires kerberos tickets to be present before the run (run kinit
- to get the tickets). The HDFS fetchdt command is not a Hadoop shell
- command. It can be run as <<<bin/hadoop fetchdt DTfile>>>. After you got
- the token you can run an HDFS command without having Kerberos tickets,
- by pointing <<<HADOOP_TOKEN_FILE_LOCATION>>> environmental variable to the
- delegation token file. For command usage, see
- {{{../hadoop-common/CommandsManual.html#fetchdt}fetchdt}} command.
- * Recovery Mode
- Typically, you will configure multiple metadata storage locations.
- Then, if one storage location is corrupt, you can read the metadata
- from one of the other storage locations.
- However, what can you do if the only storage locations available are
- corrupt? In this case, there is a special NameNode startup mode called
- Recovery mode that may allow you to recover most of your data.
- You can start the NameNode in recovery mode like so: <<<namenode -recover>>>
- When in recovery mode, the NameNode will interactively prompt you at
- the command line about possible courses of action you can take to
- recover your data.
- If you don't want to be prompted, you can give the <<<-force>>> option. This
- option will force recovery mode to always select the first choice.
- Normally, this will be the most reasonable choice.
- Because Recovery mode can cause you to lose data, you should always
- back up your edit log and fsimage before using it.
- * Upgrade and Rollback
- When Hadoop is upgraded on an existing cluster, as with any software
- upgrade, it is possible there are new bugs or incompatible changes that
- affect existing applications and were not discovered earlier. In any
- non-trivial HDFS installation, it is not an option to loose any data,
- let alone to restart HDFS from scratch. HDFS allows administrators to
- go back to earlier version of Hadoop and rollback the cluster to the
- state it was in before the upgrade. HDFS upgrade is described in more
- detail in {{{http://wiki.apache.org/hadoop/Hadoop_Upgrade}Hadoop Upgrade}}
- Wiki page. HDFS can have one such backup at a time. Before upgrading,
- administrators need to remove existing backup using bin/hadoop dfsadmin
- <<<-finalizeUpgrade>>> command. The following briefly describes the
- typical upgrade procedure:
- * Before upgrading Hadoop software, finalize if there an existing
- backup. <<<dfsadmin -upgradeProgress>>> status can tell if the cluster
- needs to be finalized.
- * Stop the cluster and distribute new version of Hadoop.
- * Run the new version with <<<-upgrade>>> option (<<<bin/start-dfs.sh -upgrade>>>).
- * Most of the time, cluster works just fine. Once the new HDFS is
- considered working well (may be after a few days of operation),
- finalize the upgrade. Note that until the cluster is finalized,
- deleting the files that existed before the upgrade does not free up
- real disk space on the DataNodes.
- * If there is a need to move back to the old version,
- * stop the cluster and distribute earlier version of Hadoop.
- * start the cluster with rollback option. (<<<bin/start-dfs.sh -rollback>>>).
- When upgrading to a new version of HDFS, it is necessary to rename or
- delete any paths that are reserved in the new version of HDFS. If the
- NameNode encounters a reserved path during upgrade, it will print an
- error like the following:
- <<< /.reserved is a reserved path and .snapshot is a
- reserved path component in this version of HDFS. Please rollback and delete
- or rename this path, or upgrade with the -renameReserved [key-value pairs]
- option to automatically rename these paths during upgrade.>>>
- Specifying <<<-upgrade -renameReserved [optional key-value pairs]>>> causes
- the NameNode to automatically rename any reserved paths found during
- startup. For example, to rename all paths named <<<.snapshot>>> to
- <<<.my-snapshot>>> and <<<.reserved>>> to <<<.my-reserved>>>, a user would
- specify <<<-upgrade -renameReserved
- .snapshot=.my-snapshot,.reserved=.my-reserved>>>.
- If no key-value pairs are specified with <<<-renameReserved>>>, the
- NameNode will then suffix reserved paths with
- <<<.<LAYOUT-VERSION>.UPGRADE_RENAMED>>>, e.g.
- <<<.snapshot.-51.UPGRADE_RENAMED>>>.
- There are some caveats to this renaming process. It's recommended,
- if possible, to first <<<hdfs dfsadmin -saveNamespace>>> before upgrading.
- This is because data inconsistency can result if an edit log operation
- refers to the destination of an automatically renamed file.
- * File Permissions and Security
- The file permissions are designed to be similar to file permissions on
- other familiar platforms like Linux. Currently, security is limited to
- simple file permissions. The user that starts NameNode is treated as
- the superuser for HDFS. Future versions of HDFS will support network
- authentication protocols like Kerberos for user authentication and
- encryption of data transfers. The details are discussed in the
- Permissions Guide.
- * Scalability
- Hadoop currently runs on clusters with thousands of nodes. The
- {{{http://wiki.apache.org/hadoop/PoweredBy}PoweredBy}} Wiki page lists
- some of the organizations that deploy Hadoop on large clusters.
- HDFS has one NameNode for each cluster. Currently the total memory
- available on NameNode is the primary scalability limitation.
- On very large clusters, increasing average size of files stored in
- HDFS helps with increasing cluster size without increasing memory
- requirements on NameNode. The default configuration may not suite
- very large clusters. The {{{http://wiki.apache.org/hadoop/FAQ}FAQ}}
- Wiki page lists suggested configuration improvements for large Hadoop clusters.
- * Related Documentation
- This user guide is a good starting point for working with HDFS. While
- the user guide continues to improve, there is a large wealth of
- documentation about Hadoop and HDFS. The following list is a starting
- point for further exploration:
- * {{{http://hadoop.apache.org}Hadoop Site}}: The home page for
- the Apache Hadoop site.
- * {{{http://wiki.apache.org/hadoop/FrontPage}Hadoop Wiki}}:
- The home page (FrontPage) for the Hadoop Wiki. Unlike
- the released documentation, which is part of Hadoop source tree,
- Hadoop Wiki is regularly edited by Hadoop Community.
- * {{{http://wiki.apache.org/hadoop/FAQ}FAQ}}: The FAQ Wiki page.
- * {{{../../api/index.html}Hadoop JavaDoc API}}.
- * Hadoop User Mailing List: user[at]hadoop.apache.org.
- * Explore {{{./hdfs-default.xml}hdfs-default.xml}}. It includes
- brief description of most of the configuration variables available.
- * {{{../hadoop-common/CommandsManual.html}Hadoop Commands Guide}}:
- Hadoop commands usage.
|