há 10 anos atrás · d411460e0d
--- a/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
+++ b/hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
@@ -290,6 +290,9 @@ Trunk (Unreleased)
 
				     HADOOP-11484. hadoop-mapreduce-client-nativetask fails to build on ARM
			
 
				     AARCH64 due to x86 asm statements (Edward Nevill via Colin P. McCabe)
			
 
				 
			
 
				+    HDFS-7667. Various typos and improvements to HDFS Federation doc
			
 
				+    (Charles Lamb via aw)
			
 
				+
			
 
				 Release 2.7.0 - UNRELEASED
			
 
				 
			
 
				   INCOMPATIBLE CHANGES
			
--- a/hadoop-hdfs-project/hadoop-hdfs/src/site/apt/Federation.apt.vm
+++ b/hadoop-hdfs-project/hadoop-hdfs/src/site/apt/Federation.apt.vm
@@ -32,16 +32,16 @@ HDFS Federation
 
				 
			
 
				   * <<Namespace>>
			
 
				 
			
 
				-    * Consists of directories, files and blocks
			
 
				+    * Consists of directories, files and blocks.
			
 
				 
			
 
				     * It supports all the namespace related file system operations such as
			
 
				       create, delete, modify and list files and directories.
			
 
				 
			
 
				-  * <<Block Storage Service>> has two parts
			
 
				+  * <<Block Storage Service>>, which has two parts:
			
 
				 
			
 
				-    * Block Management (which is done in Namenode)
			
 
				+    * Block Management (performed in the Namenode)
			
 
				 
			
 
				-      * Provides datanode cluster membership by handling registrations, and
			
 
				+      * Provides Datanode cluster membership by handling registrations, and
			
 
				         periodic heart beats.
			
 
				 
			
 
				       * Processes block reports and maintains location of blocks.
			
@@ -49,29 +49,29 @@ HDFS Federation
 
				       * Supports block related operations such as create, delete, modify and
			
 
				         get block location.
			
 
				 
			
 
				-      * Manages replica placement and replication of a block for under
			
 
				-        replicated blocks and deletes blocks that are over replicated.
			
 
				+      * Manages replica placement, block replication for under
			
 
				+        replicated blocks, and deletes blocks that are over replicated.
			
 
				 
			
 
				-    * Storage - is provided by datanodes by storing blocks on the local file
			
 
				-      system and allows read/write access.
			
 
				+    * Storage - is provided by Datanodes by storing blocks on the local file
			
 
				+      system and allowing read/write access.
			
 
				 
			
 
				   The prior HDFS architecture allows only a single namespace for the
			
 
				-  entire cluster. A single Namenode manages this namespace. HDFS
			
 
				-  Federation addresses limitation of the prior architecture by adding
			
 
				-  support multiple Namenodes/namespaces to HDFS file system.
			
 
				+  entire cluster. In that configuration, a single Namenode manages the
			
 
				+  namespace. HDFS Federation addresses this limitation by adding
			
 
				+  support for multiple Namenodes/namespaces to HDFS.
			
 
				 
			
 
				 * {Multiple Namenodes/Namespaces}
			
 
				 
			
 
				   In order to scale the name service horizontally, federation uses multiple
			
 
				-  independent Namenodes/namespaces. The Namenodes are federated, that is, the
			
 
				+  independent Namenodes/namespaces. The Namenodes are federated; the
			
 
				   Namenodes are independent and do not require coordination with each other.
			
 
				-  The datanodes are used as common storage for blocks by all the Namenodes.
			
 
				-  Each datanode registers with all the Namenodes in the cluster. Datanodes
			
 
				-  send periodic heartbeats and block reports and handles commands from the
			
 
				-  Namenodes.
			
 
				+  The Datanodes are used as common storage for blocks by all the Namenodes.
			
 
				+  Each Datanode registers with all the Namenodes in the cluster. Datanodes
			
 
				+  send periodic heartbeats and block reports. They also handle
			
 
				+  commands from the Namenodes.
			
 
				 
			
 
				-  Users may use {{{./ViewFs.html}ViewFs}} to create personalized namespace views,
			
 
				-  where ViewFs is analogous to client side mount tables in some Unix/Linux systems.
			
 
				+  Users may use {{{./ViewFs.html}ViewFs}} to create personalized namespace views.
			
 
				+  ViewFs is analogous to client side mount tables in some Unix/Linux systems.
			
 
				 
			
 
				 [./images/federation.gif] HDFS Federation Architecture
			
 
				 
			
@@ -79,66 +79,67 @@ HDFS Federation
 
				   <<Block Pool>>
			
 
				 
			
 
				   A Block Pool is a set of blocks that belong to a single namespace.
			
 
				-  Datanodes store blocks for all the block pools in the cluster.
			
 
				-  It is managed independently of other block pools. This allows a namespace
			
 
				-  to generate Block IDs for new blocks without the need for coordination
			
 
				-  with the other namespaces. The failure of a Namenode does not prevent
			
 
				-  the datanode from serving other Namenodes in the cluster.
			
 
				+  Datanodes store blocks for all the block pools in the cluster.  Each
			
 
				+  Block Pool is managed independently. This allows a namespace to
			
 
				+  generate Block IDs for new blocks without the need for coordination
			
 
				+  with the other namespaces. A Namenode failure does not prevent the
			
 
				+  Datanode from serving other Namenodes in the cluster.
			
 
				 
			
 
				   A Namespace and its block pool together are called Namespace Volume.
			
 
				   It is a self-contained unit of management. When a Namenode/namespace
			
 
				-  is deleted, the corresponding block pool at the datanodes is deleted.
			
 
				+  is deleted, the corresponding block pool at the Datanodes is deleted.
			
 
				   Each namespace volume is upgraded as a unit, during cluster upgrade.
			
 
				 
			
 
				   <<ClusterID>>
			
 
				 
			
 
				-  A new identifier <<ClusterID>> is added to identify all the nodes in
			
 
				-  the cluster.  When a Namenode is formatted, this identifier is provided
			
 
				-  or auto generated. This ID should be used for formatting the other
			
 
				-  Namenodes into the cluster.
			
 
				+  A <<ClusterID>> identifier is used to identify all the nodes in the
			
 
				+  cluster.  When a Namenode is formatted, this identifier is either
			
 
				+  provided or auto generated. This ID should be used for formatting
			
 
				+  the other Namenodes into the cluster.
			
 
				 
			
 
				 ** Key Benefits
			
 
				 
			
 
				-  * Namespace Scalability - HDFS cluster storage scales horizontally but
			
 
				-    the namespace does not. Large deployments or deployments using lot
			
 
				-    of small files benefit from scaling the namespace by adding more
			
 
				-    Namenodes to the cluster
			
 
				+  * Namespace Scalability - Federation adds namespace horizontal
			
 
				+    scaling. Large deployments or deployments using lot of small files
			
 
				+    benefit from namespace scaling by allowing more Namenodes to be
			
 
				+    added to the cluster.
			
 
				 
			
 
				-  * Performance - File system operation throughput is limited by a single
			
 
				-    Namenode in the prior architecture. Adding more Namenodes to the cluster
			
 
				-    scales the file system read/write operations throughput.
			
 
				+  * Performance - File system throughput is not limited by a single
			
 
				+    Namenode. Adding more Namenodes to the cluster scales the file
			
 
				+    system read/write throughput.
			
 
				 
			
 
				-  * Isolation - A single Namenode offers no isolation in multi user
			
 
				-    environment. An experimental application can overload the Namenode
			
 
				-    and slow down production critical applications. With multiple Namenodes,
			
 
				-    different categories of applications and users can be isolated to
			
 
				-    different namespaces.
			
 
				+  * Isolation - A single Namenode offers no isolation in a multi user
			
 
				+    environment. For example, an experimental application can overload
			
 
				+    the Namenode and slow down production critical applications. By using
			
 
				+    multiple Namenodes, different categories of applications and users
			
 
				+    can be isolated to different namespaces.
			
 
				 
			
 
				 * {Federation Configuration}
			
 
				 
			
 
				-  Federation configuration is <<backward compatible>> and allows existing
			
 
				-  single Namenode configuration to work without any change. The new
			
 
				-  configuration is designed such that all the nodes in the cluster have
			
 
				-  same configuration without the need for deploying different configuration
			
 
				-  based on the type of the node in the cluster.
			
 
				+  Federation configuration is <<backward compatible>> and allows
			
 
				+  existing single Namenode configurations to work without any
			
 
				+  change. The new configuration is designed such that all the nodes in
			
 
				+  the cluster have the same configuration without the need for
			
 
				+  deploying different configurations based on the type of the node in
			
 
				+  the cluster.
			
 
				 
			
 
				-  A new abstraction called <<<NameServiceID>>> is added with
			
 
				-  federation. The Namenode and its corresponding secondary/backup/checkpointer
			
 
				-  nodes belong to this. To support single configuration file, the Namenode and
			
 
				-  secondary/backup/checkpointer configuration parameters are suffixed with
			
 
				-  <<<NameServiceID>>> and are added to the same configuration file.
			
 
				+  Federation adds a new <<<NameServiceID>>> abstraction. A Namenode
			
 
				+  and its corresponding secondary/backup/checkpointer nodes all belong
			
 
				+  to a NameServiceId. In order to support a single configuration file,
			
 
				+  the Namenode and secondary/backup/checkpointer configuration
			
 
				+  parameters are suffixed with the <<<NameServiceID>>>.
			
 
				 
			
 
				 
			
 
				 ** Configuration:
			
 
				 
			
 
				-  <<Step 1>>: Add the following parameters to your configuration:
			
 
				-  <<<dfs.nameservices>>>: Configure with list of comma separated
			
 
				-  NameServiceIDs. This will be used by Datanodes to determine all the
			
 
				+  <<Step 1>>: Add the <<<dfs.nameservices>>> parameter to your
			
 
				+  configuration and configure it with a list of comma separated
			
 
				+  NameServiceIDs. This will be used by the Datanodes to determine the
			
 
				   Namenodes in the cluster.
			
 
				 
			
 
				   <<Step 2>>: For each Namenode and Secondary Namenode/BackupNode/Checkpointer
			
 
				-  add the following configuration suffixed with the corresponding
			
 
				-  <<<NameServiceID>>> into the common configuration file.
			
 
				+  add the following configuration parameters suffixed with the corresponding
			
 
				+  <<<NameServiceID>>> into the common configuration file:
			
 
				 
			
 
				 *---------------------+--------------------------------------------+
			
 
				 || Daemon             || Configuration Parameter                   |
			
@@ -160,7 +161,7 @@ HDFS Federation
 
				 |                     | <<<dfs.secondary.namenode.keytab.file>>>   |
			
 
				 *---------------------+--------------------------------------------+
			
 
				 
			
 
				-  Here is an example configuration with two namenodes:
			
 
				+  Here is an example configuration with two Namenodes:
			
 
				 
			
 
				 ----
			
 
				 <configuration>
			
@@ -199,16 +200,16 @@ HDFS Federation
 
				 
			
 
				 ** Formatting Namenodes
			
 
				 
			
 
				-  <<Step 1>>: Format a namenode using the following command:
			
 
				+  <<Step 1>>: Format a Namenode using the following command:
			
 
				 
			
 
				 ----
			
 
				 [hdfs]$ $HADOOP_PREFIX/bin/hdfs namenode -format [-clusterId <cluster_id>]
			
 
				 ----
			
 
				-  Choose a unique cluster_id, which will not conflict other clusters in
			
 
				-  your environment. If it is not provided, then a unique ClusterID is
			
 
				+  Choose a unique cluster_id which will not conflict other clusters in
			
 
				+  your environment. If a cluster_id is not provided, then a unique one is
			
 
				   auto generated.
			
 
				 
			
 
				-  <<Step 2>>: Format additional namenode using the following command:
			
 
				+  <<Step 2>>: Format additional Namenodes using the following command:
			
 
				 
			
 
				 ----
			
 
				 [hdfs]$ $HADOOP_PREFIX/bin/hdfs namenode -format -clusterId <cluster_id>
			
@@ -219,40 +220,38 @@ HDFS Federation
 
				 
			
 
				 ** Upgrading from an older release and configuring federation
			
 
				 
			
 
				-  Older releases supported a single Namenode.
			
 
				-  Upgrade the cluster to newer release to enable federation
			
 
				+  Older releases only support a single Namenode.
			
 
				+  Upgrade the cluster to newer release in order to enable federation
			
 
				   During upgrade you can provide a ClusterID as follows:
			
 
				 
			
 
				 ----
			
 
				-[hdfs]$ $HADOOP_PREFIX/bin/hdfs start namenode --config $HADOOP_CONF_DIR  -upgrade -clusterId <cluster_ID>
			
 
				+[hdfs]$ $HADOOP_PREFIX/bin/hdfs --daemon start namenode -upgrade -clusterId <cluster_ID>
			
 
				 ----
			
 
				-  If ClusterID is not provided, it is auto generated.
			
 
				+  If cluster_id is not provided, it is auto generated.
			
 
				 
			
 
				 ** Adding a new Namenode to an existing HDFS cluster
			
 
				 
			
 
				-  Follow the following steps:
			
 
				+  Perform the following steps:
			
 
				 
			
 
				-  * Add configuration parameter <<<dfs.nameservices>>> to the configuration.
			
 
				+  * Add <<<dfs.nameservices>>> to the configuration.
			
 
				 
			
 
				-  * Update the configuration with NameServiceID suffix. Configuration
			
 
				-    key names have changed post release 0.20. You must use new configuration
			
 
				-    parameter names, for federation.
			
 
				+  * Update the configuration with the NameServiceID suffix. Configuration
			
 
				+    key names changed post release 0.20. You must use the new configuration
			
 
				+    parameter names in order to use federation.
			
 
				 
			
 
				-  * Add new Namenode related config to the configuration files.
			
 
				+  * Add the new Namenode related config to the configuration file.
			
 
				 
			
 
				   * Propagate the configuration file to the all the nodes in the cluster.
			
 
				 
			
 
				-  * Start the new Namenode, Secondary/Backup.
			
 
				+  * Start the new Namenode and Secondary/Backup.
			
 
				 
			
 
				-  * Refresh the datanodes to pickup the newly added Namenode by running
			
 
				-    the following command:
			
 
				+  * Refresh the Datanodes to pickup the newly added Namenode by running
			
 
				+    the following command against all the Datanodes in the cluster:
			
 
				 
			
 
				 ----
			
 
				-[hdfs]$ $HADOOP_PREFIX/bin/hdfs dfadmin -refreshNameNode <datanode_host_name>:<datanode_rpc_port>
			
 
				+[hdfs]$ $HADOOP_PREFIX/bin/hdfs dfsadmin -refreshNameNode <datanode_host_name>:<datanode_rpc_port>
			
 
				 ----
			
 
				 
			
 
				-  * The above command must be run against all the datanodes in the cluster.
			
 
				-
			
 
				 * {Managing the cluster}
			
 
				 
			
 
				 **  Starting and stopping cluster
			
@@ -270,28 +269,28 @@ HDFS Federation
 
				 ----
			
 
				 
			
 
				   These commands can be run from any node where the HDFS configuration is
			
 
				-  available.  The command uses configuration to determine the Namenodes
			
 
				-  in the cluster and starts the Namenode process on those nodes. The
			
 
				-  datanodes are started on nodes specified in the <<<slaves>>> file. The
			
 
				-  script can be used as reference for building your own scripts for
			
 
				-  starting and stopping the cluster.
			
 
				+  available.  The command uses the configuration to determine the Namenodes
			
 
				+  in the cluster and then starts the Namenode process on those nodes. The
			
 
				+  Datanodes are started on the nodes specified in the <<<slaves>>> file. The
			
 
				+  script can be used as a reference for building your own scripts to
			
 
				+  start and stop the cluster.
			
 
				 
			
 
				 **  Balancer
			
 
				 
			
 
				-  Balancer has been changed to work with multiple Namenodes in the cluster to
			
 
				-  balance the cluster. Balancer can be run using the command:
			
 
				+  The Balancer has been changed to work with multiple
			
 
				+  Namenodes. The Balancer can be run using the command:
			
 
				 
			
 
				 ----
			
 
				 [hdfs]$ $HADOOP_PREFIX/bin/hdfs --daemon start balancer [-policy <policy>]
			
 
				 ----
			
 
				 
			
 
				-  Policy could be:
			
 
				+  The policy parameter can be any of the following:
			
 
				 
			
 
				   * <<<datanode>>> - this is the <default> policy. This balances the storage at
			
 
				-    the datanode level. This is similar to balancing policy from prior releases.
			
 
				+    the Datanode level. This is similar to balancing policy from prior releases.
			
 
				 
			
 
				-  * <<<blockpool>>> - this balances the storage at the block pool level.
			
 
				-    Balancing at block pool level balances storage at the datanode level also.
			
 
				+  * <<<blockpool>>> - this balances the storage at the block pool
			
 
				+    level which also balances at the Datanode level.
			
 
				 
			
 
				   Note that Balancer only balances the data and does not balance the namespace.
			
 
				   For the complete command usage, see {{{../hadoop-common/CommandsManual.html#balancer}balancer}}.
			
@@ -299,44 +298,42 @@ HDFS Federation
 
				 ** Decommissioning
			
 
				 
			
 
				   Decommissioning is similar to prior releases. The nodes that need to be
			
 
				-  decomissioned are added to the exclude file at all the Namenode. Each
			
 
				+  decomissioned are added to the exclude file at all of the Namenodes. Each
			
 
				   Namenode decommissions its Block Pool. When all the Namenodes finish
			
 
				-  decommissioning a datanode, the datanode is considered to be decommissioned.
			
 
				+  decommissioning a Datanode, the Datanode is considered decommissioned.
			
 
				 
			
 
				-  <<Step 1>>: To distributed an exclude file to all the Namenodes, use the
			
 
				+  <<Step 1>>: To distribute an exclude file to all the Namenodes, use the
			
 
				   following command:
			
 
				 
			
 
				 ----
			
 
				-[hdfs]$ $HADOOP_PREFIX/sbin/distributed-exclude.sh <exclude_file>
			
 
				+[hdfs]$ $HADOOP_PREFIX/sbin/distribute-exclude.sh <exclude_file>
			
 
				 ----
			
 
				 
			
 
				-  <<Step 2>>: Refresh all the Namenodes to pick up the new exclude file.
			
 
				+  <<Step 2>>: Refresh all the Namenodes to pick up the new exclude file:
			
 
				 
			
 
				 ----
			
 
				 [hdfs]$ $HADOOP_PREFIX/sbin/refresh-namenodes.sh
			
 
				 ----
			
 
				 
			
 
				-  The above command uses HDFS configuration to determine the Namenodes
			
 
				-  configured in the cluster and refreshes all the Namenodes to pick up
			
 
				+  The above command uses HDFS configuration to determine the
			
 
				+  configured Namenodes in the cluster and refreshes them to pick up
			
 
				   the new exclude file.
			
 
				 
			
 
				 ** Cluster Web Console
			
 
				 
			
 
				-  Similar to Namenode status web page, a Cluster Web Console is added in
			
 
				-  federation to monitor the federated cluster at
			
 
				+  Similar to the Namenode status web page, when using federation a
			
 
				+  Cluster Web Console is available to monitor the federated cluster at
			
 
				   <<<http://<any_nn_host:port>/dfsclusterhealth.jsp>>>.
			
 
				   Any Namenode in the cluster can be used to access this web page.
			
 
				 
			
 
				-  The web page provides the following information:
			
 
				+  The Cluster Web Console provides the following information:
			
 
				 
			
 
				-  * Cluster summary that shows number of files, number of blocks and
			
 
				-    total configured storage capacity, available and used storage information
			
 
				+  * A cluster summary that shows the number of files, number of blocks,
			
 
				+    total configured storage capacity, and the available and used storage
			
 
				     for the entire cluster.
			
 
				 
			
 
				-  * Provides list of Namenodes and summary that includes number of files,
			
 
				-    blocks, missing blocks, number of live and dead data nodes for each
			
 
				-    Namenode. It also provides a link to conveniently access Namenode web UI.
			
 
				-
			
 
				-  * It also provides decommissioning status of datanodes.
			
 
				-
			
 
				+  * A list of Namenodes and a summary that includes the number of files,
			
 
				+    blocks, missing blocks, and live and dead data nodes for each
			
 
				+    Namenode. It also provides a link to access each Namenode's web UI.
			
 
				 
			
 
				+  * The decommissioning status of Datanodes.