浏览代码

HDFS-7667. Various typos and improvements to HDFS Federation doc (Charles Lamb via aw)

Allen Wittenauer 10 年之前
父节点
当前提交
d411460e0d
共有 2 个文件被更改,包括 105 次插入105 次删除
  1. 3 0
      hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt
  2. 102 105
      hadoop-hdfs-project/hadoop-hdfs/src/site/apt/Federation.apt.vm

+ 3 - 0
hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt

@@ -290,6 +290,9 @@ Trunk (Unreleased)
     HADOOP-11484. hadoop-mapreduce-client-nativetask fails to build on ARM
     HADOOP-11484. hadoop-mapreduce-client-nativetask fails to build on ARM
     AARCH64 due to x86 asm statements (Edward Nevill via Colin P. McCabe)
     AARCH64 due to x86 asm statements (Edward Nevill via Colin P. McCabe)
 
 
+    HDFS-7667. Various typos and improvements to HDFS Federation doc
+    (Charles Lamb via aw)
+
 Release 2.7.0 - UNRELEASED
 Release 2.7.0 - UNRELEASED
 
 
   INCOMPATIBLE CHANGES
   INCOMPATIBLE CHANGES

+ 102 - 105
hadoop-hdfs-project/hadoop-hdfs/src/site/apt/Federation.apt.vm

@@ -32,16 +32,16 @@ HDFS Federation
 
 
   * <<Namespace>>
   * <<Namespace>>
 
 
-    * Consists of directories, files and blocks
+    * Consists of directories, files and blocks.
 
 
     * It supports all the namespace related file system operations such as
     * It supports all the namespace related file system operations such as
       create, delete, modify and list files and directories.
       create, delete, modify and list files and directories.
 
 
-  * <<Block Storage Service>> has two parts
+  * <<Block Storage Service>>, which has two parts:
 
 
-    * Block Management (which is done in Namenode)
+    * Block Management (performed in the Namenode)
 
 
-      * Provides datanode cluster membership by handling registrations, and
+      * Provides Datanode cluster membership by handling registrations, and
         periodic heart beats.
         periodic heart beats.
 
 
       * Processes block reports and maintains location of blocks.
       * Processes block reports and maintains location of blocks.
@@ -49,29 +49,29 @@ HDFS Federation
       * Supports block related operations such as create, delete, modify and
       * Supports block related operations such as create, delete, modify and
         get block location.
         get block location.
 
 
-      * Manages replica placement and replication of a block for under
-        replicated blocks and deletes blocks that are over replicated.
+      * Manages replica placement, block replication for under
+        replicated blocks, and deletes blocks that are over replicated.
 
 
-    * Storage - is provided by datanodes by storing blocks on the local file
-      system and allows read/write access.
+    * Storage - is provided by Datanodes by storing blocks on the local file
+      system and allowing read/write access.
 
 
   The prior HDFS architecture allows only a single namespace for the
   The prior HDFS architecture allows only a single namespace for the
-  entire cluster. A single Namenode manages this namespace. HDFS
-  Federation addresses limitation of the prior architecture by adding
-  support multiple Namenodes/namespaces to HDFS file system.
+  entire cluster. In that configuration, a single Namenode manages the
+  namespace. HDFS Federation addresses this limitation by adding
+  support for multiple Namenodes/namespaces to HDFS.
 
 
 * {Multiple Namenodes/Namespaces}
 * {Multiple Namenodes/Namespaces}
 
 
   In order to scale the name service horizontally, federation uses multiple
   In order to scale the name service horizontally, federation uses multiple
-  independent Namenodes/namespaces. The Namenodes are federated, that is, the
+  independent Namenodes/namespaces. The Namenodes are federated; the
   Namenodes are independent and do not require coordination with each other.
   Namenodes are independent and do not require coordination with each other.
-  The datanodes are used as common storage for blocks by all the Namenodes.
-  Each datanode registers with all the Namenodes in the cluster. Datanodes
-  send periodic heartbeats and block reports and handles commands from the
-  Namenodes.
+  The Datanodes are used as common storage for blocks by all the Namenodes.
+  Each Datanode registers with all the Namenodes in the cluster. Datanodes
+  send periodic heartbeats and block reports. They also handle
+  commands from the Namenodes.
 
 
-  Users may use {{{./ViewFs.html}ViewFs}} to create personalized namespace views,
-  where ViewFs is analogous to client side mount tables in some Unix/Linux systems.
+  Users may use {{{./ViewFs.html}ViewFs}} to create personalized namespace views.
+  ViewFs is analogous to client side mount tables in some Unix/Linux systems.
 
 
 [./images/federation.gif] HDFS Federation Architecture
 [./images/federation.gif] HDFS Federation Architecture
 
 
@@ -79,66 +79,67 @@ HDFS Federation
   <<Block Pool>>
   <<Block Pool>>
 
 
   A Block Pool is a set of blocks that belong to a single namespace.
   A Block Pool is a set of blocks that belong to a single namespace.
-  Datanodes store blocks for all the block pools in the cluster.
-  It is managed independently of other block pools. This allows a namespace
-  to generate Block IDs for new blocks without the need for coordination
-  with the other namespaces. The failure of a Namenode does not prevent
-  the datanode from serving other Namenodes in the cluster.
+  Datanodes store blocks for all the block pools in the cluster.  Each
+  Block Pool is managed independently. This allows a namespace to
+  generate Block IDs for new blocks without the need for coordination
+  with the other namespaces. A Namenode failure does not prevent the
+  Datanode from serving other Namenodes in the cluster.
 
 
   A Namespace and its block pool together are called Namespace Volume.
   A Namespace and its block pool together are called Namespace Volume.
   It is a self-contained unit of management. When a Namenode/namespace
   It is a self-contained unit of management. When a Namenode/namespace
-  is deleted, the corresponding block pool at the datanodes is deleted.
+  is deleted, the corresponding block pool at the Datanodes is deleted.
   Each namespace volume is upgraded as a unit, during cluster upgrade.
   Each namespace volume is upgraded as a unit, during cluster upgrade.
 
 
   <<ClusterID>>
   <<ClusterID>>
 
 
-  A new identifier <<ClusterID>> is added to identify all the nodes in
-  the cluster.  When a Namenode is formatted, this identifier is provided
-  or auto generated. This ID should be used for formatting the other
-  Namenodes into the cluster.
+  A <<ClusterID>> identifier is used to identify all the nodes in the
+  cluster.  When a Namenode is formatted, this identifier is either
+  provided or auto generated. This ID should be used for formatting
+  the other Namenodes into the cluster.
 
 
 ** Key Benefits
 ** Key Benefits
 
 
-  * Namespace Scalability - HDFS cluster storage scales horizontally but
-    the namespace does not. Large deployments or deployments using lot
-    of small files benefit from scaling the namespace by adding more
-    Namenodes to the cluster
+  * Namespace Scalability - Federation adds namespace horizontal
+    scaling. Large deployments or deployments using lot of small files
+    benefit from namespace scaling by allowing more Namenodes to be
+    added to the cluster.
 
 
-  * Performance - File system operation throughput is limited by a single
-    Namenode in the prior architecture. Adding more Namenodes to the cluster
-    scales the file system read/write operations throughput.
+  * Performance - File system throughput is not limited by a single
+    Namenode. Adding more Namenodes to the cluster scales the file
+    system read/write throughput.
 
 
-  * Isolation - A single Namenode offers no isolation in multi user
-    environment. An experimental application can overload the Namenode
-    and slow down production critical applications. With multiple Namenodes,
-    different categories of applications and users can be isolated to
-    different namespaces.
+  * Isolation - A single Namenode offers no isolation in a multi user
+    environment. For example, an experimental application can overload
+    the Namenode and slow down production critical applications. By using
+    multiple Namenodes, different categories of applications and users
+    can be isolated to different namespaces.
 
 
 * {Federation Configuration}
 * {Federation Configuration}
 
 
-  Federation configuration is <<backward compatible>> and allows existing
-  single Namenode configuration to work without any change. The new
-  configuration is designed such that all the nodes in the cluster have
-  same configuration without the need for deploying different configuration
-  based on the type of the node in the cluster.
+  Federation configuration is <<backward compatible>> and allows
+  existing single Namenode configurations to work without any
+  change. The new configuration is designed such that all the nodes in
+  the cluster have the same configuration without the need for
+  deploying different configurations based on the type of the node in
+  the cluster.
 
 
-  A new abstraction called <<<NameServiceID>>> is added with
-  federation. The Namenode and its corresponding secondary/backup/checkpointer
-  nodes belong to this. To support single configuration file, the Namenode and
-  secondary/backup/checkpointer configuration parameters are suffixed with
-  <<<NameServiceID>>> and are added to the same configuration file.
+  Federation adds a new <<<NameServiceID>>> abstraction. A Namenode
+  and its corresponding secondary/backup/checkpointer nodes all belong
+  to a NameServiceId. In order to support a single configuration file,
+  the Namenode and secondary/backup/checkpointer configuration
+  parameters are suffixed with the <<<NameServiceID>>>.
 
 
 
 
 ** Configuration:
 ** Configuration:
 
 
-  <<Step 1>>: Add the following parameters to your configuration:
-  <<<dfs.nameservices>>>: Configure with list of comma separated
-  NameServiceIDs. This will be used by Datanodes to determine all the
+  <<Step 1>>: Add the <<<dfs.nameservices>>> parameter to your
+  configuration and configure it with a list of comma separated
+  NameServiceIDs. This will be used by the Datanodes to determine the
   Namenodes in the cluster.
   Namenodes in the cluster.
 
 
   <<Step 2>>: For each Namenode and Secondary Namenode/BackupNode/Checkpointer
   <<Step 2>>: For each Namenode and Secondary Namenode/BackupNode/Checkpointer
-  add the following configuration suffixed with the corresponding
-  <<<NameServiceID>>> into the common configuration file.
+  add the following configuration parameters suffixed with the corresponding
+  <<<NameServiceID>>> into the common configuration file:
 
 
 *---------------------+--------------------------------------------+
 *---------------------+--------------------------------------------+
 || Daemon             || Configuration Parameter                   |
 || Daemon             || Configuration Parameter                   |
@@ -160,7 +161,7 @@ HDFS Federation
 |                     | <<<dfs.secondary.namenode.keytab.file>>>   |
 |                     | <<<dfs.secondary.namenode.keytab.file>>>   |
 *---------------------+--------------------------------------------+
 *---------------------+--------------------------------------------+
 
 
-  Here is an example configuration with two namenodes:
+  Here is an example configuration with two Namenodes:
 
 
 ----
 ----
 <configuration>
 <configuration>
@@ -199,16 +200,16 @@ HDFS Federation
 
 
 ** Formatting Namenodes
 ** Formatting Namenodes
 
 
-  <<Step 1>>: Format a namenode using the following command:
+  <<Step 1>>: Format a Namenode using the following command:
 
 
 ----
 ----
 [hdfs]$ $HADOOP_PREFIX/bin/hdfs namenode -format [-clusterId <cluster_id>]
 [hdfs]$ $HADOOP_PREFIX/bin/hdfs namenode -format [-clusterId <cluster_id>]
 ----
 ----
-  Choose a unique cluster_id, which will not conflict other clusters in
-  your environment. If it is not provided, then a unique ClusterID is
+  Choose a unique cluster_id which will not conflict other clusters in
+  your environment. If a cluster_id is not provided, then a unique one is
   auto generated.
   auto generated.
 
 
-  <<Step 2>>: Format additional namenode using the following command:
+  <<Step 2>>: Format additional Namenodes using the following command:
 
 
 ----
 ----
 [hdfs]$ $HADOOP_PREFIX/bin/hdfs namenode -format -clusterId <cluster_id>
 [hdfs]$ $HADOOP_PREFIX/bin/hdfs namenode -format -clusterId <cluster_id>
@@ -219,40 +220,38 @@ HDFS Federation
 
 
 ** Upgrading from an older release and configuring federation
 ** Upgrading from an older release and configuring federation
 
 
-  Older releases supported a single Namenode.
-  Upgrade the cluster to newer release to enable federation
+  Older releases only support a single Namenode.
+  Upgrade the cluster to newer release in order to enable federation
   During upgrade you can provide a ClusterID as follows:
   During upgrade you can provide a ClusterID as follows:
 
 
 ----
 ----
-[hdfs]$ $HADOOP_PREFIX/bin/hdfs start namenode --config $HADOOP_CONF_DIR  -upgrade -clusterId <cluster_ID>
+[hdfs]$ $HADOOP_PREFIX/bin/hdfs --daemon start namenode -upgrade -clusterId <cluster_ID>
 ----
 ----
-  If ClusterID is not provided, it is auto generated.
+  If cluster_id is not provided, it is auto generated.
 
 
 ** Adding a new Namenode to an existing HDFS cluster
 ** Adding a new Namenode to an existing HDFS cluster
 
 
-  Follow the following steps:
+  Perform the following steps:
 
 
-  * Add configuration parameter <<<dfs.nameservices>>> to the configuration.
+  * Add <<<dfs.nameservices>>> to the configuration.
 
 
-  * Update the configuration with NameServiceID suffix. Configuration
-    key names have changed post release 0.20. You must use new configuration
-    parameter names, for federation.
+  * Update the configuration with the NameServiceID suffix. Configuration
+    key names changed post release 0.20. You must use the new configuration
+    parameter names in order to use federation.
 
 
-  * Add new Namenode related config to the configuration files.
+  * Add the new Namenode related config to the configuration file.
 
 
   * Propagate the configuration file to the all the nodes in the cluster.
   * Propagate the configuration file to the all the nodes in the cluster.
 
 
-  * Start the new Namenode, Secondary/Backup.
+  * Start the new Namenode and Secondary/Backup.
 
 
-  * Refresh the datanodes to pickup the newly added Namenode by running
-    the following command:
+  * Refresh the Datanodes to pickup the newly added Namenode by running
+    the following command against all the Datanodes in the cluster:
 
 
 ----
 ----
-[hdfs]$ $HADOOP_PREFIX/bin/hdfs dfadmin -refreshNameNode <datanode_host_name>:<datanode_rpc_port>
+[hdfs]$ $HADOOP_PREFIX/bin/hdfs dfsadmin -refreshNameNode <datanode_host_name>:<datanode_rpc_port>
 ----
 ----
 
 
-  * The above command must be run against all the datanodes in the cluster.
-
 * {Managing the cluster}
 * {Managing the cluster}
 
 
 **  Starting and stopping cluster
 **  Starting and stopping cluster
@@ -270,28 +269,28 @@ HDFS Federation
 ----
 ----
 
 
   These commands can be run from any node where the HDFS configuration is
   These commands can be run from any node where the HDFS configuration is
-  available.  The command uses configuration to determine the Namenodes
-  in the cluster and starts the Namenode process on those nodes. The
-  datanodes are started on nodes specified in the <<<slaves>>> file. The
-  script can be used as reference for building your own scripts for
-  starting and stopping the cluster.
+  available.  The command uses the configuration to determine the Namenodes
+  in the cluster and then starts the Namenode process on those nodes. The
+  Datanodes are started on the nodes specified in the <<<slaves>>> file. The
+  script can be used as a reference for building your own scripts to
+  start and stop the cluster.
 
 
 **  Balancer
 **  Balancer
 
 
-  Balancer has been changed to work with multiple Namenodes in the cluster to
-  balance the cluster. Balancer can be run using the command:
+  The Balancer has been changed to work with multiple
+  Namenodes. The Balancer can be run using the command:
 
 
 ----
 ----
 [hdfs]$ $HADOOP_PREFIX/bin/hdfs --daemon start balancer [-policy <policy>]
 [hdfs]$ $HADOOP_PREFIX/bin/hdfs --daemon start balancer [-policy <policy>]
 ----
 ----
 
 
-  Policy could be:
+  The policy parameter can be any of the following:
 
 
   * <<<datanode>>> - this is the <default> policy. This balances the storage at
   * <<<datanode>>> - this is the <default> policy. This balances the storage at
-    the datanode level. This is similar to balancing policy from prior releases.
+    the Datanode level. This is similar to balancing policy from prior releases.
 
 
-  * <<<blockpool>>> - this balances the storage at the block pool level.
-    Balancing at block pool level balances storage at the datanode level also.
+  * <<<blockpool>>> - this balances the storage at the block pool
+    level which also balances at the Datanode level.
 
 
   Note that Balancer only balances the data and does not balance the namespace.
   Note that Balancer only balances the data and does not balance the namespace.
   For the complete command usage, see {{{../hadoop-common/CommandsManual.html#balancer}balancer}}.
   For the complete command usage, see {{{../hadoop-common/CommandsManual.html#balancer}balancer}}.
@@ -299,44 +298,42 @@ HDFS Federation
 ** Decommissioning
 ** Decommissioning
 
 
   Decommissioning is similar to prior releases. The nodes that need to be
   Decommissioning is similar to prior releases. The nodes that need to be
-  decomissioned are added to the exclude file at all the Namenode. Each
+  decomissioned are added to the exclude file at all of the Namenodes. Each
   Namenode decommissions its Block Pool. When all the Namenodes finish
   Namenode decommissions its Block Pool. When all the Namenodes finish
-  decommissioning a datanode, the datanode is considered to be decommissioned.
+  decommissioning a Datanode, the Datanode is considered decommissioned.
 
 
-  <<Step 1>>: To distributed an exclude file to all the Namenodes, use the
+  <<Step 1>>: To distribute an exclude file to all the Namenodes, use the
   following command:
   following command:
 
 
 ----
 ----
-[hdfs]$ $HADOOP_PREFIX/sbin/distributed-exclude.sh <exclude_file>
+[hdfs]$ $HADOOP_PREFIX/sbin/distribute-exclude.sh <exclude_file>
 ----
 ----
 
 
-  <<Step 2>>: Refresh all the Namenodes to pick up the new exclude file.
+  <<Step 2>>: Refresh all the Namenodes to pick up the new exclude file:
 
 
 ----
 ----
 [hdfs]$ $HADOOP_PREFIX/sbin/refresh-namenodes.sh
 [hdfs]$ $HADOOP_PREFIX/sbin/refresh-namenodes.sh
 ----
 ----
 
 
-  The above command uses HDFS configuration to determine the Namenodes
-  configured in the cluster and refreshes all the Namenodes to pick up
+  The above command uses HDFS configuration to determine the
+  configured Namenodes in the cluster and refreshes them to pick up
   the new exclude file.
   the new exclude file.
 
 
 ** Cluster Web Console
 ** Cluster Web Console
 
 
-  Similar to Namenode status web page, a Cluster Web Console is added in
-  federation to monitor the federated cluster at
+  Similar to the Namenode status web page, when using federation a
+  Cluster Web Console is available to monitor the federated cluster at
   <<<http://<any_nn_host:port>/dfsclusterhealth.jsp>>>.
   <<<http://<any_nn_host:port>/dfsclusterhealth.jsp>>>.
   Any Namenode in the cluster can be used to access this web page.
   Any Namenode in the cluster can be used to access this web page.
 
 
-  The web page provides the following information:
+  The Cluster Web Console provides the following information:
 
 
-  * Cluster summary that shows number of files, number of blocks and
-    total configured storage capacity, available and used storage information
+  * A cluster summary that shows the number of files, number of blocks,
+    total configured storage capacity, and the available and used storage
     for the entire cluster.
     for the entire cluster.
 
 
-  * Provides list of Namenodes and summary that includes number of files,
-    blocks, missing blocks, number of live and dead data nodes for each
-    Namenode. It also provides a link to conveniently access Namenode web UI.
-
-  * It also provides decommissioning status of datanodes.
-
+  * A list of Namenodes and a summary that includes the number of files,
+    blocks, missing blocks, and live and dead data nodes for each
+    Namenode. It also provides a link to access each Namenode's web UI.
 
 
+  * The decommissioning status of Datanodes.