|
@@ -6,7 +6,7 @@
|
|
|
<meta name="Forrest-version" content="0.8">
|
|
|
<meta name="Forrest-skin-name" content="pelt">
|
|
|
<title>
|
|
|
- Hadoop DFS User Guide
|
|
|
+ HDFS User Guide
|
|
|
</title>
|
|
|
<link type="text/css" href="skin/basic.css" rel="stylesheet">
|
|
|
<link media="screen" type="text/css" href="skin/screen.css" rel="stylesheet">
|
|
@@ -190,7 +190,7 @@ document.write("Last Published: " + document.lastModified);
|
|
|
PDF</a>
|
|
|
</div>
|
|
|
<h1>
|
|
|
- Hadoop DFS User Guide
|
|
|
+ HDFS User Guide
|
|
|
</h1>
|
|
|
<div id="minitoc-area">
|
|
|
<ul class="minitoc">
|
|
@@ -215,7 +215,7 @@ document.write("Last Published: " + document.lastModified);
|
|
|
</ul>
|
|
|
</li>
|
|
|
<li>
|
|
|
-<a href="#Secondary+Namenode"> Secondary Namenode </a>
|
|
|
+<a href="#Secondary+NameNode"> Secondary NameNode </a>
|
|
|
</li>
|
|
|
<li>
|
|
|
<a href="#Rebalancer"> Rebalancer </a>
|
|
@@ -227,7 +227,7 @@ document.write("Last Published: " + document.lastModified);
|
|
|
<a href="#Safemode"> Safemode </a>
|
|
|
</li>
|
|
|
<li>
|
|
|
-<a href="#Fsck"> Fsck </a>
|
|
|
+<a href="#fsck"> fsck </a>
|
|
|
</li>
|
|
|
<li>
|
|
|
<a href="#Upgrade+and+Rollback"> Upgrade and Rollback </a>
|
|
@@ -248,11 +248,11 @@ document.write("Last Published: " + document.lastModified);
|
|
|
<h2 class="h3">Purpose</h2>
|
|
|
<div class="section">
|
|
|
<p>
|
|
|
- This document aims to be the starting point for users working with
|
|
|
+ This document is a starting point for users working with
|
|
|
Hadoop Distributed File System (HDFS) either as a part of a
|
|
|
<a href="http://hadoop.apache.org/">Hadoop</a>
|
|
|
cluster or as a stand-alone general purpose distributed file system.
|
|
|
- While HDFS is designed to "just-work" in many environments, a working
|
|
|
+ While HDFS is designed to "just work" in many environments, a working
|
|
|
knowledge of HDFS helps greatly with configuration improvements and
|
|
|
diagnostics on a specific cluster.
|
|
|
</p>
|
|
@@ -264,21 +264,20 @@ document.write("Last Published: " + document.lastModified);
|
|
|
<div class="section">
|
|
|
<p>
|
|
|
HDFS is the primary distributed storage used by Hadoop applications. A
|
|
|
- HDFS cluster primarily consists of a <em>NameNode</em> that manages the
|
|
|
- filesystem metadata and Datanodes that store the actual data. The
|
|
|
+ HDFS cluster primarily consists of a NameNode that manages the
|
|
|
+ file system metadata and DataNodes that store the actual data. The
|
|
|
architecture of HDFS is described in detail
|
|
|
<a href="hdfs_design.html">here</a>. This user guide primarily deals with
|
|
|
interaction of users and administrators with HDFS clusters.
|
|
|
The <a href="images/hdfsarchitecture.gif">diagram</a> from
|
|
|
<a href="hdfs_design.html">HDFS architecture</a> depicts
|
|
|
- basic interactions among Namenode, Datanodes, and the clients. Eseentially,
|
|
|
- clients contact Namenode for file metadata or file modifications and perform
|
|
|
- actual file I/O directly with the datanodes.
|
|
|
+ basic interactions among NameNode, the DataNodes, and the clients.
|
|
|
+ Clients contact NameNode for file metadata or file modifications and perform
|
|
|
+ actual file I/O directly with the DataNodes.
|
|
|
</p>
|
|
|
<p>
|
|
|
The following are some of the salient features that could be of
|
|
|
- interest to many users. The terms in <em>italics</em>
|
|
|
- are described in later sections.
|
|
|
+ interest to many users.
|
|
|
</p>
|
|
|
<ul>
|
|
|
|
|
@@ -298,15 +297,15 @@ document.write("Last Published: " + document.lastModified);
|
|
|
</li>
|
|
|
|
|
|
<li>
|
|
|
- It is written in Java and is supported on all major platforms.
|
|
|
+ Hadoop is written in Java and is supported on all major platforms.
|
|
|
</li>
|
|
|
|
|
|
<li>
|
|
|
- Supports <em>shell like commands</em> to interact with HDFS directly.
|
|
|
+ Hadoop supports shell-like commands to interact with HDFS directly.
|
|
|
</li>
|
|
|
|
|
|
<li>
|
|
|
- Namenode and Datanodes have built in web servers that makes it
|
|
|
+ The NameNode and Datanodes have built in web servers that makes it
|
|
|
easy to check current status of the cluster.
|
|
|
</li>
|
|
|
|
|
@@ -316,47 +315,41 @@ document.write("Last Published: " + document.lastModified);
|
|
|
<ul>
|
|
|
|
|
|
<li>
|
|
|
-
|
|
|
-<em>File permissions and authentication.</em>
|
|
|
-
|
|
|
-</li>
|
|
|
+ File permissions and authentication.
|
|
|
+ </li>
|
|
|
|
|
|
<li>
|
|
|
|
|
|
-<em>Rack awareness</em> : to take a node's physical location into
|
|
|
+<em>Rack awareness</em>: to take a node's physical location into
|
|
|
account while scheduling tasks and allocating storage.
|
|
|
</li>
|
|
|
|
|
|
<li>
|
|
|
-
|
|
|
-<em>Safemode</em> : an administrative mode for maintanance.
|
|
|
+ Safemode: an administrative mode for maintenance.
|
|
|
</li>
|
|
|
|
|
|
<li>
|
|
|
|
|
|
-<em>fsck</em> : an utility to diagnose health of the filesystem, to
|
|
|
+<span class="codefrag">fsck</span>: a utility to diagnose health of the file system, to
|
|
|
find missing files or blocks.
|
|
|
</li>
|
|
|
|
|
|
<li>
|
|
|
-
|
|
|
-<em>Rebalancer</em> : tool to balance the cluster when the data is
|
|
|
- unevenly distributed among datanodes.
|
|
|
+ Rebalancer: tool to balance the cluster when the data is
|
|
|
+ unevenly distributed among DataNodes.
|
|
|
</li>
|
|
|
|
|
|
<li>
|
|
|
-
|
|
|
-<em>Upgrade and Rollback</em> : after a software upgrade,
|
|
|
+ Upgrade and rollback: after a software upgrade,
|
|
|
it is possible to
|
|
|
rollback to HDFS' state before the upgrade in case of unexpected
|
|
|
problems.
|
|
|
</li>
|
|
|
|
|
|
<li>
|
|
|
-
|
|
|
-<em>Secondary Namenode</em> : performs periodic checkpoints of the
|
|
|
+ Secondary NameNode: performs periodic checkpoints of the
|
|
|
namespace and helps keep the size of file containing log of HDFS
|
|
|
- modifications within certain limits at the Namenode.
|
|
|
+ modifications within certain limits at the NameNode.
|
|
|
</li>
|
|
|
|
|
|
</ul>
|
|
@@ -365,7 +358,7 @@ document.write("Last Published: " + document.lastModified);
|
|
|
|
|
|
</ul>
|
|
|
</div>
|
|
|
-<a name="N10083"></a><a name="Pre-requisites"></a>
|
|
|
+<a name="N1006B"></a><a name="Pre-requisites"></a>
|
|
|
<h2 class="h3"> Pre-requisites </h2>
|
|
|
<div class="section">
|
|
|
<p>
|
|
@@ -376,7 +369,7 @@ document.write("Last Published: " + document.lastModified);
|
|
|
|
|
|
<li>
|
|
|
|
|
|
-<a href="quickstart.html">Hadoop Quickstart</a>
|
|
|
+<a href="quickstart.html">Hadoop Quick Start</a>
|
|
|
for first-time users.
|
|
|
</li>
|
|
|
|
|
@@ -388,49 +381,48 @@ document.write("Last Published: " + document.lastModified);
|
|
|
|
|
|
</ul>
|
|
|
<p>
|
|
|
- The rest of document assumes the user is able to set up and run a
|
|
|
- HDFS with at least one Datanode. For the purpose of this document,
|
|
|
- both Namenode and Datanode could be running on the same physical
|
|
|
+ The rest of this document assumes the user is able to set up and run a
|
|
|
+ HDFS with at least one DataNode. For the purpose of this document,
|
|
|
+ both the NameNode and DataNode could be running on the same physical
|
|
|
machine.
|
|
|
</p>
|
|
|
</div>
|
|
|
-<a name="N100A1"></a><a name="Web+Interface"></a>
|
|
|
+<a name="N10089"></a><a name="Web+Interface"></a>
|
|
|
<h2 class="h3"> Web Interface </h2>
|
|
|
<div class="section">
|
|
|
<p>
|
|
|
- Namenode and Datanode each run an internal web server in order to
|
|
|
+ NameNode and DataNode each run an internal web server in order to
|
|
|
display basic information about the current status of the cluster.
|
|
|
- With the default configuration, namenode front page is at
|
|
|
- <span class="codefrag">http://namenode:50070/</span> .
|
|
|
- It lists the datanodes in the cluster and basic stats of the
|
|
|
+ With the default configuration, the NameNode front page is at
|
|
|
+ <span class="codefrag">http://namenode-name:50070/</span>.
|
|
|
+ It lists the DataNodes in the cluster and basic statistics of the
|
|
|
cluster. The web interface can also be used to browse the file
|
|
|
- system (using "Browse the file system" link on the Namenode front
|
|
|
+ system (using "Browse the file system" link on the NameNode front
|
|
|
page).
|
|
|
</p>
|
|
|
</div>
|
|
|
-<a name="N100AE"></a><a name="Shell+Commands"></a>
|
|
|
+<a name="N10096"></a><a name="Shell+Commands"></a>
|
|
|
<h2 class="h3">Shell Commands</h2>
|
|
|
<div class="section">
|
|
|
<p>
|
|
|
- Hadoop includes various "shell-like" commands that directly
|
|
|
+ Hadoop includes various shell-like commands that directly
|
|
|
interact with HDFS and other file systems that Hadoop supports.
|
|
|
The command
|
|
|
<span class="codefrag">bin/hadoop fs -help</span>
|
|
|
lists the commands supported by Hadoop
|
|
|
- shell. Further,
|
|
|
- <span class="codefrag">bin/hadoop fs -help command</span>
|
|
|
- displays more detailed help on a command. The commands support
|
|
|
- most of the normal filesystem operations like copying files,
|
|
|
+ shell. Furthermore, the command
|
|
|
+ <span class="codefrag">bin/hadoop fs -help command-name</span>
|
|
|
+ displays more detailed help for a command. These commands support
|
|
|
+ most of the normal files ystem operations like copying files,
|
|
|
changing file permissions, etc. It also supports a few HDFS
|
|
|
specific operations like changing replication of files.
|
|
|
</p>
|
|
|
-<a name="N100BD"></a><a name="DFSAdmin+Command"></a>
|
|
|
+<a name="N100A5"></a><a name="DFSAdmin+Command"></a>
|
|
|
<h3 class="h4"> DFSAdmin Command </h3>
|
|
|
<p>
|
|
|
-
|
|
|
-<span class="codefrag">'bin/hadoop dfsadmin'</span>
|
|
|
+ The <span class="codefrag">bin/hadoop dfsadmin</span>
|
|
|
command supports a few HDFS administration related operations.
|
|
|
- <span class="codefrag">bin/hadoop dfsadmin -help</span>
|
|
|
+ The <span class="codefrag">bin/hadoop dfsadmin -help</span> command
|
|
|
lists all the commands currently supported. For e.g.:
|
|
|
</p>
|
|
|
<ul>
|
|
@@ -438,15 +430,15 @@ document.write("Last Published: " + document.lastModified);
|
|
|
<li>
|
|
|
|
|
|
<span class="codefrag">-report</span>
|
|
|
- : reports basic stats of HDFS. Some of this information is
|
|
|
- also available on the Namenode front page.
|
|
|
+ : reports basic statistics of HDFS. Some of this information is
|
|
|
+ also available on the NameNode front page.
|
|
|
</li>
|
|
|
|
|
|
<li>
|
|
|
|
|
|
<span class="codefrag">-safemode</span>
|
|
|
: though usually not required, an administrator can manually enter
|
|
|
- or leave <em>safemode</em>.
|
|
|
+ or leave Safemode.
|
|
|
</li>
|
|
|
|
|
|
<li>
|
|
@@ -460,32 +452,32 @@ document.write("Last Published: " + document.lastModified);
|
|
|
For command usage, see <a href="commands_manual.html#dfsadmin">dfsadmin command</a>.
|
|
|
</p>
|
|
|
</div>
|
|
|
-<a name="N100ED"></a><a name="Secondary+Namenode"></a>
|
|
|
-<h2 class="h3"> Secondary Namenode </h2>
|
|
|
+<a name="N100D2"></a><a name="Secondary+NameNode"></a>
|
|
|
+<h2 class="h3"> Secondary NameNode </h2>
|
|
|
<div class="section">
|
|
|
<p>
|
|
|
- Namenode stores modifications to the file system as a log
|
|
|
+ The NameNode stores modifications to the file system as a log
|
|
|
appended to a native file system file (<span class="codefrag">edits</span>).
|
|
|
- When a Namenode starts up, it reads HDFS state from an image
|
|
|
- file (<span class="codefrag">fsimage</span>) and then applies <em>edits</em> from
|
|
|
- edits log file. It then writes new HDFS state to (<span class="codefrag">fsimage</span>)
|
|
|
+ When a NameNode starts up, it reads HDFS state from an image
|
|
|
+ file (<span class="codefrag">fsimage</span>) and then applies edits from the
|
|
|
+ edits log file. It then writes new HDFS state to the <span class="codefrag">fsimage</span>
|
|
|
and starts normal
|
|
|
- operation with an empty edits file. Since namenode merges
|
|
|
+ operation with an empty edits file. Since NameNode merges
|
|
|
<span class="codefrag">fsimage</span> and <span class="codefrag">edits</span> files only during start up,
|
|
|
- edits file could get very large over time on a large cluster.
|
|
|
- Another side effect of larger edits file is that next
|
|
|
- restart of Namenade takes longer.
|
|
|
+ the edits log file could get very large over time on a busy cluster.
|
|
|
+ Another side effect of a larger edits file is that next
|
|
|
+ restart of NameNode takes longer.
|
|
|
</p>
|
|
|
<p>
|
|
|
- The secondary namenode merges fsimage and edits log periodically
|
|
|
- and keeps edits log size with in a limit. It is usually run on a
|
|
|
- different machine than the primary Namenode since its memory requirements
|
|
|
- are on the same order as the primary namemode. The secondary
|
|
|
- namenode is started by <span class="codefrag">bin/start-dfs.sh</span> on the nodes
|
|
|
+ The secondary NameNode merges the fsimage and the edits log files periodically
|
|
|
+ and keeps edits log size within a limit. It is usually run on a
|
|
|
+ different machine than the primary NameNode since its memory requirements
|
|
|
+ are on the same order as the primary NameNode. The secondary
|
|
|
+ NameNode is started by <span class="codefrag">bin/start-dfs.sh</span> on the nodes
|
|
|
specified in <span class="codefrag">conf/masters</span> file.
|
|
|
</p>
|
|
|
<p>
|
|
|
- The start of the checkpoint process on the secondary name-node is
|
|
|
+ The start of the checkpoint process on the secondary NameNode is
|
|
|
controlled by two configuration parameters.
|
|
|
</p>
|
|
|
<ul>
|
|
@@ -493,68 +485,68 @@ document.write("Last Published: " + document.lastModified);
|
|
|
<li>
|
|
|
|
|
|
<span class="codefrag">fs.checkpoint.period</span>, set to 1 hour by default, specifies
|
|
|
- the maximal delay between two consecutive checkpoints, and
|
|
|
+ the maximum delay between two consecutive checkpoints, and
|
|
|
</li>
|
|
|
|
|
|
<li>
|
|
|
|
|
|
<span class="codefrag">fs.checkpoint.size</span>, set to 64MB by default, defines the
|
|
|
size of the edits log file that forces an urgent checkpoint even if
|
|
|
- the maximal checkpoint delay is not reached.
|
|
|
+ the maximum checkpoint delay is not reached.
|
|
|
</li>
|
|
|
|
|
|
</ul>
|
|
|
<p>
|
|
|
- The secondary name-node stores the latest checkpoint in a storage
|
|
|
- directory, which is structured the same way as the primary name-node's
|
|
|
- storage directory. So that the checkpointed image is always ready to be
|
|
|
- read by the primary name-node if necessary.
|
|
|
+ The secondary NameNode stores the latest checkpoint in a
|
|
|
+ directory which is structured the same way as the primary NameNode's
|
|
|
+ directory. So that the check pointed image is always ready to be
|
|
|
+ read by the primary NameNode if necessary.
|
|
|
</p>
|
|
|
<p>
|
|
|
- The latest checkpoint can be imported to the primary name-node if
|
|
|
+ The latest checkpoint can be imported to the primary NameNode if
|
|
|
all other copies of the image and the edits files are lost.
|
|
|
In order to do that one should:
|
|
|
</p>
|
|
|
<ul>
|
|
|
|
|
|
<li>
|
|
|
- create an empty storage directory specified in the
|
|
|
+ Create an empty directory specified in the
|
|
|
<span class="codefrag">dfs.name.dir</span> configuration variable;
|
|
|
</li>
|
|
|
|
|
|
<li>
|
|
|
- specify the location of the checkpoint storage directory in the
|
|
|
+ Specify the location of the checkpoint directory in the
|
|
|
configuration variable <span class="codefrag">fs.checkpoint.dir</span>;
|
|
|
</li>
|
|
|
|
|
|
<li>
|
|
|
- and start the name-node with <span class="codefrag">-importCheckpoint</span> option.
|
|
|
+ and start the NameNode with <span class="codefrag">-importCheckpoint</span> option.
|
|
|
</li>
|
|
|
|
|
|
</ul>
|
|
|
<p>
|
|
|
- The name-node will upload the checkpoint from the
|
|
|
- <span class="codefrag">fs.checkpoint.dir</span> directory and then save it to the name-node
|
|
|
- storage directory(s) set in <span class="codefrag">dfs.name.dir</span>.
|
|
|
- The name-node will fail if a legal image is contained in
|
|
|
+ The NameNode will upload the checkpoint from the
|
|
|
+ <span class="codefrag">fs.checkpoint.dir</span> directory and then save it to the NameNode
|
|
|
+ directory(s) set in <span class="codefrag">dfs.name.dir</span>.
|
|
|
+ The NameNode will fail if a legal image is contained in
|
|
|
<span class="codefrag">dfs.name.dir</span>.
|
|
|
- The name-node verifies that the image in <span class="codefrag">fs.checkpoint.dir</span> is
|
|
|
+ The NameNode verifies that the image in <span class="codefrag">fs.checkpoint.dir</span> is
|
|
|
consistent, but does not modify it in any way.
|
|
|
</p>
|
|
|
<p>
|
|
|
- For command usage, see <a href="commands_manual.html#secondarynamenode">secondarynamenode command</a>.
|
|
|
+ For command usage, see <a href="commands_manual.html#secondarynamenode"><span class="codefrag">secondarynamenode</span> command</a>.
|
|
|
</p>
|
|
|
</div>
|
|
|
-<a name="N10155"></a><a name="Rebalancer"></a>
|
|
|
+<a name="N10139"></a><a name="Rebalancer"></a>
|
|
|
<h2 class="h3"> Rebalancer </h2>
|
|
|
<div class="section">
|
|
|
<p>
|
|
|
HDFS data might not always be be placed uniformly across the
|
|
|
- datanode. One common reason is addition of new datanodes to an
|
|
|
- existing cluster. While placing new <em>blocks</em> (data for a file is
|
|
|
- stored as a series of blocks), Namenode considers various
|
|
|
- parameters before choosing the datanodes to receive these blocks.
|
|
|
- Some of the considerations are :
|
|
|
+ DataNode. One common reason is addition of new DataNodes to an
|
|
|
+ existing cluster. While placing new blocks (data for a file is
|
|
|
+ stored as a series of blocks), NameNode considers various
|
|
|
+ parameters before choosing the DataNodes to receive these blocks.
|
|
|
+ Some of the considerations are:
|
|
|
</p>
|
|
|
<ul>
|
|
|
|
|
@@ -575,16 +567,16 @@ document.write("Last Published: " + document.lastModified);
|
|
|
</li>
|
|
|
|
|
|
<li>
|
|
|
- Spread HDFS data uniformly across the datanodes in the cluster.
|
|
|
+ Spread HDFS data uniformly across the DataNodes in the cluster.
|
|
|
</li>
|
|
|
|
|
|
</ul>
|
|
|
<p>
|
|
|
Due to multiple competing considerations, data might not be
|
|
|
- uniformly placed across the datanodes.
|
|
|
+ uniformly placed across the DataNodes.
|
|
|
HDFS provides a tool for administrators that analyzes block
|
|
|
- placement and relanaces data across the datnodes. A brief
|
|
|
- adminstrator's guide for rebalancer as a
|
|
|
+ placement and rebalanaces data across the DataNode. A brief
|
|
|
+ administrator's guide for rebalancer as a
|
|
|
<a href="http://issues.apache.org/jira/secure/attachment/12368261/RebalanceDesign6.pdf">PDF</a>
|
|
|
is attached to
|
|
|
<a href="http://issues.apache.org/jira/browse/HADOOP-1652">HADOOP-1652</a>.
|
|
@@ -593,64 +585,65 @@ document.write("Last Published: " + document.lastModified);
|
|
|
For command usage, see <a href="commands_manual.html#balancer">balancer command</a>.
|
|
|
</p>
|
|
|
</div>
|
|
|
-<a name="N10183"></a><a name="Rack+Awareness"></a>
|
|
|
+<a name="N10164"></a><a name="Rack+Awareness"></a>
|
|
|
<h2 class="h3"> Rack Awareness </h2>
|
|
|
<div class="section">
|
|
|
<p>
|
|
|
- Typically large Hadoop clusters are arranged in <em>racks</em> and
|
|
|
+ Typically large Hadoop clusters are arranged in racks and
|
|
|
network traffic between different nodes with in the same rack is
|
|
|
much more desirable than network traffic across the racks. In
|
|
|
- addition Namenode tries to place replicas of block on
|
|
|
+ addition NameNode tries to place replicas of block on
|
|
|
multiple racks for improved fault tolerance. Hadoop lets the
|
|
|
- cluster administrators decide which <em>rack</em> a node belongs to
|
|
|
+ cluster administrators decide which rack a node belongs to
|
|
|
through configuration variable <span class="codefrag">dfs.network.script</span>. When this
|
|
|
script is configured, each node runs the script to determine its
|
|
|
- <em>rackid</em>. A default installation assumes all the nodes belong to
|
|
|
+ rack id. A default installation assumes all the nodes belong to
|
|
|
the same rack. This feature and configuration is further described
|
|
|
in <a href="http://issues.apache.org/jira/secure/attachment/12345251/Rack_aware_HDFS_proposal.pdf">PDF</a>
|
|
|
attached to
|
|
|
<a href="http://issues.apache.org/jira/browse/HADOOP-692">HADOOP-692</a>.
|
|
|
</p>
|
|
|
</div>
|
|
|
-<a name="N101A1"></a><a name="Safemode"></a>
|
|
|
+<a name="N10179"></a><a name="Safemode"></a>
|
|
|
<h2 class="h3"> Safemode </h2>
|
|
|
<div class="section">
|
|
|
<p>
|
|
|
- During start up Namenode loads the filesystem state from
|
|
|
- <em>fsimage</em> and <em>edits</em> log file. It then waits for datanodes
|
|
|
+ During start up the NameNode loads the file system state from the
|
|
|
+ fsimage and the edits log file. It then waits for DataNodes
|
|
|
to report their blocks so that it does not prematurely start
|
|
|
replicating the blocks though enough replicas already exist in the
|
|
|
- cluster. During this time Namenode stays in <em>safemode</em>. A
|
|
|
- <em>Safemode</em>
|
|
|
- for Namenode is essentially a read-only mode for the HDFS cluster,
|
|
|
- where it does not allow any modifications to filesystem or blocks.
|
|
|
- Normally Namenode gets out of safemode automatically at
|
|
|
- the beginning. If required, HDFS could be placed in safemode explicitly
|
|
|
- using <span class="codefrag">'bin/hadoop dfsadmin -safemode'</span> command. Namenode front
|
|
|
- page shows whether safemode is on or off. A more detailed
|
|
|
+ cluster. During this time NameNode stays in Safemode.
|
|
|
+ Safemode
|
|
|
+ for the NameNode is essentially a read-only mode for the HDFS cluster,
|
|
|
+ where it does not allow any modifications to file system or blocks.
|
|
|
+ Normally the NameNode leaves Safemode automatically after the DataNodes
|
|
|
+ have reported that most file system blocks are available.
|
|
|
+ If required, HDFS could be placed in Safemode explicitly
|
|
|
+ using <span class="codefrag">'bin/hadoop dfsadmin -safemode'</span> command. NameNode front
|
|
|
+ page shows whether Safemode is on or off. A more detailed
|
|
|
description and configuration is maintained as JavaDoc for
|
|
|
<a href="http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/dfs/NameNode.html#setSafeMode(org.apache.hadoop.dfs.FSConstants.SafeModeAction)"><span class="codefrag">setSafeMode()</span></a>.
|
|
|
</p>
|
|
|
</div>
|
|
|
-<a name="N101BF"></a><a name="Fsck"></a>
|
|
|
-<h2 class="h3"> Fsck </h2>
|
|
|
+<a name="N1018B"></a><a name="fsck"></a>
|
|
|
+<h2 class="h3"> fsck </h2>
|
|
|
<div class="section">
|
|
|
<p>
|
|
|
- HDFS supports <span class="codefrag">fsck</span> command to check for various
|
|
|
+ HDFS supports the <span class="codefrag">fsck</span> command to check for various
|
|
|
inconsistencies.
|
|
|
It it is designed for reporting problems with various
|
|
|
- files, for e.g. missing blocks for a file or under replicated
|
|
|
- blocks. Unlike a traditional fsck utility for native filesystems,
|
|
|
- this command does not correct the errors it detects. Normally Namenode
|
|
|
+ files, for example, missing blocks for a file or under-replicated
|
|
|
+ blocks. Unlike a traditional <span class="codefrag">fsck</span> utility for native file systems,
|
|
|
+ this command does not correct the errors it detects. Normally NameNode
|
|
|
automatically corrects most of the recoverable failures. By default
|
|
|
- fsck ignores open files but provides an option to select during reporting.
|
|
|
- HDFS' fsck is not a
|
|
|
+ <span class="codefrag">fsck</span> ignores open files but provides an option to select all files during reporting.
|
|
|
+ The HDFS <span class="codefrag">fsck</span> command is not a
|
|
|
Hadoop shell command. It can be run as '<span class="codefrag">bin/hadoop fsck</span>'.
|
|
|
- For command usage, see <a href="commands_manual.html#fsck">fsck command</a>.
|
|
|
- Fsck can be run on the whole filesystem or on a subset of files.
|
|
|
+ For command usage, see <a href="commands_manual.html#fsck"><span class="codefrag">fsck</span> command</a>.
|
|
|
+ <span class="codefrag">fsck</span> can be run on the whole file system or on a subset of files.
|
|
|
</p>
|
|
|
</div>
|
|
|
-<a name="N101D3"></a><a name="Upgrade+and+Rollback"></a>
|
|
|
+<a name="N101AD"></a><a name="Upgrade+and+Rollback"></a>
|
|
|
<h2 class="h3"> Upgrade and Rollback </h2>
|
|
|
<div class="section">
|
|
|
<p>
|
|
@@ -660,14 +653,14 @@ document.write("Last Published: " + document.lastModified);
|
|
|
not discovered earlier. In any non-trivial HDFS installation, it
|
|
|
is not an option to loose any data, let alone to restart HDFS from
|
|
|
scratch. HDFS allows administrators to go back to earlier version
|
|
|
- of Hadoop and <em>roll back</em> the cluster to the state it was in
|
|
|
+ of Hadoop and rollback the cluster to the state it was in
|
|
|
before
|
|
|
the upgrade. HDFS upgrade is described in more detail in
|
|
|
<a href="http://wiki.apache.org/hadoop/Hadoop%20Upgrade">upgrade wiki</a>.
|
|
|
HDFS can have one such backup at a time. Before upgrading,
|
|
|
administrators need to remove existing backup using <span class="codefrag">bin/hadoop
|
|
|
dfsadmin -finalizeUpgrade</span> command. The following
|
|
|
- briefly describes typical upgrade procedure :
|
|
|
+ briefly describes the typical upgrade procedure:
|
|
|
</p>
|
|
|
<ul>
|
|
|
|
|
@@ -690,7 +683,7 @@ document.write("Last Published: " + document.lastModified);
|
|
|
considered working well (may be after a few days of operation),
|
|
|
finalize the upgrade. Note that until the cluster is finalized,
|
|
|
deleting the files that existed before the upgrade does not free
|
|
|
- up real disk space on the datanodes.
|
|
|
+ up real disk space on the DataNodes.
|
|
|
</li>
|
|
|
|
|
|
<li>
|
|
@@ -709,52 +702,51 @@ document.write("Last Published: " + document.lastModified);
|
|
|
|
|
|
</ul>
|
|
|
</div>
|
|
|
-<a name="N10214"></a><a name="File+Permissions+and+Security"></a>
|
|
|
+<a name="N101EB"></a><a name="File+Permissions+and+Security"></a>
|
|
|
<h2 class="h3"> File Permissions and Security </h2>
|
|
|
<div class="section">
|
|
|
<p>
|
|
|
The file permissions are designed to be similar to file permissions on
|
|
|
other familiar platforms like Linux. Currently, security is limited
|
|
|
- to simple file permissions. The user that starts Namenode is
|
|
|
- treated as the <em>super user</em> for HDFS. Future versions of HDFS will
|
|
|
+ to simple file permissions. The user that starts NameNode is
|
|
|
+ treated as the superuser for HDFS. Future versions of HDFS will
|
|
|
support network authentication protocols like Kerberos for user
|
|
|
authentication and encryption of data transfers. The details are discussed in the
|
|
|
- <a href="hdfs_permissions_guide.html"><em>Permissions User and Administrator Guide</em></a>.
|
|
|
+ <a href="hdfs_permissions_guide.html">Permissions User and Administrator Guide</a>.
|
|
|
</p>
|
|
|
</div>
|
|
|
-<a name="N10226"></a><a name="Scalability"></a>
|
|
|
+<a name="N101F9"></a><a name="Scalability"></a>
|
|
|
<h2 class="h3"> Scalability </h2>
|
|
|
<div class="section">
|
|
|
<p>
|
|
|
Hadoop currently runs on clusters with thousands of nodes.
|
|
|
- <a href="http://wiki.apache.org/hadoop/PoweredBy">PoweredBy Hadoop</a>
|
|
|
+ <a href="http://wiki.apache.org/hadoop/PoweredBy">Powered By Hadoop</a>
|
|
|
lists some of the organizations that deploy Hadoop on large
|
|
|
- clusters. HDFS has one Namenode for each cluster. Currently
|
|
|
- the total memory available on Namenode is the primary scalability
|
|
|
+ clusters. HDFS has one NameNode for each cluster. Currently
|
|
|
+ the total memory available on NameNode is the primary scalability
|
|
|
limitation. On very large clusters, increasing average size of
|
|
|
files stored in HDFS helps with increasing cluster size without
|
|
|
- increasing memory requirements on Namenode.
|
|
|
+ increasing memory requirements on NameNode.
|
|
|
|
|
|
The default configuration may not suite very large clustes.
|
|
|
<a href="http://wiki.apache.org/hadoop/FAQ">Hadoop FAQ</a> page lists
|
|
|
suggested configuration improvements for large Hadoop clusters.
|
|
|
</p>
|
|
|
</div>
|
|
|
-<a name="N10238"></a><a name="Related+Documentation"></a>
|
|
|
+<a name="N1020B"></a><a name="Related+Documentation"></a>
|
|
|
<h2 class="h3"> Related Documentation </h2>
|
|
|
<div class="section">
|
|
|
<p>
|
|
|
- This user guide is intended to be a good starting point for
|
|
|
- working with HDFS. While it continues to improve,
|
|
|
+ This user guide is a good starting point for
|
|
|
+ working with HDFS. While the user guide continues to improve,
|
|
|
there is a large wealth of documentation about Hadoop and HDFS.
|
|
|
- The following lists starting points for further exploration :
|
|
|
+ The following list is a starting point for further exploration:
|
|
|
</p>
|
|
|
<ul>
|
|
|
|
|
|
<li>
|
|
|
|
|
|
-<a href="http://hadoop.apache.org/">Hadoop Home Page</a>
|
|
|
- : the start page for everything Hadoop.
|
|
|
+<a href="http://hadoop.apache.org/">Hadoop Home Page</a>: The start page for everything Hadoop.
|
|
|
</li>
|
|
|
|
|
|
<li>
|