Procházet zdrojové kódy

HADOOP-10086. Merging r1561777 from branch-2 to branch-2.3

git-svn-id: https://svn.apache.org/repos/asf/hadoop/common/branches/branch-2.3@1561781 13f79535-47bb-0310-9956-ffa450edef68
Arpit Agarwal před 11 roky
rodič
revize
8f39b6676a

+ 0 - 434
hadoop-common-project/hadoop-common/src/site/apt/ClusterSetup.apt.vm

@@ -571,440 +571,6 @@ $ $HADOOP_YARN_HOME/sbin/yarn-daemon.sh stop proxyserver --config $HADOOP_CONF_D
 $ $HADOOP_PREFIX/sbin/mr-jobhistory-daemon.sh stop historyserver --config $HADOOP_CONF_DIR
 ----    	
 
-* {Running Hadoop in Secure Mode}
-
-  This section deals with important parameters to be specified in
-  to run Hadoop in <<secure mode>> with strong, Kerberos-based
-  authentication.
-
-  * <<<User Accounts for Hadoop Daemons>>>
-
-  Ensure that HDFS and YARN daemons run as different Unix users, for e.g.
-  <<<hdfs>>> and <<<yarn>>>. Also, ensure that the MapReduce JobHistory
-  server runs as user <<<mapred>>>.
-
-  It's recommended to have them share a Unix group, for e.g. <<<hadoop>>>.
-
-*---------------+----------------------------------------------------------------------+
-|| User:Group   || Daemons                                                             |
-*---------------+----------------------------------------------------------------------+
-| hdfs:hadoop   | NameNode, Secondary NameNode, Checkpoint Node, Backup Node, DataNode |
-*---------------+----------------------------------------------------------------------+
-| yarn:hadoop   | ResourceManager, NodeManager                                         |
-*---------------+----------------------------------------------------------------------+
-| mapred:hadoop | MapReduce JobHistory Server                                          |
-*---------------+----------------------------------------------------------------------+
-
-  * <<<Permissions for both HDFS and local fileSystem paths>>>
-
-  The following table lists various paths on HDFS and local filesystems (on
-  all nodes) and recommended permissions:
-
-*-------------------+-------------------+------------------+------------------+
-|| Filesystem       || Path             || User:Group      || Permissions     |
-*-------------------+-------------------+------------------+------------------+
-| local | <<<dfs.namenode.name.dir>>> | hdfs:hadoop | drwx------ |
-*-------------------+-------------------+------------------+------------------+
-| local | <<<dfs.datanode.data.dir>>> | hdfs:hadoop | drwx------ |
-*-------------------+-------------------+------------------+------------------+
-| local | $HADOOP_LOG_DIR | hdfs:hadoop | drwxrwxr-x |
-*-------------------+-------------------+------------------+------------------+
-| local | $YARN_LOG_DIR | yarn:hadoop | drwxrwxr-x |
-*-------------------+-------------------+------------------+------------------+
-| local | <<<yarn.nodemanager.local-dirs>>> | yarn:hadoop | drwxr-xr-x |
-*-------------------+-------------------+------------------+------------------+
-| local | <<<yarn.nodemanager.log-dirs>>> | yarn:hadoop | drwxr-xr-x |
-*-------------------+-------------------+------------------+------------------+
-| local | container-executor | root:hadoop | --Sr-s--- |
-*-------------------+-------------------+------------------+------------------+
-| local | <<<conf/container-executor.cfg>>> | root:hadoop | r-------- |
-*-------------------+-------------------+------------------+------------------+
-| hdfs | / | hdfs:hadoop | drwxr-xr-x |
-*-------------------+-------------------+------------------+------------------+
-| hdfs | /tmp | hdfs:hadoop | drwxrwxrwxt |
-*-------------------+-------------------+------------------+------------------+
-| hdfs | /user | hdfs:hadoop | drwxr-xr-x |
-*-------------------+-------------------+------------------+------------------+
-| hdfs | <<<yarn.nodemanager.remote-app-log-dir>>> | yarn:hadoop | drwxrwxrwxt |
-*-------------------+-------------------+------------------+------------------+
-| hdfs | <<<mapreduce.jobhistory.intermediate-done-dir>>> | mapred:hadoop | |
-| | | | drwxrwxrwxt |
-*-------------------+-------------------+------------------+------------------+
-| hdfs | <<<mapreduce.jobhistory.done-dir>>> | mapred:hadoop | |
-| | | | drwxr-x--- |
-*-------------------+-------------------+------------------+------------------+
-
-  * Kerberos Keytab files
-
-    * HDFS
-
-    The NameNode keytab file, on the NameNode host, should look like the
-    following:
-
-----
-$ /usr/kerberos/bin/klist -e -k -t /etc/security/keytab/nn.service.keytab
-Keytab name: FILE:/etc/security/keytab/nn.service.keytab
-KVNO Timestamp         Principal
-   4 07/18/11 21:08:09 nn/full.qualified.domain.name@REALM.TLD (AES-256 CTS mode with 96-bit SHA-1 HMAC)
-   4 07/18/11 21:08:09 nn/full.qualified.domain.name@REALM.TLD (AES-128 CTS mode with 96-bit SHA-1 HMAC)
-   4 07/18/11 21:08:09 nn/full.qualified.domain.name@REALM.TLD (ArcFour with HMAC/md5)
-   4 07/18/11 21:08:09 host/full.qualified.domain.name@REALM.TLD (AES-256 CTS mode with 96-bit SHA-1 HMAC)
-   4 07/18/11 21:08:09 host/full.qualified.domain.name@REALM.TLD (AES-128 CTS mode with 96-bit SHA-1 HMAC)
-   4 07/18/11 21:08:09 host/full.qualified.domain.name@REALM.TLD (ArcFour with HMAC/md5)
-----
-
-    The Secondary NameNode keytab file, on that host, should look like the
-    following:
-
-----
-$ /usr/kerberos/bin/klist -e -k -t /etc/security/keytab/sn.service.keytab
-Keytab name: FILE:/etc/security/keytab/sn.service.keytab
-KVNO Timestamp         Principal
-   4 07/18/11 21:08:09 sn/full.qualified.domain.name@REALM.TLD (AES-256 CTS mode with 96-bit SHA-1 HMAC)
-   4 07/18/11 21:08:09 sn/full.qualified.domain.name@REALM.TLD (AES-128 CTS mode with 96-bit SHA-1 HMAC)
-   4 07/18/11 21:08:09 sn/full.qualified.domain.name@REALM.TLD (ArcFour with HMAC/md5)
-   4 07/18/11 21:08:09 host/full.qualified.domain.name@REALM.TLD (AES-256 CTS mode with 96-bit SHA-1 HMAC)
-   4 07/18/11 21:08:09 host/full.qualified.domain.name@REALM.TLD (AES-128 CTS mode with 96-bit SHA-1 HMAC)
-   4 07/18/11 21:08:09 host/full.qualified.domain.name@REALM.TLD (ArcFour with HMAC/md5)
-----
-
-    The DataNode keytab file, on each host, should look like the following:
-
-----
-$ /usr/kerberos/bin/klist -e -k -t /etc/security/keytab/dn.service.keytab
-Keytab name: FILE:/etc/security/keytab/dn.service.keytab
-KVNO Timestamp         Principal
-   4 07/18/11 21:08:09 dn/full.qualified.domain.name@REALM.TLD (AES-256 CTS mode with 96-bit SHA-1 HMAC)
-   4 07/18/11 21:08:09 dn/full.qualified.domain.name@REALM.TLD (AES-128 CTS mode with 96-bit SHA-1 HMAC)
-   4 07/18/11 21:08:09 dn/full.qualified.domain.name@REALM.TLD (ArcFour with HMAC/md5)
-   4 07/18/11 21:08:09 host/full.qualified.domain.name@REALM.TLD (AES-256 CTS mode with 96-bit SHA-1 HMAC)
-   4 07/18/11 21:08:09 host/full.qualified.domain.name@REALM.TLD (AES-128 CTS mode with 96-bit SHA-1 HMAC)
-   4 07/18/11 21:08:09 host/full.qualified.domain.name@REALM.TLD (ArcFour with HMAC/md5)
-----
-
-    * YARN
-
-    The ResourceManager keytab file, on the ResourceManager host, should look
-    like the following:
-
-----
-$ /usr/kerberos/bin/klist -e -k -t /etc/security/keytab/rm.service.keytab
-Keytab name: FILE:/etc/security/keytab/rm.service.keytab
-KVNO Timestamp         Principal
-   4 07/18/11 21:08:09 rm/full.qualified.domain.name@REALM.TLD (AES-256 CTS mode with 96-bit SHA-1 HMAC)
-   4 07/18/11 21:08:09 rm/full.qualified.domain.name@REALM.TLD (AES-128 CTS mode with 96-bit SHA-1 HMAC)
-   4 07/18/11 21:08:09 rm/full.qualified.domain.name@REALM.TLD (ArcFour with HMAC/md5)
-   4 07/18/11 21:08:09 host/full.qualified.domain.name@REALM.TLD (AES-256 CTS mode with 96-bit SHA-1 HMAC)
-   4 07/18/11 21:08:09 host/full.qualified.domain.name@REALM.TLD (AES-128 CTS mode with 96-bit SHA-1 HMAC)
-   4 07/18/11 21:08:09 host/full.qualified.domain.name@REALM.TLD (ArcFour with HMAC/md5)
-----
-
-    The NodeManager keytab file, on each host, should look like the following:
-
-----
-$ /usr/kerberos/bin/klist -e -k -t /etc/security/keytab/nm.service.keytab
-Keytab name: FILE:/etc/security/keytab/nm.service.keytab
-KVNO Timestamp         Principal
-   4 07/18/11 21:08:09 nm/full.qualified.domain.name@REALM.TLD (AES-256 CTS mode with 96-bit SHA-1 HMAC)
-   4 07/18/11 21:08:09 nm/full.qualified.domain.name@REALM.TLD (AES-128 CTS mode with 96-bit SHA-1 HMAC)
-   4 07/18/11 21:08:09 nm/full.qualified.domain.name@REALM.TLD (ArcFour with HMAC/md5)
-   4 07/18/11 21:08:09 host/full.qualified.domain.name@REALM.TLD (AES-256 CTS mode with 96-bit SHA-1 HMAC)
-   4 07/18/11 21:08:09 host/full.qualified.domain.name@REALM.TLD (AES-128 CTS mode with 96-bit SHA-1 HMAC)
-   4 07/18/11 21:08:09 host/full.qualified.domain.name@REALM.TLD (ArcFour with HMAC/md5)
-----
-
-    * MapReduce JobHistory Server
-
-    The MapReduce JobHistory Server keytab file, on that host, should look
-    like the following:
-
-----
-$ /usr/kerberos/bin/klist -e -k -t /etc/security/keytab/jhs.service.keytab
-Keytab name: FILE:/etc/security/keytab/jhs.service.keytab
-KVNO Timestamp         Principal
-   4 07/18/11 21:08:09 jhs/full.qualified.domain.name@REALM.TLD (AES-256 CTS mode with 96-bit SHA-1 HMAC)
-   4 07/18/11 21:08:09 jhs/full.qualified.domain.name@REALM.TLD (AES-128 CTS mode with 96-bit SHA-1 HMAC)
-   4 07/18/11 21:08:09 jhs/full.qualified.domain.name@REALM.TLD (ArcFour with HMAC/md5)
-   4 07/18/11 21:08:09 host/full.qualified.domain.name@REALM.TLD (AES-256 CTS mode with 96-bit SHA-1 HMAC)
-   4 07/18/11 21:08:09 host/full.qualified.domain.name@REALM.TLD (AES-128 CTS mode with 96-bit SHA-1 HMAC)
-   4 07/18/11 21:08:09 host/full.qualified.domain.name@REALM.TLD (ArcFour with HMAC/md5)
-----
-
-** Configuration in Secure Mode
-
-  * <<<conf/core-site.xml>>>
-
-*-------------------------+-------------------------+------------------------+
-|| Parameter              || Value                  || Notes                 |
-*-------------------------+-------------------------+------------------------+
-| <<<hadoop.security.authentication>>> | <kerberos> | <simple> is non-secure. |
-*-------------------------+-------------------------+------------------------+
-| <<<hadoop.security.authorization>>> | <true> | |
-| | | Enable RPC service-level authorization. |
-*-------------------------+-------------------------+------------------------+
-
-  * <<<conf/hdfs-site.xml>>>
-
-    * Configurations for NameNode:
-
-*-------------------------+-------------------------+------------------------+
-|| Parameter              || Value                  || Notes                 |
-*-------------------------+-------------------------+------------------------+
-| <<<dfs.block.access.token.enable>>> | <true> |  |
-| | | Enable HDFS block access tokens for secure operations. |
-*-------------------------+-------------------------+------------------------+
-| <<<dfs.https.enable>>> | <true> | |
-| | | This value is deprecated. Use dfs.http.policy |
-*-------------------------+-------------------------+------------------------+
-| <<<dfs.http.policy>>> | <HTTP_ONLY> or <HTTPS_ONLY> or <HTTP_AND_HTTPS> | |
-| | | HTTPS_ONLY turns off http access |
-*-------------------------+-------------------------+------------------------+
-| <<<dfs.namenode.https-address>>> | <nn_host_fqdn:50470> | |
-*-------------------------+-------------------------+------------------------+
-| <<<dfs.https.port>>> | <50470> | |
-*-------------------------+-------------------------+------------------------+
-| <<<dfs.namenode.keytab.file>>> | </etc/security/keytab/nn.service.keytab> | |
-| | | Kerberos keytab file for the NameNode. |
-*-------------------------+-------------------------+------------------------+
-| <<<dfs.namenode.kerberos.principal>>> | nn/_HOST@REALM.TLD | |
-| | | Kerberos principal name for the NameNode. |
-*-------------------------+-------------------------+------------------------+
-| <<<dfs.namenode.kerberos.https.principal>>> | host/_HOST@REALM.TLD | |
-| | | HTTPS Kerberos principal name for the NameNode. |
-*-------------------------+-------------------------+------------------------+
-
-    * Configurations for Secondary NameNode:
-
-*-------------------------+-------------------------+------------------------+
-|| Parameter              || Value                  || Notes                 |
-*-------------------------+-------------------------+------------------------+
-| <<<dfs.namenode.secondary.http-address>>> | <c_nn_host_fqdn:50090> | |
-*-------------------------+-------------------------+------------------------+
-| <<<dfs.namenode.secondary.https-port>>> | <50470> | |
-*-------------------------+-------------------------+------------------------+
-| <<<dfs.namenode.secondary.keytab.file>>> | | |
-| | </etc/security/keytab/sn.service.keytab> | |
-| | | Kerberos keytab file for the NameNode. |
-*-------------------------+-------------------------+------------------------+
-| <<<dfs.namenode.secondary.kerberos.principal>>> | sn/_HOST@REALM.TLD | |
-| | | Kerberos principal name for the Secondary NameNode. |
-*-------------------------+-------------------------+------------------------+
-| <<<dfs.namenode.secondary.kerberos.https.principal>>> | | |
-| | host/_HOST@REALM.TLD | |
-| | | HTTPS Kerberos principal name for the Secondary NameNode. |
-*-------------------------+-------------------------+------------------------+
-
-    * Configurations for DataNode:
-
-*-------------------------+-------------------------+------------------------+
-|| Parameter              || Value                  || Notes                 |
-*-------------------------+-------------------------+------------------------+
-| <<<dfs.datanode.data.dir.perm>>> | 700 | |
-*-------------------------+-------------------------+------------------------+
-| <<<dfs.datanode.address>>> | <0.0.0.0:2003> | |
-*-------------------------+-------------------------+------------------------+
-| <<<dfs.datanode.https.address>>> | <0.0.0.0:2005> | |
-*-------------------------+-------------------------+------------------------+
-| <<<dfs.datanode.keytab.file>>> | </etc/security/keytab/dn.service.keytab> | |
-| | | Kerberos keytab file for the DataNode. |
-*-------------------------+-------------------------+------------------------+
-| <<<dfs.datanode.kerberos.principal>>> | dn/_HOST@REALM.TLD | |
-| | | Kerberos principal name for the DataNode. |
-*-------------------------+-------------------------+------------------------+
-| <<<dfs.datanode.kerberos.https.principal>>> | | |
-| | host/_HOST@REALM.TLD | |
-| | | HTTPS Kerberos principal name for the DataNode. |
-*-------------------------+-------------------------+------------------------+
-
-  * <<<conf/yarn-site.xml>>>
-
-    * WebAppProxy
-
-    The <<<WebAppProxy>>> provides a proxy between the web applications
-    exported by an application and an end user.  If security is enabled
-    it will warn users before accessing a potentially unsafe web application.
-    Authentication and authorization using the proxy is handled just like
-    any other privileged web application.
-
-*-------------------------+-------------------------+------------------------+
-|| Parameter              || Value                  || Notes                 |
-*-------------------------+-------------------------+------------------------+
-| <<<yarn.web-proxy.address>>> | | |
-| | <<<WebAppProxy>>> host:port for proxy to AM web apps. | |
-| | | <host:port> if this is the same as <<<yarn.resourcemanager.webapp.address>>>|
-| | | or it is not defined then the <<<ResourceManager>>> will run the proxy|
-| | | otherwise a standalone proxy server will need to be launched.|
-*-------------------------+-------------------------+------------------------+
-| <<<yarn.web-proxy.keytab>>> | | |
-| | </etc/security/keytab/web-app.service.keytab> | |
-| | | Kerberos keytab file for the WebAppProxy. |
-*-------------------------+-------------------------+------------------------+
-| <<<yarn.web-proxy.principal>>> | wap/_HOST@REALM.TLD | |
-| | | Kerberos principal name for the WebAppProxy. |
-*-------------------------+-------------------------+------------------------+
-
-    * LinuxContainerExecutor
-
-    A <<<ContainerExecutor>>> used by YARN framework which define how any
-    <container> launched and controlled.
-
-    The following are the available in Hadoop YARN:
-
-*--------------------------------------+--------------------------------------+
-|| ContainerExecutor                   || Description                         |
-*--------------------------------------+--------------------------------------+
-| <<<DefaultContainerExecutor>>>             | |
-| | The default executor which YARN uses to manage container execution. |
-| | The container process has the same Unix user as the NodeManager.  |
-*--------------------------------------+--------------------------------------+
-| <<<LinuxContainerExecutor>>>               | |
-| | Supported only on GNU/Linux, this executor runs the containers as either the |
-| | YARN user who submitted the application (when full security is enabled) or |
-| | as a dedicated user (defaults to nobody) when full security is not enabled. |
-| | When full security is enabled, this executor requires all user accounts to be |
-| | created on the cluster nodes where the containers are launched. It uses |
-| | a <setuid> executable that is included in the Hadoop distribution. |
-| | The NodeManager uses this executable to launch and kill containers. |
-| | The setuid executable switches to the user who has submitted the |
-| | application and launches or kills the containers. For maximum security, |
-| | this executor sets up restricted permissions and user/group ownership of |
-| | local files and directories used by the containers such as the shared |
-| | objects, jars, intermediate files, log files etc. Particularly note that, |
-| | because of this, except the application owner and NodeManager, no other |
-| | user can access any of the local files/directories including those |
-| | localized as part of the distributed cache. |
-*--------------------------------------+--------------------------------------+
-
-    To build the LinuxContainerExecutor executable run:
-
-----
- $ mvn package -Dcontainer-executor.conf.dir=/etc/hadoop/
-----
-
-    The path passed in <<<-Dcontainer-executor.conf.dir>>> should be the
-    path on the cluster nodes where a configuration file for the setuid
-    executable should be located. The executable should be installed in
-    $HADOOP_YARN_HOME/bin.
-
-    The executable must have specific permissions: 6050 or --Sr-s---
-    permissions user-owned by <root> (super-user) and group-owned by a
-    special group (e.g. <<<hadoop>>>) of which the NodeManager Unix user is
-    the group member and no ordinary application user is. If any application
-    user belongs to this special group, security will be compromised. This
-    special group name should be specified for the configuration property
-    <<<yarn.nodemanager.linux-container-executor.group>>> in both
-    <<<conf/yarn-site.xml>>> and <<<conf/container-executor.cfg>>>.
-
-    For example, let's say that the NodeManager is run as user <yarn> who is
-    part of the groups users and <hadoop>, any of them being the primary group.
-    Let also be that <users> has both <yarn> and another user
-    (application submitter) <alice> as its members, and <alice> does not
-    belong to <hadoop>. Going by the above description, the setuid/setgid
-    executable should be set 6050 or --Sr-s--- with user-owner as <yarn> and
-    group-owner as <hadoop> which has <yarn> as its member (and not <users>
-    which has <alice> also as its member besides <yarn>).
-
-    The LinuxTaskController requires that paths including and leading up to
-    the directories specified in <<<yarn.nodemanager.local-dirs>>> and
-    <<<yarn.nodemanager.log-dirs>>> to be set 755 permissions as described
-    above in the table on permissions on directories.
-
-      * <<<conf/container-executor.cfg>>>
-
-    The executable requires a configuration file called
-    <<<container-executor.cfg>>> to be present in the configuration
-    directory passed to the mvn target mentioned above.
-
-    The configuration file must be owned by the user running NodeManager
-    (user <<<yarn>>> in the above example), group-owned by anyone and
-    should have the permissions 0400 or r--------.
-
-    The executable requires following configuration items to be present
-    in the <<<conf/container-executor.cfg>>> file. The items should be
-    mentioned as simple key=value pairs, one per-line:
-
-*-------------------------+-------------------------+------------------------+
-|| Parameter              || Value                  || Notes                 |
-*-------------------------+-------------------------+------------------------+
-| <<<yarn.nodemanager.linux-container-executor.group>>> | <hadoop> | |
-| | | Unix group of the NodeManager. The group owner of the |
-| | |<container-executor> binary should be this group. Should be same as the |
-| | | value with which the NodeManager is configured. This configuration is |
-| | | required for validating the secure access of the <container-executor> |
-| | | binary. |
-*-------------------------+-------------------------+------------------------+
-| <<<banned.users>>> | hfds,yarn,mapred,bin | Banned users. |
-*-------------------------+-------------------------+------------------------+
-| <<<allowed.system.users>>> | foo,bar | Allowed system users. |
-*-------------------------+-------------------------+------------------------+
-| <<<min.user.id>>> | 1000 | Prevent other super-users. |
-*-------------------------+-------------------------+------------------------+
-
-      To re-cap, here are the local file-sysytem permissions required for the
-      various paths related to the <<<LinuxContainerExecutor>>>:
-
-*-------------------+-------------------+------------------+------------------+
-|| Filesystem       || Path             || User:Group      || Permissions     |
-*-------------------+-------------------+------------------+------------------+
-| local | container-executor | root:hadoop | --Sr-s--- |
-*-------------------+-------------------+------------------+------------------+
-| local | <<<conf/container-executor.cfg>>> | root:hadoop | r-------- |
-*-------------------+-------------------+------------------+------------------+
-| local | <<<yarn.nodemanager.local-dirs>>> | yarn:hadoop | drwxr-xr-x |
-*-------------------+-------------------+------------------+------------------+
-| local | <<<yarn.nodemanager.log-dirs>>> | yarn:hadoop | drwxr-xr-x |
-*-------------------+-------------------+------------------+------------------+
-
-      * Configurations for ResourceManager:
-
-*-------------------------+-------------------------+------------------------+
-|| Parameter              || Value                  || Notes                 |
-*-------------------------+-------------------------+------------------------+
-| <<<yarn.resourcemanager.keytab>>> | | |
-| | </etc/security/keytab/rm.service.keytab> | |
-| | | Kerberos keytab file for the ResourceManager. |
-*-------------------------+-------------------------+------------------------+
-| <<<yarn.resourcemanager.principal>>> | rm/_HOST@REALM.TLD | |
-| | | Kerberos principal name for the ResourceManager. |
-*-------------------------+-------------------------+------------------------+
-
-      * Configurations for NodeManager:
-
-*-------------------------+-------------------------+------------------------+
-|| Parameter              || Value                  || Notes                 |
-*-------------------------+-------------------------+------------------------+
-| <<<yarn.nodemanager.keytab>>> | </etc/security/keytab/nm.service.keytab> | |
-| | | Kerberos keytab file for the NodeManager. |
-*-------------------------+-------------------------+------------------------+
-| <<<yarn.nodemanager.principal>>> | nm/_HOST@REALM.TLD | |
-| | | Kerberos principal name for the NodeManager. |
-*-------------------------+-------------------------+------------------------+
-| <<<yarn.nodemanager.container-executor.class>>> | | |
-| | <<<org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor>>> |
-| | | Use LinuxContainerExecutor. |
-*-------------------------+-------------------------+------------------------+
-| <<<yarn.nodemanager.linux-container-executor.group>>> | <hadoop> | |
-| | | Unix group of the NodeManager. |
-*-------------------------+-------------------------+------------------------+
-
-    * <<<conf/mapred-site.xml>>>
-
-      * Configurations for MapReduce JobHistory Server:
-
-*-------------------------+-------------------------+------------------------+
-|| Parameter              || Value                  || Notes                 |
-*-------------------------+-------------------------+------------------------+
-| <<<mapreduce.jobhistory.address>>> | | |
-| | MapReduce JobHistory Server <host:port> | Default port is 10020. |
-*-------------------------+-------------------------+------------------------+
-| <<<mapreduce.jobhistory.keytab>>> | |
-| | </etc/security/keytab/jhs.service.keytab> | |
-| | | Kerberos keytab file for the MapReduce JobHistory Server. |
-*-------------------------+-------------------------+------------------------+
-| <<<mapreduce.jobhistory.principal>>> | jhs/_HOST@REALM.TLD | |
-| | | Kerberos principal name for the MapReduce JobHistory Server. |
-*-------------------------+-------------------------+------------------------+
-
 
 * {Operating the Hadoop Cluster}
 

+ 637 - 0
hadoop-common-project/hadoop-common/src/site/apt/SecureMode.apt.vm

@@ -0,0 +1,637 @@
+~~ Licensed under the Apache License, Version 2.0 (the "License");
+~~ you may not use this file except in compliance with the License.
+~~ You may obtain a copy of the License at
+~~
+~~   http://www.apache.org/licenses/LICENSE-2.0
+~~
+~~ Unless required by applicable law or agreed to in writing, software
+~~ distributed under the License is distributed on an "AS IS" BASIS,
+~~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+~~ See the License for the specific language governing permissions and
+~~ limitations under the License. See accompanying LICENSE file.
+
+  ---
+  Hadoop in Secure Mode
+  ---
+  ---
+  ${maven.build.timestamp}
+
+%{toc|section=0|fromDepth=0|toDepth=3}
+
+Hadoop in Secure Mode
+
+* Introduction
+
+  This document describes how to configure authentication for Hadoop in
+  secure mode.
+
+  By default Hadoop runs in non-secure mode in which no actual
+  authentication is required.
+  By configuring Hadoop runs in secure mode,
+  each user and service needs to be authenticated by Kerberos
+  in order to use Hadoop services.
+
+  Security features of Hadoop consist of
+  {{{Authentication}authentication}},
+  {{{./ServiceLevelAuth.html}service level authorization}},
+  {{{./HttpAuthentication.html}authentication for Web consoles}}
+  and {{{Data confidentiality}data confidenciality}}.
+
+
+* Authentication
+
+** End User Accounts
+
+  When service level authentication is turned on,
+  end users using Hadoop in secure mode needs to be authenticated by Kerberos.
+  The simplest way to do authentication is using <<<kinit>>> command of Kerberos.
+
+** User Accounts for Hadoop Daemons
+
+  Ensure that HDFS and YARN daemons run as different Unix users,
+  e.g. <<<hdfs>>> and <<<yarn>>>.
+  Also, ensure that the MapReduce JobHistory server runs as
+  different user such as <<<mapred>>>.
+
+  It's recommended to have them share a Unix group, for e.g. <<<hadoop>>>.
+  See also "{{Mapping from user to group}}" for group management.
+
+*---------------+----------------------------------------------------------------------+
+|| User:Group   || Daemons                                                             |
+*---------------+----------------------------------------------------------------------+
+| hdfs:hadoop   | NameNode, Secondary NameNode, JournalNode, DataNode                  |
+*---------------+----------------------------------------------------------------------+
+| yarn:hadoop   | ResourceManager, NodeManager                                         |
+*---------------+----------------------------------------------------------------------+
+| mapred:hadoop | MapReduce JobHistory Server                                          |
+*---------------+----------------------------------------------------------------------+
+
+** Kerberos principals for Hadoop Daemons and Users
+
+  For running hadoop service daemons in  Hadoop in secure mode,
+  Kerberos principals are required.
+  Each service reads auhenticate information saved in keytab file with appropriate permission.
+
+  HTTP web-consoles should be served by principal different from RPC's one.
+
+  Subsections below shows the examples of credentials for Hadoop services.
+
+*** HDFS
+
+    The NameNode keytab file, on the NameNode host, should look like the
+    following:
+
+----
+$ klist -e -k -t /etc/security/keytab/nn.service.keytab
+Keytab name: FILE:/etc/security/keytab/nn.service.keytab
+KVNO Timestamp         Principal
+   4 07/18/11 21:08:09 nn/full.qualified.domain.name@REALM.TLD (AES-256 CTS mode with 96-bit SHA-1 HMAC)
+   4 07/18/11 21:08:09 nn/full.qualified.domain.name@REALM.TLD (AES-128 CTS mode with 96-bit SHA-1 HMAC)
+   4 07/18/11 21:08:09 nn/full.qualified.domain.name@REALM.TLD (ArcFour with HMAC/md5)
+   4 07/18/11 21:08:09 host/full.qualified.domain.name@REALM.TLD (AES-256 CTS mode with 96-bit SHA-1 HMAC)
+   4 07/18/11 21:08:09 host/full.qualified.domain.name@REALM.TLD (AES-128 CTS mode with 96-bit SHA-1 HMAC)
+   4 07/18/11 21:08:09 host/full.qualified.domain.name@REALM.TLD (ArcFour with HMAC/md5)
+----
+
+    The Secondary NameNode keytab file, on that host, should look like the
+    following:
+
+----
+$ klist -e -k -t /etc/security/keytab/sn.service.keytab
+Keytab name: FILE:/etc/security/keytab/sn.service.keytab
+KVNO Timestamp         Principal
+   4 07/18/11 21:08:09 sn/full.qualified.domain.name@REALM.TLD (AES-256 CTS mode with 96-bit SHA-1 HMAC)
+   4 07/18/11 21:08:09 sn/full.qualified.domain.name@REALM.TLD (AES-128 CTS mode with 96-bit SHA-1 HMAC)
+   4 07/18/11 21:08:09 sn/full.qualified.domain.name@REALM.TLD (ArcFour with HMAC/md5)
+   4 07/18/11 21:08:09 host/full.qualified.domain.name@REALM.TLD (AES-256 CTS mode with 96-bit SHA-1 HMAC)
+   4 07/18/11 21:08:09 host/full.qualified.domain.name@REALM.TLD (AES-128 CTS mode with 96-bit SHA-1 HMAC)
+   4 07/18/11 21:08:09 host/full.qualified.domain.name@REALM.TLD (ArcFour with HMAC/md5)
+----
+
+    The DataNode keytab file, on each host, should look like the following:
+
+----
+$ klist -e -k -t /etc/security/keytab/dn.service.keytab
+Keytab name: FILE:/etc/security/keytab/dn.service.keytab
+KVNO Timestamp         Principal
+   4 07/18/11 21:08:09 dn/full.qualified.domain.name@REALM.TLD (AES-256 CTS mode with 96-bit SHA-1 HMAC)
+   4 07/18/11 21:08:09 dn/full.qualified.domain.name@REALM.TLD (AES-128 CTS mode with 96-bit SHA-1 HMAC)
+   4 07/18/11 21:08:09 dn/full.qualified.domain.name@REALM.TLD (ArcFour with HMAC/md5)
+   4 07/18/11 21:08:09 host/full.qualified.domain.name@REALM.TLD (AES-256 CTS mode with 96-bit SHA-1 HMAC)
+   4 07/18/11 21:08:09 host/full.qualified.domain.name@REALM.TLD (AES-128 CTS mode with 96-bit SHA-1 HMAC)
+   4 07/18/11 21:08:09 host/full.qualified.domain.name@REALM.TLD (ArcFour with HMAC/md5)
+----
+
+*** YARN
+
+    The ResourceManager keytab file, on the ResourceManager host, should look
+    like the following:
+
+----
+$ klist -e -k -t /etc/security/keytab/rm.service.keytab
+Keytab name: FILE:/etc/security/keytab/rm.service.keytab
+KVNO Timestamp         Principal
+   4 07/18/11 21:08:09 rm/full.qualified.domain.name@REALM.TLD (AES-256 CTS mode with 96-bit SHA-1 HMAC)
+   4 07/18/11 21:08:09 rm/full.qualified.domain.name@REALM.TLD (AES-128 CTS mode with 96-bit SHA-1 HMAC)
+   4 07/18/11 21:08:09 rm/full.qualified.domain.name@REALM.TLD (ArcFour with HMAC/md5)
+   4 07/18/11 21:08:09 host/full.qualified.domain.name@REALM.TLD (AES-256 CTS mode with 96-bit SHA-1 HMAC)
+   4 07/18/11 21:08:09 host/full.qualified.domain.name@REALM.TLD (AES-128 CTS mode with 96-bit SHA-1 HMAC)
+   4 07/18/11 21:08:09 host/full.qualified.domain.name@REALM.TLD (ArcFour with HMAC/md5)
+----
+
+    The NodeManager keytab file, on each host, should look like the following:
+
+----
+$ klist -e -k -t /etc/security/keytab/nm.service.keytab
+Keytab name: FILE:/etc/security/keytab/nm.service.keytab
+KVNO Timestamp         Principal
+   4 07/18/11 21:08:09 nm/full.qualified.domain.name@REALM.TLD (AES-256 CTS mode with 96-bit SHA-1 HMAC)
+   4 07/18/11 21:08:09 nm/full.qualified.domain.name@REALM.TLD (AES-128 CTS mode with 96-bit SHA-1 HMAC)
+   4 07/18/11 21:08:09 nm/full.qualified.domain.name@REALM.TLD (ArcFour with HMAC/md5)
+   4 07/18/11 21:08:09 host/full.qualified.domain.name@REALM.TLD (AES-256 CTS mode with 96-bit SHA-1 HMAC)
+   4 07/18/11 21:08:09 host/full.qualified.domain.name@REALM.TLD (AES-128 CTS mode with 96-bit SHA-1 HMAC)
+   4 07/18/11 21:08:09 host/full.qualified.domain.name@REALM.TLD (ArcFour with HMAC/md5)
+----
+
+*** MapReduce JobHistory Server
+
+    The MapReduce JobHistory Server keytab file, on that host, should look
+    like the following:
+
+----
+$ klist -e -k -t /etc/security/keytab/jhs.service.keytab
+Keytab name: FILE:/etc/security/keytab/jhs.service.keytab
+KVNO Timestamp         Principal
+   4 07/18/11 21:08:09 jhs/full.qualified.domain.name@REALM.TLD (AES-256 CTS mode with 96-bit SHA-1 HMAC)
+   4 07/18/11 21:08:09 jhs/full.qualified.domain.name@REALM.TLD (AES-128 CTS mode with 96-bit SHA-1 HMAC)
+   4 07/18/11 21:08:09 jhs/full.qualified.domain.name@REALM.TLD (ArcFour with HMAC/md5)
+   4 07/18/11 21:08:09 host/full.qualified.domain.name@REALM.TLD (AES-256 CTS mode with 96-bit SHA-1 HMAC)
+   4 07/18/11 21:08:09 host/full.qualified.domain.name@REALM.TLD (AES-128 CTS mode with 96-bit SHA-1 HMAC)
+   4 07/18/11 21:08:09 host/full.qualified.domain.name@REALM.TLD (ArcFour with HMAC/md5)
+----
+
+** Mapping from Kerberos principal to OS user account
+
+  Hadoop maps Kerberos principal to OS user account using
+  the rule specified by <<<hadoop.security.auth_to_local>>>
+  which works in the same way as the <<<auth_to_local>>> in
+  {{{http://web.mit.edu/Kerberos/krb5-latest/doc/admin/conf_files/krb5_conf.html}Kerberos configuration file (krb5.conf)}}.
+
+  By default, it picks the first component of principal name as a user name
+  if the realms matches to the <<<defalut_realm>>> (usually defined in /etc/krb5.conf).
+  For example, <<<host/full.qualified.domain.name@REALM.TLD>>> is mapped to <<<host>>>
+  by default rule.
+
+** Mapping from user to group
+
+  Though files on HDFS are associated to owner and group,
+  Hadoop does not have the definition of group by itself.
+  Mapping from user to group is done by OS or LDAP.
+
+  You can change a way of mapping by
+  specifying the name of mapping provider as a value of
+  <<<hadoop.security.group.mapping>>>
+  See {{{../hadoop-hdfs/HdfsPermissionsGuide.html}HDFS Permissions Guide}} for details.
+
+  Practically you need to manage SSO environment using Kerberos with LDAP
+  for Hadoop in secure mode.
+
+** Proxy user
+
+  Some products such as Apache Oozie which access the services of Hadoop
+  on behalf of end users need to be able to impersonate end users.
+  You can configure proxy user using properties
+  <<<hadoop.proxyuser.${superuser}.hosts>>> and <<<hadoop.proxyuser.${superuser}.groups>>>.
+
+  For example, by specifying as below in core-site.xml,
+  user named <<<oozie>>> accessing from any host
+  can impersonate any user belonging to any group.
+
+----
+  <property>
+    <name>hadoop.proxyuser.oozie.hosts</name>
+    <value>*</value>
+  </property>
+  <property>
+    <name>hadoop.proxyuser.oozie.groups</name>
+    <value>*</value>
+  </property>
+----
+
+** Secure DataNode
+
+  Because the data transfer protocol of DataNode
+  does not use the RPC framework of Hadoop,
+  DataNode must authenticate itself by
+  using privileged ports which are specified by
+  <<<dfs.datanode.address>>> and <<<dfs.datanode.http.address>>>.
+  This authentication is based on the assumption
+  that the attacker won't be able to get root privileges.
+
+  When you execute <<<hdfs datanode>>> command as root,
+  server process binds privileged port at first,
+  then drops privilege and runs as the user account specified by
+  <<<HADOOP_SECURE_DN_USER>>>.
+  This startup process uses jsvc installed to <<<JSVC_HOME>>>.
+  You must specify <<<HADOOP_SECURE_DN_USER>>> and <<<JSVC_HOME>>>
+  as environment variables on start up (in hadoop-env.sh).
+
+
+* Data confidentiality
+
+** Data Encryption on RPC
+
+  The data transfered between hadoop services and clients.
+  Setting <<<hadoop.rpc.protection>>> to <<<"privacy">>> in the core-site.xml
+  activate data encryption.
+
+** Data Encryption on Block data transfer.
+
+  You need to set <<<dfs.encrypt.data.transfer>>> to <<<"true">>> in the hdfs-site.xml
+  in order to activate data encryption for data transfer protocol of DataNode.
+
+** Data Encryption on HTTP
+
+  Data transfer between Web-console and clients are protected by using SSL(HTTPS).
+
+
+* Configuration
+
+** Permissions for both HDFS and local fileSystem paths
+
+  The following table lists various paths on HDFS and local filesystems (on
+  all nodes) and recommended permissions:
+
+*-------------------+-------------------+------------------+------------------+
+|| Filesystem       || Path             || User:Group      || Permissions     |
+*-------------------+-------------------+------------------+------------------+
+| local | <<<dfs.namenode.name.dir>>> | hdfs:hadoop | drwx------ |
+*-------------------+-------------------+------------------+------------------+
+| local | <<<dfs.datanode.data.dir>>> | hdfs:hadoop | drwx------ |
+*-------------------+-------------------+------------------+------------------+
+| local | $HADOOP_LOG_DIR | hdfs:hadoop | drwxrwxr-x |
+*-------------------+-------------------+------------------+------------------+
+| local | $YARN_LOG_DIR | yarn:hadoop | drwxrwxr-x |
+*-------------------+-------------------+------------------+------------------+
+| local | <<<yarn.nodemanager.local-dirs>>> | yarn:hadoop | drwxr-xr-x |
+*-------------------+-------------------+------------------+------------------+
+| local | <<<yarn.nodemanager.log-dirs>>> | yarn:hadoop | drwxr-xr-x |
+*-------------------+-------------------+------------------+------------------+
+| local | container-executor | root:hadoop | --Sr-s--- |
+*-------------------+-------------------+------------------+------------------+
+| local | <<<conf/container-executor.cfg>>> | root:hadoop | r-------- |
+*-------------------+-------------------+------------------+------------------+
+| hdfs | / | hdfs:hadoop | drwxr-xr-x |
+*-------------------+-------------------+------------------+------------------+
+| hdfs | /tmp | hdfs:hadoop | drwxrwxrwxt |
+*-------------------+-------------------+------------------+------------------+
+| hdfs | /user | hdfs:hadoop | drwxr-xr-x |
+*-------------------+-------------------+------------------+------------------+
+| hdfs | <<<yarn.nodemanager.remote-app-log-dir>>> | yarn:hadoop | drwxrwxrwxt |
+*-------------------+-------------------+------------------+------------------+
+| hdfs | <<<mapreduce.jobhistory.intermediate-done-dir>>> | mapred:hadoop | |
+| | | | drwxrwxrwxt |
+*-------------------+-------------------+------------------+------------------+
+| hdfs | <<<mapreduce.jobhistory.done-dir>>> | mapred:hadoop | |
+| | | | drwxr-x--- |
+*-------------------+-------------------+------------------+------------------+
+
+** Common Configurations
+
+  In order to turn on RPC authentication in hadoop,
+  set the value of <<<hadoop.security.authentication>>> property to
+  <<<"kerberos">>>, and set security related settings listed below appropriately.
+
+  The following properties should be in the <<<core-site.xml>>> of all the
+  nodes in the cluster.
+
+*-------------------------+-------------------------+------------------------+
+|| Parameter              || Value                  || Notes                 |
+*-------------------------+-------------------------+------------------------+
+| <<<hadoop.security.authentication>>> | <kerberos> | |
+| | | <<<simple>>> : No authentication. (default) \
+| | | <<<kerberos>>> : Enable authentication by Kerberos. |
+*-------------------------+-------------------------+------------------------+
+| <<<hadoop.security.authorization>>> | <true> | |
+| | | Enable {{{./ServiceLevelAuth.html}RPC service-level authorization}}. |
+*-------------------------+-------------------------+------------------------+
+| <<<hadoop.rpc.protection>>> | <authentication> |
+| | | <authentication> : authentication only (default) \
+| | | <integrity> : integrity check in addition to authentication \
+| | | <privacy> : data encryption in addition to integrity |
+*-------------------------+-------------------------+------------------------+
+| <<<hadoop.security.auth_to_local>>> | | |
+| |  <<<RULE:>>><exp1>\
+| |  <<<RULE:>>><exp2>\
+| |  <...>\
+| |  DEFAULT |
+| | | The value is string containing new line characters.
+| | | See
+| | | {{{http://web.mit.edu/Kerberos/krb5-latest/doc/admin/conf_files/krb5_conf.html}Kerberos documentation}}
+| | | for format for <exp>.
+*-------------------------+-------------------------+------------------------+
+| <<<hadoop.proxyuser.>>><superuser><<<.hosts>>> | | |
+| | | comma separated hosts from which <superuser> access are allowd to impersonation. |
+| | | <<<*>>> means wildcard. |
+*-------------------------+-------------------------+------------------------+
+| <<<hadoop.proxyuser.>>><superuser><<<.groups>>> | | |
+| | | comma separated groups to which users impersonated by <superuser> belongs. |
+| | | <<<*>>> means wildcard. |
+*-------------------------+-------------------------+------------------------+
+Configuration for <<<conf/core-site.xml>>>
+
+**  NameNode
+
+*-------------------------+-------------------------+------------------------+
+|| Parameter              || Value                  || Notes                 |
+*-------------------------+-------------------------+------------------------+
+| <<<dfs.block.access.token.enable>>> | <true> |  |
+| | | Enable HDFS block access tokens for secure operations. |
+*-------------------------+-------------------------+------------------------+
+| <<<dfs.https.enable>>> | <true> | |
+| | | This value is deprecated. Use dfs.http.policy |
+*-------------------------+-------------------------+------------------------+
+| <<<dfs.http.policy>>> | <HTTP_ONLY> or <HTTPS_ONLY> or <HTTP_AND_HTTPS> | |
+| | | HTTPS_ONLY turns off http access |
+*-------------------------+-------------------------+------------------------+
+| <<<dfs.namenode.https-address>>> | <nn_host_fqdn:50470> | |
+*-------------------------+-------------------------+------------------------+
+| <<<dfs.https.port>>> | <50470> | |
+*-------------------------+-------------------------+------------------------+
+| <<<dfs.namenode.keytab.file>>> | </etc/security/keytab/nn.service.keytab> | |
+| | | Kerberos keytab file for the NameNode. |
+*-------------------------+-------------------------+------------------------+
+| <<<dfs.namenode.kerberos.principal>>> | nn/_HOST@REALM.TLD | |
+| | | Kerberos principal name for the NameNode. |
+*-------------------------+-------------------------+------------------------+
+| <<<dfs.namenode.kerberos.https.principal>>> | host/_HOST@REALM.TLD | |
+| | | HTTPS Kerberos principal name for the NameNode. |
+*-------------------------+-------------------------+------------------------+
+Configuration for <<<conf/hdfs-site.xml>>>
+
+**  Secondary NameNode
+
+*-------------------------+-------------------------+------------------------+
+|| Parameter              || Value                  || Notes                 |
+*-------------------------+-------------------------+------------------------+
+| <<<dfs.namenode.secondary.http-address>>> | <c_nn_host_fqdn:50090> | |
+*-------------------------+-------------------------+------------------------+
+| <<<dfs.namenode.secondary.https-port>>> | <50470> | |
+*-------------------------+-------------------------+------------------------+
+| <<<dfs.namenode.secondary.keytab.file>>> | | |
+| | </etc/security/keytab/sn.service.keytab> | |
+| | | Kerberos keytab file for the NameNode. |
+*-------------------------+-------------------------+------------------------+
+| <<<dfs.namenode.secondary.kerberos.principal>>> | sn/_HOST@REALM.TLD | |
+| | | Kerberos principal name for the Secondary NameNode. |
+*-------------------------+-------------------------+------------------------+
+| <<<dfs.namenode.secondary.kerberos.https.principal>>> | | |
+| | host/_HOST@REALM.TLD | |
+| | | HTTPS Kerberos principal name for the Secondary NameNode. |
+*-------------------------+-------------------------+------------------------+
+Configuration for <<<conf/hdfs-site.xml>>>
+
+** DataNode
+
+*-------------------------+-------------------------+------------------------+
+|| Parameter              || Value                  || Notes                 |
+*-------------------------+-------------------------+------------------------+
+| <<<dfs.datanode.data.dir.perm>>> | 700 | |
+*-------------------------+-------------------------+------------------------+
+| <<<dfs.datanode.address>>> | <0.0.0.0:1004> | |
+| | | Secure DataNode must use privileged port |
+| | | in order to assure that the server was started securely. |
+| | | This means that the server must be started via jsvc. |
+*-------------------------+-------------------------+------------------------+
+| <<<dfs.datanode.http.address>>> | <0.0.0.0:1006> | |
+| | | Secure DataNode must use privileged port |
+| | | in order to assure that the server was started securely. |
+| | | This means that the server must be started via jsvc. |
+*-------------------------+-------------------------+------------------------+
+| <<<dfs.datanode.https.address>>> | <0.0.0.0:50470> | |
+*-------------------------+-------------------------+------------------------+
+| <<<dfs.datanode.keytab.file>>> | </etc/security/keytab/dn.service.keytab> | |
+| | | Kerberos keytab file for the DataNode. |
+*-------------------------+-------------------------+------------------------+
+| <<<dfs.datanode.kerberos.principal>>> | dn/_HOST@REALM.TLD | |
+| | | Kerberos principal name for the DataNode. |
+*-------------------------+-------------------------+------------------------+
+| <<<dfs.datanode.kerberos.https.principal>>> | | |
+| | host/_HOST@REALM.TLD | |
+| | | HTTPS Kerberos principal name for the DataNode. |
+*-------------------------+-------------------------+------------------------+
+| <<<dfs.encrypt.data.transfer>>> | <false> | |
+| | | set to <<<true>>> when using data encryption |
+*-------------------------+-------------------------+------------------------+
+Configuration for <<<conf/hdfs-site.xml>>>
+
+
+** WebHDFS
+
+*-------------------------+-------------------------+------------------------+
+|| Parameter              || Value                  || Notes                 |
+*-------------------------+-------------------------+------------------------+
+| <<<dfs.webhdfs.enabled>>> | http/_HOST@REALM.TLD | |
+| | | Enable security on WebHDFS. |
+*-------------------------+-------------------------+------------------------+
+| <<<dfs.web.authentication.kerberos.principal>>> | http/_HOST@REALM.TLD | |
+| | | Kerberos keytab file for the WebHDFS. |
+*-------------------------+-------------------------+------------------------+
+| <<<dfs.web.authentication.kerberos.keytab>>> | </etc/security/keytab/http.service.keytab> | |
+| | | Kerberos principal name for WebHDFS. |
+*-------------------------+-------------------------+------------------------+
+Configuration for <<<conf/hdfs-site.xml>>>
+
+
+** ResourceManager
+
+*-------------------------+-------------------------+------------------------+
+|| Parameter              || Value                  || Notes                 |
+*-------------------------+-------------------------+------------------------+
+| <<<yarn.resourcemanager.keytab>>> | | |
+| | </etc/security/keytab/rm.service.keytab> | |
+| | | Kerberos keytab file for the ResourceManager. |
+*-------------------------+-------------------------+------------------------+
+| <<<yarn.resourcemanager.principal>>> | rm/_HOST@REALM.TLD | |
+| | | Kerberos principal name for the ResourceManager. |
+*-------------------------+-------------------------+------------------------+
+Configuration for  <<<conf/yarn-site.xml>>>
+
+**  NodeManager
+
+*-------------------------+-------------------------+------------------------+
+|| Parameter              || Value                  || Notes                 |
+*-------------------------+-------------------------+------------------------+
+| <<<yarn.nodemanager.keytab>>> | </etc/security/keytab/nm.service.keytab> | |
+| | | Kerberos keytab file for the NodeManager. |
+*-------------------------+-------------------------+------------------------+
+| <<<yarn.nodemanager.principal>>> | nm/_HOST@REALM.TLD | |
+| | | Kerberos principal name for the NodeManager. |
+*-------------------------+-------------------------+------------------------+
+| <<<yarn.nodemanager.container-executor.class>>> | | |
+| | <<<org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor>>> |
+| | | Use LinuxContainerExecutor. |
+*-------------------------+-------------------------+------------------------+
+| <<<yarn.nodemanager.linux-container-executor.group>>> | <hadoop> | |
+| | | Unix group of the NodeManager. |
+*-------------------------+-------------------------+------------------------+
+| <<<yarn.nodemanager.linux-container-executor.path>>> | </path/to/bin/container-executor> | |
+| | | The path to the executable of Linux container executor. |
+*-------------------------+-------------------------+------------------------+
+Configuration for  <<<conf/yarn-site.xml>>>
+
+** Configuration for WebAppProxy
+
+    The <<<WebAppProxy>>> provides a proxy between the web applications
+    exported by an application and an end user.  If security is enabled
+    it will warn users before accessing a potentially unsafe web application.
+    Authentication and authorization using the proxy is handled just like
+    any other privileged web application.
+
+*-------------------------+-------------------------+------------------------+
+|| Parameter              || Value                  || Notes                 |
+*-------------------------+-------------------------+------------------------+
+| <<<yarn.web-proxy.address>>> | | |
+| | <<<WebAppProxy>>> host:port for proxy to AM web apps. | |
+| | | <host:port> if this is the same as <<<yarn.resourcemanager.webapp.address>>>|
+| | | or it is not defined then the <<<ResourceManager>>> will run the proxy|
+| | | otherwise a standalone proxy server will need to be launched.|
+*-------------------------+-------------------------+------------------------+
+| <<<yarn.web-proxy.keytab>>> | | |
+| | </etc/security/keytab/web-app.service.keytab> | |
+| | | Kerberos keytab file for the WebAppProxy. |
+*-------------------------+-------------------------+------------------------+
+| <<<yarn.web-proxy.principal>>> | wap/_HOST@REALM.TLD | |
+| | | Kerberos principal name for the WebAppProxy. |
+*-------------------------+-------------------------+------------------------+
+Configuration for  <<<conf/yarn-site.xml>>>
+
+** LinuxContainerExecutor
+
+    A <<<ContainerExecutor>>> used by YARN framework which define how any
+    <container> launched and controlled.
+
+    The following are the available in Hadoop YARN:
+
+*--------------------------------------+--------------------------------------+
+|| ContainerExecutor                   || Description                         |
+*--------------------------------------+--------------------------------------+
+| <<<DefaultContainerExecutor>>>             | |
+| | The default executor which YARN uses to manage container execution. |
+| | The container process has the same Unix user as the NodeManager.  |
+*--------------------------------------+--------------------------------------+
+| <<<LinuxContainerExecutor>>>               | |
+| | Supported only on GNU/Linux, this executor runs the containers as either the |
+| | YARN user who submitted the application (when full security is enabled) or |
+| | as a dedicated user (defaults to nobody) when full security is not enabled. |
+| | When full security is enabled, this executor requires all user accounts to be |
+| | created on the cluster nodes where the containers are launched. It uses |
+| | a <setuid> executable that is included in the Hadoop distribution. |
+| | The NodeManager uses this executable to launch and kill containers. |
+| | The setuid executable switches to the user who has submitted the |
+| | application and launches or kills the containers. For maximum security, |
+| | this executor sets up restricted permissions and user/group ownership of |
+| | local files and directories used by the containers such as the shared |
+| | objects, jars, intermediate files, log files etc. Particularly note that, |
+| | because of this, except the application owner and NodeManager, no other |
+| | user can access any of the local files/directories including those |
+| | localized as part of the distributed cache. |
+*--------------------------------------+--------------------------------------+
+
+    To build the LinuxContainerExecutor executable run:
+
+----
+ $ mvn package -Dcontainer-executor.conf.dir=/etc/hadoop/
+----
+
+    The path passed in <<<-Dcontainer-executor.conf.dir>>> should be the
+    path on the cluster nodes where a configuration file for the setuid
+    executable should be located. The executable should be installed in
+    $HADOOP_YARN_HOME/bin.
+
+    The executable must have specific permissions: 6050 or --Sr-s---
+    permissions user-owned by <root> (super-user) and group-owned by a
+    special group (e.g. <<<hadoop>>>) of which the NodeManager Unix user is
+    the group member and no ordinary application user is. If any application
+    user belongs to this special group, security will be compromised. This
+    special group name should be specified for the configuration property
+    <<<yarn.nodemanager.linux-container-executor.group>>> in both
+    <<<conf/yarn-site.xml>>> and <<<conf/container-executor.cfg>>>.
+
+    For example, let's say that the NodeManager is run as user <yarn> who is
+    part of the groups users and <hadoop>, any of them being the primary group.
+    Let also be that <users> has both <yarn> and another user
+    (application submitter) <alice> as its members, and <alice> does not
+    belong to <hadoop>. Going by the above description, the setuid/setgid
+    executable should be set 6050 or --Sr-s--- with user-owner as <yarn> and
+    group-owner as <hadoop> which has <yarn> as its member (and not <users>
+    which has <alice> also as its member besides <yarn>).
+
+    The LinuxTaskController requires that paths including and leading up to
+    the directories specified in <<<yarn.nodemanager.local-dirs>>> and
+    <<<yarn.nodemanager.log-dirs>>> to be set 755 permissions as described
+    above in the table on permissions on directories.
+
+      * <<<conf/container-executor.cfg>>>
+
+    The executable requires a configuration file called
+    <<<container-executor.cfg>>> to be present in the configuration
+    directory passed to the mvn target mentioned above.
+
+    The configuration file must be owned by the user running NodeManager
+    (user <<<yarn>>> in the above example), group-owned by anyone and
+    should have the permissions 0400 or r--------.
+
+    The executable requires following configuration items to be present
+    in the <<<conf/container-executor.cfg>>> file. The items should be
+    mentioned as simple key=value pairs, one per-line:
+
+*-------------------------+-------------------------+------------------------+
+|| Parameter              || Value                  || Notes                 |
+*-------------------------+-------------------------+------------------------+
+| <<<yarn.nodemanager.linux-container-executor.group>>> | <hadoop> | |
+| | | Unix group of the NodeManager. The group owner of the |
+| | |<container-executor> binary should be this group. Should be same as the |
+| | | value with which the NodeManager is configured. This configuration is |
+| | | required for validating the secure access of the <container-executor> |
+| | | binary. |
+*-------------------------+-------------------------+------------------------+
+| <<<banned.users>>> | hfds,yarn,mapred,bin | Banned users. |
+*-------------------------+-------------------------+------------------------+
+| <<<allowed.system.users>>> | foo,bar | Allowed system users. |
+*-------------------------+-------------------------+------------------------+
+| <<<min.user.id>>> | 1000 | Prevent other super-users. |
+*-------------------------+-------------------------+------------------------+
+Configuration for  <<<conf/yarn-site.xml>>>
+
+      To re-cap, here are the local file-sysytem permissions required for the
+      various paths related to the <<<LinuxContainerExecutor>>>:
+
+*-------------------+-------------------+------------------+------------------+
+|| Filesystem       || Path             || User:Group      || Permissions     |
+*-------------------+-------------------+------------------+------------------+
+| local | container-executor | root:hadoop | --Sr-s--- |
+*-------------------+-------------------+------------------+------------------+
+| local | <<<conf/container-executor.cfg>>> | root:hadoop | r-------- |
+*-------------------+-------------------+------------------+------------------+
+| local | <<<yarn.nodemanager.local-dirs>>> | yarn:hadoop | drwxr-xr-x |
+*-------------------+-------------------+------------------+------------------+
+| local | <<<yarn.nodemanager.log-dirs>>> | yarn:hadoop | drwxr-xr-x |
+*-------------------+-------------------+------------------+------------------+
+
+**  MapReduce JobHistory Server
+
+*-------------------------+-------------------------+------------------------+
+|| Parameter              || Value                  || Notes                 |
+*-------------------------+-------------------------+------------------------+
+| <<<mapreduce.jobhistory.address>>> | | |
+| | MapReduce JobHistory Server <host:port> | Default port is 10020. |
+*-------------------------+-------------------------+------------------------+
+| <<<mapreduce.jobhistory.keytab>>> | |
+| | </etc/security/keytab/jhs.service.keytab> | |
+| | | Kerberos keytab file for the MapReduce JobHistory Server. |
+*-------------------------+-------------------------+------------------------+
+| <<<mapreduce.jobhistory.principal>>> | jhs/_HOST@REALM.TLD | |
+| | | Kerberos principal name for the MapReduce JobHistory Server. |
+*-------------------------+-------------------------+------------------------+
+Configuration for  <<<conf/mapred-site.xml>>>

+ 3 - 0
hadoop-hdfs-project/hadoop-hdfs/CHANGES.txt

@@ -693,6 +693,9 @@ Release 2.3.0 - UNRELEASED
     HDFS-5677. Need error checking for HA cluster configuration.
     (Vincent Sheffer via cos)
 
+    HADOOP-10086. User document for authentication in secure cluster.
+    (Masatake Iwasaki via Arpit Agarwal)
+
   OPTIMIZATIONS
 
   BUG FIXES

+ 1 - 0
hadoop-project/src/site/site.xml

@@ -59,6 +59,7 @@
       <item name="CLI Mini Cluster" href="hadoop-project-dist/hadoop-common/CLIMiniCluster.html"/>
       <item name="Native Libraries" href="hadoop-project-dist/hadoop-common/NativeLibraries.html"/>
       <item name="Superusers" href="hadoop-project-dist/hadoop-common/Superusers.html"/>
+      <item name="Secure Mode" href="hadoop-project-dist/hadoop-common/SecureMode.html"/>
       <item name="Service Level Authorization" href="hadoop-project-dist/hadoop-common/ServiceLevelAuth.html"/>
       <item name="HTTP Authentication" href="hadoop-project-dist/hadoop-common/HttpAuthentication.html"/>
     </menu>