123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376 |
- ~~ Licensed under the Apache License, Version 2.0 (the "License");
- ~~ you may not use this file except in compliance with the License.
- ~~ You may obtain a copy of the License at
- ~~
- ~~ http://www.apache.org/licenses/LICENSE-2.0
- ~~
- ~~ Unless required by applicable law or agreed to in writing, software
- ~~ distributed under the License is distributed on an "AS IS" BASIS,
- ~~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- ~~ See the License for the specific language governing permissions and
- ~~ limitations under the License. See accompanying LICENSE file.
- ---
- Hadoop Distributed File System-${project.version} - HDFS NFS Gateway
- ---
- ---
- ${maven.build.timestamp}
- HDFS NFS Gateway
- %{toc|section=1|fromDepth=0}
- * {Overview}
- The NFS Gateway supports NFSv3 and allows HDFS to be mounted as part of the client's local file system.
- Currently NFS Gateway supports and enables the following usage patterns:
- * Users can browse the HDFS file system through their local file system
- on NFSv3 client compatible operating systems.
- * Users can download files from the the HDFS file system on to their
- local file system.
- * Users can upload files from their local file system directly to the
- HDFS file system.
- * Users can stream data directly to HDFS through the mount point. File
- append is supported but random write is not supported.
- The NFS gateway machine needs the same thing to run an HDFS client like Hadoop JAR files, HADOOP_CONF directory.
- The NFS gateway can be on the same host as DataNode, NameNode, or any HDFS client.
- * {Configuration}
- The NFS-gateway uses proxy user to proxy all the users accessing the NFS mounts.
- In non-secure mode, the user running the gateway is the proxy user, while in secure mode the
- user in Kerberos keytab is the proxy user. Suppose the proxy user is 'nfsserver'
- and users belonging to the groups 'users-group1'
- and 'users-group2' use the NFS mounts, then in core-site.xml of the NameNode, the following
- two properities must be set and only NameNode needs restart after the configuration change
- (NOTE: replace the string 'nfsserver' with the proxy user name in your cluster):
- ----
- <property>
- <name>hadoop.proxyuser.nfsserver.groups</name>
- <value>root,users-group1,users-group2</value>
- <description>
- The 'nfsserver' user is allowed to proxy all members of the 'users-group1' and
- 'users-group2' groups. Note that in most cases you will need to include the
- group "root" because the user "root" (which usually belonges to "root" group) will
- generally be the user that initially executes the mount on the NFS client system.
- Set this to '*' to allow nfsserver user to proxy any group.
- </description>
- </property>
- ----
- ----
- <property>
- <name>hadoop.proxyuser.nfsserver.hosts</name>
- <value>nfs-client-host1.com</value>
- <description>
- This is the host where the nfs gateway is running. Set this to '*' to allow
- requests from any hosts to be proxied.
- </description>
- </property>
- ----
- The above are the only required configuration for the NFS gateway in non-secure mode. For Kerberized
- hadoop clusters, the following configurations need to be added to hdfs-site.xml for the gateway (NOTE: replace
- string "nfsserver" with the proxy user name and ensure the user contained in the keytab is
- also the same proxy user):
- ----
- <property>
- <name>nfs.keytab.file</name>
- <value>/etc/hadoop/conf/nfsserver.keytab</value> <!-- path to the nfs gateway keytab -->
- </property>
- ----
- ----
- <property>
- <name>nfs.kerberos.principal</name>
- <value>nfsserver/_HOST@YOUR-REALM.COM</value>
- </property>
- ----
-
- The rest of the NFS gateway configurations are optional for both secure and non-secure mode.
- The AIX NFS client has a {{{https://issues.apache.org/jira/browse/HDFS-6549}few known issues}}
- that prevent it from working correctly by default with the HDFS NFS
- Gateway. If you want to be able to access the HDFS NFS Gateway from AIX, you
- should set the following configuration setting to enable work-arounds for these
- issues:
- ----
- <property>
- <name>nfs.aix.compatibility.mode.enabled</name>
- <value>true</value>
- </property>
- ----
- Note that regular, non-AIX clients should NOT enable AIX compatibility mode.
- The work-arounds implemented by AIX compatibility mode effectively disable
- safeguards to ensure that listing of directory contents via NFS returns
- consistent results, and that all data sent to the NFS server can be assured to
- have been committed.
- It's strongly recommended for the users to update a few configuration properties based on their use
- cases. All the following configuration properties can be added or updated in hdfs-site.xml.
-
- * If the client mounts the export with access time update allowed, make sure the following
- property is not disabled in the configuration file. Only NameNode needs to restart after
- this property is changed. On some Unix systems, the user can disable access time update
- by mounting the export with "noatime". If the export is mounted with "noatime", the user
- doesn't need to change the following property and thus no need to restart namenode.
- ----
- <property>
- <name>dfs.namenode.accesstime.precision</name>
- <value>3600000</value>
- <description>The access time for HDFS file is precise upto this value.
- The default value is 1 hour. Setting a value of 0 disables
- access times for HDFS.
- </description>
- </property>
- ----
- * Users are expected to update the file dump directory. NFS client often
- reorders writes. Sequential writes can arrive at the NFS gateway at random
- order. This directory is used to temporarily save out-of-order writes
- before writing to HDFS. For each file, the out-of-order writes are dumped after
- they are accumulated to exceed certain threshold (e.g., 1MB) in memory.
- One needs to make sure the directory has enough
- space. For example, if the application uploads 10 files with each having
- 100MB, it is recommended for this directory to have roughly 1GB space in case if a
- worst-case write reorder happens to every file. Only NFS gateway needs to restart after
- this property is updated.
- ----
- <property>
- <name>nfs.dump.dir</name>
- <value>/tmp/.hdfs-nfs</value>
- </property>
- ----
- * By default, the export can be mounted by any client. To better control the access,
- users can update the following property. The value string contains machine name and
- access privilege, separated by whitespace
- characters. The machine name format can be a single host, a Java regular expression, or an IPv4 address. The
- access privilege uses rw or ro to specify read/write or read-only access of the machines to exports. If the access
- privilege is not provided, the default is read-only. Entries are separated by ";".
- For example: "192.168.0.0/22 rw ; host.*\.example\.com ; host1.test.org ro;". Only the NFS gateway needs to restart after
- this property is updated.
- ----
- <property>
- <name>nfs.exports.allowed.hosts</name>
- <value>* rw</value>
- </property>
- ----
- * Customize log settings. To get NFS debug trace, users can edit the log4j.property file
- to add the following. Note, debug trace, especially for ONCRPC, can be very verbose.
- To change logging level:
- -----------------------------------------------
- log4j.logger.org.apache.hadoop.hdfs.nfs=DEBUG
- -----------------------------------------------
- To get more details of ONCRPC requests:
- -----------------------------------------------
- log4j.logger.org.apache.hadoop.oncrpc=DEBUG
- -----------------------------------------------
- * {Start and stop NFS gateway service}
- Three daemons are required to provide NFS service: rpcbind (or portmap), mountd and nfsd.
- The NFS gateway process has both nfsd and mountd. It shares the HDFS root "/" as the
- only export. It is recommended to use the portmap included in NFS gateway package. Even
- though NFS gateway works with portmap/rpcbind provide by most Linux distributions, the
- package included portmap is needed on some Linux systems such as REHL6.2 due to an
- {{{https://bugzilla.redhat.com/show_bug.cgi?id=731542}rpcbind bug}}. More detailed discussions can
- be found in {{{https://issues.apache.org/jira/browse/HDFS-4763}HDFS-4763}}.
- [[1]] Stop nfs/rpcbind/portmap services provided by the platform (commands can be different on various Unix platforms):
-
- -------------------------
- service nfs stop
-
- service rpcbind stop
- -------------------------
- [[2]] Start package included portmap (needs root privileges):
- -------------------------
- hdfs portmap
-
- OR
- hadoop-daemon.sh start portmap
- -------------------------
- [[3]] Start mountd and nfsd.
-
- No root privileges are required for this command. In non-secure mode, the NFS gateway
- should be started by the proxy user mentioned at the beginning of this user guide.
- While in secure mode, any user can start NFS gateway
- as long as the user has read access to the Kerberos keytab defined in "nfs.keytab.file".
- -------------------------
- hdfs nfs3
- OR
- hadoop-daemon.sh start nfs3
- -------------------------
- Note, if the hadoop-daemon.sh script starts the NFS gateway, its log can be found in the hadoop log folder.
- [[4]] Stop NFS gateway services.
- -------------------------
- hadoop-daemon.sh stop nfs3
- hadoop-daemon.sh stop portmap
- -------------------------
- Optionally, you can forgo running the Hadoop-provided portmap daemon and
- instead use the system portmap daemon on all operating systems if you start the
- NFS Gateway as root. This will allow the HDFS NFS Gateway to work around the
- aforementioned bug and still register using the system portmap daemon. To do
- so, just start the NFS gateway daemon as you normally would, but make sure to
- do so as the "root" user, and also set the "HADOOP_PRIVILEGED_NFS_USER"
- environment variable to an unprivileged user. In this mode the NFS Gateway will
- start as root to perform its initial registration with the system portmap, and
- then will drop privileges back to the user specified by the
- HADOOP_PRIVILEGED_NFS_USER afterward and for the rest of the duration of the
- lifetime of the NFS Gateway process. Note that if you choose this route, you
- should skip steps 1 and 2 above.
- * {Verify validity of NFS related services}
- [[1]] Execute the following command to verify if all the services are up and running:
- -------------------------
- rpcinfo -p $nfs_server_ip
- -------------------------
- You should see output similar to the following:
- -------------------------
- program vers proto port
- 100005 1 tcp 4242 mountd
- 100005 2 udp 4242 mountd
- 100005 2 tcp 4242 mountd
- 100000 2 tcp 111 portmapper
- 100000 2 udp 111 portmapper
- 100005 3 udp 4242 mountd
- 100005 1 udp 4242 mountd
- 100003 3 tcp 2049 nfs
- 100005 3 tcp 4242 mountd
- -------------------------
- [[2]] Verify if the HDFS namespace is exported and can be mounted.
- -------------------------
- showmount -e $nfs_server_ip
- -------------------------
- You should see output similar to the following:
-
- -------------------------
- Exports list on $nfs_server_ip :
- / (everyone)
- -------------------------
- * {Mount the export “/”}
- Currently NFS v3 only uses TCP as the transportation protocol.
- NLM is not supported so mount option "nolock" is needed. It's recommended to use
- hard mount. This is because, even after the client sends all data to
- NFS gateway, it may take NFS gateway some extra time to transfer data to HDFS
- when writes were reorderd by NFS client Kernel.
-
- If soft mount has to be used, the user should give it a relatively
- long timeout (at least no less than the default timeout on the host) .
- The users can mount the HDFS namespace as shown below:
- -------------------------------------------------------------------
- mount -t nfs -o vers=3,proto=tcp,nolock,noacl $server:/ $mount_point
- -------------------------------------------------------------------
- Then the users can access HDFS as part of the local file system except that,
- hard link and random write are not supported yet. To optimize the performance
- of large file I/O, one can increase the NFS transfer size(rsize and wsize) during mount.
- By default, NFS gateway supports 1MB as the maximum transfer size. For larger data
- transfer size, one needs to update "nfs.rtmax" and "nfs.rtmax" in hdfs-site.xml.
- * {Allow mounts from unprivileged clients}
- In environments where root access on client machines is not generally
- available, some measure of security can be obtained by ensuring that only NFS
- clients originating from privileged ports can connect to the NFS server. This
- feature is referred to as "port monitoring." This feature is not enabled by default
- in the HDFS NFS Gateway, but can be optionally enabled by setting the
- following config in hdfs-site.xml on the NFS Gateway machine:
- -------------------------------------------------------------------
- <property>
- <name>nfs.port.monitoring.disabled</name>
- <value>false</value>
- </property>
- -------------------------------------------------------------------
- * {User authentication and mapping}
- NFS gateway in this release uses AUTH_UNIX style authentication. When the user on NFS client
- accesses the mount point, NFS client passes the UID to NFS gateway.
- NFS gateway does a lookup to find user name from the UID, and then passes the
- username to the HDFS along with the HDFS requests.
- For example, if the NFS client has current user as "admin", when the user accesses
- the mounted directory, NFS gateway will access HDFS as user "admin". To access HDFS
- as the user "hdfs", one needs to switch the current user to "hdfs" on the client system
- when accessing the mounted directory.
- The system administrator must ensure that the user on NFS client host has the same
- name and UID as that on the NFS gateway host. This is usually not a problem if
- the same user management system (e.g., LDAP/NIS) is used to create and deploy users on
- HDFS nodes and NFS client node. In case the user account is created manually on different hosts, one might need to
- modify UID (e.g., do "usermod -u 123 myusername") on either NFS client or NFS gateway host
- in order to make it the same on both sides. More technical details of RPC AUTH_UNIX can be found
- in {{{http://tools.ietf.org/html/rfc1057}RPC specification}}.
- Optionally, the system administrator can configure a custom static mapping
- file in the event one wishes to access the HDFS NFS Gateway from a system with
- a completely disparate set of UIDs/GIDs. By default this file is located at
- "/etc/nfs.map", but a custom location can be configured by setting the
- "nfs.static.mapping.file" property to the path of the static mapping file.
- The format of the static mapping file is similar to what is described in the
- exports(5) manual page, but roughly it is:
- -------------------------
- # Mapping for clients accessing the NFS gateway
- uid 10 100 # Map the remote UID 10 the local UID 100
- gid 11 101 # Map the remote GID 11 to the local GID 101
- -------------------------
|