123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119 |
- <!---
- Licensed under the Apache License, Version 2.0 (the "License");
- you may not use this file except in compliance with the License.
- You may obtain a copy of the License at
- http://www.apache.org/licenses/LICENSE-2.0
- Unless required by applicable law or agreed to in writing, software
- distributed under the License is distributed on an "AS IS" BASIS,
- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- See the License for the specific language governing permissions and
- limitations under the License. See accompanying LICENSE file.
- -->
- Hadoop HDFS over HTTP - Server Setup
- ====================================
- This page explains how to quickly setup HttpFS with Pseudo authentication against a Hadoop cluster with Pseudo authentication.
- Install HttpFS
- --------------
- ~ $ tar xzf httpfs-${project.version}.tar.gz
- Configure HttpFS
- ----------------
- By default, HttpFS assumes that Hadoop configuration files (`core-site.xml & hdfs-site.xml`) are in the HttpFS configuration directory.
- If this is not the case, add to the `httpfs-site.xml` file the `httpfs.hadoop.config.dir` property set to the location of the Hadoop configuration directory.
- Configure Hadoop
- ----------------
- Edit Hadoop `core-site.xml` and defined the Unix user that will run the HttpFS server as a proxyuser. For example:
- ```xml
- <property>
- <name>hadoop.proxyuser.#HTTPFSUSER#.hosts</name>
- <value>httpfs-host.foo.com</value>
- </property>
- <property>
- <name>hadoop.proxyuser.#HTTPFSUSER#.groups</name>
- <value>*</value>
- </property>
- ```
- IMPORTANT: Replace `#HTTPFSUSER#` with the Unix user that will start the HttpFS server.
- Restart Hadoop
- --------------
- You need to restart Hadoop for the proxyuser configuration ot become active.
- Start/Stop HttpFS
- -----------------
- To start/stop HttpFS use HttpFS's sbin/httpfs.sh script. For example:
- $ sbin/httpfs.sh start
- NOTE: Invoking the script without any parameters list all possible parameters (start, stop, run, etc.). The `httpfs.sh` script is a wrapper for Tomcat's `catalina.sh` script that sets the environment variables and Java System properties required to run HttpFS server.
- Test HttpFS is working
- ----------------------
- $ curl -sS 'http://<HTTPFSHOSTNAME>:14000/webhdfs/v1?op=gethomedirectory&user.name=hdfs'
- {"Path":"\/user\/hdfs"}
- Embedded Tomcat Configuration
- -----------------------------
- To configure the embedded Tomcat go to the `tomcat/conf`.
- HttpFS preconfigures the HTTP and Admin ports in Tomcat's `server.xml` to 14000 and 14001.
- Tomcat logs are also preconfigured to go to HttpFS's `logs/` directory.
- The following environment variables (which can be set in HttpFS's `etc/hadoop/httpfs-env.sh` script) can be used to alter those values:
- * HTTPFS\_HTTP\_PORT
- * HTTPFS\_ADMIN\_PORT
- * HADOOP\_LOG\_DIR
- HttpFS Configuration
- --------------------
- HttpFS supports the following [configuration properties](./httpfs-default.html) in the HttpFS's `etc/hadoop/httpfs-site.xml` configuration file.
- HttpFS over HTTPS (SSL)
- -----------------------
- To configure HttpFS to work over SSL edit the [httpfs-env.sh](#httpfs-env.sh) script in the configuration directory setting the [HTTPFS\_SSL\_ENABLED](#HTTPFS_SSL_ENABLED) to [true](#true).
- In addition, the following 2 properties may be defined (shown with default values):
- * HTTPFS\_SSL\_KEYSTORE\_FILE=$HOME/.keystore
- * HTTPFS\_SSL\_KEYSTORE\_PASS=password
- In the HttpFS `tomcat/conf` directory, replace the `server.xml` file with the `ssl-server.xml` file.
- You need to create an SSL certificate for the HttpFS server. As the `httpfs` Unix user, using the Java `keytool` command to create the SSL certificate:
- $ keytool -genkey -alias tomcat -keyalg RSA
- You will be asked a series of questions in an interactive prompt. It will create the keystore file, which will be named **.keystore** and located in the `httpfs` user home directory.
- The password you enter for "keystore password" must match the value of the `HTTPFS_SSL_KEYSTORE_PASS` environment variable set in the `httpfs-env.sh` script in the configuration directory.
- The answer to "What is your first and last name?" (i.e. "CN") must be the hostname of the machine where the HttpFS Server will be running.
- Start HttpFS. It should work over HTTPS.
- Using the Hadoop `FileSystem` API or the Hadoop FS shell, use the `swebhdfs://` scheme. Make sure the JVM is picking up the truststore containing the public key of the SSL certificate if using a self-signed certificate.
- NOTE: Some old SSL clients may use weak ciphers that are not supported by the HttpFS server. It is recommended to upgrade the SSL client.
|