123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204 |
- ~~ Licensed under the Apache License, Version 2.0 (the "License");
- ~~ you may not use this file except in compliance with the License.
- ~~ You may obtain a copy of the License at
- ~~
- ~~ http://www.apache.org/licenses/LICENSE-2.0
- ~~
- ~~ Unless required by applicable law or agreed to in writing, software
- ~~ distributed under the License is distributed on an "AS IS" BASIS,
- ~~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- ~~ See the License for the specific language governing permissions and
- ~~ limitations under the License. See accompanying LICENSE file.
- ---
- Hadoop Map Reduce Next Generation-${project.version} - Docker Container Executor
- ---
- ---
- ${maven.build.timestamp}
- Docker Container Executor
- %{toc|section=1|fromDepth=0}
- * {Overview}
- Docker (https://www.docker.io/) combines an easy-to-use interface to
- Linux containers with easy-to-construct image files for those
- containers. In short, Docker launches very light weight virtual
- machines.
- The Docker Container Executor (DCE) allows the YARN NodeManager to
- launch YARN containers into Docker containers. Users can specify the
- Docker images they want for their YARN containers. These containers
- provide a custom software environment in which the user's code runs,
- isolated from the software environment of the NodeManager. These
- containers can include special libraries needed by the application,
- and they can have different versions of Perl, Python, and even Java
- than what is installed on the NodeManager. Indeed, these containers
- can run a different flavor of Linux than what is running on the
- NodeManager -- although the YARN container must define all the environments
- and libraries needed to run the job, nothing will be shared with the NodeManager.
- Docker for YARN provides both consistency (all YARN containers will
- have the same software environment) and isolation (no interference
- with whatever is installed on the physical machine).
-
- * {Cluster Configuration}
- Docker Container Executor runs in non-secure mode of HDFS and
- YARN. It will not run in secure mode, and will exit if it detects
- secure mode.
- The DockerContainerExecutor requires Docker daemon to be running on
- the NodeManagers, and the Docker client installed and able to start Docker
- containers. To prevent timeouts while starting jobs, the Docker
- images to be used by a job should already be downloaded in the
- NodeManagers. Here's an example of how this can be done:
- ----
- sudo docker pull sequenceiq/hadoop-docker:2.4.1
- ----
- This should be done as part of the NodeManager startup.
- The following properties must be set in yarn-site.xml:
- ----
- <property>
- <name>yarn.nodemanager.docker-container-executor.exec-name</name>
- <value>/usr/bin/docker</value>
- <description>
- Name or path to the Docker client. This is a required parameter. If this is empty,
- user must pass an image name as part of the job invocation(see below).
- </description>
- </property>
- <property>
- <name>yarn.nodemanager.container-executor.class</name>
- <value>org.apache.hadoop.yarn.server.nodemanager.DockerContainerExecutor</value>
- <description>
- This is the container executor setting that ensures that all
- jobs are started with the DockerContainerExecutor.
- </description>
- </property>
- ----
- Administrators should be aware that DCE doesn't currently provide
- user name-space isolation. This means, in particular, that software
- running as root in the YARN container will have root privileges in the
- underlying NodeManager. Put differently, DCE currently provides no
- better security guarantees than YARN's Default Container Executor. In
- fact, DockerContainerExecutor will exit if it detects secure yarn.
- * {Tips for connecting to a secure docker repository}
- By default, docker images are pulled from the docker public repository. The
- format of a docker image url is: <username>/<image_name>. For example,
- sequenceiq/hadoop-docker:2.4.1 is an image in docker public repository that contains java and
- hadoop.
- If you want your own private repository, you provide the repository url instead of
- your username. Therefore, the image url becomes: <private_repo_url>/<image_name>.
- For example, if your repository is on localhost:8080, your images would be like:
- localhost:8080/hadoop-docker
- To connect to a secure docker repository, you can use the following invocation:
- ----
- docker login [OPTIONS] [SERVER]
- Register or log in to a Docker registry server, if no server is specified
- "https://index.docker.io/v1/" is the default.
- -e, --email="" Email
- -p, --password="" Password
- -u, --username="" Username
- ----
- If you want to login to a self-hosted registry you can specify this by adding
- the server name.
- ----
- docker login <private_repo_url>
- ----
- This needs to be run as part of the NodeManager startup, or as a cron job if
- the login session expires periodically. You can login to multiple docker repositories
- from the same NodeManager, but all your users will have access to all your repositories,
- as at present the DockerContainerExecutor does not support per-job docker login.
- * {Job Configuration}
- Currently you cannot configure any of the Docker settings with the job configuration.
- You can provide Mapper, Reducer, and ApplicationMaster environment overrides for the
- docker images, using the following 3 JVM properties respectively(only for MR jobs):
- * mapreduce.map.env: You can override the mapper's image by passing
- yarn.nodemanager.docker-container-executor.image-name=<your_image_name>
- to this JVM property.
- * mapreduce.reduce.env: You can override the reducer's image by passing
- yarn.nodemanager.docker-container-executor.image-name=<your_image_name>
- to this JVM property.
- * yarn.app.mapreduce.am.env: You can override the ApplicationMaster's image
- by passing yarn.nodemanager.docker-container-executor.image-name=<your_image_name>
- to this JVM property.
- * {Docker Image requirements}
- The Docker Images used for YARN containers must meet the following
- requirements:
- The distro and version of Linux in your Docker Image can be quite different
- from that of your NodeManager. (Docker does have a few limitations in this
- regard, but you're not likely to hit them.) However, if you're using the
- MapReduce framework, then your image will need to be configured for running
- Hadoop. Java must be installed in the container, and the following environment variables
- must be defined in the image: JAVA_HOME, HADOOP_COMMON_PATH, HADOOP_HDFS_HOME,
- HADOOP_MAPRED_HOME, HADOOP_YARN_HOME, and HADOOP_CONF_DIR
- * {Working example of yarn launched docker containers.}
- The following example shows how to run teragen using DockerContainerExecutor.
- * First ensure that YARN is properly configured with DockerContainerExecutor(see above).
- ----
- <property>
- <name>yarn.nodemanager.docker-container-executor.exec-name</name>
- <value>docker -H=tcp://0.0.0.0:4243</value>
- <description>
- Name or path to the Docker client. The tcp socket must be
- where docker daemon is listening.
- </description>
- </property>
- <property>
- <name>yarn.nodemanager.container-executor.class</name>
- <value>org.apache.hadoop.yarn.server.nodemanager.DockerContainerExecutor</value>
- <description>
- This is the container executor setting that ensures that all
- jobs are started with the DockerContainerExecutor.
- </description>
- </property>
- ----
- * Pick a custom Docker image if you want. In this example, we'll use sequenceiq/hadoop-docker:2.4.1 from the
- docker hub repository. It has jdk, hadoop, and all the previously mentioned environment variables configured.
- * Run:
- ----
- hadoop jar $HADOOP_INSTALLATION_DIR/share/hadoop/mapreduce/hadoop-mapreduce-examples-*.jar \
- teragen \
- -Dmapreduce.map.env="yarn.nodemanager.docker-container-executor.image-name=sequenceiq/hadoop-docker:2.4.1" \
- -Dyarn.app.mapreduce.am.env="yarn.nodemanager.docker-container-executor.image-name=sequenceiq/hadoop-docker:2.4.1" \
- 1000 \
- teragen_out_dir
- ----
- Once it succeeds, you can check the yarn debug logs to verify that docker indeed has launched containers.
|