DockerContainerExecutor.apt.vm 7.9 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204
  1. ~~ Licensed under the Apache License, Version 2.0 (the "License");
  2. ~~ you may not use this file except in compliance with the License.
  3. ~~ You may obtain a copy of the License at
  4. ~~
  5. ~~ http://www.apache.org/licenses/LICENSE-2.0
  6. ~~
  7. ~~ Unless required by applicable law or agreed to in writing, software
  8. ~~ distributed under the License is distributed on an "AS IS" BASIS,
  9. ~~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  10. ~~ See the License for the specific language governing permissions and
  11. ~~ limitations under the License. See accompanying LICENSE file.
  12. ---
  13. Hadoop Map Reduce Next Generation-${project.version} - Docker Container Executor
  14. ---
  15. ---
  16. ${maven.build.timestamp}
  17. Docker Container Executor
  18. %{toc|section=1|fromDepth=0}
  19. * {Overview}
  20. Docker (https://www.docker.io/) combines an easy-to-use interface to
  21. Linux containers with easy-to-construct image files for those
  22. containers. In short, Docker launches very light weight virtual
  23. machines.
  24. The Docker Container Executor (DCE) allows the YARN NodeManager to
  25. launch YARN containers into Docker containers. Users can specify the
  26. Docker images they want for their YARN containers. These containers
  27. provide a custom software environment in which the user's code runs,
  28. isolated from the software environment of the NodeManager. These
  29. containers can include special libraries needed by the application,
  30. and they can have different versions of Perl, Python, and even Java
  31. than what is installed on the NodeManager. Indeed, these containers
  32. can run a different flavor of Linux than what is running on the
  33. NodeManager -- although the YARN container must define all the environments
  34. and libraries needed to run the job, nothing will be shared with the NodeManager.
  35. Docker for YARN provides both consistency (all YARN containers will
  36. have the same software environment) and isolation (no interference
  37. with whatever is installed on the physical machine).
  38. * {Cluster Configuration}
  39. Docker Container Executor runs in non-secure mode of HDFS and
  40. YARN. It will not run in secure mode, and will exit if it detects
  41. secure mode.
  42. The DockerContainerExecutor requires Docker daemon to be running on
  43. the NodeManagers, and the Docker client installed and able to start Docker
  44. containers. To prevent timeouts while starting jobs, the Docker
  45. images to be used by a job should already be downloaded in the
  46. NodeManagers. Here's an example of how this can be done:
  47. ----
  48. sudo docker pull sequenceiq/hadoop-docker:2.4.1
  49. ----
  50. This should be done as part of the NodeManager startup.
  51. The following properties must be set in yarn-site.xml:
  52. ----
  53. <property>
  54. <name>yarn.nodemanager.docker-container-executor.exec-name</name>
  55. <value>/usr/bin/docker</value>
  56. <description>
  57. Name or path to the Docker client. This is a required parameter. If this is empty,
  58. user must pass an image name as part of the job invocation(see below).
  59. </description>
  60. </property>
  61. <property>
  62. <name>yarn.nodemanager.container-executor.class</name>
  63. <value>org.apache.hadoop.yarn.server.nodemanager.DockerContainerExecutor</value>
  64. <description>
  65. This is the container executor setting that ensures that all
  66. jobs are started with the DockerContainerExecutor.
  67. </description>
  68. </property>
  69. ----
  70. Administrators should be aware that DCE doesn't currently provide
  71. user name-space isolation. This means, in particular, that software
  72. running as root in the YARN container will have root privileges in the
  73. underlying NodeManager. Put differently, DCE currently provides no
  74. better security guarantees than YARN's Default Container Executor. In
  75. fact, DockerContainerExecutor will exit if it detects secure yarn.
  76. * {Tips for connecting to a secure docker repository}
  77. By default, docker images are pulled from the docker public repository. The
  78. format of a docker image url is: <username>/<image_name>. For example,
  79. sequenceiq/hadoop-docker:2.4.1 is an image in docker public repository that contains java and
  80. hadoop.
  81. If you want your own private repository, you provide the repository url instead of
  82. your username. Therefore, the image url becomes: <private_repo_url>/<image_name>.
  83. For example, if your repository is on localhost:8080, your images would be like:
  84. localhost:8080/hadoop-docker
  85. To connect to a secure docker repository, you can use the following invocation:
  86. ----
  87. docker login [OPTIONS] [SERVER]
  88. Register or log in to a Docker registry server, if no server is specified
  89. "https://index.docker.io/v1/" is the default.
  90. -e, --email="" Email
  91. -p, --password="" Password
  92. -u, --username="" Username
  93. ----
  94. If you want to login to a self-hosted registry you can specify this by adding
  95. the server name.
  96. ----
  97. docker login <private_repo_url>
  98. ----
  99. This needs to be run as part of the NodeManager startup, or as a cron job if
  100. the login session expires periodically. You can login to multiple docker repositories
  101. from the same NodeManager, but all your users will have access to all your repositories,
  102. as at present the DockerContainerExecutor does not support per-job docker login.
  103. * {Job Configuration}
  104. Currently you cannot configure any of the Docker settings with the job configuration.
  105. You can provide Mapper, Reducer, and ApplicationMaster environment overrides for the
  106. docker images, using the following 3 JVM properties respectively(only for MR jobs):
  107. * mapreduce.map.env: You can override the mapper's image by passing
  108. yarn.nodemanager.docker-container-executor.image-name=<your_image_name>
  109. to this JVM property.
  110. * mapreduce.reduce.env: You can override the reducer's image by passing
  111. yarn.nodemanager.docker-container-executor.image-name=<your_image_name>
  112. to this JVM property.
  113. * yarn.app.mapreduce.am.env: You can override the ApplicationMaster's image
  114. by passing yarn.nodemanager.docker-container-executor.image-name=<your_image_name>
  115. to this JVM property.
  116. * {Docker Image requirements}
  117. The Docker Images used for YARN containers must meet the following
  118. requirements:
  119. The distro and version of Linux in your Docker Image can be quite different
  120. from that of your NodeManager. (Docker does have a few limitations in this
  121. regard, but you're not likely to hit them.) However, if you're using the
  122. MapReduce framework, then your image will need to be configured for running
  123. Hadoop. Java must be installed in the container, and the following environment variables
  124. must be defined in the image: JAVA_HOME, HADOOP_COMMON_PATH, HADOOP_HDFS_HOME,
  125. HADOOP_MAPRED_HOME, HADOOP_YARN_HOME, and HADOOP_CONF_DIR
  126. * {Working example of yarn launched docker containers.}
  127. The following example shows how to run teragen using DockerContainerExecutor.
  128. * First ensure that YARN is properly configured with DockerContainerExecutor(see above).
  129. ----
  130. <property>
  131. <name>yarn.nodemanager.docker-container-executor.exec-name</name>
  132. <value>docker -H=tcp://0.0.0.0:4243</value>
  133. <description>
  134. Name or path to the Docker client. The tcp socket must be
  135. where docker daemon is listening.
  136. </description>
  137. </property>
  138. <property>
  139. <name>yarn.nodemanager.container-executor.class</name>
  140. <value>org.apache.hadoop.yarn.server.nodemanager.DockerContainerExecutor</value>
  141. <description>
  142. This is the container executor setting that ensures that all
  143. jobs are started with the DockerContainerExecutor.
  144. </description>
  145. </property>
  146. ----
  147. * Pick a custom Docker image if you want. In this example, we'll use sequenceiq/hadoop-docker:2.4.1 from the
  148. docker hub repository. It has jdk, hadoop, and all the previously mentioned environment variables configured.
  149. * Run:
  150. ----
  151. hadoop jar $HADOOP_INSTALLATION_DIR/share/hadoop/mapreduce/hadoop-mapreduce-examples-*.jar \
  152. teragen \
  153. -Dmapreduce.map.env="yarn.nodemanager.docker-container-executor.image-name=sequenceiq/hadoop-docker:2.4.1" \
  154. -Dyarn.app.mapreduce.am.env="yarn.nodemanager.docker-container-executor.image-name=sequenceiq/hadoop-docker:2.4.1" \
  155. 1000 \
  156. teragen_out_dir
  157. ----
  158. Once it succeeds, you can check the yarn debug logs to verify that docker indeed has launched containers.