EncryptedShuffle.apt.vm 13 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320
  1. ~~ Licensed under the Apache License, Version 2.0 (the "License");
  2. ~~ you may not use this file except in compliance with the License.
  3. ~~ You may obtain a copy of the License at
  4. ~~
  5. ~~ http://www.apache.org/licenses/LICENSE-2.0
  6. ~~
  7. ~~ Unless required by applicable law or agreed to in writing, software
  8. ~~ distributed under the License is distributed on an "AS IS" BASIS,
  9. ~~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  10. ~~ See the License for the specific language governing permissions and
  11. ~~ limitations under the License. See accompanying LICENSE file.
  12. ---
  13. Hadoop Map Reduce Next Generation-${project.version} - Encrypted Shuffle
  14. ---
  15. ---
  16. ${maven.build.timestamp}
  17. Hadoop MapReduce Next Generation - Encrypted Shuffle
  18. * {Introduction}
  19. The Encrypted Shuffle capability allows encryption of the MapReduce shuffle
  20. using HTTPS and with optional client authentication (also known as
  21. bi-directional HTTPS, or HTTPS with client certificates). It comprises:
  22. * A Hadoop configuration setting for toggling the shuffle between HTTP and
  23. HTTPS.
  24. * A Hadoop configuration settings for specifying the keystore and truststore
  25. properties (location, type, passwords) used by the shuffle service and the
  26. reducers tasks fetching shuffle data.
  27. * A way to re-load truststores across the cluster (when a node is added or
  28. removed).
  29. * {Configuration}
  30. ** <<core-site.xml>> Properties
  31. To enable encrypted shuffle, set the following properties in core-site.xml of
  32. all nodes in the cluster:
  33. *--------------------------------------+---------------------+-----------------+
  34. | <<Property>> | <<Default Value>> | <<Explanation>> |
  35. *--------------------------------------+---------------------+-----------------+
  36. | <<<hadoop.ssl.require.client.cert>>> | <<<false>>> | Whether client certificates are required |
  37. *--------------------------------------+---------------------+-----------------+
  38. | <<<hadoop.ssl.hostname.verifier>>> | <<<DEFAULT>>> | The hostname verifier to provide for HttpsURLConnections. Valid values are: <<DEFAULT>>, <<STRICT>>, <<STRICT_I6>>, <<DEFAULT_AND_LOCALHOST>> and <<ALLOW_ALL>> |
  39. *--------------------------------------+---------------------+-----------------+
  40. | <<<hadoop.ssl.keystores.factory.class>>> | <<<org.apache.hadoop.security.ssl.FileBasedKeyStoresFactory>>> | The KeyStoresFactory implementation to use |
  41. *--------------------------------------+---------------------+-----------------+
  42. | <<<hadoop.ssl.server.conf>>> | <<<ss-server.xml>>> | Resource file from which ssl server keystore information will be extracted. This file is looked up in the classpath, typically it should be in Hadoop conf/ directory |
  43. *--------------------------------------+---------------------+-----------------+
  44. | <<<hadoop.ssl.client.conf>>> | <<<ss-client.xml>>> | Resource file from which ssl server keystore information will be extracted. This file is looked up in the classpath, typically it should be in Hadoop conf/ directory |
  45. *--------------------------------------+---------------------+-----------------+
  46. | <<<hadoop.ssl.enabled.protocols>>> | <<<TLSv1>>> | The supported SSL protocols (JDK6 can use <<TLSv1>>, JDK7+ can use <<TLSv1,TLSv1.1,TLSv1.2>>) |
  47. *--------------------------------------+---------------------+-----------------+
  48. <<IMPORTANT:>> Currently requiring client certificates should be set to false.
  49. Refer the {{{ClientCertificates}Client Certificates}} section for details.
  50. <<IMPORTANT:>> All these properties should be marked as final in the cluster
  51. configuration files.
  52. *** Example:
  53. ------
  54. ...
  55. <property>
  56. <name>hadoop.ssl.require.client.cert</name>
  57. <value>false</value>
  58. <final>true</final>
  59. </property>
  60. <property>
  61. <name>hadoop.ssl.hostname.verifier</name>
  62. <value>DEFAULT</value>
  63. <final>true</final>
  64. </property>
  65. <property>
  66. <name>hadoop.ssl.keystores.factory.class</name>
  67. <value>org.apache.hadoop.security.ssl.FileBasedKeyStoresFactory</value>
  68. <final>true</final>
  69. </property>
  70. <property>
  71. <name>hadoop.ssl.server.conf</name>
  72. <value>ssl-server.xml</value>
  73. <final>true</final>
  74. </property>
  75. <property>
  76. <name>hadoop.ssl.client.conf</name>
  77. <value>ssl-client.xml</value>
  78. <final>true</final>
  79. </property>
  80. ...
  81. ------
  82. ** <<<mapred-site.xml>>> Properties
  83. To enable encrypted shuffle, set the following property in mapred-site.xml
  84. of all nodes in the cluster:
  85. *--------------------------------------+---------------------+-----------------+
  86. | <<Property>> | <<Default Value>> | <<Explanation>> |
  87. *--------------------------------------+---------------------+-----------------+
  88. | <<<mapreduce.shuffle.ssl.enabled>>> | <<<false>>> | Whether encrypted shuffle is enabled |
  89. *--------------------------------------+---------------------+-----------------+
  90. <<IMPORTANT:>> This property should be marked as final in the cluster
  91. configuration files.
  92. *** Example:
  93. ------
  94. ...
  95. <property>
  96. <name>mapreduce.shuffle.ssl.enabled</name>
  97. <value>true</value>
  98. <final>true</final>
  99. </property>
  100. ...
  101. ------
  102. The Linux container executor should be set to prevent job tasks from
  103. reading the server keystore information and gaining access to the shuffle
  104. server certificates.
  105. Refer to Hadoop Kerberos configuration for details on how to do this.
  106. * {Keystore and Truststore Settings}
  107. Currently <<<FileBasedKeyStoresFactory>>> is the only <<<KeyStoresFactory>>>
  108. implementation. The <<<FileBasedKeyStoresFactory>>> implementation uses the
  109. following properties, in the <<ssl-server.xml>> and <<ssl-client.xml>> files,
  110. to configure the keystores and truststores.
  111. ** <<<ssl-server.xml>>> (Shuffle server) Configuration:
  112. The mapred user should own the <<ssl-server.xml>> file and have exclusive
  113. read access to it.
  114. *---------------------------------------------+---------------------+-----------------+
  115. | <<Property>> | <<Default Value>> | <<Explanation>> |
  116. *---------------------------------------------+---------------------+-----------------+
  117. | <<<ssl.server.keystore.type>>> | <<<jks>>> | Keystore file type |
  118. *---------------------------------------------+---------------------+-----------------+
  119. | <<<ssl.server.keystore.location>>> | NONE | Keystore file location. The mapred user should own this file and have exclusive read access to it. |
  120. *---------------------------------------------+---------------------+-----------------+
  121. | <<<ssl.server.keystore.password>>> | NONE | Keystore file password |
  122. *---------------------------------------------+---------------------+-----------------+
  123. | <<<ssl.server.truststore.type>>> | <<<jks>>> | Truststore file type |
  124. *---------------------------------------------+---------------------+-----------------+
  125. | <<<ssl.server.truststore.location>>> | NONE | Truststore file location. The mapred user should own this file and have exclusive read access to it. |
  126. *---------------------------------------------+---------------------+-----------------+
  127. | <<<ssl.server.truststore.password>>> | NONE | Truststore file password |
  128. *---------------------------------------------+---------------------+-----------------+
  129. | <<<ssl.server.truststore.reload.interval>>> | 10000 | Truststore reload interval, in milliseconds |
  130. *--------------------------------------+----------------------------+-----------------+
  131. *** Example:
  132. ------
  133. <configuration>
  134. <!-- Server Certificate Store -->
  135. <property>
  136. <name>ssl.server.keystore.type</name>
  137. <value>jks</value>
  138. </property>
  139. <property>
  140. <name>ssl.server.keystore.location</name>
  141. <value>${user.home}/keystores/server-keystore.jks</value>
  142. </property>
  143. <property>
  144. <name>ssl.server.keystore.password</name>
  145. <value>serverfoo</value>
  146. </property>
  147. <!-- Server Trust Store -->
  148. <property>
  149. <name>ssl.server.truststore.type</name>
  150. <value>jks</value>
  151. </property>
  152. <property>
  153. <name>ssl.server.truststore.location</name>
  154. <value>${user.home}/keystores/truststore.jks</value>
  155. </property>
  156. <property>
  157. <name>ssl.server.truststore.password</name>
  158. <value>clientserverbar</value>
  159. </property>
  160. <property>
  161. <name>ssl.server.truststore.reload.interval</name>
  162. <value>10000</value>
  163. </property>
  164. </configuration>
  165. ------
  166. ** <<<ssl-client.xml>>> (Reducer/Fetcher) Configuration:
  167. The mapred user should own the <<ssl-server.xml>> file and it should have
  168. default permissions.
  169. *---------------------------------------------+---------------------+-----------------+
  170. | <<Property>> | <<Default Value>> | <<Explanation>> |
  171. *---------------------------------------------+---------------------+-----------------+
  172. | <<<ssl.client.keystore.type>>> | <<<jks>>> | Keystore file type |
  173. *---------------------------------------------+---------------------+-----------------+
  174. | <<<ssl.client.keystore.location>>> | NONE | Keystore file location. The mapred user should own this file and it should have default permissions. |
  175. *---------------------------------------------+---------------------+-----------------+
  176. | <<<ssl.client.keystore.password>>> | NONE | Keystore file password |
  177. *---------------------------------------------+---------------------+-----------------+
  178. | <<<ssl.client.truststore.type>>> | <<<jks>>> | Truststore file type |
  179. *---------------------------------------------+---------------------+-----------------+
  180. | <<<ssl.client.truststore.location>>> | NONE | Truststore file location. The mapred user should own this file and it should have default permissions. |
  181. *---------------------------------------------+---------------------+-----------------+
  182. | <<<ssl.client.truststore.password>>> | NONE | Truststore file password |
  183. *---------------------------------------------+---------------------+-----------------+
  184. | <<<ssl.client.truststore.reload.interval>>> | 10000 | Truststore reload interval, in milliseconds |
  185. *--------------------------------------+----------------------------+-----------------+
  186. *** Example:
  187. ------
  188. <configuration>
  189. <!-- Client certificate Store -->
  190. <property>
  191. <name>ssl.client.keystore.type</name>
  192. <value>jks</value>
  193. </property>
  194. <property>
  195. <name>ssl.client.keystore.location</name>
  196. <value>${user.home}/keystores/client-keystore.jks</value>
  197. </property>
  198. <property>
  199. <name>ssl.client.keystore.password</name>
  200. <value>clientfoo</value>
  201. </property>
  202. <!-- Client Trust Store -->
  203. <property>
  204. <name>ssl.client.truststore.type</name>
  205. <value>jks</value>
  206. </property>
  207. <property>
  208. <name>ssl.client.truststore.location</name>
  209. <value>${user.home}/keystores/truststore.jks</value>
  210. </property>
  211. <property>
  212. <name>ssl.client.truststore.password</name>
  213. <value>clientserverbar</value>
  214. </property>
  215. <property>
  216. <name>ssl.client.truststore.reload.interval</name>
  217. <value>10000</value>
  218. </property>
  219. </configuration>
  220. ------
  221. * Activating Encrypted Shuffle
  222. When you have made the above configuration changes, activate Encrypted
  223. Shuffle by re-starting all NodeManagers.
  224. <<IMPORTANT:>> Using encrypted shuffle will incur in a significant
  225. performance impact. Users should profile this and potentially reserve
  226. 1 or more cores for encrypted shuffle.
  227. * {ClientCertificates} Client Certificates
  228. Using Client Certificates does not fully ensure that the client is a
  229. reducer task for the job. Currently, Client Certificates (their private key)
  230. keystore files must be readable by all users submitting jobs to the cluster.
  231. This means that a rogue job could read such those keystore files and use
  232. the client certificates in them to establish a secure connection with a
  233. Shuffle server. However, unless the rogue job has a proper JobToken, it won't
  234. be able to retrieve shuffle data from the Shuffle server. A job, using its
  235. own JobToken, can only retrieve shuffle data that belongs to itself.
  236. * Reloading Truststores
  237. By default the truststores will reload their configuration every 10 seconds.
  238. If a new truststore file is copied over the old one, it will be re-read,
  239. and its certificates will replace the old ones. This mechanism is useful for
  240. adding or removing nodes from the cluster, or for adding or removing trusted
  241. clients. In these cases, the client or NodeManager certificate is added to
  242. (or removed from) all the truststore files in the system, and the new
  243. configuration will be picked up without you having to restart the NodeManager
  244. daemons.
  245. * Debugging
  246. <<NOTE:>> Enable debugging only for troubleshooting, and then only for jobs
  247. running on small amounts of data. It is very verbose and slows down jobs by
  248. several orders of magnitude. (You might need to increase mapred.task.timeout
  249. to prevent jobs from failing because tasks run so slowly.)
  250. To enable SSL debugging in the reducers, set <<<-Djavax.net.debug=all>>> in
  251. the <<<mapreduce.reduce.child.java.opts>>> property; for example:
  252. ------
  253. <property>
  254. <name>mapred.reduce.child.java.opts</name>
  255. <value>-Xmx-200m -Djavax.net.debug=all</value>
  256. </property>
  257. ------
  258. You can do this on a per-job basis, or by means of a cluster-wide setting in
  259. the <<<mapred-site.xml>>> file.
  260. To set this property in NodeManager, set it in the <<<yarn-env.sh>>> file:
  261. ------
  262. YARN_NODEMANAGER_OPTS="-Djavax.net.debug=all $YARN_NODEMANAGER_OPTS"
  263. ------