CentralizedCacheManagement.apt.vm 14 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344
  1. ~~ Licensed under the Apache License, Version 2.0 (the "License");
  2. ~~ you may not use this file except in compliance with the License.
  3. ~~ You may obtain a copy of the License at
  4. ~~
  5. ~~ http://www.apache.org/licenses/LICENSE-2.0
  6. ~~
  7. ~~ Unless required by applicable law or agreed to in writing, software
  8. ~~ distributed under the License is distributed on an "AS IS" BASIS,
  9. ~~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  10. ~~ See the License for the specific language governing permissions and
  11. ~~ limitations under the License. See accompanying LICENSE file.
  12. ---
  13. Hadoop Distributed File System-${project.version} - Centralized Cache Management in HDFS
  14. ---
  15. ---
  16. ${maven.build.timestamp}
  17. Centralized Cache Management in HDFS
  18. %{toc|section=1|fromDepth=2|toDepth=4}
  19. * {Overview}
  20. <Centralized cache management> in HDFS is an explicit caching mechanism that
  21. allows users to specify <paths> to be cached by HDFS. The NameNode will
  22. communicate with DataNodes that have the desired blocks on disk, and instruct
  23. them to cache the blocks in off-heap caches.
  24. Centralized cache management in HDFS has many significant advantages.
  25. [[1]] Explicit pinning prevents frequently used data from being evicted from
  26. memory. This is particularly important when the size of the working set
  27. exceeds the size of main memory, which is common for many HDFS workloads.
  28. [[1]] Because DataNode caches are managed by the NameNode, applications can
  29. query the set of cached block locations when making task placement decisions.
  30. Co-locating a task with a cached block replica improves read performance.
  31. [[1]] When block has been cached by a DataNode, clients can use a new ,
  32. more-efficient, zero-copy read API. Since checksum verification of cached
  33. data is done once by the DataNode, clients can incur essentially zero
  34. overhead when using this new API.
  35. [[1]] Centralized caching can improve overall cluster memory utilization.
  36. When relying on the OS buffer cache at each DataNode, repeated reads of
  37. a block will result in all <n> replicas of the block being pulled into
  38. buffer cache. With centralized cache management, a user can explicitly pin
  39. only <m> of the <n> replicas, saving <n-m> memory.
  40. * {Use Cases}
  41. Centralized cache management is useful for files that accessed repeatedly.
  42. For example, a small <fact table> in Hive which is often used for joins is a
  43. good candidate for caching. On the other hand, caching the input of a <
  44. one year reporting query> is probably less useful, since the
  45. historical data might only be read once.
  46. Centralized cache management is also useful for mixed workloads with
  47. performance SLAs. Caching the working set of a high-priority workload
  48. insures that it does not contend for disk I/O with a low-priority workload.
  49. * {Architecture}
  50. [images/caching.png] Caching Architecture
  51. In this architecture, the NameNode is responsible for coordinating all the
  52. DataNode off-heap caches in the cluster. The NameNode periodically receives
  53. a <cache report> from each DataNode which describes all the blocks cached
  54. on a given DN. The NameNode manages DataNode caches by piggybacking cache and
  55. uncache commands on the DataNode heartbeat.
  56. The NameNode queries its set of <cache directives> to determine
  57. which paths should be cached. Cache directives are persistently stored in the
  58. fsimage and edit log, and can be added, removed, and modified via Java and
  59. command-line APIs. The NameNode also stores a set of <cache pools>,
  60. which are administrative entities used to group cache directives together for
  61. resource management and enforcing permissions.
  62. The NameNode periodically rescans the namespace and active cache directives
  63. to determine which blocks need to be cached or uncached and assign caching
  64. work to DataNodes. Rescans can also be triggered by user actions like adding
  65. or removing a cache directive or removing a cache pool.
  66. We do not currently cache blocks which are under construction, corrupt, or
  67. otherwise incomplete. If a cache directive covers a symlink, the symlink
  68. target is not cached.
  69. Caching is currently done on the file or directory-level. Block and sub-block
  70. caching is an item of future work.
  71. * {Concepts}
  72. ** {Cache directive}
  73. A <cache directive> defines a path that should be cached. Paths can be either
  74. directories or files. Directories are cached non-recursively, meaning only
  75. files in the first-level listing of the directory.
  76. Directives also specify additional parameters, such as the cache replication
  77. factor and expiration time. The replication factor specifies the number of
  78. block replicas to cache. If multiple cache directives refer to the same file,
  79. the maximum cache replication factor is applied.
  80. The expiration time is specified on the command line as a <time-to-live
  81. (TTL)>, a relative expiration time in the future. After a cache directive
  82. expires, it is no longer considered by the NameNode when making caching
  83. decisions.
  84. ** {Cache pool}
  85. A <cache pool> is an administrative entity used to manage groups of cache
  86. directives. Cache pools have UNIX-like <permissions>, which restrict which
  87. users and groups have access to the pool. Write permissions allow users to
  88. add and remove cache directives to the pool. Read permissions allow users to
  89. list the cache directives in a pool, as well as additional metadata. Execute
  90. permissions are unused.
  91. Cache pools are also used for resource management. Pools can enforce a
  92. maximum <limit>, which restricts the number of bytes that can be cached in
  93. aggregate by directives in the pool. Normally, the sum of the pool limits
  94. will approximately equal the amount of aggregate memory reserved for
  95. HDFS caching on the cluster. Cache pools also track a number of statistics
  96. to help cluster users determine what is and should be cached.
  97. Pools also can enforce a maximum time-to-live. This restricts the maximum
  98. expiration time of directives being added to the pool.
  99. * {<<<cacheadmin>>> command-line interface}
  100. On the command-line, administrators and users can interact with cache pools
  101. and directives via the <<<hdfs cacheadmin>>> subcommand.
  102. Cache directives are identified by a unique, non-repeating 64-bit integer ID.
  103. IDs will not be reused even if a cache directive is later removed.
  104. Cache pools are identified by a unique string name.
  105. ** {Cache directive commands}
  106. *** {addDirective}
  107. Usage: <<<hdfs cacheadmin -addDirective -path <path> -pool <pool-name> [-force] [-replication <replication>] [-ttl <time-to-live>]>>>
  108. Add a new cache directive.
  109. *--+--+
  110. \<path\> | A path to cache. The path can be a directory or a file.
  111. *--+--+
  112. \<pool-name\> | The pool to which the directive will be added. You must have write permission on the cache pool in order to add new directives.
  113. *--+--+
  114. -force | Skips checking of cache pool resource limits.
  115. *--+--+
  116. \<replication\> | The cache replication factor to use. Defaults to 1.
  117. *--+--+
  118. \<time-to-live\> | How long the directive is valid. Can be specified in minutes, hours, and days, e.g. 30m, 4h, 2d. Valid units are [smhd]. "never" indicates a directive that never expires. If unspecified, the directive never expires.
  119. *--+--+
  120. *** {removeDirective}
  121. Usage: <<<hdfs cacheadmin -removeDirective <id> >>>
  122. Remove a cache directive.
  123. *--+--+
  124. \<id\> | The id of the cache directive to remove. You must have write permission on the pool of the directive in order to remove it. To see a list of cachedirective IDs, use the -listDirectives command.
  125. *--+--+
  126. *** {removeDirectives}
  127. Usage: <<<hdfs cacheadmin -removeDirectives <path> >>>
  128. Remove every cache directive with the specified path.
  129. *--+--+
  130. \<path\> | The path of the cache directives to remove. You must have write permission on the pool of the directive in order to remove it. To see a list of cache directives, use the -listDirectives command.
  131. *--+--+
  132. *** {listDirectives}
  133. Usage: <<<hdfs cacheadmin -listDirectives [-stats] [-path <path>] [-pool <pool>]>>>
  134. List cache directives.
  135. *--+--+
  136. \<path\> | List only cache directives with this path. Note that if there is a cache directive for <path> in a cache pool that we don't have read access for, it will not be listed.
  137. *--+--+
  138. \<pool\> | List only path cache directives in that pool.
  139. *--+--+
  140. -stats | List path-based cache directive statistics.
  141. *--+--+
  142. ** {Cache pool commands}
  143. *** {addPool}
  144. Usage: <<<hdfs cacheadmin -addPool <name> [-owner <owner>] [-group <group>] [-mode <mode>] [-limit <limit>] [-maxTtl <maxTtl>>>>
  145. Add a new cache pool.
  146. *--+--+
  147. \<name\> | Name of the new pool.
  148. *--+--+
  149. \<owner\> | Username of the owner of the pool. Defaults to the current user.
  150. *--+--+
  151. \<group\> | Group of the pool. Defaults to the primary group name of the current user.
  152. *--+--+
  153. \<mode\> | UNIX-style permissions for the pool. Permissions are specified in octal, e.g. 0755. By default, this is set to 0755.
  154. *--+--+
  155. \<limit\> | The maximum number of bytes that can be cached by directives in this pool, in aggregate. By default, no limit is set.
  156. *--+--+
  157. \<maxTtl\> | The maximum allowed time-to-live for directives being added to the pool. This can be specified in seconds, minutes, hours, and days, e.g. 120s, 30m, 4h, 2d. Valid units are [smhd]. By default, no maximum is set. A value of \"never\" specifies that there is no limit.
  158. *--+--+
  159. *** {modifyPool}
  160. Usage: <<<hdfs cacheadmin -modifyPool <name> [-owner <owner>] [-group <group>] [-mode <mode>] [-limit <limit>] [-maxTtl <maxTtl>]>>>
  161. Modifies the metadata of an existing cache pool.
  162. *--+--+
  163. \<name\> | Name of the pool to modify.
  164. *--+--+
  165. \<owner\> | Username of the owner of the pool.
  166. *--+--+
  167. \<group\> | Groupname of the group of the pool.
  168. *--+--+
  169. \<mode\> | Unix-style permissions of the pool in octal.
  170. *--+--+
  171. \<limit\> | Maximum number of bytes that can be cached by this pool.
  172. *--+--+
  173. \<maxTtl\> | The maximum allowed time-to-live for directives being added to the pool.
  174. *--+--+
  175. *** {removePool}
  176. Usage: <<<hdfs cacheadmin -removePool <name> >>>
  177. Remove a cache pool. This also uncaches paths associated with the pool.
  178. *--+--+
  179. \<name\> | Name of the cache pool to remove.
  180. *--+--+
  181. *** {listPools}
  182. Usage: <<<hdfs cacheadmin -listPools [-stats] [<name>]>>>
  183. Display information about one or more cache pools, e.g. name, owner, group,
  184. permissions, etc.
  185. *--+--+
  186. -stats | Display additional cache pool statistics.
  187. *--+--+
  188. \<name\> | If specified, list only the named cache pool.
  189. *--+--+
  190. *** {help}
  191. Usage: <<<hdfs cacheadmin -help <command-name> >>>
  192. Get detailed help about a command.
  193. *--+--+
  194. \<command-name\> | The command for which to get detailed help. If no command is specified, print detailed help for all commands.
  195. *--+--+
  196. * {Configuration}
  197. ** {Native Libraries}
  198. In order to lock block files into memory, the DataNode relies on native JNI
  199. code found in <<<libhadoop.so>>> or <<<hadoop.dll>>> on Windows. Be sure to
  200. {{{../hadoop-common/NativeLibraries.html}enable JNI}} if you are using HDFS
  201. centralized cache management.
  202. ** {Configuration Properties}
  203. *** Required
  204. Be sure to configure the following:
  205. * dfs.datanode.max.locked.memory
  206. This determines the maximum amount of memory a DataNode will use for caching.
  207. On Unix-like systems, the "locked-in-memory size" ulimit (<<<ulimit -l>>>) of
  208. the DataNode user also needs to be increased to match this parameter (see
  209. below section on {{OS Limits}}). When setting this value, please remember
  210. that you will need space in memory for other things as well, such as the
  211. DataNode and application JVM heaps and the operating system page cache.
  212. *** Optional
  213. The following properties are not required, but may be specified for tuning:
  214. * dfs.namenode.path.based.cache.refresh.interval.ms
  215. The NameNode will use this as the amount of milliseconds between subsequent
  216. path cache rescans. This calculates the blocks to cache and each DataNode
  217. containing a replica of the block that should cache it.
  218. By default, this parameter is set to 300000, which is five minutes.
  219. * dfs.datanode.fsdatasetcache.max.threads.per.volume
  220. The DataNode will use this as the maximum number of threads per volume to
  221. use for caching new data.
  222. By default, this parameter is set to 4.
  223. * dfs.cachereport.intervalMsec
  224. The DataNode will use this as the amount of milliseconds between sending a
  225. full report of its cache state to the NameNode.
  226. By default, this parameter is set to 10000, which is 10 seconds.
  227. * dfs.namenode.path.based.cache.block.map.allocation.percent
  228. The percentage of the Java heap which we will allocate to the cached blocks
  229. map. The cached blocks map is a hash map which uses chained hashing.
  230. Smaller maps may be accessed more slowly if the number of cached blocks is
  231. large; larger maps will consume more memory. The default is 0.25 percent.
  232. ** {OS Limits}
  233. If you get the error "Cannot start datanode because the configured max
  234. locked memory size... is more than the datanode's available RLIMIT_MEMLOCK
  235. ulimit," that means that the operating system is imposing a lower limit
  236. on the amount of memory that you can lock than what you have configured. To
  237. fix this, you must adjust the ulimit -l value that the DataNode runs with.
  238. Usually, this value is configured in <<</etc/security/limits.conf>>>.
  239. However, it will vary depending on what operating system and distribution
  240. you are using.
  241. You will know that you have correctly configured this value when you can run
  242. <<<ulimit -l>>> from the shell and get back either a higher value than what
  243. you have configured with <<<dfs.datanode.max.locked.memory>>>, or the string
  244. "unlimited," indicating that there is no limit. Note that it's typical for
  245. <<<ulimit -l>>> to output the memory lock limit in KB, but
  246. dfs.datanode.max.locked.memory must be specified in bytes.
  247. This information does not apply to deployments on Windows. Windows has no
  248. direct equivalent of <<<ulimit -l>>>.