HdfsSnapshots.xml 11 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297
  1. <?xml version="1.0" encoding="UTF-8"?>
  2. <!--
  3. Licensed to the Apache Software Foundation (ASF) under one or more
  4. contributor license agreements. See the NOTICE file distributed with
  5. this work for additional information regarding copyright ownership.
  6. The ASF licenses this file to You under the Apache License, Version 2.0
  7. (the "License"); you may not use this file except in compliance with
  8. the License. You may obtain a copy of the License at
  9. http://www.apache.org/licenses/LICENSE-2.0
  10. Unless required by applicable law or agreed to in writing, software
  11. distributed under the License is distributed on an "AS IS" BASIS,
  12. WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  13. See the License for the specific language governing permissions and
  14. limitations under the License.
  15. -->
  16. <document xmlns="http://maven.apache.org/XDOC/2.0"
  17. xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  18. xsi:schemaLocation="http://maven.apache.org/XDOC/2.0 http://maven.apache.org/xsd/xdoc-2.0.xsd">
  19. <properties>
  20. <title>HDFS Snapshots</title>
  21. </properties>
  22. <body>
  23. <h1>HDFS Snapshots</h1>
  24. <macro name="toc">
  25. <param name="section" value="0"/>
  26. <param name="fromDepth" value="0"/>
  27. <param name="toDepth" value="4"/>
  28. </macro>
  29. <section name="Overview" id="Overview">
  30. <p>
  31. HDFS Snapshots are read-only point-in-time copies of the file system.
  32. Snapshots can be taken on a subtree of the file system or the entire file system.
  33. Some common use cases of snapshots are data backup, protection against user errors
  34. and disaster recovery.
  35. </p>
  36. <p>
  37. The implementation of HDFS Snapshots is efficient:
  38. </p>
  39. <ul>
  40. <li>Snapshot creation is instantaneous:
  41. the cost is <em>O(1)</em> excluding the inode lookup time.</li>
  42. <li>Additional memory is used only when modifications are made relative to a snapshot:
  43. memory usage is <em>O(M)</em>,
  44. where <em>M</em> is the number of modified files/directories.</li>
  45. <li>Blocks in datanodes are not copied:
  46. the snapshot files record the block list and the file size.
  47. There is no data copying.</li>
  48. <li>Snapshots do not adversely affect regular HDFS operations:
  49. modifications are recorded in reverse chronological order
  50. so that the current data can be accessed directly.
  51. The snapshot data is computed by subtracting the modifications
  52. from the current data.</li>
  53. </ul>
  54. <subsection name="Snapshottable Directories" id="SnapshottableDirectories">
  55. <p>
  56. Snapshots can be taken on any directory once the directory has been set as
  57. <em>snapshottable</em>.
  58. A snapshottable directory is able to accommodate 65,536 simultaneous snapshots.
  59. There is no limit on the number of snapshottable directories.
  60. Administrators may set any directory to be snapshottable.
  61. If there are snapshots in a snapshottable directory,
  62. the directory can be neither deleted nor renamed
  63. before all the snapshots are deleted.
  64. </p>
  65. <p>
  66. Nested snapshottable directories are currently not allowed.
  67. In other words, a directory cannot be set to snapshottable
  68. if one of its ancestors/descendants is a snapshottable directory.
  69. </p>
  70. </subsection>
  71. <subsection name="Snapshot Paths" id="SnapshotPaths">
  72. <p>
  73. For a snapshottable directory,
  74. the path component <em>".snapshot"</em> is used for accessing its snapshots.
  75. Suppose <code>/foo</code> is a snapshottable directory,
  76. <code>/foo/bar</code> is a file/directory in <code>/foo</code>,
  77. and <code>/foo</code> has a snapshot <code>s0</code>.
  78. Then, the path <source>/foo/.snapshot/s0/bar</source>
  79. refers to the snapshot copy of <code>/foo/bar</code>.
  80. The usual API and CLI can work with the ".snapshot" paths.
  81. The following are some examples.
  82. </p>
  83. <ul>
  84. <li>Listing all the snapshots under a snapshottable directory:
  85. <source>hdfs dfs -ls /foo/.snapshot</source></li>
  86. <li>Listing the files in snapshot <code>s0</code>:
  87. <source>hdfs dfs -ls /foo/.snapshot/s0</source></li>
  88. <li>Copying a file from snapshot <code>s0</code>:
  89. <source>hdfs dfs -cp /foo/.snapshot/s0/bar /tmp</source></li>
  90. </ul>
  91. </subsection>
  92. </section>
  93. <section name="Upgrading to a version of HDFS with snapshots" id="Upgrade">
  94. <p>
  95. The HDFS snapshot feature introduces a new reserved path name used to
  96. interact with snapshots: <tt>.snapshot</tt>. When upgrading from an
  97. older version of HDFS, existing paths named <tt>.snapshot</tt> need
  98. to first be renamed or deleted to avoid conflicting with the reserved path.
  99. See the upgrade section in
  100. <a href="HdfsUserGuide.html#Upgrade_and_Rollback">the HDFS user guide</a>
  101. for more information. </p>
  102. </section>
  103. <section name="Snapshot Operations" id="SnapshotOperations">
  104. <subsection name="Administrator Operations" id="AdministratorOperations">
  105. <p>
  106. The operations described in this section require superuser privilege.
  107. </p>
  108. <h4>Allow Snapshots</h4>
  109. <p>
  110. Allowing snapshots of a directory to be created.
  111. If the operation completes successfully, the directory becomes snapshottable.
  112. </p>
  113. <ul>
  114. <li>Command:
  115. <source>hdfs dfsadmin -allowSnapshot &lt;path&gt;</source></li>
  116. <li>Arguments:<table>
  117. <tr><td>path</td><td>The path of the snapshottable directory.</td></tr>
  118. </table></li>
  119. </ul>
  120. <p>
  121. See also the corresponding Java API
  122. <code>void allowSnapshot(Path path)</code> in <code>HdfsAdmin</code>.
  123. </p>
  124. <h4>Disallow Snapshots</h4>
  125. <p>
  126. Disallowing snapshots of a directory to be created.
  127. All snapshots of the directory must be deleted before disallowing snapshots.
  128. </p>
  129. <ul>
  130. <li>Command:
  131. <source>hdfs dfsadmin -disallowSnapshot &lt;path&gt;</source></li>
  132. <li>Arguments:<table>
  133. <tr><td>path</td><td>The path of the snapshottable directory.</td></tr>
  134. </table></li>
  135. </ul>
  136. <p>
  137. See also the corresponding Java API
  138. <code>void disallowSnapshot(Path path)</code> in <code>HdfsAdmin</code>.
  139. </p>
  140. </subsection>
  141. <subsection name="User Operations" id="UserOperations">
  142. <p>
  143. The section describes user operations.
  144. Note that HDFS superuser can perform all the operations
  145. without satisfying the permission requirement in the individual operations.
  146. </p>
  147. <h4>Create Snapshots</h4>
  148. <p>
  149. Create a snapshot of a snapshottable directory.
  150. This operation requires owner privilege of the snapshottable directory.
  151. </p>
  152. <ul>
  153. <li>Command:
  154. <source>hdfs dfs -createSnapshot &lt;path&gt; [&lt;snapshotName&gt;]</source></li>
  155. <li>Arguments:<table>
  156. <tr><td>path</td><td>The path of the snapshottable directory.</td></tr>
  157. <tr><td>snapshotName</td><td>
  158. The snapshot name, which is an optional argument.
  159. When it is omitted, a default name is generated using a timestamp with the format
  160. <code>"'s'yyyyMMdd-HHmmss.SSS"</code>, e.g. "s20130412-151029.033".
  161. </td></tr>
  162. </table></li>
  163. </ul>
  164. <p>
  165. See also the corresponding Java API
  166. <code>Path createSnapshot(Path path)</code> and
  167. <code>Path createSnapshot(Path path, String snapshotName)</code>
  168. in <a href="../../api/org/apache/hadoop/fs/FileSystem.html"><code>FileSystem</code></a>.
  169. The snapshot path is returned in these methods.
  170. </p>
  171. <h4>Delete Snapshots</h4>
  172. <p>
  173. Delete a snapshot of from a snapshottable directory.
  174. This operation requires owner privilege of the snapshottable directory.
  175. </p>
  176. <ul>
  177. <li>Command:
  178. <source>hdfs dfs -deleteSnapshot &lt;path&gt; &lt;snapshotName&gt;</source></li>
  179. <li>Arguments:<table>
  180. <tr><td>path</td><td>The path of the snapshottable directory.</td></tr>
  181. <tr><td>snapshotName</td><td>The snapshot name.</td></tr>
  182. </table></li>
  183. </ul>
  184. <p>
  185. See also the corresponding Java API
  186. <code>void deleteSnapshot(Path path, String snapshotName)</code>
  187. in <a href="../../api/org/apache/hadoop/fs/FileSystem.html"><code>FileSystem</code></a>.
  188. </p>
  189. <h4>Rename Snapshots</h4>
  190. <p>
  191. Rename a snapshot.
  192. This operation requires owner privilege of the snapshottable directory.
  193. </p>
  194. <ul>
  195. <li>Command:
  196. <source>hdfs dfs -renameSnapshot &lt;path&gt; &lt;oldName&gt; &lt;newName&gt;</source></li>
  197. <li>Arguments:<table>
  198. <tr><td>path</td><td>The path of the snapshottable directory.</td></tr>
  199. <tr><td>oldName</td><td>The old snapshot name.</td></tr>
  200. <tr><td>newName</td><td>The new snapshot name.</td></tr>
  201. </table></li>
  202. </ul>
  203. <p>
  204. See also the corresponding Java API
  205. <code>void renameSnapshot(Path path, String oldName, String newName)</code>
  206. in <a href="../../api/org/apache/hadoop/fs/FileSystem.html"><code>FileSystem</code></a>.
  207. </p>
  208. <h4>Get Snapshottable Directory Listing</h4>
  209. <p>
  210. Get all the snapshottable directories where the current user has permission to take snapshtos.
  211. </p>
  212. <ul>
  213. <li>Command:
  214. <source>hdfs lsSnapshottableDir</source></li>
  215. <li>Arguments: none</li>
  216. </ul>
  217. <p>
  218. See also the corresponding Java API
  219. <code>SnapshottableDirectoryStatus[] getSnapshottableDirectoryListing()</code>
  220. in <code>DistributedFileSystem</code>.
  221. </p>
  222. <h4>Get Snapshots Difference Report</h4>
  223. <p>
  224. Get the differences between two snapshots.
  225. This operation requires read access privilege for all files/directories in both snapshots.
  226. </p>
  227. <ul>
  228. <li>Command:
  229. <source>hdfs snapshotDiff &lt;path&gt; &lt;fromSnapshot&gt; &lt;toSnapshot&gt;</source></li>
  230. <li>Arguments:<table>
  231. <tr><td>path</td><td>The path of the snapshottable directory.</td></tr>
  232. <tr><td>fromSnapshot</td><td>The name of the starting snapshot.</td></tr>
  233. <tr><td>toSnapshot</td><td>The name of the ending snapshot.</td></tr>
  234. </table></li>
  235. <li>Results:
  236. <table>
  237. <tr><td>+</td><td>The file/directory has been created.</td></tr>
  238. <tr><td>-</td><td>The file/directory has been deleted.</td></tr>
  239. <tr><td>M</td><td>The file/directory has been modified.</td></tr>
  240. <tr><td>R</td><td>The file/directory has been renamed.</td></tr>
  241. </table>
  242. </li>
  243. </ul>
  244. <p>
  245. A <em>RENAME</em> entry indicates a file/directory has been renamed but
  246. is still under the same snapshottable directory. A file/directory is
  247. reported as deleted if it was renamed to outside of the snapshottble directory.
  248. A file/directory renamed from outside of the snapshottble directory is
  249. reported as newly created.
  250. </p>
  251. <p>
  252. The snapshot difference report does not guarantee the same operation sequence.
  253. For example, if we rename the directory <em>"/foo"</em> to <em>"/foo2"</em>, and
  254. then append new data to the file <em>"/foo2/bar"</em>, the difference report will
  255. be:
  256. <source>
  257. R. /foo -> /foo2
  258. M. /foo/bar
  259. </source>
  260. I.e., the changes on the files/directories under a renamed directory is
  261. reported using the original path before the rename (<em>"/foo/bar"</em> in
  262. the above example).
  263. </p>
  264. <p>
  265. See also the corresponding Java API
  266. <code>SnapshotDiffReport getSnapshotDiffReport(Path path, String fromSnapshot, String toSnapshot)</code>
  267. in <code>DistributedFileSystem</code>.
  268. </p>
  269. </subsection>
  270. </section>
  271. </body>
  272. </document>