123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297 |
- <?xml version="1.0" encoding="UTF-8"?>
- <!--
- Licensed to the Apache Software Foundation (ASF) under one or more
- contributor license agreements. See the NOTICE file distributed with
- this work for additional information regarding copyright ownership.
- The ASF licenses this file to You under the Apache License, Version 2.0
- (the "License"); you may not use this file except in compliance with
- the License. You may obtain a copy of the License at
- http://www.apache.org/licenses/LICENSE-2.0
- Unless required by applicable law or agreed to in writing, software
- distributed under the License is distributed on an "AS IS" BASIS,
- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- See the License for the specific language governing permissions and
- limitations under the License.
- -->
- <document xmlns="http://maven.apache.org/XDOC/2.0"
- xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
- xsi:schemaLocation="http://maven.apache.org/XDOC/2.0 http://maven.apache.org/xsd/xdoc-2.0.xsd">
- <properties>
- <title>HDFS Snapshots</title>
- </properties>
- <body>
- <h1>HDFS Snapshots</h1>
- <macro name="toc">
- <param name="section" value="0"/>
- <param name="fromDepth" value="0"/>
- <param name="toDepth" value="4"/>
- </macro>
- <section name="Overview" id="Overview">
- <p>
- HDFS Snapshots are read-only point-in-time copies of the file system.
- Snapshots can be taken on a subtree of the file system or the entire file system.
- Some common use cases of snapshots are data backup, protection against user errors
- and disaster recovery.
- </p>
- <p>
- The implementation of HDFS Snapshots is efficient:
- </p>
- <ul>
- <li>Snapshot creation is instantaneous:
- the cost is <em>O(1)</em> excluding the inode lookup time.</li>
- <li>Additional memory is used only when modifications are made relative to a snapshot:
- memory usage is <em>O(M)</em>,
- where <em>M</em> is the number of modified files/directories.</li>
- <li>Blocks in datanodes are not copied:
- the snapshot files record the block list and the file size.
- There is no data copying.</li>
- <li>Snapshots do not adversely affect regular HDFS operations:
- modifications are recorded in reverse chronological order
- so that the current data can be accessed directly.
- The snapshot data is computed by subtracting the modifications
- from the current data.</li>
- </ul>
- <subsection name="Snapshottable Directories" id="SnapshottableDirectories">
- <p>
- Snapshots can be taken on any directory once the directory has been set as
- <em>snapshottable</em>.
- A snapshottable directory is able to accommodate 65,536 simultaneous snapshots.
- There is no limit on the number of snapshottable directories.
- Administrators may set any directory to be snapshottable.
- If there are snapshots in a snapshottable directory,
- the directory can be neither deleted nor renamed
- before all the snapshots are deleted.
- </p>
- <p>
- Nested snapshottable directories are currently not allowed.
- In other words, a directory cannot be set to snapshottable
- if one of its ancestors/descendants is a snapshottable directory.
- </p>
- </subsection>
- <subsection name="Snapshot Paths" id="SnapshotPaths">
- <p>
- For a snapshottable directory,
- the path component <em>".snapshot"</em> is used for accessing its snapshots.
- Suppose <code>/foo</code> is a snapshottable directory,
- <code>/foo/bar</code> is a file/directory in <code>/foo</code>,
- and <code>/foo</code> has a snapshot <code>s0</code>.
- Then, the path <source>/foo/.snapshot/s0/bar</source>
- refers to the snapshot copy of <code>/foo/bar</code>.
- The usual API and CLI can work with the ".snapshot" paths.
- The following are some examples.
- </p>
- <ul>
- <li>Listing all the snapshots under a snapshottable directory:
- <source>hdfs dfs -ls /foo/.snapshot</source></li>
- <li>Listing the files in snapshot <code>s0</code>:
- <source>hdfs dfs -ls /foo/.snapshot/s0</source></li>
- <li>Copying a file from snapshot <code>s0</code>:
- <source>hdfs dfs -cp /foo/.snapshot/s0/bar /tmp</source></li>
- </ul>
- </subsection>
- </section>
- <section name="Upgrading to a version of HDFS with snapshots" id="Upgrade">
- <p>
- The HDFS snapshot feature introduces a new reserved path name used to
- interact with snapshots: <tt>.snapshot</tt>. When upgrading from an
- older version of HDFS, existing paths named <tt>.snapshot</tt> need
- to first be renamed or deleted to avoid conflicting with the reserved path.
- See the upgrade section in
- <a href="HdfsUserGuide.html#Upgrade_and_Rollback">the HDFS user guide</a>
- for more information. </p>
- </section>
- <section name="Snapshot Operations" id="SnapshotOperations">
- <subsection name="Administrator Operations" id="AdministratorOperations">
- <p>
- The operations described in this section require superuser privilege.
- </p>
- <h4>Allow Snapshots</h4>
- <p>
- Allowing snapshots of a directory to be created.
- If the operation completes successfully, the directory becomes snapshottable.
- </p>
- <ul>
- <li>Command:
- <source>hdfs dfsadmin -allowSnapshot <path></source></li>
- <li>Arguments:<table>
- <tr><td>path</td><td>The path of the snapshottable directory.</td></tr>
- </table></li>
- </ul>
- <p>
- See also the corresponding Java API
- <code>void allowSnapshot(Path path)</code> in <code>HdfsAdmin</code>.
- </p>
- <h4>Disallow Snapshots</h4>
- <p>
- Disallowing snapshots of a directory to be created.
- All snapshots of the directory must be deleted before disallowing snapshots.
- </p>
- <ul>
- <li>Command:
- <source>hdfs dfsadmin -disallowSnapshot <path></source></li>
- <li>Arguments:<table>
- <tr><td>path</td><td>The path of the snapshottable directory.</td></tr>
- </table></li>
- </ul>
- <p>
- See also the corresponding Java API
- <code>void disallowSnapshot(Path path)</code> in <code>HdfsAdmin</code>.
- </p>
- </subsection>
- <subsection name="User Operations" id="UserOperations">
- <p>
- The section describes user operations.
- Note that HDFS superuser can perform all the operations
- without satisfying the permission requirement in the individual operations.
- </p>
- <h4>Create Snapshots</h4>
- <p>
- Create a snapshot of a snapshottable directory.
- This operation requires owner privilege of the snapshottable directory.
- </p>
- <ul>
- <li>Command:
- <source>hdfs dfs -createSnapshot <path> [<snapshotName>]</source></li>
- <li>Arguments:<table>
- <tr><td>path</td><td>The path of the snapshottable directory.</td></tr>
- <tr><td>snapshotName</td><td>
- The snapshot name, which is an optional argument.
- When it is omitted, a default name is generated using a timestamp with the format
- <code>"'s'yyyyMMdd-HHmmss.SSS"</code>, e.g. "s20130412-151029.033".
- </td></tr>
- </table></li>
- </ul>
- <p>
- See also the corresponding Java API
- <code>Path createSnapshot(Path path)</code> and
- <code>Path createSnapshot(Path path, String snapshotName)</code>
- in <a href="../../api/org/apache/hadoop/fs/FileSystem.html"><code>FileSystem</code></a>.
- The snapshot path is returned in these methods.
- </p>
- <h4>Delete Snapshots</h4>
- <p>
- Delete a snapshot of from a snapshottable directory.
- This operation requires owner privilege of the snapshottable directory.
- </p>
- <ul>
- <li>Command:
- <source>hdfs dfs -deleteSnapshot <path> <snapshotName></source></li>
- <li>Arguments:<table>
- <tr><td>path</td><td>The path of the snapshottable directory.</td></tr>
- <tr><td>snapshotName</td><td>The snapshot name.</td></tr>
- </table></li>
- </ul>
- <p>
- See also the corresponding Java API
- <code>void deleteSnapshot(Path path, String snapshotName)</code>
- in <a href="../../api/org/apache/hadoop/fs/FileSystem.html"><code>FileSystem</code></a>.
- </p>
- <h4>Rename Snapshots</h4>
- <p>
- Rename a snapshot.
- This operation requires owner privilege of the snapshottable directory.
- </p>
- <ul>
- <li>Command:
- <source>hdfs dfs -renameSnapshot <path> <oldName> <newName></source></li>
- <li>Arguments:<table>
- <tr><td>path</td><td>The path of the snapshottable directory.</td></tr>
- <tr><td>oldName</td><td>The old snapshot name.</td></tr>
- <tr><td>newName</td><td>The new snapshot name.</td></tr>
- </table></li>
- </ul>
- <p>
- See also the corresponding Java API
- <code>void renameSnapshot(Path path, String oldName, String newName)</code>
- in <a href="../../api/org/apache/hadoop/fs/FileSystem.html"><code>FileSystem</code></a>.
- </p>
- <h4>Get Snapshottable Directory Listing</h4>
- <p>
- Get all the snapshottable directories where the current user has permission to take snapshtos.
- </p>
- <ul>
- <li>Command:
- <source>hdfs lsSnapshottableDir</source></li>
- <li>Arguments: none</li>
- </ul>
- <p>
- See also the corresponding Java API
- <code>SnapshottableDirectoryStatus[] getSnapshottableDirectoryListing()</code>
- in <code>DistributedFileSystem</code>.
- </p>
- <h4>Get Snapshots Difference Report</h4>
- <p>
- Get the differences between two snapshots.
- This operation requires read access privilege for all files/directories in both snapshots.
- </p>
- <ul>
- <li>Command:
- <source>hdfs snapshotDiff <path> <fromSnapshot> <toSnapshot></source></li>
- <li>Arguments:<table>
- <tr><td>path</td><td>The path of the snapshottable directory.</td></tr>
- <tr><td>fromSnapshot</td><td>The name of the starting snapshot.</td></tr>
- <tr><td>toSnapshot</td><td>The name of the ending snapshot.</td></tr>
- </table></li>
- <li>Results:
- <table>
- <tr><td>+</td><td>The file/directory has been created.</td></tr>
- <tr><td>-</td><td>The file/directory has been deleted.</td></tr>
- <tr><td>M</td><td>The file/directory has been modified.</td></tr>
- <tr><td>R</td><td>The file/directory has been renamed.</td></tr>
- </table>
- </li>
- </ul>
- <p>
- A <em>RENAME</em> entry indicates a file/directory has been renamed but
- is still under the same snapshottable directory. A file/directory is
- reported as deleted if it was renamed to outside of the snapshottble directory.
- A file/directory renamed from outside of the snapshottble directory is
- reported as newly created.
- </p>
- <p>
- The snapshot difference report does not guarantee the same operation sequence.
- For example, if we rename the directory <em>"/foo"</em> to <em>"/foo2"</em>, and
- then append new data to the file <em>"/foo2/bar"</em>, the difference report will
- be:
- <source>
- R. /foo -> /foo2
- M. /foo/bar
- </source>
- I.e., the changes on the files/directories under a renamed directory is
- reported using the original path before the rename (<em>"/foo/bar"</em> in
- the above example).
- </p>
- <p>
- See also the corresponding Java API
- <code>SnapshotDiffReport getSnapshotDiffReport(Path path, String fromSnapshot, String toSnapshot)</code>
- in <code>DistributedFileSystem</code>.
- </p>
- </subsection>
- </section>
- </body>
- </document>
|