123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262 |
- <?xml version="1.0" encoding="UTF-8"?>
- <!--
- Licensed to the Apache Software Foundation (ASF) under one or more
- contributor license agreements. See the NOTICE file distributed with
- this work for additional information regarding copyright ownership.
- The ASF licenses this file to You under the Apache License, Version 2.0
- (the "License"); you may not use this file except in compliance with
- the License. You may obtain a copy of the License at
- http://www.apache.org/licenses/LICENSE-2.0
- Unless required by applicable law or agreed to in writing, software
- distributed under the License is distributed on an "AS IS" BASIS,
- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- See the License for the specific language governing permissions and
- limitations under the License.
- -->
- <document xmlns="http://maven.apache.org/XDOC/2.0"
- xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
- xsi:schemaLocation="http://maven.apache.org/XDOC/2.0 http://maven.apache.org/xsd/xdoc-2.0.xsd">
- <properties>
- <title>HFDS Snapshots</title>
- </properties>
- <body>
- <h1>HDFS Snapshots</h1>
- <macro name="toc">
- <param name="section" value="0"/>
- <param name="fromDepth" value="0"/>
- <param name="toDepth" value="4"/>
- </macro>
- <section name="Overview" id="Overview">
- <p>
- HDFS Snapshots are read-only point-in-time copies of the file system.
- Snapshots can be taken on a subtree of the file system or the entire file system.
- Some common use cases of snapshots are data backup, protection against user errors
- and disaster recovery.
- </p>
- <p>
- The implementation of HDFS Snapshots is efficient:
- </p>
- <ul>
- <li>Snapshot creation is instantaneous:
- the cost is <em>O(1)</em> excluding the inode lookup time.</li>
- <li>Additional memory is used only when modifications are made relative to a snapshot:
- memory usage is <em>O(M)</em>,
- where <em>M</em> is the number of modified files/directories.</li>
- <li>Blocks in datanodes are not copied:
- the snapshot files record the block list and the file size.
- There is no data copying.</li>
- <li>Snapshots do not adversely affect regular HDFS operations:
- modifications are recorded in reverse chronological order
- so that the current data can be accessed directly.
- The snapshot data is computed by subtracting the modifications
- from the current data.</li>
- </ul>
- <subsection name="Snapshottable Directories" id="SnapshottableDirectories">
- <p>
- Snapshots can be taken on any directory once the directory has been set as
- <em>snapshottable</em>.
- A snapshottable directory is able to accommodate 65,536 simultaneous snapshots.
- There is no limit on the number of snapshottable directories.
- Administrators may set any directory to be snapshottable.
- If there are snapshots in a snapshottable directory,
- the directory can be neither deleted nor renamed
- before all the snapshots are deleted.
- </p>
- <!--
- <p>
- Nested snapshottable directories are currently not allowed.
- In other words, a directory cannot be set to snapshottable
- if one of its ancestors is a snapshottable directory.
- </p>
- -->
- </subsection>
- <subsection name="Snapshot Paths" id="SnapshotPaths">
- <p>
- For a snapshottable directory,
- the path component <em>".snapshot"</em> is used for accessing its snapshots.
- Suppose <code>/foo</code> is a snapshottable directory,
- <code>/foo/bar</code> is a file/directory in <code>/foo</code>,
- and <code>/foo</code> has a snapshot <code>s0</code>.
- Then, the path <source>/foo/.snapshot/s0/bar</source>
- refers to the snapshot copy of <code>/foo/bar</code>.
- The usual API and CLI can work with the ".snapshot" paths.
- The following are some examples.
- </p>
- <ul>
- <li>Listing all the snapshots under a snapshottable directory:
- <source>hdfs dfs -ls /foo/.snapshot</source></li>
- <li>Listing the files in snapshot <code>s0</code>:
- <source>hdfs dfs -ls /foo/.snapshot/s0</source></li>
- <li>Copying a file from snapshot <code>s0</code>:
- <source>hdfs dfs -cp /foo/.snapshot/s0/bar /tmp</source></li>
- </ul>
- <p>
- The name ".snapshot" is now a reserved file name in HDFS
- so that users cannot create a file/directory with ".snapshot" as the name.
- If ".snapshot" is used in a previous version of HDFS, it must be renamed before upgrade;
- otherwise, upgrade will fail.
- </p>
- </subsection>
- </section>
- <section name="Snapshot Operations" id="SnapshotOperations">
- <subsection name="Administrator Operations" id="AdministratorOperations">
- <p>
- The operations described in this section require superuser privilege.
- </p>
- <h4>Allow Snapshots</h4>
- <p>
- Allowing snapshots of a directory to be created.
- If the operation completes successfully, the directory becomes snapshottable.
- </p>
- <ul>
- <li>Command:
- <source>hdfs dfsadmin -allowSnapshot <path></source></li>
- <li>Arguments:<table>
- <tr><td>path</td><td>The path of the snapshottable directory.</td></tr>
- </table></li>
- </ul>
- <p>
- See also the corresponding Java API
- <code>void allowSnapshot(Path path)</code> in <code>HdfsAdmin</code>.
- </p>
- <h4>Disallow Snapshots</h4>
- <p>
- Disallowing snapshots of a directory to be created.
- All snapshots of the directory must be deleted before disallowing snapshots.
- </p>
- <ul>
- <li>Command:
- <source>hdfs dfsadmin -disallowSnapshot <path></source></li>
- <li>Arguments:<table>
- <tr><td>path</td><td>The path of the snapshottable directory.</td></tr>
- </table></li>
- </ul>
- <p>
- See also the corresponding Java API
- <code>void disallowSnapshot(Path path)</code> in <code>HdfsAdmin</code>.
- </p>
- </subsection>
- <subsection name="User Operations" id="UserOperations">
- <p>
- The section describes user operations.
- Note that HDFS superuser can perform all the operations
- without satisfying the permission requirement in the individual operations.
- </p>
- <h4>Create Snapshots</h4>
- <p>
- Create a snapshot of a snapshottable directory.
- This operation requires owner privilege of the snapshottable directory.
- </p>
- <ul>
- <li>Command:
- <source>hdfs dfs -createSnapshot <path> [<snapshotName>]</source></li>
- <li>Arguments:<table>
- <tr><td>path</td><td>The path of the snapshottable directory.</td></tr>
- <tr><td>snapshotName</td><td>
- The snapshot name, which is an optional argument.
- When it is omitted, a default name is generated using a timestamp with the format
- <code>"'s'yyyyMMdd-HHmmss.SSS"</code>, e.g. "s20130412-151029.033".
- </td></tr>
- </table></li>
- </ul>
- <p>
- See also the corresponding Java API
- <code>Path createSnapshot(Path path)</code> and
- <code>Path createSnapshot(Path path, String snapshotName)</code>
- in <a href="../../api/org/apache/hadoop/fs/FileSystem.html"><code>FileSystem</code></a>.
- The snapshot path is returned in these methods.
- </p>
- <h4>Delete Snapshots</h4>
- <p>
- Delete a snapshot of from a snapshottable directory.
- This operation requires owner privilege of the snapshottable directory.
- </p>
- <ul>
- <li>Command:
- <source>hdfs dfs -deleteSnapshot <path> <snapshotName></source></li>
- <li>Arguments:<table>
- <tr><td>path</td><td>The path of the snapshottable directory.</td></tr>
- <tr><td>snapshotName</td><td>The snapshot name.</td></tr>
- </table></li>
- </ul>
- <p>
- See also the corresponding Java API
- <code>void deleteSnapshot(Path path, String snapshotName)</code>
- in <a href="../../api/org/apache/hadoop/fs/FileSystem.html"><code>FileSystem</code></a>.
- </p>
- <h4>Rename Snapshots</h4>
- <p>
- Rename a snapshot.
- This operation requires owner privilege of the snapshottable directory.
- </p>
- <ul>
- <li>Command:
- <source>hdfs dfs -renameSnapshot <path> <oldName> <newName></source></li>
- <li>Arguments:<table>
- <tr><td>path</td><td>The path of the snapshottable directory.</td></tr>
- <tr><td>oldName</td><td>The old snapshot name.</td></tr>
- <tr><td>newName</td><td>The new snapshot name.</td></tr>
- </table></li>
- </ul>
- <p>
- See also the corresponding Java API
- <code>void renameSnapshot(Path path, String oldName, String newName)</code>
- in <a href="../../api/org/apache/hadoop/fs/FileSystem.html"><code>FileSystem</code></a>.
- </p>
- <h4>Get Snapshottable Directory Listing</h4>
- <p>
- Get all the snapshottable directories where the current user has permission to take snapshtos.
- </p>
- <ul>
- <li>Command:
- <source>hdfs lsSnapshottableDir</source></li>
- <li>Arguments: none</li>
- </ul>
- <p>
- See also the corresponding Java API
- <code>SnapshottableDirectoryStatus[] getSnapshottableDirectoryListing()</code>
- in <code>DistributedFileSystem</code>.
- </p>
- <h4>Get Snapshots Difference Report</h4>
- <p>
- Get the differences between two snapshots.
- This operation requires read access privilege for all files/directories in both snapshots.
- </p>
- <ul>
- <li>Command:
- <source>hdfs snapshotDiff <path> <fromSnapshot> <toSnapshot></source></li>
- <li>Arguments:<table>
- <tr><td>path</td><td>The path of the snapshottable directory.</td></tr>
- <tr><td>fromSnapshot</td><td>The name of the starting snapshot.</td></tr>
- <tr><td>toSnapshot</td><td>The name of the ending snapshot.</td></tr>
- </table></li>
- </ul>
- <p>
- See also the corresponding Java API
- <code>SnapshotDiffReport getSnapshotDiffReport(Path path, String fromSnapshot, String toSnapshot)</code>
- in <code>DistributedFileSystem</code>.
- </p>
- </subsection>
- </section>
- </body>
- </document>
|