HFDS Snapshots

HDFS Snapshots

HDFS Snapshots are read-only point-in-time copies of the file system. Snapshots can be taken on a subtree of the file system or the entire file system. Some common use cases of snapshots are data backup, protection against user errors and disaster recovery.

The implementation of HDFS Snapshots is efficient:

Snapshots can be taken on any directory once the directory has been set as snapshottable. A snapshottable directory is able to accommodate 65,536 simultaneous snapshots. There is no limit on the number of snapshottable directories. Administrators may set any directory to be snapshottable. If there are snapshots in a snapshottable directory, the directory can be neither deleted nor renamed before all the snapshots are deleted.

For a snapshottable directory, the path component ".snapshot" is used for accessing its snapshots. Suppose /foo is a snapshottable directory, /foo/bar is a file/directory in /foo, and /foo has a snapshot s0. Then, the path /foo/.snapshot/s0/bar refers to the snapshot copy of /foo/bar. The usual API and CLI can work with the ".snapshot" paths. The following are some examples.

  • Listing all the snapshots under a snapshottable directory: hdfs dfs -ls /foo/.snapshot
  • Listing the files in snapshot s0: hdfs dfs -ls /foo/.snapshot/s0
  • Copying a file from snapshot s0: hdfs dfs -cp /foo/.snapshot/s0/bar /tmp

The name ".snapshot" is now a reserved file name in HDFS so that users cannot create a file/directory with ".snapshot" as the name. If ".snapshot" is used in a previous version of HDFS, it must be renamed before upgrade; otherwise, upgrade will fail.

The operations described in this section require superuser privilege.

Allow Snapshots

Allowing snapshots of a directory to be created. If the operation completes successfully, the directory becomes snapshottable.

  • Command: hdfs dfsadmin -allowSnapshot <path>
  • Arguments:
    pathThe path of the snapshottable directory.

See also the corresponding Java API void allowSnapshot(Path path) in HdfsAdmin.

Disallow Snapshots

Disallowing snapshots of a directory to be created. All snapshots of the directory must be deleted before disallowing snapshots.

  • Command: hdfs dfsadmin -disallowSnapshot <path>
  • Arguments:
    pathThe path of the snapshottable directory.

See also the corresponding Java API void disallowSnapshot(Path path) in HdfsAdmin.

The section describes user operations. Note that HDFS superuser can perform all the operations without satisfying the permission requirement in the individual operations.

Create Snapshots

Create a snapshot of a snapshottable directory. This operation requires owner privilege of the snapshottable directory.

  • Command: hdfs dfs -createSnapshot <path> [<snapshotName>]
  • Arguments:
    pathThe path of the snapshottable directory.
    snapshotName The snapshot name, which is an optional argument. When it is omitted, a default name is generated using a timestamp with the format "'s'yyyyMMdd-HHmmss.SSS", e.g. "s20130412-151029.033".

See also the corresponding Java API Path createSnapshot(Path path) and Path createSnapshot(Path path, String snapshotName) in FileSystem. The snapshot path is returned in these methods.

Delete Snapshots

Delete a snapshot of from a snapshottable directory. This operation requires owner privilege of the snapshottable directory.

  • Command: hdfs dfs -deleteSnapshot <path> <snapshotName>
  • Arguments:
    pathThe path of the snapshottable directory.
    snapshotNameThe snapshot name.

See also the corresponding Java API void deleteSnapshot(Path path, String snapshotName) in FileSystem.

Rename Snapshots

Rename a snapshot. This operation requires owner privilege of the snapshottable directory.

  • Command: hdfs dfs -renameSnapshot <path> <oldName> <newName>
  • Arguments:
    pathThe path of the snapshottable directory.
    oldNameThe old snapshot name.
    newNameThe new snapshot name.

See also the corresponding Java API void renameSnapshot(Path path, String oldName, String newName) in FileSystem.

Get Snapshottable Directory Listing

Get all the snapshottable directories where the current user has permission to take snapshtos.

  • Command: hdfs lsSnapshottableDir
  • Arguments: none

See also the corresponding Java API SnapshottableDirectoryStatus[] getSnapshottableDirectoryListing() in DistributedFileSystem.

Get Snapshots Difference Report

Get the differences between two snapshots. This operation requires read access privilege for all files/directories in both snapshots.

  • Command: hdfs snapshotDiff <path> <fromSnapshot> <toSnapshot>
  • Arguments:
    pathThe path of the snapshottable directory.
    fromSnapshotThe name of the starting snapshot.
    toSnapshotThe name of the ending snapshot.

See also the corresponding Java API SnapshotDiffReport getSnapshotDiffReport(Path path, String fromSnapshot, String toSnapshot) in DistributedFileSystem.