|
@@ -14,9 +14,21 @@
|
|
|
|
|
|
# A Model of a Hadoop Filesystem
|
|
|
|
|
|
-
|
|
|
-
|
|
|
-#### Paths and Path Elements
|
|
|
+* [Paths and Path Elements](#Paths_and_Path_Elements)
|
|
|
+ * [Predicates and Functions](#Predicates_and_Functions)
|
|
|
+ * [Notes for relative paths](#Notes_for_relative_paths)
|
|
|
+* [Defining the Filesystem](#Defining_the_Filesystem)
|
|
|
+ * [Directory references](#Directory_references)
|
|
|
+ * [File references](#File_references)
|
|
|
+ * [Symbolic references](#Symbolic_references)
|
|
|
+ * [File Length](#File_Length)
|
|
|
+ * [User home](#User_home)
|
|
|
+ * [Exclusivity](#Exclusivity)
|
|
|
+ * [Encryption Zone](#Encryption_Zone)
|
|
|
+* [Notes](#Notes)
|
|
|
+
|
|
|
+
|
|
|
+## Paths and Path Elements
|
|
|
|
|
|
A Path is a list of Path elements which represents a path to a file, directory of symbolic link
|
|
|
|
|
@@ -32,7 +44,9 @@ Filesystems MAY have other strings that are not permitted in a path element.
|
|
|
When validating path elements, the exception `InvalidPathException` SHOULD
|
|
|
be raised when a path is invalid [HDFS]
|
|
|
|
|
|
-Predicate: `valid-path-element:List[String];`
|
|
|
+### Predicates and Functions
|
|
|
+
|
|
|
+#### `valid-path-element(List[String]): bool`
|
|
|
|
|
|
A path element `pe` is invalid if any character in it is in the set of forbidden characters,
|
|
|
or the element as a whole is invalid
|
|
@@ -41,17 +55,20 @@ or the element as a whole is invalid
|
|
|
not pe in {"", ".", "..", "/"}
|
|
|
|
|
|
|
|
|
-Predicate: `valid-path:List<PathElement>`
|
|
|
+#### `valid-path(List[PathElement]): bool`
|
|
|
|
|
|
A Path `p` is *valid* if all path elements in it are valid
|
|
|
|
|
|
- def valid-path(pe): forall pe in Path: valid-path-element(pe)
|
|
|
+ def valid-path(path): forall pe in path: valid-path-element(pe)
|
|
|
|
|
|
|
|
|
The set of all possible paths is *Paths*; this is the infinite set of all lists of valid path elements.
|
|
|
|
|
|
The path represented by empty list, `[]` is the *root path*, and is denoted by the string `"/"`.
|
|
|
|
|
|
+
|
|
|
+#### `parent(path:Path): Path`
|
|
|
+
|
|
|
The partial function `parent(path:Path):Path` provides the parent path can be defined using
|
|
|
list slicing.
|
|
|
|
|
@@ -62,7 +79,7 @@ Preconditions:
|
|
|
path != []
|
|
|
|
|
|
|
|
|
-#### `filename:Path->PathElement`
|
|
|
+#### `filename(Path): PathElement`
|
|
|
|
|
|
The last Path Element in a Path is called the filename.
|
|
|
|
|
@@ -72,7 +89,7 @@ Preconditions:
|
|
|
|
|
|
p != []
|
|
|
|
|
|
-#### `childElements:(Path p, Path q):Path`
|
|
|
+#### `childElements(Path p, Path q): Path`
|
|
|
|
|
|
|
|
|
The partial function `childElements:(Path p, Path q):Path`
|
|
@@ -87,12 +104,12 @@ Preconditions:
|
|
|
q == p[:len(q)]
|
|
|
|
|
|
|
|
|
-#### ancestors(Path): List[Path]
|
|
|
+#### `ancestors(Path): List[Path]`
|
|
|
|
|
|
The list of all paths that are either the direct parent of a path p, or a parent of
|
|
|
ancestor of p.
|
|
|
|
|
|
-#### Notes
|
|
|
+### Notes for relative paths
|
|
|
|
|
|
This definition handles absolute paths but not relative ones; it needs to be reworked so the root element is explicit, presumably
|
|
|
by declaring that the root (and only the root) path element may be ['/'].
|
|
@@ -100,18 +117,18 @@ by declaring that the root (and only the root) path element may be ['/'].
|
|
|
Relative paths can then be distinguished from absolute paths as the input to any function and resolved when the second entry in a two-argument function
|
|
|
such as `rename`.
|
|
|
|
|
|
-### Defining the Filesystem
|
|
|
+## Defining the Filesystem
|
|
|
|
|
|
|
|
|
A filesystem `FS` contains a set of directories, a dictionary of paths and a dictionary of symbolic links
|
|
|
|
|
|
- (Directories:set[Path], Files:[Path:List[byte]], Symlinks:set[Path])
|
|
|
+ (Directories:Set[Path], Files:[Path:List[byte]], Symlinks:Set[Path])
|
|
|
|
|
|
|
|
|
Accessor functions return the specific element of a filesystem
|
|
|
|
|
|
def FS.Directories = FS.Directories
|
|
|
- def file(FS) = FS.Files
|
|
|
+ def files(FS) = FS.Files
|
|
|
def symlinks(FS) = FS.Symlinks
|
|
|
def filenames(FS) = keys(FS.Files)
|
|
|
|
|
@@ -131,7 +148,7 @@ The root path, "/", is a directory represented by the path ["/"], which must al
|
|
|
|
|
|
|
|
|
|
|
|
-#### Directory references
|
|
|
+### Directory references
|
|
|
|
|
|
A path MAY refer to a directory in a FileSystem:
|
|
|
|
|
@@ -172,21 +189,21 @@ path begins with the path P -that is their parent is P or an ancestor is P
|
|
|
def descendants(FS, D) = {p for p in paths(FS) where isDescendant(D, p)}
|
|
|
|
|
|
|
|
|
-#### File references
|
|
|
+### File references
|
|
|
|
|
|
A path MAY refer to a file; that it it has data in the filesystem; its path is a key in the data dictionary
|
|
|
|
|
|
def isFile(FS, p) = p in FS.Files
|
|
|
|
|
|
|
|
|
-#### Symbolic references
|
|
|
+### Symbolic references
|
|
|
|
|
|
A path MAY refer to a symbolic link:
|
|
|
|
|
|
def isSymlink(FS, p) = p in symlinks(FS)
|
|
|
|
|
|
|
|
|
-#### File Length
|
|
|
+### File Length
|
|
|
|
|
|
The length of a path p in a filesystem FS is the length of the data stored, or 0 if it is a directory:
|
|
|
|
|
@@ -203,7 +220,8 @@ The function `getHomeDirectory` returns the home directory for the Filesystem an
|
|
|
For some FileSystems, the path is `["/","users", System.getProperty("user-name")]`. However,
|
|
|
for HDFS,
|
|
|
|
|
|
-#### Exclusivity
|
|
|
+
|
|
|
+### Exclusivity
|
|
|
|
|
|
A path cannot refer to more than one of a file, a directory or a symbolic link
|
|
|
|
|
@@ -218,7 +236,33 @@ This implies that only files may have data.
|
|
|
This condition is invariant and is an implicit postcondition of all
|
|
|
operations that manipulate the state of a FileSystem `FS`.
|
|
|
|
|
|
-### Notes
|
|
|
+
|
|
|
+### Encryption Zone
|
|
|
+
|
|
|
+The data is encrypted if the file is in encryption zone.
|
|
|
+
|
|
|
+ def inEncryptionZone(FS, path): bool
|
|
|
+
|
|
|
+The nature of the encryption and the mechanism for creating an encryption zone
|
|
|
+are implementation details not covered in this specification.
|
|
|
+No guarantees are made about the quality of the encryption.
|
|
|
+The metadata is not encrypted.
|
|
|
+
|
|
|
+All files and directories under a directory in an encryption zone are also in an
|
|
|
+encryption zone.
|
|
|
+
|
|
|
+ forall d in directories(FS): inEncyptionZone(FS, d) implies
|
|
|
+ forall c in children(FS, d) where (isFile(FS, c) or isDir(FS, c)) :
|
|
|
+ inEncyptionZone(FS, c)
|
|
|
+
|
|
|
+For all files in an encrypted zone, the data is encrypted, but the encryption
|
|
|
+type and specification are not defined.
|
|
|
+
|
|
|
+ forall f in files(FS) where inEncyptionZone(FS, f):
|
|
|
+ isEncrypted(data(f))
|
|
|
+
|
|
|
+
|
|
|
+## Notes
|
|
|
|
|
|
Not covered: hard links in a FileSystem. If a FileSystem supports multiple
|
|
|
references in *paths(FS)* to point to the same data, the outcome of operations
|