|
@@ -36,7 +36,7 @@ higher performance.
|
|
|
|
|
|
The specifics of using these filesystems are documented below.
|
|
|
|
|
|
-### Warning #1: Object Stores are not filesystems.
|
|
|
+### Warning #1: Object Stores are not filesystems
|
|
|
|
|
|
Amazon S3 is an example of "an object store". In order to achieve scalability
|
|
|
and especially high availability, S3 has —as many other cloud object stores have
|
|
@@ -53,14 +53,38 @@ recursive file-by-file operations. They take time at least proportional to
|
|
|
the number of files, during which time partial updates may be visible. If
|
|
|
the operations are interrupted, the filesystem is left in an intermediate state.
|
|
|
|
|
|
-### Warning #2: Because Object stores don't track modification times of directories,
|
|
|
-features of Hadoop relying on this can have unexpected behaviour. E.g. the
|
|
|
+### Warning #2: Object stores don't track modification times of directories
|
|
|
+
|
|
|
+Features of Hadoop relying on this can have unexpected behaviour. E.g. the
|
|
|
AggregatedLogDeletionService of YARN will not remove the appropriate logfiles.
|
|
|
|
|
|
For further discussion on these topics, please consult
|
|
|
[The Hadoop FileSystem API Definition](../../../hadoop-project-dist/hadoop-common/filesystem/index.html).
|
|
|
|
|
|
-### Warning #3: your AWS credentials are valuable
|
|
|
+### Warning #3: Object stores have differerent authorization models
|
|
|
+
|
|
|
+The object authorization model of S3 is much different from the file
|
|
|
+authorization model of HDFS and traditional file systems. It is not feasible to
|
|
|
+persist file ownership and permissions in S3, so S3A reports stub information
|
|
|
+from APIs that would query this metadata:
|
|
|
+
|
|
|
+* File owner is reported as the current user.
|
|
|
+* File group also is reported as the current user. Prior to Apache Hadoop
|
|
|
+2.8.0, file group was reported as empty (no group associated), which is a
|
|
|
+potential incompatibility problem for scripts that perform positional parsing of
|
|
|
+shell output and other clients that expect to find a well-defined group.
|
|
|
+* Directory permissions are reported as 777.
|
|
|
+* File permissions are reported as 666.
|
|
|
+
|
|
|
+S3A does not really enforce any authorization checks on these stub permissions.
|
|
|
+Users authenticate to an S3 bucket using AWS credentials. It's possible that
|
|
|
+object ACLs have been defined to enforce authorization at the S3 side, but this
|
|
|
+happens entirely within the S3 service, not within the S3A implementation.
|
|
|
+
|
|
|
+For further discussion on these topics, please consult
|
|
|
+[The Hadoop FileSystem API Definition](../../../hadoop-project-dist/hadoop-common/filesystem/index.html).
|
|
|
+
|
|
|
+### Warning #4: Your AWS credentials are valuable
|
|
|
|
|
|
Your AWS credentials not only pay for services, they offer read and write
|
|
|
access to the data. Anyone with the credentials can not only read your datasets
|
|
@@ -75,7 +99,7 @@ Do not inadvertently share these credentials through means such as
|
|
|
|
|
|
If you do any of these: change your credentials immediately!
|
|
|
|
|
|
-### Warning #4: the S3 client provided by Amazon EMR are not from the Apache
|
|
|
+### Warning #5: The S3 client provided by Amazon EMR are not from the Apache
|
|
|
Software foundation, and are only supported by Amazon.
|
|
|
|
|
|
Specifically: on Amazon EMR, s3a is not supported, and amazon recommend
|