4 năm trước cách đây · bd85f6acea
--- a/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/committer_architecture.md
+++ b/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/committer_architecture.md
@@ -20,6 +20,27 @@ This document covers the architecture and implementation details of the S3A comm
 
				 
			
 
				 For information on using the committers, see [the S3A Committers](./committer.html).
			
 
				 
			
 
				+### January 2021 Update
			
 
				+
			
 
				+Now that S3 is fully consistent, problems related to inconsistent
			
 
				+directory listings have gone. However the rename problem exists: committing
			
 
				+work by renaming directories is unsafe as well as horribly slow.
			
 
				+
			
 
				+This architecture document, and the committers, were written at a time
			
 
				+when S3 was inconsistent. The two committers addressed this problem differently
			
 
				+
			
 
				+* Staging Committer: rely on a cluster HDFS filesystem for safely propagating
			
 
				+  the lists of files to commit from workers to the job manager/driver.
			
 
				+* Magic Committer: require S3Guard to offer consistent directory listings
			
 
				+  on the object store.
			
 
				+
			
 
				+With consistent S3, the Magic Committer can be safely used with any S3 bucket.
			
 
				+The choice of which to use, then, is matter for experimentation.
			
 
				+
			
 
				+This architecture document was written in 2017, a time when S3 was only
			
 
				+consistent when an extra consistency layer such as S3Guard was used.
			
 
				+The document indicates where requirements/constraints which existed then
			
 
				+are now obsolete.
			
 
				 
			
 
				 ## Problem: Efficient, reliable commits of work to consistent S3 buckets
			
 
				 
			
@@ -49,10 +70,10 @@ can be executed server-side, but as it does not complete until the in-cluster
 
				 copy has completed, it takes time proportional to the amount of data.
			
 
				 
			
 
				 The rename overhead is the most visible issue, but it is not the most dangerous.
			
 
				-That is the fact that path listings have no consistency guarantees, and may
			
 
				-lag the addition or deletion of files.
			
 
				-If files are not listed, the commit operation will *not* copy them, and
			
 
				-so they will not appear in the final output.
			
 
				+That is the fact that until late 2020, path listings had no consistency guarantees,
			
 
				+and may have lagged the addition or deletion of files.
			
 
				+If files were not listed, the commit operation would *not* copy them, and
			
 
				+so they would not appear in the final output.
			
 
				 
			
 
				 The solution to this problem is closely coupled to the S3 protocol itself:
			
 
				 delayed completion of multi-part PUT operations
			
@@ -828,6 +849,8 @@ commit sequence in `Task.done()`, when `talkToAMTGetPermissionToCommit()`
 
				 
			
 
				 # Requirements of an S3A Committer
			
 
				 
			
 
				+The design requirements of the S3A committer were
			
 
				+
			
 
				 1. Support an eventually consistent S3 object store as a reliable direct
			
 
				 destination of work through the S3A filesystem client.
			
 
				 1. Efficient: implies no rename, and a minimal amount of delay in the job driver's
			
@@ -841,6 +864,7 @@ the job, and any previous incompleted jobs.
 
				 1. Security: not to permit privilege escalation from other users with
			
 
				 write access to the same file system(s).
			
 
				 
			
 
				+ 
			
 
				 
			
 
				 ## Features of S3 and the S3A Client
			
 
				 
			
@@ -852,8 +876,8 @@ MR committer algorithms have significant performance problems.
 
				 
			
 
				 1. Single-object renames are implemented as a copy and delete sequence.
			
 
				 1. COPY is atomic, but overwrites cannot be prevented.
			
 
				-1. Amazon S3 is eventually consistent on listings, deletes and updates.
			
 
				-1. Amazon S3 has create consistency, however, the negative response of a HEAD/GET
			
 
				+1. [Obsolete] Amazon S3 is eventually consistent on listings, deletes and updates. 
			
 
				+1. [Obsolete] Amazon S3 has create consistency, however, the negative response of a HEAD/GET
			
 
				 performed on a path before an object was created can be cached, unintentionally
			
 
				 creating a create inconsistency. The S3A client library does perform such a check,
			
 
				 on `create()` and `rename()` to check the state of the destination path, and
			
@@ -872,11 +896,12 @@ data, with the `S3ABlockOutputStream` of HADOOP-13560 uploading written data
 
				 as parts of a multipart PUT once the threshold set in the configuration
			
 
				 parameter `fs.s3a.multipart.size` (default: 100MB).
			
 
				 
			
 
				-[S3Guard](./s3guard.html) adds an option of consistent view of the filesystem
			
 
				+[S3Guard](./s3guard.html) added an option of consistent view of the filesystem
			
 
				 to all processes using the shared DynamoDB table as the authoritative store of
			
 
				-metadata. Some S3-compatible object stores are fully consistent; the
			
 
				-proposed algorithm is designed to work with such object stores without the
			
 
				-need for any DynamoDB tables.
			
 
				+metadata.
			
 
				+The proposed algorithm was designed to work with such object stores without the
			
 
				+need for any DynamoDB tables. Since AWS S3 became consistent in 2020, this
			
 
				+means that they will work directly with the store.
			
 
				 
			
 
				 ## Related work: Spark's `DirectOutputCommitter`
			
 
				 
			
@@ -1246,8 +1271,8 @@ for parallel committing of work, including all the error handling based on
 
				 the Netflix experience.
			
 
				 
			
 
				 It differs in that it directly streams data to S3 (there is no staging),
			
 
				-and it also stores the lists of pending commits in S3 too. That mandates
			
 
				-consistent metadata on S3, which S3Guard provides.
			
 
				+and it also stores the lists of pending commits in S3 too. It
			
 
				+requires a consistent S3 store.
			
 
				 
			
 
				 
			
 
				 ### Core concept: A new/modified output stream for delayed PUT commits
			
@@ -1480,7 +1505,7 @@ The time to commit a job will be `O(files/threads)`
 
				 Every `.pendingset` file in the job attempt directory must be loaded, and a PUT
			
 
				 request issued for every incomplete upload listed in the files.
			
 
				 
			
 
				-Note that it is the bulk listing of all children which is where full consistency
			
 
				+[Obsolete] Note that it is the bulk listing of all children which is where full consistency
			
 
				 is required. If instead, the list of files to commit could be returned from
			
 
				 tasks to the job committer, as the Spark commit protocol allows, it would be
			
 
				 possible to commit data to an inconsistent object store.
			
@@ -1525,7 +1550,7 @@ commit algorithms.
 
				 1. It is possible to create more than one client writing to the
			
 
				 same destination file within the same S3A client/task, either sequentially or in parallel.
			
 
				 
			
 
				-1. Even with a consistent metadata store, if a job overwrites existing
			
 
				+1. [Obsolete] Even with a consistent metadata store, if a job overwrites existing
			
 
				 files, then old data may still be visible to clients reading the data, until
			
 
				 the update has propagated to all replicas of the data.
			
 
				 
			
@@ -1538,7 +1563,7 @@ all files in the destination directory which where not being overwritten.
 
				 for any purpose other than for the storage of pending commit data.
			
 
				 
			
 
				 1. Unless extra code is added to every FS operation, it will still be possible
			
 
				-to manipulate files under the `__magic` tree. That's not bad, it just potentially
			
 
				+to manipulate files under the `__magic` tree. That's not bad, just potentially
			
 
				 confusing.
			
 
				 
			
 
				 1. As written data is not materialized until the commit, it will not be possible
			
@@ -1693,14 +1718,6 @@ base for relative paths created underneath it.
 
				 
			
 
				 The committers can only be tested against an S3-compatible object store.
			
 
				 
			
 
				-Although a consistent object store is a requirement for a production deployment
			
 
				-of the magic committer an inconsistent one has appeared to work during testing, simply by
			
 
				-adding some delays to the operations: a task commit does not succeed until
			
 
				-all the objects which it has PUT are visible in the LIST operation. Assuming
			
 
				-that further listings from the same process also show the objects, the job
			
 
				-committer will be able to list and commit the uploads.
			
 
				-
			
 
				-
			
 
				 The committers have some unit tests, and integration tests based on
			
 
				 the protocol integration test lifted from `org.apache.hadoop.mapreduce.lib.output.TestFileOutputCommitter`
			
 
				 to test various state transitions of the commit mechanism has been extended
			
@@ -1766,7 +1783,8 @@ tree.
 
				 Alternatively, the fact that Spark tasks provide data to the job committer on their
			
 
				 completion means that a list of pending PUT commands could be built up, with the commit
			
 
				 operations being executed by an S3A-specific implementation of the `FileCommitProtocol`.
			
 
				-As noted earlier, this may permit the requirement for a consistent list operation
			
 
				+
			
 
				+[Obsolete] As noted earlier, this may permit the requirement for a consistent list operation
			
 
				 to be bypassed. It would still be important to list what was being written, as
			
 
				 it is needed to aid aborting work in failed tasks, but the list of files
			
 
				 created by successful tasks could be passed directly from the task to committer,
			
@@ -1890,9 +1908,6 @@ bandwidth and the data upload bandwidth.
 
				 
			
 
				 No use is made of the cluster filesystem; there are no risks there.
			
 
				 
			
 
				-A consistent store is required, which, for Amazon's infrastructure, means S3Guard.
			
 
				-This is covered below.
			
 
				-
			
 
				 A malicious user with write access to the `__magic` directory could manipulate
			
 
				 or delete the metadata of pending uploads, or potentially inject new work int
			
 
				 the commit. Having access to the `__magic` directory implies write access
			
@@ -1900,13 +1915,12 @@ to the parent destination directory: a malicious user could just as easily
 
				 manipulate the final output, without needing to attack the committer's intermediate
			
 
				 files.
			
 
				 
			
 
				-
			
 
				 ### Security Risks of all committers
			
 
				 
			
 
				 
			
 
				 #### Visibility
			
 
				 
			
 
				-* If S3Guard is used for storing metadata, then the metadata is visible to
			
 
				+[Obsolete] If S3Guard is used for storing metadata, then the metadata is visible to
			
 
				 all users with read access. A malicious user with write access could delete
			
 
				 entries of newly generated files, so they would not be visible.
			
 
				 
			
@@ -1941,7 +1955,7 @@ any of the text fields, script which could then be executed in some XSS
 
				 attack. We may wish to consider sanitizing this data on load.
			
 
				 
			
 
				 * Paths in tampered data could be modified in an attempt to commit an upload across
			
 
				-an existing file, or the MPU ID alterated to prematurely commit a different upload.
			
 
				+an existing file, or the MPU ID altered to prematurely commit a different upload.
			
 
				 These attempts will not going to succeed, because the destination
			
 
				 path of the upload is declared on the initial POST to initiate the MPU, and
			
 
				 operations associated with the MPU must also declare the path: if the path and
			
--- a/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/committers.md
+++ b/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/committers.md
@@ -26,9 +26,29 @@ and reliable commitment of output to S3.
 
				 For details on their internal design, see
			
 
				 [S3A Committers: Architecture and Implementation](./committer_architecture.html).
			
 
				 
			
 
				+### January 2021 Update
			
 
				 
			
 
				-## Introduction: The Commit Problem
			
 
				+Now that S3 is fully consistent, problems related to inconsistent directory
			
 
				+listings have gone. However the rename problem exists: committing work by
			
 
				+renaming directories is unsafe as well as horribly slow.
			
 
				+
			
 
				+This architecture document, and the committers, were written at a time when S3
			
 
				+was inconsistent. The two committers addressed this problem differently
			
 
				+
			
 
				+* Staging Committer: rely on a cluster HDFS filesystem for safely propagating
			
 
				+  the lists of files to commit from workers to the job manager/driver.
			
 
				+* Magic Committer: require S3Guard to offer consistent directory listings on the
			
 
				+  object store.
			
 
				 
			
 
				+With consistent S3, the Magic Committer can be safely used with any S3 bucket.
			
 
				+The choice of which to use, then, is matter for experimentation.
			
 
				+
			
 
				+This document was written in 2017, a time when S3 was only
			
 
				+consistent when an extra consistency layer such as S3Guard was used. The
			
 
				+document indicates where requirements/constraints which existed then are now
			
 
				+obsolete.
			
 
				+
			
 
				+## Introduction: The Commit Problem
			
 
				 
			
 
				 Apache Hadoop MapReduce (and behind the scenes, Apache Spark) often write
			
 
				 the output of their work to filesystems
			
@@ -50,21 +70,18 @@ or it is at the destination, -in which case the rename actually succeeded.
 
				 
			
 
				 **The S3 object store and the `s3a://` filesystem client cannot meet these requirements.*
			
 
				 
			
 
				-1. Amazon S3 has inconsistent directory listings unless S3Guard is enabled.
			
 
				-1. The S3A mimics `rename()` by copying files and then deleting the originals.
			
 
				+Although S3A is (now) consistent, the S3A client still mimics `rename()`
			
 
				+by copying files and then deleting the originals.
			
 
				 This can fail partway through, and there is nothing to prevent any other process
			
 
				 in the cluster attempting a rename at the same time.
			
 
				 
			
 
				 As a result,
			
 
				 
			
 
				-* Files my not be listed, hence not renamed into place.
			
 
				-* Deleted files may still be discovered, confusing the rename process to the point
			
 
				-of failure.
			
 
				 * If a rename fails, the data is left in an unknown state.
			
 
				 * If more than one process attempts to commit work simultaneously, the output
			
 
				 directory may contain the results of both processes: it is no longer an exclusive
			
 
				 operation.
			
 
				-*. While S3Guard may deliver the listing consistency, commit time is still
			
 
				+*. Commit time is still
			
 
				 proportional to the amount of data created. It still can't handle task failure.
			
 
				 
			
 
				 **Using the "classic" `FileOutputCommmitter` to commit work to Amazon S3 risks
			
@@ -163,10 +180,8 @@ and restarting the job.
 
				 whose output is in the job attempt directory, *and only rerunning all uncommitted tasks*.
			
 
				 
			
 
				 
			
 
				-None of this algorithm works safely or swiftly when working with "raw" AWS S3 storage:
			
 
				-* Directory listing can be inconsistent: the tasks and jobs may not list all work to
			
 
				-be committed.
			
 
				-* Renames go from being fast, atomic operations to slow operations which can fail partway through.
			
 
				+This algorithm does not works safely or swiftly with AWS S3 storage because 
			
 
				+tenames go from being fast, atomic operations to slow operations which can fail partway through.
			
 
				 
			
 
				 This then is the problem which the S3A committers address:
			
 
				 
			
@@ -341,9 +356,7 @@ task commit.
 
				 
			
 
				 However, it has extra requirements of the filesystem
			
 
				 
			
 
				-1. It requires a consistent object store, which for Amazon S3,
			
 
				-means that [S3Guard](./s3guard.html) must be enabled. For third-party stores,
			
 
				-consult the documentation.
			
 
				+1. [Obsolete] It requires a consistent object store.
			
 
				 1. The S3A client must be configured to recognize interactions
			
 
				 with the magic directories and treat them specially.
			
 
				 
			
@@ -358,14 +371,15 @@ it the least mature of the committers.
 
				 Partitioned Committer. Make sure you have enough hard disk capacity for all staged data.
			
 
				 Do not use it in other situations.
			
 
				 
			
 
				-1. If you know that your object store is consistent, or that the processes
			
 
				-writing data use S3Guard, use the Magic Committer for higher performance
			
 
				-writing of large amounts of data.
			
 
				+1. If you do not have a shared cluster store: use the Magic Committer.
			
 
				+   
			
 
				+1. If you are writing large amounts of data: use the Magic Committer.
			
 
				 
			
 
				 1. Otherwise: use the directory committer, making sure you have enough
			
 
				 hard disk capacity for all staged data.
			
 
				 
			
 
				-Put differently: start with the Directory Committer.
			
 
				+Now that S3 is consistent, there are fewer reasons not to use the Magic Committer.
			
 
				+Experiment with both to see which works best for your work.
			
 
				 
			
 
				 ## Switching to an S3A Committer
			
 
				 
			
@@ -499,9 +513,6 @@ performance.
 
				 
			
 
				 ### FileSystem client setup
			
 
				 
			
 
				-1. Use a *consistent* S3 object store. For Amazon S3, this means enabling
			
 
				-[S3Guard](./s3guard.html). For S3-compatible filesystems, consult the filesystem
			
 
				-documentation to see if it is consistent, hence compatible "out of the box".
			
 
				 1. Turn the magic on by `fs.s3a.committer.magic.enabled"`
			
 
				 
			
 
				 ```xml
			
@@ -514,8 +525,6 @@ documentation to see if it is consistent, hence compatible "out of the box".
 
				 </property>
			
 
				 ```
			
 
				 
			
 
				-*Do not use the Magic Committer on an inconsistent S3 object store. For
			
 
				-Amazon S3, that means S3Guard must *always* be enabled.
			
 
				 
			
 
				 
			
 
				 ### Enabling the committer
			
@@ -569,11 +578,9 @@ Conflict management is left to the execution engine itself.
 
				 
			
 
				 <property>
			
 
				   <name>fs.s3a.committer.magic.enabled</name>
			
 
				-  <value>false</value>
			
 
				+  <value>true</value>
			
 
				   <description>
			
 
				     Enable support in the filesystem for the S3 "Magic" committer.
			
 
				-    When working with AWS S3, S3Guard must be enabled for the destination
			
 
				-    bucket, as consistent metadata listings are required.
			
 
				   </description>
			
 
				 </property>
			
 
				 
			
@@ -726,7 +733,6 @@ in configuration option fs.s3a.committer.magic.enabled
 
				 The Job is configured to use the magic committer, but the S3A bucket has not been explicitly
			
 
				 declared as supporting it.
			
 
				 
			
 
				-The destination bucket **must** be declared as supporting the magic committer.
			
 
				 
			
 
				 This can be done for those buckets which are known to be consistent, either
			
 
				 because [S3Guard](s3guard.html) is used to provide consistency,
			
@@ -739,10 +745,6 @@ or because the S3-compatible filesystem is known to be strongly consistent.
 
				 </property>
			
 
				 ```
			
 
				 
			
 
				-*IMPORTANT*: only enable the magic committer against object stores which
			
 
				-offer consistent listings. By default, Amazon S3 does not do this -which is
			
 
				-why the option `fs.s3a.committer.magic.enabled` is disabled by default.
			
 
				-
			
 
				 
			
 
				 Tip: you can verify that a bucket supports the magic committer through the
			
 
				 `hadoop s3guard bucket-info` command:
			
--- a/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/index.md
+++ b/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/index.md
@@ -81,11 +81,12 @@ schemes.
 
				 * Supports authentication via: environment variables, Hadoop configuration
			
 
				 properties, the Hadoop key management store and IAM roles.
			
 
				 * Supports per-bucket configuration.
			
 
				-* With [S3Guard](./s3guard.html), adds high performance and consistent metadata/
			
 
				-directory read operations. This delivers consistency as well as speed.
			
 
				 * Supports S3 "Server Side Encryption" for both reading and writing:
			
 
				  SSE-S3, SSE-KMS and SSE-C
			
 
				 * Instrumented with Hadoop metrics.
			
 
				+* Before S3 was consistent, provided a consistent view of inconsistent storage
			
 
				+  through [S3Guard](./s3guard.html).
			
 
				+
			
 
				 * Actively maintained by the open source community.
			
 
				 
			
 
				 
			
--- a/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/performance.md
+++ b/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/performance.md
@@ -30,11 +30,11 @@ That's because its a very different system, as you can see:
 
				 | communication | RPC | HTTP GET/PUT/HEAD/LIST/COPY requests |
			
 
				 | data locality | local storage | remote S3 servers |
			
 
				 | replication | multiple datanodes | asynchronous after upload |
			
 
				-| consistency | consistent data and listings | eventual consistent for listings, deletes and updates |
			
 
				+| consistency | consistent data and listings | consistent since November 2020|
			
 
				 | bandwidth | best: local IO, worst: datacenter network | bandwidth between servers and S3 |
			
 
				 | latency | low | high, especially for "low cost" directory operations |
			
 
				-| rename | fast, atomic | slow faked rename through COPY & DELETE|
			
 
				-| delete | fast, atomic | fast for a file, slow & non-atomic for directories |
			
 
				+| rename | fast, atomic | slow faked rename through COPY and DELETE|
			
 
				+| delete | fast, atomic | fast for a file, slow and non-atomic for directories |
			
 
				 | writing| incremental | in blocks; not visible until the writer is closed |
			
 
				 | reading | seek() is fast | seek() is slow and expensive |
			
 
				 | IOPs | limited only by hardware | callers are throttled to shards in an s3 bucket |
			
--- a/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/s3_select.md
+++ b/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/s3_select.md
@@ -615,24 +615,10 @@ characters can be configured in the Hadoop configuration.
 
				 
			
 
				 **Consistency**
			
 
				 
			
 
				-* Assume the usual S3 consistency model applies.
			
 
				+Since November 2020, AWS S3 has been fully consistent.
			
 
				+This also applies to S3 Select.
			
 
				+We do not know what happens if an object is overwritten while a query is active.
			
 
				 
			
 
				-* When enabled, S3Guard's DynamoDB table will declare whether or not
			
 
				-a newly deleted file is visible: if it is marked as deleted, the
			
 
				-select request will be rejected with a `FileNotFoundException`.
			
 
				-
			
 
				-* When an existing S3-hosted object is changed, the S3 select operation
			
 
				-may return the results of a SELECT call as applied to either the old
			
 
				-or new version.
			
 
				-
			
 
				-* We don't know whether you can get partially consistent reads, or whether
			
 
				-an extended read ever picks up a later value.
			
 
				-
			
 
				-* The AWS S3 load balancers can briefly cache 404/Not-Found entries
			
 
				-from a failed HEAD/GET request against a nonexistent file; this cached
			
 
				-entry can briefly create create inconsistency, despite the
			
 
				-AWS "Create is consistent" model. There is no attempt to detect or recover from
			
 
				-this.
			
 
				 
			
 
				 **Concurrency**
			
 
				 
			
--- a/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/s3guard.md
+++ b/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/s3guard.md
@@ -22,24 +22,39 @@
 
				 which can use a (consistent) database as the store of metadata about objects
			
 
				 in an S3 bucket.
			
 
				 
			
 
				+It was written been 2016 and 2020, *when Amazon S3 was eventually consistent.*
			
 
				+It compensated for the following S3 inconsistencies: 
			
 
				+* Newly created objects excluded from directory listings.
			
 
				+* Newly deleted objects retained in directory listings.
			
 
				+* Deleted objects still visible in existence probes and opening for reading.
			
 
				+* S3 Load balancer 404 caching when a probe is made for an object before its creation.
			
 
				+
			
 
				+It did not compensate for update inconsistency, though by storing the etag
			
 
				+values of objects in the database, it could detect and report problems.
			
 
				+
			
 
				+Now that S3 is consistent, there is no need for S3Guard at all.
			
 
				+
			
 
				 S3Guard
			
 
				 
			
 
				-1. May improve performance on directory listing/scanning operations,
			
 
				+1. Permitted a consistent view of the object store.
			
 
				+
			
 
				+1. Could improve performance on directory listing/scanning operations.
			
 
				 including those which take place during the partitioning period of query
			
 
				 execution, the process where files are listed and the work divided up amongst
			
 
				 processes.
			
 
				 
			
 
				-1. Permits a consistent view of the object store. Without this, changes in
			
 
				-objects may not be immediately visible, especially in listing operations.
			
 
				 
			
 
				-1. Offers a platform for future performance improvements for running Hadoop
			
 
				-workloads on top of object stores
			
 
				 
			
 
				-The basic idea is that, for each operation in the Hadoop S3 client (s3a) that
			
 
				+The basic idea was that, for each operation in the Hadoop S3 client (s3a) that
			
 
				 reads or modifies metadata, a shadow copy of that metadata is stored in a
			
 
				-separate MetadataStore implementation.  Each MetadataStore implementation
			
 
				-offers HDFS-like consistency for the metadata, and may also provide faster
			
 
				-lookups for things like file status or directory listings.
			
 
				+separate MetadataStore implementation. The store was 
			
 
				+1. Updated after mutating operations on the store
			
 
				+1. Updated after list operations against S3 discovered changes
			
 
				+1. Looked up whenever a probe was made for a file/directory existing.
			
 
				+1. Queried for all objects under a path when a directory listing was made; the results were
			
 
				+   merged with the S3 listing in a non-authoritative path, used exclusively in
			
 
				+   authoritative mode.
			
 
				+ 
			
 
				 
			
 
				 For links to early design documents and related patches, see
			
 
				 [HADOOP-13345](https://issues.apache.org/jira/browse/HADOOP-13345).
			
@@ -55,6 +70,19 @@ It is essential for all clients writing to an S3Guard-enabled
 
				 S3 Repository to use the feature. Clients reading the data may work directly
			
 
				 with the S3A data, in which case the normal S3 consistency guarantees apply.
			
 
				 
			
 
				+## Moving off S3Guard
			
 
				+
			
 
				+How to move off S3Guard, given it is no longer needed.
			
 
				+
			
 
				+1. Unset the option `fs.s3a.metadatastore.impl` globally/for all buckets for which it
			
 
				+   was selected.
			
 
				+1. If the option `org.apache.hadoop.fs.s3a.s3guard.disabled.warn.level` has been changed from
			
 
				+the default (`SILENT`), change it back. You no longer need to be warned that S3Guard is disabled.
			
 
				+1. Restart all applications.
			
 
				+
			
 
				+Once you are confident that all applications have been restarted, _Delete the DynamoDB table_.
			
 
				+This is to avoid paying for a database you no longer need.
			
 
				+This is best done from the AWS GUI.
			
 
				 
			
 
				 ## Setting up S3Guard
			
 
				 
			
@@ -70,7 +98,7 @@ without S3Guard. The following values are available:
 
				 * `WARN`: Warn that data may be at risk in workflows.
			
 
				 * `FAIL`: S3AFileSystem instantiation will fail.
			
 
				 
			
 
				-The default setting is INFORM. The setting is case insensitive.
			
 
				+The default setting is `SILENT`. The setting is case insensitive.
			
 
				 The required level can be set in the `core-site.xml`.
			
 
				 
			
 
				 ---
			
--- a/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/testing.md
+++ b/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/testing.md
@@ -974,16 +974,18 @@ using an absolute XInclude reference to it.
 
				 **Warning do not enable any type of failure injection in production.  The
			
 
				 following settings are for testing only.**
			
 
				 
			
 
				-One of the challenges with S3A integration tests is the fact that S3 is an
			
 
				-eventually-consistent storage system.  In practice, we rarely see delays in
			
 
				-visibility of recently created objects both in listings (`listStatus()`) and
			
 
				-when getting a single file's metadata (`getFileStatus()`). Since this behavior
			
 
				-is rare and non-deterministic, thorough integration testing is challenging.
			
 
				-
			
 
				-To address this, S3A supports a shim layer on top of the `AmazonS3Client`
			
 
				+One of the challenges with S3A integration tests is the fact that S3 was an
			
 
				+eventually-consistent storage system. To simulate inconsistencies more
			
 
				+frequently than they would normally surface, S3A supports a shim layer on top of the `AmazonS3Client`
			
 
				 class which artificially delays certain paths from appearing in listings.
			
 
				 This is implemented in the class `InconsistentAmazonS3Client`.
			
 
				 
			
 
				+Now that S3 is consistent, injecting failures during integration and
			
 
				+functional testing is less important.
			
 
				+There's no need to enable it to verify that S3Guard can recover
			
 
				+from consistencies, given that in production such consistencies
			
 
				+will never surface.
			
 
				+
			
 
				 ## Simulating List Inconsistencies
			
 
				 
			
 
				 ### Enabling the InconsistentAmazonS3CClient
			
@@ -1062,9 +1064,6 @@ The default is 5000 milliseconds (five seconds).
 
				 </property>
			
 
				 ```
			
 
				 
			
 
				-Future versions of this client will introduce new failure modes,
			
 
				-with simulation of S3 throttling exceptions the next feature under
			
 
				-development.
			
 
				 
			
 
				 ### Limitations of Inconsistency Injection
			
 
				 
			
@@ -1104,8 +1103,12 @@ inconsistent directory listings.
 
				 
			
 
				 ## <a name="s3guard"></a> Testing S3Guard
			
 
				 
			
 
				-[S3Guard](./s3guard.html) is an extension to S3A which adds consistent metadata
			
 
				-listings to the S3A client. As it is part of S3A, it also needs to be tested.
			
 
				+[S3Guard](./s3guard.html) is an extension to S3A which added consistent metadata
			
 
				+listings to the S3A client. 
			
 
				+
			
 
				+It has not been needed for applications to work safely with AWS S3 since November
			
 
				+2020. However, it is currently still part of the codebase, and so something which
			
 
				+needs to be tested.
			
 
				 
			
 
				 The basic strategy for testing S3Guard correctness consists of:
			
 
				 
			
--- a/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/troubleshooting_s3a.md
+++ b/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/troubleshooting_s3a.md
@@ -1018,61 +1018,6 @@ Something has been trying to write data to "/".
 
				 These are the issues where S3 does not appear to behave the way a filesystem
			
 
				 "should".
			
 
				 
			
 
				-### Visible S3 Inconsistency
			
 
				-
			
 
				-Amazon S3 is *an eventually consistent object store*. That is: not a filesystem.
			
 
				-
			
 
				-To reduce visible inconsistencies, use the [S3Guard](./s3guard.html) consistency
			
 
				-cache.
			
 
				-
			
 
				-
			
 
				-By default, Amazon S3 offers read-after-create consistency: a newly created file
			
 
				-is immediately visible.
			
 
				-There is a small quirk: a negative GET may be cached, such
			
 
				-that even if an object is immediately created, the fact that there "wasn't"
			
 
				-an object is still remembered.
			
 
				-
			
 
				-That means the following sequence on its own will be consistent
			
 
				-```
			
 
				-touch(path) -> getFileStatus(path)
			
 
				-```
			
 
				-
			
 
				-But this sequence *may* be inconsistent.
			
 
				-
			
 
				-```
			
 
				-getFileStatus(path) -> touch(path) -> getFileStatus(path)
			
 
				-```
			
 
				-
			
 
				-A common source of visible inconsistencies is that the S3 metadata
			
 
				-database —the part of S3 which serves list requests— is updated asynchronously.
			
 
				-Newly added or deleted files may not be visible in the index, even though direct
			
 
				-operations on the object (`HEAD` and `GET`) succeed.
			
 
				-
			
 
				-That means the `getFileStatus()` and `open()` operations are more likely
			
 
				-to be consistent with the state of the object store, but without S3Guard enabled,
			
 
				-directory list operations such as `listStatus()`, `listFiles()`, `listLocatedStatus()`,
			
 
				-and `listStatusIterator()` may not see newly created files, and still list
			
 
				-old files.
			
 
				-
			
 
				-### `FileNotFoundException` even though the file was just written.
			
 
				-
			
 
				-This can be a sign of consistency problems. It may also surface if there is some
			
 
				-asynchronous file write operation still in progress in the client: the operation
			
 
				-has returned, but the write has not yet completed. While the S3A client code
			
 
				-does block during the `close()` operation, we suspect that asynchronous writes
			
 
				-may be taking place somewhere in the stack —this could explain why parallel tests
			
 
				-fail more often than serialized tests.
			
 
				-
			
 
				-### File not found in a directory listing, even though `getFileStatus()` finds it
			
 
				-
			
 
				-(Similarly: deleted file found in listing, though `getFileStatus()` reports
			
 
				-that it is not there)
			
 
				-
			
 
				-This is a visible sign of updates to the metadata server lagging
			
 
				-behind the state of the underlying filesystem.
			
 
				-
			
 
				-Fix: Use [S3Guard](s3guard.html).
			
 
				-
			
 
				 
			
 
				 ### File not visible/saved
			
 
				 
			
@@ -1159,6 +1104,11 @@ for more information.
 
				 A file being renamed and listed in the S3Guard table could not be found
			
 
				 in the S3 bucket even after multiple attempts.
			
 
				 
			
 
				+Now that S3 is consistent, this is sign that the S3Guard table is out of sync with
			
 
				+the S3 Data. 
			
 
				+
			
 
				+Fix: disable S3Guard: it is no longer needed.
			
 
				+
			
 
				 ```
			
 
				 org.apache.hadoop.fs.s3a.RemoteFileChangedException: copyFile(/sourcedir/missing, /destdir/)
			
 
				  `s3a://example/sourcedir/missing': File not found on S3 after repeated attempts: `s3a://example/sourcedir/missing'
			
@@ -1169,10 +1119,6 @@ at org.apache.hadoop.fs.s3a.impl.RenameOperation.copySourceAndUpdateTracker(Rena
 
				 at org.apache.hadoop.fs.s3a.impl.RenameOperation.lambda$initiateCopy$0(RenameOperation.java:412)
			
 
				 ```
			
 
				 
			
 
				-Either the file has been deleted, or an attempt was made to read a file before it
			
 
				-was created and the S3 load balancer has briefly cached the 404 returned by that
			
 
				-operation. This is something which AWS S3 can do for short periods.
			
 
				-
			
 
				 If error occurs and the file is on S3, consider increasing the value of
			
 
				 `fs.s3a.s3guard.consistency.retry.limit`.
			
 
				 
			
@@ -1180,29 +1126,6 @@ We also recommend using applications/application
 
				 options which do  not rename files when committing work or when copying data
			
 
				 to S3, but instead write directly to the final destination.
			
 
				 
			
 
				-### `RemoteFileChangedException`: "File to rename not found on unguarded S3 store"
			
 
				-
			
 
				-```
			
 
				-org.apache.hadoop.fs.s3a.RemoteFileChangedException: copyFile(/sourcedir/missing, /destdir/)
			
 
				- `s3a://example/sourcedir/missing': File to rename not found on unguarded S3 store: `s3a://example/sourcedir/missing'
			
 
				-at org.apache.hadoop.fs.s3a.S3AFileSystem.copyFile(S3AFileSystem.java:3231)
			
 
				-at org.apache.hadoop.fs.s3a.S3AFileSystem.access$700(S3AFileSystem.java:177)
			
 
				-at org.apache.hadoop.fs.s3a.S3AFileSystem$RenameOperationCallbacksImpl.copyFile(S3AFileSystem.java:1368)
			
 
				-at org.apache.hadoop.fs.s3a.impl.RenameOperation.copySourceAndUpdateTracker(RenameOperation.java:448)
			
 
				-at org.apache.hadoop.fs.s3a.impl.RenameOperation.lambda$initiateCopy$0(RenameOperation.java:412)
			
 
				-```
			
 
				-
			
 
				-An attempt was made to rename a file in an S3 store not protected by SGuard,
			
 
				-the directory list operation included the filename in its results but the
			
 
				-actual operation to rename the file failed.
			
 
				-
			
 
				-This can happen because S3 directory listings and the store itself are not
			
 
				-consistent: the list operation tends to lag changes in the store.
			
 
				-It is possible that the file has been deleted.
			
 
				-
			
 
				-The fix here is to use S3Guard. We also recommend using applications/application
			
 
				-options which do  not rename files when committing work or when copying data
			
 
				-to S3, but instead write directly to the final destination.
			
 
				 
			
 
				 ## <a name="encryption"></a> S3 Server Side Encryption