|
@@ -77,9 +77,34 @@ Do not inadvertently share these credentials through means such as
|
|
|
|
|
|
If you do any of these: change your credentials immediately!
|
|
|
|
|
|
+### Warning #4: the S3 client provided by Amazon EMR are not from the Apache
|
|
|
+Software foundation, and are only supported by Amazon.
|
|
|
+
|
|
|
+Specifically: on Amazon EMR, s3a is not supported, and amazon recommend
|
|
|
+a different filesystem implementation. If you are using Amazon EMR, follow
|
|
|
+these instructions —and be aware that all issues related to S3 integration
|
|
|
+in EMR can only be addressed by Amazon themselves: please raise your issues
|
|
|
+with them.
|
|
|
|
|
|
## S3
|
|
|
|
|
|
+The `s3://` filesystem is the original S3 store in the Hadoop codebase.
|
|
|
+It implements an inode-style filesystem atop S3, and was written to
|
|
|
+provide scaleability when S3 had significant limits on the size of blobs.
|
|
|
+It is incompatible with any other application's use of data in S3.
|
|
|
+
|
|
|
+It is now deprecated and will be removed in Hadoop 3. Please do not use,
|
|
|
+and migrate off data which is on it.
|
|
|
+
|
|
|
+### Dependencies
|
|
|
+
|
|
|
+* `jets3t` jar
|
|
|
+* `commons-codec` jar
|
|
|
+* `commons-logging` jar
|
|
|
+* `httpclient` jar
|
|
|
+* `httpcore` jar
|
|
|
+* `java-xmlbuilder` jar
|
|
|
+
|
|
|
### Authentication properties
|
|
|
|
|
|
<property>
|
|
@@ -95,6 +120,42 @@ If you do any of these: change your credentials immediately!
|
|
|
|
|
|
## S3N
|
|
|
|
|
|
+S3N was the first S3 Filesystem client which used "native" S3 objects, hence
|
|
|
+the schema `s3n://`.
|
|
|
+
|
|
|
+### Features
|
|
|
+
|
|
|
+* Directly reads and writes S3 objects.
|
|
|
+* Compatible with standard S3 clients.
|
|
|
+* Supports partitioned uploads for many-GB objects.
|
|
|
+* Available across all Hadoop 2.x releases.
|
|
|
+
|
|
|
+The S3N filesystem client, while widely used, is no longer undergoing
|
|
|
+active maintenance except for emergency security issues. There are
|
|
|
+known bugs, especially: it reads to end of a stream when closing a read;
|
|
|
+this can make `seek()` slow on large files. The reason there has been no
|
|
|
+attempt to fix this is that every upgrade of the Jets3t library, while
|
|
|
+fixing some problems, has unintentionally introduced new ones in either the changed
|
|
|
+Hadoop code, or somewhere in the Jets3t/Httpclient code base.
|
|
|
+The number of defects remained constant, they merely moved around.
|
|
|
+
|
|
|
+By freezing the Jets3t jar version and avoiding changes to the code,
|
|
|
+we reduce the risk of making things worse.
|
|
|
+
|
|
|
+The S3A filesystem client can read all files created by S3N. Accordingly
|
|
|
+it should be used wherever possible.
|
|
|
+
|
|
|
+
|
|
|
+### Dependencies
|
|
|
+
|
|
|
+* `jets3t` jar
|
|
|
+* `commons-codec` jar
|
|
|
+* `commons-logging` jar
|
|
|
+* `httpclient` jar
|
|
|
+* `httpcore` jar
|
|
|
+* `java-xmlbuilder` jar
|
|
|
+
|
|
|
+
|
|
|
### Authentication properties
|
|
|
|
|
|
<property>
|
|
@@ -176,6 +237,45 @@ If you do any of these: change your credentials immediately!
|
|
|
## S3A
|
|
|
|
|
|
|
|
|
+The S3A filesystem client, prefix `s3a://`, is the S3 client undergoing
|
|
|
+active development and maintenance.
|
|
|
+While this means that there is a bit of instability
|
|
|
+of configuration options and behavior, it also means
|
|
|
+that the code is getting better in terms of reliability, performance,
|
|
|
+monitoring and other features.
|
|
|
+
|
|
|
+### Features
|
|
|
+
|
|
|
+* Directly reads and writes S3 objects.
|
|
|
+* Compatible with standard S3 clients.
|
|
|
+* Can read data created with S3N.
|
|
|
+* Can write data back that is readable by S3N. (Note: excluding encryption).
|
|
|
+* Supports partitioned uploads for many-GB objects.
|
|
|
+* Instrumented with Hadoop metrics.
|
|
|
+* Performance optimized operations, including `seek()` and `readFully()`.
|
|
|
+* Uses Amazon's Java S3 SDK with support for latest S3 features and authentication
|
|
|
+schemes.
|
|
|
+* Supports authentication via: environment variables, Hadoop configuration
|
|
|
+properties, the Hadoop key management store and IAM roles.
|
|
|
+* Supports S3 "Server Side Encryption" for both reading and writing.
|
|
|
+* Supports proxies
|
|
|
+* Test suites includes distcp and suites in downstream projects.
|
|
|
+* Available since Hadoop 2.6; considered production ready in Hadoop 2.7.
|
|
|
+* Actively maintained.
|
|
|
+
|
|
|
+S3A is now the recommended client for working with S3 objects. It is also the
|
|
|
+one where patches for functionality and performance are very welcome.
|
|
|
+
|
|
|
+### Dependencies
|
|
|
+
|
|
|
+* `hadoop-aws` jar.
|
|
|
+* `aws-java-sdk-s3` jar.
|
|
|
+* `aws-java-sdk-core` jar.
|
|
|
+* `aws-java-sdk-kms` jar.
|
|
|
+* `joda-time` jar; use version 2.8.1 or later.
|
|
|
+* `httpclient` jar.
|
|
|
+* Jackson `jackson-core`, `jackson-annotations`, `jackson-databind` jars.
|
|
|
+
|
|
|
### Authentication properties
|
|
|
|
|
|
<property>
|
|
@@ -333,6 +433,7 @@ this capability.
|
|
|
|
|
|
<property>
|
|
|
<name>fs.s3a.path.style.access</name>
|
|
|
+ <value>false</value>
|
|
|
<description>Enable S3 path style access ie disabling the default virtual hosting behaviour.
|
|
|
Useful for S3A-compliant storage providers as it removes the need to set up DNS for virtual hosting.
|
|
|
</description>
|
|
@@ -432,7 +533,7 @@ this capability.
|
|
|
|
|
|
<property>
|
|
|
<name>fs.s3a.multiobjectdelete.enable</name>
|
|
|
- <value>false</value>
|
|
|
+ <value>true</value>
|
|
|
<description>When enabled, multiple single-object delete requests are replaced by
|
|
|
a single 'delete multiple objects'-request, reducing the number of requests.
|
|
|
Beware: legacy S3-compatible object stores might not support this request.
|
|
@@ -556,6 +657,211 @@ the available memory. These settings should be tuned to the envisioned
|
|
|
workflow (some large files, many small ones, ...) and the physical
|
|
|
limitations of the machine and cluster (memory, network bandwidth).
|
|
|
|
|
|
+## Troubleshooting S3A
|
|
|
+
|
|
|
+Common problems working with S3A are
|
|
|
+
|
|
|
+1. Classpath
|
|
|
+1. Authentication
|
|
|
+1. S3 Inconsistency side-effects
|
|
|
+
|
|
|
+Classpath is usually the first problem. For the S3x filesystem clients,
|
|
|
+you need the Hadoop-specific filesystem clients, third party S3 client libraries
|
|
|
+compatible with the Hadoop code, and any dependent libraries compatible with
|
|
|
+Hadoop and the specific JVM.
|
|
|
+
|
|
|
+The classpath must be set up for the process talking to S3: if this is code
|
|
|
+running in the Hadoop cluster, the JARs must be on that classpath. That
|
|
|
+includes `distcp`.
|
|
|
+
|
|
|
+
|
|
|
+### `ClassNotFoundException: org.apache.hadoop.fs.s3a.S3AFileSystem`
|
|
|
+
|
|
|
+(or `org.apache.hadoop.fs.s3native.NativeS3FileSystem`, `org.apache.hadoop.fs.s3.S3FileSystem`).
|
|
|
+
|
|
|
+These are the Hadoop classes, found in the `hadoop-aws` JAR. An exception
|
|
|
+reporting one of these classes is missing means that this JAR is not on
|
|
|
+the classpath.
|
|
|
+
|
|
|
+### `ClassNotFoundException: com.amazonaws.services.s3.AmazonS3Client`
|
|
|
+
|
|
|
+(or other `com.amazonaws` class.)
|
|
|
+`
|
|
|
+This means that one or more of the `aws-*-sdk` JARs are missing. Add them.
|
|
|
+
|
|
|
+### Missing method in AWS class
|
|
|
+
|
|
|
+This can be triggered by incompatibilities between the AWS SDK on the classpath
|
|
|
+and the version which Hadoop was compiled with.
|
|
|
+
|
|
|
+The AWS SDK JARs change their signature enough between releases that the only
|
|
|
+way to safely update the AWS SDK version is to recompile Hadoop against the later
|
|
|
+version.
|
|
|
+
|
|
|
+There's nothing the Hadoop team can do here: if you get this problem, then sorry,
|
|
|
+but you are on your own. The Hadoop developer team did look at using reflection
|
|
|
+to bind to the SDK, but there were too many changes between versions for this
|
|
|
+to work reliably. All it did was postpone version compatibility problems until
|
|
|
+the specific codepaths were executed at runtime —this was actually a backward
|
|
|
+step in terms of fast detection of compatibility problems.
|
|
|
+
|
|
|
+### Missing method in a Jackson class
|
|
|
+
|
|
|
+This is usually caused by version mismatches between Jackson JARs on the
|
|
|
+classpath. All Jackson JARs on the classpath *must* be of the same version.
|
|
|
+
|
|
|
+
|
|
|
+### Authentication failure
|
|
|
+
|
|
|
+One authentication problem is caused by classpath mismatch; see the joda time
|
|
|
+issue above.
|
|
|
+
|
|
|
+Otherwise, the general cause is: you have the wrong credentials —or somehow
|
|
|
+the credentials were not readable on the host attempting to read or write
|
|
|
+the S3 Bucket.
|
|
|
+
|
|
|
+There's not much that Hadoop can do/does for diagnostics here,
|
|
|
+though enabling debug logging for the package `org.apache.hadoop.fs.s3a`
|
|
|
+can help.
|
|
|
+
|
|
|
+There is also some logging in the AWS libraries which provide some extra details.
|
|
|
+In particular, the setting the log `com.amazonaws.auth.AWSCredentialsProviderChain`
|
|
|
+to log at DEBUG level will mean the invidual reasons for the (chained)
|
|
|
+authentication clients to fail will be printed.
|
|
|
+
|
|
|
+Otherwise, try to use the AWS command line tools with the same credentials.
|
|
|
+If you set the environment variables, you can take advantage of S3A's support
|
|
|
+of environment-variable authentication by attempting to use the `hdfs fs` command
|
|
|
+to read or write data on S3. That is: comment out the `fs.s3a` secrets and rely on
|
|
|
+the environment variables.
|
|
|
+
|
|
|
+S3 Frankfurt is a special case. It uses the V4 authentication API.
|
|
|
+
|
|
|
+### Authentication failures running on Java 8u60+
|
|
|
+
|
|
|
+A change in the Java 8 JVM broke some of the `toString()` string generation
|
|
|
+of Joda Time 2.8.0, which stopped the amazon s3 client from being able to
|
|
|
+generate authentication headers suitable for validation by S3.
|
|
|
+
|
|
|
+Fix: make sure that the version of Joda Time is 2.8.1 or later.
|
|
|
+
|
|
|
+## Visible S3 Inconsistency
|
|
|
+
|
|
|
+Amazon S3 is *an eventually consistent object store*. That is: not a filesystem.
|
|
|
+
|
|
|
+It offers read-after-create consistency: a newly created file is immediately
|
|
|
+visible. Except, there is a small quirk: a negative GET may be cached, such
|
|
|
+that even if an object is immediately created, the fact that there "wasn't"
|
|
|
+an object is still remembered.
|
|
|
+
|
|
|
+That means the following sequence on its own will be consistent
|
|
|
+```
|
|
|
+touch(path) -> getFileStatus(path)
|
|
|
+```
|
|
|
+
|
|
|
+But this sequence *may* be inconsistent.
|
|
|
+
|
|
|
+```
|
|
|
+getFileStatus(path) -> touch(path) -> getFileStatus(path)
|
|
|
+```
|
|
|
+
|
|
|
+A common source of visible inconsistencies is that the S3 metadata
|
|
|
+database —the part of S3 which serves list requests— is updated asynchronously.
|
|
|
+Newly added or deleted files may not be visible in the index, even though direct
|
|
|
+operations on the object (`HEAD` and `GET`) succeed.
|
|
|
+
|
|
|
+In S3A, that means the `getFileStatus()` and `open()` operations are more likely
|
|
|
+to be consistent with the state of the object store than any directory list
|
|
|
+operations (`listStatus()`, `listFiles()`, `listLocatedStatus()`,
|
|
|
+`listStatusIterator()`).
|
|
|
+
|
|
|
+
|
|
|
+### `FileNotFoundException` even though the file was just written.
|
|
|
+
|
|
|
+This can be a sign of consistency problems. It may also surface if there is some
|
|
|
+asynchronous file write operation still in progress in the client: the operation
|
|
|
+has returned, but the write has not yet completed. While the S3A client code
|
|
|
+does block during the `close()` operation, we suspect that asynchronous writes
|
|
|
+may be taking place somewhere in the stack —this could explain why parallel tests
|
|
|
+fail more often than serialized tests.
|
|
|
+
|
|
|
+### File not found in a directory listing, even though `getFileStatus()` finds it
|
|
|
+
|
|
|
+(Similarly: deleted file found in listing, though `getFileStatus()` reports
|
|
|
+that it is not there)
|
|
|
+
|
|
|
+This is a visible sign of updates to the metadata server lagging
|
|
|
+behind the state of the underlying filesystem.
|
|
|
+
|
|
|
+
|
|
|
+### File not visible/saved
|
|
|
+
|
|
|
+The files in an object store are not visible until the write has been completed.
|
|
|
+In-progress writes are simply saved to a local file/cached in RAM and only uploaded.
|
|
|
+at the end of a write operation. If a process terminated unexpectedly, or failed
|
|
|
+to call the `close()` method on an output stream, the pending data will have
|
|
|
+been lost.
|
|
|
+
|
|
|
+### File `flush()` and `hflush()` calls do not save data to S3A
|
|
|
+
|
|
|
+Again, this is due to the fact that the data is cached locally until the
|
|
|
+`close()` operation. The S3A filesystem cannot be used as a store of data
|
|
|
+if it is required that the data is persisted durably after every
|
|
|
+`flush()/hflush()` call. This includes resilient logging, HBase-style journalling
|
|
|
+and the like. The standard strategy here is to save to HDFS and then copy to S3.
|
|
|
+
|
|
|
+### Other issues
|
|
|
+
|
|
|
+*Performance slow*
|
|
|
+
|
|
|
+S3 is slower to read data than HDFS, even on virtual clusters running on
|
|
|
+Amazon EC2.
|
|
|
+
|
|
|
+* HDFS replicates data for faster query performance
|
|
|
+* HDFS stores the data on the local hard disks, avoiding network traffic
|
|
|
+ if the code can be executed on that host. As EC2 hosts often have their
|
|
|
+ network bandwidth throttled, this can make a tangible difference.
|
|
|
+* HDFS is significantly faster for many "metadata" operations: listing
|
|
|
+the contents of a directory, calling `getFileStatus()` on path,
|
|
|
+creating or deleting directories.
|
|
|
+* On HDFS, Directory renames and deletes are `O(1)` operations. On
|
|
|
+S3 renaming is a very expensive `O(data)` operation which may fail partway through
|
|
|
+in which case the final state depends on where the copy+ delete sequence was when it failed.
|
|
|
+All the objects are copied, then the original set of objects are deleted, so
|
|
|
+a failure should not lose data —it may result in duplicate datasets.
|
|
|
+* Because the write only begins on a `close()` operation, it may be in the final
|
|
|
+phase of a process where the write starts —this can take so long that some things
|
|
|
+can actually time out.
|
|
|
+
|
|
|
+The slow performance of `rename()` surfaces during the commit phase of work,
|
|
|
+including
|
|
|
+
|
|
|
+* The MapReduce FileOutputCommitter.
|
|
|
+* DistCp's rename after copy operation.
|
|
|
+
|
|
|
+Both these operations can be significantly slower when S3 is the destination
|
|
|
+compared to HDFS or other "real" filesystem.
|
|
|
+
|
|
|
+*Improving S3 load-balancing behavior*
|
|
|
+
|
|
|
+Amazon S3 uses a set of front-end servers to provide access to the underlying data.
|
|
|
+The choice of which front-end server to use is handled via load-balancing DNS
|
|
|
+service: when the IP address of an S3 bucket is looked up, the choice of which
|
|
|
+IP address to return to the client is made based on the the current load
|
|
|
+of the front-end servers.
|
|
|
+
|
|
|
+Over time, the load across the front-end changes, so those servers considered
|
|
|
+"lightly loaded" will change. If the DNS value is cached for any length of time,
|
|
|
+your application may end up talking to an overloaded server. Or, in the case
|
|
|
+of failures, trying to talk to a server that is no longer there.
|
|
|
+
|
|
|
+And by default, for historical security reasons in the era of applets,
|
|
|
+the DNS TTL of a JVM is "infinity".
|
|
|
+
|
|
|
+To work with AWS better, set the DNS time-to-live of an application which
|
|
|
+works with S3 to something lower. See [AWS documentation](http://docs.aws.amazon.com/AWSSdkDocsJava/latest/DeveloperGuide/java-dg-jvm-ttl.html).
|
|
|
+
|
|
|
+
|
|
|
## Testing the S3 filesystem clients
|
|
|
|
|
|
Due to eventual consistency, tests may fail without reason. Transient
|