|
@@ -0,0 +1,870 @@
|
|
|
+<!---
|
|
|
+ Licensed under the Apache License, Version 2.0 (the "License");
|
|
|
+ you may not use this file except in compliance with the License.
|
|
|
+ You may obtain a copy of the License at
|
|
|
+
|
|
|
+ http://www.apache.org/licenses/LICENSE-2.0
|
|
|
+
|
|
|
+ Unless required by applicable law or agreed to in writing, software
|
|
|
+ distributed under the License is distributed on an "AS IS" BASIS,
|
|
|
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
|
|
+ See the License for the specific language governing permissions and
|
|
|
+ limitations under the License. See accompanying LICENSE file.
|
|
|
+-->
|
|
|
+
|
|
|
+# Working with Delegation Tokens
|
|
|
+
|
|
|
+<!-- MACRO{toc|fromDepth=0|toDepth=2} -->
|
|
|
+
|
|
|
+## <a name="introduction"></a> Introducing S3A Delegation Tokens.
|
|
|
+
|
|
|
+The S3A filesystem client supports `Hadoop Delegation Tokens`.
|
|
|
+This allows YARN application like MapReduce, Distcp, Apache Flink and Apache Spark to
|
|
|
+obtain credentials to access S3 buckets and pass them pass these credentials to
|
|
|
+jobs/queries, so granting them access to the service with the same access
|
|
|
+permissions as the user.
|
|
|
+
|
|
|
+Three different token types are offered.
|
|
|
+
|
|
|
+*Full Delegation Tokens:* include the full login values of `fs.s3a.access.key`
|
|
|
+and `fs.s3a.secret.key` in the token, so the recipient has access to
|
|
|
+the data as the submitting user, with unlimited duration.
|
|
|
+These tokens do not involve communication with the AWS STS service, so
|
|
|
+can be used with other S3 installations.
|
|
|
+
|
|
|
+*Session Delegation Tokens:* These contain an "STS Session Token" requested by
|
|
|
+the S3A client from the AWS STS service. They have a limited duration
|
|
|
+so restrict how long an application can access AWS on behalf of a user.
|
|
|
+Clients with this token have the full permissions of the user.
|
|
|
+
|
|
|
+*Role Delegation Tokens:* These contain an "STS Session Token" requested by by the
|
|
|
+STS "Assume Role" API, so grant the caller to interact with S3 as specific AWS
|
|
|
+role, *with permissions restricted to purely accessing the S3 bucket
|
|
|
+and associated S3Guard data*.
|
|
|
+
|
|
|
+Role Delegation Tokens are the most powerful. By restricting the access rights
|
|
|
+of the granted STS token, no process receiving the token may perform
|
|
|
+any operations in the AWS infrastructure other than those for the S3 bucket,
|
|
|
+and that restricted by the rights of the requested role ARN.
|
|
|
+
|
|
|
+All three tokens also marshall the encryption settings: The encryption mechanism
|
|
|
+to use and the KMS key ID or SSE-C client secret. This allows encryption
|
|
|
+policy and secrets to be uploaded from the client to the services.
|
|
|
+
|
|
|
+This document covers how to use these tokens. For details on the implementation
|
|
|
+see [S3A Delegation Token Architecture](delegation_token_architecture.html).
|
|
|
+
|
|
|
+## <a name="background"></a> Background: Hadoop Delegation Tokens.
|
|
|
+
|
|
|
+A Hadoop Delegation Token are is a byte array of data which is submitted to
|
|
|
+a Hadoop services as proof that the caller has the permissions to perform
|
|
|
+the operation which it is requesting —
|
|
|
+and which can be passed between applications to *delegate* those permission.
|
|
|
+
|
|
|
+Tokens are opaque to clients, clients who simply get a byte array
|
|
|
+of data which they must to provide to a service when required.
|
|
|
+This normally contains encrypted data for use by the service.
|
|
|
+
|
|
|
+The service, which holds the password to encrypt/decrypt this data,
|
|
|
+can decrypt the byte array and read the contents,
|
|
|
+knowing that it has not been tampered with, then
|
|
|
+use the presence of a valid token as evidence the caller has
|
|
|
+at least temporary permissions to perform the requested operation.
|
|
|
+
|
|
|
+Tokens have a limited lifespan.
|
|
|
+They may be renewed, with the client making an IPC/HTTP request of a renewer service.
|
|
|
+This renewal service can also be executed on behalf of the caller by
|
|
|
+some other Hadoop cluster services, such as the YARN Resource Manager.
|
|
|
+
|
|
|
+After use, tokens may be revoked: this relies on services holding tables of
|
|
|
+valid tokens, either in memory or, for any HA service, in Apache Zookeeper or
|
|
|
+similar. Revoking tokens is used to clean up after jobs complete.
|
|
|
+
|
|
|
+Delegation support is tightly integrated with YARN: requests to launch
|
|
|
+containers and applications can include a list of delegation tokens to
|
|
|
+pass along. These tokens are serialized with the request, saved to a file
|
|
|
+on the node launching the container, and then loaded in to the credentials
|
|
|
+of the active user. Normally the HDFS cluster is one of the tokens used here,
|
|
|
+added to the credentials through a call to `FileSystem.getDelegationToken()`
|
|
|
+(usually via `FileSystem.addDelegationTokens()`).
|
|
|
+
|
|
|
+Delegation Tokens are also supported with applications such as Hive: a query
|
|
|
+issued to a shared (long-lived) Hive cluster can include the delegation
|
|
|
+tokens required to access specific filesystems *with the rights of the user
|
|
|
+submitting the query*.
|
|
|
+
|
|
|
+All these applications normally only retrieve delegation tokens when security
|
|
|
+is enabled. This is why the cluster configuration needs to enable Kerberos.
|
|
|
+Production Hadoop clusters need Kerberos for security anyway.
|
|
|
+
|
|
|
+
|
|
|
+## <a name="s3a-delegation-tokens"></a> S3A Delegation Tokens.
|
|
|
+
|
|
|
+S3A now supports delegation tokens, so allowing a caller to acquire tokens
|
|
|
+from a local S3A Filesystem connector instance and pass them on to
|
|
|
+applications to grant them equivalent or restricted access.
|
|
|
+
|
|
|
+These S3A Delegation Tokens are special in that they do not contain
|
|
|
+password-protected data opaque to clients; they contain the secrets needed
|
|
|
+to access the relevant S3 buckets and associated services.
|
|
|
+
|
|
|
+They are obtained by requesting a delegation token from the S3A filesystem client.
|
|
|
+Issued token mey be included in job submissions, passed to running applications,
|
|
|
+etc. This token is specific to an individual bucket; all buckets which a client
|
|
|
+wishes to work with must have a separate delegation token issued.
|
|
|
+
|
|
|
+S3A implements Delegation Tokens in its `org.apache.hadoop.fs.s3a.auth.delegation.S3ADelegationTokens`
|
|
|
+class, which then supports multiple "bindings" behind it, so supporting
|
|
|
+different variants of S3A Delegation Tokens.
|
|
|
+
|
|
|
+Because applications only collect Delegation Tokens in secure clusters,
|
|
|
+It does mean that to be able to submit delegation tokens in transient
|
|
|
+cloud-hosted Hadoop clusters, _these clusters must also have Kerberos enabled_.
|
|
|
+
|
|
|
+
|
|
|
+### <a name="session-tokens"></a> S3A Session Delegation Tokens
|
|
|
+
|
|
|
+A Session Delegation Token is created by asking the AWS
|
|
|
+[Security Token Service](http://docs.aws.amazon.com/STS/latest/APIReference/Welcome.html)
|
|
|
+to issue an AWS session password and identifier for a limited duration.
|
|
|
+These AWS session credentials are valid until the end of that time period.
|
|
|
+They are marshalled into the S3A Delegation Token.
|
|
|
+
|
|
|
+Other S3A connectors can extract these credentials and use them to
|
|
|
+talk to S3 and related services.
|
|
|
+
|
|
|
+Issued tokens cannot be renewed or revoked.
|
|
|
+
|
|
|
+See [GetSessionToken](http://docs.aws.amazon.com/STS/latest/APIReference/API_GetSessionToken.html)
|
|
|
+for specifics details on the (current) token lifespan.
|
|
|
+
|
|
|
+### <a name="role-tokens"></a> S3A Role Delegation Tokens
|
|
|
+
|
|
|
+A Role Delegation Tokens is created by asking the AWS
|
|
|
+[Security Token Service](http://docs.aws.amazon.com/STS/latest/APIReference/Welcome.html)
|
|
|
+for set of "Assumed Role" credentials, with a AWS account specific role for a limited duration..
|
|
|
+This role is restricted to only grant access the S3 bucket, the S3Guard table
|
|
|
+and all KMS keys,
|
|
|
+They are marshalled into the S3A Delegation Token.
|
|
|
+
|
|
|
+Other S3A connectors can extract these credentials and use them to
|
|
|
+talk to S3 and related services.
|
|
|
+They may only work with the explicit AWS resources identified when the token was generated.
|
|
|
+
|
|
|
+Issued tokens cannot be renewed or revoked.
|
|
|
+
|
|
|
+
|
|
|
+### <a name="full-credentials"></a> S3A Full-Credential Delegation Tokens
|
|
|
+
|
|
|
+Full Credential Delegation Tokens tokens contain the full AWS login details
|
|
|
+(access key and secret key) needed to access a bucket.
|
|
|
+
|
|
|
+They never expire, so are the equivalent of storing the AWS account credentials
|
|
|
+in a Hadoop, Hive, Spark configuration or similar.
|
|
|
+
|
|
|
+They differences are:
|
|
|
+
|
|
|
+1. They are automatically passed from the client/user to the application.
|
|
|
+A remote application can use them to access data on behalf of the user.
|
|
|
+1. When a remote application destroys the filesystem connector instances and
|
|
|
+tokens of a user, the secrets are destroyed too.
|
|
|
+1. Secrets in the `AWS_` environment variables on the client will be picked up
|
|
|
+and automatically propagated.
|
|
|
+1. They do not use the AWS STS service, so may work against third-party implementations
|
|
|
+of the S3 protocol.
|
|
|
+
|
|
|
+
|
|
|
+## <a name="enabling "></a> Using S3A Delegation Tokens
|
|
|
+
|
|
|
+A prerequisite to using S3A filesystem delegation tokens is to run with
|
|
|
+Hadoop security enabled —which inevitably means with Kerberos.
|
|
|
+Even though S3A delegation tokens do not use Kerberos, the code in
|
|
|
+applications which fetch DTs is normally only executed when the cluster is
|
|
|
+running in secure mode; somewhere where the `core-site.xml` configuration
|
|
|
+sets `hadoop.security.authentication` to to `kerberos` or another valid
|
|
|
+authentication mechanism.
|
|
|
+
|
|
|
+* Without enabling security at this level, delegation tokens will not
|
|
|
+be collected.*
|
|
|
+
|
|
|
+Once Kerberos enabled, the process for acquiring tokens is as follows:
|
|
|
+
|
|
|
+1. Enable Delegation token support by setting `fs.s3a.delegation.token.binding`
|
|
|
+to the classname of the token binding to use.
|
|
|
+to use.
|
|
|
+1. Add any other binding-specific settings (STS endpoint, IAM role, etc.)
|
|
|
+1. Make sure the settings are the same in the service as well as the client.
|
|
|
+1. In the client, switch to using a [Hadoop Credential Provider](hadoop-project-dist/hadoop-common/CredentialProviderAPI.html)
|
|
|
+for storing your local credentials, *with a local filesystem store
|
|
|
+ (`localjceks:` or `jcecks://file`), so as to keep the full secrets out of any
|
|
|
+ job configurations.
|
|
|
+1. Execute the client from a Kerberos-authenticated account
|
|
|
+application configured with the login credentials for an AWS account able to issue session tokens.
|
|
|
+
|
|
|
+### Configuration Parameters
|
|
|
+
|
|
|
+
|
|
|
+| **Key** | **Meaning** | **Default** |
|
|
|
+| --- | --- | --- |
|
|
|
+| `fs.s3a.delegation.token.binding` | delegation token binding class | `` |
|
|
|
+
|
|
|
+### Warnings
|
|
|
+
|
|
|
+##### Use Hadoop Credential Providers to keep secrets out of job configurations.
|
|
|
+
|
|
|
+Hadoop MapReduce jobs copy their client-side configurations with the job.
|
|
|
+If your AWS login secrets are set in an XML file then they are picked up
|
|
|
+and passed in with the job, _even if delegation tokens are used to propagate
|
|
|
+session or role secrets.
|
|
|
+
|
|
|
+Spark-submit will take any credentials in the `spark-defaults.conf`file
|
|
|
+and again, spread them across the cluster.
|
|
|
+It wil also pick up any `AWS_` environment variables and convert them into
|
|
|
+`fs.s3a.access.key`, `fs.s3a.secret.key` and `fs.s3a.session.key` configuration
|
|
|
+options.
|
|
|
+
|
|
|
+To guarantee that the secrets are not passed in, keep your secrets in
|
|
|
+a [hadoop credential provider file on the local filesystem](index.html#hadoop_credential_providers").
|
|
|
+Secrets stored here will not be propagated -the delegation tokens collected
|
|
|
+during job submission will be the sole AWS secrets passed in.
|
|
|
+
|
|
|
+
|
|
|
+##### Token Life
|
|
|
+
|
|
|
+* S3A Delegation tokens cannot be renewed.
|
|
|
+
|
|
|
+* S3A Delegation tokens cannot be revoked. It is possible for an administrator
|
|
|
+to terminate *all AWS sessions using a specific role*
|
|
|
+[from the AWS IAM console](https://docs.aws.amazon.com/IAM/latest/UserGuide/id_credentials_temp_control-access_disable-perms.html),
|
|
|
+if desired.
|
|
|
+
|
|
|
+* The lifespan of Session Delegation Tokens are limited to those of AWS sessions,
|
|
|
+maximum of 36 hours.
|
|
|
+
|
|
|
+* The lifespan of a Role Delegation Token is limited to 1 hour by default;
|
|
|
+a longer duration of up to 12 hours can be enabled in the AWS console for
|
|
|
+the specific role being used.
|
|
|
+
|
|
|
+* The lifespan of Full Delegation tokens is unlimited: the secret needs
|
|
|
+to be reset in the AWS Admin console to revoke it.
|
|
|
+
|
|
|
+##### Service Load on the AWS Secure Token Service
|
|
|
+
|
|
|
+All delegation tokens are issued on a bucket-by-bucket basis: clients
|
|
|
+must request a delegation token from every S3A filesystem to which it desires
|
|
|
+access.
|
|
|
+
|
|
|
+For Session and Role Delegation Tokens, this places load on the AWS STS service,
|
|
|
+which may trigger throttling amongst all users within the same AWS account using
|
|
|
+the same STS endpoint.
|
|
|
+
|
|
|
+* In experiments, a few hundred requests per second are needed to trigger throttling,
|
|
|
+so this is very unlikely to surface in production systems.
|
|
|
+* The S3A filesystem connector retries all throttled requests to AWS services, including STS.
|
|
|
+* Other S3 clients with use the AWS SDK will, if configured, also retry throttled requests.
|
|
|
+
|
|
|
+Overall, the risk of triggering STS throttling appears low, and most applications
|
|
|
+will recover from what is generally an intermittently used AWS service.
|
|
|
+
|
|
|
+### <a name="enabling-session-tokens"></a> Enabling Session Delegation Tokens
|
|
|
+
|
|
|
+For session tokens, set `fs.s3a.delegation.token.binding`
|
|
|
+to `org.apache.hadoop.fs.s3a.auth.delegation.SessionTokenBinding`
|
|
|
+
|
|
|
+
|
|
|
+| **Key** | **Value** |
|
|
|
+| --- | --- |
|
|
|
+| `fs.s3a.delegation.token.binding` | `org.apache.hadoop.fs.s3a.auth.delegation.SessionTokenBinding` |
|
|
|
+
|
|
|
+There some further configuration options.
|
|
|
+
|
|
|
+| **Key** | **Meaning** | **Default** |
|
|
|
+| --- | --- | --- |
|
|
|
+| `fs.s3a.assumed.role.session.duration` | Duration of delegation tokens | `1h` |
|
|
|
+| `fs.s3a.assumed.role.sts.endpoint` | URL to service issuing tokens | (undefined) |
|
|
|
+| `fs.s3a.assumed.role.sts.endpoint.region` | region for issued tokens | (undefined) |
|
|
|
+
|
|
|
+The XML settings needed to enable session tokens are:
|
|
|
+
|
|
|
+```xml
|
|
|
+<property>
|
|
|
+ <name>fs.s3a.delegation.token.binding</name>
|
|
|
+ <value>org.apache.hadoop.fs.s3a.auth.delegation.SessionTokenBinding</value>
|
|
|
+</property>
|
|
|
+<property>
|
|
|
+ <name>fs.s3a.assumed.role.session.duration</name>
|
|
|
+ <value>1h</value>
|
|
|
+</property>
|
|
|
+```
|
|
|
+
|
|
|
+1. If the application requesting a token has full AWS credentials for the
|
|
|
+relevant bucket, then a new session token will be issued.
|
|
|
+1. If the application requesting a token is itself authenticating with
|
|
|
+a session delegation token, then the existing token will be forwarded.
|
|
|
+The life of the token will not be extended.
|
|
|
+1. If the application requesting a token does not have either of these,
|
|
|
+the the tokens cannot be issued: the operation will fail with an error.
|
|
|
+
|
|
|
+
|
|
|
+The endpoint for STS requests are set by the same configuration
|
|
|
+property as for the `AssumedRole` credential provider and for Role Delegation
|
|
|
+tokens.
|
|
|
+
|
|
|
+```xml
|
|
|
+<!-- Optional -->
|
|
|
+<property>
|
|
|
+ <name>fs.s3a.assumed.role.sts.endpoint</name>
|
|
|
+ <value>sts.amazonaws.com</value>
|
|
|
+</property>
|
|
|
+<property>
|
|
|
+ <name>fs.s3a.assumed.role.sts.endpoint.region</name>
|
|
|
+ <value>us-west-1</value>
|
|
|
+</property>
|
|
|
+```
|
|
|
+
|
|
|
+If the `fs.s3a.assumed.role.sts.endpoint` option is set, or set to something
|
|
|
+other than the central `sts.amazonaws.com` endpoint, then the region property
|
|
|
+*must* be set.
|
|
|
+
|
|
|
+
|
|
|
+Both the Session and the Role Delegation Token bindings use the option
|
|
|
+`fs.s3a.aws.credentials.provider` to define the credential providers
|
|
|
+to authenticate to the AWS STS with.
|
|
|
+
|
|
|
+Here is the effective list of providers if none are declared:
|
|
|
+
|
|
|
+```xml
|
|
|
+<property>
|
|
|
+ <name>fs.s3a.aws.credentials.provider</name>
|
|
|
+ <value>
|
|
|
+ org.apache.hadoop.fs.s3a.TemporaryAWSCredentialsProvider,
|
|
|
+ org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider,
|
|
|
+ com.amazonaws.auth.EnvironmentVariableCredentialsProvider,
|
|
|
+ org.apache.hadoop.fs.s3a.auth.IAMInstanceCredentialsProvider
|
|
|
+ </value>
|
|
|
+</property>
|
|
|
+```
|
|
|
+
|
|
|
+Not all these authentication mechanisms provide the full set of credentials
|
|
|
+STS needs. The session token provider will simply forward any session credentials
|
|
|
+it is authenticated with; the role token binding will fail.
|
|
|
+
|
|
|
+#### Forwarding of existing AWS Session credentials.
|
|
|
+
|
|
|
+When the AWS credentials supplied to the Session Delegation Token binding
|
|
|
+through `fs.s3a.aws.credentials.provider` are themselves a set of
|
|
|
+session credentials, generated delegation tokens with simply contain these
|
|
|
+existing session credentials, a new set of credentials obtained from STS.
|
|
|
+This is because the STS service does not let
|
|
|
+callers authenticated with session/role credentials from requesting new sessions.
|
|
|
+
|
|
|
+This feature is useful when generating tokens from an EC2 VM instance in one IAM
|
|
|
+role and forwarding them over to VMs which are running in a different IAM role.
|
|
|
+The tokens will grant the permissions of the original VM's IAM role.
|
|
|
+
|
|
|
+The duration of the forwarded tokens will be exactly that of the current set of
|
|
|
+tokens, which may be very limited in lifespan. A warning will appear
|
|
|
+in the logs declaring this.
|
|
|
+
|
|
|
+Note: Role Delegation tokens do not support this forwarding of session credentials,
|
|
|
+because there's no way to explicitly change roles in the process.
|
|
|
+
|
|
|
+
|
|
|
+### <a name="enabling-role-tokens"></a> Enabling Role Delegation Tokens
|
|
|
+
|
|
|
+For role delegation tokens, set `fs.s3a.delegation.token.binding`
|
|
|
+to `org.apache.hadoop.fs.s3a.auth.delegation.RoleTokenBinding`
|
|
|
+
|
|
|
+| **Key** | **Value** |
|
|
|
+| --- | --- |
|
|
|
+| `fs.s3a.delegation.token.binding` | `org.apache.hadoop.fs.s3a.auth.delegation.SessionToRoleTokenBinding` |
|
|
|
+
|
|
|
+
|
|
|
+There are some further configuration options:
|
|
|
+
|
|
|
+| **Key** | **Meaning** | **Default** |
|
|
|
+| --- | --- | --- |
|
|
|
+| `fs.s3a.assumed.role.session.duration"` | Duration of delegation tokens | `1h` |
|
|
|
+| `fs.s3a.assumed.role.arn` | ARN for role to request | (undefined) |
|
|
|
+| `fs.s3a.assumed.role.sts.endpoint.region` | region for issued tokens | (undefined) |
|
|
|
+
|
|
|
+The option `fs.s3a.assumed.role.arn` must be set to a role which the
|
|
|
+user can assume. It must have permissions to access the bucket, any
|
|
|
+associated S3Guard table and any KMS encryption keys. The actual
|
|
|
+requested role will be this role, explicitly restricted to the specific
|
|
|
+bucket and S3Guard table.
|
|
|
+
|
|
|
+The XML settings needed to enable session tokens are:
|
|
|
+
|
|
|
+```xml
|
|
|
+<property>
|
|
|
+ <name>fs.s3a.delegation.token.binding</name>
|
|
|
+ <value>org.apache.hadoop.fs.s3a.auth.delegation.RoleTokenBinding</value>
|
|
|
+</property>
|
|
|
+<property>
|
|
|
+ <name>fs.s3a.assumed.role.arn</name>
|
|
|
+ <value>ARN of role to request</value>
|
|
|
+ <value>REQUIRED ARN</value>
|
|
|
+</property>
|
|
|
+<property>
|
|
|
+ <name>fs.s3a.assumed.role.session.duration</name>
|
|
|
+ <value>1h</value>
|
|
|
+</property>
|
|
|
+```
|
|
|
+
|
|
|
+A JSON role policy for the role/session will automatically be generated which will
|
|
|
+consist of
|
|
|
+1. Full access to the S3 bucket for all operations used by the S3A client
|
|
|
+(read, write, list, multipart operations, get bucket location, etc).
|
|
|
+1. Full user access to any S3Guard DynamoDB table used by the bucket.
|
|
|
+1. Full user access to KMS keys. This is to be able to decrypt any data
|
|
|
+in the bucket encrypted with SSE-KMS, as well as encrypt new data if that
|
|
|
+is the encryption policy.
|
|
|
+
|
|
|
+If the client doesn't have S3Guard enabled, but the remote application does,
|
|
|
+the issued role tokens will not have permission to access the S3Guard table.
|
|
|
+
|
|
|
+### <a name="enabling-full-tokens"></a> Enabling Full Delegation Tokens
|
|
|
+
|
|
|
+This passes the full credentials in, falling back to any session credentials
|
|
|
+which were used to configure the S3A FileSystem instance.
|
|
|
+
|
|
|
+For Full Credential Delegation tokens, set `fs.s3a.delegation.token.binding`
|
|
|
+to `org.apache.hadoop.fs.s3a.auth.delegation.FullCredentialsTokenBinding`
|
|
|
+
|
|
|
+| **Key** | **Value** |
|
|
|
+| --- | --- |
|
|
|
+| `fs.s3a.delegation.token.binding` | `org.apache.hadoop.fs.s3a.auth.delegation.FullCredentialsTokenBinding` |
|
|
|
+
|
|
|
+There are no other configuration options.
|
|
|
+
|
|
|
+```xml
|
|
|
+<property>
|
|
|
+ <name>fs.s3a.delegation.token.binding</name>
|
|
|
+ <value>org.apache.hadoop.fs.s3a.auth.delegation.FullCredentialsTokenBinding</value>
|
|
|
+</property>
|
|
|
+```
|
|
|
+
|
|
|
+Key points:
|
|
|
+
|
|
|
+1. If the application requesting a token has full AWS credentials for the
|
|
|
+relevant bucket, then a full credential token will be issued.
|
|
|
+1. If the application requesting a token is itself authenticating with
|
|
|
+a session delegation token, then the existing token will be forwarded.
|
|
|
+The life of the token will not be extended.
|
|
|
+1. If the application requesting a token does not have either of these,
|
|
|
+the the tokens cannot be issued: the operation will fail with an error.
|
|
|
+
|
|
|
+## <a name="managing_token_duration"></a> Managing the Delegation Tokens Duration
|
|
|
+
|
|
|
+Full Credentials have an unlimited lifespan.
|
|
|
+
|
|
|
+Session and role credentials have a lifespan defined by the duration
|
|
|
+property `fs.s3a.assumed.role.session.duration`.
|
|
|
+
|
|
|
+This can have a maximum value of "36h" for session delegation tokens.
|
|
|
+
|
|
|
+For Role Delegation Tokens, the maximum duration of a token is
|
|
|
+that of the role itself: 1h by default, though this can be changed to
|
|
|
+12h [In the IAM Console](https://console.aws.amazon.com/iam/home#/roles),
|
|
|
+or from the AWS CLI.
|
|
|
+
|
|
|
+*Without increasing the duration of role, one hour is the maximum value;
|
|
|
+the error message `The requested DurationSeconds exceeds the MaxSessionDuration set for this role`
|
|
|
+is returned if the requested duration of a Role Delegation Token is greater
|
|
|
+than that available for the role.
|
|
|
+
|
|
|
+
|
|
|
+## <a name="testing"></a> Testing Delegation Token Support
|
|
|
+
|
|
|
+The easiest way to test that delegation support is configured is to use
|
|
|
+the `hdfs fetchdt` command, which can fetch tokens from S3A, Azure ABFS
|
|
|
+and any other filesystem which can issue tokens, as well as HDFS itself.
|
|
|
+
|
|
|
+This will fetch the token and save it to the named file (here, `tokens.bin`),
|
|
|
+even if Kerberos is disabled.
|
|
|
+
|
|
|
+```bash
|
|
|
+# Fetch a token for the AWS landsat-pds bucket and save it to tokens.bin
|
|
|
+$ hdfs fetchdt --webservice s3a://landsat-pds/ tokens.bin
|
|
|
+```
|
|
|
+
|
|
|
+If the command fails with `ERROR: Failed to fetch token` it means the
|
|
|
+filesystem does not have delegation tokens enabled.
|
|
|
+
|
|
|
+If it fails for other reasons, the likely causes are configuration and
|
|
|
+possibly connectivity to the AWS STS Server.
|
|
|
+
|
|
|
+Once collected, the token can be printed. This will show
|
|
|
+the type of token, details about encryption and expiry, and the
|
|
|
+host on which it was created.
|
|
|
+
|
|
|
+```bash
|
|
|
+$ bin/hdfs fetchdt --print tokens.bin
|
|
|
+
|
|
|
+Token (S3ATokenIdentifier{S3ADelegationToken/Session; uri=s3a://landsat-pds;
|
|
|
+timestamp=1541683947569; encryption=EncryptionSecrets{encryptionMethod=SSE_S3};
|
|
|
+Created on vm1.local/192.168.99.1 at time 2018-11-08T13:32:26.381Z.};
|
|
|
+Session credentials for user AAABWL expires Thu Nov 08 14:02:27 GMT 2018; (valid))
|
|
|
+for s3a://landsat-pds
|
|
|
+```
|
|
|
+The "(valid)" annotation means that the AWS credentials are considered "valid":
|
|
|
+there is both a username and a secret.
|
|
|
+
|
|
|
+You can use the `s3guard bucket-info` command to see what the delegation
|
|
|
+support for a specific bucket is.
|
|
|
+If delegation support is enabled, it also prints the current
|
|
|
+hadoop security level.
|
|
|
+
|
|
|
+```bash
|
|
|
+$ hadoop s3guard bucket-info s3a://landsat-pds/
|
|
|
+
|
|
|
+Filesystem s3a://landsat-pds
|
|
|
+Location: us-west-2
|
|
|
+Filesystem s3a://landsat-pds is not using S3Guard
|
|
|
+The "magic" committer is supported
|
|
|
+
|
|
|
+S3A Client
|
|
|
+ Endpoint: fs.s3a.endpoint=s3.amazonaws.com
|
|
|
+ Encryption: fs.s3a.server-side-encryption-algorithm=none
|
|
|
+ Input seek policy: fs.s3a.experimental.input.fadvise=normal
|
|
|
+Delegation Support enabled: token kind = S3ADelegationToken/Session
|
|
|
+Hadoop security mode: SIMPLE
|
|
|
+```
|
|
|
+
|
|
|
+Although the S3A delegation tokens do not depend upon Kerberos,
|
|
|
+MapReduce and other applications only request tokens from filesystems when
|
|
|
+security is enabled in Hadoop.
|
|
|
+
|
|
|
+
|
|
|
+## <a name="troubleshooting"></a> Troubleshooting S3A Delegation Tokens
|
|
|
+
|
|
|
+The `hadoop s3guard bucket-info` command will print information about
|
|
|
+the delegation state of a bucket.
|
|
|
+
|
|
|
+Consult [troubleshooting Assumed Roles](assumed_roles.html#troubleshooting)
|
|
|
+for details on AWS error messages related to AWS IAM roles.
|
|
|
+
|
|
|
+The [cloudstore](https://github.com/steveloughran/cloudstore) module's StoreDiag
|
|
|
+utility can also be used to explore delegation token support
|
|
|
+
|
|
|
+
|
|
|
+### Submitted job cannot authenticate
|
|
|
+
|
|
|
+There are many causes for this; delegation tokens add some more.
|
|
|
+
|
|
|
+### Tokens are not issued
|
|
|
+
|
|
|
+
|
|
|
+* This user is not `kinit`-ed in to Kerberos. Use `klist` and
|
|
|
+`hadoop kdiag` to see the Kerberos authentication state of the logged in user.
|
|
|
+* The filesystem instance on the client has not had a token binding set in
|
|
|
+`fs.s3a.delegation.token.binding`, so does not attempt to issue any.
|
|
|
+* The job submission is not aware that access to the specific S3 buckets
|
|
|
+are required. Review the application's submission mechanism to determine
|
|
|
+how to list source and destination paths. For example, for MapReduce,
|
|
|
+tokens for the cluster filesystem (`fs.defaultFS`) and all filesystems
|
|
|
+referenced as input and output paths will be queried for
|
|
|
+delegation tokens.
|
|
|
+
|
|
|
+For Apache Spark, the cluster filesystem and any filesystems listed in the
|
|
|
+property `spark.yarn.access.hadoopFileSystems` are queried for delegation
|
|
|
+tokens in secure clusters.
|
|
|
+See [Running on Yarn](https://spark.apache.org/docs/latest/running-on-yarn.html).
|
|
|
+
|
|
|
+
|
|
|
+### Error `No AWS login credentials`
|
|
|
+
|
|
|
+The client does not have any valid credentials to request a token
|
|
|
+from the Amazon STS service.
|
|
|
+
|
|
|
+### Tokens Expire before job completes
|
|
|
+
|
|
|
+The default duration of session and role tokens as set in
|
|
|
+`fs.s3a.assumed.role.session.duration` is one hour, "1h".
|
|
|
+
|
|
|
+For session tokens, this can be increased to any time up to 36 hours.
|
|
|
+
|
|
|
+For role tokens, it can be increased up to 12 hours, *but only if
|
|
|
+the role is configured in the AWS IAM Console to have a longer lifespan*.
|
|
|
+
|
|
|
+
|
|
|
+### Error `DelegationTokenIOException: Token mismatch`
|
|
|
+
|
|
|
+```
|
|
|
+org.apache.hadoop.fs.s3a.auth.delegation.DelegationTokenIOException:
|
|
|
+ Token mismatch: expected token for s3a://example-bucket
|
|
|
+ of type S3ADelegationToken/Session but got a token of type S3ADelegationToken/Full
|
|
|
+
|
|
|
+ at org.apache.hadoop.fs.s3a.auth.delegation.S3ADelegationTokens.lookupToken(S3ADelegationTokens.java:379)
|
|
|
+ at org.apache.hadoop.fs.s3a.auth.delegation.S3ADelegationTokens.selectTokenFromActiveUser(S3ADelegationTokens.java:300)
|
|
|
+ at org.apache.hadoop.fs.s3a.auth.delegation.S3ADelegationTokens.bindToExistingDT(S3ADelegationTokens.java:160)
|
|
|
+ at org.apache.hadoop.fs.s3a.S3AFileSystem.bindAWSClient(S3AFileSystem.java:423)
|
|
|
+ at org.apache.hadoop.fs.s3a.S3AFileSystem.initialize(S3AFileSystem.java:265)
|
|
|
+```
|
|
|
+
|
|
|
+The value of `fs.s3a.delegation.token.binding` is different in the remote
|
|
|
+service than in the local client. As a result, the remote service
|
|
|
+cannot use the token supplied by the client to authenticate.
|
|
|
+
|
|
|
+Fix: reference the same token binding class at both ends.
|
|
|
+
|
|
|
+
|
|
|
+### Warning `Forwarding existing session credentials`
|
|
|
+
|
|
|
+This message is printed when an S3A filesystem instance has been asked
|
|
|
+for a Session Delegation Token, and it is itself only authenticated with
|
|
|
+a set of AWS session credentials (such as those issued by the IAM metadata
|
|
|
+service).
|
|
|
+
|
|
|
+The created token will contain these existing credentials, credentials which
|
|
|
+can be used until the existing session expires.
|
|
|
+
|
|
|
+The duration of this existing session is unknown: the message is warning
|
|
|
+you that it may expire without warning.
|
|
|
+
|
|
|
+### Error `Cannot issue S3A Role Delegation Tokens without full AWS credentials`
|
|
|
+
|
|
|
+An S3A filesystem instance has been asked for a Role Delegation Token,
|
|
|
+but the instance is only authenticated with session tokens.
|
|
|
+This means that a set of role tokens cannot be requested.
|
|
|
+
|
|
|
+Note: no attempt is made to convert the existing set of session tokens into
|
|
|
+a delegation token, unlike the Session Delegation Tokens. This is because
|
|
|
+the role of the current session (if any) is unknown.
|
|
|
+
|
|
|
+
|
|
|
+## <a name="implementation"></a> Implementation Details
|
|
|
+
|
|
|
+### <a name="architecture"></a> Architecture
|
|
|
+
|
|
|
+Concepts:
|
|
|
+
|
|
|
+1. The S3A FileSystem can create delegation tokens when requested.
|
|
|
+1. These can be marshalled as per other Hadoop Delegation Tokens.
|
|
|
+1. At the far end, they can be retrieved, unmarshalled and used to authenticate callers.
|
|
|
+1. DT binding plugins can then use these directly, or, somehow,
|
|
|
+manage authentication and token issue through other services
|
|
|
+(for example: Kerberos)
|
|
|
+1. Token Renewal and Revocation are not supported.
|
|
|
+
|
|
|
+
|
|
|
+There's support for different back-end token bindings through the
|
|
|
+`org.apache.hadoop.fs.s3a.auth.delegation.S3ADelegationTokenManager`
|
|
|
+
|
|
|
+Every implementation of this must return a subclass of
|
|
|
+`org.apache.hadoop.fs.s3a.auth.delegation.AbstractS3ATokenIdentifier`
|
|
|
+when asked to create a delegation token; this subclass must be registered
|
|
|
+in `META-INF/services/org.apache.hadoop.security.token.TokenIdentifier`
|
|
|
+for unmarshalling.
|
|
|
+
|
|
|
+This identifier must contain all information needed at the far end to
|
|
|
+authenticate the caller with AWS services used by the S3A client: AWS S3 and
|
|
|
+potentially AWS KMS (for SSE-KMS) and AWS DynamoDB (for S3Guard).
|
|
|
+
|
|
|
+It must have its own unique *Token Kind*, to ensure that it can be distinguished
|
|
|
+from the other token identifiers when tokens are being unmarshalled.
|
|
|
+
|
|
|
+| Kind | Token class |
|
|
|
+|------|--------------|
|
|
|
+| `S3ADelegationToken/Full` | `org.apache.hadoop.fs.s3a.auth.delegation.FullCredentialsTokenIdentifier` |
|
|
|
+| `S3ADelegationToken/Session` | `org.apache.hadoop.fs.s3a.auth.delegation.RoleTokenIdentifier`|
|
|
|
+| `S3ADelegationToken/Role` | `org.apache.hadoop.fs.s3a.auth.delegation.SessionTokenIdentifier` |
|
|
|
+
|
|
|
+If implementing an external binding:
|
|
|
+
|
|
|
+1. Follow the security requirements below.
|
|
|
+1. Define a new token identifier; there is no requirement for the `S3ADelegationToken/`
|
|
|
+prefix —but it is useful for debugging.
|
|
|
+1. Token Renewal and Revocation is not integrated with the binding mechanism;
|
|
|
+if the operations are supported, implementation is left as an exercise.
|
|
|
+1. Be aware of the stability guarantees of the module "LimitedPrivate/Unstable".
|
|
|
+
|
|
|
+### <a name="security"></a> Security
|
|
|
+
|
|
|
+S3A DTs contain secrets valuable for a limited period (session secrets) or
|
|
|
+long-lived secrets with no explicit time limit.
|
|
|
+
|
|
|
+* The `toString()` operations on token identifiers MUST NOT print secrets; this
|
|
|
+is needed to keep them out of logs.
|
|
|
+* Secrets MUST NOT be logged, even at debug level.
|
|
|
+* Prefer short-lived session secrets over long-term secrets.
|
|
|
+* Try to restrict the permissions to what a client with the delegated token
|
|
|
+ may perform to those needed to access data in the S3 bucket. This potentially
|
|
|
+ includes a DynamoDB table, KMS access, etc.
|
|
|
+* Implementations need to be resistant to attacks which pass in invalid data as
|
|
|
+their token identifier: validate the types of the unmarshalled data; set limits
|
|
|
+on the size of all strings and other arrays to read in, etc.
|
|
|
+
|
|
|
+### <a name="resilience"></a> Resilience
|
|
|
+
|
|
|
+Implementations need to handle transient failures of any remote authentication
|
|
|
+service, and the risk of a large-cluster startup overloading it.
|
|
|
+
|
|
|
+* All get/renew/cancel operations should be considered idempotent.
|
|
|
+* And clients to repeat with backoff & jitter on recoverable connectivity failures.
|
|
|
+* While failing fast on the unrecoverable failures (DNS, authentication).
|
|
|
+
|
|
|
+### <a name="scalability"></a> Scalability limits of AWS STS service
|
|
|
+
|
|
|
+There is currently no documented rate limit for token requests against the AWS
|
|
|
+STS service.
|
|
|
+
|
|
|
+We have two tests which attempt to generate enough requests for
|
|
|
+delegation tokens that the AWS STS service will throttle requests for
|
|
|
+tokens by that AWS account for that specific STS endpoint
|
|
|
+(`ILoadTestRoleCredentials` and `ILoadTestSessionCredentials`).
|
|
|
+
|
|
|
+In the initial results of these tests:
|
|
|
+
|
|
|
+* A few hundred requests a second can be made before STS block the caller.
|
|
|
+* The throttling does not last very long (seconds)
|
|
|
+* Tt does not appear to affect any other STS endpoints.
|
|
|
+
|
|
|
+If developers wish to experiment with these tests and provide more detailed
|
|
|
+analysis, we would welcome this. Do bear in mind that all users of the
|
|
|
+same AWS account in that region will be throttled. Your colleagues may
|
|
|
+notice, especially if the applications they are running do not retry on
|
|
|
+throttle responses from STS (it's not a common occurrence after all...).
|
|
|
+
|
|
|
+## Implementing your own Delegation Token Binding
|
|
|
+
|
|
|
+The DT binding mechanism is designed to be extensible: if you have an alternate
|
|
|
+authentication mechanism, such as an S3-compatible object store with
|
|
|
+Kerberos support —S3A Delegation tokens should support it.
|
|
|
+
|
|
|
+*if it can't: that's a bug in the implementation which needs to be corrected*.
|
|
|
+
|
|
|
+### Steps
|
|
|
+
|
|
|
+1. Come up with a token "Kind"; a unique name for the delegation token identifier.
|
|
|
+1. Implement a subclass of `AbstractS3ATokenIdentifier` which adds all information which
|
|
|
+is marshalled from client to remote services. This must subclass the `Writable` methods to read
|
|
|
+and write the data to a data stream: these subclasses must call the superclass methods first.
|
|
|
+1. Add a resource `META-INF/services/org.apache.hadoop.security.token.TokenIdentifier`
|
|
|
+1. And list in it, the classname of your new identifier.
|
|
|
+1. Implement a subclass of `AbstractDelegationTokenBinding`
|
|
|
+
|
|
|
+### Implementing `AbstractS3ATokenIdentifier`
|
|
|
+
|
|
|
+Look at the other examples to see what to do; `SessionTokenIdentifier` does
|
|
|
+most of the work.
|
|
|
+
|
|
|
+Having a `toString()` method which is informative is ideal for the `hdfs creds`
|
|
|
+command as well as debugging: *but do not print secrets*
|
|
|
+
|
|
|
+*Important*: Add no references to any AWS SDK class, to
|
|
|
+ensure it can be safely deserialized whenever the relevant token
|
|
|
+identifier is examined. Best practise is: avoid any references to
|
|
|
+classes which may not be on the classpath of core Hadoop services,
|
|
|
+especially the YARN Resource Manager and Node Managers.
|
|
|
+
|
|
|
+### `AWSCredentialProviderList deployUnbonded()`
|
|
|
+
|
|
|
+1. Perform all initialization needed on an "unbonded" deployment to authenticate with the store.
|
|
|
+1. Return a list of AWS Credential providers which can be used to authenticate the caller.
|
|
|
+
|
|
|
+**Tip**: consider *not* doing all the checks to verify that DTs can be issued.
|
|
|
+That can be postponed until a DT is issued -as in any deployments where a DT is not actually
|
|
|
+needed, failing at this point is overkill. As an example, `RoleTokenBinding` cannot issue
|
|
|
+DTs if it only has a set of session credentials, but it will deploy without them, so allowing
|
|
|
+`hadoop fs` commands to work on an EC2 VM with IAM role credentials.
|
|
|
+
|
|
|
+**Tip**: The class `org.apache.hadoop.fs.s3a.auth.MarshalledCredentials` holds a set of
|
|
|
+marshalled credentials and so can be used within your own Token Identifier if you want
|
|
|
+to include a set of full/session AWS credentials in your token identifier.
|
|
|
+
|
|
|
+### `AWSCredentialProviderList bindToTokenIdentifier(AbstractS3ATokenIdentifier id)`
|
|
|
+
|
|
|
+The identifier passed in will be the one for the current filesystem URI and of your token kind.
|
|
|
+
|
|
|
+1. Use `convertTokenIdentifier` to cast it to your DT type, or fail with a meaningful `IOException`.
|
|
|
+1. Extract the secrets needed to authenticate with the object store (or whatever service issues
|
|
|
+object store credentials).
|
|
|
+1. Return a list of AWS Credential providers which can be used to authenticate the caller with
|
|
|
+the extracted secrets.
|
|
|
+
|
|
|
+### `AbstractS3ATokenIdentifier createEmptyIdentifier()`
|
|
|
+
|
|
|
+Return an empty instance of your token identifier.
|
|
|
+
|
|
|
+### `AbstractS3ATokenIdentifier createTokenIdentifier(Optional<RoleModel.Policy> policy, EncryptionSecrets secrets)`
|
|
|
+
|
|
|
+Create the delegation token.
|
|
|
+
|
|
|
+If non-empty, the `policy` argument contains an AWS policy model to grant access to:
|
|
|
+
|
|
|
+* The target S3 bucket.
|
|
|
+* Any S3Guard DDB table it is bonded to.
|
|
|
+* KMS key `"kms:GenerateDataKey` and `kms:Decrypt`permissions for all KMS keys.
|
|
|
+
|
|
|
+This can be converted to a string and passed to the AWS `assumeRole` operation.
|
|
|
+
|
|
|
+The `secrets` argument contains encryption policy and secrets:
|
|
|
+this should be passed to the superclass constructor as is; it is retrieved and used
|
|
|
+to set the encryption policy on the newly created filesystem.
|
|
|
+
|
|
|
+
|
|
|
+*Tip*: Use `AbstractS3ATokenIdentifier.createDefaultOriginMessage()` to create an initial
|
|
|
+message for the origin of the token —this is useful for diagnostics.
|
|
|
+
|
|
|
+
|
|
|
+#### Token Renewal
|
|
|
+
|
|
|
+There's no support in the design for token renewal; it would be very complex
|
|
|
+to make it pluggable, and as all the bundled mechanisms don't support renewal,
|
|
|
+untestable and unjustifiable.
|
|
|
+
|
|
|
+Any token binding which wants to add renewal support will have to implement
|
|
|
+it directly.
|
|
|
+
|
|
|
+### Testing
|
|
|
+
|
|
|
+Use the tests `org.apache.hadoop.fs.s3a.auth.delegation` as examples. You'll have to
|
|
|
+copy and paste some of the test base classes over; `hadoop-common`'s test JAR is published
|
|
|
+to Maven Central, but not the S3A one (a fear of leaking AWS credentials).
|
|
|
+
|
|
|
+
|
|
|
+#### Unit Test `TestS3ADelegationTokenSupport`
|
|
|
+
|
|
|
+This tests marshalling and unmarshalling of tokens identifiers.
|
|
|
+*Test that every field is preserved.*
|
|
|
+
|
|
|
+
|
|
|
+#### Integration Test `ITestSessionDelegationTokens`
|
|
|
+
|
|
|
+Tests the lifecycle of session tokens.
|
|
|
+
|
|
|
+#### Integration Test `ITestSessionDelegationInFileystem`.
|
|
|
+
|
|
|
+This collects DTs from one filesystem, and uses that to create a new FS instance and
|
|
|
+then perform filesystem operations. A miniKDC is instantiated
|
|
|
+
|
|
|
+* Take care to remove all login secrets from the environment, so as to make sure that
|
|
|
+the second instance is picking up the DT information.
|
|
|
+* `UserGroupInformation.reset()` can be used to reset user secrets after every test
|
|
|
+case (e.g. teardown), so that issued DTs from one test case do not contaminate the next.
|
|
|
+* its subclass, `ITestRoleDelegationInFileystem` adds a check that the current credentials
|
|
|
+in the DT cannot be used to access data on other buckets —that is, the active
|
|
|
+session really is restricted to the target bucket.
|
|
|
+
|
|
|
+
|
|
|
+#### Integration Test `ITestDelegatedMRJob`
|
|
|
+
|
|
|
+It's not easy to bring up a YARN cluster with a secure HDFS and miniKDC controller in
|
|
|
+test cases —this test, the closest there is to an end-to-end test,
|
|
|
+uses mocking to mock the RPC calls to the YARN AM, and then verifies that the tokens
|
|
|
+have been collected in the job context,
|
|
|
+
|
|
|
+#### Load Test `ILoadTestSessionCredentials`
|
|
|
+
|
|
|
+This attempts to collect many, many delegation tokens simultaneously and sees
|
|
|
+what happens.
|
|
|
+
|
|
|
+Worth doing if you have a new authentication service provider, or
|
|
|
+implementing custom DT support.
|
|
|
+Consider also something for going from DT to
|
|
|
+AWS credentials if this is also implemented by your own service.
|
|
|
+This is left as an exercise for the developer.
|
|
|
+
|
|
|
+**Tip**: don't go overboard here, especially against AWS itself.
|