Explorar el Código

HADOOP-14305 S3A SSE tests won't run in parallel: Bad request in directory GetFileStatus.
Contributed by Steve Moist.

Steve Loughran hace 8 años
padre
commit
d1da5ba3af

+ 2 - 2
hadoop-tools/hadoop-aws/pom.xml

@@ -185,7 +185,7 @@
                     <exclude>**/ITestS3NContractRootDir.java</exclude>
                     <exclude>**/ITestS3NContractRootDir.java</exclude>
                     <exclude>**/ITestS3ContractRootDir.java</exclude>
                     <exclude>**/ITestS3ContractRootDir.java</exclude>
                     <exclude>**/ITestS3AFileContextStatistics.java</exclude>
                     <exclude>**/ITestS3AFileContextStatistics.java</exclude>
-                    <exclude>**/ITestS3AEncryptionSSE*.java</exclude>
+                    <exclude>**/ITestS3AEncryptionSSEC*.java</exclude>
                     <exclude>**/ITestS3AHuge*.java</exclude>
                     <exclude>**/ITestS3AHuge*.java</exclude>
                   </excludes>
                   </excludes>
                 </configuration>
                 </configuration>
@@ -216,7 +216,7 @@
                     <include>**/ITestS3ContractRootDir.java</include>
                     <include>**/ITestS3ContractRootDir.java</include>
                     <include>**/ITestS3AFileContextStatistics.java</include>
                     <include>**/ITestS3AFileContextStatistics.java</include>
                     <include>**/ITestS3AHuge*.java</include>
                     <include>**/ITestS3AHuge*.java</include>
-                    <include>**/ITestS3AEncryptionSSE*.java</include>
+                    <include>**/ITestS3AEncryptionSSEC*.java</include>
                   </includes>
                   </includes>
                 </configuration>
                 </configuration>
               </execution>
               </execution>

+ 81 - 0
hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/index.md

@@ -1513,6 +1513,52 @@ basis.
 to set fadvise policies on input streams. Once implemented,
 to set fadvise policies on input streams. Once implemented,
 this will become the supported mechanism used for configuring the input IO policy.
 this will become the supported mechanism used for configuring the input IO policy.
 
 
+
+### <a name="s3a_encryption"></a> Encrypting objects with S3A
+
+Currently, S3A only supports S3's Server Side Encryption for at rest data encryption.
+It is *encouraged* to read up on the [AWS documentation](https://docs.aws.amazon.com/AmazonS3/latest/dev/serv-side-encryption.html)
+for S3 Server Side Encryption before using these options as each behave differently
+and the documentation will be more up to date on its behavior.  When configuring
+an encryption method in the `core-site.xml`, this will apply cluster wide.  Any
+new files written will be encrypted with this encryption configuration.  Any
+existing files when read, will decrypt using the existing method (if possible)
+and will not be re-encrypted with the new method. It is also possible if mixing
+multiple keys that the user does not have access to decrypt the object. It is
+**NOT** advised to mix and match encryption types in a bucket, and is *strongly*
+recommended to just one type and key per bucket.
+
+SSE-S3 is where S3 will manage the encryption keys for each object. The parameter
+for `fs.s3a.server-side-encryption-algorithm` is `AES256`.
+
+SSE-KMS is where the user specifies a Customer Master Key(CMK) that is used to
+encrypt the objects. The user may specify a specific CMK or leave the
+`fs.s3a.server-side-encryption-key` empty to use the default auto-generated key
+in AWS IAM.  Each CMK configured in AWS IAM is region specific, and cannot be
+used in a in a S3 bucket in a different region.  There is can also be policies
+assigned to the CMK that prohibit or restrict its use for users causing S3A
+requests to fail.
+
+SSE-C is where the user specifies an actual base64 encoded AES-256 key supplied
+and managed by the user.
+
+#### SSE-C Warning
+
+It is strongly recommended to fully understand how SSE-C works in the S3
+environment before using this encryption type.  Please refer to the Server Side
+Encryption documentation available from AWS.  SSE-C is only recommended for
+advanced users with advanced encryption use cases.  Failure to properly manage
+encryption keys can cause data loss.  Currently, the AWS S3 API(and thus S3A)
+only supports one encryption key and cannot support decrypting objects during
+moves under a previous key to a new destination.  It is **NOT** advised to use
+multiple encryption keys in a bucket, and is recommended to use one key per
+bucket and to not change this key.  This is due to when a request is made to S3,
+the actual encryption key must be provided to decrypt the object and access the
+metadata.  Since only one encryption key can be provided at a time, S3A will not
+pass the correct encryption key to decrypt the data. Please see the
+troubleshooting section for more information.
+
+
 ## Troubleshooting S3A
 ## Troubleshooting S3A
 
 
 Common problems working with S3A are
 Common problems working with S3A are
@@ -1976,6 +2022,41 @@ if it is required that the data is persisted durably after every
 `flush()/hflush()` call. This includes resilient logging, HBase-style journalling
 `flush()/hflush()` call. This includes resilient logging, HBase-style journalling
 and the like. The standard strategy here is to save to HDFS and then copy to S3.
 and the like. The standard strategy here is to save to HDFS and then copy to S3.
 
 
+
+### S3 Server Side Encryption
+
+#### Using SSE-KMS
+
+When performing file operations, the user may run into an issue where the KMS
+key arn is invalid.
+```
+com.amazonaws.services.s3.model.AmazonS3Exception:
+Invalid arn (Service: Amazon S3; Status Code: 400; Error Code: KMS.NotFoundException; Request ID: 708284CF60EE233F),
+S3 Extended Request ID: iHUUtXUSiNz4kv3Bdk/hf9F+wjPt8GIVvBHx/HEfCBYkn7W6zmpvbA3XT7Y5nTzcZtfuhcqDunw=:
+Invalid arn (Service: Amazon S3; Status Code: 400; Error Code: KMS.NotFoundException; Request ID: 708284CF60EE233F)
+```
+
+This is due to either, the KMS key id is entered incorrectly, or the KMS key id
+is in a different region than the S3 bucket being used.
+
+#### Using SSE-C
+When performing file operations the user may run into an unexpected 400/403
+error such as
+```
+org.apache.hadoop.fs.s3a.AWSS3IOException: getFileStatus on fork-4/: com.amazonaws.services.s3.model.AmazonS3Exception:
+Bad Request (Service: Amazon S3; Status Code: 400;
+Error Code: 400 Bad Request; Request ID: 42F9A1987CB49A99),
+S3 Extended Request ID: jU2kcwaXnWj5APB14Cgb1IKkc449gu2+dhIsW/+7x9J4D+VUkKvu78mBo03oh9jnOT2eoTLdECU=:
+Bad Request (Service: Amazon S3; Status Code: 400; Error Code: 400 Bad Request; Request ID: 42F9A1987CB49A99)
+```
+
+This can happen in the cases of not specifying the correct SSE-C encryption key.
+Such cases can be as follows:
+1. An object is encrypted using SSE-C on S3 and either the wrong encryption type
+is used, no encryption is specified, or the SSE-C specified is incorrect.
+2. A directory is encrypted with a SSE-C keyA and the user is trying to move a
+file using configured SSE-C keyB into that structure.
+
 ### Other issues
 ### Other issues
 
 
 *Performance slow*
 *Performance slow*

+ 4 - 0
hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/AbstractTestS3AEncryption.java

@@ -85,6 +85,10 @@ public abstract class AbstractTestS3AEncryption extends AbstractS3ATestBase {
     return String.format("%s-%04x", methodName.getMethodName(), len);
     return String.format("%s-%04x", methodName.getMethodName(), len);
   }
   }
 
 
+  protected String createFilename(String name) {
+    return String.format("%s-%s", methodName.getMethodName(), name);
+  }
+
   /**
   /**
    * Assert that at path references an encrypted blob.
    * Assert that at path references an encrypted blob.
    * @param path path
    * @param path path

+ 351 - 36
hadoop-tools/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/ITestS3AEncryptionSSEC.java

@@ -18,19 +18,23 @@
 
 
 package org.apache.hadoop.fs.s3a;
 package org.apache.hadoop.fs.s3a;
 
 
-import static org.apache.hadoop.fs.contract.ContractTestUtils.dataset;
-import static org.apache.hadoop.fs.contract.ContractTestUtils.rm;
-import static org.apache.hadoop.fs.s3a.S3ATestUtils.skipIfEncryptionTestsDisabled;
+import java.io.IOException;
+import java.util.concurrent.Callable;
+
+import org.junit.Rule;
+import org.junit.Test;
+import org.junit.rules.ExpectedException;
 
 
 import org.apache.hadoop.conf.Configuration;
 import org.apache.hadoop.conf.Configuration;
 import org.apache.hadoop.fs.FileSystem;
 import org.apache.hadoop.fs.FileSystem;
 import org.apache.hadoop.fs.Path;
 import org.apache.hadoop.fs.Path;
 import org.apache.hadoop.fs.contract.ContractTestUtils;
 import org.apache.hadoop.fs.contract.ContractTestUtils;
 import org.apache.hadoop.fs.contract.s3a.S3AContract;
 import org.apache.hadoop.fs.contract.s3a.S3AContract;
-import org.hamcrest.core.StringContains;
-import org.junit.Rule;
-import org.junit.Test;
-import org.junit.rules.ExpectedException;
+
+import static org.apache.hadoop.fs.contract.ContractTestUtils.dataset;
+import static org.apache.hadoop.fs.contract.ContractTestUtils.rm;
+import static org.apache.hadoop.fs.s3a.S3ATestUtils.skipIfEncryptionTestsDisabled;
+import static org.apache.hadoop.test.LambdaTestUtils.intercept;
 
 
 /**
 /**
  * Concrete class that extends {@link AbstractTestS3AEncryption}
  * Concrete class that extends {@link AbstractTestS3AEncryption}
@@ -58,39 +62,350 @@ public class ITestS3AEncryptionSSEC extends AbstractTestS3AEncryption {
    * This will create and write to a file using encryption key A, then attempt
    * This will create and write to a file using encryption key A, then attempt
    * to read from it again with encryption key B.  This will not work as it
    * to read from it again with encryption key B.  This will not work as it
    * cannot decrypt the file.
    * cannot decrypt the file.
+   *
+   * This is expected AWS S3 SSE-C behavior.
+   *
    * @throws Exception
    * @throws Exception
    */
    */
   @Test
   @Test
   public void testCreateFileAndReadWithDifferentEncryptionKey() throws
   public void testCreateFileAndReadWithDifferentEncryptionKey() throws
-    Exception {
-    expectedException.expect(java.nio.file.AccessDeniedException.class);
-    expectedException.expectMessage(StringContains
-        .containsString("Service: Amazon S3; Status Code: 403;"));
-
-    Path path = null;
-    try {
-      int len = 2048;
-      skipIfEncryptionTestsDisabled(getConfiguration());
-      describe("Create an encrypted file of size " + len);
-      String src = createFilename(len);
-      path = writeThenReadFile(src, len);
-
-      Configuration conf = this.createConfiguration();
-      conf.set(Constants.SERVER_SIDE_ENCRYPTION_KEY,
-          "kX7SdwVc/1VXJr76kfKnkQ3ONYhxianyL2+C3rPVT9s=");
-
-      S3AContract contract = (S3AContract) createContract(conf);
-      contract.init();
-      //skip tests if they aren't enabled
-      assumeEnabled();
-      //extract the test FS
-      FileSystem fileSystem = contract.getTestFileSystem();
-      byte[] data = dataset(len, 'a', 'z');
-      ContractTestUtils.verifyFileContents(fileSystem, path, data);
-    } catch(Exception e) {
-      rm(getFileSystem(), path, false, false);
-      throw e;
-    }
+      Exception {
+    assumeEnabled();
+    skipIfEncryptionTestsDisabled(getConfiguration());
+
+    final Path[] path = new Path[1];
+    intercept(java.nio.file.AccessDeniedException.class,
+        "Service: Amazon S3; Status Code: 403;",
+        new Callable<Void>() {
+          @Override
+          public Void call() throws Exception {
+            int len = 2048;
+            describe("Create an encrypted file of size " + len);
+            String src = createFilename(len);
+            path[0] = writeThenReadFile(src, len);
+
+            //extract the test FS
+            FileSystem fileSystem = createNewFileSystemWithSSECKey(
+                "kX7SdwVc/1VXJr76kfKnkQ3ONYhxianyL2+C3rPVT9s=");
+            byte[] data = dataset(len, 'a', 'z');
+            ContractTestUtils.verifyFileContents(fileSystem, path[0], data);
+            throw new Exception("Fail");
+          }
+        });
+  }
+
+  /**
+   * While each object has it's own key and should be distinct, this verifies
+   * that hadoop treats object keys as a filesystem path.  So if a top level
+   * dir is encrypted with keyA, a sublevel dir cannot be accessed with a
+   * different keyB.
+   *
+   * This is expected AWS S3 SSE-C behavior.
+   *
+   * @throws Exception
+   */
+  @Test
+  public void testCreateSubdirWithDifferentKey() throws Exception {
+    assumeEnabled();
+    skipIfEncryptionTestsDisabled(getConfiguration());
+
+    final Path[] path = new Path[1];
+    intercept(java.nio.file.AccessDeniedException.class,
+        "Service: Amazon S3; Status Code: 403;",
+        new Callable<Void>() {
+          @Override
+          public Void call() throws Exception {
+
+            path[0] = S3ATestUtils.createTestPath(
+                new Path(createFilename("dir/"))
+            );
+            Path nestedDirectory = S3ATestUtils.createTestPath(
+                new Path(createFilename("dir/nestedDir/"))
+            );
+            FileSystem fsKeyB = createNewFileSystemWithSSECKey(
+                "G61nz31Q7+zpjJWbakxfTOZW4VS0UmQWAq2YXhcTXoo=");
+            getFileSystem().mkdirs(path[0]);
+            fsKeyB.mkdirs(nestedDirectory);
+
+            throw new Exception("Exception should be thrown.");
+          }
+        });
+    rm(getFileSystem(), path[0], true, false);
+  }
+
+  /**
+   * Ensures a file can't be created with keyA and then renamed with a different
+   * key.
+   *
+   * This is expected AWS S3 SSE-C behavior.
+   *
+   * @throws Exception
+   */
+  @Test
+  public void testCreateFileThenMoveWithDifferentSSECKey() throws Exception {
+    assumeEnabled();
+    skipIfEncryptionTestsDisabled(getConfiguration());
+
+    final Path[] path = new Path[1];
+    intercept(java.nio.file.AccessDeniedException.class,
+        "Service: Amazon S3; Status Code: 403;",
+        new Callable<Void>() {
+          @Override
+          public Void call() throws Exception {
+
+            int len = 2048;
+            String src = createFilename(len);
+            path[0] = writeThenReadFile(src, len);
+
+            FileSystem fsKeyB = createNewFileSystemWithSSECKey(
+                "NTx0dUPrxoo9+LbNiT/gqf3z9jILqL6ilismFmJO50U=");
+            fsKeyB.rename(path[0],
+                new Path(createFilename("different-path.txt")));
+
+            throw new Exception("Exception should be thrown.");
+          }
+        });
+  }
+
+  /**
+   * General test to make sure move works with SSE-C with the same key, unlike
+   * with multiple keys.
+   *
+   * @throws Exception
+   */
+  @Test
+  public void testRenameFile() throws Exception {
+    assumeEnabled();
+    skipIfEncryptionTestsDisabled(getConfiguration());
+
+    String src = createFilename("original-path.txt");
+    Path path = writeThenReadFile(src, 2048);
+    Path newPath = path(createFilename("different-path.txt"));
+    getFileSystem().rename(path, newPath);
+    byte[] data = dataset(2048, 'a', 'z');
+    ContractTestUtils.verifyFileContents(getFileSystem(), newPath, data);
+  }
+
+  /**
+   * It is possible to list the contents of a directory up to the actual
+   * end of the nested directories.  This is due to how S3A mocks the
+   * directories and how prefixes work in S3.
+   * @throws Exception
+   */
+  @Test
+  public void testListEncryptedDir() throws Exception {
+    assumeEnabled();
+    skipIfEncryptionTestsDisabled(getConfiguration());
+
+    Path nestedDirectory = S3ATestUtils.createTestPath(
+         path(createFilename("/a/b/c/"))
+    );
+    assertTrue(getFileSystem().mkdirs(nestedDirectory));
+
+    final FileSystem fsKeyB = createNewFileSystemWithSSECKey(
+        "msdo3VvvZznp66Gth58a91Hxe/UpExMkwU9BHkIjfW8=");
+
+    fsKeyB.listFiles(S3ATestUtils.createTestPath(
+        path(createFilename("/a/"))
+    ), true);
+    fsKeyB.listFiles(S3ATestUtils.createTestPath(
+        path(createFilename("/a/b/"))
+    ), true);
+
+    //Until this point, no exception is thrown about access
+    intercept(java.nio.file.AccessDeniedException.class,
+        "Service: Amazon S3; Status Code: 403;",
+        new Callable<Void>() {
+          @Override
+          public Void call() throws Exception {
+            fsKeyB.listFiles(S3ATestUtils.createTestPath(
+                path(createFilename("/a/b/c/"))
+            ), false);
+            throw new Exception("Exception should be thrown.");
+          }
+        });
+
+    Configuration conf = this.createConfiguration();
+    conf.unset(Constants.SERVER_SIDE_ENCRYPTION_ALGORITHM);
+    conf.unset(Constants.SERVER_SIDE_ENCRYPTION_KEY);
+
+    S3AContract contract = (S3AContract) createContract(conf);
+    contract.init();
+    final FileSystem unencryptedFileSystem = contract.getTestFileSystem();
+
+    //unencrypted can access until the final directory
+    unencryptedFileSystem.listFiles(S3ATestUtils.createTestPath(
+        path(createFilename("/a/"))
+    ), true);
+    unencryptedFileSystem.listFiles(S3ATestUtils.createTestPath(
+        path(createFilename("/a/b/"))
+    ), true);
+    intercept(org.apache.hadoop.fs.s3a.AWSS3IOException.class,
+        "Bad Request (Service: Amazon S3; Status Code: 400; Error" +
+            " Code: 400 Bad Request;",
+        new Callable<Void>() {
+          @Override
+          public Void call() throws Exception {
+            unencryptedFileSystem.listFiles(S3ATestUtils.createTestPath(
+                path(createFilename("/a/b/c/"))
+            ), false);
+            throw new Exception("Exception should be thrown.");
+          }
+        });
+    rm(getFileSystem(), path(createFilename("/")), true, false);
+  }
+
+  /**
+   * Much like the above list encrypted directory test, you cannot get the
+   * metadata of an object without the correct encryption key.
+   * @throws Exception
+   */
+  @Test
+  public void testListStatusEncryptedDir() throws Exception {
+    assumeEnabled();
+    skipIfEncryptionTestsDisabled(getConfiguration());
+
+    Path nestedDirectory = S3ATestUtils.createTestPath(
+         path(createFilename("/a/b/c/"))
+    );
+    assertTrue(getFileSystem().mkdirs(nestedDirectory));
+
+    final FileSystem fsKeyB = createNewFileSystemWithSSECKey(
+        "msdo3VvvZznp66Gth58a91Hxe/UpExMkwU9BHkIjfW8=");
+
+    fsKeyB.listStatus(S3ATestUtils.createTestPath(
+        path(createFilename("/a/"))));
+    fsKeyB.listStatus(S3ATestUtils.createTestPath(
+        path(createFilename("/a/b/"))));
+
+    //Until this point, no exception is thrown about access
+    intercept(java.nio.file.AccessDeniedException.class,
+        "Service: Amazon S3; Status Code: 403;",
+        new Callable<Void>() {
+          @Override
+          public Void call() throws Exception {
+            fsKeyB.listStatus(S3ATestUtils.createTestPath(
+                path(createFilename("/a/b/c/"))));
+
+            throw new Exception("Exception should be thrown.");
+          }
+        });
+
+    //Now try it with an unencrypted filesystem.
+    Configuration conf = this.createConfiguration();
+    conf.unset(Constants.SERVER_SIDE_ENCRYPTION_ALGORITHM);
+    conf.unset(Constants.SERVER_SIDE_ENCRYPTION_KEY);
+
+    S3AContract contract = (S3AContract) createContract(conf);
+    contract.init();
+    final FileSystem unencryptedFileSystem = contract.getTestFileSystem();
+
+    //unencrypted can access until the final directory
+    unencryptedFileSystem.listStatus(S3ATestUtils.createTestPath(
+        path(createFilename("/a/"))));
+    unencryptedFileSystem.listStatus(S3ATestUtils.createTestPath(
+        path(createFilename("/a/b/"))));
+    intercept(org.apache.hadoop.fs.s3a.AWSS3IOException.class,
+        "Bad Request (Service: Amazon S3; Status Code: 400; Error Code: 400" +
+            " Bad Request;", new Callable<Void>() {
+          @Override
+          public Void call() throws Exception {
+
+            unencryptedFileSystem.listStatus(S3ATestUtils.createTestPath(
+                path(createFilename("/a/b/c/"))));
+            throw new Exception("Exception should be thrown.");
+          }
+        });
+    rm(getFileSystem(), path(createFilename("/")), true, false);
+  }
+
+  /**
+   * Much like trying to access a encrypted directory, an encrypted file cannot
+   * have its metadata read, since both are technically an object.
+   * @throws Exception
+   */
+  @Test
+  public void testListStatusEncryptedFile() throws Exception {
+    assumeEnabled();
+    skipIfEncryptionTestsDisabled(getConfiguration());
+
+    Path nestedDirectory = S3ATestUtils.createTestPath(
+        path(createFilename("/a/b/c/"))
+    );
+    assertTrue(getFileSystem().mkdirs(nestedDirectory));
+
+    String src = createFilename("/a/b/c/fileToStat.txt");
+    final Path fileToStat =  writeThenReadFile(src, 2048);
+
+    final FileSystem fsKeyB = createNewFileSystemWithSSECKey(
+        "msdo3VvvZznp66Gth58a91Hxe/UpExMkwU9BHkIjfW8=");
+
+    //Until this point, no exception is thrown about access
+    intercept(java.nio.file.AccessDeniedException.class,
+        "Service: Amazon S3; Status Code: 403;",
+        new Callable<Void>() {
+          @Override
+          public Void call() throws Exception {
+        fsKeyB.listStatus(S3ATestUtils.createTestPath(fileToStat));
+
+        throw new Exception("Exception should be thrown.");
+      }});
+    rm(getFileSystem(), path(createFilename("/")), true, false);
+  }
+
+
+
+
+  /**
+   * It is possible to delete directories without the proper encryption key and
+   * the hierarchy above it.
+   *
+   * @throws Exception
+   */
+  @Test
+  public void testDeleteEncryptedObjectWithDifferentKey() throws Exception {
+    assumeEnabled();
+    skipIfEncryptionTestsDisabled(getConfiguration());
+
+    Path nestedDirectory = S3ATestUtils.createTestPath(
+        path(createFilename("/a/b/c/"))
+    );
+    assertTrue(getFileSystem().mkdirs(nestedDirectory));
+    String src = createFilename("/a/b/c/filetobedeleted.txt");
+    final Path fileToDelete =  writeThenReadFile(src, 2048);
+
+    final FileSystem fsKeyB = createNewFileSystemWithSSECKey(
+        "msdo3VvvZznp66Gth58a91Hxe/UpExMkwU9BHkIjfW8=");
+    intercept(java.nio.file.AccessDeniedException.class,
+        "Forbidden (Service: Amazon S3; Status Code: 403; Error Code: " +
+        "403 Forbidden",
+          new Callable<Void>() {
+            @Override
+            public Void call() throws Exception {
+
+              fsKeyB.delete(fileToDelete, false);
+              throw new Exception("Exception should be thrown.");
+            }
+          });
+
+  //This is possible
+    fsKeyB.delete(S3ATestUtils.createTestPath(
+        path(createFilename("/a/b/c/"))), true);
+    fsKeyB.delete(S3ATestUtils.createTestPath(
+        path(createFilename("/a/b/"))), true);
+    fsKeyB.delete(S3ATestUtils.createTestPath(
+        path(createFilename("/a/"))), true);
+  }
+
+  private FileSystem createNewFileSystemWithSSECKey(String sseCKey) throws
+      IOException {
+    Configuration conf = this.createConfiguration();
+    conf.set(Constants.SERVER_SIDE_ENCRYPTION_KEY, sseCKey);
+
+    S3AContract contract = (S3AContract) createContract(conf);
+    contract.init();
+    FileSystem fileSystem = contract.getTestFileSystem();
+    return fileSystem;
   }
   }
 
 
   @Override
   @Override