Browse Source

HADOOP-16775. DistCp reuses the same temp file within the task for different files.

Contributed by Amir Shenavandeh.

This avoids overwrite consistency issues with S3 and other stores -though
given S3's copy operation is O(data), you are still best of using -direct
when distcp-ing to it.

Change-Id: I8dc9f048ad0cc57ff01543b849da1ce4eaadf8c3
Steve Loughran 5 năm trước cách đây
mục cha
commit
5410732cff

+ 2 - 1
hadoop-tools/hadoop-distcp/src/main/java/org/apache/hadoop/tools/mapred/RetriableFileCopyCommand.java

@@ -229,7 +229,8 @@ public class RetriableFileCopyCommand extends RetriableCommand {
     Path root = target.equals(targetWorkPath) ? targetWorkPath.getParent()
         : targetWorkPath;
     Path tempFile = new Path(root, ".distcp.tmp." +
-        context.getTaskAttemptID().toString());
+        context.getTaskAttemptID().toString() +
+        "." + String.valueOf(System.currentTimeMillis()));
     LOG.info("Creating temp file: {}", tempFile);
     return tempFile;
   }