Browse Source

MAPREDUCE-6357. MultipleOutputs.write() API should document that output committing is not utilized when input path is absolute. Contributed by Dustin Cote.

(cherry picked from commit 2ba90c93d71aa2d30ee9ed431750c10c685e5599)
Akira Ajisaka 10 years ago
parent
commit
29819e7240

+ 4 - 0
hadoop-mapreduce-project/CHANGES.txt

@@ -286,6 +286,10 @@ Release 2.8.0 - UNRELEASED
     MAPREDUCE-5817. Mappers get rescheduled on node transition even after all
     reducers are completed. (Sangjin Lee via kasha)
 
+    MAPREDUCE-6357. MultipleOutputs.write() API should document that output
+    committing is not utilized when input path is absolute.
+    (Dustin Cote via aajisaka)
+
 Release 2.7.2 - UNRELEASED
 
   INCOMPATIBLE CHANGES

+ 13 - 1
hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/output/MultipleOutputs.java

@@ -120,7 +120,11 @@ import java.util.*;
  * 
  * <p>
  * Use <code>MultipleOutputs.write(KEYOUT key, VALUEOUT value, String baseOutputPath)</code> to write key and 
- * value to a path specified by <code>baseOutputPath</code>, with no need to specify a named output:
+ * value to a path specified by <code>baseOutputPath</code>, with no need to specify a named output.
+ * <b>Warning</b>: when the baseOutputPath passed to MultipleOutputs.write
+ * is a path that resolves outside of the final job output directory, the
+ * directory is created immediately and then persists through subsequent
+ * task retries, breaking the concept of output committing:
  * </p>
  * 
  * <pre>
@@ -418,6 +422,10 @@ public class MultipleOutputs<KEYOUT, VALUEOUT> {
    * @param value          the value
    * @param baseOutputPath base-output path to write the record to.
    * Note: Framework will generate unique filename for the baseOutputPath
+   * <b>Warning</b>: when the baseOutputPath is a path that resolves
+   * outside of the final job output directory, the directory is created
+   * immediately and then persists through subsequent task retries, breaking
+   * the concept of output committing.
    */
   @SuppressWarnings("unchecked")
   public <K, V> void write(String namedOutput, K key, V value,
@@ -442,6 +450,10 @@ public class MultipleOutputs<KEYOUT, VALUEOUT> {
    * @param value     the value
    * @param baseOutputPath base-output path to write the record to.
    * Note: Framework will generate unique filename for the baseOutputPath
+   * <b>Warning</b>: when the baseOutputPath is a path that resolves
+   * outside of the final job output directory, the directory is created
+   * immediately and then persists through subsequent task retries, breaking
+   * the concept of output committing.
    */
   @SuppressWarnings("unchecked")
   public void write(KEYOUT key, VALUEOUT value, String baseOutputPath)