Selaa lähdekoodia

HADOOP-4668. Improve documentation for setCombinerClass to clarify the
restrictions on combiners. (omalley)


git-svn-id: https://svn.apache.org/repos/asf/hadoop/core/trunk@719431 13f79535-47bb-0310-9956-ffa450edef68

Owen O'Malley 16 vuotta sitten
vanhempi
commit
85ae3005e4
2 muutettua tiedostoa jossa 18 lisäystä ja 6 poistoa
  1. 3 0
      CHANGES.txt
  2. 15 6
      src/mapred/org/apache/hadoop/mapred/JobConf.java

+ 3 - 0
CHANGES.txt

@@ -121,6 +121,9 @@ Trunk (unreleased changes)
     it down by monitoring for cumulative memory usage across tasks.
     (Vinod Kumar Vavilapalli via yhemanth)
 
+    HADOOP-4668. Improve documentation for setCombinerClass to clarify the
+    restrictions on combiners. (omalley)
+
   OPTIMIZATIONS
 
     HADOOP-3293. Fixes FileInputFormat to do provide locations for splits

+ 15 - 6
src/mapred/org/apache/hadoop/mapred/JobConf.java

@@ -775,11 +775,20 @@ public class JobConf extends Configuration {
    * Set the user-defined <i>combiner</i> class used to combine map-outputs 
    * before being sent to the reducers. 
    * 
-   * <p>The combiner is a task-level aggregation operation which, in some cases,
-   * helps to cut down the amount of data transferred from the {@link Mapper} to
-   * the {@link Reducer}, leading to better performance.</p>
-   *  
-   * <p>Typically the combiner is same as the the <code>Reducer</code> for the  
+   * <p>The combiner is an application-specified aggregation operation, which
+   * can help cut down the amount of data transferred between the 
+   * {@link Mapper} and the {@link Reducer}, leading to better performance.</p>
+   * 
+   * <p>The framework may invoke the combiner 0, 1, or multiple times, in both
+   * the mapper and reducer tasks. In general, the combiner is called as the
+   * sort/merge result is written to disk. The combiner must:
+   * <ul>
+   *   <li> be side-effect free</li>
+   *   <li> have the same input and output key types and the same input and 
+   *        output value types</li>
+   * </ul></p>
+   * 
+   * <p>Typically the combiner is same as the <code>Reducer</code> for the  
    * job i.e. {@link #setReducerClass(Class)}.</p>
    * 
    * @param theClass the user-defined combiner class used to combine 
@@ -1155,7 +1164,7 @@ public class JobConf extends Configuration {
 
   /**
    * Set whether the system should collect profiler information for some of 
-   * the tasks in this job? The information is stored in the the user log 
+   * the tasks in this job? The information is stored in the user log 
    * directory.
    * @param newValue true means it should be gathered
    */