Ver código fonte

HADOOP-4668. Improve documentation for setCombinerClass to clarify the
restrictions on combiners. (omalley)


git-svn-id: https://svn.apache.org/repos/asf/hadoop/core/trunk@719431 13f79535-47bb-0310-9956-ffa450edef68

Owen O'Malley 16 anos atrás
pai
commit
85ae3005e4
2 arquivos alterados com 18 adições e 6 exclusões
  1. 3 0
      CHANGES.txt
  2. 15 6
      src/mapred/org/apache/hadoop/mapred/JobConf.java

+ 3 - 0
CHANGES.txt

@@ -121,6 +121,9 @@ Trunk (unreleased changes)
     it down by monitoring for cumulative memory usage across tasks.
     it down by monitoring for cumulative memory usage across tasks.
     (Vinod Kumar Vavilapalli via yhemanth)
     (Vinod Kumar Vavilapalli via yhemanth)
 
 
+    HADOOP-4668. Improve documentation for setCombinerClass to clarify the
+    restrictions on combiners. (omalley)
+
   OPTIMIZATIONS
   OPTIMIZATIONS
 
 
     HADOOP-3293. Fixes FileInputFormat to do provide locations for splits
     HADOOP-3293. Fixes FileInputFormat to do provide locations for splits

+ 15 - 6
src/mapred/org/apache/hadoop/mapred/JobConf.java

@@ -775,11 +775,20 @@ public class JobConf extends Configuration {
    * Set the user-defined <i>combiner</i> class used to combine map-outputs 
    * Set the user-defined <i>combiner</i> class used to combine map-outputs 
    * before being sent to the reducers. 
    * before being sent to the reducers. 
    * 
    * 
-   * <p>The combiner is a task-level aggregation operation which, in some cases,
-   * helps to cut down the amount of data transferred from the {@link Mapper} to
-   * the {@link Reducer}, leading to better performance.</p>
-   *  
-   * <p>Typically the combiner is same as the the <code>Reducer</code> for the  
+   * <p>The combiner is an application-specified aggregation operation, which
+   * can help cut down the amount of data transferred between the 
+   * {@link Mapper} and the {@link Reducer}, leading to better performance.</p>
+   * 
+   * <p>The framework may invoke the combiner 0, 1, or multiple times, in both
+   * the mapper and reducer tasks. In general, the combiner is called as the
+   * sort/merge result is written to disk. The combiner must:
+   * <ul>
+   *   <li> be side-effect free</li>
+   *   <li> have the same input and output key types and the same input and 
+   *        output value types</li>
+   * </ul></p>
+   * 
+   * <p>Typically the combiner is same as the <code>Reducer</code> for the  
    * job i.e. {@link #setReducerClass(Class)}.</p>
    * job i.e. {@link #setReducerClass(Class)}.</p>
    * 
    * 
    * @param theClass the user-defined combiner class used to combine 
    * @param theClass the user-defined combiner class used to combine 
@@ -1155,7 +1164,7 @@ public class JobConf extends Configuration {
 
 
   /**
   /**
    * Set whether the system should collect profiler information for some of 
    * Set whether the system should collect profiler information for some of 
-   * the tasks in this job? The information is stored in the the user log 
+   * the tasks in this job? The information is stored in the user log 
    * directory.
    * directory.
    * @param newValue true means it should be gathered
    * @param newValue true means it should be gathered
    */
    */