瀏覽代碼

HADOOP-4439. Remove configuration variables that aren't usable yet, in
particular mapred.tasktracker.tasks.maxmemory and mapred.task.max.memory.
(Hemanth Yamijala via omalley)


git-svn-id: https://svn.apache.org/repos/asf/hadoop/core/trunk@706535 13f79535-47bb-0310-9956-ffa450edef68

Owen O'Malley 16 年之前
父節點
當前提交
6434651fc5

+ 4 - 0
CHANGES.txt

@@ -1002,6 +1002,10 @@ Release 0.19.0 - Unreleased
     HADOOP-4296. Fix job client failures by not retiring a job as soon as it
     is finished. (dhruba)
 
+    HADOOP-4439. Remove configuration variables that aren't usable yet, in
+    particular mapred.tasktracker.tasks.maxmemory and mapred.task.max.memory.
+    (Hemanth Yamijala via omalley)
+
 Release 0.18.2 - Unreleased
 
   BUG FIXES

+ 0 - 33
conf/hadoop-default.xml

@@ -1455,39 +1455,6 @@ creations/deletions), or "all".</description>
   </description>
 </property>
 
-<property>
-  <name>mapred.tasktracker.tasks.maxmemory</name>
-  <value>-1</value>
-  <description> The maximum amount of virtual memory in kilobytes all tasks 
-  	running on a tasktracker, including sub-processes they launch, can use. 
-  	This value is used to compute the amount of free memory available for 
-  	tasks. Any task scheduled on this tasktracker is guaranteed and constrained
-  	 to use a share of this amount. Any task exceeding its share will be 
-  	killed. If set to -1, this functionality is disabled, and 
-  	mapred.task.maxmemory is ignored. Further, it will be enabled only on the
-  	systems where org.apache.hadoop.util.ProcfsBasedProcessTree is available,
-  	i.e at present only on Linux.
-  </description>
-</property>
-
-<property>
-  <name>mapred.task.maxmemory</name>
-  <value>-1</value>
-  <description> The maximum amount of memory in kilobytes any task of a job 
-    will use. A task of this job will be scheduled on a tasktracker, only if 
-    the amount of free memory on the tasktracker is greater than or 
-    equal to this value. If set to -1, tasks are assured a memory limit on
-    the tasktracker equal to 
-    mapred.tasktracker.tasks.maxmemory/number of slots. If the value of 
-    mapred.tasktracker.tasks.maxmemory is set to -1, this value is ignored.
-    
-    Note: If mapred.child.java.opts is specified with an Xmx value, or if 
-    mapred.child.ulimit is specified, then the value of mapred.task.maxmemory
-    must be set to a higher value than these. If not, the task might be 
-    killed even though these limits are not reached.
-  </description>  
-</property>
-
 <property>
   <name>mapred.queue.names</name>
   <value>default</value>

+ 9 - 3
docs/changes.html

@@ -74,7 +74,7 @@ Following deprecated methods in RawLocalFileSystem are removed:
     </ol>
   </li>
   <li><a href="javascript:toggleList('trunk_(unreleased_changes)_._improvements_')">  IMPROVEMENTS
-</a>&nbsp;&nbsp;&nbsp;(8)
+</a>&nbsp;&nbsp;&nbsp;(9)
     <ol id="trunk_(unreleased_changes)_._improvements_">
       <li><a href="http://issues.apache.org/jira/browse/HADOOP-4234">HADOOP-4234</a>. Fix KFS "glue" layer to allow applications to interface
 with multiple KFS metaservers.<br />(Sriram Rao via lohit)</li>
@@ -90,6 +90,8 @@ message.<br />(stevel via omalley)</li>
 understandable.<br />(Yuri Pradkin via cdouglas)</li>
       <li><a href="http://issues.apache.org/jira/browse/HADOOP-4238">HADOOP-4238</a>. When listing jobs, if scheduling information isn't available
 print NA instead of empty output.<br />(Sreekanth Ramakrishnan via johan)</li>
+      <li><a href="http://issues.apache.org/jira/browse/HADOOP-4284">HADOOP-4284</a>. Support filters that apply to all requests, or global filters,
+to HttpServer.<br />(Kan Zhang via cdouglas)</li>
     </ol>
   </li>
   <li><a href="javascript:toggleList('trunk_(unreleased_changes)_._optimizations_')">  OPTIMIZATIONS
@@ -443,7 +445,7 @@ org.apache.hadoop.mapred  package private instead of public.<br />(omalley)</li>
     </ol>
   </li>
   <li><a href="javascript:toggleList('release_0.19.0_-_unreleased_._bug_fixes_')">  BUG FIXES
-</a>&nbsp;&nbsp;&nbsp;(140)
+</a>&nbsp;&nbsp;&nbsp;(141)
     <ol id="release_0.19.0_-_unreleased_._bug_fixes_">
       <li><a href="http://issues.apache.org/jira/browse/HADOOP-3563">HADOOP-3563</a>.  Refactor the distributed upgrade code so that it is
 easier to identify datanode and namenode related code.<br />(dhruba)</li>
@@ -710,6 +712,8 @@ append.<br />(szetszwo)</li>
 </li>
       <li><a href="http://issues.apache.org/jira/browse/HADOOP-4404">HADOOP-4404</a>. saveFSImage() removes files from a storage directory that do
 not correspond to its type.<br />(shv)</li>
+      <li><a href="http://issues.apache.org/jira/browse/HADOOP-4149">HADOOP-4149</a>. Fix handling of updates to the job priority, by changing the
+list of jobs to be keyed by the priority, submit time, and job tracker id.<br />(Amar Kamat via omalley)</li>
     </ol>
   </li>
 </ul>
@@ -719,7 +723,7 @@ not correspond to its type.<br />(shv)</li>
 </a></h3>
 <ul id="release_0.18.2_-_unreleased_">
   <li><a href="javascript:toggleList('release_0.18.2_-_unreleased_._bug_fixes_')">  BUG FIXES
-</a>&nbsp;&nbsp;&nbsp;(9)
+</a>&nbsp;&nbsp;&nbsp;(10)
     <ol id="release_0.18.2_-_unreleased_._bug_fixes_">
       <li><a href="http://issues.apache.org/jira/browse/HADOOP-4116">HADOOP-4116</a>. Balancer should provide better resource management.<br />(hairong)</li>
       <li><a href="http://issues.apache.org/jira/browse/HADOOP-3614">HADOOP-3614</a>. Fix a bug that Datanode may use an old GenerationStamp to get
@@ -735,6 +739,8 @@ ArrayIndexOutOfBoundsException.<br />(hairong)</li>
       <li><a href="http://issues.apache.org/jira/browse/HADOOP-4398">HADOOP-4398</a>. No need to truncate access time in INode. Also fixes NPE
 in CreateEditsLog.<br />(Raghu Angadi)</li>
       <li><a href="http://issues.apache.org/jira/browse/HADOOP-4399">HADOOP-4399</a>. Make fuse-dfs multi-thread access safe.<br />(Pete Wyckoff via dhruba)</li>
+      <li><a href="http://issues.apache.org/jira/browse/HADOOP-4369">HADOOP-4369</a>. Use setMetric(...) instead of incrMetric(...) for metrics
+averages.<br />(Brian Bockelman via szetszwo)</li>
     </ol>
   </li>
 </ul>

+ 0 - 31
docs/hadoop-default.html

@@ -224,10 +224,6 @@ creations/deletions), or "all".</td>
   </td>
 </tr>
 <tr>
-<td><a name="dfs.datanode.du.pct">dfs.datanode.du.pct</a></td><td>0.98f</td><td>When calculating remaining space, only use this percentage of the real available space
-  </td>
-</tr>
-<tr>
 <td><a name="dfs.name.dir">dfs.name.dir</a></td><td>${hadoop.tmp.dir}/dfs/name</td><td>Determines where on the local filesystem the DFS name node
       should store the name table(fsimage).  If this is a comma-delimited list
       of directories then the name table is replicated in all of the
@@ -911,33 +907,6 @@ creations/deletions), or "all".</td>
   </td>
 </tr>
 <tr>
-<td><a name="mapred.tasktracker.tasks.maxmemory">mapred.tasktracker.tasks.maxmemory</a></td><td>-1</td><td> The maximum amount of virtual memory in kilobytes all tasks 
-  	running on a tasktracker, including sub-processes they launch, can use. 
-  	This value is used to compute the amount of free memory available for 
-  	tasks. Any task scheduled on this tasktracker is guaranteed and constrained
-  	 to use a share of this amount. Any task exceeding its share will be 
-  	killed. If set to -1, this functionality is disabled, and 
-  	mapred.task.maxmemory is ignored. Further, it will be enabled only on the
-  	systems where org.apache.hadoop.util.ProcfsBasedProcessTree is available,
-  	i.e at present only on Linux.
-  </td>
-</tr>
-<tr>
-<td><a name="mapred.task.maxmemory">mapred.task.maxmemory</a></td><td>-1</td><td> The maximum amount of memory in kilobytes any task of a job 
-    will use. A task of this job will be scheduled on a tasktracker, only if 
-    the amount of free memory on the tasktracker is greater than or 
-    equal to this value. If set to -1, tasks are assured a memory limit on
-    the tasktracker equal to 
-    mapred.tasktracker.tasks.maxmemory/number of slots. If the value of 
-    mapred.tasktracker.tasks.maxmemory is set to -1, this value is ignored.
-    
-    Note: If mapred.child.java.opts is specified with an Xmx value, or if 
-    mapred.child.ulimit is specified, then the value of mapred.task.maxmemory
-    must be set to a higher value than these. If not, the task might be 
-    killed even though these limits are not reached.
-  </td>
-</tr>
-<tr>
 <td><a name="mapred.queue.names">mapred.queue.names</a></td><td>default</td><td> Comma separated list of queues configured for this jobtracker.
     Jobs are added to queues and schedulers can configure different 
     scheduling properties for the various queues. To configure a property 

+ 34 - 54
docs/mapred_tutorial.html

@@ -348,7 +348,7 @@ document.write("Last Published: " + document.lastModified);
 <a href="#Example%3A+WordCount+v2.0">Example: WordCount v2.0</a>
 <ul class="minitoc">
 <li>
-<a href="#Source+Code-N10FAA">Source Code</a>
+<a href="#Source+Code-N10F95">Source Code</a>
 </li>
 <li>
 <a href="#Sample+Runs">Sample Runs</a>
@@ -1621,26 +1621,6 @@ document.write("Last Published: " + document.lastModified);
         <a href="cluster_setup.html#Configuring+the+Environment+of+the+Hadoop+Daemons">
         cluster_setup.html </a>
 </p>
-<p>There are two additional parameters that influence virtual memory
-        limits for tasks run on a tasktracker. The parameter 
-        <span class="codefrag">mapred.tasktracker.maxmemory</span> is set by admins
-        to limit the total memory all tasks that it runs can use together. 
-        Setting this enables the parameter <span class="codefrag">mapred.task.maxmemory</span>
-        that can be used to specify the maximum virtual memory the entire 
-        process tree starting from the launched child-task requires. 
-        This is a cumulative limit of all processes in the process tree. 
-        By specifying this value, users can be assured that the system will 
-        run their tasks only on tasktrackers that have atleast this amount 
-        of free memory available. If at any time during task execution, this 
-        limit is exceeded, the task would be killed by the system. By default, 
-        any task would get a share of 
-        <span class="codefrag">mapred.tasktracker.maxmemory</span>, divided
-        equally among the number of slots. The user can thus verify if the
-        tasks need more memory than this, and specify it in 
-        <span class="codefrag">mapred.task.maxmemory</span>. Specifically, this value must be 
-        greater than any value specified for a maximum heap-size
-        of the child jvm via <span class="codefrag">mapred.child.java.opts</span>, or a ulimit
-        value in <span class="codefrag">mapred.child.ulimit</span>. </p>
 <p>The memory available to some parts of the framework is also
         configurable. In map and reduce tasks, performance may be influenced
         by adjusting parameters influencing the concurrency of operations and
@@ -1648,7 +1628,7 @@ document.write("Last Published: " + document.lastModified);
         counters for a job- particularly relative to byte counts from the map
         and into the reduce- is invaluable to the tuning of these
         parameters.</p>
-<a name="N108E8"></a><a name="Map+Parameters"></a>
+<a name="N108D3"></a><a name="Map+Parameters"></a>
 <h4>Map Parameters</h4>
 <p>A record emitted from a map will be serialized into a buffer and
           metadata will be stored into accounting buffers. As described in the
@@ -1722,7 +1702,7 @@ document.write("Last Published: " + document.lastModified);
             combiner.</li>
           
 </ul>
-<a name="N10954"></a><a name="Shuffle%2FReduce+Parameters"></a>
+<a name="N1093F"></a><a name="Shuffle%2FReduce+Parameters"></a>
 <h4>Shuffle/Reduce Parameters</h4>
 <p>As described previously, each reduce fetches the output assigned
           to it by the Partitioner via HTTP into memory and periodically
@@ -1818,7 +1798,7 @@ document.write("Last Published: " + document.lastModified);
             of the intermediate merge.</li>
           
 </ul>
-<a name="N109CF"></a><a name="Directory+Structure"></a>
+<a name="N109BA"></a><a name="Directory+Structure"></a>
 <h4> Directory Structure </h4>
 <p>The task tracker has local directory,
         <span class="codefrag"> ${mapred.local.dir}/taskTracker/</span> to create localized
@@ -1919,7 +1899,7 @@ document.write("Last Published: " + document.lastModified);
 </li>
         
 </ul>
-<a name="N10A3E"></a><a name="Task+JVM+Reuse"></a>
+<a name="N10A29"></a><a name="Task+JVM+Reuse"></a>
 <h4>Task JVM Reuse</h4>
 <p>Jobs can enable task JVMs to be reused by specifying the job 
         configuration <span class="codefrag">mapred.job.reuse.jvm.num.tasks</span>. If the
@@ -2011,7 +1991,7 @@ document.write("Last Published: " + document.lastModified);
         <a href="native_libraries.html#Loading+native+libraries+through+DistributedCache">
         native_libraries.html</a>
 </p>
-<a name="N10B27"></a><a name="Job+Submission+and+Monitoring"></a>
+<a name="N10B12"></a><a name="Job+Submission+and+Monitoring"></a>
 <h3 class="h4">Job Submission and Monitoring</h3>
 <p>
 <a href="api/org/apache/hadoop/mapred/JobClient.html">
@@ -2072,7 +2052,7 @@ document.write("Last Published: " + document.lastModified);
 <p>Normally the user creates the application, describes various facets 
         of the job via <span class="codefrag">JobConf</span>, and then uses the 
         <span class="codefrag">JobClient</span> to submit the job and monitor its progress.</p>
-<a name="N10B87"></a><a name="Job+Control"></a>
+<a name="N10B72"></a><a name="Job+Control"></a>
 <h4>Job Control</h4>
 <p>Users may need to chain Map/Reduce jobs to accomplish complex
           tasks which cannot be done via a single Map/Reduce job. This is fairly
@@ -2108,7 +2088,7 @@ document.write("Last Published: " + document.lastModified);
             </li>
           
 </ul>
-<a name="N10BB1"></a><a name="Job+Input"></a>
+<a name="N10B9C"></a><a name="Job+Input"></a>
 <h3 class="h4">Job Input</h3>
 <p>
 <a href="api/org/apache/hadoop/mapred/InputFormat.html">
@@ -2156,7 +2136,7 @@ document.write("Last Published: " + document.lastModified);
         appropriate <span class="codefrag">CompressionCodec</span>. However, it must be noted that
         compressed files with the above extensions cannot be <em>split</em> and 
         each compressed file is processed in its entirety by a single mapper.</p>
-<a name="N10C1B"></a><a name="InputSplit"></a>
+<a name="N10C06"></a><a name="InputSplit"></a>
 <h4>InputSplit</h4>
 <p>
 <a href="api/org/apache/hadoop/mapred/InputSplit.html">
@@ -2170,7 +2150,7 @@ document.write("Last Published: " + document.lastModified);
           FileSplit</a> is the default <span class="codefrag">InputSplit</span>. It sets 
           <span class="codefrag">map.input.file</span> to the path of the input file for the
           logical split.</p>
-<a name="N10C40"></a><a name="RecordReader"></a>
+<a name="N10C2B"></a><a name="RecordReader"></a>
 <h4>RecordReader</h4>
 <p>
 <a href="api/org/apache/hadoop/mapred/RecordReader.html">
@@ -2182,7 +2162,7 @@ document.write("Last Published: " + document.lastModified);
           for processing. <span class="codefrag">RecordReader</span> thus assumes the 
           responsibility of processing record boundaries and presents the tasks 
           with keys and values.</p>
-<a name="N10C63"></a><a name="Job+Output"></a>
+<a name="N10C4E"></a><a name="Job+Output"></a>
 <h3 class="h4">Job Output</h3>
 <p>
 <a href="api/org/apache/hadoop/mapred/OutputFormat.html">
@@ -2207,7 +2187,7 @@ document.write("Last Published: " + document.lastModified);
 <p>
 <span class="codefrag">TextOutputFormat</span> is the default 
         <span class="codefrag">OutputFormat</span>.</p>
-<a name="N10C8C"></a><a name="OutputCommitter"></a>
+<a name="N10C77"></a><a name="OutputCommitter"></a>
 <h4>OutputCommitter</h4>
 <p>
 <a href="api/org/apache/hadoop/mapred/OutputCommitter.html">
@@ -2249,7 +2229,7 @@ document.write("Last Published: " + document.lastModified);
 <p>
 <span class="codefrag">FileOutputCommitter</span> is the default 
         <span class="codefrag">OutputCommitter</span>.</p>
-<a name="N10CBC"></a><a name="Task+Side-Effect+Files"></a>
+<a name="N10CA7"></a><a name="Task+Side-Effect+Files"></a>
 <h4>Task Side-Effect Files</h4>
 <p>In some applications, component tasks need to create and/or write to
           side-files, which differ from the actual job-output files.</p>
@@ -2290,7 +2270,7 @@ document.write("Last Published: " + document.lastModified);
 <p>The entire discussion holds true for maps of jobs with 
            reducer=NONE (i.e. 0 reduces) since output of the map, in that case, 
            goes directly to HDFS.</p>
-<a name="N10D0A"></a><a name="RecordWriter"></a>
+<a name="N10CF5"></a><a name="RecordWriter"></a>
 <h4>RecordWriter</h4>
 <p>
 <a href="api/org/apache/hadoop/mapred/RecordWriter.html">
@@ -2298,9 +2278,9 @@ document.write("Last Published: " + document.lastModified);
           pairs to an output file.</p>
 <p>RecordWriter implementations write the job outputs to the 
           <span class="codefrag">FileSystem</span>.</p>
-<a name="N10D21"></a><a name="Other+Useful+Features"></a>
+<a name="N10D0C"></a><a name="Other+Useful+Features"></a>
 <h3 class="h4">Other Useful Features</h3>
-<a name="N10D27"></a><a name="Submitting+Jobs+to+a+Queue"></a>
+<a name="N10D12"></a><a name="Submitting+Jobs+to+a+Queue"></a>
 <h4>Submitting Jobs to a Queue</h4>
 <p>Some job schedulers supported in Hadoop, like the 
             <a href="capacity_scheduler.html">Capacity
@@ -2316,7 +2296,7 @@ document.write("Last Published: " + document.lastModified);
             given user. In that case, if the job is not submitted
             to one of the queues where the user has access,
             the job would be rejected.</p>
-<a name="N10D3F"></a><a name="Counters"></a>
+<a name="N10D2A"></a><a name="Counters"></a>
 <h4>Counters</h4>
 <p>
 <span class="codefrag">Counters</span> represent global counters, defined either by 
@@ -2333,7 +2313,7 @@ document.write("Last Published: " + document.lastModified);
           in the <span class="codefrag">map</span> and/or 
           <span class="codefrag">reduce</span> methods. These counters are then globally 
           aggregated by the framework.</p>
-<a name="N10D6E"></a><a name="DistributedCache"></a>
+<a name="N10D59"></a><a name="DistributedCache"></a>
 <h4>DistributedCache</h4>
 <p>
 <a href="api/org/apache/hadoop/filecache/DistributedCache.html">
@@ -2404,7 +2384,7 @@ document.write("Last Published: " + document.lastModified);
           <span class="codefrag">mapred.job.classpath.{files|archives}</span>. Similarly the
           cached files that are symlinked into the working directory of the
           task can be used to distribute native libraries and load them.</p>
-<a name="N10DF1"></a><a name="Tool"></a>
+<a name="N10DDC"></a><a name="Tool"></a>
 <h4>Tool</h4>
 <p>The <a href="api/org/apache/hadoop/util/Tool.html">Tool</a> 
           interface supports the handling of generic Hadoop command-line options.
@@ -2444,7 +2424,7 @@ document.write("Last Published: " + document.lastModified);
             </span>
           
 </p>
-<a name="N10E23"></a><a name="IsolationRunner"></a>
+<a name="N10E0E"></a><a name="IsolationRunner"></a>
 <h4>IsolationRunner</h4>
 <p>
 <a href="api/org/apache/hadoop/mapred/IsolationRunner.html">
@@ -2468,7 +2448,7 @@ document.write("Last Published: " + document.lastModified);
 <p>
 <span class="codefrag">IsolationRunner</span> will run the failed task in a single 
           jvm, which can be in the debugger, over precisely the same input.</p>
-<a name="N10E56"></a><a name="Profiling"></a>
+<a name="N10E41"></a><a name="Profiling"></a>
 <h4>Profiling</h4>
 <p>Profiling is a utility to get a representative (2 or 3) sample
           of built-in java profiler for a sample of maps and reduces. </p>
@@ -2501,7 +2481,7 @@ document.write("Last Published: " + document.lastModified);
           <span class="codefrag">-agentlib:hprof=cpu=samples,heap=sites,force=n,thread=y,verbose=n,file=%s</span>
           
 </p>
-<a name="N10E8A"></a><a name="Debugging"></a>
+<a name="N10E75"></a><a name="Debugging"></a>
 <h4>Debugging</h4>
 <p>Map/Reduce framework provides a facility to run user-provided 
           scripts for debugging. When map/reduce task fails, user can run 
@@ -2512,14 +2492,14 @@ document.write("Last Published: " + document.lastModified);
 <p> In the following sections we discuss how to submit debug script
           along with the job. For submitting debug script, first it has to
           distributed. Then the script has to supplied in Configuration. </p>
-<a name="N10E96"></a><a name="How+to+distribute+script+file%3A"></a>
+<a name="N10E81"></a><a name="How+to+distribute+script+file%3A"></a>
 <h5> How to distribute script file: </h5>
 <p>
           The user has to use 
           <a href="mapred_tutorial.html#DistributedCache">DistributedCache</a>
           mechanism to <em>distribute</em> and <em>symlink</em> the
           debug script file.</p>
-<a name="N10EAA"></a><a name="How+to+submit+script%3A"></a>
+<a name="N10E95"></a><a name="How+to+submit+script%3A"></a>
 <h5> How to submit script: </h5>
 <p> A quick way to submit debug script is to set values for the 
           properties "mapred.map.task.debug.script" and 
@@ -2543,17 +2523,17 @@ document.write("Last Published: " + document.lastModified);
 <span class="codefrag">$script $stdout $stderr $syslog $jobconf $program </span>  
           
 </p>
-<a name="N10ECC"></a><a name="Default+Behavior%3A"></a>
+<a name="N10EB7"></a><a name="Default+Behavior%3A"></a>
 <h5> Default Behavior: </h5>
 <p> For pipes, a default script is run to process core dumps under
           gdb, prints stack trace and gives info about running threads. </p>
-<a name="N10ED7"></a><a name="JobControl"></a>
+<a name="N10EC2"></a><a name="JobControl"></a>
 <h4>JobControl</h4>
 <p>
 <a href="api/org/apache/hadoop/mapred/jobcontrol/package-summary.html">
           JobControl</a> is a utility which encapsulates a set of Map/Reduce jobs
           and their dependencies.</p>
-<a name="N10EE4"></a><a name="Data+Compression"></a>
+<a name="N10ECF"></a><a name="Data+Compression"></a>
 <h4>Data Compression</h4>
 <p>Hadoop Map/Reduce provides facilities for the application-writer to
           specify compression for both intermediate map-outputs and the
@@ -2567,7 +2547,7 @@ document.write("Last Published: " + document.lastModified);
           codecs for reasons of both performance (zlib) and non-availability of
           Java libraries (lzo). More details on their usage and availability are
           available <a href="native_libraries.html">here</a>.</p>
-<a name="N10F04"></a><a name="Intermediate+Outputs"></a>
+<a name="N10EEF"></a><a name="Intermediate+Outputs"></a>
 <h5>Intermediate Outputs</h5>
 <p>Applications can control compression of intermediate map-outputs
             via the 
@@ -2576,7 +2556,7 @@ document.write("Last Published: " + document.lastModified);
             <span class="codefrag">CompressionCodec</span> to be used via the
             <a href="api/org/apache/hadoop/mapred/JobConf.html#setMapOutputCompressorClass(java.lang.Class)">
             JobConf.setMapOutputCompressorClass(Class)</a> api.</p>
-<a name="N10F19"></a><a name="Job+Outputs"></a>
+<a name="N10F04"></a><a name="Job+Outputs"></a>
 <h5>Job Outputs</h5>
 <p>Applications can control compression of job-outputs via the
             <a href="api/org/apache/hadoop/mapred/FileOutputFormat.html#setCompressOutput(org.apache.hadoop.mapred.JobConf,%20boolean)">
@@ -2593,7 +2573,7 @@ document.write("Last Published: " + document.lastModified);
             <a href="api/org/apache/hadoop/mapred/SequenceFileOutputFormat.html#setOutputCompressionType(org.apache.hadoop.mapred.JobConf,%20org.apache.hadoop.io.SequenceFile.CompressionType)">
             SequenceFileOutputFormat.setOutputCompressionType(JobConf, 
             SequenceFile.CompressionType)</a> api.</p>
-<a name="N10F46"></a><a name="Skipping+Bad+Records"></a>
+<a name="N10F31"></a><a name="Skipping+Bad+Records"></a>
 <h4>Skipping Bad Records</h4>
 <p>Hadoop provides an optional mode of execution in which the bad 
           records are detected and skipped in further attempts. 
@@ -2667,7 +2647,7 @@ document.write("Last Published: " + document.lastModified);
 </div>
 
     
-<a name="N10F90"></a><a name="Example%3A+WordCount+v2.0"></a>
+<a name="N10F7B"></a><a name="Example%3A+WordCount+v2.0"></a>
 <h2 class="h3">Example: WordCount v2.0</h2>
 <div class="section">
 <p>Here is a more complete <span class="codefrag">WordCount</span> which uses many of the
@@ -2677,7 +2657,7 @@ document.write("Last Published: " + document.lastModified);
       <a href="quickstart.html#SingleNodeSetup">pseudo-distributed</a> or
       <a href="quickstart.html#Fully-Distributed+Operation">fully-distributed</a> 
       Hadoop installation.</p>
-<a name="N10FAA"></a><a name="Source+Code-N10FAA"></a>
+<a name="N10F95"></a><a name="Source+Code-N10F95"></a>
 <h3 class="h4">Source Code</h3>
 <table class="ForrestTable" cellspacing="1" cellpadding="4">
           
@@ -3887,7 +3867,7 @@ document.write("Last Published: " + document.lastModified);
 </tr>
         
 </table>
-<a name="N1170C"></a><a name="Sample+Runs"></a>
+<a name="N116F7"></a><a name="Sample+Runs"></a>
 <h3 class="h4">Sample Runs</h3>
 <p>Sample text-files as input:</p>
 <p>
@@ -4055,7 +4035,7 @@ document.write("Last Published: " + document.lastModified);
 <br>
         
 </p>
-<a name="N117E0"></a><a name="Highlights"></a>
+<a name="N117CB"></a><a name="Highlights"></a>
 <h3 class="h4">Highlights</h3>
 <p>The second version of <span class="codefrag">WordCount</span> improves upon the 
         previous one by using some features offered by the Map/Reduce framework:

File diff suppressed because it is too large
+ 2 - 2
docs/mapred_tutorial.pdf


+ 0 - 21
src/docs/src/documentation/content/xdocs/mapred_tutorial.xml

@@ -1097,27 +1097,6 @@
         <a href="cluster_setup.html#Configuring+the+Environment+of+the+Hadoop+Daemons">
         cluster_setup.html </a></p>
         
-        <p>There are two additional parameters that influence virtual memory
-        limits for tasks run on a tasktracker. The parameter 
-        <code>mapred.tasktracker.maxmemory</code> is set by admins
-        to limit the total memory all tasks that it runs can use together. 
-        Setting this enables the parameter <code>mapred.task.maxmemory</code>
-        that can be used to specify the maximum virtual memory the entire 
-        process tree starting from the launched child-task requires. 
-        This is a cumulative limit of all processes in the process tree. 
-        By specifying this value, users can be assured that the system will 
-        run their tasks only on tasktrackers that have atleast this amount 
-        of free memory available. If at any time during task execution, this 
-        limit is exceeded, the task would be killed by the system. By default, 
-        any task would get a share of 
-        <code>mapred.tasktracker.maxmemory</code>, divided
-        equally among the number of slots. The user can thus verify if the
-        tasks need more memory than this, and specify it in 
-        <code>mapred.task.maxmemory</code>. Specifically, this value must be 
-        greater than any value specified for a maximum heap-size
-        of the child jvm via <code>mapred.child.java.opts</code>, or a ulimit
-        value in <code>mapred.child.ulimit</code>. </p>
-
         <p>The memory available to some parts of the framework is also
         configurable. In map and reduce tasks, performance may be influenced
         by adjusting parameters influencing the concurrency of operations and

+ 4 - 36
src/mapred/org/apache/hadoop/mapred/JobConf.java

@@ -108,7 +108,7 @@ public class JobConf extends Configuration {
    * A value which if set for memory related configuration options,
    * indicates that the options are turned off.
    */
-  public static final long DISABLED_VIRTUAL_MEMORY_LIMIT = -1L;
+  static final long DISABLED_VIRTUAL_MEMORY_LIMIT = -1L;
   
   /**
    * Name of the queue to which jobs will be submitted, if no queue
@@ -1336,37 +1336,6 @@ public class JobConf extends Configuration {
     return get("job.local.dir");
   }
   
-  /**
-   * The maximum amount of virtual memory all tasks running on a
-   * tasktracker, including sub-processes they launch, can use.
-   *  
-   * This value is used to compute the amount of free memory 
-   * available for tasks. Any task scheduled on this tasktracker is 
-   * guaranteed and constrained to use a share of this amount. Any task 
-   * exceeding its share will be killed.
-   * 
-   * If set to {@link #DISABLED_VIRTUAL_MEMORY_LIMIT}, this functionality 
-   * is disabled.
-   * 
-   * @return maximum amount of virtual memory in kilobytes to divide among
-   * @see #getMaxVirtualMemoryForTask()
-   */
-  public long getMaxVirtualMemoryForTasks() {
-    return getLong("mapred.tasktracker.tasks.maxmemory", 
-                      DISABLED_VIRTUAL_MEMORY_LIMIT);
-  }
-  
-  /**
-   * Set the maximum amount of virtual memory all tasks running on a
-   * tasktracker, including sub-processes they launch, can use.
-   * 
-   * @param vmem maximum amount of virtual memory in kilobytes that can be used.
-   * @see #getMaxVirtualMemoryForTasks()
-   */
-  public void setMaxVirtualMemoryForTasks(long vmem) {
-    setLong("mapred.tasktracker.tasks.maxmemory", vmem);
-  }
-  
   /**
    * The maximum amount of memory any task of this job will use.
    * 
@@ -1375,15 +1344,14 @@ public class JobConf extends Configuration {
    * or equal to this value.
    * 
    * If set to {@link #DISABLED_VIRTUAL_MEMORY_LIMIT}, tasks are assured 
-   * a memory limit on the tasktracker equal to
-   * mapred.tasktracker.tasks.maxmemory/number of slots. If the value of
+   * a memory limit set to mapred.task.default.maxmemory. If the value of
    * mapred.tasktracker.tasks.maxmemory is set to -1, this value is 
    * ignored.
    * 
    * @return The maximum amount of memory any task of this job will use, in kilobytes.
    * @see #getMaxVirtualMemoryForTasks()
    */
-  public long getMaxVirtualMemoryForTask() {
+  long getMaxVirtualMemoryForTask() {
     return getLong("mapred.task.maxmemory", DISABLED_VIRTUAL_MEMORY_LIMIT);
   }
   
@@ -1394,7 +1362,7 @@ public class JobConf extends Configuration {
    * can use.
    * @see #getMaxVirtualMemoryForTask()
    */
-  public void setMaxVirtualMemoryForTask(long vmem) {
+  void setMaxVirtualMemoryForTask(long vmem) {
     setLong("mapred.task.maxmemory", vmem);
   }
   

+ 7 - 16
src/mapred/org/apache/hadoop/mapred/TaskTracker.java

@@ -197,8 +197,6 @@ public class TaskTracker
   private boolean taskMemoryManagerEnabled = false;
   private long maxVirtualMemoryForTasks 
                                     = JobConf.DISABLED_VIRTUAL_MEMORY_LIMIT;
-  private long defaultMemoryPerTask = JobConf.DISABLED_VIRTUAL_MEMORY_LIMIT;
-  
   
   /**
    * the minimum interval between jobtracker polls
@@ -471,13 +469,9 @@ public class TaskTracker
                              "Map-events fetcher for all reduce tasks " + "on " + 
                              taskTrackerName);
     mapEventsFetcher.start();
-    maxVirtualMemoryForTasks = fConf.getMaxVirtualMemoryForTasks();
-    if (maxVirtualMemoryForTasks != 
-                JobConf.DISABLED_VIRTUAL_MEMORY_LIMIT) {
-      defaultMemoryPerTask = maxVirtualMemoryForTasks /
-                                    (maxCurrentMapTasks + 
-                                        maxCurrentReduceTasks);
-    }
+    maxVirtualMemoryForTasks = fConf.
+                                  getLong("mapred.tasktracker.tasks.maxmemory",
+                                          JobConf.DISABLED_VIRTUAL_MEMORY_LIMIT);
     this.indexCache = new IndexCache(this.fConf);
     // start the taskMemoryManager thread only if enabled
     setTaskMemoryManagerEnabledFlag();
@@ -785,10 +779,6 @@ public class TaskTracker
     launchTaskForJob(tip, new JobConf(rjob.jobFile)); 
   }
 
-  private long getDefaultMemoryPerTask() {
-    return defaultMemoryPerTask;
-  }
-
   private void launchTaskForJob(TaskInProgress tip, JobConf jobConf) throws IOException{
     synchronized (tip) {
       try {
@@ -1180,7 +1170,7 @@ public class TaskTracker
       LOG.debug("Setting amount of free virtual memory for the new task: " +
                     freeVirtualMem);
       status.getResourceStatus().setFreeVirtualMemory(freeVirtualMem);
-      status.getResourceStatus().setDefaultVirtualMemoryPerTask(getDefaultMemoryPerTask());      
+      status.getResourceStatus().setTotalMemory(maxVirtualMemoryForTasks);
     }
       
     //
@@ -1279,10 +1269,11 @@ public class TaskTracker
    * @param conf
    * @return the memory allocated for the TIP.
    */
-  public long getMemoryForTask(JobConf conf) {
+  long getMemoryForTask(JobConf conf) {
     long memForTask = conf.getMaxVirtualMemoryForTask();
     if (memForTask == JobConf.DISABLED_VIRTUAL_MEMORY_LIMIT) {
-      memForTask = this.getDefaultMemoryPerTask();
+      memForTask = fConf.getLong("mapred.task.default.maxmemory",
+                          512*1024*1024L);
     }
     return memForTask;
   }  

+ 13 - 14
src/mapred/org/apache/hadoop/mapred/TaskTrackerStatus.java

@@ -55,12 +55,12 @@ class TaskTrackerStatus implements Writable {
   static class ResourceStatus implements Writable {
     
     private long freeVirtualMemory;
-    private long defaultVirtualMemoryPerTask;
+    private long totalMemory;
     private long availableSpace;
     
     ResourceStatus() {
       freeVirtualMemory = JobConf.DISABLED_VIRTUAL_MEMORY_LIMIT;
-      defaultVirtualMemoryPerTask = JobConf.DISABLED_VIRTUAL_MEMORY_LIMIT;
+      totalMemory = JobConf.DISABLED_VIRTUAL_MEMORY_LIMIT;
       availableSpace = Long.MAX_VALUE;
     }
     
@@ -87,25 +87,24 @@ class TaskTrackerStatus implements Writable {
     }
 
     /**
-     * Set the default amount of virtual memory per task.
-     * @param vmem amount of free virtual memory in kilobytes.
+     * Set the maximum amount of virtual memory on the tasktracker.
+     * @param vmem maximum amount of virtual memory on the tasktracker in kilobytes.
      */
-    void setDefaultVirtualMemoryPerTask(long defaultVmem) {
-      defaultVirtualMemoryPerTask = defaultVmem;
+    void setTotalMemory(long totalMem) {
+      totalMemory = totalMem;
     }
     
     /**
-     * Get the default amount of virtual memory per task.
+     * Get the maximum amount of virtual memory on the tasktracker.
      * 
-     * This amount will be returned if a task's job does not specify any
-     * virtual memory itself. If this is 
+     * If this is
      * {@link JobConf.DISABLED_VIRTUAL_MEMORY_LIMIT}, it should be ignored 
      * and not used in any computation.
      * 
-     * @return default amount of virtual memory per task in kilobytes. 
+     * @return maximum amount of virtual memory on the tasktracker in kilobytes. 
      */    
-    long getDefaultVirtualMemoryPerTask() {
-      return defaultVirtualMemoryPerTask;
+    long getTotalMemory() {
+      return totalMemory;
     }
     
     void setAvailableSpace(long availSpace) {
@@ -122,13 +121,13 @@ class TaskTrackerStatus implements Writable {
     
     public void write(DataOutput out) throws IOException {
       WritableUtils.writeVLong(out, freeVirtualMemory);
-      WritableUtils.writeVLong(out, defaultVirtualMemoryPerTask);
+      WritableUtils.writeVLong(out, totalMemory);
       WritableUtils.writeVLong(out, availableSpace);
     }
     
     public void readFields(DataInput in) throws IOException {
       freeVirtualMemory = WritableUtils.readVLong(in);;
-      defaultVirtualMemoryPerTask = WritableUtils.readVLong(in);;
+      totalMemory = WritableUtils.readVLong(in);;
       availableSpace = WritableUtils.readVLong(in);;
     }
   }

+ 29 - 48
src/test/org/apache/hadoop/mapred/TestHighRAMJobs.java

@@ -78,7 +78,7 @@ public class TestHighRAMJobs extends TestCase {
       TestHighRAMJobs.LOG.info("status = " + status.getResourceStatus().getFreeVirtualMemory());
 
       long initialFreeMemory = getConf().getLong("initialFreeMemory", 0L);
-      long memoryPerTaskOnTT = getConf().getLong("memoryPerTaskOnTT", 0L);
+      long totalMemoryOnTT = getConf().getLong("totalMemoryOnTT", 0L);
 
       if (isFirstTime) {
         isFirstTime = false;
@@ -87,19 +87,15 @@ public class TestHighRAMJobs extends TestCase {
           message = "Initial memory expected = " + initialFreeMemory
                       + " reported = " + status.getResourceStatus().getFreeVirtualMemory();
         }
-        if (memoryPerTaskOnTT != status.getResourceStatus().getDefaultVirtualMemoryPerTask()) {
+        if (totalMemoryOnTT != status.getResourceStatus().getTotalMemory()) {
           hasPassed = false;
-          message = "Memory per task on TT expected = " + memoryPerTaskOnTT
+          message = "Total memory on TT expected = " + totalMemoryOnTT
                       + " reported = " 
-                      + status.getResourceStatus().getDefaultVirtualMemoryPerTask();
+                      + status.getResourceStatus().getTotalMemory();
         }
       } else if (initialFreeMemory != DISABLED_VIRTUAL_MEMORY_LIMIT) {
         
-        long memoryPerTask = memoryPerTaskOnTT; // by default
-        if (getConf().getLong("memoryPerTask", 0L) != 
-                                            DISABLED_VIRTUAL_MEMORY_LIMIT) {
-          memoryPerTask = getConf().getLong("memoryPerTask", 0L);
-        }
+        long memoryPerTask = getConf().getLong("memoryPerTask", 0L);
           
         long expectedFreeMemory = 0;
         int runningTaskCount = status.countMapTasks() +
@@ -127,8 +123,7 @@ public class TestHighRAMJobs extends TestCase {
   public void testDefaultValuesForHighRAMJobs() throws Exception {
     long defaultMemoryLimit = DISABLED_VIRTUAL_MEMORY_LIMIT;
     try {
-      setUpCluster(defaultMemoryLimit, defaultMemoryLimit, 
-                    defaultMemoryLimit, null);
+      setUpCluster(defaultMemoryLimit, defaultMemoryLimit, null);
       runJob(defaultMemoryLimit, DEFAULT_MAP_SLEEP_TIME, 
           DEFAULT_REDUCE_SLEEP_TIME, DEFAULT_SLEEP_JOB_MAP_COUNT, 
           DEFAULT_SLEEP_JOB_REDUCE_COUNT);
@@ -142,35 +137,15 @@ public class TestHighRAMJobs extends TestCase {
    * when the number of slots is non-default.
    */
   public void testDefaultMemoryPerTask() throws Exception {
-    long maxVmem = 1024*1024*1024L;
+    long maxVmem = 2*1024*1024*1024L;
     JobConf conf = new JobConf();
-    conf.setInt("mapred.tasktracker.map.tasks.maximum", 1);
-    conf.setInt("mapred.tasktracker.reduce.tasks.maximum", 1);
-    // change number of slots to 2.
-    long defaultMemPerTaskOnTT = maxVmem / 2;
+    conf.setInt("mapred.tasktracker.map.tasks.maximum", 2);
+    conf.setInt("mapred.tasktracker.reduce.tasks.maximum", 2);
+    // set a different value for the default memory per task
+    long defaultMemPerTask = 256*1024*1024L; 
     try {
-      setUpCluster(maxVmem, defaultMemPerTaskOnTT, 
-                    DISABLED_VIRTUAL_MEMORY_LIMIT, conf);
-      runJob(DISABLED_VIRTUAL_MEMORY_LIMIT, DEFAULT_MAP_SLEEP_TIME,
-              DEFAULT_REDUCE_SLEEP_TIME, DEFAULT_SLEEP_JOB_MAP_COUNT,
-              DEFAULT_SLEEP_JOB_REDUCE_COUNT);
-      verifyTestResults();
-    } finally {
-      tearDownCluster();
-    }
-  }
-  
-  /* Test that verifies configured value for free memory is
-   * reported correctly. The test does NOT configure a value for
-   * memory per task. Hence, it also verifies that the default value
-   * per task on the TT is calculated correctly.
-   */
-  public void testConfiguredValueForFreeMemory() throws Exception {
-    long maxVmem = 1024*1024*1024L;
-    long defaultMemPerTaskOnTT = maxVmem/4; // 4 = default number of slots.
-    try {
-      setUpCluster(maxVmem, defaultMemPerTaskOnTT,
-                    DISABLED_VIRTUAL_MEMORY_LIMIT, null);
+      setUpCluster(maxVmem, defaultMemPerTask, 
+                    defaultMemPerTask, conf);
       runJob(DISABLED_VIRTUAL_MEMORY_LIMIT, "10000",
               DEFAULT_REDUCE_SLEEP_TIME, DEFAULT_SLEEP_JOB_MAP_COUNT,
               DEFAULT_SLEEP_JOB_REDUCE_COUNT);
@@ -182,15 +157,14 @@ public class TestHighRAMJobs extends TestCase {
   
   public void testHighRAMJob() throws Exception {
     long maxVmem = 1024*1024*1024L;
-    long defaultMemPerTaskOnTT = maxVmem/4; // 4 = default number of slots.
+    //long defaultMemPerTaskOnTT = maxVmem/4; // 4 = default number of slots.
     /* Set a HIGH RAM requirement for a job. As 4 is the
      * default number of slots, we set up the memory limit
      * per task to be more than 25%. 
      */
     long maxVmemPerTask = maxVmem/3;
     try {
-      setUpCluster(maxVmem, defaultMemPerTaskOnTT,
-                    maxVmemPerTask, null);
+      setUpCluster(maxVmem, maxVmemPerTask, null);
       /* set up sleep limits higher, so the scheduler will see varying
        * number of running tasks at a time. Also modify the number of
        * map tasks so we test the iteration over more than one task.
@@ -203,20 +177,27 @@ public class TestHighRAMJobs extends TestCase {
     }
   }
   
-  private void setUpCluster(long initialFreeMemory, long memoryPerTaskOnTT,
-                            long memoryPerTask, JobConf conf) 
-                              throws Exception {
+  private void setUpCluster(long totalMemoryOnTT, long memoryPerTask,
+                              JobConf conf) throws Exception {
+    this.setUpCluster(totalMemoryOnTT, 512*1024*1024L, 
+                          memoryPerTask, conf);
+  }
+  
+  private void setUpCluster(long totalMemoryOnTT, long defaultMemoryPerTask,
+                              long memoryPerTask, JobConf conf)
+                                throws Exception {
     if (conf == null) {
       conf = new JobConf();
     }
     conf.setClass("mapred.jobtracker.taskScheduler", 
         TestHighRAMJobs.FakeTaskScheduler.class,
         TaskScheduler.class);
-    if (initialFreeMemory != -1L) {
-      conf.setMaxVirtualMemoryForTasks(initialFreeMemory);  
+    if (totalMemoryOnTT != -1L) {
+      conf.setLong("mapred.tasktracker.tasks.maxmemory", totalMemoryOnTT);  
     }
-    conf.setLong("initialFreeMemory", initialFreeMemory);
-    conf.setLong("memoryPerTaskOnTT", memoryPerTaskOnTT);
+    conf.setLong("mapred.task.default.maxmemory", defaultMemoryPerTask);
+    conf.setLong("initialFreeMemory", totalMemoryOnTT);
+    conf.setLong("totalMemoryOnTT", totalMemoryOnTT);
     conf.setLong("memoryPerTask", memoryPerTask);
     miniDFSCluster = new MiniDFSCluster(conf, 1, true, null);
     FileSystem fileSys = miniDFSCluster.getFileSystem();

+ 3 - 2
src/test/org/apache/hadoop/mapred/TestTaskTrackerMemoryManager.java

@@ -104,7 +104,8 @@ public class TestTaskTrackerMemoryManager extends TestCase {
     // Start cluster with proper configuration.
     JobConf fConf = new JobConf();
 
-    fConf.setMaxVirtualMemoryForTasks(Long.valueOf(10000000000L)); // Fairly large value for WordCount to succeed
+    fConf.setLong("mapred.tasktracker.tasks.maxmemory", 
+                      Long.valueOf(10000000000L)); // Fairly large value for WordCount to succeed
     startCluster(fConf);
 
     // Set up job.
@@ -178,7 +179,7 @@ public class TestTaskTrackerMemoryManager extends TestCase {
 
     // Start cluster with proper configuration.
     JobConf fConf = new JobConf();
-    fConf.setMaxVirtualMemoryForTasks(Long.valueOf(100000));
+    fConf.setLong("mapred.tasktracker.tasks.maxmemory", Long.valueOf(100000));
     fConf.set("mapred.tasktracker.taskmemorymanager.monitoring-interval", String.valueOf(300));
             //very small value, so that no task escapes to successful completion.
     startCluster(fConf);

Some files were not shown because too many files changed in this diff