Browse Source

HADOOP-5736. Update the capacity scheduler documentation for features like memory based scheduling, job initialization and removal of pre-emption. Contributed by Sreekanth Ramakrishnan.

git-svn-id: https://svn.apache.org/repos/asf/hadoop/core/branches/branch-0.20@771623 13f79535-47bb-0310-9956-ffa450edef68
Hemanth Yamijala 16 years ago
parent
commit
3045710ce4

+ 4 - 0
CHANGES.txt

@@ -13,6 +13,10 @@ Release 0.20.1 - Unreleased
 
 
     HADOOP-5711. Change Namenode file close log to info. (szetszwo)
     HADOOP-5711. Change Namenode file close log to info. (szetszwo)
 
 
+    HADOOP-5736. Update the capacity scheduler documentation for features
+    like memory based scheduling, job initialization and removal of pre-emption.
+    (Sreekanth Ramakrishnan via yhemanth)
+
   OPTIMIZATIONS
   OPTIMIZATIONS
 
 
   BUG FIXES
   BUG FIXES

+ 6 - 9
conf/capacity-scheduler.xml.template

@@ -60,16 +60,13 @@
   <property>
   <property>
     <name>mapred.capacity-scheduler.task.default-pmem-percentage-in-vmem</name>
     <name>mapred.capacity-scheduler.task.default-pmem-percentage-in-vmem</name>
     <value>-1</value>
     <value>-1</value>
-    <description>If mapred.task.maxpmem is set to -1, this configuration will
-      be used to calculate job's physical memory requirements as a percentage of
-      the job's virtual memory requirements set via mapred.task.maxvmem. This
-      property thus provides default value of physical memory for job's that
-      don't explicitly specify physical memory requirements.
+    <description>A percentage (float) of the default VM limit for jobs
+   	  (mapred.task.default.maxvm). This is the default RAM task-limit 
+   	  associated with a task. Unless overridden by a job's setting, this 
+   	  number defines the RAM task-limit.
 
 
-      If not explicitly set to a valid value, scheduler will not consider
-      physical memory for scheduling even if virtual memory based scheduling is
-      enabled(by setting valid values for both mapred.task.default.maxvmem and
-      mapred.task.limit.maxvmem).
+      If this property is missing, or set to an invalid value, scheduling 
+      based on physical memory, RAM, is disabled.  
     </description>
     </description>
   </property>
   </property>
 
 

+ 155 - 76
src/docs/src/documentation/content/xdocs/capacity_scheduler.xml

@@ -28,7 +28,9 @@
     <section>
     <section>
       <title>Purpose</title>
       <title>Purpose</title>
       
       
-      <p>This document describes the Capacity Scheduler, a pluggable Map/Reduce scheduler for Hadoop which provides a way to share large clusters.</p>
+      <p>This document describes the Capacity Scheduler, a pluggable 
+      Map/Reduce scheduler for Hadoop which provides a way to share 
+      large clusters.</p>
     </section>
     </section>
     
     
     <section>
     <section>
@@ -40,19 +42,17 @@
           Support for multiple queues, where a job is submitted to a queue.
           Support for multiple queues, where a job is submitted to a queue.
         </li>
         </li>
         <li>
         <li>
-          Queues are guaranteed a fraction of the capacity of the grid (their 
- 	      'guaranteed capacity') in the sense that a certain capacity of 
- 	      resources will be at their disposal. All jobs submitted to a 
- 	      queue will have access to the capacity guaranteed to the queue.
+          Queues are allocated a fraction of the capacity of the grid in the 
+          sense that a certain capacity of resources will be at their 
+          disposal. All jobs submitted to a queue will have access to the 
+          capacity allocated to the queue.
         </li>
         </li>
         <li>
         <li>
-          Free resources can be allocated to any queue beyond its guaranteed 
-          capacity. These excess allocated resources can be reclaimed and made 
-          available to another queue in order to meet its capacity guarantee.
-        </li>
-        <li>
-          The scheduler guarantees that excess resources taken from a queue 
-          will be restored to it within N minutes of its need for them.
+          Free resources can be allocated to any queue beyond it's capacity. 
+          When there is demand for these resources from queues running below 
+          capacity at a future point in time, as tasks scheduled on these 
+          resources complete, they will be assigned to jobs on queues 
+          running below the capacity.
         </li>
         </li>
         <li>
         <li>
           Queues optionally support job priorities (disabled by default).
           Queues optionally support job priorities (disabled by default).
@@ -60,7 +60,9 @@
         <li>
         <li>
           Within a queue, jobs with higher priority will have access to the 
           Within a queue, jobs with higher priority will have access to the 
           queue's resources before jobs with lower priority. However, once a 
           queue's resources before jobs with lower priority. However, once a 
-          job is running, it will not be preempted for a higher priority job.
+          job is running, it will not be preempted for a higher priority job,
+          though new tasks from the higher priority job will be 
+          preferentially scheduled.
         </li>
         </li>
         <li>
         <li>
           In order to prevent one or more users from monopolizing its 
           In order to prevent one or more users from monopolizing its 
@@ -83,59 +85,34 @@
       <p>Note that many of these steps can be, and will be, enhanced over time
       <p>Note that many of these steps can be, and will be, enhanced over time
       to provide better algorithms.</p>
       to provide better algorithms.</p>
       
       
-      <p>Whenever a TaskTracker is free, the Capacity Scheduler first picks a 
-      queue that needs to reclaim any resources the earliest (this is a queue
-      whose resources were temporarily being used by some other queue and now
-      needs access to those resources). If no such queue is found, it then picks
+      <p>Whenever a TaskTracker is free, the Capacity Scheduler picks 
       a queue which has most free space (whose ratio of # of running slots to 
       a queue which has most free space (whose ratio of # of running slots to 
-      guaranteed capacity is the lowest).</p>
+      capacity is the lowest).</p>
       
       
-      <p>Once a queue is selected, the scheduler picks a job in the queue. Jobs
+      <p>Once a queue is selected, the Scheduler picks a job in the queue. Jobs
       are sorted based on when they're submitted and their priorities (if the 
       are sorted based on when they're submitted and their priorities (if the 
       queue supports priorities). Jobs are considered in order, and a job is 
       queue supports priorities). Jobs are considered in order, and a job is 
       selected if its user is within the user-quota for the queue, i.e., the 
       selected if its user is within the user-quota for the queue, i.e., the 
       user is not already using queue resources above his/her limit. The 
       user is not already using queue resources above his/her limit. The 
-      scheduler also makes sure that there is enough free memory in the 
+      Scheduler also makes sure that there is enough free memory in the 
       TaskTracker to tun the job's task, in case the job has special memory
       TaskTracker to tun the job's task, in case the job has special memory
       requirements.</p>
       requirements.</p>
       
       
-      <p>Once a job is selected, the scheduler picks a task to run. This logic 
+      <p>Once a job is selected, the Scheduler picks a task to run. This logic 
       to pick a task remains unchanged from earlier versions.</p> 
       to pick a task remains unchanged from earlier versions.</p> 
       
       
     </section>
     </section>
     
     
-    <section>
-      <title>Reclaiming capacity</title>
-
-	  <p>Periodically, the scheduler determines:</p>
-	  <ul>
-	    <li>
-	      if a queue needs to reclaim capacity. This happens when a queue has
-	      at least one task pending and part of its guaranteed capacity is 
-	      being used by some other queue. If this happens, the scheduler notes
-	      the amount of resources it needs to reclaim for this queue within a 
-	      specified period of time (the reclaim time). 
-	    </li>
-	    <li>
-	      if a queue has not received all the resources it needed to reclaim,
-	      and its reclaim time is about to expire. In this case, the scheduler
-	      needs to kill tasks from queues running over capacity. This it does
-	      by killing the tasks that started the latest.
-	    </li>
-	  </ul>   
-
-    </section>
-
     <section>
     <section>
       <title>Installation</title>
       <title>Installation</title>
       
       
-        <p>The capacity scheduler is available as a JAR file in the Hadoop
+        <p>The Capacity Scheduler is available as a JAR file in the Hadoop
         tarball under the <em>contrib/capacity-scheduler</em> directory. The name of 
         tarball under the <em>contrib/capacity-scheduler</em> directory. The name of 
         the JAR file would be on the lines of hadoop-*-capacity-scheduler.jar.</p>
         the JAR file would be on the lines of hadoop-*-capacity-scheduler.jar.</p>
-        <p>You can also build the scheduler from source by executing
+        <p>You can also build the Scheduler from source by executing
         <em>ant package</em>, in which case it would be available under
         <em>ant package</em>, in which case it would be available under
         <em>build/contrib/capacity-scheduler</em>.</p>
         <em>build/contrib/capacity-scheduler</em>.</p>
-        <p>To run the capacity scheduler in your Hadoop installation, you need 
+        <p>To run the Capacity Scheduler in your Hadoop installation, you need 
         to put it on the <em>CLASSPATH</em>. The easiest way is to copy the 
         to put it on the <em>CLASSPATH</em>. The easiest way is to copy the 
         <code>hadoop-*-capacity-scheduler.jar</code> from 
         <code>hadoop-*-capacity-scheduler.jar</code> from 
         to <code>HADOOP_HOME/lib</code>. Alternatively, you can modify 
         to <code>HADOOP_HOME/lib</code>. Alternatively, you can modify 
@@ -147,9 +124,9 @@
       <title>Configuration</title>
       <title>Configuration</title>
 
 
       <section>
       <section>
-        <title>Using the capacity scheduler</title>
+        <title>Using the Capacity Scheduler</title>
         <p>
         <p>
-          To make the Hadoop framework use the capacity scheduler, set up
+          To make the Hadoop framework use the Capacity Scheduler, set up
           the following property in the site configuration:</p>
           the following property in the site configuration:</p>
           <table>
           <table>
             <tr>
             <tr>
@@ -167,7 +144,7 @@
         <title>Setting up queues</title>
         <title>Setting up queues</title>
         <p>
         <p>
           You can define multiple queues to which users can submit jobs with
           You can define multiple queues to which users can submit jobs with
-          the capacity scheduler. To define multiple queues, you should edit
+          the Capacity Scheduler. To define multiple queues, you should edit
           the site configuration for Hadoop and modify the
           the site configuration for Hadoop and modify the
           <em>mapred.queue.names</em> property.
           <em>mapred.queue.names</em> property.
         </p>
         </p>
@@ -185,8 +162,8 @@
       <section>
       <section>
         <title>Configuring properties for queues</title>
         <title>Configuring properties for queues</title>
 
 
-        <p>The capacity scheduler can be configured with several properties
-        for each queue that control the behavior of the scheduler. This
+        <p>The Capacity Scheduler can be configured with several properties
+        for each queue that control the behavior of the Scheduler. This
         configuration is in the <em>conf/capacity-scheduler.xml</em>. By
         configuration is in the <em>conf/capacity-scheduler.xml</em>. By
         default, the configuration is set up for one queue, named 
         default, the configuration is set up for one queue, named 
         <em>default</em>.</p>
         <em>default</em>.</p>
@@ -194,10 +171,10 @@
         configuration, you should use the property name as
         configuration, you should use the property name as
         <em>mapred.capacity-scheduler.queue.&lt;queue-name&gt;.&lt;property-name&gt;</em>.
         <em>mapred.capacity-scheduler.queue.&lt;queue-name&gt;.&lt;property-name&gt;</em>.
         </p>
         </p>
-        <p>For example, to define the property <em>guaranteed-capacity</em>
+        <p>For example, to define the property <em>capacity</em>
         for queue named <em>research</em>, you should specify the property
         for queue named <em>research</em>, you should specify the property
         name as 
         name as 
-        <em>mapred.capacity-scheduler.queue.research.guaranteed-capacity</em>.
+        <em>mapred.capacity-scheduler.queue.research.capacity</em>.
         </p>
         </p>
 
 
         <p>The properties defined for queues and their descriptions are
         <p>The properties defined for queues and their descriptions are
@@ -205,15 +182,10 @@
 
 
         <table>
         <table>
           <tr><th>Name</th><th>Description</th></tr>
           <tr><th>Name</th><th>Description</th></tr>
-          <tr><td>mapred.capacity-scheduler.queue.&lt;queue-name&gt;.guaranteed-capacity</td>
-          	<td>Percentage of the number of slots in the cluster that are
-          	guaranteed to be available for jobs in this queue. 
-          	The sum of guaranteed capacities for all queues should be less 
-          	than or equal 100.</td>
-          </tr>
-          <tr><td>mapred.capacity-scheduler.queue.&lt;queue-name&gt;.reclaim-time-limit</td>
-          	<td>The amount of time, in seconds, before which resources 
-          	distributed to other queues will be reclaimed.</td>
+          <tr><td>mapred.capacity-scheduler.queue.&lt;queue-name&gt;.capacity</td>
+          	<td>Percentage of the number of slots in the cluster that are made 
+            to be available for jobs in this queue. The sum of capacities 
+            for all queues should be less than or equal 100.</td>
           </tr>
           </tr>
           <tr><td>mapred.capacity-scheduler.queue.&lt;queue-name&gt;.supports-priority</td>
           <tr><td>mapred.capacity-scheduler.queue.&lt;queue-name&gt;.supports-priority</td>
           	<td>If true, priorities of jobs will be taken into account in scheduling 
           	<td>If true, priorities of jobs will be taken into account in scheduling 
@@ -236,27 +208,133 @@
       </section>
       </section>
       
       
       <section>
       <section>
-        <title>Configuring the capacity scheduler</title>
-        <p>The capacity scheduler's behavior can be controlled through the 
-          following properties. 
+        <title>Memory management</title>
+      
+        <p>The Capacity Scheduler supports scheduling of tasks on a
+        <code>TaskTracker</code>(TT) based on a job's memory requirements
+        and the availability of RAM and Virtual Memory (VMEM) on the TT node.
+        See the <a href="mapred_tutorial.html#Memory+monitoring">Hadoop 
+        Map/Reduce tutorial</a> for details on how the TT monitors
+        memory usage.</p>
+        <p>Currently the memory based scheduling is only supported
+        in Linux platform.</p>
+        <p>Memory-based scheduling works as follows:</p>
+        <ol>
+          <li>The absence of any one or more of three config parameters 
+          or -1 being set as value of any of the parameters, 
+          <code>mapred.tasktracker.vmem.reserved</code>, 
+          <code>mapred.task.default.maxvmem</code>, or
+          <code>mapred.task.limit.maxvmem</code>, disables memory-based
+          scheduling, just as it disables memory monitoring for a TT. These
+          config parameters are described in the 
+          <a href="mapred_tutorial.html#Memory+monitoring">Hadoop Map/Reduce 
+          tutorial</a>. The value of  
+          <code>mapred.tasktracker.vmem.reserved</code> is 
+          obtained from the TT via its heartbeat. 
+          </li>
+          <li>If all the three mandatory parameters are set, the Scheduler 
+          enables VMEM-based scheduling. First, the Scheduler computes the free
+          VMEM on the TT. This is the difference between the available VMEM on the
+          TT (the node's total VMEM minus the offset, both of which are sent by 
+          the TT on each heartbeat)and the sum of VMs already allocated to 
+          running tasks (i.e., sum of the VMEM task-limits). Next, the Scheduler
+          looks at the VMEM requirements for the job that's first in line to 
+          run. If the job's VMEM requirements are less than the available VMEM on 
+          the node, the job's task can be scheduled. If not, the Scheduler 
+          ensures that the TT does not get a task to run (provided the job 
+          has tasks to run). This way, the Scheduler ensures that jobs with 
+          high memory requirements are not starved, as eventually, the TT 
+          will have enough VMEM available. If the high-mem job does not have 
+          any task to run, the Scheduler moves on to the next job. 
+          </li>
+          <li>In addition to VMEM, the Capacity Scheduler can also consider 
+          RAM on the TT node. RAM is considered the same way as VMEM. TTs report
+          the total RAM available on their node, and an offset. If both are
+          set, the Scheduler computes the available RAM on the node. Next, 
+          the Scheduler figures out the RAM requirements of the job, if any. 
+          As with VMEM, users can optionally specify a RAM limit for their job
+          (<code>mapred.task.maxpmem</code>, described in the Map/Reduce 
+          tutorial). The Scheduler also maintains a limit for this value 
+          (<code>mapred.capacity-scheduler.task.default-pmem-percentage-in-vmem</code>, 
+          described below). All these three values must be set for the 
+          Scheduler to schedule tasks based on RAM constraints.
+          </li>
+          <li>The Scheduler ensures that jobs cannot ask for RAM or VMEM higher
+          than configured limits. If this happens, the job is failed when it
+          is submitted. 
+          </li>
+        </ol>
+        
+        <p>As described above, the additional scheduler-based config 
+        parameters are as follows:</p>
+
+        <table>
+          <tr><th>Name</th><th>Description</th></tr>
+          <tr><td>mapred.capacity-scheduler.task.default-pmem-percentage-in-vmem</td>
+          	<td>A percentage of the default VMEM limit for jobs
+          	(<code>mapred.task.default.maxvmem</code>). This is the default 
+          	RAM task-limit associated with a task. Unless overridden by a 
+          	job's setting, this number defines the RAM task-limit.</td>
+          </tr>
+          <tr><td>mapred.capacity-scheduler.task.limit.maxpmem</td>
+          <td>Configuration which provides an upper limit to maximum physical
+           memory which can be specified by a job. If a job requires more 
+           physical memory than what is specified in this limit then the same
+           is rejected.</td>
+          </tr>
+        </table>
+      </section>
+   <section>
+        <title>Job Initialization Parameters</title>
+        <p>Capacity scheduler lazily initializes the jobs before they are
+        scheduled, for reducing the memory footprint on jobtracker. 
+        Following are the parameters, by which you can control the laziness
+        of the job initialization. The following parameters can be 
+        configured in capacity-scheduler.xml
         </p>
         </p>
+        
         <table>
         <table>
+          <tr><th>Name</th><th>Description</th></tr>
           <tr>
           <tr>
-          <th>Name</th><th>Description</th>
+            <td>
+              mapred.capacity-scheduler.queue.&lt;queue-name&gt;.maximum-initialized-jobs-per-user
+            </td>
+            <td>
+              Maximum number of jobs which are allowed to be pre-initialized for
+              a particular user in the queue. Once a job is scheduled, i.e. 
+              it starts running, then that job is not considered
+              while scheduler computes the maximum job a user is allowed to
+              initialize. 
+            </td>
           </tr>
           </tr>
           <tr>
           <tr>
-          <td>mapred.capacity-scheduler.reclaimCapacity.interval</td>
-          <td>The time interval, in seconds, between which the scheduler 
-          periodically determines whether capacity needs to be reclaimed for 
-          any queue. The default value is 5 seconds.
-          </td>
+            <td>
+              mapred.capacity-scheduler.init-poll-interval
+            </td>
+            <td>
+              Amount of time in miliseconds which is used to poll the scheduler
+              job queue to look for jobs to be initialized.
+            </td>
+          </tr>
+          <tr>
+            <td>
+              mapred.capacity-scheduler.init-worker-threads
+            </td>
+            <td>
+              Number of worker threads which would be used by Initialization
+              poller to initialize jobs in a set of queue. If number mentioned 
+              in property is equal to number of job queues then a thread is 
+              assigned jobs from one queue. If the number configured is lesser than
+              number of queues, then a thread can get jobs from more than one queue
+              which it initializes in a round robin fashion. If the number configured
+              is greater than number of queues, then number of threads spawned
+              would be equal to number of job queues.
+            </td>
           </tr>
           </tr>
         </table>
         </table>
-        
-      </section>
-
+      </section>   
       <section>
       <section>
-        <title>Reviewing the configuration of the capacity scheduler</title>
+        <title>Reviewing the configuration of the Capacity Scheduler</title>
         <p>
         <p>
           Once the installation and configuration is completed, you can review
           Once the installation and configuration is completed, you can review
           it after starting the Map/Reduce cluster from the admin UI.
           it after starting the Map/Reduce cluster from the admin UI.
@@ -270,7 +348,8 @@
               Information</em> column against each queue.</li>
               Information</em> column against each queue.</li>
         </ul>
         </ul>
       </section>
       </section>
-    </section>
+      
+   </section>
   </body>
   </body>
   
   
 </document>
 </document>

+ 114 - 0
src/docs/src/documentation/content/xdocs/cluster_setup.xml

@@ -463,6 +463,120 @@
           </section>
           </section>
           
           
         </section>
         </section>
+        <section>
+        <title> Memory monitoring</title>
+        <p>A <code>TaskTracker</code>(TT) can be configured to monitor memory 
+        usage of tasks it spawns, so that badly-behaved jobs do not bring 
+        down a machine due to excess memory consumption. With monitoring 
+        enabled, every task is assigned a task-limit for virtual memory (VMEM). 
+        In addition, every node is assigned a node-limit for VMEM usage. 
+        A TT ensures that a task is killed if it, and 
+        its descendants, use VMEM over the task's per-task limit. It also 
+        ensures that one or more tasks are killed if the sum total of VMEM 
+        usage by all tasks, and their descendents, cross the node-limit.</p>
+        
+        <p>Users can, optionally, specify the VMEM task-limit per job. If no
+        such limit is provided, a default limit is used. A node-limit can be 
+        set per node.</p>   
+        <p>Currently the memory monitoring and management is only supported
+        in Linux platform.</p>
+        <p>To enable monitoring for a TT, the 
+        following parameters all need to be set:</p> 
+
+        <table>
+          <tr><th>Name</th><th>Type</th><th>Description</th></tr>
+          <tr><td>mapred.tasktracker.vmem.reserved</td><td>long</td>
+            <td>A number, in bytes, that represents an offset. The total VMEM on 
+            the machine, minus this offset, is the VMEM node-limit for all 
+            tasks, and their descendants, spawned by the TT. 
+          </td></tr>
+          <tr><td>mapred.task.default.maxvmem</td><td>long</td>
+            <td>A number, in bytes, that represents the default VMEM task-limit 
+            associated with a task. Unless overridden by a job's setting, 
+            this number defines the VMEM task-limit.   
+          </td></tr>
+          <tr><td>mapred.task.limit.maxvmem</td><td>long</td>
+            <td>A number, in bytes, that represents the upper VMEM task-limit 
+            associated with a task. Users, when specifying a VMEM task-limit 
+            for their tasks, should not specify a limit which exceeds this amount. 
+          </td></tr>
+        </table>
+        
+        <p>In addition, the following parameters can also be configured.</p>
+
+    <table>
+          <tr><th>Name</th><th>Type</th><th>Description</th></tr>
+          <tr><td>mapred.tasktracker.taskmemorymanager.monitoring-interval</td>
+            <td>long</td>
+            <td>The time interval, in milliseconds, between which the TT 
+            checks for any memory violation. The default value is 5000 msec
+            (5 seconds). 
+          </td></tr>
+        </table>
+        
+        <p>Here's how the memory monitoring works for a TT.</p>
+        <ol>
+          <li>If one or more of the configuration parameters described 
+          above are missing or -1 is specified , memory monitoring is 
+          disabled for the TT.
+          </li>
+          <li>In addition, monitoring is disabled if 
+          <code>mapred.task.default.maxvmem</code> is greater than 
+          <code>mapred.task.limit.maxvmem</code>. 
+          </li>
+          <li>If a TT receives a task whose task-limit is set by the user
+          to a value larger than <code>mapred.task.limit.maxvmem</code>, it 
+          logs a warning but executes the task.
+          </li> 
+          <li>Periodically, the TT checks the following: 
+          <ul>
+            <li>If any task's current VMEM usage is greater than that task's
+            VMEM task-limit, the task is killed and reason for killing 
+            the task is logged in task diagonistics . Such a task is considered 
+            failed, i.e., the killing counts towards the task's failure count.
+            </li> 
+            <li>If the sum total of VMEM used by all tasks and descendants is 
+            greater than the node-limit, the TT kills enough tasks, in the
+            order of least progress made, till the overall VMEM usage falls
+            below the node-limt. Such killed tasks are not considered failed
+            and their killing does not count towards the tasks' failure counts.
+            </li>
+          </ul>
+          </li>
+        </ol>
+        
+        <p>Schedulers can choose to ease the monitoring pressure on the TT by 
+        preventing too many tasks from running on a node and by scheduling 
+        tasks only if the TT has enough VMEM free. In addition, Schedulers may 
+        choose to consider the physical memory (RAM) available on the node
+        as well. To enable Scheduler support, TTs report their memory settings 
+        to the JobTracker in every heartbeat. Before getting into details, 
+        consider the following additional memory-related parameters than can be 
+        configured to enable better scheduling:</p> 
+
+        <table>
+          <tr><th>Name</th><th>Type</th><th>Description</th></tr>
+          <tr><td>mapred.tasktracker.pmem.reserved</td><td>int</td>
+            <td>A number, in bytes, that represents an offset. The total 
+            physical memory (RAM) on the machine, minus this offset, is the 
+            recommended RAM node-limit. The RAM node-limit is a hint to a
+            Scheduler to scheduler only so many tasks such that the sum 
+            total of their RAM requirements does not exceed this limit. 
+            RAM usage is not monitored by a TT.   
+          </td></tr>
+        </table>
+        
+        <p>A TT reports the following memory-related numbers in every 
+        heartbeat:</p>
+        <ul>
+          <li>The total VMEM available on the node.</li>
+          <li>The value of <code>mapred.tasktracker.vmem.reserved</code>,
+           if set.</li>
+          <li>The total RAM available on the node.</li> 
+          <li>The value of <code>mapred.tasktracker.pmem.reserved</code>,
+           if set.</li>
+         </ul>
+        </section>
         
         
         <section>
         <section>
           <title>Slaves</title>
           <title>Slaves</title>

+ 19 - 1
src/docs/src/documentation/content/xdocs/mapred_tutorial.xml

@@ -1104,8 +1104,26 @@
         counters for a job- particularly relative to byte counts from the map
         counters for a job- particularly relative to byte counts from the map
         and into the reduce- is invaluable to the tuning of these
         and into the reduce- is invaluable to the tuning of these
         parameters.</p>
         parameters.</p>
+        
+        <p>Users can choose to override default limits of Virtual Memory and RAM 
+          enforced by the task tracker, if memory management is enabled. 
+          Users can set the following parameter per job:</p>
+           
+          <table>
+          <tr><th>Name</th><th>Type</th><th>Description</th></tr>
+          <tr><td><code>mapred.task.maxvmem</code></td><td>int</td>
+            <td>A number, in bytes, that represents the maximum Virtual Memory
+            task-limit for each task of the job. A task will be killed if 
+            it consumes more Virtual Memory than this number. 
+          </td></tr>
+          <tr><td>mapred.task.maxpmem</td><td>int</td>
+            <td>A number, in bytes, that represents the maximum RAM task-limit
+            for each task of the job. This number can be optionally used by
+            Schedulers to prevent over-scheduling of tasks on a node based 
+            on RAM needs.  
+          </td></tr>
+        </table>       
         </section>
         </section>
-
         <section>
         <section>
           <title>Map Parameters</title>
           <title>Map Parameters</title>