فهرست منبع

HADOOP-4804. Provide Forrest documentation for the Fair Scheduler. Contributed by Sreekanth Ramakrishnan.

git-svn-id: https://svn.apache.org/repos/asf/hadoop/core/trunk@744894 13f79535-47bb-0310-9956-ffa450edef68
Hemanth Yamijala 16 سال پیش
والد
کامیت
f90ab67d82

+ 3 - 0
CHANGES.txt

@@ -503,6 +503,9 @@ Release 0.20.0 - Unreleased
     HADOOP-4944. A configuration file can include other configuration
     files. (Rama Ramasamy via dhruba)
 
+    HADOOP-4804. Provide Forrest documentation for the Fair Scheduler.
+    (Sreekanth Ramakrishnan via yhemanth)
+
   OPTIMIZATIONS
 
     HADOOP-3293. Fixes FileInputFormat to do provide locations for splits

+ 7 - 238
src/contrib/fairscheduler/README

@@ -18,243 +18,12 @@
 This package implements a fair scheduler for MapReduce jobs with additional
 support for guaranteed shares and job limits.
 
-Fair scheduling is a method of assigning resources to jobs such that all jobs
-get, on average, an equal share of resources over time. When there is a single
-job running, that job uses the entire cluster. When other jobs are submitted,
-tasks slots that free up are assigned to the new jobs, so that each job gets
-roughly the same amount of CPU time. Unlike the default Hadoop scheduler, which
-forms a queue of jobs, this lets short jobs finish in reasonable time while not
-starving long jobs. It is also a reasonable way to share a cluster between a
-number of users. Finally, fair sharing can also work with job priorities - the
-priorities are used as weights to determine the fraction of total compute time
-that each job should get.
+The functionality of this scheduler is described in the Forrest 
+documentation at http://hadoop.apache.org/core/ or alternatively, in the 
+hadoop release it can be found at $HADOOP_HOME/docs. In order to build the 
+documentation on your own from source please use the following command in 
+the downloaded source folder:
 
-The scheduler actually organizes jobs further into "pools", and shares resources
-fairly between these pools. By default, there is a separate pool for each user,
-so that each user gets the same share of the cluster no matter how many jobs
-they submit. However, it is also possible to set a job's pool based on the
-user's Unix group or any other jobconf property, such as the queue name
-property used by the Capacity Scheduler (JIRA HADOOP-3445). Within each pool,
-fair sharing is used to share capacity between the running jobs. Pools can also
-be given weights to share the cluster non-proportionally in the config file.
+ant docs -Dforrest.home=path to forrest -Djava5.home= path to jdk5. 
 
-In addition to providing fair sharing, the Fair Scheduler allows assigning
-guaranteed minimum shares to pools, which is useful for ensuring that certain
-users, groups or production applications always get sufficient resources.
-When a pool contains jobs, it gets at least its minimum share, but when the pool
-does not need its full guaranteed share, the excess is split between other
-running jobs. This lets the scheduler guarantee capacity for pools while
-utilizing resources efficiently when these pools don't contain jobs.
-
-The Fair Scheduler lets all jobs run by default, but it is also possible to
-limit the number of running jobs per user and per pool through the config
-file. This can be useful when a user must submit hundreds of jobs at once,
-or in general to improve performance if running too many jobs at once would
-cause too much intermediate data to be created or too much context-switching.
-Limiting the jobs does not cause any subsequently submitted jobs to fail, only
-to wait in the sheduler's queue until some of the user's earlier jobs finish.
-Jobs to run from each user/pool are chosen in order of priority and then submit
-time, as in the default FIFO scheduler in Hadoop.
-
-Finally, the fair scheduler provides several extension points where the basic
-functionality can be extended. For example, the weight calculation can be
-modified to give a priority boost to new jobs, implementing a "shortest job
-first" policy which reduces response times for interactive jobs even further. 
-
---------------------------------------------------------------------------------
-
-BUILDING:
-
-In HADOOP_HOME, run ant package to build Hadoop and its contrib packages.
-
---------------------------------------------------------------------------------
-
-INSTALLING:
-
-To run the fair scheduler in your Hadoop installation, you need to put it on
-the CLASSPATH. The easiest way is to copy the hadoop-*-fairscheduler.jar
-from HADOOP_HOME/build/contrib/fairscheduler to HADOOP_HOME/lib. Alternatively
-you can modify HADOOP_CLASSPATH to include this jar, in conf/hadoop-env.sh.
-
-You will also need to set the following property in the Hadoop config file
-(conf/mapred-site.xml) to have Hadoop use the fair scheduler:
-
-<property>
-  <name>mapred.jobtracker.taskScheduler</name>
-  <value>org.apache.hadoop.mapred.FairScheduler</value>
-</property>
-
-Once you restart the cluster, you can check that the fair scheduler is running
-by going to http://<jobtracker URL>/scheduler on the JobTracker's web UI. A
-"job scheduler administration" page should be visible there. This page is
-described in the Administration section.
-
---------------------------------------------------------------------------------
-
-CONFIGURING:
-
-The following properties can be set in mapred-site.xml to configure the
-scheduler:
-
-mapred.fairscheduler.allocation.file:
-    Specifies an absolute path to an XML file which contains the allocations
-    for each pool, as well as the per-pool and per-user limits on number of
-    running jobs. If this property is not provided, allocations are not used.
-    This file must be in XML format, and can contain three types of elements:
-    - pool elements, which may contain elements for minMaps, minReduces,
-      maxRunningJobs (limit the number of jobs from the pool to run at once),
-      and weight (to share the cluster non-proportionally with other pools).
-    - user elements, which may contain a maxRunningJobs to limit jobs. Note
-      that by default, there is a separate pool for each user, so these may not
-      be necessary; they are useful, however, if you create a pool per user
-      group or manually assign jobs to pools.
-    - A userMaxJobsDefault element, which sets the default running job limit
-      for any users whose limit is not specified.
-    The following example file shows how to create each type of element:
-        <?xml version="1.0"?>
-        <allocations>
-          <pool name="sample_pool">
-            <minMaps>5</minMaps>
-            <minReduces>5</minReduces>
-            <weight>2.0</weight>
-          </pool>
-          <user name="sample_user">
-            <maxRunningJobs>6</maxRunningJobs>
-          </user>
-          <userMaxJobsDefault>3</userMaxJobsDefault>
-        </allocations>
-    This example creates a pool sample_pool with a guarantee of 5 map slots
-    and 5 reduce slots. The pool also has a weight of 2.0, meaning it has a 2x
-    higher share of the cluster than other pools (the default weight is 1).
-    Finally, the example limits the number of running jobs per user
-    to 3, except for sample_user, who can run 6 jobs concurrently.
-    Any pool not defined in the allocations file will have no guaranteed
-    capacity and a weight of 1.0. Also, any pool or user with no max running
-    jobs set in the file will be allowed to run an unlimited number of jobs.
-
-mapred.fairscheduler.assignmultiple:
-    Allows the scheduler to assign both a map task and a reduce task on each
-    heartbeat, which improves cluster throughput when there are many small
-    tasks to run. Boolean value, default: false.
-
-mapred.fairscheduler.sizebasedweight:
-    Take into account job sizes in calculating their weights for fair sharing.
-    By default, weights are only based on job priorities. Setting this flag to
-    true will make them based on the size of the job (number of tasks needed)
-    as well, though not linearly (the weight will be proportional to the log
-    of the number of tasks needed). This lets larger jobs get larger fair
-    shares while still providing enough of a share to small jobs to let them
-    finish fast. Boolean value, default: false.
-
-mapred.fairscheduler.poolnameproperty:
-    Specify which jobconf property is used to determine the pool that a job
-    belongs in. String, default: user.name (i.e. one pool for each user).
-    Some other useful values to set this to are:
-    - group.name (to create a pool per Unix group).
-    - mapred.job.queue.name (the same property as the queue name in the
-    Capacity Scheduler, JIRA HADOOP-3445).
-
-mapred.fairscheduler.weightadjuster:
-    An extensibility point that lets you specify a class to adjust the weights
-    of running jobs. This class should implement the WeightAdjuster interface.
-    There is currently one example implementation - NewJobWeightBooster, which
-    increases the weight of jobs for the first 5 minutes of their lifetime
-    to let short jobs finish faster. To use it, set the weightadjuster property
-    to the full class name, org.apache.hadoop.mapred.NewJobWeightBooster.
-    NewJobWeightBooster itself provides two parameters for setting the duration
-    and boost factor - mapred.newjobweightbooster.factor (default 3) and
-    mapred.newjobweightbooster.duration (in milliseconds, default 300000 for 5
-    minutes).
-
-mapred.fairscheduler.loadmanager:
-    An extensibility point that lets you specify a class that determines
-    how many maps and reduces can run on a given TaskTracker. This class should
-    implement the LoadManager interface. By default the task caps in the Hadoop
-    config file are used, but this option could be used to make the load based
-    on available memory and CPU utilization for example.
-
-mapred.fairscheduler.taskselector:
-    An extensibility point that lets you specify a class that determines
-    which task from within a job to launch on a given tracker. This can be
-    used to change either the locality policy (e.g. keep some jobs within
-    a particular rack) or the speculative execution algorithm (select when to
-    launch speculative tasks). The default implementation uses Hadoop's
-    default algorithms from JobInProgress. 
-
---------------------------------------------------------------------------------
-
-ADMINISTRATION:
-
-The fair scheduler provides support for administration at runtime through
-two mechanisms. First, it is possible to modify pools' allocations and user
-and pool running job limits at runtime by editing the allocation config file.
-The scheduler will reload this file 10-15 seconds after it sees that it was
-modified. Second, current jobs, pools, and fair shares can be examined through
-the JobTracker's web interface, at http://<jobtracker URL>/scheduler. On this
-interface, it is also possible to modify jobs' priorities or move jobs from
-one pool to another and see the effects on the fair shares (this requires
-JavaScript). The following fields can be seen for each job on the web interface:
-
-Submitted - Date and time job was submitted.
-JobID, User, Name - Job identifiers as on the standard web UI.
-Pool - Current pool of job. Select another value to move job to another pool.
-Priority - Current priority. Select another value to change the job's priority.
-Maps/Reduces Finished: Number of tasks finished / total tasks.
-Maps/Reduces Running: Tasks currently running.
-Map/Reduce Fair Share: The average number of task slots that this job should
-    have at any given time according to fair sharing. The actual number of
-    tasks will go up and down depending on how much compute time the job has
-    had, but on average it will get its fair share amount.
-
-In addition, it is possible to turn on an "advanced" view for the web UI, by
-going to http://<jobtracker URL>/scheduler?advanced. This view shows four more
-columns used for calculations internally:
-
-Maps/Reduce Weight: Weight of the job in the fair sharing calculations. This
-    depends on priority and potentially also on job size and job age if the
-    sizebasedweight and NewJobWeightBooster are enabled.
-Map/Reduce Deficit: The job's scheduling deficit in macine-seconds - the amount
-    of resources it should have gotten according to its fair share, minus how
-    many it actually got. Positive deficit means the job will be scheduled
-    again in the near future because it needs to catch up to its fair share.
-    The scheduler schedules jobs with higher deficit ahead of others. Please
-    see the Implementation section of this document for details.
-
-Finally, the web interface provides a button for switching to FIFO scheduling,
-at runtime, at the bottom of the page, in case this becomes necessary and it
-is inconvenient to restart the MapReduce cluster. 
-
---------------------------------------------------------------------------------
-
-IMPLEMENTATION:
-
-There are two aspects to implementing fair scheduling: Calculating each job's
-fair share, and choosing which job to run when a task slot becomes available.
-
-To select jobs to run, the scheduler then keeps track of a "deficit" for
-each job - the difference between the amount of compute time it should have
-gotten on an ideal scheduler, and the amount of compute time it actually got.
-This is a measure of how "unfair" we've been to the job. Every few hundred
-milliseconds, the scheduler updates the deficit of each job by looking at
-how many tasks each job had running during this interval vs. its fair share.
-Whenever a task slot becomes available, it is assigned to the job with the
-highest deficit. There is one exception - if there were one or more jobs who
-were not meeting their pool capacity guarantees, we only choose among these
-"needy" jobs (based again on their deficit), to ensure that the scheduler
-meets pool guarantees as soon as possible.
-
-The fair shares are calculated by dividing the capacity of the cluster among
-runnable jobs according to a "weight" for each job. By default the weight is
-based on priority, with each level of priority having 2x higher weight than the
-next (for example, VERY_HIGH has 4x the weight of NORMAL). However, weights can
-also be based on job sizes and ages, as described in the Configuring section.
-For jobs that are in a pool, fair shares also take into account the minimum
-guarantee for that pool. This capacity is divided among the jobs in that pool
-according again to their weights.
-
-Finally, when limits on a user's running jobs or a pool's running jobs are in
-place, we choose which jobs get to run by sorting all jobs in order of priority
-and then submit time, as in the standard Hadoop scheduler. Any jobs that fall
-after the user/pool's limit in this ordering are queued up and wait idle until
-they can be run. During this time, they are ignored from the fair sharing
-calculations and do not gain or lose deficit (their fair share is set to zero).
+The documentation so built would be under $HADOOP_HOME/build/docs

+ 371 - 0
src/docs/src/documentation/content/xdocs/fair_scheduler.xml

@@ -0,0 +1,371 @@
+<?xml version="1.0"?>
+  <!--
+    Licensed to the Apache Software Foundation (ASF) under one or more
+    contributor license agreements. See the NOTICE file distributed with
+    this work for additional information regarding copyright ownership.
+    The ASF licenses this file to You under the Apache License, Version
+    2.0 (the "License"); you may not use this file except in compliance
+    with the License. You may obtain a copy of the License at
+
+    http://www.apache.org/licenses/LICENSE-2.0 Unless required by
+    applicable law or agreed to in writing, software distributed under
+    the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES
+    OR CONDITIONS OF ANY KIND, either express or implied. See the
+    License for the specific language governing permissions and
+    limitations under the License.
+  -->
+
+<!DOCTYPE document PUBLIC "-//APACHE//DTD Documentation V2.0//EN" "http://forrest.apache.org/dtd/document-v20.dtd">
+<document>
+  <header>
+    <title>Fair Scheduler</title>
+  </header>
+  <body>
+
+    <section>
+      <title>Purpose</title>
+
+      <p>This document describes the Fair Scheduler, a pluggable
+        Map/Reduce scheduler for Hadoop which provides a way to share
+        large clusters.</p>
+    </section>
+
+    <section>
+      <title>Introduction</title>
+      <p>Fair scheduling is a method of assigning resources to jobs
+        such that all jobs get, on average, an equal share of resources
+        over time. When there is a single job running, that job uses the
+        entire cluster. When other jobs are submitted, tasks slots that
+        free up are assigned to the new jobs, so that each job gets
+        roughly the same amount of CPU time. Unlike the default Hadoop
+        scheduler, which forms a queue of jobs, this lets short jobs finish
+        in reasonable time while not starving long jobs. It is also a 
+        reasonable way to share a cluster between a number of users. Finally, 
+        fair sharing can also work with job priorities - the priorities are
+        used as weights to determine the fraction of total compute time that
+        each job should get.
+      </p>
+      <p>
+        The scheduler actually organizes jobs further into "pools", and 
+        shares resources fairly between these pools. By default, there is a 
+        separate pool for each user, so that each user gets the same share 
+        of the cluster no matter how many jobs they submit. However, it is 
+        also possible to set a job's pool based on the user's Unix group or
+        any other jobconf property, such as the queue name property used by 
+        <a href="capacity_scheduler.html">Capacity Scheduler</a>. 
+        Within each pool, fair sharing is used to share capacity between 
+        the running jobs. Pools can also be given weights to share the 
+        cluster non-proportionally in the config file.
+      </p>
+      <p>
+        In addition to providing fair sharing, the Fair Scheduler allows
+        assigning guaranteed minimum shares to pools, which is useful for
+        ensuring that certain users, groups or production applications
+        always get sufficient resources. When a pool contains jobs, it gets
+        at least its minimum share, but when the pool does not need its full
+        guaranteed share, the excess is split between other running jobs.
+        This lets the scheduler guarantee capacity for pools while utilizing
+        resources efficiently when these pools don't contain jobs.       
+      </p>
+      <p>
+        The Fair Scheduler lets all jobs run by default, but it is also
+        possible to limit the number of running jobs per user and per pool
+        through the config file. This can be useful when a user must submit
+        hundreds of jobs at once, or in general to improve performance if
+        running too many jobs at once would cause too much intermediate data
+        to be created or too much context-switching. Limiting the jobs does
+        not cause any subsequently submitted jobs to fail, only to wait in the
+        sheduler's queue until some of the user's earlier jobs finish. Jobs to
+        run from each user/pool are chosen in order of priority and then
+        submit time, as in the default FIFO scheduler in Hadoop.
+      </p>
+      <p>
+        Finally, the fair scheduler provides several extension points where
+        the basic functionality can be extended. For example, the weight
+        calculation can be modified to give a priority boost to new jobs,
+        implementing a "shortest job first" policy which reduces response
+        times for interactive jobs even further.
+      </p>
+    </section>
+
+    <section>
+      <title>Installation</title>
+      <p>
+        To run the fair scheduler in your Hadoop installation, you need to put
+        it on the CLASSPATH. The easiest way is to copy the 
+        <em>hadoop-*-fairscheduler.jar</em> from
+        <em>HADOOP_HOME/contrib/fairscheduler</em> to <em>HADOOP_HOME/lib</em>.
+        Alternatively you can modify <em>HADOOP_CLASSPATH</em> to include this jar, in
+        <em>HADOOP_CONF_DIR/hadoop-env.sh</em>
+      </p>
+      <p>
+        In order to compile fair scheduler, from sources execute <em> ant 
+        package</em> in source folder and copy the 
+        <em>build/contrib/fair-scheduler/hadoop-*-fairscheduler.jar</em> 
+        to <em>HADOOP_HOME/lib</em>
+      </p>
+      <p>
+       You will also need to set the following property in the Hadoop config 
+       file  <em>HADOOP_CONF_DIR/mapred-site.xml</em> to have Hadoop use 
+       the fair scheduler: <br/>
+       <code>&lt;property&gt;</code><br/> 
+       <code>&nbsp;&nbsp;&lt;name&gt;mapred.jobtracker.taskScheduler&lt;/name&gt;</code><br/>
+       <code>&nbsp;&nbsp;&lt;value&gt;org.apache.hadoop.mapred.FairScheduler&lt;/value&gt;</code><br/>
+       <code>&lt;/property&gt;</code>
+      </p>
+      <p>
+        Once you restart the cluster, you can check that the fair scheduler 
+        is running by going to http://&lt;jobtracker URL&gt;/scheduler 
+        on the JobTracker's web UI. A &quot;job scheduler administration&quot; page should 
+        be visible there. This page is described in the Administration section.
+      </p>
+    </section>
+    
+    <section>
+      <title>Configuring the Fair scheduler</title>
+      <p>
+      The following properties can be set in mapred-site.xml to configure 
+      the fair scheduler:
+      </p>
+      <table>
+        <tr>
+        <th>Name</th><th>Description</th>
+        </tr>
+        <tr>
+        <td>
+          mapred.fairscheduler.allocation.file
+        </td>
+        <td>
+          Specifies an absolute path to an XML file which contains the 
+          allocations for each pool, as well as the per-pool and per-user 
+          limits on number of running jobs. If this property is not 
+          provided, allocations are not used.<br/>
+          This file must be in XML format, and can contain three types of 
+          elements:
+          <ul>
+          <li>pool elements, which may contain elements for minMaps, 
+          minReduces, maxRunningJobs (limit the number of jobs from the 
+          pool to run at once),and weight (to share the cluster 
+          non-proportionally with other pools).
+          </li>
+          <li>user elements, which may contain a maxRunningJobs to limit 
+          jobs. Note that by default, there is a separate pool for each 
+          user, so these may not be necessary; they are useful, however, 
+          if you create a pool per user group or manually assign jobs 
+          to pools.</li>
+          <li>A userMaxJobsDefault element, which sets the default running 
+          job limit for any users whose limit is not specified.</li>
+          </ul>
+          <br/>
+          Example Allocation file is listed below :<br/>
+          <code>&lt;?xml version="1.0"?&gt; </code> <br/>
+          <code>&lt;allocations&gt;</code> <br/> 
+          <code>&nbsp;&nbsp;&lt;pool name="sample_pool"&gt;</code><br/>
+          <code>&nbsp;&nbsp;&nbsp;&nbsp;&lt;minMaps&gt;5&lt;/minMaps&gt;</code><br/>
+          <code>&nbsp;&nbsp;&nbsp;&nbsp;&lt;minReduces&gt;5&lt;/minReduces&gt;</code><br/>
+          <code>&nbsp;&nbsp;&nbsp;&nbsp;&lt;weight&gt;2.0&lt;/weight&gt;</code><br/>
+          <code>&nbsp;&nbsp;&lt;/pool&gt;</code><br/>
+          <code>&nbsp;&nbsp;&lt;user name="sample_user"&gt;</code><br/>
+          <code>&nbsp;&nbsp;&nbsp;&nbsp;&lt;maxRunningJobs&gt;6&lt;/maxRunningJobs&gt;</code><br/>
+          <code>&nbsp;&nbsp;&lt;/user&gt;</code><br/>
+          <code>&nbsp;&nbsp;&lt;userMaxJobsDefault&gt;3&lt;/userMaxJobsDefault&gt;</code><br/>
+          <code>&lt;/allocations&gt;</code>
+          <br/>
+          This example creates a pool sample_pool with a guarantee of 5 map 
+          slots and 5 reduce slots. The pool also has a weight of 2.0, meaning 
+          it has a 2x higher share of the cluster than other pools (the default 
+          weight is 1). Finally, the example limits the number of running jobs 
+          per user to 3, except for sample_user, who can run 6 jobs concurrently. 
+          Any pool not defined in the allocations file will have no guaranteed 
+          capacity and a weight of 1.0. Also, any pool or user with no max 
+          running jobs set in the file will be allowed to run an unlimited 
+          number of jobs.
+        </td>
+        </tr>
+        <tr>
+        <td>
+          mapred.fairscheduler.assignmultiple
+        </td>
+        <td>
+          Allows the scheduler to assign both a map task and a reduce task 
+          on each heartbeat, which improves cluster throughput when there 
+          are many small tasks to run. Boolean value, default: false.
+        </td>
+        </tr>
+        <tr>
+        <td>
+          mapred.fairscheduler.sizebasedweight
+        </td>
+        <td>
+          Take into account job sizes in calculating their weights for fair 
+          sharing.By default, weights are only based on job priorities. 
+          Setting this flag to true will make them based on the size of the 
+          job (number of tasks needed) as well,though not linearly 
+          (the weight will be proportional to the log of the number of tasks 
+          needed). This lets larger jobs get larger fair shares while still 
+          providing enough of a share to small jobs to let them finish fast. 
+          Boolean value, default: false.
+        </td>
+        </tr>
+        <tr>
+        <td>
+          mapred.fairscheduler.poolnameproperty
+        </td>
+        <td>
+          Specify which jobconf property is used to determine the pool that a
+          job belongs in. String, default: user.name (i.e. one pool for each 
+          user). Some other useful values to set this to are: <br/>
+          <ul> 
+            <li> group.name (to create a pool per Unix group).</li>
+            <li>mapred.job.queue.name (the same property as the queue name in 
+            <a href="capacity_scheduler.html">Capacity Scheduler</a>).</li>
+          </ul>
+        </td>
+        </tr>
+        <tr>
+        <td>
+          mapred.fairscheduler.weightadjuster
+        </td>
+        <td>
+        An extensibility point that lets you specify a class to adjust the 
+        weights of running jobs. This class should implement the 
+        <em>WeightAdjuster</em> interface. There is currently one example 
+        implementation - <em>NewJobWeightBooster</em>, which increases the 
+        weight of jobs for the first 5 minutes of their lifetime to let 
+        short jobs finish faster. To use it, set the weightadjuster 
+        property to the full class name, 
+        <code>org.apache.hadoop.mapred.NewJobWeightBooster</code> 
+        NewJobWeightBooster itself provides two parameters for setting the 
+        duration and boost factor. <br/>
+        <ol>
+        <li> <em>mapred.newjobweightbooster.factor</em>
+          Factor by which new jobs weight should be boosted. Default is 3</li>
+        <li><em>mapred.newjobweightbooster.duration</em>
+          Duration in milliseconds, default 300000 for 5 minutes</li>
+        </ol>
+        </td>
+        </tr>
+        <tr>
+        <td>
+          mapred.fairscheduler.loadmanager
+        </td>
+        <td>
+          An extensibility point that lets you specify a class that determines 
+          how many maps and reduces can run on a given TaskTracker. This class 
+          should implement the LoadManager interface. By default the task caps 
+          in the Hadoop config file are used, but this option could be used to 
+          make the load based on available memory and CPU utilization for example.
+        </td>
+        </tr>
+        <tr>
+        <td>
+          mapred.fairscheduler.taskselector:
+        </td>
+        <td>
+        An extensibility point that lets you specify a class that determines 
+        which task from within a job to launch on a given tracker. This can be 
+        used to change either the locality policy (e.g. keep some jobs within 
+        a particular rack) or the speculative execution algorithm (select 
+        when to launch speculative tasks). The default implementation uses 
+        Hadoop's default algorithms from JobInProgress.
+        </td>
+        </tr>
+      </table>      
+    </section>
+    <section>
+    <title> Administration</title>
+    <p>
+      The fair scheduler provides support for administration at runtime 
+      through two mechanisms:
+    </p> 
+    <ol>
+    <li>
+      It is possible to modify pools' allocations 
+      and user and pool running job limits at runtime by editing the allocation 
+      config file. The scheduler will reload this file 10-15 seconds after it 
+      sees that it was modified.
+     </li>
+     <li>
+     Current jobs, pools, and fair shares  can be examined through the 
+     JobTracker's web interface, at  http://&lt;jobtracker URL&gt;/scheduler. 
+     On this interface, it is also possible to modify jobs' priorities or 
+     move jobs from one pool to another and see the effects on the fair 
+     shares (this requires JavaScript).
+     </li>
+    </ol>
+    <p>
+      The following fields can be seen for each job on the web interface:
+     </p>
+     <ul>
+     <li><em>Submitted</em> - Date and time job was submitted.</li>
+     <li><em>JobID, User, Name</em> - Job identifiers as on the standard 
+     web UI.</li>
+     <li><em>Pool</em> - Current pool of job. Select another value to move job to 
+     another pool.</li>
+     <li><em>Priority</em> - Current priority. Select another value to change the 
+     job's priority</li>
+     <li><em>Maps/Reduces Finished</em>: Number of tasks finished / total tasks.</li>
+     <li><em>Maps/Reduces Running</em>: Tasks currently running.</li>
+     <li><em>Map/Reduce Fair Share</em>: The average number of task slots that this 
+     job should have at any given time according to fair sharing. The actual
+     number of tasks will go up and down depending on how much compute time
+     the job has had, but on average it will get its fair share amount.</li>
+     </ul>
+     <p>
+     In addition, it is possible to turn on an "advanced" view for the web UI,
+     by going to http://&lt;jobtracker URL&gt;/scheduler?advanced. This view shows 
+     four more columns used for calculations internally:
+     </p>
+     <ul>
+     <li><em>Maps/Reduce Weight</em>: Weight of the job in the fair sharing 
+     calculations. This depends on priority and potentially also on 
+     job size and job age if the <em>sizebasedweight</em> and 
+     <em>NewJobWeightBooster</em> are enabled.</li>
+     <li><em>Map/Reduce Deficit</em>: The job's scheduling deficit in machine-
+     seconds - the amount of resources it should have gotten according to 
+     its fair share, minus how many it actually got. Positive deficit means
+      the job will be scheduled again in the near future because it needs to 
+      catch up to its fair share. The scheduler schedules jobs with higher 
+      deficit ahead of others. Please see the Implementation section of 
+      this document for details.</li>
+     </ul>
+    </section>
+    <section>
+    <title>Implementation</title>
+    <p>There are two aspects to implementing fair scheduling: Calculating 
+    each job's fair share, and choosing which job to run when a task slot 
+    becomes available.</p>
+    <p>To select jobs to run, the scheduler then keeps track of a 
+    &quot;deficit&quot; for each job - the difference between the amount of
+     compute time it should have gotten on an ideal scheduler, and the amount 
+     of compute time it actually got. This is a measure of how 
+     &quot;unfair&quot; we've been to the job. Every few hundred 
+     milliseconds, the scheduler updates the deficit of each job by looking
+     at how many tasks each job had running during this interval vs. its 
+     fair share. Whenever a task slot becomes available, it is assigned to 
+     the job with the highest deficit. There is one exception - if there 
+     were one or more jobs who were not meeting their pool capacity 
+     guarantees, we only choose among these &quot;needy&quot; jobs (based 
+     again on their deficit), to ensure that the scheduler meets pool 
+     guarantees as soon as possible.</p>
+     <p>
+     The fair shares are calculated by dividing the capacity of the cluster 
+     among runnable jobs according to a &quot;weight&quot; for each job. By 
+     default the weight is based on priority, with each level of priority 
+     having 2x higher weight than the next (for example, VERY_HIGH has 4x the 
+     weight of NORMAL). However, weights can also be based on job sizes and ages, 
+     as described in the Configuring section. For jobs that are in a pool, 
+     fair shares also take into account the minimum guarantee for that pool. 
+     This capacity is divided among the jobs in that pool according again to 
+     their weights.
+     </p>
+     <p>Finally, when limits on a user's running jobs or a pool's running jobs 
+     are in place, we choose which jobs get to run by sorting all jobs in order 
+     of priority and then submit time, as in the standard Hadoop scheduler. Any 
+     jobs that fall after the user/pool's limit in this ordering are queued up 
+     and wait idle until they can be run. During this time, they are ignored 
+     from the fair sharing calculations and do not gain or lose deficit (their 
+     fair share is set to zero).</p>
+    </section>
+  </body>  
+</document>

+ 1 - 0
src/docs/src/documentation/content/xdocs/site.xml

@@ -52,6 +52,7 @@ See http://forrest.apache.org/docs/linking.html for more info.
     <hod-user-guide label="HOD User Guide" href="hod_user_guide.html"/>
     <hod-admin-guide label="HOD Admin Guide" href="hod_admin_guide.html"/>
     <hod-config-guide label="HOD Config Guide" href="hod_config_guide.html"/>
+    <fair_scheduler label="Fair Scheduler" href="fair_scheduler.html"/>
     <capacity_scheduler label="Capacity Scheduler" href="capacity_scheduler.html"/>
     <service_level_auth label="Service Level Authorization" href="service_level_auth.html"/>
     <vaidya    label="Hadoop Vaidya" href="vaidya.html"/>