123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309 |
- ~~ Licensed under the Apache License, Version 2.0 (the "License");
- ~~ you may not use this file except in compliance with the License.
- ~~ You may obtain a copy of the License at
- ~~
- ~~ http://www.apache.org/licenses/LICENSE-2.0
- ~~
- ~~ Unless required by applicable law or agreed to in writing, software
- ~~ distributed under the License is distributed on an "AS IS" BASIS,
- ~~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- ~~ See the License for the specific language governing permissions and
- ~~ limitations under the License. See accompanying LICENSE file.
- ---
- Hadoop Map Reduce Next Generation-${project.version} - Capacity Scheduler
- ---
- ---
- ${maven.build.timestamp}
- Hadoop MapReduce Next Generation - Capacity Scheduler
- \[ {{{./index.html}Go Back}} \]
- %{toc|section=1|fromDepth=0}
- * {Purpose}
- This document describes the <<<CapacityScheduler>>>, a pluggable scheduler
- for Hadoop which allows for multiple-tenants to securely share a large cluster
- such that their applications are allocated resources in a timely manner under
- constraints of allocated capacities.
- * {Overview}
- The <<<CapacityScheduler>>> is designed to run Hadoop applications as a
- shared, multi-tenant cluster in an operator-friendly manner while maximizing
- the throughput and the utilization of the cluster.
-
- Traditionally each organization has it own private set of compute resources
- that have sufficient capacity to meet the organization's SLA under peak or
- near peak conditions. This generally leads to poor average utilization and
- overhead of managing multiple independent clusters, one per each organization.
- Sharing clusters between organizations is a cost-effective manner of running
- large Hadoop installations since this allows them to reap benefits of
- economies of scale without creating private clusters. However, organizations
- are concerned about sharing a cluster because they are worried about others
- using the resources that are critical for their SLAs.
-
- The <<<CapacityScheduler>>> is designed to allow sharing a large cluster while
- giving each organization capacity guarantees. The central idea is
- that the available resources in the Hadoop cluster are shared among multiple
- organizations who collectively fund the cluster based on their computing
- needs. There is an added benefit that an organization can access
- any excess capacity no being used by others. This provides elasticity for
- the organizations in a cost-effective manner.
-
- Sharing clusters across organizations necessitates strong support for
- multi-tenancy since each organization must be guaranteed capacity and
- safe-guards to ensure the shared cluster is impervious to single rouge
- application or user or sets thereof. The <<<CapacityScheduler>>> provides a
- stringent set of limits to ensure that a single application or user or queue
- cannot consume dispropotionate amount of resources in the cluster. Also, the
- <<<CapacityScheduler>>> provides limits on initialized/pending applications
- from a single user and queue to ensure fairness and stability of the cluster.
-
- The primary abstraction provided by the <<<CapacityScheduler>>> is the concept
- of <queues>. These queues are typically setup by administrators to reflect the
- economics of the shared cluster.
-
- To provide further control and predictability on sharing of resources, the
- <<<CapacityScheduler>>> supports <heirarchical queues> to ensure
- resources are shared among the sub-queues of an organization before other
- queues are allowed to use free resources, there-by providing <affinity>
- for sharing free resources among applications of a given organization.
-
- * {Features}
- The <<<CapacityScheduler>>> supports the following features:
-
- * Heirarchical Queues - Heirarchy of queues is supported to ensure resources
- are shared among the sub-queues of an organization before other
- queues are allowed to use free resources, there-by providing more control
- and predictability.
-
- * Capacity Guarantees - Queues are allocated a fraction of the capacity of the
- grid in the sense that a certain capacity of resources will be at their
- disposal. All applications submitted to a queue will have access to the
- capacity allocated to the queue. Adminstrators can configure soft limits and
- optional hard limits on the capacity allocated to each queue.
-
- * Security - Each queue has strict ACLs which controls which users can submit
- applications to individual queues. Also, there are safe-guards to ensure
- that users cannot view and/or modify applications from other users.
- Also, per-queue and system administrator roles are supported.
-
- * Elasticity - Free resources can be allocated to any queue beyond it's
- capacity. When there is demand for these resources from queues running below
- capacity at a future point in time, as tasks scheduled on these resources
- complete, they will be assigned to applications on queues running below the
- capacity. This ensures that resources are available in a predictable and
- elastic manner to queues, thus preventing artifical silos of resources in
- the cluster which helps utilization.
-
- * Multi-tenancy - Comprehensive set of limits are provided to prevent a
- single application, user and queue from monpolizing resources of the queue
- or the cluster as a whole to ensure that the cluster isn't overwhelmed.
-
- * Operability
-
- * Runtime Configuration - The queue definitions and properties such as
- capacity, ACLs can be changed, at runtime, by administrators in a secure
- manner to minimize disruption to users. Also, a console is provided for
- users and administrators to view current allocation of resources to
- various queues in the system. Administrators can also
- <add additional queues> at runtime.
-
- * Drain applications - Administrators can <stop> queues
- at runtime to ensure that while existing applications run to completion,
- no new applications can be submitted. If a queue is in <<<STOPPED>>>
- state, new applications cannot be submitted to <itself> or
- <any of its child queueus>. Existing applications continue to completion,
- thus the queue can be <drained> gracefully. Administrators can also
- <start> the stopped queues.
-
- * Resource-based Scheduling - Support for resource-intensive applications,
- where-in a application can optionally specify higher resource-requirements
- than the default, there-by accomodating applications with differing resource
- requirements. Currently, <memory> is the the resource requirement supported.
-
- []
-
- * {Configuration}
- * Setting up <<<ResourceManager>>> to use <<<CapacityScheduler>>>
-
- To configure the <<<ResourceManager>>> to use the <<<CapacityScheduler>>>, set
- the following property in the <<conf/yarn-site.xml>>:
-
- *--------------------------------------+--------------------------------------+
- || Property || Value |
- *--------------------------------------+--------------------------------------+
- | <<<yarn.resourcemanager.scheduler.class>>> | |
- | | <<<org.apache.hadoop.yarn.server.resourcemanager.scheduler.fifo.CapacityScheduler>>> |
- *--------------------------------------------+--------------------------------------------+
- * Setting up <queues>
-
- <<conf/capacity-scheduler.xml>> is the configuration file for the
- <<<CapacityScheduler>>>.
-
- The <<<CapacityScheduler>>> has a pre-defined queue called <root>. All
- queueus in the system are children of the root queue.
- Further queues can be setup by configuring
- <<<yarn.scheduler.capacity.root.queues>>> with a list of comma-separated
- child queues.
-
- The configuration for <<<CapacityScheduler>>> uses a concept called
- <queue path> to configure the heirarchy of queues. The <queue path> is the
- full path of the queue's heirarcy, starting at <root>, with . (dot) as the
- delimiter.
-
- A given queue's children can be defined with the configuration knob:
- <<<yarn.scheduler.capacity.<queue-path>.queues>>>
-
- Here is an example with three top-level child-queues <<<a>>>, <<<b>>> and
- <<<c>>> and some sub-queues for <<<a>>> and <<<b>>>:
-
- ----
- <property>
- <name>yarn.scheduler.capacity.root.queues</name>
- <value>a,b,c</value>
- <description>The queues at the this level (root is the root queue).
- </description>
- </property>
- <property>
- <name>yarn.scheduler.capacity.root.a.queues</name>
- <value>a1,a2</value>
- <description>The queues at the this level (root is the root queue).
- </description>
- </property>
- <property>
- <name>yarn.scheduler.capacity.root.b.queues</name>
- <value>b1,b2,b3</value>
- <description>The queues at the this level (root is the root queue).
- </description>
- </property>
- ----
- * Queue Properties
-
- * Resource Allocation
-
- *--------------------------------------+--------------------------------------+
- || Property || Description |
- *--------------------------------------+--------------------------------------+
- | <<<yarn.scheduler.capacity.<queue-path>.capacity>>> | |
- | | Queue <capacity> in percentage (%). |
- | | The sum of capacities for all queues, at each level, should be less than |
- | | or equal to 100. |
- | | Applications in the queue may consume more resources than the queue's |
- | | capacity if there are free resources, providing elasticity. |
- *--------------------------------------+--------------------------------------+
- | <<<yarn.scheduler.capacity.<queue-path>.maximum-capacity>>> | |
- | | Maximum queue capacity in percentage (%). |
- | | This limits the <elasticity> for applications in the queue. |
- *--------------------------------------+--------------------------------------+
- | <<<yarn.scheduler.capacity.<queue-path>.minimum-user-limit-percent>>> | |
- | | Each queue enforces a limit on the percentage of resources allocated to a |
- | | user at any given time, if there is demand for resources. The user limit |
- | | can vary between a minimum and maximum value. The former depends on the |
- | | number of users who have submitted applications, and the latter is set to |
- | | this property value. For e.g., suppose the value of this property is 25. |
- | | If two users have submitted applications to a queue, no single user can |
- | | use more than 50% of the queue resources. If a third user submits an |
- | | application, no single user can use more than 33% of the queue resources. |
- | | With 4 or more users, no user can use more than 25% of the queues |
- | | resources. A value of 100 implies no user limits are imposed. |
- *--------------------------------------+--------------------------------------+
- | <<<yarn.scheduler.capacity.<queue-path>.user-limit-factor>>> | |
- | | The multiple of the queue capacity which can be configured to allow a |
- | | single user to acquire more resources. By default this is set to 1 which |
- | | ensures that a single user can never take more than the queue's configured |
- | | capacity irrespective of how idle th cluster is. |
- *--------------------------------------+--------------------------------------+
- * Running and Pending Application Limits
-
-
- The <<<CapacityScheduler>>> supports the following parameters to control
- the runnign and pending applications:
-
- *--------------------------------------+--------------------------------------+
- || Property || Description |
- *--------------------------------------+--------------------------------------+
- | <<<yarn.scheduler.capacity.maximum-applications>>> | |
- | | Maximum number of jobs in the system which can be concurently active |
- | | both running and pending. Limits on each queue are directly proportional |
- | | to their queue capacities. |
- *--------------------------------------+--------------------------------------+
- | yarn.scheduler.capacity.maximum-am-resource-percent | |
- | | Maximum percent of resources in the cluster which can be used to run |
- | | application masters - controls number of concurrent running applications. |
- *--------------------------------------+--------------------------------------+
- * Queue Administration & Permissions
-
- The <<<CapacityScheduler>>> supports the following parameters to
- the administer the queues:
-
-
- *--------------------------------------+--------------------------------------+
- || Property || Description |
- *--------------------------------------+--------------------------------------+
- | <<<yarn.scheduler.capacity.<queue-path>.state>>> | |
- | | The <state> of the queue. Can be one of <<<RUNNING>>> or <<<STOPPED>>>. |
- | | If a queue is in <<<STOPPED>>> state, new applications cannot be |
- | | submitted to <itself> or <any of its child queueus>. |
- | | Thus, if the <root> queue is <<<STOPPED>>> no applications can be |
- | | submitted to the entire cluster. |
- | | Existing applications continue to completion, thus the queue can be
- | | <drained> gracefully. |
- *--------------------------------------+--------------------------------------+
- | <<<yarn.scheduler.capacity.root.<queue-path>.acl_submit_jobs>>> | |
- | | The <ACL> which controls who can <submit> jobs to the given queue. |
- | | If the given user/group has necessary ACLs on the given queue or |
- | | <one of the parent queues in the hierarchy> they can submit jobs. |
- *--------------------------------------+--------------------------------------+
- | <<<yarn.scheduler.capacity.root.<queue-path>.acl_administer_jobs>>> | |
- | | The <ACL> which controls who can <administer> jobs on the given queue. |
- | | If the given user/group has necessary ACLs on the given queue or |
- | | <one of the parent queues in the hierarchy> they can administer jobs. |
- *--------------------------------------+--------------------------------------+
-
- <Note:> An <ACL> is of the form <user1>, <user2><space><group1>, <group2>.
- The special value of <<*>> implies <anyone>. The special value of <space>
- implies <no one>.
-
- * Reviewing the configuration of the CapacityScheduler
- Once the installation and configuration is completed, you can review it
- after starting the YARN cluster from the web-ui.
- * Start the YARN cluster in the normal manner.
- * Open the <<<ResourceManager>>> web UI.
- * The </scheduler> web-page should show the resource usages of individual
- queues.
-
- []
-
- * {Changing Queue Configuration}
- Changing queue properties and adding new queues is very simple. You need to
- edit <<conf/capacity-scheduler.xml>> and run <rmadmin -refreshQueues>.
-
- ----
- $ vi $HADOOP_CONF_DIR/capacity-scheduler.xml
- $ $YARN_HOME/bin/rmadmin -refreshQueues
- ----
- <Note:> Queues cannot be <deleted>, only addition of new queues is supported -
- the updated queue configuration should be a valid one i.e. queue-capacity at
- each <level> should be equal to 100%.
-
|