|
@@ -0,0 +1,77 @@
|
|
|
+~~ Licensed under the Apache License, Version 2.0 (the "License");
|
|
|
+~~ you may not use this file except in compliance with the License.
|
|
|
+~~ You may obtain a copy of the License at
|
|
|
+~~
|
|
|
+~~ http://www.apache.org/licenses/LICENSE-2.0
|
|
|
+~~
|
|
|
+~~ Unless required by applicable law or agreed to in writing, software
|
|
|
+~~ distributed under the License is distributed on an "AS IS" BASIS,
|
|
|
+~~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
|
|
+~~ See the License for the specific language governing permissions and
|
|
|
+~~ limitations under the License. See accompanying LICENSE file.
|
|
|
+
|
|
|
+ ---
|
|
|
+ YARN
|
|
|
+ ---
|
|
|
+ ---
|
|
|
+ ${maven.build.timestamp}
|
|
|
+
|
|
|
+Apache Hadoop NextGen MapReduce (YARN)
|
|
|
+
|
|
|
+ MapReduce has undergone a complete overhaul in hadoop-0.23 and we now have,
|
|
|
+ what we call, MapReduce 2.0 (MRv2) or YARN.
|
|
|
+
|
|
|
+ The fundamental idea of MRv2 is to split up the two major functionalities of
|
|
|
+ the JobTracker, resource management and job scheduling/monitoring, into
|
|
|
+ separate daemons. The idea is to have a global ResourceManager (<RM>) and
|
|
|
+ per-application ApplicationMaster (<AM>). An application is either a single
|
|
|
+ job in the classical sense of Map-Reduce jobs or a DAG of jobs.
|
|
|
+
|
|
|
+ The ResourceManager and per-node slave, the NodeManager (<NM>), form the
|
|
|
+ data-computation framework. The ResourceManager is the ultimate authority that
|
|
|
+ arbitrates resources among all the applications in the system.
|
|
|
+
|
|
|
+ The per-application ApplicationMaster is, in effect, a framework specific
|
|
|
+ library and is tasked with negotiating resources from the ResourceManager and
|
|
|
+ working with the NodeManager(s) to execute and monitor the tasks.
|
|
|
+
|
|
|
+[./yarn_architecture.gif] MapReduce NextGen Architecture
|
|
|
+
|
|
|
+ The ResourceManager has two main components: Scheduler and
|
|
|
+ ApplicationsManager.
|
|
|
+
|
|
|
+ The Scheduler is responsible for allocating resources to the various running
|
|
|
+ applications subject to familiar constraints of capacities, queues etc. The
|
|
|
+ Scheduler is pure scheduler in the sense that it performs no monitoring or
|
|
|
+ tracking of status for the application. Also, it offers no guarantees about
|
|
|
+ restarting failed tasks either due to application failure or hardware
|
|
|
+ failures. The Scheduler performs its scheduling function based the resource
|
|
|
+ requirements of the applications; it does so based on the abstract notion of
|
|
|
+ a resource <Container> which incorporates elements such as memory, cpu, disk,
|
|
|
+ network etc. In the first version, only <<<memory>>> is supported.
|
|
|
+
|
|
|
+ The Scheduler has a pluggable policy plug-in, which is responsible for
|
|
|
+ partitioning the cluster resources among the various queues, applications etc.
|
|
|
+ The current Map-Reduce schedulers such as the CapacityScheduler and the
|
|
|
+ FairScheduler would be some examples of the plug-in.
|
|
|
+
|
|
|
+ The CapacityScheduler supports <<<hierarchical queues>>> to allow for more
|
|
|
+ predictable sharing of cluster resources
|
|
|
+
|
|
|
+ The ApplicationsManager is responsible for accepting job-submissions,
|
|
|
+ negotiating the first container for executing the application specific
|
|
|
+ ApplicationMaster and provides the service for restarting the
|
|
|
+ ApplicationMaster container on failure.
|
|
|
+
|
|
|
+ The NodeManager is the per-machine framework agent who is responsible for
|
|
|
+ containers, monitoring their resource usage (cpu, memory, disk, network) and
|
|
|
+ reporting the same to the ResourceManager/Scheduler.
|
|
|
+
|
|
|
+ The per-application ApplicationMaster has the responsibility of negotiating
|
|
|
+ appropriate resource containers from the Scheduler, tracking their status and
|
|
|
+ monitoring for progress.
|
|
|
+
|
|
|
+ MRV2 maintains <<API compatibility>> with previous stable release
|
|
|
+ (hadoop-0.20.205). This means that all Map-Reduce jobs should still run
|
|
|
+ unchanged on top of MRv2 with just a recompile.
|
|
|
+
|