11 years ago · 7ad7ab1723
--- a/hadoop-yarn-project/CHANGES.txt
+++ b/hadoop-yarn-project/CHANGES.txt
@@ -119,6 +119,9 @@ Release 2.6.0 - UNRELEASED
 
				     YARN-2373. Changed WebAppUtils to use Configuration#getPassword for
			
 
				     accessing SSL passwords. (Larry McCay via jianhe)
			
 
				 
			
 
				+    YARN-2317. Updated the document about how to write YARN applications. (Li Lu via
			
 
				+    zjshen)
			
 
				+
			
 
				   OPTIMIZATIONS
			
 
				 
			
 
				   BUG FIXES
			
--- a/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/WritingYarnApplications.apt.vm
+++ b/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-site/src/site/apt/WritingYarnApplications.apt.vm
@@ -11,8 +11,8 @@
 
				 ~~ limitations under the License. See accompanying LICENSE file.
			
 
				 
			
 
				   ---
			
 
				-  Hadoop Map Reduce Next Generation-${project.version} - Writing YARN 
			
 
				-  Applications 
			
 
				+  Hadoop Map Reduce Next Generation-${project.version} - Writing YARN
			
 
				+  Applications
			
 
				   ---
			
 
				   ---
			
 
				   ${maven.build.timestamp}
			
@@ -21,772 +21,737 @@ Hadoop MapReduce Next Generation - Writing YARN Applications
 
				 
			
 
				 %{toc|section=1|fromDepth=0}
			
 
				 
			
 
				-* Purpose 
			
 
				+* Purpose
			
 
				 
			
 
				-  This document describes, at a high-level, the way to implement new 
			
 
				+  This document describes, at a high-level, the way to implement new
			
 
				   Applications for YARN.
			
 
				 
			
 
				 * Concepts and Flow
			
 
				 
			
 
				-  The general concept is that an 'Application Submission Client' submits an 
			
 
				-  'Application' to the YARN Resource Manager. The client communicates with the 
			
 
				-  ResourceManager using the 'ApplicationClientProtocol' to first acquire a new 
			
 
				-  'ApplicationId' if needed via ApplicationClientProtocol#getNewApplication and then 
			
 
				-  submit the 'Application' to be run via ApplicationClientProtocol#submitApplication. As 
			
 
				-  part of the ApplicationClientProtocol#submitApplication call, the client needs to 
			
 
				-  provide sufficient information to the ResourceManager to 'launch' the 
			
 
				-  application's first container i.e. the ApplicationMaster. 
			
 
				-  You need to provide information such as the details about the local 
			
 
				-  files/jars that need to be available for your application to run, the actual 
			
 
				-  command that needs to be executed (with the necessary command line arguments), 
			
 
				-  any Unix environment settings (optional), etc. Effectively, you need to 
			
 
				-  describe the Unix process(es) that needs to be launched for your 
			
 
				-  ApplicationMaster. 
			
 
				-
			
 
				-  The YARN ResourceManager will then launch the ApplicationMaster (as specified) 
			
 
				-  on an allocated container. The ApplicationMaster is then expected to 
			
 
				-  communicate with the ResourceManager using the 'ApplicationMasterProtocol'. Firstly, the 
			
 
				-  ApplicationMaster needs to register itself with the ResourceManager. To 
			
 
				-  complete the task assigned to it, the ApplicationMaster can then request for 
			
 
				-  and receive containers via ApplicationMasterProtocol#allocate. After a container is 
			
 
				-  allocated to it, the ApplicationMaster communicates with the NodeManager using 
			
 
				-  ContainerManager#startContainer to launch the container for its task. As part 
			
 
				-  of launching this container, the ApplicationMaster has to specify the 
			
 
				-  ContainerLaunchContext which, similar to the ApplicationSubmissionContext, 
			
 
				-  has the launch information such as command line specification, environment, 
			
 
				-  etc. Once the task is completed, the ApplicationMaster has to signal the 
			
 
				-  ResourceManager of its completion via the ApplicationMasterProtocol#finishApplicationMaster. 
			
 
				-
			
 
				-  Meanwhile, the client can monitor the application's status by querying the 
			
 
				-  ResourceManager or by directly querying the ApplicationMaster if it supports 
			
 
				-  such a service. If needed, it can also kill the application via 
			
 
				-  ApplicationClientProtocol#forceKillApplication.  
			
 
				-
			
 
				-* Interfaces 
			
 
				+  The general concept is that an <application submission client> submits an
			
 
				+  <application> to the YARN <ResourceManager> (RM). This can be done through
			
 
				+  setting up a <<<YarnClient>>> object. After <<<YarnClient>>> is started, the
			
 
				+  client can then set up application context, prepare the very first container of
			
 
				+  the application that contains the <ApplicationMaster> (AM), and then submit
			
 
				+  the application. You need to provide information such as the details about the
			
 
				+  local files/jars that need to be available for your application to run, the
			
 
				+  actual command that needs to be executed (with the necessary command line
			
 
				+  arguments), any OS environment settings (optional), etc. Effectively, you
			
 
				+  need to describe the Unix process(es) that needs to be launched for your
			
 
				+  ApplicationMaster.
			
 
				+
			
 
				+  The YARN ResourceManager will then launch the ApplicationMaster (as
			
 
				+  specified) on an allocated container. The ApplicationMaster communicates with
			
 
				+  YARN cluster, and handles application execution. It performs operations in an
			
 
				+  asynchronous fashion. During application launch time, the main tasks of the
			
 
				+  ApplicationMaster are: a) communicating with the ResourceManager to negotiate
			
 
				+  and allocate resources for future containers, and b) after container
			
 
				+  allocation, communicating YARN <NodeManager>s (NMs) to launch application
			
 
				+  containers on them. Task a) can be performed asynchronously through an
			
 
				+  <<<AMRMClientAsync>>> object, with event handling methods specified in a
			
 
				+  <<<AMRMClientAsync.CallbackHandler>>> type of event handler. The event handler
			
 
				+  needs to be set to the client explicitly. Task b) can be performed by launching
			
 
				+  a runnable object that then launches containers when there are containers
			
 
				+  allocated. As part of launching this container, the AM has to
			
 
				+  specify the <<<ContainerLaunchContext>>> that has the launch information such as
			
 
				+  command line specification, environment, etc.
			
 
				+
			
 
				+  During the execution of an application, the ApplicationMaster communicates
			
 
				+  NodeManagers through <<<NMClientAsync>>> object. All container events are
			
 
				+  handled by <<<NMClientAsync.CallbackHandler>>>, associated with
			
 
				+  <<<NMClientAsync>>>. A typical callback handler handles client start, stop,
			
 
				+  status update and error. ApplicationMaster also reports execution progress to
			
 
				+  ResourceManager by handling the <<<getProgress()>>> method of
			
 
				+  <<<AMRMClientAsync.CallbackHandler>>>.
			
 
				+  
			
 
				+  Other than asynchronous clients, there are synchronous versions for certain
			
 
				+  workflows (<<<AMRMClient>>> and <<<NMClient>>>). The asynchronous clients are
			
 
				+  recommended because of (subjectively) simpler usages, and this article
			
 
				+  will mainly cover the asynchronous clients. Please refer to <<<AMRMClient>>>
			
 
				+  and <<<NMClient>>> for more information on synchronous clients.
			
 
				+
			
 
				+* Interfaces
			
 
				 
			
 
				   The interfaces you'd most like be concerned with are:
			
 
				 
			
 
				-  * ApplicationClientProtocol - Client\<--\>ResourceManager\
			
 
				-    The protocol for a client that wishes to communicate with the 
			
 
				-    ResourceManager to launch a new application (i.e. the ApplicationMaster), 
			
 
				-    check on the status of the application or kill the application. For example, 
			
 
				-    a job-client (a job launching program from the gateway) would use this 
			
 
				-    protocol. 
			
 
				+  * <<Client>>\<--\><<ResourceManager>>\
			
 
				+    By using <<<YarnClient>>> objects.
			
 
				+
			
 
				+  * <<ApplicationMaster>>\<--\><<ResourceManager>>\
			
 
				+    By using <<<AMRMClientAsync>>> objects, handling events asynchronously by
			
 
				+    <<<AMRMClientAsync.CallbackHandler>>>
			
 
				+
			
 
				+  * <<ApplicationMaster>>\<--\><<NodeManager>>\
			
 
				+    Launch containers. Communicate with NodeManagers
			
 
				+    by using <<<NMClientAsync>>> objects, handling container events by
			
 
				+    <<<NMClientAsync.CallbackHandler>>>
			
 
				+
			
 
				+  []
			
 
				+
			
 
				+  <<Note>>
			
 
				   
			
 
				-  * ApplicationMasterProtocol - ApplicationMaster\<--\>ResourceManager\
			
 
				-    The protocol used by the ApplicationMaster to register/unregister itself 
			
 
				-    to/from the ResourceManager as well as to request for resources from the 
			
 
				-    Scheduler to complete its tasks. 
			
 
				+    * The three main protocols for YARN application (ApplicationClientProtocol,
			
 
				+      ApplicationMasterProtocol and ContainerManagementProtocol) are still
			
 
				+      preserved. The 3 clients wrap these 3 protocols to provide simpler
			
 
				+      programming model for YARN applications.
			
 
				     
			
 
				-  * ContainerManager - ApplicationMaster\<--\>NodeManager\
			
 
				-    The protocol used by the ApplicationMaster to talk to the NodeManager to 
			
 
				-    start/stop containers and get status updates on the containers if needed. 
			
 
				+    * Under very rare circumstances, programmer may want to directly use the 3
			
 
				+      protocols to implement an application. However, note that <such behaviors
			
 
				+      are no longer encouraged for general use cases>.
			
 
				+
			
 
				+    []
			
 
				 
			
 
				 * Writing a Simple Yarn Application
			
 
				 
			
 
				 ** Writing a simple Client
			
 
				 
			
 
				-  * The first step that a client needs to do is to connect to the 
			
 
				-    ResourceManager or to be more specific, the ApplicationsManager (AsM) 
			
 
				-    interface of the ResourceManager. 
			
 
				+  * The first step that a client needs to do is to initialize and start a
			
 
				+    YarnClient.
			
 
				 
			
 
				 +---+
			
 
				-    ApplicationClientProtocol applicationsManager; 
			
 
				-    YarnConfiguration yarnConf = new YarnConfiguration(conf);
			
 
				-    InetSocketAddress rmAddress = 
			
 
				-        NetUtils.createSocketAddr(yarnConf.get(
			
 
				-            YarnConfiguration.RM_ADDRESS,
			
 
				-            YarnConfiguration.DEFAULT_RM_ADDRESS));		
			
 
				-    LOG.info("Connecting to ResourceManager at " + rmAddress);
			
 
				-    configuration appsManagerServerConf = new Configuration(conf);
			
 
				-    appsManagerServerConf.setClass(
			
 
				-        YarnConfiguration.YARN_SECURITY_INFO,
			
 
				-        ClientRMSecurityInfo.class, SecurityInfo.class);
			
 
				-    applicationsManager = ((ApplicationClientProtocol) rpc.getProxy(
			
 
				-        ApplicationClientProtocol.class, rmAddress, appsManagerServerConf));    
			
 
				+  YarnClient yarnClient = YarnClient.createYarnClient();
			
 
				+  yarnClient.init(conf);
			
 
				+  yarnClient.start();
			
 
				 +---+
			
 
				 
			
 
				-  * Once a handle is obtained to the ASM, the client needs to request the 
			
 
				-    ResourceManager for a new ApplicationId. 
			
 
				+  * Once a client is set up, the client needs to create an application, and get
			
 
				+    its application id.
			
 
				 
			
 
				 +---+
			
 
				-    GetNewApplicationRequest request = 
			
 
				-        Records.newRecord(GetNewApplicationRequest.class);		
			
 
				-    GetNewApplicationResponse response = 
			
 
				-        applicationsManager.getNewApplication(request);
			
 
				-    LOG.info("Got new ApplicationId=" + response.getApplicationId());
			
 
				+  YarnClientApplication app = yarnClient.createApplication();
			
 
				+  GetNewApplicationResponse appResponse = app.getNewApplicationResponse();
			
 
				 +---+
			
 
				 
			
 
				-  * The response from the ASM for a new application also contains information 
			
 
				-    about the cluster such as the minimum/maximum resource capabilities of the 
			
 
				-    cluster. This is required so that to ensure that you can correctly set the 
			
 
				-    specifications of the container in which the ApplicationMaster would be 
			
 
				-    launched. Please refer to GetNewApplicationResponse for more details. 
			
 
				+  * The response from the <<<YarnClientApplication>>> for a new application also
			
 
				+    contains information about the cluster such as the minimum/maximum resource
			
 
				+    capabilities of the cluster. This is required so that to ensure that you can
			
 
				+    correctly set the specifications of the container in which the
			
 
				+    ApplicationMaster would be launched. Please refer to
			
 
				+    <<<GetNewApplicationResponse>>> for more details.
			
 
				+
			
 
				+  * The main crux of a client is to setup the <<<ApplicationSubmissionContext>>>
			
 
				+    which defines all the information needed by the RM to launch the AM. A client
			
 
				+    needs to set the following into the context:
			
 
				+
			
 
				+    * Application info: id, name
			
 
				+
			
 
				+    * Queue, priority info: Queue to which the application will be submitted,
			
 
				+      the priority to be assigned for the application.
			
 
				+
			
 
				+    * User: The user submitting the application
			
 
				+
			
 
				+    * <<<ContainerLaunchContext>>>: The information defining the container in
			
 
				+      which the AM will be launched and run. The <<<ContainerLaunchContext>>>, as
			
 
				+      mentioned previously, defines all the required information needed to run
			
 
				+      the application such as the local <<R>>esources (binaries, jars, files
			
 
				+      etc.), <<E>>nvironment settings (CLASSPATH etc.), the <<C>>ommand to be
			
 
				+      executed and security <<T>>okens (<RECT>).
			
 
				+
			
 
				+    []
			
 
				 
			
 
				-  * The main crux of a client is to setup the ApplicationSubmissionContext 
			
 
				-    which defines all the information needed by the ResourceManager to launch 
			
 
				-    the ApplicationMaster. A client needs to set the following into the context: 
			
 
				-    
			
 
				-    * Application Info: id, name
			
 
				-
			
 
				-    * Queue, Priority info: Queue to which the application will be submitted, 
			
 
				-      the priority to be assigned for the application. 
			
 
				-
			
 
				-    * User: The user submitting the application 
			
 
				-
			
 
				-    * ContainerLaunchContext: The information defining the container in which 
			
 
				-      the ApplicationMaster will be launched and run. The 
			
 
				-      ContainerLaunchContext, as mentioned previously, defines all the required
			
 
				-      information needed to run the ApplicationMaster such as the local 
			
 
				-      resources (binaries, jars, files etc.), security tokens, environment 
			
 
				-      settings (CLASSPATH etc.) and the command to be executed. 
			
 
				-       
			
 
				-    []   
			
 
				-
			
 
				-+---+
			
 
				-    // Create a new ApplicationSubmissionContext
			
 
				-    ApplicationSubmissionContext appContext = 
			
 
				-        Records.newRecord(ApplicationSubmissionContext.class);
			
 
				-    // set the ApplicationId 
			
 
				-    appContext.setApplicationId(appId);
			
 
				-    // set the application name
			
 
				-    appContext.setApplicationName(appName);
			
 
				-    
			
 
				-    // Create a new container launch context for the AM's container
			
 
				-    ContainerLaunchContext amContainer = 
			
 
				-        Records.newRecord(ContainerLaunchContext.class);
			
 
				-
			
 
				-    // Define the local resources required 
			
 
				-    Map<String, LocalResource> localResources = 
			
 
				-        new HashMap<String, LocalResource>();
			
 
				-    // Lets assume the jar we need for our ApplicationMaster is available in 
			
 
				-    // HDFS at a certain known path to us and we want to make it available to
			
 
				-    // the ApplicationMaster in the launched container 
			
 
				-    Path jarPath; // <- known path to jar file  
			
 
				-    FileStatus jarStatus = fs.getFileStatus(jarPath);
			
 
				-    LocalResource amJarRsrc = Records.newRecord(LocalResource.class);
			
 
				-    // Set the type of resource - file or archive
			
 
				-    // archives are untarred at the destination by the framework
			
 
				-    amJarRsrc.setType(LocalResourceType.FILE);
			
 
				-    // Set visibility of the resource 
			
 
				-    // Setting to most private option i.e. this file will only 
			
 
				-    // be visible to this instance of the running application
			
 
				-    amJarRsrc.setVisibility(LocalResourceVisibility.APPLICATION);	   
			
 
				-    // Set the location of resource to be copied over into the 
			
 
				-    // working directory
			
 
				-    amJarRsrc.setResource(ConverterUtils.getYarnUrlFromPath(jarPath)); 
			
 
				-    // Set timestamp and length of file so that the framework 
			
 
				-    // can do basic sanity checks for the local resource 
			
 
				-    // after it has been copied over to ensure it is the same 
			
 
				-    // resource the client intended to use with the application
			
 
				-    amJarRsrc.setTimestamp(jarStatus.getModificationTime());
			
 
				-    amJarRsrc.setSize(jarStatus.getLen());
			
 
				-    // The framework will create a symlink called AppMaster.jar in the 
			
 
				-    // working directory that will be linked back to the actual file. 
			
 
				-    // The ApplicationMaster, if needs to reference the jar file, would 
			
 
				-    // need to use the symlink filename.  
			
 
				-    localResources.put("AppMaster.jar",  amJarRsrc);    
			
 
				-    // Set the local resources into the launch context    
			
 
				-    amContainer.setLocalResources(localResources);
			
 
				-
			
 
				-    // Set up the environment needed for the launch context
			
 
				-    Map<String, String> env = new HashMap<String, String>();    
			
 
				-    // For example, we could setup the classpath needed.
			
 
				-    // Assuming our classes or jars are available as local resources in the
			
 
				-    // working directory from which the command will be run, we need to append
			
 
				-    // "." to the path. 
			
 
				-    // By default, all the hadoop specific classpaths will already be available 
			
 
				-    // in $CLASSPATH, so we should be careful not to overwrite it.   
			
 
				-    String classPathEnv = "$CLASSPATH:./*:";    
			
 
				-    env.put("CLASSPATH", classPathEnv);
			
 
				-    amContainer.setEnvironment(env);
			
 
				-    
			
 
				-    // Construct the command to be executed on the launched container 
			
 
				-    String command = 
			
 
				-        "${JAVA_HOME}" + /bin/java" +
			
 
				-        " MyAppMaster" + 
			
 
				-        " arg1 arg2 arg3" + 
			
 
				-        " 1>" + ApplicationConstants.LOG_DIR_EXPANSION_VAR + "/stdout" +
			
 
				-        " 2>" + ApplicationConstants.LOG_DIR_EXPANSION_VAR + "/stderr";                     
			
 
				-
			
 
				-    List<String> commands = new ArrayList<String>();
			
 
				-    commands.add(command);
			
 
				-    // add additional commands if needed		
			
 
				-
			
 
				-    // Set the command array into the container spec
			
 
				-    amContainer.setCommands(commands);
			
 
				-    
			
 
				-    // Define the resource requirements for the container
			
 
				-    // For now, YARN only supports memory so we set the memory 
			
 
				-    // requirements. 
			
 
				-    // If the process takes more than its allocated memory, it will 
			
 
				-    // be killed by the framework. 
			
 
				-    // Memory being requested for should be less than max capability 
			
 
				-    // of the cluster and all asks should be a multiple of the min capability. 
			
 
				-    Resource capability = Records.newRecord(Resource.class);
			
 
				-    capability.setMemory(amMemory);
			
 
				-    amContainer.setResource(capability);
			
 
				-    
			
 
				-    // Set the container launch content into the ApplicationSubmissionContext
			
 
				-    appContext.setAMContainerSpec(amContainer);
			
 
				-+---+
			
 
				-
			
 
				-  * After the setup process is complete, the client is finally ready to submit 
			
 
				-    the application to the ASM.  
			
 
				-     
			
 
				-+---+
			
 
				-    // Create the request to send to the ApplicationsManager 
			
 
				-    SubmitApplicationRequest appRequest = 
			
 
				-        Records.newRecord(SubmitApplicationRequest.class);
			
 
				-    appRequest.setApplicationSubmissionContext(appContext);
			
 
				-
			
 
				-    // Submit the application to the ApplicationsManager
			
 
				-    // Ignore the response as either a valid response object is returned on 
			
 
				-    // success or an exception thrown to denote the failure
			
 
				-    applicationsManager.submitApplication(appRequest);
			
 
				-+---+    
			
 
				-   
			
 
				-  * At this point, the ResourceManager will have accepted the application and 
			
 
				-    in the background, will go through the process of allocating a container 
			
 
				-    with the required specifications and then eventually setting up and 
			
 
				-    launching the ApplicationMaster on the allocated container. 
			
 
				-    
			
 
				-  * There are multiple ways a client can track progress of the actual task. 
			
 
				-  
			
 
				-    * It can communicate with the ResourceManager and request for a report of 
			
 
				-      the application via ApplicationClientProtocol#getApplicationReport. 
			
 
				-
			
 
				-+-----+     
			
 
				-      GetApplicationReportRequest reportRequest = 
			
 
				-          Records.newRecord(GetApplicationReportRequest.class);
			
 
				-      reportRequest.setApplicationId(appId);
			
 
				-      GetApplicationReportResponse reportResponse = 
			
 
				-          applicationsManager.getApplicationReport(reportRequest);
			
 
				-      ApplicationReport report = reportResponse.getApplicationReport();
			
 
				-+-----+             
			
 
				-  
			
 
				-      The ApplicationReport received from the ResourceManager consists of the following: 
			
 
				-      
			
 
				-        * General application information: ApplicationId, queue to which the 
			
 
				-          application was submitted, user who submitted the application and the 
			
 
				-          start time for the application. 
			
 
				-          
			
 
				-        * ApplicationMaster details: the host on which the ApplicationMaster is 
			
 
				-          running, the rpc port (if any) on which it is listening for requests 
			
 
				-          from clients and a token that the client needs to communicate with 
			
 
				-          the ApplicationMaster. 
			
 
				-          
			
 
				-        * Application tracking information: If the application supports some 
			
 
				-          form of progress tracking, it can set a tracking url which is 
			
 
				-          available via ApplicationReport#getTrackingUrl that a client can look 
			
 
				-          at to monitor progress. 
			
 
				-          
			
 
				-        * ApplicationStatus: The state of the application as seen by the 
			
 
				-          ResourceManager is available via 
			
 
				-          ApplicationReport#getYarnApplicationState. If the 
			
 
				-          YarnApplicationState is set to FINISHED, the client should refer to 
			
 
				-          ApplicationReport#getFinalApplicationStatus to check for the actual 
			
 
				-          success/failure of the application task itself. In case of failures, 
			
 
				-          ApplicationReport#getDiagnostics may be useful to shed some more 
			
 
				-          light on the the failure.      
			
 
				- 
			
 
				-    * If the ApplicationMaster supports it, a client can directly query the 
			
 
				-      ApplicationMaster itself for progress updates via the host:rpcport 
			
 
				-      information obtained from the ApplicationReport. It can also use the 
			
 
				-      tracking url obtained from the report if available.
			
 
				-
			
 
				-  * In certain situations, if the application is taking too long or due to 
			
 
				-    other factors, the client may wish to kill the application. The 
			
 
				-    ApplicationClientProtocol supports the forceKillApplication call that allows a 
			
 
				-    client to send a kill signal to the ApplicationMaster via the 
			
 
				-    ResourceManager. An ApplicationMaster if so designed may also support an 
			
 
				-    abort call via its rpc layer that a client may be able to leverage.
			
 
				-
			
 
				-+---+
			
 
				-    KillApplicationRequest killRequest = 
			
 
				-        Records.newRecord(KillApplicationRequest.class);		
			
 
				-    killRequest.setApplicationId(appId);
			
 
				-    applicationsManager.forceKillApplication(killRequest);	
			
 
				-+---+
			
 
				-
			
 
				-** Writing an ApplicationMaster
			
 
				-
			
 
				-  * The ApplicationMaster is the actual owner of the job. It will be launched 
			
 
				-    by the ResourceManager and via the client will be provided all the necessary 
			
 
				-    information and resources about the job that it has been tasked with to 
			
 
				-    oversee and complete.  
			
 
				-
			
 
				-  * As the ApplicationMaster is launched within a container that may (likely 
			
 
				-    will) be sharing a physical host with other containers, given the 
			
 
				-    multi-tenancy nature, amongst other issues, it cannot make any assumptions 
			
 
				-    of things like pre-configured ports that it can listen on. 
			
 
				-  
			
 
				-  * When the ApplicationMaster starts up, several parameters are made available
			
 
				-    to it via the environment. These include the ContainerId for the
			
 
				-    ApplicationMaster container, the application submission time and details
			
 
				-    about the NodeManager host running the Application Master.
			
 
				-    Ref ApplicationConstants for parameter names.
			
 
				-
			
 
				-  * All interactions with the ResourceManager require an ApplicationAttemptId 
			
 
				-    (there can be multiple attempts per application in case of failures). The 
			
 
				-    ApplicationAttemptId can be obtained from the ApplicationMaster
			
 
				-    containerId. There are helper apis to convert the value obtained from the
			
 
				-    environment into objects.
			
 
				-    
			
 
				 +---+
			
 
				-    Map<String, String> envs = System.getenv();
			
 
				-    String containerIdString = 
			
 
				-        envs.get(ApplicationConstants.AM_CONTAINER_ID_ENV);
			
 
				-    if (containerIdString == null) {
			
 
				-      // container id should always be set in the env by the framework 
			
 
				-      throw new IllegalArgumentException(
			
 
				-          "ContainerId not set in the environment");
			
 
				+  // set the application submission context
			
 
				+  ApplicationSubmissionContext appContext = app.getApplicationSubmissionContext();
			
 
				+  ApplicationId appId = appContext.getApplicationId();
			
 
				+
			
 
				+  appContext.setKeepContainersAcrossApplicationAttempts(keepContainers);
			
 
				+  appContext.setApplicationName(appName);
			
 
				+
			
 
				+  // set local resources for the application master
			
 
				+  // local files or archives as needed
			
 
				+  // In this scenario, the jar file for the application master is part of the local resources
			
 
				+  Map<String, LocalResource> localResources = new HashMap<String, LocalResource>();
			
 
				+
			
 
				+  LOG.info("Copy App Master jar from local filesystem and add to local environment");
			
 
				+  // Copy the application master jar to the filesystem
			
 
				+  // Create a local resource to point to the destination jar path
			
 
				+  FileSystem fs = FileSystem.get(conf);
			
 
				+  addToLocalResources(fs, appMasterJar, appMasterJarPath, appId.toString(),
			
 
				+      localResources, null);
			
 
				+
			
 
				+  // Set the log4j properties if needed
			
 
				+  if (!log4jPropFile.isEmpty()) {
			
 
				+    addToLocalResources(fs, log4jPropFile, log4jPath, appId.toString(),
			
 
				+        localResources, null);
			
 
				+  }
			
 
				+
			
 
				+  // The shell script has to be made available on the final container(s)
			
 
				+  // where it will be executed.
			
 
				+  // To do this, we need to first copy into the filesystem that is visible
			
 
				+  // to the yarn framework.
			
 
				+  // We do not need to set this as a local resource for the application
			
 
				+  // master as the application master does not need it.
			
 
				+  String hdfsShellScriptLocation = "";
			
 
				+  long hdfsShellScriptLen = 0;
			
 
				+  long hdfsShellScriptTimestamp = 0;
			
 
				+  if (!shellScriptPath.isEmpty()) {
			
 
				+    Path shellSrc = new Path(shellScriptPath);
			
 
				+    String shellPathSuffix =
			
 
				+        appName + "/" + appId.toString() + "/" + SCRIPT_PATH;
			
 
				+    Path shellDst =
			
 
				+        new Path(fs.getHomeDirectory(), shellPathSuffix);
			
 
				+    fs.copyFromLocalFile(false, true, shellSrc, shellDst);
			
 
				+    hdfsShellScriptLocation = shellDst.toUri().toString();
			
 
				+    FileStatus shellFileStatus = fs.getFileStatus(shellDst);
			
 
				+    hdfsShellScriptLen = shellFileStatus.getLen();
			
 
				+    hdfsShellScriptTimestamp = shellFileStatus.getModificationTime();
			
 
				+  }
			
 
				+
			
 
				+  if (!shellCommand.isEmpty()) {
			
 
				+    addToLocalResources(fs, null, shellCommandPath, appId.toString(),
			
 
				+        localResources, shellCommand);
			
 
				+  }
			
 
				+
			
 
				+  if (shellArgs.length > 0) {
			
 
				+    addToLocalResources(fs, null, shellArgsPath, appId.toString(),
			
 
				+        localResources, StringUtils.join(shellArgs, " "));
			
 
				+  }
			
 
				+
			
 
				+  // Set the env variables to be setup in the env where the application master will be run
			
 
				+  LOG.info("Set the environment for the application master");
			
 
				+  Map<String, String> env = new HashMap<String, String>();
			
 
				+
			
 
				+  // put location of shell script into env
			
 
				+  // using the env info, the application master will create the correct local resource for the
			
 
				+  // eventual containers that will be launched to execute the shell scripts
			
 
				+  env.put(DSConstants.DISTRIBUTEDSHELLSCRIPTLOCATION, hdfsShellScriptLocation);
			
 
				+  env.put(DSConstants.DISTRIBUTEDSHELLSCRIPTTIMESTAMP, Long.toString(hdfsShellScriptTimestamp));
			
 
				+  env.put(DSConstants.DISTRIBUTEDSHELLSCRIPTLEN, Long.toString(hdfsShellScriptLen));
			
 
				+
			
 
				+  // Add AppMaster.jar location to classpath
			
 
				+  // At some point we should not be required to add
			
 
				+  // the hadoop specific classpaths to the env.
			
 
				+  // It should be provided out of the box.
			
 
				+  // For now setting all required classpaths including
			
 
				+  // the classpath to "." for the application jar
			
 
				+  StringBuilder classPathEnv = new StringBuilder(Environment.CLASSPATH.$$())
			
 
				+    .append(ApplicationConstants.CLASS_PATH_SEPARATOR).append("./*");
			
 
				+  for (String c : conf.getStrings(
			
 
				+      YarnConfiguration.YARN_APPLICATION_CLASSPATH,
			
 
				+      YarnConfiguration.DEFAULT_YARN_CROSS_PLATFORM_APPLICATION_CLASSPATH)) {
			
 
				+    classPathEnv.append(ApplicationConstants.CLASS_PATH_SEPARATOR);
			
 
				+    classPathEnv.append(c.trim());
			
 
				+  }
			
 
				+  classPathEnv.append(ApplicationConstants.CLASS_PATH_SEPARATOR).append(
			
 
				+    "./log4j.properties");
			
 
				+
			
 
				+  // Set the necessary command to execute the application master
			
 
				+  Vector<CharSequence> vargs = new Vector<CharSequence>(30);
			
 
				+
			
 
				+  // Set java executable command
			
 
				+  LOG.info("Setting up app master command");
			
 
				+  vargs.add(Environment.JAVA_HOME.$$() + "/bin/java");
			
 
				+  // Set Xmx based on am memory size
			
 
				+  vargs.add("-Xmx" + amMemory + "m");
			
 
				+  // Set class name
			
 
				+  vargs.add(appMasterMainClass);
			
 
				+  // Set params for Application Master
			
 
				+  vargs.add("--container_memory " + String.valueOf(containerMemory));
			
 
				+  vargs.add("--container_vcores " + String.valueOf(containerVirtualCores));
			
 
				+  vargs.add("--num_containers " + String.valueOf(numContainers));
			
 
				+  vargs.add("--priority " + String.valueOf(shellCmdPriority));
			
 
				+
			
 
				+  for (Map.Entry<String, String> entry : shellEnv.entrySet()) {
			
 
				+    vargs.add("--shell_env " + entry.getKey() + "=" + entry.getValue());
			
 
				+  }
			
 
				+  if (debugFlag) {
			
 
				+    vargs.add("--debug");
			
 
				+  }
			
 
				+
			
 
				+  vargs.add("1>" + ApplicationConstants.LOG_DIR_EXPANSION_VAR + "/AppMaster.stdout");
			
 
				+  vargs.add("2>" + ApplicationConstants.LOG_DIR_EXPANSION_VAR + "/AppMaster.stderr");
			
 
				+
			
 
				+  // Get final commmand
			
 
				+  StringBuilder command = new StringBuilder();
			
 
				+  for (CharSequence str : vargs) {
			
 
				+    command.append(str).append(" ");
			
 
				+  }
			
 
				+
			
 
				+  LOG.info("Completed setting up app master command " + command.toString());
			
 
				+  List<String> commands = new ArrayList<String>();
			
 
				+  commands.add(command.toString());
			
 
				+
			
 
				+  // Set up the container launch context for the application master
			
 
				+  ContainerLaunchContext amContainer = ContainerLaunchContext.newInstance(
			
 
				+    localResources, env, commands, null, null, null);
			
 
				+
			
 
				+  // Set up resource type requirements
			
 
				+  // For now, both memory and vcores are supported, so we set memory and
			
 
				+  // vcores requirements
			
 
				+  Resource capability = Resource.newInstance(amMemory, amVCores);
			
 
				+  appContext.setResource(capability);
			
 
				+
			
 
				+  // Service data is a binary blob that can be passed to the application
			
 
				+  // Not needed in this scenario
			
 
				+  // amContainer.setServiceData(serviceData);
			
 
				+
			
 
				+  // Setup security tokens
			
 
				+  if (UserGroupInformation.isSecurityEnabled()) {
			
 
				+    // Note: Credentials class is marked as LimitedPrivate for HDFS and MapReduce
			
 
				+    Credentials credentials = new Credentials();
			
 
				+    String tokenRenewer = conf.get(YarnConfiguration.RM_PRINCIPAL);
			
 
				+    if (tokenRenewer == null || tokenRenewer.length() == 0) {
			
 
				+      throw new IOException(
			
 
				+        "Can't get Master Kerberos principal for the RM to use as renewer");
			
 
				     }
			
 
				-    ContainerId containerId = ConverterUtils.toContainerId(containerIdString);
			
 
				-    ApplicationAttemptId appAttemptID = containerId.getApplicationAttemptId();
			
 
				-+---+     
			
 
				-  
			
 
				-  * After an ApplicationMaster has initialized itself completely, it needs to 
			
 
				-    register with the ResourceManager via 
			
 
				-    ApplicationMasterProtocol#registerApplicationMaster. The ApplicationMaster always 
			
 
				-    communicate via the Scheduler interface of the ResourceManager. 
			
 
				-  
			
 
				+
			
 
				+    // For now, only getting tokens for the default file-system.
			
 
				+    final Token<?> tokens[] =
			
 
				+        fs.addDelegationTokens(tokenRenewer, credentials);
			
 
				+    if (tokens != null) {
			
 
				+      for (Token<?> token : tokens) {
			
 
				+        LOG.info("Got dt for " + fs.getUri() + "; " + token);
			
 
				+      }
			
 
				+    }
			
 
				+    DataOutputBuffer dob = new DataOutputBuffer();
			
 
				+    credentials.writeTokenStorageToStream(dob);
			
 
				+    ByteBuffer fsTokens = ByteBuffer.wrap(dob.getData(), 0, dob.getLength());
			
 
				+    amContainer.setTokens(fsTokens);
			
 
				+  }
			
 
				+
			
 
				+  appContext.setAMContainerSpec(amContainer);
			
 
				 +---+
			
 
				-    // Connect to the Scheduler of the ResourceManager. 
			
 
				-    YarnConfiguration yarnConf = new YarnConfiguration(conf);
			
 
				-    InetSocketAddress rmAddress = 
			
 
				-        NetUtils.createSocketAddr(yarnConf.get(
			
 
				-            YarnConfiguration.RM_SCHEDULER_ADDRESS,
			
 
				-            YarnConfiguration.DEFAULT_RM_SCHEDULER_ADDRESS));		
			
 
				-    LOG.info("Connecting to ResourceManager at " + rmAddress);
			
 
				-    ApplicationMasterProtocol resourceManager = 
			
 
				-        (ApplicationMasterProtocol) rpc.getProxy(ApplicationMasterProtocol.class, rmAddress, conf);
			
 
				-
			
 
				-    // Register the AM with the RM
			
 
				-    // Set the required info into the registration request: 
			
 
				-    // ApplicationAttemptId, 
			
 
				-    // host on which the app master is running
			
 
				-    // rpc port on which the app master accepts requests from the client 
			
 
				-    // tracking url for the client to track app master progress
			
 
				-    RegisterApplicationMasterRequest appMasterRequest = 
			
 
				-        Records.newRecord(RegisterApplicationMasterRequest.class);
			
 
				-    appMasterRequest.setApplicationAttemptId(appAttemptID);	
			
 
				-    appMasterRequest.setHost(appMasterHostname);
			
 
				-    appMasterRequest.setRpcPort(appMasterRpcPort);
			
 
				-    appMasterRequest.setTrackingUrl(appMasterTrackingUrl);
			
 
				-
			
 
				-    // The registration response is useful as it provides information about the 
			
 
				-    // cluster. 
			
 
				-    // Similar to the GetNewApplicationResponse in the client, it provides 
			
 
				-    // information about the min/mx resource capabilities of the cluster that 
			
 
				-    // would be needed by the ApplicationMaster when requesting for containers.
			
 
				-    RegisterApplicationMasterResponse response = 
			
 
				-        resourceManager.registerApplicationMaster(appMasterRequest);
			
 
				-+---+
			
 
				-     
			
 
				-  * The ApplicationMaster has to emit heartbeats to the ResourceManager to keep 
			
 
				-    it informed that the ApplicationMaster is alive and still running. The 
			
 
				-    timeout expiry interval at the ResourceManager is defined by a config 
			
 
				-    setting accessible via YarnConfiguration.RM_AM_EXPIRY_INTERVAL_MS with the 
			
 
				-    default being defined by YarnConfiguration.DEFAULT_RM_AM_EXPIRY_INTERVAL_MS. 
			
 
				-    The ApplicationMasterProtocol#allocate calls to the ResourceManager count as heartbeats 
			
 
				-    as it also supports sending progress update information. Therefore, an 
			
 
				-    allocate call with no containers requested and progress information updated 
			
 
				-    if any is a valid way for making heartbeat calls to the ResourceManager. 
			
 
				-    
			
 
				-  * Based on the task requirements, the ApplicationMaster can ask for a set of 
			
 
				-    containers to run its tasks on. The ApplicationMaster has to use the 
			
 
				-    ResourceRequest class to define the following container specifications: 
			
 
				-    
			
 
				-    * Hostname: If containers are required to be hosted on a particular rack or 
			
 
				-      a specific host. '*' is a special value that implies any host will do. 
			
 
				-      
			
 
				-    * Resource capability: Currently, YARN only supports memory based resource 
			
 
				-      requirements so the request should define how much memory is needed. The 
			
 
				-      value is defined in MB and has to less than the max capability of the 
			
 
				+
			
 
				+  * After the setup process is complete, the client is ready to submit
			
 
				+    the application with specified priority and queue.
			
 
				+
			
 
				++---+
			
 
				+  // Set the priority for the application master
			
 
				+  Priority pri = Priority.newInstance(amPriority);
			
 
				+  appContext.setPriority(pri);
			
 
				+
			
 
				+  // Set the queue to which this application is to be submitted in the RM
			
 
				+  appContext.setQueue(amQueue);
			
 
				+
			
 
				+  // Submit the application to the applications manager
			
 
				+  // SubmitApplicationResponse submitResp = applicationsManager.submitApplication(appRequest);
			
 
				+
			
 
				+  yarnClient.submitApplication(appContext);
			
 
				++---+
			
 
				+
			
 
				+  * At this point, the RM will have accepted the application and in the
			
 
				+    background, will go through the process of allocating a container with the
			
 
				+    required specifications and then eventually setting up and launching the AM
			
 
				+    on the allocated container.
			
 
				+
			
 
				+  * There are multiple ways a client can track progress of the actual task.
			
 
				+
			
 
				+    * It can communicate with the RM and request for a report of the application
			
 
				+      via the <<<getApplicationReport()>>> method of <<<YarnClient>>>.
			
 
				+
			
 
				++-----+
			
 
				+  // Get application report for the appId we are interested in
			
 
				+  ApplicationReport report = yarnClient.getApplicationReport(appId);
			
 
				++-----+
			
 
				+
			
 
				+      The <<<ApplicationReport>>> received from the RM consists of the following:
			
 
				+
			
 
				+        * General application information: Application id, queue to which the
			
 
				+          application was submitted, user who submitted the application and the
			
 
				+          start time for the application.
			
 
				+
			
 
				+        * ApplicationMaster details: the host on which the AM is running, the
			
 
				+          rpc port (if any) on which it is listening for requests from clients
			
 
				+          and a token that the client needs to communicate with the AM.
			
 
				+
			
 
				+        * Application tracking information: If the application supports some form
			
 
				+          of progress tracking, it can set a tracking url which is available via
			
 
				+          <<<ApplicationReport>>>'s <<<getTrackingUrl()>>> method that a client
			
 
				+          can look at to monitor progress.
			
 
				+
			
 
				+        * Application status: The state of the application as seen by the
			
 
				+          ResourceManager is available via
			
 
				+          <<<ApplicationReport#getYarnApplicationState>>>. If the
			
 
				+          <<<YarnApplicationState>>> is set to <<<FINISHED>>>, the client should
			
 
				+          refer to <<<ApplicationReport#getFinalApplicationStatus>>> to check for
			
 
				+          the actual success/failure of the application task itself. In case of
			
 
				+          failures, <<<ApplicationReport#getDiagnostics>>> may be useful to shed
			
 
				+          some more light on the the failure.
			
 
				+
			
 
				+    * If the ApplicationMaster supports it, a client can directly query the AM
			
 
				+      itself for progress updates via the host:rpcport information obtained from
			
 
				+      the application report. It can also use the tracking url obtained from the
			
 
				+      report if available.
			
 
				+
			
 
				+  * In certain situations, if the application is taking too long or due to other
			
 
				+    factors, the client may wish to kill the application. <<<YarnClient>>>
			
 
				+    supports the <<<killApplication>>> call that allows a client to send a kill
			
 
				+    signal to the AM via the ResourceManager. An ApplicationMaster if so
			
 
				+    designed may also support an abort call via its rpc layer that a client may
			
 
				+    be able to leverage.
			
 
				+
			
 
				++---+
			
 
				+  yarnClient.killApplication(appId);
			
 
				++---+
			
 
				+
			
 
				+** Writing an ApplicationMaster (AM)
			
 
				+
			
 
				+  * The AM is the actual owner of the job. It will be launched
			
 
				+    by the RM and via the client will be provided all the
			
 
				+    necessary information and resources about the job that it has been tasked
			
 
				+    with to oversee and complete.
			
 
				+
			
 
				+  * As the AM is launched within a container that may (likely
			
 
				+    will) be sharing a physical host with other containers, given the
			
 
				+    multi-tenancy nature, amongst other issues, it cannot make any assumptions
			
 
				+    of things like pre-configured ports that it can listen on.
			
 
				+
			
 
				+  * When the AM starts up, several parameters are made available
			
 
				+    to it via the environment. These include the <<<ContainerId>>> for the
			
 
				+    AM container, the application submission time and details
			
 
				+    about the NM (NodeManager) host running the ApplicationMaster.
			
 
				+    Ref <<<ApplicationConstants>>> for parameter names.
			
 
				+
			
 
				+  * All interactions with the RM require an <<<ApplicationAttemptId>>> (there can
			
 
				+    be multiple attempts per application in case of failures). The
			
 
				+    <<<ApplicationAttemptId>>> can be obtained from the AM's container id. There
			
 
				+    are helper APIs to convert the value obtained from the environment into
			
 
				+    objects.
			
 
				+
			
 
				++---+
			
 
				+  Map<String, String> envs = System.getenv();
			
 
				+  String containerIdString =
			
 
				+      envs.get(ApplicationConstants.AM_CONTAINER_ID_ENV);
			
 
				+  if (containerIdString == null) {
			
 
				+    // container id should always be set in the env by the framework
			
 
				+    throw new IllegalArgumentException(
			
 
				+        "ContainerId not set in the environment");
			
 
				+  }
			
 
				+  ContainerId containerId = ConverterUtils.toContainerId(containerIdString);
			
 
				+  ApplicationAttemptId appAttemptID = containerId.getApplicationAttemptId();
			
 
				++---+
			
 
				+
			
 
				+  * After an AM has initialized itself completely, we can start the two clients:
			
 
				+    one to ResourceManager, and one to NodeManagers. We set them up with our
			
 
				+    customized event handler, and we will talk about those event handlers in
			
 
				+    detail later in this article.
			
 
				+
			
 
				++---+
			
 
				+  AMRMClientAsync.CallbackHandler allocListener = new RMCallbackHandler();
			
 
				+  amRMClient = AMRMClientAsync.createAMRMClientAsync(1000, allocListener);
			
 
				+  amRMClient.init(conf);
			
 
				+  amRMClient.start();
			
 
				+
			
 
				+  containerListener = createNMCallbackHandler();
			
 
				+  nmClientAsync = new NMClientAsyncImpl(containerListener);
			
 
				+  nmClientAsync.init(conf);
			
 
				+  nmClientAsync.start();
			
 
				++---+
			
 
				+
			
 
				+  * The AM has to emit heartbeats to the RM to keep it informed that the AM is
			
 
				+    alive and still running. The timeout expiry interval at the RM is defined by
			
 
				+    a config setting accessible via
			
 
				+    <<<YarnConfiguration.RM_AM_EXPIRY_INTERVAL_MS>>> with the default being
			
 
				+    defined by <<<YarnConfiguration.DEFAULT_RM_AM_EXPIRY_INTERVAL_MS>>>. The
			
 
				+    ApplicationMaster needs to register itself with the ResourceManager to
			
 
				+    start hearbeating.
			
 
				+
			
 
				++---+
			
 
				+  // Register self with ResourceManager
			
 
				+  // This will start heartbeating to the RM
			
 
				+  appMasterHostname = NetUtils.getHostname();
			
 
				+  RegisterApplicationMasterResponse response = amRMClient
			
 
				+      .registerApplicationMaster(appMasterHostname, appMasterRpcPort,
			
 
				+          appMasterTrackingUrl);
			
 
				++---+
			
 
				+
			
 
				+  * In the response of the registration, maximum resource capability if included. You may want to use this to check the application's request.
			
 
				+
			
 
				++---+
			
 
				+  // Dump out information about cluster capability as seen by the
			
 
				+  // resource manager
			
 
				+  int maxMem = response.getMaximumResourceCapability().getMemory();
			
 
				+  LOG.info("Max mem capabililty of resources in this cluster " + maxMem);
			
 
				+
			
 
				+  int maxVCores = response.getMaximumResourceCapability().getVirtualCores();
			
 
				+  LOG.info("Max vcores capabililty of resources in this cluster " + maxVCores);
			
 
				+
			
 
				+  // A resource ask cannot exceed the max.
			
 
				+  if (containerMemory > maxMem) {
			
 
				+    LOG.info("Container memory specified above max threshold of cluster."
			
 
				+        + " Using max value." + ", specified=" + containerMemory + ", max="
			
 
				+        + maxMem);
			
 
				+    containerMemory = maxMem;
			
 
				+  }
			
 
				+
			
 
				+  if (containerVirtualCores > maxVCores) {
			
 
				+    LOG.info("Container virtual cores specified above max threshold of  cluster."
			
 
				+      + " Using max value." + ", specified=" + containerVirtualCores + ", max="
			
 
				+      + maxVCores);
			
 
				+    containerVirtualCores = maxVCores;
			
 
				+  }
			
 
				+  List<Container> previousAMRunningContainers =
			
 
				+      response.getContainersFromPreviousAttempts();
			
 
				+  LOG.info("Received " + previousAMRunningContainers.size()
			
 
				+          + " previous AM's running containers on AM registration.");
			
 
				++---+
			
 
				+
			
 
				+  * Based on the task requirements, the AM can ask for a set of containers to run
			
 
				+    its tasks on. We can now calculate how many containers we need, and request
			
 
				+    those many containers.
			
 
				+
			
 
				++---+
			
 
				+  List<Container> previousAMRunningContainers =
			
 
				+      response.getContainersFromPreviousAttempts();
			
 
				+  List<Container> previousAMRunningContainers =
			
 
				+      response.getContainersFromPreviousAttempts();
			
 
				+  LOG.info("Received " + previousAMRunningContainers.size()
			
 
				+      + " previous AM's running containers on AM registration.");
			
 
				+
			
 
				+  int numTotalContainersToRequest =
			
 
				+      numTotalContainers - previousAMRunningContainers.size();
			
 
				+  // Setup ask for containers from RM
			
 
				+  // Send request for containers to RM
			
 
				+  // Until we get our fully allocated quota, we keep on polling RM for
			
 
				+  // containers
			
 
				+  // Keep looping until all the containers are launched and shell script
			
 
				+  // executed on them ( regardless of success/failure).
			
 
				+  for (int i = 0; i < numTotalContainersToRequest; ++i) {
			
 
				+    ContainerRequest containerAsk = setupContainerAskForRM();
			
 
				+    amRMClient.addContainerRequest(containerAsk);
			
 
				+  }
			
 
				++---+
			
 
				+
			
 
				+  * In <<<setupContainerAskForRM()>>>, the follow two things need some set up:
			
 
				+
			
 
				+    * Resource capability: Currently, YARN supports memory based resource
			
 
				+      requirements so the request should define how much memory is needed. The
			
 
				+      value is defined in MB and has to less than the max capability of the
			
 
				       cluster and an exact multiple of the min capability. Memory resources
			
 
				-      correspond to physical memory limits imposed on the task containers.
			
 
				-      
			
 
				-    * Priority: When asking for sets of containers, an ApplicationMaster may 
			
 
				-      define different priorities to each set. For example, the Map-Reduce 
			
 
				-      ApplicationMaster may assign a higher priority to containers needed 
			
 
				-      for the Map tasks and a lower priority for the Reduce tasks' containers.
			
 
				-      
			
 
				-    []     
			
 
				-       
			
 
				-+----+ 
			
 
				-    // Resource Request
			
 
				-    ResourceRequest rsrcRequest = Records.newRecord(ResourceRequest.class);
			
 
				-
			
 
				-    // setup requirements for hosts 
			
 
				-    // whether a particular rack/host is needed 
			
 
				-    // useful for applications that are sensitive
			
 
				-    // to data locality 
			
 
				-    rsrcRequest.setHostName("*");
			
 
				+      correspond to physical memory limits imposed on the task containers. It
			
 
				+      will also support computation based resource (vCore), as shown in the code.
			
 
				 
			
 
				+    * Priority: When asking for sets of containers, an AM may define different
			
 
				+      priorities to each set. For example, the Map-Reduce AM may assign a higher
			
 
				+      priority to containers needed for the Map tasks and a lower priority for
			
 
				+      the Reduce tasks' containers.
			
 
				+
			
 
				+    []
			
 
				+
			
 
				++---+
			
 
				+  private ContainerRequest setupContainerAskForRM() {
			
 
				+    // setup requirements for hosts
			
 
				+    // using * as any host will do for the distributed shell app
			
 
				     // set the priority for the request
			
 
				-    Priority pri = Records.newRecord(Priority.class);
			
 
				-    pri.setPriority(requestPriority);
			
 
				-    rsrcRequest.setPriority(pri);	    
			
 
				+    Priority pri = Priority.newInstance(requestPriority);
			
 
				 
			
 
				     // Set up resource type requirements
			
 
				-    // For now, only memory is supported so we set memory requirements
			
 
				-    Resource capability = Records.newRecord(Resource.class);
			
 
				-    capability.setMemory(containerMemory);
			
 
				-    rsrcRequest.setCapability(capability);
			
 
				-
			
 
				-    // set no. of containers needed
			
 
				-    // matching the specifications
			
 
				-    rsrcRequest.setNumContainers(numContainers);
			
 
				-+---+
			
 
				-        
			
 
				-  * After defining the container requirements, the ApplicationMaster has to 
			
 
				-    construct an AllocateRequest to send to the ResourceManager. 
			
 
				-    The AllocateRequest consists of:
			
 
				-        
			
 
				-    * Requested containers: The container specifications and the no. of 
			
 
				-      containers being requested for by the ApplicationMaster from the 
			
 
				-      ResourceManager. 
			
 
				-    
			
 
				-    * Released containers: There may be situations when the ApplicationMaster 
			
 
				-      may have requested for more containers that it needs or due to failure 
			
 
				-      issues, decide to use other containers allocated to it. In all such 
			
 
				-      situations, it is beneficial to the cluster if the ApplicationMaster 
			
 
				-      releases these containers back to the ResourceManager so that they can be 
			
 
				-      re-allocated to other applications.   
			
 
				-    
			
 
				-    * ResponseId: The response id that will be sent back in the response from 
			
 
				-      the allocate call.  
			
 
				-     
			
 
				-    * Progress update information: The ApplicationMaster can send its progress 
			
 
				-      update (range between to 0 to 1) to the ResourceManager. 
			
 
				-    
			
 
				-    []
			
 
				-    
			
 
				+    // For now, memory and CPU are supported so we set memory and cpu requirements
			
 
				+    Resource capability = Resource.newInstance(containerMemory,
			
 
				+      containerVirtualCores);
			
 
				+
			
 
				+    ContainerRequest request = new ContainerRequest(capability, null, null,
			
 
				+        pri);
			
 
				+    LOG.info("Requested container ask: " + request.toString());
			
 
				+    return request;
			
 
				+  }
			
 
				 +---+
			
 
				-    List<ResourceRequest> requestedContainers;
			
 
				-    List<ContainerId> releasedContainers    
			
 
				-    AllocateRequest req = Records.newRecord(AllocateRequest.class);
			
 
				 
			
 
				-    // The response id set in the request will be sent back in 
			
 
				-    // the response so that the ApplicationMaster can 
			
 
				-    // match it to its original ask and act appropriately.
			
 
				-    req.setResponseId(rmRequestID);
			
 
				-    
			
 
				-    // Set ApplicationAttemptId 
			
 
				-    req.setApplicationAttemptId(appAttemptID);
			
 
				-    
			
 
				-    // Add the list of containers being asked for 
			
 
				-    req.addAllAsks(requestedContainers);
			
 
				-    
			
 
				-    // If the ApplicationMaster has no need for certain 
			
 
				-    // containers due to over-allocation or for any other
			
 
				-    // reason, it can release them back to the ResourceManager
			
 
				-    req.addAllReleases(releasedContainers);
			
 
				-    
			
 
				-    // Assuming the ApplicationMaster can track its progress
			
 
				-    req.setProgress(currentProgress);
			
 
				-    
			
 
				-    AllocateResponse allocateResponse = resourceManager.allocate(req);		     
			
 
				+  * After container allocation requests have been sent by the application
			
 
				+    manager, contailers will be launched asynchronously, by the event handler of
			
 
				+    the <<<AMRMClientAsync>>> client. The handler should implement
			
 
				+    <<<AMRMClientAsync.CallbackHandler>>> interface.
			
 
				+
			
 
				+    * When there are containers allocated, the handler sets up a thread that runs
			
 
				+      the code to launch containers. Here we use the name
			
 
				+      <<<LaunchContainerRunnable>>> to demonstrate. We will talk about the
			
 
				+      <<<LaunchContainerRunnable>>> class in the following part of this article.
			
 
				+
			
 
				 +---+
			
 
				-    
			
 
				-  * The AllocateResponse sent back from the ResourceManager provides the 
			
 
				-    following information:
			
 
				-  
			
 
				-    * Reboot flag: For scenarios when the ApplicationMaster may get out of sync 
			
 
				-      with the ResourceManager. 
			
 
				-    
			
 
				-    * Allocated containers: The containers that have been allocated to the 
			
 
				-      ApplicationMaster.
			
 
				-    
			
 
				-    * Headroom: Headroom for resources in the cluster. Based on this information 
			
 
				-      and knowing its needs, an ApplicationMaster can make intelligent decisions 
			
 
				-      such as re-prioritizing sub-tasks to take advantage of currently allocated 
			
 
				-      containers, bailing out faster if resources are not becoming available 
			
 
				-      etc.         
			
 
				-    
			
 
				-    * Completed containers: Once an ApplicationMaster triggers a launch an 
			
 
				-      allocated container, it will receive an update from the ResourceManager 
			
 
				-      when the container completes. The ApplicationMaster can look into the 
			
 
				-      status of the completed container and take appropriate actions such as 
			
 
				-      re-trying a particular sub-task in case of a failure.
			
 
				-
			
 
				-    * Number of cluster nodes: The number of hosts available on the cluster.
			
 
				-      
			
 
				-    [] 
			
 
				-      
			
 
				-    One thing to note is that containers will not be immediately allocated to 
			
 
				-    the ApplicationMaster. This does not imply that the ApplicationMaster should 
			
 
				-    keep on asking the pending count of required containers. Once an allocate 
			
 
				-    request has been sent, the ApplicationMaster will eventually be allocated 
			
 
				-    the containers based on cluster capacity, priorities and the scheduling 
			
 
				-    policy in place. The ApplicationMaster should only request for containers 
			
 
				-    again if and only if its original estimate changed and it needs additional 
			
 
				-    containers. 
			
 
				-
			
 
				-+---+
			
 
				-
			
 
				-    // Retrieve list of allocated containers from the response 
			
 
				-    // and on each allocated container, lets assume we are launching 
			
 
				-    // the same job.
			
 
				-    List<Container> allocatedContainers = allocateResponse.getAllocatedContainers();
			
 
				+  @Override
			
 
				+  public void onContainersAllocated(List<Container> allocatedContainers) {
			
 
				+    LOG.info("Got response from RM for container ask, allocatedCnt="
			
 
				+        + allocatedContainers.size());
			
 
				+    numAllocatedContainers.addAndGet(allocatedContainers.size());
			
 
				     for (Container allocatedContainer : allocatedContainers) {
			
 
				-      LOG.info("Launching shell command on a new container."
			
 
				-          + ", containerId=" + allocatedContainer.getId()
			
 
				-          + ", containerNode=" + allocatedContainer.getNodeId().getHost() 
			
 
				-          + ":" + allocatedContainer.getNodeId().getPort()
			
 
				-          + ", containerNodeURI=" + allocatedContainer.getNodeHttpAddress()
			
 
				-          + ", containerState" + allocatedContainer.getState()
			
 
				-          + ", containerResourceMemory"  
			
 
				-          + allocatedContainer.getResource().getMemory());
			
 
				-          
			
 
				-          
			
 
				-      // Launch and start the container on a separate thread to keep the main 
			
 
				-      // thread unblocked as all containers may not be allocated at one go.
			
 
				-      LaunchContainerRunnable runnableLaunchContainer = 
			
 
				-          new LaunchContainerRunnable(allocatedContainer);
			
 
				-      Thread launchThread = new Thread(runnableLaunchContainer);	
			
 
				+      LaunchContainerRunnable runnableLaunchContainer =
			
 
				+          new LaunchContainerRunnable(allocatedContainer, containerListener);
			
 
				+      Thread launchThread = new Thread(runnableLaunchContainer);
			
 
				+
			
 
				+      // launch and start the container on a separate thread to keep
			
 
				+      // the main thread unblocked
			
 
				+      // as all containers may not be allocated at one go.
			
 
				       launchThreads.add(launchThread);
			
 
				       launchThread.start();
			
 
				     }
			
 
				+  }
			
 
				++---+
			
 
				 
			
 
				-    // Check what the current available resources in the cluster are
			
 
				-    Resource availableResources = allocateResponse.getAvailableResources();
			
 
				-    // Based on this information, an ApplicationMaster can make appropriate 
			
 
				-    // decisions
			
 
				-
			
 
				-    // Check the completed containers
			
 
				-    // Let's assume we are keeping a count of total completed containers, 
			
 
				-    // containers that failed and ones that completed successfully.  			
			
 
				-    List<ContainerStatus> completedContainers = 
			
 
				-        allocateResponse.getCompletedContainersStatuses();
			
 
				-    for (ContainerStatus containerStatus : completedContainers) {				
			
 
				-      LOG.info("Got container status for containerID= " 
			
 
				-          + containerStatus.getContainerId()
			
 
				-          + ", state=" + containerStatus.getState()	
			
 
				-          + ", exitStatus=" + containerStatus.getExitStatus() 
			
 
				-          + ", diagnostics=" + containerStatus.getDiagnostics());
			
 
				-
			
 
				-      int exitStatus = containerStatus.getExitStatus();
			
 
				-      if (0 != exitStatus) {
			
 
				-        // container failed 
			
 
				-        // -100 is a special case where the container 
			
 
				-        // was aborted/pre-empted for some reason 
			
 
				-        if (-100 != exitStatus) {
			
 
				-          // application job on container returned a non-zero exit code
			
 
				-          // counts as completed 
			
 
				-          numCompletedContainers.incrementAndGet();
			
 
				-          numFailedContainers.incrementAndGet();							
			
 
				-        }
			
 
				-        else { 
			
 
				-          // something else bad happened 
			
 
				-          // app job did not complete for some reason 
			
 
				-          // we should re-try as the container was lost for some reason
			
 
				-          // decrementing the requested count so that we ask for an
			
 
				-          // additional one in the next allocate call.          
			
 
				-          numRequestedContainers.decrementAndGet();
			
 
				-          // we do not need to release the container as that has already 
			
 
				-          // been done by the ResourceManager/NodeManager. 
			
 
				-        }
			
 
				-        }
			
 
				-        else { 
			
 
				-          // nothing to do 
			
 
				-          // container completed successfully 
			
 
				-          numCompletedContainers.incrementAndGet();
			
 
				-          numSuccessfulContainers.incrementAndGet();
			
 
				-        }
			
 
				-      }
			
 
				-    }
			
 
				-+---+      
			
 
				+    * On heart beat, the event handler reports the progress of the application.
			
 
				 
			
 
				-    
			
 
				-  * After a container has been allocated to the ApplicationMaster, it needs to 
			
 
				-    follow a similar process that the Client followed in setting up the 
			
 
				-    ContainerLaunchContext for the eventual task that is going to be running on 
			
 
				-    the allocated Container. Once the ContainerLaunchContext is defined, the 
			
 
				-    ApplicationMaster can then communicate with the ContainerManager to start 
			
 
				-    its allocated container.
			
 
				-       
			
 
				-+---+
			
 
				-       
			
 
				-    //Assuming an allocated Container obtained from AllocateResponse
			
 
				-    Container container;   
			
 
				-    // Connect to ContainerManager on the allocated container 
			
 
				-    String cmIpPortStr = container.getNodeId().getHost() + ":" 
			
 
				-        + container.getNodeId().getPort();		
			
 
				-    InetSocketAddress cmAddress = NetUtils.createSocketAddr(cmIpPortStr);		
			
 
				-    ContainerManager cm = 
			
 
				-        (ContainerManager)rpc.getProxy(ContainerManager.class, cmAddress, conf);     
			
 
				-
			
 
				-    // Now we setup a ContainerLaunchContext  
			
 
				-    ContainerLaunchContext ctx = 
			
 
				-        Records.newRecord(ContainerLaunchContext.class);
			
 
				-
			
 
				-    ctx.setContainerId(container.getId());
			
 
				-    ctx.setResource(container.getResource());
			
 
				-
			
 
				-    try {
			
 
				-      ctx.setUser(UserGroupInformation.getCurrentUser().getShortUserName());
			
 
				-    } catch (IOException e) {
			
 
				-      LOG.info(
			
 
				-          "Getting current user failed when trying to launch the container",
			
 
				-          + e.getMessage());
			
 
				-    }
			
 
				++---+
			
 
				+  @Override
			
 
				+  public float getProgress() {
			
 
				+    // set progress to deliver to RM on next heartbeat
			
 
				+    float progress = (float) numCompletedContainers.get()
			
 
				+        / numTotalContainers;
			
 
				+    return progress;
			
 
				+  }
			
 
				++---+
			
 
				 
			
 
				-    // Set the environment 
			
 
				-    Map<String, String> unixEnv;
			
 
				-    // Setup the required env. 
			
 
				-    // Please note that the launched container does not inherit 
			
 
				-    // the environment of the ApplicationMaster so all the 
			
 
				-    // necessary environment settings will need to be re-setup 
			
 
				-    // for this allocated container.      
			
 
				-    ctx.setEnvironment(unixEnv);
			
 
				-
			
 
				-    // Set the local resources 
			
 
				-    Map<String, LocalResource> localResources = 
			
 
				-        new HashMap<String, LocalResource>();
			
 
				-    // Again, the local resources from the ApplicationMaster is not copied over 
			
 
				-    // by default to the allocated container. Thus, it is the responsibility 
			
 
				- 	  // of the ApplicationMaster to setup all the necessary local resources 
			
 
				- 	  // needed by the job that will be executed on the allocated container. 
			
 
				-      
			
 
				-    // Assume that we are executing a shell script on the allocated container 
			
 
				-    // and the shell script's location in the filesystem is known to us. 
			
 
				-    Path shellScriptPath; 
			
 
				-    LocalResource shellRsrc = Records.newRecord(LocalResource.class);
			
 
				-    shellRsrc.setType(LocalResourceType.FILE);
			
 
				-    shellRsrc.setVisibility(LocalResourceVisibility.APPLICATION);	   
			
 
				-    shellRsrc.setResource(
			
 
				-        ConverterUtils.getYarnUrlFromURI(new URI(shellScriptPath)));
			
 
				-    shellRsrc.setTimestamp(shellScriptPathTimestamp);
			
 
				-    shellRsrc.setSize(shellScriptPathLen);
			
 
				-    localResources.put("MyExecShell.sh", shellRsrc);
			
 
				-
			
 
				-    ctx.setLocalResources(localResources);			
			
 
				-
			
 
				-    // Set the necessary command to execute on the allocated container 
			
 
				-    String command = "/bin/sh ./MyExecShell.sh"
			
 
				-        + " 1>" + ApplicationConstants.LOG_DIR_EXPANSION_VAR + "/stdout"
			
 
				-        + " 2>" + ApplicationConstants.LOG_DIR_EXPANSION_VAR + "/stderr";
			
 
				-
			
 
				-    List<String> commands = new ArrayList<String>();
			
 
				-    commands.add(command);
			
 
				-    ctx.setCommands(commands);
			
 
				-
			
 
				-    // Send the start request to the ContainerManager
			
 
				-    StartContainerRequest startReq = Records.newRecord(StartContainerRequest.class);
			
 
				-    startReq.setContainerLaunchContext(ctx);
			
 
				-    cm.startContainer(startReq);
			
 
				-+---+                
			
 
				-      
			
 
				-  * The ApplicationMaster, as mentioned previously, will get updates of 
			
 
				-    completed containers as part of the response from the ApplicationMasterProtocol#allocate 
			
 
				-    calls. It can also monitor its launched containers pro-actively by querying 
			
 
				-    the ContainerManager for the status. 
			
 
				-    
			
 
				+    []
			
 
				+
			
 
				+  * The container launch thread actually launches the containers on NMs. After a
			
 
				+    container has been allocated to the AM, it needs to follow a similar process
			
 
				+    that the client followed in setting up the <<<ContainerLaunchContext>>> for
			
 
				+    the eventual task that is going to be running on the allocated Container.
			
 
				+    Once the <<<ContainerLaunchContext>>> is defined, the AM can start it through
			
 
				+    the <<<NMClientAsync>>>.
			
 
				+
			
 
				++---+
			
 
				+  // Set the necessary command to execute on the allocated container
			
 
				+  Vector<CharSequence> vargs = new Vector<CharSequence>(5);
			
 
				+
			
 
				+  // Set executable command
			
 
				+  vargs.add(shellCommand);
			
 
				+  // Set shell script path
			
 
				+  if (!scriptPath.isEmpty()) {
			
 
				+    vargs.add(Shell.WINDOWS ? ExecBatScripStringtPath
			
 
				+      : ExecShellStringPath);
			
 
				+  }
			
 
				+
			
 
				+  // Set args for the shell command if any
			
 
				+  vargs.add(shellArgs);
			
 
				+  // Add log redirect params
			
 
				+  vargs.add("1>" + ApplicationConstants.LOG_DIR_EXPANSION_VAR + "/stdout");
			
 
				+  vargs.add("2>" + ApplicationConstants.LOG_DIR_EXPANSION_VAR + "/stderr");
			
 
				+
			
 
				+  // Get final commmand
			
 
				+  StringBuilder command = new StringBuilder();
			
 
				+  for (CharSequence str : vargs) {
			
 
				+    command.append(str).append(" ");
			
 
				+  }
			
 
				+
			
 
				+  List<String> commands = new ArrayList<String>();
			
 
				+  commands.add(command.toString());
			
 
				+
			
 
				+  // Set up ContainerLaunchContext, setting local resource, environment,
			
 
				+  // command and token for constructor.
			
 
				+
			
 
				+  // Note for tokens: Set up tokens for the container too. Today, for normal
			
 
				+  // shell commands, the container in distribute-shell doesn't need any
			
 
				+  // tokens. We are populating them mainly for NodeManagers to be able to
			
 
				+  // download anyfiles in the distributed file-system. The tokens are
			
 
				+  // otherwise also useful in cases, for e.g., when one is running a
			
 
				+  // "hadoop dfs" command inside the distributed shell.
			
 
				+  ContainerLaunchContext ctx = ContainerLaunchContext.newInstance(
			
 
				+    localResources, shellEnv, commands, null, allTokens.duplicate(), null);
			
 
				+  containerListener.addContainer(container.getId(), container);
			
 
				+  nmClientAsync.startContainerAsync(container, ctx);
			
 
				 +---+
			
 
				 
			
 
				-    GetContainerStatusRequest statusReq = 
			
 
				-        Records.newRecord(GetContainerStatusRequest.class);
			
 
				-    statusReq.setContainerId(container.getId());
			
 
				-    GetContainerStatusResponse statusResp = cm.getContainerStatus(statusReq);
			
 
				-    LOG.info("Container Status"
			
 
				-        + ", id=" + container.getId()
			
 
				-        + ", status=" + statusResp.getStatus());
			
 
				-+---+      
			
 
				+  * The <<<NMClientAsync>>> object, together with its event handler, handles container events. Including container start, stop, status update, and occurs an error.
			
 
				+  
			
 
				+  * After the ApplicationMaster determines the work is done, it needs to unregister itself through the AM-RM client, and then stops the client. 
			
 
				+
			
 
				++---+
			
 
				+  try {
			
 
				+    amRMClient.unregisterApplicationMaster(appStatus, appMessage, null);
			
 
				+  } catch (YarnException ex) {
			
 
				+    LOG.error("Failed to unregister application", ex);
			
 
				+  } catch (IOException e) {
			
 
				+    LOG.error("Failed to unregister application", e);
			
 
				+  }
			
 
				+  
			
 
				+  amRMClient.stop();
			
 
				++---+
			
 
				 
			
 
				 ~~** Defining the context in which your code runs
			
 
				 
			
 
				-~~*** Container Resource Requests 
			
 
				+~~*** Container Resource Requests
			
 
				 
			
 
				-~~*** Local Resources 
			
 
				+~~*** Local Resources
			
 
				 
			
 
				-~~*** Environment 
			
 
				+~~*** Environment
			
 
				 
			
 
				-~~**** Managing the CLASSPATH 
			
 
				+~~**** Managing the CLASSPATH
			
 
				 
			
 
				-~~** Security 
			
 
				+~~** Security
			
 
				 
			
 
				-* FAQ 
			
 
				+* FAQ
			
 
				 
			
 
				-** How can I distribute my application's jars to all of the nodes in the YARN 
			
 
				+** How can I distribute my application's jars to all of the nodes in the YARN
			
 
				    cluster that need it?
			
 
				 
			
 
				-  You can use the LocalResource to add resources to your application request. 
			
 
				-  This will cause YARN to distribute the resource to the ApplicationMaster node. 
			
 
				-  If the resource is a tgz, zip, or jar - you can have YARN unzip it. Then, all 
			
 
				-  you need to do is add the unzipped folder to your classpath. 
			
 
				-  For example, when creating your application request:
			
 
				-
			
 
				-+---+
			
 
				-    File packageFile = new File(packagePath);
			
 
				-    Url packageUrl = ConverterUtils.getYarnUrlFromPath(
			
 
				-        FileContext.getFileContext.makeQualified(new Path(packagePath)));
			
 
				-
			
 
				-    packageResource.setResource(packageUrl);
			
 
				-    packageResource.setSize(packageFile.length());
			
 
				-    packageResource.setTimestamp(packageFile.lastModified());
			
 
				-    packageResource.setType(LocalResourceType.ARCHIVE);
			
 
				-    packageResource.setVisibility(LocalResourceVisibility.APPLICATION);
			
 
				-
			
 
				-    resource.setMemory(memory)
			
 
				-    containerCtx.setResource(resource)
			
 
				-    containerCtx.setCommands(ImmutableList.of(
			
 
				-        "java -cp './package/*' some.class.to.Run "
			
 
				-        + "1>" + ApplicationConstants.LOG_DIR_EXPANSION_VAR + "/stdout "
			
 
				-        + "2>" + ApplicationConstants.LOG_DIR_EXPANSION_VAR + "/stderr"))
			
 
				-    containerCtx.setLocalResources(
			
 
				-        Collections.singletonMap("package", packageResource))
			
 
				-    appCtx.setApplicationId(appId)
			
 
				-    appCtx.setUser(user.getShortUserName)
			
 
				-    appCtx.setAMContainerSpec(containerCtx)
			
 
				-    request.setApplicationSubmissionContext(appCtx)
			
 
				-    applicationsManager.submitApplication(request)
			
 
				-+---+
			
 
				-
			
 
				-  As you can see, the setLocalResources command takes a map of names to 
			
 
				-  resources. The name becomes a sym link in your application's cwd, so you can 
			
 
				-  just refer to the artifacts inside by using ./package/*. 
			
 
				-  
			
 
				-  Note: Java's classpath (cp) argument is VERY sensitive. 
			
 
				-  Make sure you get the syntax EXACTLY correct.
			
 
				+  * You can use the LocalResource to add resources to your application request.
			
 
				+    This will cause YARN to distribute the resource to the ApplicationMaster
			
 
				+    node. If the resource is a tgz, zip, or jar - you can have YARN unzip it.
			
 
				+    Then, all you need to do is add the unzipped folder to your classpath. For
			
 
				+    example, when creating your application request:
			
 
				 
			
 
				-  Once your package is distributed to your ApplicationMaster, you'll need to 
			
 
				-  follow the same process whenever your ApplicationMaster starts a new container 
			
 
				-  (assuming you want the resources to be sent to your container). The code for 
			
 
				-  this is the same. You just need to make sure that you give your 
			
 
				-  ApplicationMaster the package path (either HDFS, or local), so that it can 
			
 
				-  send the resource URL along with the container ctx.
			
 
				++---+
			
 
				+  File packageFile = new File(packagePath);
			
 
				+  Url packageUrl = ConverterUtils.getYarnUrlFromPath(
			
 
				+      FileContext.getFileContext.makeQualified(new Path(packagePath)));
			
 
				+
			
 
				+  packageResource.setResource(packageUrl);
			
 
				+  packageResource.setSize(packageFile.length());
			
 
				+  packageResource.setTimestamp(packageFile.lastModified());
			
 
				+  packageResource.setType(LocalResourceType.ARCHIVE);
			
 
				+  packageResource.setVisibility(LocalResourceVisibility.APPLICATION);
			
 
				+
			
 
				+  resource.setMemory(memory);
			
 
				+  containerCtx.setResource(resource);
			
 
				+  containerCtx.setCommands(ImmutableList.of(
			
 
				+      "java -cp './package/*' some.class.to.Run "
			
 
				+      + "1>" + ApplicationConstants.LOG_DIR_EXPANSION_VAR + "/stdout "
			
 
				+      + "2>" + ApplicationConstants.LOG_DIR_EXPANSION_VAR + "/stderr"));
			
 
				+  containerCtx.setLocalResources(
			
 
				+      Collections.singletonMap("package", packageResource));
			
 
				+  appCtx.setApplicationId(appId);
			
 
				+  appCtx.setUser(user.getShortUserName);
			
 
				+  appCtx.setAMContainerSpec(containerCtx);
			
 
				+  yarnClient.submitApplication(appCtx);
			
 
				++---+
			
 
				 
			
 
				-** How do I get the ApplicationMaster's ApplicationAttemptId? 
			
 
				+  As you can see, the <<<setLocalResources>>> command takes a map of names to
			
 
				+  resources. The name becomes a sym link in your application's cwd, so you can
			
 
				+  just refer to the artifacts inside by using ./package/*.
			
 
				 
			
 
				+  Note: Java's classpath (cp) argument is VERY sensitive.
			
 
				+  Make sure you get the syntax EXACTLY correct.
			
 
				 
			
 
				-  The ApplicationAttemptId will be passed to the ApplicationMaster via the 
			
 
				-  environment and the value from the environment can be converted into an 
			
 
				-  ApplicationAttemptId object via the ConverterUtils helper function.
			
 
				+  Once your package is distributed to your AM, you'll need to follow the same
			
 
				+  process whenever your AM starts a new container (assuming you want the
			
 
				+  resources to be sent to your container). The code for this is the same. You
			
 
				+  just need to make sure that you give your AM the package path (either HDFS, or
			
 
				+  local), so that it can send the resource URL along with the container ctx.
			
 
				 
			
 
				-** My container is being killed by the Node Manager
			
 
				+** How do I get the ApplicationMaster's <<<ApplicationAttemptId>>>?
			
 
				 
			
 
				-  This is likely due to high memory usage exceeding your requested container 
			
 
				-  memory size. There are a number of reasons that can cause this. First, look 
			
 
				-  at the process tree that the node manager dumps when it kills your container. 
			
 
				-  The two things you're interested in are physical memory and virtual memory. 
			
 
				-  If you have exceeded physical memory limits your app is using too much physical 
			
 
				-  memory. If you're running a Java app, you can use -hprof to look at what is 
			
 
				-  taking up space in the heap. If you have exceeded virtual memory, you may
			
 
				-  need to increase the value of the the cluster-wide configuration variable
			
 
				-  <<<yarn.nodemanager.vmem-pmem-ratio>>>.
			
 
				+  * The <<<ApplicationAttemptId>>> will be passed to the AM via the environment
			
 
				+    and the value from the environment can be converted into an
			
 
				+    <<<ApplicationAttemptId>>> object via the ConverterUtils helper function.
			
 
				 
			
 
				-** How do I include native libraries?
			
 
				+** Why my container is killed by the NodeManager?
			
 
				 
			
 
				+  * This is likely due to high memory usage exceeding your requested container
			
 
				+    memory size. There are a number of reasons that can cause this. First, look
			
 
				+    at the process tree that the NodeManager dumps when it kills your container.
			
 
				+    The two things you're interested in are physical memory and virtual memory.
			
 
				+    If you have exceeded physical memory limits your app is using too much
			
 
				+    physical memory. If you're running a Java app, you can use -hprof to look at
			
 
				+    what is taking up space in the heap. If you have exceeded virtual memory, you
			
 
				+    may need to increase the value of the the cluster-wide configuration variable
			
 
				+    <<<yarn.nodemanager.vmem-pmem-ratio>>>.
			
 
				+
			
 
				+** How do I include native libraries?
			
 
				 
			
 
				-  Setting -Djava.library.path on the command line while launching a container 
			
 
				-  can cause native libraries used by Hadoop to not be loaded correctly and can
			
 
				-  result in errors. It is cleaner to use LD_LIBRARY_PATH instead.
			
 
				+  * Setting <<<-Djava.library.path>>> on the command line while launching a
			
 
				+    container can cause native libraries used by Hadoop to not be loaded
			
 
				+    correctly and can result in errors. It is cleaner to use
			
 
				+    <<<LD_LIBRARY_PATH>>> instead.
			
 
				 
			
 
				 * Useful Links
			
 
				 
			
 
				-  * {{{https://issues.apache.org/jira/secure/attachment/12486023/MapReduce_NextGen_Architecture.pdf}Map Reduce Next Generation Architecture}}
			
 
				+  * {{{http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/YARN.html}YARN Architecture}}
			
 
				+
			
 
				+  * {{{http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/CapacityScheduler.html}YARN Capacity Scheduler}}
			
 
				+
			
 
				+  * {{{http://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/FairScheduler.html}YARN Fair Scheduler}}
			
 
				+
			
 
				+* Sample code
			
 
				 
			
 
				-  * {{{http://developer.yahoo.com/blogs/hadoop/posts/2011/03/mapreduce-nextgen-scheduler/}Map Reduce Next Generation Scheduler}}
			
 
				+  * Yarn distributed shell: in <<<hadoop-yarn-applications-distributedshell>>>
			
 
				+    project after you set up your development environment.