浏览代码

Merge -r 633697:633699 from trunk to branch-0.16 to fix HADOOP-2861.

git-svn-id: https://svn.apache.org/repos/asf/hadoop/core/branches/branch-0.16@633701 13f79535-47bb-0310-9956-ffa450edef68
Nigel Daley 17 年之前
父节点
当前提交
e0deba0f83

+ 5 - 0
CHANGES.txt

@@ -2,6 +2,11 @@ Hadoop Change Log
 
 Release 0.16.1 - Unreleased
 
+  INCOMPATIBLE CHANGES
+
+    HADOOP-2861. Improve the user interface for the HOD commands. 
+    Command line structure has changed. (Hemanth Yamijala via nigel)
+
   IMPROVEMENTS
 
     HADOOP-2371. User guide for file permissions in HDFS.

+ 63 - 17
docs/hadoop-default.html

@@ -62,10 +62,6 @@ creations/deletions), or "all".</td>
   determine the host, port, etc. for a filesystem.</td>
 </tr>
 <tr>
-<td><a name="fs.trash.root">fs.trash.root</a></td><td>${hadoop.tmp.dir}/Trash</td><td>The trash directory, used by FsShell's 'rm' command.
-  </td>
-</tr>
-<tr>
 <td><a name="fs.trash.interval">fs.trash.interval</a></td><td>0</td><td>Number of minutes between trash checkpoints.
   If zero, the trash feature is disabled.
   </td>
@@ -106,25 +102,25 @@ creations/deletions), or "all".</td>
   </td>
 </tr>
 <tr>
-<td><a name="dfs.secondary.http.bindAddress">dfs.secondary.http.bindAddress</a></td><td>0.0.0.0:50090</td><td>
-    The secondary namenode http server bind address and port.
+<td><a name="dfs.secondary.http.address">dfs.secondary.http.address</a></td><td>0.0.0.0:50090</td><td>
+    The secondary namenode http server address and port.
     If the port is 0 then the server will start on a free port.
   </td>
 </tr>
 <tr>
-<td><a name="dfs.datanode.bindAddress">dfs.datanode.bindAddress</a></td><td>0.0.0.0:50010</td><td>
-    The address where the datanode will listen to.
+<td><a name="dfs.datanode.address">dfs.datanode.address</a></td><td>0.0.0.0:50010</td><td>
+    The address where the datanode server will listen to.
     If the port is 0 then the server will start on a free port.
   </td>
 </tr>
 <tr>
-<td><a name="dfs.datanode.http.bindAddress">dfs.datanode.http.bindAddress</a></td><td>0.0.0.0:50075</td><td>
-    The datanode http server bind address and port.
+<td><a name="dfs.datanode.http.address">dfs.datanode.http.address</a></td><td>0.0.0.0:50075</td><td>
+    The datanode http server address and port.
     If the port is 0 then the server will start on a free port.
   </td>
 </tr>
 <tr>
-<td><a name="dfs.http.bindAddress">dfs.http.bindAddress</a></td><td>0.0.0.0:50070</td><td>
+<td><a name="dfs.http.address">dfs.http.address</a></td><td>0.0.0.0:50070</td><td>
     The address and the base port where the dfs namenode web ui will listen on.
     If the port is 0 then the server will start on a free port.
   </td>
@@ -163,6 +159,11 @@ creations/deletions), or "all".</td>
       directories, for redundancy. </td>
 </tr>
 <tr>
+<td><a name="dfs.web.ugi">dfs.web.ugi</a></td><td>webuser,webgroup</td><td>The user account used by the web interface.
+    Syntax: USERNAME,GROUP1,GROUP2, ...
+  </td>
+</tr>
+<tr>
 <td><a name="dfs.permissions">dfs.permissions</a></td><td>true</td><td>
     If "true", enable permission checking in HDFS.
     If "false", permission checking is turned off,
@@ -267,6 +268,12 @@ creations/deletions), or "all".</td>
   </td>
 </tr>
 <tr>
+<td><a name="dfs.namenode.decommission.interval">dfs.namenode.decommission.interval</a></td><td>300</td><td>Namenode periodicity in seconds to check if decommission is complete.</td>
+</tr>
+<tr>
+<td><a name="dfs.replication.interval">dfs.replication.interval</a></td><td>3</td><td>The periodicity in seconds with which the namenode computes repliaction work for datanodes. </td>
+</tr>
+<tr>
 <td><a name="fs.s3.block.size">fs.s3.block.size</a></td><td>67108864</td><td>Block size to use when writing files to S3.</td>
 </tr>
 <tr>
@@ -291,8 +298,8 @@ creations/deletions), or "all".</td>
   </td>
 </tr>
 <tr>
-<td><a name="mapred.job.tracker.http.bindAddress">mapred.job.tracker.http.bindAddress</a></td><td>0.0.0.0:50030</td><td>
-    The job tracker http server bind address and port.
+<td><a name="mapred.job.tracker.http.address">mapred.job.tracker.http.address</a></td><td>0.0.0.0:50030</td><td>
+    The job tracker http server address and port the server will listen on.
     If the port is 0 then the server will start on a free port.
   </td>
 </tr>
@@ -303,8 +310,10 @@ creations/deletions), or "all".</td>
   </td>
 </tr>
 <tr>
-<td><a name="mapred.task.tracker.report.bindAddress">mapred.task.tracker.report.bindAddress</a></td><td>127.0.0.1:0</td><td>The interface that task processes use to communicate
-  with their parent tasktracker process.</td>
+<td><a name="mapred.task.tracker.report.address">mapred.task.tracker.report.address</a></td><td>127.0.0.1:0</td><td>The interface and port that task tracker server listens on. 
+  Since it is only connected to by the tasks, it uses the local interface.
+  EXPERT ONLY. Should only be changed if your host does not have the loopback 
+  interface.</td>
 </tr>
 <tr>
 <td><a name="mapred.local.dir">mapred.local.dir</a></td><td>${hadoop.tmp.dir}/mapred/local</td><td>The local directory where MapReduce stores intermediate
@@ -410,6 +419,15 @@ creations/deletions), or "all".</td>
   </td>
 </tr>
 <tr>
+<td><a name="mapred.child.tmp">mapred.child.tmp</a></td><td>./tmp</td><td> To set the value of tmp directory for map and reduce tasks.
+  If the value is an absolute path, it is directly assigned. Otherwise, it is
+  prepended with task's working directory. The java tasks are executed with
+  option -Djava.io.tmpdir='the absolute path of the tmp dir'. Pipes and
+  streaming are set with environment variable,
+   TMPDIR='the absolute path of the tmp dir'
+  </td>
+</tr>
+<tr>
 <td><a name="mapred.inmem.merge.threshold">mapred.inmem.merge.threshold</a></td><td>1000</td><td>The threshold, in terms of the number of files 
   for the in-memory merge process. When we accumulate threshold number of files
   we initiate the in-memory merge and spill to disk. A value of 0 or less than
@@ -452,8 +470,8 @@ creations/deletions), or "all".</td>
   </td>
 </tr>
 <tr>
-<td><a name="mapred.task.tracker.http.bindAddress">mapred.task.tracker.http.bindAddress</a></td><td>0.0.0.0:50060</td><td>
-    The task tracker http server bind address and port.
+<td><a name="mapred.task.tracker.http.address">mapred.task.tracker.http.address</a></td><td>0.0.0.0:50060</td><td>
+    The task tracker http server address and port.
     If the port is 0 then the server will start on a free port.
   </td>
 </tr>
@@ -564,6 +582,22 @@ creations/deletions), or "all".</td>
     </td>
 </tr>
 <tr>
+<td><a name="mapred.task.profile">mapred.task.profile</a></td><td>false</td><td>To set whether the system should collect profiler
+     information for some of the tasks in this job? The information is stored
+     in the the user log directory. The value is "true" if task profiling
+     is enabled.</td>
+</tr>
+<tr>
+<td><a name="mapred.task.profile.maps">mapred.task.profile.maps</a></td><td>0-2</td><td> To set the ranges of map tasks to profile.
+    mapred.task.profile has to be set to true for the value to be accounted.
+    </td>
+</tr>
+<tr>
+<td><a name="mapred.task.profile.reduces">mapred.task.profile.reduces</a></td><td>0-2</td><td> To set the ranges of reduce tasks to profile.
+    mapred.task.profile has to be set to true for the value to be accounted.
+    </td>
+</tr>
+<tr>
 <td><a name="ipc.client.timeout">ipc.client.timeout</a></td><td>60000</td><td>Defines the timeout for IPC calls in milliseconds.</td>
 </tr>
 <tr>
@@ -596,6 +630,18 @@ creations/deletions), or "all".</td>
   </td>
 </tr>
 <tr>
+<td><a name="ipc.server.tcpnodelay">ipc.server.tcpnodelay</a></td><td>false</td><td>Turn on/off Nagle's algorithm for the TCP socket connection on 
+  the server. Setting to true disables the algorithm and may decrease latency
+  with a cost of more/smaller packets. 
+  </td>
+</tr>
+<tr>
+<td><a name="ipc.client.tcpnodelay">ipc.client.tcpnodelay</a></td><td>false</td><td>Turn on/off Nagle's algorithm for the TCP socket connection on 
+  the client. Setting to true disables the algorithm and may decrease latency
+  with a cost of more/smaller packets. 
+  </td>
+</tr>
+<tr>
 <td><a name="job.end.retry.attempts">job.end.retry.attempts</a></td><td>0</td><td>Indicates how many times hadoop should attempt to contact the
                notification URL </td>
 </tr>

+ 129 - 114
docs/hod_user_guide.html

@@ -6,7 +6,7 @@
 <meta name="Forrest-version" content="0.8">
 <meta name="Forrest-skin-name" content="pelt">
 <title>
-      Hadoop On Demand 0.4 User Guide
+      Hadoop On Demand User Guide
     </title>
 <link type="text/css" href="skin/basic.css" rel="stylesheet">
 <link media="screen" type="text/css" href="skin/screen.css" rel="stylesheet">
@@ -169,7 +169,7 @@ document.write("Last Published: " + document.lastModified);
         PDF</a>
 </div>
 <h1>
-      Hadoop On Demand 0.4 User Guide
+      Hadoop On Demand User Guide
     </h1>
 <div id="minitoc-area">
 <ul class="minitoc">
@@ -177,18 +177,18 @@ document.write("Last Published: " + document.lastModified);
 <a href="#Introduction-N1000C"> Introduction </a>
 </li>
 <li>
-<a href="#Getting+Started+Using+HOD+0.4"> Getting Started Using HOD 0.4 </a>
+<a href="#Getting+Started+Using+HOD"> Getting Started Using HOD </a>
 <ul class="minitoc">
 <li>
-<a href="#HOD"> HOD Operation Mode </a>
+<a href="#A+typical+HOD+session">A typical HOD session</a>
 </li>
 <li>
-<a href="#HOD-N1013B"> HOD Script Mode </a>
+<a href="#Running+hadoop+scripts+using+HOD">Running hadoop scripts using HOD</a>
 </li>
 </ul>
 </li>
 <li>
-<a href="#HOD+0.4+Features"> HOD 0.4 Features </a>
+<a href="#HOD+Features"> HOD Features </a>
 <ul class="minitoc">
 <li>
 <a href="#Provisioning+and+Managing+Hadoop+Clusters"> Provisioning and Managing Hadoop Clusters </a>
@@ -217,13 +217,8 @@ document.write("Last Published: " + document.lastModified);
 <li>
 <a href="#Capturing+HOD+exit+codes+in+Torque"> Capturing HOD exit codes in Torque </a>
 </li>
-</ul>
-</li>
 <li>
-<a href="#Command+Line+Options"> Command Line Options </a>
-<ul class="minitoc">
-<li>
-<a href="#Options+Defining+Operations"> Options Defining Operations </a>
+<a href="#Command+Line"> Command Line</a>
 </li>
 <li>
 <a href="#Options+Configuring+HOD"> Options Configuring HOD </a>
@@ -231,7 +226,7 @@ document.write("Last Published: " + document.lastModified);
 </ul>
 </li>
 <li>
-<a href="#Troubleshooting-N1055A"> Troubleshooting </a>
+<a href="#Troubleshooting-N10576"> Troubleshooting </a>
 <ul class="minitoc">
 <li>
 <a href="#Hangs+During+Allocation">hod Hangs During Allocation </a>
@@ -273,31 +268,29 @@ document.write("Last Published: " + document.lastModified);
 <div class="section">
 <a name="Introduction" id="Introduction"></a>
 <p>Hadoop On Demand (HOD) is a system for provisioning virtual Hadoop clusters over a large physical cluster. It uses the Torque resource manager to do node allocation. On the allocated nodes, it can start Hadoop Map/Reduce and HDFS daemons. It automatically generates the appropriate configuration files (hadoop-site.xml) for the Hadoop daemons and client. HOD also has the capability to distribute Hadoop to the nodes in the virtual cluster that it allocates. In short, HOD makes it easy for administrators and users to quickly setup and use Hadoop. It is also a very useful tool for Hadoop developers and testers who need to share a physical cluster for testing their own Hadoop versions.</p>
-<p>HOD 0.4 supports Hadoop from version 0.15 onwards.</p>
+<p>HOD supports Hadoop from version 0.15 onwards.</p>
 <p>The rest of the documentation comprises of a quick-start guide that helps you get quickly started with using HOD, a more detailed guide of all HOD features, command line options, known issues and trouble-shooting information.</p>
 </div>
   
-<a name="N1001E"></a><a name="Getting+Started+Using+HOD+0.4"></a>
-<h2 class="h3"> Getting Started Using HOD 0.4 </h2>
+<a name="N1001E"></a><a name="Getting+Started+Using+HOD"></a>
+<h2 class="h3"> Getting Started Using HOD </h2>
 <div class="section">
 <a name="Getting_Started_Using_HOD_0_4" id="Getting_Started_Using_HOD_0_4"></a>
-<p>In this section, we shall see a step-by-step introduction on how to use HOD for the most basic operations. Before following these steps, it is assumed that HOD 0.4 and its dependent hardware and software components are setup and configured correctly. This is a step that is generally performed by system administrators of the cluster.</p>
-<p>The HOD 0.4 user interface is a command line utility called <span class="codefrag">hod</span>. It is driven by a configuration file, that is typically setup for users by system administrators. Users can override this configuration when using the <span class="codefrag">hod</span>, which is described later in this documentation. The configuration file can be specified in two ways when using <span class="codefrag">hod</span>, as described below: </p>
+<p>In this section, we shall see a step-by-step introduction on how to use HOD for the most basic operations. Before following these steps, it is assumed that HOD and its dependent hardware and software components are setup and configured correctly. This is a step that is generally performed by system administrators of the cluster.</p>
+<p>The HOD user interface is a command line utility called <span class="codefrag">hod</span>. It is driven by a configuration file, that is typically setup for users by system administrators. Users can override this configuration when using the <span class="codefrag">hod</span>, which is described later in this documentation. The configuration file can be specified in two ways when using <span class="codefrag">hod</span>, as described below: </p>
 <ul>
     
-<li> Specify it on command line, using the -c option. Such as <span class="codefrag">hod -c path-to-the-configuration-file other-options</span>
+<li> Specify it on command line, using the -c option. Such as <span class="codefrag">hod &lt;operation&gt; &lt;required-args&gt; -c path-to-the-configuration-file [other-options]</span>
 </li>
     
 <li> Set up an environment variable <em>HOD_CONF_DIR</em> where <span class="codefrag">hod</span> will be run. This should be pointed to a directory on the local file system, containing a file called <em>hodrc</em>. Note that this is analogous to the <em>HADOOP_CONF_DIR</em> and <em>hadoop-site.xml</em> file for Hadoop. If no configuration file is specified on the command line, <span class="codefrag">hod</span> shall look for the <em>HOD_CONF_DIR</em> environment variable and a <em>hodrc</em> file under that.</li>
     
 </ul>
 <p>In examples listed below, we shall not explicitly point to the configuration option, assuming it is correctly specified.</p>
-<p>
-<span class="codefrag">hod</span> can be used in two modes, the <em>operation</em> mode and the <em>script</em> mode. We shall describe the two modes in detail below.</p>
-<a name="N10066"></a><a name="HOD"></a>
-<h3 class="h4"> HOD Operation Mode </h3>
-<a name="HOD_Operation_Mode" id="HOD_Operation_Mode"></a>
-<p>A typical session of HOD using this option will involve at least three steps: allocate, run hadoop jobs, deallocate. In order to use this mode, perform the following steps.</p>
+<a name="N1005B"></a><a name="A+typical+HOD+session"></a>
+<h3 class="h4">A typical HOD session</h3>
+<a name="HOD_Session" id="HOD_Session"></a>
+<p>A typical session of HOD will involve at least three steps: allocate, run hadoop jobs, deallocate. In order to do this, perform the following steps.</p>
 <p>
 <strong> Create a Cluster Directory </strong>
 </p>
@@ -307,13 +300,13 @@ document.write("Last Published: " + document.lastModified);
 <strong> Operation <em>allocate</em></strong>
 </p>
 <a name="Operation_allocate" id="Operation_allocate"></a>
-<p>The <em>allocate</em> operation is used to allocate a set of nodes and install and provision Hadoop on them. It has the following syntax:</p>
+<p>The <em>allocate</em> operation is used to allocate a set of nodes and install and provision Hadoop on them. It has the following syntax. Note that it requires a cluster_dir ( -d, --hod.clusterdir) and the number of nodes (-n, --hod.nodecount) needed to be allocated:</p>
 <table class="ForrestTable" cellspacing="1" cellpadding="4">
       
         
 <tr>
           
-<td colspan="1" rowspan="1"><span class="codefrag">$ hod -o "allocate cluster_dir number_of_nodes"</span></td>
+<td colspan="1" rowspan="1"><span class="codefrag">$ hod allocate -d cluster_dir -n number_of_nodes [OPTIONS]</span></td>
         
 </tr>
       
@@ -325,7 +318,7 @@ document.write("Last Published: " + document.lastModified);
     
 <tr>
       
-<td colspan="1" rowspan="1"><span class="codefrag">$ hod -o "allocate ~/hod-clusters/test 5"</span>
+<td colspan="1" rowspan="1"><span class="codefrag">$ hod allocate -d ~/hod-clusters/test -n 5</span>
 <br>
       
 <span class="codefrag">INFO - HDFS UI on http://foo1.bar.com:53422</span>
@@ -391,13 +384,13 @@ document.write("Last Published: " + document.lastModified);
 <strong> Operation <em>deallocate</em></strong>
 </p>
 <a name="Operation_deallocate" id="Operation_deallocate"></a>
-<p>The <em>deallocate</em> operation is used to release an allocated cluster. When finished with a cluster, deallocate must be run so that the nodes become free for others to use. The <em>deallocate</em> operation has the following syntax:</p>
+<p>The <em>deallocate</em> operation is used to release an allocated cluster. When finished with a cluster, deallocate must be run so that the nodes become free for others to use. The <em>deallocate</em> operation has the following syntax. Note that it requires the cluster_dir (-d, --hod.clusterdir) argument:</p>
 <table class="ForrestTable" cellspacing="1" cellpadding="4">
       
         
 <tr>
           
-<td colspan="1" rowspan="1"><span class="codefrag">$ hod -o "deallocate cluster_dir"</span></td>
+<td colspan="1" rowspan="1"><span class="codefrag">$ hod deallocate -d cluster_dir</span></td>
         
 </tr>
       
@@ -406,14 +399,14 @@ document.write("Last Published: " + document.lastModified);
 <p>Continuing our example, the following command will deallocate the cluster:</p>
 <table class="ForrestTable" cellspacing="1" cellpadding="4">
 <tr>
-<td colspan="1" rowspan="1"><span class="codefrag">$ hod -o "deallocate ~/hod-clusters/test"</span></td>
+<td colspan="1" rowspan="1"><span class="codefrag">$ hod deallocate -d ~/hod-clusters/test</span></td>
 </tr>
 </table>
-<p>As can be seen, when used in the <em>operation</em> mode, HOD allows the users to allocate a cluster, and use it flexibly for running Hadoop jobs. For example, users can run multiple jobs in parallel on the same cluster, by running hadoop from multiple shells pointing to the same configuration.</p>
-<a name="N1013B"></a><a name="HOD-N1013B"></a>
-<h3 class="h4"> HOD Script Mode </h3>
+<p>As can be seen, HOD allows the users to allocate a cluster, and use it flexibly for running Hadoop jobs. For example, users can run multiple jobs in parallel on the same cluster, by running hadoop from multiple shells pointing to the same configuration.</p>
+<a name="N1012A"></a><a name="Running+hadoop+scripts+using+HOD"></a>
+<h3 class="h4">Running hadoop scripts using HOD</h3>
 <a name="HOD_Script_Mode" id="HOD_Script_Mode"></a>
-<p>The HOD <em>script mode</em> combines the operations of allocating, using and deallocating a cluster into a single operation. This is very useful for users who want to run a script of hadoop jobs and let HOD handle the cleanup automatically once the script completes. In order to use <span class="codefrag">hod</span> in the script mode, do the following:</p>
+<p>The HOD <em>script operation</em> combines the operations of allocating, using and deallocating a cluster into a single operation. This is very useful for users who want to run a script of hadoop jobs and let HOD handle the cleanup automatically once the script completes. In order to run hadoop scripts using <span class="codefrag">hod</span>, do the following:</p>
 <p>
 <strong> Create a script file </strong>
 </p>
@@ -425,18 +418,18 @@ document.write("Last Published: " + document.lastModified);
   
 </tr>
 </table>
-<p>However, the user can add any valid commands as part of the script. HOD will execute this script setting <em>HADOOP_CONF_DIR</em> automatically to point to the allocated cluster. So users do not need to worry about this. They also do not need to create a cluster directory as in the <em>operation</em> mode.</p>
+<p>However, the user can add any valid commands as part of the script. HOD will execute this script setting <em>HADOOP_CONF_DIR</em> automatically to point to the allocated cluster. So users do not need to worry about this. The users however need to create a cluster directory just like when using the allocate operation.</p>
 <p>
 <strong> Running the script </strong>
 </p>
 <a name="Running_the_script" id="Running_the_script"></a>
-<p>The syntax for the <em>script mode</em> as is as follows:</p>
+<p>The syntax for the <em>script operation</em> as is as follows. Note that it requires a cluster directory ( -d, --hod.clusterdir), number of nodes (-n, --hod.nodecount) and a script file (-s, --hod.script):</p>
 <table class="ForrestTable" cellspacing="1" cellpadding="4">
       
         
 <tr>
           
-<td colspan="1" rowspan="1"><span class="codefrag">$ hod -m number_of_nodes -z script_file</span></td>
+<td colspan="1" rowspan="1"><span class="codefrag">$ hod script -d cluster_directory -n number_of_nodes -s script_file</span></td>
         
 </tr>
       
@@ -445,10 +438,10 @@ document.write("Last Published: " + document.lastModified);
 <p>Note that HOD will deallocate the cluster as soon as the script completes, and this means that the script must not complete until the hadoop jobs themselves are completed. Users must take care of this while writing the script. </p>
 </div>
   
-<a name="N10186"></a><a name="HOD+0.4+Features"></a>
-<h2 class="h3"> HOD 0.4 Features </h2>
+<a name="N1016F"></a><a name="HOD+Features"></a>
+<h2 class="h3"> HOD Features </h2>
 <div class="section">
-<a name="HOD_0_4_Features" id="HOD_0_4_Features"></a><a name="N1018E"></a><a name="Provisioning+and+Managing+Hadoop+Clusters"></a>
+<a name="HOD_0_4_Features" id="HOD_0_4_Features"></a><a name="N10177"></a><a name="Provisioning+and+Managing+Hadoop+Clusters"></a>
 <h3 class="h4"> Provisioning and Managing Hadoop Clusters </h3>
 <a name="Provisioning_and_Managing_Hadoop" id="Provisioning_and_Managing_Hadoop"></a>
 <p>The primary feature of HOD is to provision Hadoop Map/Reduce and HDFS clusters. This is described above in the Getting Started section. Also, as long as nodes are available, and organizational policies allow, a user can use HOD to allocate multiple Map/Reduce clusters simultaneously. The user would need to specify different paths for the <span class="codefrag">cluster_dir</span> parameter mentioned above for each cluster he/she allocates. HOD provides the <em>list</em> and the <em>info</em> operations to enable managing multiple clusters.</p>
@@ -462,7 +455,7 @@ document.write("Last Published: " + document.lastModified);
         
 <tr>
           
-<td colspan="1" rowspan="1"><span class="codefrag">$ hod -o "list"</span></td>
+<td colspan="1" rowspan="1"><span class="codefrag">$ hod list</span></td>
         
 </tr>
       
@@ -472,31 +465,30 @@ document.write("Last Published: " + document.lastModified);
 <strong> Operation <em>info</em></strong>
 </p>
 <a name="Operation_info" id="Operation_info"></a>
-<p>The info operation shows information about a given cluster. The information shown includes the Torque job id, and locations of the important daemons like the HOD Ringmaster process, and the Hadoop JobTracker and NameNode daemons. The info operation has the following syntax:</p>
+<p>The info operation shows information about a given cluster. The information shown includes the Torque job id, and locations of the important daemons like the HOD Ringmaster process, and the Hadoop JobTracker and NameNode daemons. The info operation has the following syntax. Note that it requires a cluster directory (-d, --hod.clusterdir):</p>
 <table class="ForrestTable" cellspacing="1" cellpadding="4">
       
         
 <tr>
           
-<td colspan="1" rowspan="1"><span class="codefrag">$ hod -o "info cluster_dir"</span></td>
+<td colspan="1" rowspan="1"><span class="codefrag">$ hod info -d cluster_dir</span></td>
         
 </tr>
       
     
 </table>
 <p>The <span class="codefrag">cluster_dir</span> should be a valid cluster directory specified in an earlier <em>allocate</em> operation.</p>
-<a name="N101D9"></a><a name="Using+a+tarball+to+distribute+Hadoop"></a>
+<a name="N101C2"></a><a name="Using+a+tarball+to+distribute+Hadoop"></a>
 <h3 class="h4"> Using a tarball to distribute Hadoop </h3>
 <a name="Using_a_tarball_to_distribute_Ha" id="Using_a_tarball_to_distribute_Ha"></a>
 <p>When provisioning Hadoop, HOD can use either a pre-installed Hadoop on the cluster nodes or distribute and install a Hadoop tarball as part of the provisioning operation. If the tarball option is being used, there is no need to have a pre-installed Hadoop on the cluster nodes, nor a need to use a pre-installed one. This is especially useful in a development / QE environment where individual developers may have different versions of Hadoop to test on a shared cluster. </p>
 <p>In order to use a pre-installed Hadoop, you must specify, in the hodrc, the <span class="codefrag">pkgs</span> option in the <span class="codefrag">gridservice-hdfs</span> and <span class="codefrag">gridservice-mapred</span> sections. This must point to the path where Hadoop is installed on all nodes of the cluster.</p>
-<p>The tarball option can be used in both the <em>operation</em> and <em>script</em> options. </p>
-<p>In the operation option, the syntax is as follows:</p>
+<p>The syntax for specifying tarball is as follows:</p>
 <table class="ForrestTable" cellspacing="1" cellpadding="4">
         
 <tr>
           
-<td colspan="1" rowspan="1"><span class="codefrag">$ hod -t hadoop_tarball_location -o "allocate cluster_dir number_of_nodes"</span></td>
+<td colspan="1" rowspan="1"><span class="codefrag">$ hod allocate -d cluster_dir -n number_of_nodes -t hadoop_tarball_location</span></td>
         
 </tr>
     
@@ -504,15 +496,15 @@ document.write("Last Published: " + document.lastModified);
 <p>For example, the following command allocates Hadoop provided by the tarball <span class="codefrag">~/share/hadoop.tar.gz</span>:</p>
 <table class="ForrestTable" cellspacing="1" cellpadding="4">
 <tr>
-<td colspan="1" rowspan="1"><span class="codefrag">$ hod -t ~/share/hadoop.tar.gz -o "allocate ~/hadoop-cluster 10"</span></td>
+<td colspan="1" rowspan="1"><span class="codefrag">$ hod allocate -d ~/hadoop-cluster -n 10 -t ~/share/hadoop.tar.gz</span></td>
 </tr>
 </table>
-<p>In the script option, the syntax is as follows:</p>
+<p>Similarly, when using hod script, the syntax is as follows:</p>
 <table class="ForrestTable" cellspacing="1" cellpadding="4">
         
 <tr>
           
-<td colspan="1" rowspan="1"><span class="codefrag">$ hod -t hadoop_tarball_location -m number_of_nodes -z script_file</span></td>
+<td colspan="1" rowspan="1"><span class="codefrag">$ hod script -d cluster_directory -s script_file -n number_of_nodes -t hadoop_tarball_location</span></td>
         
 </tr>
     
@@ -528,7 +520,7 @@ document.write("Last Published: " + document.lastModified);
 <li> When you want to run jobs against a cluster allocated using the tarball, you must use a compatible version of hadoop to submit your jobs. The best would be to untar and use the version that is present in the tarball itself.</li>
   
 </ul>
-<a name="N10235"></a><a name="Using+an+external+HDFS"></a>
+<a name="N10215"></a><a name="Using+an+external+HDFS"></a>
 <h3 class="h4"> Using an external HDFS </h3>
 <a name="Using_an_external_HDFS" id="Using_an_external_HDFS"></a>
 <p>In typical Hadoop clusters provisioned by HOD, HDFS is already set up statically (without using HOD). This allows data to persist in HDFS after the HOD provisioned clusters is deallocated. To use a statically configured HDFS, your hodrc must point to an external HDFS. Specifically, set the following options to the correct values in the section <span class="codefrag">gridservice-hdfs</span> of the hodrc:</p>
@@ -554,7 +546,7 @@ document.write("Last Published: " + document.lastModified);
         
 <tr>
           
-<td colspan="1" rowspan="1"><span class="codefrag">$ hod --gridservice-hdfs.external -o "allocate cluster_dir number_of_nodes"</span></td>
+<td colspan="1" rowspan="1"><span class="codefrag">$ hod allocate -d cluster_dir -n number_of_nodes --gridservice-hdfs.external</span></td>
         
 </tr>
     
@@ -565,7 +557,7 @@ document.write("Last Published: " + document.lastModified);
 <td colspan="1" rowspan="1">external = false</td>
 </tr>
 </table>
-<a name="N10279"></a><a name="Options+for+Configuring+Hadoop"></a>
+<a name="N10259"></a><a name="Options+for+Configuring+Hadoop"></a>
 <h3 class="h4"> Options for Configuring Hadoop </h3>
 <a name="Options_for_Configuring_Hadoop" id="Options_for_Configuring_Hadoop"></a>
 <p>HOD provides a very convenient mechanism to configure both the Hadoop daemons that it provisions and also the hadoop-site.xml that it generates on the client side. This is done by specifying Hadoop configuration parameters in either the HOD configuration file, or from the command line when allocating clusters.</p>
@@ -591,7 +583,7 @@ document.write("Last Published: " + document.lastModified);
         
 <tr>
           
-<td colspan="1" rowspan="1"><span class="codefrag">$ hod -Mmapred.reduce.parallel.copies=20 -Mio.sort.factor=100 -o "allocate cluster_dir number_of_nodes"</span></td>
+<td colspan="1" rowspan="1"><span class="codefrag">$ hod allocate -d cluster_dir -n number_of_nodes -Mmapred.reduce.parallel.copies=20 -Mio.sort.factor=100</span></td>
         
 </tr>
     
@@ -601,7 +593,7 @@ document.write("Last Published: " + document.lastModified);
         
 <tr>
           
-<td colspan="1" rowspan="1"><span class="codefrag">$ hod -Fmapred.reduce.parallel.copies=20 -Fio.sort.factor=100 -o "allocate cluster_dir number_of_nodes"</span></td>
+<td colspan="1" rowspan="1"><span class="codefrag">$ hod allocate -d cluster_dir -n number_of_nodes -Fmapred.reduce.parallel.copies=20 -Fio.sort.factor=100</span></td>
         
 </tr>
     
@@ -617,19 +609,19 @@ document.write("Last Published: " + document.lastModified);
         
 <tr>
           
-<td colspan="1" rowspan="1"><span class="codefrag">$ hod -Cmapred.userlog.limit.kb=200 -Cmapred.child.java.opts=-Xmx512m -o "allocate cluster_dir number_of_nodes"</span></td>
+<td colspan="1" rowspan="1"><span class="codefrag">$ hod allocate -d cluster_dir -n number_of_nodes -Cmapred.userlog.limit.kb=200 -Cmapred.child.java.opts=-Xmx512m</span></td>
         
 </tr>
     
 </table>
 <p>In this example, the <em>mapred.userlog.limit.kb</em> and <em>mapred.child.java.opts</em> options will be included into the hadoop-site.xml that is generated by HOD.</p>
-<a name="N1030B"></a><a name="Viewing+Hadoop+Web-UIs"></a>
+<a name="N102EB"></a><a name="Viewing+Hadoop+Web-UIs"></a>
 <h3 class="h4"> Viewing Hadoop Web-UIs </h3>
 <a name="Viewing_Hadoop_Web_UIs" id="Viewing_Hadoop_Web_UIs"></a>
 <p>The HOD allocation operation prints the JobTracker and NameNode web UI URLs. For example:</p>
 <table class="ForrestTable" cellspacing="1" cellpadding="4">
 <tr>
-<td colspan="1" rowspan="1"><span class="codefrag">$ hod -c ~/hod-conf-dir/hodrc -o "allocate ~/hadoop-cluster 10"</span>
+<td colspan="1" rowspan="1"><span class="codefrag">$ hod allocate -d ~/hadoop-cluster -n 10 -c ~/hod-conf-dir/hodrc</span>
 <br>
     
 <span class="codefrag">INFO - HDFS UI on http://host242.foo.com:55391</span>
@@ -640,7 +632,7 @@ document.write("Last Published: " + document.lastModified);
 </tr>
 </table>
 <p>The same information is also available via the <em>info</em> operation described above.</p>
-<a name="N1032D"></a><a name="Collecting+and+Viewing+Hadoop+Logs"></a>
+<a name="N1030D"></a><a name="Collecting+and+Viewing+Hadoop+Logs"></a>
 <h3 class="h4"> Collecting and Viewing Hadoop Logs </h3>
 <a name="Collecting_and_Viewing_Hadoop_Lo" id="Collecting_and_Viewing_Hadoop_Lo"></a>
 <p>To get the Hadoop logs of the daemons running on one of the allocated nodes: </p>
@@ -668,13 +660,13 @@ document.write("Last Published: " + document.lastModified);
 </table>
 <p>Under the root directory specified above in the path, HOD will create a create a path user_name/torque_jobid and store gzipped log files for each node that was part of the job.</p>
 <p>Note that to store the files to HDFS, you may need to configure the <span class="codefrag">hodring.pkgs</span> option with the Hadoop version that matches the HDFS mentioned. If not, HOD will try to use the Hadoop version that it is using to provision the Hadoop cluster itself.</p>
-<a name="N10376"></a><a name="Auto-deallocation+of+Idle+Clusters"></a>
+<a name="N10356"></a><a name="Auto-deallocation+of+Idle+Clusters"></a>
 <h3 class="h4"> Auto-deallocation of Idle Clusters </h3>
 <a name="Auto_deallocation_of_Idle_Cluste" id="Auto_deallocation_of_Idle_Cluste"></a>
 <p>HOD automatically deallocates clusters that are not running Hadoop jobs for a given period of time. Each HOD allocation includes a monitoring facility that constantly checks for running Hadoop jobs. If it detects no running Hadoop jobs for a given period, it will automatically deallocate its own cluster and thus free up nodes which are not being used effectively.</p>
 <p>
 <em>Note:</em> While the cluster is deallocated, the <em>cluster directory</em> is not cleaned up automatically. The user must deallocate this cluster through the regular <em>deallocate</em> operation to clean this up.</p>
-<a name="N1038C"></a><a name="Specifying+Additional+Job+Attributes"></a>
+<a name="N1036C"></a><a name="Specifying+Additional+Job+Attributes"></a>
 <h3 class="h4"> Specifying Additional Job Attributes </h3>
 <a name="Specifying_Additional_Job_Attrib" id="Specifying_Additional_Job_Attrib"></a>
 <p>HOD allows the user to specify a wallclock time and a name (or title) for a Torque job. </p>
@@ -684,7 +676,7 @@ document.write("Last Published: " + document.lastModified);
         
 <tr>
           
-<td colspan="1" rowspan="1"><span class="codefrag">$ hod -l time_in_seconds -o "allocate cluster_dir number_of_nodes"</span></td>
+<td colspan="1" rowspan="1"><span class="codefrag">$ hod allocate -d cluster_dir -n number_of_nodes -l time_in_seconds</span></td>
         
 </tr>
     
@@ -695,14 +687,14 @@ document.write("Last Published: " + document.lastModified);
         
 <tr>
           
-<td colspan="1" rowspan="1"><span class="codefrag">$ hod -N name_of_job -o "allocate cluster_dir number_of_nodes"</span></td>
+<td colspan="1" rowspan="1"><span class="codefrag">$ hod allocate -d cluster_dir -n number_of_nodes -N name_of_job</span></td>
         
 </tr>
     
 </table>
 <p>
 <em>Note:</em> Due to restriction in the underlying Torque resource manager, names which do not start with a alphabet or contain a 'space' will cause the job to fail. The failure message points to the problem being in the specified job name.</p>
-<a name="N103C3"></a><a name="Capturing+HOD+exit+codes+in+Torque"></a>
+<a name="N103A3"></a><a name="Capturing+HOD+exit+codes+in+Torque"></a>
 <h3 class="h4"> Capturing HOD exit codes in Torque </h3>
 <a name="Capturing_HOD_exit_codes_in_Torq" id="Capturing_HOD_exit_codes_in_Torq"></a>
 <p>HOD exit codes are captured in the Torque exit_status field. This will help users and system administrators to distinguish successful runs from unsuccessful runs of HOD. The exit codes are 0 if allocation succeeded and all hadoop jobs ran on the allocated cluster correctly. They are non-zero if allocation failed or some of the hadoop jobs failed on the allocated cluster. The exit codes that are possible are mentioned in the table below. <em>Note: Hadoop job status is captured only if the version of Hadoop used is 16 or above.</em>
@@ -782,54 +774,69 @@ document.write("Last Published: " + document.lastModified);
     
   
 </table>
-</div>
-  
-<a name="N10456"></a><a name="Command+Line+Options"></a>
-<h2 class="h3"> Command Line Options </h2>
-<div class="section">
-<a name="Command_Line_Options" id="Command_Line_Options"></a>
-<p>Command line options for the <span class="codefrag">hod</span> command are used for two purposes: defining an operation that HOD must perform, and defining configuration options for customizing HOD that override options defined in the default configuration file. This section covers both types of options. </p>
-<a name="N10464"></a><a name="Options+Defining+Operations"></a>
-<h3 class="h4"> Options Defining Operations </h3>
-<a name="Options_Defining_Operations" id="Options_Defining_Operations"></a>
+<a name="N10435"></a><a name="Command+Line"></a>
+<h3 class="h4"> Command Line</h3>
+<a name="Command_Line" id="Command_Line"></a>
+<p>HOD command line has the following general syntax:<br>
+      
+<em>hod &lt;operation&gt; [ARGS] [OPTIONS]<br>
+</em>
+      Allowed operations are 'allocate', 'deallocate', 'info', 'list', 'script' and 'help'. For help on a particular operation one can do : <span class="codefrag">hod help &lt;operation&gt;</span>. To have a look at possible options one can do a <span class="codefrag">hod help options.</span>
+</p>
 <p>
-<em>--help</em>
+<em>allocate</em>
 <br>
-    Prints out the help message to see the basic options.</p>
-<p>
-<em>--verbose-help</em>
+      
+<em>Usage : hod allocate -d cluster_dir -n number_of_nodes [OPTIONS]</em>
 <br>
-    All configuration options provided in the hodrc file can be passed on the command line, using the syntax <span class="codefrag">--section_name.option_name[=value]</span>. When provided this way, the value provided on command line overrides the option provided in hodrc. The verbose-help command lists all the available options in the hodrc file. This is also a nice way to see the meaning of the configuration options.</p>
+        Allocates a cluster on the given number of cluster nodes, and store the allocation information in cluster_dir for use with subsequent <span class="codefrag">hadoop</span> commands. Note that the <span class="codefrag">cluster_dir</span> must exist before running the command.</p>
 <p>
-<em>-o "operation_name options"</em>
+<em>list</em>
 <br>
-    This class of options are used to define the <em>operation</em> mode of HOD. <em>Note:</em> The operation_name and other options must be specified within double quotes.</p>
+      
+<em>Usage : hod list [OPTIONS]</em>
+<br>
+       Lists the clusters allocated by this user. Information provided includes the Torque job id corresponding to the cluster, the cluster directory where the allocation information is stored, and whether the Map/Reduce daemon is still active or not.</p>
 <p>
-<em>-o "help"</em>
+<em>info</em>
+<br>
+      
+<em>Usage : hod info -d cluster_dir [OPTIONS]</em>
 <br>
-    Lists the operations available in the <em>operation</em> mode.</p>
+        Lists information about the cluster whose allocation information is stored in the specified cluster directory.</p>
 <p>
-<em>-o "allocate cluster_dir number_of_nodes"</em>
+<em>deallocate</em>
 <br>
-    Allocates a cluster on the given number of cluster nodes, and store the allocation information in cluster_dir for use with subsequent <span class="codefrag">hadoop</span> commands. Note that the <span class="codefrag">cluster_dir</span> must exist before running the command.</p>
+      
+<em>Usage : hod deallocate -d cluster_dir [OPTIONS]</em>
+<br>
+        Deallocates the cluster whose allocation information is stored in the specified cluster directory.</p>
 <p>
-<em>-o "list"</em>
+<em>script</em>
+<br>
+      
+<em>Usage : hod script -s script_file -d cluster_directory -n number_of_nodes [OPTIONS]</em>
 <br>
-    Lists the clusters allocated by this user. Information provided includes the Torque job id corresponding to the cluster, the cluster directory where the allocation information is stored, and whether the Map/Reduce daemon is still active or not.</p>
+        Runs a hadoop script using HOD<em>script</em> operation. Provisions Hadoop on a given number of nodes, executes the given script from the submitting node, and deallocates the cluster when the script completes.</p>
 <p>
-<em>-o "info cluster_dir"</em>
+<em>help</em>
+<br>
+      
+<em>Usage : hod help [operation | 'options']</em>
 <br>
-    Lists information about the cluster whose allocation information is stored in the specified cluster directory.</p>
+       When no argument is specified, <span class="codefrag">hod help</span> gives the usage and basic options, and is equivalent to <span class="codefrag">hod --help</span> (See below). When 'options' is given as argument, hod displays only the basic options that hod takes. When an operation is specified, it displays the usage and description corresponding to that particular operation. For e.g, to know about allocate operation, one can do a <span class="codefrag">hod help allocate</span>
+</p>
+<p>Besides the operations, HOD can take the following command line options.</p>
 <p>
-<em>-o "deallocate cluster_dir"</em>
+<em>--help</em>
 <br>
-    Deallocates the cluster whose allocation information is stored in the specified cluster directory.</p>
+        Prints out the help message to see the usage and basic options.</p>
 <p>
-<em>-z script_file</em>
+<em>--verbose-help</em>
 <br>
-    Runs HOD in <em>script mode</em>. Provisions Hadoop on a given number of nodes, executes the given script from the submitting node, and deallocates the cluster when the script completes. Refer to option <em>-m</em>
-</p>
-<a name="N104B9"></a><a name="Options+Configuring+HOD"></a>
+        All configuration options provided in the hodrc file can be passed on the command line, using the syntax <span class="codefrag">--section_name.option_name[=value]</span>. When provided this way, the value provided on command line overrides the option provided in hodrc. The verbose-help command lists all the available options in the hodrc file. This is also a nice way to see the meaning of the configuration options.</p>
+<p>See the <a href="#Options_Configuring_HOD">next section</a> for a description of most important hod configuration options. For basic options, one can do a <span class="codefrag">hod help options</span> and for all options possible in hod configuration, one can see <span class="codefrag">hod --verbose-help</span>. See <a href="hod_config_guide.html">config guide</a> for a description of all options.</p>
+<a name="N104BC"></a><a name="Options+Configuring+HOD"></a>
 <h3 class="h4"> Options Configuring HOD </h3>
 <a name="Options_Configuring_HOD" id="Options_Configuring_HOD"></a>
 <p>As described above, HOD is configured using a configuration file that is usually set up by system administrators. This is a INI style configuration file that is divided into sections, and options inside each section. Each section relates to one of the HOD processes: client, ringmaster, hodring, mapreduce or hdfs. The options inside a section comprise of an option name and value. </p>
@@ -847,6 +854,18 @@ document.write("Last Published: " + document.lastModified);
 <br>
     Provides the configuration file to use. Can be used with all other options of HOD. Alternatively, the <span class="codefrag">HOD_CONF_DIR</span> environment variable can be defined to specify a directory that contains a file named <span class="codefrag">hodrc</span>, alleviating the need to specify the configuration file in each HOD command.</p>
 <p>
+<em>-d cluster_dir</em>
+<br>
+        This is required for most of the hod operations. As described <a href="#Create_a_Cluster_Directory">here</a>, the <em>cluster directory</em> is a directory on the local file system where <span class="codefrag">hod</span> will generate the Hadoop configuration, <em>hadoop-site.xml</em>, corresponding to the cluster it allocates. Create this directory and pass it to the <span class="codefrag">hod</span> operations as an argument to -d or --hod.clusterdir. Once a cluster is allocated, a user can utilize it to run Hadoop jobs by specifying the clusterdirectory as the Hadoop --config option.</p>
+<p>
+<em>-n number_of_nodes</em>
+<br>
+  This is required for the hod 'allocation' operation and for script operation. This denotes the number of nodes to be allocated.</p>
+<p>
+<em>-s script-file</em>
+<br>
+   Required when using script operation, specifies the script file to execute.</p>
+<p>
 <em>-b 1|2|3|4</em>
 <br>
     Enables the given debug level. Can be used with all other options of HOD. 4 is most verbose.</p>
@@ -855,10 +874,6 @@ document.write("Last Published: " + document.lastModified);
 <br>
     Provisions Hadoop from the given tar.gz file. This option is only applicable to the <em>allocate</em> operation. For better distribution performance it is strongly recommended that the Hadoop tarball is created <em>after</em> removing the source or documentation.</p>
 <p>
-<em>-m number_of_nodes</em>
-<br>
-    When used in the <em>script</em> mode, this specifies the number of nodes to allocate. Note that this option is useful only in the script mode.</p>
-<p>
 <em>-N job-name</em>
 <br>
     The Name to give to the resource manager job that HOD uses underneath. For e.g. in the case of Torque, this translates to the <span class="codefrag">qsub -N</span> option, and can be seen as the job name using the <span class="codefrag">qstat</span> command.</p>
@@ -869,7 +884,7 @@ document.write("Last Published: " + document.lastModified);
 <p>
 <em>-j java-home</em>
 <br>
-    Path to be set to the JAVA_HOME environment variable. This is used in the <em>script</em> mode. HOD sets the JAVA_HOME environment variable tot his value and launches the user script in that.</p>
+    Path to be set to the JAVA_HOME environment variable. This is used in the <em>script</em> operation. HOD sets the JAVA_HOME environment variable tot his value and launches the user script in that.</p>
 <p>
 <em>-A account-string</em>
 <br>
@@ -903,12 +918,12 @@ document.write("Last Published: " + document.lastModified);
 </p>
 </div>
 	
-<a name="N1055A"></a><a name="Troubleshooting-N1055A"></a>
+<a name="N10576"></a><a name="Troubleshooting-N10576"></a>
 <h2 class="h3"> Troubleshooting </h2>
 <div class="section">
 <a name="Troubleshooting" id="Troubleshooting"></a>
 <p>The following section identifies some of the most likely error conditions users can run into when using HOD and ways to trouble-shoot them</p>
-<a name="N10565"></a><a name="Hangs+During+Allocation"></a>
+<a name="N10581"></a><a name="Hangs+During+Allocation"></a>
 <h3 class="h4">hod Hangs During Allocation </h3>
 <a name="_hod_Hangs_During_Allocation" id="_hod_Hangs_During_Allocation"></a><a name="hod_Hangs_During_Allocation" id="hod_Hangs_During_Allocation"></a>
 <p>
@@ -917,12 +932,12 @@ document.write("Last Published: " + document.lastModified);
 <em>Possible Cause:</em> A large allocation is fired with a tarball. Sometimes due to load in the network, or on the allocated nodes, the tarball distribution might be significantly slow and take a couple of minutes to come back. Wait for completion. Also check that the tarball does not have the Hadoop sources or documentation.</p>
 <p>
 <em>Possible Cause:</em> A Torque related problem. If the cause is Torque related, the <span class="codefrag">hod</span> command will not return for more than 5 minutes. Running <span class="codefrag">hod</span> in debug mode may show the <span class="codefrag">qstat</span> command being executed repeatedly. Executing the <span class="codefrag">qstat</span> command from a separate shell may show that the job is in the <span class="codefrag">Q</span> (Queued) state. This usually indicates a problem with Torque. Possible causes could include some nodes being down, or new nodes added that Torque is not aware of. Generally, system administator help is needed to resolve this problem.</p>
-<a name="N10592"></a><a name="Hangs+During+Deallocation"></a>
+<a name="N105AE"></a><a name="Hangs+During+Deallocation"></a>
 <h3 class="h4">hod Hangs During Deallocation </h3>
 <a name="_hod_Hangs_During_Deallocation" id="_hod_Hangs_During_Deallocation"></a><a name="hod_Hangs_During_Deallocation" id="hod_Hangs_During_Deallocation"></a>
 <p>
 <em>Possible Cause:</em> A Torque related problem, usually load on the Torque server, or the allocation is very large. Generally, waiting for the command to complete is the only option.</p>
-<a name="N105A3"></a><a name="Fails+With+an+error+code+and+error+message"></a>
+<a name="N105BF"></a><a name="Fails+With+an+error+code+and+error+message"></a>
 <h3 class="h4">hod Fails With an error code and error message </h3>
 <a name="hod_Fails_With_an_error_code_and" id="hod_Fails_With_an_error_code_and"></a><a name="_hod_Fails_With_an_error_code_an" id="_hod_Fails_With_an_error_code_an"></a>
 <p>If the exit code of the <span class="codefrag">hod</span> command is not <span class="codefrag">0</span>, then refer to the following table of error exit codes to determine why the code may have occurred and how to debug the situation.</p>
@@ -953,7 +968,7 @@ document.write("Last Published: " + document.lastModified);
         
 <td colspan="1" rowspan="1"> 2 </td>
         <td colspan="1" rowspan="1"> Invalid operation </td>
-        <td colspan="1" rowspan="1"> Do <span class="codefrag">hod -o "help"</span> for the list of valid operations. </td>
+        <td colspan="1" rowspan="1"> Do <span class="codefrag">hod help</span> for the list of valid operations. </td>
       
 </tr>
       
@@ -961,7 +976,7 @@ document.write("Last Published: " + document.lastModified);
         
 <td colspan="1" rowspan="1"> 3 </td>
         <td colspan="1" rowspan="1"> Invalid operation arguments </td>
-        <td colspan="1" rowspan="1"> Do <span class="codefrag">hod -o "help"</span> for the list of valid operations. Note that for an <em>allocate</em> operation, the directory argument must specify an existing directory. </td>
+        <td colspan="1" rowspan="1"> Do <span class="codefrag">hod help operation</span> for listing the usage of a particular operation.</td>
       
 </tr>
       
@@ -1062,7 +1077,7 @@ document.write("Last Published: " + document.lastModified);
     
   
 </table>
-<a name="N10715"></a><a name="Hadoop+Jobs+Not+Running+on+a+Successfully+Allocated+Cluster"></a>
+<a name="N1072E"></a><a name="Hadoop+Jobs+Not+Running+on+a+Successfully+Allocated+Cluster"></a>
 <h3 class="h4"> Hadoop Jobs Not Running on a Successfully Allocated Cluster </h3>
 <a name="Hadoop_Jobs_Not_Running_on_a_Suc" id="Hadoop_Jobs_Not_Running_on_a_Suc"></a>
 <p>This scenario generally occurs when a cluster is allocated, and is left inactive for sometime, and then hadoop jobs are attempted to be run on them. Then Hadoop jobs fail with the following exception:</p>
@@ -1081,31 +1096,31 @@ document.write("Last Published: " + document.lastModified);
 <em>Possible Cause:</em> There is a version mismatch between the version of the hadoop client being used to submit jobs and the hadoop used in provisioning (typically via the tarball option). Ensure compatible versions are being used.</p>
 <p>
 <em>Possible Cause:</em> You used one of the options for specifying Hadoop configuration <span class="codefrag">-M or -H</span>, which had special characters like space or comma that were not escaped correctly. Refer to the section <em>Options Configuring HOD</em> for checking how to specify such options correctly.</p>
-<a name="N10750"></a><a name="My+Hadoop+Job+Got+Killed"></a>
+<a name="N10769"></a><a name="My+Hadoop+Job+Got+Killed"></a>
 <h3 class="h4"> My Hadoop Job Got Killed </h3>
 <a name="My_Hadoop_Job_Got_Killed" id="My_Hadoop_Job_Got_Killed"></a>
 <p>
 <em>Possible Cause:</em> The wallclock limit specified by the Torque administrator or the <span class="codefrag">-l</span> option defined in the section <em>Specifying Additional Job Attributes</em> was exceeded since allocation time. Thus the cluster would have got released. Deallocate the cluster and allocate it again, this time with a larger wallclock time.</p>
 <p>
 <em>Possible Cause:</em> Problems with the JobTracker node. Refer to the section in <em>Collecting and Viewing Hadoop Logs</em> to get more information.</p>
-<a name="N1076B"></a><a name="Hadoop+Job+Fails+with+Message%3A+%27Job+tracker+still+initializing%27"></a>
+<a name="N10784"></a><a name="Hadoop+Job+Fails+with+Message%3A+%27Job+tracker+still+initializing%27"></a>
 <h3 class="h4"> Hadoop Job Fails with Message: 'Job tracker still initializing' </h3>
 <a name="Hadoop_Job_Fails_with_Message_Jo" id="Hadoop_Job_Fails_with_Message_Jo"></a>
 <p>
 <em>Possible Cause:</em> The hadoop job was being run as part of the HOD script command, and it started before the JobTracker could come up fully. Allocate the cluster using a large value for the configuration option <span class="codefrag">--hod.script-wait-time</span>. Typically a value of 120 should work, though it is typically unnecessary to be that large.</p>
-<a name="N1077B"></a><a name="The+Exit+Codes+For+HOD+Are+Not+Getting+Into+Torque"></a>
+<a name="N10794"></a><a name="The+Exit+Codes+For+HOD+Are+Not+Getting+Into+Torque"></a>
 <h3 class="h4"> The Exit Codes For HOD Are Not Getting Into Torque </h3>
 <a name="The_Exit_Codes_For_HOD_Are_Not_G" id="The_Exit_Codes_For_HOD_Are_Not_G"></a>
 <p>
 <em>Possible Cause:</em> Version 0.16 of hadoop is required for this functionality to work. The version of Hadoop used does not match. Use the required version of Hadoop.</p>
 <p>
 <em>Possible Cause:</em> The deallocation was done without using the <span class="codefrag">hod</span> command; for e.g. directly using <span class="codefrag">qdel</span>. When the cluster is deallocated in this manner, the HOD processes are terminated using signals. This results in the exit code to be based on the signal number, rather than the exit code of the program.</p>
-<a name="N10793"></a><a name="The+Hadoop+Logs+are+Not+Uploaded+to+DFS"></a>
+<a name="N107AC"></a><a name="The+Hadoop+Logs+are+Not+Uploaded+to+DFS"></a>
 <h3 class="h4"> The Hadoop Logs are Not Uploaded to DFS </h3>
 <a name="The_Hadoop_Logs_are_Not_Uploaded" id="The_Hadoop_Logs_are_Not_Uploaded"></a>
 <p>
 <em>Possible Cause:</em> There is a version mismatch between the version of the hadoop being used for uploading the logs and the external HDFS. Ensure that the correct version is specified in the <span class="codefrag">hodring.pkgs</span> option.</p>
-<a name="N107A3"></a><a name="Locating+Ringmaster+Logs"></a>
+<a name="N107BC"></a><a name="Locating+Ringmaster+Logs"></a>
 <h3 class="h4"> Locating Ringmaster Logs </h3>
 <a name="Locating_Ringmaster_Logs" id="Locating_Ringmaster_Logs"></a>
 <p>To locate the ringmaster logs, follow these steps: </p>
@@ -1122,7 +1137,7 @@ document.write("Last Published: " + document.lastModified);
 <li> If you don't get enough information, you may want to set the ringmaster debug level to 4. This can be done by passing <span class="codefrag">--ringmaster.debug 4</span> to the hod command line.</li>
   
 </ul>
-<a name="N107CF"></a><a name="Locating+Hodring+Logs"></a>
+<a name="N107E8"></a><a name="Locating+Hodring+Logs"></a>
 <h3 class="h4"> Locating Hodring Logs </h3>
 <a name="Locating_Hodring_Logs" id="Locating_Hodring_Logs"></a>
 <p>To locate hodring logs, follow the steps below: </p>

文件差异内容过多而无法显示
+ 24 - 29
docs/hod_user_guide.pdf


+ 24 - 29
src/contrib/hod/bin/hod

@@ -51,6 +51,7 @@ from hodlib.Common.util import local_fqdn, need_to_allocate, filter_warnings,\
     get_exception_error_string, hodInterrupt, \
     HOD_INTERRUPTED_MESG, HOD_INTERRUPTED_CODE
 from hodlib.Common.tcp import tcpError, tcpSocket
+from hodlib.Hod.hod import hodHelp
 
 filter_warnings()
 
@@ -80,7 +81,12 @@ if not os.path.exists(DEFAULT_CONFIG):
 #
 defList = { 'hod' : (      
              ('original-dir', 'directory', 'hod original start directory',
-              False, None, True, True, 'r'),        
+              False, None, True, True, 'r'),
+
+             ('clusterdir', 'directory', 
+             'Directory where cluster state information and hadoop-site.xml' +
+             ' will be stored.',
+              True, None, False, True, 'd'),
 
              ('syslog-address', 'address', 'Syslog address.',
               False, None, False, True, 'y'),
@@ -92,15 +98,14 @@ defList = { 'hod' : (
               True, 3, True, True, 'b'),
             
              ('stream', 'bool', 'Output to stderr.',
-              False, True, False, True, 's'),
+              False, True, False, True),
 
-             ('min-nodes', 'pos_int', 
-              'Minimum number of nodes to allocate at startup. ' + \
-              'Used with hod.script option',
-              True, None, False, True, 'm'),
+             ('nodecount', 'pos_int', 
+              'Number of nodes to allocate at startup. ',
+              True, None, False, True, 'n'),
 
              ('script', 'file', 'Hadoop script to execute.',
-              True, None, False, True, 'z'), 
+              True, None, False, False, 's'), 
 
              ('userid', 'user_account', 
               'User ID the hod shell is running under.',
@@ -109,11 +114,11 @@ defList = { 'hod' : (
              ('allocate-wait-time', 'pos_int', 
               'Time to wait for cluster allocation.',
               False, 300, True, True, 'e'),         
-             
-             ('operation', 'string', 
-              'Initiate a hod operation. (help, allocate, deallocate ...)',
-              True, None, False, True, 'o'),
               
+             ('operation', 'string',
+              'Initiate a hod operation. (help, allocate, deallocate ...)',
+              False, None, False, True, 'o'),
+             
              ('cluster-factor', 'pos_float',
               'The number of grid slots per machines', False, 1.9, False, True,
               'x'),
@@ -144,7 +149,7 @@ defList = { 'hod' : (
                True, "HOD", False, True, 'N'),
 
              ('walltime', 'pos_int', 'Walltime in seconds for the current HOD allocation',
-              True, None, False, True),
+              True, None, False, True, 'l'),
 
              ('script-wait-time', 'pos_int', 'Specifies the time to wait before running the script. Used with the hod.script option.',
               True, 10, False, True, 'W')),
@@ -361,9 +366,12 @@ if __name__ == '__main__':
   try:
     confDef = definition()
     confDef.add_defs(defList, defOrder)
-    hodOptions = options(confDef, "./%s -c <CONFIG_FILE> [OPTIONS]" % myName,
-                         VERSION, withConfig=True, defaultConfig=DEFAULT_CONFIG)
-  
+    hodhelp = hodHelp()
+    usage = hodhelp.help()
+            
+    hodOptions = options(confDef, usage,
+                      VERSION, withConfig=True, defaultConfig=DEFAULT_CONFIG,
+                      name=myName )
     # hodConfig is a dict like object, hodConfig[section][name]
     try:
       hodConfig = config(hodOptions['config'], configDef=confDef, 
@@ -384,22 +392,9 @@ if __name__ == '__main__':
       sys.exit(1)
   
     ## TODO : should move the dependency verification to hodConfig.verify
-    if hodConfig['hod'].has_key('script') \
-      and not hodConfig['hod'].has_key('min-nodes'):
-      printErrors(hodConfig.var_error('hod', 'min-nodes',
-          "hod.min-nodes must be specified when using hod.script option."))
-      sys.exit(1)
-  
-    if hodConfig['hod'].has_key('min-nodes'):
-      if hodConfig['hod']['min-nodes'] < 3:
-        printErrors(hodConfig.var_error('hod', 'min-nodes',
-          "hod.min-nodes must be >= 3 nodes: %s." % 
-          hodConfig['hod']['min-nodes']))
-        sys.exit(1)
-    
     if hodConfig['hod'].has_key('operation') and \
       hodConfig['hod'].has_key('script'):
-      print "Script execution and hod operations are mutually exclusive."
+      print "Script operation is mutually exclusive with other HOD operations"
       hodOptions.print_help(sys.stderr)
       sys.exit(1)
     

+ 93 - 8
src/contrib/hod/hodlib/Common/setup.py

@@ -27,6 +27,7 @@ from ConfigParser import SafeConfigParser
 from optparse import OptionParser, IndentedHelpFormatter, OptionGroup
 from util import get_perms, replace_escapes
 from types import typeValidator, is_valid_type, typeToString
+from hodlib.Hod.hod import hodHelp
 
 reEmailAddress = re.compile("^.*@.*$")
 reEmailDelimit = re.compile("@")
@@ -228,8 +229,8 @@ class baseConfig:
         errorStrings = []  
         if not self._dict[section].has_key(option):
           self._dict[section][option] = None
-        errorStrings.append("%s: invalid '%s' specified in section %s: %s" % (
-            errorPrefix, option, section, self._dict[section][option]))
+        errorStrings.append("%s: invalid '%s' specified in section %s (--%s.%s): %s" % (
+            errorPrefix, option, section, section, option, self._dict[section][option]))
 
         if addData:
             errorStrings.append("%s: additional info: %s\n" % (errorPrefix,
@@ -238,11 +239,8 @@ class baseConfig:
 
     def var_error_suggest(self, errorStrings):
         if self.configFile:
-            errorStrings.append("See configuration file: %s" % \
-                self.configFile)
-        
-        if self._options:
-            errorStrings.append("Configuration can be overridden by options, see -h")
+            errorStrings.append("Check your command line options and/or " + \
+                              "your configuration file %s" % self.configFile)
     
     def __get_args(self, section):
         def __dummyToString(type, value):
@@ -603,7 +601,8 @@ class formatter(IndentedHelpFormatter):
 
 class options(OptionParser, baseConfig):
     def __init__(self, optionDef, usage, version, originalDir=None, 
-                 withConfig=False, defaultConfig=None, defaultLocation=None):
+                 withConfig=False, defaultConfig=None, defaultLocation=None,
+                 name=None):
         """Constructs and options object.
          
            optionDef     - definition object
@@ -619,6 +618,7 @@ class options(OptionParser, baseConfig):
         self.formatter = formatter(4, max_help_position=100, width=180, 
                                    short_first=1)
         
+        self.__name = name
         self.__version = version
         self.__withConfig = withConfig
         self.__defaultConfig = defaultConfig
@@ -671,6 +671,85 @@ class options(OptionParser, baseConfig):
         
         (self.__parsedOptions, self.args) = self.parse_args()
 
+        # Now process the positional arguments only for the client side
+        if self.__name == 'hod':
+
+          hodhelp = hodHelp()
+
+          _operation = getattr(self.__parsedOptions,'hod.operation')
+          _script = getattr(self.__parsedOptions, 'hod.script')
+          nArgs = self.args.__len__()
+          if _operation:
+            # -o option is given
+            if nArgs != 0:
+              self.error('invalid syntax : command and operation(-o) cannot coexist')
+          elif nArgs == 0 and _script:
+            # for a script option, without subcommand: hod -s script ...
+            pass
+          elif nArgs == 0:
+            print "Usage: ",hodhelp.help()
+            sys.exit(0)
+          else:
+            # subcommand is given
+            cmdstr = self.args[0] # the subcommand itself
+            cmdlist = hodhelp.ops
+            if cmdstr not in cmdlist:
+              print "Usage: ", hodhelp.help()
+              sys.exit(2)
+
+            numNodes = None
+            clusterDir = None
+            # Check which subcommand. cmdstr  = subcommand itself now.
+            if cmdstr == "allocate":
+              clusterDir = getattr(self.__parsedOptions, 'hod.clusterdir')
+              numNodes = getattr(self.__parsedOptions, 'hod.nodecount')
+ 
+              if not clusterDir or not numNodes:
+                print getattr(hodhelp, "help_%s" % cmdstr)()
+                sys.exit(3)
+
+              cmdstr = cmdstr + ' ' + clusterDir + ' ' + numNodes
+
+              setattr(self.__parsedOptions,'hod.operation', cmdstr)
+ 
+            elif cmdstr == "deallocate" or cmdstr == "info":
+              clusterDir = getattr(self.__parsedOptions, 'hod.clusterdir')
+
+              if not clusterDir:
+                print getattr(hodhelp, "help_%s" % cmdstr)()
+                sys.exit(3)
+ 
+              cmdstr = cmdstr + ' ' + clusterDir
+              setattr(self.__parsedOptions,'hod.operation', cmdstr)
+
+            elif cmdstr == "list":
+              setattr(self.__parsedOptions,'hod.operation', cmdstr)
+              pass
+ 
+            elif cmdstr == "script":
+              clusterDir = getattr(self.__parsedOptions, 'hod.clusterdir')
+              numNodes = getattr(self.__parsedOptions, 'hod.nodecount')
+
+              if not _script or not clusterDir or not numNodes:
+                print getattr(hodhelp, "help_%s" % cmdstr)()
+                sys.exit(3)
+              pass
+
+            elif cmdstr == "help":
+              if nArgs == 1:
+                self.print_help()
+                sys.exit(0)
+              elif nArgs != 2:
+                self.print_help()
+                sys.exit(3)
+              elif self.args[1] == 'options':
+                self.print_options()
+                sys.exit(0)
+              cmdstr = cmdstr + ' ' + self.args[1]
+              setattr(self.__parsedOptions,'hod.operation', cmdstr)
+
+        # end of processing for arguments on the client side
+
         if self.__withConfig:
             self.config = self.__parsedOptions.config
             if not self.config:
@@ -925,6 +1004,12 @@ class options(OptionParser, baseConfig):
         self.__set_display_groups()
         OptionParser.print_help(self, file)
         self.__unset_display_groups()
+
+    def print_options(self):
+        _usage = self.usage
+        self.set_usage('')
+        self.print_help()
+        self.set_usage(_usage)
                         
     def verify(self):
         return baseConfig.verify(self)

+ 182 - 95
src/contrib/hod/hodlib/Hod/hod.py

@@ -88,8 +88,8 @@ class hodState:
         
 class hodRunner:
   def __init__(self, cfg):
-    self.__ops = [ 'prepare', 'allocate', 'deallocate', 
-                   'list', 'info', 'help' ]           
+    self.__hodhelp = hodHelp()
+    self.__ops = self.__hodhelp.ops
     self.__cfg = cfg  
     self.__npd = self.__cfg['nodepooldesc']
     self.__opCode = 0
@@ -185,80 +185,94 @@ class hodRunner:
     argLength = len(args)
     min = 0
     max = 0
+    errorFlag = False
+    errorMsgs = []
+
     if argLength == 3:
       nodes = args[2]
       clusterDir = self.__norm_cluster_dir(args[1])
-      if os.path.isdir(clusterDir):
-        self.__setup_cluster_logger(clusterDir)
-        if re.match('\d+-\d+', nodes):
-          (min, max) = nodes.split("-")
-          min = int(min)
-          max = int(max)
-        else:
-          try:
-            nodes = int(nodes)
-            min = nodes
-            max = nodes
-          except ValueError:
-            self.__log.critical(
-            "%s operation requires a single argument. n nodes, or n-m nodes." % 
-            operation)
-            self.__opCode = 3
-          else:
-            self.__setup_cluster_state(clusterDir)
-            clusterInfo = self.__clusterState.read()
-            self.__opCode = self.__cluster.check_cluster(clusterInfo)
-            if self.__opCode == 0 or self.__opCode == 15:
-              self.__setup_service_registry()   
-              if hodInterrupt.isSet(): 
-                self.__cleanup()
-                raise HodInterruptException()
-              self.__log.info("Service Registry Started.")
-              try:
-                allocateStatus = self.__cluster.allocate(clusterDir, min, max)    
-              except HodInterruptException, h:
-                self.__cleanup()
-                raise h
-              # Allocation has gone through.
-              # Don't care about interrupts any more
-
-              if allocateStatus == 0:
-                self.__set_cluster_state_info(os.environ, 
-                                              self.__cluster.hdfsInfo, 
-                                              self.__cluster.mapredInfo, 
-                                              self.__cluster.ringmasterXRS,
-                                              self.__cluster.jobId,
-                                              min, max)
-                self.__setup_cluster_state(clusterDir)
-                self.__clusterState.write(self.__cluster.jobId, 
-                                          self.__clusterStateInfo)
-                #  Do we need to check for interrupts here ??
-
-                self.__set_user_state_info( 
-                  { clusterDir : self.__cluster.jobId, } )
-              self.__opCode = allocateStatus
-            elif self.__opCode == 12:
-              self.__log.critical("Cluster %s already allocated." % clusterDir)
-            elif self.__opCode == 10:
-              self.__log.critical("dead\t%s\t%s" % (clusterInfo['jobid'], 
-                                                    clusterDir))
-            elif self.__opCode == 13:
-              self.__log.warn("hdfs dead\t%s\t%s" % (clusterInfo['jobid'], 
-                                                         clusterDir))
-            elif self.__opCode == 14:
-              self.__log.warn("mapred dead\t%s\t%s" % (clusterInfo['jobid'], 
-                                                       clusterDir))   
-            
-            if self.__opCode > 0 and self.__opCode != 15:
-              self.__log.critical("Cannot allocate cluster %s" % clusterDir)
-            
-      else:
-        self.__log.critical("Invalid cluster directory '%s' specified." % 
-                          clusterDir)
+
+      if not os.path.isdir(clusterDir):
+        errorFlag = True
+        errorMsgs.append("Invalid cluster directory(--hod.clusterdir or -d) "+\
+                          "'%s' specified." % clusterDir)
+      if int(nodes) < 3 :
+        errorFlag = True
+        errorMsgs.append("hod.nodecount(--hod.nodecount or -n) must be >= 3."+\
+                          " Given nodes: %s" % nodes)
+      if errorFlag:
+        for msg in errorMsgs:
+          self.__log.critical(msg)
         self.__opCode = 3
+        return
+
+      self.__setup_cluster_logger(clusterDir)
+      if re.match('\d+-\d+', nodes):
+        (min, max) = nodes.split("-")
+        min = int(min)
+        max = int(max)
+      else:
+        try:
+          nodes = int(nodes)
+          min = nodes
+          max = nodes
+        except ValueError:
+          print self.__hodhelp.help_allocate()
+          self.__log.critical(
+          "%s operation requires a single argument. n nodes, or n-m nodes." % 
+          operation)
+          self.__opCode = 3
+        else:
+          self.__setup_cluster_state(clusterDir)
+          clusterInfo = self.__clusterState.read()
+          self.__opCode = self.__cluster.check_cluster(clusterInfo)
+          if self.__opCode == 0 or self.__opCode == 15:
+            self.__setup_service_registry()   
+            if hodInterrupt.isSet(): 
+              self.__cleanup()
+              raise HodInterruptException()
+            self.__log.info("Service Registry Started.")
+            try:
+              allocateStatus = self.__cluster.allocate(clusterDir, min, max)    
+            except HodInterruptException, h:
+              self.__cleanup()
+              raise h
+            # Allocation has gone through.
+            # Don't care about interrupts any more
+
+            if allocateStatus == 0:
+              self.__set_cluster_state_info(os.environ, 
+                                            self.__cluster.hdfsInfo, 
+                                            self.__cluster.mapredInfo, 
+                                            self.__cluster.ringmasterXRS,
+                                            self.__cluster.jobId,
+                                            min, max)
+              self.__setup_cluster_state(clusterDir)
+              self.__clusterState.write(self.__cluster.jobId, 
+                                        self.__clusterStateInfo)
+              #  Do we need to check for interrupts here ??
+
+              self.__set_user_state_info( 
+                { clusterDir : self.__cluster.jobId, } )
+            self.__opCode = allocateStatus
+          elif self.__opCode == 12:
+            self.__log.critical("Cluster %s already allocated." % clusterDir)
+          elif self.__opCode == 10:
+            self.__log.critical("dead\t%s\t%s" % (clusterInfo['jobid'], 
+                                                  clusterDir))
+          elif self.__opCode == 13:
+            self.__log.warn("hdfs dead\t%s\t%s" % (clusterInfo['jobid'], 
+                                                       clusterDir))
+          elif self.__opCode == 14:
+            self.__log.warn("mapred dead\t%s\t%s" % (clusterInfo['jobid'], 
+                                                     clusterDir))   
+          
+          if self.__opCode > 0 and self.__opCode != 15:
+            self.__log.critical("Cannot allocate cluster %s" % clusterDir)
     else:
+      print self.__hodhelp.help_allocate()
       self.__log.critical("%s operation requires two arguments. "  % operation
-                        + "A cluster path and n nodes, or min-max nodes.")
+                        + "A cluster directory and a nodecount.")
       self.__opCode = 3
  
   def _is_cluster_allocated(self, clusterDir):
@@ -292,6 +306,7 @@ class hodRunner:
                             clusterDir)
         self.__opCode = 3        
     else:
+      print self.__hodhelp.help_deallocate()
       self.__log.critical("%s operation requires one argument. "  % operation
                         + "A cluster path.")
       self.__opCode = 3
@@ -341,6 +356,7 @@ class hodRunner:
         self.__log.critical("'%s' does not exist." % clusterDir)
         self.__opCode = 3 
     else:
+      print self.__hodhelp.help_info()
       self.__log.critical("%s operation requires one argument. "  % operation
                         + "A cluster path.")
       self.__opCode = 3      
@@ -356,21 +372,18 @@ class hodRunner:
       for var in clusterInfo['env'].keys():
         self.__log.debug("%s = %s" % (var, clusterInfo['env'][var]))
 
- 
-  def _op_help(self, args):  
-    print "hod operations:\n"
-    print " allocate <directory> <nodes> - Allocates a cluster of n nodes using the specified cluster"
-    print "                                directory to store cluster state information.  The Hadoop site XML" 
-    print "                                is also stored in this location."
-    print ""
-    print " deallocate <directory>       - Deallocates a cluster using the pecified cluster directory.  This"
-    print "                                operation is also required to clean up a dead cluster."      
-    print ""
-    print " list                         - List all clusters currently allocated by a user, along with" 
-    print "                                limited status information and the cluster's job ID."
-    print ""
-    print " info <directory>             - Provide detailed information on an allocated cluster."
-  
+  def _op_help(self, arg):
+    if arg == None or arg.__len__() != 2:
+      print "hod commands:\n"
+      for op in self.__ops:
+        print getattr(self.__hodhelp, "help_%s" % op)()      
+    else:
+      if arg[1] not in self.__ops:
+        print self.__hodhelp.help_help()
+        self.__log.critical("Help requested for invalid operation : %s"%arg[1])
+        self.__opCode = 3
+      else: print getattr(self.__hodhelp, "help_%s" % arg[1])()
+
   def operation(self):  
     operation = self.__cfg['hod']['operation']
     try:
@@ -393,16 +406,37 @@ class hodRunner:
     return self.__opCode
   
   def script(self):
+    errorFlag = False
+    errorMsgs = []
+    
     script = self.__cfg['hod']['script']
-    nodes = self.__cfg['hod']['min-nodes']
-    isExecutable = os.access(script, os.X_OK)
-    if not isExecutable:
-      self.__log.critical('Script %s is not an executable.' % script)
-      return 1
-
-    clusterDir = "/tmp/%s.%s" % (self.__cfg['hod']['userid'], 
-                                 random.randint(0, 20000))
-    os.mkdir(clusterDir)
+    nodes = self.__cfg['hod']['nodecount']
+    clusterDir = self.__cfg['hod']['clusterdir']
+    
+    if not os.path.isfile(script):
+      errorFlag = True
+      errorMsgs.append("Invalid script file (--hod.script or -s) " + \
+                       "specified : %s" % script)
+    else:
+      isExecutable = os.access(script, os.X_OK)
+      if not isExecutable:
+        errorFlag = True
+        errorMsgs.append('Script %s is not an executable.' % \
+                                  self.__cfg['hod']['script'])
+    if not os.path.isdir(self.__cfg['hod']['clusterdir']):
+      errorFlag = True
+      errorMsgs.append("Invalid cluster directory (--hod.clusterdir or -d) " +\
+                        "'%s' specified." % self.__cfg['hod']['clusterdir'])
+    if int(self.__cfg['hod']['nodecount']) < 3 :
+      errorFlag = True
+      errorMsgs.append("nodecount(--hod.nodecount or -n) must be >= 3. " + \
+                       "Given nodes: %s" %   self.__cfg['hod']['nodecount'])
+
+    if errorFlag:
+      for msg in errorMsgs:
+        self.__log.critical(msg)
+      sys.exit(3)
+
     ret = 0
     try:
       self._op_allocate(('allocate', clusterDir, str(nodes)))
@@ -426,7 +460,6 @@ class hodRunner:
         hodInterrupt.setFlag(False)
       if self._is_cluster_allocated(clusterDir):
         self._op_deallocate(('deallocate', clusterDir))
-      shutil.rmtree(clusterDir, True)
     except HodInterruptException, h:
       self.__log.critical("Script failed because of an process interrupt.")
       self.__opCode = HOD_INTERRUPTED_CODE
@@ -442,3 +475,57 @@ class hodRunner:
       self.__opCode = ret
 
     return self.__opCode
+
+class hodHelp():
+  def __init__(self):
+    self.ops = ['allocate', 'deallocate', 'info', 'list','script',  'help']
+  
+  def help_allocate(self):
+    return \
+    "Usage       : hod allocate -d <clusterdir> -n <nodecount> [OPTIONS]\n" + \
+      "Description : Allocates a cluster of n nodes using the specified \n" + \
+      "              cluster directory to store cluster state \n" + \
+      "              information. The Hadoop site XML is also stored \n" + \
+      "              in this location.\n" + \
+      "For all options : hod help options.\n"
+
+  def help_deallocate(self):
+    return "Usage       : hod deallocate -d <clusterdir> [OPTIONS]\n" + \
+      "Description : Deallocates a cluster using the specified \n" + \
+      "             cluster directory.  This operation is also \n" + \
+      "             required to clean up a dead cluster.\n" + \
+      "For all options : hod help options.\n"
+
+  def help_list(self):
+    return "Usage       : hod list [OPTIONS]\n" + \
+      "Description : List all clusters currently allocated by a user, \n" + \
+      "              along with limited status information and the \n" + \
+      "              cluster ID.\n" + \
+      "For all options : hod help options.\n"
+
+  def help_info(self):
+    return "Usage       : hod info -d <clusterdir> [OPTIONS]\n" + \
+      "Description : Provide detailed information on an allocated cluster.\n" + \
+      "For all options : hod help options.\n"
+
+  def help_script(self):
+    return "Usage       : hod script -d <clusterdir> -n <nodecount> " + \
+                                        "-s <script> [OPTIONS]\n" + \
+           "Description : Allocates a cluster of n nodes with the given \n" +\
+           "              cluster directory, runs the specified script \n" + \
+           "              using the allocated cluster, and then \n" + \
+           "              deallocates the cluster.\n" + \
+           "For all options : hod help options.\n"
+ 
+  def help_help(self):
+    return "Usage       : hod help <OPERATION>\n" + \
+      "Description : Print help for the operation and exit.\n" + \
+      "Available operations : %s.\n" % self.ops + \
+      "For all options : hod help options.\n"
+
+  def help(self):
+    return  "hod <operation> [ARGS] [OPTIONS]\n" + \
+            "Available operations : %s\n" % self.ops + \
+            "For help on a particular operation : hod help <operation>.\n" + \
+            "For all options : hod help options."
+

+ 75 - 66
src/docs/src/documentation/content/xdocs/hod_user_guide.xml

@@ -5,7 +5,7 @@
 <document>
   <header>
     <title>
-      Hadoop On Demand 0.4 User Guide
+      Hadoop On Demand User Guide
     </title>
   </header>
 
@@ -13,29 +13,28 @@
   <section>
     <title> Introduction </title><anchor id="Introduction"></anchor>
   <p>Hadoop On Demand (HOD) is a system for provisioning virtual Hadoop clusters over a large physical cluster. It uses the Torque resource manager to do node allocation. On the allocated nodes, it can start Hadoop Map/Reduce and HDFS daemons. It automatically generates the appropriate configuration files (hadoop-site.xml) for the Hadoop daemons and client. HOD also has the capability to distribute Hadoop to the nodes in the virtual cluster that it allocates. In short, HOD makes it easy for administrators and users to quickly setup and use Hadoop. It is also a very useful tool for Hadoop developers and testers who need to share a physical cluster for testing their own Hadoop versions.</p>
-  <p>HOD 0.4 supports Hadoop from version 0.15 onwards.</p>
+  <p>HOD supports Hadoop from version 0.15 onwards.</p>
   <p>The rest of the documentation comprises of a quick-start guide that helps you get quickly started with using HOD, a more detailed guide of all HOD features, command line options, known issues and trouble-shooting information.</p>
   </section>
   <section>
-		<title> Getting Started Using HOD 0.4 </title><anchor id="Getting_Started_Using_HOD_0_4"></anchor>
-  <p>In this section, we shall see a step-by-step introduction on how to use HOD for the most basic operations. Before following these steps, it is assumed that HOD 0.4 and its dependent hardware and software components are setup and configured correctly. This is a step that is generally performed by system administrators of the cluster.</p>
-  <p>The HOD 0.4 user interface is a command line utility called <code>hod</code>. It is driven by a configuration file, that is typically setup for users by system administrators. Users can override this configuration when using the <code>hod</code>, which is described later in this documentation. The configuration file can be specified in two ways when using <code>hod</code>, as described below: </p>
+		<title> Getting Started Using HOD </title><anchor id="Getting_Started_Using_HOD_0_4"></anchor>
+  <p>In this section, we shall see a step-by-step introduction on how to use HOD for the most basic operations. Before following these steps, it is assumed that HOD and its dependent hardware and software components are setup and configured correctly. This is a step that is generally performed by system administrators of the cluster.</p>
+  <p>The HOD user interface is a command line utility called <code>hod</code>. It is driven by a configuration file, that is typically setup for users by system administrators. Users can override this configuration when using the <code>hod</code>, which is described later in this documentation. The configuration file can be specified in two ways when using <code>hod</code>, as described below: </p>
   <ul>
-    <li> Specify it on command line, using the -c option. Such as <code>hod -c path-to-the-configuration-file other-options</code></li>
+    <li> Specify it on command line, using the -c option. Such as <code>hod &lt;operation&gt; &lt;required-args&gt; -c path-to-the-configuration-file [other-options]</code></li>
     <li> Set up an environment variable <em>HOD_CONF_DIR</em> where <code>hod</code> will be run. This should be pointed to a directory on the local file system, containing a file called <em>hodrc</em>. Note that this is analogous to the <em>HADOOP_CONF_DIR</em> and <em>hadoop-site.xml</em> file for Hadoop. If no configuration file is specified on the command line, <code>hod</code> shall look for the <em>HOD_CONF_DIR</em> environment variable and a <em>hodrc</em> file under that.</li>
     </ul>
   <p>In examples listed below, we shall not explicitly point to the configuration option, assuming it is correctly specified.</p>
-  <p><code>hod</code> can be used in two modes, the <em>operation</em> mode and the <em>script</em> mode. We shall describe the two modes in detail below.</p>
-  <section><title> HOD <em>Operation</em> Mode </title><anchor id="HOD_Operation_Mode"></anchor>
-  <p>A typical session of HOD using this option will involve at least three steps: allocate, run hadoop jobs, deallocate. In order to use this mode, perform the following steps.</p>
+  <section><title>A typical HOD session</title><anchor id="HOD_Session"></anchor>
+  <p>A typical session of HOD will involve at least three steps: allocate, run hadoop jobs, deallocate. In order to do this, perform the following steps.</p>
   <p><strong> Create a Cluster Directory </strong></p><anchor id="Create_a_Cluster_Directory"></anchor>
   <p>The <em>cluster directory</em> is a directory on the local file system where <code>hod</code> will generate the Hadoop configuration, <em>hadoop-site.xml</em>, corresponding to the cluster it allocates. Create this directory and pass it to the <code>hod</code> operations as stated below. Once a cluster is allocated, a user can utilize it to run Hadoop jobs by specifying the cluster directory as the Hadoop --config option. </p>
   <p><strong> Operation <em>allocate</em></strong></p><anchor id="Operation_allocate"></anchor>
-  <p>The <em>allocate</em> operation is used to allocate a set of nodes and install and provision Hadoop on them. It has the following syntax:</p>
+  <p>The <em>allocate</em> operation is used to allocate a set of nodes and install and provision Hadoop on them. It has the following syntax. Note that it requires a cluster_dir ( -d, --hod.clusterdir) and the number of nodes (-n, --hod.nodecount) needed to be allocated:</p>
     <table>
       
         <tr>
-          <td><code>$ hod -o "allocate cluster_dir number_of_nodes"</code></td>
+          <td><code>$ hod allocate -d cluster_dir -n number_of_nodes [OPTIONS]</code></td>
         </tr>
       
     </table>
@@ -43,7 +42,7 @@
   <p>An example run of this command produces the following output. Note in this example that <code>~/hod-clusters/test</code> is the cluster directory, and we are allocating 5 nodes:</p>
   <table>
     <tr>
-      <td><code>$ hod -o "allocate ~/hod-clusters/test 5"</code><br/>
+      <td><code>$ hod allocate -d ~/hod-clusters/test -n 5</code><br/>
       <code>INFO - HDFS UI on http://foo1.bar.com:53422</code><br/>
       <code>INFO - Mapred UI on http://foo2.bar.com:55380</code><br/></td>
       </tr>
@@ -75,31 +74,31 @@
     </tr>
   </table>
   <p><strong> Operation <em>deallocate</em></strong></p><anchor id="Operation_deallocate"></anchor>
-  <p>The <em>deallocate</em> operation is used to release an allocated cluster. When finished with a cluster, deallocate must be run so that the nodes become free for others to use. The <em>deallocate</em> operation has the following syntax:</p>
+  <p>The <em>deallocate</em> operation is used to release an allocated cluster. When finished with a cluster, deallocate must be run so that the nodes become free for others to use. The <em>deallocate</em> operation has the following syntax. Note that it requires the cluster_dir (-d, --hod.clusterdir) argument:</p>
     <table>
       
         <tr>
-          <td><code>$ hod -o "deallocate cluster_dir"</code></td>
+          <td><code>$ hod deallocate -d cluster_dir</code></td>
         </tr>
       
     </table>
   <p>Continuing our example, the following command will deallocate the cluster:</p>
-  <table><tr><td><code>$ hod -o "deallocate ~/hod-clusters/test"</code></td></tr></table>
-  <p>As can be seen, when used in the <em>operation</em> mode, HOD allows the users to allocate a cluster, and use it flexibly for running Hadoop jobs. For example, users can run multiple jobs in parallel on the same cluster, by running hadoop from multiple shells pointing to the same configuration.</p>
+  <table><tr><td><code>$ hod deallocate -d ~/hod-clusters/test</code></td></tr></table>
+  <p>As can be seen, HOD allows the users to allocate a cluster, and use it flexibly for running Hadoop jobs. For example, users can run multiple jobs in parallel on the same cluster, by running hadoop from multiple shells pointing to the same configuration.</p>
 	</section>
-  <section><title> HOD <em>Script</em> Mode </title><anchor id="HOD_Script_Mode"></anchor>
-  <p>The HOD <em>script mode</em> combines the operations of allocating, using and deallocating a cluster into a single operation. This is very useful for users who want to run a script of hadoop jobs and let HOD handle the cleanup automatically once the script completes. In order to use <code>hod</code> in the script mode, do the following:</p>
+  <section><title>Running hadoop scripts using HOD</title><anchor id="HOD_Script_Mode"></anchor>
+  <p>The HOD <em>script operation</em> combines the operations of allocating, using and deallocating a cluster into a single operation. This is very useful for users who want to run a script of hadoop jobs and let HOD handle the cleanup automatically once the script completes. In order to run hadoop scripts using <code>hod</code>, do the following:</p>
   <p><strong> Create a script file </strong></p><anchor id="Create_a_script_file"></anchor>
   <p>This will be a regular shell script that will typically contain hadoop commands, such as:</p>
   <table><tr><td><code>$ hadoop jar jar_file options</code></td>
   </tr></table>
-  <p>However, the user can add any valid commands as part of the script. HOD will execute this script setting <em>HADOOP_CONF_DIR</em> automatically to point to the allocated cluster. So users do not need to worry about this. They also do not need to create a cluster directory as in the <em>operation</em> mode.</p>
+  <p>However, the user can add any valid commands as part of the script. HOD will execute this script setting <em>HADOOP_CONF_DIR</em> automatically to point to the allocated cluster. So users do not need to worry about this. The users however need to create a cluster directory just like when using the allocate operation.</p>
   <p><strong> Running the script </strong></p><anchor id="Running_the_script"></anchor>
-  <p>The syntax for the <em>script mode</em> as is as follows:</p>
+  <p>The syntax for the <em>script operation</em> as is as follows. Note that it requires a cluster directory ( -d, --hod.clusterdir), number of nodes (-n, --hod.nodecount) and a script file (-s, --hod.script):</p>
     <table>
       
         <tr>
-          <td><code>$ hod -m number_of_nodes -z script_file</code></td>
+          <td><code>$ hod script -d cluster_directory -n number_of_nodes -s script_file</code></td>
         </tr>
       
     </table>
@@ -107,7 +106,7 @@
    </section>
   </section>
   <section>
-		<title> HOD 0.4 Features </title><anchor id="HOD_0_4_Features"></anchor>
+		<title> HOD Features </title><anchor id="HOD_0_4_Features"></anchor>
   <section><title> Provisioning and Managing Hadoop Clusters </title><anchor id="Provisioning_and_Managing_Hadoop"></anchor>
   <p>The primary feature of HOD is to provision Hadoop Map/Reduce and HDFS clusters. This is described above in the Getting Started section. Also, as long as nodes are available, and organizational policies allow, a user can use HOD to allocate multiple Map/Reduce clusters simultaneously. The user would need to specify different paths for the <code>cluster_dir</code> parameter mentioned above for each cluster he/she allocates. HOD provides the <em>list</em> and the <em>info</em> operations to enable managing multiple clusters.</p>
   <p><strong> Operation <em>list</em></strong></p><anchor id="Operation_list"></anchor>
@@ -115,16 +114,16 @@
     <table>
       
         <tr>
-          <td><code>$ hod -o "list"</code></td>
+          <td><code>$ hod list</code></td>
         </tr>
       
     </table>
   <p><strong> Operation <em>info</em></strong></p><anchor id="Operation_info"></anchor>
-  <p>The info operation shows information about a given cluster. The information shown includes the Torque job id, and locations of the important daemons like the HOD Ringmaster process, and the Hadoop JobTracker and NameNode daemons. The info operation has the following syntax:</p>
+  <p>The info operation shows information about a given cluster. The information shown includes the Torque job id, and locations of the important daemons like the HOD Ringmaster process, and the Hadoop JobTracker and NameNode daemons. The info operation has the following syntax. Note that it requires a cluster directory (-d, --hod.clusterdir):</p>
     <table>
       
         <tr>
-          <td><code>$ hod -o "info cluster_dir"</code></td>
+          <td><code>$ hod info -d cluster_dir</code></td>
         </tr>
       
     </table>
@@ -133,19 +132,18 @@
   <section><title> Using a tarball to distribute Hadoop </title><anchor id="Using_a_tarball_to_distribute_Ha"></anchor>
   <p>When provisioning Hadoop, HOD can use either a pre-installed Hadoop on the cluster nodes or distribute and install a Hadoop tarball as part of the provisioning operation. If the tarball option is being used, there is no need to have a pre-installed Hadoop on the cluster nodes, nor a need to use a pre-installed one. This is especially useful in a development / QE environment where individual developers may have different versions of Hadoop to test on a shared cluster. </p>
   <p>In order to use a pre-installed Hadoop, you must specify, in the hodrc, the <code>pkgs</code> option in the <code>gridservice-hdfs</code> and <code>gridservice-mapred</code> sections. This must point to the path where Hadoop is installed on all nodes of the cluster.</p>
-  <p>The tarball option can be used in both the <em>operation</em> and <em>script</em> options. </p>
-  <p>In the operation option, the syntax is as follows:</p>
+  <p>The syntax for specifying tarball is as follows:</p>
     <table>
         <tr>
-          <td><code>$ hod -t hadoop_tarball_location -o "allocate cluster_dir number_of_nodes"</code></td>
+          <td><code>$ hod allocate -d cluster_dir -n number_of_nodes -t hadoop_tarball_location</code></td>
         </tr>
     </table>
   <p>For example, the following command allocates Hadoop provided by the tarball <code>~/share/hadoop.tar.gz</code>:</p>
-  <table><tr><td><code>$ hod -t ~/share/hadoop.tar.gz -o "allocate ~/hadoop-cluster 10"</code></td></tr></table>
-  <p>In the script option, the syntax is as follows:</p>
+  <table><tr><td><code>$ hod allocate -d ~/hadoop-cluster -n 10 -t ~/share/hadoop.tar.gz</code></td></tr></table>
+  <p>Similarly, when using hod script, the syntax is as follows:</p>
     <table>
         <tr>
-          <td><code>$ hod -t hadoop_tarball_location -m number_of_nodes -z script_file</code></td>
+          <td><code>$ hod script -d cluster_directory -s script_file -n number_of_nodes -t hadoop_tarball_location</code></td>
         </tr>
     </table>
   <p>The hadoop_tarball specified in the syntax above should point to a path on a shared file system that is accessible from all the compute nodes. Currently, HOD only supports NFS mounted file systems.</p>
@@ -162,7 +160,7 @@
     </p>
     <table>
         <tr>
-          <td><code>$ hod --gridservice-hdfs.external -o "allocate cluster_dir number_of_nodes"</code></td>
+          <td><code>$ hod allocate -d cluster_dir -n number_of_nodes --gridservice-hdfs.external</code></td>
         </tr>
     </table>
   <p>HOD can be used to provision an HDFS cluster as well as a Map/Reduce cluster, if required. To do so, set the following option in the section <code>gridservice-hdfs</code> of the hodrc:</p>
@@ -180,13 +178,13 @@
   <p>For configuring the Map/Reduce daemons use:</p>
     <table>
         <tr>
-          <td><code>$ hod -Mmapred.reduce.parallel.copies=20 -Mio.sort.factor=100 -o "allocate cluster_dir number_of_nodes"</code></td>
+          <td><code>$ hod allocate -d cluster_dir -n number_of_nodes -Mmapred.reduce.parallel.copies=20 -Mio.sort.factor=100</code></td>
         </tr>
     </table>
   <p>In the example above, the <em>mapred.reduce.parallel.copies</em> parameter and the <em>io.sort.factor</em> parameter will be appended to the other <code>server-params</code> or if they already exist in <code>server-params</code>, will override them. In order to specify these are <em>final</em> parameters, you can use:</p>
     <table>
         <tr>
-          <td><code>$ hod -Fmapred.reduce.parallel.copies=20 -Fio.sort.factor=100 -o "allocate cluster_dir number_of_nodes"</code></td>
+          <td><code>$ hod allocate -d cluster_dir -n number_of_nodes -Fmapred.reduce.parallel.copies=20 -Fio.sort.factor=100</code></td>
         </tr>
     </table>
   <p>However, note that final parameters cannot be overwritten from command line. They can only be appended if not already specified.</p>
@@ -195,14 +193,14 @@
   <p>As mentioned above, if the allocation operation completes successfully then <code>cluster_dir/hadoop-site.xml</code> will be generated and will contain information about the allocated cluster's JobTracker and NameNode. This configuration is used when submitting jobs to the cluster. HOD provides an option to include additional Hadoop configuration parameters into this file. The syntax for doing so is as follows:</p>
     <table>
         <tr>
-          <td><code>$ hod -Cmapred.userlog.limit.kb=200 -Cmapred.child.java.opts=-Xmx512m -o "allocate cluster_dir number_of_nodes"</code></td>
+          <td><code>$ hod allocate -d cluster_dir -n number_of_nodes -Cmapred.userlog.limit.kb=200 -Cmapred.child.java.opts=-Xmx512m</code></td>
         </tr>
     </table>
   <p>In this example, the <em>mapred.userlog.limit.kb</em> and <em>mapred.child.java.opts</em> options will be included into the hadoop-site.xml that is generated by HOD.</p>
   </section>
   <section><title> Viewing Hadoop Web-UIs </title><anchor id="Viewing_Hadoop_Web_UIs"></anchor>
   <p>The HOD allocation operation prints the JobTracker and NameNode web UI URLs. For example:</p>
-   <table><tr><td><code>$ hod -c ~/hod-conf-dir/hodrc -o "allocate ~/hadoop-cluster 10"</code><br/>
+   <table><tr><td><code>$ hod allocate -d ~/hadoop-cluster -n 10 -c ~/hod-conf-dir/hodrc</code><br/>
     <code>INFO - HDFS UI on http://host242.foo.com:55391</code><br/>
     <code>INFO - Mapred UI on http://host521.foo.com:54874</code>
     </td></tr></table>
@@ -233,14 +231,14 @@
   <p>To specify the wallclock time, use the following syntax:</p>
     <table>
         <tr>
-          <td><code>$ hod -l time_in_seconds -o "allocate cluster_dir number_of_nodes"</code></td>
+          <td><code>$ hod allocate -d cluster_dir -n number_of_nodes -l time_in_seconds</code></td>
         </tr>
     </table>
   <p>The name or title of a Torque job helps in user friendly identification of the job. The string specified here will show up in all information where Torque job attributes are displayed, including the <code>qstat</code> command.</p>
   <p>To specify the name or title, use the following syntax:</p>
     <table>
         <tr>
-          <td><code>$ hod -N name_of_job -o "allocate cluster_dir number_of_nodes"</code></td>
+          <td><code>$ hod allocate -d cluster_dir -n number_of_nodes -N name_of_job</code></td>
         </tr>
     </table>
   <p><em>Note:</em> Due to restriction in the underlying Torque resource manager, names which do not start with a alphabet or contain a 'space' will cause the job to fail. The failure message points to the problem being in the specified job name.</p>
@@ -292,30 +290,37 @@
     
   </table>
   </section>
-	</section>
   <section>
-		<title> Command Line Options </title><anchor id="Command_Line_Options"></anchor>
-  <p>Command line options for the <code>hod</code> command are used for two purposes: defining an operation that HOD must perform, and defining configuration options for customizing HOD that override options defined in the default configuration file. This section covers both types of options. </p>
-  <section><title> Options Defining Operations </title><anchor id="Options_Defining_Operations"></anchor>
-  <p><em>--help</em><br />
-    Prints out the help message to see the basic options.</p>
-  <p><em>--verbose-help</em><br />
-    All configuration options provided in the hodrc file can be passed on the command line, using the syntax <code>--section_name.option_name[=value]</code>. When provided this way, the value provided on command line overrides the option provided in hodrc. The verbose-help command lists all the available options in the hodrc file. This is also a nice way to see the meaning of the configuration options.</p>
-  <p><em>-o "operation_name options"</em><br />
-    This class of options are used to define the <em>operation</em> mode of HOD. <em>Note:</em> The operation_name and other options must be specified within double quotes.</p>
-  <p><em>-o "help"</em><br />
-    Lists the operations available in the <em>operation</em> mode.</p>
-  <p><em>-o "allocate cluster_dir number_of_nodes"</em><br />
-    Allocates a cluster on the given number of cluster nodes, and store the allocation information in cluster_dir for use with subsequent <code>hadoop</code> commands. Note that the <code>cluster_dir</code> must exist before running the command.</p>
-  <p><em>-o "list"</em><br />
-    Lists the clusters allocated by this user. Information provided includes the Torque job id corresponding to the cluster, the cluster directory where the allocation information is stored, and whether the Map/Reduce daemon is still active or not.</p>
-  <p><em>-o "info cluster_dir"</em><br />
-    Lists information about the cluster whose allocation information is stored in the specified cluster directory.</p>
-  <p><em>-o "deallocate cluster_dir"</em><br />
-    Deallocates the cluster whose allocation information is stored in the specified cluster directory.</p>
-  <p><em>-z script_file</em><br />
-    Runs HOD in <em>script mode</em>. Provisions Hadoop on a given number of nodes, executes the given script from the submitting node, and deallocates the cluster when the script completes. Refer to option <em>-m</em></p>
+    <title> Command Line</title><anchor id="Command_Line"></anchor>
+    <p>HOD command line has the following general syntax:<br/>
+      <em>hod &lt;operation&gt; [ARGS] [OPTIONS]<br/></em>
+      Allowed operations are 'allocate', 'deallocate', 'info', 'list', 'script' and 'help'. For help on a particular operation one can do : <code>hod help &lt;operation&gt;</code>. To have a look at possible options one can do a <code>hod help options.</code></p>
+      <p><em>allocate</em><br />
+      <em>Usage : hod allocate -d cluster_dir -n number_of_nodes [OPTIONS]</em><br />
+        Allocates a cluster on the given number of cluster nodes, and store the allocation information in cluster_dir for use with subsequent <code>hadoop</code> commands. Note that the <code>cluster_dir</code> must exist before running the command.</p>
+      <p><em>list</em><br/>
+      <em>Usage : hod list [OPTIONS]</em><br />
+       Lists the clusters allocated by this user. Information provided includes the Torque job id corresponding to the cluster, the cluster directory where the allocation information is stored, and whether the Map/Reduce daemon is still active or not.</p>
+      <p><em>info</em><br/>
+      <em>Usage : hod info -d cluster_dir [OPTIONS]</em><br />
+        Lists information about the cluster whose allocation information is stored in the specified cluster directory.</p>
+      <p><em>deallocate</em><br/>
+      <em>Usage : hod deallocate -d cluster_dir [OPTIONS]</em><br />
+        Deallocates the cluster whose allocation information is stored in the specified cluster directory.</p>
+      <p><em>script</em><br/>
+      <em>Usage : hod script -s script_file -d cluster_directory -n number_of_nodes [OPTIONS]</em><br />
+        Runs a hadoop script using HOD<em>script</em> operation. Provisions Hadoop on a given number of nodes, executes the given script from the submitting node, and deallocates the cluster when the script completes.</p>
+      <p><em>help</em><br/>
+      <em>Usage : hod help [operation | 'options']</em><br/>
+       When no argument is specified, <code>hod help</code> gives the usage and basic options, and is equivalent to <code>hod --help</code> (See below). When 'options' is given as argument, hod displays only the basic options that hod takes. When an operation is specified, it displays the usage and description corresponding to that particular operation. For e.g, to know about allocate operation, one can do a <code>hod help allocate</code></p>
+      <p>Besides the operations, HOD can take the following command line options.</p>
+      <p><em>--help</em><br />
+        Prints out the help message to see the usage and basic options.</p>
+      <p><em>--verbose-help</em><br />
+        All configuration options provided in the hodrc file can be passed on the command line, using the syntax <code>--section_name.option_name[=value]</code>. When provided this way, the value provided on command line overrides the option provided in hodrc. The verbose-help command lists all the available options in the hodrc file. This is also a nice way to see the meaning of the configuration options.</p>
+       <p>See the <a href="#Options_Configuring_HOD">next section</a> for a description of most important hod configuration options. For basic options, one can do a <code>hod help options</code> and for all options possible in hod configuration, one can see <code>hod --verbose-help</code>. See <a href="hod_config_guide.html">config guide</a> for a description of all options.</p>
   </section>
+
   <section><title> Options Configuring HOD </title><anchor id="Options_Configuring_HOD"></anchor>
   <p>As described above, HOD is configured using a configuration file that is usually set up by system administrators. This is a INI style configuration file that is divided into sections, and options inside each section. Each section relates to one of the HOD processes: client, ringmaster, hodring, mapreduce or hdfs. The options inside a section comprise of an option name and value. </p>
   <p>Users can override the configuration defined in the default configuration in two ways: </p>
@@ -326,18 +331,22 @@
   <p>This section describes some of the most commonly used configuration options. These commonly used options are provided with a <em>short</em> option for convenience of specification. All other options can be specified using a <em>long</em> option that is also described below.</p>
   <p><em>-c config_file</em><br />
     Provides the configuration file to use. Can be used with all other options of HOD. Alternatively, the <code>HOD_CONF_DIR</code> environment variable can be defined to specify a directory that contains a file named <code>hodrc</code>, alleviating the need to specify the configuration file in each HOD command.</p>
-  <p><em>-b 1|2|3|4</em><br />
+  <p><em>-d cluster_dir</em><br />
+        This is required for most of the hod operations. As described <a href="#Create_a_Cluster_Directory">here</a>, the <em>cluster directory</em> is a directory on the local file system where <code>hod</code> will generate the Hadoop configuration, <em>hadoop-site.xml</em>, corresponding to the cluster it allocates. Create this directory and pass it to the <code>hod</code> operations as an argument to -d or --hod.clusterdir. Once a cluster is allocated, a user can utilize it to run Hadoop jobs by specifying the clusterdirectory as the Hadoop --config option.</p>
+  <p><em>-n number_of_nodes</em><br />
+  This is required for the hod 'allocation' operation and for script operation. This denotes the number of nodes to be allocated.</p>
+  <p><em>-s script-file</em><br/>
+   Required when using script operation, specifies the script file to execute.</p>
+ <p><em>-b 1|2|3|4</em><br />
     Enables the given debug level. Can be used with all other options of HOD. 4 is most verbose.</p>
   <p><em>-t hadoop_tarball</em><br />
     Provisions Hadoop from the given tar.gz file. This option is only applicable to the <em>allocate</em> operation. For better distribution performance it is strongly recommended that the Hadoop tarball is created <em>after</em> removing the source or documentation.</p>
-  <p><em>-m number_of_nodes</em><br />
-    When used in the <em>script</em> mode, this specifies the number of nodes to allocate. Note that this option is useful only in the script mode.</p>
   <p><em>-N job-name</em><br />
     The Name to give to the resource manager job that HOD uses underneath. For e.g. in the case of Torque, this translates to the <code>qsub -N</code> option, and can be seen as the job name using the <code>qstat</code> command.</p>
   <p><em>-l wall-clock-time</em><br />
     The amount of time for which the user expects to have work on the allocated cluster. This is passed to the resource manager underneath HOD, and can be used in more efficient scheduling and utilization of the cluster. Note that in the case of Torque, the cluster is automatically deallocated after this time expires.</p>
   <p><em>-j java-home</em><br />
-    Path to be set to the JAVA_HOME environment variable. This is used in the <em>script</em> mode. HOD sets the JAVA_HOME environment variable tot his value and launches the user script in that.</p>
+    Path to be set to the JAVA_HOME environment variable. This is used in the <em>script</em> operation. HOD sets the JAVA_HOME environment variable tot his value and launches the user script in that.</p>
   <p><em>-A account-string</em><br />
     Accounting information to pass to underlying resource manager.</p>
   <p><em>-Q queue-name</em><br />
@@ -384,12 +393,12 @@
       <tr>
         <td> 2 </td>
         <td> Invalid operation </td>
-        <td> Do <code>hod -o "help"</code> for the list of valid operations. </td>
+        <td> Do <code>hod help</code> for the list of valid operations. </td>
       </tr>
       <tr>
         <td> 3 </td>
         <td> Invalid operation arguments </td>
-        <td> Do <code>hod -o "help"</code> for the list of valid operations. Note that for an <em>allocate</em> operation, the directory argument must specify an existing directory. </td>
+        <td> Do <code>hod help operation</code> for listing the usage of a particular operation.</td>
       </tr>
       <tr>
         <td> 4 </td>

部分文件因为文件数量过多而无法显示