Jelajahi Sumber

Merged -r 694701:694702 from trunk to branch-0.18 to fix HADOOP-4145.

git-svn-id: https://svn.apache.org/repos/asf/hadoop/core/branches/branch-0.18@694707 13f79535-47bb-0310-9956-ffa450edef68
Nigel Daley 16 tahun lalu
induk
melakukan
6ca6570bff

+ 63 - 23
docs/changes.html

@@ -36,7 +36,7 @@
     function collapse() {
       for (var i = 0; i < document.getElementsByTagName("ul").length; i++) {
         var list = document.getElementsByTagName("ul")[i];
-        if (list.id != 'release_0.18.0_-_unreleased_' && list.id != 'release_0.17.2_-_unreleased_') {
+        if (list.id != 'release_0.18.1_-_unreleased_' && list.id != 'release_0.18.0_-_2008-08-19_') {
           list.style.display = "none";
         }
       }
@@ -52,12 +52,38 @@
 <a href="http://hadoop.apache.org/core/"><img class="logoImage" alt="Hadoop" src="images/hadoop-logo.jpg" title="Scalable Computing Platform"></a>
 <h1>Hadoop Change Log</h1>
 
-<h2><a href="javascript:toggleList('release_0.18.0_-_unreleased_')">Release 0.18.0 - Unreleased
+<h2><a href="javascript:toggleList('release_0.18.1_-_unreleased_')">Release 0.18.1 - Unreleased
 </a></h2>
-<ul id="release_0.18.0_-_unreleased_">
-  <li><a href="javascript:toggleList('release_0.18.0_-_unreleased_._incompatible_changes_')">  INCOMPATIBLE CHANGES
+<ul id="release_0.18.1_-_unreleased_">
+  <li><a href="javascript:toggleList('release_0.18.1_-_unreleased_._improvements_')">  IMPROVEMENTS
+</a>&nbsp;&nbsp;&nbsp;(1)
+    <ol id="release_0.18.1_-_unreleased_._improvements_">
+      <li><a href="http://issues.apache.org/jira/browse/HADOOP-3934">HADOOP-3934</a>. Upgrade log4j to 1.2.15.<br />(omalley)</li>
+    </ol>
+  </li>
+  <li><a href="javascript:toggleList('release_0.18.1_-_unreleased_._bug_fixes_')">  BUG FIXES
+</a>&nbsp;&nbsp;&nbsp;(5)
+    <ol id="release_0.18.1_-_unreleased_._bug_fixes_">
+      <li><a href="http://issues.apache.org/jira/browse/HADOOP-3995">HADOOP-3995</a>. In case of quota failure on HDFS, rename does not restore
+source filename.<br />(rangadi)</li>
+      <li><a href="http://issues.apache.org/jira/browse/HADOOP-3821">HADOOP-3821</a>. Prevent SequenceFile and IFile from duplicating codecs in
+CodecPool when closed more than once.<br />(Arun Murthy via cdouglas)</li>
+      <li><a href="http://issues.apache.org/jira/browse/HADOOP-4040">HADOOP-4040</a>. Remove coded default of the IPC idle connection timeout
+from the TaskTracker, which was causing HDFS client connections to not be
+collected.<br />(ddas via omalley)</li>
+      <li><a href="http://issues.apache.org/jira/browse/HADOOP-4046">HADOOP-4046</a>. Made WritableComparable's constructor protected instead of
+private to re-enable class derivation.<br />(cdouglas via omalley)</li>
+      <li><a href="http://issues.apache.org/jira/browse/HADOOP-3940">HADOOP-3940</a>. Fix in-memory merge condition to wait when there are no map
+outputs or when the final map outputs are being fetched without contention.<br />(cdouglas)</li>
+    </ol>
+  </li>
+</ul>
+<h2><a href="javascript:toggleList('release_0.18.0_-_2008-08-19_')">Release 0.18.0 - 2008-08-19
+</a></h2>
+<ul id="release_0.18.0_-_2008-08-19_">
+  <li><a href="javascript:toggleList('release_0.18.0_-_2008-08-19_._incompatible_changes_')">  INCOMPATIBLE CHANGES
 </a>&nbsp;&nbsp;&nbsp;(23)
-    <ol id="release_0.18.0_-_unreleased_._incompatible_changes_">
+    <ol id="release_0.18.0_-_2008-08-19_._incompatible_changes_">
       <li><a href="http://issues.apache.org/jira/browse/HADOOP-2703">HADOOP-2703</a>.  The default options to fsck skips checking files
 that are being written to. The output of fsck is incompatible
 with previous release.<br />(lohit vijayarenu via dhruba)</li>
@@ -133,9 +159,9 @@ to SequenceFiles.<br />(cdouglas)</li>
 </li>
     </ol>
   </li>
-  <li><a href="javascript:toggleList('release_0.18.0_-_unreleased_._new_features_')">  NEW FEATURES
+  <li><a href="javascript:toggleList('release_0.18.0_-_2008-08-19_._new_features_')">  NEW FEATURES
 </a>&nbsp;&nbsp;&nbsp;(25)
-    <ol id="release_0.18.0_-_unreleased_._new_features_">
+    <ol id="release_0.18.0_-_2008-08-19_._new_features_">
       <li><a href="http://issues.apache.org/jira/browse/HADOOP-3074">HADOOP-3074</a>. Provides a UrlStreamHandler for DFS and other FS,
 relying on FileSystem<br />(taton)</li>
       <li><a href="http://issues.apache.org/jira/browse/HADOOP-2585">HADOOP-2585</a>. Name-node imports namespace data from a recent checkpoint
@@ -189,9 +215,9 @@ framework.<br />(tomwhite via omalley)</li>
 in hadoop user guide.<br />(shv)</li>
     </ol>
   </li>
-  <li><a href="javascript:toggleList('release_0.18.0_-_unreleased_._improvements_')">  IMPROVEMENTS
+  <li><a href="javascript:toggleList('release_0.18.0_-_2008-08-19_._improvements_')">  IMPROVEMENTS
 </a>&nbsp;&nbsp;&nbsp;(47)
-    <ol id="release_0.18.0_-_unreleased_._improvements_">
+    <ol id="release_0.18.0_-_2008-08-19_._improvements_">
       <li><a href="http://issues.apache.org/jira/browse/HADOOP-3677">HADOOP-3677</a>. Simplify generation stamp upgrade by making is a
 local upgrade on datandodes. Deleted distributed upgrade.<br />(rangadi)</li>
       <li><a href="http://issues.apache.org/jira/browse/HADOOP-2928">HADOOP-2928</a>. Remove deprecated FileSystem.getContentLength().<br />(Lohit Vjayarenu via rangadi)</li>
@@ -286,9 +312,9 @@ via the DistributedCache.<br />(Amareshwari Sriramadasu via ddas)</li>
       <li><a href="http://issues.apache.org/jira/browse/HADOOP-3688">HADOOP-3688</a>. Fix up HDFS docs.<br />(Robert Chansler via hairong)</li>
     </ol>
   </li>
-  <li><a href="javascript:toggleList('release_0.18.0_-_unreleased_._optimizations_')">  OPTIMIZATIONS
+  <li><a href="javascript:toggleList('release_0.18.0_-_2008-08-19_._optimizations_')">  OPTIMIZATIONS
 </a>&nbsp;&nbsp;&nbsp;(10)
-    <ol id="release_0.18.0_-_unreleased_._optimizations_">
+    <ol id="release_0.18.0_-_2008-08-19_._optimizations_">
       <li><a href="http://issues.apache.org/jira/browse/HADOOP-3274">HADOOP-3274</a>. The default constructor of BytesWritable creates empty
 byte array. (Tsz Wo (Nicholas), SZE via shv)
 </li>
@@ -309,9 +335,9 @@ InputFormat.validateInput.<br />(tomwhite via omalley)</li>
       <li><a href="http://issues.apache.org/jira/browse/HADOOP-3552">HADOOP-3552</a>. Add forrest documentation for Hadoop commands.<br />(Sharad Agarwal via cdouglas)</li>
     </ol>
   </li>
-  <li><a href="javascript:toggleList('release_0.18.0_-_unreleased_._bug_fixes_')">  BUG FIXES
-</a>&nbsp;&nbsp;&nbsp;(138)
-    <ol id="release_0.18.0_-_unreleased_._bug_fixes_">
+  <li><a href="javascript:toggleList('release_0.18.0_-_2008-08-19_._bug_fixes_')">  BUG FIXES
+</a>&nbsp;&nbsp;&nbsp;(143)
+    <ol id="release_0.18.0_-_2008-08-19_._bug_fixes_">
       <li><a href="http://issues.apache.org/jira/browse/HADOOP-2905">HADOOP-2905</a>. 'fsck -move' triggers NPE in NameNode.<br />(Lohit Vjayarenu via rangadi)</li>
       <li>Increment ClientProtocol.versionID missed by <a href="http://issues.apache.org/jira/browse/HADOOP-2585">HADOOP-2585</a>.<br />(shv)</li>
       <li><a href="http://issues.apache.org/jira/browse/HADOOP-3254">HADOOP-3254</a>. Restructure internal namenode methods that process
@@ -525,7 +551,7 @@ current semantics.<br />(lohit vijayarenu via cdouglas)</li>
       <li><a href="http://issues.apache.org/jira/browse/HADOOP-3635">HADOOP-3635</a>. Uncaught exception in DataBlockScanner.
 (Tsz Wo (Nicholas), SZE via hairong)
 </li>
-      <li><a href="http://issues.apache.org/jira/browse/HADOOP-3539">HADOOP-3539</a>. Exception when closing DFSClient while multiple files are
+      <li><a href="http://issues.apache.org/jira/browse/HADOOP-3639">HADOOP-3639</a>. Exception when closing DFSClient while multiple files are
 open.<br />(Benjamin Gufler via hairong)</li>
       <li><a href="http://issues.apache.org/jira/browse/HADOOP-3572">HADOOP-3572</a>. SetQuotas usage interface has some minor bugs.<br />(hairong)</li>
       <li><a href="http://issues.apache.org/jira/browse/HADOOP-3649">HADOOP-3649</a>. Fix bug in removing blocks from the corrupted block map.<br />(Lohit Vijayarenu via shv)</li>
@@ -582,15 +608,26 @@ in the fsimage.<br />(dhruba)</li>
       <li><a href="http://issues.apache.org/jira/browse/HADOOP-3855">HADOOP-3855</a>. Fixes an import problem introduced by <a href="http://issues.apache.org/jira/browse/HADOOP-3827">HADOOP-3827</a>.<br />(Arun Murthy via ddas)</li>
       <li><a href="http://issues.apache.org/jira/browse/HADOOP-3865">HADOOP-3865</a>. Remove reference to FSNamesystem from metrics preventing
 garbage collection.<br />(Lohit Vijayarenu via cdouglas)</li>
+      <li><a href="http://issues.apache.org/jira/browse/HADOOP-3884">HADOOP-3884</a>.  Fix so that Eclipse plugin builds against recent
+Eclipse releases.<br />(cutting)</li>
+      <li><a href="http://issues.apache.org/jira/browse/HADOOP-3837">HADOOP-3837</a>. Streaming jobs report progress status.<br />(dhruba)</li>
+      <li><a href="http://issues.apache.org/jira/browse/HADOOP-3897">HADOOP-3897</a>. Fix a NPE in secondary namenode.<br />(Lohit Vijayarenu via
+cdouglas)</li>
+      <li><a href="http://issues.apache.org/jira/browse/HADOOP-3901">HADOOP-3901</a>. Fix bin/hadoop to correctly set classpath under cygwin.
+(Tsz Wo (Nicholas) Sze via omalley)
+</li>
+      <li><a href="http://issues.apache.org/jira/browse/HADOOP-3947">HADOOP-3947</a>. Fix a problem in tasktracker reinitialization.<br />(Amareshwari Sriramadasu via ddas)</li>
     </ol>
   </li>
 </ul>
-<h2><a href="javascript:toggleList('release_0.17.2_-_unreleased_')">Release 0.17.2 - Unreleased
-</a></h2>
-<ul id="release_0.17.2_-_unreleased_">
-  <li><a href="javascript:toggleList('release_0.17.2_-_unreleased_._bug_fixes_')">  BUG FIXES
-</a>&nbsp;&nbsp;&nbsp;(10)
-    <ol id="release_0.17.2_-_unreleased_._bug_fixes_">
+<h2><a href="javascript:toggleList('older')">Older Releases</a></h2>
+<ul id="older">
+<h3><a href="javascript:toggleList('release_0.17.2_-_2008-08-11_')">Release 0.17.2 - 2008-08-11
+</a></h3>
+<ul id="release_0.17.2_-_2008-08-11_">
+  <li><a href="javascript:toggleList('release_0.17.2_-_2008-08-11_._bug_fixes_')">  BUG FIXES
+</a>&nbsp;&nbsp;&nbsp;(12)
+    <ol id="release_0.17.2_-_2008-08-11_._bug_fixes_">
       <li><a href="http://issues.apache.org/jira/browse/HADOOP-3678">HADOOP-3678</a>. Avoid spurious exceptions logged at DataNode when clients
 read from DFS.<br />(rangadi)</li>
       <li><a href="http://issues.apache.org/jira/browse/HADOOP-3760">HADOOP-3760</a>. Fix a bug with HDFS file close() mistakenly introduced
@@ -611,11 +648,14 @@ correctly cleaned-up on task completion.<br />(Zheng Shao via acmurthy)</li>
       <li><a href="http://issues.apache.org/jira/browse/HADOOP-3813">HADOOP-3813</a>. Fix task-output clean-up on HDFS to use the recursive
 FileSystem.delete rather than the FileUtil.fullyDelete.<br />(Amareshwari
 Sri Ramadasu via acmurthy)</li>
+      <li><a href="http://issues.apache.org/jira/browse/HADOOP-3859">HADOOP-3859</a>. Allow the maximum number of xceivers in the data node to
+be configurable.<br />(Johan Oskarsson via omalley)</li>
+      <li><a href="http://issues.apache.org/jira/browse/HADOOP-3931">HADOOP-3931</a>. Fix corner case in the map-side sort that causes some values
+to be counted as too large and cause pre-mature spills to disk. Some values
+will also bypass the combiner incorrectly.<br />(cdouglas via omalley)</li>
     </ol>
   </li>
 </ul>
-<h2><a href="javascript:toggleList('older')">Older Releases</a></h2>
-<ul id="older">
 <h3><a href="javascript:toggleList('release_0.17.1_-_2008-06-23_')">Release 0.17.1 - 2008-06-23
 </a></h3>
 <ul id="release_0.17.1_-_2008-06-23_">

+ 35 - 0
docs/hod_admin_guide.html

@@ -242,6 +242,15 @@ document.write("Last Published: " + document.lastModified);
 </li>
 </ul>
 </li>
+<li>
+<a href="#verify-account+-+Script+to+verify+an+account+under+which+%0A+++++++++++++jobs+are+submitted">verify-account - Script to verify an account under which 
+             jobs are submitted</a>
+<ul class="minitoc">
+<li>
+<a href="#Integrating+the+verify-account+script+with+HOD">Integrating the verify-account script with HOD</a>
+</li>
+</ul>
+</li>
 </ul>
 </li>
 </ul>
@@ -643,8 +652,34 @@ in the HOD Configuration Guide.</p>
         constraints, for example via cron. Please note that the resource manager
         and scheduler commands used in this script can be expensive and so
         it is better not to run this inside a tight loop without sleeping.</p>
+<a name="N1022D"></a><a name="verify-account+-+Script+to+verify+an+account+under+which+%0A+++++++++++++jobs+are+submitted"></a>
+<h3 class="h4">verify-account - Script to verify an account under which 
+             jobs are submitted</h3>
+<p>Production systems use accounting packages to charge users for using
+      shared compute resources. HOD supports a parameter 
+      <em>resource_manager.pbs-account</em> to allow users to identify the
+      account under which they would like to submit jobs. It may be necessary
+      to verify that this account is a valid one configured in an accounting
+      system. The <em>hod-install-dir/bin/verify-account</em> script 
+      provides a mechanism to plug-in a custom script that can do this
+      verification.</p>
+<a name="N1023C"></a><a name="Integrating+the+verify-account+script+with+HOD"></a>
+<h4>Integrating the verify-account script with HOD</h4>
+<p>HOD runs the <em>verify-account</em> script passing in the
+        <em>resource_manager.pbs-account</em> value as argument to the script,
+        before allocating a cluster. Sites can write a script that verify this 
+        account against their accounting systems. Returning a non-zero exit 
+        code from this script will cause HOD to fail allocation. Also, in
+        case of an error, HOD will print the output of script to the user.
+        Any descriptive error message can be passed to the user from the
+        script in this manner.</p>
+<p>The default script that comes with the HOD installation does not
+        do any validation, and returns a zero exit code.</p>
+<p>If the verify-account script is not found, then HOD will treat
+        that verification is disabled, and continue allocation as is.</p>
 </div>
 
+
 </div>
 <!--+
     |end content

File diff ditekan karena terlalu besar
+ 13 - 2
docs/hod_admin_guide.pdf


+ 9 - 8
docs/hod_user_guide.html

@@ -1018,7 +1018,8 @@ document.write("Last Published: " + document.lastModified);
 <td colspan="1" rowspan="1"> 5 </td>
         <td colspan="1" rowspan="1"> Job execution failure </td>
         <td colspan="1" rowspan="1"> 1. Torque Job was deleted from outside. Execute the Torque <span class="codefrag">qstat</span> command to see if you have any jobs in the <span class="codefrag">R</span> (Running) state. If none exist, try re-executing HOD. <br>
-          2. Torque problems such as the server momentarily going down, or becoming unresponsive. Contact system administrator. </td>
+          2. Torque problems such as the server momentarily going down, or becoming unresponsive. Contact system administrator. <br>
+          3. The system administrator might have configured account verification, and an invalid account is specified. Contact system administrator.</td>
       
 </tr>
       
@@ -1116,7 +1117,7 @@ document.write("Last Published: " + document.lastModified);
 </tr>
   
 </table>
-<a name="N10755"></a><a name="Hadoop+Jobs+Not+Running+on+a+Successfully+Allocated+Cluster"></a>
+<a name="N10757"></a><a name="Hadoop+Jobs+Not+Running+on+a+Successfully+Allocated+Cluster"></a>
 <h3 class="h4"> Hadoop Jobs Not Running on a Successfully Allocated Cluster </h3>
 <a name="Hadoop_Jobs_Not_Running_on_a_Suc" id="Hadoop_Jobs_Not_Running_on_a_Suc"></a>
 <p>This scenario generally occurs when a cluster is allocated, and is left inactive for sometime, and then hadoop jobs are attempted to be run on them. Then Hadoop jobs fail with the following exception:</p>
@@ -1135,31 +1136,31 @@ document.write("Last Published: " + document.lastModified);
 <em>Possible Cause:</em> There is a version mismatch between the version of the hadoop client being used to submit jobs and the hadoop used in provisioning (typically via the tarball option). Ensure compatible versions are being used.</p>
 <p>
 <em>Possible Cause:</em> You used one of the options for specifying Hadoop configuration <span class="codefrag">-M or -H</span>, which had special characters like space or comma that were not escaped correctly. Refer to the section <em>Options Configuring HOD</em> for checking how to specify such options correctly.</p>
-<a name="N10790"></a><a name="My+Hadoop+Job+Got+Killed"></a>
+<a name="N10792"></a><a name="My+Hadoop+Job+Got+Killed"></a>
 <h3 class="h4"> My Hadoop Job Got Killed </h3>
 <a name="My_Hadoop_Job_Got_Killed" id="My_Hadoop_Job_Got_Killed"></a>
 <p>
 <em>Possible Cause:</em> The wallclock limit specified by the Torque administrator or the <span class="codefrag">-l</span> option defined in the section <em>Specifying Additional Job Attributes</em> was exceeded since allocation time. Thus the cluster would have got released. Deallocate the cluster and allocate it again, this time with a larger wallclock time.</p>
 <p>
 <em>Possible Cause:</em> Problems with the JobTracker node. Refer to the section in <em>Collecting and Viewing Hadoop Logs</em> to get more information.</p>
-<a name="N107AB"></a><a name="Hadoop+Job+Fails+with+Message%3A+%27Job+tracker+still+initializing%27"></a>
+<a name="N107AD"></a><a name="Hadoop+Job+Fails+with+Message%3A+%27Job+tracker+still+initializing%27"></a>
 <h3 class="h4"> Hadoop Job Fails with Message: 'Job tracker still initializing' </h3>
 <a name="Hadoop_Job_Fails_with_Message_Jo" id="Hadoop_Job_Fails_with_Message_Jo"></a>
 <p>
 <em>Possible Cause:</em> The hadoop job was being run as part of the HOD script command, and it started before the JobTracker could come up fully. Allocate the cluster using a large value for the configuration option <span class="codefrag">--hod.script-wait-time</span>. Typically a value of 120 should work, though it is typically unnecessary to be that large.</p>
-<a name="N107BB"></a><a name="The+Exit+Codes+For+HOD+Are+Not+Getting+Into+Torque"></a>
+<a name="N107BD"></a><a name="The+Exit+Codes+For+HOD+Are+Not+Getting+Into+Torque"></a>
 <h3 class="h4"> The Exit Codes For HOD Are Not Getting Into Torque </h3>
 <a name="The_Exit_Codes_For_HOD_Are_Not_G" id="The_Exit_Codes_For_HOD_Are_Not_G"></a>
 <p>
 <em>Possible Cause:</em> Version 0.16 of hadoop is required for this functionality to work. The version of Hadoop used does not match. Use the required version of Hadoop.</p>
 <p>
 <em>Possible Cause:</em> The deallocation was done without using the <span class="codefrag">hod</span> command; for e.g. directly using <span class="codefrag">qdel</span>. When the cluster is deallocated in this manner, the HOD processes are terminated using signals. This results in the exit code to be based on the signal number, rather than the exit code of the program.</p>
-<a name="N107D3"></a><a name="The+Hadoop+Logs+are+Not+Uploaded+to+DFS"></a>
+<a name="N107D5"></a><a name="The+Hadoop+Logs+are+Not+Uploaded+to+DFS"></a>
 <h3 class="h4"> The Hadoop Logs are Not Uploaded to DFS </h3>
 <a name="The_Hadoop_Logs_are_Not_Uploaded" id="The_Hadoop_Logs_are_Not_Uploaded"></a>
 <p>
 <em>Possible Cause:</em> There is a version mismatch between the version of the hadoop being used for uploading the logs and the external HDFS. Ensure that the correct version is specified in the <span class="codefrag">hodring.pkgs</span> option.</p>
-<a name="N107E3"></a><a name="Locating+Ringmaster+Logs"></a>
+<a name="N107E5"></a><a name="Locating+Ringmaster+Logs"></a>
 <h3 class="h4"> Locating Ringmaster Logs </h3>
 <a name="Locating_Ringmaster_Logs" id="Locating_Ringmaster_Logs"></a>
 <p>To locate the ringmaster logs, follow these steps: </p>
@@ -1176,7 +1177,7 @@ document.write("Last Published: " + document.lastModified);
 <li> If you don't get enough information, you may want to set the ringmaster debug level to 4. This can be done by passing <span class="codefrag">--ringmaster.debug 4</span> to the hod command line.</li>
   
 </ul>
-<a name="N1080F"></a><a name="Locating+Hodring+Logs"></a>
+<a name="N10811"></a><a name="Locating+Hodring+Logs"></a>
 <h3 class="h4"> Locating Hodring Logs </h3>
 <a name="Locating_Hodring_Logs" id="Locating_Hodring_Logs"></a>
 <p>To locate hodring logs, follow the steps below: </p>

File diff ditekan karena terlalu besar
+ 7 - 8
docs/hod_user_guide.pdf


+ 5 - 0
src/contrib/hod/CHANGES.txt

@@ -7,6 +7,11 @@ Release 0.18.1 - Unreleased
     HADOOP-4060. Modified HOD to rotate log files on the client side.
     (Vinod Kumar Vavilapalli via yhemanth)
 
+  IMPROVEMENTS
+
+    HADOOP-4145. Add an accounting plugin (script) for HOD.
+    (Hemanth Yamijala via nigel)
+
   BUG FIXES
 
     HADOOP-4161. Fixed bug in HOD cleanup that had the potential to

+ 36 - 1
src/contrib/hod/hodlib/Hod/hadoop.py

@@ -451,8 +451,43 @@ class hadoopCluster:
       raise Exception("Invalid state: Node pool is not initialized to delete the given job.")
     return ret
          
+  def is_valid_account(self):
+    """Verify if the account being used to submit the job is a valid account.
+       This code looks for a file <install-dir>/bin/verify-account. 
+       If the file is present, it executes the file, passing as argument 
+       the account name. It returns the exit code and output from the 
+       script on non-zero exit code."""
+
+    accountValidationScript = os.path.abspath('./verify-account')
+    if not os.path.exists(accountValidationScript):
+      return (0, None)
+
+    account = self.__nodePool.getAccountString()
+    exitCode = 0
+    errMsg = None
+    try:
+      accountValidationCmd = simpleCommand('Account Validation Command',\
+                                             '%s %s' % (accountValidationScript,
+                                                        account))
+      accountValidationCmd.start()
+      accountValidationCmd.wait()
+      accountValidationCmd.join()
+      exitCode = accountValidationCmd.exit_code()
+      self.__log.debug('account validation script is run %d' \
+                          % exitCode)
+      errMsg = None
+      if exitCode is not 0:
+        errMsg = accountValidationCmd.output()
+    except Exception, e:
+      exitCode = 0
+      self.__log.warn('Error executing account script: %s ' \
+                         'Accounting is disabled.' \
+                          % get_exception_error_string())
+      self.__log.debug(get_exception_string())
+    return (exitCode, errMsg)
+    
   def allocate(self, clusterDir, min, max=None):
-    status = 0  
+    status = 0
     self.__svcrgyClient = self.__get_svcrgy_client()
         
     self.__log.debug("allocate %s %s %s" % (clusterDir, min, max))

+ 15 - 1
src/contrib/hod/hodlib/Hod/hod.py

@@ -252,7 +252,6 @@ class hodRunner:
     self.__cfg['ringmaster']['max-master-failures'] = \
                               min(maxFailures, maxFailedNodes)
 
-    
   def _op_allocate(self, args):
     operation = "allocate"
     argLength = len(args)
@@ -313,6 +312,21 @@ class hodRunner:
           return
  
       self.__setup_cluster_logger(clusterDir)
+
+      (status, message) = self.__cluster.is_valid_account()
+      if status is not 0:
+        if message:
+          for line in message:
+            self.__log.critical("verify-account output: %s" % line)
+        self.__log.critical("Cluster cannot be allocated because account verification failed. " \
+                              + "verify-account returned exit code: %s." % status)
+        self.__opCode = 4
+        return
+      else:
+        self.__log.debug("verify-account returned zero exit code.")
+        if message:
+          self.__log.debug("verify-account output: %s" % message)
+
       if re.match('\d+-\d+', nodes):
         (min, max) = nodes.split("-")
         min = int(min)

+ 4 - 0
src/contrib/hod/hodlib/Hod/nodePool.py

@@ -116,6 +116,10 @@ class NodePool:
     """Update information about the workers started by this NodePool."""
     raise NotImplementedError
 
+  def getAccountString(self):
+    """Return the account string for this job"""
+    raise NotImplementedError
+
   def getNextNodeSetId(self):
     id = self.nextNodeSetId
     self.nextNodeSetId += 1

+ 6 - 0
src/contrib/hod/hodlib/NodePools/torque.py

@@ -51,6 +51,12 @@ class TorquePool(NodePool):
     self.__torque = torqueInterface(
       self._cfg['resource_manager']['batch-home'], environ, self._log)
 
+  def getAccountString(self):
+    account = ''
+    if self._cfg['resource_manager'].has_key('pbs-account'):
+      account = self._cfg['resource_manager']['pbs-account']
+    return account
+
   def __gen_submit_params(self, nodeSet, walltime = None, qosLevel = None, 
                           account = None):
     argList = []

+ 31 - 0
src/docs/src/documentation/content/xdocs/hod_admin_guide.xml

@@ -351,6 +351,37 @@ in the HOD Configuration Guide.</p>
         it is better not to run this inside a tight loop without sleeping.</p>
       </section>
     </section>
+
+    <section>
+      <title>verify-account - Script to verify an account under which 
+             jobs are submitted</title>
+      <p>Production systems use accounting packages to charge users for using
+      shared compute resources. HOD supports a parameter 
+      <em>resource_manager.pbs-account</em> to allow users to identify the
+      account under which they would like to submit jobs. It may be necessary
+      to verify that this account is a valid one configured in an accounting
+      system. The <em>hod-install-dir/bin/verify-account</em> script 
+      provides a mechanism to plug-in a custom script that can do this
+      verification.</p>
+      
+      <section>
+        <title>Integrating the verify-account script with HOD</title>
+        <p>HOD runs the <em>verify-account</em> script passing in the
+        <em>resource_manager.pbs-account</em> value as argument to the script,
+        before allocating a cluster. Sites can write a script that verify this 
+        account against their accounting systems. Returning a non-zero exit 
+        code from this script will cause HOD to fail allocation. Also, in
+        case of an error, HOD will print the output of script to the user.
+        Any descriptive error message can be passed to the user from the
+        script in this manner.</p>
+        <p>The default script that comes with the HOD installation does not
+        do any validation, and returns a zero exit code.</p>
+        <p>If the verify-account script is not found, then HOD will treat
+        that verification is disabled, and continue allocation as is.</p>
+      </section>
+    </section>
+
   </section>
+
 </body>
 </document>

+ 2 - 1
src/docs/src/documentation/content/xdocs/hod_user_guide.xml

@@ -412,7 +412,8 @@
         <td> 5 </td>
         <td> Job execution failure </td>
         <td> 1. Torque Job was deleted from outside. Execute the Torque <code>qstat</code> command to see if you have any jobs in the <code>R</code> (Running) state. If none exist, try re-executing HOD. <br />
-          2. Torque problems such as the server momentarily going down, or becoming unresponsive. Contact system administrator. </td>
+          2. Torque problems such as the server momentarily going down, or becoming unresponsive. Contact system administrator. <br/>
+          3. The system administrator might have configured account verification, and an invalid account is specified. Contact system administrator.</td>
       </tr>
       <tr>
         <td> 6 </td>

Beberapa file tidak ditampilkan karena terlalu banyak file yang berubah dalam diff ini