ソースを参照

HADOOP-1660. Add the cwd of the map/reduce task to the java.library.path of the child-jvm to support loading of native libraries distributed via the DistributedCache.

git-svn-id: https://svn.apache.org/repos/asf/lucene/hadoop/trunk@610135 13f79535-47bb-0310-9956-ffa450edef68
Arun Murthy 17 年 前
コミット
54234bdf57

+ 4 - 0
CHANGES.txt

@@ -163,6 +163,10 @@ Trunk (unreleased changes)
     HADOOP-2390. Added documentation for user-controls for intermediate
     HADOOP-2390. Added documentation for user-controls for intermediate
     map-outputs & final job-outputs and native-hadoop libraries. (acmurthy) 
     map-outputs & final job-outputs and native-hadoop libraries. (acmurthy) 
  
  
+    HADOOP-1660. Add the cwd of the map/reduce task to the java.library.path
+    of the child-jvm to support loading of native libraries distributed via
+    the DistributedCache. (acmurthy)
+ 
   OPTIMIZATIONS
   OPTIMIZATIONS
 
 
     HADOOP-1898.  Release the lock protecting the last time of the last stack
     HADOOP-1898.  Release the lock protecting the last time of the last stack

+ 95 - 31
docs/mapred_tutorial.html

@@ -216,6 +216,9 @@ document.write("Last Published: " + document.lastModified);
 <a href="#Job+Configuration">Job Configuration</a>
 <a href="#Job+Configuration">Job Configuration</a>
 </li>
 </li>
 <li>
 <li>
+<a href="#Task+Execution+%26+Environment">Task Execution &amp; Environment</a>
+</li>
+<li>
 <a href="#Job+Submission+and+Monitoring">Job Submission and Monitoring</a>
 <a href="#Job+Submission+and+Monitoring">Job Submission and Monitoring</a>
 <ul class="minitoc">
 <ul class="minitoc">
 <li>
 <li>
@@ -274,7 +277,7 @@ document.write("Last Published: " + document.lastModified);
 <a href="#Example%3A+WordCount+v2.0">Example: WordCount v2.0</a>
 <a href="#Example%3A+WordCount+v2.0">Example: WordCount v2.0</a>
 <ul class="minitoc">
 <ul class="minitoc">
 <li>
 <li>
-<a href="#Source+Code-N10B1F">Source Code</a>
+<a href="#Source+Code-N10B98">Source Code</a>
 </li>
 </li>
 <li>
 <li>
 <a href="#Sample+Runs">Sample Runs</a>
 <a href="#Sample+Runs">Sample Runs</a>
@@ -1460,7 +1463,67 @@ document.write("Last Published: " + document.lastModified);
         <a href="api/org/apache/hadoop/conf/Configuration.html#set(java.lang.String, java.lang.String)">set(String, String)</a>/<a href="api/org/apache/hadoop/conf/Configuration.html#get(java.lang.String, java.lang.String)">get(String, String)</a>
         <a href="api/org/apache/hadoop/conf/Configuration.html#set(java.lang.String, java.lang.String)">set(String, String)</a>/<a href="api/org/apache/hadoop/conf/Configuration.html#get(java.lang.String, java.lang.String)">get(String, String)</a>
         to set/get arbitrary parameters needed by applications. However, use the 
         to set/get arbitrary parameters needed by applications. However, use the 
         <span class="codefrag">DistributedCache</span> for large amounts of (read-only) data.</p>
         <span class="codefrag">DistributedCache</span> for large amounts of (read-only) data.</p>
-<a name="N1082C"></a><a name="Job+Submission+and+Monitoring"></a>
+<a name="N1082C"></a><a name="Task+Execution+%26+Environment"></a>
+<h3 class="h4">Task Execution &amp; Environment</h3>
+<p>The <span class="codefrag">TaskTracker</span> executes the <span class="codefrag">Mapper</span>/ 
+        <span class="codefrag">Reducer</span>  <em>task</em> as a child process in a separate jvm.
+        </p>
+<p>The child-task inherits the environment of the parent 
+        <span class="codefrag">TaskTracker</span>. The user can specify additional options to the
+        child-jvm via the <span class="codefrag">mapred.child.java.opts</span> configuration
+        parameter in the <span class="codefrag">JobConf</span> such as non-standard paths for the 
+        run-time linker to search shared libraries via 
+        <span class="codefrag">-Djava.library.path=&lt;&gt;</span> etc. If the 
+        <span class="codefrag">mapred.child.java.opts</span> contains the symbol <em>@taskid@</em> 
+        it is interpolated with value of <span class="codefrag">taskid</span> of the map/reduce
+        task.</p>
+<p>Here is an example with multiple arguments and substitutions, 
+        showing jvm GC logging, and start of a passwordless JVM JMX agent so that
+        it can connect with jconsole and the likes to watch child memory, 
+        threads and get thread dumps. It also sets the maximum heap-size of the 
+        child jvm to 512MB and adds an additional path to the 
+        <span class="codefrag">java.library.path</span> of the child-jvm.</p>
+<p>
+          
+<span class="codefrag">&lt;property&gt;</span>
+<br>
+          &nbsp;&nbsp;<span class="codefrag">&lt;name&gt;mapred.child.java.opts&lt;/name&gt;</span>
+<br>
+          &nbsp;&nbsp;<span class="codefrag">&lt;value&gt;</span>
+<br>
+          &nbsp;&nbsp;&nbsp;&nbsp;<span class="codefrag">
+                    -Xmx512M -Djava.library.path=/home/mycompany/lib
+                    -verbose:gc -Xloggc:/tmp/@taskid@.gc</span>
+<br>
+          &nbsp;&nbsp;&nbsp;&nbsp;<span class="codefrag">
+                    -Dcom.sun.management.jmxremote.authenticate=false 
+                    -Dcom.sun.management.jmxremote.ssl=false</span>
+<br>
+          &nbsp;&nbsp;<span class="codefrag">&lt;/value&gt;</span>
+<br>
+          
+<span class="codefrag">&lt;/property&gt;</span>
+        
+</p>
+<p>The <a href="#DistributedCache">DistributedCache</a> can also be used
+        as a rudimentary software distribution mechanism for use in the map 
+        and/or reduce tasks. It can be used to distribute both jars and 
+        native libraries. The 
+        <a href="api/org/apache/hadoop/filecache/DistributedCache.html#addArchiveToClassPath(org.apache.hadoop.fs.Path,%20org.apache.hadoop.conf.Configuration)">
+        DistributedCache.addArchiveToClassPath(Path, Configuration)</a> or 
+        <a href="api/org/apache/hadoop/filecache/DistributedCache.html#addFileToClassPath(org.apache.hadoop.fs.Path,%20org.apache.hadoop.conf.Configuration)">
+        DistributedCache.addFileToClassPath(Path, Configuration)</a> api can 
+        be used to cache files/jars and also add them to the <em>classpath</em> 
+        of child-jvm. Similarly the facility provided by the 
+        <span class="codefrag">DistributedCache</span> where-in it symlinks the cached files into
+        the working directory of the task can be used to distribute native 
+        libraries and load them. The underlying detail is that child-jvm always 
+        has its <em>current working directory</em> added to the
+        <span class="codefrag">java.library.path</span> and hence the cached libraries can be 
+        loaded via <a href="http://java.sun.com/j2se/1.5.0/docs/api/java/lang/System.html#loadLibrary(java.lang.String)">
+        System.loadLibrary</a> or <a href="http://java.sun.com/j2se/1.5.0/docs/api/java/lang/System.html#load(java.lang.String)">
+        System.load</a>.</p>
+<a name="N108A1"></a><a name="Job+Submission+and+Monitoring"></a>
 <h3 class="h4">Job Submission and Monitoring</h3>
 <h3 class="h4">Job Submission and Monitoring</h3>
 <p>
 <p>
 <a href="api/org/apache/hadoop/mapred/JobClient.html">
 <a href="api/org/apache/hadoop/mapred/JobClient.html">
@@ -1496,7 +1559,7 @@ document.write("Last Published: " + document.lastModified);
 <p>Normally the user creates the application, describes various facets 
 <p>Normally the user creates the application, describes various facets 
         of the job via <span class="codefrag">JobConf</span>, and then uses the 
         of the job via <span class="codefrag">JobConf</span>, and then uses the 
         <span class="codefrag">JobClient</span> to submit the job and monitor its progress.</p>
         <span class="codefrag">JobClient</span> to submit the job and monitor its progress.</p>
-<a name="N1086A"></a><a name="Job+Control"></a>
+<a name="N108DF"></a><a name="Job+Control"></a>
 <h4>Job Control</h4>
 <h4>Job Control</h4>
 <p>Users may need to chain map-reduce jobs to accomplish complex
 <p>Users may need to chain map-reduce jobs to accomplish complex
           tasks which cannot be done via a single map-reduce job. This is fairly
           tasks which cannot be done via a single map-reduce job. This is fairly
@@ -1532,7 +1595,7 @@ document.write("Last Published: " + document.lastModified);
             </li>
             </li>
           
           
 </ul>
 </ul>
-<a name="N10894"></a><a name="Job+Input"></a>
+<a name="N10909"></a><a name="Job+Input"></a>
 <h3 class="h4">Job Input</h3>
 <h3 class="h4">Job Input</h3>
 <p>
 <p>
 <a href="api/org/apache/hadoop/mapred/InputFormat.html">
 <a href="api/org/apache/hadoop/mapred/InputFormat.html">
@@ -1580,7 +1643,7 @@ document.write("Last Published: " + document.lastModified);
         appropriate <span class="codefrag">CompressionCodec</span>. However, it must be noted that
         appropriate <span class="codefrag">CompressionCodec</span>. However, it must be noted that
         compressed files with the above extensions cannot be <em>split</em> and 
         compressed files with the above extensions cannot be <em>split</em> and 
         each compressed file is processed in its entirety by a single mapper.</p>
         each compressed file is processed in its entirety by a single mapper.</p>
-<a name="N108FE"></a><a name="InputSplit"></a>
+<a name="N10973"></a><a name="InputSplit"></a>
 <h4>InputSplit</h4>
 <h4>InputSplit</h4>
 <p>
 <p>
 <a href="api/org/apache/hadoop/mapred/InputSplit.html">
 <a href="api/org/apache/hadoop/mapred/InputSplit.html">
@@ -1594,7 +1657,7 @@ document.write("Last Published: " + document.lastModified);
           FileSplit</a> is the default <span class="codefrag">InputSplit</span>. It sets 
           FileSplit</a> is the default <span class="codefrag">InputSplit</span>. It sets 
           <span class="codefrag">map.input.file</span> to the path of the input file for the
           <span class="codefrag">map.input.file</span> to the path of the input file for the
           logical split.</p>
           logical split.</p>
-<a name="N10923"></a><a name="RecordReader"></a>
+<a name="N10998"></a><a name="RecordReader"></a>
 <h4>RecordReader</h4>
 <h4>RecordReader</h4>
 <p>
 <p>
 <a href="api/org/apache/hadoop/mapred/RecordReader.html">
 <a href="api/org/apache/hadoop/mapred/RecordReader.html">
@@ -1606,7 +1669,7 @@ document.write("Last Published: " + document.lastModified);
           for processing. <span class="codefrag">RecordReader</span> thus assumes the 
           for processing. <span class="codefrag">RecordReader</span> thus assumes the 
           responsibility of processing record boundaries and presents the tasks 
           responsibility of processing record boundaries and presents the tasks 
           with keys and values.</p>
           with keys and values.</p>
-<a name="N10946"></a><a name="Job+Output"></a>
+<a name="N109BB"></a><a name="Job+Output"></a>
 <h3 class="h4">Job Output</h3>
 <h3 class="h4">Job Output</h3>
 <p>
 <p>
 <a href="api/org/apache/hadoop/mapred/OutputFormat.html">
 <a href="api/org/apache/hadoop/mapred/OutputFormat.html">
@@ -1631,7 +1694,7 @@ document.write("Last Published: " + document.lastModified);
 <p>
 <p>
 <span class="codefrag">TextOutputFormat</span> is the default 
 <span class="codefrag">TextOutputFormat</span> is the default 
         <span class="codefrag">OutputFormat</span>.</p>
         <span class="codefrag">OutputFormat</span>.</p>
-<a name="N1096F"></a><a name="Task+Side-Effect+Files"></a>
+<a name="N109E4"></a><a name="Task+Side-Effect+Files"></a>
 <h4>Task Side-Effect Files</h4>
 <h4>Task Side-Effect Files</h4>
 <p>In some applications, component tasks need to create and/or write to
 <p>In some applications, component tasks need to create and/or write to
           side-files, which differ from the actual job-output files.</p>
           side-files, which differ from the actual job-output files.</p>
@@ -1657,7 +1720,7 @@ document.write("Last Published: " + document.lastModified);
           JobConf.getOutputPath()</a>, and the framework will promote them 
           JobConf.getOutputPath()</a>, and the framework will promote them 
           similarly for succesful task-attempts, thus eliminating the need to 
           similarly for succesful task-attempts, thus eliminating the need to 
           pick unique paths per task-attempt.</p>
           pick unique paths per task-attempt.</p>
-<a name="N109A4"></a><a name="RecordWriter"></a>
+<a name="N10A19"></a><a name="RecordWriter"></a>
 <h4>RecordWriter</h4>
 <h4>RecordWriter</h4>
 <p>
 <p>
 <a href="api/org/apache/hadoop/mapred/RecordWriter.html">
 <a href="api/org/apache/hadoop/mapred/RecordWriter.html">
@@ -1665,9 +1728,9 @@ document.write("Last Published: " + document.lastModified);
           pairs to an output file.</p>
           pairs to an output file.</p>
 <p>RecordWriter implementations write the job outputs to the 
 <p>RecordWriter implementations write the job outputs to the 
           <span class="codefrag">FileSystem</span>.</p>
           <span class="codefrag">FileSystem</span>.</p>
-<a name="N109BB"></a><a name="Other+Useful+Features"></a>
+<a name="N10A30"></a><a name="Other+Useful+Features"></a>
 <h3 class="h4">Other Useful Features</h3>
 <h3 class="h4">Other Useful Features</h3>
-<a name="N109C1"></a><a name="Counters"></a>
+<a name="N10A36"></a><a name="Counters"></a>
 <h4>Counters</h4>
 <h4>Counters</h4>
 <p>
 <p>
 <span class="codefrag">Counters</span> represent global counters, defined either by 
 <span class="codefrag">Counters</span> represent global counters, defined either by 
@@ -1681,7 +1744,7 @@ document.write("Last Published: " + document.lastModified);
           Reporter.incrCounter(Enum, long)</a> in the <span class="codefrag">map</span> and/or 
           Reporter.incrCounter(Enum, long)</a> in the <span class="codefrag">map</span> and/or 
           <span class="codefrag">reduce</span> methods. These counters are then globally 
           <span class="codefrag">reduce</span> methods. These counters are then globally 
           aggregated by the framework.</p>
           aggregated by the framework.</p>
-<a name="N109EC"></a><a name="DistributedCache"></a>
+<a name="N10A61"></a><a name="DistributedCache"></a>
 <h4>DistributedCache</h4>
 <h4>DistributedCache</h4>
 <p>
 <p>
 <a href="api/org/apache/hadoop/filecache/DistributedCache.html">
 <a href="api/org/apache/hadoop/filecache/DistributedCache.html">
@@ -1701,19 +1764,20 @@ document.write("Last Published: " + document.lastModified);
           per job and the ability to cache archives which are un-archived on 
           per job and the ability to cache archives which are un-archived on 
           the slaves.</p>
           the slaves.</p>
 <p>
 <p>
-<span class="codefrag">DistributedCache</span> can be used to distribute simple, 
-          read-only data/text files and more complex types such as archives and
-          jars. Archives (zip files) are <em>un-archived</em> at the slave nodes.
-          Jars maybe be optionally added to the classpath of the tasks, a
-          rudimentary <em>software distribution</em> mechanism.  Files have 
-          <em>execution permissions</em> set. Optionally users can also direct the
-          <span class="codefrag">DistributedCache</span> to <em>symlink</em> the cached file(s) 
-          into the working directory of the task.</p>
-<p>
 <span class="codefrag">DistributedCache</span> tracks the modification timestamps of 
 <span class="codefrag">DistributedCache</span> tracks the modification timestamps of 
           the cached files. Clearly the cache files should not be modified by 
           the cached files. Clearly the cache files should not be modified by 
           the application or externally while the job is executing.</p>
           the application or externally while the job is executing.</p>
-<a name="N10A26"></a><a name="Tool"></a>
+<p>
+<span class="codefrag">DistributedCache</span> can be used to distribute simple, 
+          read-only data/text files and more complex types such as archives and
+          jars. Archives (zip files) are <em>un-archived</em> at the slave nodes.
+          Optionally users can also direct the <span class="codefrag">DistributedCache</span> to 
+          <em>symlink</em> the cached file(s) into the <span class="codefrag">current working 
+          directory</span> of the task via the 
+          <a href="api/org/apache/hadoop/filecache/DistributedCache.html#createSymlink(org.apache.hadoop.conf.Configuration)">
+          DistributedCache.createSymlink(Path, Configuration)</a> api. Files 
+          have <em>execution permissions</em> set.</p>
+<a name="N10A9F"></a><a name="Tool"></a>
 <h4>Tool</h4>
 <h4>Tool</h4>
 <p>The <a href="api/org/apache/hadoop/util/Tool.html">Tool</a> 
 <p>The <a href="api/org/apache/hadoop/util/Tool.html">Tool</a> 
           interface supports the handling of generic Hadoop command-line options.
           interface supports the handling of generic Hadoop command-line options.
@@ -1753,7 +1817,7 @@ document.write("Last Published: " + document.lastModified);
             </span>
             </span>
           
           
 </p>
 </p>
-<a name="N10A58"></a><a name="IsolationRunner"></a>
+<a name="N10AD1"></a><a name="IsolationRunner"></a>
 <h4>IsolationRunner</h4>
 <h4>IsolationRunner</h4>
 <p>
 <p>
 <a href="api/org/apache/hadoop/mapred/IsolationRunner.html">
 <a href="api/org/apache/hadoop/mapred/IsolationRunner.html">
@@ -1777,13 +1841,13 @@ document.write("Last Published: " + document.lastModified);
 <p>
 <p>
 <span class="codefrag">IsolationRunner</span> will run the failed task in a single 
 <span class="codefrag">IsolationRunner</span> will run the failed task in a single 
           jvm, which can be in the debugger, over precisely the same input.</p>
           jvm, which can be in the debugger, over precisely the same input.</p>
-<a name="N10A8B"></a><a name="JobControl"></a>
+<a name="N10B04"></a><a name="JobControl"></a>
 <h4>JobControl</h4>
 <h4>JobControl</h4>
 <p>
 <p>
 <a href="api/org/apache/hadoop/mapred/jobcontrol/package-summary.html">
 <a href="api/org/apache/hadoop/mapred/jobcontrol/package-summary.html">
           JobControl</a> is a utility which encapsulates a set of Map-Reduce jobs
           JobControl</a> is a utility which encapsulates a set of Map-Reduce jobs
           and their dependencies.</p>
           and their dependencies.</p>
-<a name="N10A98"></a><a name="Data+Compression"></a>
+<a name="N10B11"></a><a name="Data+Compression"></a>
 <h4>Data Compression</h4>
 <h4>Data Compression</h4>
 <p>Hadoop Map-Reduce provides facilities for the application-writer to
 <p>Hadoop Map-Reduce provides facilities for the application-writer to
           specify compression for both intermediate map-outputs and the
           specify compression for both intermediate map-outputs and the
@@ -1797,7 +1861,7 @@ document.write("Last Published: " + document.lastModified);
           codecs for reasons of both performance (zlib) and non-availability of
           codecs for reasons of both performance (zlib) and non-availability of
           Java libraries (lzo). More details on their usage and availability are
           Java libraries (lzo). More details on their usage and availability are
           available <a href="native_libraries.html">here</a>.</p>
           available <a href="native_libraries.html">here</a>.</p>
-<a name="N10AB8"></a><a name="Intermediate+Outputs"></a>
+<a name="N10B31"></a><a name="Intermediate+Outputs"></a>
 <h5>Intermediate Outputs</h5>
 <h5>Intermediate Outputs</h5>
 <p>Applications can control compression of intermediate map-outputs
 <p>Applications can control compression of intermediate map-outputs
             via the 
             via the 
@@ -1818,7 +1882,7 @@ document.write("Last Published: " + document.lastModified);
             <a href="api/org/apache/hadoop/mapred/JobConf.html#setMapOutputCompressionType(org.apache.hadoop.io.SequenceFile.CompressionType)">
             <a href="api/org/apache/hadoop/mapred/JobConf.html#setMapOutputCompressionType(org.apache.hadoop.io.SequenceFile.CompressionType)">
             JobConf.setMapOutputCompressionType(SequenceFile.CompressionType)</a> 
             JobConf.setMapOutputCompressionType(SequenceFile.CompressionType)</a> 
             api.</p>
             api.</p>
-<a name="N10AE4"></a><a name="Job+Outputs"></a>
+<a name="N10B5D"></a><a name="Job+Outputs"></a>
 <h5>Job Outputs</h5>
 <h5>Job Outputs</h5>
 <p>Applications can control compression of job-outputs via the
 <p>Applications can control compression of job-outputs via the
             <a href="api/org/apache/hadoop/mapred/OutputFormatBase.html#setCompressOutput(org.apache.hadoop.mapred.JobConf,%20boolean)">
             <a href="api/org/apache/hadoop/mapred/OutputFormatBase.html#setCompressOutput(org.apache.hadoop.mapred.JobConf,%20boolean)">
@@ -1838,12 +1902,12 @@ document.write("Last Published: " + document.lastModified);
 </div>
 </div>
 
 
     
     
-<a name="N10B13"></a><a name="Example%3A+WordCount+v2.0"></a>
+<a name="N10B8C"></a><a name="Example%3A+WordCount+v2.0"></a>
 <h2 class="h3">Example: WordCount v2.0</h2>
 <h2 class="h3">Example: WordCount v2.0</h2>
 <div class="section">
 <div class="section">
 <p>Here is a more complete <span class="codefrag">WordCount</span> which uses many of the
 <p>Here is a more complete <span class="codefrag">WordCount</span> which uses many of the
       features provided by the Map-Reduce framework we discussed so far:</p>
       features provided by the Map-Reduce framework we discussed so far:</p>
-<a name="N10B1F"></a><a name="Source+Code-N10B1F"></a>
+<a name="N10B98"></a><a name="Source+Code-N10B98"></a>
 <h3 class="h4">Source Code</h3>
 <h3 class="h4">Source Code</h3>
 <table class="ForrestTable" cellspacing="1" cellpadding="4">
 <table class="ForrestTable" cellspacing="1" cellpadding="4">
           
           
@@ -3021,7 +3085,7 @@ document.write("Last Published: " + document.lastModified);
 </tr>
 </tr>
         
         
 </table>
 </table>
-<a name="N11251"></a><a name="Sample+Runs"></a>
+<a name="N112CA"></a><a name="Sample+Runs"></a>
 <h3 class="h4">Sample Runs</h3>
 <h3 class="h4">Sample Runs</h3>
 <p>Sample text-files as input:</p>
 <p>Sample text-files as input:</p>
 <p>
 <p>
@@ -3186,7 +3250,7 @@ document.write("Last Published: " + document.lastModified);
 <br>
 <br>
         
         
 </p>
 </p>
-<a name="N11321"></a><a name="Salient+Points"></a>
+<a name="N1139A"></a><a name="Salient+Points"></a>
 <h3 class="h4">Salient Points</h3>
 <h3 class="h4">Salient Points</h3>
 <p>The second version of <span class="codefrag">WordCount</span> improves upon the 
 <p>The second version of <span class="codefrag">WordCount</span> improves upon the 
         previous one by using some features offered by the Map-Reduce framework:
         previous one by using some features offered by the Map-Reduce framework:

ファイルの差分が大きいため隠しています
+ 21 - 10
docs/mapred_tutorial.pdf


+ 68 - 9
src/docs/src/documentation/content/xdocs/mapred_tutorial.xml

@@ -1002,6 +1002,64 @@
         <code>DistributedCache</code> for large amounts of (read-only) data.</p>
         <code>DistributedCache</code> for large amounts of (read-only) data.</p>
       </section>
       </section>
 
 
+      <section>
+        <title>Task Execution &amp; Environment</title>
+
+        <p>The <code>TaskTracker</code> executes the <code>Mapper</code>/ 
+        <code>Reducer</code>  <em>task</em> as a child process in a separate jvm.
+        </p>
+        
+        <p>The child-task inherits the environment of the parent 
+        <code>TaskTracker</code>. The user can specify additional options to the
+        child-jvm via the <code>mapred.child.java.opts</code> configuration
+        parameter in the <code>JobConf</code> such as non-standard paths for the 
+        run-time linker to search shared libraries via 
+        <code>-Djava.library.path=&lt;&gt;</code> etc. If the 
+        <code>mapred.child.java.opts</code> contains the symbol <em>@taskid@</em> 
+        it is interpolated with value of <code>taskid</code> of the map/reduce
+        task.</p>
+        
+        <p>Here is an example with multiple arguments and substitutions, 
+        showing jvm GC logging, and start of a passwordless JVM JMX agent so that
+        it can connect with jconsole and the likes to watch child memory, 
+        threads and get thread dumps. It also sets the maximum heap-size of the 
+        child jvm to 512MB and adds an additional path to the 
+        <code>java.library.path</code> of the child-jvm.</p>
+
+        <p>
+          <code>&lt;property&gt;</code><br/>
+          &nbsp;&nbsp;<code>&lt;name&gt;mapred.child.java.opts&lt;/name&gt;</code><br/>
+          &nbsp;&nbsp;<code>&lt;value&gt;</code><br/>
+          &nbsp;&nbsp;&nbsp;&nbsp;<code>
+                    -Xmx512M -Djava.library.path=/home/mycompany/lib
+                    -verbose:gc -Xloggc:/tmp/@taskid@.gc</code><br/>
+          &nbsp;&nbsp;&nbsp;&nbsp;<code>
+                    -Dcom.sun.management.jmxremote.authenticate=false 
+                    -Dcom.sun.management.jmxremote.ssl=false</code><br/>
+          &nbsp;&nbsp;<code>&lt;/value&gt;</code><br/>
+          <code>&lt;/property&gt;</code>
+        </p>
+        
+        <p>The <a href="#DistributedCache">DistributedCache</a> can also be used
+        as a rudimentary software distribution mechanism for use in the map 
+        and/or reduce tasks. It can be used to distribute both jars and 
+        native libraries. The 
+        <a href="ext:api/org/apache/hadoop/filecache/distributedcache/addarchivetoclasspath">
+        DistributedCache.addArchiveToClassPath(Path, Configuration)</a> or 
+        <a href="ext:api/org/apache/hadoop/filecache/distributedcache/addfiletoclasspath">
+        DistributedCache.addFileToClassPath(Path, Configuration)</a> api can 
+        be used to cache files/jars and also add them to the <em>classpath</em> 
+        of child-jvm. Similarly the facility provided by the 
+        <code>DistributedCache</code> where-in it symlinks the cached files into
+        the working directory of the task can be used to distribute native 
+        libraries and load them. The underlying detail is that child-jvm always 
+        has its <em>current working directory</em> added to the
+        <code>java.library.path</code> and hence the cached libraries can be 
+        loaded via <a href="http://java.sun.com/j2se/1.5.0/docs/api/java/lang/System.html#loadLibrary(java.lang.String)">
+        System.loadLibrary</a> or <a href="http://java.sun.com/j2se/1.5.0/docs/api/java/lang/System.html#load(java.lang.String)">
+        System.load</a>.</p>
+      </section>
+      
       <section>
       <section>
         <title>Job Submission and Monitoring</title>
         <title>Job Submission and Monitoring</title>
         
         
@@ -1260,19 +1318,20 @@
           efficiency stems from the fact that the files are only copied once 
           efficiency stems from the fact that the files are only copied once 
           per job and the ability to cache archives which are un-archived on 
           per job and the ability to cache archives which are un-archived on 
           the slaves.</p> 
           the slaves.</p> 
+          
+          <p><code>DistributedCache</code> tracks the modification timestamps of 
+          the cached files. Clearly the cache files should not be modified by 
+          the application or externally while the job is executing.</p>
 
 
           <p><code>DistributedCache</code> can be used to distribute simple, 
           <p><code>DistributedCache</code> can be used to distribute simple, 
           read-only data/text files and more complex types such as archives and
           read-only data/text files and more complex types such as archives and
           jars. Archives (zip files) are <em>un-archived</em> at the slave nodes.
           jars. Archives (zip files) are <em>un-archived</em> at the slave nodes.
-          Jars maybe be optionally added to the classpath of the tasks, a
-          rudimentary <em>software distribution</em> mechanism.  Files have 
-          <em>execution permissions</em> set. Optionally users can also direct the
-          <code>DistributedCache</code> to <em>symlink</em> the cached file(s) 
-          into the working directory of the task.</p>
- 
-          <p><code>DistributedCache</code> tracks the modification timestamps of 
-          the cached files. Clearly the cache files should not be modified by 
-          the application or externally while the job is executing.</p>
+          Optionally users can also direct the <code>DistributedCache</code> to 
+          <em>symlink</em> the cached file(s) into the <code>current working 
+          directory</code> of the task via the 
+          <a href="ext:api/org/apache/hadoop/filecache/distributedcache/createsymlink">
+          DistributedCache.createSymlink(Path, Configuration)</a> api. Files 
+          have <em>execution permissions</em> set.</p>
         </section>
         </section>
         
         
         <section>
         <section>

+ 5 - 1
src/docs/src/documentation/content/xdocs/site.xml

@@ -61,7 +61,11 @@ See http://forrest.apache.org/docs/linking.html for more info.
               </configuration>
               </configuration>
             </conf>
             </conf>
             <filecache href="filecache/">
             <filecache href="filecache/">
-              <distributedcache href="DistributedCache.html" />
+              <distributedcache href="DistributedCache.html">
+                <addarchivetoclasspath href="#addArchiveToClassPath(org.apache.hadoop.fs.Path,%20org.apache.hadoop.conf.Configuration)" />
+                <addfiletoclasspath href="#addFileToClassPath(org.apache.hadoop.fs.Path,%20org.apache.hadoop.conf.Configuration)" />
+                <createsymlink href="#createSymlink(org.apache.hadoop.conf.Configuration)" />
+              </distributedcache>  
             </filecache>
             </filecache>
             <fs href="fs/">
             <fs href="fs/">
               <filesystem href="FileSystem.html" />
               <filesystem href="FileSystem.html" />

+ 23 - 11
src/java/org/apache/hadoop/mapred/TaskRunner.java

@@ -293,19 +293,31 @@ abstract class TaskRunner extends Thread {
       javaOpts = replaceAll(javaOpts, "@taskid@", taskid);
       javaOpts = replaceAll(javaOpts, "@taskid@", taskid);
       String [] javaOptsSplit = javaOpts.split(" ");
       String [] javaOptsSplit = javaOpts.split(" ");
       
       
-      //Add java.library.path; necessary for native-hadoop libraries
+      // Add java.library.path; necessary for loading native libraries.
+      //
+      // 1. To support native-hadoop library i.e. libhadoop.so, we add the 
+      //    parent processes' java.library.path to the child. 
+      // 2. We also add the 'cwd' of the task to it's java.library.path to help 
+      //    users distribute native libraries via the DistributedCache.
+      // 3. The user can also specify extra paths to be added to the 
+      //    java.library.path via mapred.child.java.opts.
+      //
       String libraryPath = System.getProperty("java.library.path");
       String libraryPath = System.getProperty("java.library.path");
-      if (libraryPath != null) {
-        boolean hasLibrary = false;
-        for(int i=0; i<javaOptsSplit.length ;i++) { 
-          if(javaOptsSplit[i].startsWith("-Djava.library.path=")) {
-            javaOptsSplit[i] += sep + libraryPath;
-            hasLibrary = true;
-            break;
-          }
+      if (libraryPath == null) {
+        libraryPath = workDir.getAbsolutePath();
+      } else {
+        libraryPath += sep + workDir;
+      }
+      boolean hasUserLDPath = false;
+      for(int i=0; i<javaOptsSplit.length ;i++) { 
+        if(javaOptsSplit[i].startsWith("-Djava.library.path=")) {
+          javaOptsSplit[i] += sep + libraryPath;
+          hasUserLDPath = true;
+          break;
         }
         }
-        if(!hasLibrary)
-          vargs.add("-Djava.library.path=" + libraryPath);
+      }
+      if(!hasUserLDPath) {
+        vargs.add("-Djava.library.path=" + libraryPath);
       }
       }
       for (int i = 0; i < javaOptsSplit.length; i++) {
       for (int i = 0; i < javaOptsSplit.length; i++) {
         vargs.add(javaOptsSplit[i]);
         vargs.add(javaOptsSplit[i]);

この差分においてかなりの量のファイルが変更されているため、一部のファイルを表示していません