Browse Source

Merge -r 669788:669789 and 669789:669790 from trunk onto 0.18 branch. Fixes HADOOP-3547.

git-svn-id: https://svn.apache.org/repos/asf/hadoop/core/branches/branch-0.18@669793 13f79535-47bb-0310-9956-ffa450edef68
Devaraj Das 17 năm trước cách đây
mục cha
commit
7bbce3f96b

+ 3 - 0
CHANGES.txt

@@ -298,6 +298,9 @@ Release 0.18.0 - Unreleased
 
     HADOOP-3593. Updates the mapred tutorial. (ddas)
 
+    HADOOP-3547. Documents the way in which native libraries can be distributed
+    via the DistributedCache. (Amareshwari Sriramadasu via ddas)
+
   OPTIMIZATIONS
 
     HADOOP-3274. The default constructor of BytesWritable creates empty 

+ 85 - 69
docs/mapred_tutorial.html

@@ -307,7 +307,7 @@ document.write("Last Published: " + document.lastModified);
 <a href="#Example%3A+WordCount+v2.0">Example: WordCount v2.0</a>
 <ul class="minitoc">
 <li>
-<a href="#Source+Code-N10DA0">Source Code</a>
+<a href="#Source+Code-N10DD5">Source Code</a>
 </li>
 <li>
 <a href="#Sample+Runs">Sample Runs</a>
@@ -1723,25 +1723,20 @@ document.write("Last Published: " + document.lastModified);
         <span class="codefrag">${HADOOP_LOG_DIR}/userlogs</span>
 </p>
 <p>The <a href="#DistributedCache">DistributedCache</a> can also be used
-        as a rudimentary software distribution mechanism for use in the map 
-        and/or reduce tasks. It can be used to distribute both jars and 
-        native libraries. The 
-        <a href="api/org/apache/hadoop/filecache/DistributedCache.html#addArchiveToClassPath(org.apache.hadoop.fs.Path,%20org.apache.hadoop.conf.Configuration)">
-        DistributedCache.addArchiveToClassPath(Path, Configuration)</a> or 
-        <a href="api/org/apache/hadoop/filecache/DistributedCache.html#addFileToClassPath(org.apache.hadoop.fs.Path,%20org.apache.hadoop.conf.Configuration)">
-        DistributedCache.addFileToClassPath(Path, Configuration)</a> api can 
-        be used to cache files/jars and also add them to the <em>classpath</em> 
-        of child-jvm. Similarly the facility provided by the 
-        <span class="codefrag">DistributedCache</span> where-in it symlinks the cached files into
-        the working directory of the task can be used to distribute native 
-        libraries and load them. The underlying detail is that child-jvm always 
-        has its <em>current working directory</em> added to the
+        to distribute both jars and native libraries for use in the map 
+        and/or reduce tasks. The child-jvm always has its 
+        <em>current working directory</em> added to the
         <span class="codefrag">java.library.path</span> and <span class="codefrag">LD_LIBRARY_PATH</span>. 
-        And hence the cached libraries can be 
-        loaded via <a href="http://java.sun.com/j2se/1.5.0/docs/api/java/lang/System.html#loadLibrary(java.lang.String)">
-        System.loadLibrary</a> or <a href="http://java.sun.com/j2se/1.5.0/docs/api/java/lang/System.html#load(java.lang.String)">
-        System.load</a>.</p>
-<a name="N109F3"></a><a name="Job+Submission+and+Monitoring"></a>
+        And hence the cached libraries can be loaded via 
+        <a href="http://java.sun.com/j2se/1.5.0/docs/api/java/lang/System.html#loadLibrary(java.lang.String)">
+        System.loadLibrary</a> or 
+        <a href="http://java.sun.com/j2se/1.5.0/docs/api/java/lang/System.html#load(java.lang.String)">
+        System.load</a>. More details on how to load shared libraries through 
+        distributed cache are documented at 
+        <a href="native_libraries.html#Loading+native+libraries+through+DistributedCache">
+        native_libraries.html</a>
+</p>
+<a name="N109E8"></a><a name="Job+Submission+and+Monitoring"></a>
 <h3 class="h4">Job Submission and Monitoring</h3>
 <p>
 <a href="api/org/apache/hadoop/mapred/JobClient.html">
@@ -1802,7 +1797,7 @@ document.write("Last Published: " + document.lastModified);
 <p>Normally the user creates the application, describes various facets 
         of the job via <span class="codefrag">JobConf</span>, and then uses the 
         <span class="codefrag">JobClient</span> to submit the job and monitor its progress.</p>
-<a name="N10A53"></a><a name="Job+Control"></a>
+<a name="N10A48"></a><a name="Job+Control"></a>
 <h4>Job Control</h4>
 <p>Users may need to chain map-reduce jobs to accomplish complex
           tasks which cannot be done via a single map-reduce job. This is fairly
@@ -1838,7 +1833,7 @@ document.write("Last Published: " + document.lastModified);
             </li>
           
 </ul>
-<a name="N10A7D"></a><a name="Job+Input"></a>
+<a name="N10A72"></a><a name="Job+Input"></a>
 <h3 class="h4">Job Input</h3>
 <p>
 <a href="api/org/apache/hadoop/mapred/InputFormat.html">
@@ -1886,7 +1881,7 @@ document.write("Last Published: " + document.lastModified);
         appropriate <span class="codefrag">CompressionCodec</span>. However, it must be noted that
         compressed files with the above extensions cannot be <em>split</em> and 
         each compressed file is processed in its entirety by a single mapper.</p>
-<a name="N10AE7"></a><a name="InputSplit"></a>
+<a name="N10ADC"></a><a name="InputSplit"></a>
 <h4>InputSplit</h4>
 <p>
 <a href="api/org/apache/hadoop/mapred/InputSplit.html">
@@ -1900,7 +1895,7 @@ document.write("Last Published: " + document.lastModified);
           FileSplit</a> is the default <span class="codefrag">InputSplit</span>. It sets 
           <span class="codefrag">map.input.file</span> to the path of the input file for the
           logical split.</p>
-<a name="N10B0C"></a><a name="RecordReader"></a>
+<a name="N10B01"></a><a name="RecordReader"></a>
 <h4>RecordReader</h4>
 <p>
 <a href="api/org/apache/hadoop/mapred/RecordReader.html">
@@ -1912,7 +1907,7 @@ document.write("Last Published: " + document.lastModified);
           for processing. <span class="codefrag">RecordReader</span> thus assumes the 
           responsibility of processing record boundaries and presents the tasks 
           with keys and values.</p>
-<a name="N10B2F"></a><a name="Job+Output"></a>
+<a name="N10B24"></a><a name="Job+Output"></a>
 <h3 class="h4">Job Output</h3>
 <p>
 <a href="api/org/apache/hadoop/mapred/OutputFormat.html">
@@ -1937,7 +1932,7 @@ document.write("Last Published: " + document.lastModified);
 <p>
 <span class="codefrag">TextOutputFormat</span> is the default 
         <span class="codefrag">OutputFormat</span>.</p>
-<a name="N10B58"></a><a name="Task+Side-Effect+Files"></a>
+<a name="N10B4D"></a><a name="Task+Side-Effect+Files"></a>
 <h4>Task Side-Effect Files</h4>
 <p>In some applications, component tasks need to create and/or write to
           side-files, which differ from the actual job-output files.</p>
@@ -1976,7 +1971,7 @@ document.write("Last Published: " + document.lastModified);
 <p>The entire discussion holds true for maps of jobs with 
            reducer=NONE (i.e. 0 reduces) since output of the map, in that case, 
            goes directly to HDFS.</p>
-<a name="N10BA0"></a><a name="RecordWriter"></a>
+<a name="N10B95"></a><a name="RecordWriter"></a>
 <h4>RecordWriter</h4>
 <p>
 <a href="api/org/apache/hadoop/mapred/RecordWriter.html">
@@ -1984,9 +1979,9 @@ document.write("Last Published: " + document.lastModified);
           pairs to an output file.</p>
 <p>RecordWriter implementations write the job outputs to the 
           <span class="codefrag">FileSystem</span>.</p>
-<a name="N10BB7"></a><a name="Other+Useful+Features"></a>
+<a name="N10BAC"></a><a name="Other+Useful+Features"></a>
 <h3 class="h4">Other Useful Features</h3>
-<a name="N10BBD"></a><a name="Counters"></a>
+<a name="N10BB2"></a><a name="Counters"></a>
 <h4>Counters</h4>
 <p>
 <span class="codefrag">Counters</span> represent global counters, defined either by 
@@ -2003,7 +1998,7 @@ document.write("Last Published: " + document.lastModified);
           in the <span class="codefrag">map</span> and/or 
           <span class="codefrag">reduce</span> methods. These counters are then globally 
           aggregated by the framework.</p>
-<a name="N10BEC"></a><a name="DistributedCache"></a>
+<a name="N10BE1"></a><a name="DistributedCache"></a>
 <h4>DistributedCache</h4>
 <p>
 <a href="api/org/apache/hadoop/filecache/DistributedCache.html">
@@ -2030,14 +2025,51 @@ document.write("Last Published: " + document.lastModified);
 <span class="codefrag">DistributedCache</span> can be used to distribute simple, 
           read-only data/text files and more complex types such as archives and
           jars. Archives (zip, tar, tgz and tar.gz files) are 
-          <em>un-archived</em> at the slave nodes.
-          Optionally users can also direct the <span class="codefrag">DistributedCache</span> to 
-          <em>symlink</em> the cached file(s) into the <span class="codefrag">current working 
+          <em>un-archived</em> at the slave nodes. Files 
+          have <em>execution permissions</em> set. </p>
+<p>The files/archives can be distributed by setting the property
+          <span class="codefrag">mapred.cache.{files|archives}</span>. If more than one 
+          file/archive has to be distributed, they can be added as comma
+          separated paths. The properties can also be set by APIs 
+          <a href="api/org/apache/hadoop/filecache/DistributedCache.html#addCacheFile(java.net.URI,%20org.apache.hadoop.conf.Configuration)">
+          DistributedCache.addCacheFile(URI,conf)</a>/ 
+          <a href="api/org/apache/hadoop/filecache/DistributedCache.html#addCacheArchive(java.net.URI,%20org.apache.hadoop.conf.Configuration)">
+          DistributedCache.addCacheArchive(URI,conf)</a> and
+          <a href="api/org/apache/hadoop/filecache/DistributedCache.html#setCacheFiles(java.net.URI[],%20org.apache.hadoop.conf.Configuration)">
+          DistributedCache.setCacheFiles(URIs,conf)</a>/
+          <a href="api/org/apache/hadoop/filecache/DistributedCache.html#setCacheArchives(java.net.URI[],%20org.apache.hadoop.conf.Configuration)">
+          DistributedCache.setCacheArchives(URIs,conf)</a> 
+          where URI is of the form
+          <span class="codefrag">hdfs://host:port/absolute-path#link-name</span>.
+          In Streaming, the files can be distributed through command line
+          option <span class="codefrag">-cacheFile/-cacheArchive</span>.</p>
+<p>Optionally users can also direct the <span class="codefrag">DistributedCache</span>
+          to <em>symlink</em> the cached file(s) into the <span class="codefrag">current working 
           directory</span> of the task via the 
           <a href="api/org/apache/hadoop/filecache/DistributedCache.html#createSymlink(org.apache.hadoop.conf.Configuration)">
-          DistributedCache.createSymlink(Configuration)</a> api. Files 
-          have <em>execution permissions</em> set.</p>
-<a name="N10C2A"></a><a name="Tool"></a>
+          DistributedCache.createSymlink(Configuration)</a> api. Or by setting
+          the configuration property <span class="codefrag">mapred.create.symlink</span>
+          as <span class="codefrag">yes</span>. The DistributedCache will use the 
+          <span class="codefrag">fragment</span> of the URI as the name of the symlink. 
+          For example, the URI 
+          <span class="codefrag">hdfs://namenode:port/lib.so.1#lib.so</span>
+          will have the symlink name as <span class="codefrag">lib.so</span> in task's cwd
+          for the file <span class="codefrag">lib.so.1</span> in distributed cache.</p>
+<p>The <span class="codefrag">DistributedCache</span> can also be used as a 
+          rudimentary software distribution mechanism for use in the
+          map and/or reduce tasks. It can be used to distribute both
+          jars and native libraries. The 
+          <a href="api/org/apache/hadoop/filecache/DistributedCache.html#addArchiveToClassPath(org.apache.hadoop.fs.Path,%20org.apache.hadoop.conf.Configuration)">
+          DistributedCache.addArchiveToClassPath(Path, Configuration)</a> or 
+          <a href="api/org/apache/hadoop/filecache/DistributedCache.html#addFileToClassPath(org.apache.hadoop.fs.Path,%20org.apache.hadoop.conf.Configuration)">
+          DistributedCache.addFileToClassPath(Path, Configuration)</a> api 
+          can be used to cache files/jars and also add them to the 
+          <em>classpath</em> of child-jvm. The same can be done by setting
+          the configuration properties 
+          <span class="codefrag">mapred.job.classpath.{files|archives}</span>. Similarly the
+          cached files that are symlinked into the working directory of the
+          task can be used to distribute native libraries and load them.</p>
+<a name="N10C64"></a><a name="Tool"></a>
 <h4>Tool</h4>
 <p>The <a href="api/org/apache/hadoop/util/Tool.html">Tool</a> 
           interface supports the handling of generic Hadoop command-line options.
@@ -2077,7 +2109,7 @@ document.write("Last Published: " + document.lastModified);
             </span>
           
 </p>
-<a name="N10C5C"></a><a name="IsolationRunner"></a>
+<a name="N10C96"></a><a name="IsolationRunner"></a>
 <h4>IsolationRunner</h4>
 <p>
 <a href="api/org/apache/hadoop/mapred/IsolationRunner.html">
@@ -2101,7 +2133,7 @@ document.write("Last Published: " + document.lastModified);
 <p>
 <span class="codefrag">IsolationRunner</span> will run the failed task in a single 
           jvm, which can be in the debugger, over precisely the same input.</p>
-<a name="N10C8F"></a><a name="Profiling"></a>
+<a name="N10CC9"></a><a name="Profiling"></a>
 <h4>Profiling</h4>
 <p>Profiling is a utility to get a representative (2 or 3) sample
           of built-in java profiler for a sample of maps and reduces. </p>
@@ -2134,7 +2166,7 @@ document.write("Last Published: " + document.lastModified);
           <span class="codefrag">-agentlib:hprof=cpu=samples,heap=sites,force=n,thread=y,verbose=n,file=%s</span>
           
 </p>
-<a name="N10CC3"></a><a name="Debugging"></a>
+<a name="N10CFD"></a><a name="Debugging"></a>
 <h4>Debugging</h4>
 <p>Map/Reduce framework provides a facility to run user-provided 
           scripts for debugging. When map/reduce task fails, user can run 
@@ -2145,30 +2177,14 @@ document.write("Last Published: " + document.lastModified);
 <p> In the following sections we discuss how to submit debug script
           along with the job. For submitting debug script, first it has to
           distributed. Then the script has to supplied in Configuration. </p>
-<a name="N10CCF"></a><a name="How+to+distribute+script+file%3A"></a>
+<a name="N10D09"></a><a name="How+to+distribute+script+file%3A"></a>
 <h5> How to distribute script file: </h5>
 <p>
-          To distribute  the debug script file, first copy the file to the dfs.
-          The file can be distributed by setting the property 
-          "mapred.cache.files" with value "path"#"script-name". 
-          If more than one file has to be distributed, the files can be added
-          as comma separated paths. This property can also be set by APIs
-          <a href="api/org/apache/hadoop/filecache/DistributedCache.html#addCacheFile(java.net.URI,%20org.apache.hadoop.conf.Configuration)">
-          DistributedCache.addCacheFile(URI,conf) </a> and
-          <a href="api/org/apache/hadoop/filecache/DistributedCache.html#setCacheFiles(java.net.URI[],%20org.apache.hadoop.conf.Configuration)">
-          DistributedCache.setCacheFiles(URIs,conf) </a> where URI is of 
-          the form "hdfs://host:port/'absolutepath'#'script-name'". 
-          For Streaming, the file can be added through 
-          command line option -cacheFile.
-          </p>
-<p>
-          The files has to be symlinked in the current working directory of 
-          of the task. To create symlink for the file, the property 
-          "mapred.create.symlink" is set to "yes". This can also be set by
-          <a href="api/org/apache/hadoop/filecache/DistributedCache.html#createSymlink(org.apache.hadoop.conf.Configuration)">
-          DistributedCache.createSymLink(Configuration) </a> api.
-          </p>
-<a name="N10CE8"></a><a name="How+to+submit+script%3A"></a>
+          The user has to use 
+          <a href="mapred_tutorial.html#DistributedCache">DistributedCache</a>
+          mechanism to <em>distribute</em> and <em>symlink</em> the
+          debug script file.</p>
+<a name="N10D1D"></a><a name="How+to+submit+script%3A"></a>
 <h5> How to submit script: </h5>
 <p> A quick way to submit debug script is to set values for the 
           properties "mapred.map.task.debug.script" and 
@@ -2192,17 +2208,17 @@ document.write("Last Published: " + document.lastModified);
 <span class="codefrag">$script $stdout $stderr $syslog $jobconf $program </span>  
           
 </p>
-<a name="N10D0A"></a><a name="Default+Behavior%3A"></a>
+<a name="N10D3F"></a><a name="Default+Behavior%3A"></a>
 <h5> Default Behavior: </h5>
 <p> For pipes, a default script is run to process core dumps under
           gdb, prints stack trace and gives info about running threads. </p>
-<a name="N10D15"></a><a name="JobControl"></a>
+<a name="N10D4A"></a><a name="JobControl"></a>
 <h4>JobControl</h4>
 <p>
 <a href="api/org/apache/hadoop/mapred/jobcontrol/package-summary.html">
           JobControl</a> is a utility which encapsulates a set of Map-Reduce jobs
           and their dependencies.</p>
-<a name="N10D22"></a><a name="Data+Compression"></a>
+<a name="N10D57"></a><a name="Data+Compression"></a>
 <h4>Data Compression</h4>
 <p>Hadoop Map-Reduce provides facilities for the application-writer to
           specify compression for both intermediate map-outputs and the
@@ -2216,7 +2232,7 @@ document.write("Last Published: " + document.lastModified);
           codecs for reasons of both performance (zlib) and non-availability of
           Java libraries (lzo). More details on their usage and availability are
           available <a href="native_libraries.html">here</a>.</p>
-<a name="N10D42"></a><a name="Intermediate+Outputs"></a>
+<a name="N10D77"></a><a name="Intermediate+Outputs"></a>
 <h5>Intermediate Outputs</h5>
 <p>Applications can control compression of intermediate map-outputs
             via the 
@@ -2225,7 +2241,7 @@ document.write("Last Published: " + document.lastModified);
             <span class="codefrag">CompressionCodec</span> to be used via the
             <a href="api/org/apache/hadoop/mapred/JobConf.html#setMapOutputCompressorClass(java.lang.Class)">
             JobConf.setMapOutputCompressorClass(Class)</a> api.</p>
-<a name="N10D57"></a><a name="Job+Outputs"></a>
+<a name="N10D8C"></a><a name="Job+Outputs"></a>
 <h5>Job Outputs</h5>
 <p>Applications can control compression of job-outputs via the
             <a href="api/org/apache/hadoop/mapred/FileOutputFormat.html#setCompressOutput(org.apache.hadoop.mapred.JobConf,%20boolean)">
@@ -2245,7 +2261,7 @@ document.write("Last Published: " + document.lastModified);
 </div>
 
     
-<a name="N10D86"></a><a name="Example%3A+WordCount+v2.0"></a>
+<a name="N10DBB"></a><a name="Example%3A+WordCount+v2.0"></a>
 <h2 class="h3">Example: WordCount v2.0</h2>
 <div class="section">
 <p>Here is a more complete <span class="codefrag">WordCount</span> which uses many of the
@@ -2255,7 +2271,7 @@ document.write("Last Published: " + document.lastModified);
       <a href="quickstart.html#SingleNodeSetup">pseudo-distributed</a> or
       <a href="quickstart.html#Fully-Distributed+Operation">fully-distributed</a> 
       Hadoop installation.</p>
-<a name="N10DA0"></a><a name="Source+Code-N10DA0"></a>
+<a name="N10DD5"></a><a name="Source+Code-N10DD5"></a>
 <h3 class="h4">Source Code</h3>
 <table class="ForrestTable" cellspacing="1" cellpadding="4">
           
@@ -3465,7 +3481,7 @@ document.write("Last Published: " + document.lastModified);
 </tr>
         
 </table>
-<a name="N11502"></a><a name="Sample+Runs"></a>
+<a name="N11537"></a><a name="Sample+Runs"></a>
 <h3 class="h4">Sample Runs</h3>
 <p>Sample text-files as input:</p>
 <p>
@@ -3633,7 +3649,7 @@ document.write("Last Published: " + document.lastModified);
 <br>
         
 </p>
-<a name="N115D6"></a><a name="Highlights"></a>
+<a name="N1160B"></a><a name="Highlights"></a>
 <h3 class="h4">Highlights</h3>
 <p>The second version of <span class="codefrag">WordCount</span> improves upon the 
         previous one by using some features offered by the Map-Reduce framework:

Những thai đổi đã bị hủy bỏ vì nó quá lớn
+ 2 - 2
docs/mapred_tutorial.pdf


+ 38 - 0
docs/native_libraries.html

@@ -207,6 +207,9 @@ document.write("Last Published: " + document.lastModified);
 </li>
 </ul>
 </li>
+<li>
+<a href="#Loading+native+libraries+through+DistributedCache"> Loading native libraries through DistributedCache </a>
+</li>
 </ul>
 </div>
   
@@ -400,6 +403,41 @@ document.write("Last Published: " + document.lastModified);
         
 </ul>
 </div>
+    
+<a name="N1011E"></a><a name="Loading+native+libraries+through+DistributedCache"></a>
+<h2 class="h3"> Loading native libraries through DistributedCache </h2>
+<div class="section">
+<p>User can load native shared libraries through  
+      <a href="mapred_tutorial.html#DistributedCache">DistributedCache</a>
+      for <em>distributing</em> and <em>symlinking</em> the library files</p>
+<p>Here is an example, describing how to distribute the library and
+      load it from map/reduce task. </p>
+<ol>
+      
+<li> First copy the library to the HDFS. <br>
+      
+<span class="codefrag">bin/hadoop fs -copyFromLocal mylib.so.1 /libraries/mylib.so.1</span>
+      
+</li>
+      
+<li> The job launching program should contain the following: <br>
+      
+<span class="codefrag"> DistributedCache.createSymlink(conf); </span> 
+<br>
+      
+<span class="codefrag"> DistributedCache.addCacheFile("hdfs://host:port/libraries/mylib.so.1#mylib.so", conf);
+      </span>
+      
+</li>
+      
+<li> The map/reduce task can contain: <br>
+      
+<span class="codefrag"> System.loadLibrary("mylib.so"); </span>
+      
+</li>
+      
+</ol>
+</div>
   
 </div>
 <!--+

Những thai đổi đã bị hủy bỏ vì nó quá lớn
+ 13 - 2
docs/native_libraries.pdf


+ 61 - 43
src/docs/src/documentation/content/xdocs/mapred_tutorial.xml

@@ -1182,24 +1182,18 @@
         <code>${HADOOP_LOG_DIR}/userlogs</code></p>
         
         <p>The <a href="#DistributedCache">DistributedCache</a> can also be used
-        as a rudimentary software distribution mechanism for use in the map 
-        and/or reduce tasks. It can be used to distribute both jars and 
-        native libraries. The 
-        <a href="ext:api/org/apache/hadoop/filecache/distributedcache/addarchivetoclasspath">
-        DistributedCache.addArchiveToClassPath(Path, Configuration)</a> or 
-        <a href="ext:api/org/apache/hadoop/filecache/distributedcache/addfiletoclasspath">
-        DistributedCache.addFileToClassPath(Path, Configuration)</a> api can 
-        be used to cache files/jars and also add them to the <em>classpath</em> 
-        of child-jvm. Similarly the facility provided by the 
-        <code>DistributedCache</code> where-in it symlinks the cached files into
-        the working directory of the task can be used to distribute native 
-        libraries and load them. The underlying detail is that child-jvm always 
-        has its <em>current working directory</em> added to the
+        to distribute both jars and native libraries for use in the map 
+        and/or reduce tasks. The child-jvm always has its 
+        <em>current working directory</em> added to the
         <code>java.library.path</code> and <code>LD_LIBRARY_PATH</code>. 
-        And hence the cached libraries can be 
-        loaded via <a href="http://java.sun.com/j2se/1.5.0/docs/api/java/lang/System.html#loadLibrary(java.lang.String)">
-        System.loadLibrary</a> or <a href="http://java.sun.com/j2se/1.5.0/docs/api/java/lang/System.html#load(java.lang.String)">
-        System.load</a>.</p>
+        And hence the cached libraries can be loaded via 
+        <a href="http://java.sun.com/j2se/1.5.0/docs/api/java/lang/System.html#loadLibrary(java.lang.String)">
+        System.loadLibrary</a> or 
+        <a href="http://java.sun.com/j2se/1.5.0/docs/api/java/lang/System.html#load(java.lang.String)">
+        System.load</a>. More details on how to load shared libraries through 
+        distributed cache are documented at 
+        <a href="native_libraries.html#Loading+native+libraries+through+DistributedCache">
+        native_libraries.html</a></p>
       </section>
       
       <section>
@@ -1507,13 +1501,54 @@
           <p><code>DistributedCache</code> can be used to distribute simple, 
           read-only data/text files and more complex types such as archives and
           jars. Archives (zip, tar, tgz and tar.gz files) are 
-          <em>un-archived</em> at the slave nodes.
-          Optionally users can also direct the <code>DistributedCache</code> to 
-          <em>symlink</em> the cached file(s) into the <code>current working 
+          <em>un-archived</em> at the slave nodes. Files 
+          have <em>execution permissions</em> set. </p>
+          
+          <p>The files/archives can be distributed by setting the property
+          <code>mapred.cache.{files|archives}</code>. If more than one 
+          file/archive has to be distributed, they can be added as comma
+          separated paths. The properties can also be set by APIs 
+          <a href="ext:api/org/apache/hadoop/filecache/distributedcache/addcachefile">
+          DistributedCache.addCacheFile(URI,conf)</a>/ 
+          <a href="ext:api/org/apache/hadoop/filecache/distributedcache/addcachearchive">
+          DistributedCache.addCacheArchive(URI,conf)</a> and
+          <a href="ext:api/org/apache/hadoop/filecache/distributedcache/setcachefiles">
+          DistributedCache.setCacheFiles(URIs,conf)</a>/
+          <a href="ext:api/org/apache/hadoop/filecache/distributedcache/setcachearchives">
+          DistributedCache.setCacheArchives(URIs,conf)</a> 
+          where URI is of the form
+          <code>hdfs://host:port/absolute-path#link-name</code>.
+          In Streaming, the files can be distributed through command line
+          option <code>-cacheFile/-cacheArchive</code>.</p>
+          
+          <p>Optionally users can also direct the <code>DistributedCache</code>
+          to <em>symlink</em> the cached file(s) into the <code>current working 
           directory</code> of the task via the 
           <a href="ext:api/org/apache/hadoop/filecache/distributedcache/createsymlink">
-          DistributedCache.createSymlink(Configuration)</a> api. Files 
-          have <em>execution permissions</em> set.</p>
+          DistributedCache.createSymlink(Configuration)</a> api. Or by setting
+          the configuration property <code>mapred.create.symlink</code>
+          as <code>yes</code>. The DistributedCache will use the 
+          <code>fragment</code> of the URI as the name of the symlink. 
+          For example, the URI 
+          <code>hdfs://namenode:port/lib.so.1#lib.so</code>
+          will have the symlink name as <code>lib.so</code> in task's cwd
+          for the file <code>lib.so.1</code> in distributed cache.</p>
+         
+          <p>The <code>DistributedCache</code> can also be used as a 
+          rudimentary software distribution mechanism for use in the
+          map and/or reduce tasks. It can be used to distribute both
+          jars and native libraries. The 
+          <a href="ext:api/org/apache/hadoop/filecache/distributedcache/addarchivetoclasspath">
+          DistributedCache.addArchiveToClassPath(Path, Configuration)</a> or 
+          <a href="ext:api/org/apache/hadoop/filecache/distributedcache/addfiletoclasspath">
+          DistributedCache.addFileToClassPath(Path, Configuration)</a> api 
+          can be used to cache files/jars and also add them to the 
+          <em>classpath</em> of child-jvm. The same can be done by setting
+          the configuration properties 
+          <code>mapred.job.classpath.{files|archives}</code>. Similarly the
+          cached files that are symlinked into the working directory of the
+          task can be used to distribute native libraries and load them.</p>
+          
         </section>
         
         <section>
@@ -1628,27 +1663,10 @@
           <section>
           <title> How to distribute script file: </title>
           <p>
-          To distribute  the debug script file, first copy the file to the dfs.
-          The file can be distributed by setting the property 
-          "mapred.cache.files" with value "path"#"script-name". 
-          If more than one file has to be distributed, the files can be added
-          as comma separated paths. This property can also be set by APIs
-          <a href="ext:api/org/apache/hadoop/filecache/distributedcache/addcachefile">
-          DistributedCache.addCacheFile(URI,conf) </a> and
-          <a href="ext:api/org/apache/hadoop/filecache/distributedcache/setcachefiles">
-          DistributedCache.setCacheFiles(URIs,conf) </a> where URI is of 
-          the form "hdfs://host:port/'absolutepath'#'script-name'". 
-          For Streaming, the file can be added through 
-          command line option -cacheFile.
-          </p>
-          
-          <p>
-          The files has to be symlinked in the current working directory of 
-          of the task. To create symlink for the file, the property 
-          "mapred.create.symlink" is set to "yes". This can also be set by
-          <a href="ext:api/org/apache/hadoop/filecache/distributedcache/createsymlink">
-          DistributedCache.createSymLink(Configuration) </a> api.
-          </p>
+          The user has to use 
+          <a href="mapred_tutorial.html#DistributedCache">DistributedCache</a>
+          mechanism to <em>distribute</em> and <em>symlink</em> the
+          debug script file.</p>
           </section>
           <section>
           <title> How to submit script: </title>

+ 22 - 0
src/docs/src/documentation/content/xdocs/native_libraries.xml

@@ -185,6 +185,28 @@
         </ul>
       </section>
     </section>
+    <section>
+      <title> Loading native libraries through DistributedCache </title>
+      <p>User can load native shared libraries through  
+      <a href="mapred_tutorial.html#DistributedCache">DistributedCache</a>
+      for <em>distributing</em> and <em>symlinking</em> the library files</p>
+      
+      <p>Here is an example, describing how to distribute the library and
+      load it from map/reduce task. </p>
+      <ol>
+      <li> First copy the library to the HDFS. <br/>
+      <code>bin/hadoop fs -copyFromLocal mylib.so.1 /libraries/mylib.so.1</code>
+      </li>
+      <li> The job launching program should contain the following: <br/>
+      <code> DistributedCache.createSymlink(conf); </code> <br/>
+      <code> DistributedCache.addCacheFile("hdfs://host:port/libraries/mylib.so.1#mylib.so", conf);
+      </code>
+      </li>
+      <li> The map/reduce task can contain: <br/>
+      <code> System.loadLibrary("mylib.so"); </code>
+      </li>
+      </ol>
+    </section>
   </body>
   
 </document>

+ 2 - 0
src/docs/src/documentation/content/xdocs/site.xml

@@ -104,7 +104,9 @@ See http://forrest.apache.org/docs/linking.html for more info.
                 <addarchivetoclasspath href="#addArchiveToClassPath(org.apache.hadoop.fs.Path,%20org.apache.hadoop.conf.Configuration)" />
                 <addfiletoclasspath href="#addFileToClassPath(org.apache.hadoop.fs.Path,%20org.apache.hadoop.conf.Configuration)" />
                 <addcachefile href="#addCacheFile(java.net.URI,%20org.apache.hadoop.conf.Configuration)" />
+                <addcachearchive href="#addCacheArchive(java.net.URI,%20org.apache.hadoop.conf.Configuration)" />
                 <setcachefiles href="#setCacheFiles(java.net.URI[],%20org.apache.hadoop.conf.Configuration)" />
+                <setcachearchives href="#setCacheArchives(java.net.URI[],%20org.apache.hadoop.conf.Configuration)" />
                 <createsymlink href="#createSymlink(org.apache.hadoop.conf.Configuration)" />
               </distributedcache>  
             </filecache>

Một số tệp đã không được hiển thị bởi vì quá nhiều tập tin thay đổi trong này khác