17 years ago · cf7a4cb470
--- a/CHANGES.txt
+++ b/CHANGES.txt
@@ -282,6 +282,9 @@ Release 0.18.0 - Unreleased
 
				     HADOOP-3379. Documents stream.non.zero.exit.status.is.failure for Streaming.
			
 
				     (Amareshwari Sriramadasu via ddas)
			
 
				 
			
 
				+    HADOOP-3096. Improves documentation about the Task Execution Environment in 
			
 
				+    the Map-Reduce tutorial. (Amareshwari Sriramadasu via ddas)
			
 
				+
			
 
				   OPTIMIZATIONS
			
 
				 
			
 
				     HADOOP-3274. The default constructor of BytesWritable creates empty 
			
--- a/docs/mapred_tutorial.html
+++ b/docs/mapred_tutorial.html
@@ -301,7 +301,7 @@ document.write("Last Published: " + document.lastModified);
 
				 <a href="#Example%3A+WordCount+v2.0">Example: WordCount v2.0</a>
			
 
				 <ul class="minitoc">
			
 
				 <li>
			
 
				-<a href="#Source+Code-N10C87">Source Code</a>
			
 
				+<a href="#Source+Code-N10D77">Source Code</a>
			
 
				 </li>
			
 
				 <li>
			
 
				 <a href="#Sample+Runs">Sample Runs</a>
			
@@ -1542,42 +1542,170 @@ document.write("Last Published: " + document.lastModified);
 
				 </p>
			
 
				 <p>Users/admins can also specify the maximum virtual memory 
			
 
				         of the launched child-task using <span class="codefrag">mapred.child.ulimit</span>.</p>
			
 
				-<p>When the job starts, the localized job directory
			
 
				-        <span class="codefrag"> ${mapred.local.dir}/taskTracker/jobcache/$jobid/</span>
			
 
				-        has the following directories: </p>
			
 
				+<p>The task tracker has local directory,
			
 
				+        <span class="codefrag"> ${mapred.local.dir}/taskTracker/</span> to create localized
			
 
				+        cache and localized job. It can define multiple local directories 
			
 
				+        (spanning multiple disks) and then each filename is assigned to a
			
 
				+        semi-random local directory. When the job starts, task tracker 
			
 
				+        creates a localized job directory relative to the local directory
			
 
				+        specified in the configuration. Thus the task tracker directory 
			
 
				+        structure looks the following: </p>
			
 
				 <ul>
			
 
				         
			
 
				-<li> A job-specific shared directory, created at location
			
 
				-        <span class="codefrag">${mapred.local.dir}/taskTracker/jobcache/$jobid/work/ </span>.
			
 
				-        This directory is exposed to the users through 
			
 
				-        <span class="codefrag">job.local.dir </span>. The tasks can use this space as scratch
			
 
				-        space and share files among them. The directory can accessed through 
			
 
				+<li>
			
 
				+<span class="codefrag">${mapred.local.dir}/taskTracker/archive/</span> :
			
 
				+        The distributed cache. This directory holds the localized distributed
			
 
				+        cache. Thus localized distributed cache is shared among all
			
 
				+        the tasks and jobs </li>
			
 
				+        
			
 
				+<li>
			
 
				+<span class="codefrag">${mapred.local.dir}/taskTracker/jobcache/$jobid/</span> :
			
 
				+        The localized job directory 
			
 
				+        <ul>
			
 
				+        
			
 
				+<li>
			
 
				+<span class="codefrag">${mapred.local.dir}/taskTracker/jobcache/$jobid/work/</span> 
			
 
				+        : The job-specific shared directory. The tasks can use this space as 
			
 
				+        scratch space and share files among them. This directory is exposed
			
 
				+        to the users through the configuration property  
			
 
				+        <span class="codefrag">job.local.dir</span>. The directory can accessed through 
			
 
				         api <a href="api/org/apache/hadoop/mapred/JobConf.html#getJobLocalDir()">
			
 
				         JobConf.getJobLocalDir()</a>. It is available as System property also.
			
 
				-        So,users can call <span class="codefrag">System.getProperty("job.local.dir")</span>;
			
 
				-        </li>
			
 
				+        So, users (streaming etc.) can call 
			
 
				+        <span class="codefrag">System.getProperty("job.local.dir")</span> to access the 
			
 
				+        directory.</li>
			
 
				         
			
 
				-<li>A jars directory, which has the job jar file and expanded jar </li>
			
 
				+<li>
			
 
				+<span class="codefrag">${mapred.local.dir}/taskTracker/jobcache/$jobid/jars/</span>
			
 
				+        : The jars directory, which has the job jar file and expanded jar.
			
 
				+        The <span class="codefrag">job.jar</span> is the application's jar file that is
			
 
				+        automatically distributed to each machine. It is expanded in jars
			
 
				+        directory before the tasks for the job start. The job.jar location
			
 
				+        is accessible to the application through the api
			
 
				+        <a href="api/org/apache/hadoop/mapred/JobConf.html#getJar()"> 
			
 
				+        JobConf.getJar() </a>. To access the unjarred directory,
			
 
				+        JobConf.getJar().getParent() can be called.</li>
			
 
				         
			
 
				-<li>A job.xml file, the generic job configuration </li>
			
 
				+<li>
			
 
				+<span class="codefrag">${mapred.local.dir}/taskTracker/jobcache/$jobid/job.xml</span>
			
 
				+        : The job.xml file, the generic job configuration, localized for 
			
 
				+        the job. </li>
			
 
				         
			
 
				-<li>Each task has directory <span class="codefrag">task-id</span> which again has the 
			
 
				-        following structure
			
 
				+<li>
			
 
				+<span class="codefrag">${mapred.local.dir}/taskTracker/jobcache/$jobid/$taskid</span>
			
 
				+        : The task direcrory for each task attempt. Each task directory
			
 
				+        again has the following structure :
			
 
				         <ul>
			
 
				         
			
 
				-<li>A job.xml file, task localized job configuration </li>
			
 
				+<li>
			
 
				+<span class="codefrag">${mapred.local.dir}/taskTracker/jobcache/$jobid/$taskid/job.xml</span>
			
 
				+        : A job.xml file, task localized job configuration, Task localization
			
 
				+        means that properties have been set that are specific to
			
 
				+        this particular task within the job. The properties localized for 
			
 
				+        each task are described below.</li>
			
 
				+        
			
 
				+<li>
			
 
				+<span class="codefrag">${mapred.local.dir}/taskTracker/jobcache/$jobid/$taskid/output</span>
			
 
				+        : A directory for intermediate output files. This contains the
			
 
				+        temporary map reduce data generated by the framework
			
 
				+        such as map output files etc. </li>
			
 
				+        
			
 
				+<li>
			
 
				+<span class="codefrag">${mapred.local.dir}/taskTracker/jobcache/$jobid/$taskid/work</span>
			
 
				+        : The curernt working directory of the task. </li>
			
 
				+        
			
 
				+<li>
			
 
				+<span class="codefrag">${mapred.local.dir}/taskTracker/jobcache/$jobid/$taskid/work/tmp</span>
			
 
				+        : The temporary directory for the task. 
			
 
				+        (User can specify the property <span class="codefrag">mapred.child.tmp</span> to set
			
 
				+        the value of temporary directory for map and reduce tasks. This 
			
 
				+        defaults to <span class="codefrag">./tmp</span>. If the value is not an absolute path,
			
 
				+        it is prepended with task's working directory. Otherwise, it is
			
 
				+        directly assigned. The directory will be created if it doesn't exist.
			
 
				+        Then, the child java tasks are executed with option
			
 
				+        <span class="codefrag">-Djava.io.tmpdir='the absolute path of the tmp dir'</span>.
			
 
				+        Anp pipes and streaming are set with environment variable,
			
 
				+        <span class="codefrag">TMPDIR='the absolute path of the tmp dir'</span>). This 
			
 
				+        directory is created, if <span class="codefrag">mapred.child.tmp</span> has the value
			
 
				+        <span class="codefrag">./tmp</span> 
			
 
				+</li>
			
 
				         
			
 
				-<li>A directory for intermediate output files</li>
			
 
				+</ul>
			
 
				         
			
 
				-<li>The working directory of the task. 
			
 
				-        And work directory has a temporary directory 
			
 
				-        to create temporary files</li>
			
 
				+</li>
			
 
				         
			
 
				 </ul>
			
 
				         
			
 
				 </li>
			
 
				         
			
 
				 </ul>
			
 
				+<p>The following properties are localized in the job configuration 
			
 
				+         for each task's execution: </p>
			
 
				+<table class="ForrestTable" cellspacing="1" cellpadding="4">
			
 
				+          
			
 
				+<tr>
			
 
				+<th colspan="1" rowspan="1">Name</th><th colspan="1" rowspan="1">Type</th><th colspan="1" rowspan="1">Description</th>
			
 
				+</tr>
			
 
				+          
			
 
				+<tr>
			
 
				+<td colspan="1" rowspan="1">mapred.job.id</td><td colspan="1" rowspan="1">String</td><td colspan="1" rowspan="1">The job id</td>
			
 
				+</tr>
			
 
				+          
			
 
				+<tr>
			
 
				+<td colspan="1" rowspan="1">mapred.jar</td><td colspan="1" rowspan="1">String</td>
			
 
				+              <td colspan="1" rowspan="1">job.jar location in job directory</td>
			
 
				+</tr>
			
 
				+          
			
 
				+<tr>
			
 
				+<td colspan="1" rowspan="1">job.local.dir</td><td colspan="1" rowspan="1"> String</td>
			
 
				+              <td colspan="1" rowspan="1"> The job specific shared scratch space</td>
			
 
				+</tr>
			
 
				+          
			
 
				+<tr>
			
 
				+<td colspan="1" rowspan="1">mapred.tip.id</td><td colspan="1" rowspan="1"> String</td>
			
 
				+              <td colspan="1" rowspan="1"> The task id</td>
			
 
				+</tr>
			
 
				+          
			
 
				+<tr>
			
 
				+<td colspan="1" rowspan="1">mapred.task.id</td><td colspan="1" rowspan="1"> String</td>
			
 
				+              <td colspan="1" rowspan="1"> The task attempt id</td>
			
 
				+</tr>
			
 
				+          
			
 
				+<tr>
			
 
				+<td colspan="1" rowspan="1">mapred.task.is.map</td><td colspan="1" rowspan="1"> boolean </td>
			
 
				+              <td colspan="1" rowspan="1">Is this a map task</td>
			
 
				+</tr>
			
 
				+          
			
 
				+<tr>
			
 
				+<td colspan="1" rowspan="1">mapred.task.partition</td><td colspan="1" rowspan="1"> int </td>
			
 
				+              <td colspan="1" rowspan="1">The id of the task within the job</td>
			
 
				+</tr>
			
 
				+          
			
 
				+<tr>
			
 
				+<td colspan="1" rowspan="1">map.input.file</td><td colspan="1" rowspan="1"> String</td>
			
 
				+              <td colspan="1" rowspan="1"> The filename that the map is reading from</td>
			
 
				+</tr>
			
 
				+          
			
 
				+<tr>
			
 
				+<td colspan="1" rowspan="1">map.input.start</td><td colspan="1" rowspan="1"> long</td>
			
 
				+              <td colspan="1" rowspan="1"> The offset of the start of the map input split</td>
			
 
				+</tr>
			
 
				+          
			
 
				+<tr>
			
 
				+<td colspan="1" rowspan="1">map.input.length </td><td colspan="1" rowspan="1">long </td>
			
 
				+              <td colspan="1" rowspan="1">The number of bytes in the map input split</td>
			
 
				+</tr>
			
 
				+          
			
 
				+<tr>
			
 
				+<td colspan="1" rowspan="1">mapred.work.output.dir</td><td colspan="1" rowspan="1"> String </td>
			
 
				+              <td colspan="1" rowspan="1">The task's temporary output directory</td>
			
 
				+</tr>
			
 
				+        
			
 
				+</table>
			
 
				+<p>The standard output (stdout) and error (stderr) streams of the task 
			
 
				+        are read by the TaskTracker and logged to 
			
 
				+        <span class="codefrag">${HADOOP_LOG_DIR}/userlogs</span>
			
 
				+</p>
			
 
				 <p>The <a href="#DistributedCache">DistributedCache</a> can also be used
			
 
				         as a rudimentary software distribution mechanism for use in the map 
			
 
				         and/or reduce tasks. It can be used to distribute both jars and 
			
@@ -1597,7 +1725,7 @@ document.write("Last Published: " + document.lastModified);
 
				         loaded via <a href="http://java.sun.com/j2se/1.5.0/docs/api/java/lang/System.html#loadLibrary(java.lang.String)">
			
 
				         System.loadLibrary</a> or <a href="http://java.sun.com/j2se/1.5.0/docs/api/java/lang/System.html#load(java.lang.String)">
			
 
				         System.load</a>.</p>
			
 
				-<a name="N108FB"></a><a name="Job+Submission+and+Monitoring"></a>
			
 
				+<a name="N109EB"></a><a name="Job+Submission+and+Monitoring"></a>
			
 
				 <h3 class="h4">Job Submission and Monitoring</h3>
			
 
				 <p>
			
 
				 <a href="api/org/apache/hadoop/mapred/JobClient.html">
			
@@ -1658,7 +1786,7 @@ document.write("Last Published: " + document.lastModified);
 
				 <p>Normally the user creates the application, describes various facets 
			
 
				         of the job via <span class="codefrag">JobConf</span>, and then uses the 
			
 
				         <span class="codefrag">JobClient</span> to submit the job and monitor its progress.</p>
			
 
				-<a name="N1095B"></a><a name="Job+Control"></a>
			
 
				+<a name="N10A4B"></a><a name="Job+Control"></a>
			
 
				 <h4>Job Control</h4>
			
 
				 <p>Users may need to chain map-reduce jobs to accomplish complex
			
 
				           tasks which cannot be done via a single map-reduce job. This is fairly
			
@@ -1694,7 +1822,7 @@ document.write("Last Published: " + document.lastModified);
 
				             </li>
			
 
				           
			
 
				 </ul>
			
 
				-<a name="N10985"></a><a name="Job+Input"></a>
			
 
				+<a name="N10A75"></a><a name="Job+Input"></a>
			
 
				 <h3 class="h4">Job Input</h3>
			
 
				 <p>
			
 
				 <a href="api/org/apache/hadoop/mapred/InputFormat.html">
			
@@ -1742,7 +1870,7 @@ document.write("Last Published: " + document.lastModified);
 
				         appropriate <span class="codefrag">CompressionCodec</span>. However, it must be noted that
			
 
				         compressed files with the above extensions cannot be <em>split</em> and 
			
 
				         each compressed file is processed in its entirety by a single mapper.</p>
			
 
				-<a name="N109EF"></a><a name="InputSplit"></a>
			
 
				+<a name="N10ADF"></a><a name="InputSplit"></a>
			
 
				 <h4>InputSplit</h4>
			
 
				 <p>
			
 
				 <a href="api/org/apache/hadoop/mapred/InputSplit.html">
			
@@ -1756,7 +1884,7 @@ document.write("Last Published: " + document.lastModified);
 
				           FileSplit</a> is the default <span class="codefrag">InputSplit</span>. It sets 
			
 
				           <span class="codefrag">map.input.file</span> to the path of the input file for the
			
 
				           logical split.</p>
			
 
				-<a name="N10A14"></a><a name="RecordReader"></a>
			
 
				+<a name="N10B04"></a><a name="RecordReader"></a>
			
 
				 <h4>RecordReader</h4>
			
 
				 <p>
			
 
				 <a href="api/org/apache/hadoop/mapred/RecordReader.html">
			
@@ -1768,7 +1896,7 @@ document.write("Last Published: " + document.lastModified);
 
				           for processing. <span class="codefrag">RecordReader</span> thus assumes the 
			
 
				           responsibility of processing record boundaries and presents the tasks 
			
 
				           with keys and values.</p>
			
 
				-<a name="N10A37"></a><a name="Job+Output"></a>
			
 
				+<a name="N10B27"></a><a name="Job+Output"></a>
			
 
				 <h3 class="h4">Job Output</h3>
			
 
				 <p>
			
 
				 <a href="api/org/apache/hadoop/mapred/OutputFormat.html">
			
@@ -1793,7 +1921,7 @@ document.write("Last Published: " + document.lastModified);
 
				 <p>
			
 
				 <span class="codefrag">TextOutputFormat</span> is the default 
			
 
				         <span class="codefrag">OutputFormat</span>.</p>
			
 
				-<a name="N10A60"></a><a name="Task+Side-Effect+Files"></a>
			
 
				+<a name="N10B50"></a><a name="Task+Side-Effect+Files"></a>
			
 
				 <h4>Task Side-Effect Files</h4>
			
 
				 <p>In some applications, component tasks need to create and/or write to
			
 
				           side-files, which differ from the actual job-output files.</p>
			
@@ -1832,7 +1960,7 @@ document.write("Last Published: " + document.lastModified);
 
				 <p>The entire discussion holds true for maps of jobs with 
			
 
				            reducer=NONE (i.e. 0 reduces) since output of the map, in that case, 
			
 
				            goes directly to HDFS.</p>
			
 
				-<a name="N10AA8"></a><a name="RecordWriter"></a>
			
 
				+<a name="N10B98"></a><a name="RecordWriter"></a>
			
 
				 <h4>RecordWriter</h4>
			
 
				 <p>
			
 
				 <a href="api/org/apache/hadoop/mapred/RecordWriter.html">
			
@@ -1840,9 +1968,9 @@ document.write("Last Published: " + document.lastModified);
 
				           pairs to an output file.</p>
			
 
				 <p>RecordWriter implementations write the job outputs to the 
			
 
				           <span class="codefrag">FileSystem</span>.</p>
			
 
				-<a name="N10ABF"></a><a name="Other+Useful+Features"></a>
			
 
				+<a name="N10BAF"></a><a name="Other+Useful+Features"></a>
			
 
				 <h3 class="h4">Other Useful Features</h3>
			
 
				-<a name="N10AC5"></a><a name="Counters"></a>
			
 
				+<a name="N10BB5"></a><a name="Counters"></a>
			
 
				 <h4>Counters</h4>
			
 
				 <p>
			
 
				 <span class="codefrag">Counters</span> represent global counters, defined either by 
			
@@ -1856,7 +1984,7 @@ document.write("Last Published: " + document.lastModified);
 
				           Reporter.incrCounter(Enum, long)</a> in the <span class="codefrag">map</span> and/or 
			
 
				           <span class="codefrag">reduce</span> methods. These counters are then globally 
			
 
				           aggregated by the framework.</p>
			
 
				-<a name="N10AF0"></a><a name="DistributedCache"></a>
			
 
				+<a name="N10BE0"></a><a name="DistributedCache"></a>
			
 
				 <h4>DistributedCache</h4>
			
 
				 <p>
			
 
				 <a href="api/org/apache/hadoop/filecache/DistributedCache.html">
			
@@ -1890,7 +2018,7 @@ document.write("Last Published: " + document.lastModified);
 
				           <a href="api/org/apache/hadoop/filecache/DistributedCache.html#createSymlink(org.apache.hadoop.conf.Configuration)">
			
 
				           DistributedCache.createSymlink(Configuration)</a> api. Files 
			
 
				           have <em>execution permissions</em> set.</p>
			
 
				-<a name="N10B2E"></a><a name="Tool"></a>
			
 
				+<a name="N10C1E"></a><a name="Tool"></a>
			
 
				 <h4>Tool</h4>
			
 
				 <p>The <a href="api/org/apache/hadoop/util/Tool.html">Tool</a> 
			
 
				           interface supports the handling of generic Hadoop command-line options.
			
@@ -1930,7 +2058,7 @@ document.write("Last Published: " + document.lastModified);
 
				             </span>
			
 
				           
			
 
				 </p>
			
 
				-<a name="N10B60"></a><a name="IsolationRunner"></a>
			
 
				+<a name="N10C50"></a><a name="IsolationRunner"></a>
			
 
				 <h4>IsolationRunner</h4>
			
 
				 <p>
			
 
				 <a href="api/org/apache/hadoop/mapred/IsolationRunner.html">
			
@@ -1954,7 +2082,7 @@ document.write("Last Published: " + document.lastModified);
 
				 <p>
			
 
				 <span class="codefrag">IsolationRunner</span> will run the failed task in a single 
			
 
				           jvm, which can be in the debugger, over precisely the same input.</p>
			
 
				-<a name="N10B93"></a><a name="Debugging"></a>
			
 
				+<a name="N10C83"></a><a name="Debugging"></a>
			
 
				 <h4>Debugging</h4>
			
 
				 <p>Map/Reduce framework provides a facility to run user-provided 
			
 
				           scripts for debugging. When map/reduce task fails, user can run 
			
@@ -1965,7 +2093,7 @@ document.write("Last Published: " + document.lastModified);
 
				 <p> In the following sections we discuss how to submit debug script
			
 
				           along with the job. For submitting debug script, first it has to
			
 
				           distributed. Then the script has to supplied in Configuration. </p>
			
 
				-<a name="N10B9F"></a><a name="How+to+distribute+script+file%3A"></a>
			
 
				+<a name="N10C8F"></a><a name="How+to+distribute+script+file%3A"></a>
			
 
				 <h5> How to distribute script file: </h5>
			
 
				 <p>
			
 
				           To distribute  the debug script file, first copy the file to the dfs.
			
@@ -1988,7 +2116,7 @@ document.write("Last Published: " + document.lastModified);
 
				           <a href="api/org/apache/hadoop/filecache/DistributedCache.html#createSymlink(org.apache.hadoop.conf.Configuration)">
			
 
				           DistributedCache.createSymLink(Configuration) </a> api.
			
 
				           </p>
			
 
				-<a name="N10BB8"></a><a name="How+to+submit+script%3A"></a>
			
 
				+<a name="N10CA8"></a><a name="How+to+submit+script%3A"></a>
			
 
				 <h5> How to submit script: </h5>
			
 
				 <p> A quick way to submit debug script is to set values for the 
			
 
				           properties "mapred.map.task.debug.script" and 
			
@@ -2012,17 +2140,17 @@ document.write("Last Published: " + document.lastModified);
 
				 <span class="codefrag">$script $stdout $stderr $syslog $jobconf $program </span>  
			
 
				           
			
 
				 </p>
			
 
				-<a name="N10BDA"></a><a name="Default+Behavior%3A"></a>
			
 
				+<a name="N10CCA"></a><a name="Default+Behavior%3A"></a>
			
 
				 <h5> Default Behavior: </h5>
			
 
				 <p> For pipes, a default script is run to process core dumps under
			
 
				           gdb, prints stack trace and gives info about running threads. </p>
			
 
				-<a name="N10BE5"></a><a name="JobControl"></a>
			
 
				+<a name="N10CD5"></a><a name="JobControl"></a>
			
 
				 <h4>JobControl</h4>
			
 
				 <p>
			
 
				 <a href="api/org/apache/hadoop/mapred/jobcontrol/package-summary.html">
			
 
				           JobControl</a> is a utility which encapsulates a set of Map-Reduce jobs
			
 
				           and their dependencies.</p>
			
 
				-<a name="N10BF2"></a><a name="Data+Compression"></a>
			
 
				+<a name="N10CE2"></a><a name="Data+Compression"></a>
			
 
				 <h4>Data Compression</h4>
			
 
				 <p>Hadoop Map-Reduce provides facilities for the application-writer to
			
 
				           specify compression for both intermediate map-outputs and the
			
@@ -2036,7 +2164,7 @@ document.write("Last Published: " + document.lastModified);
 
				           codecs for reasons of both performance (zlib) and non-availability of
			
 
				           Java libraries (lzo). More details on their usage and availability are
			
 
				           available <a href="native_libraries.html">here</a>.</p>
			
 
				-<a name="N10C12"></a><a name="Intermediate+Outputs"></a>
			
 
				+<a name="N10D02"></a><a name="Intermediate+Outputs"></a>
			
 
				 <h5>Intermediate Outputs</h5>
			
 
				 <p>Applications can control compression of intermediate map-outputs
			
 
				             via the 
			
@@ -2057,7 +2185,7 @@ document.write("Last Published: " + document.lastModified);
 
				             <a href="api/org/apache/hadoop/mapred/JobConf.html#setMapOutputCompressionType(org.apache.hadoop.io.SequenceFile.CompressionType)">
			
 
				             JobConf.setMapOutputCompressionType(SequenceFile.CompressionType)</a> 
			
 
				             api.</p>
			
 
				-<a name="N10C3E"></a><a name="Job+Outputs"></a>
			
 
				+<a name="N10D2E"></a><a name="Job+Outputs"></a>
			
 
				 <h5>Job Outputs</h5>
			
 
				 <p>Applications can control compression of job-outputs via the
			
 
				             <a href="api/org/apache/hadoop/mapred/OutputFormatBase.html#setCompressOutput(org.apache.hadoop.mapred.JobConf,%20boolean)">
			
@@ -2077,7 +2205,7 @@ document.write("Last Published: " + document.lastModified);
 
				 </div>
			
 
				 
			
 
				     
			
 
				-<a name="N10C6D"></a><a name="Example%3A+WordCount+v2.0"></a>
			
 
				+<a name="N10D5D"></a><a name="Example%3A+WordCount+v2.0"></a>
			
 
				 <h2 class="h3">Example: WordCount v2.0</h2>
			
 
				 <div class="section">
			
 
				 <p>Here is a more complete <span class="codefrag">WordCount</span> which uses many of the
			
@@ -2087,7 +2215,7 @@ document.write("Last Published: " + document.lastModified);
 
				       <a href="quickstart.html#SingleNodeSetup">pseudo-distributed</a> or
			
 
				       <a href="quickstart.html#Fully-Distributed+Operation">fully-distributed</a> 
			
 
				       Hadoop installation.</p>
			
 
				-<a name="N10C87"></a><a name="Source+Code-N10C87"></a>
			
 
				+<a name="N10D77"></a><a name="Source+Code-N10D77"></a>
			
 
				 <h3 class="h4">Source Code</h3>
			
 
				 <table class="ForrestTable" cellspacing="1" cellpadding="4">
			
 
				           
			
@@ -3297,7 +3425,7 @@ document.write("Last Published: " + document.lastModified);
 
				 </tr>
			
 
				         
			
 
				 </table>
			
 
				-<a name="N113E9"></a><a name="Sample+Runs"></a>
			
 
				+<a name="N114D9"></a><a name="Sample+Runs"></a>
			
 
				 <h3 class="h4">Sample Runs</h3>
			
 
				 <p>Sample text-files as input:</p>
			
 
				 <p>
			
@@ -3465,7 +3593,7 @@ document.write("Last Published: " + document.lastModified);
 
				 <br>
			
 
				         
			
 
				 </p>
			
 
				-<a name="N114BD"></a><a name="Highlights"></a>
			
 
				+<a name="N115AD"></a><a name="Highlights"></a>
			
 
				 <h3 class="h4">Highlights</h3>
			
 
				 <p>The second version of <span class="codefrag">WordCount</span> improves upon the 
			
 
				         previous one by using some features offered by the Map-Reduce framework:
			
--- a/docs/mapred_tutorial.pdf
+++ b/docs/mapred_tutorial.pdf
--- a/src/docs/src/documentation/content/xdocs/mapred_tutorial.xml
+++ b/src/docs/src/documentation/content/xdocs/mapred_tutorial.xml
@@ -1068,33 +1068,109 @@
 
				         <p>Users/admins can also specify the maximum virtual memory 
			
 
				         of the launched child-task using <code>mapred.child.ulimit</code>.</p>
			
 
				         
			
 
				-        <p>When the job starts, the localized job directory
			
 
				-        <code> ${mapred.local.dir}/taskTracker/jobcache/$jobid/</code>
			
 
				-        has the following directories: </p>
			
 
				+        <p>The task tracker has local directory,
			
 
				+        <code> ${mapred.local.dir}/taskTracker/</code> to create localized
			
 
				+        cache and localized job. It can define multiple local directories 
			
 
				+        (spanning multiple disks) and then each filename is assigned to a
			
 
				+        semi-random local directory. When the job starts, task tracker 
			
 
				+        creates a localized job directory relative to the local directory
			
 
				+        specified in the configuration. Thus the task tracker directory 
			
 
				+        structure looks the following: </p>         
			
 
				         <ul>
			
 
				-        <li> A job-specific shared directory, created at location
			
 
				-        <code>${mapred.local.dir}/taskTracker/jobcache/$jobid/work/ </code>.
			
 
				-        This directory is exposed to the users through 
			
 
				-        <code>job.local.dir </code>. The tasks can use this space as scratch
			
 
				-        space and share files among them. The directory can accessed through 
			
 
				+        <li><code>${mapred.local.dir}/taskTracker/archive/</code> :
			
 
				+        The distributed cache. This directory holds the localized distributed
			
 
				+        cache. Thus localized distributed cache is shared among all
			
 
				+        the tasks and jobs </li>
			
 
				+        <li><code>${mapred.local.dir}/taskTracker/jobcache/$jobid/</code> :
			
 
				+        The localized job directory 
			
 
				+        <ul>
			
 
				+        <li><code>${mapred.local.dir}/taskTracker/jobcache/$jobid/work/</code> 
			
 
				+        : The job-specific shared directory. The tasks can use this space as 
			
 
				+        scratch space and share files among them. This directory is exposed
			
 
				+        to the users through the configuration property  
			
 
				+        <code>job.local.dir</code>. The directory can accessed through 
			
 
				         api <a href="ext:api/org/apache/hadoop/mapred/jobconf/getjoblocaldir">
			
 
				         JobConf.getJobLocalDir()</a>. It is available as System property also.
			
 
				-        So,users can call <code>System.getProperty("job.local.dir")</code>;
			
 
				-        </li>
			
 
				-        <li>A jars directory, which has the job jar file and expanded jar </li>
			
 
				-        <li>A job.xml file, the generic job configuration </li>
			
 
				-        <li>Each task has directory <code>task-id</code> which again has the 
			
 
				-        following structure
			
 
				+        So, users (streaming etc.) can call 
			
 
				+        <code>System.getProperty("job.local.dir")</code> to access the 
			
 
				+        directory.</li>
			
 
				+        <li><code>${mapred.local.dir}/taskTracker/jobcache/$jobid/jars/</code>
			
 
				+        : The jars directory, which has the job jar file and expanded jar.
			
 
				+        The <code>job.jar</code> is the application's jar file that is
			
 
				+        automatically distributed to each machine. It is expanded in jars
			
 
				+        directory before the tasks for the job start. The job.jar location
			
 
				+        is accessible to the application through the api
			
 
				+        <a href="ext:api/org/apache/hadoop/mapred/jobconf/getjar"> 
			
 
				+        JobConf.getJar() </a>. To access the unjarred directory,
			
 
				+        JobConf.getJar().getParent() can be called.</li>
			
 
				+        <li><code>${mapred.local.dir}/taskTracker/jobcache/$jobid/job.xml</code>
			
 
				+        : The job.xml file, the generic job configuration, localized for 
			
 
				+        the job. </li>
			
 
				+        <li><code>${mapred.local.dir}/taskTracker/jobcache/$jobid/$taskid</code>
			
 
				+        : The task direcrory for each task attempt. Each task directory
			
 
				+        again has the following structure :
			
 
				         <ul>
			
 
				-        <li>A job.xml file, task localized job configuration </li>
			
 
				-        <li>A directory for intermediate output files</li>
			
 
				-        <li>The working directory of the task. 
			
 
				-        And work directory has a temporary directory 
			
 
				-        to create temporary files</li>
			
 
				+        <li><code>${mapred.local.dir}/taskTracker/jobcache/$jobid/$taskid/job.xml</code>
			
 
				+        : A job.xml file, task localized job configuration, Task localization
			
 
				+        means that properties have been set that are specific to
			
 
				+        this particular task within the job. The properties localized for 
			
 
				+        each task are described below.</li>
			
 
				+        <li><code>${mapred.local.dir}/taskTracker/jobcache/$jobid/$taskid/output</code>
			
 
				+        : A directory for intermediate output files. This contains the
			
 
				+        temporary map reduce data generated by the framework
			
 
				+        such as map output files etc. </li>
			
 
				+        <li><code>${mapred.local.dir}/taskTracker/jobcache/$jobid/$taskid/work</code>
			
 
				+        : The curernt working directory of the task. </li>
			
 
				+        <li><code>${mapred.local.dir}/taskTracker/jobcache/$jobid/$taskid/work/tmp</code>
			
 
				+        : The temporary directory for the task. 
			
 
				+        (User can specify the property <code>mapred.child.tmp</code> to set
			
 
				+        the value of temporary directory for map and reduce tasks. This 
			
 
				+        defaults to <code>./tmp</code>. If the value is not an absolute path,
			
 
				+        it is prepended with task's working directory. Otherwise, it is
			
 
				+        directly assigned. The directory will be created if it doesn't exist.
			
 
				+        Then, the child java tasks are executed with option
			
 
				+        <code>-Djava.io.tmpdir='the absolute path of the tmp dir'</code>.
			
 
				+        Anp pipes and streaming are set with environment variable,
			
 
				+        <code>TMPDIR='the absolute path of the tmp dir'</code>). This 
			
 
				+        directory is created, if <code>mapred.child.tmp</code> has the value
			
 
				+        <code>./tmp</code> </li>
			
 
				         </ul>
			
 
				         </li>
			
 
				         </ul>
			
 
				- 
			
 
				+        </li>
			
 
				+        </ul>
			
 
				+
			
 
				+        <p>The following properties are localized in the job configuration 
			
 
				+         for each task's execution: </p>
			
 
				+        <table>
			
 
				+          <tr><th>Name</th><th>Type</th><th>Description</th></tr>
			
 
				+          <tr><td>mapred.job.id</td><td>String</td><td>The job id</td></tr>
			
 
				+          <tr><td>mapred.jar</td><td>String</td>
			
 
				+              <td>job.jar location in job directory</td></tr>
			
 
				+          <tr><td>job.local.dir</td><td> String</td>
			
 
				+              <td> The job specific shared scratch space</td></tr>
			
 
				+          <tr><td>mapred.tip.id</td><td> String</td>
			
 
				+              <td> The task id</td></tr>
			
 
				+          <tr><td>mapred.task.id</td><td> String</td>
			
 
				+              <td> The task attempt id</td></tr>
			
 
				+          <tr><td>mapred.task.is.map</td><td> boolean </td>
			
 
				+              <td>Is this a map task</td></tr>
			
 
				+          <tr><td>mapred.task.partition</td><td> int </td>
			
 
				+              <td>The id of the task within the job</td></tr>
			
 
				+          <tr><td>map.input.file</td><td> String</td>
			
 
				+              <td> The filename that the map is reading from</td></tr>
			
 
				+          <tr><td>map.input.start</td><td> long</td>
			
 
				+              <td> The offset of the start of the map input split</td></tr>
			
 
				+          <tr><td>map.input.length </td><td>long </td>
			
 
				+              <td>The number of bytes in the map input split</td></tr>
			
 
				+          <tr><td>mapred.work.output.dir</td><td> String </td>
			
 
				+              <td>The task's temporary output directory</td></tr>
			
 
				+        </table>
			
 
				+        
			
 
				+        <p>The standard output (stdout) and error (stderr) streams of the task 
			
 
				+        are read by the TaskTracker and logged to 
			
 
				+        <code>${HADOOP_LOG_DIR}/userlogs</code></p>
			
 
				+        
			
 
				         <p>The <a href="#DistributedCache">DistributedCache</a> can also be used
			
 
				         as a rudimentary software distribution mechanism for use in the map 
			
 
				         and/or reduce tasks. It can be used to distribute both jars and 
			
--- a/src/docs/src/documentation/content/xdocs/site.xml
+++ b/src/docs/src/documentation/content/xdocs/site.xml
@@ -167,6 +167,7 @@ See http://forrest.apache.org/docs/linking.html for more info.
 
				                 <setmapoutputcompressiontype href="#setMapOutputCompressionType(org.apache.hadoop.io.SequenceFile.CompressionType)" />
			
 
				                 <setmapoutputcompressorclass href="#setMapOutputCompressorClass(java.lang.Class)" />
			
 
				                 <getjoblocaldir href="#getJobLocalDir()" />
			
 
				+                <getjar href="#getJar()" />
			
 
				               </jobconf>
			
 
				               <jobconfigurable href="JobConfigurable.html">
			
 
				                 <configure href="#configure(org.apache.hadoop.mapred.JobConf)" />