Browse Source

HADOOP-3106. Adds documentation in forrest for debugging. Contributed by Amareshwari Sriramadasu.

git-svn-id: https://svn.apache.org/repos/asf/hadoop/core/trunk@643793 13f79535-47bb-0310-9956-ffa450edef68
Devaraj Das 17 years ago
parent
commit
455b583dce

+ 3 - 0
CHANGES.txt

@@ -176,6 +176,9 @@ Trunk (unreleased changes)
     HADOOP-3093. Adds Configuration.getStrings(name, default-value) and
     the corresponding setStrings. (Amareshwari Sriramadasu via ddas)
 
+    HADOOP-3106. Adds documentation in forrest for debugging.
+    (Amareshwari Sriramadasu via ddas)
+
   OPTIMIZATIONS
 
     HADOOP-2790.  Fixed inefficient method hasSpeculativeTask by removing

+ 102 - 19
docs/changes.html

@@ -36,7 +36,7 @@
     function collapse() {
       for (var i = 0; i < document.getElementsByTagName("ul").length; i++) {
         var list = document.getElementsByTagName("ul")[i];
-        if (list.id != 'trunk_(unreleased_changes)_' && list.id != 'release_0.16.2_-_unreleased_') {
+        if (list.id != 'trunk_(unreleased_changes)_' && list.id != 'release_0.16.2_-_2008-04-02_') {
           list.style.display = "none";
         }
       }
@@ -56,7 +56,7 @@
 </a></h2>
 <ul id="trunk_(unreleased_changes)_">
   <li><a href="javascript:toggleList('trunk_(unreleased_changes)_._incompatible_changes_')">  INCOMPATIBLE CHANGES
-</a>&nbsp;&nbsp;&nbsp;(11)
+</a>&nbsp;&nbsp;&nbsp;(19)
     <ol id="trunk_(unreleased_changes)_._incompatible_changes_">
       <li><a href="http://issues.apache.org/jira/browse/HADOOP-2786">HADOOP-2786</a>.  Move hbase out of hadoop core
 </li>
@@ -77,13 +77,28 @@ specifies whether a recursive delete is intended.<br />(Mahadev Konar via dhruba
 and isDir(String) from ClientProtocol. ClientProtocol version changed
 from 26 to 27. (Tsz Wo (Nicholas), SZE via cdouglas)
 </li>
-      <li><a href="http://issues.apache.org/jira/browse/HADOOP-2822">HADOOP-2822</a>. Remove depreceted code for classes InputFormatBase and
+      <li><a href="http://issues.apache.org/jira/browse/HADOOP-2822">HADOOP-2822</a>. Remove deprecated code for classes InputFormatBase and
 PhasedFileSystem.<br />(Amareshwari Sriramadasu via enis)</li>
       <li><a href="http://issues.apache.org/jira/browse/HADOOP-2116">HADOOP-2116</a>. Changes the layout of the task execution directory.<br />(Amareshwari Sriramadasu via ddas)</li>
+      <li><a href="http://issues.apache.org/jira/browse/HADOOP-2828">HADOOP-2828</a>. The following deprecated methods in Configuration.java
+have been removed
+    getObject(String name)
+    setObject(String name, Object value)
+    get(String name, Object defaultValue)
+    set(String name, Object value)
+    Iterator entries()<br />(Amareshwari Sriramadasu via ddas)</li>
+      <li><a href="http://issues.apache.org/jira/browse/HADOOP-2824">HADOOP-2824</a>. Removes one deprecated constructor from MiniMRCluster.<br />(Amareshwari Sriramadasu via ddas)</li>
+      <li><a href="http://issues.apache.org/jira/browse/HADOOP-2823">HADOOP-2823</a>. Removes deprecated methods getColumn(), getLine() from
+org.apache.hadoop.record.compiler.generated.SimpleCharStream.<br />(Amareshwari Sriramadasu via ddas)</li>
+      <li><a href="http://issues.apache.org/jira/browse/HADOOP-3060">HADOOP-3060</a>. Removes one unused constructor argument from MiniMRCluster.<br />(Amareshwari Sriramadasu via ddas)</li>
+      <li><a href="http://issues.apache.org/jira/browse/HADOOP-2854">HADOOP-2854</a>. Remove deprecated o.a.h.ipc.Server::getUserInfo().<br />(lohit vijayarenu via cdouglas)</li>
+      <li><a href="http://issues.apache.org/jira/browse/HADOOP-2563">HADOOP-2563</a>. Remove deprecated FileSystem::listPaths.<br />(lohit vijayarenu via cdouglas)</li>
+      <li><a href="http://issues.apache.org/jira/browse/HADOOP-2818">HADOOP-2818</a>.  Remove deprecated methods in Counters.<br />(Amareshwari Sriramadasu via tomwhite)</li>
+      <li><a href="http://issues.apache.org/jira/browse/HADOOP-2831">HADOOP-2831</a>. Remove deprecated o.a.h.dfs.INode::getAbsoluteName()<br />(lohit vijayarenu via cdouglas)</li>
     </ol>
   </li>
   <li><a href="javascript:toggleList('trunk_(unreleased_changes)_._new_features_')">  NEW FEATURES
-</a>&nbsp;&nbsp;&nbsp;(7)
+</a>&nbsp;&nbsp;&nbsp;(9)
     <ol id="trunk_(unreleased_changes)_._new_features_">
       <li><a href="http://issues.apache.org/jira/browse/HADOOP-1398">HADOOP-1398</a>.  Add HBase in-memory block cache.<br />(tomwhite)</li>
       <li><a href="http://issues.apache.org/jira/browse/HADOOP-2178">HADOOP-2178</a>.  Job History on DFS.<br />(Amareshwari Sri Ramadasu via ddas)</li>
@@ -99,10 +114,12 @@ config params to map records to different output files.<br />(Runping Qi via cdo
 DFSClient and DataNode sockets have 10min write timeout.<br />(rangadi)</li>
       <li><a href="http://issues.apache.org/jira/browse/HADOOP-2951">HADOOP-2951</a>.  Add a contrib module that provides a utility to
 build or update Lucene indexes using Map/Reduce.<br />(Ning Li via cutting)</li>
+      <li><a href="http://issues.apache.org/jira/browse/HADOOP-1622">HADOOP-1622</a>.  Allow multiple jar files for map reduce.<br />(Mahadev Konar via dhruba)</li>
+      <li><a href="http://issues.apache.org/jira/browse/HADOOP-2055">HADOOP-2055</a>. Allows users to set PathFilter on the FileInputFormat.<br />(Alejandro Abdelnur via ddas)</li>
     </ol>
   </li>
   <li><a href="javascript:toggleList('trunk_(unreleased_changes)_._improvements_')">  IMPROVEMENTS
-</a>&nbsp;&nbsp;&nbsp;(22)
+</a>&nbsp;&nbsp;&nbsp;(26)
     <ol id="trunk_(unreleased_changes)_._improvements_">
       <li><a href="http://issues.apache.org/jira/browse/HADOOP-2655">HADOOP-2655</a>. Copy on write for data and metadata files in the
 presence of snapshots. Needed for supporting appends to HDFS
@@ -114,9 +131,6 @@ methods for fs.default.name, and check for null authority in HDFS.<br />(cutting
       <li><a href="http://issues.apache.org/jira/browse/HADOOP-2895">HADOOP-2895</a>. Let the profiling string be configurable.<br />(Martin Traverso via cdouglas)</li>
       <li><a href="http://issues.apache.org/jira/browse/HADOOP-910">HADOOP-910</a>. Enables Reduces to do merges for the on-disk map output files
 in parallel with their copying.<br />(Amar Kamat via ddas)</li>
-      <li><a href="http://issues.apache.org/jira/browse/HADOOP-2833">HADOOP-2833</a>. Do not use "Dr. Who" as the default user in JobClient.
-A valid user name is required. (Tsz Wo (Nicholas), SZE via rangadi)
-</li>
       <li><a href="http://issues.apache.org/jira/browse/HADOOP-730">HADOOP-730</a>. Use rename rather than copy for local renames.<br />(cdouglas)</li>
       <li><a href="http://issues.apache.org/jira/browse/HADOOP-2810">HADOOP-2810</a>. Updated the Hadoop Core logo.<br />(nigel)</li>
       <li><a href="http://issues.apache.org/jira/browse/HADOOP-2057">HADOOP-2057</a>.  Streaming should optionally treat a non-zero exit status
@@ -138,18 +152,24 @@ second.<br />(lohit vijayarenu via cdouglas)</li>
       <li><a href="http://issues.apache.org/jira/browse/HADOOP-2939">HADOOP-2939</a>. Make the automated patch testing process an executable
 Ant target, test-patch.<br />(nigel)</li>
       <li><a href="http://issues.apache.org/jira/browse/HADOOP-2239">HADOOP-2239</a>. Add HsftpFileSystem to permit transferring files over ssl.<br />(cdouglas)</li>
-      <li><a href="http://issues.apache.org/jira/browse/HADOOP-2910">HADOOP-2910</a>. Throttle IPC Client/Server during bursts of
-requests or server slowdown.<br />(Hairong Kuang via dhruba)</li>
       <li><a href="http://issues.apache.org/jira/browse/HADOOP-2848">HADOOP-2848</a>. [HOD]hod -o list and deallocate works even after deleting
 the cluster directory.<br />(Hemanth Yamijala via ddas)</li>
       <li><a href="http://issues.apache.org/jira/browse/HADOOP-2899">HADOOP-2899</a>. [HOD] Cleans up hdfs:///mapredsystem directory after
 deallocation.<br />(Hemanth Yamijala via ddas)</li>
       <li><a href="http://issues.apache.org/jira/browse/HADOOP-2886">HADOOP-2886</a>.  Track individual RPC metrics.<br />(girish vaitheeswaran via dhruba)</li>
       <li><a href="http://issues.apache.org/jira/browse/HADOOP-2373">HADOOP-2373</a>. Improvement in safe-mode reporting.<br />(shv)</li>
+      <li><a href="http://issues.apache.org/jira/browse/HADOOP-2796">HADOOP-2796</a>. Enables distinguishing exit codes from user code vis-a-vis
+HOD's exit code.<br />(Hemanth Yamijala via ddas)</li>
+      <li><a href="http://issues.apache.org/jira/browse/HADOOP-3091">HADOOP-3091</a>. Modify FsShell command -put to accept multiple sources.<br />(Lohit Vijaya Renu via cdouglas)</li>
+      <li><a href="http://issues.apache.org/jira/browse/HADOOP-3092">HADOOP-3092</a>. Show counter values from job -status command.<br />(Tom White via ddas)</li>
+      <li><a href="http://issues.apache.org/jira/browse/HADOOP-1228">HADOOP-1228</a>.  Ant task to generate Eclipse project files.<br />(tomwhite)</li>
+      <li><a href="http://issues.apache.org/jira/browse/HADOOP-3093">HADOOP-3093</a>. Adds Configuration.getStrings(name, default-value) and
+the corresponding setStrings.<br />(Amareshwari Sriramadasu via ddas)</li>
+      <li><a href="http://issues.apache.org/jira/browse/HADOOP-3106">HADOOP-3106</a>. Adds documentation in forrest for debugging.<br />(Amareshwari Sriramadasu via ddas)</li>
     </ol>
   </li>
   <li><a href="javascript:toggleList('trunk_(unreleased_changes)_._optimizations_')">  OPTIMIZATIONS
-</a>&nbsp;&nbsp;&nbsp;(7)
+</a>&nbsp;&nbsp;&nbsp;(10)
     <ol id="trunk_(unreleased_changes)_._optimizations_">
       <li><a href="http://issues.apache.org/jira/browse/HADOOP-2790">HADOOP-2790</a>.  Fixed inefficient method hasSpeculativeTask by removing
 repetitive calls to get the current time and late checking to see if
@@ -168,10 +188,20 @@ each live data-node.<br />(shv)</li>
       <li><a href="http://issues.apache.org/jira/browse/HADOOP-2148">HADOOP-2148</a>. Eliminate redundant data-node blockMap lookups.<br />(shv)</li>
       <li><a href="http://issues.apache.org/jira/browse/HADOOP-2027">HADOOP-2027</a>. Return the number of bytes in each block in a file
 via a single rpc to the namenode to speed up job planning.<br />(Lohit Vijaya Renu via omalley)</li>
+      <li><a href="http://issues.apache.org/jira/browse/HADOOP-2902">HADOOP-2902</a>.  Replace uses of "fs.default.name" with calls to the
+accessor methods added in <a href="http://issues.apache.org/jira/browse/HADOOP-1967">HADOOP-1967</a>.<br />(cutting)</li>
+      <li><a href="http://issues.apache.org/jira/browse/HADOOP-2119">HADOOP-2119</a>.  Optimize scheduling of jobs with large numbers of
+tasks by replacing static arrays with lists of runnable tasks.<br />(Amar Kamat via omalley)</li>
+      <li><a href="http://issues.apache.org/jira/browse/HADOOP-2919">HADOOP-2919</a>.  Reduce the number of memory copies done during the
+map output sorting. Also adds two config variables:
+io.sort.spill.percent - the percentages of io.sort.mb that should
+                        cause a spill (default 80%)
+io.sort.record.percent - the percent of io.sort.mb that should
+                         hold key/value indexes (default 5%)<br />(cdouglas via omalley)</li>
     </ol>
   </li>
   <li><a href="javascript:toggleList('trunk_(unreleased_changes)_._bug_fixes_')">  BUG FIXES
-</a>&nbsp;&nbsp;&nbsp;(53)
+</a>&nbsp;&nbsp;&nbsp;(68)
     <ol id="trunk_(unreleased_changes)_._bug_fixes_">
       <li><a href="http://issues.apache.org/jira/browse/HADOOP-2195">HADOOP-2195</a>. '-mkdir' behaviour is now closer to Linux shell in case of
 errors.<br />(Mahadev Konar via rangadi)</li>
@@ -275,15 +305,37 @@ client side configs.<br />(Vinod Kumar Vavilapalli via ddas)</li>
 the recursive flag.<br />(Mahadev Konar via dhruba)</li>
       <li><a href="http://issues.apache.org/jira/browse/HADOOP-3012">HADOOP-3012</a>. dfs -mv file to user home directory throws exception if
 the user home directory does not exist.<br />(Mahadev Konar via dhruba)</li>
+      <li><a href="http://issues.apache.org/jira/browse/HADOOP-3066">HADOOP-3066</a>. Should not require superuser privilege to query if hdfs is in
+safe mode<br />(jimk)</li>
+      <li><a href="http://issues.apache.org/jira/browse/HADOOP-3040">HADOOP-3040</a>. If the input line starts with the separator char, the key
+is set as empty.<br />(Amareshwari Sriramadasu via ddas)</li>
+      <li><a href="http://issues.apache.org/jira/browse/HADOOP-3080">HADOOP-3080</a>. Removes flush calls from JobHistory.<br />(Amareshwari Sriramadasu via ddas)</li>
+      <li><a href="http://issues.apache.org/jira/browse/HADOOP-3086">HADOOP-3086</a>. Adds the testcase missed during commit of hadoop-3040.<br />(Amareshwari Sriramadasu via ddas)</li>
+      <li><a href="http://issues.apache.org/jira/browse/HADOOP-2983">HADOOP-2983</a>. [HOD] Fixes the problem - local_fqdn() returns None when
+gethostbyname_ex doesnt return any FQDNs.<br />(Craig Macdonald via ddas)</li>
+      <li><a href="http://issues.apache.org/jira/browse/HADOOP-3046">HADOOP-3046</a>. Fix the raw comparators for Text and BytesWritables
+to use the provided length rather than recompute it.<br />(omalley)</li>
+      <li><a href="http://issues.apache.org/jira/browse/HADOOP-3094">HADOOP-3094</a>. Fix BytesWritable.toString to avoid extending the sign bit<br />(Owen O'Malley via cdouglas)</li>
+      <li><a href="http://issues.apache.org/jira/browse/HADOOP-3067">HADOOP-3067</a>. DFSInputStream's position read does not close the sockets.<br />(rangadi)</li>
+      <li><a href="http://issues.apache.org/jira/browse/HADOOP-3073">HADOOP-3073</a>. close() on SocketInputStream or SocketOutputStream should
+close the underlying channel.<br />(rangadi)</li>
+      <li><a href="http://issues.apache.org/jira/browse/HADOOP-3087">HADOOP-3087</a>. Fixes a problem to do with refreshing of loadHistory.jsp.<br />(Amareshwari Sriramadasu via ddas)</li>
+      <li><a href="http://issues.apache.org/jira/browse/HADOOP-2982">HADOOP-2982</a>. Fixes a problem in the way HOD looks for free nodes.<br />(Hemanth Yamijala via ddas)</li>
+      <li><a href="http://issues.apache.org/jira/browse/HADOOP-3065">HADOOP-3065</a>. Better logging message if the rack location of a datanode
+cannot be determined.<br />(Devaraj Das via dhruba)</li>
+      <li><a href="http://issues.apache.org/jira/browse/HADOOP-3064">HADOOP-3064</a>. Commas in a file path should not be treated as delimiters.<br />(Hairong Kuang via shv)</li>
+      <li><a href="http://issues.apache.org/jira/browse/HADOOP-2997">HADOOP-2997</a>. Adds test for non-writable serialier. Also fixes a problem
+introduced by <a href="http://issues.apache.org/jira/browse/HADOOP-2399">HADOOP-2399</a>.<br />(Tom White via ddas)</li>
+      <li><a href="http://issues.apache.org/jira/browse/HADOOP-3114">HADOOP-3114</a>. Fix TestDFSShell on Windows.<br />(Lohit Vijaya Renu via cdouglas)</li>
     </ol>
   </li>
 </ul>
-<h2><a href="javascript:toggleList('release_0.16.2_-_unreleased_')">Release 0.16.2 - Unreleased
+<h2><a href="javascript:toggleList('release_0.16.2_-_2008-04-02_')">Release 0.16.2 - 2008-04-02
 </a></h2>
-<ul id="release_0.16.2_-_unreleased_">
-  <li><a href="javascript:toggleList('release_0.16.2_-_unreleased_._bug_fixes_')">  BUG FIXES
-</a>&nbsp;&nbsp;&nbsp;(6)
-    <ol id="release_0.16.2_-_unreleased_._bug_fixes_">
+<ul id="release_0.16.2_-_2008-04-02_">
+  <li><a href="javascript:toggleList('release_0.16.2_-_2008-04-02_._bug_fixes_')">  BUG FIXES
+</a>&nbsp;&nbsp;&nbsp;(19)
+    <ol id="release_0.16.2_-_2008-04-02_._bug_fixes_">
       <li><a href="http://issues.apache.org/jira/browse/HADOOP-3011">HADOOP-3011</a>. Prohibit distcp from overwriting directories on the
 destination filesystem with files.<br />(cdouglas)</li>
       <li><a href="http://issues.apache.org/jira/browse/HADOOP-3033">HADOOP-3033</a>. The BlockReceiver thread in the datanode writes data to
@@ -297,6 +349,34 @@ FileSystem object is created. (Tsz Wo (Nicholas), SZE via dhruba)
 </li>
       <li><a href="http://issues.apache.org/jira/browse/HADOOP-3042">HADOOP-3042</a>. Updates the Javadoc in JobConf.getOutputPath to reflect
 the actual temporary path.<br />(Amareshwari Sriramadasu via ddas)</li>
+      <li><a href="http://issues.apache.org/jira/browse/HADOOP-3007">HADOOP-3007</a>. Tolerate mirror failures while DataNode is replicating
+blocks as it used to before.<br />(rangadi)</li>
+      <li><a href="http://issues.apache.org/jira/browse/HADOOP-2944">HADOOP-2944</a>. Fixes a "Run on Hadoop" wizard NPE when creating a
+Location from the wizard.<br />(taton)</li>
+      <li><a href="http://issues.apache.org/jira/browse/HADOOP-3049">HADOOP-3049</a>. Fixes a problem in MultiThreadedMapRunner to do with
+catching RuntimeExceptions.<br />(Alejandro Abdelnur via ddas)</li>
+      <li><a href="http://issues.apache.org/jira/browse/HADOOP-3039">HADOOP-3039</a>. Fixes a problem to do with exceptions in tasks not
+killing jobs.<br />(Amareshwari Sriramadasu via ddas)</li>
+      <li><a href="http://issues.apache.org/jira/browse/HADOOP-3027">HADOOP-3027</a>. Fixes a problem to do with adding a shutdown hook in
+FileSystem.<br />(Amareshwari Sriramadasu via ddas)</li>
+      <li><a href="http://issues.apache.org/jira/browse/HADOOP-3056">HADOOP-3056</a>. Fix distcp when the target is an empty directory by
+making sure the directory is created first.<br />(cdouglas and acmurthy
+via omalley)</li>
+      <li><a href="http://issues.apache.org/jira/browse/HADOOP-3070">HADOOP-3070</a>. Protect the trash emptier thread from null pointer
+exceptions.<br />(Koji Noguchi via omalley)</li>
+      <li><a href="http://issues.apache.org/jira/browse/HADOOP-3084">HADOOP-3084</a>. Fix HftpFileSystem to work for zero-lenghth files.<br />(cdouglas)</li>
+      <li><a href="http://issues.apache.org/jira/browse/HADOOP-3107">HADOOP-3107</a>. Fix NPE when fsck invokes getListings.<br />(dhruba)</li>
+      <li><a href="http://issues.apache.org/jira/browse/HADOOP-3103">HADOOP-3103</a>. [HOD] Hadoop.tmp.dir should not be set to cluster
+directory. (Vinod Kumar Vavilapalli via ddas).
+</li>
+      <li><a href="http://issues.apache.org/jira/browse/HADOOP-3104">HADOOP-3104</a>. Limit MultithreadedMapRunner to have a fixed length queue
+between the RecordReader and the map threads.<br />(Alejandro Abdelnur via
+omalley)</li>
+      <li><a href="http://issues.apache.org/jira/browse/HADOOP-2833">HADOOP-2833</a>. Do not use "Dr. Who" as the default user in JobClient.
+A valid user name is required. (Tsz Wo (Nicholas), SZE via rangadi)
+</li>
+      <li><a href="http://issues.apache.org/jira/browse/HADOOP-3128">HADOOP-3128</a>. Throw RemoteException in setPermissions and setOwner of
+DistributedFileSystem.<br />(shv via nigel)</li>
     </ol>
   </li>
 </ul>
@@ -319,12 +399,14 @@ Configuration changes to hadoop-default.xml:
     </ol>
   </li>
   <li><a href="javascript:toggleList('release_0.16.1_-_2008-03-13_._improvements_')">  IMPROVEMENTS
-</a>&nbsp;&nbsp;&nbsp;(3)
+</a>&nbsp;&nbsp;&nbsp;(4)
     <ol id="release_0.16.1_-_2008-03-13_._improvements_">
       <li><a href="http://issues.apache.org/jira/browse/HADOOP-2371">HADOOP-2371</a>. User guide for file permissions in HDFS.<br />(Robert Chansler via rangadi)</li>
       <li><a href="http://issues.apache.org/jira/browse/HADOOP-2730">HADOOP-2730</a>. HOD documentation update.<br />(Vinod Kumar Vavilapalli via ddas)</li>
       <li><a href="http://issues.apache.org/jira/browse/HADOOP-2911">HADOOP-2911</a>. Make the information printed by the HOD allocate and
 info commands less verbose and clearer.<br />(Vinod Kumar via nigel)</li>
+      <li><a href="http://issues.apache.org/jira/browse/HADOOP-3098">HADOOP-3098</a>. Allow more characters in user and group names while
+using -chown and -chgrp commands.<br />(rangadi)</li>
     </ol>
   </li>
   <li><a href="javascript:toggleList('release_0.16.1_-_2008-03-13_._bug_fixes_')">  BUG FIXES
@@ -661,7 +743,7 @@ the map task.<br />(Amar Kamat via ddas)</li>
     </ol>
   </li>
   <li><a href="javascript:toggleList('release_0.16.0_-_2008-02-07_._bug_fixes_')">  BUG FIXES
-</a>&nbsp;&nbsp;&nbsp;(91)
+</a>&nbsp;&nbsp;&nbsp;(92)
     <ol id="release_0.16.0_-_2008-02-07_._bug_fixes_">
       <li><a href="http://issues.apache.org/jira/browse/HADOOP-2583">HADOOP-2583</a>.  Fixes a bug in the Eclipse plug-in UI to edit locations.
 Plug-in version is now synchronized with Hadoop version.
@@ -863,6 +945,7 @@ files that was broken by <a href="http://issues.apache.org/jira/browse/HADOOP-21
 issue.  (Tsz Wo (Nicholas), SZE via dhruba)
 </li>
       <li><a href="http://issues.apache.org/jira/browse/HADOOP-2768">HADOOP-2768</a>. Fix performance regression caused by <a href="http://issues.apache.org/jira/browse/HADOOP-1707">HADOOP-1707</a>.<br />(dhruba borthakur via nigel)</li>
+      <li><a href="http://issues.apache.org/jira/browse/HADOOP-3108">HADOOP-3108</a>. Fix NPE in setPermission and setOwner.<br />(shv)</li>
     </ol>
   </li>
 </ul>

+ 75 - 10
docs/mapred_tutorial.html

@@ -276,6 +276,9 @@ document.write("Last Published: " + document.lastModified);
 <a href="#IsolationRunner">IsolationRunner</a>
 </li>
 <li>
+<a href="#Debugging">Debugging</a>
+</li>
+<li>
 <a href="#JobControl">JobControl</a>
 </li>
 <li>
@@ -289,7 +292,7 @@ document.write("Last Published: " + document.lastModified);
 <a href="#Example%3A+WordCount+v2.0">Example: WordCount v2.0</a>
 <ul class="minitoc">
 <li>
-<a href="#Source+Code-N10C11">Source Code</a>
+<a href="#Source+Code-N10C63">Source Code</a>
 </li>
 <li>
 <a href="#Sample+Runs">Sample Runs</a>
@@ -1857,7 +1860,7 @@ document.write("Last Published: " + document.lastModified);
           <em>symlink</em> the cached file(s) into the <span class="codefrag">current working 
           directory</span> of the task via the 
           <a href="api/org/apache/hadoop/filecache/DistributedCache.html#createSymlink(org.apache.hadoop.conf.Configuration)">
-          DistributedCache.createSymlink(Path, Configuration)</a> api. Files 
+          DistributedCache.createSymlink(Configuration)</a> api. Files 
           have <em>execution permissions</em> set.</p>
 <a name="N10B0A"></a><a name="Tool"></a>
 <h4>Tool</h4>
@@ -1923,13 +1926,75 @@ document.write("Last Published: " + document.lastModified);
 <p>
 <span class="codefrag">IsolationRunner</span> will run the failed task in a single 
           jvm, which can be in the debugger, over precisely the same input.</p>
-<a name="N10B6F"></a><a name="JobControl"></a>
+<a name="N10B6F"></a><a name="Debugging"></a>
+<h4>Debugging</h4>
+<p>Map/Reduce framework provides a facility to run user-provided 
+          scripts for debugging. When map/reduce task fails, user can run 
+          script for doing post-processing on task logs i.e task's stdout,
+          stderr, syslog and jobconf. The stdout and stderr of the
+          user-provided debug script are printed on the diagnostics. 
+          These outputs are also displayed on job UI on demand. </p>
+<p> In the following sections we discuss how to submit debug script
+          along with the job. For submitting debug script, first it has to
+          distributed. Then the script has to supplied in Configuration. </p>
+<a name="N10B7B"></a><a name="How+to+distribute+script+file%3A"></a>
+<h5> How to distribute script file: </h5>
+<p>
+          To distribute  the debug script file, first copy the file to the dfs.
+          The file can be distributed by setting the property 
+          "mapred.cache.files" with value "path"#"script-name". 
+          If more than one file has to be distributed, the files can be added
+          as comma separated paths. This property can also be set by APIs
+          <a href="api/org/apache/hadoop/filecache/DistributedCache.html#addCacheFile(java.net.URI,%20org.apache.hadoop.conf.Configuration)">
+          DistributedCache.addCacheFile(URI,conf) </a> and
+          <a href="api/org/apache/hadoop/filecache/DistributedCache.html#setCacheFiles(java.net.URI[],%20org.apache.hadoop.conf.Configuration)">
+          DistributedCache.setCacheFiles(URIs,conf) </a> where URI is of 
+          the form "hdfs://host:port/'absolutepath'#'script-name'". 
+          For Streaming, the file can be added through 
+          command line option -cacheFile.
+          </p>
+<p>
+          The files has to be symlinked in the current working directory of 
+          of the task. To create symlink for the file, the property 
+          "mapred.create.symlink" is set to "yes". This can also be set by
+          <a href="api/org/apache/hadoop/filecache/DistributedCache.html#createSymlink(org.apache.hadoop.conf.Configuration)">
+          DistributedCache.createSymLink(Configuration) </a> api.
+          </p>
+<a name="N10B94"></a><a name="How+to+submit+script%3A"></a>
+<h5> How to submit script: </h5>
+<p> A quick way to submit debug script is to set values for the 
+          properties "mapred.map.task.debug.script" and 
+          "mapred.reduce.task.debug.script" for debugging map task and reduce
+          task respectively. These properties can also be set by using APIs 
+          <a href="api/org/apache/hadoop/mapred/JobConf.html#setMapDebugScript(java.lang.String)">
+          JobConf.setMapDebugScript(String) </a> and
+          <a href="api/org/apache/hadoop/mapred/JobConf.html#setReduceDebugScript(java.lang.String)">
+          JobConf.setReduceDebugScript(String) </a>. For streaming, debug 
+          script can be submitted with command-line options -mapdebug,
+          -reducedebug for debugging mapper and reducer respectively.</p>
+<p>The arguments of the script are task's stdout, stderr, 
+          syslog and jobconf files. The debug command, run on the node where
+          the map/reduce failed, is: <br>
+          
+<span class="codefrag"> $script $stdout $stderr $syslog $jobconf </span> 
+</p>
+<p> Pipes programs have the c++ program name as a fifth argument
+          for the command. Thus for the pipes programs the command is <br> 
+          
+<span class="codefrag">$script $stdout $stderr $syslog $jobconf $program </span>  
+          
+</p>
+<a name="N10BB6"></a><a name="Default+Behavior%3A"></a>
+<h5> Default Behavior: </h5>
+<p> For pipes, a default script is run to process core dumps under
+          gdb, prints stack trace and gives info about running threads. </p>
+<a name="N10BC1"></a><a name="JobControl"></a>
 <h4>JobControl</h4>
 <p>
 <a href="api/org/apache/hadoop/mapred/jobcontrol/package-summary.html">
           JobControl</a> is a utility which encapsulates a set of Map-Reduce jobs
           and their dependencies.</p>
-<a name="N10B7C"></a><a name="Data+Compression"></a>
+<a name="N10BCE"></a><a name="Data+Compression"></a>
 <h4>Data Compression</h4>
 <p>Hadoop Map-Reduce provides facilities for the application-writer to
           specify compression for both intermediate map-outputs and the
@@ -1943,7 +2008,7 @@ document.write("Last Published: " + document.lastModified);
           codecs for reasons of both performance (zlib) and non-availability of
           Java libraries (lzo). More details on their usage and availability are
           available <a href="native_libraries.html">here</a>.</p>
-<a name="N10B9C"></a><a name="Intermediate+Outputs"></a>
+<a name="N10BEE"></a><a name="Intermediate+Outputs"></a>
 <h5>Intermediate Outputs</h5>
 <p>Applications can control compression of intermediate map-outputs
             via the 
@@ -1964,7 +2029,7 @@ document.write("Last Published: " + document.lastModified);
             <a href="api/org/apache/hadoop/mapred/JobConf.html#setMapOutputCompressionType(org.apache.hadoop.io.SequenceFile.CompressionType)">
             JobConf.setMapOutputCompressionType(SequenceFile.CompressionType)</a> 
             api.</p>
-<a name="N10BC8"></a><a name="Job+Outputs"></a>
+<a name="N10C1A"></a><a name="Job+Outputs"></a>
 <h5>Job Outputs</h5>
 <p>Applications can control compression of job-outputs via the
             <a href="api/org/apache/hadoop/mapred/OutputFormatBase.html#setCompressOutput(org.apache.hadoop.mapred.JobConf,%20boolean)">
@@ -1984,7 +2049,7 @@ document.write("Last Published: " + document.lastModified);
 </div>
 
     
-<a name="N10BF7"></a><a name="Example%3A+WordCount+v2.0"></a>
+<a name="N10C49"></a><a name="Example%3A+WordCount+v2.0"></a>
 <h2 class="h3">Example: WordCount v2.0</h2>
 <div class="section">
 <p>Here is a more complete <span class="codefrag">WordCount</span> which uses many of the
@@ -1994,7 +2059,7 @@ document.write("Last Published: " + document.lastModified);
       <a href="quickstart.html#SingleNodeSetup">pseudo-distributed</a> or
       <a href="quickstart.html#Fully-Distributed+Operation">fully-distributed</a> 
       Hadoop installation.</p>
-<a name="N10C11"></a><a name="Source+Code-N10C11"></a>
+<a name="N10C63"></a><a name="Source+Code-N10C63"></a>
 <h3 class="h4">Source Code</h3>
 <table class="ForrestTable" cellspacing="1" cellpadding="4">
           
@@ -3204,7 +3269,7 @@ document.write("Last Published: " + document.lastModified);
 </tr>
         
 </table>
-<a name="N11373"></a><a name="Sample+Runs"></a>
+<a name="N113C5"></a><a name="Sample+Runs"></a>
 <h3 class="h4">Sample Runs</h3>
 <p>Sample text-files as input:</p>
 <p>
@@ -3372,7 +3437,7 @@ document.write("Last Published: " + document.lastModified);
 <br>
         
 </p>
-<a name="N11447"></a><a name="Highlights"></a>
+<a name="N11499"></a><a name="Highlights"></a>
 <h3 class="h4">Highlights</h3>
 <p>The second version of <span class="codefrag">WordCount</span> improves upon the 
         previous one by using some features offered by the Map-Reduce framework:

File diff suppressed because it is too large
+ 3 - 3
docs/mapred_tutorial.pdf


+ 69 - 1
src/docs/src/documentation/content/xdocs/mapred_tutorial.xml

@@ -1401,7 +1401,7 @@
           <em>symlink</em> the cached file(s) into the <code>current working 
           directory</code> of the task via the 
           <a href="ext:api/org/apache/hadoop/filecache/distributedcache/createsymlink">
-          DistributedCache.createSymlink(Path, Configuration)</a> api. Files 
+          DistributedCache.createSymlink(Configuration)</a> api. Files 
           have <em>execution permissions</em> set.</p>
         </section>
         
@@ -1465,6 +1465,74 @@
           jvm, which can be in the debugger, over precisely the same input.</p>
         </section>
         
+        <section>
+          <title>Debugging</title>
+          <p>Map/Reduce framework provides a facility to run user-provided 
+          scripts for debugging. When map/reduce task fails, user can run 
+          script for doing post-processing on task logs i.e task's stdout,
+          stderr, syslog and jobconf. The stdout and stderr of the
+          user-provided debug script are printed on the diagnostics. 
+          These outputs are also displayed on job UI on demand. </p>
+
+          <p> In the following sections we discuss how to submit debug script
+          along with the job. For submitting debug script, first it has to
+          distributed. Then the script has to supplied in Configuration. </p>
+          <section>
+          <title> How to distribute script file: </title>
+          <p>
+          To distribute  the debug script file, first copy the file to the dfs.
+          The file can be distributed by setting the property 
+          "mapred.cache.files" with value "path"#"script-name". 
+          If more than one file has to be distributed, the files can be added
+          as comma separated paths. This property can also be set by APIs
+          <a href="ext:api/org/apache/hadoop/filecache/distributedcache/addcachefile">
+          DistributedCache.addCacheFile(URI,conf) </a> and
+          <a href="ext:api/org/apache/hadoop/filecache/distributedcache/setcachefiles">
+          DistributedCache.setCacheFiles(URIs,conf) </a> where URI is of 
+          the form "hdfs://host:port/'absolutepath'#'script-name'". 
+          For Streaming, the file can be added through 
+          command line option -cacheFile.
+          </p>
+          
+          <p>
+          The files has to be symlinked in the current working directory of 
+          of the task. To create symlink for the file, the property 
+          "mapred.create.symlink" is set to "yes". This can also be set by
+          <a href="ext:api/org/apache/hadoop/filecache/distributedcache/createsymlink">
+          DistributedCache.createSymLink(Configuration) </a> api.
+          </p>
+          </section>
+          <section>
+          <title> How to submit script: </title>
+          <p> A quick way to submit debug script is to set values for the 
+          properties "mapred.map.task.debug.script" and 
+          "mapred.reduce.task.debug.script" for debugging map task and reduce
+          task respectively. These properties can also be set by using APIs 
+          <a href="ext:api/org/apache/hadoop/mapred/jobconf/setmapdebugscript">
+          JobConf.setMapDebugScript(String) </a> and
+          <a href="ext:api/org/apache/hadoop/mapred/jobconf/setreducedebugscript">
+          JobConf.setReduceDebugScript(String) </a>. For streaming, debug 
+          script can be submitted with command-line options -mapdebug,
+          -reducedebug for debugging mapper and reducer respectively.</p>
+            
+          <p>The arguments of the script are task's stdout, stderr, 
+          syslog and jobconf files. The debug command, run on the node where
+          the map/reduce failed, is: <br/>
+          <code> $script $stdout $stderr $syslog $jobconf </code> </p> 
+
+          <p> Pipes programs have the c++ program name as a fifth argument
+          for the command. Thus for the pipes programs the command is <br/> 
+          <code>$script $stdout $stderr $syslog $jobconf $program </code>  
+          </p>
+          </section>
+          
+          <section>
+          <title> Default Behavior: </title>
+          <p> For pipes, a default script is run to process core dumps under
+          gdb, prints stack trace and gives info about running threads. </p>
+          </section>
+        </section>
+        
         <section>
           <title>JobControl</title>
           

+ 2 - 0
src/docs/src/documentation/content/xdocs/site.xml

@@ -98,6 +98,8 @@ See http://forrest.apache.org/docs/linking.html for more info.
               <distributedcache href="DistributedCache.html">
                 <addarchivetoclasspath href="#addArchiveToClassPath(org.apache.hadoop.fs.Path,%20org.apache.hadoop.conf.Configuration)" />
                 <addfiletoclasspath href="#addFileToClassPath(org.apache.hadoop.fs.Path,%20org.apache.hadoop.conf.Configuration)" />
+                <addcachefile href="#addCacheFile(java.net.URI,%20org.apache.hadoop.conf.Configuration)" />
+                <setcachefiles href="#setCacheFiles(java.net.URI[],%20org.apache.hadoop.conf.Configuration)" />
                 <createsymlink href="#createSymlink(org.apache.hadoop.conf.Configuration)" />
               </distributedcache>  
             </filecache>

Some files were not shown because too many files changed in this diff