瀏覽代碼

Preparing for 0.18.0 build.

git-svn-id: https://svn.apache.org/repos/asf/hadoop/core/branches/branch-0.18@681496 13f79535-47bb-0310-9956-ffa450edef68
Doug Cutting 17 年之前
父節點
當前提交
281bf8ab7e
共有 6 個文件被更改,包括 227 次插入120 次删除
  1. 2 2
      build.xml
  2. 117 48
      docs/changes.html
  3. 24 20
      docs/commands_manual.html
  4. 1 1
      docs/commands_manual.pdf
  5. 61 44
      docs/mapred_tutorial.html
  6. 22 5
      docs/mapred_tutorial.pdf

+ 2 - 2
build.xml

@@ -26,9 +26,9 @@
  
   <property name="Name" value="Hadoop"/>
   <property name="name" value="hadoop"/>
-  <property name="version" value="0.18.0-dev"/>
+  <property name="version" value="0.18.1-dev"/>
   <property name="final.name" value="${name}-${version}"/>
-  <property name="year" value="2006"/>
+  <property name="year" value="2008"/>
   <property name="libhdfs.version" value="1"/>
 
   <property name="src.dir" value="${basedir}/src"/>  	

+ 117 - 48
docs/changes.html

@@ -36,7 +36,7 @@
     function collapse() {
       for (var i = 0; i < document.getElementsByTagName("ul").length; i++) {
         var list = document.getElementsByTagName("ul")[i];
-        if (list.id != 'release_0.18.0_-_unreleased_' && list.id != 'release_0.17.1_-_unreleased_') {
+        if (list.id != 'release_0.18.0_-_unreleased_' && list.id != 'release_0.17.2_-_unreleased_') {
           list.style.display = "none";
         }
       }
@@ -56,7 +56,7 @@
 </a></h2>
 <ul id="release_0.18.0_-_unreleased_">
   <li><a href="javascript:toggleList('release_0.18.0_-_unreleased_._incompatible_changes_')">  INCOMPATIBLE CHANGES
-</a>&nbsp;&nbsp;&nbsp;(22)
+</a>&nbsp;&nbsp;&nbsp;(23)
     <ol id="release_0.18.0_-_unreleased_._incompatible_changes_">
       <li><a href="http://issues.apache.org/jira/browse/HADOOP-2703">HADOOP-2703</a>.  The default options to fsck skips checking files
 that are being written to. The output of fsck is incompatible
@@ -128,6 +128,9 @@ of the keytype if the type does not define a WritableComparator. Calling
 the superclass compare will throw a NullPointerException. Also define
 a RawComparator for NullWritable and permit it to be written as a key
 to SequenceFiles.<br />(cdouglas)</li>
+      <li><a href="http://issues.apache.org/jira/browse/HADOOP-3673">HADOOP-3673</a>. Avoid deadlock caused by DataNode RPC receoverBlock().
+(Tsz Wo (Nicholas), SZE via rangadi)
+</li>
     </ol>
   </li>
   <li><a href="javascript:toggleList('release_0.18.0_-_unreleased_._new_features_')">  NEW FEATURES
@@ -187,8 +190,10 @@ in hadoop user guide.<br />(shv)</li>
     </ol>
   </li>
   <li><a href="javascript:toggleList('release_0.18.0_-_unreleased_._improvements_')">  IMPROVEMENTS
-</a>&nbsp;&nbsp;&nbsp;(46)
+</a>&nbsp;&nbsp;&nbsp;(47)
     <ol id="release_0.18.0_-_unreleased_._improvements_">
+      <li><a href="http://issues.apache.org/jira/browse/HADOOP-3677">HADOOP-3677</a>. Simplify generation stamp upgrade by making is a
+local upgrade on datandodes. Deleted distributed upgrade.<br />(rangadi)</li>
       <li><a href="http://issues.apache.org/jira/browse/HADOOP-2928">HADOOP-2928</a>. Remove deprecated FileSystem.getContentLength().<br />(Lohit Vjayarenu via rangadi)</li>
       <li><a href="http://issues.apache.org/jira/browse/HADOOP-3130">HADOOP-3130</a>. Make the connect timeout smaller for getFile.<br />(Amar Ramesh Kamat via ddas)</li>
       <li><a href="http://issues.apache.org/jira/browse/HADOOP-3160">HADOOP-3160</a>. Remove deprecated exists() from ClientProtocol and
@@ -305,7 +310,7 @@ InputFormat.validateInput.<br />(tomwhite via omalley)</li>
     </ol>
   </li>
   <li><a href="javascript:toggleList('release_0.18.0_-_unreleased_._bug_fixes_')">  BUG FIXES
-</a>&nbsp;&nbsp;&nbsp;(124)
+</a>&nbsp;&nbsp;&nbsp;(138)
     <ol id="release_0.18.0_-_unreleased_._bug_fixes_">
       <li><a href="http://issues.apache.org/jira/browse/HADOOP-2905">HADOOP-2905</a>. 'fsck -move' triggers NPE in NameNode.<br />(Lohit Vjayarenu via rangadi)</li>
       <li>Increment ClientProtocol.versionID missed by <a href="http://issues.apache.org/jira/browse/HADOOP-2585">HADOOP-2585</a>.<br />(shv)</li>
@@ -382,8 +387,6 @@ write operation.<br />(rangadi)</li>
       <li><a href="http://issues.apache.org/jira/browse/HADOOP-3296">HADOOP-3296</a>. Fix task cache to work for more than two levels in the cache
 hierarchy. This also adds a new counter to track cache hits at levels
 greater than two.<br />(Amar Kamat via cdouglas)</li>
-      <li><a href="http://issues.apache.org/jira/browse/HADOOP-3370">HADOOP-3370</a>. Ensure that the TaskTracker.runningJobs data-structure is
-correctly cleaned-up on task completion.<br />(Zheng Shao via acmurthy)</li>
       <li><a href="http://issues.apache.org/jira/browse/HADOOP-3375">HADOOP-3375</a>. Lease paths were sometimes not removed from
 LeaseManager.sortedLeasesByPath. (Tsz Wo (Nicholas), SZE via dhruba)
 </li>
@@ -409,11 +412,7 @@ security manager non-fatal.<br />(Edward Yoon via omalley)</li>
 instead of removed getFileCacheHints.<br />(lohit vijayarenu via cdouglas)</li>
       <li><a href="http://issues.apache.org/jira/browse/HADOOP-3401">HADOOP-3401</a>. Update FileBench to set the new
 "mapred.work.output.dir" property to work post-3041.<br />(cdouglas via omalley)</li>
-      <li><a href="http://issues.apache.org/jira/browse/HADOOP-2159">HADOOP-2159</a> Namenode stuck in safemode. The counter blockSafe should
-not be decremented for invalid blocks.<br />(hairong)</li>
       <li><a href="http://issues.apache.org/jira/browse/HADOOP-2669">HADOOP-2669</a>. DFSClient locks pendingCreates appropriately.<br />(dhruba)</li>
-      <li><a href="http://issues.apache.org/jira/browse/HADOOP-3477">HADOOP-3477</a>. Fix build to not package contrib/*/bin twice in
-distributions.<br />(Adam Heath via cutting)</li>
       <li><a href="http://issues.apache.org/jira/browse/HADOOP-3410">HADOOP-3410</a>. Fix KFS implemenation to return correct file
 modification time.<br />(Sriram Rao via cutting)</li>
       <li><a href="http://issues.apache.org/jira/browse/HADOOP-3340">HADOOP-3340</a>. Fix DFS metrics for BlocksReplicated, HeartbeatsNum, and
@@ -422,18 +421,17 @@ BlockReportsAverageTime.<br />(lohit vijayarenu via cdouglas)</li>
 /bin/bash and fix the test patch to require bash instead of sh.<br />(Brice Arnould via omalley)</li>
       <li><a href="http://issues.apache.org/jira/browse/HADOOP-3471">HADOOP-3471</a>. Fix spurious errors from TestIndexedSort and add additional
 logging to let failures be reproducible.<br />(cdouglas)</li>
-      <li><a href="http://issues.apache.org/jira/browse/HADOOP-3475">HADOOP-3475</a>. Fix MapTask to correctly size the accounting allocation of
-io.sort.mb.<br />(cdouglas)</li>
       <li><a href="http://issues.apache.org/jira/browse/HADOOP-3443">HADOOP-3443</a>. Avoid copying map output across partitions when renaming a
 single spill.<br />(omalley via cdouglas)</li>
       <li><a href="http://issues.apache.org/jira/browse/HADOOP-3454">HADOOP-3454</a>. Fix Text::find to search only valid byte ranges.<br />(Chad Whipkey
 via cdouglas)</li>
-      <li><a href="http://issues.apache.org/jira/browse/HADOOP-3417">HADOOP-3417</a>. Removes the static configuration variable, commandLineConfig from
-JobClient. Moves the cli parsing from JobShell to GenericOptionsParser.
-Thus removes the class org.apache.hadoop.mapred.JobShell.<br />(Amareshwari Sriramadasu via ddas)</li>
-      <li><a href="http://issues.apache.org/jira/browse/HADOOP-2132">HADOOP-2132</a>. Only RUNNING/PREP jobs can be killed.<br />(Jothi Padmanabhan via ddas)</li>
-      <li><a href="http://issues.apache.org/jira/browse/HADOOP-3472">HADOOP-3472</a> MapFile.Reader getClosest() function returns incorrect results
-when before is true<br />(Todd Lipcon via Stack)</li>
+      <li><a href="http://issues.apache.org/jira/browse/HADOOP-3417">HADOOP-3417</a>. Removes the static configuration variable,
+commandLineConfig from JobClient. Moves the cli parsing from
+JobShell to GenericOptionsParser.  Thus removes the class
+org.apache.hadoop.mapred.JobShell.<br />(Amareshwari Sriramadasu via
+ddas)</li>
+      <li><a href="http://issues.apache.org/jira/browse/HADOOP-2132">HADOOP-2132</a>. Only RUNNING/PREP jobs can be killed.<br />(Jothi Padmanabhan
+via ddas)</li>
       <li><a href="http://issues.apache.org/jira/browse/HADOOP-3476">HADOOP-3476</a>. Code cleanup in fuse-dfs.<br />(Peter Wyckoff via dhruba)</li>
       <li><a href="http://issues.apache.org/jira/browse/HADOOP-2427">HADOOP-2427</a>. Ensure that the cwd of completed tasks is cleaned-up
 correctly on task-completion.<br />(Amareshwari Sri Ramadasu via acmurthy)</li>
@@ -452,14 +450,16 @@ Instead the file is created in the test directory<br />(Mahadev Konar via ddas)<
 in <a href="http://issues.apache.org/jira/browse/HADOOP-3095">HADOOP-3095</a>.<br />(tomwhite)</li>
       <li><a href="http://issues.apache.org/jira/browse/HADOOP-3135">HADOOP-3135</a>. Get the system directory from the JobTracker instead of from
 the conf.<br />(Subramaniam Krishnan via ddas)</li>
-      <li><a href="http://issues.apache.org/jira/browse/HADOOP-3503">HADOOP-3503</a>. Fix a race condition when client and namenode start simultaneous
-recovery of the same block.  (dhruba &amp; Tsz Wo (Nicholas), SZE)
+      <li><a href="http://issues.apache.org/jira/browse/HADOOP-3503">HADOOP-3503</a>. Fix a race condition when client and namenode start
+simultaneous recovery of the same block.  (dhruba &amp; Tsz Wo
+(Nicholas), SZE)
 </li>
       <li><a href="http://issues.apache.org/jira/browse/HADOOP-3440">HADOOP-3440</a>. Fixes DistributedCache to not create symlinks for paths which
 don't have fragments even when createSymLink is true.<br />(Abhijit Bagri via ddas)</li>
       <li><a href="http://issues.apache.org/jira/browse/HADOOP-3463">HADOOP-3463</a>. Hadoop-daemons script should cd to $HADOOP_HOME.<br />(omalley)</li>
       <li><a href="http://issues.apache.org/jira/browse/HADOOP-3489">HADOOP-3489</a>. Fix NPE in SafeModeMonitor.<br />(Lohit Vijayarenu via shv)</li>
-      <li><a href="http://issues.apache.org/jira/browse/HADOOP-3509">HADOOP-3509</a>. Fix NPE in FSNamesystem.close. (Tsz Wo (Nicholas), SZE via shv)
+      <li><a href="http://issues.apache.org/jira/browse/HADOOP-3509">HADOOP-3509</a>. Fix NPE in FSNamesystem.close. (Tsz Wo (Nicholas), SZE via
+shv)
 </li>
       <li><a href="http://issues.apache.org/jira/browse/HADOOP-3491">HADOOP-3491</a>. Name-node shutdown causes InterruptedException in
 ResolutionMonitor.<br />(Lohit Vijayarenu via shv)</li>
@@ -471,9 +471,6 @@ with a configuration.<br />(Subramaniam Krishnan via omalley)</li>
       <li><a href="http://issues.apache.org/jira/browse/HADOOP-3519">HADOOP-3519</a>.  Fix NPE in DFS FileSystem rename.<br />(hairong via tomwhite)</li>
       <li><a href="http://issues.apache.org/jira/browse/HADOOP-3528">HADOOP-3528</a>. Metrics FilesCreated and files_deleted metrics
 do not match.<br />(Lohit via Mahadev)</li>
-      <li><a href="http://issues.apache.org/jira/browse/HADOOP-3442">HADOOP-3442</a>. Limit recursion depth on the stack for QuickSort to prevent
-StackOverflowErrors. To avoid O(n*n) cases, when partitioning depth exceeds
-a multiple of log(n), change to HeapSort.<br />(cdouglas)</li>
       <li><a href="http://issues.apache.org/jira/browse/HADOOP-3418">HADOOP-3418</a>. When a directory is deleted, any leases that point to files
 in the subdirectory are removed. ((Tsz Wo (Nicholas), SZE via dhruba)
 </li>
@@ -493,10 +490,6 @@ cygwin. (Tsz Wo (Nicholas), Sze via omalley)
 </li>
       <li><a href="http://issues.apache.org/jira/browse/HADOOP-3520">HADOOP-3520</a>.  TestDFSUpgradeFromImage triggers a race condition in the
 Upgrade Manager. Fixed.<br />(dhruba)</li>
-      <li><a href="http://issues.apache.org/jira/browse/HADOOP-3550">HADOOP-3550</a>. Fix the serialization data structures in MapTask where the
-value lengths are incorrectly calculated.<br />(cdouglas)</li>
-      <li><a href="http://issues.apache.org/jira/browse/HADOOP-3526">HADOOP-3526</a>. Fix contrib/data_join framework by cloning values retained
-in the reduce.<br />(Spyros Blanas via cdouglas)</li>
       <li><a href="http://issues.apache.org/jira/browse/HADOOP-3586">HADOOP-3586</a>. Provide deprecated, backwards compatibile semantics for the
 combiner to be run once and only once on each record.<br />(cdouglas)</li>
       <li><a href="http://issues.apache.org/jira/browse/HADOOP-3533">HADOOP-3533</a>. Add deprecated methods to provide API compatibility
@@ -539,8 +532,6 @@ open.<br />(Benjamin Gufler via hairong)</li>
       <li><a href="http://issues.apache.org/jira/browse/HADOOP-3604">HADOOP-3604</a>. Work around a JVM synchronization problem observed while
 retrieving the address of direct buffers from compression code by obtaining
 a lock during this call.<br />(Arun C Murthy via cdouglas)</li>
-      <li><a href="http://issues.apache.org/jira/browse/HADOOP-3678">HADOOP-3678</a>. Avoid spurious exceptions logged at DataNode when clients
-read from DFS.<br />(rangadi)</li>
       <li><a href="http://issues.apache.org/jira/browse/HADOOP-3683">HADOOP-3683</a>. Fix dfs metrics to count file listings rather than files
 listed.<br />(lohit vijayarenu via cdouglas)</li>
       <li><a href="http://issues.apache.org/jira/browse/HADOOP-3597">HADOOP-3597</a>. Fix SortValidator to use filesystems other than the default as
@@ -551,45 +542,123 @@ conform to style guidelines.<br />(Amareshwari Sriramadasu via cdouglas)</li>
 classpath jars.<br />(Brice Arnould via nigel)</li>
       <li><a href="http://issues.apache.org/jira/browse/HADOOP-3692">HADOOP-3692</a>. Fix documentation for Cluster setup and Quick start guides.<br />(Amareshwari Sriramadasu via ddas)</li>
       <li><a href="http://issues.apache.org/jira/browse/HADOOP-3691">HADOOP-3691</a>. Fix streaming and tutorial docs.<br />(Jothi Padmanabhan via ddas)</li>
+      <li><a href="http://issues.apache.org/jira/browse/HADOOP-3630">HADOOP-3630</a>. Fix NullPointerException in CompositeRecordReader from empty
+sources<br />(cdouglas)</li>
+      <li><a href="http://issues.apache.org/jira/browse/HADOOP-3706">HADOOP-3706</a>. Fix a ClassLoader issue in the mapred.join Parser that
+prevents it from loading user-specified InputFormats.<br />(Jingkei Ly via cdouglas)</li>
+      <li><a href="http://issues.apache.org/jira/browse/HADOOP-3718">HADOOP-3718</a>. Fix KFSOutputStream::write(int) to output a byte instead of
+an int, per the OutputStream contract.<br />(Sriram Rao via cdouglas)</li>
+      <li><a href="http://issues.apache.org/jira/browse/HADOOP-3647">HADOOP-3647</a>. Add debug logs to help track down a very occassional,
+hard-to-reproduce, bug in shuffle/merge on the reducer.<br />(acmurthy)</li>
+      <li><a href="http://issues.apache.org/jira/browse/HADOOP-3716">HADOOP-3716</a>. Prevent listStatus in KosmosFileSystem from returning
+null for valid, empty directories.<br />(Sriram Rao via cdouglas)</li>
+      <li><a href="http://issues.apache.org/jira/browse/HADOOP-3752">HADOOP-3752</a>. Fix audit logging to record rename events.<br />(cdouglas)</li>
+      <li><a href="http://issues.apache.org/jira/browse/HADOOP-3737">HADOOP-3737</a>. Fix CompressedWritable to call Deflater::end to release
+compressor memory.<br />(Grant Glouser via cdouglas)</li>
+      <li><a href="http://issues.apache.org/jira/browse/HADOOP-3670">HADOOP-3670</a>. Fixes JobTracker to clear out split bytes when no longer
+required.<br />(Amareshwari Sriramadasu via ddas)</li>
+      <li><a href="http://issues.apache.org/jira/browse/HADOOP-3743">HADOOP-3743</a>. Fix -libjars, -files, -archives options to work even if
+user code does not implement tools.<br />(Amareshwari Sriramadasu via mahadev)</li>
+      <li><a href="http://issues.apache.org/jira/browse/HADOOP-3774">HADOOP-3774</a>. Fix typos in shell output. (Tsz Wo (Nicholas), SZE via
+cdouglas)
+</li>
+      <li><a href="http://issues.apache.org/jira/browse/HADOOP-3762">HADOOP-3762</a>. Fixed FileSystem cache to work with the default port.<br />(cutting
+via omalley)</li>
+      <li><a href="http://issues.apache.org/jira/browse/HADOOP-3798">HADOOP-3798</a>. Fix tests compilation.<br />(Mukund Madhugiri via omalley)</li>
+      <li><a href="http://issues.apache.org/jira/browse/HADOOP-3794">HADOOP-3794</a>. Return modification time instead of zero for KosmosFileSystem.<br />(Sriram Rao via cdouglas)</li>
+      <li><a href="http://issues.apache.org/jira/browse/HADOOP-3806">HADOOP-3806</a>. Remove debug statement to stdout from QuickSort.<br />(cdouglas)</li>
+      <li><a href="http://issues.apache.org/jira/browse/HADOOP-3776">HADOOP-3776</a>. Fix NPE at NameNode when datanode reports a block after it is
+deleted at NameNode.<br />(rangadi)</li>
+      <li><a href="http://issues.apache.org/jira/browse/HADOOP-3537">HADOOP-3537</a>. Disallow adding a datanode to a network topology when its
+network location is not resolved.<br />(hairong)</li>
+      <li><a href="http://issues.apache.org/jira/browse/HADOOP-3571">HADOOP-3571</a>. Fix bug in block removal used in lease recovery.<br />(shv)</li>
+      <li><a href="http://issues.apache.org/jira/browse/HADOOP-3645">HADOOP-3645</a>. MetricsTimeVaryingRate returns wrong value for
+metric_avg_time.<br />(Lohit Vijayarenu via hairong)</li>
+      <li><a href="http://issues.apache.org/jira/browse/HADOOP-3521">HADOOP-3521</a>. Reverted the missing cast to float for sending Counters' values
+to Hadoop metrics which was removed by <a href="http://issues.apache.org/jira/browse/HADOOP-544">HADOOP-544</a>.<br />(acmurthy)</li>
+      <li><a href="http://issues.apache.org/jira/browse/HADOOP-3724">HADOOP-3724</a>. Fixes two problems related to storing and recovering lease
+in the fsimage.<br />(dhruba)</li>
+      <li><a href="http://issues.apache.org/jira/browse/HADOOP-3827">HADOOP-3827</a>.  Fixed compression of empty map-outputs.<br />(acmurthy)</li>
+      <li><a href="http://issues.apache.org/jira/browse/HADOOP-3855">HADOOP-3855</a>. Fixes an import problem introduced by <a href="http://issues.apache.org/jira/browse/HADOOP-3827">HADOOP-3827</a>.<br />(Arun Murthy via ddas)</li>
+      <li><a href="http://issues.apache.org/jira/browse/HADOOP-3865">HADOOP-3865</a>. Remove reference to FSNamesystem from metrics preventing
+garbage collection.<br />(Lohit Vijayarenu via cdouglas)</li>
     </ol>
   </li>
 </ul>
-<h2><a href="javascript:toggleList('release_0.17.1_-_unreleased_')">Release 0.17.1 - Unreleased
+<h2><a href="javascript:toggleList('release_0.17.2_-_unreleased_')">Release 0.17.2 - Unreleased
 </a></h2>
-<ul id="release_0.17.1_-_unreleased_">
-  <li><a href="javascript:toggleList('release_0.17.1_-_unreleased_._incompatible_changes_')">  INCOMPATIBLE CHANGES
+<ul id="release_0.17.2_-_unreleased_">
+  <li><a href="javascript:toggleList('release_0.17.2_-_unreleased_._bug_fixes_')">  BUG FIXES
+</a>&nbsp;&nbsp;&nbsp;(10)
+    <ol id="release_0.17.2_-_unreleased_._bug_fixes_">
+      <li><a href="http://issues.apache.org/jira/browse/HADOOP-3678">HADOOP-3678</a>. Avoid spurious exceptions logged at DataNode when clients
+read from DFS.<br />(rangadi)</li>
+      <li><a href="http://issues.apache.org/jira/browse/HADOOP-3760">HADOOP-3760</a>. Fix a bug with HDFS file close() mistakenly introduced
+by <a href="http://issues.apache.org/jira/browse/HADOOP-3681">HADOOP-3681</a>.<br />(Lohit Vijayarenu via rangadi)</li>
+      <li><a href="http://issues.apache.org/jira/browse/HADOOP-3707">HADOOP-3707</a>. NameNode keeps a count of number of blocks scheduled
+to be written to a datanode and uses it to avoid allocating more
+blocks than a datanode can hold.<br />(rangadi)</li>
+      <li><a href="http://issues.apache.org/jira/browse/HADOOP-3681">HADOOP-3681</a>. DFSClient can get into an infinite loop while closing
+a file if there are some errors.<br />(Lohit Vijayarenu via rangadi)</li>
+      <li><a href="http://issues.apache.org/jira/browse/HADOOP-3002">HADOOP-3002</a>. Hold off block removal while in safe mode.<br />(shv)</li>
+      <li><a href="http://issues.apache.org/jira/browse/HADOOP-3685">HADOOP-3685</a>. Unbalanced replication target.<br />(hairong)</li>
+      <li><a href="http://issues.apache.org/jira/browse/HADOOP-3758">HADOOP-3758</a>. Shutdown datanode on version mismatch instead of retrying
+continuously, preventing excessive logging at the namenode.<br />(lohit vijayarenu via cdouglas)</li>
+      <li><a href="http://issues.apache.org/jira/browse/HADOOP-3633">HADOOP-3633</a>. Correct exception handling in DataXceiveServer, and throttle
+the number of xceiver threads in a data-node.<br />(shv)</li>
+      <li><a href="http://issues.apache.org/jira/browse/HADOOP-3370">HADOOP-3370</a>. Ensure that the TaskTracker.runningJobs data-structure is
+correctly cleaned-up on task completion.<br />(Zheng Shao via acmurthy)</li>
+      <li><a href="http://issues.apache.org/jira/browse/HADOOP-3813">HADOOP-3813</a>. Fix task-output clean-up on HDFS to use the recursive
+FileSystem.delete rather than the FileUtil.fullyDelete.<br />(Amareshwari
+Sri Ramadasu via acmurthy)</li>
+    </ol>
+  </li>
+</ul>
+<h2><a href="javascript:toggleList('older')">Older Releases</a></h2>
+<ul id="older">
+<h3><a href="javascript:toggleList('release_0.17.1_-_2008-06-23_')">Release 0.17.1 - 2008-06-23
+</a></h3>
+<ul id="release_0.17.1_-_2008-06-23_">
+  <li><a href="javascript:toggleList('release_0.17.1_-_2008-06-23_._incompatible_changes_')">  INCOMPATIBLE CHANGES
 </a>&nbsp;&nbsp;&nbsp;(1)
-    <ol id="release_0.17.1_-_unreleased_._incompatible_changes_">
+    <ol id="release_0.17.1_-_2008-06-23_._incompatible_changes_">
       <li><a href="http://issues.apache.org/jira/browse/HADOOP-3565">HADOOP-3565</a>. Fix the Java serialization, which is not enabled by
 default, to clear the state of the serializer between objects.<br />(tomwhite via omalley)</li>
     </ol>
   </li>
-  <li><a href="javascript:toggleList('release_0.17.1_-_unreleased_._improvements_')">  IMPROVEMENTS
+  <li><a href="javascript:toggleList('release_0.17.1_-_2008-06-23_._improvements_')">  IMPROVEMENTS
 </a>&nbsp;&nbsp;&nbsp;(2)
-    <ol id="release_0.17.1_-_unreleased_._improvements_">
+    <ol id="release_0.17.1_-_2008-06-23_._improvements_">
       <li><a href="http://issues.apache.org/jira/browse/HADOOP-3522">HADOOP-3522</a>. Improve documentation on reduce pointing out that
 input keys and values will be reused.<br />(omalley)</li>
       <li><a href="http://issues.apache.org/jira/browse/HADOOP-3487">HADOOP-3487</a>. Balancer uses thread pools for managing its threads;
 therefore provides better resource management.<br />(hairong)</li>
     </ol>
   </li>
-  <li><a href="javascript:toggleList('release_0.17.1_-_unreleased_._bug_fixes_')">  BUG FIXES
-</a>&nbsp;&nbsp;&nbsp;(5)
-    <ol id="release_0.17.1_-_unreleased_._bug_fixes_">
+  <li><a href="javascript:toggleList('release_0.17.1_-_2008-06-23_._bug_fixes_')">  BUG FIXES
+</a>&nbsp;&nbsp;&nbsp;(8)
+    <ol id="release_0.17.1_-_2008-06-23_._bug_fixes_">
+      <li><a href="http://issues.apache.org/jira/browse/HADOOP-2159">HADOOP-2159</a> Namenode stuck in safemode. The counter blockSafe should
+not be decremented for invalid blocks.<br />(hairong)</li>
+      <li><a href="http://issues.apache.org/jira/browse/HADOOP-3472">HADOOP-3472</a> MapFile.Reader getClosest() function returns incorrect results
+when before is true<br />(Todd Lipcon via Stack)</li>
+      <li><a href="http://issues.apache.org/jira/browse/HADOOP-3442">HADOOP-3442</a>. Limit recursion depth on the stack for QuickSort to prevent
+StackOverflowErrors. To avoid O(n*n) cases, when partitioning depth
+exceeds
+a multiple of log(n), change to HeapSort.<br />(cdouglas)</li>
+      <li><a href="http://issues.apache.org/jira/browse/HADOOP-3477">HADOOP-3477</a>. Fix build to not package contrib/*/bin twice in
+distributions.<br />(Adam Heath via cutting)</li>
+      <li><a href="http://issues.apache.org/jira/browse/HADOOP-3475">HADOOP-3475</a>. Fix MapTask to correctly size the accounting allocation of
+io.sort.mb.<br />(cdouglas)</li>
+      <li><a href="http://issues.apache.org/jira/browse/HADOOP-3550">HADOOP-3550</a>. Fix the serialization data structures in MapTask where the
+value lengths are incorrectly calculated.<br />(cdouglas)</li>
+      <li><a href="http://issues.apache.org/jira/browse/HADOOP-3526">HADOOP-3526</a>. Fix contrib/data_join framework by cloning values retained
+in the reduce.<br />(Spyros Blanas via cdouglas)</li>
       <li><a href="http://issues.apache.org/jira/browse/HADOOP-1979">HADOOP-1979</a>. Speed up fsck by adding a buffered stream.<br />(Lohit
 Vijaya Renu via omalley)</li>
-      <li><a href="http://issues.apache.org/jira/browse/HADOOP-3537">HADOOP-3537</a>. Disallow adding a datanode to a network topology when its
-network location is not resolved.<br />(hairong)</li>
-      <li><a href="http://issues.apache.org/jira/browse/HADOOP-3571">HADOOP-3571</a>. Fix bug in block removal used in lease recovery.<br />(shv)</li>
-      <li><a href="http://issues.apache.org/jira/browse/HADOOP-3645">HADOOP-3645</a>. MetricsTimeVaryingRate returns wrong value for
-metric_avg_time.<br />(Lohit Vijayarenu via hairong)</li>
-      <li><a href="http://issues.apache.org/jira/browse/HADOOP-3633">HADOOP-3633</a>. Correct exception handling in DataXceiveServer, and throttle
-the number of xceiver threads in a data-node.<br />(shv)</li>
     </ol>
   </li>
 </ul>
-<h2><a href="javascript:toggleList('older')">Older Releases</a></h2>
-<ul id="older">
 <h3><a href="javascript:toggleList('release_0.17.0_-_2008-05-18_')">Release 0.17.0 - 2008-05-18
 </a></h3>
 <ul id="release_0.17.0_-_2008-05-18_">

+ 24 - 20
docs/commands_manual.html

@@ -315,7 +315,11 @@ document.write("Last Published: " + document.lastModified);
 <p>
 				  Following are supported by <a href="commands_manual.html#dfsadmin">dfsadmin</a>, 
 				  <a href="commands_manual.html#fs">fs</a>, <a href="commands_manual.html#fsck">fsck</a> and 
-				  <a href="commands_manual.html#job">job</a>.
+				  <a href="commands_manual.html#job">job</a>. 
+				  Applications should implement 
+				  <a href="api/org/apache/hadoop/util/Tool.html">Tool</a> to support
+				  <a href="api/org/apache/hadoop/util/GenericOptionsParser.html">
+				  GenericOptions</a>.
 				</p>
 <table class="ForrestTable" cellspacing="1" cellpadding="4">
 			          
@@ -380,11 +384,11 @@ document.write("Last Published: " + document.lastModified);
 </div>
 		
 		
-<a name="N100FB"></a><a name="User+Commands"></a>
+<a name="N10103"></a><a name="User+Commands"></a>
 <h2 class="h3"> User Commands </h2>
 <div class="section">
 <p>Commands useful for users of a hadoop cluster.</p>
-<a name="N10104"></a><a name="archive"></a>
+<a name="N1010C"></a><a name="archive"></a>
 <h3 class="h4"> archive </h3>
 <p>
 					Creates a hadoop archive. More information can be found at <a href="hadoop_archives.html">Hadoop Archives</a>.
@@ -422,7 +426,7 @@ document.write("Last Published: " + document.lastModified);
 </tr>
 			     
 </table>
-<a name="N1014F"></a><a name="distcp"></a>
+<a name="N10157"></a><a name="distcp"></a>
 <h3 class="h4"> distcp </h3>
 <p>
 					Copy file or directories recursively. More information can be found at <a href="distcp.html">DistCp Guide</a>.
@@ -454,7 +458,7 @@ document.write("Last Published: " + document.lastModified);
 </tr>
 			     
 </table>
-<a name="N1018C"></a><a name="fs"></a>
+<a name="N10194"></a><a name="fs"></a>
 <h3 class="h4"> fs </h3>
 <p>
 					
@@ -468,7 +472,7 @@ document.write("Last Published: " + document.lastModified);
 <p>
 					The various COMMAND_OPTIONS can be found at <a href="hdfs_shell.html">HDFS Shell Guide</a>.
 				</p>
-<a name="N101A8"></a><a name="fsck"></a>
+<a name="N101B0"></a><a name="fsck"></a>
 <h3 class="h4"> fsck </h3>
 <p>
 					Runs a HDFS filesystem checking utility. See <a href="hdfs_user_guide.html#Fsck">Fsck</a> for more info.
@@ -541,7 +545,7 @@ document.write("Last Published: " + document.lastModified);
 </tr>
 					
 </table>
-<a name="N1023C"></a><a name="jar"></a>
+<a name="N10244"></a><a name="jar"></a>
 <h3 class="h4"> jar </h3>
 <p>
 					Runs a jar file. Users can bundle their Map Reduce code in a jar file and execute it using this command.
@@ -561,7 +565,7 @@ document.write("Last Published: " + document.lastModified);
 					<a href="mapred_tutorial.html#Usage">Wordcount example</a>
 				
 </p>
-<a name="N1025A"></a><a name="job"></a>
+<a name="N10262"></a><a name="job"></a>
 <h3 class="h4"> job </h3>
 <p>
 					Command to interact with Map Reduce Jobs.
@@ -648,7 +652,7 @@ document.write("Last Published: " + document.lastModified);
 </tr>
 					
 </table>
-<a name="N102FA"></a><a name="pipes"></a>
+<a name="N10302"></a><a name="pipes"></a>
 <h3 class="h4"> pipes </h3>
 <p>
 					Runs a pipes job.
@@ -753,7 +757,7 @@ document.write("Last Published: " + document.lastModified);
 </tr>
 					
 </table>
-<a name="N103BF"></a><a name="version"></a>
+<a name="N103C7"></a><a name="version"></a>
 <h3 class="h4"> version </h3>
 <p>
 					Prints the version.
@@ -763,7 +767,7 @@ document.write("Last Published: " + document.lastModified);
 <span class="codefrag">Usage: hadoop version</span>
 				
 </p>
-<a name="N103CF"></a><a name="CLASSNAME"></a>
+<a name="N103D7"></a><a name="CLASSNAME"></a>
 <h3 class="h4"> CLASSNAME </h3>
 <p>
 					 hadoop script can be used to invoke any class.
@@ -779,11 +783,11 @@ document.write("Last Published: " + document.lastModified);
 </div>
 		
 		
-<a name="N103E3"></a><a name="Administration+Commands"></a>
+<a name="N103EB"></a><a name="Administration+Commands"></a>
 <h2 class="h3"> Administration Commands </h2>
 <div class="section">
 <p>Commands useful for administrators of a hadoop cluster.</p>
-<a name="N103EC"></a><a name="balancer"></a>
+<a name="N103F4"></a><a name="balancer"></a>
 <h3 class="h4"> balancer </h3>
 <p>
 					Runs a cluster balancing utility. An administrator can simply press Ctrl-C to stop the 
@@ -809,7 +813,7 @@ document.write("Last Published: " + document.lastModified);
 </tr>
 			     
 </table>
-<a name="N1041B"></a><a name="daemonlog"></a>
+<a name="N10423"></a><a name="daemonlog"></a>
 <h3 class="h4"> daemonlog </h3>
 <p>
 					 Get/Set the log level for each daemon.
@@ -846,7 +850,7 @@ document.write("Last Published: " + document.lastModified);
 </tr>
 			     
 </table>
-<a name="N10458"></a><a name="datanode"></a>
+<a name="N10460"></a><a name="datanode"></a>
 <h3 class="h4"> datanode</h3>
 <p>
 					Runs a HDFS datanode.
@@ -872,7 +876,7 @@ document.write("Last Published: " + document.lastModified);
 </tr>
 			     
 </table>
-<a name="N10483"></a><a name="dfsadmin"></a>
+<a name="N1048B"></a><a name="dfsadmin"></a>
 <h3 class="h4"> dfsadmin </h3>
 <p>
 					Runs a HDFS dfsadmin client.
@@ -987,7 +991,7 @@ document.write("Last Published: " + document.lastModified);
 </tr>
 			     
 </table>
-<a name="N10543"></a><a name="jobtracker"></a>
+<a name="N1054B"></a><a name="jobtracker"></a>
 <h3 class="h4"> jobtracker </h3>
 <p>
 					Runs the MapReduce job Tracker node.
@@ -997,7 +1001,7 @@ document.write("Last Published: " + document.lastModified);
 <span class="codefrag">Usage: hadoop jobtracker</span>
 				
 </p>
-<a name="N10553"></a><a name="namenode"></a>
+<a name="N1055B"></a><a name="namenode"></a>
 <h3 class="h4"> namenode </h3>
 <p>
 					Runs the namenode. More info about the upgrade, rollback and finalize is at 
@@ -1055,7 +1059,7 @@ document.write("Last Published: " + document.lastModified);
 </tr>
 			     
 </table>
-<a name="N105BA"></a><a name="secondarynamenode"></a>
+<a name="N105C2"></a><a name="secondarynamenode"></a>
 <h3 class="h4"> secondarynamenode </h3>
 <p>
 					Runs the HDFS secondary namenode. See <a href="hdfs_user_guide.html#Secondary+Namenode">Secondary Namenode</a> 
@@ -1089,7 +1093,7 @@ document.write("Last Published: " + document.lastModified);
 </tr>
 			     
 </table>
-<a name="N105F7"></a><a name="tasktracker"></a>
+<a name="N105FF"></a><a name="tasktracker"></a>
 <h3 class="h4"> tasktracker </h3>
 <p>
 					Runs a MapReduce task Tracker node.

File diff suppressed because it is too large
+ 1 - 1
docs/commands_manual.pdf


+ 61 - 44
docs/mapred_tutorial.html

@@ -310,7 +310,7 @@ document.write("Last Published: " + document.lastModified);
 <a href="#Example%3A+WordCount+v2.0">Example: WordCount v2.0</a>
 <ul class="minitoc">
 <li>
-<a href="#Source+Code-N10DD5">Source Code</a>
+<a href="#Source+Code-N10DF5">Source Code</a>
 </li>
 <li>
 <a href="#Sample+Runs">Sample Runs</a>
@@ -1116,7 +1116,24 @@ document.write("Last Published: " + document.lastModified);
 <br>
         
 </p>
-<a name="N104EC"></a><a name="Walk-through"></a>
+<p> Applications can specify a comma separated list of paths which
+        would be present in the current working directory of the task 
+        using the option <span class="codefrag">-files</span>. The <span class="codefrag">-libjars</span>
+        option allows applications to add jars to the classpaths of the maps
+        and reduces. The <span class="codefrag">-archives</span> allows them to pass archives
+        as arguments that are unzipped/unjarred and a link with name of the
+        jar/zip are created in the current working directory of tasks. More
+        details about the command line options are available at 
+        <a href="commands_manual.html">Commands manual</a>
+</p>
+<p>Running <span class="codefrag">wordcount</span> example with 
+        <span class="codefrag">-libjars</span> and <span class="codefrag">-files</span>:<br>
+        
+<span class="codefrag"> hadoop jar hadoop-examples.jar wordcount -files cachefile.txt 
+        -libjars mylib.jar input output </span> 
+        
+</p>
+<a name="N1050C"></a><a name="Walk-through"></a>
 <h3 class="h4">Walk-through</h3>
 <p>The <span class="codefrag">WordCount</span> application is quite straight-forward.</p>
 <p>The <span class="codefrag">Mapper</span> implementation (lines 14-26), via the 
@@ -1226,7 +1243,7 @@ document.write("Last Published: " + document.lastModified);
 </div>
     
     
-<a name="N105A3"></a><a name="Map%2FReduce+-+User+Interfaces"></a>
+<a name="N105C3"></a><a name="Map%2FReduce+-+User+Interfaces"></a>
 <h2 class="h3">Map/Reduce - User Interfaces</h2>
 <div class="section">
 <p>This section provides a reasonable amount of detail on every user-facing 
@@ -1245,12 +1262,12 @@ document.write("Last Published: " + document.lastModified);
 <p>Finally, we will wrap up by discussing some useful features of the
       framework such as the <span class="codefrag">DistributedCache</span>, 
       <span class="codefrag">IsolationRunner</span> etc.</p>
-<a name="N105DC"></a><a name="Payload"></a>
+<a name="N105FC"></a><a name="Payload"></a>
 <h3 class="h4">Payload</h3>
 <p>Applications typically implement the <span class="codefrag">Mapper</span> and 
         <span class="codefrag">Reducer</span> interfaces to provide the <span class="codefrag">map</span> and 
         <span class="codefrag">reduce</span> methods. These form the core of the job.</p>
-<a name="N105F1"></a><a name="Mapper"></a>
+<a name="N10611"></a><a name="Mapper"></a>
 <h4>Mapper</h4>
 <p>
 <a href="api/org/apache/hadoop/mapred/Mapper.html">
@@ -1306,7 +1323,7 @@ document.write("Last Published: " + document.lastModified);
           <a href="api/org/apache/hadoop/io/compress/CompressionCodec.html">
           CompressionCodec</a> to be used via the <span class="codefrag">JobConf</span>.
           </p>
-<a name="N10667"></a><a name="How+Many+Maps%3F"></a>
+<a name="N10687"></a><a name="How+Many+Maps%3F"></a>
 <h5>How Many Maps?</h5>
 <p>The number of maps is usually driven by the total size of the 
             inputs, that is, the total number of blocks of the input files.</p>
@@ -1319,7 +1336,7 @@ document.write("Last Published: " + document.lastModified);
             <a href="api/org/apache/hadoop/mapred/JobConf.html#setNumMapTasks(int)">
             setNumMapTasks(int)</a> (which only provides a hint to the framework) 
             is used to set it even higher.</p>
-<a name="N1067F"></a><a name="Reducer"></a>
+<a name="N1069F"></a><a name="Reducer"></a>
 <h4>Reducer</h4>
 <p>
 <a href="api/org/apache/hadoop/mapred/Reducer.html">
@@ -1342,18 +1359,18 @@ document.write("Last Published: " + document.lastModified);
 <p>
 <span class="codefrag">Reducer</span> has 3 primary phases: shuffle, sort and reduce.
           </p>
-<a name="N106AF"></a><a name="Shuffle"></a>
+<a name="N106CF"></a><a name="Shuffle"></a>
 <h5>Shuffle</h5>
 <p>Input to the <span class="codefrag">Reducer</span> is the sorted output of the
             mappers. In this phase the framework fetches the relevant partition 
             of the output of all the mappers, via HTTP.</p>
-<a name="N106BC"></a><a name="Sort"></a>
+<a name="N106DC"></a><a name="Sort"></a>
 <h5>Sort</h5>
 <p>The framework groups <span class="codefrag">Reducer</span> inputs by keys (since 
             different mappers may have output the same key) in this stage.</p>
 <p>The shuffle and sort phases occur simultaneously; while 
             map-outputs are being fetched they are merged.</p>
-<a name="N106CB"></a><a name="Secondary+Sort"></a>
+<a name="N106EB"></a><a name="Secondary+Sort"></a>
 <h5>Secondary Sort</h5>
 <p>If equivalence rules for grouping the intermediate keys are 
               required to be different from those for grouping keys before 
@@ -1364,7 +1381,7 @@ document.write("Last Published: " + document.lastModified);
               JobConf.setOutputKeyComparatorClass(Class)</a> can be used to 
               control how intermediate keys are grouped, these can be used in 
               conjunction to simulate <em>secondary sort on values</em>.</p>
-<a name="N106E4"></a><a name="Reduce"></a>
+<a name="N10704"></a><a name="Reduce"></a>
 <h5>Reduce</h5>
 <p>In this phase the 
             <a href="api/org/apache/hadoop/mapred/Reducer.html#reduce(K2, java.util.Iterator, org.apache.hadoop.mapred.OutputCollector, org.apache.hadoop.mapred.Reporter)">
@@ -1380,7 +1397,7 @@ document.write("Last Published: " + document.lastModified);
             progress, set application-level status messages and update 
             <span class="codefrag">Counters</span>, or just indicate that they are alive.</p>
 <p>The output of the <span class="codefrag">Reducer</span> is <em>not sorted</em>.</p>
-<a name="N10712"></a><a name="How+Many+Reduces%3F"></a>
+<a name="N10732"></a><a name="How+Many+Reduces%3F"></a>
 <h5>How Many Reduces?</h5>
 <p>The right number of reduces seems to be <span class="codefrag">0.95</span> or 
             <span class="codefrag">1.75</span> multiplied by (&lt;<em>no. of nodes</em>&gt; * 
@@ -1395,7 +1412,7 @@ document.write("Last Published: " + document.lastModified);
 <p>The scaling factors above are slightly less than whole numbers to 
             reserve a few reduce slots in the framework for speculative-tasks and
             failed tasks.</p>
-<a name="N10737"></a><a name="Reducer+NONE"></a>
+<a name="N10757"></a><a name="Reducer+NONE"></a>
 <h5>Reducer NONE</h5>
 <p>It is legal to set the number of reduce-tasks to <em>zero</em> if 
             no reduction is desired.</p>
@@ -1405,7 +1422,7 @@ document.write("Last Published: " + document.lastModified);
             setOutputPath(Path)</a>. The framework does not sort the 
             map-outputs before writing them out to the <span class="codefrag">FileSystem</span>.
             </p>
-<a name="N10752"></a><a name="Partitioner"></a>
+<a name="N10772"></a><a name="Partitioner"></a>
 <h4>Partitioner</h4>
 <p>
 <a href="api/org/apache/hadoop/mapred/Partitioner.html">
@@ -1419,7 +1436,7 @@ document.write("Last Published: " + document.lastModified);
 <p>
 <a href="api/org/apache/hadoop/mapred/lib/HashPartitioner.html">
           HashPartitioner</a> is the default <span class="codefrag">Partitioner</span>.</p>
-<a name="N10771"></a><a name="Reporter"></a>
+<a name="N10791"></a><a name="Reporter"></a>
 <h4>Reporter</h4>
 <p>
 <a href="api/org/apache/hadoop/mapred/Reporter.html">
@@ -1438,7 +1455,7 @@ document.write("Last Published: " + document.lastModified);
           </p>
 <p>Applications can also update <span class="codefrag">Counters</span> using the 
           <span class="codefrag">Reporter</span>.</p>
-<a name="N1079B"></a><a name="OutputCollector"></a>
+<a name="N107BB"></a><a name="OutputCollector"></a>
 <h4>OutputCollector</h4>
 <p>
 <a href="api/org/apache/hadoop/mapred/OutputCollector.html">
@@ -1449,7 +1466,7 @@ document.write("Last Published: " + document.lastModified);
 <p>Hadoop Map/Reduce comes bundled with a 
         <a href="api/org/apache/hadoop/mapred/lib/package-summary.html">
         library</a> of generally useful mappers, reducers, and partitioners.</p>
-<a name="N107B6"></a><a name="Job+Configuration"></a>
+<a name="N107D6"></a><a name="Job+Configuration"></a>
 <h3 class="h4">Job Configuration</h3>
 <p>
 <a href="api/org/apache/hadoop/mapred/JobConf.html">
@@ -1507,7 +1524,7 @@ document.write("Last Published: " + document.lastModified);
         <a href="api/org/apache/hadoop/conf/Configuration.html#set(java.lang.String, java.lang.String)">set(String, String)</a>/<a href="api/org/apache/hadoop/conf/Configuration.html#get(java.lang.String, java.lang.String)">get(String, String)</a>
         to set/get arbitrary parameters needed by applications. However, use the 
         <span class="codefrag">DistributedCache</span> for large amounts of (read-only) data.</p>
-<a name="N10848"></a><a name="Task+Execution+%26+Environment"></a>
+<a name="N10868"></a><a name="Task+Execution+%26+Environment"></a>
 <h3 class="h4">Task Execution &amp; Environment</h3>
 <p>The <span class="codefrag">TaskTracker</span> executes the <span class="codefrag">Mapper</span>/ 
         <span class="codefrag">Reducer</span>  <em>task</em> as a child process in a separate jvm.
@@ -1739,7 +1756,7 @@ document.write("Last Published: " + document.lastModified);
         <a href="native_libraries.html#Loading+native+libraries+through+DistributedCache">
         native_libraries.html</a>
 </p>
-<a name="N109E8"></a><a name="Job+Submission+and+Monitoring"></a>
+<a name="N10A08"></a><a name="Job+Submission+and+Monitoring"></a>
 <h3 class="h4">Job Submission and Monitoring</h3>
 <p>
 <a href="api/org/apache/hadoop/mapred/JobClient.html">
@@ -1800,7 +1817,7 @@ document.write("Last Published: " + document.lastModified);
 <p>Normally the user creates the application, describes various facets 
         of the job via <span class="codefrag">JobConf</span>, and then uses the 
         <span class="codefrag">JobClient</span> to submit the job and monitor its progress.</p>
-<a name="N10A48"></a><a name="Job+Control"></a>
+<a name="N10A68"></a><a name="Job+Control"></a>
 <h4>Job Control</h4>
 <p>Users may need to chain Map/Reduce jobs to accomplish complex
           tasks which cannot be done via a single Map/Reduce job. This is fairly
@@ -1836,7 +1853,7 @@ document.write("Last Published: " + document.lastModified);
             </li>
           
 </ul>
-<a name="N10A72"></a><a name="Job+Input"></a>
+<a name="N10A92"></a><a name="Job+Input"></a>
 <h3 class="h4">Job Input</h3>
 <p>
 <a href="api/org/apache/hadoop/mapred/InputFormat.html">
@@ -1884,7 +1901,7 @@ document.write("Last Published: " + document.lastModified);
         appropriate <span class="codefrag">CompressionCodec</span>. However, it must be noted that
         compressed files with the above extensions cannot be <em>split</em> and 
         each compressed file is processed in its entirety by a single mapper.</p>
-<a name="N10ADC"></a><a name="InputSplit"></a>
+<a name="N10AFC"></a><a name="InputSplit"></a>
 <h4>InputSplit</h4>
 <p>
 <a href="api/org/apache/hadoop/mapred/InputSplit.html">
@@ -1898,7 +1915,7 @@ document.write("Last Published: " + document.lastModified);
           FileSplit</a> is the default <span class="codefrag">InputSplit</span>. It sets 
           <span class="codefrag">map.input.file</span> to the path of the input file for the
           logical split.</p>
-<a name="N10B01"></a><a name="RecordReader"></a>
+<a name="N10B21"></a><a name="RecordReader"></a>
 <h4>RecordReader</h4>
 <p>
 <a href="api/org/apache/hadoop/mapred/RecordReader.html">
@@ -1910,7 +1927,7 @@ document.write("Last Published: " + document.lastModified);
           for processing. <span class="codefrag">RecordReader</span> thus assumes the 
           responsibility of processing record boundaries and presents the tasks 
           with keys and values.</p>
-<a name="N10B24"></a><a name="Job+Output"></a>
+<a name="N10B44"></a><a name="Job+Output"></a>
 <h3 class="h4">Job Output</h3>
 <p>
 <a href="api/org/apache/hadoop/mapred/OutputFormat.html">
@@ -1935,7 +1952,7 @@ document.write("Last Published: " + document.lastModified);
 <p>
 <span class="codefrag">TextOutputFormat</span> is the default 
         <span class="codefrag">OutputFormat</span>.</p>
-<a name="N10B4D"></a><a name="Task+Side-Effect+Files"></a>
+<a name="N10B6D"></a><a name="Task+Side-Effect+Files"></a>
 <h4>Task Side-Effect Files</h4>
 <p>In some applications, component tasks need to create and/or write to
           side-files, which differ from the actual job-output files.</p>
@@ -1974,7 +1991,7 @@ document.write("Last Published: " + document.lastModified);
 <p>The entire discussion holds true for maps of jobs with 
            reducer=NONE (i.e. 0 reduces) since output of the map, in that case, 
            goes directly to HDFS.</p>
-<a name="N10B95"></a><a name="RecordWriter"></a>
+<a name="N10BB5"></a><a name="RecordWriter"></a>
 <h4>RecordWriter</h4>
 <p>
 <a href="api/org/apache/hadoop/mapred/RecordWriter.html">
@@ -1982,9 +1999,9 @@ document.write("Last Published: " + document.lastModified);
           pairs to an output file.</p>
 <p>RecordWriter implementations write the job outputs to the 
           <span class="codefrag">FileSystem</span>.</p>
-<a name="N10BAC"></a><a name="Other+Useful+Features"></a>
+<a name="N10BCC"></a><a name="Other+Useful+Features"></a>
 <h3 class="h4">Other Useful Features</h3>
-<a name="N10BB2"></a><a name="Counters"></a>
+<a name="N10BD2"></a><a name="Counters"></a>
 <h4>Counters</h4>
 <p>
 <span class="codefrag">Counters</span> represent global counters, defined either by 
@@ -2001,7 +2018,7 @@ document.write("Last Published: " + document.lastModified);
           in the <span class="codefrag">map</span> and/or 
           <span class="codefrag">reduce</span> methods. These counters are then globally 
           aggregated by the framework.</p>
-<a name="N10BE1"></a><a name="DistributedCache"></a>
+<a name="N10C01"></a><a name="DistributedCache"></a>
 <h4>DistributedCache</h4>
 <p>
 <a href="api/org/apache/hadoop/filecache/DistributedCache.html">
@@ -2072,7 +2089,7 @@ document.write("Last Published: " + document.lastModified);
           <span class="codefrag">mapred.job.classpath.{files|archives}</span>. Similarly the
           cached files that are symlinked into the working directory of the
           task can be used to distribute native libraries and load them.</p>
-<a name="N10C64"></a><a name="Tool"></a>
+<a name="N10C84"></a><a name="Tool"></a>
 <h4>Tool</h4>
 <p>The <a href="api/org/apache/hadoop/util/Tool.html">Tool</a> 
           interface supports the handling of generic Hadoop command-line options.
@@ -2112,7 +2129,7 @@ document.write("Last Published: " + document.lastModified);
             </span>
           
 </p>
-<a name="N10C96"></a><a name="IsolationRunner"></a>
+<a name="N10CB6"></a><a name="IsolationRunner"></a>
 <h4>IsolationRunner</h4>
 <p>
 <a href="api/org/apache/hadoop/mapred/IsolationRunner.html">
@@ -2136,7 +2153,7 @@ document.write("Last Published: " + document.lastModified);
 <p>
 <span class="codefrag">IsolationRunner</span> will run the failed task in a single 
           jvm, which can be in the debugger, over precisely the same input.</p>
-<a name="N10CC9"></a><a name="Profiling"></a>
+<a name="N10CE9"></a><a name="Profiling"></a>
 <h4>Profiling</h4>
 <p>Profiling is a utility to get a representative (2 or 3) sample
           of built-in java profiler for a sample of maps and reduces. </p>
@@ -2169,7 +2186,7 @@ document.write("Last Published: " + document.lastModified);
           <span class="codefrag">-agentlib:hprof=cpu=samples,heap=sites,force=n,thread=y,verbose=n,file=%s</span>
           
 </p>
-<a name="N10CFD"></a><a name="Debugging"></a>
+<a name="N10D1D"></a><a name="Debugging"></a>
 <h4>Debugging</h4>
 <p>Map/Reduce framework provides a facility to run user-provided 
           scripts for debugging. When map/reduce task fails, user can run 
@@ -2180,14 +2197,14 @@ document.write("Last Published: " + document.lastModified);
 <p> In the following sections we discuss how to submit debug script
           along with the job. For submitting debug script, first it has to
           distributed. Then the script has to supplied in Configuration. </p>
-<a name="N10D09"></a><a name="How+to+distribute+script+file%3A"></a>
+<a name="N10D29"></a><a name="How+to+distribute+script+file%3A"></a>
 <h5> How to distribute script file: </h5>
 <p>
           The user has to use 
           <a href="mapred_tutorial.html#DistributedCache">DistributedCache</a>
           mechanism to <em>distribute</em> and <em>symlink</em> the
           debug script file.</p>
-<a name="N10D1D"></a><a name="How+to+submit+script%3A"></a>
+<a name="N10D3D"></a><a name="How+to+submit+script%3A"></a>
 <h5> How to submit script: </h5>
 <p> A quick way to submit debug script is to set values for the 
           properties "mapred.map.task.debug.script" and 
@@ -2211,17 +2228,17 @@ document.write("Last Published: " + document.lastModified);
 <span class="codefrag">$script $stdout $stderr $syslog $jobconf $program </span>  
           
 </p>
-<a name="N10D3F"></a><a name="Default+Behavior%3A"></a>
+<a name="N10D5F"></a><a name="Default+Behavior%3A"></a>
 <h5> Default Behavior: </h5>
 <p> For pipes, a default script is run to process core dumps under
           gdb, prints stack trace and gives info about running threads. </p>
-<a name="N10D4A"></a><a name="JobControl"></a>
+<a name="N10D6A"></a><a name="JobControl"></a>
 <h4>JobControl</h4>
 <p>
 <a href="api/org/apache/hadoop/mapred/jobcontrol/package-summary.html">
           JobControl</a> is a utility which encapsulates a set of Map/Reduce jobs
           and their dependencies.</p>
-<a name="N10D57"></a><a name="Data+Compression"></a>
+<a name="N10D77"></a><a name="Data+Compression"></a>
 <h4>Data Compression</h4>
 <p>Hadoop Map/Reduce provides facilities for the application-writer to
           specify compression for both intermediate map-outputs and the
@@ -2235,7 +2252,7 @@ document.write("Last Published: " + document.lastModified);
           codecs for reasons of both performance (zlib) and non-availability of
           Java libraries (lzo). More details on their usage and availability are
           available <a href="native_libraries.html">here</a>.</p>
-<a name="N10D77"></a><a name="Intermediate+Outputs"></a>
+<a name="N10D97"></a><a name="Intermediate+Outputs"></a>
 <h5>Intermediate Outputs</h5>
 <p>Applications can control compression of intermediate map-outputs
             via the 
@@ -2244,7 +2261,7 @@ document.write("Last Published: " + document.lastModified);
             <span class="codefrag">CompressionCodec</span> to be used via the
             <a href="api/org/apache/hadoop/mapred/JobConf.html#setMapOutputCompressorClass(java.lang.Class)">
             JobConf.setMapOutputCompressorClass(Class)</a> api.</p>
-<a name="N10D8C"></a><a name="Job+Outputs"></a>
+<a name="N10DAC"></a><a name="Job+Outputs"></a>
 <h5>Job Outputs</h5>
 <p>Applications can control compression of job-outputs via the
             <a href="api/org/apache/hadoop/mapred/FileOutputFormat.html#setCompressOutput(org.apache.hadoop.mapred.JobConf,%20boolean)">
@@ -2264,7 +2281,7 @@ document.write("Last Published: " + document.lastModified);
 </div>
 
     
-<a name="N10DBB"></a><a name="Example%3A+WordCount+v2.0"></a>
+<a name="N10DDB"></a><a name="Example%3A+WordCount+v2.0"></a>
 <h2 class="h3">Example: WordCount v2.0</h2>
 <div class="section">
 <p>Here is a more complete <span class="codefrag">WordCount</span> which uses many of the
@@ -2274,7 +2291,7 @@ document.write("Last Published: " + document.lastModified);
       <a href="quickstart.html#SingleNodeSetup">pseudo-distributed</a> or
       <a href="quickstart.html#Fully-Distributed+Operation">fully-distributed</a> 
       Hadoop installation.</p>
-<a name="N10DD5"></a><a name="Source+Code-N10DD5"></a>
+<a name="N10DF5"></a><a name="Source+Code-N10DF5"></a>
 <h3 class="h4">Source Code</h3>
 <table class="ForrestTable" cellspacing="1" cellpadding="4">
           
@@ -3484,7 +3501,7 @@ document.write("Last Published: " + document.lastModified);
 </tr>
         
 </table>
-<a name="N11537"></a><a name="Sample+Runs"></a>
+<a name="N11557"></a><a name="Sample+Runs"></a>
 <h3 class="h4">Sample Runs</h3>
 <p>Sample text-files as input:</p>
 <p>
@@ -3652,7 +3669,7 @@ document.write("Last Published: " + document.lastModified);
 <br>
         
 </p>
-<a name="N1160B"></a><a name="Highlights"></a>
+<a name="N1162B"></a><a name="Highlights"></a>
 <h3 class="h4">Highlights</h3>
 <p>The second version of <span class="codefrag">WordCount</span> improves upon the 
         previous one by using some features offered by the Map/Reduce framework:

File diff suppressed because it is too large
+ 22 - 5
docs/mapred_tutorial.pdf


Some files were not shown because too many files changed in this diff