浏览代码

HADOOP-3406. Add forrest documentation for Profiling. Contributed by Amareshwari Sriramadasu.

git-svn-id: https://svn.apache.org/repos/asf/hadoop/core/trunk@668400 13f79535-47bb-0310-9956-ffa450edef68
Devaraj Das 17 年之前
父节点
当前提交
0a36e37166

+ 3 - 0
CHANGES.txt

@@ -298,6 +298,9 @@ Release 0.18.0 - Unreleased
 
 
     HADOOP-2984. Add forrest documentation for DistCp. (cdouglas)
     HADOOP-2984. Add forrest documentation for DistCp. (cdouglas)
 
 
+    HADOOP-3406. Add forrest documentation for Profiling.
+    (Amareshwari Sriramadasu via ddas)
+
   OPTIMIZATIONS
   OPTIMIZATIONS
 
 
     HADOOP-3274. The default constructor of BytesWritable creates empty 
     HADOOP-3274. The default constructor of BytesWritable creates empty 

+ 9 - 4
docs/changes.html

@@ -207,7 +207,7 @@ framework.<br />(tomwhite via omalley)</li>
     </ol>
     </ol>
   </li>
   </li>
   <li><a href="javascript:toggleList('release_0.18.0_-_unreleased_._improvements_')">  IMPROVEMENTS
   <li><a href="javascript:toggleList('release_0.18.0_-_unreleased_._improvements_')">  IMPROVEMENTS
-</a>&nbsp;&nbsp;&nbsp;(36)
+</a>&nbsp;&nbsp;&nbsp;(38)
     <ol id="release_0.18.0_-_unreleased_._improvements_">
     <ol id="release_0.18.0_-_unreleased_._improvements_">
       <li><a href="http://issues.apache.org/jira/browse/HADOOP-2928">HADOOP-2928</a>. Remove deprecated FileSystem.getContentLength().<br />(Lohit Vjayarenu via rangadi)</li>
       <li><a href="http://issues.apache.org/jira/browse/HADOOP-2928">HADOOP-2928</a>. Remove deprecated FileSystem.getContentLength().<br />(Lohit Vjayarenu via rangadi)</li>
       <li><a href="http://issues.apache.org/jira/browse/HADOOP-3130">HADOOP-3130</a>. Make the connect timeout smaller for getFile.<br />(Amar Ramesh Kamat via ddas)</li>
       <li><a href="http://issues.apache.org/jira/browse/HADOOP-3130">HADOOP-3130</a>. Make the connect timeout smaller for getFile.<br />(Amar Ramesh Kamat via ddas)</li>
@@ -286,6 +286,8 @@ a separate thread.<br />(Amareshwari Sriramadasu via ddas)</li>
       <li><a href="http://issues.apache.org/jira/browse/HADOOP-3379">HADOOP-3379</a>. Documents stream.non.zero.exit.status.is.failure for Streaming.<br />(Amareshwari Sriramadasu via ddas)</li>
       <li><a href="http://issues.apache.org/jira/browse/HADOOP-3379">HADOOP-3379</a>. Documents stream.non.zero.exit.status.is.failure for Streaming.<br />(Amareshwari Sriramadasu via ddas)</li>
       <li><a href="http://issues.apache.org/jira/browse/HADOOP-3096">HADOOP-3096</a>. Improves documentation about the Task Execution Environment in
       <li><a href="http://issues.apache.org/jira/browse/HADOOP-3096">HADOOP-3096</a>. Improves documentation about the Task Execution Environment in
 the Map-Reduce tutorial.<br />(Amareshwari Sriramadasu via ddas)</li>
 the Map-Reduce tutorial.<br />(Amareshwari Sriramadasu via ddas)</li>
+      <li><a href="http://issues.apache.org/jira/browse/HADOOP-2984">HADOOP-2984</a>. Add forrest documentation for DistCp.<br />(cdouglas)</li>
+      <li><a href="http://issues.apache.org/jira/browse/HADOOP-3406">HADOOP-3406</a>. Add forrest documentation for Profiling.<br />(Amareshwari Sriramadasu via ddas)</li>
     </ol>
     </ol>
   </li>
   </li>
   <li><a href="javascript:toggleList('release_0.18.0_-_unreleased_._optimizations_')">  OPTIMIZATIONS
   <li><a href="javascript:toggleList('release_0.18.0_-_unreleased_._optimizations_')">  OPTIMIZATIONS
@@ -313,7 +315,7 @@ InputFormat.validateInput.<br />(tomwhite via omalley)</li>
     </ol>
     </ol>
   </li>
   </li>
   <li><a href="javascript:toggleList('release_0.18.0_-_unreleased_._bug_fixes_')">  BUG FIXES
   <li><a href="javascript:toggleList('release_0.18.0_-_unreleased_._bug_fixes_')">  BUG FIXES
-</a>&nbsp;&nbsp;&nbsp;(86)
+</a>&nbsp;&nbsp;&nbsp;(90)
     <ol id="release_0.18.0_-_unreleased_._bug_fixes_">
     <ol id="release_0.18.0_-_unreleased_._bug_fixes_">
       <li><a href="http://issues.apache.org/jira/browse/HADOOP-2905">HADOOP-2905</a>. 'fsck -move' triggers NPE in NameNode.<br />(Lohit Vjayarenu via rangadi)</li>
       <li><a href="http://issues.apache.org/jira/browse/HADOOP-2905">HADOOP-2905</a>. 'fsck -move' triggers NPE in NameNode.<br />(Lohit Vjayarenu via rangadi)</li>
       <li>Increment ClientProtocol.versionID missed by <a href="http://issues.apache.org/jira/browse/HADOOP-2585">HADOOP-2585</a>.<br />(shv)</li>
       <li>Increment ClientProtocol.versionID missed by <a href="http://issues.apache.org/jira/browse/HADOOP-2585">HADOOP-2585</a>.<br />(shv)</li>
@@ -489,8 +491,11 @@ in the subdirectory are removed. ((Tsz Wo (Nicholas), SZE via dhruba)
 directory.<br />(Mahadev Konar via ddas)</li>
 directory.<br />(Mahadev Konar via ddas)</li>
       <li><a href="http://issues.apache.org/jira/browse/HADOOP-3544">HADOOP-3544</a>. Fixes a documentation issue for hadoop archives.<br />(Mahadev Konar via ddas)</li>
       <li><a href="http://issues.apache.org/jira/browse/HADOOP-3544">HADOOP-3544</a>. Fixes a documentation issue for hadoop archives.<br />(Mahadev Konar via ddas)</li>
       <li><a href="http://issues.apache.org/jira/browse/HADOOP-3517">HADOOP-3517</a>. Fixes a problem in the reducer due to which the last InMemory
       <li><a href="http://issues.apache.org/jira/browse/HADOOP-3517">HADOOP-3517</a>. Fixes a problem in the reducer due to which the last InMemory
-merge may be missed.
-</li>
+merge may be missed.<br />(Arun Murthy via ddas)</li>
+      <li><a href="http://issues.apache.org/jira/browse/HADOOP-3548">HADOOP-3548</a>. Fixes build.xml to copy all *.jar files to the dist.<br />(Owen O'Malley via ddas)</li>
+      <li><a href="http://issues.apache.org/jira/browse/HADOOP-3363">HADOOP-3363</a>. Fix unformatted storage detection in FSImage.<br />(shv)</li>
+      <li><a href="http://issues.apache.org/jira/browse/HADOOP-3560">HADOOP-3560</a>. Fixes a problem to do with split creation in archives.<br />(Mahadev Konar via ddas)</li>
+      <li><a href="http://issues.apache.org/jira/browse/HADOOP-3545">HADOOP-3545</a>. Fixes a overflow problem in archives.<br />(Mahadev Konar via ddas)</li>
     </ol>
     </ol>
   </li>
   </li>
 </ul>
 </ul>

+ 49 - 13
docs/mapred_tutorial.html

@@ -288,6 +288,9 @@ document.write("Last Published: " + document.lastModified);
 <a href="#IsolationRunner">IsolationRunner</a>
 <a href="#IsolationRunner">IsolationRunner</a>
 </li>
 </li>
 <li>
 <li>
+<a href="#Profiling">Profiling</a>
+</li>
+<li>
 <a href="#Debugging">Debugging</a>
 <a href="#Debugging">Debugging</a>
 </li>
 </li>
 <li>
 <li>
@@ -304,7 +307,7 @@ document.write("Last Published: " + document.lastModified);
 <a href="#Example%3A+WordCount+v2.0">Example: WordCount v2.0</a>
 <a href="#Example%3A+WordCount+v2.0">Example: WordCount v2.0</a>
 <ul class="minitoc">
 <ul class="minitoc">
 <li>
 <li>
-<a href="#Source+Code-N10D60">Source Code</a>
+<a href="#Source+Code-N10D94">Source Code</a>
 </li>
 </li>
 <li>
 <li>
 <a href="#Sample+Runs">Sample Runs</a>
 <a href="#Sample+Runs">Sample Runs</a>
@@ -2085,7 +2088,40 @@ document.write("Last Published: " + document.lastModified);
 <p>
 <p>
 <span class="codefrag">IsolationRunner</span> will run the failed task in a single 
 <span class="codefrag">IsolationRunner</span> will run the failed task in a single 
           jvm, which can be in the debugger, over precisely the same input.</p>
           jvm, which can be in the debugger, over precisely the same input.</p>
-<a name="N10C83"></a><a name="Debugging"></a>
+<a name="N10C83"></a><a name="Profiling"></a>
+<h4>Profiling</h4>
+<p>Profiling is a utility to get a representative (2 or 3) sample
+          of built-in java profiler for a sample of maps and reduces. </p>
+<p>User can specify whether the system should collect profiler
+          information for some of the tasks in the job by setting the
+          configuration property <span class="codefrag">mapred.task.profile</span>. The
+          value can be set using the api 
+          <a href="api/org/apache/hadoop/mapred/JobConf.html#setProfileEnabled(boolean)">
+          JobConf.setProfileEnabled(boolean)</a>. If the value is set 
+          <span class="codefrag">true</span>, the task profiling is enabled. The profiler
+          information is stored in the the user log directory. By default, 
+          profiling is not enabled for the job.  </p>
+<p>Once user configures that profiling is needed, she/he can use
+          the configuration property 
+          <span class="codefrag">mapred.task.profile.{maps|reduces}</span> to set the ranges
+          of map/reduce tasks to profile. The value can be set using the api 
+          <a href="api/org/apache/hadoop/mapred/JobConf.html#setProfileTaskRange(boolean,%20java.lang.String)">
+          JobConf.setProfileTaskRange(boolean,String)</a>.
+          By default, the specified range is <span class="codefrag">0-2</span>.</p>
+<p>User can also specify the profiler configuration arguments by 
+          setting the configuration property 
+          <span class="codefrag">mapred.task.profile.params</span>. The value can be specified 
+          using the api
+          <a href="api/org/apache/hadoop/mapred/JobConf.html#setProfileParams(java.lang.String)">
+          JobConf.setProfileParams(String)</a>. If the string contains a 
+          <span class="codefrag">%s</span>, it will be replaced with the name of the profiling
+          output file when the task runs. These parameters are passed to the
+          task child JVM on the command line. The default value for 
+          the profiling parameters is 
+          <span class="codefrag">-agentlib:hprof=cpu=samples,heap=sites,force=n,thread=y,verbose=n,file=%s</span>
+          
+</p>
+<a name="N10CB7"></a><a name="Debugging"></a>
 <h4>Debugging</h4>
 <h4>Debugging</h4>
 <p>Map/Reduce framework provides a facility to run user-provided 
 <p>Map/Reduce framework provides a facility to run user-provided 
           scripts for debugging. When map/reduce task fails, user can run 
           scripts for debugging. When map/reduce task fails, user can run 
@@ -2096,7 +2132,7 @@ document.write("Last Published: " + document.lastModified);
 <p> In the following sections we discuss how to submit debug script
 <p> In the following sections we discuss how to submit debug script
           along with the job. For submitting debug script, first it has to
           along with the job. For submitting debug script, first it has to
           distributed. Then the script has to supplied in Configuration. </p>
           distributed. Then the script has to supplied in Configuration. </p>
-<a name="N10C8F"></a><a name="How+to+distribute+script+file%3A"></a>
+<a name="N10CC3"></a><a name="How+to+distribute+script+file%3A"></a>
 <h5> How to distribute script file: </h5>
 <h5> How to distribute script file: </h5>
 <p>
 <p>
           To distribute  the debug script file, first copy the file to the dfs.
           To distribute  the debug script file, first copy the file to the dfs.
@@ -2119,7 +2155,7 @@ document.write("Last Published: " + document.lastModified);
           <a href="api/org/apache/hadoop/filecache/DistributedCache.html#createSymlink(org.apache.hadoop.conf.Configuration)">
           <a href="api/org/apache/hadoop/filecache/DistributedCache.html#createSymlink(org.apache.hadoop.conf.Configuration)">
           DistributedCache.createSymLink(Configuration) </a> api.
           DistributedCache.createSymLink(Configuration) </a> api.
           </p>
           </p>
-<a name="N10CA8"></a><a name="How+to+submit+script%3A"></a>
+<a name="N10CDC"></a><a name="How+to+submit+script%3A"></a>
 <h5> How to submit script: </h5>
 <h5> How to submit script: </h5>
 <p> A quick way to submit debug script is to set values for the 
 <p> A quick way to submit debug script is to set values for the 
           properties "mapred.map.task.debug.script" and 
           properties "mapred.map.task.debug.script" and 
@@ -2143,17 +2179,17 @@ document.write("Last Published: " + document.lastModified);
 <span class="codefrag">$script $stdout $stderr $syslog $jobconf $program </span>  
 <span class="codefrag">$script $stdout $stderr $syslog $jobconf $program </span>  
           
           
 </p>
 </p>
-<a name="N10CCA"></a><a name="Default+Behavior%3A"></a>
+<a name="N10CFE"></a><a name="Default+Behavior%3A"></a>
 <h5> Default Behavior: </h5>
 <h5> Default Behavior: </h5>
 <p> For pipes, a default script is run to process core dumps under
 <p> For pipes, a default script is run to process core dumps under
           gdb, prints stack trace and gives info about running threads. </p>
           gdb, prints stack trace and gives info about running threads. </p>
-<a name="N10CD5"></a><a name="JobControl"></a>
+<a name="N10D09"></a><a name="JobControl"></a>
 <h4>JobControl</h4>
 <h4>JobControl</h4>
 <p>
 <p>
 <a href="api/org/apache/hadoop/mapred/jobcontrol/package-summary.html">
 <a href="api/org/apache/hadoop/mapred/jobcontrol/package-summary.html">
           JobControl</a> is a utility which encapsulates a set of Map-Reduce jobs
           JobControl</a> is a utility which encapsulates a set of Map-Reduce jobs
           and their dependencies.</p>
           and their dependencies.</p>
-<a name="N10CE2"></a><a name="Data+Compression"></a>
+<a name="N10D16"></a><a name="Data+Compression"></a>
 <h4>Data Compression</h4>
 <h4>Data Compression</h4>
 <p>Hadoop Map-Reduce provides facilities for the application-writer to
 <p>Hadoop Map-Reduce provides facilities for the application-writer to
           specify compression for both intermediate map-outputs and the
           specify compression for both intermediate map-outputs and the
@@ -2167,7 +2203,7 @@ document.write("Last Published: " + document.lastModified);
           codecs for reasons of both performance (zlib) and non-availability of
           codecs for reasons of both performance (zlib) and non-availability of
           Java libraries (lzo). More details on their usage and availability are
           Java libraries (lzo). More details on their usage and availability are
           available <a href="native_libraries.html">here</a>.</p>
           available <a href="native_libraries.html">here</a>.</p>
-<a name="N10D02"></a><a name="Intermediate+Outputs"></a>
+<a name="N10D36"></a><a name="Intermediate+Outputs"></a>
 <h5>Intermediate Outputs</h5>
 <h5>Intermediate Outputs</h5>
 <p>Applications can control compression of intermediate map-outputs
 <p>Applications can control compression of intermediate map-outputs
             via the 
             via the 
@@ -2176,7 +2212,7 @@ document.write("Last Published: " + document.lastModified);
             <span class="codefrag">CompressionCodec</span> to be used via the
             <span class="codefrag">CompressionCodec</span> to be used via the
             <a href="api/org/apache/hadoop/mapred/JobConf.html#setMapOutputCompressorClass(java.lang.Class)">
             <a href="api/org/apache/hadoop/mapred/JobConf.html#setMapOutputCompressorClass(java.lang.Class)">
             JobConf.setMapOutputCompressorClass(Class)</a> api.</p>
             JobConf.setMapOutputCompressorClass(Class)</a> api.</p>
-<a name="N10D17"></a><a name="Job+Outputs"></a>
+<a name="N10D4B"></a><a name="Job+Outputs"></a>
 <h5>Job Outputs</h5>
 <h5>Job Outputs</h5>
 <p>Applications can control compression of job-outputs via the
 <p>Applications can control compression of job-outputs via the
             <a href="api/org/apache/hadoop/mapred/OutputFormatBase.html#setCompressOutput(org.apache.hadoop.mapred.JobConf,%20boolean)">
             <a href="api/org/apache/hadoop/mapred/OutputFormatBase.html#setCompressOutput(org.apache.hadoop.mapred.JobConf,%20boolean)">
@@ -2196,7 +2232,7 @@ document.write("Last Published: " + document.lastModified);
 </div>
 </div>
 
 
     
     
-<a name="N10D46"></a><a name="Example%3A+WordCount+v2.0"></a>
+<a name="N10D7A"></a><a name="Example%3A+WordCount+v2.0"></a>
 <h2 class="h3">Example: WordCount v2.0</h2>
 <h2 class="h3">Example: WordCount v2.0</h2>
 <div class="section">
 <div class="section">
 <p>Here is a more complete <span class="codefrag">WordCount</span> which uses many of the
 <p>Here is a more complete <span class="codefrag">WordCount</span> which uses many of the
@@ -2206,7 +2242,7 @@ document.write("Last Published: " + document.lastModified);
       <a href="quickstart.html#SingleNodeSetup">pseudo-distributed</a> or
       <a href="quickstart.html#SingleNodeSetup">pseudo-distributed</a> or
       <a href="quickstart.html#Fully-Distributed+Operation">fully-distributed</a> 
       <a href="quickstart.html#Fully-Distributed+Operation">fully-distributed</a> 
       Hadoop installation.</p>
       Hadoop installation.</p>
-<a name="N10D60"></a><a name="Source+Code-N10D60"></a>
+<a name="N10D94"></a><a name="Source+Code-N10D94"></a>
 <h3 class="h4">Source Code</h3>
 <h3 class="h4">Source Code</h3>
 <table class="ForrestTable" cellspacing="1" cellpadding="4">
 <table class="ForrestTable" cellspacing="1" cellpadding="4">
           
           
@@ -3416,7 +3452,7 @@ document.write("Last Published: " + document.lastModified);
 </tr>
 </tr>
         
         
 </table>
 </table>
-<a name="N114C2"></a><a name="Sample+Runs"></a>
+<a name="N114F6"></a><a name="Sample+Runs"></a>
 <h3 class="h4">Sample Runs</h3>
 <h3 class="h4">Sample Runs</h3>
 <p>Sample text-files as input:</p>
 <p>Sample text-files as input:</p>
 <p>
 <p>
@@ -3584,7 +3620,7 @@ document.write("Last Published: " + document.lastModified);
 <br>
 <br>
         
         
 </p>
 </p>
-<a name="N11596"></a><a name="Highlights"></a>
+<a name="N115CA"></a><a name="Highlights"></a>
 <h3 class="h4">Highlights</h3>
 <h3 class="h4">Highlights</h3>
 <p>The second version of <span class="codefrag">WordCount</span> improves upon the 
 <p>The second version of <span class="codefrag">WordCount</span> improves upon the 
         previous one by using some features offered by the Map-Reduce framework:
         previous one by using some features offered by the Map-Reduce framework:

文件差异内容过多而无法显示
+ 3 - 3
docs/mapred_tutorial.pdf


+ 37 - 0
src/docs/src/documentation/content/xdocs/mapred_tutorial.xml

@@ -1562,6 +1562,43 @@
           <p><code>IsolationRunner</code> will run the failed task in a single 
           <p><code>IsolationRunner</code> will run the failed task in a single 
           jvm, which can be in the debugger, over precisely the same input.</p>
           jvm, which can be in the debugger, over precisely the same input.</p>
         </section>
         </section>
+
+        <section>
+          <title>Profiling</title>
+          <p>Profiling is a utility to get a representative (2 or 3) sample
+          of built-in java profiler for a sample of maps and reduces. </p>
+          
+          <p>User can specify whether the system should collect profiler
+          information for some of the tasks in the job by setting the
+          configuration property <code>mapred.task.profile</code>. The
+          value can be set using the api 
+          <a href="ext:api/org/apache/hadoop/mapred/jobconf/setprofileenabled">
+          JobConf.setProfileEnabled(boolean)</a>. If the value is set 
+          <code>true</code>, the task profiling is enabled. The profiler
+          information is stored in the the user log directory. By default, 
+          profiling is not enabled for the job.  </p>
+          
+          <p>Once user configures that profiling is needed, she/he can use
+          the configuration property 
+          <code>mapred.task.profile.{maps|reduces}</code> to set the ranges
+          of map/reduce tasks to profile. The value can be set using the api 
+          <a href="ext:api/org/apache/hadoop/mapred/jobconf/setprofiletaskrange">
+          JobConf.setProfileTaskRange(boolean,String)</a>.
+          By default, the specified range is <code>0-2</code>.</p>
+          
+          <p>User can also specify the profiler configuration arguments by 
+          setting the configuration property 
+          <code>mapred.task.profile.params</code>. The value can be specified 
+          using the api
+          <a href="ext:api/org/apache/hadoop/mapred/jobconf/setprofileparams">
+          JobConf.setProfileParams(String)</a>. If the string contains a 
+          <code>%s</code>, it will be replaced with the name of the profiling
+          output file when the task runs. These parameters are passed to the
+          task child JVM on the command line. The default value for 
+          the profiling parameters is 
+          <code>-agentlib:hprof=cpu=samples,heap=sites,force=n,thread=y,verbose=n,file=%s</code>
+          </p>
+        </section>
         
         
         <section>
         <section>
           <title>Debugging</title>
           <title>Debugging</title>

+ 3 - 0
src/docs/src/documentation/content/xdocs/site.xml

@@ -166,6 +166,9 @@ See http://forrest.apache.org/docs/linking.html for more info.
                 <setjobendnotificationuri href="#setJobEndNotificationURI(java.lang.String)" />
                 <setjobendnotificationuri href="#setJobEndNotificationURI(java.lang.String)" />
                 <setcompressmapoutput href="#setCompressMapOutput(boolean)" />
                 <setcompressmapoutput href="#setCompressMapOutput(boolean)" />
                 <setmapoutputcompressorclass href="#setMapOutputCompressorClass(java.lang.Class)" />
                 <setmapoutputcompressorclass href="#setMapOutputCompressorClass(java.lang.Class)" />
+                <setprofileenabled href="#setProfileEnabled(boolean)" />
+                <setprofiletaskrange href="#setProfileTaskRange(boolean,%20java.lang.String)" />
+                <setprofileparams href="#setProfileParams(java.lang.String)" />
                 <getjoblocaldir href="#getJobLocalDir()" />
                 <getjoblocaldir href="#getJobLocalDir()" />
                 <getjar href="#getJar()" />
                 <getjar href="#getJar()" />
               </jobconf>
               </jobconf>

部分文件因为文件数量过多而无法显示