Quellcode durchsuchen

HADOOP-3041. Deprecates getOutputPath and defines two new APIs getCurrentOutputPath and getFinalOutputPath. Contributed by Amareshwari Sriramadasu.

git-svn-id: https://svn.apache.org/repos/asf/hadoop/core/branches/branch-0.16@639818 13f79535-47bb-0310-9956-ffa450edef68
Devaraj Das vor 17 Jahren
Ursprung
Commit
e6e55e4746

+ 4 - 0
CHANGES.txt

@@ -27,6 +27,10 @@ Release 0.16.2 - Unreleased
     HADOOP-3007. Tolerate mirror failures while DataNode is replicating
     blocks as it used to before. (rangadi)
 
+    HADOOP-3041. Deprecates getOutputPath and defines two new APIs
+    getCurrentOutputPath and getFinalOutputPath.
+    (Amareshwari Sriramadasu via ddas)
+
 Release 0.16.1 - 2008-03-13
 
   INCOMPATIBLE CHANGES

+ 22 - 18
docs/mapred_tutorial.html

@@ -283,7 +283,7 @@ document.write("Last Published: " + document.lastModified);
 <a href="#Example%3A+WordCount+v2.0">Example: WordCount v2.0</a>
 <ul class="minitoc">
 <li>
-<a href="#Source+Code-N10BBE">Source Code</a>
+<a href="#Source+Code-N10BC1">Source Code</a>
 </li>
 <li>
 <a href="#Sample+Runs">Sample Runs</a>
@@ -1731,11 +1731,15 @@ document.write("Last Published: " + document.lastModified);
 <p>The application-writer can take advantage of this feature by 
           creating any side-files required in <span class="codefrag">${mapred.output.dir}</span> 
           during execution of a task via 
-          <a href="api/org/apache/hadoop/mapred/JobConf.html#getOutputPath()">
-          JobConf.getOutputPath()</a>, and the framework will promote them 
+          <a href="api/org/apache/hadoop/mapred/JobConf.html#getCurrentOutputPath()">
+          JobConf.getCurrentOutputPath()</a>, and the framework will promote them 
           similarly for succesful task-attempts, thus eliminating the need to 
-          pick unique paths per task-attempt.</p>
-<a name="N10A31"></a><a name="RecordWriter"></a>
+          pick unique paths per task-attempt. She can get the actual configured 
+          path (final output path) via 
+          <a href="api/org/apache/hadoop/mapred/JobConf.html#getFinalOutputPath()">
+          JobConf.getFinalOutputPath()</a>
+</p>
+<a name="N10A34"></a><a name="RecordWriter"></a>
 <h4>RecordWriter</h4>
 <p>
 <a href="api/org/apache/hadoop/mapred/RecordWriter.html">
@@ -1743,9 +1747,9 @@ document.write("Last Published: " + document.lastModified);
           pairs to an output file.</p>
 <p>RecordWriter implementations write the job outputs to the 
           <span class="codefrag">FileSystem</span>.</p>
-<a name="N10A48"></a><a name="Other+Useful+Features"></a>
+<a name="N10A4B"></a><a name="Other+Useful+Features"></a>
 <h3 class="h4">Other Useful Features</h3>
-<a name="N10A4E"></a><a name="Counters"></a>
+<a name="N10A51"></a><a name="Counters"></a>
 <h4>Counters</h4>
 <p>
 <span class="codefrag">Counters</span> represent global counters, defined either by 
@@ -1759,7 +1763,7 @@ document.write("Last Published: " + document.lastModified);
           Reporter.incrCounter(Enum, long)</a> in the <span class="codefrag">map</span> and/or 
           <span class="codefrag">reduce</span> methods. These counters are then globally 
           aggregated by the framework.</p>
-<a name="N10A79"></a><a name="DistributedCache"></a>
+<a name="N10A7C"></a><a name="DistributedCache"></a>
 <h4>DistributedCache</h4>
 <p>
 <a href="api/org/apache/hadoop/filecache/DistributedCache.html">
@@ -1792,7 +1796,7 @@ document.write("Last Published: " + document.lastModified);
           <a href="api/org/apache/hadoop/filecache/DistributedCache.html#createSymlink(org.apache.hadoop.conf.Configuration)">
           DistributedCache.createSymlink(Path, Configuration)</a> api. Files 
           have <em>execution permissions</em> set.</p>
-<a name="N10AB7"></a><a name="Tool"></a>
+<a name="N10ABA"></a><a name="Tool"></a>
 <h4>Tool</h4>
 <p>The <a href="api/org/apache/hadoop/util/Tool.html">Tool</a> 
           interface supports the handling of generic Hadoop command-line options.
@@ -1832,7 +1836,7 @@ document.write("Last Published: " + document.lastModified);
             </span>
           
 </p>
-<a name="N10AE9"></a><a name="IsolationRunner"></a>
+<a name="N10AEC"></a><a name="IsolationRunner"></a>
 <h4>IsolationRunner</h4>
 <p>
 <a href="api/org/apache/hadoop/mapred/IsolationRunner.html">
@@ -1856,13 +1860,13 @@ document.write("Last Published: " + document.lastModified);
 <p>
 <span class="codefrag">IsolationRunner</span> will run the failed task in a single 
           jvm, which can be in the debugger, over precisely the same input.</p>
-<a name="N10B1C"></a><a name="JobControl"></a>
+<a name="N10B1F"></a><a name="JobControl"></a>
 <h4>JobControl</h4>
 <p>
 <a href="api/org/apache/hadoop/mapred/jobcontrol/package-summary.html">
           JobControl</a> is a utility which encapsulates a set of Map-Reduce jobs
           and their dependencies.</p>
-<a name="N10B29"></a><a name="Data+Compression"></a>
+<a name="N10B2C"></a><a name="Data+Compression"></a>
 <h4>Data Compression</h4>
 <p>Hadoop Map-Reduce provides facilities for the application-writer to
           specify compression for both intermediate map-outputs and the
@@ -1876,7 +1880,7 @@ document.write("Last Published: " + document.lastModified);
           codecs for reasons of both performance (zlib) and non-availability of
           Java libraries (lzo). More details on their usage and availability are
           available <a href="native_libraries.html">here</a>.</p>
-<a name="N10B49"></a><a name="Intermediate+Outputs"></a>
+<a name="N10B4C"></a><a name="Intermediate+Outputs"></a>
 <h5>Intermediate Outputs</h5>
 <p>Applications can control compression of intermediate map-outputs
             via the 
@@ -1897,7 +1901,7 @@ document.write("Last Published: " + document.lastModified);
             <a href="api/org/apache/hadoop/mapred/JobConf.html#setMapOutputCompressionType(org.apache.hadoop.io.SequenceFile.CompressionType)">
             JobConf.setMapOutputCompressionType(SequenceFile.CompressionType)</a> 
             api.</p>
-<a name="N10B75"></a><a name="Job+Outputs"></a>
+<a name="N10B78"></a><a name="Job+Outputs"></a>
 <h5>Job Outputs</h5>
 <p>Applications can control compression of job-outputs via the
             <a href="api/org/apache/hadoop/mapred/OutputFormatBase.html#setCompressOutput(org.apache.hadoop.mapred.JobConf,%20boolean)">
@@ -1917,7 +1921,7 @@ document.write("Last Published: " + document.lastModified);
 </div>
 
     
-<a name="N10BA4"></a><a name="Example%3A+WordCount+v2.0"></a>
+<a name="N10BA7"></a><a name="Example%3A+WordCount+v2.0"></a>
 <h2 class="h3">Example: WordCount v2.0</h2>
 <div class="section">
 <p>Here is a more complete <span class="codefrag">WordCount</span> which uses many of the
@@ -1927,7 +1931,7 @@ document.write("Last Published: " + document.lastModified);
       <a href="quickstart.html#SingleNodeSetup">pseudo-distributed</a> or
       <a href="quickstart.html#Fully-Distributed+Operation">fully-distributed</a> 
       Hadoop installation.</p>
-<a name="N10BBE"></a><a name="Source+Code-N10BBE"></a>
+<a name="N10BC1"></a><a name="Source+Code-N10BC1"></a>
 <h3 class="h4">Source Code</h3>
 <table class="ForrestTable" cellspacing="1" cellpadding="4">
           
@@ -3137,7 +3141,7 @@ document.write("Last Published: " + document.lastModified);
 </tr>
         
 </table>
-<a name="N11320"></a><a name="Sample+Runs"></a>
+<a name="N11323"></a><a name="Sample+Runs"></a>
 <h3 class="h4">Sample Runs</h3>
 <p>Sample text-files as input:</p>
 <p>
@@ -3305,7 +3309,7 @@ document.write("Last Published: " + document.lastModified);
 <br>
         
 </p>
-<a name="N113F4"></a><a name="Highlights"></a>
+<a name="N113F7"></a><a name="Highlights"></a>
 <h3 class="h4">Highlights</h3>
 <p>The second version of <span class="codefrag">WordCount</span> improves upon the 
         previous one by using some features offered by the Map-Reduce framework:

Datei-Diff unterdrückt, da er zu groß ist
+ 1 - 1
docs/mapred_tutorial.pdf


+ 6 - 3
src/docs/src/documentation/content/xdocs/mapred_tutorial.xml

@@ -1282,10 +1282,13 @@
           <p>The application-writer can take advantage of this feature by 
           creating any side-files required in <code>${mapred.output.dir}</code> 
           during execution of a task via 
-          <a href="ext:api/org/apache/hadoop/mapred/jobconf/getoutputpath">
-          JobConf.getOutputPath()</a>, and the framework will promote them 
+          <a href="ext:api/org/apache/hadoop/mapred/jobconf/getcurrentoutputpath">
+          JobConf.getCurrentOutputPath()</a>, and the framework will promote them 
           similarly for succesful task-attempts, thus eliminating the need to 
-          pick unique paths per task-attempt.</p>
+          pick unique paths per task-attempt. She can get the actual configured 
+          path (final output path) via 
+          <a href="ext:api/org/apache/hadoop/mapred/jobconf/getfinaloutputpath">
+          JobConf.getFinalOutputPath()</a></p>
         </section>
         
         <section>

+ 2 - 1
src/docs/src/documentation/content/xdocs/site.xml

@@ -136,7 +136,8 @@ See http://forrest.apache.org/docs/linking.html for more info.
                 <setoutputvaluegroupingcomparator href="#setOutputValueGroupingComparator(java.lang.Class)" />
                 <setinputpath href="#setInputPath(org.apache.hadoop.fs.Path)" />
                 <addinputpath href="#addInputPath(org.apache.hadoop.fs.Path)" />
-                <getoutputpath href="#getOutputPath()" />
+                <getcurrentoutputpath href="#getCurrentOutputPath()" />
+                <getfinaloutputpath href="#getFinalOutputPath()" />
                 <setoutputpath href="#setOutputPath(org.apache.hadoop.fs.Path)" />
                 <setcombinerclass href="#setCombinerClass(java.lang.Class)" />
                 <setmapdebugscript href="#setMapDebugScript(java.lang.String)" />

+ 1 - 1
src/examples/org/apache/hadoop/examples/RandomWriter.java

@@ -105,7 +105,7 @@ public class RandomWriter extends Configured implements Tool {
     public InputSplit[] getSplits(JobConf job, 
                                   int numSplits) throws IOException {
       InputSplit[] result = new InputSplit[numSplits];
-      Path outDir = job.getOutputPath();
+      Path outDir = job.getCurrentOutputPath();
       for(int i=0; i < result.length; ++i) {
         result[i] = new FileSplit(new Path(outDir, "dummy-split-" + i), 0, 1, job);
       }

+ 1 - 1
src/examples/org/apache/hadoop/examples/Sort.java

@@ -140,7 +140,7 @@ public class Sort extends Configured implements Tool {
         cluster.getTaskTrackers() +
         " nodes to sort from " + 
         jobConf.getInputPaths()[0] + " into " +
-        jobConf.getOutputPath() + " with " + num_reduces + " reduces.");
+        jobConf.getCurrentOutputPath() + " with " + num_reduces + " reduces.");
     Date startTime = new Date();
     System.out.println("Job started: " + startTime);
     JobClient.runJob(jobConf);

+ 37 - 6
src/java/org/apache/hadoop/mapred/JobConf.java

@@ -353,7 +353,20 @@ public class JobConf extends Configuration {
   }
   
   /**
-   * Get the {@link Path} to the output directory for the map-reduce job.
+   * @deprecated Please use {@link #getCurrentOutputPath()} 
+   *             or {@link #getFinalOutputPath()} 
+   *             
+   * @return the {@link Path} to the output directory for the map-reduce job.
+   */
+  @Deprecated
+  public Path getOutputPath() {
+    return getCurrentOutputPath();
+  }
+
+  /**
+   * Get the {@link Path} to the output directory for the map-reduce job
+   * (This is sensitive to the task execution. While executing task, this 
+   * value points to the task's temporary output directory)
    * 
    * <h4 id="SideEffectFiles">Tasks' Side-Effect Files</h4>
    * 
@@ -378,28 +391,44 @@ public class JobConf extends Configuration {
    * 
    * <p>The application-writer can take advantage of this by creating any 
    * side-files required in <tt>${mapred.output.dir}</tt> during execution of his 
-   * reduce-task i.e. via {@link #getOutputPath()}, and the framework will move 
-   * them out similarly - thus she doesn't have to pick unique paths per 
-   * task-attempt.</p>
+   * reduce-task i.e. via {@link #getCurrentOutputPath()}, 
+   * and the framework will move them out similarly 
+   * - thus she doesn't have to pick unique paths per task-attempt.</p>
    * 
    * <p><i>Note</i>: the value of <tt>${mapred.output.dir}</tt> during execution 
    * of a particular task-attempt is actually 
    * <tt>${mapred.output.dir}/_temporary/_{$taskid}</tt>, not the value set by 
    * {@link #setOutputPath(Path)}. So, just create any side-files in the path 
-   * returned by {@link #getOutputPath()} from map/reduce task to take 
+   * returned by {@link #getCurrentOutputPath()} from map/reduce task to take 
    * advantage of this feature.</p>
    * 
    * <p>The entire discussion holds true for maps of jobs with 
    * reducer=NONE (i.e. 0 reduces) since output of the map, in that case, 
    * goes directly to HDFS.</p> 
    * 
+   * @see #getFinalOutputPath()
+   * 
    * @return the {@link Path} to the output directory for the map-reduce job.
    */
-  public Path getOutputPath() { 
+  public Path getCurrentOutputPath() { 
     String name = get("mapred.output.dir");
     return name == null ? null: new Path(name);
   }
 
+  /**
+   * Get the {@link Path} to the output directory for the map-reduce job
+   * 
+   * This is the actual configured output path set 
+   * using {@link #setOutputPath(Path)} for job submission.
+   * 
+   * @see #getCurrentOutputPath()
+   * @return the {@link Path} to the output directory for the map-reduce job.
+   */
+  public Path getFinalOutputPath() { 
+    String name = get("mapred.final.output.dir");
+    return name == null ? null: new Path(name);
+  }
+
   /**
    * Set the {@link Path} of the output directory for the map-reduce job.
    * 
@@ -410,6 +439,8 @@ public class JobConf extends Configuration {
   public void setOutputPath(Path dir) {
     dir = new Path(getWorkingDirectory(), dir);
     set("mapred.output.dir", dir.toString());
+    if (get("mapred.final.output.dir") == null)
+      set("mapred.final.output.dir", dir.toString());
   }
 
   /**

+ 2 - 2
src/java/org/apache/hadoop/mapred/JobInProgress.java

@@ -277,7 +277,7 @@ class JobInProgress {
     }
 
     // create job specific temporary directory in output path
-    Path outputPath = conf.getOutputPath();
+    Path outputPath = conf.getCurrentOutputPath();
     if (outputPath != null) {
       Path tmpDir = new Path(outputPath, MRConstants.TEMP_DIR_NAME);
       FileSystem fileSys = tmpDir.getFileSystem(conf);
@@ -1141,7 +1141,7 @@ class JobInProgress {
       fs.delete(tempDir); 
 
       // delete the temporary directory in output directory
-      Path outputPath = conf.getOutputPath();
+      Path outputPath = conf.getCurrentOutputPath();
       if (outputPath != null) {
         Path tmpDir = new Path(outputPath, MRConstants.TEMP_DIR_NAME);
         FileSystem fileSys = tmpDir.getFileSystem(conf);

+ 1 - 1
src/java/org/apache/hadoop/mapred/LocalJobRunner.java

@@ -114,7 +114,7 @@ class LocalJobRunner implements JobSubmissionProtocol {
           job.setNumReduceTasks(1);
         }
         // create job specific temp directory in output path
-        Path outputPath = job.getOutputPath();
+        Path outputPath = job.getCurrentOutputPath();
         FileSystem outputFs = null;
         Path tmpDir = null;
         if (outputPath != null) {

+ 1 - 1
src/java/org/apache/hadoop/mapred/MapFileOutputFormat.java

@@ -42,7 +42,7 @@ public class MapFileOutputFormat extends OutputFormatBase {
                                       String name, Progressable progress)
     throws IOException {
 
-    Path outputPath = job.getOutputPath();
+    Path outputPath = job.getCurrentOutputPath();
     FileSystem fs = outputPath.getFileSystem(job);
     if (!fs.exists(outputPath)) {
       throw new IOException("Output directory doesnt exist");

+ 1 - 1
src/java/org/apache/hadoop/mapred/OutputFormatBase.java

@@ -100,7 +100,7 @@ public abstract class OutputFormatBase<K extends WritableComparable,
     throws FileAlreadyExistsException, 
            InvalidJobConfException, IOException {
     // Ensure that the output directory is set and not already there
-    Path outDir = job.getOutputPath();
+    Path outDir = job.getCurrentOutputPath();
     if (outDir == null && job.getNumReduceTasks() != 0) {
       throw new InvalidJobConfException("Output directory not set in JobConf.");
     }

+ 1 - 1
src/java/org/apache/hadoop/mapred/SequenceFileOutputFormat.java

@@ -40,7 +40,7 @@ public class SequenceFileOutputFormat extends OutputFormatBase {
                                       String name, Progressable progress)
     throws IOException {
 
-    Path outputPath = job.getOutputPath();
+    Path outputPath = job.getCurrentOutputPath();
     FileSystem fs = outputPath.getFileSystem(job);
     if (!fs.exists(outputPath)) {
       throw new IOException("Output directory doesnt exist");

+ 3 - 3
src/java/org/apache/hadoop/mapred/Task.java

@@ -190,7 +190,7 @@ abstract class Task implements Writable, Configurable {
   public String toString() { return taskId; }
 
   private Path getTaskOutputPath(JobConf conf) {
-    Path p = new Path(conf.getOutputPath(), 
+    Path p = new Path(conf.getCurrentOutputPath(), 
       (MRConstants.TEMP_DIR_NAME + Path.SEPARATOR + "_" + taskId));
     try {
       FileSystem fs = p.getFileSystem(conf);
@@ -212,7 +212,7 @@ abstract class Task implements Writable, Configurable {
     conf.set("mapred.job.id", jobId);
     
     // The task-specific output path
-    if (conf.getOutputPath() != null) {
+    if (conf.getCurrentOutputPath() != null) {
       taskOutputPath = getTaskOutputPath(conf);
       conf.setOutputPath(taskOutputPath);
     }
@@ -397,7 +397,7 @@ abstract class Task implements Writable, Configurable {
       this.conf = (JobConf) conf;
 
       if (taskId != null && taskOutputPath == null && 
-              this.conf.getOutputPath() != null) {
+              this.conf.getCurrentOutputPath() != null) {
         taskOutputPath = getTaskOutputPath(this.conf);
       }
     } else {

+ 1 - 1
src/java/org/apache/hadoop/mapred/TaskTracker.java

@@ -1420,7 +1420,7 @@ public class TaskTracker
       keepFailedTaskFiles = localJobConf.getKeepFailedTaskFiles();
 
       // create _taskid directory in output path temporary directory.
-      Path outputPath = localJobConf.getOutputPath();
+      Path outputPath = localJobConf.getCurrentOutputPath();
       if (outputPath != null) {
         Path jobTmpDir = new Path(outputPath, MRConstants.TEMP_DIR_NAME);
         FileSystem fs = jobTmpDir.getFileSystem(localJobConf);

+ 1 - 1
src/java/org/apache/hadoop/mapred/TextOutputFormat.java

@@ -106,7 +106,7 @@ public class TextOutputFormat<K extends WritableComparable,
                                                   Progressable progress)
     throws IOException {
 
-    Path dir = job.getOutputPath();
+    Path dir = job.getCurrentOutputPath();
     FileSystem fs = dir.getFileSystem(job);
     if (!fs.exists(dir)) {
       throw new IOException("Output directory doesnt exist");

+ 1 - 1
src/test/org/apache/hadoop/io/FileBench.java

@@ -112,7 +112,7 @@ public class FileBench extends Configured implements Tool {
     Text val = new Text();
 
     final String fn = conf.get("test.filebench.name", "");
-    final Path outd = conf.getOutputPath();
+    final Path outd = conf.getCurrentOutputPath();
     OutputFormat outf = conf.getOutputFormat();
     RecordWriter<Text,Text> rw =
       outf.getRecordWriter(outd.getFileSystem(conf), conf, fn,

+ 1 - 1
src/test/org/apache/hadoop/mapred/GenericMRLoadGenerator.java

@@ -140,7 +140,7 @@ public class GenericMRLoadGenerator extends Configured implements Tool {
       return -1;
     }
 
-    if (null == job.getOutputPath()) {
+    if (null == job.getCurrentOutputPath()) {
       // No output dir? No writes
       job.setOutputFormat(NullOutputFormat.class);
     }

+ 1 - 1
src/test/org/apache/hadoop/mapred/MRBench.java

@@ -184,7 +184,7 @@ public class MRBench {
 
       LOG.info("Running job " + i + ":" +
                " input=" + jobConf.getInputPaths()[0] + 
-               " output=" + jobConf.getOutputPath());
+               " output=" + jobConf.getCurrentOutputPath());
       
       // run the mapred task now 
       long curTime = System.currentTimeMillis();

+ 3 - 2
src/test/org/apache/hadoop/mapred/SortValidator.java

@@ -351,7 +351,7 @@ public class SortValidator extends Configured implements Tool {
                          "from " + jobConf.getInputPaths()[0] + " (" + 
                          noSortInputpaths + " files), " + 
                          jobConf.getInputPaths()[1] + " (" + noSortReduceTasks + 
-                         " files) into " + jobConf.getOutputPath() + 
+                         " files) into " + jobConf.getCurrentOutputPath() + 
                          " with 1 reducer.");
       Date startTime = new Date();
       System.out.println("Job started: " + startTime);
@@ -492,7 +492,8 @@ public class SortValidator extends Configured implements Tool {
       System.out.println("\nSortValidator.RecordChecker: Running on " +
                          cluster.getTaskTrackers() +
                          " nodes to validate sort from " + jobConf.getInputPaths()[0] + ", " + 
-                         jobConf.getInputPaths()[1] + " into " + jobConf.getOutputPath() + 
+                         jobConf.getInputPaths()[1] + " into " +
+                         jobConf.getCurrentOutputPath() + 
                          " with " + noReduces + " reduces.");
       Date startTime = new Date();
       System.out.println("Job started: " + startTime);

+ 1 - 1
src/test/org/apache/hadoop/mapred/ThreadedMapBenchmark.java

@@ -78,7 +78,7 @@ public class ThreadedMapBenchmark extends Configured implements Tool {
     public InputSplit[] getSplits(JobConf job, 
                                   int numSplits) throws IOException {
       InputSplit[] result = new InputSplit[numSplits];
-      Path outDir = job.getOutputPath();
+      Path outDir = job.getCurrentOutputPath();
       for(int i=0; i < result.length; ++i) {
         result[i] = new FileSplit(new Path(outDir, "dummy-split-" + i), 0, 1, 
                                   job);

Einige Dateien werden nicht angezeigt, da zu viele Dateien in diesem Diff geändert wurden.