apache
/
hadoop
mirror de https://github.com/apache/hadoop.git


			
				
					
						
						
							123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506507508509510511512513514515516517518519520521522523524525526527528529530531532533534535536537538539540541542543544545546547548549550551552553554555556557558559560561562563564565566567568569570571572573574575576577578579580581582583584585586587588589590591592593594595596597598599600601602603604605606607608609610611612613614615616617618619620621622623624625626627628629630631632633634635636637638639640641642643644645646647648649650651652653654655656657658659660661662663664665666667668669670671672673674675676677678679680681682683684685686687688689690691692693694695696697698699700701702703704705706707708709710711712713714715716717718719720721722723724725726727728729730731732733734735736737738739740741742743744745746747748749750751752753754755756757758759760761762763764765766767768769770771772773774775776777778779780781782783784785786787788789790791792793794795796797798799800801802803804805806807808809810811812813814815816817818819820821822823824825826827828829830831832833834835836837838839840841842843844845846847848849850851852853854855856857858859860861862863864865866867868869870871872873874875876877878879880881882883884885886887888889890891892893894895896897898899900901902903904905906907908909910911912913914915916917918919920921922923924925926927928929930931932933934935936937938939940941942943944945946947948949950951952953954955956
							<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html><head>
    <title>Hadoop 0.17.2 Release Notes</title></head>
<body>
<font face="sans-serif">
    <h1>Hadoop 0.17.2 Release Notes</h1>
The bug fixes are listed below.
<ul><a name="changes">
    <h2>Changes Since Hadoop 0.17.1</h2>
<ul>
<li>[<a href='https://issues.apache.org/jira/browse/HADOOP-3678'>HADOOP-3678</a>] - Avoid spurious exceptions logged at DataNode when clients
    read from DFS.</li>

<li>[<a href='https://issues.apache.org/jira/browse/HADOOP-3760'>HADOOP-3760</a>] - Fix a bug with HDFS file close()</li>

<li>[<a href='https://issues.apache.org/jira/browse/HADOOP-3707'>HADOOP-3707</a>] - NameNode keeps a count of number of blocks scheduled 
    to be written to a datanode and uses it to avoid allocating more
    blocks than a datanode can hold.</li>

<li>[<a href='https://issues.apache.org/jira/browse/HADOOP-3681'>HADOOP-3681</a>] - DFSClient can get into an infinite loop while closing
    a file if there are some errors.</li>

<li>[<a href='https://issues.apache.org/jira/browse/HADOOP-3002'>HADOOP-3002</a>] - Hold off block removal while in safe mode.</li>

<li>[<a href='https://issues.apache.org/jira/browse/HADOOP-3685'>HADOOP-3685</a>] - Unbalanced replication target.</li>

<li>[<a href='https://issues.apache.org/jira/browse/HADOOP-3758'>HADOOP-3758</a>] - Shutdown datanode on version mismatch instead of retrying
    continuously, preventing excessive logging at the namenode.</li>

<li>[<a href='https://issues.apache.org/jira/browse/HADOOP-3633'>HADOOP-3633</a>] - Correct exception handling in DataXceiveServer, and throttle
    the number of xceiver threads in a data-node.</li>

<li>[<a href='https://issues.apache.org/jira/browse/HADOOP-3370'>HADOOP-3370</a>] - Ensure that the TaskTracker.runningJobs data-structure is
    correctly cleaned-up on task completion.</li>

<li>[<a href='https://issues.apache.org/jira/browse/HADOOP-3813'>HADOOP-3813</a>] - Fix task-output clean-up on HDFS to use the recursive 
    FileSystem.delete rather than the FileUtil.fullyDelete.</li>

<li>[<a href='https://issues.apache.org/jira/browse/HADOOP-3859'>HADOOP-3859</a>] - Allow the maximum number of xceivers in the data node to
    be configurable.</li>

<li>[<a href='https://issues.apache.org/jira/browse/HADOOP-3931'>HADOOP-3931</a>] - Fix corner case in the map-side sort that causes some values 
    to be counted as too large and cause pre-mature spills to disk. Some values
    will also bypass the combiner incorrectly.</li>
</ul>
</ul>

    <h1>Hadoop 0.17.1 Release Notes</h1>
The bug fixes are listed below.
<ul><a name="changes">
    <h2>Changes Since Hadoop 0.17.0</h2>
<ul>
<li>[<a href='https://issues.apache.org/jira/browse/HADOOP-2159'>HADOOP-2159</a>] -         Namenode stuck in safemode
</li>
<li>[<a href='https://issues.apache.org/jira/browse/HADOOP-3442'>HADOOP-3442</a>] -         QuickSort may get into unbounded recursion
</li>
<li>[<a href='https://issues.apache.org/jira/browse/HADOOP-3472'>HADOOP-3472</a>] -         MapFile.Reader getClosest() function returns incorrect results when before is true
</li>
<li>[<a href='https://issues.apache.org/jira/browse/HADOOP-3475'>HADOOP-3475</a>] -         MapOutputBuffer allocates 4x as much space to record capacity as intended
</li>
<li>[<a href='https://issues.apache.org/jira/browse/HADOOP-3477'>HADOOP-3477</a>] -         release tar.gz contains duplicate files
</li>
<li>[<a href='https://issues.apache.org/jira/browse/HADOOP-3522'>HADOOP-3522</a>] -         ValuesIterator.next() doesn't return a new object, thus failing many equals() tests.
</li>
<li>[<a href='https://issues.apache.org/jira/browse/HADOOP-3526'>HADOOP-3526</a>] -         contrib/data_join doesn't work
</li>
<li>[<a href='https://issues.apache.org/jira/browse/HADOOP-3550'>HADOOP-3550</a>] -         Reduce tasks failing with OOM
</li>
<li>[<a href='https://issues.apache.org/jira/browse/HADOOP-3565'>HADOOP-3565</a>] -         JavaSerialization can throw java.io.StreamCorruptedException
</li>
</ul>
</ul>

    <h1>Hadoop 0.17.0 Release Notes</h1>

These release notes include new developer and user facing incompatibilities, features, and major improvements.  The table below is sorted by Component.
<ul><a name="changes">
<h2>Changes Since Hadoop 0.16.4</h2>
  <table border="1" width="100%" cellpadding="4">
   <tbody><tr>
    <td><b>Issue</b></td>
    <td><b>Component</b></td>
    <td><b>Notes</b></td>
   </tr>
   <tr>
    <td>
    <a href="https://issues.apache.org/jira/browse/HADOOP-2828">HADOOP-2828</a>
    </td>
    <td>
    conf
    </td>
    <td>
    Remove these deprecated methods in
    <tt>org.apache.hadoop.conf.Configuration</tt>:<br><tt><ul><li>
    public Object getObject(String name) </li><li>
    public void setObject(String name, Object value) </li><li>
    public Object get(String name, Object defaultValue) </li><li>
    public void set(String name, Object value)</li><li>public Iterator entries()
    </li></ul></tt></td>
   </tr>
   <tr>
    <td nowrap>
    <a href="https://issues.apache.org/jira/browse/HADOOP-2410">HADOOP-2410</a>
    </td>
    <td>
    contrib/ec2
    </td>
    <td>
    The command <tt>hadoop-ec2
    run</tt> has been replaced by <tt>hadoop-ec2 launch-cluster
    &lt;group&gt; &lt;number of instances&gt;</tt>, and <tt>hadoop-ec2
    start-hadoop</tt> has been removed since Hadoop is started on instance
    start up. See <a href="http://wiki.apache.org/hadoop/AmazonEC2">http://wiki.apache.org/hadoop/AmazonEC2</a>
    for details.
    </td>
   </tr>
   <tr>
    <td>
    <a href="https://issues.apache.org/jira/browse/HADOOP-2796">HADOOP-2796</a>
    </td>
    <td>
    contrib/hod
    </td>
    <td>
    Added a provision to reliably detect a
    failing script's exit code. When the HOD script option
    returns a non-zero exit code, look for a <tt>script.exitcode</tt>
    file written to the HOD cluster directory. If this file is present, it
    means the script failed with the exit code given in the file.
    </td>
   </tr>
   <tr>
    <td>
    <a href="https://issues.apache.org/jira/browse/HADOOP-2775">HADOOP-2775</a>
    </td>
    <td>
    contrib/hod
    </td>
    <td>
    Added A unit testing framework based on
    pyunit to HOD. Developers contributing patches to HOD should now
    contribute unit tests along with the patches when possible.
    </td>
   </tr>
   <tr>
    <td>
    <a href="https://issues.apache.org/jira/browse/HADOOP-3137">HADOOP-3137</a>
    </td>
    <td>
    contrib/hod
    </td>
    <td>
    The HOD version is now the same as the Hadoop version.
    </td>
   </tr>
   <tr>
    <td>
    <a href="https://issues.apache.org/jira/browse/HADOOP-2855">HADOOP-2855</a>
    </td>
    <td>
    contrib/hod
    </td>
    <td>
    HOD now handles relative
    paths correctly for important HOD options such as the cluster directory,
    tarball option, and script file.
    </td>
   </tr>
   <tr>
    <td>
    <a href="https://issues.apache.org/jira/browse/HADOOP-2899">HADOOP-2899</a>
    </td>
    <td>
    contrib/hod
    </td>
    <td>
    HOD now cleans up the HOD generated mapred system directory
    at cluster deallocation time.
    </td>
   </tr>
   <tr>
    <td>
    <a href="https://issues.apache.org/jira/browse/HADOOP-2982">HADOOP-2982</a>
    </td>
    <td>
    contrib/hod
    </td>
    <td>
    The number of free nodes in the cluster
    is computed using a better algorithm that filters out inconsistencies in
    node status as reported by Torque.
    </td>
   </tr>
   <tr>
    <td>
    <a href="https://issues.apache.org/jira/browse/HADOOP-2947">HADOOP-2947</a>
    </td>
    <td>
    contrib/hod
    </td>
    <td>
    The stdout and stderr streams of
    daemons are redirected to files that are created under the hadoop log
    directory. Users can now send a <tt>kill 3</tt> signal to the daemons to get stack traces
    and thread dumps for debugging.
    </td>
   </tr>
   <tr>
    <td>
    <a href="https://issues.apache.org/jira/browse/HADOOP-3168">HADOOP-3168</a>
    </td>
    <td>
    contrib/streaming
    </td>
    <td>
    Decreased the frequency of logging
    in Hadoop streaming (from every 100 records to every 10,000 records).
    </td>
   </tr>
   <tr>
    <td>
    <a href="https://issues.apache.org/jira/browse/HADOOP-3040">HADOOP-3040</a>
    </td>
    <td>
    contrib/streaming
    </td>
    <td>
    Fixed a critical bug to restore important functionality in Hadoop streaming. If the first character on a line is
    the separator, then an empty key is assumed and the whole line is the value.
    </td>
   </tr>
   <tr>
    <td>
    <a href="https://issues.apache.org/jira/browse/HADOOP-2820">HADOOP-2820</a>
    </td>
    <td>
    contrib/streaming
    </td>
    <td>
    Removed these deprecated classes: <br><tt><ul><li>org.apache.hadoop.streaming.StreamLineRecordReader</li><li>org.apache.hadoop.streaming.StreamOutputFormat</li><li>org.apache.hadoop.streaming.StreamSequenceRecordReader</li></ul></tt></td>
   </tr>
   <tr>
    <td>
    <a href="https://issues.apache.org/jira/browse/HADOOP-3280">HADOOP-3280</a>
    </td>
    <td>
    contrib/streaming
    </td>
    <td>
    Added the
    <tt>mapred.child.ulimit</tt> configuration variable to limit the maximum virtual memory allocated to processes launched by the 
Map-Reduce framework. This can be used to control both the Mapper/Reducer 
tasks and applications using Hadoop pipes, Hadoop streaming etc. 
    </td>
   </tr>
   <tr>
    <td>
    <a href="https://issues.apache.org/jira/browse/HADOOP-2657">HADOOP-2657</a>
    </td>
    <td>
    dfs
    </td>
    <td>Added the new API <tt>DFSOututStream.flush()</tt> to
    flush all outstanding data to DataNodes.
    </td>
   </tr>
   <tr>
    <td>
    <a href="https://issues.apache.org/jira/browse/HADOOP-2219">HADOOP-2219</a>
    </td>
    <td>
    dfs
    </td>
    <td>
    Added a new <tt>fs -count</tt> command for
    counting the number of bytes, files, and directories under a given path. <br>
    <br>
    Added a new RPC <tt>getContentSummary(String path)</tt> to ClientProtocol.
    </td>
   </tr>
   <tr>
    <td>
    <a href="https://issues.apache.org/jira/browse/HADOOP-2559">HADOOP-2559</a>
    </td>
    <td>
    dfs
    </td>
    <td>
    Changed DFS block placement to
    allocate the first replica locally, the second off-rack, and the third
    intra-rack from the second.
    </td>
   </tr>
   <tr>
    <td>
    <a href="https://issues.apache.org/jira/browse/HADOOP-2758">HADOOP-2758</a>
    </td>
    <td>
    dfs
    </td>
    <td>
    Improved DataNode CPU usage by 50% while serving data to clients.
    </td>
   </tr>
   <tr>
    <td>
    <a href="https://issues.apache.org/jira/browse/HADOOP-2634">HADOOP-2634</a>
    </td>
    <td>
    dfs
    </td>
    <td>
    Deprecated ClientProtocol's <tt>exists()</tt> method.  Use <tt>getFileInfo(String)</tt> instead.
    </td>
   </tr>
   <tr>
    <td>
    <a href="https://issues.apache.org/jira/browse/HADOOP-2423">HADOOP-2423</a>
    </td>
    <td>
    dfs
    </td>
    <td>
    Improved <tt>FSDirectory.mkdirs(...)</tt> performance by about 50% as measured by the NNThroughputBenchmark.
    </td>
   </tr>
   <tr>
    <td>
    <a href="https://issues.apache.org/jira/browse/HADOOP-3124">HADOOP-3124</a>
    </td>
    <td>
    dfs
    </td>
    <td>
    Made DataNode socket write timeout configurable, however the configuration variable is undocumented.
    </td>
   </tr>
   <tr>
    <td>
    <a href="https://issues.apache.org/jira/browse/HADOOP-2470">HADOOP-2470</a>
    </td>
    <td>
    dfs
    </td>
    <td>
    Removed <tt>open()</tt> and <tt>isDir()</tt> methods from ClientProtocol without first deprecating. <br>
    <br>
    Remove deprecated <tt>getContentLength()</tt> from ClientProtocol.<br>
    <br>
    Deprecated <tt>isDirectory</tt> in DFSClient.  Use <tt>getFileStatus()</tt> instead.
    </td>
   </tr>
   <tr>
    <td>
    <a href="https://issues.apache.org/jira/browse/HADOOP-2854">HADOOP-2854</a>
    </td>
    <td>
    dfs
    </td>
    <td>
    Removed deprecated method <tt>org.apache.hadoop.ipc.Server.getUserInfo()</tt>.
    </td>
   </tr>
   <tr>
    <td>
    <a href="https://issues.apache.org/jira/browse/HADOOP-2239">HADOOP-2239</a>
    </td>
    <td>
    dfs
    </td>
    <td>
    Added a new FileSystem, HftpsFileSystem, that allows access to HDFS data over HTTPS.
    </td>
   </tr>
   <tr>
    <td>
    <a href="https://issues.apache.org/jira/browse/HADOOP-771">HADOOP-771</a>
    </td>
    <td>
    dfs
    </td>
    <td>
    Added a new method to <tt>FileSystem</tt> API, <tt>delete(path, boolean)</tt>, 
    and deprecated the previous <tt>delete(path)</tt> method.
    The new method recursively deletes files only if boolean is set to true.
    </td>
   </tr>
   <tr>
    <td>
    <a href="https://issues.apache.org/jira/browse/HADOOP-3239">HADOOP-3239</a>
    </td>
    <td>
    dfs
    </td>
    <td>
    Modified <tt>org.apache.hadoop.dfs.FSDirectory.getFileInfo(String)</tt> to return null when a file is not
    found instead of throwing FileNotFoundException.
    </td>
   </tr>
   <tr>
    <td>
    <a href="https://issues.apache.org/jira/browse/HADOOP-3091">HADOOP-3091</a>
    </td>
    <td>
    dfs
    </td>
    <td>
    Enhanced <tt>hadoop dfs -put</tt> command to accept multiple
    sources when destination is a directory.
    </td>
   </tr>
   <tr>
    <td>
    <a href="https://issues.apache.org/jira/browse/HADOOP-2192">HADOOP-2192</a>
    </td>
    <td>
    dfs
    </td>
    <td>
    Modified <tt>hadoop dfs -mv</tt> to be closer in functionality to
    the Linux <tt>mv</tt> command by removing unnecessary output and return
    an error message when moving non existent files/directories.
    </td>
   </tr>
   <tr>
    <td>
    <u1:p></u1:p><a href="https://issues.apache.org/jira/browse/HADOOP-1985">HADOOP-1985</a>
    </td>
    <td>
    dfs <br>
    mapred
    </td>
    <td>
    Added rack awareness for map tasks and moves the rack resolution logic to the 
    NameNode and JobTracker. <p> The administrator can specify a
    loadable class given by topology.node.switch.mapping.impl to specify the
    class implementing the logic for rack resolution. The class must implement
    a method - resolve(List&lt;String&gt; names), where names is the list of
    DNS-names/IP-addresses that we want resolved. The return value is a list of
    resolved network paths of the form /foo/rack, where rack is the rackID
    where the node belongs to and foo is the switch where multiple racks are
    connected, and so on. The default implementation of this class is packaged
    along with hadoop and points to org.apache.hadoop.net.ScriptBasedMapping
    and this class loads a script that can be used for rack resolution. The
    script location is configurable. It is specified by
    topology.script.file.name and defaults to an empty script. In the case
    where the script name is empty, /default-rack is returned for all
    dns-names/IP-addresses. The loadable topology.node.switch.mapping.impl provides
    administrators fleixibilty to define how their site's node resolution
    should happen. <br>
    For mapred, one can also specify the level of the cache w.r.t the number of
    levels in the resolved network path - defaults to two. This means that the
    JobTracker will cache tasks at the host level and at the rack level. <br>
    Known issue: the task caching will not work with levels greater than 2
    (beyond racks). This bug is tracked in <a href="https://issues.apache.org/jira/browse/HADOOP-3296">HADOOP-3296</a>.
    </td>
   </tr>
   <tr>
    <td>
    <a href="https://issues.apache.org/jira/browse/HADOOP-2063">HADOOP-2063</a>
    </td>
    <td>
    fs
    </td>
    <td>
    Added a new option <tt>-ignoreCrc</tt> to <tt>fs -get</tt> and <tt>fs -copyToLocal</tt>.  The option causes CRC checksums to be
    ignored for this command so that corrupt files may be downloaded.
    </td>
   </tr>
   <tr>
    <td>
    <a href="https://issues.apache.org/jira/browse/HADOOP-3001">HADOOP-3001</a>
    </td>
    <td>
    fs
    </td>
    <td>
    Added a new Map/Reduce framework
    counters that track the number of bytes read and written to HDFS, local,
    KFS, and S3 file systems.
    </td>
   </tr>
   <tr>
    <td>
    <a href="https://issues.apache.org/jira/browse/HADOOP-2027">HADOOP-2027</a>
    </td>
    <td>
    fs
    </td>
    <td>
    Added a new FileSystem method <tt>getFileBlockLocations</tt> to return the number of bytes in each block in a file
    via a single rpc to the NameNode. Deprecated <tt>getFileCacheHints</tt>.
    </td>
   </tr>
   <tr>
    <td>
    <a href="https://issues.apache.org/jira/browse/HADOOP-2839">HADOOP-2839</a>
    </td>
    <td>
    fs
    </td>
    <td>
    Removed deprecated method <tt>org.apache.hadoop.fs.FileSystem.globPaths()</tt>.
    </td>
   </tr>
   <tr>
    <td>
    <a href="https://issues.apache.org/jira/browse/HADOOP-2563">HADOOP-2563</a>
    </td>
    <td>
    fs
    </td>
    <td>
    Removed deprecated method <tt>org.apache.hadoop.fs.FileSystem.listPaths()</tt>.
    </td>
   </tr>
   <tr>
    <td>
    <a href="https://issues.apache.org/jira/browse/HADOOP-1593">HADOOP-1593</a>
    </td>
    <td>
    fs
    </td>
    <td>
    Modified FSShell commands to accept non-default paths. Now you can commands like <tt>hadoop dfs -ls hdfs://remotehost1:port/path</tt>
    and <tt>hadoop dfs -ls hdfs://remotehost2:port/path</tt> without changing your Hadoop config.
    </td>
   </tr>
   <tr>
    <td>
    <a href="https://issues.apache.org/jira/browse/HADOOP-3048">HADOOP-3048</a>
    </td>
    <td>
    io
    </td>
    <td>
    Added a new API and a default
    implementation to convert and restore serializations of objects to strings.
    </td>
   </tr>
   <tr>
    <td>
    <a href="https://issues.apache.org/jira/browse/HADOOP-3152">HADOOP-3152</a>
    </td>
    <td>
    io
    </td>
    <td>
    Add a static method
    <tt>MapFile.setIndexInterval(Configuration, int interval)</tt> so that Map/Reduce
    jobs using <tt>MapFileOutputFormat</tt> can set the index interval.
    </td>
   </tr>
   <tr>
    <td>
    <a href="https://issues.apache.org/jira/browse/HADOOP-3073">HADOOP-3073</a>
    </td>
    <td>
    ipc
    </td>
    <td>
    <tt>SocketOutputStream.close()</tt> now closes the
    underlying channel. This increase compatibility with
    <tt>java.net.Socket.getOutputStream</tt>.
    </td>
   </tr>
   <tr>
    <td>
    <a href="https://issues.apache.org/jira/browse/HADOOP-3041">HADOOP-3041</a>
    </td>
    <td>
    mapred
    </td>
    <td>
    Deprecated <tt>JobConf.setOutputPath</tt> and <tt>JobConf.getOutputPath</tt>.<p>
    Deprecated <tt>OutputFormatBase</tt>. Added <tt>FileOutputFormat</tt>. Existing output
    formats extending <tt>OutputFormatBase</tt> now extend <tt>FileOutputFormat</tt>. <p>
    Added the following methods to <tt>FileOutputFormat</tt>:
    <tt><ul>
    <li>public static void setOutputPath(JobConf conf, Path outputDir)
    <li>public static Path getOutputPath(JobConf conf)
    <li>public static Path getWorkOutputPath(JobConf conf)
    <li>static void setWorkOutputPath(JobConf conf, Path outputDir) 
    </ul></tt>
    </td>
   </tr>
   <tr>
    <td>
    <a href="https://issues.apache.org/jira/browse/HADOOP-3204">HADOOP-3204</a>
    </td>
    <td>
    mapred
    </td>
    <td>
    Fixed <tt>ReduceTask.LocalFSMerger</tt> to handle errors and exceptions better. Prior to this all
    exceptions except IOException would be silently ignored.
    </td>
   </tr>
   <tr>
    <td>
    <a href="https://issues.apache.org/jira/browse/HADOOP-1986">HADOOP-1986</a>
    </td>
    <td>
    mapred
    </td>
    <td>
    Programs that implement the raw
    <tt>Mapper</tt> or <tt>Reducer</tt> interfaces will need modification to compile with this
    release. For example, <p>
    <pre>
    class MyMapper implements Mapper { 
        public void map(WritableComparable key, Writable val, 
                    OutputCollector out, Reporter reporter) throws IOException { 
            // ... 
        } 
        // ... 
    }
    </pre>
    will need to be changed to refer to the parameterized type. For example: <p>
    <pre>
    class MyMapper implements Mapper&lt;WritableComparable, Writable, WritableComparable, Writable&gt; { 
        public void map(WritableComparable key, Writable val, 
                        OutputCollector&lt;WritableComparable, Writable&gt;
                        out, Reporter reporter) throws IOException { 
            // ... 
        } 
        // ... 
    } 
    </pre>
    Similarly implementations of the following raw interfaces will need
    modification:
    <tt><ul>
    <li>InputFormat
    <li>OutputCollector 
    <li>OutputFormat 
    <li>Partitioner
    <li>RecordReader 
    <li>RecordWriter
    </ul></tt>
    </td>
   </tr>
   <tr>
    <td>
    <a href="https://issues.apache.org/jira/browse/HADOOP-910">HADOOP-910</a>
    </td>
    <td>
    mapred
    </td>
    <td>
    Reducers now perform merges of
    shuffle data (both in-memory and on disk) while fetching map outputs.
    Earlier, during shuffle they used to merge only the in-memory outputs.
    </td>
   </tr>
   <tr>
    <td>
    <a href="https://issues.apache.org/jira/browse/HADOOP-2822">HADOOP-2822</a>
    </td>
    <td>
    mapred
    </td>
    <td>
    Removed the deprecated classes <tt>org.apache.hadoop.mapred.InputFormatBase</tt> 
    and <tt>org.apache.hadoop.mapred.PhasedFileSystem</tt>.
    </td>
   </tr>
   <tr>
    <td>
    <a href="https://issues.apache.org/jira/browse/HADOOP-2817">HADOOP-2817</a>
    </td>
    <td>
    mapred
    </td>
    <td>
    Removed the deprecated method 
    <tt>org.apache.hadoop.mapred.ClusterStatus.getMaxTasks()</tt> 
    and the deprecated configuration property <tt>mapred.tasktracker.tasks.maximum</tt>.
    </td>
   </tr>
   <tr>
    <td>
    <a href="https://issues.apache.org/jira/browse/HADOOP-2825">HADOOP-2825</a>
    </td>
    <td>
    mapred
    </td>
    <td>
    Removed the deprecated method
    <tt>org.apache.hadoop.mapred.MapOutputLocation.getFile(FileSystem fileSys, Path
    localFilename, int reduce, Progressable pingee, int timeout)</tt>.
    </td>
   </tr>
   <tr>
    <td>
    <a href="https://issues.apache.org/jira/browse/HADOOP-2818">HADOOP-2818</a>
    </td>
    <td>
    mapred
    </td>
    <td>
    Removed the deprecated methods
    <tt>org.apache.hadoop.mapred.Counters.getDisplayName(String counter)</tt> and
    <tt>org.apache.hadoop.mapred.Counters.getCounterNames()</tt>.
    Undeprecated the method 
    <tt>org.apache.hadoop.mapred.Counters.getCounter(String counterName)</tt>.
    </td>
   </tr>
   <tr>
    <td>
    <a href="https://issues.apache.org/jira/browse/HADOOP-2826">HADOOP-2826</a>
    </td>
    <td>
    mapred
    </td>
    <td>
    Changed The signature of the method
    <tt>public org.apache.hadoop.streaming.UTF8ByteArrayUtils.readLIne(InputStream)</tt> to
    <tt>UTF8ByteArrayUtils.readLIne(LineReader, Text)</tt>. Since the old
    signature is not deprecated, any code using the old method must be changed
    to use the new method.
    <p>
    Removed the deprecated methods <tt>org.apache.hadoop.mapred.FileSplit.getFile()</tt>
    and <tt>org.apache.hadoop.mapred.LineRecordReader.readLine(InputStream in,
    OutputStream out)</tt>.
    <p>
    Made the constructor <tt>org.apache.hadoop.mapred.LineRecordReader.LineReader(InputStream in, Configuration
    conf)</tt> public.
    </td>
   </tr>
   <tr>
    <td>
    <a href="https://issues.apache.org/jira/browse/HADOOP-2819">HADOOP-2819</a>
    </td>
    <td>
    mapred
    </td>
    <td>
    Removed these deprecated methods from <tt>org.apache.hadoop.JobConf</tt>:
    <tt><ul>
    <li>public Class getInputKeyClass() 
    <li>public void setInputKeyClass(Class theClass) 
    <li>public Class getInputValueClass() 
    <li>public void setInputValueClass(Class theClass) 
    </ul></tt>
    and undeprecated these methods:
    <tt><ul>
    <li>getSpeculativeExecution() 
    <li>public void setSpeculativeExecution(boolean speculativeExecution)
    </ul></tt>
    </td>
   </tr>
   <tr>
    <td>
    <a href="https://issues.apache.org/jira/browse/HADOOP-3093">HADOOP-3093</a>
    </td>
    <td>
    mapred
    </td>
    <td>
    Added the following public methods to <tt>org.apache.hadoop.conf.Configuration</tt>:
    <tt><ul>
    <li>String[] Configuration.getStrings(String name, String... defaultValue)
    <li>void Configuration.setStrings(String name, String... values)
    </ul></tt>
    </td>
   </tr>
   <tr>
    <td>
    <a href="https://issues.apache.org/jira/browse/HADOOP-2399">HADOOP-2399</a>
    </td>
    <td>
    mapred
    </td>
    <td>
    The key and value objects that are given
    to the Combiner and Reducer are now reused between calls. This is much more
    efficient, but the user can not assume the objects are constant.
    </td>
   </tr>
   <tr>
    <td>
    <a href="https://issues.apache.org/jira/browse/HADOOP-3162">HADOOP-3162</a>
    </td>
    <td>
    mapred
    </td>
    <td>
    Deprecated the public methods <tt>org.apache.hadoop.mapred.JobConf.setInputPath(Path)</tt> and
    <tt>org.apache.hadoop.mapred.JobConf.addInputPath(Path)</tt>.
    <p>
    Added the following public methods to <tt>org.apache.hadoop.mapred.FileInputFormat</tt>:
    <tt><ul>
    <li>public static void setInputPaths(JobConf job, Path... paths); <br>
    <li>public static void setInputPaths(JobConf job, String commaSeparatedPaths); <br>
    <li>public static void addInputPath(JobConf job, Path path); <br>
    <li>public static void addInputPaths(JobConf job, String commaSeparatedPaths); <br>
    </ul></tt>
    Earlier code calling <tt>JobConf.setInputPath(Path)</tt> and <tt>JobConf.addInputPath(Path)</tt>
    should now call <tt>FileInputFormat.setInputPaths(JobConf, Path...)</tt> and
    <tt>FileInputFormat.addInputPath(Path)</tt> respectively.
    </td>
   </tr>
   <tr>
    <td>
    <a href="https://issues.apache.org/jira/browse/HADOOP-2178">HADOOP-2178</a>
    </td>
    <td>
    mapred
    </td>
    <td>
    Provided a new facility to
    store job history on DFS. Cluster administrator can now provide either localFS
    location or DFS location using configuration property
    <tt>mapred.job.history.location</tt> to store job histroy. History will also
    be logged in user specified location if the configuration property
    <tt>mapred.job.history.user.location</tt> is specified.
    <p>
    Removed these classes and method:
    <tt><ul>
    <li>org.apache.hadoop.mapred.DefaultJobHistoryParser.MasterIndex
    <li>org.apache.hadoop.mapred.DefaultJobHistoryParser.MasterIndexParseListener
    <li>org.apache.hadoop.mapred.DefaultJobHistoryParser.parseMasterIndex
    </ul></tt>
    <p>
    Changed the signature of the public method
    <tt>org.apache.hadoop.mapred.DefaultJobHistoryParser.parseJobTasks(File
    jobHistoryFile, JobHistory.JobInfo job)</tt> to
    <tt>DefaultJobHistoryParser.parseJobTasks(String jobHistoryFile,
    JobHistory.JobInfo job, FileSystem fs)</tt>. <p>
    Changed the signature of the public method
    <tt>org.apache.hadoop.mapred.JobHistory.parseHistory(File path, Listener l)</tt>
    to <tt>JobHistory.parseHistoryFromFS(String path, Listener l, FileSystem fs)</tt>.
    </td>
   </tr>
   <tr>
    <td>
    <a href="https://issues.apache.org/jira/browse/HADOOP-2055">HADOOP-2055</a>
    </td>
    <td>
    mapred
    </td>
    <td>
    Users are now provided the ability to specify what paths to ignore when processing the job input directory
    (apart from the filenames that start with "_" and ".").
    To do this, two new methods were defined: 
    <tt><ul>
    <li>FileInputFormat.setInputPathFilter(JobConf, PathFilter)
    <li>FileInputFormat.getInputPathFilter(JobConf)
    </ul></tt>
    </td>
   </tr>
   <tr>
    <td>
    <a href="https://issues.apache.org/jira/browse/HADOOP-2116">HADOOP-2116</a>
    </td>
    <td>
    mapred
    </td>
    <td>
    Restructured the local job directory on the tasktracker. Users are provided with a job-specific shared directory
    (<tt>mapred-local/taskTracker/jobcache/$jobid/work</tt>) for use as scratch
    space, through configuration property and system property
    <tt>job.local.dir</tt>. The directory <tt>../work</tt> is no longer available from the task's current working directory.
    </td>
   </tr>
   <tr>
    <td>
    <a href="https://issues.apache.org/jira/browse/HADOOP-1622">HADOOP-1622</a>
    </td>
    <td>
    mapred
    </td>
    <td>
    Added new command line options for <tt>hadoop jar</tt> command:
    <p>
    <tt>hadoop jar -files &lt;comma seperated list of files&gt; -libjars &lt;comma
    seperated list of jars&gt; -archives &lt;comma seperated list of
    archives&gt; </tt>
    <p>
    where the options have these meanings:
    <p>
    <ul>
    <li><tt>-files</tt> options allows you to speficy comma seperated list of path which
    would be present in your current working directory of your task <br>
    <li><tt>-libjars</tt> option allows you to add jars to the classpaths of the maps and
    reduces. <br>
    <li><tt>-archives</tt> allows you to pass archives as arguments that are
    unzipped/unjarred and a link with name of the jar/zip are created in the
    current working directory if tasks.
    </ul>
    </td>
   </tr>
   <tr>
    <td>
    <a href="https://issues.apache.org/jira/browse/HADOOP-2823">HADOOP-2823</a>
    </td>
    <td>
    record
    </td>
    <td>
    Removed the deprecated methods in
    <tt>org.apache.hadoop.record.compiler.generated.SimpleCharStream</tt>:
    <tt><ul>
    <li>public int getColumn() 
    <li>and public int getLine() 
    </ul></tt>
    </td>
   </tr>
   <tr>
    <td>
    <a href="https://issues.apache.org/jira/browse/HADOOP-2551">HADOOP-2551</a>
    </td>
    <td>
    scripts
    </td>
    <td>
    Introduced new environment variables to allow finer grained control of Java options passed to server and
    client JVMs. See the new <tt>*_OPTS</tt> variables in <tt>conf/hadoop-env.sh</tt>.
    </td>
   </tr>
   <tr>
    <td>
    <a href="https://issues.apache.org/jira/browse/HADOOP-3099">HADOOP-3099</a>
    </td>
    <td>
    util
    </td>
    <td>
    Added a new <tt>-p</tt> option to <tt>distcp</tt> for preserving file and directory status:
    <pre>
    -p[rbugp] Preserve status
        r: replication number 
        b: block size 
        u: user
        g: group 
        p: permission
    </pre>
    The <tt>-p</tt> option alone is equivalent to <tt>-prbugp</tt>
   </td>
   </tr>
   <tr>
    <td>
    <a href="https://issues.apache.org/jira/browse/HADOOP-2821">HADOOP-2821</a>
    </td>
    <td>
    util
    </td>
    <td>
    Removed the deprecated classes <tt>org.apache.hadoop.util.ShellUtil</tt> and <tt>org.apache.hadoop.util.ToolBase</tt>.
    </td>
   </tr>
  </tbody></table>

</ul>

</body></html>