123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506507508509510511512513514515516517518519520521522523524525526527528529530531532533534535536537538539540541542543544545546547548549550551552553554555556557558559560561562563564565566567568569570571572573574575576577578579580581582583584585586587588589590591592593594595596597598599600601602603604605606607608609610611612613614615616617618619620621622623624625626627628629630631632633634635636637638639640641642643644645646647648649650651652653654655656657658659660661662663664665666667668669670671672673674675676677678679680681682683684685686687688689690691692693694695696697698699700701702703704705706707708709710711712713714715716717718719720721722723724725726727728729730731732733734735736737738739740741742743744745746747748749750751752753754755756757758759760761762763764765766767768769770771772773774775776777778779780781782783784785786787788789790791792793794795796797798799800801802803804805806807808809810811812813814815816817818819820821822823824825826827828829830831832833834835836837838839840841842843844845846847848849850851852853854855856857858859860861862863864865866867868869870871872873874875876877878879880881882883884885886887888889890891892893894895896897898899900901902903904905906907908909910911912913914915916917918919920921922923924925926927928929930931932933934935936937938939940941942943944945946947948949950951952953954955956 |
- <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
- <html><head>
- <title>Hadoop 0.17.2 Release Notes</title></head>
- <body>
- <font face="sans-serif">
- <h1>Hadoop 0.17.2 Release Notes</h1>
- The bug fixes are listed below.
- <ul><a name="changes">
- <h2>Changes Since Hadoop 0.17.1</h2>
- <ul>
- <li>[<a href='https://issues.apache.org/jira/browse/HADOOP-3678'>HADOOP-3678</a>] - Avoid spurious exceptions logged at DataNode when clients
- read from DFS.</li>
- <li>[<a href='https://issues.apache.org/jira/browse/HADOOP-3760'>HADOOP-3760</a>] - Fix a bug with HDFS file close()</li>
- <li>[<a href='https://issues.apache.org/jira/browse/HADOOP-3707'>HADOOP-3707</a>] - NameNode keeps a count of number of blocks scheduled
- to be written to a datanode and uses it to avoid allocating more
- blocks than a datanode can hold.</li>
- <li>[<a href='https://issues.apache.org/jira/browse/HADOOP-3681'>HADOOP-3681</a>] - DFSClient can get into an infinite loop while closing
- a file if there are some errors.</li>
- <li>[<a href='https://issues.apache.org/jira/browse/HADOOP-3002'>HADOOP-3002</a>] - Hold off block removal while in safe mode.</li>
- <li>[<a href='https://issues.apache.org/jira/browse/HADOOP-3685'>HADOOP-3685</a>] - Unbalanced replication target.</li>
- <li>[<a href='https://issues.apache.org/jira/browse/HADOOP-3758'>HADOOP-3758</a>] - Shutdown datanode on version mismatch instead of retrying
- continuously, preventing excessive logging at the namenode.</li>
- <li>[<a href='https://issues.apache.org/jira/browse/HADOOP-3633'>HADOOP-3633</a>] - Correct exception handling in DataXceiveServer, and throttle
- the number of xceiver threads in a data-node.</li>
- <li>[<a href='https://issues.apache.org/jira/browse/HADOOP-3370'>HADOOP-3370</a>] - Ensure that the TaskTracker.runningJobs data-structure is
- correctly cleaned-up on task completion.</li>
- <li>[<a href='https://issues.apache.org/jira/browse/HADOOP-3813'>HADOOP-3813</a>] - Fix task-output clean-up on HDFS to use the recursive
- FileSystem.delete rather than the FileUtil.fullyDelete.</li>
- <li>[<a href='https://issues.apache.org/jira/browse/HADOOP-3859'>HADOOP-3859</a>] - Allow the maximum number of xceivers in the data node to
- be configurable.</li>
- <li>[<a href='https://issues.apache.org/jira/browse/HADOOP-3931'>HADOOP-3931</a>] - Fix corner case in the map-side sort that causes some values
- to be counted as too large and cause pre-mature spills to disk. Some values
- will also bypass the combiner incorrectly.</li>
- </ul>
- </ul>
- <h1>Hadoop 0.17.1 Release Notes</h1>
- The bug fixes are listed below.
- <ul><a name="changes">
- <h2>Changes Since Hadoop 0.17.0</h2>
- <ul>
- <li>[<a href='https://issues.apache.org/jira/browse/HADOOP-2159'>HADOOP-2159</a>] - Namenode stuck in safemode
- </li>
- <li>[<a href='https://issues.apache.org/jira/browse/HADOOP-3442'>HADOOP-3442</a>] - QuickSort may get into unbounded recursion
- </li>
- <li>[<a href='https://issues.apache.org/jira/browse/HADOOP-3472'>HADOOP-3472</a>] - MapFile.Reader getClosest() function returns incorrect results when before is true
- </li>
- <li>[<a href='https://issues.apache.org/jira/browse/HADOOP-3475'>HADOOP-3475</a>] - MapOutputBuffer allocates 4x as much space to record capacity as intended
- </li>
- <li>[<a href='https://issues.apache.org/jira/browse/HADOOP-3477'>HADOOP-3477</a>] - release tar.gz contains duplicate files
- </li>
- <li>[<a href='https://issues.apache.org/jira/browse/HADOOP-3522'>HADOOP-3522</a>] - ValuesIterator.next() doesn't return a new object, thus failing many equals() tests.
- </li>
- <li>[<a href='https://issues.apache.org/jira/browse/HADOOP-3526'>HADOOP-3526</a>] - contrib/data_join doesn't work
- </li>
- <li>[<a href='https://issues.apache.org/jira/browse/HADOOP-3550'>HADOOP-3550</a>] - Reduce tasks failing with OOM
- </li>
- <li>[<a href='https://issues.apache.org/jira/browse/HADOOP-3565'>HADOOP-3565</a>] - JavaSerialization can throw java.io.StreamCorruptedException
- </li>
- </ul>
- </ul>
- <h1>Hadoop 0.17.0 Release Notes</h1>
- These release notes include new developer and user facing incompatibilities, features, and major improvements. The table below is sorted by Component.
- <ul><a name="changes">
- <h2>Changes Since Hadoop 0.16.4</h2>
- <table border="1" width="100%" cellpadding="4">
- <tbody><tr>
- <td><b>Issue</b></td>
- <td><b>Component</b></td>
- <td><b>Notes</b></td>
- </tr>
- <tr>
- <td>
- <a href="https://issues.apache.org/jira/browse/HADOOP-2828">HADOOP-2828</a>
- </td>
- <td>
- conf
- </td>
- <td>
- Remove these deprecated methods in
- <tt>org.apache.hadoop.conf.Configuration</tt>:<br><tt><ul><li>
- public Object getObject(String name) </li><li>
- public void setObject(String name, Object value) </li><li>
- public Object get(String name, Object defaultValue) </li><li>
- public void set(String name, Object value)</li><li>public Iterator entries()
- </li></ul></tt></td>
- </tr>
- <tr>
- <td nowrap>
- <a href="https://issues.apache.org/jira/browse/HADOOP-2410">HADOOP-2410</a>
- </td>
- <td>
- contrib/ec2
- </td>
- <td>
- The command <tt>hadoop-ec2
- run</tt> has been replaced by <tt>hadoop-ec2 launch-cluster
- <group> <number of instances></tt>, and <tt>hadoop-ec2
- start-hadoop</tt> has been removed since Hadoop is started on instance
- start up. See <a href="http://wiki.apache.org/hadoop/AmazonEC2">http://wiki.apache.org/hadoop/AmazonEC2</a>
- for details.
- </td>
- </tr>
- <tr>
- <td>
- <a href="https://issues.apache.org/jira/browse/HADOOP-2796">HADOOP-2796</a>
- </td>
- <td>
- contrib/hod
- </td>
- <td>
- Added a provision to reliably detect a
- failing script's exit code. When the HOD script option
- returns a non-zero exit code, look for a <tt>script.exitcode</tt>
- file written to the HOD cluster directory. If this file is present, it
- means the script failed with the exit code given in the file.
- </td>
- </tr>
- <tr>
- <td>
- <a href="https://issues.apache.org/jira/browse/HADOOP-2775">HADOOP-2775</a>
- </td>
- <td>
- contrib/hod
- </td>
- <td>
- Added A unit testing framework based on
- pyunit to HOD. Developers contributing patches to HOD should now
- contribute unit tests along with the patches when possible.
- </td>
- </tr>
- <tr>
- <td>
- <a href="https://issues.apache.org/jira/browse/HADOOP-3137">HADOOP-3137</a>
- </td>
- <td>
- contrib/hod
- </td>
- <td>
- The HOD version is now the same as the Hadoop version.
- </td>
- </tr>
- <tr>
- <td>
- <a href="https://issues.apache.org/jira/browse/HADOOP-2855">HADOOP-2855</a>
- </td>
- <td>
- contrib/hod
- </td>
- <td>
- HOD now handles relative
- paths correctly for important HOD options such as the cluster directory,
- tarball option, and script file.
- </td>
- </tr>
- <tr>
- <td>
- <a href="https://issues.apache.org/jira/browse/HADOOP-2899">HADOOP-2899</a>
- </td>
- <td>
- contrib/hod
- </td>
- <td>
- HOD now cleans up the HOD generated mapred system directory
- at cluster deallocation time.
- </td>
- </tr>
- <tr>
- <td>
- <a href="https://issues.apache.org/jira/browse/HADOOP-2982">HADOOP-2982</a>
- </td>
- <td>
- contrib/hod
- </td>
- <td>
- The number of free nodes in the cluster
- is computed using a better algorithm that filters out inconsistencies in
- node status as reported by Torque.
- </td>
- </tr>
- <tr>
- <td>
- <a href="https://issues.apache.org/jira/browse/HADOOP-2947">HADOOP-2947</a>
- </td>
- <td>
- contrib/hod
- </td>
- <td>
- The stdout and stderr streams of
- daemons are redirected to files that are created under the hadoop log
- directory. Users can now send a <tt>kill 3</tt> signal to the daemons to get stack traces
- and thread dumps for debugging.
- </td>
- </tr>
- <tr>
- <td>
- <a href="https://issues.apache.org/jira/browse/HADOOP-3168">HADOOP-3168</a>
- </td>
- <td>
- contrib/streaming
- </td>
- <td>
- Decreased the frequency of logging
- in Hadoop streaming (from every 100 records to every 10,000 records).
- </td>
- </tr>
- <tr>
- <td>
- <a href="https://issues.apache.org/jira/browse/HADOOP-3040">HADOOP-3040</a>
- </td>
- <td>
- contrib/streaming
- </td>
- <td>
- Fixed a critical bug to restore important functionality in Hadoop streaming. If the first character on a line is
- the separator, then an empty key is assumed and the whole line is the value.
- </td>
- </tr>
- <tr>
- <td>
- <a href="https://issues.apache.org/jira/browse/HADOOP-2820">HADOOP-2820</a>
- </td>
- <td>
- contrib/streaming
- </td>
- <td>
- Removed these deprecated classes: <br><tt><ul><li>org.apache.hadoop.streaming.StreamLineRecordReader</li><li>org.apache.hadoop.streaming.StreamOutputFormat</li><li>org.apache.hadoop.streaming.StreamSequenceRecordReader</li></ul></tt></td>
- </tr>
- <tr>
- <td>
- <a href="https://issues.apache.org/jira/browse/HADOOP-3280">HADOOP-3280</a>
- </td>
- <td>
- contrib/streaming
- </td>
- <td>
- Added the
- <tt>mapred.child.ulimit</tt> configuration variable to limit the maximum virtual memory allocated to processes launched by the
- Map-Reduce framework. This can be used to control both the Mapper/Reducer
- tasks and applications using Hadoop pipes, Hadoop streaming etc.
- </td>
- </tr>
- <tr>
- <td>
- <a href="https://issues.apache.org/jira/browse/HADOOP-2657">HADOOP-2657</a>
- </td>
- <td>
- dfs
- </td>
- <td>Added the new API <tt>DFSOututStream.flush()</tt> to
- flush all outstanding data to DataNodes.
- </td>
- </tr>
- <tr>
- <td>
- <a href="https://issues.apache.org/jira/browse/HADOOP-2219">HADOOP-2219</a>
- </td>
- <td>
- dfs
- </td>
- <td>
- Added a new <tt>fs -count</tt> command for
- counting the number of bytes, files, and directories under a given path. <br>
- <br>
- Added a new RPC <tt>getContentSummary(String path)</tt> to ClientProtocol.
- </td>
- </tr>
- <tr>
- <td>
- <a href="https://issues.apache.org/jira/browse/HADOOP-2559">HADOOP-2559</a>
- </td>
- <td>
- dfs
- </td>
- <td>
- Changed DFS block placement to
- allocate the first replica locally, the second off-rack, and the third
- intra-rack from the second.
- </td>
- </tr>
- <tr>
- <td>
- <a href="https://issues.apache.org/jira/browse/HADOOP-2758">HADOOP-2758</a>
- </td>
- <td>
- dfs
- </td>
- <td>
- Improved DataNode CPU usage by 50% while serving data to clients.
- </td>
- </tr>
- <tr>
- <td>
- <a href="https://issues.apache.org/jira/browse/HADOOP-2634">HADOOP-2634</a>
- </td>
- <td>
- dfs
- </td>
- <td>
- Deprecated ClientProtocol's <tt>exists()</tt> method. Use <tt>getFileInfo(String)</tt> instead.
- </td>
- </tr>
- <tr>
- <td>
- <a href="https://issues.apache.org/jira/browse/HADOOP-2423">HADOOP-2423</a>
- </td>
- <td>
- dfs
- </td>
- <td>
- Improved <tt>FSDirectory.mkdirs(...)</tt> performance by about 50% as measured by the NNThroughputBenchmark.
- </td>
- </tr>
- <tr>
- <td>
- <a href="https://issues.apache.org/jira/browse/HADOOP-3124">HADOOP-3124</a>
- </td>
- <td>
- dfs
- </td>
- <td>
- Made DataNode socket write timeout configurable, however the configuration variable is undocumented.
- </td>
- </tr>
- <tr>
- <td>
- <a href="https://issues.apache.org/jira/browse/HADOOP-2470">HADOOP-2470</a>
- </td>
- <td>
- dfs
- </td>
- <td>
- Removed <tt>open()</tt> and <tt>isDir()</tt> methods from ClientProtocol without first deprecating. <br>
- <br>
- Remove deprecated <tt>getContentLength()</tt> from ClientProtocol.<br>
- <br>
- Deprecated <tt>isDirectory</tt> in DFSClient. Use <tt>getFileStatus()</tt> instead.
- </td>
- </tr>
- <tr>
- <td>
- <a href="https://issues.apache.org/jira/browse/HADOOP-2854">HADOOP-2854</a>
- </td>
- <td>
- dfs
- </td>
- <td>
- Removed deprecated method <tt>org.apache.hadoop.ipc.Server.getUserInfo()</tt>.
- </td>
- </tr>
- <tr>
- <td>
- <a href="https://issues.apache.org/jira/browse/HADOOP-2239">HADOOP-2239</a>
- </td>
- <td>
- dfs
- </td>
- <td>
- Added a new FileSystem, HftpsFileSystem, that allows access to HDFS data over HTTPS.
- </td>
- </tr>
- <tr>
- <td>
- <a href="https://issues.apache.org/jira/browse/HADOOP-771">HADOOP-771</a>
- </td>
- <td>
- dfs
- </td>
- <td>
- Added a new method to <tt>FileSystem</tt> API, <tt>delete(path, boolean)</tt>,
- and deprecated the previous <tt>delete(path)</tt> method.
- The new method recursively deletes files only if boolean is set to true.
- </td>
- </tr>
- <tr>
- <td>
- <a href="https://issues.apache.org/jira/browse/HADOOP-3239">HADOOP-3239</a>
- </td>
- <td>
- dfs
- </td>
- <td>
- Modified <tt>org.apache.hadoop.dfs.FSDirectory.getFileInfo(String)</tt> to return null when a file is not
- found instead of throwing FileNotFoundException.
- </td>
- </tr>
- <tr>
- <td>
- <a href="https://issues.apache.org/jira/browse/HADOOP-3091">HADOOP-3091</a>
- </td>
- <td>
- dfs
- </td>
- <td>
- Enhanced <tt>hadoop dfs -put</tt> command to accept multiple
- sources when destination is a directory.
- </td>
- </tr>
- <tr>
- <td>
- <a href="https://issues.apache.org/jira/browse/HADOOP-2192">HADOOP-2192</a>
- </td>
- <td>
- dfs
- </td>
- <td>
- Modified <tt>hadoop dfs -mv</tt> to be closer in functionality to
- the Linux <tt>mv</tt> command by removing unnecessary output and return
- an error message when moving non existent files/directories.
- </td>
- </tr>
- <tr>
- <td>
- <u1:p></u1:p><a href="https://issues.apache.org/jira/browse/HADOOP-1985">HADOOP-1985</a>
- </td>
- <td>
- dfs <br>
- mapred
- </td>
- <td>
- Added rack awareness for map tasks and moves the rack resolution logic to the
- NameNode and JobTracker. <p> The administrator can specify a
- loadable class given by topology.node.switch.mapping.impl to specify the
- class implementing the logic for rack resolution. The class must implement
- a method - resolve(List<String> names), where names is the list of
- DNS-names/IP-addresses that we want resolved. The return value is a list of
- resolved network paths of the form /foo/rack, where rack is the rackID
- where the node belongs to and foo is the switch where multiple racks are
- connected, and so on. The default implementation of this class is packaged
- along with hadoop and points to org.apache.hadoop.net.ScriptBasedMapping
- and this class loads a script that can be used for rack resolution. The
- script location is configurable. It is specified by
- topology.script.file.name and defaults to an empty script. In the case
- where the script name is empty, /default-rack is returned for all
- dns-names/IP-addresses. The loadable topology.node.switch.mapping.impl provides
- administrators fleixibilty to define how their site's node resolution
- should happen. <br>
- For mapred, one can also specify the level of the cache w.r.t the number of
- levels in the resolved network path - defaults to two. This means that the
- JobTracker will cache tasks at the host level and at the rack level. <br>
- Known issue: the task caching will not work with levels greater than 2
- (beyond racks). This bug is tracked in <a href="https://issues.apache.org/jira/browse/HADOOP-3296">HADOOP-3296</a>.
- </td>
- </tr>
- <tr>
- <td>
- <a href="https://issues.apache.org/jira/browse/HADOOP-2063">HADOOP-2063</a>
- </td>
- <td>
- fs
- </td>
- <td>
- Added a new option <tt>-ignoreCrc</tt> to <tt>fs -get</tt> and <tt>fs -copyToLocal</tt>. The option causes CRC checksums to be
- ignored for this command so that corrupt files may be downloaded.
- </td>
- </tr>
- <tr>
- <td>
- <a href="https://issues.apache.org/jira/browse/HADOOP-3001">HADOOP-3001</a>
- </td>
- <td>
- fs
- </td>
- <td>
- Added a new Map/Reduce framework
- counters that track the number of bytes read and written to HDFS, local,
- KFS, and S3 file systems.
- </td>
- </tr>
- <tr>
- <td>
- <a href="https://issues.apache.org/jira/browse/HADOOP-2027">HADOOP-2027</a>
- </td>
- <td>
- fs
- </td>
- <td>
- Added a new FileSystem method <tt>getFileBlockLocations</tt> to return the number of bytes in each block in a file
- via a single rpc to the NameNode. Deprecated <tt>getFileCacheHints</tt>.
- </td>
- </tr>
- <tr>
- <td>
- <a href="https://issues.apache.org/jira/browse/HADOOP-2839">HADOOP-2839</a>
- </td>
- <td>
- fs
- </td>
- <td>
- Removed deprecated method <tt>org.apache.hadoop.fs.FileSystem.globPaths()</tt>.
- </td>
- </tr>
- <tr>
- <td>
- <a href="https://issues.apache.org/jira/browse/HADOOP-2563">HADOOP-2563</a>
- </td>
- <td>
- fs
- </td>
- <td>
- Removed deprecated method <tt>org.apache.hadoop.fs.FileSystem.listPaths()</tt>.
- </td>
- </tr>
- <tr>
- <td>
- <a href="https://issues.apache.org/jira/browse/HADOOP-1593">HADOOP-1593</a>
- </td>
- <td>
- fs
- </td>
- <td>
- Modified FSShell commands to accept non-default paths. Now you can commands like <tt>hadoop dfs -ls hdfs://remotehost1:port/path</tt>
- and <tt>hadoop dfs -ls hdfs://remotehost2:port/path</tt> without changing your Hadoop config.
- </td>
- </tr>
- <tr>
- <td>
- <a href="https://issues.apache.org/jira/browse/HADOOP-3048">HADOOP-3048</a>
- </td>
- <td>
- io
- </td>
- <td>
- Added a new API and a default
- implementation to convert and restore serializations of objects to strings.
- </td>
- </tr>
- <tr>
- <td>
- <a href="https://issues.apache.org/jira/browse/HADOOP-3152">HADOOP-3152</a>
- </td>
- <td>
- io
- </td>
- <td>
- Add a static method
- <tt>MapFile.setIndexInterval(Configuration, int interval)</tt> so that Map/Reduce
- jobs using <tt>MapFileOutputFormat</tt> can set the index interval.
- </td>
- </tr>
- <tr>
- <td>
- <a href="https://issues.apache.org/jira/browse/HADOOP-3073">HADOOP-3073</a>
- </td>
- <td>
- ipc
- </td>
- <td>
- <tt>SocketOutputStream.close()</tt> now closes the
- underlying channel. This increase compatibility with
- <tt>java.net.Socket.getOutputStream</tt>.
- </td>
- </tr>
- <tr>
- <td>
- <a href="https://issues.apache.org/jira/browse/HADOOP-3041">HADOOP-3041</a>
- </td>
- <td>
- mapred
- </td>
- <td>
- Deprecated <tt>JobConf.setOutputPath</tt> and <tt>JobConf.getOutputPath</tt>.<p>
- Deprecated <tt>OutputFormatBase</tt>. Added <tt>FileOutputFormat</tt>. Existing output
- formats extending <tt>OutputFormatBase</tt> now extend <tt>FileOutputFormat</tt>. <p>
- Added the following methods to <tt>FileOutputFormat</tt>:
- <tt><ul>
- <li>public static void setOutputPath(JobConf conf, Path outputDir)
- <li>public static Path getOutputPath(JobConf conf)
- <li>public static Path getWorkOutputPath(JobConf conf)
- <li>static void setWorkOutputPath(JobConf conf, Path outputDir)
- </ul></tt>
- </td>
- </tr>
- <tr>
- <td>
- <a href="https://issues.apache.org/jira/browse/HADOOP-3204">HADOOP-3204</a>
- </td>
- <td>
- mapred
- </td>
- <td>
- Fixed <tt>ReduceTask.LocalFSMerger</tt> to handle errors and exceptions better. Prior to this all
- exceptions except IOException would be silently ignored.
- </td>
- </tr>
- <tr>
- <td>
- <a href="https://issues.apache.org/jira/browse/HADOOP-1986">HADOOP-1986</a>
- </td>
- <td>
- mapred
- </td>
- <td>
- Programs that implement the raw
- <tt>Mapper</tt> or <tt>Reducer</tt> interfaces will need modification to compile with this
- release. For example, <p>
- <pre>
- class MyMapper implements Mapper {
- public void map(WritableComparable key, Writable val,
- OutputCollector out, Reporter reporter) throws IOException {
- // ...
- }
- // ...
- }
- </pre>
- will need to be changed to refer to the parameterized type. For example: <p>
- <pre>
- class MyMapper implements Mapper<WritableComparable, Writable, WritableComparable, Writable> {
- public void map(WritableComparable key, Writable val,
- OutputCollector<WritableComparable, Writable>
- out, Reporter reporter) throws IOException {
- // ...
- }
- // ...
- }
- </pre>
- Similarly implementations of the following raw interfaces will need
- modification:
- <tt><ul>
- <li>InputFormat
- <li>OutputCollector
- <li>OutputFormat
- <li>Partitioner
- <li>RecordReader
- <li>RecordWriter
- </ul></tt>
- </td>
- </tr>
- <tr>
- <td>
- <a href="https://issues.apache.org/jira/browse/HADOOP-910">HADOOP-910</a>
- </td>
- <td>
- mapred
- </td>
- <td>
- Reducers now perform merges of
- shuffle data (both in-memory and on disk) while fetching map outputs.
- Earlier, during shuffle they used to merge only the in-memory outputs.
- </td>
- </tr>
- <tr>
- <td>
- <a href="https://issues.apache.org/jira/browse/HADOOP-2822">HADOOP-2822</a>
- </td>
- <td>
- mapred
- </td>
- <td>
- Removed the deprecated classes <tt>org.apache.hadoop.mapred.InputFormatBase</tt>
- and <tt>org.apache.hadoop.mapred.PhasedFileSystem</tt>.
- </td>
- </tr>
- <tr>
- <td>
- <a href="https://issues.apache.org/jira/browse/HADOOP-2817">HADOOP-2817</a>
- </td>
- <td>
- mapred
- </td>
- <td>
- Removed the deprecated method
- <tt>org.apache.hadoop.mapred.ClusterStatus.getMaxTasks()</tt>
- and the deprecated configuration property <tt>mapred.tasktracker.tasks.maximum</tt>.
- </td>
- </tr>
- <tr>
- <td>
- <a href="https://issues.apache.org/jira/browse/HADOOP-2825">HADOOP-2825</a>
- </td>
- <td>
- mapred
- </td>
- <td>
- Removed the deprecated method
- <tt>org.apache.hadoop.mapred.MapOutputLocation.getFile(FileSystem fileSys, Path
- localFilename, int reduce, Progressable pingee, int timeout)</tt>.
- </td>
- </tr>
- <tr>
- <td>
- <a href="https://issues.apache.org/jira/browse/HADOOP-2818">HADOOP-2818</a>
- </td>
- <td>
- mapred
- </td>
- <td>
- Removed the deprecated methods
- <tt>org.apache.hadoop.mapred.Counters.getDisplayName(String counter)</tt> and
- <tt>org.apache.hadoop.mapred.Counters.getCounterNames()</tt>.
- Undeprecated the method
- <tt>org.apache.hadoop.mapred.Counters.getCounter(String counterName)</tt>.
- </td>
- </tr>
- <tr>
- <td>
- <a href="https://issues.apache.org/jira/browse/HADOOP-2826">HADOOP-2826</a>
- </td>
- <td>
- mapred
- </td>
- <td>
- Changed The signature of the method
- <tt>public org.apache.hadoop.streaming.UTF8ByteArrayUtils.readLIne(InputStream)</tt> to
- <tt>UTF8ByteArrayUtils.readLIne(LineReader, Text)</tt>. Since the old
- signature is not deprecated, any code using the old method must be changed
- to use the new method.
- <p>
- Removed the deprecated methods <tt>org.apache.hadoop.mapred.FileSplit.getFile()</tt>
- and <tt>org.apache.hadoop.mapred.LineRecordReader.readLine(InputStream in,
- OutputStream out)</tt>.
- <p>
- Made the constructor <tt>org.apache.hadoop.mapred.LineRecordReader.LineReader(InputStream in, Configuration
- conf)</tt> public.
- </td>
- </tr>
- <tr>
- <td>
- <a href="https://issues.apache.org/jira/browse/HADOOP-2819">HADOOP-2819</a>
- </td>
- <td>
- mapred
- </td>
- <td>
- Removed these deprecated methods from <tt>org.apache.hadoop.JobConf</tt>:
- <tt><ul>
- <li>public Class getInputKeyClass()
- <li>public void setInputKeyClass(Class theClass)
- <li>public Class getInputValueClass()
- <li>public void setInputValueClass(Class theClass)
- </ul></tt>
- and undeprecated these methods:
- <tt><ul>
- <li>getSpeculativeExecution()
- <li>public void setSpeculativeExecution(boolean speculativeExecution)
- </ul></tt>
- </td>
- </tr>
- <tr>
- <td>
- <a href="https://issues.apache.org/jira/browse/HADOOP-3093">HADOOP-3093</a>
- </td>
- <td>
- mapred
- </td>
- <td>
- Added the following public methods to <tt>org.apache.hadoop.conf.Configuration</tt>:
- <tt><ul>
- <li>String[] Configuration.getStrings(String name, String... defaultValue)
- <li>void Configuration.setStrings(String name, String... values)
- </ul></tt>
- </td>
- </tr>
- <tr>
- <td>
- <a href="https://issues.apache.org/jira/browse/HADOOP-2399">HADOOP-2399</a>
- </td>
- <td>
- mapred
- </td>
- <td>
- The key and value objects that are given
- to the Combiner and Reducer are now reused between calls. This is much more
- efficient, but the user can not assume the objects are constant.
- </td>
- </tr>
- <tr>
- <td>
- <a href="https://issues.apache.org/jira/browse/HADOOP-3162">HADOOP-3162</a>
- </td>
- <td>
- mapred
- </td>
- <td>
- Deprecated the public methods <tt>org.apache.hadoop.mapred.JobConf.setInputPath(Path)</tt> and
- <tt>org.apache.hadoop.mapred.JobConf.addInputPath(Path)</tt>.
- <p>
- Added the following public methods to <tt>org.apache.hadoop.mapred.FileInputFormat</tt>:
- <tt><ul>
- <li>public static void setInputPaths(JobConf job, Path... paths); <br>
- <li>public static void setInputPaths(JobConf job, String commaSeparatedPaths); <br>
- <li>public static void addInputPath(JobConf job, Path path); <br>
- <li>public static void addInputPaths(JobConf job, String commaSeparatedPaths); <br>
- </ul></tt>
- Earlier code calling <tt>JobConf.setInputPath(Path)</tt> and <tt>JobConf.addInputPath(Path)</tt>
- should now call <tt>FileInputFormat.setInputPaths(JobConf, Path...)</tt> and
- <tt>FileInputFormat.addInputPath(Path)</tt> respectively.
- </td>
- </tr>
- <tr>
- <td>
- <a href="https://issues.apache.org/jira/browse/HADOOP-2178">HADOOP-2178</a>
- </td>
- <td>
- mapred
- </td>
- <td>
- Provided a new facility to
- store job history on DFS. Cluster administrator can now provide either localFS
- location or DFS location using configuration property
- <tt>mapred.job.history.location</tt> to store job histroy. History will also
- be logged in user specified location if the configuration property
- <tt>mapred.job.history.user.location</tt> is specified.
- <p>
- Removed these classes and method:
- <tt><ul>
- <li>org.apache.hadoop.mapred.DefaultJobHistoryParser.MasterIndex
- <li>org.apache.hadoop.mapred.DefaultJobHistoryParser.MasterIndexParseListener
- <li>org.apache.hadoop.mapred.DefaultJobHistoryParser.parseMasterIndex
- </ul></tt>
- <p>
- Changed the signature of the public method
- <tt>org.apache.hadoop.mapred.DefaultJobHistoryParser.parseJobTasks(File
- jobHistoryFile, JobHistory.JobInfo job)</tt> to
- <tt>DefaultJobHistoryParser.parseJobTasks(String jobHistoryFile,
- JobHistory.JobInfo job, FileSystem fs)</tt>. <p>
- Changed the signature of the public method
- <tt>org.apache.hadoop.mapred.JobHistory.parseHistory(File path, Listener l)</tt>
- to <tt>JobHistory.parseHistoryFromFS(String path, Listener l, FileSystem fs)</tt>.
- </td>
- </tr>
- <tr>
- <td>
- <a href="https://issues.apache.org/jira/browse/HADOOP-2055">HADOOP-2055</a>
- </td>
- <td>
- mapred
- </td>
- <td>
- Users are now provided the ability to specify what paths to ignore when processing the job input directory
- (apart from the filenames that start with "_" and ".").
- To do this, two new methods were defined:
- <tt><ul>
- <li>FileInputFormat.setInputPathFilter(JobConf, PathFilter)
- <li>FileInputFormat.getInputPathFilter(JobConf)
- </ul></tt>
- </td>
- </tr>
- <tr>
- <td>
- <a href="https://issues.apache.org/jira/browse/HADOOP-2116">HADOOP-2116</a>
- </td>
- <td>
- mapred
- </td>
- <td>
- Restructured the local job directory on the tasktracker. Users are provided with a job-specific shared directory
- (<tt>mapred-local/taskTracker/jobcache/$jobid/work</tt>) for use as scratch
- space, through configuration property and system property
- <tt>job.local.dir</tt>. The directory <tt>../work</tt> is no longer available from the task's current working directory.
- </td>
- </tr>
- <tr>
- <td>
- <a href="https://issues.apache.org/jira/browse/HADOOP-1622">HADOOP-1622</a>
- </td>
- <td>
- mapred
- </td>
- <td>
- Added new command line options for <tt>hadoop jar</tt> command:
- <p>
- <tt>hadoop jar -files <comma seperated list of files> -libjars <comma
- seperated list of jars> -archives <comma seperated list of
- archives> </tt>
- <p>
- where the options have these meanings:
- <p>
- <ul>
- <li><tt>-files</tt> options allows you to speficy comma seperated list of path which
- would be present in your current working directory of your task <br>
- <li><tt>-libjars</tt> option allows you to add jars to the classpaths of the maps and
- reduces. <br>
- <li><tt>-archives</tt> allows you to pass archives as arguments that are
- unzipped/unjarred and a link with name of the jar/zip are created in the
- current working directory if tasks.
- </ul>
- </td>
- </tr>
- <tr>
- <td>
- <a href="https://issues.apache.org/jira/browse/HADOOP-2823">HADOOP-2823</a>
- </td>
- <td>
- record
- </td>
- <td>
- Removed the deprecated methods in
- <tt>org.apache.hadoop.record.compiler.generated.SimpleCharStream</tt>:
- <tt><ul>
- <li>public int getColumn()
- <li>and public int getLine()
- </ul></tt>
- </td>
- </tr>
- <tr>
- <td>
- <a href="https://issues.apache.org/jira/browse/HADOOP-2551">HADOOP-2551</a>
- </td>
- <td>
- scripts
- </td>
- <td>
- Introduced new environment variables to allow finer grained control of Java options passed to server and
- client JVMs. See the new <tt>*_OPTS</tt> variables in <tt>conf/hadoop-env.sh</tt>.
- </td>
- </tr>
- <tr>
- <td>
- <a href="https://issues.apache.org/jira/browse/HADOOP-3099">HADOOP-3099</a>
- </td>
- <td>
- util
- </td>
- <td>
- Added a new <tt>-p</tt> option to <tt>distcp</tt> for preserving file and directory status:
- <pre>
- -p[rbugp] Preserve status
- r: replication number
- b: block size
- u: user
- g: group
- p: permission
- </pre>
- The <tt>-p</tt> option alone is equivalent to <tt>-prbugp</tt>
- </td>
- </tr>
- <tr>
- <td>
- <a href="https://issues.apache.org/jira/browse/HADOOP-2821">HADOOP-2821</a>
- </td>
- <td>
- util
- </td>
- <td>
- Removed the deprecated classes <tt>org.apache.hadoop.util.ShellUtil</tt> and <tt>org.apache.hadoop.util.ToolBase</tt>.
- </td>
- </tr>
- </tbody></table>
- </ul>
- </body></html>
|