releasenotes.html 28 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506507508509510511512513514515516517518519520521522523524525526527528529530531532533534535536537538539540541542543544545546547548549550551552553554555556557558559560561562563564565566567568569570571572573574575576577578579580581582583584585586587588589590591592593594595596597598599600601602603604605606607608609610611612613614615616617618619620621622623624625626627628629630631632633634635636637638639640641642643644645646647648649650651652653654655656657658659660661662663664665666667668669670671672673674675676677678679680681682683684685686687688689690691692693694695696697698699700701702703704705706707708709710711712713714715716717718719720721722723724725726727728729730731732733734735736737738739740741742743744745746747748749750751752753754755756757758759760761762763764765766767768769770771772773774775776777778779780781782783784785786787788789790791792793794795796797798799800801802803804805806807808809810811812813814815816817818819820821822823824825826827828829830831832833834835836837838839840841842843844845846847848849850851852853854855856857858859860861862863864865866867868869870871872873874875876877878879880881882883884885886887888889890891892893894895896897898899900901902903904905906907908909910911912913914
  1. <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
  2. <html><head>
  3. <title>Hadoop 0.17.1 Release Notes</title></head>
  4. <body>
  5. <font face="sans-serif">
  6. <h1>Hadoop 0.17.1 Release Notes</h1>
  7. The bug fixes are listed below.
  8. <ul><a name="changes">
  9. <h2>Changes Since Hadoop 0.17.0</h2>
  10. <ul>
  11. <li>[<a href='https://issues.apache.org/jira/browse/HADOOP-2159'>HADOOP-2159</a>] - Namenode stuck in safemode
  12. </li>
  13. <li>[<a href='https://issues.apache.org/jira/browse/HADOOP-3442'>HADOOP-3442</a>] - QuickSort may get into unbounded recursion
  14. </li>
  15. <li>[<a href='https://issues.apache.org/jira/browse/HADOOP-3472'>HADOOP-3472</a>] - MapFile.Reader getClosest() function returns incorrect results when before is true
  16. </li>
  17. <li>[<a href='https://issues.apache.org/jira/browse/HADOOP-3475'>HADOOP-3475</a>] - MapOutputBuffer allocates 4x as much space to record capacity as intended
  18. </li>
  19. <li>[<a href='https://issues.apache.org/jira/browse/HADOOP-3477'>HADOOP-3477</a>] - release tar.gz contains duplicate files
  20. </li>
  21. <li>[<a href='https://issues.apache.org/jira/browse/HADOOP-3522'>HADOOP-3522</a>] - ValuesIterator.next() doesn't return a new object, thus failing many equals() tests.
  22. </li>
  23. <li>[<a href='https://issues.apache.org/jira/browse/HADOOP-3526'>HADOOP-3526</a>] - contrib/data_join doesn't work
  24. </li>
  25. <li>[<a href='https://issues.apache.org/jira/browse/HADOOP-3550'>HADOOP-3550</a>] - Reduce tasks failing with OOM
  26. </li>
  27. <li>[<a href='https://issues.apache.org/jira/browse/HADOOP-3565'>HADOOP-3565</a>] - JavaSerialization can throw java.io.StreamCorruptedException
  28. </li>
  29. </ul>
  30. </ul>
  31. <h1>Hadoop 0.17.0 Release Notes</h1>
  32. These release notes include new developer and user facing incompatibilities, features, and major improvements. The table below is sorted by Component.
  33. <ul><a name="changes">
  34. <h2>Changes Since Hadoop 0.16.4</h2>
  35. <table border="1" width="100%" cellpadding="4">
  36. <tbody><tr>
  37. <td><b>Issue</b></td>
  38. <td><b>Component</b></td>
  39. <td><b>Notes</b></td>
  40. </tr>
  41. <tr>
  42. <td>
  43. <a href="https://issues.apache.org/jira/browse/HADOOP-2828">HADOOP-2828</a>
  44. </td>
  45. <td>
  46. conf
  47. </td>
  48. <td>
  49. Remove these deprecated methods in
  50. <tt>org.apache.hadoop.conf.Configuration</tt>:<br><tt><ul><li>
  51. public Object getObject(String name) </li><li>
  52. public void setObject(String name, Object value) </li><li>
  53. public Object get(String name, Object defaultValue) </li><li>
  54. public void set(String name, Object value)</li><li>public Iterator entries()
  55. </li></ul></tt></td>
  56. </tr>
  57. <tr>
  58. <td nowrap>
  59. <a href="https://issues.apache.org/jira/browse/HADOOP-2410">HADOOP-2410</a>
  60. </td>
  61. <td>
  62. contrib/ec2
  63. </td>
  64. <td>
  65. The command <tt>hadoop-ec2
  66. run</tt> has been replaced by <tt>hadoop-ec2 launch-cluster
  67. &lt;group&gt; &lt;number of instances&gt;</tt>, and <tt>hadoop-ec2
  68. start-hadoop</tt> has been removed since Hadoop is started on instance
  69. start up. See <a href="http://wiki.apache.org/hadoop/AmazonEC2">http://wiki.apache.org/hadoop/AmazonEC2</a>
  70. for details.
  71. </td>
  72. </tr>
  73. <tr>
  74. <td>
  75. <a href="https://issues.apache.org/jira/browse/HADOOP-2796">HADOOP-2796</a>
  76. </td>
  77. <td>
  78. contrib/hod
  79. </td>
  80. <td>
  81. Added a provision to reliably detect a
  82. failing script's exit code. When the HOD script option
  83. returns a non-zero exit code, look for a <tt>script.exitcode</tt>
  84. file written to the HOD cluster directory. If this file is present, it
  85. means the script failed with the exit code given in the file.
  86. </td>
  87. </tr>
  88. <tr>
  89. <td>
  90. <a href="https://issues.apache.org/jira/browse/HADOOP-2775">HADOOP-2775</a>
  91. </td>
  92. <td>
  93. contrib/hod
  94. </td>
  95. <td>
  96. Added A unit testing framework based on
  97. pyunit to HOD. Developers contributing patches to HOD should now
  98. contribute unit tests along with the patches when possible.
  99. </td>
  100. </tr>
  101. <tr>
  102. <td>
  103. <a href="https://issues.apache.org/jira/browse/HADOOP-3137">HADOOP-3137</a>
  104. </td>
  105. <td>
  106. contrib/hod
  107. </td>
  108. <td>
  109. The HOD version is now the same as the Hadoop version.
  110. </td>
  111. </tr>
  112. <tr>
  113. <td>
  114. <a href="https://issues.apache.org/jira/browse/HADOOP-2855">HADOOP-2855</a>
  115. </td>
  116. <td>
  117. contrib/hod
  118. </td>
  119. <td>
  120. HOD now handles relative
  121. paths correctly for important HOD options such as the cluster directory,
  122. tarball option, and script file.
  123. </td>
  124. </tr>
  125. <tr>
  126. <td>
  127. <a href="https://issues.apache.org/jira/browse/HADOOP-2899">HADOOP-2899</a>
  128. </td>
  129. <td>
  130. contrib/hod
  131. </td>
  132. <td>
  133. HOD now cleans up the HOD generated mapred system directory
  134. at cluster deallocation time.
  135. </td>
  136. </tr>
  137. <tr>
  138. <td>
  139. <a href="https://issues.apache.org/jira/browse/HADOOP-2982">HADOOP-2982</a>
  140. </td>
  141. <td>
  142. contrib/hod
  143. </td>
  144. <td>
  145. The number of free nodes in the cluster
  146. is computed using a better algorithm that filters out inconsistencies in
  147. node status as reported by Torque.
  148. </td>
  149. </tr>
  150. <tr>
  151. <td>
  152. <a href="https://issues.apache.org/jira/browse/HADOOP-2947">HADOOP-2947</a>
  153. </td>
  154. <td>
  155. contrib/hod
  156. </td>
  157. <td>
  158. The stdout and stderr streams of
  159. daemons are redirected to files that are created under the hadoop log
  160. directory. Users can now send a <tt>kill 3</tt> signal to the daemons to get stack traces
  161. and thread dumps for debugging.
  162. </td>
  163. </tr>
  164. <tr>
  165. <td>
  166. <a href="https://issues.apache.org/jira/browse/HADOOP-3168">HADOOP-3168</a>
  167. </td>
  168. <td>
  169. contrib/streaming
  170. </td>
  171. <td>
  172. Decreased the frequency of logging
  173. in Hadoop streaming (from every 100 records to every 10,000 records).
  174. </td>
  175. </tr>
  176. <tr>
  177. <td>
  178. <a href="https://issues.apache.org/jira/browse/HADOOP-3040">HADOOP-3040</a>
  179. </td>
  180. <td>
  181. contrib/streaming
  182. </td>
  183. <td>
  184. Fixed a critical bug to restore important functionality in Hadoop streaming. If the first character on a line is
  185. the separator, then an empty key is assumed and the whole line is the value.
  186. </td>
  187. </tr>
  188. <tr>
  189. <td>
  190. <a href="https://issues.apache.org/jira/browse/HADOOP-2820">HADOOP-2820</a>
  191. </td>
  192. <td>
  193. contrib/streaming
  194. </td>
  195. <td>
  196. Removed these deprecated classes: <br><tt><ul><li>org.apache.hadoop.streaming.StreamLineRecordReader</li><li>org.apache.hadoop.streaming.StreamOutputFormat</li><li>org.apache.hadoop.streaming.StreamSequenceRecordReader</li></ul></tt></td>
  197. </tr>
  198. <tr>
  199. <td>
  200. <a href="https://issues.apache.org/jira/browse/HADOOP-3280">HADOOP-3280</a>
  201. </td>
  202. <td>
  203. contrib/streaming
  204. </td>
  205. <td>
  206. Added the
  207. <tt>mapred.child.ulimit</tt> configuration variable to limit the maximum virtual memory allocated to processes launched by the
  208. Map-Reduce framework. This can be used to control both the Mapper/Reducer
  209. tasks and applications using Hadoop pipes, Hadoop streaming etc.
  210. </td>
  211. </tr>
  212. <tr>
  213. <td>
  214. <a href="https://issues.apache.org/jira/browse/HADOOP-2657">HADOOP-2657</a>
  215. </td>
  216. <td>
  217. dfs
  218. </td>
  219. <td>Added the new API <tt>DFSOututStream.flush()</tt> to
  220. flush all outstanding data to DataNodes.
  221. </td>
  222. </tr>
  223. <tr>
  224. <td>
  225. <a href="https://issues.apache.org/jira/browse/HADOOP-2219">HADOOP-2219</a>
  226. </td>
  227. <td>
  228. dfs
  229. </td>
  230. <td>
  231. Added a new <tt>fs -count</tt> command for
  232. counting the number of bytes, files, and directories under a given path. <br>
  233. <br>
  234. Added a new RPC <tt>getContentSummary(String path)</tt> to ClientProtocol.
  235. </td>
  236. </tr>
  237. <tr>
  238. <td>
  239. <a href="https://issues.apache.org/jira/browse/HADOOP-2559">HADOOP-2559</a>
  240. </td>
  241. <td>
  242. dfs
  243. </td>
  244. <td>
  245. Changed DFS block placement to
  246. allocate the first replica locally, the second off-rack, and the third
  247. intra-rack from the second.
  248. </td>
  249. </tr>
  250. <tr>
  251. <td>
  252. <a href="https://issues.apache.org/jira/browse/HADOOP-2758">HADOOP-2758</a>
  253. </td>
  254. <td>
  255. dfs
  256. </td>
  257. <td>
  258. Improved DataNode CPU usage by 50% while serving data to clients.
  259. </td>
  260. </tr>
  261. <tr>
  262. <td>
  263. <a href="https://issues.apache.org/jira/browse/HADOOP-2634">HADOOP-2634</a>
  264. </td>
  265. <td>
  266. dfs
  267. </td>
  268. <td>
  269. Deprecated ClientProtocol's <tt>exists()</tt> method. Use <tt>getFileInfo(String)</tt> instead.
  270. </td>
  271. </tr>
  272. <tr>
  273. <td>
  274. <a href="https://issues.apache.org/jira/browse/HADOOP-2423">HADOOP-2423</a>
  275. </td>
  276. <td>
  277. dfs
  278. </td>
  279. <td>
  280. Improved <tt>FSDirectory.mkdirs(...)</tt> performance by about 50% as measured by the NNThroughputBenchmark.
  281. </td>
  282. </tr>
  283. <tr>
  284. <td>
  285. <a href="https://issues.apache.org/jira/browse/HADOOP-3124">HADOOP-3124</a>
  286. </td>
  287. <td>
  288. dfs
  289. </td>
  290. <td>
  291. Made DataNode socket write timeout configurable, however the configuration variable is undocumented.
  292. </td>
  293. </tr>
  294. <tr>
  295. <td>
  296. <a href="https://issues.apache.org/jira/browse/HADOOP-2470">HADOOP-2470</a>
  297. </td>
  298. <td>
  299. dfs
  300. </td>
  301. <td>
  302. Removed <tt>open()</tt> and <tt>isDir()</tt> methods from ClientProtocol without first deprecating. <br>
  303. <br>
  304. Remove deprecated <tt>getContentLength()</tt> from ClientProtocol.<br>
  305. <br>
  306. Deprecated <tt>isDirectory</tt> in DFSClient. Use <tt>getFileStatus()</tt> instead.
  307. </td>
  308. </tr>
  309. <tr>
  310. <td>
  311. <a href="https://issues.apache.org/jira/browse/HADOOP-2854">HADOOP-2854</a>
  312. </td>
  313. <td>
  314. dfs
  315. </td>
  316. <td>
  317. Removed deprecated method <tt>org.apache.hadoop.ipc.Server.getUserInfo()</tt>.
  318. </td>
  319. </tr>
  320. <tr>
  321. <td>
  322. <a href="https://issues.apache.org/jira/browse/HADOOP-2239">HADOOP-2239</a>
  323. </td>
  324. <td>
  325. dfs
  326. </td>
  327. <td>
  328. Added a new FileSystem, HftpsFileSystem, that allows access to HDFS data over HTTPS.
  329. </td>
  330. </tr>
  331. <tr>
  332. <td>
  333. <a href="https://issues.apache.org/jira/browse/HADOOP-771">HADOOP-771</a>
  334. </td>
  335. <td>
  336. dfs
  337. </td>
  338. <td>
  339. Added a new method to <tt>FileSystem</tt> API, <tt>delete(path, boolean)</tt>,
  340. and deprecated the previous <tt>delete(path)</tt> method.
  341. The new method recursively deletes files only if boolean is set to true.
  342. </td>
  343. </tr>
  344. <tr>
  345. <td>
  346. <a href="https://issues.apache.org/jira/browse/HADOOP-3239">HADOOP-3239</a>
  347. </td>
  348. <td>
  349. dfs
  350. </td>
  351. <td>
  352. Modified <tt>org.apache.hadoop.dfs.FSDirectory.getFileInfo(String)</tt> to return null when a file is not
  353. found instead of throwing FileNotFoundException.
  354. </td>
  355. </tr>
  356. <tr>
  357. <td>
  358. <a href="https://issues.apache.org/jira/browse/HADOOP-3091">HADOOP-3091</a>
  359. </td>
  360. <td>
  361. dfs
  362. </td>
  363. <td>
  364. Enhanced <tt>hadoop dfs -put</tt> command to accept multiple
  365. sources when destination is a directory.
  366. </td>
  367. </tr>
  368. <tr>
  369. <td>
  370. <a href="https://issues.apache.org/jira/browse/HADOOP-2192">HADOOP-2192</a>
  371. </td>
  372. <td>
  373. dfs
  374. </td>
  375. <td>
  376. Modified <tt>hadoop dfs -mv</tt> to be closer in functionality to
  377. the Linux <tt>mv</tt> command by removing unnecessary output and return
  378. an error message when moving non existent files/directories.
  379. </td>
  380. </tr>
  381. <tr>
  382. <td>
  383. <u1:p></u1:p><a href="https://issues.apache.org/jira/browse/HADOOP-1985">HADOOP-1985</a>
  384. </td>
  385. <td>
  386. dfs <br>
  387. mapred
  388. </td>
  389. <td>
  390. Added rack awareness for map tasks and moves the rack resolution logic to the
  391. NameNode and JobTracker. <p> The administrator can specify a
  392. loadable class given by topology.node.switch.mapping.impl to specify the
  393. class implementing the logic for rack resolution. The class must implement
  394. a method - resolve(List&lt;String&gt; names), where names is the list of
  395. DNS-names/IP-addresses that we want resolved. The return value is a list of
  396. resolved network paths of the form /foo/rack, where rack is the rackID
  397. where the node belongs to and foo is the switch where multiple racks are
  398. connected, and so on. The default implementation of this class is packaged
  399. along with hadoop and points to org.apache.hadoop.net.ScriptBasedMapping
  400. and this class loads a script that can be used for rack resolution. The
  401. script location is configurable. It is specified by
  402. topology.script.file.name and defaults to an empty script. In the case
  403. where the script name is empty, /default-rack is returned for all
  404. dns-names/IP-addresses. The loadable topology.node.switch.mapping.impl provides
  405. administrators fleixibilty to define how their site's node resolution
  406. should happen. <br>
  407. For mapred, one can also specify the level of the cache w.r.t the number of
  408. levels in the resolved network path - defaults to two. This means that the
  409. JobTracker will cache tasks at the host level and at the rack level. <br>
  410. Known issue: the task caching will not work with levels greater than 2
  411. (beyond racks). This bug is tracked in <a href="https://issues.apache.org/jira/browse/HADOOP-3296">HADOOP-3296</a>.
  412. </td>
  413. </tr>
  414. <tr>
  415. <td>
  416. <a href="https://issues.apache.org/jira/browse/HADOOP-2063">HADOOP-2063</a>
  417. </td>
  418. <td>
  419. fs
  420. </td>
  421. <td>
  422. Added a new option <tt>-ignoreCrc</tt> to <tt>fs -get</tt> and <tt>fs -copyToLocal</tt>. The option causes CRC checksums to be
  423. ignored for this command so that corrupt files may be downloaded.
  424. </td>
  425. </tr>
  426. <tr>
  427. <td>
  428. <a href="https://issues.apache.org/jira/browse/HADOOP-3001">HADOOP-3001</a>
  429. </td>
  430. <td>
  431. fs
  432. </td>
  433. <td>
  434. Added a new Map/Reduce framework
  435. counters that track the number of bytes read and written to HDFS, local,
  436. KFS, and S3 file systems.
  437. </td>
  438. </tr>
  439. <tr>
  440. <td>
  441. <a href="https://issues.apache.org/jira/browse/HADOOP-2027">HADOOP-2027</a>
  442. </td>
  443. <td>
  444. fs
  445. </td>
  446. <td>
  447. Added a new FileSystem method <tt>getFileBlockLocations</tt> to return the number of bytes in each block in a file
  448. via a single rpc to the NameNode. Deprecated <tt>getFileCacheHints</tt>.
  449. </td>
  450. </tr>
  451. <tr>
  452. <td>
  453. <a href="https://issues.apache.org/jira/browse/HADOOP-2839">HADOOP-2839</a>
  454. </td>
  455. <td>
  456. fs
  457. </td>
  458. <td>
  459. Removed deprecated method <tt>org.apache.hadoop.fs.FileSystem.globPaths()</tt>.
  460. </td>
  461. </tr>
  462. <tr>
  463. <td>
  464. <a href="https://issues.apache.org/jira/browse/HADOOP-2563">HADOOP-2563</a>
  465. </td>
  466. <td>
  467. fs
  468. </td>
  469. <td>
  470. Removed deprecated method <tt>org.apache.hadoop.fs.FileSystem.listPaths()</tt>.
  471. </td>
  472. </tr>
  473. <tr>
  474. <td>
  475. <a href="https://issues.apache.org/jira/browse/HADOOP-1593">HADOOP-1593</a>
  476. </td>
  477. <td>
  478. fs
  479. </td>
  480. <td>
  481. Modified FSShell commands to accept non-default paths. Now you can commands like <tt>hadoop dfs -ls hdfs://remotehost1:port/path</tt>
  482. and <tt>hadoop dfs -ls hdfs://remotehost2:port/path</tt> without changing your Hadoop config.
  483. </td>
  484. </tr>
  485. <tr>
  486. <td>
  487. <a href="https://issues.apache.org/jira/browse/HADOOP-3048">HADOOP-3048</a>
  488. </td>
  489. <td>
  490. io
  491. </td>
  492. <td>
  493. Added a new API and a default
  494. implementation to convert and restore serializations of objects to strings.
  495. </td>
  496. </tr>
  497. <tr>
  498. <td>
  499. <a href="https://issues.apache.org/jira/browse/HADOOP-3152">HADOOP-3152</a>
  500. </td>
  501. <td>
  502. io
  503. </td>
  504. <td>
  505. Add a static method
  506. <tt>MapFile.setIndexInterval(Configuration, int interval)</tt> so that Map/Reduce
  507. jobs using <tt>MapFileOutputFormat</tt> can set the index interval.
  508. </td>
  509. </tr>
  510. <tr>
  511. <td>
  512. <a href="https://issues.apache.org/jira/browse/HADOOP-3073">HADOOP-3073</a>
  513. </td>
  514. <td>
  515. ipc
  516. </td>
  517. <td>
  518. <tt>SocketOutputStream.close()</tt> now closes the
  519. underlying channel. This increase compatibility with
  520. <tt>java.net.Socket.getOutputStream</tt>.
  521. </td>
  522. </tr>
  523. <tr>
  524. <td>
  525. <a href="https://issues.apache.org/jira/browse/HADOOP-3041">HADOOP-3041</a>
  526. </td>
  527. <td>
  528. mapred
  529. </td>
  530. <td>
  531. Deprecated <tt>JobConf.setOutputPath</tt> and <tt>JobConf.getOutputPath</tt>.<p>
  532. Deprecated <tt>OutputFormatBase</tt>. Added <tt>FileOutputFormat</tt>. Existing output
  533. formats extending <tt>OutputFormatBase</tt> now extend <tt>FileOutputFormat</tt>. <p>
  534. Added the following methods to <tt>FileOutputFormat</tt>:
  535. <tt><ul>
  536. <li>public static void setOutputPath(JobConf conf, Path outputDir)
  537. <li>public static Path getOutputPath(JobConf conf)
  538. <li>public static Path getWorkOutputPath(JobConf conf)
  539. <li>static void setWorkOutputPath(JobConf conf, Path outputDir)
  540. </ul></tt>
  541. </td>
  542. </tr>
  543. <tr>
  544. <td>
  545. <a href="https://issues.apache.org/jira/browse/HADOOP-3204">HADOOP-3204</a>
  546. </td>
  547. <td>
  548. mapred
  549. </td>
  550. <td>
  551. Fixed <tt>ReduceTask.LocalFSMerger</tt> to handle errors and exceptions better. Prior to this all
  552. exceptions except IOException would be silently ignored.
  553. </td>
  554. </tr>
  555. <tr>
  556. <td>
  557. <a href="https://issues.apache.org/jira/browse/HADOOP-1986">HADOOP-1986</a>
  558. </td>
  559. <td>
  560. mapred
  561. </td>
  562. <td>
  563. Programs that implement the raw
  564. <tt>Mapper</tt> or <tt>Reducer</tt> interfaces will need modification to compile with this
  565. release. For example, <p>
  566. <pre>
  567. class MyMapper implements Mapper {
  568. public void map(WritableComparable key, Writable val,
  569. OutputCollector out, Reporter reporter) throws IOException {
  570. // ...
  571. }
  572. // ...
  573. }
  574. </pre>
  575. will need to be changed to refer to the parameterized type. For example: <p>
  576. <pre>
  577. class MyMapper implements Mapper&lt;WritableComparable, Writable, WritableComparable, Writable&gt; {
  578. public void map(WritableComparable key, Writable val,
  579. OutputCollector&lt;WritableComparable, Writable&gt;
  580. out, Reporter reporter) throws IOException {
  581. // ...
  582. }
  583. // ...
  584. }
  585. </pre>
  586. Similarly implementations of the following raw interfaces will need
  587. modification:
  588. <tt><ul>
  589. <li>InputFormat
  590. <li>OutputCollector
  591. <li>OutputFormat
  592. <li>Partitioner
  593. <li>RecordReader
  594. <li>RecordWriter
  595. </ul></tt>
  596. </td>
  597. </tr>
  598. <tr>
  599. <td>
  600. <a href="https://issues.apache.org/jira/browse/HADOOP-910">HADOOP-910</a>
  601. </td>
  602. <td>
  603. mapred
  604. </td>
  605. <td>
  606. Reducers now perform merges of
  607. shuffle data (both in-memory and on disk) while fetching map outputs.
  608. Earlier, during shuffle they used to merge only the in-memory outputs.
  609. </td>
  610. </tr>
  611. <tr>
  612. <td>
  613. <a href="https://issues.apache.org/jira/browse/HADOOP-2822">HADOOP-2822</a>
  614. </td>
  615. <td>
  616. mapred
  617. </td>
  618. <td>
  619. Removed the deprecated classes <tt>org.apache.hadoop.mapred.InputFormatBase</tt>
  620. and <tt>org.apache.hadoop.mapred.PhasedFileSystem</tt>.
  621. </td>
  622. </tr>
  623. <tr>
  624. <td>
  625. <a href="https://issues.apache.org/jira/browse/HADOOP-2817">HADOOP-2817</a>
  626. </td>
  627. <td>
  628. mapred
  629. </td>
  630. <td>
  631. Removed the deprecated method
  632. <tt>org.apache.hadoop.mapred.ClusterStatus.getMaxTasks()</tt>
  633. and the deprecated configuration property <tt>mapred.tasktracker.tasks.maximum</tt>.
  634. </td>
  635. </tr>
  636. <tr>
  637. <td>
  638. <a href="https://issues.apache.org/jira/browse/HADOOP-2825">HADOOP-2825</a>
  639. </td>
  640. <td>
  641. mapred
  642. </td>
  643. <td>
  644. Removed the deprecated method
  645. <tt>org.apache.hadoop.mapred.MapOutputLocation.getFile(FileSystem fileSys, Path
  646. localFilename, int reduce, Progressable pingee, int timeout)</tt>.
  647. </td>
  648. </tr>
  649. <tr>
  650. <td>
  651. <a href="https://issues.apache.org/jira/browse/HADOOP-2818">HADOOP-2818</a>
  652. </td>
  653. <td>
  654. mapred
  655. </td>
  656. <td>
  657. Removed the deprecated methods
  658. <tt>org.apache.hadoop.mapred.Counters.getDisplayName(String counter)</tt> and
  659. <tt>org.apache.hadoop.mapred.Counters.getCounterNames()</tt>.
  660. Undeprecated the method
  661. <tt>org.apache.hadoop.mapred.Counters.getCounter(String counterName)</tt>.
  662. </td>
  663. </tr>
  664. <tr>
  665. <td>
  666. <a href="https://issues.apache.org/jira/browse/HADOOP-2826">HADOOP-2826</a>
  667. </td>
  668. <td>
  669. mapred
  670. </td>
  671. <td>
  672. Changed The signature of the method
  673. <tt>public org.apache.hadoop.streaming.UTF8ByteArrayUtils.readLIne(InputStream)</tt> to
  674. <tt>UTF8ByteArrayUtils.readLIne(LineReader, Text)</tt>. Since the old
  675. signature is not deprecated, any code using the old method must be changed
  676. to use the new method.
  677. <p>
  678. Removed the deprecated methods <tt>org.apache.hadoop.mapred.FileSplit.getFile()</tt>
  679. and <tt>org.apache.hadoop.mapred.LineRecordReader.readLine(InputStream in,
  680. OutputStream out)</tt>.
  681. <p>
  682. Made the constructor <tt>org.apache.hadoop.mapred.LineRecordReader.LineReader(InputStream in, Configuration
  683. conf)</tt> public.
  684. </td>
  685. </tr>
  686. <tr>
  687. <td>
  688. <a href="https://issues.apache.org/jira/browse/HADOOP-2819">HADOOP-2819</a>
  689. </td>
  690. <td>
  691. mapred
  692. </td>
  693. <td>
  694. Removed these deprecated methods from <tt>org.apache.hadoop.JobConf</tt>:
  695. <tt><ul>
  696. <li>public Class getInputKeyClass()
  697. <li>public void setInputKeyClass(Class theClass)
  698. <li>public Class getInputValueClass()
  699. <li>public void setInputValueClass(Class theClass)
  700. </ul></tt>
  701. and undeprecated these methods:
  702. <tt><ul>
  703. <li>getSpeculativeExecution()
  704. <li>public void setSpeculativeExecution(boolean speculativeExecution)
  705. </ul></tt>
  706. </td>
  707. </tr>
  708. <tr>
  709. <td>
  710. <a href="https://issues.apache.org/jira/browse/HADOOP-3093">HADOOP-3093</a>
  711. </td>
  712. <td>
  713. mapred
  714. </td>
  715. <td>
  716. Added the following public methods to <tt>org.apache.hadoop.conf.Configuration</tt>:
  717. <tt><ul>
  718. <li>String[] Configuration.getStrings(String name, String... defaultValue)
  719. <li>void Configuration.setStrings(String name, String... values)
  720. </ul></tt>
  721. </td>
  722. </tr>
  723. <tr>
  724. <td>
  725. <a href="https://issues.apache.org/jira/browse/HADOOP-2399">HADOOP-2399</a>
  726. </td>
  727. <td>
  728. mapred
  729. </td>
  730. <td>
  731. The key and value objects that are given
  732. to the Combiner and Reducer are now reused between calls. This is much more
  733. efficient, but the user can not assume the objects are constant.
  734. </td>
  735. </tr>
  736. <tr>
  737. <td>
  738. <a href="https://issues.apache.org/jira/browse/HADOOP-3162">HADOOP-3162</a>
  739. </td>
  740. <td>
  741. mapred
  742. </td>
  743. <td>
  744. Deprecated the public methods <tt>org.apache.hadoop.mapred.JobConf.setInputPath(Path)</tt> and
  745. <tt>org.apache.hadoop.mapred.JobConf.addInputPath(Path)</tt>.
  746. <p>
  747. Added the following public methods to <tt>org.apache.hadoop.mapred.FileInputFormat</tt>:
  748. <tt><ul>
  749. <li>public static void setInputPaths(JobConf job, Path... paths); <br>
  750. <li>public static void setInputPaths(JobConf job, String commaSeparatedPaths); <br>
  751. <li>public static void addInputPath(JobConf job, Path path); <br>
  752. <li>public static void addInputPaths(JobConf job, String commaSeparatedPaths); <br>
  753. </ul></tt>
  754. Earlier code calling <tt>JobConf.setInputPath(Path)</tt> and <tt>JobConf.addInputPath(Path)</tt>
  755. should now call <tt>FileInputFormat.setInputPaths(JobConf, Path...)</tt> and
  756. <tt>FileInputFormat.addInputPath(Path)</tt> respectively.
  757. </td>
  758. </tr>
  759. <tr>
  760. <td>
  761. <a href="https://issues.apache.org/jira/browse/HADOOP-2178">HADOOP-2178</a>
  762. </td>
  763. <td>
  764. mapred
  765. </td>
  766. <td>
  767. Provided a new facility to
  768. store job history on DFS. Cluster administrator can now provide either localFS
  769. location or DFS location using configuration property
  770. <tt>mapred.job.history.location</tt> to store job histroy. History will also
  771. be logged in user specified location if the configuration property
  772. <tt>mapred.job.history.user.location</tt> is specified.
  773. <p>
  774. Removed these classes and method:
  775. <tt><ul>
  776. <li>org.apache.hadoop.mapred.DefaultJobHistoryParser.MasterIndex
  777. <li>org.apache.hadoop.mapred.DefaultJobHistoryParser.MasterIndexParseListener
  778. <li>org.apache.hadoop.mapred.DefaultJobHistoryParser.parseMasterIndex
  779. </ul></tt>
  780. <p>
  781. Changed the signature of the public method
  782. <tt>org.apache.hadoop.mapred.DefaultJobHistoryParser.parseJobTasks(File
  783. jobHistoryFile, JobHistory.JobInfo job)</tt> to
  784. <tt>DefaultJobHistoryParser.parseJobTasks(String jobHistoryFile,
  785. JobHistory.JobInfo job, FileSystem fs)</tt>. <p>
  786. Changed the signature of the public method
  787. <tt>org.apache.hadoop.mapred.JobHistory.parseHistory(File path, Listener l)</tt>
  788. to <tt>JobHistory.parseHistoryFromFS(String path, Listener l, FileSystem fs)</tt>.
  789. </td>
  790. </tr>
  791. <tr>
  792. <td>
  793. <a href="https://issues.apache.org/jira/browse/HADOOP-2055">HADOOP-2055</a>
  794. </td>
  795. <td>
  796. mapred
  797. </td>
  798. <td>
  799. Users are now provided the ability to specify what paths to ignore when processing the job input directory
  800. (apart from the filenames that start with "_" and ".").
  801. To do this, two new methods were defined:
  802. <tt><ul>
  803. <li>FileInputFormat.setInputPathFilter(JobConf, PathFilter)
  804. <li>FileInputFormat.getInputPathFilter(JobConf)
  805. </ul></tt>
  806. </td>
  807. </tr>
  808. <tr>
  809. <td>
  810. <a href="https://issues.apache.org/jira/browse/HADOOP-2116">HADOOP-2116</a>
  811. </td>
  812. <td>
  813. mapred
  814. </td>
  815. <td>
  816. Restructured the local job directory on the tasktracker. Users are provided with a job-specific shared directory
  817. (<tt>mapred-local/taskTracker/jobcache/$jobid/work</tt>) for use as scratch
  818. space, through configuration property and system property
  819. <tt>job.local.dir</tt>. The directory <tt>../work</tt> is no longer available from the task's current working directory.
  820. </td>
  821. </tr>
  822. <tr>
  823. <td>
  824. <a href="https://issues.apache.org/jira/browse/HADOOP-1622">HADOOP-1622</a>
  825. </td>
  826. <td>
  827. mapred
  828. </td>
  829. <td>
  830. Added new command line options for <tt>hadoop jar</tt> command:
  831. <p>
  832. <tt>hadoop jar -files &lt;comma seperated list of files&gt; -libjars &lt;comma
  833. seperated list of jars&gt; -archives &lt;comma seperated list of
  834. archives&gt; </tt>
  835. <p>
  836. where the options have these meanings:
  837. <p>
  838. <ul>
  839. <li><tt>-files</tt> options allows you to speficy comma seperated list of path which
  840. would be present in your current working directory of your task <br>
  841. <li><tt>-libjars</tt> option allows you to add jars to the classpaths of the maps and
  842. reduces. <br>
  843. <li><tt>-archives</tt> allows you to pass archives as arguments that are
  844. unzipped/unjarred and a link with name of the jar/zip are created in the
  845. current working directory if tasks.
  846. </ul>
  847. </td>
  848. </tr>
  849. <tr>
  850. <td>
  851. <a href="https://issues.apache.org/jira/browse/HADOOP-2823">HADOOP-2823</a>
  852. </td>
  853. <td>
  854. record
  855. </td>
  856. <td>
  857. Removed the deprecated methods in
  858. <tt>org.apache.hadoop.record.compiler.generated.SimpleCharStream</tt>:
  859. <tt><ul>
  860. <li>public int getColumn()
  861. <li>and public int getLine()
  862. </ul></tt>
  863. </td>
  864. </tr>
  865. <tr>
  866. <td>
  867. <a href="https://issues.apache.org/jira/browse/HADOOP-2551">HADOOP-2551</a>
  868. </td>
  869. <td>
  870. scripts
  871. </td>
  872. <td>
  873. Introduced new environment variables to allow finer grained control of Java options passed to server and
  874. client JVMs. See the new <tt>*_OPTS</tt> variables in <tt>conf/hadoop-env.sh</tt>.
  875. </td>
  876. </tr>
  877. <tr>
  878. <td>
  879. <a href="https://issues.apache.org/jira/browse/HADOOP-3099">HADOOP-3099</a>
  880. </td>
  881. <td>
  882. util
  883. </td>
  884. <td>
  885. Added a new <tt>-p</tt> option to <tt>distcp</tt> for preserving file and directory status:
  886. <pre>
  887. -p[rbugp] Preserve status
  888. r: replication number
  889. b: block size
  890. u: user
  891. g: group
  892. p: permission
  893. </pre>
  894. The <tt>-p</tt> option alone is equivalent to <tt>-prbugp</tt>
  895. </td>
  896. </tr>
  897. <tr>
  898. <td>
  899. <a href="https://issues.apache.org/jira/browse/HADOOP-2821">HADOOP-2821</a>
  900. </td>
  901. <td>
  902. util
  903. </td>
  904. <td>
  905. Removed the deprecated classes <tt>org.apache.hadoop.util.ShellUtil</tt> and <tt>org.apache.hadoop.util.ToolBase</tt>.
  906. </td>
  907. </tr>
  908. </tbody></table>
  909. </ul>
  910. </body></html>