releasenotes.html 30 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506507508509510511512513514515516517518519520521522523524525526527528529530531532533534535536537538539540541542543544545546547548549550551552553554555556557558559560561562563564565566567568569570571572573574575576577578579580581582583584585586587588589590591592593594595596597598599600601602603604605606607608609610611612613614615616617618619620621622623624625626627628629630631632633634635636637638639640641642643644645646647648649650651652653654655656657658659660661662663664665666667668669670671672673674675676677678679680681682683684685686687688689690691692693694695696697698699700701702703704705706707708709710711712713714715716717718719720721722723724725726727728729730731732733734735736737738739740741742743744745746747748749750751752753754755756757758759760761762763764765766767768769770771772773774775776777778779780781782783784785786787788789790791792793794795796797798799800801802803804805806807808809810811812813814815816817818819820821822823824825826827828829830831832833834835836837838839840841842843844845846847848849850851852853854855856857858859860861862863864865866867868869870871872873874875876877878879880881882883884885886887888889890891892893894895896897898899900901902903904905906907908909910911912913914915916917918919920921922923924925926927928929930931932933934935936937938939940941942943944945946947948949950951952953954955956
  1. <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
  2. <html><head>
  3. <title>Hadoop 0.17.2 Release Notes</title></head>
  4. <body>
  5. <font face="sans-serif">
  6. <h1>Hadoop 0.17.2 Release Notes</h1>
  7. The bug fixes are listed below.
  8. <ul><a name="changes">
  9. <h2>Changes Since Hadoop 0.17.1</h2>
  10. <ul>
  11. <li>[<a href='https://issues.apache.org/jira/browse/HADOOP-3678'>HADOOP-3678</a>] - Avoid spurious exceptions logged at DataNode when clients
  12. read from DFS.</li>
  13. <li>[<a href='https://issues.apache.org/jira/browse/HADOOP-3760'>HADOOP-3760</a>] - Fix a bug with HDFS file close()</li>
  14. <li>[<a href='https://issues.apache.org/jira/browse/HADOOP-3707'>HADOOP-3707</a>] - NameNode keeps a count of number of blocks scheduled
  15. to be written to a datanode and uses it to avoid allocating more
  16. blocks than a datanode can hold.</li>
  17. <li>[<a href='https://issues.apache.org/jira/browse/HADOOP-3681'>HADOOP-3681</a>] - DFSClient can get into an infinite loop while closing
  18. a file if there are some errors.</li>
  19. <li>[<a href='https://issues.apache.org/jira/browse/HADOOP-3002'>HADOOP-3002</a>] - Hold off block removal while in safe mode.</li>
  20. <li>[<a href='https://issues.apache.org/jira/browse/HADOOP-3685'>HADOOP-3685</a>] - Unbalanced replication target.</li>
  21. <li>[<a href='https://issues.apache.org/jira/browse/HADOOP-3758'>HADOOP-3758</a>] - Shutdown datanode on version mismatch instead of retrying
  22. continuously, preventing excessive logging at the namenode.</li>
  23. <li>[<a href='https://issues.apache.org/jira/browse/HADOOP-3633'>HADOOP-3633</a>] - Correct exception handling in DataXceiveServer, and throttle
  24. the number of xceiver threads in a data-node.</li>
  25. <li>[<a href='https://issues.apache.org/jira/browse/HADOOP-3370'>HADOOP-3370</a>] - Ensure that the TaskTracker.runningJobs data-structure is
  26. correctly cleaned-up on task completion.</li>
  27. <li>[<a href='https://issues.apache.org/jira/browse/HADOOP-3813'>HADOOP-3813</a>] - Fix task-output clean-up on HDFS to use the recursive
  28. FileSystem.delete rather than the FileUtil.fullyDelete.</li>
  29. <li>[<a href='https://issues.apache.org/jira/browse/HADOOP-3859'>HADOOP-3859</a>] - Allow the maximum number of xceivers in the data node to
  30. be configurable.</li>
  31. <li>[<a href='https://issues.apache.org/jira/browse/HADOOP-3931'>HADOOP-3931</a>] - Fix corner case in the map-side sort that causes some values
  32. to be counted as too large and cause pre-mature spills to disk. Some values
  33. will also bypass the combiner incorrectly.</li>
  34. </ul>
  35. </ul>
  36. <h1>Hadoop 0.17.1 Release Notes</h1>
  37. The bug fixes are listed below.
  38. <ul><a name="changes">
  39. <h2>Changes Since Hadoop 0.17.0</h2>
  40. <ul>
  41. <li>[<a href='https://issues.apache.org/jira/browse/HADOOP-2159'>HADOOP-2159</a>] - Namenode stuck in safemode
  42. </li>
  43. <li>[<a href='https://issues.apache.org/jira/browse/HADOOP-3442'>HADOOP-3442</a>] - QuickSort may get into unbounded recursion
  44. </li>
  45. <li>[<a href='https://issues.apache.org/jira/browse/HADOOP-3472'>HADOOP-3472</a>] - MapFile.Reader getClosest() function returns incorrect results when before is true
  46. </li>
  47. <li>[<a href='https://issues.apache.org/jira/browse/HADOOP-3475'>HADOOP-3475</a>] - MapOutputBuffer allocates 4x as much space to record capacity as intended
  48. </li>
  49. <li>[<a href='https://issues.apache.org/jira/browse/HADOOP-3477'>HADOOP-3477</a>] - release tar.gz contains duplicate files
  50. </li>
  51. <li>[<a href='https://issues.apache.org/jira/browse/HADOOP-3522'>HADOOP-3522</a>] - ValuesIterator.next() doesn't return a new object, thus failing many equals() tests.
  52. </li>
  53. <li>[<a href='https://issues.apache.org/jira/browse/HADOOP-3526'>HADOOP-3526</a>] - contrib/data_join doesn't work
  54. </li>
  55. <li>[<a href='https://issues.apache.org/jira/browse/HADOOP-3550'>HADOOP-3550</a>] - Reduce tasks failing with OOM
  56. </li>
  57. <li>[<a href='https://issues.apache.org/jira/browse/HADOOP-3565'>HADOOP-3565</a>] - JavaSerialization can throw java.io.StreamCorruptedException
  58. </li>
  59. </ul>
  60. </ul>
  61. <h1>Hadoop 0.17.0 Release Notes</h1>
  62. These release notes include new developer and user facing incompatibilities, features, and major improvements. The table below is sorted by Component.
  63. <ul><a name="changes">
  64. <h2>Changes Since Hadoop 0.16.4</h2>
  65. <table border="1" width="100%" cellpadding="4">
  66. <tbody><tr>
  67. <td><b>Issue</b></td>
  68. <td><b>Component</b></td>
  69. <td><b>Notes</b></td>
  70. </tr>
  71. <tr>
  72. <td>
  73. <a href="https://issues.apache.org/jira/browse/HADOOP-2828">HADOOP-2828</a>
  74. </td>
  75. <td>
  76. conf
  77. </td>
  78. <td>
  79. Remove these deprecated methods in
  80. <tt>org.apache.hadoop.conf.Configuration</tt>:<br><tt><ul><li>
  81. public Object getObject(String name) </li><li>
  82. public void setObject(String name, Object value) </li><li>
  83. public Object get(String name, Object defaultValue) </li><li>
  84. public void set(String name, Object value)</li><li>public Iterator entries()
  85. </li></ul></tt></td>
  86. </tr>
  87. <tr>
  88. <td nowrap>
  89. <a href="https://issues.apache.org/jira/browse/HADOOP-2410">HADOOP-2410</a>
  90. </td>
  91. <td>
  92. contrib/ec2
  93. </td>
  94. <td>
  95. The command <tt>hadoop-ec2
  96. run</tt> has been replaced by <tt>hadoop-ec2 launch-cluster
  97. &lt;group&gt; &lt;number of instances&gt;</tt>, and <tt>hadoop-ec2
  98. start-hadoop</tt> has been removed since Hadoop is started on instance
  99. start up. See <a href="http://wiki.apache.org/hadoop/AmazonEC2">http://wiki.apache.org/hadoop/AmazonEC2</a>
  100. for details.
  101. </td>
  102. </tr>
  103. <tr>
  104. <td>
  105. <a href="https://issues.apache.org/jira/browse/HADOOP-2796">HADOOP-2796</a>
  106. </td>
  107. <td>
  108. contrib/hod
  109. </td>
  110. <td>
  111. Added a provision to reliably detect a
  112. failing script's exit code. When the HOD script option
  113. returns a non-zero exit code, look for a <tt>script.exitcode</tt>
  114. file written to the HOD cluster directory. If this file is present, it
  115. means the script failed with the exit code given in the file.
  116. </td>
  117. </tr>
  118. <tr>
  119. <td>
  120. <a href="https://issues.apache.org/jira/browse/HADOOP-2775">HADOOP-2775</a>
  121. </td>
  122. <td>
  123. contrib/hod
  124. </td>
  125. <td>
  126. Added A unit testing framework based on
  127. pyunit to HOD. Developers contributing patches to HOD should now
  128. contribute unit tests along with the patches when possible.
  129. </td>
  130. </tr>
  131. <tr>
  132. <td>
  133. <a href="https://issues.apache.org/jira/browse/HADOOP-3137">HADOOP-3137</a>
  134. </td>
  135. <td>
  136. contrib/hod
  137. </td>
  138. <td>
  139. The HOD version is now the same as the Hadoop version.
  140. </td>
  141. </tr>
  142. <tr>
  143. <td>
  144. <a href="https://issues.apache.org/jira/browse/HADOOP-2855">HADOOP-2855</a>
  145. </td>
  146. <td>
  147. contrib/hod
  148. </td>
  149. <td>
  150. HOD now handles relative
  151. paths correctly for important HOD options such as the cluster directory,
  152. tarball option, and script file.
  153. </td>
  154. </tr>
  155. <tr>
  156. <td>
  157. <a href="https://issues.apache.org/jira/browse/HADOOP-2899">HADOOP-2899</a>
  158. </td>
  159. <td>
  160. contrib/hod
  161. </td>
  162. <td>
  163. HOD now cleans up the HOD generated mapred system directory
  164. at cluster deallocation time.
  165. </td>
  166. </tr>
  167. <tr>
  168. <td>
  169. <a href="https://issues.apache.org/jira/browse/HADOOP-2982">HADOOP-2982</a>
  170. </td>
  171. <td>
  172. contrib/hod
  173. </td>
  174. <td>
  175. The number of free nodes in the cluster
  176. is computed using a better algorithm that filters out inconsistencies in
  177. node status as reported by Torque.
  178. </td>
  179. </tr>
  180. <tr>
  181. <td>
  182. <a href="https://issues.apache.org/jira/browse/HADOOP-2947">HADOOP-2947</a>
  183. </td>
  184. <td>
  185. contrib/hod
  186. </td>
  187. <td>
  188. The stdout and stderr streams of
  189. daemons are redirected to files that are created under the hadoop log
  190. directory. Users can now send a <tt>kill 3</tt> signal to the daemons to get stack traces
  191. and thread dumps for debugging.
  192. </td>
  193. </tr>
  194. <tr>
  195. <td>
  196. <a href="https://issues.apache.org/jira/browse/HADOOP-3168">HADOOP-3168</a>
  197. </td>
  198. <td>
  199. contrib/streaming
  200. </td>
  201. <td>
  202. Decreased the frequency of logging
  203. in Hadoop streaming (from every 100 records to every 10,000 records).
  204. </td>
  205. </tr>
  206. <tr>
  207. <td>
  208. <a href="https://issues.apache.org/jira/browse/HADOOP-3040">HADOOP-3040</a>
  209. </td>
  210. <td>
  211. contrib/streaming
  212. </td>
  213. <td>
  214. Fixed a critical bug to restore important functionality in Hadoop streaming. If the first character on a line is
  215. the separator, then an empty key is assumed and the whole line is the value.
  216. </td>
  217. </tr>
  218. <tr>
  219. <td>
  220. <a href="https://issues.apache.org/jira/browse/HADOOP-2820">HADOOP-2820</a>
  221. </td>
  222. <td>
  223. contrib/streaming
  224. </td>
  225. <td>
  226. Removed these deprecated classes: <br><tt><ul><li>org.apache.hadoop.streaming.StreamLineRecordReader</li><li>org.apache.hadoop.streaming.StreamOutputFormat</li><li>org.apache.hadoop.streaming.StreamSequenceRecordReader</li></ul></tt></td>
  227. </tr>
  228. <tr>
  229. <td>
  230. <a href="https://issues.apache.org/jira/browse/HADOOP-3280">HADOOP-3280</a>
  231. </td>
  232. <td>
  233. contrib/streaming
  234. </td>
  235. <td>
  236. Added the
  237. <tt>mapred.child.ulimit</tt> configuration variable to limit the maximum virtual memory allocated to processes launched by the
  238. Map-Reduce framework. This can be used to control both the Mapper/Reducer
  239. tasks and applications using Hadoop pipes, Hadoop streaming etc.
  240. </td>
  241. </tr>
  242. <tr>
  243. <td>
  244. <a href="https://issues.apache.org/jira/browse/HADOOP-2657">HADOOP-2657</a>
  245. </td>
  246. <td>
  247. dfs
  248. </td>
  249. <td>Added the new API <tt>DFSOututStream.flush()</tt> to
  250. flush all outstanding data to DataNodes.
  251. </td>
  252. </tr>
  253. <tr>
  254. <td>
  255. <a href="https://issues.apache.org/jira/browse/HADOOP-2219">HADOOP-2219</a>
  256. </td>
  257. <td>
  258. dfs
  259. </td>
  260. <td>
  261. Added a new <tt>fs -count</tt> command for
  262. counting the number of bytes, files, and directories under a given path. <br>
  263. <br>
  264. Added a new RPC <tt>getContentSummary(String path)</tt> to ClientProtocol.
  265. </td>
  266. </tr>
  267. <tr>
  268. <td>
  269. <a href="https://issues.apache.org/jira/browse/HADOOP-2559">HADOOP-2559</a>
  270. </td>
  271. <td>
  272. dfs
  273. </td>
  274. <td>
  275. Changed DFS block placement to
  276. allocate the first replica locally, the second off-rack, and the third
  277. intra-rack from the second.
  278. </td>
  279. </tr>
  280. <tr>
  281. <td>
  282. <a href="https://issues.apache.org/jira/browse/HADOOP-2758">HADOOP-2758</a>
  283. </td>
  284. <td>
  285. dfs
  286. </td>
  287. <td>
  288. Improved DataNode CPU usage by 50% while serving data to clients.
  289. </td>
  290. </tr>
  291. <tr>
  292. <td>
  293. <a href="https://issues.apache.org/jira/browse/HADOOP-2634">HADOOP-2634</a>
  294. </td>
  295. <td>
  296. dfs
  297. </td>
  298. <td>
  299. Deprecated ClientProtocol's <tt>exists()</tt> method. Use <tt>getFileInfo(String)</tt> instead.
  300. </td>
  301. </tr>
  302. <tr>
  303. <td>
  304. <a href="https://issues.apache.org/jira/browse/HADOOP-2423">HADOOP-2423</a>
  305. </td>
  306. <td>
  307. dfs
  308. </td>
  309. <td>
  310. Improved <tt>FSDirectory.mkdirs(...)</tt> performance by about 50% as measured by the NNThroughputBenchmark.
  311. </td>
  312. </tr>
  313. <tr>
  314. <td>
  315. <a href="https://issues.apache.org/jira/browse/HADOOP-3124">HADOOP-3124</a>
  316. </td>
  317. <td>
  318. dfs
  319. </td>
  320. <td>
  321. Made DataNode socket write timeout configurable, however the configuration variable is undocumented.
  322. </td>
  323. </tr>
  324. <tr>
  325. <td>
  326. <a href="https://issues.apache.org/jira/browse/HADOOP-2470">HADOOP-2470</a>
  327. </td>
  328. <td>
  329. dfs
  330. </td>
  331. <td>
  332. Removed <tt>open()</tt> and <tt>isDir()</tt> methods from ClientProtocol without first deprecating. <br>
  333. <br>
  334. Remove deprecated <tt>getContentLength()</tt> from ClientProtocol.<br>
  335. <br>
  336. Deprecated <tt>isDirectory</tt> in DFSClient. Use <tt>getFileStatus()</tt> instead.
  337. </td>
  338. </tr>
  339. <tr>
  340. <td>
  341. <a href="https://issues.apache.org/jira/browse/HADOOP-2854">HADOOP-2854</a>
  342. </td>
  343. <td>
  344. dfs
  345. </td>
  346. <td>
  347. Removed deprecated method <tt>org.apache.hadoop.ipc.Server.getUserInfo()</tt>.
  348. </td>
  349. </tr>
  350. <tr>
  351. <td>
  352. <a href="https://issues.apache.org/jira/browse/HADOOP-2239">HADOOP-2239</a>
  353. </td>
  354. <td>
  355. dfs
  356. </td>
  357. <td>
  358. Added a new FileSystem, HftpsFileSystem, that allows access to HDFS data over HTTPS.
  359. </td>
  360. </tr>
  361. <tr>
  362. <td>
  363. <a href="https://issues.apache.org/jira/browse/HADOOP-771">HADOOP-771</a>
  364. </td>
  365. <td>
  366. dfs
  367. </td>
  368. <td>
  369. Added a new method to <tt>FileSystem</tt> API, <tt>delete(path, boolean)</tt>,
  370. and deprecated the previous <tt>delete(path)</tt> method.
  371. The new method recursively deletes files only if boolean is set to true.
  372. </td>
  373. </tr>
  374. <tr>
  375. <td>
  376. <a href="https://issues.apache.org/jira/browse/HADOOP-3239">HADOOP-3239</a>
  377. </td>
  378. <td>
  379. dfs
  380. </td>
  381. <td>
  382. Modified <tt>org.apache.hadoop.dfs.FSDirectory.getFileInfo(String)</tt> to return null when a file is not
  383. found instead of throwing FileNotFoundException.
  384. </td>
  385. </tr>
  386. <tr>
  387. <td>
  388. <a href="https://issues.apache.org/jira/browse/HADOOP-3091">HADOOP-3091</a>
  389. </td>
  390. <td>
  391. dfs
  392. </td>
  393. <td>
  394. Enhanced <tt>hadoop dfs -put</tt> command to accept multiple
  395. sources when destination is a directory.
  396. </td>
  397. </tr>
  398. <tr>
  399. <td>
  400. <a href="https://issues.apache.org/jira/browse/HADOOP-2192">HADOOP-2192</a>
  401. </td>
  402. <td>
  403. dfs
  404. </td>
  405. <td>
  406. Modified <tt>hadoop dfs -mv</tt> to be closer in functionality to
  407. the Linux <tt>mv</tt> command by removing unnecessary output and return
  408. an error message when moving non existent files/directories.
  409. </td>
  410. </tr>
  411. <tr>
  412. <td>
  413. <u1:p></u1:p><a href="https://issues.apache.org/jira/browse/HADOOP-1985">HADOOP-1985</a>
  414. </td>
  415. <td>
  416. dfs <br>
  417. mapred
  418. </td>
  419. <td>
  420. Added rack awareness for map tasks and moves the rack resolution logic to the
  421. NameNode and JobTracker. <p> The administrator can specify a
  422. loadable class given by topology.node.switch.mapping.impl to specify the
  423. class implementing the logic for rack resolution. The class must implement
  424. a method - resolve(List&lt;String&gt; names), where names is the list of
  425. DNS-names/IP-addresses that we want resolved. The return value is a list of
  426. resolved network paths of the form /foo/rack, where rack is the rackID
  427. where the node belongs to and foo is the switch where multiple racks are
  428. connected, and so on. The default implementation of this class is packaged
  429. along with hadoop and points to org.apache.hadoop.net.ScriptBasedMapping
  430. and this class loads a script that can be used for rack resolution. The
  431. script location is configurable. It is specified by
  432. topology.script.file.name and defaults to an empty script. In the case
  433. where the script name is empty, /default-rack is returned for all
  434. dns-names/IP-addresses. The loadable topology.node.switch.mapping.impl provides
  435. administrators fleixibilty to define how their site's node resolution
  436. should happen. <br>
  437. For mapred, one can also specify the level of the cache w.r.t the number of
  438. levels in the resolved network path - defaults to two. This means that the
  439. JobTracker will cache tasks at the host level and at the rack level. <br>
  440. Known issue: the task caching will not work with levels greater than 2
  441. (beyond racks). This bug is tracked in <a href="https://issues.apache.org/jira/browse/HADOOP-3296">HADOOP-3296</a>.
  442. </td>
  443. </tr>
  444. <tr>
  445. <td>
  446. <a href="https://issues.apache.org/jira/browse/HADOOP-2063">HADOOP-2063</a>
  447. </td>
  448. <td>
  449. fs
  450. </td>
  451. <td>
  452. Added a new option <tt>-ignoreCrc</tt> to <tt>fs -get</tt> and <tt>fs -copyToLocal</tt>. The option causes CRC checksums to be
  453. ignored for this command so that corrupt files may be downloaded.
  454. </td>
  455. </tr>
  456. <tr>
  457. <td>
  458. <a href="https://issues.apache.org/jira/browse/HADOOP-3001">HADOOP-3001</a>
  459. </td>
  460. <td>
  461. fs
  462. </td>
  463. <td>
  464. Added a new Map/Reduce framework
  465. counters that track the number of bytes read and written to HDFS, local,
  466. KFS, and S3 file systems.
  467. </td>
  468. </tr>
  469. <tr>
  470. <td>
  471. <a href="https://issues.apache.org/jira/browse/HADOOP-2027">HADOOP-2027</a>
  472. </td>
  473. <td>
  474. fs
  475. </td>
  476. <td>
  477. Added a new FileSystem method <tt>getFileBlockLocations</tt> to return the number of bytes in each block in a file
  478. via a single rpc to the NameNode. Deprecated <tt>getFileCacheHints</tt>.
  479. </td>
  480. </tr>
  481. <tr>
  482. <td>
  483. <a href="https://issues.apache.org/jira/browse/HADOOP-2839">HADOOP-2839</a>
  484. </td>
  485. <td>
  486. fs
  487. </td>
  488. <td>
  489. Removed deprecated method <tt>org.apache.hadoop.fs.FileSystem.globPaths()</tt>.
  490. </td>
  491. </tr>
  492. <tr>
  493. <td>
  494. <a href="https://issues.apache.org/jira/browse/HADOOP-2563">HADOOP-2563</a>
  495. </td>
  496. <td>
  497. fs
  498. </td>
  499. <td>
  500. Removed deprecated method <tt>org.apache.hadoop.fs.FileSystem.listPaths()</tt>.
  501. </td>
  502. </tr>
  503. <tr>
  504. <td>
  505. <a href="https://issues.apache.org/jira/browse/HADOOP-1593">HADOOP-1593</a>
  506. </td>
  507. <td>
  508. fs
  509. </td>
  510. <td>
  511. Modified FSShell commands to accept non-default paths. Now you can commands like <tt>hadoop dfs -ls hdfs://remotehost1:port/path</tt>
  512. and <tt>hadoop dfs -ls hdfs://remotehost2:port/path</tt> without changing your Hadoop config.
  513. </td>
  514. </tr>
  515. <tr>
  516. <td>
  517. <a href="https://issues.apache.org/jira/browse/HADOOP-3048">HADOOP-3048</a>
  518. </td>
  519. <td>
  520. io
  521. </td>
  522. <td>
  523. Added a new API and a default
  524. implementation to convert and restore serializations of objects to strings.
  525. </td>
  526. </tr>
  527. <tr>
  528. <td>
  529. <a href="https://issues.apache.org/jira/browse/HADOOP-3152">HADOOP-3152</a>
  530. </td>
  531. <td>
  532. io
  533. </td>
  534. <td>
  535. Add a static method
  536. <tt>MapFile.setIndexInterval(Configuration, int interval)</tt> so that Map/Reduce
  537. jobs using <tt>MapFileOutputFormat</tt> can set the index interval.
  538. </td>
  539. </tr>
  540. <tr>
  541. <td>
  542. <a href="https://issues.apache.org/jira/browse/HADOOP-3073">HADOOP-3073</a>
  543. </td>
  544. <td>
  545. ipc
  546. </td>
  547. <td>
  548. <tt>SocketOutputStream.close()</tt> now closes the
  549. underlying channel. This increase compatibility with
  550. <tt>java.net.Socket.getOutputStream</tt>.
  551. </td>
  552. </tr>
  553. <tr>
  554. <td>
  555. <a href="https://issues.apache.org/jira/browse/HADOOP-3041">HADOOP-3041</a>
  556. </td>
  557. <td>
  558. mapred
  559. </td>
  560. <td>
  561. Deprecated <tt>JobConf.setOutputPath</tt> and <tt>JobConf.getOutputPath</tt>.<p>
  562. Deprecated <tt>OutputFormatBase</tt>. Added <tt>FileOutputFormat</tt>. Existing output
  563. formats extending <tt>OutputFormatBase</tt> now extend <tt>FileOutputFormat</tt>. <p>
  564. Added the following methods to <tt>FileOutputFormat</tt>:
  565. <tt><ul>
  566. <li>public static void setOutputPath(JobConf conf, Path outputDir)
  567. <li>public static Path getOutputPath(JobConf conf)
  568. <li>public static Path getWorkOutputPath(JobConf conf)
  569. <li>static void setWorkOutputPath(JobConf conf, Path outputDir)
  570. </ul></tt>
  571. </td>
  572. </tr>
  573. <tr>
  574. <td>
  575. <a href="https://issues.apache.org/jira/browse/HADOOP-3204">HADOOP-3204</a>
  576. </td>
  577. <td>
  578. mapred
  579. </td>
  580. <td>
  581. Fixed <tt>ReduceTask.LocalFSMerger</tt> to handle errors and exceptions better. Prior to this all
  582. exceptions except IOException would be silently ignored.
  583. </td>
  584. </tr>
  585. <tr>
  586. <td>
  587. <a href="https://issues.apache.org/jira/browse/HADOOP-1986">HADOOP-1986</a>
  588. </td>
  589. <td>
  590. mapred
  591. </td>
  592. <td>
  593. Programs that implement the raw
  594. <tt>Mapper</tt> or <tt>Reducer</tt> interfaces will need modification to compile with this
  595. release. For example, <p>
  596. <pre>
  597. class MyMapper implements Mapper {
  598. public void map(WritableComparable key, Writable val,
  599. OutputCollector out, Reporter reporter) throws IOException {
  600. // ...
  601. }
  602. // ...
  603. }
  604. </pre>
  605. will need to be changed to refer to the parameterized type. For example: <p>
  606. <pre>
  607. class MyMapper implements Mapper&lt;WritableComparable, Writable, WritableComparable, Writable&gt; {
  608. public void map(WritableComparable key, Writable val,
  609. OutputCollector&lt;WritableComparable, Writable&gt;
  610. out, Reporter reporter) throws IOException {
  611. // ...
  612. }
  613. // ...
  614. }
  615. </pre>
  616. Similarly implementations of the following raw interfaces will need
  617. modification:
  618. <tt><ul>
  619. <li>InputFormat
  620. <li>OutputCollector
  621. <li>OutputFormat
  622. <li>Partitioner
  623. <li>RecordReader
  624. <li>RecordWriter
  625. </ul></tt>
  626. </td>
  627. </tr>
  628. <tr>
  629. <td>
  630. <a href="https://issues.apache.org/jira/browse/HADOOP-910">HADOOP-910</a>
  631. </td>
  632. <td>
  633. mapred
  634. </td>
  635. <td>
  636. Reducers now perform merges of
  637. shuffle data (both in-memory and on disk) while fetching map outputs.
  638. Earlier, during shuffle they used to merge only the in-memory outputs.
  639. </td>
  640. </tr>
  641. <tr>
  642. <td>
  643. <a href="https://issues.apache.org/jira/browse/HADOOP-2822">HADOOP-2822</a>
  644. </td>
  645. <td>
  646. mapred
  647. </td>
  648. <td>
  649. Removed the deprecated classes <tt>org.apache.hadoop.mapred.InputFormatBase</tt>
  650. and <tt>org.apache.hadoop.mapred.PhasedFileSystem</tt>.
  651. </td>
  652. </tr>
  653. <tr>
  654. <td>
  655. <a href="https://issues.apache.org/jira/browse/HADOOP-2817">HADOOP-2817</a>
  656. </td>
  657. <td>
  658. mapred
  659. </td>
  660. <td>
  661. Removed the deprecated method
  662. <tt>org.apache.hadoop.mapred.ClusterStatus.getMaxTasks()</tt>
  663. and the deprecated configuration property <tt>mapred.tasktracker.tasks.maximum</tt>.
  664. </td>
  665. </tr>
  666. <tr>
  667. <td>
  668. <a href="https://issues.apache.org/jira/browse/HADOOP-2825">HADOOP-2825</a>
  669. </td>
  670. <td>
  671. mapred
  672. </td>
  673. <td>
  674. Removed the deprecated method
  675. <tt>org.apache.hadoop.mapred.MapOutputLocation.getFile(FileSystem fileSys, Path
  676. localFilename, int reduce, Progressable pingee, int timeout)</tt>.
  677. </td>
  678. </tr>
  679. <tr>
  680. <td>
  681. <a href="https://issues.apache.org/jira/browse/HADOOP-2818">HADOOP-2818</a>
  682. </td>
  683. <td>
  684. mapred
  685. </td>
  686. <td>
  687. Removed the deprecated methods
  688. <tt>org.apache.hadoop.mapred.Counters.getDisplayName(String counter)</tt> and
  689. <tt>org.apache.hadoop.mapred.Counters.getCounterNames()</tt>.
  690. Undeprecated the method
  691. <tt>org.apache.hadoop.mapred.Counters.getCounter(String counterName)</tt>.
  692. </td>
  693. </tr>
  694. <tr>
  695. <td>
  696. <a href="https://issues.apache.org/jira/browse/HADOOP-2826">HADOOP-2826</a>
  697. </td>
  698. <td>
  699. mapred
  700. </td>
  701. <td>
  702. Changed The signature of the method
  703. <tt>public org.apache.hadoop.streaming.UTF8ByteArrayUtils.readLIne(InputStream)</tt> to
  704. <tt>UTF8ByteArrayUtils.readLIne(LineReader, Text)</tt>. Since the old
  705. signature is not deprecated, any code using the old method must be changed
  706. to use the new method.
  707. <p>
  708. Removed the deprecated methods <tt>org.apache.hadoop.mapred.FileSplit.getFile()</tt>
  709. and <tt>org.apache.hadoop.mapred.LineRecordReader.readLine(InputStream in,
  710. OutputStream out)</tt>.
  711. <p>
  712. Made the constructor <tt>org.apache.hadoop.mapred.LineRecordReader.LineReader(InputStream in, Configuration
  713. conf)</tt> public.
  714. </td>
  715. </tr>
  716. <tr>
  717. <td>
  718. <a href="https://issues.apache.org/jira/browse/HADOOP-2819">HADOOP-2819</a>
  719. </td>
  720. <td>
  721. mapred
  722. </td>
  723. <td>
  724. Removed these deprecated methods from <tt>org.apache.hadoop.JobConf</tt>:
  725. <tt><ul>
  726. <li>public Class getInputKeyClass()
  727. <li>public void setInputKeyClass(Class theClass)
  728. <li>public Class getInputValueClass()
  729. <li>public void setInputValueClass(Class theClass)
  730. </ul></tt>
  731. and undeprecated these methods:
  732. <tt><ul>
  733. <li>getSpeculativeExecution()
  734. <li>public void setSpeculativeExecution(boolean speculativeExecution)
  735. </ul></tt>
  736. </td>
  737. </tr>
  738. <tr>
  739. <td>
  740. <a href="https://issues.apache.org/jira/browse/HADOOP-3093">HADOOP-3093</a>
  741. </td>
  742. <td>
  743. mapred
  744. </td>
  745. <td>
  746. Added the following public methods to <tt>org.apache.hadoop.conf.Configuration</tt>:
  747. <tt><ul>
  748. <li>String[] Configuration.getStrings(String name, String... defaultValue)
  749. <li>void Configuration.setStrings(String name, String... values)
  750. </ul></tt>
  751. </td>
  752. </tr>
  753. <tr>
  754. <td>
  755. <a href="https://issues.apache.org/jira/browse/HADOOP-2399">HADOOP-2399</a>
  756. </td>
  757. <td>
  758. mapred
  759. </td>
  760. <td>
  761. The key and value objects that are given
  762. to the Combiner and Reducer are now reused between calls. This is much more
  763. efficient, but the user can not assume the objects are constant.
  764. </td>
  765. </tr>
  766. <tr>
  767. <td>
  768. <a href="https://issues.apache.org/jira/browse/HADOOP-3162">HADOOP-3162</a>
  769. </td>
  770. <td>
  771. mapred
  772. </td>
  773. <td>
  774. Deprecated the public methods <tt>org.apache.hadoop.mapred.JobConf.setInputPath(Path)</tt> and
  775. <tt>org.apache.hadoop.mapred.JobConf.addInputPath(Path)</tt>.
  776. <p>
  777. Added the following public methods to <tt>org.apache.hadoop.mapred.FileInputFormat</tt>:
  778. <tt><ul>
  779. <li>public static void setInputPaths(JobConf job, Path... paths); <br>
  780. <li>public static void setInputPaths(JobConf job, String commaSeparatedPaths); <br>
  781. <li>public static void addInputPath(JobConf job, Path path); <br>
  782. <li>public static void addInputPaths(JobConf job, String commaSeparatedPaths); <br>
  783. </ul></tt>
  784. Earlier code calling <tt>JobConf.setInputPath(Path)</tt> and <tt>JobConf.addInputPath(Path)</tt>
  785. should now call <tt>FileInputFormat.setInputPaths(JobConf, Path...)</tt> and
  786. <tt>FileInputFormat.addInputPath(Path)</tt> respectively.
  787. </td>
  788. </tr>
  789. <tr>
  790. <td>
  791. <a href="https://issues.apache.org/jira/browse/HADOOP-2178">HADOOP-2178</a>
  792. </td>
  793. <td>
  794. mapred
  795. </td>
  796. <td>
  797. Provided a new facility to
  798. store job history on DFS. Cluster administrator can now provide either localFS
  799. location or DFS location using configuration property
  800. <tt>mapred.job.history.location</tt> to store job histroy. History will also
  801. be logged in user specified location if the configuration property
  802. <tt>mapred.job.history.user.location</tt> is specified.
  803. <p>
  804. Removed these classes and method:
  805. <tt><ul>
  806. <li>org.apache.hadoop.mapred.DefaultJobHistoryParser.MasterIndex
  807. <li>org.apache.hadoop.mapred.DefaultJobHistoryParser.MasterIndexParseListener
  808. <li>org.apache.hadoop.mapred.DefaultJobHistoryParser.parseMasterIndex
  809. </ul></tt>
  810. <p>
  811. Changed the signature of the public method
  812. <tt>org.apache.hadoop.mapred.DefaultJobHistoryParser.parseJobTasks(File
  813. jobHistoryFile, JobHistory.JobInfo job)</tt> to
  814. <tt>DefaultJobHistoryParser.parseJobTasks(String jobHistoryFile,
  815. JobHistory.JobInfo job, FileSystem fs)</tt>. <p>
  816. Changed the signature of the public method
  817. <tt>org.apache.hadoop.mapred.JobHistory.parseHistory(File path, Listener l)</tt>
  818. to <tt>JobHistory.parseHistoryFromFS(String path, Listener l, FileSystem fs)</tt>.
  819. </td>
  820. </tr>
  821. <tr>
  822. <td>
  823. <a href="https://issues.apache.org/jira/browse/HADOOP-2055">HADOOP-2055</a>
  824. </td>
  825. <td>
  826. mapred
  827. </td>
  828. <td>
  829. Users are now provided the ability to specify what paths to ignore when processing the job input directory
  830. (apart from the filenames that start with "_" and ".").
  831. To do this, two new methods were defined:
  832. <tt><ul>
  833. <li>FileInputFormat.setInputPathFilter(JobConf, PathFilter)
  834. <li>FileInputFormat.getInputPathFilter(JobConf)
  835. </ul></tt>
  836. </td>
  837. </tr>
  838. <tr>
  839. <td>
  840. <a href="https://issues.apache.org/jira/browse/HADOOP-2116">HADOOP-2116</a>
  841. </td>
  842. <td>
  843. mapred
  844. </td>
  845. <td>
  846. Restructured the local job directory on the tasktracker. Users are provided with a job-specific shared directory
  847. (<tt>mapred-local/taskTracker/jobcache/$jobid/work</tt>) for use as scratch
  848. space, through configuration property and system property
  849. <tt>job.local.dir</tt>. The directory <tt>../work</tt> is no longer available from the task's current working directory.
  850. </td>
  851. </tr>
  852. <tr>
  853. <td>
  854. <a href="https://issues.apache.org/jira/browse/HADOOP-1622">HADOOP-1622</a>
  855. </td>
  856. <td>
  857. mapred
  858. </td>
  859. <td>
  860. Added new command line options for <tt>hadoop jar</tt> command:
  861. <p>
  862. <tt>hadoop jar -files &lt;comma seperated list of files&gt; -libjars &lt;comma
  863. seperated list of jars&gt; -archives &lt;comma seperated list of
  864. archives&gt; </tt>
  865. <p>
  866. where the options have these meanings:
  867. <p>
  868. <ul>
  869. <li><tt>-files</tt> options allows you to speficy comma seperated list of path which
  870. would be present in your current working directory of your task <br>
  871. <li><tt>-libjars</tt> option allows you to add jars to the classpaths of the maps and
  872. reduces. <br>
  873. <li><tt>-archives</tt> allows you to pass archives as arguments that are
  874. unzipped/unjarred and a link with name of the jar/zip are created in the
  875. current working directory if tasks.
  876. </ul>
  877. </td>
  878. </tr>
  879. <tr>
  880. <td>
  881. <a href="https://issues.apache.org/jira/browse/HADOOP-2823">HADOOP-2823</a>
  882. </td>
  883. <td>
  884. record
  885. </td>
  886. <td>
  887. Removed the deprecated methods in
  888. <tt>org.apache.hadoop.record.compiler.generated.SimpleCharStream</tt>:
  889. <tt><ul>
  890. <li>public int getColumn()
  891. <li>and public int getLine()
  892. </ul></tt>
  893. </td>
  894. </tr>
  895. <tr>
  896. <td>
  897. <a href="https://issues.apache.org/jira/browse/HADOOP-2551">HADOOP-2551</a>
  898. </td>
  899. <td>
  900. scripts
  901. </td>
  902. <td>
  903. Introduced new environment variables to allow finer grained control of Java options passed to server and
  904. client JVMs. See the new <tt>*_OPTS</tt> variables in <tt>conf/hadoop-env.sh</tt>.
  905. </td>
  906. </tr>
  907. <tr>
  908. <td>
  909. <a href="https://issues.apache.org/jira/browse/HADOOP-3099">HADOOP-3099</a>
  910. </td>
  911. <td>
  912. util
  913. </td>
  914. <td>
  915. Added a new <tt>-p</tt> option to <tt>distcp</tt> for preserving file and directory status:
  916. <pre>
  917. -p[rbugp] Preserve status
  918. r: replication number
  919. b: block size
  920. u: user
  921. g: group
  922. p: permission
  923. </pre>
  924. The <tt>-p</tt> option alone is equivalent to <tt>-prbugp</tt>
  925. </td>
  926. </tr>
  927. <tr>
  928. <td>
  929. <a href="https://issues.apache.org/jira/browse/HADOOP-2821">HADOOP-2821</a>
  930. </td>
  931. <td>
  932. util
  933. </td>
  934. <td>
  935. Removed the deprecated classes <tt>org.apache.hadoop.util.ShellUtil</tt> and <tt>org.apache.hadoop.util.ToolBase</tt>.
  936. </td>
  937. </tr>
  938. </tbody></table>
  939. </ul>
  940. </body></html>