releasenotes.html 36 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506507508509510
  1. <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
  2. <html><head>
  3. <title>Hadoop 0.19.2</title></head>
  4. <body>
  5. <font face="sans-serif">
  6. <h1>Hadoop 0.19.2 Release Notes</h1>
  7. The bug fixes and improvements are listed below.
  8. <ul>
  9. <h2>Changes Since Hadoop 0.19.1</h2>
  10. <h3> Bug
  11. </h3>
  12. <ul>
  13. <li>[<a href='https://issues.apache.org/jira/browse/HADOOP-3998'>HADOOP-3998</a>] - Got an exception from ClientFinalizer when the JT is terminated
  14. </li>
  15. <li>[<a href='https://issues.apache.org/jira/browse/HADOOP-4619'>HADOOP-4619</a>] - hdfs_write infinite loop when dfs fails and cannot write files &gt; 2 GB
  16. </li>
  17. <li>[<a href='https://issues.apache.org/jira/browse/HADOOP-4638'>HADOOP-4638</a>] - Exception thrown in/from RecoveryManager.recover() should be caught and handled
  18. </li>
  19. <li>[<a href='https://issues.apache.org/jira/browse/HADOOP-4719'>HADOOP-4719</a>] - The ls shell command documentation is out-dated
  20. </li>
  21. <li>[<a href='https://issues.apache.org/jira/browse/HADOOP-4780'>HADOOP-4780</a>] - Task Tracker burns a lot of cpu in calling getLocalCache
  22. </li>
  23. <li>[<a href='https://issues.apache.org/jira/browse/HADOOP-5146'>HADOOP-5146</a>] - LocalDirAllocator misses files on the local filesystem
  24. </li>
  25. <li>[<a href='https://issues.apache.org/jira/browse/HADOOP-5154'>HADOOP-5154</a>] - 4-way deadlock in FairShare scheduler
  26. </li>
  27. <li>[<a href='https://issues.apache.org/jira/browse/HADOOP-5213'>HADOOP-5213</a>] - BZip2CompressionOutputStream NullPointerException
  28. </li>
  29. <li>[<a href='https://issues.apache.org/jira/browse/HADOOP-5231'>HADOOP-5231</a>] - Negative number of maps in cluster summary
  30. </li>
  31. <li>[<a href='https://issues.apache.org/jira/browse/HADOOP-5233'>HADOOP-5233</a>] - Reducer not Succeded after 100%
  32. </li>
  33. <li>[<a href='https://issues.apache.org/jira/browse/HADOOP-5241'>HADOOP-5241</a>] - Reduce tasks get stuck because of over-estimated task size (regression from 0.18)
  34. </li>
  35. <li>[<a href='https://issues.apache.org/jira/browse/HADOOP-5247'>HADOOP-5247</a>] - NPEs in JobTracker and JobClient when mapred.jobtracker.completeuserjobs.maximum is set to zero.
  36. </li>
  37. <li>[<a href='https://issues.apache.org/jira/browse/HADOOP-5269'>HADOOP-5269</a>] - TaskTracker.runningTasks holding FAILED_UNCLEAN and KILLED_UNCLEAN taskStatuses forever in some cases.
  38. </li>
  39. <li>[<a href='https://issues.apache.org/jira/browse/HADOOP-5280'>HADOOP-5280</a>] - When expiring a lost launched task, JT doesn't remove the attempt from the taskidToTIPMap.
  40. </li>
  41. <li>[<a href='https://issues.apache.org/jira/browse/HADOOP-5285'>HADOOP-5285</a>] - JobTracker hangs for long periods of time
  42. </li>
  43. <li>[<a href='https://issues.apache.org/jira/browse/HADOOP-5326'>HADOOP-5326</a>] - bzip2 codec (CBZip2OutputStream) creates corrupted output file for some inputs
  44. </li>
  45. <li>[<a href='https://issues.apache.org/jira/browse/HADOOP-5333'>HADOOP-5333</a>] - The libhdfs append API is not coded correctly
  46. </li>
  47. <li>[<a href='https://issues.apache.org/jira/browse/HADOOP-5374'>HADOOP-5374</a>] - NPE in JobTracker.getTasksToSave() method
  48. </li>
  49. <li>[<a href='https://issues.apache.org/jira/browse/HADOOP-5376'>HADOOP-5376</a>] - JobInProgress.obtainTaskCleanupTask() throws an ArrayIndexOutOfBoundsException
  50. </li>
  51. <li>[<a href='https://issues.apache.org/jira/browse/HADOOP-5384'>HADOOP-5384</a>] - DataNodeCluster should not create blocks with generationStamp == 1
  52. </li>
  53. <li>[<a href='https://issues.apache.org/jira/browse/HADOOP-5392'>HADOOP-5392</a>] - JobTracker crashes during recovery if job files are garbled
  54. </li>
  55. <li>[<a href='https://issues.apache.org/jira/browse/HADOOP-5421'>HADOOP-5421</a>] - HADOOP-4638 has broken 0.19 compilation
  56. </li>
  57. <li>[<a href='https://issues.apache.org/jira/browse/HADOOP-5440'>HADOOP-5440</a>] - Successful taskid are not removed from TaskMemoryManager
  58. </li>
  59. <li>[<a href='https://issues.apache.org/jira/browse/HADOOP-5446'>HADOOP-5446</a>] - TaskTracker metrics are disabled
  60. </li>
  61. <li>[<a href='https://issues.apache.org/jira/browse/HADOOP-5449'>HADOOP-5449</a>] - Verify if JobHistory.HistoryCleaner works as expected
  62. </li>
  63. <li>[<a href='https://issues.apache.org/jira/browse/HADOOP-5454'>HADOOP-5454</a>] - SortedMapWritable: readFields() will not clear values before deserialization
  64. </li>
  65. <li>[<a href='https://issues.apache.org/jira/browse/HADOOP-5465'>HADOOP-5465</a>] - Blocks remain under-replicated
  66. </li>
  67. <li>[<a href='https://issues.apache.org/jira/browse/HADOOP-5479'>HADOOP-5479</a>] - NameNode should not send empty block replication request to DataNode
  68. </li>
  69. <li>[<a href='https://issues.apache.org/jira/browse/HADOOP-5522'>HADOOP-5522</a>] - Document job setup/cleaup tasks and task cleanup tasks in mapred tutorial
  70. </li>
  71. <li>[<a href='https://issues.apache.org/jira/browse/HADOOP-5549'>HADOOP-5549</a>] - ReplicationMonitor should schedule both replication and deletion work in one iteration
  72. </li>
  73. <li>[<a href='https://issues.apache.org/jira/browse/HADOOP-5551'>HADOOP-5551</a>] - Namenode permits directory destruction on overwrite
  74. </li>
  75. <li>[<a href='https://issues.apache.org/jira/browse/HADOOP-5554'>HADOOP-5554</a>] - DataNodeCluster should create blocks with the same generation stamp as the blocks created in CreateEditsLog
  76. </li>
  77. <li>[<a href='https://issues.apache.org/jira/browse/HADOOP-5557'>HADOOP-5557</a>] - Two minor problems in TestOverReplicatedBlocks
  78. </li>
  79. <li>[<a href='https://issues.apache.org/jira/browse/HADOOP-5579'>HADOOP-5579</a>] - libhdfs does not set errno correctly
  80. </li>
  81. <li>[<a href='https://issues.apache.org/jira/browse/HADOOP-5644'>HADOOP-5644</a>] - Namnode is stuck in safe mode
  82. </li>
  83. <li>[<a href='https://issues.apache.org/jira/browse/HADOOP-5671'>HADOOP-5671</a>] - DistCp.sameFile(..) should return true if src fs does not support checksum
  84. </li>
  85. <li>[<a href='https://issues.apache.org/jira/browse/HADOOP-5728'>HADOOP-5728</a>] - FSEditLog.printStatistics may cause IndexOutOfBoundsException
  86. </li>
  87. <li>[<a href='https://issues.apache.org/jira/browse/HADOOP-5816'>HADOOP-5816</a>] - ArrayIndexOutOfBoundsException when using KeyFieldBasedComparator
  88. </li>
  89. <li>[<a href='https://issues.apache.org/jira/browse/HADOOP-5951'>HADOOP-5951</a>] - StorageInfo needs Apache license header.
  90. </li>
  91. <li>[<a href='https://issues.apache.org/jira/browse/HADOOP-6017'>HADOOP-6017</a>] - NameNode and SecondaryNameNode fail to restart because of abnormal filenames.
  92. </li>
  93. </ul>
  94. <h3> Improvement
  95. </h3>
  96. <ul>
  97. <li>[<a href='https://issues.apache.org/jira/browse/HADOOP-5332'>HADOOP-5332</a>] - Make support for file append API configurable
  98. </li>
  99. <li>[<a href='https://issues.apache.org/jira/browse/HADOOP-5379'>HADOOP-5379</a>] - Throw exception instead of writing to System.err when there is a CRC error on CBZip2InputStream
  100. </li>
  101. </ul>
  102. </ul>
  103. <h1>Hadoop 0.19.1 Release Notes</h1>
  104. Hadoop 0.19.1 fixes serveral problems that may lead to data loss
  105. from the file system and makes some incompatible changes from Hadoop 0.19.0. For instance, the file append API has been disabled in this release due to implementation issues that can lead to data loss.
  106. The bug fixes are listed below.
  107. <ul>
  108. <h2>Changes Since Hadoop 0.19.0</h2>
  109. <h3> Bug
  110. </h3>
  111. <ul>
  112. <li>[<a href='https://issues.apache.org/jira/browse/HADOOP-3874'>HADOOP-3874</a>] - TestFileAppend2.testComplexAppend sometimes fails
  113. </li>
  114. <li>[<a href='https://issues.apache.org/jira/browse/HADOOP-4061'>HADOOP-4061</a>] - Large number of decommission freezes the Namenode
  115. </li>
  116. <li>[<a href='https://issues.apache.org/jira/browse/HADOOP-4420'>HADOOP-4420</a>] - JobTracker.killJob() doesn't check for the JobID being valid
  117. </li>
  118. <li>[<a href='https://issues.apache.org/jira/browse/HADOOP-4494'>HADOOP-4494</a>] - libhdfs does not call FileSystem.append when O_APPEND passed to hdfsOpenFile
  119. </li>
  120. <li>[<a href='https://issues.apache.org/jira/browse/HADOOP-4508'>HADOOP-4508</a>] - FSDataOutputStream.getPos() == 0when appending to existing file and should be file length
  121. </li>
  122. <li>[<a href='https://issues.apache.org/jira/browse/HADOOP-4616'>HADOOP-4616</a>] - assertion makes fuse-dfs exit when reading incomplete data
  123. </li>
  124. <li>[<a href='https://issues.apache.org/jira/browse/HADOOP-4619'>HADOOP-4619</a>] - hdfs_write infinite loop when dfs fails and cannot write files &gt; 2 GB
  125. </li>
  126. <li>[<a href='https://issues.apache.org/jira/browse/HADOOP-4632'>HADOOP-4632</a>] - TestJobHistoryVersion should not create directory in current dir.
  127. </li>
  128. <li>[<a href='https://issues.apache.org/jira/browse/HADOOP-4635'>HADOOP-4635</a>] - Memory leak ?
  129. </li>
  130. <li>[<a href='https://issues.apache.org/jira/browse/HADOOP-4680'>HADOOP-4680</a>] - fuse-dfs - df -kh on hdfs mount shows much less %used than the dfs UI
  131. </li>
  132. <li>[<a href='https://issues.apache.org/jira/browse/HADOOP-4681'>HADOOP-4681</a>] - DFSClient block read failures cause open DFSInputStream to become unusable
  133. </li>
  134. <li>[<a href='https://issues.apache.org/jira/browse/HADOOP-4697'>HADOOP-4697</a>] - KFS::getBlockLocations() fails with files having multiple blocks
  135. </li>
  136. <li>[<a href='https://issues.apache.org/jira/browse/HADOOP-4715'>HADOOP-4715</a>] - Fix quickstart.html to reflect that Hadoop works with Java 1.6.x now
  137. </li>
  138. <li>[<a href='https://issues.apache.org/jira/browse/HADOOP-4720'>HADOOP-4720</a>] - docs/api does not contain the hdfs directory after building
  139. </li>
  140. <li>[<a href='https://issues.apache.org/jira/browse/HADOOP-4727'>HADOOP-4727</a>] - Groups do not work for fuse-dfs out of the box on 0.19.0
  141. </li>
  142. <li>[<a href='https://issues.apache.org/jira/browse/HADOOP-4731'>HADOOP-4731</a>] - Job is not removed from the waiting jobs queue upon completion.
  143. </li>
  144. <li>[<a href='https://issues.apache.org/jira/browse/HADOOP-4759'>HADOOP-4759</a>] - HADOOP-4654 to be fixed for branches &gt;= 0.19
  145. </li>
  146. <li>[<a href='https://issues.apache.org/jira/browse/HADOOP-4760'>HADOOP-4760</a>] - HDFS streams should not throw exceptions when closed twice
  147. </li>
  148. <li>[<a href='https://issues.apache.org/jira/browse/HADOOP-4797'>HADOOP-4797</a>] - RPC Server can leave a lot of direct buffers
  149. </li>
  150. <li>[<a href='https://issues.apache.org/jira/browse/HADOOP-4821'>HADOOP-4821</a>] - Usage description in the Quotas guide documentations are incorrect
  151. </li>
  152. <li>[<a href='https://issues.apache.org/jira/browse/HADOOP-4836'>HADOOP-4836</a>] - Minor typos in documentation and comments
  153. </li>
  154. <li>[<a href='https://issues.apache.org/jira/browse/HADOOP-4847'>HADOOP-4847</a>] - OutputCommitter is loaded in the TaskTracker in localizeConfiguration
  155. </li>
  156. <li>[<a href='https://issues.apache.org/jira/browse/HADOOP-4862'>HADOOP-4862</a>] - A spurious IOException log on DataNode is not completely removed
  157. </li>
  158. <li>[<a href='https://issues.apache.org/jira/browse/HADOOP-4873'>HADOOP-4873</a>] - display minMaps/Reduces on advanced scheduler page
  159. </li>
  160. <li>[<a href='https://issues.apache.org/jira/browse/HADOOP-4906'>HADOOP-4906</a>] - TaskTracker running out of memory after running several tasks
  161. </li>
  162. <li>[<a href='https://issues.apache.org/jira/browse/HADOOP-4918'>HADOOP-4918</a>] - Fix bzip2 work with SequenceFile
  163. </li>
  164. <li>[<a href='https://issues.apache.org/jira/browse/HADOOP-4924'>HADOOP-4924</a>] - Race condition in re-init of TaskTracker
  165. </li>
  166. <li>[<a href='https://issues.apache.org/jira/browse/HADOOP-4943'>HADOOP-4943</a>] - fair share scheduler does not utilize all slots if the task trackers are configured heterogeneously
  167. </li>
  168. <li>[<a href='https://issues.apache.org/jira/browse/HADOOP-4955'>HADOOP-4955</a>] - Make DBOutputFormat us column names from setOutput(...)
  169. </li>
  170. <li>[<a href='https://issues.apache.org/jira/browse/HADOOP-4965'>HADOOP-4965</a>] - DFSClient should log instead of printing into std err.
  171. </li>
  172. <li>[<a href='https://issues.apache.org/jira/browse/HADOOP-4966'>HADOOP-4966</a>] - Setup tasks are not removed from JobTracker's taskIdToTIPMap even after the job completes
  173. </li>
  174. <li>[<a href='https://issues.apache.org/jira/browse/HADOOP-4967'>HADOOP-4967</a>] - Inconsistent state in JVM manager
  175. </li>
  176. <li>[<a href='https://issues.apache.org/jira/browse/HADOOP-4976'>HADOOP-4976</a>] - Mapper runs out of memory
  177. </li>
  178. <li>[<a href='https://issues.apache.org/jira/browse/HADOOP-4982'>HADOOP-4982</a>] - TestFsck does not run in Eclipse.
  179. </li>
  180. <li>[<a href='https://issues.apache.org/jira/browse/HADOOP-4983'>HADOOP-4983</a>] - Job counters sometimes go down as tasks run without task failures
  181. </li>
  182. <li>[<a href='https://issues.apache.org/jira/browse/HADOOP-4992'>HADOOP-4992</a>] - TestCustomOutputCommitter fails on hadoop-0.19
  183. </li>
  184. <li>[<a href='https://issues.apache.org/jira/browse/HADOOP-5002'>HADOOP-5002</a>] - 2 core tests TestFileOutputFormat and TestHarFileSystem are failing in branch 19
  185. </li>
  186. <li>[<a href='https://issues.apache.org/jira/browse/HADOOP-5008'>HADOOP-5008</a>] - TestReplication#testPendingReplicationRetry leaves an opened fd unclosed
  187. </li>
  188. <li>[<a href='https://issues.apache.org/jira/browse/HADOOP-5009'>HADOOP-5009</a>] - DataNode#shutdown sometimes leaves data block scanner verification log unclosed
  189. </li>
  190. <li>[<a href='https://issues.apache.org/jira/browse/HADOOP-5067'>HADOOP-5067</a>] - Failed/Killed attempts column in jobdetails.jsp does not show the number of failed/killed attempts correctly
  191. </li>
  192. <li>[<a href='https://issues.apache.org/jira/browse/HADOOP-5134'>HADOOP-5134</a>] - FSNamesystem#commitBlockSynchronization adds under-construction block locations to blocksMap
  193. </li>
  194. <li>[<a href='https://issues.apache.org/jira/browse/HADOOP-5156'>HADOOP-5156</a>] - TestHeartbeatHandling uses MiniDFSCluster.getNamesystem() which does not exist in branch 0.20
  195. </li>
  196. <li>[<a href='https://issues.apache.org/jira/browse/HADOOP-5161'>HADOOP-5161</a>] - Accepted sockets do not get placed in DataXceiverServer#childSockets
  197. </li>
  198. <li>[<a href='https://issues.apache.org/jira/browse/HADOOP-5166'>HADOOP-5166</a>] - JobTracker fails to restart if recovery and ACLs are enabled
  199. </li>
  200. <li>[<a href='https://issues.apache.org/jira/browse/HADOOP-5192'>HADOOP-5192</a>] - Block reciever should not remove a finalized block when block replication fails
  201. </li>
  202. <li>[<a href='https://issues.apache.org/jira/browse/HADOOP-5193'>HADOOP-5193</a>] - SecondaryNameNode does not rollImage because of incorrect calculation of edits modification time.
  203. </li>
  204. <li>[<a href='https://issues.apache.org/jira/browse/HADOOP-5213'>HADOOP-5213</a>] - BZip2CompressionOutputStream NullPointerException
  205. </li>
  206. <li>[<a href='https://issues.apache.org/jira/browse/HADOOP-5224'>HADOOP-5224</a>] - Disable append
  207. </li>
  208. <li>[<a href='https://issues.apache.org/jira/browse/HADOOP-5225'>HADOOP-5225</a>] - workaround for tmp file handling on DataNodes in 0.19.1 (HADOOP-4663)
  209. </li>
  210. <li>[<a href='https://issues.apache.org/jira/browse/HADOOP-5233'>HADOOP-5233</a>] - Reducer not Succeded after 100%
  211. </li>
  212. <li>[<a href='https://issues.apache.org/jira/browse/HADOOP-5235'>HADOOP-5235</a>] - possible NPE in tip.kill()
  213. </li>
  214. <li>[<a href='https://issues.apache.org/jira/browse/HADOOP-5241'>HADOOP-5241</a>] - Reduce tasks get stuck because of over-estimated task size (regression from 0.18)
  215. </li>
  216. <li>[<a href='https://issues.apache.org/jira/browse/HADOOP-5269'>HADOOP-5269</a>] - TaskTracker.runningTasks holding FAILED_UNCLEAN and KILLED_UNCLEAN taskStatuses forever in some cases.
  217. </li>
  218. </ul>
  219. <h3> Improvement
  220. </h3>
  221. <ul>
  222. <li>[<a href='https://issues.apache.org/jira/browse/HADOOP-3894'>HADOOP-3894</a>] - DFSClient chould log errors better, and provide better diagnostics
  223. </li>
  224. <li>[<a href='https://issues.apache.org/jira/browse/HADOOP-4482'>HADOOP-4482</a>] - Hadoop JMX usage makes Nagios monitoring impossible
  225. </li>
  226. <li>[<a href='https://issues.apache.org/jira/browse/HADOOP-4675'>HADOOP-4675</a>] - Current Ganglia metrics implementation is incompatible with Ganglia 3.1
  227. </li>
  228. <li>[<a href='https://issues.apache.org/jira/browse/HADOOP-4739'>HADOOP-4739</a>] - Minor enhancements to some sections of the Map/Reduce tutorial
  229. </li>
  230. <li>[<a href='https://issues.apache.org/jira/browse/HADOOP-4751'>HADOOP-4751</a>] - FileSystem.listStatus should report the current length of a file if it has been sync'd
  231. </li>
  232. <li>[<a href='https://issues.apache.org/jira/browse/HADOOP-5086'>HADOOP-5086</a>] - Trash URI semantics can be relaxed
  233. </li>
  234. <li>[<a href='https://issues.apache.org/jira/browse/HADOOP-5127'>HADOOP-5127</a>] - FSDirectory should not have public methods.
  235. </li>
  236. </ul>
  237. <h3> New Feature
  238. </h3>
  239. <ul>
  240. <li>[<a href='https://issues.apache.org/jira/browse/HADOOP-5034'>HADOOP-5034</a>] - NameNode should send both replication and deletion requests to DataNode in one reply to a heartbeat
  241. </li>
  242. </ul>
  243. </ul>
  244. <h1>Hadoop 0.19.0 Release Notes</h1>
  245. These release notes include new developer and user facing incompatibilities, features, and major improvements.
  246. The table below is sorted by Component.
  247. <ul><a name="changes">
  248. <h2>Changes Since Hadoop 0.18.2</h2>
  249. <table border="1">
  250. <tr bgcolor="#DDDDDD">
  251. <th align="left">Issue</th><th align="left">Component</th><th align="left">Notes</th>
  252. </tr>
  253. <tr>
  254. <td><a href="https://issues.apache.org:443/jira/browse/HADOOP-2325">HADOOP-2325</a></td><td>build</td><td>Hadoop now requires Java 6.</td>
  255. </tr>
  256. <tr>
  257. <td><a href="https://issues.apache.org:443/jira/browse/HADOOP-3730">HADOOP-3730</a></td><td>conf</td><td>&nbsp;Added a JobConf constructor that disables loading default configurations so as to take all default values from the JobTracker's configuration.</td>
  258. </tr>
  259. <tr>
  260. <td><a href="https://issues.apache.org:443/jira/browse/HADOOP-3722">HADOOP-3722</a></td><td>conf</td><td>Changed streaming StreamJob and Submitter to implement Tool and Configurable, and to use GenericOptionsParser arguments -fs, -jt, -conf, -D, -libjars, -files, and -archives. Deprecated -jobconf, -cacheArchive, -dfs, -cacheArchive, -additionalconfspec, from streaming and pipes in favor of the generic options. Removed from streaming -config, -mapred.job.tracker, and -cluster.</td>
  261. </tr>
  262. <tr>
  263. <td><a href="https://issues.apache.org:443/jira/browse/HADOOP-3646">HADOOP-3646</a></td><td>conf</td><td>Introduced support for bzip2 compressed files.</td>
  264. </tr>
  265. <tr>
  266. <td><a href="https://issues.apache.org:443/jira/browse/HADOOP-3445">HADOOP-3445</a></td><td>contrib/capacity-sched</td><td>Introduced Capacity Task Scheduler.
  267. <br/>
  268. </td>
  269. </tr>
  270. <tr>
  271. <td><a href="https://issues.apache.org:443/jira/browse/HADOOP-3719">HADOOP-3719</a></td><td>contrib/chukwa</td><td>Introduced Chukwa data collection and analysis framework.</td>
  272. </tr>
  273. <tr>
  274. <td><a href="https://issues.apache.org:443/jira/browse/HADOOP-2411">HADOOP-2411</a></td><td>contrib/ec2</td><td>Added support for c1.* instance types and associated kernels for EC2.</td>
  275. </tr>
  276. <tr>
  277. <td><a href="https://issues.apache.org:443/jira/browse/HADOOP-4117">HADOOP-4117</a></td><td>contrib/ec2</td><td>Changed scripts to pass initialization script for EC2 instances at boot time (as EC2 user data) rather than embedding initialization information in the EC2 image. This change makes it easy to customize the hadoop-site.xml file for your cluster before launch, by editing the hadoop-ec2-init-remote.sh script, or by setting the environment variable USER_DATA_FILE in hadoop-ec2-env.sh to run a script of your choice.</td>
  278. </tr>
  279. <tr>
  280. <td><a href="https://issues.apache.org:443/jira/browse/HADOOP-3485">HADOOP-3485</a></td><td>contrib/fuse-dfs</td><td>Introduce write support for Fuse; requires Linux kernel 2.6.15 or better.</td>
  281. </tr>
  282. <tr>
  283. <td><a href="https://issues.apache.org:443/jira/browse/HADOOP-3796">HADOOP-3796</a></td><td>contrib/fuse-dfs</td><td>Changed Fuse configuration to use mount options.</td>
  284. </tr>
  285. <tr>
  286. <td><a href="https://issues.apache.org:443/jira/browse/HADOOP-4106">HADOOP-4106</a></td><td>contrib/fuse-dfs</td><td>Added time, permission and user attribute support to libhdfs.</td>
  287. </tr>
  288. <tr>
  289. <td><a href="https://issues.apache.org:443/jira/browse/HADOOP-3908">HADOOP-3908</a></td><td>contrib/fuse-dfs</td><td>Improved Fuse-dfs better error message if llibhdfs.so doesn't exist.</td>
  290. </tr>
  291. <tr>
  292. <td><a href="https://issues.apache.org:443/jira/browse/HADOOP-4086">HADOOP-4086</a></td><td>contrib/hive</td><td>Added LIMIT to Hive query language.</td>
  293. </tr>
  294. <tr>
  295. <td><a href="https://issues.apache.org:443/jira/browse/HADOOP-4138">HADOOP-4138</a></td><td>contrib/hive</td><td>Introduced new SerDe library for src/contrib/hive.</td>
  296. </tr>
  297. <tr>
  298. <td><a href="https://issues.apache.org:443/jira/browse/HADOOP-3601">HADOOP-3601</a></td><td>contrib/hive</td><td>Introduced Hive Data Warehouse built on top of Hadoop that enables structuring Hadoop files as tables and partitions and allows users to query this data through a SQL like language using a command line interface.
  299. <br/>
  300. </td>
  301. </tr>
  302. <tr>
  303. <td><a href="https://issues.apache.org:443/jira/browse/HADOOP-4205">HADOOP-4205</a></td><td>contrib/hive</td><td>Improved Hive metastore and ql to use the refactored SerDe library.</td>
  304. </tr>
  305. <tr>
  306. <td><a href="https://issues.apache.org:443/jira/browse/HADOOP-4084">HADOOP-4084</a></td><td>contrib/hive</td><td>Introduced &quot;EXPLAIN&quot; plan for Hive.</td>
  307. </tr>
  308. <tr>
  309. <td><a href="https://issues.apache.org:443/jira/browse/HADOOP-3695">HADOOP-3695</a></td><td>contrib/hod</td><td>Added an ability in HOD to start multiple workers (TaskTrackers and/or DataNodes) per node to assist testing and simulation of scale. A configuration variable ringmaster.workers_per_ring was added to specify the number of workers to start.</td>
  310. </tr>
  311. <tr>
  312. <td><a href="https://issues.apache.org:443/jira/browse/HADOOP-3837">HADOOP-3837</a></td><td>contrib/streaming</td><td>Changed streaming tasks to adhere to task timeout value specified in the job configuration.</td>
  313. </tr>
  314. <tr>
  315. <td><a href="https://issues.apache.org:443/jira/browse/HADOOP-2302">HADOOP-2302</a></td><td>contrib/streaming</td><td>Introduced numerical key comparison for streaming.</td>
  316. </tr>
  317. <tr>
  318. <td><a href="https://issues.apache.org:443/jira/browse/HADOOP-4430">HADOOP-4430</a></td><td>dfs</td><td>Changed reporting in the NameNode Web UI to more closely reflect the behavior of the re-balancer. Removed no longer used config parameter dfs.datanode.du.pct from hadoop-default.xml.
  319. <br/>
  320. </td>
  321. </tr>
  322. <tr>
  323. <td><a href="https://issues.apache.org:443/jira/browse/HADOOP-4281">HADOOP-4281</a></td><td>dfs</td><td>Changed command &quot;hadoop dfsadmin -report&quot; to be consistent with Web UI for both Namenode and Datanode reports. &quot;Total raw bytes&quot; is changed to &quot;Configured Capacity&quot;. &quot;Present Capacity&quot; is newly added to indicate the present capacity of the DFS. &quot;Remaining raw bytes&quot; is changed to &quot;DFS Remaining&quot;. &quot;Used raw bytes&quot; is changed to &quot;DFS Used&quot;. &quot;% used&quot; is changed to &quot;DFS Used%&quot;. Applications that parse command output should be reviewed.</td>
  324. </tr>
  325. <tr>
  326. <td><a href="https://issues.apache.org:443/jira/browse/HADOOP-2885">HADOOP-2885</a></td><td>dfs</td><td>Restructured the package hadoop.dfs.</td>
  327. </tr>
  328. <tr>
  329. <td><a href="https://issues.apache.org:443/jira/browse/HADOOP-4007">HADOOP-4007</a></td><td>dfs</td><td>Changed ClientProtocol getStatus and getListing to use the type FileStatus. Removed type DFSFileInfo.</td>
  330. </tr>
  331. <tr>
  332. <td><a href="https://issues.apache.org:443/jira/browse/HADOOP-2816">HADOOP-2816</a></td><td>dfs</td><td>Improved space reporting for NameNode Web UI. Applications that parse the Web UI output should be reviewed.</td>
  333. </tr>
  334. <tr>
  335. <td><a href="https://issues.apache.org:443/jira/browse/HADOOP-1700">HADOOP-1700</a></td><td>dfs</td><td>Introduced append operation for HDFS files.
  336. <br/>
  337. </td>
  338. </tr>
  339. <tr>
  340. <td><a href="https://issues.apache.org:443/jira/browse/HADOOP-3938">HADOOP-3938</a></td><td>dfs</td><td>Introducted byte space quotas for directories. The count shell command modified to report both name and byte quotas.</td>
  341. </tr>
  342. <tr>
  343. <td><a href="https://issues.apache.org:443/jira/browse/HADOOP-4176">HADOOP-4176</a></td><td>dfs</td><td>Implemented getFileChecksum(Path) in HftpFileSystemfor distcp support.</td>
  344. </tr>
  345. <tr>
  346. <td><a href="https://issues.apache.org:443/jira/browse/HADOOP-1869">HADOOP-1869</a></td><td>dfs</td><td>Added HDFS file access times. By default, access times will be precise to the most recent hour boundary. A configuration parameter dfs.access.time.precision (milliseconds) is used to control this precision. Setting a value of 0 will disable persisting access times for HDFS files.</td>
  347. </tr>
  348. <tr>
  349. <td><a href="https://issues.apache.org:443/jira/browse/HADOOP-3981">HADOOP-3981</a></td><td>dfs</td><td>Implemented MD5-of-xxxMD5-of-yyyCRC32 which is a distributed file checksum algorithm for HDFS, where xxx is the number of CRCs per block and yyy is the number of bytes per CRC.
  350. <br/>
  351. <br/>
  352. Changed DistCp to use file checksum for comparing files if both source and destination FileSystem(s) support getFileChecksum(...).</td>
  353. </tr>
  354. <tr>
  355. <td><a href="https://issues.apache.org:443/jira/browse/HADOOP-3992">HADOOP-3992</a></td><td>dfs</td><td>Added a synthetic load generation facility to the test directory.</td>
  356. </tr>
  357. <tr>
  358. <td><a href="https://issues.apache.org:443/jira/browse/HADOOP-3792">HADOOP-3792</a></td><td>fs</td><td>Changed exit code from hadoop.fs.FsShell -test to match the usual Unix convention.</td>
  359. </tr>
  360. <tr>
  361. <td><a href="https://issues.apache.org:443/jira/browse/HADOOP-4227">HADOOP-4227</a></td><td>fs</td><td>Removed the deprecated class org.apache.hadoop.fs.ShellCommand.</td>
  362. </tr>
  363. <tr>
  364. <td><a href="https://issues.apache.org:443/jira/browse/HADOOP-3498">HADOOP-3498</a></td><td>fs</td><td>Extended file globbing alternation to cross path components. For example, {/a/b,/c/d} expands to a path that matches the files /a/b and /c/d.</td>
  365. </tr>
  366. <tr>
  367. <td><a href="https://issues.apache.org:443/jira/browse/HADOOP-3911">HADOOP-3911</a></td><td>fs</td><td>Added a check to fsck options to make sure -files is not the first option so as to resolve conflicts with GenericOptionsParser.</td>
  368. </tr>
  369. <tr>
  370. <td><a href="https://issues.apache.org:443/jira/browse/HADOOP-3941">HADOOP-3941</a></td><td>fs</td><td>Added new FileSystem APIs: FileChecksum and FileSystem.getFileChecksum(Path).</td>
  371. </tr>
  372. <tr>
  373. <td><a href="https://issues.apache.org:443/jira/browse/HADOOP-4466">HADOOP-4466</a></td><td>io</td><td>Ensure that SequenceFileOutputFormat isn't tied to Writables and can be used with other Serialization frameworks.</td>
  374. </tr>
  375. <tr>
  376. <td><a href="https://issues.apache.org:443/jira/browse/HADOOP-2664">HADOOP-2664</a></td><td>io</td><td>Introduced LZOP codec.</td>
  377. </tr>
  378. <tr>
  379. <td><a href="https://issues.apache.org:443/jira/browse/HADOOP-3549">HADOOP-3549</a></td><td>libhdfs</td><td>Improved error reporting for libhdfs so permission problems now return EACCES.</td>
  380. </tr>
  381. <tr>
  382. <td><a href="https://issues.apache.org:443/jira/browse/HADOOP-3963">HADOOP-3963</a></td><td>libhdfs</td><td>Modified libhdfs to return NULL or error code when unrecoverable error occurs rather than exiting itself.
  383. <br/>
  384. </td>
  385. </tr>
  386. <tr>
  387. <td><a href="https://issues.apache.org:443/jira/browse/HADOOP-4190">HADOOP-4190</a></td><td>mapred</td><td>Changed job history format to add a dot at end of each line.</td>
  388. </tr>
  389. <tr>
  390. <td><a href="https://issues.apache.org:443/jira/browse/HADOOP-3970">HADOOP-3970</a></td><td>mapred</td><td>Added getEscapedCompactString() and fromEscapedCompactString() to Counters.java to represent counters as Strings and to reconstruct the counters from the Strings.</td>
  391. </tr>
  392. <tr>
  393. <td><a href="https://issues.apache.org:443/jira/browse/HADOOP-3245">HADOOP-3245</a></td><td>mapred</td><td>Introduced recovery of jobs when JobTracker restarts. This facility is off by default. Introduced config parameters mapred.jobtracker.restart.recover, mapred.jobtracker.job.history.block.size, and mapred.jobtracker.job.history.buffer.size.</td>
  394. </tr>
  395. <tr>
  396. <td><a href="https://issues.apache.org:443/jira/browse/HADOOP-3829">HADOOP-3829</a></td><td>mapred</td><td>Introduced new config parameter org.apache.hadoop.mapred.SkipBadRecords.setMapperMaxSkipRecords to set range of records to be skipped in the neighborhood of a failed record.</td>
  397. </tr>
  398. <tr>
  399. <td><a href="https://issues.apache.org:443/jira/browse/HADOOP-4018">HADOOP-4018</a></td><td>mapred</td><td>Introduced new configuration parameter mapred.max.tasks.per.job to specifie the maximum number of tasks per job.
  400. <br/>
  401. <br/>
  402. </td>
  403. </tr>
  404. <tr>
  405. <td><a href="https://issues.apache.org:443/jira/browse/HADOOP-249">HADOOP-249</a></td><td>mapred</td><td>Enabled task JVMs to be reused via the job config mapred.job.reuse.jvm.num.tasks.</td>
  406. </tr>
  407. <tr>
  408. <td><a href="https://issues.apache.org:443/jira/browse/HADOOP-3746">HADOOP-3746</a></td><td>mapred</td><td>Introduced Fair Scheduler.</td>
  409. </tr>
  410. <tr>
  411. <td><a href="https://issues.apache.org:443/jira/browse/HADOOP-153">HADOOP-153</a></td><td>mapred</td><td>Introduced record skipping where tasks fail on certain records. (org.apache.hadoop.mapred.SkipBadRecords)</td>
  412. </tr>
  413. <tr>
  414. <td><a href="https://issues.apache.org:443/jira/browse/HADOOP-3652">HADOOP-3652</a></td><td>mapred</td><td>Removed deprecated org.apache.hadoop.mapred.OutputFormatBase.</td>
  415. </tr>
  416. <tr>
  417. <td><a href="https://issues.apache.org:443/jira/browse/HADOOP-3702">HADOOP-3702</a></td><td>mapred</td><td>Introduced ChainMapper and the ChainReducer classes to allow composing chains of Maps and Reduces in a single Map/Reduce job, something like MAP+ REDUCE MAP*.
  418. <br/>
  419. </td>
  420. </tr>
  421. <tr>
  422. <td><a href="https://issues.apache.org:443/jira/browse/HADOOP-3595">HADOOP-3595</a></td><td>mapred</td><td>&nbsp;Removed deprecated methods for mapred.combine.once functionality.</td>
  423. </tr>
  424. <tr>
  425. <td><a href="https://issues.apache.org:443/jira/browse/HADOOP-3149">HADOOP-3149</a></td><td>mapred</td><td>Introduced MultipleOutputs class so Map/Reduce jobs can write data to different output files. Each output can use a different OutputFormat. Outpufiles are created within the job output directory. FileOutputFormat.getPathForCustomFile() creates a filename under the outputdir that is named with the task ID and task type (i.e. myfile-r-00001).</td>
  426. </tr>
  427. <tr>
  428. <td><a href="https://issues.apache.org:443/jira/browse/HADOOP-3684">HADOOP-3684</a></td><td>mapred</td><td>Allowed user to overwrite clone function in a subclass of TaggedMapOutput class.</td>
  429. </tr>
  430. <tr>
  431. <td><a href="https://issues.apache.org:443/jira/browse/HADOOP-3930">HADOOP-3930</a></td><td>mapred</td><td>Changed TaskScheduler to expose API for Web UI and Command Line Tool.</td>
  432. </tr>
  433. <tr>
  434. <td><a href="https://issues.apache.org:443/jira/browse/HADOOP-3478">HADOOP-3478</a></td><td>mapred</td><td>Changed reducers to fetch maps in the same order for a given host to speed up identification of the faulty maps; reducers still randomize the host selection to distribute load.</td>
  435. </tr>
  436. <tr>
  437. <td><a href="https://issues.apache.org:443/jira/browse/HADOOP-3150">HADOOP-3150</a></td><td>mapred</td><td>Moved task file promotion to the Task. When the task has finished, it will do a commit and is declared SUCCEDED. Job cleanup is done by a separate task. Job is declared SUCCEDED/FAILED after the cleanup task has finished. Added public classes org.apache.hadoop.mapred.JobContext, TaskAttemptContext, OutputCommitter and FileOutputCommiitter. Added public APIs: public OutputCommitter getOutputCommitter() and
  438. <br/>
  439. public void setOutputCommitter(Class&lt;? extends OutputCommitter&gt; theClass) in org.apache.hadoop.mapred.JobConf</td>
  440. </tr>
  441. <tr>
  442. <td><a href="https://issues.apache.org:443/jira/browse/HADOOP-3019">HADOOP-3019</a></td><td>mapred</td><td>Added a partitioner that effects a total order of output data, and an input sampler for generating the partition keyset for TotalOrderPartitioner for when the map's input keytype and distribution approximates its output.</td>
  443. </tr>
  444. <tr>
  445. <td><a href="https://issues.apache.org:443/jira/browse/HADOOP-3581">HADOOP-3581</a></td><td>mapred</td><td>Added the ability to kill process trees transgressing memory limits. TaskTracker uses the configuration parameters introduced in <a href="https://issues.apache.org:443/jira/browse/HADOOP-3759" title="Provide ability to run memory intensive jobs without affecting other running tasks on the nodes"><strike>HADOOP-3759</strike></a>. In addition, mapred.tasktracker.taskmemorymanager.monitoring-interval specifies the interval for which TT waits between cycles of monitoring tasks' memory usage, and mapred.tasktracker.procfsbasedprocesstree.sleeptime-before-sigkill specifies the time TT waits for sending a SIGKILL to a process-tree that has overrun memory limits, after it has been sent a SIGTERM.</td>
  446. </tr>
  447. <tr>
  448. <td><a href="https://issues.apache.org:443/jira/browse/HADOOP-3828">HADOOP-3828</a></td><td>mapred</td><td>Skipped records can optionally be written to the HDFS. Refer org.apache.hadoop.mapred.SkipBadRecords.setSkipOutputPath for setting the output path.</td>
  449. </tr>
  450. <tr>
  451. <td><a href="https://issues.apache.org:443/jira/browse/HADOOP-4293">HADOOP-4293</a></td><td>mapred</td><td>Made Configuration Writable and rename the old write method to writeXml.</td>
  452. </tr>
  453. <tr>
  454. <td><a href="https://issues.apache.org:443/jira/browse/HADOOP-3412">HADOOP-3412</a></td><td>mapred</td><td>Added the ability to chose between many schedulers, and to limit the number of running tasks per job.</td>
  455. </tr>
  456. <tr>
  457. <td><a href="https://issues.apache.org:443/jira/browse/HADOOP-3667">HADOOP-3667</a></td><td>mapred</td><td>Removed the following deprecated methods from JobConf:
  458. <br/>
  459. &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;addInputPath(Path)
  460. <br/>
  461. &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;getInputPaths()
  462. <br/>
  463. &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;getMapOutputCompressionType()
  464. <br/>
  465. &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;getOutputPath()
  466. <br/>
  467. &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;getSystemDir()
  468. <br/>
  469. &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;setInputPath(Path)
  470. <br/>
  471. &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;setMapOutputCompressionType(CompressionType style)
  472. <br/>
  473. &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;setOutputPath(Path)</td>
  474. </tr>
  475. <tr>
  476. <td><a href="https://issues.apache.org:443/jira/browse/HADOOP-3585">HADOOP-3585</a></td><td>metrics</td><td>Added FailMon as a contrib project for hardware failure monitoring and analysis, under /src/contrib/failmon. Created User Manual and Quick Start Guide.</td>
  477. </tr>
  478. <tr>
  479. <td><a href="https://issues.apache.org:443/jira/browse/HADOOP-3062">HADOOP-3062</a></td><td>metrics</td><td>Introduced additional log records for data transfers.</td>
  480. </tr>
  481. <tr>
  482. <td><a href="https://issues.apache.org:443/jira/browse/HADOOP-3714">HADOOP-3714</a></td><td>scripts</td><td>Adds a new contrib, bash-tab-completion, which enables bash tab completion for the bin/hadoop script. See the README file in the contrib directory for the installation.</td>
  483. </tr>
  484. <tr>
  485. <td><a href="https://issues.apache.org:443/jira/browse/HADOOP-3889">HADOOP-3889</a></td><td>tools/distcp</td><td>Changed DistCp error messages when there is a RemoteException. Changed the corresponding return value from -999 to -3.</td>
  486. </tr>
  487. <tr>
  488. <td><a href="https://issues.apache.org:443/jira/browse/HADOOP-3939">HADOOP-3939</a></td><td>tools/distcp</td><td>Added a new option -delete to DistCp so that if the files/directories exist in dst but not in src will be deleted. It uses FsShell to do delete, so that it will use trash if the trash is enable.</td>
  489. </tr>
  490. <tr>
  491. <td><a href="https://issues.apache.org:443/jira/browse/HADOOP-3873">HADOOP-3873</a></td><td>tools/distcp</td><td>Added two new options -filelimit &lt;n&gt; and -sizelimit &lt;n&gt; to DistCp for limiting the total number of files and the total size in bytes, respectively.</td>
  492. </tr>
  493. <tr>
  494. <td><a href="https://issues.apache.org:443/jira/browse/HADOOP-3854">HADOOP-3854</a></td><td>util</td><td>Added a configuration property hadoop.http.filter.initializers and a class org.apache.hadoop.http.FilterInitializer for supporting servlet filter. Cluster administrator could possibly configure customized filters for their web site.</td>
  495. </tr>
  496. </table>
  497. </ul>
  498. </body>
  499. </html>