123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254 |
- Hadoop Change Log
- Trunk (unreleased)
- 1. HADOOP-208. Enhance MapReduce web interface, adding new pages
- for failed tasks, and tasktrackers. (omalley via cutting)
- 2. HADOOP-204. Tweaks to metrics package. (David Bowen via cutting)
- 3. HADOOP-209. Add a MapReduce-based file copier. This will
- copy files within or between file systems in parallel.
- (Milind Bhandarkar via cutting)
- 4. HADOOP-146. Fix DFS to check when randomly generating a new block
- id that no existing blocks already have that id.
- (Milind Bhandarkar via cutting)
- 5. HADOOP-180. Make a daemon thread that does the actual task clean ups, so
- that the main offerService thread in the taskTracker doesn't get stuck
- and miss his heartbeat window. This was killing many task trackers as
- big jobs finished (300+ tasks / node). (omalley via cutting)
- 6. HADOOP-200. Avoid transmitting entire list of map task names to
- reduce tasks. Instead just transmit the number of map tasks and
- henceforth refer to them by number when collecting map output.
- (omalley via cutting)
- Release 0.2.1 - 2006-05-12
- 1. HADOOP-199. Fix reduce progress (broken by HADOOP-182).
- (omalley via cutting)
- 2. HADOOP-201. Fix 'bin/hadoop dfs -report'. (cutting)
- 3. HADOOP-207. Fix JDK 1.4 incompatibility introduced by HADOOP-96.
- System.getenv() does not work in JDK 1.4. (Hairong Kuang via cutting)
- Release 0.2.0 - 2006-05-05
- 1. Fix HADOOP-126. 'bin/hadoop dfs -cp' now correctly copies .crc
- files. (Konstantin Shvachko via cutting)
- 2. Fix HADOOP-51. Change DFS to support per-file replication counts.
- (Konstantin Shvachko via cutting)
- 3. Fix HADOOP-131. Add scripts to start/stop dfs and mapred daemons.
- Use these in start/stop-all scripts. (Chris Mattmann via cutting)
- 4. Stop using ssh options by default that are not yet in widely used
- versions of ssh. Folks can still enable their use by uncommenting
- a line in conf/hadoop-env.sh. (cutting)
- 5. Fix HADOOP-92. Show information about all attempts to run each
- task in the web ui. (Mahadev konar via cutting)
- 6. Fix HADOOP-128. Improved DFS error handling. (Owen O'Malley via cutting)
- 7. Fix HADOOP-129. Replace uses of java.io.File with new class named
- Path. This fixes bugs where java.io.File methods were called
- directly when FileSystem methods were desired, and reduces the
- likelihood of such bugs in the future. It also makes the handling
- of pathnames more consistent between local and dfs FileSystems and
- between Windows and Unix. java.io.File-based methods are still
- available for back-compatibility, but are deprecated and will be
- removed once 0.2 is released. (cutting)
- 8. Change dfs.data.dir and mapred.local.dir to be comma-separated
- lists of directories, no longer be space-separated. This fixes
- several bugs on Windows. (cutting)
- 9. Fix HADOOP-144. Use mapred task id for dfs client id, to
- facilitate debugging. (omalley via cutting)
- 10. Fix HADOOP-143. Do not line-wrap stack-traces in web ui.
- (omalley via cutting)
- 11. Fix HADOOP-118. In DFS, improve clean up of abandoned file
- creations. (omalley via cutting)
- 12. Fix HADOOP-138. Stop multiple tasks in a single heartbeat, rather
- than one per heartbeat. (Stefan via cutting)
- 13. Fix HADOOP-139. Remove a potential deadlock in
- LocalFileSystem.lock(). (Igor Bolotin via cutting)
- 14. Fix HADOOP-134. Don't hang jobs when the tasktracker is
- misconfigured to use an un-writable local directory. (omalley via cutting)
- 15. Fix HADOOP-115. Correct an error message. (Stack via cutting)
- 16. Fix HADOOP-133. Retry pings from child to parent, in case of
- (local) communcation problems. Also log exit status, so that one
- can distinguish patricide from other deaths. (omalley via cutting)
- 17. Fix HADOOP-142. Avoid re-running a task on a host where it has
- previously failed. (omalley via cutting)
- 18. Fix HADOOP-148. Maintain a task failure count for each
- tasktracker and display it in the web ui. (omalley via cutting)
- 19. Fix HADOOP-151. Close a potential socket leak, where new IPC
- connection pools were created per configuration instance that RPCs
- use. Now a global RPC connection pool is used again, as
- originally intended. (cutting)
- 20. Fix HADOOP-69. Don't throw a NullPointerException when getting
- hints for non-existing file split. (Bryan Pendelton via cutting)
- 21. Fix HADOOP-157. When a task that writes dfs files (e.g., a reduce
- task) failed and was retried, it would fail again and again,
- eventually failing the job. The problem was that dfs did not yet
- know that the failed task had abandoned the files, and would not
- yet let another task create files with the same names. Dfs now
- retries when creating a file long enough for locks on abandoned
- files to expire. (omalley via cutting)
- 22. Fix HADOOP-150. Improved task names that include job
- names. (omalley via cutting)
- 23. Fix HADOOP-162. Fix ConcurrentModificationException when
- releasing file locks. (omalley via cutting)
- 24. Fix HADOOP-132. Initial check-in of new Metrics API, including
- implementations for writing metric data to a file and for sending
- it to Ganglia. (David Bowen via cutting)
- 25. Fix HADOOP-160. Remove some uneeded synchronization around
- time-consuming operations in the TaskTracker. (omalley via cutting)
- 26. Fix HADOOP-166. RPCs failed when passed subclasses of a declared
- parameter type. This is fixed by changing ObjectWritable to store
- both the declared type and the instance type for Writables. Note
- that this incompatibly changes the format of ObjectWritable and
- will render unreadable any ObjectWritables stored in files.
- Nutch only uses ObjectWritable in intermediate files, so this
- should not be a problem for Nutch. (Stefan & cutting)
- 27. Fix HADOOP-168. MapReduce RPC protocol methods should all declare
- IOException, so that timeouts are handled appropriately.
- (omalley via cutting)
- 28. Fix HADOOP-169. Don't fail a reduce task if a call to the
- jobtracker to locate map outputs fails. (omalley via cutting)
- 29. Fix HADOOP-170. Permit FileSystem clients to examine and modify
- the replication count of individual files. Also fix a few
- replication-related bugs. (Konstantin Shvachko via cutting)
- 30. Permit specification of a higher replication levels for job
- submission files (job.xml and job.jar). This helps with large
- clusters, since these files are read by every node. (cutting)
- 31. HADOOP-173. Optimize allocation of tasks with local data. (cutting)
- 32. HADOOP-167. Reduce number of Configurations and JobConf's
- created. (omalley via cutting)
- 33. NUTCH-256. Change FileSystem#createNewFile() to create a .crc
- file. The lack of a .crc file was causing warnings. (cutting)
- 34. HADOOP-174. Change JobClient to not abort job until it has failed
- to contact the job tracker for five attempts, not just one as
- before. (omalley via cutting)
- 35. HADOOP-177. Change MapReduce web interface to page through tasks.
- Previously, when jobs had more than a few thousand tasks they
- could crash web browsers. (Mahadev Konar via cutting)
- 36. HADOOP-178. In DFS, piggyback blockwork requests from datanodes
- on heartbeat responses from namenode. This reduces the volume of
- RPC traffic. Also move startup delay in blockwork from datanode
- to namenode. This fixes a problem where restarting the namenode
- triggered a lot of uneeded replication. (Hairong Kuang via cutting)
- 37. HADOOP-183. If the DFS namenode is restarted with different
- minimum and/or maximum replication counts, existing files'
- replication counts are now automatically adjusted to be within the
- newly configured bounds. (Hairong Kuang via cutting)
- 38. HADOOP-186. Better error handling in TaskTracker's top-level
- loop. Also improve calculation of time to send next heartbeat.
- (omalley via cutting)
- 39. HADOOP-187. Add two MapReduce examples/benchmarks. One creates
- files containing random data. The second sorts the output of the
- first. (omalley via cutting)
- 40. HADOOP-185. Fix so that, when a task tracker times out making the
- RPC asking for a new task to run, the job tracker does not think
- that it is actually running the task returned. (omalley via cutting)
- 41. HADOOP-190. If a child process hangs after it has reported
- completion, its output should not be lost. (Stack via cutting)
- 42. HADOOP-184. Re-structure some test code to better support testing
- on a cluster. (Mahadev Konar via cutting)
- 43. HADOOP-191 Add streaming package, Hadoop's first contrib module.
- This permits folks to easily submit MapReduce jobs whose map and
- reduce functions are implemented by shell commands. Use
- 'bin/hadoop jar build/hadoop-streaming.jar' to get details.
- (Michel Tourn via cutting)
- 44. HADOOP-189. Fix MapReduce in standalone configuration to
- correctly handle job jar files that contain a lib directory with
- nested jar files. (cutting)
- 45. HADOOP-65. Initial version of record I/O framework that enables
- the specification of record types and generates marshalling code
- in both Java and C++. Generated Java code implements
- WritableComparable, but is not yet otherwise used by
- Hadoop. (Milind Bhandarkar via cutting)
- 46. HADOOP-193. Add a MapReduce-based FileSystem benchmark.
- (Konstantin Shvachko via cutting)
- 47. HADOOP-194. Add a MapReduce-based FileSystem checker. This reads
- every block in every file in the filesystem. (Konstantin Shvachko
- via cutting)
- 48. HADOOP-182. Fix so that lost task trackers to not change the
- status of reduce tasks or completed jobs. Also fixes the progress
- meter so that failed tasks are subtracted. (omalley via cutting)
- 49. HADOOP-96. Logging improvements. Log files are now separate from
- standard output and standard error files. Logs are now rolled.
- Logging of all DFS state changes can be enabled, to facilitate
- debugging. (Hairong Kuang via cutting)
- Release 0.1.1 - 2006-04-08
- 1. Added CHANGES.txt, logging all significant changes to Hadoop. (cutting)
- 2. Fix MapReduceBase.close() to throw IOException, as declared in the
- Closeable interface. This permits subclasses which override this
- method to throw that exception. (cutting)
- 3. Fix HADOOP-117. Pathnames were mistakenly transposed in
- JobConf.getLocalFile() causing many mapred temporary files to not
- be removed. (Raghavendra Prabhu via cutting)
-
- 4. Fix HADOOP-116. Clean up job submission files when jobs complete.
- (cutting)
- 5. Fix HADOOP-125. Fix handling of absolute paths on Windows (cutting)
- Release 0.1.0 - 2006-04-01
- 1. The first release of Hadoop.
|