123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506507508509510511512513514515516517518519520521522523524525526527528529530531532533534535536537538539540541542543544545546547548549550551552553554555556557558559560561562563564565566567568569570571572573574575576577578579580581582583584585586587588589590591592593594595596597598599600601602603604605606607608609610611612613614615616617618619620621622623624625626627628629630631632633634635636637638639640641642643644645646647648649650651652653654655656657658659660661662663664665666667668669670671672673674675676677678679680681682683684685686687688689690691692693694695696697698699700701702703704705706707708709710711712713714715716717718719720721722723724725726727728729730731732733734735736737738739740741742743744745746747748749750751752753754755756757758759760761762763764765766767 |
- Hadoop Change Log
- Trunk (unreleased changes)
- 1. HADOOP-427. Replace some uses of DatanodeDescriptor in the DFS
- web UI code with DatanodeInfo, the preferred public class.
- (Devaraj Das via cutting)
- 2. HADOOP-426. Fix streaming contrib module to work correctly on
- Solaris. This was causing nightly builds to fail.
- (Michel Tourn via cutting)
- 3. HADOOP-400. Improvements to task assignment. Tasks are no longer
- re-run on nodes where they have failed (unless no other node is
- available). Also, tasks are better load-balanced among nodes.
- (omalley via cutting)
- 4. HADOOP-324. Fix datanode to not exit when a disk is full, but
- rather simply to fail writes. (Wendy Chien via cutting)
- 5. HADOOP-434. Change smallJobsBenchmark to use standard Hadoop
- scripts. (Sanjay Dahiya via cutting)
- 6. HADOOP-453. Fix a bug in Text.setCapacity(). (siren via cutting)
- 7. HADOOP-450. Change so that input types are determined by the
- RecordReader rather than specified directly in the JobConf. This
- facilitates jobs with a variety of input types.
- WARNING: This contains incompatible API changes! The RecordReader
- interface has two new methods that all user-defined InputFormats
- must now define. Also, the values returned by TextInputFormat are
- no longer of class UTF8, but now of class Text.
- 8. HADOOP-436. Fix an error-handling bug in the web ui.
- (Devaraj Das via cutting)
- 9. HADOOP-455. Fix a bug in Text, where DEL was not permitted.
- (Hairong Kuang via cutting)
- 10. HADOOP-456. Change the DFS namenode to keep a persistent record
- of the set of known datanodes. This will be used to implement a
- "safe mode" where filesystem changes are prohibited when a
- critical percentage of the datanodes are unavailable.
- (Konstantin Shvachko via cutting)
- 11. HADOOP-322. Add a job control utility. This permits one to
- specify job interdependencies. Each job is submitted only after
- the jobs it depends on have successfully completed.
- (Runping Qi via cutting)
- 12. HADOOP-176. Fix a bug in IntWritable.Comparator.
- (Dick King via cutting)
- 13. HADOOP-421. Replace uses of String in recordio package with Text
- class, for improved handling of UTF-8 data.
- (Milind Bhandarkar via cutting)
- 14. HADOOP-464. Improved error message when job jar not found.
- (Michel Tourn via cutting)
- 15. HADOOP-469. Fix /bin/bash specifics that have crept into our
- /bin/sh scripts since HADOOP-352.
- (Jean-Baptiste Quenot via cutting)
- 16. HADOOP-468. Add HADOOP_NICENESS environment variable to set
- scheduling priority for daemons. (Vetle Roeim via cutting)
- 17. HADOOP-473. Fix TextInputFormat to correctly handle more EOL
- formats. Things now work correctly with CR, LF or CRLF.
- (Dennis Kubes & James White via cutting)
- 18. HADOOP-461. Make Java 1.5 an explicit requirement. (cutting)
- 19. HADOOP-54. Add block compression to SequenceFile. One may now
- specify that blocks of keys and values are compressed together,
- improving compression for small keys and values.
- SequenceFile.Writer's constructor is now deprecated and replaced
- with a factory method. (Arun C Murthy via cutting)
- 20. HADOOP-281. Prohibit DFS files that are also directories.
- (Wendy Chien via cutting)
- 21. HADOOP-486. Add the job username to JobStatus instances returned
- by JobClient. (Mahadev Konar via cutting)
- 22. HADOOP-437. contrib/streaming: Add support for gzipped inputs.
- (Michel Tourn via cutting)
- 23. HADOOP-463. Add variable expansion to config files.
- Configuration property values may now contain variable
- expressions. A variable is referenced with the syntax
- '${variable}'. Variables values are found first in the
- configuration, and then in Java system properties. The default
- configuration is modified so that temporary directories are now
- under ${hadoop.tmp.dir}, which is, by default,
- /tmp/hadoop-${user.name}. (Michel Tourn via cutting)
- 24. HADOOP-419. Fix a NullPointerException finding the ClassLoader
- when using libhdfs. (omalley via cutting)
- 25. HADOOP-460. Fix contrib/smallJobsBenchmark to use Text instead of
- UTF8. (Sanjay Dahiya via cutting)
- 26. HADOOP-196. Fix Configuration(Configuration) constructor to work
- correctly. (Sami Siren via cutting)
- 27. HADOOP-501. Fix Configuration.toString() to handle URL resources.
- (Thomas Friol via cutting)
- 28. HADOOP-499. Reduce the use of Strings in contrib/streaming,
- replacing them with Text for better performance.
- (Hairong Kuang via cutting)
- 29. HADOOP-64. Manage multiple volumes with a single DataNode.
- Previously DataNode would create a separate daemon per configured
- volume, each with its own connection to the NameNode. Now all
- volumes are handled by a single DataNode daemon, reducing the load
- on the NameNode. (Milind Bhandarkar via cutting)
- 30. HADOOP-424. Fix MapReduce so that jobs which generate zero splits
- do not fail. (Frédéric Bertin via cutting)
- 31. HADOOP-408. Adjust some timeouts and remove some others so that
- unit tests run faster. (cutting)
- 32. HADOOP-507. Fix an IllegalAccessException in DFS.
- (omalley via cutting)
- 33. HADOOP-320. Fix so that checksum files are correctly copied when
- the destination of a file copy is a directory.
- (Hairong Kuang via cutting)
- 34. HADOOP-286. In DFSClient, avoid pinging the NameNode with
- renewLease() calls when no files are being written.
- (Konstantin Shvachko via cutting)
- 35. HADOOP-312. Close idle IPC connections. All IPC connections were
- cached forever. Now, after a connection has been idle for more
- than a configurable amount of time (one second by default), the
- connection is closed, conserving resources on both client and
- server. (Devaraj Das via cutting)
- 36. HADOOP-497. Permit the specification of the network interface and
- nameserver to be used when determining the local hostname
- advertised by datanodes and tasktrackers.
- (Lorenzo Thione via cutting)
- 37. HADOOP-441. Add a compression codec API and extend SequenceFile
- to use it. This will permit the use of alternate compression
- codecs in SequenceFile. (Arun C Murthy via cutting)
- 38. HADOOP-483. Improvements to libhdfs build and documentation.
- (Arun C Murthy via cutting)
- Release 0.5.0 - 2006-08-04
- 1. HADOOP-352. Fix shell scripts to use /bin/sh instead of
- /bin/bash, for better portability.
- (Jean-Baptiste Quenot via cutting)
- 2. HADOOP-313. Permit task state to be saved so that single tasks
- may be manually re-executed when debugging. (omalley via cutting)
- 3. HADOOP-339. Add method to JobClient API listing jobs that are
- not yet complete, i.e., that are queued or running.
- (Mahadev Konar via cutting)
- 4. HADOOP-355. Updates to the streaming contrib module, including
- API fixes, making reduce optional, and adding an input type for
- StreamSequenceRecordReader. (Michel Tourn via cutting)
- 5. HADOOP-358. Fix a NPE bug in Path.equals().
- (Frédéric Bertin via cutting)
- 6. HADOOP-327. Fix ToolBase to not call System.exit() when
- exceptions are thrown. (Hairong Kuang via cutting)
- 7. HADOOP-359. Permit map output to be compressed.
- (omalley via cutting)
- 8. HADOOP-341. Permit input URI to CopyFiles to use the HTTP
- protocol. This lets one, e.g., more easily copy log files into
- DFS. (Arun C Murthy via cutting)
- 9. HADOOP-361. Remove unix dependencies from streaming contrib
- module tests, making them pure java. (Michel Tourn via cutting)
- 10. HADOOP-354. Make public methods to stop DFS daemons.
- (Barry Kaplan via cutting)
- 11. HADOOP-252. Add versioning to RPC protocols.
- (Milind Bhandarkar via cutting)
- 12. HADOOP-356. Add contrib to "compile" and "test" build targets, so
- that this code is better maintained. (Michel Tourn via cutting)
- 13. HADOOP-307. Add smallJobsBenchmark contrib module. This runs
- lots of small jobs, in order to determine per-task overheads.
- (Sanjay Dahiya via cutting)
- 14. HADOOP-342. Add a tool for log analysis: Logalyzer.
- (Arun C Murthy via cutting)
- 15. HADOOP-347. Add web-based browsing of DFS content. The namenode
- redirects browsing requests to datanodes. Content requests are
- redirected to datanodes where the data is local when possible.
- (Devaraj Das via cutting)
- 16. HADOOP-351. Make Hadoop IPC kernel independent of Jetty.
- (Devaraj Das via cutting)
- 17. HADOOP-237. Add metric reporting to DFS and MapReduce. With only
- minor configuration changes, one can now monitor many Hadoop
- system statistics using Ganglia or other monitoring systems.
- (Milind Bhandarkar via cutting)
- 18. HADOOP-376. Fix datanode's HTTP server to scan for a free port.
- (omalley via cutting)
- 19. HADOOP-260. Add --config option to shell scripts, specifying an
- alternate configuration directory. (Milind Bhandarkar via cutting)
- 20. HADOOP-381. Permit developers to save the temporary files for
- tasks whose names match a regular expression, to facilliate
- debugging. (omalley via cutting)
- 21. HADOOP-344. Fix some Windows-related problems with DF.
- (Konstantin Shvachko via cutting)
- 22. HADOOP-380. Fix reduce tasks to poll less frequently for map
- outputs. (Mahadev Konar via cutting)
- 23. HADOOP-321. Refactor DatanodeInfo, in preparation for
- HADOOP-306. (Konstantin Shvachko & omalley via cutting)
- 24. HADOOP-385. Fix some bugs in record io code generation.
- (Milind Bhandarkar via cutting)
- 25. HADOOP-302. Add new Text class to replace UTF8, removing
- limitations of that class. Also refactor utility methods for
- writing zero-compressed integers (VInts and VLongs).
- (Hairong Kuang via cutting)
- 26. HADOOP-335. Refactor DFS namespace/transaction logging in
- namenode. (Konstantin Shvachko via cutting)
- 27. HADOOP-375. Fix handling of the datanode HTTP daemon's port so
- that multiple datanode's can be run on a single host.
- (Devaraj Das via cutting)
- 28. HADOOP-386. When removing excess DFS block replicas, remove those
- on nodes with the least free space first.
- (Johan Oskarson via cutting)
- 29. HADOOP-389. Fix intermittent failures of mapreduce unit tests.
- Also fix some build dependencies.
- (Mahadev & Konstantin via cutting)
- 30. HADOOP-362. Fix a problem where jobs hang when status messages
- are recieved out-of-order. (omalley via cutting)
- 31. HADOOP-394. Change order of DFS shutdown in unit tests to
- minimize errors logged. (Konstantin Shvachko via cutting)
- 32. HADOOP-396. Make DatanodeID implement Writable.
- (Konstantin Shvachko via cutting)
- 33. HADOOP-377. Permit one to add URL resources to a Configuration.
- (Jean-Baptiste Quenot via cutting)
- 34. HADOOP-345. Permit iteration over Configuration key/value pairs.
- (Michel Tourn via cutting)
- 35. HADOOP-409. Streaming contrib module: make configuration
- properties available to commands as environment variables.
- (Michel Tourn via cutting)
- 36. HADOOP-369. Add -getmerge option to dfs command that appends all
- files in a directory into a single local file.
- (Johan Oskarson via cutting)
- 37. HADOOP-410. Replace some TreeMaps with HashMaps in DFS, for
- a 17% performance improvement. (Milind Bhandarkar via cutting)
- 38. HADOOP-411. Add unit tests for command line parser.
- (Hairong Kuang via cutting)
- 39. HADOOP-412. Add MapReduce input formats that support filtering
- of SequenceFile data, including sampling and regex matching.
- Also, move JobConf.newInstance() to a new utility class.
- (Hairong Kuang via cutting)
- 40. HADOOP-226. Fix fsck command to properly consider replication
- counts, now that these can vary per file. (Bryan Pendleton via cutting)
- 41. HADOOP-425. Add a Python MapReduce example, using Jython.
- (omalley via cutting)
- Release 0.4.0 - 2006-06-28
- 1. HADOOP-298. Improved progress reports for CopyFiles utility, the
- distributed file copier. (omalley via cutting)
- 2. HADOOP-299. Fix the task tracker, permitting multiple jobs to
- more easily execute at the same time. (omalley via cutting)
- 3. HADOOP-250. Add an HTTP user interface to the namenode, running
- on port 50070. (Devaraj Das via cutting)
- 4. HADOOP-123. Add MapReduce unit tests that run a jobtracker and
- tasktracker, greatly increasing code coverage.
- (Milind Bhandarkar via cutting)
- 5. HADOOP-271. Add links from jobtracker's web ui to tasktracker's
- web ui. Also attempt to log a thread dump of child processes
- before they're killed. (omalley via cutting)
- 6. HADOOP-210. Change RPC server to use a selector instead of a
- thread per connection. This should make it easier to scale to
- larger clusters. Note that this incompatibly changes the RPC
- protocol: clients and servers must both be upgraded to the new
- version to ensure correct operation. (Devaraj Das via cutting)
- 7. HADOOP-311. Change DFS client to retry failed reads, so that a
- single read failure will not alone cause failure of a task.
- (omalley via cutting)
- 8. HADOOP-314. Remove the "append" phase when reducing. Map output
- files are now directly passed to the sorter, without first
- appending them into a single file. Now, the first third of reduce
- progress is "copy" (transferring map output to reduce nodes), the
- middle third is "sort" (sorting map output) and the last third is
- "reduce" (generating output). Long-term, the "sort" phase will
- also be removed. (omalley via cutting)
- 9. HADOOP-316. Fix a potential deadlock in the jobtracker.
- (omalley via cutting)
- 10. HADOOP-319. Fix FileSystem.close() to remove the FileSystem
- instance from the cache. (Hairong Kuang via cutting)
- 11. HADOOP-135. Fix potential deadlock in JobTracker by acquiring
- locks in a consistent order. (omalley via cutting)
- 12. HADOOP-278. Check for existence of input directories before
- starting MapReduce jobs, making it easier to debug this common
- error. (omalley via cutting)
- 13. HADOOP-304. Improve error message for
- UnregisterdDatanodeException to include expected node name.
- (Konstantin Shvachko via cutting)
- 14. HADOOP-305. Fix TaskTracker to ask for new tasks as soon as a
- task is finished, rather than waiting for the next heartbeat.
- This improves performance when tasks are short.
- (Mahadev Konar via cutting)
- 15. HADOOP-59. Add support for generic command line options. One may
- now specify the filesystem (-fs), the MapReduce jobtracker (-jt),
- a config file (-conf) or any configuration property (-D). The
- "dfs", "fsck", "job", and "distcp" commands currently support
- this, with more to be added. (Hairong Kuang via cutting)
- 16. HADOOP-296. Permit specification of the amount of reserved space
- on a DFS datanode. One may specify both the percentage free and
- the number of bytes. (Johan Oskarson via cutting)
- 17. HADOOP-325. Fix a problem initializing RPC parameter classes, and
- remove the workaround used to initialize classes.
- (omalley via cutting)
- 18. HADOOP-328. Add an option to the "distcp" command to ignore read
- errors while copying. (omalley via cutting)
- 19. HADOOP-27. Don't allocate tasks to trackers whose local free
- space is too low. (Johan Oskarson via cutting)
- 20. HADOOP-318. Keep slow DFS output from causing task timeouts.
- This incompatibly changes some public interfaces, adding a
- parameter to OutputFormat.getRecordWriter() and the new method
- Reporter.progress(), but it makes lots of tasks succeed that were
- previously failing. (Milind Bhandarkar via cutting)
- Release 0.3.2 - 2006-06-09
- 1. HADOOP-275. Update the streaming contrib module to use log4j for
- its logging. (Michel Tourn via cutting)
- 2. HADOOP-279. Provide defaults for log4j logging parameters, so
- that things still work reasonably when Hadoop-specific system
- properties are not provided. (omalley via cutting)
- 3. HADOOP-280. Fix a typo in AllTestDriver which caused the wrong
- test to be run when "DistributedFSCheck" was specified.
- (Konstantin Shvachko via cutting)
- 4. HADOOP-240. DFS's mkdirs() implementation no longer logs a warning
- when the directory already exists. (Hairong Kuang via cutting)
- 5. HADOOP-285. Fix DFS datanodes to be able to re-join the cluster
- after the connection to the namenode is lost. (omalley via cutting)
- 6. HADOOP-277. Fix a race condition when creating directories.
- (Sameer Paranjpye via cutting)
- 7. HADOOP-289. Improved exception handling in DFS datanode.
- (Konstantin Shvachko via cutting)
- 8. HADOOP-292. Fix client-side logging to go to standard error
- rather than standard output, so that it can be distinguished from
- application output. (omalley via cutting)
- 9. HADOOP-294. Fixed bug where conditions for retrying after errors
- in the DFS client were reversed. (omalley via cutting)
- Release 0.3.1 - 2006-06-05
- 1. HADOOP-272. Fix a bug in bin/hadoop setting log
- parameters. (omalley & cutting)
- 2. HADOOP-274. Change applications to log to standard output rather
- than to a rolling log file like daemons. (omalley via cutting)
- 3. HADOOP-262. Fix reduce tasks to report progress while they're
- waiting for map outputs, so that they do not time out.
- (Mahadev Konar via cutting)
- 4. HADOOP-245 and HADOOP-246. Improvements to record io package.
- (Mahadev Konar via cutting)
- 5. HADOOP-276. Add logging config files to jar file so that they're
- always found. (omalley via cutting)
- Release 0.3.0 - 2006-06-02
- 1. HADOOP-208. Enhance MapReduce web interface, adding new pages
- for failed tasks, and tasktrackers. (omalley via cutting)
- 2. HADOOP-204. Tweaks to metrics package. (David Bowen via cutting)
- 3. HADOOP-209. Add a MapReduce-based file copier. This will
- copy files within or between file systems in parallel.
- (Milind Bhandarkar via cutting)
- 4. HADOOP-146. Fix DFS to check when randomly generating a new block
- id that no existing blocks already have that id.
- (Milind Bhandarkar via cutting)
- 5. HADOOP-180. Make a daemon thread that does the actual task clean ups, so
- that the main offerService thread in the taskTracker doesn't get stuck
- and miss his heartbeat window. This was killing many task trackers as
- big jobs finished (300+ tasks / node). (omalley via cutting)
- 6. HADOOP-200. Avoid transmitting entire list of map task names to
- reduce tasks. Instead just transmit the number of map tasks and
- henceforth refer to them by number when collecting map output.
- (omalley via cutting)
- 7. HADOOP-219. Fix a NullPointerException when handling a checksum
- exception under SequenceFile.Sorter.sort(). (cutting & stack)
- 8. HADOOP-212. Permit alteration of the file block size in DFS. The
- default block size for new files may now be specified in the
- configuration with the dfs.block.size property. The block size
- may also be specified when files are opened.
- (omalley via cutting)
- 9. HADOOP-218. Avoid accessing configuration while looping through
- tasks in JobTracker. (Mahadev Konar via cutting)
- 10. HADOOP-161. Add hashCode() method to DFS's Block.
- (Milind Bhandarkar via cutting)
- 11. HADOOP-115. Map output types may now be specified. These are also
- used as reduce input types, thus permitting reduce input types to
- differ from reduce output types. (Runping Qi via cutting)
- 12. HADOOP-216. Add task progress to task status page.
- (Bryan Pendelton via cutting)
- 13. HADOOP-233. Add web server to task tracker that shows running
- tasks and logs. Also add log access to job tracker web interface.
- (omalley via cutting)
- 14. HADOOP-205. Incorporate pending tasks into tasktracker load
- calculations. (Mahadev Konar via cutting)
- 15. HADOOP-247. Fix sort progress to better handle exceptions.
- (Mahadev Konar via cutting)
- 16. HADOOP-195. Improve performance of the transfer of map outputs to
- reduce nodes by performing multiple transfers in parallel, each on
- a separate socket. (Sameer Paranjpye via cutting)
- 17. HADOOP-251. Fix task processes to be tolerant of failed progress
- reports to their parent process. (omalley via cutting)
- 18. HADOOP-325. Improve the FileNotFound exceptions thrown by
- LocalFileSystem to include the name of the file.
- (Benjamin Reed via cutting)
- 19. HADOOP-254. Use HTTP to transfer map output data to reduce
- nodes. This, together with HADOOP-195, greatly improves the
- performance of these transfers. (omalley via cutting)
- 20. HADOOP-163. Cause datanodes that\ are unable to either read or
- write data to exit, so that the namenode will no longer target
- them for new blocks and will replicate their data on other nodes.
- (Hairong Kuang via cutting)
- 21. HADOOP-222. Add a -setrep option to the dfs commands that alters
- file replication levels. (Johan Oskarson via cutting)
- 22. HADOOP-75. In DFS, only check for a complete file when the file
- is closed, rather than as each block is written.
- (Milind Bhandarkar via cutting)
- 23. HADOOP-124. Change DFS so that datanodes are identified by a
- persistent ID rather than by host and port. This solves a number
- of filesystem integrity problems, when, e.g., datanodes are
- restarted. (Konstantin Shvachko via cutting)
- 24. HADOOP-256. Add a C API for DFS. (Arun C Murthy via cutting)
- 25. HADOOP-211. Switch to use the Jakarta Commons logging internally,
- configured to use log4j by default. (Arun C Murthy and cutting)
- 26. HADOOP-265. Tasktracker now fails to start if it does not have a
- writable local directory for temporary files. In this case, it
- logs a message to the JobTracker and exits. (Hairong Kuang via cutting)
- 27. HADOOP-270. Fix potential deadlock in datanode shutdown.
- (Hairong Kuang via cutting)
- Release 0.2.1 - 2006-05-12
- 1. HADOOP-199. Fix reduce progress (broken by HADOOP-182).
- (omalley via cutting)
- 2. HADOOP-201. Fix 'bin/hadoop dfs -report'. (cutting)
- 3. HADOOP-207. Fix JDK 1.4 incompatibility introduced by HADOOP-96.
- System.getenv() does not work in JDK 1.4. (Hairong Kuang via cutting)
- Release 0.2.0 - 2006-05-05
- 1. Fix HADOOP-126. 'bin/hadoop dfs -cp' now correctly copies .crc
- files. (Konstantin Shvachko via cutting)
- 2. Fix HADOOP-51. Change DFS to support per-file replication counts.
- (Konstantin Shvachko via cutting)
- 3. Fix HADOOP-131. Add scripts to start/stop dfs and mapred daemons.
- Use these in start/stop-all scripts. (Chris Mattmann via cutting)
- 4. Stop using ssh options by default that are not yet in widely used
- versions of ssh. Folks can still enable their use by uncommenting
- a line in conf/hadoop-env.sh. (cutting)
- 5. Fix HADOOP-92. Show information about all attempts to run each
- task in the web ui. (Mahadev konar via cutting)
- 6. Fix HADOOP-128. Improved DFS error handling. (Owen O'Malley via cutting)
- 7. Fix HADOOP-129. Replace uses of java.io.File with new class named
- Path. This fixes bugs where java.io.File methods were called
- directly when FileSystem methods were desired, and reduces the
- likelihood of such bugs in the future. It also makes the handling
- of pathnames more consistent between local and dfs FileSystems and
- between Windows and Unix. java.io.File-based methods are still
- available for back-compatibility, but are deprecated and will be
- removed once 0.2 is released. (cutting)
- 8. Change dfs.data.dir and mapred.local.dir to be comma-separated
- lists of directories, no longer be space-separated. This fixes
- several bugs on Windows. (cutting)
- 9. Fix HADOOP-144. Use mapred task id for dfs client id, to
- facilitate debugging. (omalley via cutting)
- 10. Fix HADOOP-143. Do not line-wrap stack-traces in web ui.
- (omalley via cutting)
- 11. Fix HADOOP-118. In DFS, improve clean up of abandoned file
- creations. (omalley via cutting)
- 12. Fix HADOOP-138. Stop multiple tasks in a single heartbeat, rather
- than one per heartbeat. (Stefan via cutting)
- 13. Fix HADOOP-139. Remove a potential deadlock in
- LocalFileSystem.lock(). (Igor Bolotin via cutting)
- 14. Fix HADOOP-134. Don't hang jobs when the tasktracker is
- misconfigured to use an un-writable local directory. (omalley via cutting)
- 15. Fix HADOOP-115. Correct an error message. (Stack via cutting)
- 16. Fix HADOOP-133. Retry pings from child to parent, in case of
- (local) communcation problems. Also log exit status, so that one
- can distinguish patricide from other deaths. (omalley via cutting)
- 17. Fix HADOOP-142. Avoid re-running a task on a host where it has
- previously failed. (omalley via cutting)
- 18. Fix HADOOP-148. Maintain a task failure count for each
- tasktracker and display it in the web ui. (omalley via cutting)
- 19. Fix HADOOP-151. Close a potential socket leak, where new IPC
- connection pools were created per configuration instance that RPCs
- use. Now a global RPC connection pool is used again, as
- originally intended. (cutting)
- 20. Fix HADOOP-69. Don't throw a NullPointerException when getting
- hints for non-existing file split. (Bryan Pendelton via cutting)
- 21. Fix HADOOP-157. When a task that writes dfs files (e.g., a reduce
- task) failed and was retried, it would fail again and again,
- eventually failing the job. The problem was that dfs did not yet
- know that the failed task had abandoned the files, and would not
- yet let another task create files with the same names. Dfs now
- retries when creating a file long enough for locks on abandoned
- files to expire. (omalley via cutting)
- 22. Fix HADOOP-150. Improved task names that include job
- names. (omalley via cutting)
- 23. Fix HADOOP-162. Fix ConcurrentModificationException when
- releasing file locks. (omalley via cutting)
- 24. Fix HADOOP-132. Initial check-in of new Metrics API, including
- implementations for writing metric data to a file and for sending
- it to Ganglia. (David Bowen via cutting)
- 25. Fix HADOOP-160. Remove some uneeded synchronization around
- time-consuming operations in the TaskTracker. (omalley via cutting)
- 26. Fix HADOOP-166. RPCs failed when passed subclasses of a declared
- parameter type. This is fixed by changing ObjectWritable to store
- both the declared type and the instance type for Writables. Note
- that this incompatibly changes the format of ObjectWritable and
- will render unreadable any ObjectWritables stored in files.
- Nutch only uses ObjectWritable in intermediate files, so this
- should not be a problem for Nutch. (Stefan & cutting)
- 27. Fix HADOOP-168. MapReduce RPC protocol methods should all declare
- IOException, so that timeouts are handled appropriately.
- (omalley via cutting)
- 28. Fix HADOOP-169. Don't fail a reduce task if a call to the
- jobtracker to locate map outputs fails. (omalley via cutting)
- 29. Fix HADOOP-170. Permit FileSystem clients to examine and modify
- the replication count of individual files. Also fix a few
- replication-related bugs. (Konstantin Shvachko via cutting)
- 30. Permit specification of a higher replication levels for job
- submission files (job.xml and job.jar). This helps with large
- clusters, since these files are read by every node. (cutting)
- 31. HADOOP-173. Optimize allocation of tasks with local data. (cutting)
- 32. HADOOP-167. Reduce number of Configurations and JobConf's
- created. (omalley via cutting)
- 33. NUTCH-256. Change FileSystem#createNewFile() to create a .crc
- file. The lack of a .crc file was causing warnings. (cutting)
- 34. HADOOP-174. Change JobClient to not abort job until it has failed
- to contact the job tracker for five attempts, not just one as
- before. (omalley via cutting)
- 35. HADOOP-177. Change MapReduce web interface to page through tasks.
- Previously, when jobs had more than a few thousand tasks they
- could crash web browsers. (Mahadev Konar via cutting)
- 36. HADOOP-178. In DFS, piggyback blockwork requests from datanodes
- on heartbeat responses from namenode. This reduces the volume of
- RPC traffic. Also move startup delay in blockwork from datanode
- to namenode. This fixes a problem where restarting the namenode
- triggered a lot of uneeded replication. (Hairong Kuang via cutting)
- 37. HADOOP-183. If the DFS namenode is restarted with different
- minimum and/or maximum replication counts, existing files'
- replication counts are now automatically adjusted to be within the
- newly configured bounds. (Hairong Kuang via cutting)
- 38. HADOOP-186. Better error handling in TaskTracker's top-level
- loop. Also improve calculation of time to send next heartbeat.
- (omalley via cutting)
- 39. HADOOP-187. Add two MapReduce examples/benchmarks. One creates
- files containing random data. The second sorts the output of the
- first. (omalley via cutting)
- 40. HADOOP-185. Fix so that, when a task tracker times out making the
- RPC asking for a new task to run, the job tracker does not think
- that it is actually running the task returned. (omalley via cutting)
- 41. HADOOP-190. If a child process hangs after it has reported
- completion, its output should not be lost. (Stack via cutting)
- 42. HADOOP-184. Re-structure some test code to better support testing
- on a cluster. (Mahadev Konar via cutting)
- 43. HADOOP-191 Add streaming package, Hadoop's first contrib module.
- This permits folks to easily submit MapReduce jobs whose map and
- reduce functions are implemented by shell commands. Use
- 'bin/hadoop jar build/hadoop-streaming.jar' to get details.
- (Michel Tourn via cutting)
- 44. HADOOP-189. Fix MapReduce in standalone configuration to
- correctly handle job jar files that contain a lib directory with
- nested jar files. (cutting)
- 45. HADOOP-65. Initial version of record I/O framework that enables
- the specification of record types and generates marshalling code
- in both Java and C++. Generated Java code implements
- WritableComparable, but is not yet otherwise used by
- Hadoop. (Milind Bhandarkar via cutting)
- 46. HADOOP-193. Add a MapReduce-based FileSystem benchmark.
- (Konstantin Shvachko via cutting)
- 47. HADOOP-194. Add a MapReduce-based FileSystem checker. This reads
- every block in every file in the filesystem. (Konstantin Shvachko
- via cutting)
- 48. HADOOP-182. Fix so that lost task trackers to not change the
- status of reduce tasks or completed jobs. Also fixes the progress
- meter so that failed tasks are subtracted. (omalley via cutting)
- 49. HADOOP-96. Logging improvements. Log files are now separate from
- standard output and standard error files. Logs are now rolled.
- Logging of all DFS state changes can be enabled, to facilitate
- debugging. (Hairong Kuang via cutting)
- Release 0.1.1 - 2006-04-08
- 1. Added CHANGES.txt, logging all significant changes to Hadoop. (cutting)
- 2. Fix MapReduceBase.close() to throw IOException, as declared in the
- Closeable interface. This permits subclasses which override this
- method to throw that exception. (cutting)
- 3. Fix HADOOP-117. Pathnames were mistakenly transposed in
- JobConf.getLocalFile() causing many mapred temporary files to not
- be removed. (Raghavendra Prabhu via cutting)
-
- 4. Fix HADOOP-116. Clean up job submission files when jobs complete.
- (cutting)
- 5. Fix HADOOP-125. Fix handling of absolute paths on Windows (cutting)
- Release 0.1.0 - 2006-04-01
- 1. The first release of Hadoop.
|