Hadoop 0.23.4 Release Notes
These release notes include new developer and user-facing incompatibilities, features, and major improvements.
Changes since Hadoop 0.23.3
- YARN-137.
Major improvement reported by Siddharth Seth and fixed by Siddharth Seth (scheduler)
Change the default scheduler to the CapacityScheduler
There's some bugs in the FifoScheduler atm - doesn't distribute tasks across nodes and some headroom (available resource) issues.
That's not the best experience for users trying out the 2.0 branch. The CS with the default configuration of a single queue behaves the same as the FifoScheduler and doesn't have these issues.
- YARN-108.
Critical bug reported by Jason Lowe and fixed by Jason Lowe (nodemanager)
FSDownload can create cache directories with the wrong permissions
When the cluster is configured with a restrictive umask, e.g.: {{fs.permissions.umask-mode=0077}}, the nodemanager can end up creating directory entries in the public cache with the wrong permissions. The permissions can end up where only the nodemanager user can access files in the public cache, preventing jobs from running properly.
- YARN-106.
Major bug reported by Jason Lowe and fixed by Jason Lowe (nodemanager)
Nodemanager needs to set permissions of local directories
If the nodemanager process is running with a restrictive default umask (e.g.: 0077) then it will create its local directories with permissions that are too restrictive to allow containers from other users to run.
- YARN-93.
Major bug reported by Jason Lowe and fixed by Jason Lowe (resourcemanager)
Diagnostics missing from applications that have finished but failed
If an application finishes in the YARN sense but fails in the app framework sense (e.g.: a failed MapReduce job) then diagnostics are missing from the RM web page for the application. The RM should be reporting diagnostic messages even for successful YARN applications.
- YARN-88.
Major bug reported by Jason Lowe and fixed by Jason Lowe (nodemanager)
DefaultContainerExecutor can fail to set proper permissions
{{DefaultContainerExecutor}} can fail to set the proper permissions on its local directories if the cluster has been configured with a restrictive umask, e.g.: fs.permissions.umask-mode=0077. The configured umask ends up defeating the permissions requested by {{DefaultContainerExecutor}} when it creates directories.
- YARN-75.
Major bug reported by Siddharth Seth and fixed by Siddharth Seth
RMContainer should handle a RELEASE event while RUNNING
An AppMaster can send a container release at any point. Currently this results in an exception, if this is done while the RM considers the container to be RUNNING.
The event not being processed correctly also implies that these containers do not show up in the Completed Container List seen by the AM (AMRMProtocol). MR-3902 depends on this set being complete.
- YARN-57.
Major improvement reported by Radim Kolar and fixed by Radim Kolar (nodemanager)
Plugable process tree
Trunk version of Pluggable process tree. Work based on MAPREDUCE-4204
- YARN-42.
Major bug reported by Devaraj K and fixed by Devaraj K (nodemanager)
Node Manager throws NPE on startup
NM throws NPE on startup if it doesn't have persmission's on nm local dir's
{code:xml}
2012-05-14 16:32:13,468 FATAL org.apache.hadoop.yarn.server.nodemanager.NodeManager: Error starting NodeManager
org.apache.hadoop.yarn.YarnException: Failed to initialize LocalizationService
at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService.init(ResourceLocalizationService.java:202)
at org.apache.hadoop.yarn.service.CompositeService.init(CompositeService.java:58)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.init(ContainerManagerImpl.java:183)
at org.apache.hadoop.yarn.service.CompositeService.init(CompositeService.java:58)
at org.apache.hadoop.yarn.server.nodemanager.NodeManager.init(NodeManager.java:166)
at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:268)
at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:284)
Caused by: java.io.IOException: mkdir of /mrv2/tmp/nm-local-dir/usercache failed
at org.apache.hadoop.fs.FileSystem.primitiveMkdir(FileSystem.java:907)
at org.apache.hadoop.fs.DelegateToFileSystem.mkdir(DelegateToFileSystem.java:143)
at org.apache.hadoop.fs.FilterFs.mkdir(FilterFs.java:189)
at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:706)
at org.apache.hadoop.fs.FileContext$4.next(FileContext.java:703)
at org.apache.hadoop.fs.FileContext$FSLinkResolver.resolve(FileContext.java:2325)
at org.apache.hadoop.fs.FileContext.mkdir(FileContext.java:703)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService.init(ResourceLocalizationService.java:188)
... 6 more
2012-05-14 16:32:13,472 INFO org.apache.hadoop.yarn.service.CompositeService: Error stopping org.apache.hadoop.yarn.server.nodemanager.containermanager.loghandler.NonAggregatingLogHandler
java.lang.NullPointerException
at org.apache.hadoop.yarn.server.nodemanager.containermanager.loghandler.NonAggregatingLogHandler.stop(NonAggregatingLogHandler.java:82)
at org.apache.hadoop.yarn.service.CompositeService.stop(CompositeService.java:99)
at org.apache.hadoop.yarn.service.CompositeService.stop(CompositeService.java:89)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.ContainerManagerImpl.stop(ContainerManagerImpl.java:266)
at org.apache.hadoop.yarn.service.CompositeService.stop(CompositeService.java:99)
at org.apache.hadoop.yarn.service.CompositeService.stop(CompositeService.java:89)
at org.apache.hadoop.yarn.server.nodemanager.NodeManager.stop(NodeManager.java:182)
at org.apache.hadoop.yarn.service.CompositeService$CompositeServiceShutdownHook.run(CompositeService.java:122)
at org.apache.hadoop.util.ShutdownHookManager$1.run(ShutdownHookManager.java:54)
{code}
- MAPREDUCE-4691.
Critical bug reported by Jason Lowe and fixed by Robert Joseph Evans (jobhistoryserver , mrv2)
Historyserver can report "Unknown job" after RM says job has completed
- MAPREDUCE-4689.
Major bug reported by Jason Lowe and fixed by Jason Lowe (client)
JobClient.getMapTaskReports on failed job results in NPE
- MAPREDUCE-4651.
Major new feature reported by Konstantin Shvachko and fixed by Konstantin Shvachko (benchmarks , test)
Benchmarking random reads with DFSIO
- MAPREDUCE-4647.
Major bug reported by Robert Joseph Evans and fixed by Robert Joseph Evans (mrv2)
We should only unjar jobjar if there is a lib directory in it.
- MAPREDUCE-4646.
Major bug reported by Jason Lowe and fixed by Jason Lowe (mrv2)
client does not receive job diagnostics for failed jobs
- MAPREDUCE-4645.
Major improvement reported by Ravi Prakash and fixed by Ravi Prakash (performance , test)
Providing a random seed to Slive should make the sequence of filenames completely deterministic
- MAPREDUCE-4408.
Major improvement reported by Alejandro Abdelnur and fixed by Robert Kanter (mrv1 , mrv2)
allow jobs to set a JAR that is in the distributed cached
- MAPREDUCE-4193.
Major bug reported by Patrick Hunt and fixed by Patrick Hunt (documentation)
broken doc link for yarn-default.xml in site.xml
- MAPREDUCE-2786.
Minor improvement reported by Plamen Jeliazkov and fixed by Plamen Jeliazkov (benchmarks)
TestDFSIO should also test compression reading/writing from command-line.
- HDFS-3922.
Critical bug reported by Kihwal Lee and fixed by Kihwal Lee (name-node)
0.22 and 0.23 namenode throws away blocks under construction on restart
- HDFS-3860.
Major bug reported by Jing Zhao and fixed by Jing Zhao
HeartbeatManager#Monitor may wrongly hold the writelock of namesystem
- HDFS-3831.
Critical bug reported by Jason Lowe and fixed by Jason Lowe (security)
Failure to renew tokens due to test-sources left in classpath
- HDFS-3731.
Blocker bug reported by Suresh Srinivas and fixed by Kihwal Lee (data-node)
2.0 release upgrade must handle blocks being written from 1.0
- HDFS-3373.
Major bug reported by Todd Lipcon and fixed by John George (hdfs client)
FileContext HDFS implementation can leak socket caches
- HADOOP-8843.
Critical bug reported by Robert Joseph Evans and fixed by Jason Lowe
Old trash directories are never deleted on upgrade from 1.x
- HADOOP-8822.
Major bug reported by Robert Joseph Evans and fixed by Robert Joseph Evans
relnotes.py was deleted post mavenization
- HADOOP-8684.
Minor bug reported by Hiroshi Ikeda and fixed by Jing Zhao (io)
Deadlock between WritableComparator and WritableComparable
- HADOOP-8623.
Minor improvement reported by Steven Willis and fixed by Steven Willis (scripts)
hadoop jar command should respect HADOOP_OPTS
- HADOOP-8310.
Major bug reported by Aaron T. Myers and fixed by Aaron T. Myers (fs)
FileContext#checkPath should handle URIs with no port
- HADOOP-8183.
Minor improvement reported by Harsh J and fixed by Harsh J (util)
Stop using "mapred.used.genericoptionsparser" to avoid unnecessary warnings