index.md.vm 4.9 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121
  1. <!---
  2. Licensed under the Apache License, Version 2.0 (the "License");
  3. you may not use this file except in compliance with the License.
  4. You may obtain a copy of the License at
  5. http://www.apache.org/licenses/LICENSE-2.0
  6. Unless required by applicable law or agreed to in writing, software
  7. distributed under the License is distributed on an "AS IS" BASIS,
  8. WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  9. See the License for the specific language governing permissions and
  10. limitations under the License. See accompanying LICENSE file.
  11. -->
  12. Apache Hadoop ${project.version}
  13. ================================
  14. Apache Hadoop ${project.version} is an update to the Hadoop 3.3.x release branch.
  15. Overview of Changes
  16. ===================
  17. Users are encouraged to read the full set of release notes.
  18. This page provides an overview of the major changes.
  19. Vectored IO API
  20. ---------------
  21. The `PositionedReadable` interface has now added an operation for
  22. Vectored (also known as Scatter/Gather IO):
  23. ```java
  24. void readVectored(List<? extends FileRange> ranges, IntFunction<ByteBuffer> allocate)
  25. ```
  26. All the requested ranges will be retrieved into the supplied byte buffers -possibly asynchronously,
  27. possibly in parallel, with results potentially coming in out-of-order.
  28. 1. The default implementation uses a series of `readFully()` calls, so delivers
  29. equivalent performance.
  30. 2. The local filesystem uses java native IO calls for higher performance reads than `readFully()`
  31. 3. The S3A filesystem issues parallel HTTP GET requests in different threads.
  32. Benchmarking of (modified) ORC and Parquet clients through `file://` and `s3a://`
  33. show tangible improvements in query times.
  34. Further Reading: [FsDataInputStream](./hadoop-project-dist/hadoop-common/filesystem/fsdatainputstream.html).
  35. Manifest Committer for Azure ABFS and google GCS performance
  36. ------------------------------------------------------------
  37. A new "intermediate manifest committer" uses a manifest file
  38. to commit the work of successful task attempts, rather than
  39. renaming directories.
  40. Job commit is matter of reading all the manifests, creating the
  41. destination directories (parallelized) and renaming the files,
  42. again in parallel.
  43. This is fast and correct on Azure Storage and Google GCS,
  44. and should be used there instead of the classic v1/v2 file
  45. output committers.
  46. It is also safe to use on HDFS, where it should be faster
  47. than the v1 committer. It is however optimized for
  48. cloud storage where list and rename operations are significantly
  49. slower; the benefits may be less.
  50. More details are available in the
  51. [manifest committer](./hadoop-mapreduce-client/hadoop-mapreduce-client-core/manifest_committer.html).
  52. documentation.
  53. Transitive CVE fixes
  54. --------------------
  55. A lot of dependencies have been upgraded to address recent CVEs.
  56. Many of the CVEs were not actually exploitable through the Hadoop
  57. so much of this work is just due diligence.
  58. However applications which have all the library is on a class path may
  59. be vulnerable, and the ugprades should also reduce the number of false
  60. positives security scanners report.
  61. We have not been able to upgrade every single dependency to the latest
  62. version there is. Some of those changes are just going to be incompatible.
  63. If you have concerns about the state of a specific library, consult the apache JIRA
  64. issue tracker to see what discussions have taken place about the library in question.
  65. As an open source project, contributions in this area are always welcome,
  66. especially in testing the active branches, testing applications downstream of
  67. those branches and of whether updated dependencies trigger regressions.
  68. HDFS: Router Based Federation
  69. -----------------------------
  70. A lot of effort has been invested into stabilizing/improving the HDFS Router Based Federation feature.
  71. 1. HDFS-13522, HDFS-16767 & Related Jiras: Allow Observer Reads in HDFS Router Based Federation.
  72. 2. HDFS-13248: RBF supports Client Locality
  73. HDFS: Dynamic Datanode Reconfiguration
  74. --------------------------------------
  75. HDFS-16400, HDFS-16399, HDFS-16396, HDFS-16397, HDFS-16413, HDFS-16457.
  76. A number of Datanode configuration options can be changed without having to restart
  77. the datanode. This makes it possible to tune deployment configurations without
  78. cluster-wide Datanode Restarts.
  79. See [DataNode.java](https://github.com/apache/hadoop/blob/branch-3.3.5/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java#L346-L361)
  80. for the list of dynamically reconfigurable attributes.
  81. Getting Started
  82. ===============
  83. The Hadoop documentation includes the information you need to get started using
  84. Hadoop. Begin with the
  85. [Single Node Setup](./hadoop-project-dist/hadoop-common/SingleCluster.html)
  86. which shows you how to set up a single-node Hadoop installation.
  87. Then move on to the
  88. [Cluster Setup](./hadoop-project-dist/hadoop-common/ClusterSetup.html)
  89. to learn how to set up a multi-node Hadoop installation.