123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121 |
- <!---
- Licensed under the Apache License, Version 2.0 (the "License");
- you may not use this file except in compliance with the License.
- You may obtain a copy of the License at
- http://www.apache.org/licenses/LICENSE-2.0
- Unless required by applicable law or agreed to in writing, software
- distributed under the License is distributed on an "AS IS" BASIS,
- WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- See the License for the specific language governing permissions and
- limitations under the License. See accompanying LICENSE file.
- -->
- Apache Hadoop ${project.version}
- ================================
- Apache Hadoop ${project.version} is an update to the Hadoop 3.3.x release branch.
- Overview of Changes
- ===================
- Users are encouraged to read the full set of release notes.
- This page provides an overview of the major changes.
- Vectored IO API
- ---------------
- The `PositionedReadable` interface has now added an operation for
- Vectored (also known as Scatter/Gather IO):
- ```java
- void readVectored(List<? extends FileRange> ranges, IntFunction<ByteBuffer> allocate)
- ```
- All the requested ranges will be retrieved into the supplied byte buffers -possibly asynchronously,
- possibly in parallel, with results potentially coming in out-of-order.
- 1. The default implementation uses a series of `readFully()` calls, so delivers
- equivalent performance.
- 2. The local filesystem uses java native IO calls for higher performance reads than `readFully()`
- 3. The S3A filesystem issues parallel HTTP GET requests in different threads.
- Benchmarking of (modified) ORC and Parquet clients through `file://` and `s3a://`
- show tangible improvements in query times.
- Further Reading: [FsDataInputStream](./hadoop-project-dist/hadoop-common/filesystem/fsdatainputstream.html).
- Manifest Committer for Azure ABFS and google GCS performance
- ------------------------------------------------------------
- A new "intermediate manifest committer" uses a manifest file
- to commit the work of successful task attempts, rather than
- renaming directories.
- Job commit is matter of reading all the manifests, creating the
- destination directories (parallelized) and renaming the files,
- again in parallel.
- This is fast and correct on Azure Storage and Google GCS,
- and should be used there instead of the classic v1/v2 file
- output committers.
- It is also safe to use on HDFS, where it should be faster
- than the v1 committer. It is however optimized for
- cloud storage where list and rename operations are significantly
- slower; the benefits may be less.
- More details are available in the
- [manifest committer](./hadoop-mapreduce-client/hadoop-mapreduce-client-core/manifest_committer.html).
- documentation.
- Transitive CVE fixes
- --------------------
- A lot of dependencies have been upgraded to address recent CVEs.
- Many of the CVEs were not actually exploitable through the Hadoop
- so much of this work is just due diligence.
- However applications which have all the library is on a class path may
- be vulnerable, and the ugprades should also reduce the number of false
- positives security scanners report.
- We have not been able to upgrade every single dependency to the latest
- version there is. Some of those changes are just going to be incompatible.
- If you have concerns about the state of a specific library, consult the apache JIRA
- issue tracker to see what discussions have taken place about the library in question.
- As an open source project, contributions in this area are always welcome,
- especially in testing the active branches, testing applications downstream of
- those branches and of whether updated dependencies trigger regressions.
- HDFS: Router Based Federation
- -----------------------------
- A lot of effort has been invested into stabilizing/improving the HDFS Router Based Federation feature.
- 1. HDFS-13522, HDFS-16767 & Related Jiras: Allow Observer Reads in HDFS Router Based Federation.
- 2. HDFS-13248: RBF supports Client Locality
- HDFS: Dynamic Datanode Reconfiguration
- --------------------------------------
- HDFS-16400, HDFS-16399, HDFS-16396, HDFS-16397, HDFS-16413, HDFS-16457.
- A number of Datanode configuration options can be changed without having to restart
- the datanode. This makes it possible to tune deployment configurations without
- cluster-wide Datanode Restarts.
- See [DataNode.java](https://github.com/apache/hadoop/blob/branch-3.3.5/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java#L346-L361)
- for the list of dynamically reconfigurable attributes.
- Getting Started
- ===============
- The Hadoop documentation includes the information you need to get started using
- Hadoop. Begin with the
- [Single Node Setup](./hadoop-project-dist/hadoop-common/SingleCluster.html)
- which shows you how to set up a single-node Hadoop installation.
- Then move on to the
- [Cluster Setup](./hadoop-project-dist/hadoop-common/ClusterSetup.html)
- to learn how to set up a multi-node Hadoop installation.
|