1 year ago · 8918c7156d
--- a/hadoop-project/src/site/markdown/index.md.vm
+++ b/hadoop-project/src/site/markdown/index.md.vm
@@ -23,109 +23,164 @@ Overview of Changes
 
															 Users are encouraged to read the full set of release notes.
														
 
															 This page provides an overview of the major changes.
														
 
															-DataNode FsDatasetImpl Fine-Grained Locking via BlockPool
														
 
															+S3A: Upgrade AWS SDK to V2
														
 
															 ----------------------------------------
														
 
															-[HDFS-15180](https://issues.apache.org/jira/browse/HDFS-15180) Split FsDatasetImpl datasetLock via blockpool to solve the issue of heavy FsDatasetImpl datasetLock
														
 
															-When there are many namespaces in a large cluster.
														
 
															+
														
 
															+[HADOOP-18073](https://issues.apache.org/jira/browse/HADOOP-18073) S3A: Upgrade AWS SDK to V2
														
 
															+
														
 
															+The S3A connector now uses the V2 AWS SDK.  This is a significant change at the source code level.
														
 
															+
														
 
															+Any applications using the internal extension/override points in the filesystem connector are likely to break.
														
 
															+
														
 
															+Consult the document aws\_sdk\_upgrade for the full details.
														
 
															+
														
 
															+HDFS DataNode Split one FsDatasetImpl lock to volume grain locks
														
 
															+----------------------------------------
														
 
															+
														
 
															+[HDFS-15382](https://issues.apache.org/jira/browse/HDFS-15382) Split one FsDatasetImpl lock to volume grain locks.
														
 
															+
														
 
															+Throughput is one of the core performance evaluation for DataNode instance.
														
 
															+
														
 
															+However, it does not reach the best performance  especially for Federation deploy all the time although there are different improvement,
														
 
															+
														
 
															+because of the global coarse-grain lock.
														
 
															+
														
 
															+These series issues (include [HDFS-16534](https://issues.apache.org/jira/browse/HDFS-16534), [HDFS-16511](https://issues.apache.org/jira/browse/HDFS-16511), [HDFS-15382](https://issues.apache.org/jira/browse/HDFS-15382) and [HDFS-16429](https://issues.apache.org/jira/browse/HDFS-16429).) try to split the global coarse-grain lock to
														
 
															+
														
 
															+fine-grain lock which is double level lock for blockpool and volume, to improve the throughput and avoid lock impacts between
														
 
															+
														
 
															+blockpools and volumes.
														
 
															 YARN Federation improvements
														
 
															 ----------------------------------------
														
 
															-[YARN-5597](https://issues.apache.org/jira/browse/YARN-5597) brings many improvements, including the following:
														
 
															-1. YARN Router now boasts a full implementation of all relevant interfaces including the ApplicationClientProtocol, ResourceManagerAdministrationProtocol, and RMWebServiceProtocol.
														
 
															-2. Enhanced support for Application cleanup and automatic offline mechanisms for SubCluster are now facilitated by the YARN Router.
														
 
															-3. Code optimization for Router and AMRMProxy was undertaken, coupled with improvements to previously pending functionalities.
														
 
															+[YARN-5597](https://issues.apache.org/jira/browse/YARN-5597) YARN Federation improvements.
														
 
															+
														
 
															+We have enhanced the YARN Federation functionality for improved usability. The enhanced features are as follows:
														
 
															+
														
 
															+1. YARN Router now boasts a full implementation of all interfaces including the ApplicationClientProtocol, ResourceManagerAdministrationProtocol, and RMWebServiceProtocol.
														
 
															+2. YARN Router support for application cleanup and automatic offline mechanisms for subCluster.
														
 
															+3. Code improvements were undertaken for the Router and AMRMProxy, along with enhancements to previously pending functionalities.
														
 
															 4. Audit logs and Metrics for Router received upgrades.
														
 
															 5. A boost in cluster security features was achieved, with the inclusion of Kerberos support.
														
 
															 6. The page function of the router has been enhanced.
														
 
															+7. A set of commands has been added to the Router side for operating on SubClusters and Policies.
														
 
															-Upgrade AWS SDK to V2
														
 
															+HDFS RBF: Code Enhancements, New Features, and Bug Fixes
														
 
															 ----------------------------------------
														
 
															-[HADOOP-18073](https://issues.apache.org/jira/browse/HADOOP-18073)
														
 
															-The S3A connector now uses the V2 AWS SDK.  This is a significant change at the source code level.
														
 
															-Any applications using the internal extension/override points in the filesystem connector are likely to break.
														
 
															-Consult the document aws\_sdk\_upgrade for the full details.
														
 
															-Azure ABFS: Critical Stream Prefetch Fix
														
 
															+The HDFS RBF functionality has undergone significant enhancements, encompassing over 200 commits for feature
														
 
															+
														
 
															+improvements, new functionalities, and bug fixes.
														
 
															+
														
 
															+Important features and improvements are as follows:
														
 
															+
														
 
															+**Feature**
														
 
															+
														
 
															+[HDFS-15294](https://issues.apache.org/jira/browse/HDFS-15294) Federation balance tool introduces a new HDFS federation balance tool to balance data across different federation
														
 
															+
														
 
															+namespaces. It uses Distcp to copy data from the source path to the target path.
														
 
															+
														
 
															+**Improvement**
														
 
															+
														
 
															+[HDFS-17128](https://issues.apache.org/jira/browse/HDFS-17128) RBF: SQLDelegationTokenSecretManager should use version of tokens updated by other routers.
														
 
															+
														
 
															+The SQLDelegationTokenSecretManager enhances performance by maintaining processed tokens in memory. However, there is
														
 
															+
														
 
															+a potential issue of router cache inconsistency due to token loading and renewal. This issue has been addressed by the
														
 
															+
														
 
															+resolution of HDFS-17128.
														
 
															+
														
 
															+[HDFS-17148](https://issues.apache.org/jira/browse/HDFS-17148) RBF: SQLDelegationTokenSecretManager must cleanup expired tokens in SQL.
														
 
															+
														
 
															+SQLDelegationTokenSecretManager, while fetching and temporarily storing tokens from SQL in a memory cache with a short TTL,
														
 
															+
														
 
															+faces an issue where expired tokens are not efficiently cleaned up, leading to a buildup of expired tokens in the SQL database.
														
 
															+
														
 
															+This issue has been addressed by the resolution of HDFS-17148.
														
 
															+
														
 
															+**Others**
														
 
															+
														
 
															+Other changes to HDFS RBF include WebUI, command line, and other improvements. Please refer to the release document.
														
 
															+
														
 
															+HDFS EC: Code Enhancements and Bug Fixes
														
 
															 ----------------------------------------
														
 
															-The abfs has a critical bug fix
														
 
															-[HADOOP-18546](https://issues.apache.org/jira/browse/HADOOP-18546).
														
 
															-*ABFS. Disable purging list of in-progress reads in abfs stream close().*
														
 
															+HDFS EC has made code improvements and fixed some bugs.
														
 
															+
														
 
															+Important improvements and bugs are as follows:
														
 
															+
														
 
															+**Improvement**
														
 
															+
														
 
															+[HDFS-16613](https://issues.apache.org/jira/browse/HDFS-16613) EC: Improve performance of decommissioning dn with many ec blocks.
														
 
															+
														
 
															+In a hdfs cluster with a lot of EC blocks, decommission a dn is very slow. The reason is unlike replication blocks can be replicated
														
 
															+
														
 
															+from any dn which has the same block replication, the ec block have to be replicated from the decommissioning dn.
														
 
															+
														
 
															+The configurations `dfs.namenode.replication.max-streams` and `dfs.namenode.replication.max-streams-hard-limit` will limit
														
 
															+
														
 
															+the replication speed, but increase these configurations will create risk to the whole cluster's network. So it should add a new
														
 
															+
														
 
															+configuration to limit the decommissioning dn, distinguished from the cluster wide max-streams limit.
														
 
															+
														
 
															+[HDFS-16663](https://issues.apache.org/jira/browse/HDFS-16663) Allow block reconstruction pending timeout refreshable to increase decommission performance.
														
 
															+
														
 
															+In [HDFS-16613](https://issues.apache.org/jira/browse/HDFS-16613), increase the value of `dfs.namenode.replication.max-streams-hard-limit` would maximize the IO
														
 
															+
														
 
															+performance of the decommissioning DN, which has a lot of EC blocks. Besides this, we also need to decrease the value of
														
 
															+
														
 
															+`dfs.namenode.reconstruction.pending.timeout-sec`, default is 5 minutes, to shorten the interval time for checking
														
 
															+
														
 
															+pendingReconstructions. Or the decommissioning node would be idle to wait for copy tasks in most of this 5 minutes.
														
 
															+
														
 
															+In decommission progress, we may need to reconfigure these 2 parameters several times. In [HDFS-14560](https://issues.apache.org/jira/browse/HDFS-14560), the
														
 
															+
														
 
															+`dfs.namenode.replication.max-streams-hard-limit` can already be reconfigured dynamically without namenode restart. And
														
 
															-All users of the abfs connector in hadoop releases 3.3.2+ MUST either upgrade
														
 
															-or disable prefetching by setting `fs.azure.readaheadqueue.depth` to `0`
														
 
															+the `dfs.namenode.reconstruction.pending.timeout-sec` parameter also need to be reconfigured dynamically.
														
 
															-Consult the parent JIRA [HADOOP-18521](https://issues.apache.org/jira/browse/HADOOP-18521)
														
 
															-*ABFS ReadBufferManager buffer sharing across concurrent HTTP requests*
														
 
															-for root cause analysis, details on what is affected, and mitigations.
														
 
															+**Bug**
														
 
															+[HDFS-16456](https://issues.apache.org/jira/browse/HDFS-16456) EC: Decommission a rack with only on dn will fail when the rack number is equal with replication.
														
 
															-Vectored IO API
														
 
															----------------
														
 
															+In below scenario, decommission will fail by `TOO_MANY_NODES_ON_RACK` reason:
														
 
															-[HADOOP-18103](https://issues.apache.org/jira/browse/HADOOP-18103).
														
 
															-*High performance vectored read API in Hadoop*
														
 
															+- Enable EC policy, such as RS-6-3-1024k.
														
 
															-The `PositionedReadable` interface has now added an operation for
														
 
															-Vectored IO (also known as Scatter/Gather IO):
														
 
															+- The rack number in this cluster is equal with or less than the replication number(9)
														
 
															-```java
														
 
															-void readVectored(List<? extends FileRange> ranges, IntFunction<ByteBuffer> allocate)
														
 
															-```
														
 
															+- A rack only has one DN, and decommission this DN.
														
 
															-All the requested ranges will be retrieved into the supplied byte buffers -possibly asynchronously,
														
 
															-possibly in parallel, with results potentially coming in out-of-order.
														
 
															+This issue has been addressed by the resolution of HDFS-16456.
														
 
															-1. The default implementation uses a series of `readFully()` calls, so delivers
														
 
															-   equivalent performance.
														
 
															-2. The local filesystem uses java native IO calls for higher performance reads than `readFully()`.
														
 
															-3. The S3A filesystem issues parallel HTTP GET requests in different threads.
														
 
															+[HDFS-17094](https://issues.apache.org/jira/browse/HDFS-17094) EC: Fix bug in block recovery when there are stale datanodes.
														
 
															-Benchmarking of enhanced Apache ORC and Apache Parquet clients through `file://` and `s3a://`
														
 
															-show significant improvements in query performance.
														
 
															+During block recovery, the `RecoveryTaskStriped` in the datanode expects a one-to-one correspondence between
														
 
															-Further Reading:
														
 
															-* [FsDataInputStream](./hadoop-project-dist/hadoop-common/filesystem/fsdatainputstream.html).
														
 
															-* [Hadoop Vectored IO: Your Data Just Got Faster!](https://apachecon.com/acasia2022/sessions/bigdata-1148.html)
														
 
															-  Apachecon 2022 talk.
														
 
															+`rBlock.getLocations()` and `rBlock.getBlockIndices()`. However, if there are stale locations during a NameNode heartbeat,
														
 
															-Mapreduce: Manifest Committer for Azure ABFS and google GCS
														
 
															-----------------------------------------------------------
														
 
															+this correspondence may be disrupted. Specifically, although there are no stale locations in `recoveryLocations`, the block indices
														
 
															-The new _Intermediate Manifest Committer_ uses a manifest file
														
 
															-to commit the work of successful task attempts, rather than
														
 
															-renaming directories.
														
 
															-Job commit is matter of reading all the manifests, creating the
														
 
															-destination directories (parallelized) and renaming the files,
														
 
															-again in parallel.
														
 
															+array remains complete. This discrepancy causes `BlockRecoveryWorker.RecoveryTaskStriped#recover` to generate an incorrect
														
 
															-This is both fast and correct on Azure Storage and Google GCS,
														
 
															-and should be used there instead of the classic v1/v2 file
														
 
															-output committers.
														
 
															+internal block ID, leading to a failure in the recovery process as the corresponding datanode cannot locate the replica.
														
 
															-It is also safe to use on HDFS, where it should be faster
														
 
															-than the v1 committer. It is however optimized for
														
 
															-cloud storage where list and rename operations are significantly
														
 
															-slower; the benefits may be less.
														
 
															+This issue has been addressed by the resolution of HDFS-17094.
														
 
															-More details are available in the
														
 
															-[manifest committer](./hadoop-mapreduce-client/hadoop-mapreduce-client-core/manifest_committer.html).
														
 
															-documentation.
														
 
															+[HDFS-17284](https://issues.apache.org/jira/browse/HDFS-17284). EC: Fix int overflow in calculating numEcReplicatedTasks and numReplicationTasks during block recovery.
														
 
															+Due to an integer overflow in the calculation of numReplicationTasks or numEcReplicatedTasks, the NameNode's configuration
														
 
															-HDFS: Dynamic Datanode Reconfiguration
														
 
															---------------------------------------
														
 
															+parameter `dfs.namenode.replication.max-streams-hard-limit` failed to take effect. This led to an excessive number of tasks
														
 
															-HDFS-16400, HDFS-16399, HDFS-16396, HDFS-16397, HDFS-16413, HDFS-16457.
														
 
															+being sent to the DataNodes, consequently occupying too much of their memory.
														
 
															-A number of Datanode configuration options can be changed without having to restart
														
 
															-the datanode. This makes it possible to tune deployment configurations without
														
 
															-cluster-wide Datanode Restarts.
														
 
															+This issue has been addressed by the resolution of HDFS-17284.
														
 
															-See [DataNode.java](https://github.com/apache/hadoop/blob/branch-3.3.5/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java#L346-L361)
														
 
															-for the list of dynamically reconfigurable attributes.
														
 
															+**Others**
														
 
															+Other improvements and fixes for HDFS EC, Please refer to the release document.
														
 
															 Transitive CVE fixes
														
 
															 --------------------
														
@@ -133,8 +188,8 @@ Transitive CVE fixes
 
															 A lot of dependencies have been upgraded to address recent CVEs.
														
 
															 Many of the CVEs were not actually exploitable through the Hadoop
														
 
															 so much of this work is just due diligence.
														
 
															-However applications which have all the library is on a class path may
														
 
															-be vulnerable, and the ugprades should also reduce the number of false
														
 
															+However, applications which have all the library is on a class path may
														
 
															+be vulnerable, and the upgrades should also reduce the number of false
														
 
															 positives security scanners report.
														
 
															 We have not been able to upgrade every single dependency to the latest
														
@@ -170,12 +225,12 @@ can, with care, keep data and computing resources private.
 
															 1. Physical cluster: *configure Hadoop security*, usually bonded to the
														
 
															    enterprise Kerberos/Active Directory systems.
														
 
															    Good.
														
 
															-1. Cloud: transient or persistent single or multiple user/tenant cluster
														
 
															+2. Cloud: transient or persistent single or multiple user/tenant cluster
														
 
															    with private VLAN *and security*.
														
 
															    Good.
														
 
															    Consider [Apache Knox](https://knox.apache.org/) for managing remote
														
 
															    access to the cluster.
														
 
															-1. Cloud: transient single user/tenant cluster with private VLAN
														
 
															+3. Cloud: transient single user/tenant cluster with private VLAN
														
 
															    *and no security at all*.
														
 
															    Requires careful network configuration as this is the sole
														
 
															    means of securing the cluster..