Browse Source

Update Highlight big features and improvements.

Shilun Fan 1 year ago
parent
commit
8918c7156d
1 changed files with 127 additions and 72 deletions
  1. 127 72
      hadoop-project/src/site/markdown/index.md.vm

+ 127 - 72
hadoop-project/src/site/markdown/index.md.vm

@@ -23,109 +23,164 @@ Overview of Changes
 Users are encouraged to read the full set of release notes.
 Users are encouraged to read the full set of release notes.
 This page provides an overview of the major changes.
 This page provides an overview of the major changes.
 
 
-DataNode FsDatasetImpl Fine-Grained Locking via BlockPool
+S3A: Upgrade AWS SDK to V2
 ----------------------------------------
 ----------------------------------------
-[HDFS-15180](https://issues.apache.org/jira/browse/HDFS-15180) Split FsDatasetImpl datasetLock via blockpool to solve the issue of heavy FsDatasetImpl datasetLock
-When there are many namespaces in a large cluster.
+
+[HADOOP-18073](https://issues.apache.org/jira/browse/HADOOP-18073) S3A: Upgrade AWS SDK to V2
+
+The S3A connector now uses the V2 AWS SDK.  This is a significant change at the source code level.
+
+Any applications using the internal extension/override points in the filesystem connector are likely to break.
+
+Consult the document aws\_sdk\_upgrade for the full details.
+
+HDFS DataNode Split one FsDatasetImpl lock to volume grain locks
+----------------------------------------
+
+[HDFS-15382](https://issues.apache.org/jira/browse/HDFS-15382) Split one FsDatasetImpl lock to volume grain locks.
+
+Throughput is one of the core performance evaluation for DataNode instance.
+
+However, it does not reach the best performance  especially for Federation deploy all the time although there are different improvement,
+
+because of the global coarse-grain lock.
+
+These series issues (include [HDFS-16534](https://issues.apache.org/jira/browse/HDFS-16534), [HDFS-16511](https://issues.apache.org/jira/browse/HDFS-16511), [HDFS-15382](https://issues.apache.org/jira/browse/HDFS-15382) and [HDFS-16429](https://issues.apache.org/jira/browse/HDFS-16429).) try to split the global coarse-grain lock to
+
+fine-grain lock which is double level lock for blockpool and volume, to improve the throughput and avoid lock impacts between
+
+blockpools and volumes.
 
 
 YARN Federation improvements
 YARN Federation improvements
 ----------------------------------------
 ----------------------------------------
-[YARN-5597](https://issues.apache.org/jira/browse/YARN-5597) brings many improvements, including the following:
 
 
-1. YARN Router now boasts a full implementation of all relevant interfaces including the ApplicationClientProtocol, ResourceManagerAdministrationProtocol, and RMWebServiceProtocol.
-2. Enhanced support for Application cleanup and automatic offline mechanisms for SubCluster are now facilitated by the YARN Router.
-3. Code optimization for Router and AMRMProxy was undertaken, coupled with improvements to previously pending functionalities.
+[YARN-5597](https://issues.apache.org/jira/browse/YARN-5597) YARN Federation improvements.
+
+We have enhanced the YARN Federation functionality for improved usability. The enhanced features are as follows:
+
+1. YARN Router now boasts a full implementation of all interfaces including the ApplicationClientProtocol, ResourceManagerAdministrationProtocol, and RMWebServiceProtocol.
+2. YARN Router support for application cleanup and automatic offline mechanisms for subCluster.
+3. Code improvements were undertaken for the Router and AMRMProxy, along with enhancements to previously pending functionalities.
 4. Audit logs and Metrics for Router received upgrades.
 4. Audit logs and Metrics for Router received upgrades.
 5. A boost in cluster security features was achieved, with the inclusion of Kerberos support.
 5. A boost in cluster security features was achieved, with the inclusion of Kerberos support.
 6. The page function of the router has been enhanced.
 6. The page function of the router has been enhanced.
+7. A set of commands has been added to the Router side for operating on SubClusters and Policies.
 
 
-Upgrade AWS SDK to V2
+HDFS RBF: Code Enhancements, New Features, and Bug Fixes
 ----------------------------------------
 ----------------------------------------
-[HADOOP-18073](https://issues.apache.org/jira/browse/HADOOP-18073)
-The S3A connector now uses the V2 AWS SDK.  This is a significant change at the source code level.
-Any applications using the internal extension/override points in the filesystem connector are likely to break.
-Consult the document aws\_sdk\_upgrade for the full details.
 
 
-Azure ABFS: Critical Stream Prefetch Fix
+The HDFS RBF functionality has undergone significant enhancements, encompassing over 200 commits for feature
+
+improvements, new functionalities, and bug fixes.
+
+Important features and improvements are as follows:
+
+**Feature**
+
+[HDFS-15294](https://issues.apache.org/jira/browse/HDFS-15294) Federation balance tool introduces a new HDFS federation balance tool to balance data across different federation
+
+namespaces. It uses Distcp to copy data from the source path to the target path.
+
+**Improvement**
+
+[HDFS-17128](https://issues.apache.org/jira/browse/HDFS-17128) RBF: SQLDelegationTokenSecretManager should use version of tokens updated by other routers.
+
+The SQLDelegationTokenSecretManager enhances performance by maintaining processed tokens in memory. However, there is
+
+a potential issue of router cache inconsistency due to token loading and renewal. This issue has been addressed by the
+
+resolution of HDFS-17128.
+
+[HDFS-17148](https://issues.apache.org/jira/browse/HDFS-17148) RBF: SQLDelegationTokenSecretManager must cleanup expired tokens in SQL.
+
+SQLDelegationTokenSecretManager, while fetching and temporarily storing tokens from SQL in a memory cache with a short TTL,
+
+faces an issue where expired tokens are not efficiently cleaned up, leading to a buildup of expired tokens in the SQL database.
+
+This issue has been addressed by the resolution of HDFS-17148.
+
+**Others**
+
+Other changes to HDFS RBF include WebUI, command line, and other improvements. Please refer to the release document.
+
+HDFS EC: Code Enhancements and Bug Fixes
 ----------------------------------------
 ----------------------------------------
 
 
-The abfs has a critical bug fix
-[HADOOP-18546](https://issues.apache.org/jira/browse/HADOOP-18546).
-*ABFS. Disable purging list of in-progress reads in abfs stream close().*
+HDFS EC has made code improvements and fixed some bugs.
+
+Important improvements and bugs are as follows:
+
+**Improvement**
+
+[HDFS-16613](https://issues.apache.org/jira/browse/HDFS-16613) EC: Improve performance of decommissioning dn with many ec blocks.
+
+In a hdfs cluster with a lot of EC blocks, decommission a dn is very slow. The reason is unlike replication blocks can be replicated
+
+from any dn which has the same block replication, the ec block have to be replicated from the decommissioning dn.
+
+The configurations `dfs.namenode.replication.max-streams` and `dfs.namenode.replication.max-streams-hard-limit` will limit
+
+the replication speed, but increase these configurations will create risk to the whole cluster's network. So it should add a new
+
+configuration to limit the decommissioning dn, distinguished from the cluster wide max-streams limit.
+
+[HDFS-16663](https://issues.apache.org/jira/browse/HDFS-16663) Allow block reconstruction pending timeout refreshable to increase decommission performance.
+
+In [HDFS-16613](https://issues.apache.org/jira/browse/HDFS-16613), increase the value of `dfs.namenode.replication.max-streams-hard-limit` would maximize the IO
+
+performance of the decommissioning DN, which has a lot of EC blocks. Besides this, we also need to decrease the value of
+
+`dfs.namenode.reconstruction.pending.timeout-sec`, default is 5 minutes, to shorten the interval time for checking
+
+pendingReconstructions. Or the decommissioning node would be idle to wait for copy tasks in most of this 5 minutes.
+
+In decommission progress, we may need to reconfigure these 2 parameters several times. In [HDFS-14560](https://issues.apache.org/jira/browse/HDFS-14560), the
+
+`dfs.namenode.replication.max-streams-hard-limit` can already be reconfigured dynamically without namenode restart. And
 
 
-All users of the abfs connector in hadoop releases 3.3.2+ MUST either upgrade
-or disable prefetching by setting `fs.azure.readaheadqueue.depth` to `0`
+the `dfs.namenode.reconstruction.pending.timeout-sec` parameter also need to be reconfigured dynamically.
 
 
-Consult the parent JIRA [HADOOP-18521](https://issues.apache.org/jira/browse/HADOOP-18521)
-*ABFS ReadBufferManager buffer sharing across concurrent HTTP requests*
-for root cause analysis, details on what is affected, and mitigations.
+**Bug**
 
 
+[HDFS-16456](https://issues.apache.org/jira/browse/HDFS-16456) EC: Decommission a rack with only on dn will fail when the rack number is equal with replication.
 
 
-Vectored IO API
----------------
+In below scenario, decommission will fail by `TOO_MANY_NODES_ON_RACK` reason:
 
 
-[HADOOP-18103](https://issues.apache.org/jira/browse/HADOOP-18103).
-*High performance vectored read API in Hadoop*
+- Enable EC policy, such as RS-6-3-1024k.
 
 
-The `PositionedReadable` interface has now added an operation for
-Vectored IO (also known as Scatter/Gather IO):
+- The rack number in this cluster is equal with or less than the replication number(9)
 
 
-```java
-void readVectored(List<? extends FileRange> ranges, IntFunction<ByteBuffer> allocate)
-```
+- A rack only has one DN, and decommission this DN.
 
 
-All the requested ranges will be retrieved into the supplied byte buffers -possibly asynchronously,
-possibly in parallel, with results potentially coming in out-of-order.
+This issue has been addressed by the resolution of HDFS-16456.
 
 
-1. The default implementation uses a series of `readFully()` calls, so delivers
-   equivalent performance.
-2. The local filesystem uses java native IO calls for higher performance reads than `readFully()`.
-3. The S3A filesystem issues parallel HTTP GET requests in different threads.
+[HDFS-17094](https://issues.apache.org/jira/browse/HDFS-17094) EC: Fix bug in block recovery when there are stale datanodes.
 
 
-Benchmarking of enhanced Apache ORC and Apache Parquet clients through `file://` and `s3a://`
-show significant improvements in query performance.
+During block recovery, the `RecoveryTaskStriped` in the datanode expects a one-to-one correspondence between
 
 
-Further Reading:
-* [FsDataInputStream](./hadoop-project-dist/hadoop-common/filesystem/fsdatainputstream.html).
-* [Hadoop Vectored IO: Your Data Just Got Faster!](https://apachecon.com/acasia2022/sessions/bigdata-1148.html)
-  Apachecon 2022 talk.
+`rBlock.getLocations()` and `rBlock.getBlockIndices()`. However, if there are stale locations during a NameNode heartbeat,
 
 
-Mapreduce: Manifest Committer for Azure ABFS and google GCS
-----------------------------------------------------------
+this correspondence may be disrupted. Specifically, although there are no stale locations in `recoveryLocations`, the block indices
 
 
-The new _Intermediate Manifest Committer_ uses a manifest file
-to commit the work of successful task attempts, rather than
-renaming directories.
-Job commit is matter of reading all the manifests, creating the
-destination directories (parallelized) and renaming the files,
-again in parallel.
+array remains complete. This discrepancy causes `BlockRecoveryWorker.RecoveryTaskStriped#recover` to generate an incorrect
 
 
-This is both fast and correct on Azure Storage and Google GCS,
-and should be used there instead of the classic v1/v2 file
-output committers.
+internal block ID, leading to a failure in the recovery process as the corresponding datanode cannot locate the replica.
 
 
-It is also safe to use on HDFS, where it should be faster
-than the v1 committer. It is however optimized for
-cloud storage where list and rename operations are significantly
-slower; the benefits may be less.
+This issue has been addressed by the resolution of HDFS-17094.
 
 
-More details are available in the
-[manifest committer](./hadoop-mapreduce-client/hadoop-mapreduce-client-core/manifest_committer.html).
-documentation.
+[HDFS-17284](https://issues.apache.org/jira/browse/HDFS-17284). EC: Fix int overflow in calculating numEcReplicatedTasks and numReplicationTasks during block recovery.
 
 
+Due to an integer overflow in the calculation of numReplicationTasks or numEcReplicatedTasks, the NameNode's configuration
 
 
-HDFS: Dynamic Datanode Reconfiguration
---------------------------------------
+parameter `dfs.namenode.replication.max-streams-hard-limit` failed to take effect. This led to an excessive number of tasks
 
 
-HDFS-16400, HDFS-16399, HDFS-16396, HDFS-16397, HDFS-16413, HDFS-16457.
+being sent to the DataNodes, consequently occupying too much of their memory.
 
 
-A number of Datanode configuration options can be changed without having to restart
-the datanode. This makes it possible to tune deployment configurations without
-cluster-wide Datanode Restarts.
+This issue has been addressed by the resolution of HDFS-17284.
 
 
-See [DataNode.java](https://github.com/apache/hadoop/blob/branch-3.3.5/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java#L346-L361)
-for the list of dynamically reconfigurable attributes.
+**Others**
 
 
+Other improvements and fixes for HDFS EC, Please refer to the release document.
 
 
 Transitive CVE fixes
 Transitive CVE fixes
 --------------------
 --------------------
@@ -133,8 +188,8 @@ Transitive CVE fixes
 A lot of dependencies have been upgraded to address recent CVEs.
 A lot of dependencies have been upgraded to address recent CVEs.
 Many of the CVEs were not actually exploitable through the Hadoop
 Many of the CVEs were not actually exploitable through the Hadoop
 so much of this work is just due diligence.
 so much of this work is just due diligence.
-However applications which have all the library is on a class path may
-be vulnerable, and the ugprades should also reduce the number of false
+However, applications which have all the library is on a class path may
+be vulnerable, and the upgrades should also reduce the number of false
 positives security scanners report.
 positives security scanners report.
 
 
 We have not been able to upgrade every single dependency to the latest
 We have not been able to upgrade every single dependency to the latest
@@ -170,12 +225,12 @@ can, with care, keep data and computing resources private.
 1. Physical cluster: *configure Hadoop security*, usually bonded to the
 1. Physical cluster: *configure Hadoop security*, usually bonded to the
    enterprise Kerberos/Active Directory systems.
    enterprise Kerberos/Active Directory systems.
    Good.
    Good.
-1. Cloud: transient or persistent single or multiple user/tenant cluster
+2. Cloud: transient or persistent single or multiple user/tenant cluster
    with private VLAN *and security*.
    with private VLAN *and security*.
    Good.
    Good.
    Consider [Apache Knox](https://knox.apache.org/) for managing remote
    Consider [Apache Knox](https://knox.apache.org/) for managing remote
    access to the cluster.
    access to the cluster.
-1. Cloud: transient single user/tenant cluster with private VLAN
+3. Cloud: transient single user/tenant cluster with private VLAN
    *and no security at all*.
    *and no security at all*.
    Requires careful network configuration as this is the sole
    Requires careful network configuration as this is the sole
    means of securing the cluster..
    means of securing the cluster..