4 年之前 · 67c274bc88
--- a/hadoop-project/src/site/markdown/index.md.vm
+++ b/hadoop-project/src/site/markdown/index.md.vm
@@ -16,10 +16,7 @@ Apache Hadoop ${project.version}
 
				 ================================
			
 
				 
			
 
				 Apache Hadoop ${project.version} incorporates a number of significant
			
 
				-enhancements over the previous major release line (hadoop-2.x).
			
 
				-
			
 
				-This release is generally available (GA), meaning that it represents a point of
			
 
				-API stability and quality that we consider production-ready.
			
 
				+enhancements over the previous major release line (hadoop-3.2).
			
 
				 
			
 
				 Overview
			
 
				 ========
			
@@ -27,224 +24,83 @@ Overview
 
				 Users are encouraged to read the full set of release notes.
			
 
				 This page provides an overview of the major changes.
			
 
				 
			
 
				-Minimum required Java version increased from Java 7 to Java 8
			
 
				-------------------
			
 
				-
			
 
				-All Hadoop JARs are now compiled targeting a runtime version of Java 8.
			
 
				-Users still using Java 7 or below must upgrade to Java 8.
			
 
				-
			
 
				-Support for erasure coding in HDFS
			
 
				-------------------
			
 
				-
			
 
				-Erasure coding is a method for durably storing data with significant space
			
 
				-savings compared to replication. Standard encodings like Reed-Solomon (10,4)
			
 
				-have a 1.4x space overhead, compared to the 3x overhead of standard HDFS
			
 
				-replication.
			
 
				-
			
 
				-Since erasure coding imposes additional overhead during reconstruction
			
 
				-and performs mostly remote reads, it has traditionally been used for
			
 
				-storing colder, less frequently accessed data. Users should consider
			
 
				-the network and CPU overheads of erasure coding when deploying this
			
 
				-feature.
			
 
				-
			
 
				-More details are available in the
			
 
				-[HDFS Erasure Coding](./hadoop-project-dist/hadoop-hdfs/HDFSErasureCoding.html)
			
 
				-documentation.
			
 
				-
			
 
				-YARN Timeline Service v.2
			
 
				--------------------
			
 
				-
			
 
				-We are introducing an early preview (alpha 2) of a major revision of YARN
			
 
				-Timeline Service: v.2. YARN Timeline Service v.2 addresses two major
			
 
				-challenges: improving scalability and reliability of Timeline Service, and
			
 
				-enhancing usability by introducing flows and aggregation.
			
 
				-
			
 
				-YARN Timeline Service v.2 alpha 2 is provided so that users and developers
			
 
				-can test it and provide feedback and suggestions for making it a ready
			
 
				-replacement for Timeline Service v.1.x. It should be used only in a test
			
 
				-capacity.
			
 
				-
			
 
				-More details are available in the
			
 
				-[YARN Timeline Service v.2](./hadoop-yarn/hadoop-yarn-site/TimelineServiceV2.html)
			
 
				-documentation.
			
 
				-
			
 
				-Shell script rewrite
			
 
				--------------------
			
 
				-
			
 
				-The Hadoop shell scripts have been rewritten to fix many long-standing
			
 
				-bugs and include some new features.  While an eye has been kept towards
			
 
				-compatibility, some changes may break existing installations.
			
 
				-
			
 
				-Incompatible changes are documented in the release notes, with related
			
 
				-discussion on [HADOOP-9902](https://issues.apache.org/jira/browse/HADOOP-9902).
			
 
				-
			
 
				-More details are available in the
			
 
				-[Unix Shell Guide](./hadoop-project-dist/hadoop-common/UnixShellGuide.html)
			
 
				-documentation. Power users will also be pleased by the
			
 
				-[Unix Shell API](./hadoop-project-dist/hadoop-common/UnixShellAPI.html)
			
 
				-documentation, which describes much of the new functionality, particularly
			
 
				-related to extensibility.
			
 
				-
			
 
				-Shaded client jars
			
 
				-------------------
			
 
				-
			
 
				-The `hadoop-client` Maven artifact available in 2.x releases pulls
			
 
				-Hadoop's transitive dependencies onto a Hadoop application's classpath.
			
 
				-This can be problematic if the versions of these transitive dependencies
			
 
				-conflict with the versions used by the application.
			
 
				-
			
 
				-[HADOOP-11804](https://issues.apache.org/jira/browse/HADOOP-11804) adds
			
 
				-new `hadoop-client-api` and `hadoop-client-runtime` artifacts that
			
 
				-shade Hadoop's dependencies into a single jar. This avoids leaking
			
 
				-Hadoop's dependencies onto the application's classpath.
			
 
				-
			
 
				-Support for Opportunistic Containers and Distributed Scheduling.
			
 
				---------------------
			
 
				-
			
 
				-A notion of `ExecutionType` has been introduced, whereby Applications can
			
 
				-now request for containers with an execution type of `Opportunistic`.
			
 
				-Containers of this type can be dispatched for execution at an NM even if
			
 
				-there are no resources available at the moment of scheduling. In such a
			
 
				-case, these containers will be queued at the NM, waiting for resources to
			
 
				-be available for it to start. Opportunistic containers are of lower priority
			
 
				-than the default `Guaranteed` containers and are therefore preempted,
			
 
				-if needed, to make room for Guaranteed containers. This should
			
 
				-improve cluster utilization.
			
 
				-
			
 
				-Opportunistic containers are by default allocated by the central RM, but
			
 
				-support has also been added to allow opportunistic containers to be
			
 
				-allocated by a distributed scheduler which is implemented as an
			
 
				-AMRMProtocol interceptor.
			
 
				-
			
 
				-Please see [documentation](./hadoop-yarn/hadoop-yarn-site/OpportunisticContainers.html)
			
 
				-for more details.
			
 
				-
			
 
				-MapReduce task-level native optimization
			
 
				---------------------
			
 
				-
			
 
				-MapReduce has added support for a native implementation of the map output
			
 
				-collector. For shuffle-intensive jobs, this can lead to a performance
			
 
				-improvement of 30% or more.
			
 
				-
			
 
				-See the release notes for
			
 
				-[MAPREDUCE-2841](https://issues.apache.org/jira/browse/MAPREDUCE-2841)
			
 
				-for more detail.
			
 
				-
			
 
				-Support for more than 2 NameNodes.
			
 
				---------------------
			
 
				-
			
 
				-The initial implementation of HDFS NameNode high-availability provided
			
 
				-for a single active NameNode and a single Standby NameNode. By replicating
			
 
				-edits to a quorum of three JournalNodes, this architecture is able to
			
 
				-tolerate the failure of any one node in the system.
			
 
				-
			
 
				-However, some deployments require higher degrees of fault-tolerance.
			
 
				-This is enabled by this new feature, which allows users to run multiple
			
 
				-standby NameNodes. For instance, by configuring three NameNodes and
			
 
				-five JournalNodes, the cluster is able to tolerate the failure of two
			
 
				-nodes rather than just one.
			
 
				-
			
 
				-The [HDFS high-availability documentation](./hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithQJM.html)
			
 
				-has been updated with instructions on how to configure more than two
			
 
				-NameNodes.
			
 
				-
			
 
				-Default ports of multiple services have been changed.
			
 
				-------------------------
			
 
				-
			
 
				-Previously, the default ports of multiple Hadoop services were in the
			
 
				-Linux ephemeral port range (32768-61000). This meant that at startup,
			
 
				-services would sometimes fail to bind to the port due to a conflict
			
 
				-with another application.
			
 
				-
			
 
				-These conflicting ports have been moved out of the ephemeral range,
			
 
				-affecting the NameNode, Secondary NameNode, DataNode, and KMS. Our
			
 
				-documentation has been updated appropriately, but see the release
			
 
				-notes for [HDFS-9427](https://issues.apache.org/jira/browse/HDFS-9427) and
			
 
				-[HADOOP-12811](https://issues.apache.org/jira/browse/HADOOP-12811)
			
 
				-for a list of port changes.
			
 
				-
			
 
				-Support for Microsoft Azure Data Lake and Aliyun Object Storage System filesystem connectors
			
 
				----------------------
			
 
				-
			
 
				-Hadoop now supports integration with Microsoft Azure Data Lake and
			
 
				-Aliyun Object Storage System as alternative Hadoop-compatible filesystems.
			
 
				-
			
 
				-Intra-datanode balancer
			
 
				--------------------
			
 
				-
			
 
				-A single DataNode manages multiple disks. During normal write operation,
			
 
				-disks will be filled up evenly. However, adding or replacing disks can
			
 
				-lead to significant skew within a DataNode. This situation is not handled
			
 
				-by the existing HDFS balancer, which concerns itself with inter-, not intra-,
			
 
				-DN skew.
			
 
				-
			
 
				-This situation is handled by the new intra-DataNode balancing
			
 
				-functionality, which is invoked via the `hdfs diskbalancer` CLI.
			
 
				-See the disk balancer section in the
			
 
				-[HDFS Commands Guide](./hadoop-project-dist/hadoop-hdfs/HDFSCommands.html)
			
 
				-for more information.
			
 
				-
			
 
				-Reworked daemon and task heap management
			
 
				----------------------
			
 
				-
			
 
				-A series of changes have been made to heap management for Hadoop daemons
			
 
				-as well as MapReduce tasks.
			
 
				-
			
 
				-[HADOOP-10950](https://issues.apache.org/jira/browse/HADOOP-10950) introduces
			
 
				-new methods for configuring daemon heap sizes.
			
 
				-Notably, auto-tuning is now possible based on the memory size of the host,
			
 
				-and the `HADOOP_HEAPSIZE` variable has been deprecated.
			
 
				-See the full release notes of HADOOP-10950 for more detail.
			
 
				-
			
 
				-[MAPREDUCE-5785](https://issues.apache.org/jira/browse/MAPREDUCE-5785)
			
 
				-simplifies the configuration of map and reduce task
			
 
				-heap sizes, so the desired heap size no longer needs to be specified
			
 
				-in both the task configuration and as a Java option.
			
 
				-Existing configs that already specify both are not affected by this change.
			
 
				-See the full release notes of MAPREDUCE-5785 for more details.
			
 
				-
			
 
				-S3Guard: Consistency and Metadata Caching for the S3A filesystem client
			
 
				----------------------
			
 
				-
			
 
				-[HADOOP-13345](https://issues.apache.org/jira/browse/HADOOP-13345) adds an
			
 
				-optional feature to the S3A client of Amazon S3 storage: the ability to use
			
 
				-a DynamoDB table as a fast and consistent store of file and directory
			
 
				-metadata.
			
 
				-
			
 
				-See [S3Guard](./hadoop-aws/tools/hadoop-aws/s3guard.html) for more details.
			
 
				-
			
 
				-HDFS Router-Based Federation
			
 
				----------------------
			
 
				-HDFS Router-Based Federation adds a RPC routing layer that provides a federated
			
 
				-view of multiple HDFS namespaces. This is similar to the existing
			
 
				-[ViewFs](./hadoop-project-dist/hadoop-hdfs/ViewFs.html)) and
			
 
				-[HDFS Federation](./hadoop-project-dist/hadoop-hdfs/Federation.html)
			
 
				-functionality, except the mount table is managed on the server-side by the
			
 
				-routing layer rather than on the client. This simplifies access to a federated
			
 
				-cluster for existing HDFS clients.
			
 
				-
			
 
				-See [HDFS-10467](https://issues.apache.org/jira/browse/HDFS-10467) and the
			
 
				-HDFS Router-based Federation
			
 
				-[documentation](./hadoop-project-dist/hadoop-hdfs-rbf/HDFSRouterFederation.html) for
			
 
				-more details.
			
 
				-
			
 
				-API-based configuration of Capacity Scheduler queue configuration
			
 
				-----------------------
			
 
				-
			
 
				-The OrgQueue extension to the capacity scheduler provides a programmatic way to
			
 
				-change configurations by providing a REST API that users can call to modify
			
 
				-queue configurations. This enables automation of queue configuration management
			
 
				-by administrators in the queue's `administer_queue` ACL.
			
 
				-
			
 
				-See [YARN-5734](https://issues.apache.org/jira/browse/YARN-5734) and the
			
 
				-[Capacity Scheduler documentation](./hadoop-yarn/hadoop-yarn-site/CapacityScheduler.html) for more information.
			
 
				-
			
 
				-YARN Resource Types
			
 
				----------------
			
 
				-
			
 
				-The YARN resource model has been generalized to support user-defined countable resource types beyond CPU and memory. For instance, the cluster administrator could define resources like GPUs, software licenses, or locally-attached storage. YARN tasks can then be scheduled based on the availability of these resources.
			
 
				+ABFS: fix for Sever Name Indication (SNI)
			
 
				+----------
			
 
				+ABFS: Bug fix to support Server Name Indication (SNI).
			
 
				+
			
 
				+
			
 
				+Setting permissions on name directory fails on non posix compliant filesystems
			
 
				+----------
			
 
				+Fixed namenode/journal startup on Windows.
			
 
				+
			
 
				+
			
 
				+Backport HDFS persistent memory read cache support to branch-3.2**
			
 
				+----------
			
 
				+Non-volatile storage class memory (SCM, also known as persistent memory) is
			
 
				+supported in HDFS cache. To enable SCM cache, user just needs to configure SCM
			
 
				+volume for property “dfs.datanode.cache.pmem.dirs” in hdfs-site.xml. And all
			
 
				+HDFS cache directives keep unchanged. There are two implementations for HDFS
			
 
				+SCM Cache, one is pure java code implementation and the other is native PMDK
			
 
				+based implementation. The latter implementation can bring user better
			
 
				+performance gain in cache write and cache read. If PMDK native libs could be
			
 
				+loaded, it will use PMDK based implementation otherwise it will fallback to
			
 
				+java code implementation. To enable PMDK based implementation, user should
			
 
				+install PMDK library by referring to the official site http://pmem.io/. Then,
			
 
				+build Hadoop with PMDK support by referring to "PMDK library build options"
			
 
				+section in \`BUILDING.txt\` in the source code. If multiple SCM volumes are
			
 
				+configured, a round-robin policy is used to select an available volume for
			
 
				+caching a block. Consistent with DRAM cache, SCM cache also has no cache
			
 
				+eviction mechanism. When DataNode receives a data read request from a client,
			
 
				+if the corresponding block is cached into SCM, DataNode will instantiate an
			
 
				+InputStream with the block location path on SCM (pure java implementation) or
			
 
				+cache address on SCM (PMDK based implementation). Once the InputStream is
			
 
				+created, DataNode will send the cached data to the client. Please refer
			
 
				+"Centralized Cache Management" guide for more details.
			
 
				+
			
 
				+
			
 
				+Consistent Reads from Standby Node
			
 
				+----------
			
 
				+Observer is a new type of a NameNode in addition to Active and Standby Nodes in
			
 
				+HA settings. An Observer Node maintains a replica of the namespace same as a
			
 
				+Standby Node.  It additionally allows execution of clients read requests. To
			
 
				+ensure read-after-write consistency within a single client, a state ID is
			
 
				+introduced in RPC headers. The Observer responds to the client request only
			
 
				+after its own state has caught up with the client’s state ID, which it
			
 
				+previously received from the Active NameNode.
			
 
				+
			
 
				+Clients can explicitly invoke a new client protocol call msync(), which ensures
			
 
				+that subsequent reads by this client from an Observer are consistent.
			
 
				+
			
 
				+A new client-side ObserverReadProxyProvider is introduced to provide automatic
			
 
				+switching between Active and Observer NameNodes for submitting respectively
			
 
				+write and read requests.
			
 
				+
			
 
				+
			
 
				+Update checkstyle to 8.26 and maven-checkstyle-plugin to 3.1.0
			
 
				+----------
			
 
				+Updated checkstyle to 8.26 and updated maven-checkstyle-plugin to 3.1.0.
			
 
				+
			
 
				+
			
 
				+ZKFC ignores dfs.namenode.rpc-bind-host and uses dfs.namenode.rpc-address to bind to host address
			
 
				+----------
			
 
				+ZKFC binds host address to "dfs.namenode.servicerpc-bind-host", if configured.
			
 
				+Otherwise, it binds to "dfs.namenode.rpc-bind-host". If neither of those is
			
 
				+configured, ZKFC binds itself to NameNode RPC server address (effectively
			
 
				+"dfs.namenode.rpc-address").
			
 
				+
			
 
				+
			
 
				+ListStatus on ViewFS root (ls "/") should list the linkFallBack root (configured target root).
			
 
				+----------
			
 
				+ViewFS#listStatus on root("/") considers listing from fallbackLink if available.
			
 
				+If the same directory name is present in configured mount path as well as in
			
 
				+fallback link, then only the configured mount path will be listed in the
			
 
				+returned result.
			
 
				+
			
 
				+
			
 
				+NMs should supply a health status when registering with RM
			
 
				+----------
			
 
				+Improved node registration with node health status.
			
 
				 
			
 
				-See [YARN-3926](https://issues.apache.org/jira/browse/YARN-3926) and the [YARN resource model documentation](./hadoop-yarn/hadoop-yarn-site/ResourceModel.html) for more information.
			
 
				 
			
 
				 Getting Started
			
 
				 ===============