瀏覽代碼

Update index.md.vm for 3.2.2 release.

He Xiaoqiao 4 年之前
父節點
當前提交
67c274bc88
共有 1 個文件被更改,包括 77 次插入221 次删除
  1. 77 221
      hadoop-project/src/site/markdown/index.md.vm

+ 77 - 221
hadoop-project/src/site/markdown/index.md.vm

@@ -16,10 +16,7 @@ Apache Hadoop ${project.version}
 ================================
 ================================
 
 
 Apache Hadoop ${project.version} incorporates a number of significant
 Apache Hadoop ${project.version} incorporates a number of significant
-enhancements over the previous major release line (hadoop-2.x).
-
-This release is generally available (GA), meaning that it represents a point of
-API stability and quality that we consider production-ready.
+enhancements over the previous major release line (hadoop-3.2).
 
 
 Overview
 Overview
 ========
 ========
@@ -27,224 +24,83 @@ Overview
 Users are encouraged to read the full set of release notes.
 Users are encouraged to read the full set of release notes.
 This page provides an overview of the major changes.
 This page provides an overview of the major changes.
 
 
-Minimum required Java version increased from Java 7 to Java 8
-------------------
-
-All Hadoop JARs are now compiled targeting a runtime version of Java 8.
-Users still using Java 7 or below must upgrade to Java 8.
-
-Support for erasure coding in HDFS
-------------------
-
-Erasure coding is a method for durably storing data with significant space
-savings compared to replication. Standard encodings like Reed-Solomon (10,4)
-have a 1.4x space overhead, compared to the 3x overhead of standard HDFS
-replication.
-
-Since erasure coding imposes additional overhead during reconstruction
-and performs mostly remote reads, it has traditionally been used for
-storing colder, less frequently accessed data. Users should consider
-the network and CPU overheads of erasure coding when deploying this
-feature.
-
-More details are available in the
-[HDFS Erasure Coding](./hadoop-project-dist/hadoop-hdfs/HDFSErasureCoding.html)
-documentation.
-
-YARN Timeline Service v.2
--------------------
-
-We are introducing an early preview (alpha 2) of a major revision of YARN
-Timeline Service: v.2. YARN Timeline Service v.2 addresses two major
-challenges: improving scalability and reliability of Timeline Service, and
-enhancing usability by introducing flows and aggregation.
-
-YARN Timeline Service v.2 alpha 2 is provided so that users and developers
-can test it and provide feedback and suggestions for making it a ready
-replacement for Timeline Service v.1.x. It should be used only in a test
-capacity.
-
-More details are available in the
-[YARN Timeline Service v.2](./hadoop-yarn/hadoop-yarn-site/TimelineServiceV2.html)
-documentation.
-
-Shell script rewrite
--------------------
-
-The Hadoop shell scripts have been rewritten to fix many long-standing
-bugs and include some new features.  While an eye has been kept towards
-compatibility, some changes may break existing installations.
-
-Incompatible changes are documented in the release notes, with related
-discussion on [HADOOP-9902](https://issues.apache.org/jira/browse/HADOOP-9902).
-
-More details are available in the
-[Unix Shell Guide](./hadoop-project-dist/hadoop-common/UnixShellGuide.html)
-documentation. Power users will also be pleased by the
-[Unix Shell API](./hadoop-project-dist/hadoop-common/UnixShellAPI.html)
-documentation, which describes much of the new functionality, particularly
-related to extensibility.
-
-Shaded client jars
-------------------
-
-The `hadoop-client` Maven artifact available in 2.x releases pulls
-Hadoop's transitive dependencies onto a Hadoop application's classpath.
-This can be problematic if the versions of these transitive dependencies
-conflict with the versions used by the application.
-
-[HADOOP-11804](https://issues.apache.org/jira/browse/HADOOP-11804) adds
-new `hadoop-client-api` and `hadoop-client-runtime` artifacts that
-shade Hadoop's dependencies into a single jar. This avoids leaking
-Hadoop's dependencies onto the application's classpath.
-
-Support for Opportunistic Containers and Distributed Scheduling.
---------------------
-
-A notion of `ExecutionType` has been introduced, whereby Applications can
-now request for containers with an execution type of `Opportunistic`.
-Containers of this type can be dispatched for execution at an NM even if
-there are no resources available at the moment of scheduling. In such a
-case, these containers will be queued at the NM, waiting for resources to
-be available for it to start. Opportunistic containers are of lower priority
-than the default `Guaranteed` containers and are therefore preempted,
-if needed, to make room for Guaranteed containers. This should
-improve cluster utilization.
-
-Opportunistic containers are by default allocated by the central RM, but
-support has also been added to allow opportunistic containers to be
-allocated by a distributed scheduler which is implemented as an
-AMRMProtocol interceptor.
-
-Please see [documentation](./hadoop-yarn/hadoop-yarn-site/OpportunisticContainers.html)
-for more details.
-
-MapReduce task-level native optimization
---------------------
-
-MapReduce has added support for a native implementation of the map output
-collector. For shuffle-intensive jobs, this can lead to a performance
-improvement of 30% or more.
-
-See the release notes for
-[MAPREDUCE-2841](https://issues.apache.org/jira/browse/MAPREDUCE-2841)
-for more detail.
-
-Support for more than 2 NameNodes.
---------------------
-
-The initial implementation of HDFS NameNode high-availability provided
-for a single active NameNode and a single Standby NameNode. By replicating
-edits to a quorum of three JournalNodes, this architecture is able to
-tolerate the failure of any one node in the system.
-
-However, some deployments require higher degrees of fault-tolerance.
-This is enabled by this new feature, which allows users to run multiple
-standby NameNodes. For instance, by configuring three NameNodes and
-five JournalNodes, the cluster is able to tolerate the failure of two
-nodes rather than just one.
-
-The [HDFS high-availability documentation](./hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithQJM.html)
-has been updated with instructions on how to configure more than two
-NameNodes.
-
-Default ports of multiple services have been changed.
-------------------------
-
-Previously, the default ports of multiple Hadoop services were in the
-Linux ephemeral port range (32768-61000). This meant that at startup,
-services would sometimes fail to bind to the port due to a conflict
-with another application.
-
-These conflicting ports have been moved out of the ephemeral range,
-affecting the NameNode, Secondary NameNode, DataNode, and KMS. Our
-documentation has been updated appropriately, but see the release
-notes for [HDFS-9427](https://issues.apache.org/jira/browse/HDFS-9427) and
-[HADOOP-12811](https://issues.apache.org/jira/browse/HADOOP-12811)
-for a list of port changes.
-
-Support for Microsoft Azure Data Lake and Aliyun Object Storage System filesystem connectors
----------------------
-
-Hadoop now supports integration with Microsoft Azure Data Lake and
-Aliyun Object Storage System as alternative Hadoop-compatible filesystems.
-
-Intra-datanode balancer
--------------------
-
-A single DataNode manages multiple disks. During normal write operation,
-disks will be filled up evenly. However, adding or replacing disks can
-lead to significant skew within a DataNode. This situation is not handled
-by the existing HDFS balancer, which concerns itself with inter-, not intra-,
-DN skew.
-
-This situation is handled by the new intra-DataNode balancing
-functionality, which is invoked via the `hdfs diskbalancer` CLI.
-See the disk balancer section in the
-[HDFS Commands Guide](./hadoop-project-dist/hadoop-hdfs/HDFSCommands.html)
-for more information.
-
-Reworked daemon and task heap management
----------------------
-
-A series of changes have been made to heap management for Hadoop daemons
-as well as MapReduce tasks.
-
-[HADOOP-10950](https://issues.apache.org/jira/browse/HADOOP-10950) introduces
-new methods for configuring daemon heap sizes.
-Notably, auto-tuning is now possible based on the memory size of the host,
-and the `HADOOP_HEAPSIZE` variable has been deprecated.
-See the full release notes of HADOOP-10950 for more detail.
-
-[MAPREDUCE-5785](https://issues.apache.org/jira/browse/MAPREDUCE-5785)
-simplifies the configuration of map and reduce task
-heap sizes, so the desired heap size no longer needs to be specified
-in both the task configuration and as a Java option.
-Existing configs that already specify both are not affected by this change.
-See the full release notes of MAPREDUCE-5785 for more details.
-
-S3Guard: Consistency and Metadata Caching for the S3A filesystem client
----------------------
-
-[HADOOP-13345](https://issues.apache.org/jira/browse/HADOOP-13345) adds an
-optional feature to the S3A client of Amazon S3 storage: the ability to use
-a DynamoDB table as a fast and consistent store of file and directory
-metadata.
-
-See [S3Guard](./hadoop-aws/tools/hadoop-aws/s3guard.html) for more details.
-
-HDFS Router-Based Federation
----------------------
-HDFS Router-Based Federation adds a RPC routing layer that provides a federated
-view of multiple HDFS namespaces. This is similar to the existing
-[ViewFs](./hadoop-project-dist/hadoop-hdfs/ViewFs.html)) and
-[HDFS Federation](./hadoop-project-dist/hadoop-hdfs/Federation.html)
-functionality, except the mount table is managed on the server-side by the
-routing layer rather than on the client. This simplifies access to a federated
-cluster for existing HDFS clients.
-
-See [HDFS-10467](https://issues.apache.org/jira/browse/HDFS-10467) and the
-HDFS Router-based Federation
-[documentation](./hadoop-project-dist/hadoop-hdfs-rbf/HDFSRouterFederation.html) for
-more details.
-
-API-based configuration of Capacity Scheduler queue configuration
-----------------------
-
-The OrgQueue extension to the capacity scheduler provides a programmatic way to
-change configurations by providing a REST API that users can call to modify
-queue configurations. This enables automation of queue configuration management
-by administrators in the queue's `administer_queue` ACL.
-
-See [YARN-5734](https://issues.apache.org/jira/browse/YARN-5734) and the
-[Capacity Scheduler documentation](./hadoop-yarn/hadoop-yarn-site/CapacityScheduler.html) for more information.
-
-YARN Resource Types
----------------
-
-The YARN resource model has been generalized to support user-defined countable resource types beyond CPU and memory. For instance, the cluster administrator could define resources like GPUs, software licenses, or locally-attached storage. YARN tasks can then be scheduled based on the availability of these resources.
+ABFS: fix for Sever Name Indication (SNI)
+----------
+ABFS: Bug fix to support Server Name Indication (SNI).
+
+
+Setting permissions on name directory fails on non posix compliant filesystems
+----------
+Fixed namenode/journal startup on Windows.
+
+
+Backport HDFS persistent memory read cache support to branch-3.2**
+----------
+Non-volatile storage class memory (SCM, also known as persistent memory) is
+supported in HDFS cache. To enable SCM cache, user just needs to configure SCM
+volume for property “dfs.datanode.cache.pmem.dirs” in hdfs-site.xml. And all
+HDFS cache directives keep unchanged. There are two implementations for HDFS
+SCM Cache, one is pure java code implementation and the other is native PMDK
+based implementation. The latter implementation can bring user better
+performance gain in cache write and cache read. If PMDK native libs could be
+loaded, it will use PMDK based implementation otherwise it will fallback to
+java code implementation. To enable PMDK based implementation, user should
+install PMDK library by referring to the official site http://pmem.io/. Then,
+build Hadoop with PMDK support by referring to "PMDK library build options"
+section in \`BUILDING.txt\` in the source code. If multiple SCM volumes are
+configured, a round-robin policy is used to select an available volume for
+caching a block. Consistent with DRAM cache, SCM cache also has no cache
+eviction mechanism. When DataNode receives a data read request from a client,
+if the corresponding block is cached into SCM, DataNode will instantiate an
+InputStream with the block location path on SCM (pure java implementation) or
+cache address on SCM (PMDK based implementation). Once the InputStream is
+created, DataNode will send the cached data to the client. Please refer
+"Centralized Cache Management" guide for more details.
+
+
+Consistent Reads from Standby Node
+----------
+Observer is a new type of a NameNode in addition to Active and Standby Nodes in
+HA settings. An Observer Node maintains a replica of the namespace same as a
+Standby Node.  It additionally allows execution of clients read requests. To
+ensure read-after-write consistency within a single client, a state ID is
+introduced in RPC headers. The Observer responds to the client request only
+after its own state has caught up with the client’s state ID, which it
+previously received from the Active NameNode.
+
+Clients can explicitly invoke a new client protocol call msync(), which ensures
+that subsequent reads by this client from an Observer are consistent.
+
+A new client-side ObserverReadProxyProvider is introduced to provide automatic
+switching between Active and Observer NameNodes for submitting respectively
+write and read requests.
+
+
+Update checkstyle to 8.26 and maven-checkstyle-plugin to 3.1.0
+----------
+Updated checkstyle to 8.26 and updated maven-checkstyle-plugin to 3.1.0.
+
+
+ZKFC ignores dfs.namenode.rpc-bind-host and uses dfs.namenode.rpc-address to bind to host address
+----------
+ZKFC binds host address to "dfs.namenode.servicerpc-bind-host", if configured.
+Otherwise, it binds to "dfs.namenode.rpc-bind-host". If neither of those is
+configured, ZKFC binds itself to NameNode RPC server address (effectively
+"dfs.namenode.rpc-address").
+
+
+ListStatus on ViewFS root (ls "/") should list the linkFallBack root (configured target root).
+----------
+ViewFS#listStatus on root("/") considers listing from fallbackLink if available.
+If the same directory name is present in configured mount path as well as in
+fallback link, then only the configured mount path will be listed in the
+returned result.
+
+
+NMs should supply a health status when registering with RM
+----------
+Improved node registration with node health status.
 
 
-See [YARN-3926](https://issues.apache.org/jira/browse/YARN-3926) and the [YARN resource model documentation](./hadoop-yarn/hadoop-yarn-site/ResourceModel.html) for more information.
 
 
 Getting Started
 Getting Started
 ===============
 ===============