|
@@ -0,0 +1,96 @@
|
|
|
+~~ Licensed under the Apache License, Version 2.0 (the "License");
|
|
|
+~~ you may not use this file except in compliance with the License.
|
|
|
+~~ You may obtain a copy of the License at
|
|
|
+~~
|
|
|
+~~ http://www.apache.org/licenses/LICENSE-2.0
|
|
|
+~~
|
|
|
+~~ Unless required by applicable law or agreed to in writing, software
|
|
|
+~~ distributed under the License is distributed on an "AS IS" BASIS,
|
|
|
+~~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
|
|
+~~ See the License for the specific language governing permissions and
|
|
|
+~~ limitations under the License. See accompanying LICENSE file.
|
|
|
+
|
|
|
+ ---
|
|
|
+ Hadoop Map Reduce Next Generation-${project.version} - Pluggable Shuffle and Pluggable Sort
|
|
|
+ ---
|
|
|
+ ---
|
|
|
+ ${maven.build.timestamp}
|
|
|
+
|
|
|
+Hadoop MapReduce Next Generation - Pluggable Shuffle and Pluggable Sort
|
|
|
+
|
|
|
+ \[ {{{./index.html}Go Back}} \]
|
|
|
+
|
|
|
+* Introduction
|
|
|
+
|
|
|
+ The pluggable shuffle and pluggable sort capabilities allow replacing the
|
|
|
+ built in shuffle and sort logic with alternate implementations. Example use
|
|
|
+ cases for this are: using a different application protocol other than HTTP
|
|
|
+ such as RDMA for shuffling data from the Map nodes to the Reducer nodes; or
|
|
|
+ replacing the sort logic with custom algorithms that enable Hash aggregation
|
|
|
+ and Limit-N query.
|
|
|
+
|
|
|
+ <<IMPORTANT:>> The pluggable shuffle and pluggable sort capabilities are
|
|
|
+ experimental and unstable. This means the provided APIs may change and break
|
|
|
+ compatibility in future versions of Hadoop.
|
|
|
+
|
|
|
+* Implementing a Custom Shuffle and a Custom Sort
|
|
|
+
|
|
|
+ A custom shuffle implementation requires a
|
|
|
+ <<<org.apache.hadoop.yarn.server.nodemanager.containermanager.AuxServices.AuxiliaryService>>>
|
|
|
+ implementation class running in the NodeManagers and a
|
|
|
+ <<<org.apache.hadoop.mapred.ShuffleConsumerPlugin>>> implementation class
|
|
|
+ running in the Reducer tasks.
|
|
|
+
|
|
|
+ The default implementations provided by Hadoop can be used as references:
|
|
|
+
|
|
|
+ * <<<org.apache.hadoop.mapred.ShuffleHandler>>>
|
|
|
+
|
|
|
+ * <<<org.apache.hadoop.mapreduce.task.reduce.Shuffle>>>
|
|
|
+
|
|
|
+ A custom sort implementation requires a <<<org.apache.hadoop.mapred.MapOutputCollector>>>
|
|
|
+ implementation class running in the Mapper tasks and (optionally, depending
|
|
|
+ on the sort implementation) a <<<org.apache.hadoop.mapred.ShuffleConsumerPlugin>>>
|
|
|
+ implementation class running in the Reducer tasks.
|
|
|
+
|
|
|
+ The default implementations provided by Hadoop can be used as references:
|
|
|
+
|
|
|
+ * <<<org.apache.hadoop.mapred.MapTask$MapOutputBuffer>>>
|
|
|
+
|
|
|
+ * <<<org.apache.hadoop.mapreduce.task.reduce.Shuffle>>>
|
|
|
+
|
|
|
+* Configuration
|
|
|
+
|
|
|
+ Except for the auxiliary service running in the NodeManagers serving the
|
|
|
+ shuffle (by default the <<<ShuffleHandler>>>), all the pluggable components
|
|
|
+ run in the job tasks. This means, they can be configured on per job basis.
|
|
|
+ The auxiliary service servicing the Shuffle must be configured in the
|
|
|
+ NodeManagers configuration.
|
|
|
+
|
|
|
+** Job Configuration Properties (on per job basis):
|
|
|
+
|
|
|
+*--------------------------------------+---------------------+-----------------+
|
|
|
+| <<Property>> | <<Default Value>> | <<Explanation>> |
|
|
|
+*--------------------------------------+---------------------+-----------------+
|
|
|
+| <<<mapreduce.job.reduce.shuffle.consumer.plugin.class>>> | <<<org.apache.hadoop.mapreduce.task.reduce.Shuffle>>> | The <<<ShuffleConsumerPlugin>>> implementation to use |
|
|
|
+*--------------------------------------+---------------------+-----------------+
|
|
|
+| <<<mapreduce.job.map.output.collector.class>>> | <<<org.apache.hadoop.mapred.MapTask$MapOutputBuffer>>> | The <<<MapOutputCollector>>> implementation to use |
|
|
|
+*--------------------------------------+---------------------+-----------------+
|
|
|
+
|
|
|
+ These properties can also be set in the <<<mapred-site.xml>>> to change the default values for all jobs.
|
|
|
+
|
|
|
+** NodeManager Configuration properties, <<<yarn-site.xml>>> in all nodes:
|
|
|
+
|
|
|
+*--------------------------------------+---------------------+-----------------+
|
|
|
+| <<Property>> | <<Default Value>> | <<Explanation>> |
|
|
|
+*--------------------------------------+---------------------+-----------------+
|
|
|
+| <<<yarn.nodemanager.aux-services>>> | <<<...,mapreduce.shuffle>>> | The auxiliary service name |
|
|
|
+*--------------------------------------+---------------------+-----------------+
|
|
|
+| <<<yarn.nodemanager.aux-services.mapreduce.shuffle.class>>> | <<<org.apache.hadoop.mapred.ShuffleHandler>>> | The auxiliary service class to use |
|
|
|
+*--------------------------------------+---------------------+-----------------+
|
|
|
+
|
|
|
+ <<IMPORTANT:>> If setting an auxiliary service in addition the default
|
|
|
+ <<<mapreduce.shuffle>>> service, then a new service key should be added to the
|
|
|
+ <<<yarn.nodemanager.aux-services>>> property, for example <<<mapred.shufflex>>>.
|
|
|
+ Then the property defining the corresponding class must be
|
|
|
+ <<<yarn.nodemanager.aux-services.mapreduce.shufflex.class>>>.
|
|
|
+
|