17 年之前 · e304f72fc0
--- a/CHANGES.txt
+++ b/CHANGES.txt
@@ -315,6 +315,8 @@ Release 0.18.0 - Unreleased
 
				     HADOOP-3547. Documents the way in which native libraries can be distributed
			
 
				     via the DistributedCache. (Amareshwari Sriramadasu via ddas)
			
 
				 
			
 
				+    HADOOP-3606. Updates the Streaming doc. (Amareshwari Sriramadasu via ddas) 
			
 
				+
			
 
				   OPTIMIZATIONS
			
 
				 
			
 
				     HADOOP-3274. The default constructor of BytesWritable creates empty 
			
--- a/docs/changes.html
+++ b/docs/changes.html
@@ -209,7 +209,7 @@ framework.<br />(tomwhite via omalley)</li>
 
				     </ol>
			
 
				   </li>
			
 
				   <li><a href="javascript:toggleList('release_0.18.0_-_unreleased_._improvements_')">  IMPROVEMENTS
			
 
				-</a>&nbsp;&nbsp;&nbsp;(42)
			
 
				+</a>&nbsp;&nbsp;&nbsp;(43)
			
 
				     <ol id="release_0.18.0_-_unreleased_._improvements_">
			
 
				       <li><a href="http://issues.apache.org/jira/browse/HADOOP-2928">HADOOP-2928</a>. Remove deprecated FileSystem.getContentLength().<br />(Lohit Vjayarenu via rangadi)</li>
			
 
				       <li><a href="http://issues.apache.org/jira/browse/HADOOP-3130">HADOOP-3130</a>. Make the connect timeout smaller for getFile.<br />(Amar Ramesh Kamat via ddas)</li>
			
@@ -297,6 +297,7 @@ reflect that it should only be used in cleanup contexts.<br />(omalley)</li>
 
				       <li><a href="http://issues.apache.org/jira/browse/HADOOP-3593">HADOOP-3593</a>. Updates the mapred tutorial.<br />(ddas)</li>
			
 
				       <li><a href="http://issues.apache.org/jira/browse/HADOOP-3547">HADOOP-3547</a>. Documents the way in which native libraries can be distributed
			
 
				 via the DistributedCache.<br />(Amareshwari Sriramadasu via ddas)</li>
			
 
				+      <li><a href="http://issues.apache.org/jira/browse/HADOOP-3606">HADOOP-3606</a>. Updates the Streaming doc.<br />(Amareshwari Sriramadasu via ddas)</li>
			
 
				     </ol>
			
 
				   </li>
			
 
				   <li><a href="javascript:toggleList('release_0.18.0_-_unreleased_._optimizations_')">  OPTIMIZATIONS
			
--- a/docs/streaming.html
+++ b/docs/streaming.html
@@ -269,6 +269,12 @@ document.write("Last Published: " + document.lastModified);
 
				 <li>
			
 
				 <a href="#How+do+I+parse+XML+documents+using+streaming%3F">How do I parse XML documents using streaming? </a>
			
 
				 </li>
			
 
				+<li>
			
 
				+<a href="#How+do+I+update+counters+in+streaming+applications%3F">How do I update counters in streaming applications? </a>
			
 
				+</li>
			
 
				+<li>
			
 
				+<a href="#How+do+I+update+status+in+streaming+applications%3F">How do I update status in streaming applications? </a>
			
 
				+</li>
			
 
				 </ul>
			
 
				 </li>
			
 
				 </ul>
			
@@ -471,7 +477,8 @@ $HADOOP_HOME/bin/hadoop  jar $HADOOP_HOME/hadoop-streaming.jar \
 
				 The -jobconf mapred.reduce.tasks=2 in the above example specifies to use two reducers for the job.
			
 
				 </p>
			
 
				 <p>
			
 
				-For more details on the jobconf parameters see: <a href="http://wiki.apache.org/hadoop/JobConfFile">http://wiki.apache.org/hadoop/JobConfFile</a>
			
 
				+For more details on the jobconf parameters see:
			
 
				+<a href="http://hadoop.apache.org/core/docs/current/hadoop-default.html">hadoop-default.html</a>
			
 
				 </p>
			
 
				 <a name="N100D6"></a><a name="Other+Supported+Options"></a>
			
 
				 <h3 class="h4">Other Supported Options </h3>
			
@@ -543,8 +550,8 @@ To specify additional local temp directories use:
 
				    -jobconf mapred.temp.dir=/tmp/temp
			
 
				 </pre>
			
 
				 <p>
			
 
				-For more details on jobconf parameters see: <a href="http://wiki.apache.org/hadoop/JobConfFile">http://wiki.apache.org/hadoop/JobConfFile</a>
			
 
				-
			
 
				+For more details on jobconf parameters see:
			
 
				+<a href="http://hadoop.apache.org/core/docs/current/hadoop-default.html">hadoop-default.html</a>
			
 
				 </p>
			
 
				 <p>
			
 
				 To set an environment variable in a streaming command use:
			
@@ -644,7 +651,15 @@ Sorting within each partition for the reducer(all 4 fields used for sorting)</p>
 
				 <a name="N101E0"></a><a name="Working+with+the+Hadoop+Aggregate+Package+%28the+-reduce+aggregate+option%29"></a>
			
 
				 <h3 class="h4">Working with the Hadoop Aggregate Package (the -reduce aggregate option) </h3>
			
 
				 <p>
			
 
				-Hadoop has a library package called "Aggregate" (<a href="https://svn.apache.org/repos/asf/hadoop/core/trunk/src/java/org/apache/hadoop/mapred/lib/aggregate">https://svn.apache.org/repos/asf/hadoop/core/trunk/src/java/org/apache/hadoop/mapred/lib/aggregate</a>).  Aggregate provides a special reducer class and a special combiner class, and a list of simple aggregators that perform aggregations such as "sum", "max", "min" and so on  over a sequence of values. Aggregate allows you to define a mapper plugin class that is expected to generate "aggregatable items" for each input key/value pair of the mappers. The combiner/reducer will aggregate those aggregatable items by invoking the appropriate aggregators.
			
 
				+Hadoop has a library package called "Aggregate" (
			
 
				+<a href="https://svn.apache.org/repos/asf/hadoop/core/trunk/src/mapred/org/apache/hadoop/mapred/lib/aggregate">
			
 
				+https://svn.apache.org/repos/asf/hadoop/core/trunk/src/mapred/org/apache/hadoop/mapred/lib/aggregate</a>).
			
 
				+Aggregate provides a special reducer class and a special combiner class, and
			
 
				+a list of simple aggregators that perform aggregations such as "sum", "max",
			
 
				+"min" and so on  over a sequence of values. Aggregate allows you to define a
			
 
				+mapper plugin class that is expected to generate "aggregatable items" for each
			
 
				+input key/value pair of the mappers. The combiner/reducer will aggregate those
			
 
				+aggregatable items by invoking the appropriate aggregators.
			
 
				 </p>
			
 
				 <p>
			
 
				 To use Aggregate, simply specify "-reducer aggregate":
			
@@ -739,8 +754,8 @@ As an example, consider the problem of zipping (compressing) a set of files acro
 
				    
			
 
				 <li>Add these commands to your main function:
			
 
				 <pre class="code">
			
 
				-       OutputFormatBase.setCompressOutput(conf, true);
			
 
				-       OutputFormatBase.setOutputCompressorClass(conf, org.apache.hadoop.io.compress.GzipCodec.class);
			
 
				+       FileOutputFormat.setCompressOutput(conf, true);
			
 
				+       FileOutputFormat.setOutputCompressorClass(conf, org.apache.hadoop.io.compress.GzipCodec.class);
			
 
				        conf.setOutputFormat(NonSplitableTextInputFormat.class);
			
 
				        conf.setNumReduceTasks(0);
			
 
				 </pre>
			
@@ -766,7 +781,7 @@ As an example, consider the problem of zipping (compressing) a set of files acro
 
				 <a name="N1024A"></a><a name="How+many+reducers+should+I+use%3F"></a>
			
 
				 <h3 class="h4">How many reducers should I use? </h3>
			
 
				 <p>
			
 
				-See the Hadoop Wiki for details: <a href="http://wiki.apache.org/hadoop/HowManyMapsAndReduces">http://wiki.apache.org/hadoop/HowManyMapsAndReduces</a>
			
 
				+See the Hadoop Wiki for details: <a href="mapred_tutorial.html#Reducer">Reducer</a>
			
 
				 
			
 
				 </p>
			
 
				 <a name="N10258"></a><a name="If+I+set+up+an+alias+in+my+shell+script%2C+will+that+work+after+-mapper%2C+i.e.+say+I+do%3A+alias+c1%3D%27cut+-f1%27.+Will+-mapper+%22c1%22+work%3F"></a>
			
@@ -838,6 +853,20 @@ hadoop jar hadoop-streaming.jar -inputreader "StreamXmlRecord,begin=BEGIN_STRING
 
				 <p>
			
 
				 Anything found between BEGIN_STRING and END_STRING would be treated as one record for map tasks.
			
 
				 </p>
			
 
				+<a name="N102B3"></a><a name="How+do+I+update+counters+in+streaming+applications%3F"></a>
			
 
				+<h3 class="h4">How do I update counters in streaming applications? </h3>
			
 
				+<p>
			
 
				+A streaming process can use the stderr to emit counter information.
			
 
				+<span class="codefrag">reporter:counter:&lt;group&gt;,&lt;counter&gt;,&lt;amount&gt;</span> 
			
 
				+should be sent to stderr to update the counter.
			
 
				+</p>
			
 
				+<a name="N102C0"></a><a name="How+do+I+update+status+in+streaming+applications%3F"></a>
			
 
				+<h3 class="h4">How do I update status in streaming applications? </h3>
			
 
				+<p>
			
 
				+A streaming process can use the stderr to emit status information.
			
 
				+To set a status, <span class="codefrag">reporter:status:&lt;message&gt;</span> should be sent 
			
 
				+to stderr.
			
 
				+</p>
			
 
				 </div>
			
 
				 
			
 
				 </div>
			
--- a/docs/streaming.pdf
+++ b/docs/streaming.pdf
--- a/src/docs/src/documentation/content/xdocs/streaming.xml
+++ b/src/docs/src/documentation/content/xdocs/streaming.xml
@@ -222,7 +222,8 @@ $HADOOP_HOME/bin/hadoop  jar $HADOOP_HOME/hadoop-streaming.jar \
 
				 The -jobconf mapred.reduce.tasks=2 in the above example specifies to use two reducers for the job.

			
 
				 </p>

			
 
				 <p>

			
 
				-For more details on the jobconf parameters see: <a href="http://wiki.apache.org/hadoop/JobConfFile">http://wiki.apache.org/hadoop/JobConfFile</a></p>

			
 
				+For more details on the jobconf parameters see:

			
 
				+<a href="ext:hadoop-default">hadoop-default.html</a></p>

			
 
				 </section>

			
 
				 

			
 
				 <section>

			
@@ -264,8 +265,9 @@ To specify additional local temp directories use:
 
				    -jobconf mapred.temp.dir=/tmp/temp

			
 
				 </source>

			
 
				 <p>

			
 
				-For more details on jobconf parameters see: <a href="http://wiki.apache.org/hadoop/JobConfFile">http://wiki.apache.org/hadoop/JobConfFile</a>

			
 
				-</p><p>

			
 
				+For more details on jobconf parameters see:

			
 
				+<a href="ext:hadoop-default">hadoop-default.html</a></p>

			
 
				+<p>

			
 
				 To set an environment variable in a streaming command use:

			
 
				 </p>

			
 
				 <source>

			
@@ -362,7 +364,15 @@ Sorting within each partition for the reducer(all 4 fields used for sorting)</p>
 
				 <section>

			
 
				 <title>Working with the Hadoop Aggregate Package (the -reduce aggregate option) </title>

			
 
				 <p>

			
 
				-Hadoop has a library package called "Aggregate" (<a href="https://svn.apache.org/repos/asf/hadoop/core/trunk/src/java/org/apache/hadoop/mapred/lib/aggregate">https://svn.apache.org/repos/asf/hadoop/core/trunk/src/java/org/apache/hadoop/mapred/lib/aggregate</a>).  Aggregate provides a special reducer class and a special combiner class, and a list of simple aggregators that perform aggregations such as "sum", "max", "min" and so on  over a sequence of values. Aggregate allows you to define a mapper plugin class that is expected to generate "aggregatable items" for each input key/value pair of the mappers. The combiner/reducer will aggregate those aggregatable items by invoking the appropriate aggregators.

			
 
				+Hadoop has a library package called "Aggregate" (

			
 
				+<a href="https://svn.apache.org/repos/asf/hadoop/core/trunk/src/mapred/org/apache/hadoop/mapred/lib/aggregate">

			
 
				+https://svn.apache.org/repos/asf/hadoop/core/trunk/src/mapred/org/apache/hadoop/mapred/lib/aggregate</a>).

			
 
				+Aggregate provides a special reducer class and a special combiner class, and

			
 
				+a list of simple aggregators that perform aggregations such as "sum", "max",

			
 
				+"min" and so on  over a sequence of values. Aggregate allows you to define a

			
 
				+mapper plugin class that is expected to generate "aggregatable items" for each

			
 
				+input key/value pair of the mappers. The combiner/reducer will aggregate those

			
 
				+aggregatable items by invoking the appropriate aggregators.

			
 
				 </p><p>

			
 
				 To use Aggregate, simply specify "-reducer aggregate":

			
 
				 </p>

			
@@ -452,8 +462,8 @@ As an example, consider the problem of zipping (compressing) a set of files acro
 
				 <li>The existing Hadoop Framework:<ul>

			
 
				    <li>Add these commands to your main function:

			
 
				 <source>

			
 
				-       OutputFormatBase.setCompressOutput(conf, true);

			
 
				-       OutputFormatBase.setOutputCompressorClass(conf, org.apache.hadoop.io.compress.GzipCodec.class);

			
 
				+       FileOutputFormat.setCompressOutput(conf, true);

			
 
				+       FileOutputFormat.setOutputCompressorClass(conf, org.apache.hadoop.io.compress.GzipCodec.class);

			
 
				        conf.setOutputFormat(NonSplitableTextInputFormat.class);

			
 
				        conf.setNumReduceTasks(0);

			
 
				 </source></li>

			
@@ -474,7 +484,7 @@ As an example, consider the problem of zipping (compressing) a set of files acro
 
				 <section>

			
 
				 <title>How many reducers should I use? </title>

			
 
				 <p>

			
 
				-See the Hadoop Wiki for details: <a href="http://wiki.apache.org/hadoop/HowManyMapsAndReduces">http://wiki.apache.org/hadoop/HowManyMapsAndReduces</a>

			
 
				+See the Hadoop Wiki for details: <a href="mapred_tutorial.html#Reducer">Reducer</a>

			
 
				 </p>

			
 
				 </section>

			
 
				 

			
@@ -559,6 +569,25 @@ hadoop jar hadoop-streaming.jar -inputreader "StreamXmlRecord,begin=BEGIN_STRING
 
				 Anything found between BEGIN_STRING and END_STRING would be treated as one record for map tasks.

			
 
				 </p>

			
 
				 </section>

			
 
				+

			
 
				+<section>

			
 
				+<title>How do I update counters in streaming applications? </title>

			
 
				+<p>

			
 
				+A streaming process can use the stderr to emit counter information.

			
 
				+<code>reporter:counter:&lt;group&gt;,&lt;counter&gt;,&lt;amount&gt;</code> 

			
 
				+should be sent to stderr to update the counter.

			
 
				+</p>

			
 
				+</section>

			
 
				+

			
 
				+<section>

			
 
				+<title>How do I update status in streaming applications? </title>

			
 
				+<p>

			
 
				+A streaming process can use the stderr to emit status information.

			
 
				+To set a status, <code>reporter:status:&lt;message&gt;</code> should be sent 

			
 
				+to stderr.

			
 
				+</p>

			
 
				+</section>

			
 
				+

			
 
				 </section>

			
 
				 </body>

			
 
				 </document>