14 years ago · 7cd46a11fe
--- a/src/docs/src/documentation/content/xdocs/mapred_tutorial.xml
+++ b/src/docs/src/documentation/content/xdocs/mapred_tutorial.xml
@@ -1592,6 +1592,83 @@
 
				             </li>
			
 
				           </ul>
			
 
				         </section>
			
 
				+        <section>
			
 
				+          <title>Job Credentials</title>
			
 
				+          <p>In a secure cluster, the user is authenticated via Kerberos'
			
 
				+             kinit command. Because of scalability concerns, we don't push
			
 
				+             the client's Kerberos' tickets in MapReduce jobs. Instead, we
			
 
				+             acquire delegation tokens from each HDFS NameNode that the job
			
 
				+             will use and store them in the job as part of job submission.
			
 
				+             The delegation tokens are automatically obtained
			
 
				+             for the HDFS that holds the staging directories, where the job
			
 
				+             job files are written, and any HDFS systems referenced by
			
 
				+             FileInputFormats, FileOutputFormats, DistCp, and the
			
 
				+             distributed cache.
			
 
				+             Other applications require to set the configuration
			
 
				+             "mapreduce.job.hdfs-servers" for all NameNodes that tasks might 
			
 
				+             need to talk during the job execution. This is a comma separated
			
 
				+             list of file system names, such as "hdfs://nn1/,hdfs://nn2/".
			
 
				+             These tokens are passed to the JobTracker
			
 
				+             as part of the job submission as <a href="ext:api/org/apache/hadoop/
			
 
				+             security/credentials">Credentials</a>. </p> 
			
 
				+
			
 
				+          <p>Similar to HDFS delegation tokens, we also have MapReduce delegation tokens. The
			
 
				+             MapReduce tokens are provided so that tasks can spawn jobs if they wish to. The tasks authenticate
			
 
				+             to the JobTracker via the MapReduce delegation tokens. The delegation token can
			
 
				+             be obtained via the API in <a href="api/org/apache/hadoop/mapred/jobclient/getdelegationtoken">
			
 
				+             JobClient.getDelegationToken</a>. The obtained token must then be pushed onto the
			
 
				+             credentials that is there in the JobConf used for job submission. The API  
			
 
				+             <a href="ext:api/org/apache/hadoop/security/credentials/addtoken">Credentials.addToken</a>
			
 
				+             can be used for this. </p>
			
 
				+
			
 
				+          <p>The credentials are sent to the JobTracker as part of the job submission process.
			
 
				+             The JobTracker persists the tokens and secrets in its filesystem (typically HDFS) 
			
 
				+             in a file within mapred.system.dir/JOBID. The TaskTracker localizes the file as part
			
 
				+             job localization. Tasks see an environment variable called
			
 
				+             HADOOP_TOKEN_FILE_LOCATION and the framework sets this to point to the
			
 
				+             localized file. In order to launch jobs from tasks or for doing any HDFS operation,
			
 
				+             tasks must set the configuration "mapreduce.job.credentials.binary" to point to
			
 
				+             this token file.</p> 
			
 
				+
			
 
				+          <p>The HDFS delegation tokens passed to the JobTracker during job submission are
			
 
				+             are cancelled by the JobTracker when the job completes. This is the default behavior
			
 
				+             unless mapreduce.job.complete.cancel.delegation.tokens is set to false in the 
			
 
				+             JobConf. For jobs whose tasks in turn spawns jobs, this should be set to false.
			
 
				+             Applications sharing JobConf objects between multiple jobs on the JobClient side 
			
 
				+             should look at setting mapreduce.job.complete.cancel.delegation.tokens to false. 
			
 
				+             This is because the Credentials object within the JobConf will then be shared. 
			
 
				+             All jobs will end up sharing the same tokens, and hence the tokens should not be 
			
 
				+             canceled when the jobs in the sequence finish.</p>
			
 
				+
			
 
				+          <p>Apart from the HDFS delegation tokens, arbitrary secrets can also be 
			
 
				+             passed during the job submission for tasks to access other third party services.
			
 
				+             The APIs 
			
 
				+             <a href="ext:api/org/apache/hadoop/mapred/jobconf/getcredentials">
			
 
				+             JobConf.getCredentials</a> or <a href="ext:api/org/apache/
			
 
				+              hadoop/mapreduce/jobcontext/getcredentials">JobContext.getCredentials()</a>
			
 
				+             should be used to get the credentials object and then
			
 
				+             <a href="ext:api/org/apache/hadoop/security/credentials/addsecretkey">
			
 
				+             Credentials.addSecretKey</a> should be used to add secrets.</p>
			
 
				+
			
 
				+          <p>For applications written using the old MapReduce API, the Mapper/Reducer classes 
			
 
				+             need to implement <a href="api/org/apache/hadoop/mapred/jobconfigurable">
			
 
				+             JobConfigurable</a> in order to get access to the credentials in the tasks.
			
 
				+             A reference to the JobConf passed in the 
			
 
				+             <a href="api/org/apache/hadoop/mapred/jobconfigurable/configure">
			
 
				+             JobConfigurable.configure</a> should be stored. In the new MapReduce API, 
			
 
				+             a similar thing can be done in the 
			
 
				+             <a href="api/org/apache/hadoop/mapreduce/mapper/setup">Mapper.setup</a>
			
 
				+             method.
			
 
				+             The api <a href="ext:api/org/apache/hadoop/mapred/jobconf/getcredentials">
			
 
				+              JobConf.getCredentials()</a> or the api <a href="ext:api/org/apache/
			
 
				+              hadoop/mapreduce/jobcontext/getcredentials">JobContext.getCredentials()</a>
			
 
				+              should be used to get the credentials reference (depending
			
 
				+              on whether the new MapReduce API or the old MapReduce API is used). 
			
 
				+              Tasks can access the secrets using the APIs in <a href="ext:api/
			
 
				+              org/apache/hadoop/security/credentials">Credentials</a> </p>
			
 
				+
			
 
				+             
			
 
				+        </section>
			
 
				       </section>
			
 
				 
			
 
				       <section>
			
--- a/src/docs/src/documentation/content/xdocs/site.xml
+++ b/src/docs/src/documentation/content/xdocs/site.xml
@@ -155,6 +155,20 @@ See http://forrest.apache.org/docs/linking.html for more info.
 
				                 <compressioncodec href="CompressionCodec.html" />
			
 
				               </compress>
			
 
				             </io>
			
 
				+            <security href="security/">
			
 
				+              <credentials href="Credentials.html">
			
 
				+                <addtoken href="#addToken(org.apache.hadoop.io.Text,org.apache.hadoop.security.token.Token)" />
			
 
				+                <addsecretkey href="#addSecretKey(org.apache.hadoop.io.Text,byte[])" />
			
 
				+              </credentials> 
			
 
				+            </security>
			
 
				+            <mapreduce href="mapreduce/">
			
 
				+              <mapper href="Mapper.html">
			
 
				+                <setup href="#setup(org.apache.hadoop.mapreduce.Mapper.Context)" />
			
 
				+              </mapper>
			
 
				+              <jobcontext href="JobContext.html">
			
 
				+                <getcredentials href="#getcredentials" />
			
 
				+              </jobcontext>
			
 
				+            </mapreduce>
			
 
				             <mapred href="mapred/">
			
 
				               <clusterstatus href="ClusterStatus.html" />
			
 
				               <counters href="Counters.html" />
			
@@ -178,6 +192,7 @@ See http://forrest.apache.org/docs/linking.html for more info.
 
				               <jobclient href="JobClient.html">
			
 
				                 <runjob href="#runJob(org.apache.hadoop.mapred.JobConf)" />
			
 
				                 <submitjob href="#submitJob(org.apache.hadoop.mapred.JobConf)" />
			
 
				+                <getdelegationtoken href="#getDelegationToken(org.apache.hadoop.io.Text)" />
			
 
				               </jobclient>
			
 
				               <jobconf href="JobConf.html">
			
 
				                 <setnummaptasks href="#setNumMapTasks(int)" />
			
@@ -203,6 +218,7 @@ See http://forrest.apache.org/docs/linking.html for more info.
 
				                 <setqueuename href="#setQueueName(java.lang.String)" />
			
 
				                 <getjoblocaldir href="#getJobLocalDir()" />
			
 
				                 <getjar href="#getJar()" />
			
 
				+                <getcredentials href="#getCredentials()" />
			
 
				               </jobconf>
			
 
				               <jobconfigurable href="JobConfigurable.html">
			
 
				                 <configure href="#configure(org.apache.hadoop.mapred.JobConf)" />