|
@@ -1592,6 +1592,83 @@
|
|
|
</li>
|
|
|
</ul>
|
|
|
</section>
|
|
|
+ <section>
|
|
|
+ <title>Job Credentials</title>
|
|
|
+ <p>In a secure cluster, the user is authenticated via Kerberos'
|
|
|
+ kinit command. Because of scalability concerns, we don't push
|
|
|
+ the client's Kerberos' tickets in MapReduce jobs. Instead, we
|
|
|
+ acquire delegation tokens from each HDFS NameNode that the job
|
|
|
+ will use and store them in the job as part of job submission.
|
|
|
+ The delegation tokens are automatically obtained
|
|
|
+ for the HDFS that holds the staging directories, where the job
|
|
|
+ job files are written, and any HDFS systems referenced by
|
|
|
+ FileInputFormats, FileOutputFormats, DistCp, and the
|
|
|
+ distributed cache.
|
|
|
+ Other applications require to set the configuration
|
|
|
+ "mapreduce.job.hdfs-servers" for all NameNodes that tasks might
|
|
|
+ need to talk during the job execution. This is a comma separated
|
|
|
+ list of file system names, such as "hdfs://nn1/,hdfs://nn2/".
|
|
|
+ These tokens are passed to the JobTracker
|
|
|
+ as part of the job submission as <a href="ext:api/org/apache/hadoop/
|
|
|
+ security/credentials">Credentials</a>. </p>
|
|
|
+
|
|
|
+ <p>Similar to HDFS delegation tokens, we also have MapReduce delegation tokens. The
|
|
|
+ MapReduce tokens are provided so that tasks can spawn jobs if they wish to. The tasks authenticate
|
|
|
+ to the JobTracker via the MapReduce delegation tokens. The delegation token can
|
|
|
+ be obtained via the API in <a href="api/org/apache/hadoop/mapred/jobclient/getdelegationtoken">
|
|
|
+ JobClient.getDelegationToken</a>. The obtained token must then be pushed onto the
|
|
|
+ credentials that is there in the JobConf used for job submission. The API
|
|
|
+ <a href="ext:api/org/apache/hadoop/security/credentials/addtoken">Credentials.addToken</a>
|
|
|
+ can be used for this. </p>
|
|
|
+
|
|
|
+ <p>The credentials are sent to the JobTracker as part of the job submission process.
|
|
|
+ The JobTracker persists the tokens and secrets in its filesystem (typically HDFS)
|
|
|
+ in a file within mapred.system.dir/JOBID. The TaskTracker localizes the file as part
|
|
|
+ job localization. Tasks see an environment variable called
|
|
|
+ HADOOP_TOKEN_FILE_LOCATION and the framework sets this to point to the
|
|
|
+ localized file. In order to launch jobs from tasks or for doing any HDFS operation,
|
|
|
+ tasks must set the configuration "mapreduce.job.credentials.binary" to point to
|
|
|
+ this token file.</p>
|
|
|
+
|
|
|
+ <p>The HDFS delegation tokens passed to the JobTracker during job submission are
|
|
|
+ are cancelled by the JobTracker when the job completes. This is the default behavior
|
|
|
+ unless mapreduce.job.complete.cancel.delegation.tokens is set to false in the
|
|
|
+ JobConf. For jobs whose tasks in turn spawns jobs, this should be set to false.
|
|
|
+ Applications sharing JobConf objects between multiple jobs on the JobClient side
|
|
|
+ should look at setting mapreduce.job.complete.cancel.delegation.tokens to false.
|
|
|
+ This is because the Credentials object within the JobConf will then be shared.
|
|
|
+ All jobs will end up sharing the same tokens, and hence the tokens should not be
|
|
|
+ canceled when the jobs in the sequence finish.</p>
|
|
|
+
|
|
|
+ <p>Apart from the HDFS delegation tokens, arbitrary secrets can also be
|
|
|
+ passed during the job submission for tasks to access other third party services.
|
|
|
+ The APIs
|
|
|
+ <a href="ext:api/org/apache/hadoop/mapred/jobconf/getcredentials">
|
|
|
+ JobConf.getCredentials</a> or <a href="ext:api/org/apache/
|
|
|
+ hadoop/mapreduce/jobcontext/getcredentials">JobContext.getCredentials()</a>
|
|
|
+ should be used to get the credentials object and then
|
|
|
+ <a href="ext:api/org/apache/hadoop/security/credentials/addsecretkey">
|
|
|
+ Credentials.addSecretKey</a> should be used to add secrets.</p>
|
|
|
+
|
|
|
+ <p>For applications written using the old MapReduce API, the Mapper/Reducer classes
|
|
|
+ need to implement <a href="api/org/apache/hadoop/mapred/jobconfigurable">
|
|
|
+ JobConfigurable</a> in order to get access to the credentials in the tasks.
|
|
|
+ A reference to the JobConf passed in the
|
|
|
+ <a href="api/org/apache/hadoop/mapred/jobconfigurable/configure">
|
|
|
+ JobConfigurable.configure</a> should be stored. In the new MapReduce API,
|
|
|
+ a similar thing can be done in the
|
|
|
+ <a href="api/org/apache/hadoop/mapreduce/mapper/setup">Mapper.setup</a>
|
|
|
+ method.
|
|
|
+ The api <a href="ext:api/org/apache/hadoop/mapred/jobconf/getcredentials">
|
|
|
+ JobConf.getCredentials()</a> or the api <a href="ext:api/org/apache/
|
|
|
+ hadoop/mapreduce/jobcontext/getcredentials">JobContext.getCredentials()</a>
|
|
|
+ should be used to get the credentials reference (depending
|
|
|
+ on whether the new MapReduce API or the old MapReduce API is used).
|
|
|
+ Tasks can access the secrets using the APIs in <a href="ext:api/
|
|
|
+ org/apache/hadoop/security/credentials">Credentials</a> </p>
|
|
|
+
|
|
|
+
|
|
|
+ </section>
|
|
|
</section>
|
|
|
|
|
|
<section>
|