123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501 |
- ~~ Licensed to the Apache Software Foundation (ASF) under one or more
- ~~ contributor license agreements. See the NOTICE file distributed with
- ~~ this work for additional information regarding copyright ownership.
- ~~ The ASF licenses this file to You under the Apache License, Version 2.0
- ~~ (the "License"); you may not use this file except in compliance with
- ~~ the License. You may obtain a copy of the License at
- ~~
- ~~ http://www.apache.org/licenses/LICENSE-2.0
- ~~
- ~~ Unless required by applicable law or agreed to in writing, software
- ~~ distributed under the License is distributed on an "AS IS" BASIS,
- ~~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- ~~ See the License for the specific language governing permissions and
- ~~ limitations under the License.
- ---
- Hadoop Commands Guide
- ---
- ---
- ${maven.build.timestamp}
- %{toc}
- Overview
- All hadoop commands are invoked by the <<<bin/hadoop>>> script. Running the
- hadoop script without any arguments prints the description for all
- commands.
- Usage: <<<hadoop [--config confdir] [COMMAND] [GENERIC_OPTIONS] [COMMAND_OPTIONS]>>>
- Hadoop has an option parsing framework that employs parsing generic
- options as well as running classes.
- *-----------------------+---------------+
- || COMMAND_OPTION || Description
- *-----------------------+---------------+
- | <<<--config confdir>>>| Overwrites the default Configuration directory. Default is <<<${HADOOP_HOME}/conf>>>.
- *-----------------------+---------------+
- | GENERIC_OPTIONS | The common set of options supported by multiple commands.
- | COMMAND_OPTIONS | Various commands with their options are described in the following sections. The commands have been grouped into User Commands and Administration Commands.
- *-----------------------+---------------+
- Generic Options
- The following options are supported by {{dfsadmin}}, {{fs}}, {{fsck}},
- {{job}} and {{fetchdt}}. Applications should implement
- {{{../../api/org/apache/hadoop/util/Tool.html}Tool}} to support
- GenericOptions.
- *------------------------------------------------+-----------------------------+
- || GENERIC_OPTION || Description
- *------------------------------------------------+-----------------------------+
- |<<<-conf \<configuration file\> >>> | Specify an application
- | configuration file.
- *------------------------------------------------+-----------------------------+
- |<<<-D \<property\>=\<value\> >>> | Use value for given property.
- *------------------------------------------------+-----------------------------+
- |<<<-jt \<local\> or \<jobtracker:port\> >>> | Specify a job tracker.
- | Applies only to job.
- *------------------------------------------------+-----------------------------+
- |<<<-files \<comma separated list of files\> >>> | Specify comma separated files
- | to be copied to the map
- | reduce cluster. Applies only
- | to job.
- *------------------------------------------------+-----------------------------+
- |<<<-libjars \<comma seperated list of jars\> >>>| Specify comma separated jar
- | files to include in the
- | classpath. Applies only to
- | job.
- *------------------------------------------------+-----------------------------+
- |<<<-archives \<comma separated list of archives\> >>> | Specify comma separated
- | archives to be unarchived on
- | the compute machines. Applies
- | only to job.
- *------------------------------------------------+-----------------------------+
- User Commands
- Commands useful for users of a hadoop cluster.
- * <<<archive>>>
- Creates a hadoop archive. More information can be found at Hadoop
- Archives.
- Usage: <<<hadoop archive -archiveName NAME <src>* <dest> >>>
- *-------------------+-------------------------------------------------------+
- ||COMMAND_OPTION || Description
- *-------------------+-------------------------------------------------------+
- | -archiveName NAME | Name of the archive to be created.
- *-------------------+-------------------------------------------------------+
- | src | Filesystem pathnames which work as usual with regular
- | expressions.
- *-------------------+-------------------------------------------------------+
- | dest | Destination directory which would contain the archive.
- *-------------------+-------------------------------------------------------+
- * <<<distcp>>>
- Copy file or directories recursively. More information can be found at
- Hadoop DistCp Guide.
- Usage: <<<hadoop distcp <srcurl> <desturl> >>>
- *-------------------+--------------------------------------------+
- ||COMMAND_OPTION || Description
- *-------------------+--------------------------------------------+
- | srcurl | Source Url
- *-------------------+--------------------------------------------+
- | desturl | Destination Url
- *-------------------+--------------------------------------------+
- * <<<fs>>>
- Usage: <<<hadoop fs [GENERIC_OPTIONS] [COMMAND_OPTIONS]>>>
- Deprecated, use <<<hdfs dfs>>> instead.
- Runs a generic filesystem user client.
- The various COMMAND_OPTIONS can be found at File System Shell Guide.
- * <<<fsck>>>
- Runs a HDFS filesystem checking utility.
- See {{{../hadoop-hdfs/HdfsUserGuide.html#fsck}fsck}} for more info.
- Usage: <<<hadoop fsck [GENERIC_OPTIONS] <path> [-move | -delete | -openforwrite] [-files [-blocks [-locations | -racks]]]>>>
- *------------------+---------------------------------------------+
- || COMMAND_OPTION || Description
- *------------------+---------------------------------------------+
- | <path> | Start checking from this path.
- *------------------+---------------------------------------------+
- | -move | Move corrupted files to /lost+found
- *------------------+---------------------------------------------+
- | -delete | Delete corrupted files.
- *------------------+---------------------------------------------+
- | -openforwrite | Print out files opened for write.
- *------------------+---------------------------------------------+
- | -files | Print out files being checked.
- *------------------+---------------------------------------------+
- | -blocks | Print out block report.
- *------------------+---------------------------------------------+
- | -locations | Print out locations for every block.
- *------------------+---------------------------------------------+
- | -racks | Print out network topology for data-node locations.
- *------------------+---------------------------------------------+
- * <<<fetchdt>>>
- Gets Delegation Token from a NameNode.
- See {{{../hadoop-hdfs/HdfsUserGuide.html#fetchdt}fetchdt}} for more info.
- Usage: <<<hadoop fetchdt [GENERIC_OPTIONS] [--webservice <namenode_http_addr>] <path> >>>
- *------------------------------+---------------------------------------------+
- || COMMAND_OPTION || Description
- *------------------------------+---------------------------------------------+
- | <fileName> | File name to store the token into.
- *------------------------------+---------------------------------------------+
- | --webservice <https_address> | use http protocol instead of RPC
- *------------------------------+---------------------------------------------+
- * <<<jar>>>
- Runs a jar file. Users can bundle their Map Reduce code in a jar file and
- execute it using this command.
- Usage: <<<hadoop jar <jar> [mainClass] args...>>>
- The streaming jobs are run via this command. Examples can be referred from
- Streaming examples
- Word count example is also run using jar command. It can be referred from
- Wordcount example
- * <<<job>>>
- Command to interact with Map Reduce Jobs.
- Usage: <<<hadoop job [GENERIC_OPTIONS] [-submit <job-file>] | [-status <job-id>] | [-counter <job-id> <group-name> <counter-name>] | [-kill <job-id>] | [-events <job-id> <from-event-#> <#-of-events>] | [-history [all] <jobOutputDir>] | [-list [all]] | [-kill-task <task-id>] | [-fail-task <task-id>] | [-set-priority <job-id> <priority>]>>>
- *------------------------------+---------------------------------------------+
- || COMMAND_OPTION || Description
- *------------------------------+---------------------------------------------+
- | -submit <job-file> | Submits the job.
- *------------------------------+---------------------------------------------+
- | -status <job-id> | Prints the map and reduce completion
- | percentage and all job counters.
- *------------------------------+---------------------------------------------+
- | -counter <job-id> <group-name> <counter-name> | Prints the counter value.
- *------------------------------+---------------------------------------------+
- | -kill <job-id> | Kills the job.
- *------------------------------+---------------------------------------------+
- | -events <job-id> <from-event-#> <#-of-events> | Prints the events' details
- | received by jobtracker for the given range.
- *------------------------------+---------------------------------------------+
- | -history [all]<jobOutputDir> | Prints job details, failed and killed tip
- | details. More details about the job such as
- | successful tasks and task attempts made for
- | each task can be viewed by specifying the [all]
- | option.
- *------------------------------+---------------------------------------------+
- | -list [all] | Displays jobs which are yet to complete.
- | <<<-list all>>> displays all jobs.
- *------------------------------+---------------------------------------------+
- | -kill-task <task-id> | Kills the task. Killed tasks are NOT counted
- | against failed attempts.
- *------------------------------+---------------------------------------------+
- | -fail-task <task-id> | Fails the task. Failed tasks are counted
- | against failed attempts.
- *------------------------------+---------------------------------------------+
- | -set-priority <job-id> <priority> | Changes the priority of the job. Allowed
- | priority values are VERY_HIGH, HIGH, NORMAL,
- | LOW, VERY_LOW
- *------------------------------+---------------------------------------------+
- * <<<pipes>>>
- Runs a pipes job.
- Usage: <<<hadoop pipes [-conf <path>] [-jobconf <key=value>, <key=value>,
- ...] [-input <path>] [-output <path>] [-jar <jar file>] [-inputformat
- <class>] [-map <class>] [-partitioner <class>] [-reduce <class>] [-writer
- <class>] [-program <executable>] [-reduces <num>]>>>
-
- *----------------------------------------+------------------------------------+
- || COMMAND_OPTION || Description
- *----------------------------------------+------------------------------------+
- | -conf <path> | Configuration for job
- *----------------------------------------+------------------------------------+
- | -jobconf <key=value>, <key=value>, ... | Add/override configuration for job
- *----------------------------------------+------------------------------------+
- | -input <path> | Input directory
- *----------------------------------------+------------------------------------+
- | -output <path> | Output directory
- *----------------------------------------+------------------------------------+
- | -jar <jar file> | Jar filename
- *----------------------------------------+------------------------------------+
- | -inputformat <class> | InputFormat class
- *----------------------------------------+------------------------------------+
- | -map <class> | Java Map class
- *----------------------------------------+------------------------------------+
- | -partitioner <class> | Java Partitioner
- *----------------------------------------+------------------------------------+
- | -reduce <class> | Java Reduce class
- *----------------------------------------+------------------------------------+
- | -writer <class> | Java RecordWriter
- *----------------------------------------+------------------------------------+
- | -program <executable> | Executable URI
- *----------------------------------------+------------------------------------+
- | -reduces <num> | Number of reduces
- *----------------------------------------+------------------------------------+
- * <<<queue>>>
- command to interact and view Job Queue information
- Usage: <<<hadoop queue [-list] | [-info <job-queue-name> [-showJobs]] | [-showacls]>>>
- *-----------------+-----------------------------------------------------------+
- || COMMAND_OPTION || Description
- *-----------------+-----------------------------------------------------------+
- | -list | Gets list of Job Queues configured in the system.
- | Along with scheduling information associated with the job queues.
- *-----------------+-----------------------------------------------------------+
- | -info <job-queue-name> [-showJobs] | Displays the job queue information and
- | associated scheduling information of particular job queue.
- | If <<<-showJobs>>> options is present a list of jobs
- | submitted to the particular job queue is displayed.
- *-----------------+-----------------------------------------------------------+
- | -showacls | Displays the queue name and associated queue operations
- | allowed for the current user. The list consists of only
- | those queues to which the user has access.
- *-----------------+-----------------------------------------------------------+
- * <<<version>>>
- Prints the version.
- Usage: <<<hadoop version>>>
- * <<<CLASSNAME>>>
- hadoop script can be used to invoke any class.
- Usage: <<<hadoop CLASSNAME>>>
- Runs the class named <<<CLASSNAME>>>.
- * <<<classpath>>>
- Prints the class path needed to get the Hadoop jar and the required
- libraries.
- Usage: <<<hadoop classpath>>>
- Administration Commands
- Commands useful for administrators of a hadoop cluster.
- * <<<balancer>>>
- Runs a cluster balancing utility. An administrator can simply press Ctrl-C
- to stop the rebalancing process. See
- {{{../hadoop-hdfs/HdfsUserGuide.html#Rebalancer}Rebalancer}} for more details.
- Usage: <<<hadoop balancer [-threshold <threshold>]>>>
- *------------------------+-----------------------------------------------------------+
- || COMMAND_OPTION | Description
- *------------------------+-----------------------------------------------------------+
- | -threshold <threshold> | Percentage of disk capacity. This overwrites the
- | default threshold.
- *------------------------+-----------------------------------------------------------+
- * <<<daemonlog>>>
- Get/Set the log level for each daemon.
- Usage: <<<hadoop daemonlog -getlevel <host:port> <name> >>>
- Usage: <<<hadoop daemonlog -setlevel <host:port> <name> <level> >>>
- *------------------------------+-----------------------------------------------------------+
- || COMMAND_OPTION || Description
- *------------------------------+-----------------------------------------------------------+
- | -getlevel <host:port> <name> | Prints the log level of the daemon running at
- | <host:port>. This command internally connects
- | to http://<host:port>/logLevel?log=<name>
- *------------------------------+-----------------------------------------------------------+
- | -setlevel <host:port> <name> <level> | Sets the log level of the daemon
- | running at <host:port>. This command internally
- | connects to http://<host:port>/logLevel?log=<name>
- *------------------------------+-----------------------------------------------------------+
- * <<<datanode>>>
- Runs a HDFS datanode.
- Usage: <<<hadoop datanode [-rollback]>>>
- *-----------------+-----------------------------------------------------------+
- || COMMAND_OPTION || Description
- *-----------------+-----------------------------------------------------------+
- | -rollback | Rollsback the datanode to the previous version. This should
- | be used after stopping the datanode and distributing the old
- | hadoop version.
- *-----------------+-----------------------------------------------------------+
- * <<<dfsadmin>>>
- Runs a HDFS dfsadmin client.
- Usage: <<<hadoop dfsadmin [GENERIC_OPTIONS] [-report] [-safemode enter | leave | get | wait] [-refreshNodes] [-finalizeUpgrade] [-upgradeProgress status | details | force] [-metasave filename] [-setQuota <quota> <dirname>...<dirname>] [-clrQuota <dirname>...<dirname>] [-restoreFailedStorage true|false|check] [-help [cmd]]>>>
- *-----------------+-----------------------------------------------------------+
- || COMMAND_OPTION || Description
- *-----------------+-----------------------------------------------------------+
- | -report | Reports basic filesystem information and statistics.
- *-----------------+-----------------------------------------------------------+
- | -safemode enter / leave / get / wait | Safe mode maintenance command. Safe
- | mode is a Namenode state in which it \
- | 1. does not accept changes to the name space (read-only) \
- | 2. does not replicate or delete blocks. \
- | Safe mode is entered automatically at Namenode startup, and
- | leaves safe mode automatically when the configured minimum
- | percentage of blocks satisfies the minimum replication
- | condition. Safe mode can also be entered manually, but then
- | it can only be turned off manually as well.
- *-----------------+-----------------------------------------------------------+
- | -refreshNodes | Re-read the hosts and exclude files to update the set of
- | Datanodes that are allowed to connect to the Namenode and
- | those that should be decommissioned or recommissioned.
- *-----------------+-----------------------------------------------------------+
- | -finalizeUpgrade| Finalize upgrade of HDFS. Datanodes delete their previous
- | version working directories, followed by Namenode doing the
- | same. This completes the upgrade process.
- *-----------------+-----------------------------------------------------------+
- | -upgradeProgress status / details / force | Request current distributed
- | upgrade status, a detailed status or force the upgrade to
- | proceed.
- *-----------------+-----------------------------------------------------------+
- | -metasave filename | Save Namenode's primary data structures to <filename> in
- | the directory specified by hadoop.log.dir property.
- | <filename> is overwritten if it exists.
- | <filename> will contain one line for each of the following\
- | 1. Datanodes heart beating with Namenode\
- | 2. Blocks waiting to be replicated\
- | 3. Blocks currrently being replicated\
- | 4. Blocks waiting to be deleted\
- *-----------------+-----------------------------------------------------------+
- | -setQuota <quota> <dirname>...<dirname> | Set the quota <quota> for each
- | directory <dirname>. The directory quota is a long integer
- | that puts a hard limit on the number of names in the
- | directory tree. Best effort for the directory, with faults
- | reported if \
- | 1. N is not a positive integer, or \
- | 2. user is not an administrator, or \
- | 3. the directory does not exist or is a file, or \
- | 4. the directory would immediately exceed the new quota. \
- *-----------------+-----------------------------------------------------------+
- | -clrQuota <dirname>...<dirname> | Clear the quota for each directory
- | <dirname>. Best effort for the directory. with fault
- | reported if \
- | 1. the directory does not exist or is a file, or \
- | 2. user is not an administrator. It does not fault if the
- | directory has no quota.
- *-----------------+-----------------------------------------------------------+
- | -restoreFailedStorage true / false / check | This option will turn on/off automatic attempt to restore failed storage replicas.
- | If a failed storage becomes available again the system will attempt to restore
- | edits and/or fsimage during checkpoint. 'check' option will return current setting.
- *-----------------+-----------------------------------------------------------+
- | -help [cmd] | Displays help for the given command or all commands if none
- | is specified.
- *-----------------+-----------------------------------------------------------+
- * <<<mradmin>>>
- Runs MR admin client
- Usage: <<<hadoop mradmin [ GENERIC_OPTIONS ] [-refreshQueueAcls]>>>
- *-------------------+-----------------------------------------------------------+
- || COMMAND_OPTION || Description
- *-------------------+-----------------------------------------------------------+
- | -refreshQueueAcls | Refresh the queue acls used by hadoop, to check access
- | during submissions and administration of the job by the
- | user. The properties present in mapred-queue-acls.xml is
- | reloaded by the queue manager.
- *-------------------+-----------------------------------------------------------+
- * <<<jobtracker>>>
- Runs the MapReduce job Tracker node.
- Usage: <<<hadoop jobtracker [-dumpConfiguration]>>>
- *--------------------+-----------------------------------------------------------+
- || COMMAND_OPTION || Description
- *--------------------+-----------------------------------------------------------+
- | -dumpConfiguration | Dumps the configuration used by the JobTracker alongwith
- | queue configuration in JSON format into Standard output
- | used by the jobtracker and exits.
- *--------------------+-----------------------------------------------------------+
- * <<<namenode>>>
- Runs the namenode. More info about the upgrade, rollback and finalize is
- at {{{../hadoop-hdfs/HdfsUserGuide.html#Upgrade_and_Rollback}Upgrade Rollback}}.
- Usage: <<<hadoop namenode [-format] | [-upgrade] | [-rollback] | [-finalize] | [-importCheckpoint]>>>
- *--------------------+-----------------------------------------------------------+
- || COMMAND_OPTION || Description
- *--------------------+-----------------------------------------------------------+
- | -format | Formats the namenode. It starts the namenode, formats
- | it and then shut it down.
- *--------------------+-----------------------------------------------------------+
- | -upgrade | Namenode should be started with upgrade option after
- | the distribution of new hadoop version.
- *--------------------+-----------------------------------------------------------+
- | -rollback | Rollsback the namenode to the previous version. This
- | should be used after stopping the cluster and
- | distributing the old hadoop version.
- *--------------------+-----------------------------------------------------------+
- | -finalize | Finalize will remove the previous state of the files
- | system. Recent upgrade will become permanent. Rollback
- | option will not be available anymore. After finalization
- | it shuts the namenode down.
- *--------------------+-----------------------------------------------------------+
- | -importCheckpoint | Loads image from a checkpoint directory and save it
- | into the current one. Checkpoint dir is read from
- | property fs.checkpoint.dir
- *--------------------+-----------------------------------------------------------+
- * <<<secondarynamenode>>>
- Runs the HDFS secondary namenode.
- See {{{../hadoop-hdfs/HdfsUserGuide.html#Secondary_NameNode}Secondary Namenode}}
- for more info.
- Usage: <<<hadoop secondarynamenode [-checkpoint [force]] | [-geteditsize]>>>
- *----------------------+-----------------------------------------------------------+
- || COMMAND_OPTION || Description
- *----------------------+-----------------------------------------------------------+
- | -checkpoint [-force] | Checkpoints the Secondary namenode if EditLog size
- | >= fs.checkpoint.size. If <<<-force>>> is used,
- | checkpoint irrespective of EditLog size.
- *----------------------+-----------------------------------------------------------+
- | -geteditsize | Prints the EditLog size.
- *----------------------+-----------------------------------------------------------+
- * <<<tasktracker>>>
- Runs a MapReduce task Tracker node.
- Usage: <<<hadoop tasktracker>>>
|