ソースを参照

HADOOP-6350. Document Hadoop Metrics. (Contributed by Akira Ajisaka)

git-svn-id: https://svn.apache.org/repos/asf/hadoop/common/trunk@1602324 13f79535-47bb-0310-9956-ffa450edef68
Arpit Agarwal 11 年 前
コミット
ab54276440

+ 2 - 0
hadoop-common-project/hadoop-common/CHANGES.txt

@@ -420,6 +420,8 @@ Release 2.5.0 - UNRELEASED
     HADOOP-10376. Refactor refresh*Protocols into a single generic
     refreshConfigProtocol. (Chris Li via Arpit Agarwal)
 
+    HADOOP-6350. Documenting Hadoop metrics. (Akira Ajisaka via Arpit Agarwal)
+
   OPTIMIZATIONS
 
   BUG FIXES 

+ 732 - 0
hadoop-common-project/hadoop-common/src/site/apt/Metrics.apt.vm

@@ -0,0 +1,732 @@
+~~ Licensed to the Apache Software Foundation (ASF) under one or more
+~~ contributor license agreements.  See the NOTICE file distributed with
+~~ this work for additional information regarding copyright ownership.
+~~ The ASF licenses this file to You under the Apache License, Version 2.0
+~~ (the "License"); you may not use this file except in compliance with
+~~ the License.  You may obtain a copy of the License at
+~~
+~~     http://www.apache.org/licenses/LICENSE-2.0
+~~
+~~ Unless required by applicable law or agreed to in writing, software
+~~ distributed under the License is distributed on an "AS IS" BASIS,
+~~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+~~ See the License for the specific language governing permissions and
+~~ limitations under the License.
+
+  ---
+  Metrics Guide
+  ---
+  ---
+  ${maven.build.timestamp}
+
+%{toc}
+
+Overview
+
+  Metrics are statistical information exposed by Hadoop daemons,
+  used for monitoring, performance tuning and debug.
+  There are many metrics available by default
+  and they are very useful for troubleshooting.
+  This page shows the details of the available metrics.
+
+  Each section describes each context into which metrics are grouped.
+
+  The documentation of Metrics 2.0 framework is
+  {{{../../api/org/apache/hadoop/metrics2/package-summary.html}here}}.
+
+jvm context
+
+* JvmMetrics
+
+  Each metrics record contains tags such as ProcessName, SessionID
+  and Hostname as additional information along with metrics.
+
+*-------------------------------------+--------------------------------------+
+|| Name                               || Description
+*-------------------------------------+--------------------------------------+
+|<<<MemNonHeapUsedM>>> | Current non-heap memory used in MB
+*-------------------------------------+--------------------------------------+
+|<<<MemNonHeapCommittedM>>> | Current non-heap memory committed in MB
+*-------------------------------------+--------------------------------------+
+|<<<MemNonHeapMaxM>>> | Max non-heap memory size in MB
+*-------------------------------------+--------------------------------------+
+|<<<MemHeapUsedM>>> | Current heap memory used in MB
+*-------------------------------------+--------------------------------------+
+|<<<MemHeapCommittedM>>> | Current heap memory committed in MB
+*-------------------------------------+--------------------------------------+
+|<<<MemHeapMaxM>>> | Max heap memory size in MB
+*-------------------------------------+--------------------------------------+
+|<<<MemMaxM>>> | Max memory size in MB
+*-------------------------------------+--------------------------------------+
+|<<<ThreadsNew>>> | Current number of NEW threads
+*-------------------------------------+--------------------------------------+
+|<<<ThreadsRunnable>>> | Current number of RUNNABLE threads
+*-------------------------------------+--------------------------------------+
+|<<<ThreadsBlocked>>> | Current number of BLOCKED threads
+*-------------------------------------+--------------------------------------+
+|<<<ThreadsWaiting>>> | Current number of WAITING threads
+*-------------------------------------+--------------------------------------+
+|<<<ThreadsTimedWaiting>>> | Current number of TIMED_WAITING threads
+*-------------------------------------+--------------------------------------+
+|<<<ThreadsTerminated>>> | Current number of TERMINATED threads
+*-------------------------------------+--------------------------------------+
+|<<<GcInfo>>>  | Total GC count and GC time in msec, grouped by the kind of GC. \
+               | ex.) GcCountPS Scavenge=6, GCTimeMillisPS Scavenge=40,
+               | GCCountPS MarkSweep=0, GCTimeMillisPS MarkSweep=0
+*-------------------------------------+--------------------------------------+
+|<<<GcCount>>> | Total GC count
+*-------------------------------------+--------------------------------------+
+|<<<GcTimeMillis>>> | Total GC time in msec
+*-------------------------------------+--------------------------------------+
+|<<<LogFatal>>> | Total number of FATAL logs
+*-------------------------------------+--------------------------------------+
+|<<<LogError>>> | Total number of ERROR logs
+*-------------------------------------+--------------------------------------+
+|<<<LogWarn>>> | Total number of WARN logs
+*-------------------------------------+--------------------------------------+
+|<<<LogInfo>>> | Total number of INFO logs
+*-------------------------------------+--------------------------------------+
+
+rpc context
+
+* rpc
+
+  Each metrics record contains tags such as Hostname
+  and port (number to which server is bound)
+  as additional information along with metrics.
+
+*-------------------------------------+--------------------------------------+
+|| Name                               || Description
+*-------------------------------------+--------------------------------------+
+|<<<ReceivedBytes>>> | Total number of received bytes
+*-------------------------------------+--------------------------------------+
+|<<<SentBytes>>> | Total number of sent bytes
+*-------------------------------------+--------------------------------------+
+|<<<RpcQueueTimeNumOps>>> | Total number of RPC calls
+*-------------------------------------+--------------------------------------+
+|<<<RpcQueueTimeAvgTime>>> | Average queue time in milliseconds
+*-------------------------------------+--------------------------------------+
+|<<<RpcProcessingTimeNumOps>>> | Total number of RPC calls (same to
+                               | RpcQueueTimeNumOps)
+*-------------------------------------+--------------------------------------+
+|<<<RpcProcessingAvgTime>>> | Average Processing time in milliseconds
+*-------------------------------------+--------------------------------------+
+|<<<RpcAuthenticationFailures>>> | Total number of authentication failures
+*-------------------------------------+--------------------------------------+
+|<<<RpcAuthenticationSuccesses>>> | Total number of authentication successes
+*-------------------------------------+--------------------------------------+
+|<<<RpcAuthorizationFailures>>> | Total number of authorization failures
+*-------------------------------------+--------------------------------------+
+|<<<RpcAuthorizationSuccesses>>> | Total number of authorization successes
+*-------------------------------------+--------------------------------------+
+|<<<NumOpenConnections>>> | Current number of open connections
+*-------------------------------------+--------------------------------------+
+|<<<CallQueueLength>>> | Current length of the call queue
+*-------------------------------------+--------------------------------------+
+|<<<rpcQueueTime>>><num><<<sNumOps>>> | Shows total number of RPC calls
+| | (<num> seconds granularity) if <<<rpc.metrics.quantile.enable>>> is set to
+| | true. <num> is specified by <<<rpc.metrics.percentiles.intervals>>>.
+*-------------------------------------+--------------------------------------+
+|<<<rpcQueueTime>>><num><<<s50thPercentileLatency>>> |
+| | Shows the 50th percentile of RPC queue time in milliseconds
+| | (<num> seconds granularity) if <<<rpc.metrics.quantile.enable>>> is set to
+| | true. <num> is specified by <<<rpc.metrics.percentiles.intervals>>>.
+*-------------------------------------+--------------------------------------+
+|<<<rpcQueueTime>>><num><<<s75thPercentileLatency>>> |
+| | Shows the 75th percentile of RPC queue time in milliseconds
+| | (<num> seconds granularity) if <<<rpc.metrics.quantile.enable>>> is set to
+| | true. <num> is specified by <<<rpc.metrics.percentiles.intervals>>>.
+*-------------------------------------+--------------------------------------+
+|<<<rpcQueueTime>>><num><<<s90thPercentileLatency>>> |
+| | Shows the 90th percentile of RPC queue time in milliseconds
+| | (<num> seconds granularity) if <<<rpc.metrics.quantile.enable>>> is set to
+| | true. <num> is specified by <<<rpc.metrics.percentiles.intervals>>>.
+*-------------------------------------+--------------------------------------+
+|<<<rpcQueueTime>>><num><<<s95thPercentileLatency>>> |
+| | Shows the 95th percentile of RPC queue time in milliseconds
+| | (<num> seconds granularity) if <<<rpc.metrics.quantile.enable>>> is set to
+| | true. <num> is specified by <<<rpc.metrics.percentiles.intervals>>>.
+*-------------------------------------+--------------------------------------+
+|<<<rpcQueueTime>>><num><<<s99thPercentileLatency>>> |
+| | Shows the 99th percentile of RPC queue time in milliseconds
+| | (<num> seconds granularity) if <<<rpc.metrics.quantile.enable>>> is set to
+| | true. <num> is specified by <<<rpc.metrics.percentiles.intervals>>>.
+*-------------------------------------+--------------------------------------+
+|<<<rpcProcessingTime>>><num><<<sNumOps>>> | Shows total number of RPC calls
+| | (<num> seconds granularity) if <<<rpc.metrics.quantile.enable>>> is set to
+| | true. <num> is specified by <<<rpc.metrics.percentiles.intervals>>>.
+*-------------------------------------+--------------------------------------+
+|<<<rpcProcessingTime>>><num><<<s50thPercentileLatency>>> |
+| | Shows the 50th percentile of RPC processing time in milliseconds
+| | (<num> seconds granularity) if <<<rpc.metrics.quantile.enable>>> is set to
+| | true. <num> is specified by <<<rpc.metrics.percentiles.intervals>>>.
+*-------------------------------------+--------------------------------------+
+|<<<rpcProcessingTime>>><num><<<s75thPercentileLatency>>> |
+| | Shows the 75th percentile of RPC processing time in milliseconds
+| | (<num> seconds granularity) if <<<rpc.metrics.quantile.enable>>> is set to
+| | true. <num> is specified by <<<rpc.metrics.percentiles.intervals>>>.
+*-------------------------------------+--------------------------------------+
+|<<<rpcProcessingTime>>><num><<<s90thPercentileLatency>>> |
+| | Shows the 90th percentile of RPC processing time in milliseconds
+| | (<num> seconds granularity) if <<<rpc.metrics.quantile.enable>>> is set to
+| | true. <num> is specified by <<<rpc.metrics.percentiles.intervals>>>.
+*-------------------------------------+--------------------------------------+
+|<<<rpcProcessingTime>>><num><<<s95thPercentileLatency>>> |
+| | Shows the 95th percentile of RPC processing time in milliseconds
+| | (<num> seconds granularity) if <<<rpc.metrics.quantile.enable>>> is set to
+| | true. <num> is specified by <<<rpc.metrics.percentiles.intervals>>>.
+*-------------------------------------+--------------------------------------+
+|<<<rpcProcessingTime>>><num><<<s99thPercentileLatency>>> |
+| | Shows the 99th percentile of RPC processing time in milliseconds
+| | (<num> seconds granularity) if <<<rpc.metrics.quantile.enable>>> is set to
+| | true. <num> is specified by <<<rpc.metrics.percentiles.intervals>>>.
+*-------------------------------------+--------------------------------------+
+
+* RetryCache/NameNodeRetryCache
+
+  RetryCache metrics is useful to monitor NameNode fail-over.
+  Each metrics record contains Hostname tag.
+
+*-------------------------------------+--------------------------------------+
+|| Name                               || Description
+*-------------------------------------+--------------------------------------+
+|<<<CacheHit>>> | Total number of RetryCache hit
+*-------------------------------------+--------------------------------------+
+|<<<CacheCleared>>> | Total number of RetryCache cleared
+*-------------------------------------+--------------------------------------+
+|<<<CacheUpdated>>> | Total number of RetryCache updated
+*-------------------------------------+--------------------------------------+
+
+rpcdetailed context
+
+  Metrics of rpcdetailed context are exposed in unified manner by RPC
+  layer. Two metrics are exposed for each RPC based on its name.
+  Metrics named "(RPC method name)NumOps" indicates total number of
+  method calls, and metrics named "(RPC method name)AvgTime" shows
+  average turn around time for method calls in milliseconds.
+
+* rpcdetailed
+
+  Each metrics record contains tags such as Hostname
+  and port (number to which server is bound)
+  as additional information along with metrics.
+
+  The Metrics about RPCs which is not called are not included
+  in metrics record.
+
+*-------------------------------------+--------------------------------------+
+|| Name                               || Description
+*-------------------------------------+--------------------------------------+
+|<methodname><<<NumOps>>> | Total number of the times the method is called
+*-------------------------------------+--------------------------------------+
+|<methodname><<<AvgTime>>> | Average turn around time of the method in
+                           | milliseconds
+*-------------------------------------+--------------------------------------+
+
+dfs context
+
+* namenode
+
+  Each metrics record contains tags such as ProcessName, SessionId,
+  and Hostname as additional information along with metrics.
+
+*-------------------------------------+--------------------------------------+
+|| Name                               || Description
+*-------------------------------------+--------------------------------------+
+|<<<CreateFileOps>>> | Total number of files created
+*-------------------------------------+--------------------------------------+
+|<<<FilesCreated>>> | Total number of files and directories created by create
+                    | or mkdir operations
+*-------------------------------------+--------------------------------------+
+|<<<FilesAppended>>> | Total number of files appended
+*-------------------------------------+--------------------------------------+
+|<<<GetBlockLocations>>> | Total number of getBlockLocations operations
+*-------------------------------------+--------------------------------------+
+|<<<FilesRenamed>>> | Total number of rename <<operations>> (NOT number of
+                    | files/dirs renamed)
+*-------------------------------------+--------------------------------------+
+|<<<GetListingOps>>> | Total number of directory listing operations
+*-------------------------------------+--------------------------------------+
+|<<<DeleteFileOps>>> | Total number of delete operations
+*-------------------------------------+--------------------------------------+
+|<<<FilesDeleted>>> | Total number of files and directories deleted by delete
+                    | or rename operations
+*-------------------------------------+--------------------------------------+
+|<<<FileInfoOps>>> | Total number of getFileInfo and getLinkFileInfo
+                   | operations
+*-------------------------------------+--------------------------------------+
+|<<<AddBlockOps>>> | Total number of addBlock operations succeeded
+*-------------------------------------+--------------------------------------+
+|<<<GetAdditionalDatanodeOps>>> | Total number of getAdditionalDatanode
+                                | operations
+*-------------------------------------+--------------------------------------+
+|<<<CreateSymlinkOps>>> | Total number of createSymlink operations
+*-------------------------------------+--------------------------------------+
+|<<<GetLinkTargetOps>>> | Total number of getLinkTarget operations
+*-------------------------------------+--------------------------------------+
+|<<<FilesInGetListingOps>>> | Total number of files and directories listed by
+                            | directory listing operations
+*-------------------------------------+--------------------------------------+
+|<<<AllowSnapshotOps>>> | Total number of allowSnapshot operations
+*-------------------------------------+--------------------------------------+
+|<<<DisallowSnapshotOps>>> | Total number of disallowSnapshot operations
+*-------------------------------------+--------------------------------------+
+|<<<CreateSnapshotOps>>> | Total number of createSnapshot operations
+*-------------------------------------+--------------------------------------+
+|<<<DeleteSnapshotOps>>> | Total number of deleteSnapshot operations
+*-------------------------------------+--------------------------------------+
+|<<<RenameSnapshotOps>>> | Total number of renameSnapshot operations
+*-------------------------------------+--------------------------------------+
+|<<<ListSnapshottableDirOps>>> | Total number of snapshottableDirectoryStatus
+                               | operations
+*-------------------------------------+--------------------------------------+
+|<<<SnapshotDiffReportOps>>> | Total number of getSnapshotDiffReport
+                             | operations
+*-------------------------------------+--------------------------------------+
+|<<<TransactionsNumOps>>> | Total number of Journal transactions
+*-------------------------------------+--------------------------------------+
+|<<<TransactionsAvgTime>>> | Average time of Journal transactions in
+                           | milliseconds
+*-------------------------------------+--------------------------------------+
+|<<<SyncsNumOps>>> | Total number of Journal syncs
+*-------------------------------------+--------------------------------------+
+|<<<SyncsAvgTime>>> | Average time of Journal syncs in milliseconds
+*-------------------------------------+--------------------------------------+
+|<<<TransactionsBatchedInSync>>> | Total number of Journal transactions batched
+                                 | in sync
+*-------------------------------------+--------------------------------------+
+|<<<BlockReportNumOps>>> | Total number of processing block reports from
+                         | DataNode
+*-------------------------------------+--------------------------------------+
+|<<<BlockReportAvgTime>>> | Average time of processing block reports in
+                          | milliseconds
+*-------------------------------------+--------------------------------------+
+|<<<CacheReportNumOps>>> | Total number of processing cache reports from
+                         | DataNode
+*-------------------------------------+--------------------------------------+
+|<<<CacheReportAvgTime>>> | Average time of processing cache reports in
+                          | milliseconds
+*-------------------------------------+--------------------------------------+
+|<<<SafeModeTime>>> | The interval between FSNameSystem starts and the last
+                    | time safemode leaves in milliseconds. \
+                    | (sometimes not equal to the time in SafeMode,
+                    | see {{{https://issues.apache.org/jira/browse/HDFS-5156}HDFS-5156}})
+*-------------------------------------+--------------------------------------+
+|<<<FsImageLoadTime>>> | Time loading FS Image at startup in milliseconds
+*-------------------------------------+--------------------------------------+
+|<<<FsImageLoadTime>>> | Time loading FS Image at startup in milliseconds
+*-------------------------------------+--------------------------------------+
+|<<<GetEditNumOps>>> | Total number of edits downloads from SecondaryNameNode
+*-------------------------------------+--------------------------------------+
+|<<<GetEditAvgTime>>> | Average edits download time in milliseconds
+*-------------------------------------+--------------------------------------+
+|<<<GetImageNumOps>>> |Total number of fsimage downloads from SecondaryNameNode
+*-------------------------------------+--------------------------------------+
+|<<<GetImageAvgTime>>> | Average fsimage download time in milliseconds
+*-------------------------------------+--------------------------------------+
+|<<<PutImageNumOps>>> | Total number of fsimage uploads to SecondaryNameNode
+*-------------------------------------+--------------------------------------+
+|<<<PutImageAvgTime>>> | Average fsimage upload time in milliseconds
+*-------------------------------------+--------------------------------------+
+
+* FSNamesystem
+
+  Each metrics record contains tags such as HAState and Hostname
+  as additional information along with metrics.
+
+*-------------------------------------+--------------------------------------+
+|| Name                               || Description
+*-------------------------------------+--------------------------------------+
+|<<<MissingBlocks>>> | Current number of missing blocks
+*-------------------------------------+--------------------------------------+
+|<<<ExpiredHeartbeats>>> | Total number of expired heartbeats
+*-------------------------------------+--------------------------------------+
+|<<<TransactionsSinceLastCheckpoint>>> | Total number of transactions since
+                                       | last checkpoint
+*-------------------------------------+--------------------------------------+
+|<<<TransactionsSinceLastLogRoll>>> | Total number of transactions since last
+                                    | edit log roll
+*-------------------------------------+--------------------------------------+
+|<<<LastWrittenTransactionId>>> | Last transaction ID written to the edit log
+*-------------------------------------+--------------------------------------+
+|<<<LastCheckpointTime>>> | Time in milliseconds since epoch of last checkpoint
+*-------------------------------------+--------------------------------------+
+|<<<CapacityTotal>>> | Current raw capacity of DataNodes in bytes
+*-------------------------------------+--------------------------------------+
+|<<<CapacityTotalGB>>> | Current raw capacity of DataNodes in GB
+*-------------------------------------+--------------------------------------+
+|<<<CapacityUsed>>> | Current used capacity across all DataNodes in bytes
+*-------------------------------------+--------------------------------------+
+|<<<CapacityUsedGB>>> | Current used capacity across all DataNodes in GB
+*-------------------------------------+--------------------------------------+
+|<<<CapacityRemaining>>> | Current remaining capacity in bytes
+*-------------------------------------+--------------------------------------+
+|<<<CapacityRemainingGB>>> | Current remaining capacity in GB
+*-------------------------------------+--------------------------------------+
+|<<<CapacityUsedNonDFS>>> | Current space used by DataNodes for non DFS
+                          | purposes in bytes
+*-------------------------------------+--------------------------------------+
+|<<<TotalLoad>>> | Current number of connections
+*-------------------------------------+--------------------------------------+
+|<<<SnapshottableDirectories>>> | Current number of snapshottable directories
+*-------------------------------------+--------------------------------------+
+|<<<Snapshots>>> | Current number of snapshots
+*-------------------------------------+--------------------------------------+
+|<<<BlocksTotal>>> | Current number of allocated blocks in the system
+*-------------------------------------+--------------------------------------+
+|<<<FilesTotal>>> | Current number of files and directories
+*-------------------------------------+--------------------------------------+
+|<<<PendingReplicationBlocks>>> | Current number of blocks pending to be
+                                | replicated
+*-------------------------------------+--------------------------------------+
+|<<<UnderReplicatedBlocks>>> | Current number of blocks under replicated
+*-------------------------------------+--------------------------------------+
+|<<<CorruptBlocks>>> | Current number of blocks with corrupt replicas.
+*-------------------------------------+--------------------------------------+
+|<<<ScheduledReplicationBlocks>>> | Current number of blocks scheduled for
+                                  | replications
+*-------------------------------------+--------------------------------------+
+|<<<PendingDeletionBlocks>>> | Current number of blocks pending deletion
+*-------------------------------------+--------------------------------------+
+|<<<ExcessBlocks>>> | Current number of excess blocks
+*-------------------------------------+--------------------------------------+
+|<<<PostponedMisreplicatedBlocks>>> | (HA-only) Current number of blocks
+                                    | postponed to replicate
+*-------------------------------------+--------------------------------------+
+|<<<PendingDataNodeMessageCourt>>> | (HA-only) Current number of pending
+                                   | block-related messages for later
+                                   | processing in the standby NameNode
+*-------------------------------------+--------------------------------------+
+|<<<MillisSinceLastLoadedEdits>>> | (HA-only) Time in milliseconds since the
+                                  | last time standby NameNode load edit log.
+                                  | In active NameNode, set to 0
+*-------------------------------------+--------------------------------------+
+|<<<BlockCapacity>>> | Current number of block capacity
+*-------------------------------------+--------------------------------------+
+|<<<StaleDataNodes>>> | Current number of DataNodes marked stale due to delayed
+                      | heartbeat
+*-------------------------------------+--------------------------------------+
+|<<<TotalFiles>>> |Current number of files and directories (same as FilesTotal)
+*-------------------------------------+--------------------------------------+
+
+* JournalNode
+
+  The server-side metrics for a journal from the JournalNode's perspective.
+  Each metrics record contains Hostname tag as additional information
+  along with metrics.
+
+*-------------------------------------+--------------------------------------+
+|| Name                               || Description
+*-------------------------------------+--------------------------------------+
+|<<<Syncs60sNumOps>>> | Number of sync operations (1 minute granularity)
+*-------------------------------------+--------------------------------------+
+|<<<Syncs60s50thPercentileLatencyMicros>>> | The 50th percentile of sync
+| | latency in microseconds (1 minute granularity)
+*-------------------------------------+--------------------------------------+
+|<<<Syncs60s75thPercentileLatencyMicros>>> | The 75th percentile of sync
+| | latency in microseconds (1 minute granularity)
+*-------------------------------------+--------------------------------------+
+|<<<Syncs60s90thPercentileLatencyMicros>>> | The 90th percentile of sync
+| | latency in microseconds (1 minute granularity)
+*-------------------------------------+--------------------------------------+
+|<<<Syncs60s95thPercentileLatencyMicros>>> | The 95th percentile of sync
+| | latency in microseconds (1 minute granularity)
+*-------------------------------------+--------------------------------------+
+|<<<Syncs60s99thPercentileLatencyMicros>>> | The 99th percentile of sync
+| | latency in microseconds (1 minute granularity)
+*-------------------------------------+--------------------------------------+
+|<<<Syncs300sNumOps>>> | Number of sync operations (5 minutes granularity)
+*-------------------------------------+--------------------------------------+
+|<<<Syncs300s50thPercentileLatencyMicros>>> | The 50th percentile of sync
+| | latency in microseconds (5 minutes granularity)
+*-------------------------------------+--------------------------------------+
+|<<<Syncs300s75thPercentileLatencyMicros>>> | The 75th percentile of sync
+| | latency in microseconds (5 minutes granularity)
+*-------------------------------------+--------------------------------------+
+|<<<Syncs300s90thPercentileLatencyMicros>>> | The 90th percentile of sync
+| | latency in microseconds (5 minutes granularity)
+*-------------------------------------+--------------------------------------+
+|<<<Syncs300s95thPercentileLatencyMicros>>> | The 95th percentile of sync
+| | latency in microseconds (5 minutes granularity)
+*-------------------------------------+--------------------------------------+
+|<<<Syncs300s99thPercentileLatencyMicros>>> | The 99th percentile of sync
+| | latency in microseconds (5 minutes granularity)
+*-------------------------------------+--------------------------------------+
+|<<<Syncs3600sNumOps>>> | Number of sync operations (1 hour granularity)
+*-------------------------------------+--------------------------------------+
+|<<<Syncs3600s50thPercentileLatencyMicros>>> | The 50th percentile of sync
+| | latency in microseconds (1 hour granularity)
+*-------------------------------------+--------------------------------------+
+|<<<Syncs3600s75thPercentileLatencyMicros>>> | The 75th percentile of sync
+| | latency in microseconds (1 hour granularity)
+*-------------------------------------+--------------------------------------+
+|<<<Syncs3600s90thPercentileLatencyMicros>>> | The 90th percentile of sync
+| | latency in microseconds (1 hour granularity)
+*-------------------------------------+--------------------------------------+
+|<<<Syncs3600s95thPercentileLatencyMicros>>> | The 95th percentile of sync
+| | latency in microseconds (1 hour granularity)
+*-------------------------------------+--------------------------------------+
+|<<<Syncs3600s99thPercentileLatencyMicros>>> | The 99th percentile of sync
+| | latency in microseconds (1 hour granularity)
+*-------------------------------------+--------------------------------------+
+|<<<BatchesWritten>>> | Total number of batches written since startup
+*-------------------------------------+--------------------------------------+
+|<<<TxnsWritten>>> | Total number of transactions written since startup
+*-------------------------------------+--------------------------------------+
+|<<<BytesWritten>>> | Total number of bytes written since startup
+*-------------------------------------+--------------------------------------+
+|<<<BatchesWrittenWhileLagging>>> | Total number of batches written where this
+| | node was lagging
+*-------------------------------------+--------------------------------------+
+|<<<LastWriterEpoch>>> | Current writer's epoch number
+*-------------------------------------+--------------------------------------+
+|<<<CurrentLagTxns>>> | The number of transactions that this JournalNode is
+| | lagging
+*-------------------------------------+--------------------------------------+
+|<<<LastWrittenTxId>>> | The highest transaction id stored on this JournalNode
+*-------------------------------------+--------------------------------------+
+|<<<LastPromisedEpoch>>> | The last epoch number which this node has promised
+| | not to accept any lower epoch, or 0 if no promises have been made
+*-------------------------------------+--------------------------------------+
+
+* datanode
+
+  Each metrics record contains tags such as SessionId and Hostname
+  as additional information along with metrics.
+
+*-------------------------------------+--------------------------------------+
+|| Name                               || Description
+*-------------------------------------+--------------------------------------+
+|<<<BytesWritten>>> | Total number of bytes written to DataNode
+*-------------------------------------+--------------------------------------+
+|<<<BytesRead>>> | Total number of bytes read from DataNode
+*-------------------------------------+--------------------------------------+
+|<<<BlocksWritten>>> | Total number of blocks written to DataNode
+*-------------------------------------+--------------------------------------+
+|<<<BlocksRead>>> | Total number of blocks read from DataNode
+*-------------------------------------+--------------------------------------+
+|<<<BlocksReplicated>>> | Total number of blocks replicated
+*-------------------------------------+--------------------------------------+
+|<<<BlocksRemoved>>> | Total number of blocks removed
+*-------------------------------------+--------------------------------------+
+|<<<BlocksVerified>>> | Total number of blocks verified
+*-------------------------------------+--------------------------------------+
+|<<<BlockVerificationFailures>>> | Total number of verifications failures
+*-------------------------------------+--------------------------------------+
+|<<<BlocksCached>>> | Total number of blocks cached
+*-------------------------------------+--------------------------------------+
+|<<<BlocksUncached>>> | Total number of blocks uncached
+*-------------------------------------+--------------------------------------+
+|<<<ReadsFromLocalClient>>> | Total number of read operations from local client
+*-------------------------------------+--------------------------------------+
+|<<<ReadsFromRemoteClient>>> | Total number of read operations from remote
+                             | client
+*-------------------------------------+--------------------------------------+
+|<<<WritesFromLocalClient>>> | Total number of write operations from local
+                             | client
+*-------------------------------------+--------------------------------------+
+|<<<WritesFromRemoteClient>>> | Total number of write operations from remote
+                              | client
+*-------------------------------------+--------------------------------------+
+|<<<BlocksGetLocalPathInfo>>> | Total number of operations to get local path
+                              | names of blocks
+*-------------------------------------+--------------------------------------+
+|<<<FsyncCount>>> | Total number of fsync
+*-------------------------------------+--------------------------------------+
+|<<<VolumeFailures>>> | Total number of volume failures occurred
+*-------------------------------------+--------------------------------------+
+|<<<ReadBlockOpNumOps>>> | Total number of read operations
+*-------------------------------------+--------------------------------------+
+|<<<ReadBlockOpAvgTime>>> | Average time of read operations in milliseconds
+*-------------------------------------+--------------------------------------+
+|<<<WriteBlockOpNumOps>>> | Total number of write operations
+*-------------------------------------+--------------------------------------+
+|<<<WriteBlockOpAvgTime>>> | Average time of write operations in milliseconds
+*-------------------------------------+--------------------------------------+
+|<<<BlockChecksumOpNumOps>>> | Total number of blockChecksum operations
+*-------------------------------------+--------------------------------------+
+|<<<BlockChecksumOpAvgTime>>> | Average time of blockChecksum operations in
+                              | milliseconds
+*-------------------------------------+--------------------------------------+
+|<<<CopyBlockOpNumOps>>> | Total number of block copy operations
+*-------------------------------------+--------------------------------------+
+|<<<CopyBlockOpAvgTime>>> | Average time of block copy operations in
+                          | milliseconds
+*-------------------------------------+--------------------------------------+
+|<<<ReplaceBlockOpNumOps>>> | Total number of block replace operations
+*-------------------------------------+--------------------------------------+
+|<<<ReplaceBlockOpAvgTime>>> | Average time of block replace operations in
+                             | milliseconds
+*-------------------------------------+--------------------------------------+
+|<<<HeartbeatsNumOps>>> | Total number of heartbeats
+*-------------------------------------+--------------------------------------+
+|<<<HeartbeatsAvgTime>>> | Average heartbeat time in milliseconds
+*-------------------------------------+--------------------------------------+
+|<<<BlockReportsNumOps>>> | Total number of block report operations
+*-------------------------------------+--------------------------------------+
+|<<<BlockReportsAvgTime>>> | Average time of block report operations in
+                           | milliseconds
+*-------------------------------------+--------------------------------------+
+|<<<CacheReportsNumOps>>> | Total number of cache report operations
+*-------------------------------------+--------------------------------------+
+|<<<CacheReportsAvgTime>>> | Average time of cache report operations in
+                           | milliseconds
+*-------------------------------------+--------------------------------------+
+|<<<PacketAckRoundTripTimeNanosNumOps>>> | Total number of ack round trip
+*-------------------------------------+--------------------------------------+
+|<<<PacketAckRoundTripTimeNanosAvgTime>>> | Average time from ack send to
+| | receive minus the downstream ack time in nanoseconds
+*-------------------------------------+--------------------------------------+
+|<<<FlushNanosNumOps>>> | Total number of flushes
+*-------------------------------------+--------------------------------------+
+|<<<FlushNanosAvgTime>>> | Average flush time in nanoseconds
+*-------------------------------------+--------------------------------------+
+|<<<FsyncNanosNumOps>>> | Total number of fsync
+*-------------------------------------+--------------------------------------+
+|<<<FsyncNanosAvgTime>>> | Average fsync time in nanoseconds
+*-------------------------------------+--------------------------------------+
+|<<<SendDataPacketBlockedOnNetworkNanosNumOps>>> | Total number of sending
+                                                 | packets
+*-------------------------------------+--------------------------------------+
+|<<<SendDataPacketBlockedOnNetworkNanosAvgTime>>> | Average waiting time of
+| | sending packets in nanoseconds
+*-------------------------------------+--------------------------------------+
+|<<<SendDataPacketTransferNanosNumOps>>> | Total number of sending packets
+*-------------------------------------+--------------------------------------+
+|<<<SendDataPacketTransferNanosAvgTime>>> | Average transfer time of sending
+                                          | packets in nanoseconds
+*-------------------------------------+--------------------------------------+
+
+ugi context
+
+* UgiMetrics
+
+  UgiMetrics is related to user and group information.
+  Each metrics record contains Hostname tag as additional information
+  along with metrics.
+
+*-------------------------------------+--------------------------------------+
+|| Name                               || Description
+*-------------------------------------+--------------------------------------+
+|<<<LoginSuccessNumOps>>> | Total number of successful kerberos logins
+*-------------------------------------+--------------------------------------+
+|<<<LoginSuccessAvgTime>>> | Average time for successful kerberos logins in
+                           | milliseconds
+*-------------------------------------+--------------------------------------+
+|<<<LoginFailureNumOps>>> | Total number of failed kerberos logins
+*-------------------------------------+--------------------------------------+
+|<<<LoginFailureAvgTime>>> | Average time for failed kerberos logins in
+                           | milliseconds
+*-------------------------------------+--------------------------------------+
+|<<<getGroupsNumOps>>> | Total number of group resolutions
+*-------------------------------------+--------------------------------------+
+|<<<getGroupsAvgTime>>> | Average time for group resolution in milliseconds
+*-------------------------------------+--------------------------------------+
+|<<<getGroups>>><num><<<sNumOps>>> |
+| | Total number of group resolutions (<num> seconds granularity). <num> is
+| | specified by <<<hadoop.user.group.metrics.percentiles.intervals>>>.
+*-------------------------------------+--------------------------------------+
+|<<<getGroups>>><num><<<s50thPercentileLatency>>> |
+| | Shows the 50th percentile of group resolution time in milliseconds
+| | (<num> seconds granularity). <num> is specified by
+| | <<<hadoop.user.group.metrics.percentiles.intervals>>>.
+*-------------------------------------+--------------------------------------+
+|<<<getGroups>>><num><<<s75thPercentileLatency>>> |
+| | Shows the 75th percentile of group resolution time in milliseconds
+| | (<num> seconds granularity). <num> is specified by
+| | <<<hadoop.user.group.metrics.percentiles.intervals>>>.
+*-------------------------------------+--------------------------------------+
+|<<<getGroups>>><num><<<s90thPercentileLatency>>> |
+| | Shows the 90th percentile of group resolution time in milliseconds
+| | (<num> seconds granularity). <num> is specified by
+| | <<<hadoop.user.group.metrics.percentiles.intervals>>>.
+*-------------------------------------+--------------------------------------+
+|<<<getGroups>>><num><<<s95thPercentileLatency>>> |
+| | Shows the 95th percentile of group resolution time in milliseconds
+| | (<num> seconds granularity). <num> is specified by
+| | <<<hadoop.user.group.metrics.percentiles.intervals>>>.
+*-------------------------------------+--------------------------------------+
+|<<<getGroups>>><num><<<s99thPercentileLatency>>> |
+| | Shows the 99th percentile of group resolution time in milliseconds
+| | (<num> seconds granularity). <num> is specified by
+| | <<<hadoop.user.group.metrics.percentiles.intervals>>>.
+*-------------------------------------+--------------------------------------+
+
+metricssystem context
+
+* MetricsSystem
+
+  MetricsSystem shows the statistics for metrics snapshots and publishes.
+  Each metrics record contains Hostname tag as additional information
+  along with metrics.
+
+*-------------------------------------+--------------------------------------+
+|| Name                               || Description
+*-------------------------------------+--------------------------------------+
+|<<<NumActiveSources>>> | Current number of active metrics sources
+*-------------------------------------+--------------------------------------+
+|<<<NumAllSources>>> | Total number of metrics sources
+*-------------------------------------+--------------------------------------+
+|<<<NumActiveSinks>>> | Current number of active sinks
+*-------------------------------------+--------------------------------------+
+|<<<NumAllSinks>>> | Total number of sinks \
+                   | (BUT usually less than <<<NumActiveSinks>>>,
+                   | see {{{https://issues.apache.org/jira/browse/HADOOP-9946}HADOOP-9946}})
+*-------------------------------------+--------------------------------------+
+|<<<SnapshotNumOps>>> | Total number of operations to snapshot statistics from
+                      | a metrics source
+*-------------------------------------+--------------------------------------+
+|<<<SnapshotAvgTime>>> | Average time in milliseconds to snapshot statistics
+                       | from a metrics source
+*-------------------------------------+--------------------------------------+
+|<<<PublishNumOps>>> | Total number of operations to publish statistics to a
+                     | sink
+*-------------------------------------+--------------------------------------+
+|<<<PublishAvgTime>>> | Average time in milliseconds to publish statistics to
+                      | a sink
+*-------------------------------------+--------------------------------------+
+|<<<DroppedPubAll>>> | Total number of dropped publishes
+*-------------------------------------+--------------------------------------+
+|<<<Sink_>>><instance><<<NumOps>>> | Total number of sink operations for the
+                                   | <instance>
+*-------------------------------------+--------------------------------------+
+|<<<Sink_>>><instance><<<AvgTime>>> | Average time in milliseconds of sink
+                                    | operations for the <instance>
+*-------------------------------------+--------------------------------------+
+|<<<Sink_>>><instance><<<Dropped>>> | Total number of dropped sink operations
+                                    | for the <instance>
+*-------------------------------------+--------------------------------------+
+|<<<Sink_>>><instance><<<Qsize>>> | Current queue length of sink operations \
+                                  | (BUT always set to 0 because nothing to
+                                  | increment this metrics, see
+                                  | {{{https://issues.apache.org/jira/browse/HADOOP-9941}HADOOP-9941}})
+*-------------------------------------+--------------------------------------+
+
+default context
+
+* StartupProgress
+
+  StartupProgress metrics shows the statistics of NameNode startup.
+  Four metrics are exposed for each startup phase based on its name.
+  The startup <phase>s are <<<LoadingFsImage>>>, <<<LoadingEdits>>>,
+  <<<SavingCheckpoint>>>, and <<<SafeMode>>>.
+  Each metrics record contains Hostname tag as additional information
+  along with metrics.
+
+*-------------------------------------+--------------------------------------+
+|| Name                               || Description
+*-------------------------------------+--------------------------------------+
+|<<<ElapsedTime>>> | Total elapsed time in milliseconds
+*-------------------------------------+--------------------------------------+
+|<<<PercentComplete>>> | Current rate completed in NameNode startup progress \
+                       | (The max value is not 100 but 1.0)
+*-------------------------------------+--------------------------------------+
+|<phase><<<Count>>> | Total number of steps completed in the phase
+*-------------------------------------+--------------------------------------+
+|<phase><<<ElapsedTime>>> | Total elapsed time in the phase in milliseconds
+*-------------------------------------+--------------------------------------+
+|<phase><<<Total>>> | Total number of steps in the phase
+*-------------------------------------+--------------------------------------+
+|<phase><<<PercentComplete>>> | Current rate completed in the phase \
+                              | (The max value is not 100 but 1.0)
+*-------------------------------------+--------------------------------------+

+ 1 - 0
hadoop-project/src/site/site.xml

@@ -137,6 +137,7 @@
       <item name="Common CHANGES.txt" href="hadoop-project-dist/hadoop-common/CHANGES.txt"/>
       <item name="HDFS CHANGES.txt" href="hadoop-project-dist/hadoop-hdfs/CHANGES.txt"/>
       <item name="MapReduce CHANGES.txt" href="hadoop-project-dist/hadoop-mapreduce/CHANGES.txt"/>
+      <item name="Metrics" href="hadoop-project-dist/hadoop-common/Metrics.html"/>
     </menu>
     
     <menu name="Configuration" inherit="top">