|
@@ -0,0 +1,732 @@
|
|
|
+~~ Licensed to the Apache Software Foundation (ASF) under one or more
|
|
|
+~~ contributor license agreements. See the NOTICE file distributed with
|
|
|
+~~ this work for additional information regarding copyright ownership.
|
|
|
+~~ The ASF licenses this file to You under the Apache License, Version 2.0
|
|
|
+~~ (the "License"); you may not use this file except in compliance with
|
|
|
+~~ the License. You may obtain a copy of the License at
|
|
|
+~~
|
|
|
+~~ http://www.apache.org/licenses/LICENSE-2.0
|
|
|
+~~
|
|
|
+~~ Unless required by applicable law or agreed to in writing, software
|
|
|
+~~ distributed under the License is distributed on an "AS IS" BASIS,
|
|
|
+~~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
|
|
+~~ See the License for the specific language governing permissions and
|
|
|
+~~ limitations under the License.
|
|
|
+
|
|
|
+ ---
|
|
|
+ Metrics Guide
|
|
|
+ ---
|
|
|
+ ---
|
|
|
+ ${maven.build.timestamp}
|
|
|
+
|
|
|
+%{toc}
|
|
|
+
|
|
|
+Overview
|
|
|
+
|
|
|
+ Metrics are statistical information exposed by Hadoop daemons,
|
|
|
+ used for monitoring, performance tuning and debug.
|
|
|
+ There are many metrics available by default
|
|
|
+ and they are very useful for troubleshooting.
|
|
|
+ This page shows the details of the available metrics.
|
|
|
+
|
|
|
+ Each section describes each context into which metrics are grouped.
|
|
|
+
|
|
|
+ The documentation of Metrics 2.0 framework is
|
|
|
+ {{{../../api/org/apache/hadoop/metrics2/package-summary.html}here}}.
|
|
|
+
|
|
|
+jvm context
|
|
|
+
|
|
|
+* JvmMetrics
|
|
|
+
|
|
|
+ Each metrics record contains tags such as ProcessName, SessionID
|
|
|
+ and Hostname as additional information along with metrics.
|
|
|
+
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+|| Name || Description
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+|<<<MemNonHeapUsedM>>> | Current non-heap memory used in MB
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+|<<<MemNonHeapCommittedM>>> | Current non-heap memory committed in MB
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+|<<<MemNonHeapMaxM>>> | Max non-heap memory size in MB
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+|<<<MemHeapUsedM>>> | Current heap memory used in MB
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+|<<<MemHeapCommittedM>>> | Current heap memory committed in MB
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+|<<<MemHeapMaxM>>> | Max heap memory size in MB
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+|<<<MemMaxM>>> | Max memory size in MB
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+|<<<ThreadsNew>>> | Current number of NEW threads
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+|<<<ThreadsRunnable>>> | Current number of RUNNABLE threads
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+|<<<ThreadsBlocked>>> | Current number of BLOCKED threads
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+|<<<ThreadsWaiting>>> | Current number of WAITING threads
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+|<<<ThreadsTimedWaiting>>> | Current number of TIMED_WAITING threads
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+|<<<ThreadsTerminated>>> | Current number of TERMINATED threads
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+|<<<GcInfo>>> | Total GC count and GC time in msec, grouped by the kind of GC. \
|
|
|
+ | ex.) GcCountPS Scavenge=6, GCTimeMillisPS Scavenge=40,
|
|
|
+ | GCCountPS MarkSweep=0, GCTimeMillisPS MarkSweep=0
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+|<<<GcCount>>> | Total GC count
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+|<<<GcTimeMillis>>> | Total GC time in msec
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+|<<<LogFatal>>> | Total number of FATAL logs
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+|<<<LogError>>> | Total number of ERROR logs
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+|<<<LogWarn>>> | Total number of WARN logs
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+|<<<LogInfo>>> | Total number of INFO logs
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+
|
|
|
+rpc context
|
|
|
+
|
|
|
+* rpc
|
|
|
+
|
|
|
+ Each metrics record contains tags such as Hostname
|
|
|
+ and port (number to which server is bound)
|
|
|
+ as additional information along with metrics.
|
|
|
+
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+|| Name || Description
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+|<<<ReceivedBytes>>> | Total number of received bytes
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+|<<<SentBytes>>> | Total number of sent bytes
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+|<<<RpcQueueTimeNumOps>>> | Total number of RPC calls
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+|<<<RpcQueueTimeAvgTime>>> | Average queue time in milliseconds
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+|<<<RpcProcessingTimeNumOps>>> | Total number of RPC calls (same to
|
|
|
+ | RpcQueueTimeNumOps)
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+|<<<RpcProcessingAvgTime>>> | Average Processing time in milliseconds
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+|<<<RpcAuthenticationFailures>>> | Total number of authentication failures
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+|<<<RpcAuthenticationSuccesses>>> | Total number of authentication successes
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+|<<<RpcAuthorizationFailures>>> | Total number of authorization failures
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+|<<<RpcAuthorizationSuccesses>>> | Total number of authorization successes
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+|<<<NumOpenConnections>>> | Current number of open connections
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+|<<<CallQueueLength>>> | Current length of the call queue
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+|<<<rpcQueueTime>>><num><<<sNumOps>>> | Shows total number of RPC calls
|
|
|
+| | (<num> seconds granularity) if <<<rpc.metrics.quantile.enable>>> is set to
|
|
|
+| | true. <num> is specified by <<<rpc.metrics.percentiles.intervals>>>.
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+|<<<rpcQueueTime>>><num><<<s50thPercentileLatency>>> |
|
|
|
+| | Shows the 50th percentile of RPC queue time in milliseconds
|
|
|
+| | (<num> seconds granularity) if <<<rpc.metrics.quantile.enable>>> is set to
|
|
|
+| | true. <num> is specified by <<<rpc.metrics.percentiles.intervals>>>.
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+|<<<rpcQueueTime>>><num><<<s75thPercentileLatency>>> |
|
|
|
+| | Shows the 75th percentile of RPC queue time in milliseconds
|
|
|
+| | (<num> seconds granularity) if <<<rpc.metrics.quantile.enable>>> is set to
|
|
|
+| | true. <num> is specified by <<<rpc.metrics.percentiles.intervals>>>.
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+|<<<rpcQueueTime>>><num><<<s90thPercentileLatency>>> |
|
|
|
+| | Shows the 90th percentile of RPC queue time in milliseconds
|
|
|
+| | (<num> seconds granularity) if <<<rpc.metrics.quantile.enable>>> is set to
|
|
|
+| | true. <num> is specified by <<<rpc.metrics.percentiles.intervals>>>.
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+|<<<rpcQueueTime>>><num><<<s95thPercentileLatency>>> |
|
|
|
+| | Shows the 95th percentile of RPC queue time in milliseconds
|
|
|
+| | (<num> seconds granularity) if <<<rpc.metrics.quantile.enable>>> is set to
|
|
|
+| | true. <num> is specified by <<<rpc.metrics.percentiles.intervals>>>.
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+|<<<rpcQueueTime>>><num><<<s99thPercentileLatency>>> |
|
|
|
+| | Shows the 99th percentile of RPC queue time in milliseconds
|
|
|
+| | (<num> seconds granularity) if <<<rpc.metrics.quantile.enable>>> is set to
|
|
|
+| | true. <num> is specified by <<<rpc.metrics.percentiles.intervals>>>.
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+|<<<rpcProcessingTime>>><num><<<sNumOps>>> | Shows total number of RPC calls
|
|
|
+| | (<num> seconds granularity) if <<<rpc.metrics.quantile.enable>>> is set to
|
|
|
+| | true. <num> is specified by <<<rpc.metrics.percentiles.intervals>>>.
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+|<<<rpcProcessingTime>>><num><<<s50thPercentileLatency>>> |
|
|
|
+| | Shows the 50th percentile of RPC processing time in milliseconds
|
|
|
+| | (<num> seconds granularity) if <<<rpc.metrics.quantile.enable>>> is set to
|
|
|
+| | true. <num> is specified by <<<rpc.metrics.percentiles.intervals>>>.
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+|<<<rpcProcessingTime>>><num><<<s75thPercentileLatency>>> |
|
|
|
+| | Shows the 75th percentile of RPC processing time in milliseconds
|
|
|
+| | (<num> seconds granularity) if <<<rpc.metrics.quantile.enable>>> is set to
|
|
|
+| | true. <num> is specified by <<<rpc.metrics.percentiles.intervals>>>.
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+|<<<rpcProcessingTime>>><num><<<s90thPercentileLatency>>> |
|
|
|
+| | Shows the 90th percentile of RPC processing time in milliseconds
|
|
|
+| | (<num> seconds granularity) if <<<rpc.metrics.quantile.enable>>> is set to
|
|
|
+| | true. <num> is specified by <<<rpc.metrics.percentiles.intervals>>>.
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+|<<<rpcProcessingTime>>><num><<<s95thPercentileLatency>>> |
|
|
|
+| | Shows the 95th percentile of RPC processing time in milliseconds
|
|
|
+| | (<num> seconds granularity) if <<<rpc.metrics.quantile.enable>>> is set to
|
|
|
+| | true. <num> is specified by <<<rpc.metrics.percentiles.intervals>>>.
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+|<<<rpcProcessingTime>>><num><<<s99thPercentileLatency>>> |
|
|
|
+| | Shows the 99th percentile of RPC processing time in milliseconds
|
|
|
+| | (<num> seconds granularity) if <<<rpc.metrics.quantile.enable>>> is set to
|
|
|
+| | true. <num> is specified by <<<rpc.metrics.percentiles.intervals>>>.
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+
|
|
|
+* RetryCache/NameNodeRetryCache
|
|
|
+
|
|
|
+ RetryCache metrics is useful to monitor NameNode fail-over.
|
|
|
+ Each metrics record contains Hostname tag.
|
|
|
+
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+|| Name || Description
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+|<<<CacheHit>>> | Total number of RetryCache hit
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+|<<<CacheCleared>>> | Total number of RetryCache cleared
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+|<<<CacheUpdated>>> | Total number of RetryCache updated
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+
|
|
|
+rpcdetailed context
|
|
|
+
|
|
|
+ Metrics of rpcdetailed context are exposed in unified manner by RPC
|
|
|
+ layer. Two metrics are exposed for each RPC based on its name.
|
|
|
+ Metrics named "(RPC method name)NumOps" indicates total number of
|
|
|
+ method calls, and metrics named "(RPC method name)AvgTime" shows
|
|
|
+ average turn around time for method calls in milliseconds.
|
|
|
+
|
|
|
+* rpcdetailed
|
|
|
+
|
|
|
+ Each metrics record contains tags such as Hostname
|
|
|
+ and port (number to which server is bound)
|
|
|
+ as additional information along with metrics.
|
|
|
+
|
|
|
+ The Metrics about RPCs which is not called are not included
|
|
|
+ in metrics record.
|
|
|
+
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+|| Name || Description
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+|<methodname><<<NumOps>>> | Total number of the times the method is called
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+|<methodname><<<AvgTime>>> | Average turn around time of the method in
|
|
|
+ | milliseconds
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+
|
|
|
+dfs context
|
|
|
+
|
|
|
+* namenode
|
|
|
+
|
|
|
+ Each metrics record contains tags such as ProcessName, SessionId,
|
|
|
+ and Hostname as additional information along with metrics.
|
|
|
+
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+|| Name || Description
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+|<<<CreateFileOps>>> | Total number of files created
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+|<<<FilesCreated>>> | Total number of files and directories created by create
|
|
|
+ | or mkdir operations
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+|<<<FilesAppended>>> | Total number of files appended
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+|<<<GetBlockLocations>>> | Total number of getBlockLocations operations
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+|<<<FilesRenamed>>> | Total number of rename <<operations>> (NOT number of
|
|
|
+ | files/dirs renamed)
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+|<<<GetListingOps>>> | Total number of directory listing operations
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+|<<<DeleteFileOps>>> | Total number of delete operations
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+|<<<FilesDeleted>>> | Total number of files and directories deleted by delete
|
|
|
+ | or rename operations
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+|<<<FileInfoOps>>> | Total number of getFileInfo and getLinkFileInfo
|
|
|
+ | operations
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+|<<<AddBlockOps>>> | Total number of addBlock operations succeeded
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+|<<<GetAdditionalDatanodeOps>>> | Total number of getAdditionalDatanode
|
|
|
+ | operations
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+|<<<CreateSymlinkOps>>> | Total number of createSymlink operations
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+|<<<GetLinkTargetOps>>> | Total number of getLinkTarget operations
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+|<<<FilesInGetListingOps>>> | Total number of files and directories listed by
|
|
|
+ | directory listing operations
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+|<<<AllowSnapshotOps>>> | Total number of allowSnapshot operations
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+|<<<DisallowSnapshotOps>>> | Total number of disallowSnapshot operations
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+|<<<CreateSnapshotOps>>> | Total number of createSnapshot operations
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+|<<<DeleteSnapshotOps>>> | Total number of deleteSnapshot operations
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+|<<<RenameSnapshotOps>>> | Total number of renameSnapshot operations
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+|<<<ListSnapshottableDirOps>>> | Total number of snapshottableDirectoryStatus
|
|
|
+ | operations
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+|<<<SnapshotDiffReportOps>>> | Total number of getSnapshotDiffReport
|
|
|
+ | operations
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+|<<<TransactionsNumOps>>> | Total number of Journal transactions
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+|<<<TransactionsAvgTime>>> | Average time of Journal transactions in
|
|
|
+ | milliseconds
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+|<<<SyncsNumOps>>> | Total number of Journal syncs
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+|<<<SyncsAvgTime>>> | Average time of Journal syncs in milliseconds
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+|<<<TransactionsBatchedInSync>>> | Total number of Journal transactions batched
|
|
|
+ | in sync
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+|<<<BlockReportNumOps>>> | Total number of processing block reports from
|
|
|
+ | DataNode
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+|<<<BlockReportAvgTime>>> | Average time of processing block reports in
|
|
|
+ | milliseconds
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+|<<<CacheReportNumOps>>> | Total number of processing cache reports from
|
|
|
+ | DataNode
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+|<<<CacheReportAvgTime>>> | Average time of processing cache reports in
|
|
|
+ | milliseconds
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+|<<<SafeModeTime>>> | The interval between FSNameSystem starts and the last
|
|
|
+ | time safemode leaves in milliseconds. \
|
|
|
+ | (sometimes not equal to the time in SafeMode,
|
|
|
+ | see {{{https://issues.apache.org/jira/browse/HDFS-5156}HDFS-5156}})
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+|<<<FsImageLoadTime>>> | Time loading FS Image at startup in milliseconds
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+|<<<FsImageLoadTime>>> | Time loading FS Image at startup in milliseconds
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+|<<<GetEditNumOps>>> | Total number of edits downloads from SecondaryNameNode
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+|<<<GetEditAvgTime>>> | Average edits download time in milliseconds
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+|<<<GetImageNumOps>>> |Total number of fsimage downloads from SecondaryNameNode
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+|<<<GetImageAvgTime>>> | Average fsimage download time in milliseconds
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+|<<<PutImageNumOps>>> | Total number of fsimage uploads to SecondaryNameNode
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+|<<<PutImageAvgTime>>> | Average fsimage upload time in milliseconds
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+
|
|
|
+* FSNamesystem
|
|
|
+
|
|
|
+ Each metrics record contains tags such as HAState and Hostname
|
|
|
+ as additional information along with metrics.
|
|
|
+
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+|| Name || Description
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+|<<<MissingBlocks>>> | Current number of missing blocks
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+|<<<ExpiredHeartbeats>>> | Total number of expired heartbeats
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+|<<<TransactionsSinceLastCheckpoint>>> | Total number of transactions since
|
|
|
+ | last checkpoint
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+|<<<TransactionsSinceLastLogRoll>>> | Total number of transactions since last
|
|
|
+ | edit log roll
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+|<<<LastWrittenTransactionId>>> | Last transaction ID written to the edit log
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+|<<<LastCheckpointTime>>> | Time in milliseconds since epoch of last checkpoint
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+|<<<CapacityTotal>>> | Current raw capacity of DataNodes in bytes
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+|<<<CapacityTotalGB>>> | Current raw capacity of DataNodes in GB
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+|<<<CapacityUsed>>> | Current used capacity across all DataNodes in bytes
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+|<<<CapacityUsedGB>>> | Current used capacity across all DataNodes in GB
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+|<<<CapacityRemaining>>> | Current remaining capacity in bytes
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+|<<<CapacityRemainingGB>>> | Current remaining capacity in GB
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+|<<<CapacityUsedNonDFS>>> | Current space used by DataNodes for non DFS
|
|
|
+ | purposes in bytes
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+|<<<TotalLoad>>> | Current number of connections
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+|<<<SnapshottableDirectories>>> | Current number of snapshottable directories
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+|<<<Snapshots>>> | Current number of snapshots
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+|<<<BlocksTotal>>> | Current number of allocated blocks in the system
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+|<<<FilesTotal>>> | Current number of files and directories
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+|<<<PendingReplicationBlocks>>> | Current number of blocks pending to be
|
|
|
+ | replicated
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+|<<<UnderReplicatedBlocks>>> | Current number of blocks under replicated
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+|<<<CorruptBlocks>>> | Current number of blocks with corrupt replicas.
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+|<<<ScheduledReplicationBlocks>>> | Current number of blocks scheduled for
|
|
|
+ | replications
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+|<<<PendingDeletionBlocks>>> | Current number of blocks pending deletion
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+|<<<ExcessBlocks>>> | Current number of excess blocks
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+|<<<PostponedMisreplicatedBlocks>>> | (HA-only) Current number of blocks
|
|
|
+ | postponed to replicate
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+|<<<PendingDataNodeMessageCourt>>> | (HA-only) Current number of pending
|
|
|
+ | block-related messages for later
|
|
|
+ | processing in the standby NameNode
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+|<<<MillisSinceLastLoadedEdits>>> | (HA-only) Time in milliseconds since the
|
|
|
+ | last time standby NameNode load edit log.
|
|
|
+ | In active NameNode, set to 0
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+|<<<BlockCapacity>>> | Current number of block capacity
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+|<<<StaleDataNodes>>> | Current number of DataNodes marked stale due to delayed
|
|
|
+ | heartbeat
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+|<<<TotalFiles>>> |Current number of files and directories (same as FilesTotal)
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+
|
|
|
+* JournalNode
|
|
|
+
|
|
|
+ The server-side metrics for a journal from the JournalNode's perspective.
|
|
|
+ Each metrics record contains Hostname tag as additional information
|
|
|
+ along with metrics.
|
|
|
+
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+|| Name || Description
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+|<<<Syncs60sNumOps>>> | Number of sync operations (1 minute granularity)
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+|<<<Syncs60s50thPercentileLatencyMicros>>> | The 50th percentile of sync
|
|
|
+| | latency in microseconds (1 minute granularity)
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+|<<<Syncs60s75thPercentileLatencyMicros>>> | The 75th percentile of sync
|
|
|
+| | latency in microseconds (1 minute granularity)
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+|<<<Syncs60s90thPercentileLatencyMicros>>> | The 90th percentile of sync
|
|
|
+| | latency in microseconds (1 minute granularity)
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+|<<<Syncs60s95thPercentileLatencyMicros>>> | The 95th percentile of sync
|
|
|
+| | latency in microseconds (1 minute granularity)
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+|<<<Syncs60s99thPercentileLatencyMicros>>> | The 99th percentile of sync
|
|
|
+| | latency in microseconds (1 minute granularity)
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+|<<<Syncs300sNumOps>>> | Number of sync operations (5 minutes granularity)
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+|<<<Syncs300s50thPercentileLatencyMicros>>> | The 50th percentile of sync
|
|
|
+| | latency in microseconds (5 minutes granularity)
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+|<<<Syncs300s75thPercentileLatencyMicros>>> | The 75th percentile of sync
|
|
|
+| | latency in microseconds (5 minutes granularity)
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+|<<<Syncs300s90thPercentileLatencyMicros>>> | The 90th percentile of sync
|
|
|
+| | latency in microseconds (5 minutes granularity)
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+|<<<Syncs300s95thPercentileLatencyMicros>>> | The 95th percentile of sync
|
|
|
+| | latency in microseconds (5 minutes granularity)
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+|<<<Syncs300s99thPercentileLatencyMicros>>> | The 99th percentile of sync
|
|
|
+| | latency in microseconds (5 minutes granularity)
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+|<<<Syncs3600sNumOps>>> | Number of sync operations (1 hour granularity)
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+|<<<Syncs3600s50thPercentileLatencyMicros>>> | The 50th percentile of sync
|
|
|
+| | latency in microseconds (1 hour granularity)
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+|<<<Syncs3600s75thPercentileLatencyMicros>>> | The 75th percentile of sync
|
|
|
+| | latency in microseconds (1 hour granularity)
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+|<<<Syncs3600s90thPercentileLatencyMicros>>> | The 90th percentile of sync
|
|
|
+| | latency in microseconds (1 hour granularity)
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+|<<<Syncs3600s95thPercentileLatencyMicros>>> | The 95th percentile of sync
|
|
|
+| | latency in microseconds (1 hour granularity)
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+|<<<Syncs3600s99thPercentileLatencyMicros>>> | The 99th percentile of sync
|
|
|
+| | latency in microseconds (1 hour granularity)
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+|<<<BatchesWritten>>> | Total number of batches written since startup
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+|<<<TxnsWritten>>> | Total number of transactions written since startup
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+|<<<BytesWritten>>> | Total number of bytes written since startup
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+|<<<BatchesWrittenWhileLagging>>> | Total number of batches written where this
|
|
|
+| | node was lagging
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+|<<<LastWriterEpoch>>> | Current writer's epoch number
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+|<<<CurrentLagTxns>>> | The number of transactions that this JournalNode is
|
|
|
+| | lagging
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+|<<<LastWrittenTxId>>> | The highest transaction id stored on this JournalNode
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+|<<<LastPromisedEpoch>>> | The last epoch number which this node has promised
|
|
|
+| | not to accept any lower epoch, or 0 if no promises have been made
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+
|
|
|
+* datanode
|
|
|
+
|
|
|
+ Each metrics record contains tags such as SessionId and Hostname
|
|
|
+ as additional information along with metrics.
|
|
|
+
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+|| Name || Description
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+|<<<BytesWritten>>> | Total number of bytes written to DataNode
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+|<<<BytesRead>>> | Total number of bytes read from DataNode
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+|<<<BlocksWritten>>> | Total number of blocks written to DataNode
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+|<<<BlocksRead>>> | Total number of blocks read from DataNode
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+|<<<BlocksReplicated>>> | Total number of blocks replicated
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+|<<<BlocksRemoved>>> | Total number of blocks removed
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+|<<<BlocksVerified>>> | Total number of blocks verified
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+|<<<BlockVerificationFailures>>> | Total number of verifications failures
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+|<<<BlocksCached>>> | Total number of blocks cached
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+|<<<BlocksUncached>>> | Total number of blocks uncached
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+|<<<ReadsFromLocalClient>>> | Total number of read operations from local client
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+|<<<ReadsFromRemoteClient>>> | Total number of read operations from remote
|
|
|
+ | client
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+|<<<WritesFromLocalClient>>> | Total number of write operations from local
|
|
|
+ | client
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+|<<<WritesFromRemoteClient>>> | Total number of write operations from remote
|
|
|
+ | client
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+|<<<BlocksGetLocalPathInfo>>> | Total number of operations to get local path
|
|
|
+ | names of blocks
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+|<<<FsyncCount>>> | Total number of fsync
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+|<<<VolumeFailures>>> | Total number of volume failures occurred
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+|<<<ReadBlockOpNumOps>>> | Total number of read operations
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+|<<<ReadBlockOpAvgTime>>> | Average time of read operations in milliseconds
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+|<<<WriteBlockOpNumOps>>> | Total number of write operations
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+|<<<WriteBlockOpAvgTime>>> | Average time of write operations in milliseconds
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+|<<<BlockChecksumOpNumOps>>> | Total number of blockChecksum operations
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+|<<<BlockChecksumOpAvgTime>>> | Average time of blockChecksum operations in
|
|
|
+ | milliseconds
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+|<<<CopyBlockOpNumOps>>> | Total number of block copy operations
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+|<<<CopyBlockOpAvgTime>>> | Average time of block copy operations in
|
|
|
+ | milliseconds
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+|<<<ReplaceBlockOpNumOps>>> | Total number of block replace operations
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+|<<<ReplaceBlockOpAvgTime>>> | Average time of block replace operations in
|
|
|
+ | milliseconds
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+|<<<HeartbeatsNumOps>>> | Total number of heartbeats
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+|<<<HeartbeatsAvgTime>>> | Average heartbeat time in milliseconds
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+|<<<BlockReportsNumOps>>> | Total number of block report operations
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+|<<<BlockReportsAvgTime>>> | Average time of block report operations in
|
|
|
+ | milliseconds
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+|<<<CacheReportsNumOps>>> | Total number of cache report operations
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+|<<<CacheReportsAvgTime>>> | Average time of cache report operations in
|
|
|
+ | milliseconds
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+|<<<PacketAckRoundTripTimeNanosNumOps>>> | Total number of ack round trip
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+|<<<PacketAckRoundTripTimeNanosAvgTime>>> | Average time from ack send to
|
|
|
+| | receive minus the downstream ack time in nanoseconds
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+|<<<FlushNanosNumOps>>> | Total number of flushes
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+|<<<FlushNanosAvgTime>>> | Average flush time in nanoseconds
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+|<<<FsyncNanosNumOps>>> | Total number of fsync
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+|<<<FsyncNanosAvgTime>>> | Average fsync time in nanoseconds
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+|<<<SendDataPacketBlockedOnNetworkNanosNumOps>>> | Total number of sending
|
|
|
+ | packets
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+|<<<SendDataPacketBlockedOnNetworkNanosAvgTime>>> | Average waiting time of
|
|
|
+| | sending packets in nanoseconds
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+|<<<SendDataPacketTransferNanosNumOps>>> | Total number of sending packets
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+|<<<SendDataPacketTransferNanosAvgTime>>> | Average transfer time of sending
|
|
|
+ | packets in nanoseconds
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+
|
|
|
+ugi context
|
|
|
+
|
|
|
+* UgiMetrics
|
|
|
+
|
|
|
+ UgiMetrics is related to user and group information.
|
|
|
+ Each metrics record contains Hostname tag as additional information
|
|
|
+ along with metrics.
|
|
|
+
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+|| Name || Description
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+|<<<LoginSuccessNumOps>>> | Total number of successful kerberos logins
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+|<<<LoginSuccessAvgTime>>> | Average time for successful kerberos logins in
|
|
|
+ | milliseconds
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+|<<<LoginFailureNumOps>>> | Total number of failed kerberos logins
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+|<<<LoginFailureAvgTime>>> | Average time for failed kerberos logins in
|
|
|
+ | milliseconds
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+|<<<getGroupsNumOps>>> | Total number of group resolutions
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+|<<<getGroupsAvgTime>>> | Average time for group resolution in milliseconds
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+|<<<getGroups>>><num><<<sNumOps>>> |
|
|
|
+| | Total number of group resolutions (<num> seconds granularity). <num> is
|
|
|
+| | specified by <<<hadoop.user.group.metrics.percentiles.intervals>>>.
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+|<<<getGroups>>><num><<<s50thPercentileLatency>>> |
|
|
|
+| | Shows the 50th percentile of group resolution time in milliseconds
|
|
|
+| | (<num> seconds granularity). <num> is specified by
|
|
|
+| | <<<hadoop.user.group.metrics.percentiles.intervals>>>.
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+|<<<getGroups>>><num><<<s75thPercentileLatency>>> |
|
|
|
+| | Shows the 75th percentile of group resolution time in milliseconds
|
|
|
+| | (<num> seconds granularity). <num> is specified by
|
|
|
+| | <<<hadoop.user.group.metrics.percentiles.intervals>>>.
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+|<<<getGroups>>><num><<<s90thPercentileLatency>>> |
|
|
|
+| | Shows the 90th percentile of group resolution time in milliseconds
|
|
|
+| | (<num> seconds granularity). <num> is specified by
|
|
|
+| | <<<hadoop.user.group.metrics.percentiles.intervals>>>.
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+|<<<getGroups>>><num><<<s95thPercentileLatency>>> |
|
|
|
+| | Shows the 95th percentile of group resolution time in milliseconds
|
|
|
+| | (<num> seconds granularity). <num> is specified by
|
|
|
+| | <<<hadoop.user.group.metrics.percentiles.intervals>>>.
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+|<<<getGroups>>><num><<<s99thPercentileLatency>>> |
|
|
|
+| | Shows the 99th percentile of group resolution time in milliseconds
|
|
|
+| | (<num> seconds granularity). <num> is specified by
|
|
|
+| | <<<hadoop.user.group.metrics.percentiles.intervals>>>.
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+
|
|
|
+metricssystem context
|
|
|
+
|
|
|
+* MetricsSystem
|
|
|
+
|
|
|
+ MetricsSystem shows the statistics for metrics snapshots and publishes.
|
|
|
+ Each metrics record contains Hostname tag as additional information
|
|
|
+ along with metrics.
|
|
|
+
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+|| Name || Description
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+|<<<NumActiveSources>>> | Current number of active metrics sources
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+|<<<NumAllSources>>> | Total number of metrics sources
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+|<<<NumActiveSinks>>> | Current number of active sinks
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+|<<<NumAllSinks>>> | Total number of sinks \
|
|
|
+ | (BUT usually less than <<<NumActiveSinks>>>,
|
|
|
+ | see {{{https://issues.apache.org/jira/browse/HADOOP-9946}HADOOP-9946}})
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+|<<<SnapshotNumOps>>> | Total number of operations to snapshot statistics from
|
|
|
+ | a metrics source
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+|<<<SnapshotAvgTime>>> | Average time in milliseconds to snapshot statistics
|
|
|
+ | from a metrics source
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+|<<<PublishNumOps>>> | Total number of operations to publish statistics to a
|
|
|
+ | sink
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+|<<<PublishAvgTime>>> | Average time in milliseconds to publish statistics to
|
|
|
+ | a sink
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+|<<<DroppedPubAll>>> | Total number of dropped publishes
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+|<<<Sink_>>><instance><<<NumOps>>> | Total number of sink operations for the
|
|
|
+ | <instance>
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+|<<<Sink_>>><instance><<<AvgTime>>> | Average time in milliseconds of sink
|
|
|
+ | operations for the <instance>
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+|<<<Sink_>>><instance><<<Dropped>>> | Total number of dropped sink operations
|
|
|
+ | for the <instance>
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+|<<<Sink_>>><instance><<<Qsize>>> | Current queue length of sink operations \
|
|
|
+ | (BUT always set to 0 because nothing to
|
|
|
+ | increment this metrics, see
|
|
|
+ | {{{https://issues.apache.org/jira/browse/HADOOP-9941}HADOOP-9941}})
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+
|
|
|
+default context
|
|
|
+
|
|
|
+* StartupProgress
|
|
|
+
|
|
|
+ StartupProgress metrics shows the statistics of NameNode startup.
|
|
|
+ Four metrics are exposed for each startup phase based on its name.
|
|
|
+ The startup <phase>s are <<<LoadingFsImage>>>, <<<LoadingEdits>>>,
|
|
|
+ <<<SavingCheckpoint>>>, and <<<SafeMode>>>.
|
|
|
+ Each metrics record contains Hostname tag as additional information
|
|
|
+ along with metrics.
|
|
|
+
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+|| Name || Description
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+|<<<ElapsedTime>>> | Total elapsed time in milliseconds
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+|<<<PercentComplete>>> | Current rate completed in NameNode startup progress \
|
|
|
+ | (The max value is not 100 but 1.0)
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+|<phase><<<Count>>> | Total number of steps completed in the phase
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+|<phase><<<ElapsedTime>>> | Total elapsed time in the phase in milliseconds
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+|<phase><<<Total>>> | Total number of steps in the phase
|
|
|
+*-------------------------------------+--------------------------------------+
|
|
|
+|<phase><<<PercentComplete>>> | Current rate completed in the phase \
|
|
|
+ | (The max value is not 100 but 1.0)
|
|
|
+*-------------------------------------+--------------------------------------+
|