|
@@ -1,879 +0,0 @@
|
|
|
-~~ Licensed to the Apache Software Foundation (ASF) under one or more
|
|
|
-~~ contributor license agreements. See the NOTICE file distributed with
|
|
|
-~~ this work for additional information regarding copyright ownership.
|
|
|
-~~ The ASF licenses this file to You under the Apache License, Version 2.0
|
|
|
-~~ (the "License"); you may not use this file except in compliance with
|
|
|
-~~ the License. You may obtain a copy of the License at
|
|
|
-~~
|
|
|
-~~ http://www.apache.org/licenses/LICENSE-2.0
|
|
|
-~~
|
|
|
-~~ Unless required by applicable law or agreed to in writing, software
|
|
|
-~~ distributed under the License is distributed on an "AS IS" BASIS,
|
|
|
-~~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
|
|
-~~ See the License for the specific language governing permissions and
|
|
|
-~~ limitations under the License.
|
|
|
-
|
|
|
- ---
|
|
|
- Metrics Guide
|
|
|
- ---
|
|
|
- ---
|
|
|
- ${maven.build.timestamp}
|
|
|
-
|
|
|
-%{toc}
|
|
|
-
|
|
|
-Overview
|
|
|
-
|
|
|
- Metrics are statistical information exposed by Hadoop daemons,
|
|
|
- used for monitoring, performance tuning and debug.
|
|
|
- There are many metrics available by default
|
|
|
- and they are very useful for troubleshooting.
|
|
|
- This page shows the details of the available metrics.
|
|
|
-
|
|
|
- Each section describes each context into which metrics are grouped.
|
|
|
-
|
|
|
- The documentation of Metrics 2.0 framework is
|
|
|
- {{{../../api/org/apache/hadoop/metrics2/package-summary.html}here}}.
|
|
|
-
|
|
|
-jvm context
|
|
|
-
|
|
|
-* JvmMetrics
|
|
|
-
|
|
|
- Each metrics record contains tags such as ProcessName, SessionID
|
|
|
- and Hostname as additional information along with metrics.
|
|
|
-
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|| Name || Description
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<MemNonHeapUsedM>>> | Current non-heap memory used in MB
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<MemNonHeapCommittedM>>> | Current non-heap memory committed in MB
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<MemNonHeapMaxM>>> | Max non-heap memory size in MB
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<MemHeapUsedM>>> | Current heap memory used in MB
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<MemHeapCommittedM>>> | Current heap memory committed in MB
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<MemHeapMaxM>>> | Max heap memory size in MB
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<MemMaxM>>> | Max memory size in MB
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<ThreadsNew>>> | Current number of NEW threads
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<ThreadsRunnable>>> | Current number of RUNNABLE threads
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<ThreadsBlocked>>> | Current number of BLOCKED threads
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<ThreadsWaiting>>> | Current number of WAITING threads
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<ThreadsTimedWaiting>>> | Current number of TIMED_WAITING threads
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<ThreadsTerminated>>> | Current number of TERMINATED threads
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<GcInfo>>> | Total GC count and GC time in msec, grouped by the kind of GC. \
|
|
|
- | ex.) GcCountPS Scavenge=6, GCTimeMillisPS Scavenge=40,
|
|
|
- | GCCountPS MarkSweep=0, GCTimeMillisPS MarkSweep=0
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<GcCount>>> | Total GC count
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<GcTimeMillis>>> | Total GC time in msec
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<LogFatal>>> | Total number of FATAL logs
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<LogError>>> | Total number of ERROR logs
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<LogWarn>>> | Total number of WARN logs
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<LogInfo>>> | Total number of INFO logs
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<GcNumWarnThresholdExceeded>>> | Number of times that the GC warn
|
|
|
- | threshold is exceeded
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<GcNumInfoThresholdExceeded>>> | Number of times that the GC info
|
|
|
- | threshold is exceeded
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<GcTotalExtraSleepTime>>> | Total GC extra sleep time in msec
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-
|
|
|
-rpc context
|
|
|
-
|
|
|
-* rpc
|
|
|
-
|
|
|
- Each metrics record contains tags such as Hostname
|
|
|
- and port (number to which server is bound)
|
|
|
- as additional information along with metrics.
|
|
|
-
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|| Name || Description
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<ReceivedBytes>>> | Total number of received bytes
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<SentBytes>>> | Total number of sent bytes
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<RpcQueueTimeNumOps>>> | Total number of RPC calls
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<RpcQueueTimeAvgTime>>> | Average queue time in milliseconds
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<RpcProcessingTimeNumOps>>> | Total number of RPC calls (same to
|
|
|
- | RpcQueueTimeNumOps)
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<RpcProcessingAvgTime>>> | Average Processing time in milliseconds
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<RpcAuthenticationFailures>>> | Total number of authentication failures
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<RpcAuthenticationSuccesses>>> | Total number of authentication successes
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<RpcAuthorizationFailures>>> | Total number of authorization failures
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<RpcAuthorizationSuccesses>>> | Total number of authorization successes
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<NumOpenConnections>>> | Current number of open connections
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<CallQueueLength>>> | Current length of the call queue
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<rpcQueueTime>>><num><<<sNumOps>>> | Shows total number of RPC calls
|
|
|
-| | (<num> seconds granularity) if <<<rpc.metrics.quantile.enable>>> is set to
|
|
|
-| | true. <num> is specified by <<<rpc.metrics.percentiles.intervals>>>.
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<rpcQueueTime>>><num><<<s50thPercentileLatency>>> |
|
|
|
-| | Shows the 50th percentile of RPC queue time in milliseconds
|
|
|
-| | (<num> seconds granularity) if <<<rpc.metrics.quantile.enable>>> is set to
|
|
|
-| | true. <num> is specified by <<<rpc.metrics.percentiles.intervals>>>.
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<rpcQueueTime>>><num><<<s75thPercentileLatency>>> |
|
|
|
-| | Shows the 75th percentile of RPC queue time in milliseconds
|
|
|
-| | (<num> seconds granularity) if <<<rpc.metrics.quantile.enable>>> is set to
|
|
|
-| | true. <num> is specified by <<<rpc.metrics.percentiles.intervals>>>.
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<rpcQueueTime>>><num><<<s90thPercentileLatency>>> |
|
|
|
-| | Shows the 90th percentile of RPC queue time in milliseconds
|
|
|
-| | (<num> seconds granularity) if <<<rpc.metrics.quantile.enable>>> is set to
|
|
|
-| | true. <num> is specified by <<<rpc.metrics.percentiles.intervals>>>.
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<rpcQueueTime>>><num><<<s95thPercentileLatency>>> |
|
|
|
-| | Shows the 95th percentile of RPC queue time in milliseconds
|
|
|
-| | (<num> seconds granularity) if <<<rpc.metrics.quantile.enable>>> is set to
|
|
|
-| | true. <num> is specified by <<<rpc.metrics.percentiles.intervals>>>.
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<rpcQueueTime>>><num><<<s99thPercentileLatency>>> |
|
|
|
-| | Shows the 99th percentile of RPC queue time in milliseconds
|
|
|
-| | (<num> seconds granularity) if <<<rpc.metrics.quantile.enable>>> is set to
|
|
|
-| | true. <num> is specified by <<<rpc.metrics.percentiles.intervals>>>.
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<rpcProcessingTime>>><num><<<sNumOps>>> | Shows total number of RPC calls
|
|
|
-| | (<num> seconds granularity) if <<<rpc.metrics.quantile.enable>>> is set to
|
|
|
-| | true. <num> is specified by <<<rpc.metrics.percentiles.intervals>>>.
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<rpcProcessingTime>>><num><<<s50thPercentileLatency>>> |
|
|
|
-| | Shows the 50th percentile of RPC processing time in milliseconds
|
|
|
-| | (<num> seconds granularity) if <<<rpc.metrics.quantile.enable>>> is set to
|
|
|
-| | true. <num> is specified by <<<rpc.metrics.percentiles.intervals>>>.
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<rpcProcessingTime>>><num><<<s75thPercentileLatency>>> |
|
|
|
-| | Shows the 75th percentile of RPC processing time in milliseconds
|
|
|
-| | (<num> seconds granularity) if <<<rpc.metrics.quantile.enable>>> is set to
|
|
|
-| | true. <num> is specified by <<<rpc.metrics.percentiles.intervals>>>.
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<rpcProcessingTime>>><num><<<s90thPercentileLatency>>> |
|
|
|
-| | Shows the 90th percentile of RPC processing time in milliseconds
|
|
|
-| | (<num> seconds granularity) if <<<rpc.metrics.quantile.enable>>> is set to
|
|
|
-| | true. <num> is specified by <<<rpc.metrics.percentiles.intervals>>>.
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<rpcProcessingTime>>><num><<<s95thPercentileLatency>>> |
|
|
|
-| | Shows the 95th percentile of RPC processing time in milliseconds
|
|
|
-| | (<num> seconds granularity) if <<<rpc.metrics.quantile.enable>>> is set to
|
|
|
-| | true. <num> is specified by <<<rpc.metrics.percentiles.intervals>>>.
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<rpcProcessingTime>>><num><<<s99thPercentileLatency>>> |
|
|
|
-| | Shows the 99th percentile of RPC processing time in milliseconds
|
|
|
-| | (<num> seconds granularity) if <<<rpc.metrics.quantile.enable>>> is set to
|
|
|
-| | true. <num> is specified by <<<rpc.metrics.percentiles.intervals>>>.
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-
|
|
|
-* RetryCache/NameNodeRetryCache
|
|
|
-
|
|
|
- RetryCache metrics is useful to monitor NameNode fail-over.
|
|
|
- Each metrics record contains Hostname tag.
|
|
|
-
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|| Name || Description
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<CacheHit>>> | Total number of RetryCache hit
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<CacheCleared>>> | Total number of RetryCache cleared
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<CacheUpdated>>> | Total number of RetryCache updated
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-
|
|
|
-rpcdetailed context
|
|
|
-
|
|
|
- Metrics of rpcdetailed context are exposed in unified manner by RPC
|
|
|
- layer. Two metrics are exposed for each RPC based on its name.
|
|
|
- Metrics named "(RPC method name)NumOps" indicates total number of
|
|
|
- method calls, and metrics named "(RPC method name)AvgTime" shows
|
|
|
- average turn around time for method calls in milliseconds.
|
|
|
-
|
|
|
-* rpcdetailed
|
|
|
-
|
|
|
- Each metrics record contains tags such as Hostname
|
|
|
- and port (number to which server is bound)
|
|
|
- as additional information along with metrics.
|
|
|
-
|
|
|
- The Metrics about RPCs which is not called are not included
|
|
|
- in metrics record.
|
|
|
-
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|| Name || Description
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<methodname><<<NumOps>>> | Total number of the times the method is called
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<methodname><<<AvgTime>>> | Average turn around time of the method in
|
|
|
- | milliseconds
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-
|
|
|
-dfs context
|
|
|
-
|
|
|
-* namenode
|
|
|
-
|
|
|
- Each metrics record contains tags such as ProcessName, SessionId,
|
|
|
- and Hostname as additional information along with metrics.
|
|
|
-
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|| Name || Description
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<CreateFileOps>>> | Total number of files created
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<FilesCreated>>> | Total number of files and directories created by create
|
|
|
- | or mkdir operations
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<FilesAppended>>> | Total number of files appended
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<GetBlockLocations>>> | Total number of getBlockLocations operations
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<FilesRenamed>>> | Total number of rename <<operations>> (NOT number of
|
|
|
- | files/dirs renamed)
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<GetListingOps>>> | Total number of directory listing operations
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<DeleteFileOps>>> | Total number of delete operations
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<FilesDeleted>>> | Total number of files and directories deleted by delete
|
|
|
- | or rename operations
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<FileInfoOps>>> | Total number of getFileInfo and getLinkFileInfo
|
|
|
- | operations
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<AddBlockOps>>> | Total number of addBlock operations succeeded
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<GetAdditionalDatanodeOps>>> | Total number of getAdditionalDatanode
|
|
|
- | operations
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<CreateSymlinkOps>>> | Total number of createSymlink operations
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<GetLinkTargetOps>>> | Total number of getLinkTarget operations
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<FilesInGetListingOps>>> | Total number of files and directories listed by
|
|
|
- | directory listing operations
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<AllowSnapshotOps>>> | Total number of allowSnapshot operations
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<DisallowSnapshotOps>>> | Total number of disallowSnapshot operations
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<CreateSnapshotOps>>> | Total number of createSnapshot operations
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<DeleteSnapshotOps>>> | Total number of deleteSnapshot operations
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<RenameSnapshotOps>>> | Total number of renameSnapshot operations
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<ListSnapshottableDirOps>>> | Total number of snapshottableDirectoryStatus
|
|
|
- | operations
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<SnapshotDiffReportOps>>> | Total number of getSnapshotDiffReport
|
|
|
- | operations
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<TransactionsNumOps>>> | Total number of Journal transactions
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<TransactionsAvgTime>>> | Average time of Journal transactions in
|
|
|
- | milliseconds
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<SyncsNumOps>>> | Total number of Journal syncs
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<SyncsAvgTime>>> | Average time of Journal syncs in milliseconds
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<TransactionsBatchedInSync>>> | Total number of Journal transactions batched
|
|
|
- | in sync
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<BlockReportNumOps>>> | Total number of processing block reports from
|
|
|
- | DataNode
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<BlockReportAvgTime>>> | Average time of processing block reports in
|
|
|
- | milliseconds
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<CacheReportNumOps>>> | Total number of processing cache reports from
|
|
|
- | DataNode
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<CacheReportAvgTime>>> | Average time of processing cache reports in
|
|
|
- | milliseconds
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<SafeModeTime>>> | The interval between FSNameSystem starts and the last
|
|
|
- | time safemode leaves in milliseconds. \
|
|
|
- | (sometimes not equal to the time in SafeMode,
|
|
|
- | see {{{https://issues.apache.org/jira/browse/HDFS-5156}HDFS-5156}})
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<FsImageLoadTime>>> | Time loading FS Image at startup in milliseconds
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<FsImageLoadTime>>> | Time loading FS Image at startup in milliseconds
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<GetEditNumOps>>> | Total number of edits downloads from SecondaryNameNode
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<GetEditAvgTime>>> | Average edits download time in milliseconds
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<GetImageNumOps>>> |Total number of fsimage downloads from SecondaryNameNode
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<GetImageAvgTime>>> | Average fsimage download time in milliseconds
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<PutImageNumOps>>> | Total number of fsimage uploads to SecondaryNameNode
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<PutImageAvgTime>>> | Average fsimage upload time in milliseconds
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-
|
|
|
-* FSNamesystem
|
|
|
-
|
|
|
- Each metrics record contains tags such as HAState and Hostname
|
|
|
- as additional information along with metrics.
|
|
|
-
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|| Name || Description
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<MissingBlocks>>> | Current number of missing blocks
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<ExpiredHeartbeats>>> | Total number of expired heartbeats
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<TransactionsSinceLastCheckpoint>>> | Total number of transactions since
|
|
|
- | last checkpoint
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<TransactionsSinceLastLogRoll>>> | Total number of transactions since last
|
|
|
- | edit log roll
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<LastWrittenTransactionId>>> | Last transaction ID written to the edit log
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<LastCheckpointTime>>> | Time in milliseconds since epoch of last checkpoint
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<CapacityTotal>>> | Current raw capacity of DataNodes in bytes
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<CapacityTotalGB>>> | Current raw capacity of DataNodes in GB
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<CapacityUsed>>> | Current used capacity across all DataNodes in bytes
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<CapacityUsedGB>>> | Current used capacity across all DataNodes in GB
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<CapacityRemaining>>> | Current remaining capacity in bytes
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<CapacityRemainingGB>>> | Current remaining capacity in GB
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<CapacityUsedNonDFS>>> | Current space used by DataNodes for non DFS
|
|
|
- | purposes in bytes
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<TotalLoad>>> | Current number of connections
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<SnapshottableDirectories>>> | Current number of snapshottable directories
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<Snapshots>>> | Current number of snapshots
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<BlocksTotal>>> | Current number of allocated blocks in the system
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<FilesTotal>>> | Current number of files and directories
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<PendingReplicationBlocks>>> | Current number of blocks pending to be
|
|
|
- | replicated
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<UnderReplicatedBlocks>>> | Current number of blocks under replicated
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<CorruptBlocks>>> | Current number of blocks with corrupt replicas.
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<ScheduledReplicationBlocks>>> | Current number of blocks scheduled for
|
|
|
- | replications
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<PendingDeletionBlocks>>> | Current number of blocks pending deletion
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<ExcessBlocks>>> | Current number of excess blocks
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<PostponedMisreplicatedBlocks>>> | (HA-only) Current number of blocks
|
|
|
- | postponed to replicate
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<PendingDataNodeMessageCourt>>> | (HA-only) Current number of pending
|
|
|
- | block-related messages for later
|
|
|
- | processing in the standby NameNode
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<MillisSinceLastLoadedEdits>>> | (HA-only) Time in milliseconds since the
|
|
|
- | last time standby NameNode load edit log.
|
|
|
- | In active NameNode, set to 0
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<BlockCapacity>>> | Current number of block capacity
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<StaleDataNodes>>> | Current number of DataNodes marked stale due to delayed
|
|
|
- | heartbeat
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<TotalFiles>>> |Current number of files and directories (same as FilesTotal)
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-
|
|
|
-* JournalNode
|
|
|
-
|
|
|
- The server-side metrics for a journal from the JournalNode's perspective.
|
|
|
- Each metrics record contains Hostname tag as additional information
|
|
|
- along with metrics.
|
|
|
-
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|| Name || Description
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<Syncs60sNumOps>>> | Number of sync operations (1 minute granularity)
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<Syncs60s50thPercentileLatencyMicros>>> | The 50th percentile of sync
|
|
|
-| | latency in microseconds (1 minute granularity)
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<Syncs60s75thPercentileLatencyMicros>>> | The 75th percentile of sync
|
|
|
-| | latency in microseconds (1 minute granularity)
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<Syncs60s90thPercentileLatencyMicros>>> | The 90th percentile of sync
|
|
|
-| | latency in microseconds (1 minute granularity)
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<Syncs60s95thPercentileLatencyMicros>>> | The 95th percentile of sync
|
|
|
-| | latency in microseconds (1 minute granularity)
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<Syncs60s99thPercentileLatencyMicros>>> | The 99th percentile of sync
|
|
|
-| | latency in microseconds (1 minute granularity)
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<Syncs300sNumOps>>> | Number of sync operations (5 minutes granularity)
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<Syncs300s50thPercentileLatencyMicros>>> | The 50th percentile of sync
|
|
|
-| | latency in microseconds (5 minutes granularity)
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<Syncs300s75thPercentileLatencyMicros>>> | The 75th percentile of sync
|
|
|
-| | latency in microseconds (5 minutes granularity)
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<Syncs300s90thPercentileLatencyMicros>>> | The 90th percentile of sync
|
|
|
-| | latency in microseconds (5 minutes granularity)
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<Syncs300s95thPercentileLatencyMicros>>> | The 95th percentile of sync
|
|
|
-| | latency in microseconds (5 minutes granularity)
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<Syncs300s99thPercentileLatencyMicros>>> | The 99th percentile of sync
|
|
|
-| | latency in microseconds (5 minutes granularity)
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<Syncs3600sNumOps>>> | Number of sync operations (1 hour granularity)
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<Syncs3600s50thPercentileLatencyMicros>>> | The 50th percentile of sync
|
|
|
-| | latency in microseconds (1 hour granularity)
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<Syncs3600s75thPercentileLatencyMicros>>> | The 75th percentile of sync
|
|
|
-| | latency in microseconds (1 hour granularity)
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<Syncs3600s90thPercentileLatencyMicros>>> | The 90th percentile of sync
|
|
|
-| | latency in microseconds (1 hour granularity)
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<Syncs3600s95thPercentileLatencyMicros>>> | The 95th percentile of sync
|
|
|
-| | latency in microseconds (1 hour granularity)
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<Syncs3600s99thPercentileLatencyMicros>>> | The 99th percentile of sync
|
|
|
-| | latency in microseconds (1 hour granularity)
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<BatchesWritten>>> | Total number of batches written since startup
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<TxnsWritten>>> | Total number of transactions written since startup
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<BytesWritten>>> | Total number of bytes written since startup
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<BatchesWrittenWhileLagging>>> | Total number of batches written where this
|
|
|
-| | node was lagging
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<LastWriterEpoch>>> | Current writer's epoch number
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<CurrentLagTxns>>> | The number of transactions that this JournalNode is
|
|
|
-| | lagging
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<LastWrittenTxId>>> | The highest transaction id stored on this JournalNode
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<LastPromisedEpoch>>> | The last epoch number which this node has promised
|
|
|
-| | not to accept any lower epoch, or 0 if no promises have been made
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-
|
|
|
-* datanode
|
|
|
-
|
|
|
- Each metrics record contains tags such as SessionId and Hostname
|
|
|
- as additional information along with metrics.
|
|
|
-
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|| Name || Description
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<BytesWritten>>> | Total number of bytes written to DataNode
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<BytesRead>>> | Total number of bytes read from DataNode
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<BlocksWritten>>> | Total number of blocks written to DataNode
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<BlocksRead>>> | Total number of blocks read from DataNode
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<BlocksReplicated>>> | Total number of blocks replicated
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<BlocksRemoved>>> | Total number of blocks removed
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<BlocksVerified>>> | Total number of blocks verified
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<BlockVerificationFailures>>> | Total number of verifications failures
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<BlocksCached>>> | Total number of blocks cached
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<BlocksUncached>>> | Total number of blocks uncached
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<ReadsFromLocalClient>>> | Total number of read operations from local client
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<ReadsFromRemoteClient>>> | Total number of read operations from remote
|
|
|
- | client
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<WritesFromLocalClient>>> | Total number of write operations from local
|
|
|
- | client
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<WritesFromRemoteClient>>> | Total number of write operations from remote
|
|
|
- | client
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<BlocksGetLocalPathInfo>>> | Total number of operations to get local path
|
|
|
- | names of blocks
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<FsyncCount>>> | Total number of fsync
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<VolumeFailures>>> | Total number of volume failures occurred
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<ReadBlockOpNumOps>>> | Total number of read operations
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<ReadBlockOpAvgTime>>> | Average time of read operations in milliseconds
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<WriteBlockOpNumOps>>> | Total number of write operations
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<WriteBlockOpAvgTime>>> | Average time of write operations in milliseconds
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<BlockChecksumOpNumOps>>> | Total number of blockChecksum operations
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<BlockChecksumOpAvgTime>>> | Average time of blockChecksum operations in
|
|
|
- | milliseconds
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<CopyBlockOpNumOps>>> | Total number of block copy operations
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<CopyBlockOpAvgTime>>> | Average time of block copy operations in
|
|
|
- | milliseconds
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<ReplaceBlockOpNumOps>>> | Total number of block replace operations
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<ReplaceBlockOpAvgTime>>> | Average time of block replace operations in
|
|
|
- | milliseconds
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<HeartbeatsNumOps>>> | Total number of heartbeats
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<HeartbeatsAvgTime>>> | Average heartbeat time in milliseconds
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<BlockReportsNumOps>>> | Total number of block report operations
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<BlockReportsAvgTime>>> | Average time of block report operations in
|
|
|
- | milliseconds
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<CacheReportsNumOps>>> | Total number of cache report operations
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<CacheReportsAvgTime>>> | Average time of cache report operations in
|
|
|
- | milliseconds
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<PacketAckRoundTripTimeNanosNumOps>>> | Total number of ack round trip
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<PacketAckRoundTripTimeNanosAvgTime>>> | Average time from ack send to
|
|
|
-| | receive minus the downstream ack time in nanoseconds
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<FlushNanosNumOps>>> | Total number of flushes
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<FlushNanosAvgTime>>> | Average flush time in nanoseconds
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<FsyncNanosNumOps>>> | Total number of fsync
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<FsyncNanosAvgTime>>> | Average fsync time in nanoseconds
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<SendDataPacketBlockedOnNetworkNanosNumOps>>> | Total number of sending
|
|
|
- | packets
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<SendDataPacketBlockedOnNetworkNanosAvgTime>>> | Average waiting time of
|
|
|
-| | sending packets in nanoseconds
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<SendDataPacketTransferNanosNumOps>>> | Total number of sending packets
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<SendDataPacketTransferNanosAvgTime>>> | Average transfer time of sending
|
|
|
- | packets in nanoseconds
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-
|
|
|
-yarn context
|
|
|
-
|
|
|
-* ClusterMetrics
|
|
|
-
|
|
|
- ClusterMetrics shows the metrics of the YARN cluster from the
|
|
|
- ResourceManager's perspective. Each metrics record contains
|
|
|
- Hostname tag as additional information along with metrics.
|
|
|
-
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|| Name || Description
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<NumActiveNMs>>> | Current number of active NodeManagers
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<NumDecommissionedNMs>>> | Current number of decommissioned NodeManagers
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<NumLostNMs>>> | Current number of lost NodeManagers for not sending
|
|
|
- | heartbeats
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<NumUnhealthyNMs>>> | Current number of unhealthy NodeManagers
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<NumRebootedNMs>>> | Current number of rebooted NodeManagers
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-
|
|
|
-* QueueMetrics
|
|
|
-
|
|
|
- QueueMetrics shows an application queue from the
|
|
|
- ResourceManager's perspective. Each metrics record shows
|
|
|
- the statistics of each queue, and contains tags such as
|
|
|
- queue name and Hostname as additional information along with metrics.
|
|
|
-
|
|
|
- In <<<running_>>><num> metrics such as <<<running_0>>>, you can set the
|
|
|
- property <<<yarn.resourcemanager.metrics.runtime.buckets>>> in yarn-site.xml
|
|
|
- to change the buckets. The default values is <<<60,300,1440>>>.
|
|
|
-
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|| Name || Description
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<running_0>>> | Current number of running applications whose elapsed time are
|
|
|
- | less than 60 minutes
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<running_60>>> | Current number of running applications whose elapsed time are
|
|
|
- | between 60 and 300 minutes
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<running_300>>> | Current number of running applications whose elapsed time are
|
|
|
- | between 300 and 1440 minutes
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<running_1440>>> | Current number of running applications elapsed time are
|
|
|
- | more than 1440 minutes
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<AppsSubmitted>>> | Total number of submitted applications
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<AppsRunning>>> | Current number of running applications
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<AppsPending>>> | Current number of applications that have not yet been
|
|
|
- | assigned by any containers
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<AppsCompleted>>> | Total number of completed applications
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<AppsKilled>>> | Total number of killed applications
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<AppsFailed>>> | Total number of failed applications
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<AllocatedMB>>> | Current allocated memory in MB
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<AllocatedVCores>>> | Current allocated CPU in virtual cores
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<AllocatedContainers>>> | Current number of allocated containers
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<AggregateContainersAllocated>>> | Total number of allocated containers
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<AggregateContainersReleased>>> | Total number of released containers
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<AvailableMB>>> | Current available memory in MB
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<AvailableVCores>>> | Current available CPU in virtual cores
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<PendingMB>>> | Current pending memory resource requests in MB that are
|
|
|
- | not yet fulfilled by the scheduler
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<PendingVCores>>> | Current pending CPU allocation requests in virtual
|
|
|
- | cores that are not yet fulfilled by the scheduler
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<PendingContainers>>> | Current pending resource requests that are not
|
|
|
- | yet fulfilled by the scheduler
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<ReservedMB>>> | Current reserved memory in MB
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<ReservedVCores>>> | Current reserved CPU in virtual cores
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<ReservedContainers>>> | Current number of reserved containers
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<ActiveUsers>>> | Current number of active users
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<ActiveApplications>>> | Current number of active applications
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<FairShareMB>>> | (FairScheduler only) Current fair share of memory in MB
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<FairShareVCores>>> | (FairScheduler only) Current fair share of CPU in
|
|
|
- | virtual cores
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<MinShareMB>>> | (FairScheduler only) Minimum share of memory in MB
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<MinShareVCores>>> | (FairScheduler only) Minimum share of CPU in virtual
|
|
|
- | cores
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<MaxShareMB>>> | (FairScheduler only) Maximum share of memory in MB
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<MaxShareVCores>>> | (FairScheduler only) Maximum share of CPU in virtual
|
|
|
- | cores
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-
|
|
|
-* NodeManagerMetrics
|
|
|
-
|
|
|
- NodeManagerMetrics shows the statistics of the containers in the node.
|
|
|
- Each metrics record contains Hostname tag as additional information
|
|
|
- along with metrics.
|
|
|
-
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|| Name || Description
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<containersLaunched>>> | Total number of launched containers
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<containersCompleted>>> | Total number of successfully completed containers
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<containersFailed>>> | Total number of failed containers
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<containersKilled>>> | Total number of killed containers
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<containersIniting>>> | Current number of initializing containers
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<containersRunning>>> | Current number of running containers
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<allocatedContainers>>> | Current number of allocated containers
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<allocatedGB>>> | Current allocated memory in GB
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<availableGB>>> | Current available memory in GB
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-
|
|
|
-ugi context
|
|
|
-
|
|
|
-* UgiMetrics
|
|
|
-
|
|
|
- UgiMetrics is related to user and group information.
|
|
|
- Each metrics record contains Hostname tag as additional information
|
|
|
- along with metrics.
|
|
|
-
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|| Name || Description
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<LoginSuccessNumOps>>> | Total number of successful kerberos logins
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<LoginSuccessAvgTime>>> | Average time for successful kerberos logins in
|
|
|
- | milliseconds
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<LoginFailureNumOps>>> | Total number of failed kerberos logins
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<LoginFailureAvgTime>>> | Average time for failed kerberos logins in
|
|
|
- | milliseconds
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<getGroupsNumOps>>> | Total number of group resolutions
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<getGroupsAvgTime>>> | Average time for group resolution in milliseconds
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<getGroups>>><num><<<sNumOps>>> |
|
|
|
-| | Total number of group resolutions (<num> seconds granularity). <num> is
|
|
|
-| | specified by <<<hadoop.user.group.metrics.percentiles.intervals>>>.
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<getGroups>>><num><<<s50thPercentileLatency>>> |
|
|
|
-| | Shows the 50th percentile of group resolution time in milliseconds
|
|
|
-| | (<num> seconds granularity). <num> is specified by
|
|
|
-| | <<<hadoop.user.group.metrics.percentiles.intervals>>>.
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<getGroups>>><num><<<s75thPercentileLatency>>> |
|
|
|
-| | Shows the 75th percentile of group resolution time in milliseconds
|
|
|
-| | (<num> seconds granularity). <num> is specified by
|
|
|
-| | <<<hadoop.user.group.metrics.percentiles.intervals>>>.
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<getGroups>>><num><<<s90thPercentileLatency>>> |
|
|
|
-| | Shows the 90th percentile of group resolution time in milliseconds
|
|
|
-| | (<num> seconds granularity). <num> is specified by
|
|
|
-| | <<<hadoop.user.group.metrics.percentiles.intervals>>>.
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<getGroups>>><num><<<s95thPercentileLatency>>> |
|
|
|
-| | Shows the 95th percentile of group resolution time in milliseconds
|
|
|
-| | (<num> seconds granularity). <num> is specified by
|
|
|
-| | <<<hadoop.user.group.metrics.percentiles.intervals>>>.
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<getGroups>>><num><<<s99thPercentileLatency>>> |
|
|
|
-| | Shows the 99th percentile of group resolution time in milliseconds
|
|
|
-| | (<num> seconds granularity). <num> is specified by
|
|
|
-| | <<<hadoop.user.group.metrics.percentiles.intervals>>>.
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-
|
|
|
-metricssystem context
|
|
|
-
|
|
|
-* MetricsSystem
|
|
|
-
|
|
|
- MetricsSystem shows the statistics for metrics snapshots and publishes.
|
|
|
- Each metrics record contains Hostname tag as additional information
|
|
|
- along with metrics.
|
|
|
-
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|| Name || Description
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<NumActiveSources>>> | Current number of active metrics sources
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<NumAllSources>>> | Total number of metrics sources
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<NumActiveSinks>>> | Current number of active sinks
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<NumAllSinks>>> | Total number of sinks \
|
|
|
- | (BUT usually less than <<<NumActiveSinks>>>,
|
|
|
- | see {{{https://issues.apache.org/jira/browse/HADOOP-9946}HADOOP-9946}})
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<SnapshotNumOps>>> | Total number of operations to snapshot statistics from
|
|
|
- | a metrics source
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<SnapshotAvgTime>>> | Average time in milliseconds to snapshot statistics
|
|
|
- | from a metrics source
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<PublishNumOps>>> | Total number of operations to publish statistics to a
|
|
|
- | sink
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<PublishAvgTime>>> | Average time in milliseconds to publish statistics to
|
|
|
- | a sink
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<DroppedPubAll>>> | Total number of dropped publishes
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<Sink_>>><instance><<<NumOps>>> | Total number of sink operations for the
|
|
|
- | <instance>
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<Sink_>>><instance><<<AvgTime>>> | Average time in milliseconds of sink
|
|
|
- | operations for the <instance>
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<Sink_>>><instance><<<Dropped>>> | Total number of dropped sink operations
|
|
|
- | for the <instance>
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<Sink_>>><instance><<<Qsize>>> | Current queue length of sink operations \
|
|
|
- | (BUT always set to 0 because nothing to
|
|
|
- | increment this metrics, see
|
|
|
- | {{{https://issues.apache.org/jira/browse/HADOOP-9941}HADOOP-9941}})
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-
|
|
|
-default context
|
|
|
-
|
|
|
-* StartupProgress
|
|
|
-
|
|
|
- StartupProgress metrics shows the statistics of NameNode startup.
|
|
|
- Four metrics are exposed for each startup phase based on its name.
|
|
|
- The startup <phase>s are <<<LoadingFsImage>>>, <<<LoadingEdits>>>,
|
|
|
- <<<SavingCheckpoint>>>, and <<<SafeMode>>>.
|
|
|
- Each metrics record contains Hostname tag as additional information
|
|
|
- along with metrics.
|
|
|
-
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|| Name || Description
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<ElapsedTime>>> | Total elapsed time in milliseconds
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<<<PercentComplete>>> | Current rate completed in NameNode startup progress \
|
|
|
- | (The max value is not 100 but 1.0)
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<phase><<<Count>>> | Total number of steps completed in the phase
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<phase><<<ElapsedTime>>> | Total elapsed time in the phase in milliseconds
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<phase><<<Total>>> | Total number of steps in the phase
|
|
|
-*-------------------------------------+--------------------------------------+
|
|
|
-|<phase><<<PercentComplete>>> | Current rate completed in the phase \
|
|
|
- | (The max value is not 100 but 1.0)
|
|
|
-*-------------------------------------+--------------------------------------+
|