HdfsImageViewer.apt.vm 11 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247
  1. ~~ Licensed under the Apache License, Version 2.0 (the "License");
  2. ~~ you may not use this file except in compliance with the License.
  3. ~~ You may obtain a copy of the License at
  4. ~~
  5. ~~ http://www.apache.org/licenses/LICENSE-2.0
  6. ~~
  7. ~~ Unless required by applicable law or agreed to in writing, software
  8. ~~ distributed under the License is distributed on an "AS IS" BASIS,
  9. ~~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  10. ~~ See the License for the specific language governing permissions and
  11. ~~ limitations under the License. See accompanying LICENSE file.
  12. ---
  13. Offline Image Viewer Guide
  14. ---
  15. ---
  16. ${maven.build.timestamp}
  17. Offline Image Viewer Guide
  18. %{toc|section=1|fromDepth=0}
  19. * Overview
  20. The Offline Image Viewer is a tool to dump the contents of hdfs fsimage
  21. files to a human-readable format and provide read-only WebHDFS API
  22. in order to allow offline analysis and examination of an Hadoop cluster's
  23. namespace. The tool is able to process very large image files relatively
  24. quickly. The tool handles the layout formats that were included with Hadoop
  25. versions 2.4 and up. If you want to handle older layout formats, you can
  26. use the Offline Image Viewer of Hadoop 2.3 or {{oiv_legacy Command}}.
  27. If the tool is not able to process an image file, it will exit cleanly.
  28. The Offline Image Viewer does not require a Hadoop cluster to be running;
  29. it is entirely offline in its operation.
  30. The Offline Image Viewer provides several output processors:
  31. [[1]] Web is the default output processor. It launches a HTTP server
  32. that exposes read-only WebHDFS API. Users can investigate the namespace
  33. interactively by using HTTP REST API.
  34. [[2]] XML creates an XML document of the fsimage and includes all of the
  35. information within the fsimage, similar to the lsr processor. The
  36. output of this processor is amenable to automated processing and
  37. analysis with XML tools. Due to the verbosity of the XML syntax,
  38. this processor will also generate the largest amount of output.
  39. [[3]] FileDistribution is the tool for analyzing file sizes in the
  40. namespace image. In order to run the tool one should define a range
  41. of integers [0, maxSize] by specifying maxSize and a step. The
  42. range of integers is divided into segments of size step: [0, s[1],
  43. ..., s[n-1], maxSize], and the processor calculates how many files
  44. in the system fall into each segment [s[i-1], s[i]). Note that
  45. files larger than maxSize always fall into the very last segment.
  46. The output file is formatted as a tab separated two column table:
  47. Size and NumFiles. Where Size represents the start of the segment,
  48. and numFiles is the number of files form the image which size falls
  49. in this segment.
  50. * Usage
  51. ** Web Processor
  52. Web processor launches a HTTP server which exposes read-only WebHDFS API.
  53. Users can specify the address to listen by -addr option (default by
  54. localhost:5978).
  55. ----
  56. bash$ bin/hdfs oiv -i fsimage
  57. 14/04/07 13:25:14 INFO offlineImageViewer.WebImageViewer: WebImageViewer
  58. started. Listening on /127.0.0.1:5978. Press Ctrl+C to stop the viewer.
  59. ----
  60. Users can access the viewer and get the information of the fsimage by
  61. the following shell command:
  62. ----
  63. bash$ bin/hdfs dfs -ls webhdfs://127.0.0.1:5978/
  64. Found 2 items
  65. drwxrwx--- - root supergroup 0 2014-03-26 20:16 webhdfs://127.0.0.1:5978/tmp
  66. drwxr-xr-x - root supergroup 0 2014-03-31 14:08 webhdfs://127.0.0.1:5978/user
  67. ----
  68. To get the information of all the files and directories, you can simply use
  69. the following command:
  70. ----
  71. bash$ bin/hdfs dfs -ls -R webhdfs://127.0.0.1:5978/
  72. ----
  73. Users can also get JSON formatted FileStatuses via HTTP REST API.
  74. ----
  75. bash$ curl -i http://127.0.0.1:5978/webhdfs/v1/?op=liststatus
  76. HTTP/1.1 200 OK
  77. Content-Type: application/json
  78. Content-Length: 252
  79. {"FileStatuses":{"FileStatus":[
  80. {"fileId":16386,"accessTime":0,"replication":0,"owner":"theuser","length":0,"permission":"755","blockSize":0,"modificationTime":1392772497282,"type":"DIRECTORY","group":"supergroup","childrenNum":1,"pathSuffix":"user"}
  81. ]}}
  82. ----
  83. The Web processor now supports the following operations:
  84. * {{{./WebHDFS.html#List_a_Directory}LISTSTATUS}}
  85. * {{{./WebHDFS.html#Status_of_a_FileDirectory}GETFILESTATUS}}
  86. * {{{./WebHDFS.html#Get_ACL_Status}GETACLSTATUS}}
  87. ** XML Processor
  88. XML Processor is used to dump all the contents in the fsimage. Users can
  89. specify input and output file via -i and -o command-line.
  90. ----
  91. bash$ bin/hdfs oiv -p XML -i fsimage -o fsimage.xml
  92. ----
  93. This will create a file named fsimage.xml contains all the information in
  94. the fsimage. For very large image files, this process may take several
  95. minutes.
  96. Applying the Offline Image Viewer with XML processor would result in the
  97. following output:
  98. ----
  99. <?xml version="1.0"?>
  100. <fsimage>
  101. <NameSection>
  102. <genstampV1>1000</genstampV1>
  103. <genstampV2>1002</genstampV2>
  104. <genstampV1Limit>0</genstampV1Limit>
  105. <lastAllocatedBlockId>1073741826</lastAllocatedBlockId>
  106. <txid>37</txid>
  107. </NameSection>
  108. <INodeSection>
  109. <lastInodeId>16400</lastInodeId>
  110. <inode>
  111. <id>16385</id>
  112. <type>DIRECTORY</type>
  113. <name></name>
  114. <mtime>1392772497282</mtime>
  115. <permission>theuser:supergroup:rwxr-xr-x</permission>
  116. <nsquota>9223372036854775807</nsquota>
  117. <dsquota>-1</dsquota>
  118. </inode>
  119. ...remaining output omitted...
  120. ----
  121. * Options
  122. *-----------------------:-----------------------------------+
  123. | <<Flag>> | <<Description>> |
  124. *-----------------------:-----------------------------------+
  125. | <<<-i>>>\|<<<--inputFile>>> <input file> | Specify the input fsimage file
  126. | | to process. Required.
  127. *-----------------------:-----------------------------------+
  128. | <<<-o>>>\|<<<--outputFile>>> <output file> | Specify the output filename,
  129. | | if the specified output processor generates one. If
  130. | | the specified file already exists, it is silently
  131. | | overwritten. (output to stdout by default)
  132. *-----------------------:-----------------------------------+
  133. | <<<-p>>>\|<<<--processor>>> <processor> | Specify the image processor to
  134. | | apply against the image file. Currently valid options
  135. | | are Web (default), XML and FileDistribution.
  136. *-----------------------:-----------------------------------+
  137. | <<<-addr>>> <address> | Specify the address(host:port) to listen.
  138. | | (localhost:5978 by default). This option is used with
  139. | | Web processor.
  140. *-----------------------:-----------------------------------+
  141. | <<<-maxSize>>> <size> | Specify the range [0, maxSize] of file sizes to be
  142. | | analyzed in bytes (128GB by default). This option is
  143. | | used with FileDistribution processor.
  144. *-----------------------:-----------------------------------+
  145. | <<<-step>>> <size> | Specify the granularity of the distribution in bytes
  146. | | (2MB by default). This option is used with
  147. | | FileDistribution processor.
  148. *-----------------------:-----------------------------------+
  149. | <<<-h>>>\|<<<--help>>>| Display the tool usage and help information and
  150. | | exit.
  151. *-----------------------:-----------------------------------+
  152. * Analyzing Results
  153. The Offline Image Viewer makes it easy to gather large amounts of data
  154. about the hdfs namespace. This information can then be used to explore
  155. file system usage patterns or find specific files that match arbitrary
  156. criteria, along with other types of namespace analysis.
  157. * oiv_legacy Command
  158. Due to the internal layout changes introduced by the ProtocolBuffer-based
  159. fsimage ({{{https://issues.apache.org/jira/browse/HDFS-5698}HDFS-5698}}),
  160. OfflineImageViewer consumes excessive amount of memory and loses some
  161. functions such as Indented and Delimited processor. If you want to process
  162. without large amount of memory or use these processors, you can use
  163. <<<oiv_legacy>>> command (same as <<<oiv>>> in Hadoop 2.3).
  164. ** Usage
  165. 1. Set <<<dfs.namenode.legacy-oiv-image.dir>>> to an appropriate directory
  166. to make standby NameNode or SecondaryNameNode save its namespace in the
  167. old fsimage format during checkpointing.
  168. 2. Use <<<oiv_legacy>>> command to the old format fsimage.
  169. ----
  170. bash$ bin/hdfs oiv_legacy -i fsimage_old -o output
  171. ----
  172. ** Options
  173. *-----------------------:-----------------------------------+
  174. | <<Flag>> | <<Description>> |
  175. *-----------------------:-----------------------------------+
  176. | <<<-i>>>\|<<<--inputFile>>> <input file> | Specify the input fsimage file to
  177. | | process. Required.
  178. *-----------------------:-----------------------------------+
  179. | <<<-o>>>\|<<<--outputFile>>> <output file> | Specify the output filename, if
  180. | | the specified output processor generates one. If the
  181. | | specified file already exists, it is silently
  182. | | overwritten. Required.
  183. *-----------------------:-----------------------------------+
  184. | <<<-p>>>\|<<<--processor>>> <processor> | Specify the image processor to
  185. | | apply against the image file. Valid options are
  186. | | Ls (default), XML, Delimited, Indented, and
  187. | | FileDistribution.
  188. *-----------------------:-----------------------------------+
  189. | <<<-skipBlocks>>> | Do not enumerate individual blocks within files. This
  190. | | may save processing time and outfile file space on
  191. | | namespaces with very large files. The Ls processor
  192. | | reads the blocks to correctly determine file sizes
  193. | | and ignores this option.
  194. *-----------------------:-----------------------------------+
  195. | <<<-printToScreen>>> | Pipe output of processor to console as well as
  196. | | specified file. On extremely large namespaces, this
  197. | | may increase processing time by an order of
  198. | | magnitude.
  199. *-----------------------:-----------------------------------+
  200. | <<<-delimiter>>> <arg>| When used in conjunction with the Delimited
  201. | | processor, replaces the default tab delimiter with
  202. | | the string specified by <arg>.
  203. *-----------------------:-----------------------------------+
  204. | <<<-h>>>\|<<<--help>>>| Display the tool usage and help information and exit.
  205. *-----------------------:-----------------------------------+