OzoneGettingStarted.md.vm 11 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307
  1. <!---
  2. Licensed under the Apache License, Version 2.0 (the "License");
  3. you may not use this file except in compliance with the License.
  4. You may obtain a copy of the License at
  5. http://www.apache.org/licenses/LICENSE-2.0
  6. Unless required by applicable law or agreed to in writing, software
  7. distributed under the License is distributed on an "AS IS" BASIS,
  8. WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  9. See the License for the specific language governing permissions and
  10. limitations under the License. See accompanying LICENSE file.
  11. -->
  12. Ozone - Object store for Hadoop
  13. ==============================
  14. Introduction
  15. ------------
  16. Ozone is an object store for Hadoop. It is a redundant, distributed object
  17. store build by leveraging primitives present in HDFS. Ozone supports REST
  18. API for accessing the store.
  19. Getting Started
  20. ---------------
  21. Ozone is a work in progress and currently lives in its own branch. To
  22. use it, you have to build a package by yourself and deploy a cluster.
  23. ### Building Ozone
  24. To build Ozone, please checkout the hadoop sources from github. Then
  25. checkout the ozone branch, HDFS-7240 and build it.
  26. - `git checkout HDFS-7240`
  27. - `mvn clean package -DskipTests=true -Dmaven.javadoc.skip=true -Pdist -Dtar -DskipShade`
  28. skipShade is just to make compilation faster and not really required.
  29. This will give you a tarball in your distribution directory. This is the
  30. tarball that can be used for deploying your hadoop cluster. Here is an
  31. example of the tarball that will be generated.
  32. * `~/apache/hadoop/hadoop-dist/target/${project.version}.tar.gz`
  33. At this point we have an option to setup a physical cluster or run ozone via
  34. docker.
  35. Running Ozone via Docker
  36. ------------------------
  37. This assumes that you have a running docker setup on the machine. Please run
  38. these following commands to see ozone in action.
  39. Go to the directory where the docker compose files exist.
  40. - `cd dev-support/compose/ozone`
  41. Tell docker to start ozone, this will start a KSM, SCM and a single datanode in
  42. the background.
  43. - `docker-compose up -d`
  44. Now let us run some work load against ozone, to do that we will run corona.
  45. This will log into the datanode and run bash.
  46. - `docker-compose exec datanode bash`
  47. - `cd hadoop/bin`
  48. Now you can run the oz command shell or corona the ozone load generator.
  49. This is the command to run corona.
  50. - `./hdfs corona -mode offline -validateWrites -numOfVolumes 1 -numOfBuckets 10 -numOfKeys 100`
  51. You can checkout the KSM UI to see the requests information.
  52. - `http://localhost:9874/`
  53. If you need more datanode you can scale up:
  54. - `docker-compose scale datanode=3`
  55. Running Ozone using a real cluster
  56. ----------------------------------
  57. Please proceed to setup a hadoop cluster by creating the hdfs-site.xml and
  58. other configuration files that are needed for your cluster.
  59. ### Ozone Configuration
  60. Ozone relies on its own configuration file called `ozone-site.xml`. It is
  61. just for convenience and ease of management -- you can add these settings
  62. to `hdfs-site.xml`, if you don't want to keep ozone settings separate.
  63. This document refers to `ozone-site.xml` so that ozone settings are in one
  64. place and not mingled with HDFS settings.
  65. * _*ozone.enabled*_ This is the most important setting for ozone.
  66. Currently, Ozone is an opt-in subsystem of HDFS. By default, Ozone is
  67. disabled. Setting this flag to `true` enables ozone in the HDFS cluster.
  68. Here is an example,
  69. ```
  70. <property>
  71. <name>ozone.enabled</name>
  72. <value>True</value>
  73. </property>
  74. ```
  75. * _*ozone.metadata.dirs*_ Ozone is designed with modern hardware
  76. in mind. It tries to use SSDs effectively. So users can specify where the
  77. metadata must reside. Usually you pick your fastest disk (SSD if
  78. you have them on your nodes). KSM, SCM and datanode will write the metadata
  79. to these disks. This is a required setting, if this is missing Ozone will
  80. fail to come up. Here is an example,
  81. ```
  82. <property>
  83. <name>ozone.metadata.dirs</name>
  84. <value>/data/disk1/meta</value>
  85. </property>
  86. ```
  87. * _*ozone.scm.names*_ Ozone is build on top of container framework (See Ozone
  88. Architecture TODO). Storage container manager(SCM) is a distributed block
  89. service which is used by ozone and other storage services.
  90. This property allows datanodes to discover where SCM is, so that
  91. datanodes can send heartbeat to SCM. SCM is designed to be highly available
  92. and datanodes assume there are multiple instances of SCM which form a highly
  93. available ring. The HA feature of SCM is a work in progress. So we
  94. configure ozone.scm.names to be a single machine. Here is an example,
  95. ```
  96. <property>
  97. <name>ozone.scm.names</name>
  98. <value>scm.hadoop.apache.org</value>
  99. </property>
  100. ```
  101. * _*ozone.scm.datanode.id*_ Each datanode that speaks to SCM generates an ID
  102. just like HDFS. This is an optional setting. Please note:
  103. This path will be created by datanodes if it doesn't exist already. Here is an
  104. example,
  105. ```
  106. <property>
  107. <name>ozone.scm.datanode.id</name>
  108. <value>/data/disk1/scm/meta/node/datanode.id</value>
  109. </property>
  110. ```
  111. * _*ozone.scm.block.client.address*_ Storage Container Manager(SCM) offers a
  112. set of services that can be used to build a distributed storage system. One
  113. of the services offered is the block services. KSM and HDFS would use this
  114. service. This property describes where KSM can discover SCM's block service
  115. endpoint. There is corresponding ports etc, but assuming that we are using
  116. default ports, the server address is the only required field. Here is an
  117. example,
  118. ```
  119. <property>
  120. <name>ozone.scm.block.client.address</name>
  121. <value>scm.hadoop.apache.org</value>
  122. </property>
  123. ```
  124. * _*ozone.ksm.address*_ KSM server address. This is used by Ozonehandler and
  125. Ozone File System.
  126. ```
  127. <property>
  128. <name>ozone.ksm.address</name>
  129. <value>ksm.hadoop.apache.org</value>
  130. </property>
  131. ```
  132. Here is a quick summary of settings needed by Ozone.
  133. | Setting | Value | Comment |
  134. |--------------------------------|------------------------------|------------------------------------------------------------------|
  135. | ozone.enabled | True | This enables SCM and containers in HDFS cluster. |
  136. | ozone.metadata.dirs | file path | The metadata will be stored here. |
  137. | ozone.scm.names | SCM server name | Hostname:port or or IP:port address of SCM. |
  138. | ozone.scm.block.client.address | SCM server name and port | Used by services like KSM |
  139. | ozone.scm.client.address | SCM server name and port | Used by client side |
  140. | ozone.scm.datanode.address | SCM server name and port | Used by datanode to talk to SCM |
  141. | ozone.ksm.address | KSM server name | Used by Ozone handler and Ozone file system. |
  142. Here is a working example of`ozone-site.xml`.
  143. ```
  144. <?xml version="1.0" encoding="UTF-8"?>
  145. <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
  146. <configuration>
  147. <property>
  148. <name>ozone.enabled</name>
  149. <value>True</value>
  150. </property>
  151. <property>
  152. <name>ozone.metadata.dirs</name>
  153. <value>/data/disk1/ozone/meta</value>
  154. </property>
  155. <property>
  156. <name>ozone.scm.names</name>
  157. <value>127.0.0.1</value>
  158. </property>
  159. <property>
  160. <name>ozone.scm.client.address</name>
  161. <value>127.0.0.1:9860</value>
  162. </property>
  163. <property>
  164. <name>ozone.scm.block.client.address</name>
  165. <value>127.0.0.1:9863</value>
  166. </property>
  167. <property>
  168. <name>ozone.scm.datanode.address</name>
  169. <value>127.0.0.1:9861</value>
  170. </property>
  171. <property>
  172. <name>ozone.ksm.address</name>
  173. <value>127.0.0.1:9874</value>
  174. </property>
  175. </configuration>
  176. ```
  177. ### Starting Ozone
  178. Ozone is designed to run concurrently with HDFS. The simplest way to [start
  179. HDFS](../hadoop-common/ClusterSetup.html) is to run `start-dfs.sh` from the
  180. `$HADOOP/sbin/start-dfs.sh`. Once HDFS
  181. is running, please verify it is fully functional by running some commands like
  182. - *./hdfs dfs -mkdir /usr*
  183. - *./hdfs dfs -ls /*
  184. Once you are sure that HDFS is running, start Ozone. To start ozone, you
  185. need to start SCM and KSM. Currently we assume that both KSM and SCM
  186. is running on the same node, this will change in future.
  187. - `./hdfs --daemon start scm`
  188. - `./hdfs --daemon start ksm`
  189. if you would like to start HDFS and Ozone together, you can do that by running
  190. a single command.
  191. - `$HADOOP/sbin/start-ozone.sh`
  192. This command will start HDFS and then start the ozone components.
  193. Once you have ozone running you can use these ozone [shell](./OzoneCommandShell.html)
  194. commands to create a volume, bucket and keys.
  195. ### Diagnosing issues
  196. Ozone tries not to pollute the existing HDFS streams of configuration and
  197. logging. So ozone logs are by default configured to be written to a file
  198. called `ozone.log`. This is controlled by the settings in `log4j.properties`
  199. file in the hadoop configuration directory.
  200. Here is the log4j properties that are added by ozone.
  201. ```
  202. #
  203. # Add a logger for ozone that is separate from the Datanode.
  204. #
  205. #log4j.debug=true
  206. log4j.logger.org.apache.hadoop.ozone=DEBUG,OZONE,FILE
  207. # Do not log into datanode logs. Remove this line to have single log.
  208. log4j.additivity.org.apache.hadoop.ozone=false
  209. # For development purposes, log both to console and log file.
  210. log4j.appender.OZONE=org.apache.log4j.ConsoleAppender
  211. log4j.appender.OZONE.Threshold=info
  212. log4j.appender.OZONE.layout=org.apache.log4j.PatternLayout
  213. log4j.appender.OZONE.layout.ConversionPattern=%d{ISO8601} [%t] %-5p \
  214. %X{component} %X{function} %X{resource} %X{user} %X{request} - %m%n
  215. # Real ozone logger that writes to ozone.log
  216. log4j.appender.FILE=org.apache.log4j.DailyRollingFileAppender
  217. log4j.appender.FILE.File=${hadoop.log.dir}/ozone.log
  218. log4j.appender.FILE.Threshold=debug
  219. log4j.appender.FILE.layout=org.apache.log4j.PatternLayout
  220. log4j.appender.FILE.layout.ConversionPattern=%d{ISO8601} [%t] %-5p \
  221. (%F:%L) %X{function} %X{resource} %X{user} %X{request} - \
  222. %m%n
  223. ```
  224. If you would like to have a single datanode log instead of ozone stuff
  225. getting written to ozone.log, please remove this line or set this to true.
  226. ` log4j.additivity.org.apache.hadoop.ozone=false`
  227. On the SCM/KSM side, you will be able to see
  228. - `hadoop-hdfs-ksm-hostname.log`
  229. - `hadoop-hdfs-scm-hostname.log`
  230. Please file any issues you see under [Object store in HDFS (HDFS-7240)](https://issues.apache.org/jira/browse/HDFS-7240)
  231. as this is still a work in progress.