zookeeperAdmin.xml 32 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506507508509510511512513514515516517518519520521522523524525526527528529530531532533534535536537538539540541542543544545546547548549550551552553554555556557558559560561562563564565566567568569570571572573574575576577578579580581582583584585586587588589590591592593594595596597598599600601602603604605606607608609610611612613614615616617618619620621622623624625626627628629630631632633634635636637638639640641642643644645646647648649650651652653654655656657658659660661662663664665666667668669670671672673674675676677678679680681682683684685686687688689690691692693694695696697698699700701702703704705706707708709710711712713714715716717718719720721722723724725726727728729730731732733734735736737738739740741742743744745746747748749750751752753754755756757758759760761762763764765766767768769770771772773774775776777778779780781782783784785786787788789790791792793794795796797798799800801802803
  1. <?xml version="1.0" encoding="UTF-8"?>
  2. <!--
  3. Copyright 2002-2004 The Apache Software Foundation
  4. Licensed under the Apache License, Version 2.0 (the "License");
  5. you may not use this file except in compliance with the License.
  6. You may obtain a copy of the License at
  7. http://www.apache.org/licenses/LICENSE-2.0
  8. Unless required by applicable law or agreed to in writing, software
  9. distributed under the License is distributed on an "AS IS" BASIS,
  10. WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  11. See the License for the specific language governing permissions and
  12. limitations under the License.
  13. -->
  14. <!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN"
  15. "http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd">
  16. <book id="bk_Admin">
  17. <title>ZooKeeper Administrator's Guide</title>
  18. <subtitle>A Guide to Deployment and Administration</subtitle>
  19. <bookinfo>
  20. <legalnotice>
  21. <para>Licensed under the Apache License, Version 2.0 (the "License");
  22. you may not use this file except in compliance with the License. You may
  23. obtain a copy of the License at <ulink
  24. url="http://www.apache.org/licenses/LICENSE-2.0">http://www.apache.org/licenses/LICENSE-2.0</ulink>.</para>
  25. <para>Unless required by applicable law or agreed to in writing,
  26. software distributed under the License is distributed on an "AS IS"
  27. BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
  28. implied. See the License for the specific language governing permissions
  29. and limitations under the License.</para>
  30. </legalnotice>
  31. <abstract>
  32. <para>This document contains information about deploying, administering
  33. and mantaining ZooKeeper. It also discusses best practices and common
  34. problems.</para>
  35. <para>$Revision: 1.7 $ $Date: 2008/09/19 05:29:31 $</para>
  36. </abstract>
  37. </bookinfo>
  38. <chapter id="ch_deployment">
  39. <title>Deployment</title>
  40. <para>This chapter contains information about deploying Zookeeper and
  41. covers these topics:</para>
  42. <itemizedlist>
  43. <listitem>
  44. <para><xref linkend="sc_systemReq"/></para>
  45. </listitem>
  46. <listitem>
  47. <para><xref linkend="sc_zkMulitServerSetup"/></para>
  48. </listitem>
  49. <listitem>
  50. <para><xref linkend="sc_singleAndDevSetup"/></para>
  51. </listitem>
  52. </itemizedlist>
  53. <para>The first two sections assume you are interested in installing
  54. Zookeeper in a production environment such as a datacenter. The final
  55. section covers situations in which you are setting up Zookeeper on a
  56. limited basis - for evaluation, testing, or development - but not in a
  57. production environment.</para>
  58. <section id="sc_systemReq">
  59. <title>System Requirements</title>
  60. <para>Zookeeper runs in Java, release 1.5 or greater, as group of hosts
  61. called a quorum. Three Zookeeper hosts per quorum is the minimum
  62. recommended quorum size. At Yahoo!, Zookeeper is usually deployed on
  63. dedicated RHEL boxes, with dual-core processors, 2GB of RAM, and 80GB
  64. IDE harddrives.</para>
  65. </section>
  66. <section id="sc_zkMulitServerSetup">
  67. <title>Clustered (Multi-Server) Setup</title>
  68. <para>For reliable ZooKeeper service, you should deploy ZooKeeper in a
  69. cluster known as a <firstterm>quorum</firstterm>. As long as a majority
  70. of the quorum are up, the service will be available. Because Zookeeper
  71. requires a majority, it is best to use an
  72. odd number of machines. For example, with four machines ZooKeeper can
  73. only handle the failure of a single machine; if two machines fail, the
  74. remaining two machines do not constitute a majority. However, with five
  75. machines ZooKeeper can handle the failure of two machines. </para>
  76. <para>Here are the steps to setting a server that will be part of a
  77. quorum. These steps should be performed on every host in the
  78. quorum:</para>
  79. <orderedlist>
  80. <listitem>
  81. <para>Install the Java JDK:</para>
  82. <screen>$yinst -i jdk-1.6.0.00_3 -br test </screen>
  83. </listitem>
  84. <listitem>
  85. <para>Set the Java heap size. This is very important, to avoid
  86. swapping, which will seriously degrade Zookeeper performance. To
  87. determine the correct value, load tests, make sure you are well
  88. below the usage limit that would cause you to swap. Be conservative
  89. - use a maximum heap size of 3GB for a 4GB machine.</para>
  90. </listitem>
  91. <listitem>
  92. <para>Install the Zookeeper Server Package:</para>
  93. <screen>$ yinst install -nostart zookeeper_server </screen>
  94. </listitem>
  95. <listitem>
  96. <para>Create a configuration file. This file can be called anything.
  97. Use the following settings as a starting point:</para>
  98. <screen>
  99. tickTime=2000
  100. dataDir=/var/zookeeper/
  101. clientPort=2181
  102. initLimit=5
  103. syncLimit=2
  104. server.1=zoo1:2888
  105. server.2=zoo2:2888
  106. server.3=zoo3:2888</screen>
  107. <para>You can find the meanings of these and other configuration
  108. settings in the section <xref linkend="sc_configuration" />. A word
  109. though about a few here:</para>
  110. <para>Every machine that is part of the ZooKeeper quorum should know
  111. about every other machine in the quorum. You accomplish this with
  112. the series of lines of the form <emphasis
  113. role="bold">server.id=host:port</emphasis>. The integers <emphasis
  114. role="bold">host</emphasis> and <emphasis
  115. role="bold">port</emphasis> are straightforward. You attribute the
  116. server id to each machine by creating a file named
  117. <filename>myid</filename>, one for each server, which resides in
  118. that server's data directory, as specified by the configuration file
  119. parameter <emphasis role="bold">dataDir</emphasis>. The myid file
  120. consists of a single line containing only the text of that machine's
  121. id. So <filename>myid</filename> of server 1 would contain the text
  122. "1" and nothing else. The id must be unique within the
  123. quorum.</para>
  124. </listitem>
  125. <listitem>
  126. <para>If your configuration file is set up, you can start
  127. Zookeeper:</para>
  128. <screen>$ java -cp zookeeper-dev.jar:java/lib/log4j-1.2.15.jar:conf \
  129. org.apache.zookeeper.server.quorum.QuorumPeerMain zoo.cfg</screen>
  130. </listitem>
  131. <listitem>
  132. <para>Test your deployment by connecting to the hosts:</para>
  133. <itemizedlist>
  134. <listitem>
  135. <para>In Java, you can run the following command to execute
  136. simple operations:</para>
  137. <screen>$ java -cp zookeeper.jar:java/lib/log4j-1.2.15.jar:conf \
  138. org.apache.zookeeper.ZooKeeperMain 127.0.0.1:2181</screen>
  139. </listitem>
  140. <listitem>
  141. <para>In C, you can compile either the single threaded client or
  142. the multithreaded client: or n the c subdirectory in the
  143. Zookeeper sources. This compiles the single threaded
  144. client:</para>
  145. <screen>$ _make cli_st_</screen>
  146. <para>And this compiles the mulithreaded client:</para>
  147. <screen>$ _make cli_mt_</screen>
  148. </listitem>
  149. </itemizedlist>
  150. <para>Running either program gives you a shell in which to execute
  151. simple file-system-like operations. To connect to Zookeeper with the multithreaded
  152. client, for example, you would run:</para>
  153. <screen>$ cli_mt 127.0.0.1:2181</screen>
  154. </listitem>
  155. </orderedlist>
  156. </section>
  157. <section id="sc_singleAndDevSetup">
  158. <title>Single Server and Developer Setup</title>
  159. <para>If you want to setup Zookeeper for development purposes, you will
  160. probably want to setup a single server instance of Zookeeper, and then
  161. install either the Java or C client-side libraries and bindings on your
  162. development machine.</para>
  163. <para>The steps to setting up a single server instance are the similar
  164. to the above, except the configuration file is simpler. You can find the
  165. complete instructions in the <ulink
  166. url="zookeeperStarted.html#sc_InstallingSingleMode">Installing
  167. and Running Zookeeper in SIngle Server Mode</ulink> section of the
  168. <ulink url="zookeeperStarted.html">Zookeeper
  169. Getting Started Guide</ulink>.</para>
  170. <para>For information on installing the client side libraries, refer to
  171. the <ulink
  172. url="zookeeperProgrammers.html#Bindings">Bindings</ulink>
  173. section of the <ulink
  174. url="zookeeperProgrammers.html">Zookeeper
  175. Programmer's Guide</ulink>.</para>
  176. </section>
  177. </chapter>
  178. <chapter id="ch_administration">
  179. <title>Administration</title>
  180. <para>This chapter contains information about running and maintaining
  181. ZooKeeper and covers these topics: <itemizedlist>
  182. <listitem>
  183. <para><xref linkend="sc_configuration"/></para>
  184. </listitem>
  185. <listitem>
  186. <para><xref linkend="sc_zkCommands"/></para>
  187. </listitem>
  188. <listitem>
  189. <para><xref linkend="sc_dataFileManagement"/></para>
  190. </listitem>
  191. <listitem>
  192. <para><xref linkend="sc_commonProblems"/></para>
  193. </listitem>
  194. <listitem>
  195. <para><xref linkend="sc_bestPractices"/></para>
  196. </listitem>
  197. </itemizedlist></para>
  198. <section id="sc_configuration">
  199. <title>Configuration Parameters</title>
  200. <para>ZooKeeper's behavior is governed by the ZooKeeper configuration
  201. file. This file is designed so that the exact same file can be used by
  202. all the servers that make up a ZooKeeper server assuming the disk
  203. layouts are the same. If servers use different configuration files,
  204. care must be taken to ensure that the list of servers in all of the
  205. different configuration files match.</para>
  206. <section id="sc_minimumConfiguration">
  207. <title>Minimum Configuration</title>
  208. <para>Here are the minimum configuration keywords that must be
  209. defined in the configuration file:</para>
  210. <variablelist>
  211. <varlistentry>
  212. <term>clientPort</term>
  213. <listitem>
  214. <para>the port to listen for client connections; that is, the
  215. port that clients attempt to connect to.</para>
  216. </listitem>
  217. </varlistentry>
  218. <varlistentry>
  219. <term>dataDir</term>
  220. <listitem>
  221. <para>the location where Zookeeper will store the in-memory
  222. database snapshots and, unless specified otherwise, the
  223. transaction log of updates to the database.</para>
  224. <note>
  225. <para>Be careful where you put the transaction log. A
  226. dedicated transaction log device is key to consistent good
  227. performance. Putting the log on a busy device will adversely
  228. effect performance.</para>
  229. </note>
  230. </listitem>
  231. </varlistentry>
  232. <varlistentry id="id_tickTime">
  233. <term>tickTime</term>
  234. <listitem>
  235. <para>the length of a single tick, which is the basic time
  236. unit used by ZooKeeper, as measured in milliseconds. It is
  237. used to regulate heartbeats, and timeouts. For example, the
  238. minimum session timeout will be two ticks.</para>
  239. </listitem>
  240. </varlistentry>
  241. </variablelist>
  242. </section>
  243. <section id="sc_advancedConfiguration">
  244. <title>Advanced Configuration</title>
  245. <para>The configuration settings in the section are optional. You
  246. can use them to further fine tune the behaviour of your Zookeeper
  247. servers. Some can also be set using Java system properties,
  248. generally of the form <emphasis>zookeeper.keyword</emphasis>. The
  249. exact system property, when available, is noted below.</para>
  250. <variablelist>
  251. <varlistentry>
  252. <term>dataLogDir</term>
  253. <listitem>
  254. <para>(No Java system property)</para>
  255. <para>This option will direct the machine to write the
  256. transaction log to the <emphasis
  257. role="bold">dataLogDir</emphasis> rather than the <emphasis
  258. role="bold">dataDir</emphasis>. This allows a dedicated log
  259. device to be used, and helps avoid competition between logging
  260. and snaphots.</para>
  261. <note>
  262. <para>Having a dedicated log device has a large impact on
  263. throughput and stable latencies. It is highly recommened to
  264. dedicate a log device and set <emphasis
  265. role="bold">dataLogDir</emphasis> to point to a directory on
  266. that device, and then make sure to point <emphasis
  267. role="bold">dataDir</emphasis> to a directory
  268. <emphasis>not</emphasis> residing on that device.</para>
  269. </note>
  270. </listitem>
  271. </varlistentry>
  272. <varlistentry>
  273. <term>globalOutstandingLimit</term>
  274. <listitem>
  275. <para>(Java system property: <emphasis
  276. role="bold">zookeeper.globalOutstandingLimit.</emphasis>)</para>
  277. <para>Clients can submit requests faster than ZooKeeper can
  278. process them, especially if there are a lot of clients. To
  279. prevent ZooKeeper from running out of memory due to queued
  280. requests, ZooKeeper will throttle clients so that there is no
  281. more than globalOutstandingLimit outstanding requests in the
  282. system. The default limit is 1,000.</para>
  283. </listitem>
  284. </varlistentry>
  285. <varlistentry>
  286. <term>preAllocSize</term>
  287. <listitem>
  288. <para>(Java system property: <emphasis
  289. role="bold">zookeeper.preAllocSize</emphasis>)</para>
  290. <para>To avoid seeks ZooKeeper allocates space in the
  291. transaction log file in blocks of preAllocSize kilobytes. The
  292. default block size is 64M. One reason for changing the size of
  293. the blocks is to reduce the block size if snapshots are taken
  294. more often. (Also, see <emphasis
  295. role="bold">snapCount</emphasis>).</para>
  296. </listitem>
  297. </varlistentry>
  298. <varlistentry>
  299. <term>snapCount</term>
  300. <listitem>
  301. <para>(Java system property: <emphasis
  302. role="bold">zookeeper.snapCount</emphasis>)</para>
  303. <para>Clients can submit requests faster than ZooKeeper can
  304. process them, especially if there are a lot of clients. To
  305. prevent ZooKeeper from running out of memory due to queued
  306. requests, ZooKeeper will throttle clients so that there is no
  307. more than globalOutstandingLimit outstanding requests in the
  308. system. The default limit is 1,000.ZooKeeper logs transactions
  309. to a transaction log. After snapCount transactions are written
  310. to a log file a snapshot is started and a new transaction log
  311. file is started. The default snapCount is 10,000.</para>
  312. </listitem>
  313. </varlistentry>
  314. <varlistentry>
  315. <term>traceFile</term>
  316. <listitem>
  317. <para>(Java system property: <emphasis
  318. role="bold">requestTraceFile</emphasis>)</para>
  319. <para>If this option is defined, requests will be will logged
  320. to a trace file named traceFile.year.month.day. Use of this
  321. option provides useful debugging information, but will impact
  322. performance. (Note: The system property has no zookeeper
  323. prefix, and the configuration variable name is different from
  324. the system property. Yes - it's not consistent, and it's
  325. annoying.)</para>
  326. </listitem>
  327. </varlistentry>
  328. </variablelist>
  329. </section>
  330. <section id="sc_clusterOptions">
  331. <title>Cluster Options</title>
  332. <para>The options in this section are designed for use in quorums --
  333. that is, when deploying clusters of servers.</para>
  334. <variablelist>
  335. <varlistentry>
  336. <term>electionAlg:</term>
  337. <listitem>
  338. <para>(No Java system property)</para>
  339. <para>Election implementation to use. A value of "0"
  340. corresponds to the original UDP-based version, "1" corresponds
  341. to the non-authenticated UDP-based version of fast leader
  342. election, "2" corresponds to the authenticated UDP-based
  343. version of fast leader election, and "3" corresponds to
  344. TCP-based version of fast leader election</para>
  345. </listitem>
  346. </varlistentry>
  347. <varlistentry>
  348. <term>electionPort</term>
  349. <listitem>
  350. <para>(No Java system property)</para>
  351. <para>Port used for leader election. It is only used when the
  352. election algorithm is not "0". When the election algorithm is
  353. "0" a UDP port with the same port number as the port listed in
  354. the <emphasis role="bold">server.num</emphasis> option will be
  355. used.</para>
  356. </listitem>
  357. </varlistentry>
  358. <varlistentry>
  359. <term>initLimit</term>
  360. <listitem>
  361. <para>(No Java system property)</para>
  362. <para>Amount of time, in ticks (see <ulink
  363. url="#id_tickTime">tickTime</ulink>), to allow followers to
  364. connect and sync to a leader. Increased this value as needed,
  365. if the amount of data managed by ZooKeeper is large.</para>
  366. </listitem>
  367. </varlistentry>
  368. <varlistentry>
  369. <term>leaderServes</term>
  370. <listitem>
  371. <para>(Java system property: zookeeper.<emphasis
  372. role="bold">leaderServes</emphasis>)</para>
  373. <para>Leader accepts client connections. Default value is
  374. "yes". The leader machine coordinates updates. For higher
  375. update throughput at thes slight expense of read throughput
  376. the leader can be configured to not accept clients and focus
  377. on coordination. The default to this option is yes, which
  378. means that a leader will accept client connections.
  379. </para>
  380. <note>
  381. <para>Turning on leader selection is highly recommended when
  382. you have more than three Zookeeper servers in a
  383. quorum.</para>
  384. </note>
  385. </listitem>
  386. </varlistentry>
  387. <varlistentry>
  388. <term>server.x=[hostname]:nnnn, etc</term>
  389. <listitem>
  390. <para>(No Java system property)</para>
  391. <para>servers making up the Zookeeper quorum. When the server
  392. starts up, it determines which server it is by looking for the
  393. file <filename>myid</filename> in the data directory. That file contains the
  394. server number, in ASCII, and it should match <emphasis
  395. role="bold">x</emphasis> in <emphasis
  396. role="bold">server.x</emphasis> in the left hand side of this
  397. setting.</para>
  398. <para>The list of servers that make up ZooKeeper servers that
  399. is used by the clients must match the list of ZooKeeper
  400. servers that each ZooKeeper server has.</para>
  401. <para>The port numbers <emphasis role="bold">nnnn</emphasis>
  402. in this setting are the <emphasis>electionPort</emphasis>
  403. numbers of the servers (as opposed to clientPorts).
  404. If you want to test multiple servers on a single
  405. machine, the individual choices of electionPort for each
  406. server can be defined in each server's config files using the
  407. line electionPort=xxxx to avoid clashes.</para>
  408. </listitem>
  409. </varlistentry>
  410. <varlistentry>
  411. <term>syncLimit</term>
  412. <listitem>
  413. <para>(No Java system property)</para>
  414. <para>Amount of time, in ticks (see <ulink
  415. url="#id_tickTime">tickTime</ulink>), to allow followers to
  416. sync with ZooKeeper. If followers fall too far behind a
  417. leader, they will be dropped.</para>
  418. </listitem>
  419. </varlistentry>
  420. </variablelist>
  421. <para></para>
  422. </section>
  423. <section>
  424. <title>Unsafe Options</title>
  425. <para>The following options can be useful, but be careful when you
  426. use them. The risk of each is explained along with the explanation
  427. of what the variable does.</para>
  428. <variablelist>
  429. <varlistentry>
  430. <term>forceSync</term>
  431. <listitem>
  432. <para>(Java system property: <emphasis
  433. role="bold">zookeeper.forceSync</emphasis>)</para>
  434. <para>Requires updates to be synced to media of the
  435. transaction log before finishing processing the update. If
  436. this option is set to no, ZooKeeper will not require updates
  437. to be synced to the media.</para>
  438. </listitem>
  439. </varlistentry>
  440. <varlistentry>
  441. <term>jute.maxbuffer:</term>
  442. <listitem>
  443. <para>(Java system property:<emphasis role="bold">
  444. jute.maxbuffer</emphasis>)</para>
  445. <para>This option can only be set as a Java system property.
  446. There is no zookeeper prefix on it. It specifies the maximum
  447. size of the data that can be stored in a znode. The default is
  448. 0xfffff, or just under 1M. If this option is changed, the
  449. system property must be set on all servers and clients
  450. otherwise problems will arise. This is really a sanity check.
  451. ZooKeeper is designed to store data on the order of kilobytes
  452. in size.</para>
  453. </listitem>
  454. </varlistentry>
  455. <varlistentry>
  456. <term>skipACL</term>
  457. <listitem>
  458. <para>(Java system property: <emphasis
  459. role="bold">zookeeper.skipACL</emphasis>)</para>
  460. <para>Skips ACL checks.
  461. This results in a boost in throughput, but opens up full
  462. access to the data tree to everyone.</para>
  463. </listitem>
  464. </varlistentry>
  465. </variablelist>
  466. </section>
  467. </section>
  468. <section id="sc_zkCommands">
  469. <title>Zookeeper Commands: The Four Letter Words</title>
  470. <para>Zookeeper responds to a small set of commands. Each command is composed of
  471. four letters. You issue the commands to Zookeeper via telnet or nc, at
  472. the client port.</para>
  473. <variablelist>
  474. <varlistentry>
  475. <term>dump</term>
  476. <listitem>
  477. <para>Lists the outstanding sessions and ephemeral nodes. This
  478. only works on the leader.</para>
  479. </listitem>
  480. </varlistentry>
  481. <varlistentry>
  482. <term>kill</term>
  483. <listitem>
  484. <para>Shuts down the server. This must be issued from the
  485. machine the Zookeeper server is running on.</para>
  486. </listitem>
  487. </varlistentry>
  488. <varlistentry>
  489. <term>ruok</term>
  490. <listitem>
  491. <para>Tests if server is running in a non-error state. The
  492. server will respond with imok if it is running. Otherwise it
  493. will not respond at all.</para>
  494. </listitem>
  495. </varlistentry>
  496. <varlistentry>
  497. <term>stat</term>
  498. <listitem>
  499. <para>Lists statistics about performance and connected
  500. clients.</para>
  501. </listitem>
  502. </varlistentry>
  503. </variablelist>
  504. <para>Here's an example of the <emphasis role="bold">ruok</emphasis>
  505. command:</para>
  506. <screen>$ echo ruok | nc 127.0.0.1 5111
  507. imok
  508. </screen>
  509. </section>
  510. <section id="sc_monitoring">
  511. <title>Monitoring</title>
  512. <remark>[tbd]</remark>
  513. </section>
  514. <section id="sc_dataFileManagement">
  515. <title>Data File Management</title>
  516. <para>ZooKeeper stores its data in a data directory and its transaction
  517. log in a transaction log directory. By default these two directories are
  518. the same. The server can (and should) be configured to store the
  519. transaction log files in a separate directory than the data files.
  520. Throughput increases and latency decreases when transaction logs reside
  521. on a dedicated log devices.</para>
  522. <section>
  523. <title>The Data Directory</title>
  524. <para>This directory has two files in it:</para>
  525. <itemizedlist>
  526. <listitem>
  527. <para><filename>myid</filename> - contains a single integer in
  528. human readable ASCII text that represents the server id.</para>
  529. </listitem>
  530. <listitem>
  531. <para><filename>snapshot.&lt;zxid&gt;</filename> - holds the fuzzy
  532. snapshot of a data tree.</para>
  533. </listitem>
  534. </itemizedlist>
  535. <para>Each ZooKeeper server has a unique id. This id is used in two
  536. places: the <filename>myid</filename> file and the configuration file.
  537. The <filename>myid</filename> file identifies the server that
  538. corresponds to the given data directory. The configuration file lists
  539. the contact information for each server identified by its server id.
  540. When a ZooKeeper server instance starts, it reads its id from the
  541. <filename>myid</filename> file and then, using that id, reads from the
  542. configuration file, looking up the port on which it should
  543. listen.</para>
  544. <para>The <filename>snapshot</filename> files stored in the data
  545. directory are fuzzy snapshots in the sense that during the time the
  546. ZooKeeper server is taking the snapshot, updates are occurring to the
  547. data tree. The suffix of the <filename>snapshot</filename> file names
  548. is the <emphasis>zxid</emphasis>, the ZooKeeper transaction id, of the
  549. last committed transaction at the start of the snapshot. Thus, the
  550. snapshot includes a subset of the updates to the data tree that
  551. occurred while the snapshot was in process. The snapshot, then, may
  552. not correspond to any data tree that actually existed, and for this
  553. reason we refer to it as a fuzzy snapshot. Still, ZooKeeper can
  554. recover using this snapshot because it takes advantage of the
  555. idempotent nature of its updates. By replaying the transaction log
  556. against fuzzy snapshots ZooKeeper gets the state of the system at the
  557. end of the log.</para>
  558. </section>
  559. <section>
  560. <title>The Log Directory</title>
  561. <para>The Log Directory contains the ZooKeeper transaction logs.
  562. Before any update takes place, ZooKeeper ensures that the transaction
  563. that represents the update is written to non-volatile storage. A new
  564. log file is started each time a snapshot is begun. The log file's
  565. suffix is the first zxid written to that log.</para>
  566. </section>
  567. <section>
  568. <title>File Management</title>
  569. <para>The format of snapshot and log files does not change between
  570. standalone ZooKeeper servers and different configurations of
  571. replicated ZooKeeper servers. Therefore, you can pull these files from
  572. a running replicated ZooKeeper server to a development machine with a
  573. stand-alone ZooKeeper server for trouble shooting.</para>
  574. <para>Using older log and snapshot files, you can look at the previous
  575. state of ZooKeeper servers and even restore that state. The
  576. LogFormatter class allows an administrator to look at the transactions
  577. in a log.</para>
  578. <para>The ZooKeeper server creates snapshot and log files, but never
  579. deletes them. The retention policy of the data and log files is
  580. implemented outside of the ZooKeeper server. The server itself only
  581. needs the latest complete fuzzy snapshot and the log files from the
  582. start of that snapshot. The PurgeTxnLog utility implements a simple
  583. retention policy that administrators can use.</para>
  584. </section>
  585. </section>
  586. <section id="sc_commonProblems">
  587. <title>Things to Avoid</title>
  588. <para>Here are some common problems you can avoid by configuring
  589. ZooKeeper correctly:</para>
  590. <variablelist>
  591. <varlistentry>
  592. <term>inconsistent lists of servers</term>
  593. <listitem>
  594. <para>The list of Zookeeper servers used by the clients must match
  595. the list of ZooKeeper servers that each ZooKeeper server has.
  596. Things work okay if the client list is a subset of the real list,
  597. but things will really act strange if clients have a list of
  598. ZooKeeper servers that are in different ZooKeeper clusters. Also,
  599. the server lists in each Zookeeper server configuration file
  600. should be consistent with one another.</para>
  601. </listitem>
  602. </varlistentry>
  603. <varlistentry>
  604. <term>incorrect placement of transasction log</term>
  605. <listitem>
  606. <para>The most performance critical part of ZooKeeper is the
  607. transaction log. Zookeeper syncs transactions to media before it
  608. returns a response. A dedicated transaction log device is key to
  609. consistent good performance. Putting the log on a busy device will
  610. adversely effect performance. If you only have one storage device,
  611. put trace files on NFS and increase the snapshotCount; it doesn't
  612. eliminate the problem, but it should mitigate it.</para>
  613. </listitem>
  614. </varlistentry>
  615. <varlistentry>
  616. <term>incorrect Java heap size</term>
  617. <listitem>
  618. <para>You should take special care to set your Java max heap size
  619. correctly. In particular, you should not create a situation in
  620. which Zookeeper swaps to disk. The disk is death to ZooKeeper.
  621. Everything is ordered, so if processing one request swaps the
  622. disk, all other queued requests will probably do the same. the
  623. disk. DON'T SWAP.</para>
  624. <para>Be conservative in your estimates: if you have 4G of RAM, do
  625. not set the Java max heap size to 6G or even 4G. For example, it
  626. is more likely you would use a 3G heap for a 4G machine, as the
  627. operating system and the cache also need memory. The best and only
  628. recommend practice for estimating the heap size your system needs
  629. is to run load tests, and then make sure you are well below the
  630. usage limit that would cause the system to swap.</para>
  631. </listitem>
  632. </varlistentry>
  633. </variablelist>
  634. </section>
  635. <section id="sc_bestPractices">
  636. <title>Best Practices</title>
  637. <para>For best results, take note of the following list of good
  638. Zookeeper practices. <remark>[tbd...]</remark></para>
  639. </section>
  640. </chapter>
  641. </book>