|
@@ -231,6 +231,17 @@ document.write("Last Published: " + document.lastModified);
|
|
<a href="#sc_administering">Administering</a>
|
|
<a href="#sc_administering">Administering</a>
|
|
</li>
|
|
</li>
|
|
<li>
|
|
<li>
|
|
|
|
+<a href="#sc_maintenance">Maintenance</a>
|
|
|
|
+<ul class="minitoc">
|
|
|
|
+<li>
|
|
|
|
+<a href="#Ongoing+Data+Directory+Cleanup">Ongoing Data Directory Cleanup</a>
|
|
|
|
+</li>
|
|
|
|
+<li>
|
|
|
|
+<a href="#Debug+Log+Cleanup+%28log4j%29">Debug Log Cleanup (log4j)</a>
|
|
|
|
+</li>
|
|
|
|
+</ul>
|
|
|
|
+</li>
|
|
|
|
+<li>
|
|
<a href="#sc_monitoring">Monitoring</a>
|
|
<a href="#sc_monitoring">Monitoring</a>
|
|
</li>
|
|
</li>
|
|
<li>
|
|
<li>
|
|
@@ -269,7 +280,7 @@ document.write("Last Published: " + document.lastModified);
|
|
<a href="#The+Log+Directory">The Log Directory</a>
|
|
<a href="#The+Log+Directory">The Log Directory</a>
|
|
</li>
|
|
</li>
|
|
<li>
|
|
<li>
|
|
-<a href="#File+Management">File Management</a>
|
|
|
|
|
|
+<a href="#sc_filemanagement">File Management</a>
|
|
</li>
|
|
</li>
|
|
</ul>
|
|
</ul>
|
|
</li>
|
|
</li>
|
|
@@ -472,7 +483,7 @@ server.3=zoo3:2888:3888</span>
|
|
consists of a single line containing only the text of that machine's
|
|
consists of a single line containing only the text of that machine's
|
|
id. So <span class="codefrag filename">myid</span> of server 1 would contain the text
|
|
id. So <span class="codefrag filename">myid</span> of server 1 would contain the text
|
|
"1" and nothing else. The id must be unique within the
|
|
"1" and nothing else. The id must be unique within the
|
|
- ensemble.</p>
|
|
|
|
|
|
+ ensemble and should have a value between 1 and 255.</p>
|
|
|
|
|
|
</li>
|
|
</li>
|
|
|
|
|
|
@@ -626,6 +637,15 @@ server.3=zoo3:2888:3888</span>
|
|
</li>
|
|
</li>
|
|
|
|
|
|
|
|
|
|
|
|
+<li>
|
|
|
|
+
|
|
|
|
+<p>
|
|
|
|
+<a href="#sc_maintenance">Maintenance</a>
|
|
|
|
+</p>
|
|
|
|
+
|
|
|
|
+</li>
|
|
|
|
+
|
|
|
|
+
|
|
<li>
|
|
<li>
|
|
|
|
|
|
<p>
|
|
<p>
|
|
@@ -698,7 +718,7 @@ server.3=zoo3:2888:3888</span>
|
|
</li>
|
|
</li>
|
|
|
|
|
|
</ul>
|
|
</ul>
|
|
-<a name="N101A6"></a><a name="sc_designing"></a>
|
|
|
|
|
|
+<a name="N101AE"></a><a name="sc_designing"></a>
|
|
<h3 class="h4">Designing a ZooKeeper Deployment</h3>
|
|
<h3 class="h4">Designing a ZooKeeper Deployment</h3>
|
|
<p>The reliablity of ZooKeeper rests on two basic assumptions.</p>
|
|
<p>The reliablity of ZooKeeper rests on two basic assumptions.</p>
|
|
<ol>
|
|
<ol>
|
|
@@ -725,7 +745,7 @@ server.3=zoo3:2888:3888</span>
|
|
to hold true. Some of these are cross-machines considerations,
|
|
to hold true. Some of these are cross-machines considerations,
|
|
and others are things you should consider for each and every
|
|
and others are things you should consider for each and every
|
|
machine in your deployment.</p>
|
|
machine in your deployment.</p>
|
|
-<a name="N101C2"></a><a name="sc_CrossMachineRequirements"></a>
|
|
|
|
|
|
+<a name="N101CA"></a><a name="sc_CrossMachineRequirements"></a>
|
|
<h4>Cross Machine Requirements</h4>
|
|
<h4>Cross Machine Requirements</h4>
|
|
<p>For the ZooKeeper service to be active, there must be a
|
|
<p>For the ZooKeeper service to be active, there must be a
|
|
majority of non-failing machines that can communicate with
|
|
majority of non-failing machines that can communicate with
|
|
@@ -743,7 +763,7 @@ server.3=zoo3:2888:3888</span>
|
|
failure of that switch could cause a correlated failure and
|
|
failure of that switch could cause a correlated failure and
|
|
bring down the service. The same holds true of shared power
|
|
bring down the service. The same holds true of shared power
|
|
circuits, cooling systems, etc.</p>
|
|
circuits, cooling systems, etc.</p>
|
|
-<a name="N101CF"></a><a name="Single+Machine+Requirements"></a>
|
|
|
|
|
|
+<a name="N101D7"></a><a name="Single+Machine+Requirements"></a>
|
|
<h4>Single Machine Requirements</h4>
|
|
<h4>Single Machine Requirements</h4>
|
|
<p>If ZooKeeper has to contend with other applications for
|
|
<p>If ZooKeeper has to contend with other applications for
|
|
access to resourses like storage media, CPU, network, or
|
|
access to resourses like storage media, CPU, network, or
|
|
@@ -784,19 +804,61 @@ server.3=zoo3:2888:3888</span>
|
|
</li>
|
|
</li>
|
|
|
|
|
|
</ul>
|
|
</ul>
|
|
-<a name="N101ED"></a><a name="sc_provisioning"></a>
|
|
|
|
|
|
+<a name="N101F5"></a><a name="sc_provisioning"></a>
|
|
<h3 class="h4">Provisioning</h3>
|
|
<h3 class="h4">Provisioning</h3>
|
|
<p></p>
|
|
<p></p>
|
|
-<a name="N101F6"></a><a name="sc_strengthsAndLimitations"></a>
|
|
|
|
|
|
+<a name="N101FE"></a><a name="sc_strengthsAndLimitations"></a>
|
|
<h3 class="h4">Things to Consider: ZooKeeper Strengths and Limitations</h3>
|
|
<h3 class="h4">Things to Consider: ZooKeeper Strengths and Limitations</h3>
|
|
<p></p>
|
|
<p></p>
|
|
-<a name="N101FF"></a><a name="sc_administering"></a>
|
|
|
|
|
|
+<a name="N10207"></a><a name="sc_administering"></a>
|
|
<h3 class="h4">Administering</h3>
|
|
<h3 class="h4">Administering</h3>
|
|
<p></p>
|
|
<p></p>
|
|
-<a name="N10208"></a><a name="sc_monitoring"></a>
|
|
|
|
|
|
+<a name="N10210"></a><a name="sc_maintenance"></a>
|
|
|
|
+<h3 class="h4">Maintenance</h3>
|
|
|
|
+<p>Little long term maintenance is required for a ZooKeeper
|
|
|
|
+ cluster however you must be aware of the following:</p>
|
|
|
|
+<a name="N10219"></a><a name="Ongoing+Data+Directory+Cleanup"></a>
|
|
|
|
+<h4>Ongoing Data Directory Cleanup</h4>
|
|
|
|
+<p>The ZooKeeper <a href="#var_datadir">Data
|
|
|
|
+ Directory</a> contains files which are a persistent copy
|
|
|
|
+ of the znodes stored by a particular serving ensemble. These
|
|
|
|
+ are the snapshot and transactional log files. As changes are
|
|
|
|
+ made to the znodes these changes are appended to a
|
|
|
|
+ transaction log, occasionally, when a log grows large, a
|
|
|
|
+ snapshot of the current state of all znodes will be written
|
|
|
|
+ to the filesystem. This snapshot supercedes all previous
|
|
|
|
+ logs.
|
|
|
|
+ </p>
|
|
|
|
+<p>A ZooKeeper server <strong>will not remove
|
|
|
|
+ old snapshots and log files</strong>, this is the
|
|
|
|
+ responsibility of the operator. Every serving environment is
|
|
|
|
+ different and therefore the requirements of managing these
|
|
|
|
+ files may differ from install to install (backup for example).
|
|
|
|
+ </p>
|
|
|
|
+<p>The PurgeTxnLog utility implements a simple retention
|
|
|
|
+ policy that administrators can use. The <a href="api/index.html">API docs</a> contains details on
|
|
|
|
+ calling conventions (arguments, etc...).
|
|
|
|
+ </p>
|
|
|
|
+<p>In the following example the last count snapshots and
|
|
|
|
+ their corresponding logs are retained and the others are
|
|
|
|
+ deleted. The value of <count> should typically be
|
|
|
|
+ greater than 3 (although not required, this provides 3 backups
|
|
|
|
+ in the unlikely event a recent log has become corrupted). This
|
|
|
|
+ can be run as a cron job on the ZooKeeper server machines to
|
|
|
|
+ clean up the logs daily.</p>
|
|
|
|
+<pre class="code"> java -cp zookeeper.jar:log4j.jar:conf org.apache.zookeeper.server.PurgeTxnLog <dataDir> <snapDir> -n <count></pre>
|
|
|
|
+<a name="N1023A"></a><a name="Debug+Log+Cleanup+%28log4j%29"></a>
|
|
|
|
+<h4>Debug Log Cleanup (log4j)</h4>
|
|
|
|
+<p>See the section on <a href="#sc_logging">logging</a> in this document. It is
|
|
|
|
+ expected that you will setup a rolling file appender using the
|
|
|
|
+ in-built log4j feature. The sample configuration file in the
|
|
|
|
+ release tar's conf/log4j.properties provides an example of
|
|
|
|
+ this.
|
|
|
|
+ </p>
|
|
|
|
+<a name="N10249"></a><a name="sc_monitoring"></a>
|
|
<h3 class="h4">Monitoring</h3>
|
|
<h3 class="h4">Monitoring</h3>
|
|
<p></p>
|
|
<p></p>
|
|
-<a name="N10211"></a><a name="sc_logging"></a>
|
|
|
|
|
|
+<a name="N10252"></a><a name="sc_logging"></a>
|
|
<h3 class="h4">Logging</h3>
|
|
<h3 class="h4">Logging</h3>
|
|
<p>ZooKeeper uses <strong>log4j</strong> version 1.2 as
|
|
<p>ZooKeeper uses <strong>log4j</strong> version 1.2 as
|
|
its logging infrastructure. The ZooKeeper default <span class="codefrag filename">log4j.properties</span>
|
|
its logging infrastructure. The ZooKeeper default <span class="codefrag filename">log4j.properties</span>
|
|
@@ -806,10 +868,10 @@ server.3=zoo3:2888:3888</span>
|
|
<p>For more information, see
|
|
<p>For more information, see
|
|
<a href="http://logging.apache.org/log4j/1.2/manual.html#defaultInit">Log4j Default Initialization Procedure</a>
|
|
<a href="http://logging.apache.org/log4j/1.2/manual.html#defaultInit">Log4j Default Initialization Procedure</a>
|
|
of the log4j manual.</p>
|
|
of the log4j manual.</p>
|
|
-<a name="N10231"></a><a name="sc_troubleshooting"></a>
|
|
|
|
|
|
+<a name="N10272"></a><a name="sc_troubleshooting"></a>
|
|
<h3 class="h4">Troubleshooting</h3>
|
|
<h3 class="h4">Troubleshooting</h3>
|
|
<p></p>
|
|
<p></p>
|
|
-<a name="N1023A"></a><a name="sc_configuration"></a>
|
|
|
|
|
|
+<a name="N1027B"></a><a name="sc_configuration"></a>
|
|
<h3 class="h4">Configuration Parameters</h3>
|
|
<h3 class="h4">Configuration Parameters</h3>
|
|
<p>ZooKeeper's behavior is governed by the ZooKeeper configuration
|
|
<p>ZooKeeper's behavior is governed by the ZooKeeper configuration
|
|
file. This file is designed so that the exact same file can be used by
|
|
file. This file is designed so that the exact same file can be used by
|
|
@@ -817,7 +879,7 @@ server.3=zoo3:2888:3888</span>
|
|
layouts are the same. If servers use different configuration files, care
|
|
layouts are the same. If servers use different configuration files, care
|
|
must be taken to ensure that the list of servers in all of the different
|
|
must be taken to ensure that the list of servers in all of the different
|
|
configuration files match.</p>
|
|
configuration files match.</p>
|
|
-<a name="N10243"></a><a name="sc_minimumConfiguration"></a>
|
|
|
|
|
|
+<a name="N10284"></a><a name="sc_minimumConfiguration"></a>
|
|
<h4>Minimum Configuration</h4>
|
|
<h4>Minimum Configuration</h4>
|
|
<p>Here are the minimum configuration keywords that must be defined
|
|
<p>Here are the minimum configuration keywords that must be defined
|
|
in the configuration file:</p>
|
|
in the configuration file:</p>
|
|
@@ -864,7 +926,7 @@ server.3=zoo3:2888:3888</span>
|
|
</dd>
|
|
</dd>
|
|
|
|
|
|
</dl>
|
|
</dl>
|
|
-<a name="N1026A"></a><a name="sc_advancedConfiguration"></a>
|
|
|
|
|
|
+<a name="N102AB"></a><a name="sc_advancedConfiguration"></a>
|
|
<h4>Advanced Configuration</h4>
|
|
<h4>Advanced Configuration</h4>
|
|
<p>The configuration settings in the section are optional. You can
|
|
<p>The configuration settings in the section are optional. You can
|
|
use them to further fine tune the behaviour of your ZooKeeper servers.
|
|
use them to further fine tune the behaviour of your ZooKeeper servers.
|
|
@@ -955,7 +1017,7 @@ server.3=zoo3:2888:3888</span>
|
|
</dd>
|
|
</dd>
|
|
|
|
|
|
</dl>
|
|
</dl>
|
|
-<a name="N102CA"></a><a name="sc_clusterOptions"></a>
|
|
|
|
|
|
+<a name="N1030B"></a><a name="sc_clusterOptions"></a>
|
|
<h4>Cluster Options</h4>
|
|
<h4>Cluster Options</h4>
|
|
<p>The options in this section are designed for use with an ensemble
|
|
<p>The options in this section are designed for use with an ensemble
|
|
of servers -- that is, when deploying clusters of servers.</p>
|
|
of servers -- that is, when deploying clusters of servers.</p>
|
|
@@ -1045,7 +1107,7 @@ server.3=zoo3:2888:3888</span>
|
|
|
|
|
|
</dl>
|
|
</dl>
|
|
<p></p>
|
|
<p></p>
|
|
-<a name="N10327"></a><a name="Unsafe+Options"></a>
|
|
|
|
|
|
+<a name="N10368"></a><a name="Unsafe+Options"></a>
|
|
<h4>Unsafe Options</h4>
|
|
<h4>Unsafe Options</h4>
|
|
<p>The following options can be useful, but be careful when you use
|
|
<p>The following options can be useful, but be careful when you use
|
|
them. The risk of each is explained along with the explanation of what
|
|
them. The risk of each is explained along with the explanation of what
|
|
@@ -1090,7 +1152,7 @@ server.3=zoo3:2888:3888</span>
|
|
</dd>
|
|
</dd>
|
|
|
|
|
|
</dl>
|
|
</dl>
|
|
-<a name="N10359"></a><a name="sc_zkCommands"></a>
|
|
|
|
|
|
+<a name="N1039A"></a><a name="sc_zkCommands"></a>
|
|
<h3 class="h4">ZooKeeper Commands: The Four Letter Words</h3>
|
|
<h3 class="h4">ZooKeeper Commands: The Four Letter Words</h3>
|
|
<p>ZooKeeper responds to a small set of commands. Each command is
|
|
<p>ZooKeeper responds to a small set of commands. Each command is
|
|
composed of four letters. You issue the commands to ZooKeeper via telnet
|
|
composed of four letters. You issue the commands to ZooKeeper via telnet
|
|
@@ -1163,7 +1225,7 @@ server.3=zoo3:2888:3888</span>
|
|
<pre class="code">$ echo ruok | nc 127.0.0.1 5111
|
|
<pre class="code">$ echo ruok | nc 127.0.0.1 5111
|
|
imok
|
|
imok
|
|
</pre>
|
|
</pre>
|
|
-<a name="N103A0"></a><a name="sc_dataFileManagement"></a>
|
|
|
|
|
|
+<a name="N103E1"></a><a name="sc_dataFileManagement"></a>
|
|
<h3 class="h4">Data File Management</h3>
|
|
<h3 class="h4">Data File Management</h3>
|
|
<p>ZooKeeper stores its data in a data directory and its transaction
|
|
<p>ZooKeeper stores its data in a data directory and its transaction
|
|
log in a transaction log directory. By default these two directories are
|
|
log in a transaction log directory. By default these two directories are
|
|
@@ -1171,7 +1233,7 @@ imok
|
|
transaction log files in a separate directory than the data files.
|
|
transaction log files in a separate directory than the data files.
|
|
Throughput increases and latency decreases when transaction logs reside
|
|
Throughput increases and latency decreases when transaction logs reside
|
|
on a dedicated log devices.</p>
|
|
on a dedicated log devices.</p>
|
|
-<a name="N103A9"></a><a name="The+Data+Directory"></a>
|
|
|
|
|
|
+<a name="N103EA"></a><a name="The+Data+Directory"></a>
|
|
<h4>The Data Directory</h4>
|
|
<h4>The Data Directory</h4>
|
|
<p>This directory has two files in it:</p>
|
|
<p>This directory has two files in it:</p>
|
|
<ul>
|
|
<ul>
|
|
@@ -1217,14 +1279,14 @@ imok
|
|
idempotent nature of its updates. By replaying the transaction log
|
|
idempotent nature of its updates. By replaying the transaction log
|
|
against fuzzy snapshots ZooKeeper gets the state of the system at the
|
|
against fuzzy snapshots ZooKeeper gets the state of the system at the
|
|
end of the log.</p>
|
|
end of the log.</p>
|
|
-<a name="N103E5"></a><a name="The+Log+Directory"></a>
|
|
|
|
|
|
+<a name="N10426"></a><a name="The+Log+Directory"></a>
|
|
<h4>The Log Directory</h4>
|
|
<h4>The Log Directory</h4>
|
|
<p>The Log Directory contains the ZooKeeper transaction logs.
|
|
<p>The Log Directory contains the ZooKeeper transaction logs.
|
|
Before any update takes place, ZooKeeper ensures that the transaction
|
|
Before any update takes place, ZooKeeper ensures that the transaction
|
|
that represents the update is written to non-volatile storage. A new
|
|
that represents the update is written to non-volatile storage. A new
|
|
log file is started each time a snapshot is begun. The log file's
|
|
log file is started each time a snapshot is begun. The log file's
|
|
suffix is the first zxid written to that log.</p>
|
|
suffix is the first zxid written to that log.</p>
|
|
-<a name="N103EF"></a><a name="File+Management"></a>
|
|
|
|
|
|
+<a name="N10430"></a><a name="sc_filemanagement"></a>
|
|
<h4>File Management</h4>
|
|
<h4>File Management</h4>
|
|
<p>The format of snapshot and log files does not change between
|
|
<p>The format of snapshot and log files does not change between
|
|
standalone ZooKeeper servers and different configurations of
|
|
standalone ZooKeeper servers and different configurations of
|
|
@@ -1235,13 +1297,16 @@ imok
|
|
state of ZooKeeper servers and even restore that state. The
|
|
state of ZooKeeper servers and even restore that state. The
|
|
LogFormatter class allows an administrator to look at the transactions
|
|
LogFormatter class allows an administrator to look at the transactions
|
|
in a log.</p>
|
|
in a log.</p>
|
|
-<p>The ZooKeeper server creates snapshot and log files, but never
|
|
|
|
- deletes them. The retention policy of the data and log files is
|
|
|
|
- implemented outside of the ZooKeeper server. The server itself only
|
|
|
|
- needs the latest complete fuzzy snapshot and the log files from the
|
|
|
|
- start of that snapshot. The PurgeTxnLog utility implements a simple
|
|
|
|
- retention policy that administrators can use.</p>
|
|
|
|
-<a name="N10400"></a><a name="sc_commonProblems"></a>
|
|
|
|
|
|
+<p>The ZooKeeper server creates snapshot and log files, but
|
|
|
|
+ never deletes them. The retention policy of the data and log
|
|
|
|
+ files is implemented outside of the ZooKeeper server. The
|
|
|
|
+ server itself only needs the latest complete fuzzy snapshot
|
|
|
|
+ and the log files from the start of that snapshot. See the
|
|
|
|
+ <a href="#sc_maintenance">maintenance</a> section in
|
|
|
|
+ this document for more details on setting a retention policy
|
|
|
|
+ and maintenance of ZooKeeper storage.
|
|
|
|
+ </p>
|
|
|
|
+<a name="N10445"></a><a name="sc_commonProblems"></a>
|
|
<h3 class="h4">Things to Avoid</h3>
|
|
<h3 class="h4">Things to Avoid</h3>
|
|
<p>Here are some common problems you can avoid by configuring
|
|
<p>Here are some common problems you can avoid by configuring
|
|
ZooKeeper correctly:</p>
|
|
ZooKeeper correctly:</p>
|
|
@@ -1295,7 +1360,7 @@ imok
|
|
</dd>
|
|
</dd>
|
|
|
|
|
|
</dl>
|
|
</dl>
|
|
-<a name="N10424"></a><a name="sc_bestPractices"></a>
|
|
|
|
|
|
+<a name="N10469"></a><a name="sc_bestPractices"></a>
|
|
<h3 class="h4">Best Practices</h3>
|
|
<h3 class="h4">Best Practices</h3>
|
|
<p>For best results, take note of the following list of good
|
|
<p>For best results, take note of the following list of good
|
|
Zookeeper practices. <em>[tbd...]</em>
|
|
Zookeeper practices. <em>[tbd...]</em>
|