Pārlūkot izejas kodu

ZOOKEEPER-485. Need ops documentation that details supervision of ZK server processes. (phunt via mahadev)

git-svn-id: https://svn.apache.org/repos/asf/hadoop/zookeeper/trunk@915956 13f79535-47bb-0310-9956-ffa450edef68
Mahadev Konar 15 gadi atpakaļ
vecāks
revīzija
7c30839a67

+ 3 - 0
CHANGES.txt

@@ -293,6 +293,9 @@ IMPROVEMENTS:
 
 
   ZOOKEEPER-607. improve bookkeeper overview (flavio via mahadev)
   ZOOKEEPER-607. improve bookkeeper overview (flavio via mahadev)
 
 
+  ZOOKEEPER-485. Need ops documentation that details supervision of ZK server
+  processes. (phunt via mahadev)
+
 NEW FEATURES:
 NEW FEATURES:
   ZOOKEEPER-539. generate eclipse project via ant target. (phunt via mahadev)
   ZOOKEEPER-539. generate eclipse project via ant target. (phunt via mahadev)
 
 

+ 59 - 26
docs/zookeeperAdmin.html

@@ -263,6 +263,9 @@ document.write("Last Published: " + document.lastModified);
 </ul>
 </ul>
 </li>
 </li>
 <li>
 <li>
+<a href="#sc_supervision">Supervision</a>
+</li>
+<li>
 <a href="#sc_monitoring">Monitoring</a>
 <a href="#sc_monitoring">Monitoring</a>
 </li>
 </li>
 <li>
 <li>
@@ -670,6 +673,15 @@ server.3=zoo3:2888:3888</span>
 </li>
 </li>
 
 
         
         
+<li>
+          
+<p>
+<a href="#sc_supervision">Supervision</a>
+</p>
+        
+</li>
+
+        
 <li>
 <li>
           
           
 <p>
 <p>
@@ -742,7 +754,7 @@ server.3=zoo3:2888:3888</span>
 </li>
 </li>
       
       
 </ul>
 </ul>
-<a name="N101AE"></a><a name="sc_designing"></a>
+<a name="N101B6"></a><a name="sc_designing"></a>
 <h3 class="h4">Designing a ZooKeeper Deployment</h3>
 <h3 class="h4">Designing a ZooKeeper Deployment</h3>
 <p>The reliablity of ZooKeeper rests on two basic assumptions.</p>
 <p>The reliablity of ZooKeeper rests on two basic assumptions.</p>
 <ol>
 <ol>
@@ -769,7 +781,7 @@ server.3=zoo3:2888:3888</span>
       to hold true. Some of these are cross-machines considerations,
       to hold true. Some of these are cross-machines considerations,
       and others are things you should consider for each and every
       and others are things you should consider for each and every
       machine in your deployment.</p>
       machine in your deployment.</p>
-<a name="N101CA"></a><a name="sc_CrossMachineRequirements"></a>
+<a name="N101D2"></a><a name="sc_CrossMachineRequirements"></a>
 <h4>Cross Machine Requirements</h4>
 <h4>Cross Machine Requirements</h4>
 <p>For the ZooKeeper service to be active, there must be a
 <p>For the ZooKeeper service to be active, there must be a
         majority of non-failing machines that can communicate with
         majority of non-failing machines that can communicate with
@@ -787,7 +799,7 @@ server.3=zoo3:2888:3888</span>
         failure of that switch could cause a correlated failure and
         failure of that switch could cause a correlated failure and
         bring down the service. The same holds true of shared power
         bring down the service. The same holds true of shared power
         circuits, cooling systems, etc.</p>
         circuits, cooling systems, etc.</p>
-<a name="N101D7"></a><a name="Single+Machine+Requirements"></a>
+<a name="N101DF"></a><a name="Single+Machine+Requirements"></a>
 <h4>Single Machine Requirements</h4>
 <h4>Single Machine Requirements</h4>
 <p>If ZooKeeper has to contend with other applications for
 <p>If ZooKeeper has to contend with other applications for
         access to resourses like storage media, CPU, network, or
         access to resourses like storage media, CPU, network, or
@@ -828,20 +840,20 @@ server.3=zoo3:2888:3888</span>
 </li>
 </li>
       
       
 </ul>
 </ul>
-<a name="N101F5"></a><a name="sc_provisioning"></a>
+<a name="N101FD"></a><a name="sc_provisioning"></a>
 <h3 class="h4">Provisioning</h3>
 <h3 class="h4">Provisioning</h3>
 <p></p>
 <p></p>
-<a name="N101FE"></a><a name="sc_strengthsAndLimitations"></a>
+<a name="N10206"></a><a name="sc_strengthsAndLimitations"></a>
 <h3 class="h4">Things to Consider: ZooKeeper Strengths and Limitations</h3>
 <h3 class="h4">Things to Consider: ZooKeeper Strengths and Limitations</h3>
 <p></p>
 <p></p>
-<a name="N10207"></a><a name="sc_administering"></a>
+<a name="N1020F"></a><a name="sc_administering"></a>
 <h3 class="h4">Administering</h3>
 <h3 class="h4">Administering</h3>
 <p></p>
 <p></p>
-<a name="N10210"></a><a name="sc_maintenance"></a>
+<a name="N10218"></a><a name="sc_maintenance"></a>
 <h3 class="h4">Maintenance</h3>
 <h3 class="h4">Maintenance</h3>
 <p>Little long term maintenance is required for a ZooKeeper
 <p>Little long term maintenance is required for a ZooKeeper
         cluster however you must be aware of the following:</p>
         cluster however you must be aware of the following:</p>
-<a name="N10219"></a><a name="Ongoing+Data+Directory+Cleanup"></a>
+<a name="N10221"></a><a name="Ongoing+Data+Directory+Cleanup"></a>
 <h4>Ongoing Data Directory Cleanup</h4>
 <h4>Ongoing Data Directory Cleanup</h4>
 <p>The ZooKeeper <a href="#var_datadir">Data
 <p>The ZooKeeper <a href="#var_datadir">Data
           Directory</a> contains files which are a persistent copy
           Directory</a> contains files which are a persistent copy
@@ -871,7 +883,7 @@ server.3=zoo3:2888:3888</span>
         can be run as a cron job on the ZooKeeper server machines to
         can be run as a cron job on the ZooKeeper server machines to
         clean up the logs daily.</p>
         clean up the logs daily.</p>
 <pre class="code"> java -cp zookeeper.jar:log4j.jar:conf org.apache.zookeeper.server.PurgeTxnLog &lt;dataDir&gt; &lt;snapDir&gt; -n &lt;count&gt;</pre>
 <pre class="code"> java -cp zookeeper.jar:log4j.jar:conf org.apache.zookeeper.server.PurgeTxnLog &lt;dataDir&gt; &lt;snapDir&gt; -n &lt;count&gt;</pre>
-<a name="N1023A"></a><a name="Debug+Log+Cleanup+%28log4j%29"></a>
+<a name="N10242"></a><a name="Debug+Log+Cleanup+%28log4j%29"></a>
 <h4>Debug Log Cleanup (log4j)</h4>
 <h4>Debug Log Cleanup (log4j)</h4>
 <p>See the section on <a href="#sc_logging">logging</a> in this document. It is
 <p>See the section on <a href="#sc_logging">logging</a> in this document. It is
         expected that you will setup a rolling file appender using the
         expected that you will setup a rolling file appender using the
@@ -879,10 +891,31 @@ server.3=zoo3:2888:3888</span>
         release tar's conf/log4j.properties provides an example of
         release tar's conf/log4j.properties provides an example of
         this.
         this.
         </p>
         </p>
-<a name="N10249"></a><a name="sc_monitoring"></a>
+<a name="N10251"></a><a name="sc_supervision"></a>
+<h3 class="h4">Supervision</h3>
+<p>You will want to have a supervisory process that manages
+      each of your ZooKeeper server processes (JVM). The ZK server is
+      designed to be "fail fast" meaning that it will shutdown
+      (process exit) if an error occurs that it cannot recover
+      from. As a ZooKeeper serving cluster is highly reliable, this
+      means that while the server may go down the cluster as a whole
+      is still active and serving requests. Additionally, as the
+      cluster is "self healing" the failed server once restarted will
+      automatically rejoin the ensemble w/o any manual
+      interaction.</p>
+<p>Having a supervisory process such as <a href="http://cr.yp.to/daemontools.html">daemontools</a> or
+      <a href="http://en.wikipedia.org/wiki/Service_Management_Facility">SMF</a>
+      (other options for supervisory process are also available, it's
+      up to you which one you would like to use, these are just two
+      examples) managing your ZooKeeper server ensures that if the
+      process does exit abnormally it will automatically be restarted
+      and will quickly rejoin the cluster.</p>
+<a name="N10266"></a><a name="sc_monitoring"></a>
 <h3 class="h4">Monitoring</h3>
 <h3 class="h4">Monitoring</h3>
-<p></p>
-<a name="N10252"></a><a name="sc_logging"></a>
+<p>The ZooKeeper service can be monitored in one of two
+      primary ways; 1) the command port through the use of <a href="#sc_zkCommands">4 letter words</a> and 2) <a href="zookeeperJMX.html">JMX</a>. See the appropriate section for
+      your environment/requirements.</p>
+<a name="N10278"></a><a name="sc_logging"></a>
 <h3 class="h4">Logging</h3>
 <h3 class="h4">Logging</h3>
 <p>ZooKeeper uses <strong>log4j</strong> version 1.2 as 
 <p>ZooKeeper uses <strong>log4j</strong> version 1.2 as 
       its logging infrastructure. The  ZooKeeper default <span class="codefrag filename">log4j.properties</span> 
       its logging infrastructure. The  ZooKeeper default <span class="codefrag filename">log4j.properties</span> 
@@ -892,10 +925,10 @@ server.3=zoo3:2888:3888</span>
 <p>For more information, see 
 <p>For more information, see 
       <a href="http://logging.apache.org/log4j/1.2/manual.html#defaultInit">Log4j Default Initialization Procedure</a> 
       <a href="http://logging.apache.org/log4j/1.2/manual.html#defaultInit">Log4j Default Initialization Procedure</a> 
       of the log4j manual.</p>
       of the log4j manual.</p>
-<a name="N10272"></a><a name="sc_troubleshooting"></a>
+<a name="N10298"></a><a name="sc_troubleshooting"></a>
 <h3 class="h4">Troubleshooting</h3>
 <h3 class="h4">Troubleshooting</h3>
 <p></p>
 <p></p>
-<a name="N1027B"></a><a name="sc_configuration"></a>
+<a name="N102A1"></a><a name="sc_configuration"></a>
 <h3 class="h4">Configuration Parameters</h3>
 <h3 class="h4">Configuration Parameters</h3>
 <p>ZooKeeper's behavior is governed by the ZooKeeper configuration
 <p>ZooKeeper's behavior is governed by the ZooKeeper configuration
       file. This file is designed so that the exact same file can be used by
       file. This file is designed so that the exact same file can be used by
@@ -903,7 +936,7 @@ server.3=zoo3:2888:3888</span>
       layouts are the same. If servers use different configuration files, care
       layouts are the same. If servers use different configuration files, care
       must be taken to ensure that the list of servers in all of the different
       must be taken to ensure that the list of servers in all of the different
       configuration files match.</p>
       configuration files match.</p>
-<a name="N10284"></a><a name="sc_minimumConfiguration"></a>
+<a name="N102AA"></a><a name="sc_minimumConfiguration"></a>
 <h4>Minimum Configuration</h4>
 <h4>Minimum Configuration</h4>
 <p>Here are the minimum configuration keywords that must be defined
 <p>Here are the minimum configuration keywords that must be defined
         in the configuration file:</p>
         in the configuration file:</p>
@@ -950,7 +983,7 @@ server.3=zoo3:2888:3888</span>
 </dd>
 </dd>
         
         
 </dl>
 </dl>
-<a name="N102AB"></a><a name="sc_advancedConfiguration"></a>
+<a name="N102D1"></a><a name="sc_advancedConfiguration"></a>
 <h4>Advanced Configuration</h4>
 <h4>Advanced Configuration</h4>
 <p>The configuration settings in the section are optional. You can
 <p>The configuration settings in the section are optional. You can
         use them to further fine tune the behaviour of your ZooKeeper servers.
         use them to further fine tune the behaviour of your ZooKeeper servers.
@@ -1050,7 +1083,7 @@ server.3=zoo3:2888:3888</span>
 </dd>
 </dd>
         
         
 </dl>
 </dl>
-<a name="N10314"></a><a name="sc_clusterOptions"></a>
+<a name="N1033A"></a><a name="sc_clusterOptions"></a>
 <h4>Cluster Options</h4>
 <h4>Cluster Options</h4>
 <p>The options in this section are designed for use with an ensemble
 <p>The options in this section are designed for use with an ensemble
         of servers -- that is, when deploying clusters of servers.</p>
         of servers -- that is, when deploying clusters of servers.</p>
@@ -1174,7 +1207,7 @@ server.3=zoo3:2888:3888</span>
         
         
 </dl>
 </dl>
 <p></p>
 <p></p>
-<a name="N1038F"></a><a name="sc_authOptions"></a>
+<a name="N103B5"></a><a name="sc_authOptions"></a>
 <h4>Authentication &amp; Authorization Options</h4>
 <h4>Authentication &amp; Authorization Options</h4>
 <p>The options in this section allow control over
 <p>The options in this section allow control over
         authentication/authorization performed by the service.</p>
         authentication/authorization performed by the service.</p>
@@ -1208,7 +1241,7 @@ server.3=zoo3:2888:3888</span>
 </dd>
 </dd>
         
         
 </dl>
 </dl>
-<a name="N103B2"></a><a name="Unsafe+Options"></a>
+<a name="N103D8"></a><a name="Unsafe+Options"></a>
 <h4>Unsafe Options</h4>
 <h4>Unsafe Options</h4>
 <p>The following options can be useful, but be careful when you use
 <p>The following options can be useful, but be careful when you use
         them. The risk of each is explained along with the explanation of what
         them. The risk of each is explained along with the explanation of what
@@ -1253,7 +1286,7 @@ server.3=zoo3:2888:3888</span>
 </dd>
 </dd>
         
         
 </dl>
 </dl>
-<a name="N103E4"></a><a name="sc_zkCommands"></a>
+<a name="N1040A"></a><a name="sc_zkCommands"></a>
 <h3 class="h4">ZooKeeper Commands: The Four Letter Words</h3>
 <h3 class="h4">ZooKeeper Commands: The Four Letter Words</h3>
 <p>ZooKeeper responds to a small set of commands. Each command is
 <p>ZooKeeper responds to a small set of commands. Each command is
       composed of four letters. You issue the commands to ZooKeeper via telnet
       composed of four letters. You issue the commands to ZooKeeper via telnet
@@ -1374,7 +1407,7 @@ server.3=zoo3:2888:3888</span>
 <pre class="code">$ echo ruok | nc 127.0.0.1 5111
 <pre class="code">$ echo ruok | nc 127.0.0.1 5111
 imok
 imok
 </pre>
 </pre>
-<a name="N1044C"></a><a name="sc_dataFileManagement"></a>
+<a name="N10472"></a><a name="sc_dataFileManagement"></a>
 <h3 class="h4">Data File Management</h3>
 <h3 class="h4">Data File Management</h3>
 <p>ZooKeeper stores its data in a data directory and its transaction
 <p>ZooKeeper stores its data in a data directory and its transaction
       log in a transaction log directory. By default these two directories are
       log in a transaction log directory. By default these two directories are
@@ -1382,7 +1415,7 @@ imok
       transaction log files in a separate directory than the data files.
       transaction log files in a separate directory than the data files.
       Throughput increases and latency decreases when transaction logs reside
       Throughput increases and latency decreases when transaction logs reside
       on a dedicated log devices.</p>
       on a dedicated log devices.</p>
-<a name="N10455"></a><a name="The+Data+Directory"></a>
+<a name="N1047B"></a><a name="The+Data+Directory"></a>
 <h4>The Data Directory</h4>
 <h4>The Data Directory</h4>
 <p>This directory has two files in it:</p>
 <p>This directory has two files in it:</p>
 <ul>
 <ul>
@@ -1428,14 +1461,14 @@ imok
         idempotent nature of its updates. By replaying the transaction log
         idempotent nature of its updates. By replaying the transaction log
         against fuzzy snapshots ZooKeeper gets the state of the system at the
         against fuzzy snapshots ZooKeeper gets the state of the system at the
         end of the log.</p>
         end of the log.</p>
-<a name="N10491"></a><a name="The+Log+Directory"></a>
+<a name="N104B7"></a><a name="The+Log+Directory"></a>
 <h4>The Log Directory</h4>
 <h4>The Log Directory</h4>
 <p>The Log Directory contains the ZooKeeper transaction logs.
 <p>The Log Directory contains the ZooKeeper transaction logs.
         Before any update takes place, ZooKeeper ensures that the transaction
         Before any update takes place, ZooKeeper ensures that the transaction
         that represents the update is written to non-volatile storage. A new
         that represents the update is written to non-volatile storage. A new
         log file is started each time a snapshot is begun. The log file's
         log file is started each time a snapshot is begun. The log file's
         suffix is the first zxid written to that log.</p>
         suffix is the first zxid written to that log.</p>
-<a name="N1049B"></a><a name="sc_filemanagement"></a>
+<a name="N104C1"></a><a name="sc_filemanagement"></a>
 <h4>File Management</h4>
 <h4>File Management</h4>
 <p>The format of snapshot and log files does not change between
 <p>The format of snapshot and log files does not change between
         standalone ZooKeeper servers and different configurations of
         standalone ZooKeeper servers and different configurations of
@@ -1455,7 +1488,7 @@ imok
         this document for more details on setting a retention policy
         this document for more details on setting a retention policy
         and maintenance of ZooKeeper storage.
         and maintenance of ZooKeeper storage.
         </p>
         </p>
-<a name="N104B0"></a><a name="sc_commonProblems"></a>
+<a name="N104D6"></a><a name="sc_commonProblems"></a>
 <h3 class="h4">Things to Avoid</h3>
 <h3 class="h4">Things to Avoid</h3>
 <p>Here are some common problems you can avoid by configuring
 <p>Here are some common problems you can avoid by configuring
       ZooKeeper correctly:</p>
       ZooKeeper correctly:</p>
@@ -1509,7 +1542,7 @@ imok
 </dd>
 </dd>
       
       
 </dl>
 </dl>
-<a name="N104D4"></a><a name="sc_bestPractices"></a>
+<a name="N104FA"></a><a name="sc_bestPractices"></a>
 <h3 class="h4">Best Practices</h3>
 <h3 class="h4">Best Practices</h3>
 <p>For best results, take note of the following list of good
 <p>For best results, take note of the following list of good
       Zookeeper practices:</p>
       Zookeeper practices:</p>

Failā izmaiņas netiks attēlotas, jo tās ir par lielu
+ 21 - 10
docs/zookeeperAdmin.pdf


+ 34 - 1
src/docs/src/documentation/content/xdocs/zookeeperAdmin.xml

@@ -298,6 +298,10 @@ server.3=zoo3:2888:3888</computeroutput></para>
           <para><xref linkend="sc_maintenance" /></para>
           <para><xref linkend="sc_maintenance" /></para>
         </listitem>
         </listitem>
 
 
+        <listitem>
+          <para><xref linkend="sc_supervision" /></para>
+        </listitem>
+
         <listitem>
         <listitem>
           <para><xref linkend="sc_monitoring" /></para>
           <para><xref linkend="sc_monitoring" /></para>
         </listitem>
         </listitem>
@@ -492,10 +496,39 @@ server.3=zoo3:2888:3888</computeroutput></para>
 
 
     </section>
     </section>
 
 
+    <section id="sc_supervision">
+      <title>Supervision</title>
+
+      <para>You will want to have a supervisory process that manages
+      each of your ZooKeeper server processes (JVM). The ZK server is
+      designed to be "fail fast" meaning that it will shutdown
+      (process exit) if an error occurs that it cannot recover
+      from. As a ZooKeeper serving cluster is highly reliable, this
+      means that while the server may go down the cluster as a whole
+      is still active and serving requests. Additionally, as the
+      cluster is "self healing" the failed server once restarted will
+      automatically rejoin the ensemble w/o any manual
+      interaction.</para>
+
+      <para>Having a supervisory process such as <ulink
+      url="http://cr.yp.to/daemontools.html">daemontools</ulink> or
+      <ulink
+      url="http://en.wikipedia.org/wiki/Service_Management_Facility">SMF</ulink>
+      (other options for supervisory process are also available, it's
+      up to you which one you would like to use, these are just two
+      examples) managing your ZooKeeper server ensures that if the
+      process does exit abnormally it will automatically be restarted
+      and will quickly rejoin the cluster.</para>
+    </section>
+
     <section id="sc_monitoring">
     <section id="sc_monitoring">
       <title>Monitoring</title>
       <title>Monitoring</title>
 
 
-      <para></para>
+      <para>The ZooKeeper service can be monitored in one of two
+      primary ways; 1) the command port through the use of <ulink
+      url="#sc_zkCommands">4 letter words</ulink> and 2) <ulink
+      url="zookeeperJMX.html">JMX</ulink>. See the appropriate section for
+      your environment/requirements.</para>
     </section>
     </section>
 
 
     <section id="sc_logging">
     <section id="sc_logging">

Daži faili netika attēloti, jo izmaiņu fails ir pārāk liels