瀏覽代碼

ZOOKEEPER-161. Content needed: "Designing a ZooKeeper Deployment"

git-svn-id: https://svn.apache.org/repos/asf/hadoop/zookeeper/trunk@724936 13f79535-47bb-0310-9956-ffa450edef68
Patrick D. Hunt 16 年之前
父節點
當前提交
679a561d00
共有 4 個文件被更改,包括 227 次插入54 次删除
  1. 36 32
      CHANGES.txt
  2. 110 19
      docs/zookeeperAdmin.html
  3. 2 2
      docs/zookeeperAdmin.pdf
  4. 79 1
      src/docs/src/documentation/content/xdocs/zookeeperAdmin.xml

+ 36 - 32
CHANGES.txt

@@ -5,58 +5,62 @@ Non-backward compatible changes:
 Backward compatibile changes:
 Backward compatibile changes:
 
 
 BUGFIXES: 
 BUGFIXES: 
-   ZOOKEEPER-211 Not all Mock tests are working (ben via phunt)
+  ZOOKEEPER-211. Not all Mock tests are working (ben via phunt)
 
 
-   ZOOKEEPER-223. change default level in root logger to INFO.
-   (pat via mahadev) 
+  ZOOKEEPER-223. change default level in root logger to INFO.
+  (pat via mahadev) 
    
    
-   ZOOKEEPER-212. fix the snapshot to be asynchronous. (mahadev and ben)
+  ZOOKEEPER-212. fix the snapshot to be asynchronous. (mahadev and ben)
 
 
-   ZOOKEEPER-213. fix programmer guide C api docs to be  in sync with latest
-   zookeeper.h (pat via mahadev)
+  ZOOKEEPER-213. fix programmer guide C api docs to be  in sync with latest
+  zookeeper.h (pat via mahadev)
 
 
-   ZOOKEEPER-219. fix events.poll timeout in watcher test to be longer.
-   (pat via mahadev)
+  ZOOKEEPER-219. fix events.poll timeout in watcher test to be longer.
+  (pat via mahadev)
    
    
-   ZOOKEEPER-217. Fix errors in config to be thrown as Exceptions. (mahadev)
+  ZOOKEEPER-217. Fix errors in config to be thrown as Exceptions. (mahadev)
 
 
-   ZOOKEEPER-228. fix apache header missing in DBTest. (mahadev)
+  ZOOKEEPER-228. fix apache header missing in DBTest. (mahadev)
 
 
-   ZOOKEEPER-218. fix the error in the barrier example code. (pat via mahadev)
+  ZOOKEEPER-218. fix the error in the barrier example code. (pat via mahadev)
 
 
-   ZOOKEEPER-206. documentation tab should contain the version number and 
-   other small site changes. (pat via mahadev) 
+  ZOOKEEPER-206. documentation tab should contain the version number and 
+  other small site changes. (pat via mahadev) 
 
 
-   ZOOKEEPER-226. fix exists calls that fail on server if node has null data.
-   (mahadev) 
+  ZOOKEEPER-226. fix exists calls that fail on server if node has null data.
+  (mahadev) 
 
 
-   ZOOKEEPER-204. SetWatches needs to be the first message after auth messages
-to the server (ben via mahadev)
+  ZOOKEEPER-204. SetWatches needs to be the first message after auth
+  messages to the server (ben via mahadev)
   
   
-   ZOOKEEPER-208. Zookeeper C client uses API that are not thread safe,
-causing crashes when multiple instances are active. (austin shoemaker, chris
-daroch and ben reed via mahadev) 
+  ZOOKEEPER-208. Zookeeper C client uses API that are not thread safe,
+  causing crashes when multiple instances are active.
+  (austin shoemaker, chris daroch and ben reed via mahadev) 
 
 
-   ZOOKEEPER-227. gcc warning from recordio.h (chris darroch via mahadev)
+  ZOOKEEPER-227. gcc warning from recordio.h (chris darroch via mahadev)
 
 
-   ZOOKEEPER-232. fix apache licence header in TestableZookeeper (mahadev)
+  ZOOKEEPER-232. fix apache licence header in TestableZookeeper (mahadev)
 
 
-   ZOOKEEPER-249. QuorumPeer.getClientPort() always returns -1. (nitay 
-joffe via mahadev)
+  ZOOKEEPER-249. QuorumPeer.getClientPort() always returns -1.
+  (nitay joffe via mahadev)
 
 
-  ZOOKEEPER-248.  QuorumPeer should use Map interface instead of 
-HashMap implementation. (nitay joffe via mahadev)
+  ZOOKEEPER-248.  QuorumPeer should use Map interface instead of HashMap
+  implementation. (nitay joffe via mahadev)
 
 
-  ZOOKEEPER-241. Build of a distro fails after clean target is run. (patrick
-hunt via mahadev)
+  ZOOKEEPER-241. Build of a distro fails after clean target is run.
+  (patrick hunt via mahadev)
 
 
 IMPROVEMENTS:
 IMPROVEMENTS:
    
    
-   ZOOKEEPER-64. Log system env information when initializing server and
-client (pat via mahadev)
+  ZOOKEEPER-161. Content needed: "Designing a ZooKeeper Deployment"
+  (breed via phunt)
+
+  ZOOKEEPER-64. Log system env information when initializing server and
+  client (pat via mahadev)
+
+  ZOOKEEPER-243. add SEQUENCE flag documentation to the programming guide.
+  (patrick hunt via mahadev)
 
 
-   ZOOKEEPER-243. add SEQUENCE flag documentation to the programming guide.
-(patrick hunt via mahadev)
 
 
 Release 3.0.0 - 2008-10-21
 Release 3.0.0 - 2008-10-21
 
 

+ 110 - 19
docs/zookeeperAdmin.html

@@ -201,6 +201,14 @@ document.write("Last Published: " + document.lastModified);
 <ul class="minitoc">
 <ul class="minitoc">
 <li>
 <li>
 <a href="#sc_designing">Designing a ZooKeeper Deployment</a>
 <a href="#sc_designing">Designing a ZooKeeper Deployment</a>
+<ul class="minitoc">
+<li>
+<a href="#sc_CrossMachineRequirements">Cross Machine Requirements</a>
+</li>
+<li>
+<a href="#Single+Machine+Requirements">Single Machine Requirements</a>
+</li>
+</ul>
 </li>
 </li>
 <li>
 <li>
 <a href="#sc_provisioning">Provisioning</a>
 <a href="#sc_provisioning">Provisioning</a>
@@ -621,20 +629,103 @@ server.3=zoo3:2888:3888</span>
 </ul>
 </ul>
 <a name="N10160"></a><a name="sc_designing"></a>
 <a name="N10160"></a><a name="sc_designing"></a>
 <h3 class="h4">Designing a ZooKeeper Deployment</h3>
 <h3 class="h4">Designing a ZooKeeper Deployment</h3>
-<p></p>
-<a name="N10169"></a><a name="sc_provisioning"></a>
+<p>The reliablity of ZooKeeper rests on two basic assumptions.</p>
+<ol>
+        
+<li>
+<p> Only a minority of servers in a deployment
+            will fail. <em>Failure</em> in this context
+            means a machine crash, or some error in the network that
+            partitions a server off from the majority.</p>
+        
+</li>
+        
+<li>
+<p> Deployed machines operate correctly. To
+            operate correctly means to execute code correctly, to have
+            clocks that work properly, and to have storage and network
+            components that perform consistently.</p>
+        
+</li>
+      
+</ol>
+<p>The sections below contain considerations for ZooKeeper
+      administrators to maximize the probability for these assumptions
+      to hold true. Some of these are cross-machines considerations,
+      and others are things you should consider for each and every
+      machine in your deployment.</p>
+<a name="N1017C"></a><a name="sc_CrossMachineRequirements"></a>
+<h4>Cross Machine Requirements</h4>
+<p>For the ZooKeeper service to be active, there must be a
+        majority of non-failing machines that can communicate with
+        each other. To create a deployment that can tolerate the
+        failure of F machines, you should count on deploying 2xF+1
+        machines.  Thus, a deployment that consists of three machines
+        can handle one failure, and a deployment of five machines can
+        handle two failures. Note that a deployment of six machines
+        can only handle two failures since three machines is not a
+        majority.  For this reason, ZooKeeper deployments are usually
+        made up of an odd number of machines.</p>
+<p>To achieve the highest probability of tolerating a failure
+        you should try to make machine failures independent. For
+        example, if most of the machines share the same switch,
+        failure of that switch could cause a correlated failure and
+        bring down the service. The same holds true of shared power
+        circuits, cooling systems, etc.</p>
+<a name="N10189"></a><a name="Single+Machine+Requirements"></a>
+<h4>Single Machine Requirements</h4>
+<p>If ZooKeeper has to contend with other applications for
+        access to resourses like storage media, CPU, network, or
+        memory, its performance will suffer markedly.  ZooKeeper has
+        strong durability guarantees, which means it uses storage
+        media to log changes before the operation responsible for the
+        change is allowed to complete. You should be aware of this
+        dependency then, and take great care if you want to ensure
+        that ZooKeeper operations aren&rsquo;t held up by your media. Here
+        are some things you can do to minimize that sort of
+        degradation:
+      </p>
+<ul>
+        
+<li>
+          
+<p>ZooKeeper's transaction log must be on a dedicated
+            device. (A dedicated partition is not enough.) ZooKeeper
+            writes the log sequentially, without seeking Sharing your
+            log device with other processes can cause seeks and
+            contention, which in turn can cause multi-second
+            delays.</p>
+        
+</li>
+
+        
+<li>
+          
+<p>Do not put ZooKeeper in a situation that can cause a
+            swap. In order for ZooKeeper to function with any sort of
+            timeliness, it simply cannot be allowed to swap.
+            Therefore, make certain that the maximum heap size given
+            to ZooKeeper is not bigger than the amount of real memory
+            available to ZooKeeper.  For more on this, see
+            <a href="#sc_commonProblems">Things to Avoid</a>
+            below. </p>
+        
+</li>
+      
+</ul>
+<a name="N101A7"></a><a name="sc_provisioning"></a>
 <h3 class="h4">Provisioning</h3>
 <h3 class="h4">Provisioning</h3>
 <p></p>
 <p></p>
-<a name="N10172"></a><a name="sc_strengthsAndLimitations"></a>
+<a name="N101B0"></a><a name="sc_strengthsAndLimitations"></a>
 <h3 class="h4">Things to Consider: ZooKeeper Strengths and Limitations</h3>
 <h3 class="h4">Things to Consider: ZooKeeper Strengths and Limitations</h3>
 <p></p>
 <p></p>
-<a name="N1017B"></a><a name="sc_administering"></a>
+<a name="N101B9"></a><a name="sc_administering"></a>
 <h3 class="h4">Administering</h3>
 <h3 class="h4">Administering</h3>
 <p></p>
 <p></p>
-<a name="N10184"></a><a name="sc_monitoring"></a>
+<a name="N101C2"></a><a name="sc_monitoring"></a>
 <h3 class="h4">Monitoring</h3>
 <h3 class="h4">Monitoring</h3>
 <p></p>
 <p></p>
-<a name="N1018D"></a><a name="sc_logging"></a>
+<a name="N101CB"></a><a name="sc_logging"></a>
 <h3 class="h4">Logging</h3>
 <h3 class="h4">Logging</h3>
 <p>ZooKeeper uses <strong>log4j</strong> version 1.2 as 
 <p>ZooKeeper uses <strong>log4j</strong> version 1.2 as 
       its logging infrastructure. The  ZooKeeper default <span class="codefrag filename">log4j.properties</span> 
       its logging infrastructure. The  ZooKeeper default <span class="codefrag filename">log4j.properties</span> 
@@ -644,10 +735,10 @@ server.3=zoo3:2888:3888</span>
 <p>For more information, see 
 <p>For more information, see 
       <a href="http://logging.apache.org/log4j/1.2/manual.html#defaultInit">Log4j Default Initialization Procedure</a> 
       <a href="http://logging.apache.org/log4j/1.2/manual.html#defaultInit">Log4j Default Initialization Procedure</a> 
       of the log4j manual.</p>
       of the log4j manual.</p>
-<a name="N101AD"></a><a name="sc_troubleshooting"></a>
+<a name="N101EB"></a><a name="sc_troubleshooting"></a>
 <h3 class="h4">Troubleshooting</h3>
 <h3 class="h4">Troubleshooting</h3>
 <p></p>
 <p></p>
-<a name="N101B6"></a><a name="sc_configuration"></a>
+<a name="N101F4"></a><a name="sc_configuration"></a>
 <h3 class="h4">Configuration Parameters</h3>
 <h3 class="h4">Configuration Parameters</h3>
 <p>ZooKeeper's behavior is governed by the ZooKeeper configuration
 <p>ZooKeeper's behavior is governed by the ZooKeeper configuration
       file. This file is designed so that the exact same file can be used by
       file. This file is designed so that the exact same file can be used by
@@ -655,7 +746,7 @@ server.3=zoo3:2888:3888</span>
       layouts are the same. If servers use different configuration files, care
       layouts are the same. If servers use different configuration files, care
       must be taken to ensure that the list of servers in all of the different
       must be taken to ensure that the list of servers in all of the different
       configuration files match.</p>
       configuration files match.</p>
-<a name="N101BF"></a><a name="sc_minimumConfiguration"></a>
+<a name="N101FD"></a><a name="sc_minimumConfiguration"></a>
 <h4>Minimum Configuration</h4>
 <h4>Minimum Configuration</h4>
 <p>Here are the minimum configuration keywords that must be defined
 <p>Here are the minimum configuration keywords that must be defined
         in the configuration file:</p>
         in the configuration file:</p>
@@ -702,7 +793,7 @@ server.3=zoo3:2888:3888</span>
 </dd>
 </dd>
         
         
 </dl>
 </dl>
-<a name="N101E6"></a><a name="sc_advancedConfiguration"></a>
+<a name="N10224"></a><a name="sc_advancedConfiguration"></a>
 <h4>Advanced Configuration</h4>
 <h4>Advanced Configuration</h4>
 <p>The configuration settings in the section are optional. You can
 <p>The configuration settings in the section are optional. You can
         use them to further fine tune the behaviour of your ZooKeeper servers.
         use them to further fine tune the behaviour of your ZooKeeper servers.
@@ -793,7 +884,7 @@ server.3=zoo3:2888:3888</span>
 </dd>
 </dd>
         
         
 </dl>
 </dl>
-<a name="N10246"></a><a name="sc_clusterOptions"></a>
+<a name="N10284"></a><a name="sc_clusterOptions"></a>
 <h4>Cluster Options</h4>
 <h4>Cluster Options</h4>
 <p>The options in this section are designed for use with an ensemble
 <p>The options in this section are designed for use with an ensemble
         of servers -- that is, when deploying clusters of servers.</p>
         of servers -- that is, when deploying clusters of servers.</p>
@@ -883,7 +974,7 @@ server.3=zoo3:2888:3888</span>
         
         
 </dl>
 </dl>
 <p></p>
 <p></p>
-<a name="N102A3"></a><a name="Unsafe+Options"></a>
+<a name="N102E1"></a><a name="Unsafe+Options"></a>
 <h4>Unsafe Options</h4>
 <h4>Unsafe Options</h4>
 <p>The following options can be useful, but be careful when you use
 <p>The following options can be useful, but be careful when you use
         them. The risk of each is explained along with the explanation of what
         them. The risk of each is explained along with the explanation of what
@@ -928,7 +1019,7 @@ server.3=zoo3:2888:3888</span>
 </dd>
 </dd>
         
         
 </dl>
 </dl>
-<a name="N102D5"></a><a name="sc_zkCommands"></a>
+<a name="N10313"></a><a name="sc_zkCommands"></a>
 <h3 class="h4">ZooKeeper Commands: The Four Letter Words</h3>
 <h3 class="h4">ZooKeeper Commands: The Four Letter Words</h3>
 <p>ZooKeeper responds to a small set of commands. Each command is
 <p>ZooKeeper responds to a small set of commands. Each command is
       composed of four letters. You issue the commands to ZooKeeper via telnet
       composed of four letters. You issue the commands to ZooKeeper via telnet
@@ -993,7 +1084,7 @@ server.3=zoo3:2888:3888</span>
 <pre class="code">$ echo ruok | nc 127.0.0.1 5111
 <pre class="code">$ echo ruok | nc 127.0.0.1 5111
 imok
 imok
 </pre>
 </pre>
-<a name="N10315"></a><a name="sc_dataFileManagement"></a>
+<a name="N10353"></a><a name="sc_dataFileManagement"></a>
 <h3 class="h4">Data File Management</h3>
 <h3 class="h4">Data File Management</h3>
 <p>ZooKeeper stores its data in a data directory and its transaction
 <p>ZooKeeper stores its data in a data directory and its transaction
       log in a transaction log directory. By default these two directories are
       log in a transaction log directory. By default these two directories are
@@ -1001,7 +1092,7 @@ imok
       transaction log files in a separate directory than the data files.
       transaction log files in a separate directory than the data files.
       Throughput increases and latency decreases when transaction logs reside
       Throughput increases and latency decreases when transaction logs reside
       on a dedicated log devices.</p>
       on a dedicated log devices.</p>
-<a name="N1031E"></a><a name="The+Data+Directory"></a>
+<a name="N1035C"></a><a name="The+Data+Directory"></a>
 <h4>The Data Directory</h4>
 <h4>The Data Directory</h4>
 <p>This directory has two files in it:</p>
 <p>This directory has two files in it:</p>
 <ul>
 <ul>
@@ -1047,14 +1138,14 @@ imok
         idempotent nature of its updates. By replaying the transaction log
         idempotent nature of its updates. By replaying the transaction log
         against fuzzy snapshots ZooKeeper gets the state of the system at the
         against fuzzy snapshots ZooKeeper gets the state of the system at the
         end of the log.</p>
         end of the log.</p>
-<a name="N1035A"></a><a name="The+Log+Directory"></a>
+<a name="N10398"></a><a name="The+Log+Directory"></a>
 <h4>The Log Directory</h4>
 <h4>The Log Directory</h4>
 <p>The Log Directory contains the ZooKeeper transaction logs.
 <p>The Log Directory contains the ZooKeeper transaction logs.
         Before any update takes place, ZooKeeper ensures that the transaction
         Before any update takes place, ZooKeeper ensures that the transaction
         that represents the update is written to non-volatile storage. A new
         that represents the update is written to non-volatile storage. A new
         log file is started each time a snapshot is begun. The log file's
         log file is started each time a snapshot is begun. The log file's
         suffix is the first zxid written to that log.</p>
         suffix is the first zxid written to that log.</p>
-<a name="N10364"></a><a name="File+Management"></a>
+<a name="N103A2"></a><a name="File+Management"></a>
 <h4>File Management</h4>
 <h4>File Management</h4>
 <p>The format of snapshot and log files does not change between
 <p>The format of snapshot and log files does not change between
         standalone ZooKeeper servers and different configurations of
         standalone ZooKeeper servers and different configurations of
@@ -1071,7 +1162,7 @@ imok
         needs the latest complete fuzzy snapshot and the log files from the
         needs the latest complete fuzzy snapshot and the log files from the
         start of that snapshot. The PurgeTxnLog utility implements a simple
         start of that snapshot. The PurgeTxnLog utility implements a simple
         retention policy that administrators can use.</p>
         retention policy that administrators can use.</p>
-<a name="N10375"></a><a name="sc_commonProblems"></a>
+<a name="N103B3"></a><a name="sc_commonProblems"></a>
 <h3 class="h4">Things to Avoid</h3>
 <h3 class="h4">Things to Avoid</h3>
 <p>Here are some common problems you can avoid by configuring
 <p>Here are some common problems you can avoid by configuring
       ZooKeeper correctly:</p>
       ZooKeeper correctly:</p>
@@ -1125,7 +1216,7 @@ imok
 </dd>
 </dd>
       
       
 </dl>
 </dl>
-<a name="N10399"></a><a name="sc_bestPractices"></a>
+<a name="N103D7"></a><a name="sc_bestPractices"></a>
 <h3 class="h4">Best Practices</h3>
 <h3 class="h4">Best Practices</h3>
 <p>For best results, take note of the following list of good
 <p>For best results, take note of the following list of good
       Zookeeper practices. <em>[tbd...]</em>
       Zookeeper practices. <em>[tbd...]</em>

文件差異過大導致無法顯示
+ 2 - 2
docs/zookeeperAdmin.pdf


+ 79 - 1
src/docs/src/documentation/content/xdocs/zookeeperAdmin.xml

@@ -282,7 +282,85 @@ server.3=zoo3:2888:3888</computeroutput></para>
     <section id="sc_designing">
     <section id="sc_designing">
       <title>Designing a ZooKeeper Deployment</title>
       <title>Designing a ZooKeeper Deployment</title>
 
 
-      <para></para>
+      <para>The reliablity of ZooKeeper rests on two basic assumptions.</para>
+      <orderedlist>
+        <listitem><para> Only a minority of servers in a deployment
+            will fail. <emphasis>Failure</emphasis> in this context
+            means a machine crash, or some error in the network that
+            partitions a server off from the majority.</para>
+        </listitem>
+        <listitem><para> Deployed machines operate correctly. To
+            operate correctly means to execute code correctly, to have
+            clocks that work properly, and to have storage and network
+            components that perform consistently.</para>
+        </listitem>
+      </orderedlist>
+    
+    <para>The sections below contain considerations for ZooKeeper
+      administrators to maximize the probability for these assumptions
+      to hold true. Some of these are cross-machines considerations,
+      and others are things you should consider for each and every
+      machine in your deployment.</para>
+
+    <section id="sc_CrossMachineRequirements">
+      <title>Cross Machine Requirements</title>
+    
+      <para>For the ZooKeeper service to be active, there must be a
+        majority of non-failing machines that can communicate with
+        each other. To create a deployment that can tolerate the
+        failure of F machines, you should count on deploying 2xF+1
+        machines.  Thus, a deployment that consists of three machines
+        can handle one failure, and a deployment of five machines can
+        handle two failures. Note that a deployment of six machines
+        can only handle two failures since three machines is not a
+        majority.  For this reason, ZooKeeper deployments are usually
+        made up of an odd number of machines.</para>
+
+      <para>To achieve the highest probability of tolerating a failure
+        you should try to make machine failures independent. For
+        example, if most of the machines share the same switch,
+        failure of that switch could cause a correlated failure and
+        bring down the service. The same holds true of shared power
+        circuits, cooling systems, etc.</para>
+    </section>
+
+    <section>
+      <title>Single Machine Requirements</title>
+
+      <para>If ZooKeeper has to contend with other applications for
+        access to resourses like storage media, CPU, network, or
+        memory, its performance will suffer markedly.  ZooKeeper has
+        strong durability guarantees, which means it uses storage
+        media to log changes before the operation responsible for the
+        change is allowed to complete. You should be aware of this
+        dependency then, and take great care if you want to ensure
+        that ZooKeeper operations aren’t held up by your media. Here
+        are some things you can do to minimize that sort of
+        degradation:
+      </para>
+
+      <itemizedlist>
+        <listitem>
+          <para>ZooKeeper's transaction log must be on a dedicated
+            device. (A dedicated partition is not enough.) ZooKeeper
+            writes the log sequentially, without seeking Sharing your
+            log device with other processes can cause seeks and
+            contention, which in turn can cause multi-second
+            delays.</para>
+        </listitem>
+
+        <listitem>
+          <para>Do not put ZooKeeper in a situation that can cause a
+            swap. In order for ZooKeeper to function with any sort of
+            timeliness, it simply cannot be allowed to swap.
+            Therefore, make certain that the maximum heap size given
+            to ZooKeeper is not bigger than the amount of real memory
+            available to ZooKeeper.  For more on this, see
+            <xref linkend="sc_commonProblems"/>
+            below. </para>
+        </listitem>
+      </itemizedlist>
+    </section>
     </section>
     </section>
 
 
     <section id="sc_provisioning">
     <section id="sc_provisioning">

部分文件因文件數量過多而無法顯示