Browse Source

updated with latest docs from robbie, ZOOKEEPER-93

git-svn-id: https://svn.apache.org/repos/asf/hadoop/zookeeper/trunk@698787 13f79535-47bb-0310-9956-ffa450edef68
Patrick D. Hunt 16 years ago
parent
commit
f813b6394a
37 changed files with 8983 additions and 77 deletions
  1. BIN
      docs/images/zkcomponents.jpg
  2. BIN
      docs/images/zknamespace.jpg
  3. BIN
      docs/images/zkperfRW.jpg
  4. BIN
      docs/images/zkperfreliability.jpg
  5. BIN
      docs/images/zkservice.jpg
  6. 43 15
      docs/index.html
  7. 84 32
      docs/index.pdf
  8. 49 4
      docs/linkmap.html
  9. 12 12
      docs/linkmap.pdf
  10. 911 0
      docs/recipes.html
  11. 107 0
      docs/recipes.pdf
  12. 1056 0
      docs/zookeeperAdmin.html
  13. 151 0
      docs/zookeeperAdmin.pdf
  14. 206 0
      docs/zookeeperOtherInfo.html
  15. 151 0
      docs/zookeeperOtherInfo.pdf
  16. 629 0
      docs/zookeeperOver.html
  17. BIN
      docs/zookeeperOver.pdf
  18. 1540 0
      docs/zookeeperProgrammers.html
  19. 195 0
      docs/zookeeperProgrammers.pdf
  20. 446 0
      docs/zookeeperStarted.html
  21. 96 0
      docs/zookeeperStarted.pdf
  22. 2 0
      src/docs/forrest.properties
  23. 16 8
      src/docs/src/documentation/content/xdocs/index.xml
  24. 623 0
      src/docs/src/documentation/content/xdocs/recipes.xml
  25. 11 6
      src/docs/src/documentation/content/xdocs/site.xml
  26. 827 0
      src/docs/src/documentation/content/xdocs/zookeeperAdmin.xml
  27. 46 0
      src/docs/src/documentation/content/xdocs/zookeeperOtherInfo.xml
  28. 437 0
      src/docs/src/documentation/content/xdocs/zookeeperOver.xml
  29. 1077 0
      src/docs/src/documentation/content/xdocs/zookeeperProgrammers.xml
  30. 268 0
      src/docs/src/documentation/content/xdocs/zookeeperStarted.xml
  31. BIN
      src/docs/src/documentation/resources/images/architecture.gif
  32. BIN
      src/docs/src/documentation/resources/images/zkarch.jpg
  33. BIN
      src/docs/src/documentation/resources/images/zkcomponents.jpg
  34. BIN
      src/docs/src/documentation/resources/images/zknamespace.jpg
  35. BIN
      src/docs/src/documentation/resources/images/zkperfRW.jpg
  36. BIN
      src/docs/src/documentation/resources/images/zkperfreliability.jpg
  37. BIN
      src/docs/src/documentation/resources/images/zkservice.jpg

BIN
docs/images/zkcomponents.jpg


BIN
docs/images/zknamespace.jpg


BIN
docs/images/zkperfRW.jpg


BIN
docs/images/zkperfreliability.jpg


BIN
docs/images/zkservice.jpg


+ 43 - 15
docs/index.html

@@ -5,7 +5,7 @@
 <meta content="Apache Forrest" name="Generator">
 <meta name="Forrest-version" content="0.8">
 <meta name="Forrest-skin-name" content="pelt">
-<title>ZooKeeper Documentation</title>
+<title>ZooKeeper: Because Coordinating Distributed Systems is a Zoo</title>
 <link type="text/css" href="skin/basic.css" rel="stylesheet">
 <link media="screen" type="text/css" href="skin/screen.css" rel="stylesheet">
 <link media="print" type="text/css" href="skin/print.css" rel="stylesheet">
@@ -105,10 +105,22 @@ document.write("Last Published: " + document.lastModified);
 <div onclick="SwitchMenu('menu_selected_1.1', 'skin/')" id="menu_selected_1.1Title" class="menutitle" style="background-image: url('skin/images/chapter_open.gif');">Documentation</div>
 <div id="menu_selected_1.1" class="selectedmenuitemgroup" style="display: block;">
 <div class="menupage">
-<div class="menupagetitle">Overview</div>
+<div class="menupagetitle">Welcome</div>
 </div>
 <div class="menuitem">
-<a href="api/overview-summary.html#overview_description">Getting Started</a>
+<a href="zookeeperOver.html">Zookeeper Overview</a>
+</div>
+<div class="menuitem">
+<a href="zookeeperStarted.html">Getting Started</a>
+</div>
+<div class="menuitem">
+<a href="zookeeperProgrammers.html">Programmer's Guide</a>
+</div>
+<div class="menuitem">
+<a href="recipes.html">Recipes</a>
+</div>
+<div class="menuitem">
+<a href="zookeeperAdmin.html">Administrator's Guide</a>
 </div>
 <div class="menuitem">
 <a href="api/index.html">API Docs</a>
@@ -122,6 +134,9 @@ document.write("Last Published: " + document.lastModified);
 <div class="menuitem">
 <a href="http://hadoop.apache.org/zookeeper/mailing_lists.html">Mailing Lists</a>
 </div>
+<div class="menuitem">
+<a href="zookeeperOtherInfo.html">Other Info</a>
+</div>
 </div>
 <div id="credit">
 <hr>
@@ -145,31 +160,44 @@ document.write("Last Published: " + document.lastModified);
 <a class="dida" href="index.pdf"><img alt="PDF -icon" src="skin/images/pdfdoc.gif" class="skin"><br>
         PDF</a>
 </div>
-<h1>ZooKeeper Documentation</h1>
+<h1>ZooKeeper: Because Coordinating Distributed Systems is a Zoo</h1>
     
 <p>
-    The following documents provide concepts and procedures that will help you 
-    get started using ZooKeeper. If you have more questions, you can ask the 
-    <a href="http://hadoop.apache.org/zookeeper/mailing_lists.html">mailing list</a> or browse the archives.
+ZooKeeper is a high-performance coordination service for distributed applications. It exposes common services - such as naming, configuration management, synchronization, and group services - in a simple interface so you don't have to write them from scratch. You can use it off-the-shelf to implement consensus, group management, leader election, and presence protocols. And you can build on it for your own, specific needs.
+</p>
+
+
+<p>
+The following documents provide concepts and procedures to get you started using ZooKeeper. If you have more questions, please ask the <a href="http://hadoop.apache.org/zookeeper/mailing_lists.html">mailing list</a> or browse the archives.
     </p>
     
 <ul>
+
       
 <li>
-<a href="api/overview-summary.html#overview_description">Getting Started</a>
-</li>
+<a href="zookeeperOver.html">Overview</a> - a bird's eye view of ZooKeeper, including design concepts and architecture</li>
       
 <li>
-<a href="api/index.html">API Docs</a>
-</li>
+<a href="zookeeperStarted.html">Getting Started</a> - a tutorial-style guide for developers to install, run, and program to ZooKeeper</li>
       
 <li>
-<a href="http://wiki.apache.org/hadoop/ZooKeeper">Wiki</a>
-</li>
+<a href="zookeeperProgrammers.html">Programmer's Guide</a> - an application developer's guide to ZooKeeper</li>
       
 <li>
-<a href="http://wiki.apache.org/hadoop/ZooKeeper/FAQ">FAQ</a>
-</li>
+<a href="recipes.html">ZooKeeper Recipes</a> - a set of common, higher level solutions using ZooKeeper</li>
+      
+<li>
+<a href="zookeeperAdmin.html">Administrator's Guide</a> - a guide for system administrators and anyone else who might deploy Zookeeer</li>
+      
+<li>
+<a href="api/index.html">API Docs</a> - the technical reference to ZooKeeper APIs</li>
+      
+<li>
+<a href="http://wiki.apache.org/hadoop/ZooKeeper">Wiki</a> - miscellaneous, informal ZooKeeper documentation, in Wiki format</li>
+      
+<li>
+<a href="http://wiki.apache.org/hadoop/ZooKeeper/FAQ">FAQ</a> - frequently asked questions</li>    
+
     
 </ul>
   

+ 84 - 32
docs/index.pdf

@@ -5,10 +5,10 @@
 /Producer (FOP 0.20.5) >>
 endobj
 5 0 obj
-<< /Length 820 /Filter [ /ASCII85Decode /FlateDecode ]
+<< /Length 1907 /Filter [ /ASCII85Decode /FlateDecode ]
  >>
 stream
-Gat=*bDt:1']%q&]UY*ff66LCQ:tLPdr%2m;PeQJ8.u*QKU?`\B`>`kdl0N'67gKsnTin'D"A2uTB7b8*"s#<*E%A5%RDV#,%<_MKlt`%"M7ujU(-s?pKOWc;Jr>F7j"-@R)aDX<3k9"U5H!F(&-oFMS9s7JLG;Z7qBL2Yt'%3(9M.B,O,MX99n]*O;0;#8iu1`[^KB,ZP:1f<?uSmQc)<R>+`lXV^5GAXlIQ)LFgbg[?gX><psd9Mj4FDZANcNdr*t2L`XUL+)tAB-5tm][eI/!&7!8&0^(u?:qc7,=m'(>QiJLR!?rt"iV9^brAAWPSCT^sq1p"LA2rX^!#IFI^]nco@ZRggP9YY]iM-KY37#l;e:[E$%2fR"+GEO[j\64tAWKiMY0/H`SkpC-$jHA[$\E)MhQMV7_g,9`SQiTG\4NnN"`'C3g&BIc:s*i^O+/@*2PqC?XHRe8QR*^H%\;66Y)>`qTjAQL'7!Ig1ohK(ROBTITKWY`maX,^ZY)'6^#MUc<R\-m/6C$lQfLs$rYQ[f*/<<,_-jVP!i(\$/qq-5S8s4L;ssL$C"OD8`ZF`Vla!4*A%Y8okZuM`4ACYSM&CT*>a@UJ3!'p(_+h]]-jH_8'UgnWIsG&.`5hNHI#6HkY^/P=l5S1ui6qFd]k5k5k+R^NWt![]))0jt[i+<O3+(gSkHX[%BadlqPqSR^a^?h:85$>V<LA'ghs1=7\H)4u+`QWOA3:cGNQ>b?IFm.e?.jT'KaDKE7S$pfmcJDFfeuT5]hlo;&8TX\X'LiGrspf6[J+QC1tj7SAL[:YVV$q.+1i<LGl~>
+Gat=,8TWWE'Y`m7n<RT3@RuS'WSX/@cD3VQM)EarRiZ5MOtZnC"sBb@^V8W`#B#hb5$.e,^:,c2qsE\)]k8MGG-(([j])8CQ"(h8Y2uTHf@j2NJa_5t&6[c;qWIP/SGu>T:;ef(X>L>0p&4Q1P/O1BHK,+j@Rmc7T\8:`orke%D1Lhr)Y#&;KqM4Q6$=1*nKJ@(9"dO,mo]Thij9-Nh.+VsN@"icN#b!!r.4VX"ElVBo/kMfO7[Br/&FFcBq-J;In=tRGc#[2&LRq-e3Ue#R7apCnsn+'`f[^dr`n[EjCcaaO]+`f?>05,?3sEV$#+p)C[eT&N^*q:Pf9VBfitCj[\nmY%$Xa-M%:jX=TN3t,tj8MBCrs-)LRAtT_1V?17qN:"X<4G>#^#aP:i%Ubo)(CL%8RZSKbWDIGlJk&C4Y`]-$&H3)tK-B0cq#kt"_1F2>[b-H^5J&\.Nfa>/U')K#J\JesXEEF%>`m)/Sp[Gnlc8&FuKP1J3_Ht9`cP"_24mDpB-Y%8m8_42[-,YL>H"]9Xs_iN@;2k_77C?0s_4Q-^gruFM=bZaA4U^RgOn>&<\,7PA+=Re1H5is>cH^]J$+CnO[2PEJTkaJe?)HkB'5M($RLK7;Rp6<ofr'Y`u9i9Ka+j%Nk4<_=7^jL[85*06nA?q#F;05Dj;j@6re+KN?PHZ\,B6]fcP&^S(5>!-c!XC9Y_D6l)*m="A6FG8?\m>8:2$-rg5?TKI#<#Ic_[fRL-W8.B+FE/C)pE?h#`%!aH<bl$^#+!g`PLh>eK@-31o<rNA]33];R+3!CrZP."Ro^IP2'FdTpN2L$%uliQ*c^k<pcrZg0atc)#I*\T-Y?l\p&]rNq>a9BRb6p:3TXoa*&<-nD=eo1IcGVW24ZPQ!K>cY7e%p)6"&ApOqEb]Kir^5>G4>+Qj&@\2\66!IE*X5gstP@eU5*'tF=N2"5q,!"%""#`*8b-*5mi\PFJ6A67r>m=msXel%TB4lIP'?ge<IXfJVqK^XQdNIL@Zh24CYU6s7JP?l-k>[fo]*h(+oa51WaH%95Jp)jq=Z)`T=G_s'K'p)5392QI0d$UqHEIl$ITem+f3W5'R@9`uZ9_kQR.as?-:_MkVDrh39V^@R(c<DC$)r5/Ch"scE3J$R(+&rfRMiI:#khAt"EUD&2ETM_bqTpXZ'(JT>8MT.,aUp/K0nor=&BG/p't*WE^!-FP+SDWZ%RqUr<6]>X!AP!;37I/N_*ejNC,Lf.(fXSK]^B]>]=akU\sOGHp*L4ES_IE;D5P=53=uHH.j=h%+iY)m!!5/QIl.k_6M(?R7tJ`ge^Tu\2`/G4.Xi:@5X&IQ,b3b"Rq_;5TG1AW;E0N>.OsBTL6_GA2&c6r*Z!_s"eEB%DEE9"Cm>^j`f3ihb'`*Vmrmts>$[e2<I3hDJ]YA6">:*;#/DPcEdc]'U3Qmt5+7[3kkR7N2P?.V3\$G!MQ@mj^bKMtg$p8un-gEP"p`$:&QFfj`S'_rg:AU:FcY=?SFMLL?/Gt!,.OjiHe/K=7WlsT5"214H)TW20tJ(n^,=b1I%Fh3@&DS'c8,8/`8Em<#cf/T=b,^)m>h<W5Or?be@7JS5liPC.V%%2oAq]*m0/^`62!\Xj-=DWQ1UF8LX5No-i;0sAdQf<F?SDH@""f.IIQOr-iHf<XQJIbjVh`]9F,-J7_%tp6F6M[O=cp!]CQP?_ViU;,8J;f1bP@W.10D0X&76Oai:.DqF6YmnDJP@pLW3UO:YN90?B2P)23WnU$hpLFeX@<0_IK.C@,:aR>';aCKa_%YpG46mo2R;H`M`=U!UhLJlKE#>H/Dg]kbGJnGFK!.bD?\Im_l=/*&[P?F?t!c/L@:'=`MPqr/S'Y]J.6Cfbu@#0.S0MjcSLeZPLf`/3hsXnr'Xrb#L~>
 endstream
 endobj
 6 0 obj
@@ -27,12 +27,16 @@ endobj
 10 0 R
 11 0 R
 12 0 R
+13 0 R
+14 0 R
+15 0 R
+16 0 R
 ]
 endobj
 8 0 obj
 << /Type /Annot
 /Subtype /Link
-/Rect [ 392.94 572.6 447.288 560.6 ]
+/Rect [ 356.268 471.6 410.616 459.6 ]
 /C [ 0 0 0 ]
 /Border [ 0 0 0 ]
 /A << /URI (http://hadoop.apache.org/zookeeper/mailing_lists.html)
@@ -43,10 +47,10 @@ endobj
 9 0 obj
 << /Type /Annot
 /Subtype /Link
-/Rect [ 108.0 542.2 180.996 530.2 ]
+/Rect [ 108.0 454.4 155.316 442.4 ]
 /C [ 0 0 0 ]
 /Border [ 0 0 0 ]
-/A << /URI (api/overview-summary.html#overview_description)
+/A << /URI (zookeeperOver.html)
 /S /URI >>
 /H /I
 >>
@@ -54,10 +58,10 @@ endobj
 10 0 obj
 << /Type /Annot
 /Subtype /Link
-/Rect [ 108.0 529.0 154.992 517.0 ]
+/Rect [ 108.0 441.2 180.996 429.2 ]
 /C [ 0 0 0 ]
 /Border [ 0 0 0 ]
-/A << /URI (api/index.html)
+/A << /URI (zookeeperStarted.html)
 /S /URI >>
 /H /I
 >>
@@ -65,10 +69,10 @@ endobj
 11 0 obj
 << /Type /Annot
 /Subtype /Link
-/Rect [ 108.0 515.8 132.0 503.8 ]
+/Rect [ 108.0 414.8 207.144 402.8 ]
 /C [ 0 0 0 ]
 /Border [ 0 0 0 ]
-/A << /URI (http://wiki.apache.org/hadoop/ZooKeeper)
+/A << /URI (zookeeperProgrammers.html)
 /S /URI >>
 /H /I
 >>
@@ -76,36 +80,80 @@ endobj
 12 0 obj
 << /Type /Annot
 /Subtype /Link
-/Rect [ 108.0 502.6 132.0 490.6 ]
+/Rect [ 108.0 401.6 202.968 389.6 ]
 /C [ 0 0 0 ]
 /Border [ 0 0 0 ]
-/A << /URI (http://wiki.apache.org/hadoop/ZooKeeper/FAQ)
+/A << /URI (recipes.html)
 /S /URI >>
 /H /I
 >>
 endobj
 13 0 obj
+<< /Type /Annot
+/Subtype /Link
+/Rect [ 108.0 388.4 214.488 376.4 ]
+/C [ 0 0 0 ]
+/Border [ 0 0 0 ]
+/A << /URI (zookeeperAdmin.html)
+/S /URI >>
+/H /I
+>>
+endobj
+14 0 obj
+<< /Type /Annot
+/Subtype /Link
+/Rect [ 108.0 362.0 154.992 350.0 ]
+/C [ 0 0 0 ]
+/Border [ 0 0 0 ]
+/A << /URI (api/index.html)
+/S /URI >>
+/H /I
+>>
+endobj
+15 0 obj
+<< /Type /Annot
+/Subtype /Link
+/Rect [ 108.0 348.8 132.0 336.8 ]
+/C [ 0 0 0 ]
+/Border [ 0 0 0 ]
+/A << /URI (http://wiki.apache.org/hadoop/ZooKeeper)
+/S /URI >>
+/H /I
+>>
+endobj
+16 0 obj
+<< /Type /Annot
+/Subtype /Link
+/Rect [ 108.0 335.6 132.0 323.6 ]
+/C [ 0 0 0 ]
+/Border [ 0 0 0 ]
+/A << /URI (http://wiki.apache.org/hadoop/ZooKeeper/FAQ)
+/S /URI >>
+/H /I
+>>
+endobj
+17 0 obj
 << /Type /Font
 /Subtype /Type1
 /Name /F3
 /BaseFont /Helvetica-Bold
 /Encoding /WinAnsiEncoding >>
 endobj
-14 0 obj
+18 0 obj
 << /Type /Font
 /Subtype /Type1
 /Name /F5
 /BaseFont /Times-Roman
 /Encoding /WinAnsiEncoding >>
 endobj
-15 0 obj
+19 0 obj
 << /Type /Font
 /Subtype /Type1
 /Name /F1
 /BaseFont /Helvetica
 /Encoding /WinAnsiEncoding >>
 endobj
-16 0 obj
+20 0 obj
 << /Type /Font
 /Subtype /Type1
 /Name /F2
@@ -124,34 +172,38 @@ endobj
 endobj
 3 0 obj
 << 
-/Font << /F3 13 0 R /F5 14 0 R /F1 15 0 R /F2 16 0 R >> 
+/Font << /F3 17 0 R /F5 18 0 R /F1 19 0 R /F2 20 0 R >> 
 /ProcSet [ /PDF /ImageC /Text ] >> 
 endobj
 xref
-0 17
+0 21
 0000000000 65535 f 
-0000002531 00000 n 
-0000002589 00000 n 
-0000002639 00000 n 
+0000004289 00000 n 
+0000004347 00000 n 
+0000004397 00000 n 
 0000000015 00000 n 
 0000000071 00000 n 
-0000000982 00000 n 
-0000001102 00000 n 
-0000001154 00000 n 
-0000001355 00000 n 
-0000001548 00000 n 
-0000001710 00000 n 
-0000001895 00000 n 
-0000002084 00000 n 
-0000002197 00000 n 
-0000002307 00000 n 
-0000002415 00000 n 
+0000002070 00000 n 
+0000002190 00000 n 
+0000002270 00000 n 
+0000002472 00000 n 
+0000002637 00000 n 
+0000002806 00000 n 
+0000002979 00000 n 
+0000003139 00000 n 
+0000003306 00000 n 
+0000003468 00000 n 
+0000003653 00000 n 
+0000003842 00000 n 
+0000003955 00000 n 
+0000004065 00000 n 
+0000004173 00000 n 
 trailer
 <<
-/Size 17
+/Size 21
 /Root 2 0 R
 /Info 4 0 R
 >>
 startxref
-2751
+4509
 %%EOF

+ 49 - 4
docs/linkmap.html

@@ -105,10 +105,22 @@ document.write("Last Published: " + document.lastModified);
 <div onclick="SwitchMenu('menu_1.1', 'skin/')" id="menu_1.1Title" class="menutitle">Documentation</div>
 <div id="menu_1.1" class="menuitemgroup">
 <div class="menuitem">
-<a href="index.html">Overview</a>
+<a href="index.html">Welcome</a>
 </div>
 <div class="menuitem">
-<a href="api/overview-summary.html#overview_description">Getting Started</a>
+<a href="zookeeperOver.html">Zookeeper Overview</a>
+</div>
+<div class="menuitem">
+<a href="zookeeperStarted.html">Getting Started</a>
+</div>
+<div class="menuitem">
+<a href="zookeeperProgrammers.html">Programmer's Guide</a>
+</div>
+<div class="menuitem">
+<a href="recipes.html">Recipes</a>
+</div>
+<div class="menuitem">
+<a href="zookeeperAdmin.html">Administrator's Guide</a>
 </div>
 <div class="menuitem">
 <a href="api/index.html">API Docs</a>
@@ -122,6 +134,9 @@ document.write("Last Published: " + document.lastModified);
 <div class="menuitem">
 <a href="http://hadoop.apache.org/zookeeper/mailing_lists.html">Mailing Lists</a>
 </div>
+<div class="menuitem">
+<a href="zookeeperOtherInfo.html">Other Info</a>
+</div>
 </div>
 <div id="credit"></div>
 <div id="roundbottom">
@@ -161,13 +176,37 @@ document.write("Last Published: " + document.lastModified);
     
 <ul>
 <li>
-<a href="index.html">Overview</a>&nbsp;&nbsp;___________________&nbsp;&nbsp;<em>overview</em>
+<a href="index.html">Welcome</a>&nbsp;&nbsp;___________________&nbsp;&nbsp;<em>welcome</em>
 </li>
 </ul>
     
 <ul>
 <li>
-<a href="api/overview-summary.html#overview_description">Getting Started</a>&nbsp;&nbsp;___________________&nbsp;&nbsp;<em>started</em>
+<a href="zookeeperOver.html">Zookeeper Overview</a>&nbsp;&nbsp;___________________&nbsp;&nbsp;<em>overview</em>
+</li>
+</ul>
+    
+<ul>
+<li>
+<a href="zookeeperStarted.html">Getting Started</a>&nbsp;&nbsp;___________________&nbsp;&nbsp;<em>started</em>
+</li>
+</ul>
+    
+<ul>
+<li>
+<a href="zookeeperProgrammers.html">Programmer's Guide</a>&nbsp;&nbsp;___________________&nbsp;&nbsp;<em>program</em>
+</li>
+</ul>
+    
+<ul>
+<li>
+<a href="recipes.html">Recipes</a>&nbsp;&nbsp;___________________&nbsp;&nbsp;<em>recipes</em>
+</li>
+</ul>
+    
+<ul>
+<li>
+<a href="zookeeperAdmin.html">Administrator's Guide</a>&nbsp;&nbsp;___________________&nbsp;&nbsp;<em>admin</em>
 </li>
 </ul>
     
@@ -194,6 +233,12 @@ document.write("Last Published: " + document.lastModified);
 <a href="http://hadoop.apache.org/zookeeper/mailing_lists.html">Mailing Lists</a>&nbsp;&nbsp;___________________&nbsp;&nbsp;<em>lists</em>
 </li>
 </ul>
+    
+<ul>
+<li>
+<a href="zookeeperOtherInfo.html">Other Info</a>&nbsp;&nbsp;___________________&nbsp;&nbsp;<em>other</em>
+</li>
+</ul>
   
 </ul>
 </ul>

+ 12 - 12
docs/linkmap.pdf

@@ -5,10 +5,10 @@
 /Producer (FOP 0.20.5) >>
 endobj
 5 0 obj
-<< /Length 666 /Filter [ /ASCII85Decode /FlateDecode ]
+<< /Length 840 /Filter [ /ASCII85Decode /FlateDecode ]
  >>
 stream
-GatUq?#Q2d'Sc)P'u!?plIj08E1+/X51F;IK%,^$W]Vtl<IuYZNV;h]7h_@k'T"/7ic?[-pGr$/>fGQtPn+.S#fN-dg:tQn@,k[GRP+kBS)Y2KT8+QGJ_uBn1@bKOOs"RQ(:U;gg546)@duC$k&k:Vf=EZA0_GVE"6/d/1BX$:c3$-D"HXk^gVmAs[h>d5=h%sMLT^i/cs-3RZ^!F>oWHE7EAFqB,>k7sPB![=;sb7)Y9Z68it6GM_'j\BIisEX;_1E*CLJ2KHE?h@8@<>N@Aku2I/>o1ni).G^bCYe:pR9E*.mYqJmq9t!;dDh^A&Y<m!$^c$);Ss(X8k,O6.@W3dGX"@,<4IBmrS!I_Nng7ENG`.cs)?o:^q'M,'0V!lGVU0@C\;D!Vp@)c7K@c6dCg:_`GnB78:"!2ZB-&(DK\o%,f.H.+@dgil?$'cra!;GGV[C3St]h7\,4Z[mLGY(#TqVKaL.4U\=Ph$7q)nOu&@e$S:ZK]\%dq-]EBWnP[(^nA7".31J3]jg`_kLb.LRR;;J@2-M-!STOY8/S9NaLjV'Wbj5]R6NTpO9Ce.4U!VG,L/R86Pf;Pf=Ll]WQj#HDXft/[LkeRFILG)Bede/,]:9GR5p)GafP-K8A"0q3fiC5%6Et3M.PLJ!]#J<o`*a8X@3~>
+Gatn%;,>q#&BE]".Ju)K^p>'98KJ6J`l&"=84X]Kj<\\ZOi/D_WVH1.'&#=<&1O]G>2NJns6jcrF2tQ[q90(s63Q;ELRPur6X]"aJF"))W:=kbT[,t:kMXt.XYTGY*0DAN1C'8HGfW>,4(@_=F6</$+.qc6aMQG^i&?J,=_.dkl')En/-m@g[$-F"Wbt/-B'-lLMgkFfcl&Bb2`>I))i;k`9H(rM/(YD&l6mN?W]&BS0q1eP4Za]nB+^*>(\W2BfK72ae1;!f#h!"e?U^u;P7'a[n=R1^I6dcY+:0`\+u0S[jZ!`9GQU5d,\V^VKQ3@VWD5@^h)Ac,0'8o(eF1ahTi<E.*<s_,#:&qd"RSrrG#RL=.#-L$)G"[g0rMMmJ`-md:TY_7/7m?RTrTPD4Xmq4b=[mK$5*h[ofhRB/.MQlUgCm/94Lr+l;;JZW/OD_YH"_u.,mV=H5CW0PgfWq^&l!NoVrOL/uqcCD4YZF3r9%V`#tq!-sLjUe#\dn(<mAXW)D1g/XM#V[>R&aeWf\3f-8I>U$R1SN;em]BH8o<LWZ?RRYF7Y[#>dP(Hp\jHWeQEFODb=.jdkG8`tlFB\hIC#;mIA8$A\+gMg(R4<,r6Zc1/oESTJFnMj61UX9/</lXa=<A*[<S%jN*m6GKY%MZ6%iY5pegBcl*m15^KqP%*Wq+$B2+7l1rmc(\dI>PYX[^BlI"Dk:5J-"@2,=iZM,Zo@OhJq#-&NdaCa[-V:k&b.$0c4m7+s4+`P<%%PF/;oQe+[&?K]1c3#_c$3[WWRr74Al$^=>l_'p[f>e!hOu?iJDt/B`@7eZKosf(Qd&;qS#I%k`l_df~>
 endstream
 endobj
 6 0 obj
@@ -72,17 +72,17 @@ endobj
 xref
 0 12
 0000000000 65535 f 
-0000001489 00000 n 
-0000001547 00000 n 
-0000001597 00000 n 
+0000001663 00000 n 
+0000001721 00000 n 
+0000001771 00000 n 
 0000000015 00000 n 
 0000000071 00000 n 
-0000000828 00000 n 
-0000000934 00000 n 
-0000001046 00000 n 
-0000001155 00000 n 
-0000001265 00000 n 
-0000001373 00000 n 
+0000001002 00000 n 
+0000001108 00000 n 
+0000001220 00000 n 
+0000001329 00000 n 
+0000001439 00000 n 
+0000001547 00000 n 
 trailer
 <<
 /Size 12
@@ -90,5 +90,5 @@ trailer
 /Info 4 0 R
 >>
 startxref
-1717
+1891
 %%EOF

+ 911 - 0
docs/recipes.html

@@ -0,0 +1,911 @@
+<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
+<html>
+<head>
+<META http-equiv="Content-Type" content="text/html; charset=UTF-8">
+<meta content="Apache Forrest" name="Generator">
+<meta name="Forrest-version" content="0.8">
+<meta name="Forrest-skin-name" content="pelt">
+<title></title>
+<link type="text/css" href="skin/basic.css" rel="stylesheet">
+<link media="screen" type="text/css" href="skin/screen.css" rel="stylesheet">
+<link media="print" type="text/css" href="skin/print.css" rel="stylesheet">
+<link type="text/css" href="skin/profile.css" rel="stylesheet">
+<script src="skin/getBlank.js" language="javascript" type="text/javascript"></script><script src="skin/getMenu.js" language="javascript" type="text/javascript"></script><script src="skin/fontsize.js" language="javascript" type="text/javascript"></script>
+<link rel="shortcut icon" href="images/favicon.ico">
+</head>
+<body onload="init()">
+<script type="text/javascript">ndeSetTextSize();</script>
+<div id="top">
+<!--+
+    |breadtrail
+    +-->
+<div class="breadtrail">
+<a href="http://www.apache.org/">Apache</a> &gt; <a href="http://hadoop.apache.org/">Hadoop</a> &gt; <a href="http://hadoop.apache.org/zookeeper/">ZooKeeper</a><script src="skin/breadcrumbs.js" language="JavaScript" type="text/javascript"></script>
+</div>
+<!--+
+    |header
+    +-->
+<div class="header">
+<!--+
+    |start group logo
+    +-->
+<div class="grouplogo">
+<a href="http://hadoop.apache.org/"><img class="logoImage" alt="Hadoop" src="images/hadoop-logo.jpg" title="Apache Hadoop"></a>
+</div>
+<!--+
+    |end group logo
+    +-->
+<!--+
+    |start Project Logo
+    +-->
+<div class="projectlogo">
+<a href="http://hadoop.apache.org/zookeeper/"><img class="logoImage" alt="ZooKeeper" src="images/zookeeper_small.gif" title="The Hadoop database"></a>
+</div>
+<!--+
+    |end Project Logo
+    +-->
+<!--+
+    |start Search
+    +-->
+<div class="searchbox">
+<form action="http://www.google.com/search" method="get" class="roundtopsmall">
+<input value="hadoop.apache.org" name="sitesearch" type="hidden"><input onFocus="getBlank (this, 'Search the site with google');" size="25" name="q" id="query" type="text" value="Search the site with google">&nbsp; 
+                    <input name="Search" value="Search" type="submit">
+</form>
+</div>
+<!--+
+    |end search
+    +-->
+<!--+
+    |start Tabs
+    +-->
+<ul id="tabs">
+<li>
+<a class="unselected" href="http://hadoop.apache.org/zookeeper/">Project</a>
+</li>
+<li>
+<a class="unselected" href="http://wiki.apache.org/hadoop/ZooKeeper">Wiki</a>
+</li>
+<li class="current">
+<a class="selected" href="index.html">ZooKeeper Documentation</a>
+</li>
+</ul>
+<!--+
+    |end Tabs
+    +-->
+</div>
+</div>
+<div id="main">
+<div id="publishedStrip">
+<!--+
+    |start Subtabs
+    +-->
+<div id="level2tabs"></div>
+<!--+
+    |end Endtabs
+    +-->
+<script type="text/javascript"><!--
+document.write("Last Published: " + document.lastModified);
+//  --></script>
+</div>
+<!--+
+    |breadtrail
+    +-->
+<div class="breadtrail">
+
+             &nbsp;
+           </div>
+<!--+
+    |start Menu, mainarea
+    +-->
+<!--+
+    |start Menu
+    +-->
+<div id="menu">
+<div onclick="SwitchMenu('menu_selected_1.1', 'skin/')" id="menu_selected_1.1Title" class="menutitle" style="background-image: url('skin/images/chapter_open.gif');">Documentation</div>
+<div id="menu_selected_1.1" class="selectedmenuitemgroup" style="display: block;">
+<div class="menuitem">
+<a href="index.html">Welcome</a>
+</div>
+<div class="menuitem">
+<a href="zookeeperOver.html">Zookeeper Overview</a>
+</div>
+<div class="menuitem">
+<a href="zookeeperStarted.html">Getting Started</a>
+</div>
+<div class="menuitem">
+<a href="zookeeperProgrammers.html">Programmer's Guide</a>
+</div>
+<div class="menupage">
+<div class="menupagetitle">Recipes</div>
+</div>
+<div class="menuitem">
+<a href="zookeeperAdmin.html">Administrator's Guide</a>
+</div>
+<div class="menuitem">
+<a href="api/index.html">API Docs</a>
+</div>
+<div class="menuitem">
+<a href="http://wiki.apache.org/hadoop/ZooKeeper">Wiki</a>
+</div>
+<div class="menuitem">
+<a href="http://wiki.apache.org/hadoop/ZooKeeper/FAQ">FAQ</a>
+</div>
+<div class="menuitem">
+<a href="http://hadoop.apache.org/zookeeper/mailing_lists.html">Mailing Lists</a>
+</div>
+<div class="menuitem">
+<a href="zookeeperOtherInfo.html">Other Info</a>
+</div>
+</div>
+<div id="credit"></div>
+<div id="roundbottom">
+<img style="display: none" class="corner" height="15" width="15" alt="" src="skin/images/rc-b-l-15-1body-2menu-3menu.png"></div>
+<!--+
+  |alternative credits
+  +-->
+<div id="credit2"></div>
+</div>
+<!--+
+    |end Menu
+    +-->
+<!--+
+    |start content
+    +-->
+<div id="content">
+<div title="Portable Document Format" class="pdflink">
+<a class="dida" href="recipes.pdf"><img alt="PDF -icon" src="skin/images/pdfdoc.gif" class="skin"><br>
+        PDF</a>
+</div>
+<div id="minitoc-area">
+<ul class="minitoc">
+<li>
+<a href="#A+Guide+to+Creating+Higher-level+Constructs+with+ZooKeeper">A Guide to Creating Higher-level Constructs with ZooKeeper</a>
+<ul class="minitoc">
+<li>
+<a href="#sc_outOfTheBox">Out of the Box Applications: Name Service, Configuration, Group
+    Membership</a>
+</li>
+<li>
+<a href="#sc_recipes_eventHandles">Barriers</a>
+<ul class="minitoc">
+<li>
+<a href="#sc_doubleBarriers">Double Barriers</a>
+</li>
+</ul>
+</li>
+<li>
+<a href="#sc_recipes_Queues">Queues</a>
+<ul class="minitoc">
+<li>
+<a href="#sc_recipes_priorityQueues">Priority Queues</a>
+</li>
+</ul>
+</li>
+<li>
+<a href="#sc_recipes_Locks">Locks</a>
+<ul class="minitoc">
+<li>
+<a href="#Shared+Locks">Shared Locks</a>
+</li>
+<li>
+<a href="#sc_recoverableSharedLocks">Recoverable Shared Locks</a>
+</li>
+</ul>
+</li>
+<li>
+<a href="#sc_recipes_twoPhasedCommit">Two-phased Commit</a>
+</li>
+<li>
+<a href="#sc_leaderElection">Leader Election</a>
+</li>
+</ul>
+</li>
+</ul>
+</div>
+  
+<title>ZooKeeper Recipes and Solutions</title>
+
+  
+
+  
+<a name="N1000A"></a><a name="A+Guide+to+Creating+Higher-level+Constructs+with+ZooKeeper"></a>
+<h2 class="h3">A Guide to Creating Higher-level Constructs with ZooKeeper</h2>
+<div class="section">
+<p>In this article, you'll find guidelines for using
+    ZooKeeper to implement higher order functions. All of them are conventions
+    implemented at the client and do not require special support from
+    ZooKeeper. Hopfully the community will capture these conventions in client-side libraries 
+    to ease their use and to encourage standardization.</p>
+<p>One of the most interesting things about ZooKeeper is that even
+    though ZooKeeper uses <em>asynchronous</em> notifications, you
+    can use it to build <em>synchronous</em> consistency
+    primitives, such as queues and locks. As you will see, this is possible
+    because ZooKeeper imposes an overall order on updates, and has mechanisms
+    to expose this ordering.</p>
+<p>Note that the recipes below attempt to employ best practices. In
+    particular, they avoid polling, timers or anything else that would result
+    in a "herd effect", causing bursts of traffic and limiting
+    scalability.</p>
+<p>There are many useful functions that can be imagined that aren't
+    included here - revocable read-write priority locks, as just one example.
+    And some of the constructs mentioned here - locks, in particular -
+    illustrate certain points, even though you may find other constructs, such
+    as event handles or queues, a more practical means of performing the same
+    function. In general, the examples in this section are designed to
+    stimulate thought.</p>
+<a name="N10022"></a><a name="sc_outOfTheBox"></a>
+<h3 class="h4">Out of the Box Applications: Name Service, Configuration, Group
+    Membership</h3>
+<p>Name service and configuration are two of the primary applications
+    of ZooKeeper. These two functions are provided directly by the ZooKeeper
+    API.</p>
+<p>Another function directly provided by ZooKeeper is <em>group
+    membership</em>. The group is represented by a node. Members of the
+    group create ephemeral nodes under the group node. Nodes of the members
+    that fail abnormally will be removed automatically when ZooKeeper detects
+    the failure.</p>
+<a name="N10032"></a><a name="sc_recipes_eventHandles"></a>
+<h3 class="h4">Barriers</h3>
+<p>Distributed systems use <em>barriers</em> to block
+    processing of a set of nodes until a condition is met at which time all
+    the nodes are allowed to proceed. Barriers are implemented in ZooKeeper by
+    designating a barrier node. The barrier is in place if the barrier node
+    exists. Here's the pseudo code:</p>
+<ol>
+      
+<li>
+        
+<p>Client calls the ZooKeeper API's <strong>exists()</strong> function on the barrier node, with
+        <em>watch</em> set to true.</p>
+      
+</li>
+
+      
+<li>
+        
+<p>If <strong>exists()</strong> returns false, the
+        barrier is gone and the client proceeds</p>
+      
+</li>
+
+      
+<li>
+        
+<p>Else, if <strong>exists()</strong> returns true,
+        the clients wait for a watch event from ZooKeeper for the barrier
+        node.</p>
+      
+</li>
+
+      
+<li>
+        
+<p>When the watch event is triggered, the client reissues the
+        <strong>exists( )</strong> call, again waiting until
+        the barrier node is removed.</p>
+      
+</li>
+    
+</ol>
+<p>
+<remark>[tbd: maybe an illustration would be nice for each of the
+    recipes?]</remark>
+</p>
+<a name="N1006C"></a><a name="sc_doubleBarriers"></a>
+<h4>Double Barriers</h4>
+<p>Double barriers enable clients to synchronize the beginning and
+      the end of a computation. When enough processes have joined the barrier,
+      processes start their computation and leave the barrier once they have
+      finished. This recipe shows how to use a ZooKeeper node as a
+      barrier.</p>
+<p>The pseudo code in this recipe represents the barrier node as
+      <em>b</em>. Every client process <em>p</em>
+      registers with the barrier node on entry and unregisters when it is
+      ready to leave. A node registers with the barrier node via the <strong>Enter</strong> procedure below, it waits until
+      <em>x</em> client process register before proceeding with
+      the computation. (The <em>x</em> here is up to you to
+      determine for your system.)</p>
+<p>
+<table class="ForrestTable" cellspacing="1" cellpadding="4">
+          
+              
+<tr>
+                
+<td><strong>Enter</strong></td>
+
+                <td><strong>Leave</strong></td>
+              
+</tr>
+
+              
+<tr>
+                
+<td>
+<ol>
+                    
+<li>
+                      
+<p>Create a name <em><em>n</em> =
+                      <em>b</em>+&ldquo;/&rdquo;+<em>p</em></em>
+</p>
+                    
+</li>
+
+                    
+<li>
+                      
+<p>Set watch: <strong>exists(<em>b</em> + &lsquo;&lsquo;/ready&rsquo;&rsquo;,
+                      true)</strong>
+</p>
+                    
+</li>
+
+                    
+<li>
+                      
+<p>Create child: <strong>create(
+                      <em>n</em>, EPHEMERAL)</strong>
+</p>
+                    
+</li>
+
+                    
+<li>
+                      
+<p>
+<strong>L = getChildren(b,
+                      false)</strong>
+</p>
+                    
+</li>
+
+                    
+<li>
+                      
+<p>if fewer children in L than<em>
+                      x</em>, wait for watch event <remark>[tbd: how do
+                      you wait?]</remark>
+</p>
+                    
+</li>
+
+                    
+<li>
+                      
+<p>else <strong>create(b + &lsquo;&lsquo;/ready&rsquo;&rsquo;,
+                      REGULAR)</strong>
+</p>
+                    
+</li>
+                  
+</ol>
+</td>
+
+                <td>
+<ol>
+                    
+<li>
+                      
+<p>
+<strong>L = getChildren(b,
+                      false)</strong>
+</p>
+                    
+</li>
+
+                    
+<li>
+                      
+<p>if no children, exit</p>
+                    
+</li>
+
+                    
+<li>
+                      
+<p>if <em>p</em> is only process node in
+                      L, delete(n) and exit</p>
+                    
+</li>
+
+                    
+<li>
+                      
+<p>if <em>p</em> is the lowest process
+                      node in L, wait on highest process node in P</p>
+                    
+</li>
+
+                    
+<li>
+                      
+<p>else <strong>delete(<em>n</em>) </strong>if
+                      still exists and wait on lowest process node in L</p>
+                    
+</li>
+
+                    
+<li>
+                      
+<p>goto 1</p>
+                    
+</li>
+                  
+</ol>
+</td>
+              
+</tr>
+            
+        
+</table>On entering, all processes watch on a ready node and
+      create an ephemeral node as a child of the barrier node. Each process
+      but the last enters the barrier and waits for the ready node to appear
+      at line 5. The process that creates the xth node, the last process, will
+      see x nodes in the list of children and create the ready node, waking up
+      the other processes. Note that waiting processes wake up only when it is
+      time to exit, so waiting is efficient.</p>
+<p>On exit, you can't use a flag such as <em>ready</em>
+      because you are watching for process nodes to go away. By using
+      ephemeral nodes, processes that fail after the barrier has been entered
+      do not prevent correct processes from finishing. When processes are
+      ready to leave, they need to delete their process nodes and wait for all
+      other processes to do the same.</p>
+<p>Processes exit when there are no process nodes left as children of
+      <em>b</em>. However, as an efficiency, you can use the
+      lowest process node as the ready flag. All other processes that are
+      ready to exit watch for the lowest existing process node to go away, and
+      the owner of the lowest process watches for any other process node
+      (picking the highest for simplicity) to go away. This means that only a
+      single process wakes up on each node deletion except for the last node,
+      which wakes up everyone when it is removed.</p>
+<a name="N10120"></a><a name="sc_recipes_Queues"></a>
+<h3 class="h4">Queues</h3>
+<p>Distributed queues are a common data structure. To implement a
+    distributed queue in ZooKeeper, first designate a znode to hold the queue,
+    the queue node. The distributed clients put something into the queue by
+    calling create() with a pathname ending in "queue-", with the
+    <em>sequence</em> and <em>ephemeral</em> flags in
+    the create() call set to true. Because the <em>sequence</em>
+    flag is set, the new pathnames will have the form
+    _path-to-queue-node_/queue-X, where X is a monotonic increasing number. A
+    client that wants to be remove from the queue calls ZooKeeper's <strong>getChildren( )</strong> function, with
+    <em>watch</em> set to true on the queue node, and begins
+    processing nodes with the lowest number. The client does not need to issue
+    another <strong>getChildren( )</strong> until it exhausts
+    the list obtained from the first <strong>getChildren(
+    )</strong> call. If there are are no children in the queue node, the
+    reader waits for a watch notification to check to queue again.</p>
+<a name="N1013E"></a><a name="sc_recipes_priorityQueues"></a>
+<h4>Priority Queues</h4>
+<p>To implement a priority queue, you need only make two simple
+      changes to the generic <a href="#sc_recipes_Queues">queue
+      recipe</a> . First, to add to a queue, the pathname ends with
+      "queue-YY" where YY is the priority of the element with lower numbers
+      representing higher priority (just like UNIX). Second, when removing
+      from the queue a client uses an up-to-date children list meaning that
+      the client will invalidate previously obtained children lists if a watch
+      notification triggers for the queue node.</p>
+<a name="N1014D"></a><a name="sc_recipes_Locks"></a>
+<h3 class="h4">Locks</h3>
+<p>Fully distributed locks that are globally synchronous, meaning at
+    any snapshot in time no two clients think they hold the same lock. These
+    can be implemented using ZooKeeeper. As with priority queues, first define
+    a lock node.</p>
+<p>Clients wishing to obtain a lock do the following:</p>
+<ol>
+      
+<li>
+        
+<p>Call <strong>create( )</strong> with a pathname
+        of "_locknode_/lock-" and the <em>sequence</em> and
+        <em>ephemeral</em> flags set.</p>
+      
+</li>
+
+      
+<li>
+        
+<p>Call <strong>getChildren( )</strong> on the lock
+        node <em>without</em> setting the watch flag (this is
+        important to avoid the herd effect).</p>
+      
+</li>
+
+      
+<li>
+        
+<p>If the pathname created in step <strong>1</strong> has the lowest sequence number suffix, the
+        client has the lock and the client exits the protocol.</p>
+      
+</li>
+
+      
+<li>
+        
+<p>The client calls <strong>exists( )</strong> with
+        the watch flag set on the path in the lock directory with the next
+        lowest sequence number.</p>
+      
+</li>
+
+      
+<li>
+        
+<p>if <strong>exists( )</strong> returns false, go
+        to step <strong>2</strong>. Otherwise, wait for a
+        notification for the pathname from the previous step before going to
+        step <strong>2</strong>.</p>
+      
+</li>
+    
+</ol>
+<p>The unlock protocol is very simple: clients wishing to release a
+    lock simply delete the node they created in step 1.</p>
+<p>Here are a few things to notice:</p>
+<ul>
+      
+<li>
+        
+<p>The removal of a node will only cause one client to wake up
+        since each node is watched by exactly one client. In this way, you
+        avoid the herd effect.</p>
+      
+</li>
+    
+</ul>
+<ul>
+      
+<li>
+        
+<p>There is no polling or timeouts.</p>
+      
+</li>
+    
+</ul>
+<ul>
+      
+<li>
+        
+<p>Because of the way you implement locking, it is easy to see the
+        amount of lock contention, break locks, debug locking problems,
+        etc.</p>
+      
+</li>
+    
+</ul>
+<a name="N101B9"></a><a name="Shared+Locks"></a>
+<h4>Shared Locks</h4>
+<p>You can implement shared locks by with a few changes to the lock
+      protocol:</p>
+<table class="ForrestTable" cellspacing="1" cellpadding="4">
+        
+            
+<tr>
+              
+<td><strong>Obtaining a read
+              lock:</strong></td>
+
+              <td><strong>Obtaining a write
+              lock:</strong></td>
+            
+</tr>
+
+            
+<tr>
+              
+<td>
+<ol>
+                  
+<li>
+                    
+<p>Call <strong>create( )</strong> to
+                    create a node with pathname
+                    "<span class="codefrag parameter">_locknode_/read-</span>". This is the
+                    lock node use later in the protocol. Make sure to set both
+                    the <em>sequence</em> and
+                    <em>ephemeral</em> flags.</p>
+                  
+</li>
+
+                  
+<li>
+                    
+<p>Call <strong>getChildren( )</strong>
+                    on the lock node <em>without</em> setting the
+                    <em>watch</em> flag - this is important, as it
+                    avoids the herd effect.</p>
+                  
+</li>
+
+                  
+<li>
+                    
+<p>If there are no children with a pathname starting
+                    with "<span class="codefrag parameter">write-</span>" and having a lower
+                    sequence number than the node created in step <strong>1</strong>, the client has the lock and can
+                    exit the protocol. </p>
+                  
+</li>
+
+                  
+<li>
+                    
+<p>Otherwise, call <strong>exists(
+                    )</strong>, with <em>watch</em> flag, set on
+                    the node in lock directory with pathname staring with
+                    "<span class="codefrag parameter">write-</span>" having the next lowest
+                    sequence number.</p>
+                  
+</li>
+
+                  
+<li>
+                    
+<p>If <strong>exists( )</strong>
+                    returns <em>false</em>, goto step <strong>2</strong>.</p>
+                  
+</li>
+
+                  
+<li>
+                    
+<p>Otherwise, wait for a notification for the pathname
+                    from the previous step before going to step <strong>2</strong>
+</p>
+                  
+</li>
+                
+</ol>
+</td>
+
+              <td>
+<ol>
+                  
+<li>
+                    
+<p>Call <strong>create( )</strong> to
+                    create a node with pathname
+                    "<span class="codefrag parameter">_locknode_/write-</span>". This is the
+                    lock node spoken of later in the protocol. Make sure to
+                    set both <em>sequence</em> and
+                    <em>ephemeral</em> flags.</p>
+                  
+</li>
+
+                  
+<li>
+                    
+<p>Call <strong>getChildren( )
+                    </strong> on the lock node <em>without</em>
+                    setting the <em>watch</em> flag - this is
+                    important, as it avoids the herd effect.</p>
+                  
+</li>
+
+                  
+<li>
+                    
+<p>If there are no children with a lower sequence
+                    number than the node created in step <strong>1</strong>, the client has the lock and the
+                    client exits the protocol.</p>
+                  
+</li>
+
+                  
+<li>
+                    
+<p>Call <strong>exists( ),</strong>
+                    with <em>watch</em> flag set, on the node with
+                    the pathname that has the next lowest sequence
+                    number.</p>
+                  
+</li>
+
+                  
+<li>
+                    
+<p>If <strong>exists( )</strong>
+                    returns <em>false</em>, goto step <strong>2</strong>. Otherwise, wait for a
+                    notification for the pathname from the previous step
+                    before going to step <strong>2</strong>.</p>
+                  
+</li>
+                
+</ol>
+</td>
+            
+</tr>
+          
+      
+</table>
+<p>
+<div class="note">
+<div class="label">Note</div>
+<div class="content">
+          
+<p>It might appear that this recipe creates a herd effect: when
+          there is a large group of clients waiting for a read lock, and all
+          getting notified more or less simultaneously when the
+          "<span class="codefrag parameter">write-</span>" node with the lowest sequence number
+          is deleted. In fact. that's valid behavior: as all those waiting
+          reader clients should be released since they have the lock. The herd
+          effect refers to releasing a "herd" when in fact only a single or a
+          small number of machines can proceed. <remark>[tbd: myabe helpful to
+          indicate which step this refers to?]</remark>
+</p>
+        
+</div>
+</div>
+</p>
+<a name="N10288"></a><a name="sc_recoverableSharedLocks"></a>
+<h4>Recoverable Shared Locks</h4>
+<p>With minor modifications to the Shared Lock protocol, you make
+      shared locks revocable by modifying the shared lock protocol:</p>
+<p>In step <strong>1</strong>, of both obtain reader
+      and writer lock protocols, call <strong>getData(
+      )</strong> with <em>watch</em> set, immediately after the
+      call to <strong>create( )</strong>. If the client
+      subsequently receives notification for the node it created in step
+      <strong>1</strong>, it does another <strong>getData( )</strong> on that node, with
+      <em>watch</em> set and looks for the string "unlock", which
+      signals to the client that it must release the lock. This is because,
+      according to this shared lock protocol, you can request the client with
+      the lock give up the lock by calling <strong>setData()
+      </strong> on the lock node, writing "unlock" to that node.</p>
+<p>Note that this protocol requires the lock holder to consent to
+      releasing the lock. Such consent is important, especially if the lock
+      holder needs to do some processing before releasing the lock. Of course
+      you can always implement <em>Revocable Shared Locks with Freaking
+      Laser Beams</em> by stipulating in your protocol that the revoker
+      is allowed to delete the lock node if after some length of time the lock
+      isn't deleted by the lock holder.</p>
+<a name="N102B4"></a><a name="sc_recipes_twoPhasedCommit"></a>
+<h3 class="h4">Two-phased Commit</h3>
+<p>A two-phase commit protocol is an algorithm that lets all clients in
+    a distributed system agree either to commit a transaction or abort.</p>
+<p>In ZooKeeper, you can implement a two-phased commit by having a
+    coordinator create a transaction node, say "/app/Tx", and one child node
+    per participating site, say "/app/Tx/s_i". When coordinator creates the
+    child node, it leaves the content undefined. Once each site involved in
+    the transaction receives the transaction from the coordinator, the site
+    reads each child node and sets a watch. Each site then processes the query
+    and votes "commit" or "abort" by writing to its respective node. Once the
+    write completes, the other sites are notified, and as soon as all sites
+    have all votes, they can decide either "abort" or "commit". Note that a
+    node can decide "abort" earlier if some site votes for "abort".</p>
+<p>An interesting aspect of this implementation is that the only role
+    of the coordinator is to decide upon the group of sites, to create the
+    ZooKeeper nodes, and to propagate the transaction to the corresponding
+    sites. In fact, even propagating the transaction can be done through
+    ZooKeeper by writing it in the transaction node.</p>
+<p>There are two important drawbacks of the approach described above.
+    One is the message complexity, which is O(n&sup2;). The second is the
+    impossibility of detecting failures of sites through ephemeral nodes. To
+    detect the failure of a site using ephemeral nodes, it is necessary that
+    the site create the node.</p>
+<p>To solve the first problem, you can have only the coordinator
+    notified of changes to the transaction nodes, and then notify the sites
+    once coordinator reaches a decision. Note that this approach is scalable,
+    but it's is slower too, as it requires all communication to go through the
+    coordinator.</p>
+<p>To address the second problem, you can have the coordinator
+    propagate the transaction to the sites, and have each site creating its
+    own ephemeral node.</p>
+<a name="N102CD"></a><a name="sc_leaderElection"></a>
+<h3 class="h4">Leader Election</h3>
+<p>A simple way of doing leader election with ZooKeeper is to use the
+    <strong>SEQUENCE|EPHEMERAL</strong> flags when creating
+    znodes that represent "proposals" of clients. The idea is to have a znode,
+    say "/election", such that each znode creates a child znode "/election/n_"
+    with both flags SEQUENCE|EPHEMERAL. With the sequence flag, ZooKeeper
+    automatically appends a sequence number that is greater that any one
+    previously appended to a child of "/election". The process that created
+    the znode with the smallest appended sequence number is the leader.
+    </p>
+<p>That's not all, though. It is important to watch for failures of the
+    leader, so that a new client arises as the new leader in the case the
+    current leader fails. A trivial solution is to have all application
+    processes watching upon the current smallest znode, and checking if they
+    are the new leader when the smallest znode goes away (note that the
+    smallest znode will go away if the leader fails because the node is
+    ephemeral). But this causes a herd effect: upon of failure of the current
+    leader, all other processes receive a notification, and execute
+    getChildren on "/election" to obtain the current list of children of
+    "/election". If the number of clients is large, it causes a spike on the
+    number of operations that ZooKeeper servers have to process. To avoid the
+    herd effect, it is sufficient to watch for the next znode down on the
+    sequence of znodes. If a client receives a notification that the znode it
+    is watching is gone, then it becomes the new leader in the case that there
+    is no smaller znode. Note that this avoids the herd effect by not having
+    all clients watching the same znode. </p>
+<p>Here's the pseudo code:</p>
+<p>Let ELECTION be a path of choice of the application. To volunteer to
+    be a leader: </p>
+<ol>
+      
+<li>
+        
+<p>Create znode z with path "ELECTION/n_" with both SEQUENCE and
+        EPHEMERAL flags;</p>
+      
+</li>
+
+      
+<li>
+        
+<p>Let C be the children of "ELECTION", and i be the sequence
+        number of z;</p>
+      
+</li>
+
+      
+<li>
+        
+<p>Watch for changes on "ELECTION/n_j", where j is the smallest
+        sequence number such that j &lt; i and n_j is a znode in C;</p>
+      
+</li>
+    
+</ol>
+<p>Upon receiving a notification of znode deletion: </p>
+<ol>
+      
+<li>
+        
+<p>Let C be the new set of children of ELECTION; </p>
+      
+</li>
+
+      
+<li>
+        
+<p>If z is the smallest node in C, then execute leader
+        procedure;</p>
+      
+</li>
+
+      
+<li>
+        
+<p>Otherwise, watch for changes on "ELECTION/n_j", where j is the
+        smallest sequence number such that j &lt; i and n_j is a znode in C;
+        </p>
+      
+</li>
+    
+</ol>
+<p>Note that the znode having no preceding znode on the list of
+    children does not imply that the creator of this znode is aware that it is
+    the current leader. Applications may consider creating a separate to znode
+    to acknowledge that the leader has executed the leader procedure. </p>
+</div>
+
+<p align="right">
+<font size="-2"></font>
+</p>
+</div>
+<!--+
+    |end content
+    +-->
+<div class="clearboth">&nbsp;</div>
+</div>
+<div id="footer">
+<!--+
+    |start bottomstrip
+    +-->
+<div class="lastmodified">
+<script type="text/javascript"><!--
+document.write("Last Published: " + document.lastModified);
+//  --></script>
+</div>
+<div class="copyright">
+        Copyright &copy;
+         2008 <a href="http://www.apache.org/licenses/">The Apache Software Foundation.</a>
+</div>
+<!--+
+    |end bottomstrip
+    +-->
+</div>
+</body>
+</html>

File diff suppressed because it is too large
+ 107 - 0
docs/recipes.pdf


+ 1056 - 0
docs/zookeeperAdmin.html

@@ -0,0 +1,1056 @@
+<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
+<html>
+<head>
+<META http-equiv="Content-Type" content="text/html; charset=UTF-8">
+<meta content="Apache Forrest" name="Generator">
+<meta name="Forrest-version" content="0.8">
+<meta name="Forrest-skin-name" content="pelt">
+<title></title>
+<link type="text/css" href="skin/basic.css" rel="stylesheet">
+<link media="screen" type="text/css" href="skin/screen.css" rel="stylesheet">
+<link media="print" type="text/css" href="skin/print.css" rel="stylesheet">
+<link type="text/css" href="skin/profile.css" rel="stylesheet">
+<script src="skin/getBlank.js" language="javascript" type="text/javascript"></script><script src="skin/getMenu.js" language="javascript" type="text/javascript"></script><script src="skin/fontsize.js" language="javascript" type="text/javascript"></script>
+<link rel="shortcut icon" href="images/favicon.ico">
+</head>
+<body onload="init()">
+<script type="text/javascript">ndeSetTextSize();</script>
+<div id="top">
+<!--+
+    |breadtrail
+    +-->
+<div class="breadtrail">
+<a href="http://www.apache.org/">Apache</a> &gt; <a href="http://hadoop.apache.org/">Hadoop</a> &gt; <a href="http://hadoop.apache.org/zookeeper/">ZooKeeper</a><script src="skin/breadcrumbs.js" language="JavaScript" type="text/javascript"></script>
+</div>
+<!--+
+    |header
+    +-->
+<div class="header">
+<!--+
+    |start group logo
+    +-->
+<div class="grouplogo">
+<a href="http://hadoop.apache.org/"><img class="logoImage" alt="Hadoop" src="images/hadoop-logo.jpg" title="Apache Hadoop"></a>
+</div>
+<!--+
+    |end group logo
+    +-->
+<!--+
+    |start Project Logo
+    +-->
+<div class="projectlogo">
+<a href="http://hadoop.apache.org/zookeeper/"><img class="logoImage" alt="ZooKeeper" src="images/zookeeper_small.gif" title="The Hadoop database"></a>
+</div>
+<!--+
+    |end Project Logo
+    +-->
+<!--+
+    |start Search
+    +-->
+<div class="searchbox">
+<form action="http://www.google.com/search" method="get" class="roundtopsmall">
+<input value="hadoop.apache.org" name="sitesearch" type="hidden"><input onFocus="getBlank (this, 'Search the site with google');" size="25" name="q" id="query" type="text" value="Search the site with google">&nbsp; 
+                    <input name="Search" value="Search" type="submit">
+</form>
+</div>
+<!--+
+    |end search
+    +-->
+<!--+
+    |start Tabs
+    +-->
+<ul id="tabs">
+<li>
+<a class="unselected" href="http://hadoop.apache.org/zookeeper/">Project</a>
+</li>
+<li>
+<a class="unselected" href="http://wiki.apache.org/hadoop/ZooKeeper">Wiki</a>
+</li>
+<li class="current">
+<a class="selected" href="index.html">ZooKeeper Documentation</a>
+</li>
+</ul>
+<!--+
+    |end Tabs
+    +-->
+</div>
+</div>
+<div id="main">
+<div id="publishedStrip">
+<!--+
+    |start Subtabs
+    +-->
+<div id="level2tabs"></div>
+<!--+
+    |end Endtabs
+    +-->
+<script type="text/javascript"><!--
+document.write("Last Published: " + document.lastModified);
+//  --></script>
+</div>
+<!--+
+    |breadtrail
+    +-->
+<div class="breadtrail">
+
+             &nbsp;
+           </div>
+<!--+
+    |start Menu, mainarea
+    +-->
+<!--+
+    |start Menu
+    +-->
+<div id="menu">
+<div onclick="SwitchMenu('menu_selected_1.1', 'skin/')" id="menu_selected_1.1Title" class="menutitle" style="background-image: url('skin/images/chapter_open.gif');">Documentation</div>
+<div id="menu_selected_1.1" class="selectedmenuitemgroup" style="display: block;">
+<div class="menuitem">
+<a href="index.html">Welcome</a>
+</div>
+<div class="menuitem">
+<a href="zookeeperOver.html">Zookeeper Overview</a>
+</div>
+<div class="menuitem">
+<a href="zookeeperStarted.html">Getting Started</a>
+</div>
+<div class="menuitem">
+<a href="zookeeperProgrammers.html">Programmer's Guide</a>
+</div>
+<div class="menuitem">
+<a href="recipes.html">Recipes</a>
+</div>
+<div class="menupage">
+<div class="menupagetitle">Administrator's Guide</div>
+</div>
+<div class="menuitem">
+<a href="api/index.html">API Docs</a>
+</div>
+<div class="menuitem">
+<a href="http://wiki.apache.org/hadoop/ZooKeeper">Wiki</a>
+</div>
+<div class="menuitem">
+<a href="http://wiki.apache.org/hadoop/ZooKeeper/FAQ">FAQ</a>
+</div>
+<div class="menuitem">
+<a href="http://hadoop.apache.org/zookeeper/mailing_lists.html">Mailing Lists</a>
+</div>
+<div class="menuitem">
+<a href="zookeeperOtherInfo.html">Other Info</a>
+</div>
+</div>
+<div id="credit"></div>
+<div id="roundbottom">
+<img style="display: none" class="corner" height="15" width="15" alt="" src="skin/images/rc-b-l-15-1body-2menu-3menu.png"></div>
+<!--+
+  |alternative credits
+  +-->
+<div id="credit2"></div>
+</div>
+<!--+
+    |end Menu
+    +-->
+<!--+
+    |start content
+    +-->
+<div id="content">
+<div title="Portable Document Format" class="pdflink">
+<a class="dida" href="zookeeperAdmin.pdf"><img alt="PDF -icon" src="skin/images/pdfdoc.gif" class="skin"><br>
+        PDF</a>
+</div>
+<div id="minitoc-area">
+<ul class="minitoc">
+<li>
+<a href="#Deployment">Deployment</a>
+<ul class="minitoc">
+<li>
+<a href="#sc_systemReq">System Requirements</a>
+</li>
+<li>
+<a href="#sc_zkMulitServerSetup">Clustered (Multi-Server) Setup</a>
+</li>
+<li>
+<a href="#sc_singleAndDevSetup">Single Server and Developer Setup</a>
+</li>
+</ul>
+</li>
+<li>
+<a href="#Administration">Administration</a>
+<ul class="minitoc">
+<li>
+<a href="#sc_configuration">Configuration Parameters</a>
+<ul class="minitoc">
+<li>
+<a href="#sc_minimumConfiguration">Minimum Configuration</a>
+</li>
+<li>
+<a href="#sc_advancedConfiguration">Advanced Configuration</a>
+</li>
+<li>
+<a href="#sc_clusterOptions">Cluster Options</a>
+</li>
+<li>
+<a href="#Unsafe+Options">Unsafe Options</a>
+</li>
+</ul>
+</li>
+<li>
+<a href="#sc_zkCommands">Zookeeper Commands: The Four Letter Words</a>
+</li>
+<li>
+<a href="#sc_monitoring">Monitoring</a>
+</li>
+<li>
+<a href="#sc_dataFileManagement">Data File Management</a>
+<ul class="minitoc">
+<li>
+<a href="#The+Data+Directory">The Data Directory</a>
+</li>
+<li>
+<a href="#The+Log+Directory">The Log Directory</a>
+</li>
+<li>
+<a href="#File+Management">File Management</a>
+</li>
+</ul>
+</li>
+<li>
+<a href="#sc_commonProblems">Things to Avoid</a>
+</li>
+<li>
+<a href="#sc_bestPractices">Best Practices</a>
+</li>
+</ul>
+</li>
+</ul>
+</div>
+  
+<title>ZooKeeper Administrator's Guide</title>
+
+  
+<subtitle>A Guide to Deployment and Administration</subtitle>
+
+  
+
+  
+<a name="N1000D"></a><a name="Deployment"></a>
+<h2 class="h3">Deployment</h2>
+<div class="section">
+<p>This chapter contains information about deploying Zookeeper and
+    covers these topics:</p>
+<ul>
+      
+<li>
+        
+<p>
+<a href="#sc_systemReq">System Requirements</a>
+</p>
+      
+</li>
+
+      
+<li>
+        
+<p>
+<a href="#sc_zkMulitServerSetup">Clustered (Multi-Server) Setup</a>
+</p>
+      
+</li>
+
+      
+<li>
+        
+<p>
+<a href="#sc_singleAndDevSetup">Single Server and Developer Setup</a>
+</p>
+      
+</li>
+    
+</ul>
+<p>The first two sections assume you are interested in installing
+    Zookeeper in a production environment such as a datacenter. The final
+    section covers situations in which you are setting up Zookeeper on a
+    limited basis - for evaluation, testing, or development - but not in a
+    production environment.</p>
+<a name="N10034"></a><a name="sc_systemReq"></a>
+<h3 class="h4">System Requirements</h3>
+<p>Zookeeper runs in Java, release 1.6 or greater, as group of hosts
+      called a quorum. Three Zookeeper hosts per quorum is the minimum
+      recommended quorum size. At Yahoo!, Zookeeper is usually deployed on
+      dedicated RHEL boxes, with dual-core processors, 2GB of RAM, and 80GB
+      IDE harddrives.</p>
+<a name="N1003E"></a><a name="sc_zkMulitServerSetup"></a>
+<h3 class="h4">Clustered (Multi-Server) Setup</h3>
+<p>For reliable ZooKeeper service, you should deploy ZooKeeper in a
+      cluster known as a <em>quorum</em>. As long as a majority
+      of the quorum are up, the service will be available. Because Zookeeper
+      requires a majority <remark>[tbd: why?]</remark>, it is best to use an
+      odd number of machines. For example, with four machines ZooKeeper can
+      only handle the failure of a single machine; if two machines fail, the
+      remaining two machines do not constitute a majority. However, with five
+      machines ZooKeeper can handle the failure of two machines. </p>
+<p>Here are the steps to setting a server that will be part of a
+      quorum. These steps should be performed on every host in the
+      quorum:</p>
+<ol>
+        
+<li>
+          
+<p>Install the Java JDK:</p>
+
+          
+<pre class="code">$yinst -i jdk-1.6.0.00_3 -br test  <remark>[y! prop - replace with open equiv]</remark>
+</pre>
+        
+</li>
+
+        
+<li>
+          
+<p>Set the Java heap size. This is very important, to avoid
+          swapping, which will seriously degrade Zookeeper performance. To
+          determine the correct value, load tests, make sure you are well
+          below the usage limit that would cause you to swap. Be conservative
+          - use a maximum heap size of 3GB for a 4GB machine. <remark>[tbd:
+          where would they do this? Environment variable,
+          etc?]</remark>
+</p>
+        
+</li>
+
+        
+<li>
+          
+<p>Install the Zookeeper Server Package:</p>
+
+          
+<pre class="code">$ yinst install -nostart zookeeper_server <remark>[Y! prop - replace with open eq]</remark>
+</pre>
+        
+</li>
+
+        
+<li>
+          
+<p>Create a configuration file. This file can be called anything.
+          Use the following settings as a starting point:</p>
+
+          
+<pre class="code">
+tickTime=2000
+dataDir=/var/zookeeper/
+clientPort=2181
+initLimit=5
+syncLimit=2
+server.1=zoo1:2888
+server.2=zoo2:2888
+server.3=zoo3:2888</pre>
+
+          
+<p>You can find the meanings of these and other configuration
+          settings in the section <a href="#sc_configuration">Configuration Parameters</a>. A word
+          though about a few here:</p>
+
+          
+<p>Every machine that is part of the ZooKeeper quorum should know
+          about every other machine in the quorum. You accomplish this with
+          the series of lines of the form <strong>server.id=host:port</strong>. The integers <strong>host</strong> and <strong>port</strong> are straightforward. You attribute the
+          server id to each machine by creating a file named
+          <span class="codefrag filename">myid</span>, one for each server, which resides in
+          that server's data directory, as specified by the configuration file
+          parameter <strong>dataDir</strong>. The myid file
+          consists of a single line containing only the text of that machine's
+          id. So <span class="codefrag filename">myid</span> of server 1 would contain the text
+          "1" and nothing else. The id must be unique within the
+          quorum.</p>
+        
+</li>
+
+        
+<li>
+          
+<p>If your configuration file is set up, you can start
+          Zookeeper:</p>
+
+          
+<pre class="code">$ java -cp zookeeper-dev.jar:java/lib/log4j-1.2.15.jar:conf \
+        org.apache.zookeeper.server.quorum.QuorumPeerMain zoo.cfg</pre>
+        
+</li>
+
+        
+<li>
+          
+<p>Test your deployment by connecting to the hosts:</p>
+
+          
+<ul>
+            
+<li>
+              
+<p>In Java, you can run the following command to execute
+              simple operations:<remark> [tbd: also, maybe give some of those
+              simple operations?]</remark>
+</p>
+
+              
+<pre class="code">$ java -cp zookeeper.jar:java/lib/log4j-1.2.15.jar:conf \
+      org.apache.zookeeper.ZooKeeperMain 127.0.0.1:2181</pre>
+            
+</li>
+
+            
+<li>
+              
+<p>In C, you can compile either the single threaded client or
+              the multithreaded client: or n the c subdirectory in the
+              Zookeeper sources. This compiles the single threaded
+              client:</p>
+
+              
+<pre class="code">$ _make cli_st_</pre>
+
+              
+<p>And this compiles the mulithreaded client:</p>
+
+              
+<pre class="code">$ _make cli_mt_</pre>
+            
+</li>
+          
+</ul>
+
+          
+<p>Running either program gives you a shell in which to execute
+          simple file-system-like operations. <remark>[tbd: again, sample
+          operations?]</remark> To connect to Zookeeper with the multithreaded
+          client, for example, you would run:</p>
+
+          
+<pre class="code">$ cli_mt 127.0.0.1:2181</pre>
+        
+</li>
+      
+</ol>
+<a name="N100CE"></a><a name="sc_singleAndDevSetup"></a>
+<h3 class="h4">Single Server and Developer Setup</h3>
+<p>If you want to setup Zookeeper for development purposes, you will
+      probably want to setup a single server instance of Zookeeper, and then
+      install either the Java or C client-side libraries and bindings on your
+      development machine.</p>
+<p>The steps to setting up a single server instance are the similar
+      to the above, except the configuration file is simpler. You can find the
+      complete instructions in the <a href="zookeeperStarted.html#sc_InstallingSingleMode">Installing
+      and Running Zookeeper in SIngle Server Mode</a> section of the
+      <a href="zookeeperStarted.html">Zookeeper
+      Getting Started Guide</a>.</p>
+<p>For information on installing the client side libraries, refer to
+      the <a href="zookeeperProgrammers.html#Bindings">Bindings</a>
+      section of the <a href="zookeeperProgrammers.html">Zookeeper
+      Programmer's Guide</a>.</p>
+</div>
+
+  
+<a name="N100EF"></a><a name="Administration"></a>
+<h2 class="h3">Administration</h2>
+<div class="section">
+<p>This chapter contains information about running and maintaining
+    ZooKeeper and covers these topics: <ul>
+        
+<li>
+          
+<p>
+<a href="#sc_configuration">Configuration Parameters</a>
+</p>
+        
+</li>
+
+        
+<li>
+          
+<p>
+<a href="#sc_zkCommands">Zookeeper Commands: The Four Letter Words</a>
+</p>
+        
+</li>
+
+        
+<li>
+          
+<p>
+<a href="#sc_dataFileManagement">Data File Management</a>
+</p>
+        
+</li>
+
+        
+<li>
+          
+<p>
+<a href="#sc_commonProblems">Things to Avoid</a>
+</p>
+        
+</li>
+
+        
+<li>
+          
+<p>
+<a href="#sc_bestPractices">Best Practices</a>
+</p>
+        
+</li>
+      
+</ul>
+</p>
+<a name="N10122"></a><a name="sc_configuration"></a>
+<h3 class="h4">Configuration Parameters</h3>
+<p>ZooKeeper's behavior is governed by the ZooKeeper configuration
+        file. This file is designed so that the exact same file can be used by
+        all the servers that make up a ZooKeeper server assuming the disk
+        layouts are the same. If servers use different configuration files,
+        care must be taken to ensure that the list of servers in all of the
+        different configuration files match.<remark> [tbd: reformat in
+        standard form, with legal values, etc]</remark>
+</p>
+<a name="N1012D"></a><a name="sc_minimumConfiguration"></a>
+<h4>Minimum Configuration</h4>
+<p>Here are the minimum configuration keywords that must be
+          defined in the configuration file:</p>
+<dl>
+
+	    
+<dt>
+<term>clientPort</term>
+</dt>
+<dd>
+<p>the port to listen for client connections; that is, the
+                port that clients attempt to connect to.</p>
+</dd>
+
+            
+<dt>
+<term>dataDir</term>
+</dt>
+<dd>
+<p>the location where Zookeeper will store the in-memory
+                database snapshots and, unless specified otherwise, the
+                transaction log of updates to the database.</p>
+<div class="note">
+<div class="label">Note</div>
+<div class="content">
+                  
+<p>Be careful where you put the transaction log. A
+                  dedicated transaction log device is key to consistent good
+                  performance. Putting the log on a busy device will adversely
+                  effect performance.</p>
+                
+</div>
+</div>
+</dd>
+	    
+	    
+<dt>
+<term>tickTime</term>
+</dt>
+<dd>
+<p>the length of a single tick, which is the basic time
+                unit used by ZooKeeper, as measured in milliseconds. It is
+                used to regulate heartbeats, and timeouts. For example, the
+                minimum session timeout will be two ticks.</p>
+</dd>
+	    
+          
+</dl>
+<a name="N10154"></a><a name="sc_advancedConfiguration"></a>
+<h4>Advanced Configuration</h4>
+<p>The configuration settings in the section are optional. You
+          can use them to further fine tune the behaviour of your Zookeeper
+          servers. Some can also be set using Java system properties,
+          generally of the form <em>zookeeper.keyword</em>. The
+          exact system property, when available, is noted below.</p>
+<dl>
+	  
+            
+<dt>
+<term>dataLogDir</term>
+</dt>
+<dd>
+<p>(No Java system property)</p>
+<p>This option will direct the machine to write the
+                transaction log to the <strong>dataLogDir</strong> rather than the <strong>dataDir</strong>. This allows a dedicated log
+                device to be used, and helps avoid competition between logging
+                and snaphots.</p>
+<div class="note">
+<div class="label">Note</div>
+<div class="content">
+                  
+<p>Having a dedicated log device has a large impact on
+                  throughput and stable latencies. It is highly recommened to
+                  dedicate a log device and set <strong>dataLogDir</strong> to point to a directory on
+                  that device, and then make sure to point <strong>dataDir</strong> to a directory
+                  <em>not</em> residing on that device.</p>
+                
+</div>
+</div>
+</dd>
+	    
+	     
+<dt>
+<term>globalOutstandingLimit</term>
+</dt>
+<dd>
+<p>(Java system property: <strong>zookeeper.globalOutstandingLimit.</strong>)</p>
+<p>Clients can submit requests faster than ZooKeeper can
+                process them, especially if there are a lot of clients. To
+                prevent ZooKeeper from running out of memory due to queued
+                requests, ZooKeeper will throttle clients so that there is no
+                more than globalOutstandingLimit outstanding requests in the
+                system. The default limit is 1,000.</p>
+</dd>
+	    
+            
+<dt>
+<term>preAllocSize</term>
+</dt>
+<dd>
+<p>(Java system property: <strong>zookeeper.preAllocSize</strong>)</p>
+<p>To avoid seeks ZooKeeper allocates space in the
+                transaction log file in blocks of preAllocSize kilobytes. The
+                default block size is 64M. One reason for changing the size of
+                the blocks is to reduce the block size if snapshots are taken
+                more often. (Also, see <strong>snapCount</strong>).</p>
+</dd>
+
+            
+<dt>
+<term>snapCount</term>
+</dt>
+<dd>
+<p>(Java system property: <strong>zookeeper.snapCount</strong>)</p>
+<p>Clients can submit requests faster than ZooKeeper can
+                process them, especially if there are a lot of clients. To
+                prevent ZooKeeper from running out of memory due to queued
+                requests, ZooKeeper will throttle clients so that there is no
+                more than globalOutstandingLimit outstanding requests in the
+                system. The default limit is 1,000.ZooKeeper logs transactions
+                to a transaction log. After snapCount transactions are written
+                to a log file a snapshot is started and a new transaction log
+                file is started. The default snapCount is 10,000.</p>
+</dd>
+
+            
+<dt>
+<term>traceFile</term>
+</dt>
+<dd>
+<p>(Java system property: <strong>requestTraceFile</strong>)</p>
+<p>If this option is defined, requests will be will logged
+                to a trace file named traceFile.year.month.day. Use of this
+                option provides useful debugging information, but will impact
+                performance. (Note: The system property has no zookeeper
+                prefix, and the configuration variable name is different from
+                the system property. Yes - it's not consistent, and it's
+                annoying.<remark> [tbd: is there any explanation for
+                this?]</remark>)</p>
+</dd>
+
+          
+</dl>
+<a name="N101B7"></a><a name="sc_clusterOptions"></a>
+<h4>Cluster Options</h4>
+<p>The options in this section are designed for use in quorums --
+          that is, when deploying clusters of servers.</p>
+<dl>
+            
+<dt>
+<term>electionAlg:</term>
+</dt>
+<dd>
+<p>(No Java system property)</p>
+<p>Election implementation to use. A value of "0"
+                corresponds to the original UDP-based version, "1" corresponds
+                to the non-authenticated UDP-based version of fast leader
+                election, "2" corresponds to the authenticated UDP-based
+                version of fast leader election, and "3" corresponds to
+                TCP-based version of fast leader election</p>
+</dd>
+
+            
+<dt>
+<term>electionPort</term>
+</dt>
+<dd>
+<p>(No Java system property)</p>
+<p>Port used for leader election. It is only used when the
+                election algorithm is not "0". When the election algorithm is
+                "0" a UDP port with the same port number as the port listed in
+                the <strong>server.num</strong> option will be
+                used. <remark>[tbd: should that be <strong>server.id</strong>? Also, why isn't server.id
+                documented anywhere?]</remark>
+</p>
+</dd>
+
+            
+<dt>
+<term>initLimit</term>
+</dt>
+<dd>
+<p>(No Java system property)</p>
+<p>Amount of time, in ticks (see <a href="#id_tickTime">tickTime</a>), to allow followers to
+                connect and sync to a leader. Increased this value as needed,
+                if the amount of data managed by ZooKeeper is large.</p>
+</dd>
+
+            
+<dt>
+<term>leaderServes</term>
+</dt>
+<dd>
+<p>(Java system property: zookeeper.<strong>leaderServes</strong>)</p>
+<p>Leader accepts client connections. Default value is
+                "yes". The leader machine coordinates updates. For higher
+                update throughput at thes slight expense of read throughput
+                the leader can be configured to not accept clients and focus
+                on coordination. The default to this option is yes, which
+                means that a leader will accept client connections.
+                <remark>[tbd: how do you specifiy which server is the
+                leader?]</remark>
+</p>
+<div class="note">
+<div class="label">Note</div>
+<div class="content">
+                  
+<p>Turning on leader selection is highly recommended when
+                  you have more than three Zookeeper servers in a
+                  quorum.</p>
+                
+</div>
+</div>
+</dd>
+
+            
+<dt>
+<term>server.x=[hostname]:nnnn, etc</term>
+</dt>
+<dd>
+<p>(No Java system property)</p>
+<p>servers making up the Zookeeper quorum. When the server
+                starts up, it determines which server it is by looking for the
+                file <span class="codefrag filename">myid</span> in the data directory.<remark>
+                [tdb: should we mention somewhere about creating this file,
+                myid, in the setup procedure?]</remark> That file contains the
+                server number, in ASCII, and it should match <strong>x</strong> in <strong>server.x</strong> in the left hand side of this
+                setting.</p>
+<p>The list of servers that make up ZooKeeper servers that
+                is used by the clients must match the list of ZooKeeper
+                servers that each ZooKeeper server has.</p>
+<p>The port numbers <strong>nnnn</strong>
+                in this setting are the <em>electionPort</em>
+                numbers of the servers (as opposed to clientPorts).
+                <remark>[tbd: is the next sentence explanation an of what the
+                election port or is it a description of a special case?]
+                </remark>If you want to test multiple servers on a single
+                machine, the individual choices of electionPort for each
+                server can be defined in each server's config files using the
+                line electionPort=xxxx to avoid clashes.</p>
+</dd>
+
+            
+<dt>
+<term>syncLimit</term>
+</dt>
+<dd>
+<p>(No Java system property)</p>
+<p>Amount of time, in ticks (see <a href="#id_tickTime">tickTime</a>), to allow followers to
+                sync with ZooKeeper. If followers fall too far behind a
+                leader, they will be dropped. <remark>[tbd: is this a correct
+                rewording: if followers fall beyond this limit, they are
+                dropped?]</remark>
+</p>
+</dd>
+          
+</dl>
+<p></p>
+<a name="N10232"></a><a name="Unsafe+Options"></a>
+<h4>Unsafe Options</h4>
+<p>The following options can be useful, but be careful when you
+          use them. The risk of each is explained along with the explanation
+          of what the variable does.</p>
+<dl>
+	  
+	  
+<dt>
+<term>forceSync</term>
+</dt>
+<dd>
+<p>(Java system property: <strong>zookeeper.forceSync</strong>)</p>
+<p>Requires updates to be synced to media of the
+                transaction log before finishing processing the update. If
+                this option is set to no, ZooKeeper will not require updates
+                to be synced to the media. <remark>[tbd: useful because...,
+                dangerous because...]</remark>
+</p>
+</dd>
+
+            
+<dt>
+<term>jute.maxbuffer:</term>
+</dt>
+<dd>
+<p>(Java system property:<strong>
+                jute.maxbuffer</strong>)</p>
+<p>This option can only be set as a Java system property.
+                There is no zookeeper prefix on it. It specifies the maximum
+                size of the data that can be stored in a znode. The default is
+                0xfffff, or just under 1M. If this option is changed, the
+                system property must be set on all servers and clients
+                otherwise problems will arise. This is really a sanity check.
+                ZooKeeper is designed to store data on the order of kilobytes
+                in size.</p>
+</dd>
+	    
+            
+<dt>
+<term>skipACL</term>
+</dt>
+<dd>
+<p>(Java system property: <strong>zookeeper.skipACL</strong>)</p>
+<p>Skips ACL checks. <remark>[tbd: when? where?]</remark>
+                This results in a boost in throughput, but opens up full
+                access to the data tree to everyone.</p>
+</dd>
+
+            
+          
+</dl>
+<a name="N10269"></a><a name="sc_zkCommands"></a>
+<h3 class="h4">Zookeeper Commands: The Four Letter Words</h3>
+<p>Zookeeper responds to a small set of commands. Each command is composed of
+        four letters. You issue the commands to Zookeeper via telnet or nc, at
+        the client port.</p>
+<dl>
+	
+	    
+<dt>
+<term>dump</term>
+</dt>
+<dd>
+<p>Lists the outstanding sessions and ephemeral nodes. This
+              only works on the leader.</p>
+</dd>
+	  
+	    
+<dt>
+<term>kill</term>
+</dt>
+<dd>
+<p>Shuts down the server. This must be issued from the
+              machine the Zookeeper server is running on.</p>
+</dd>
+	  
+          
+<dt>
+<term>ruok</term>
+</dt>
+<dd>
+<p>Tests if server is running in a non-error state. The
+              server will respond with imok if it is running. Otherwise it
+              will not respond at all.</p>
+</dd>
+
+          
+<dt>
+<term>stat</term>
+</dt>
+<dd>
+<p>Lists statistics about performance and connected
+              clients.</p>
+</dd>
+        
+</dl>
+<p>Here's an example of the <strong>ruok</strong>
+        command:</p>
+<pre class="code">$ echo ruok | nc 127.0.0.1 5111
+
+imok
+</pre>
+<a name="N1029B"></a><a name="sc_monitoring"></a>
+<h3 class="h4">Monitoring</h3>
+<p>
+<remark>[tbd: Patrick, Ben, et al: I believe the Message Broker
+        team does perform routine monitoring of Zookeeper. But I might be
+        wrong. To your knowledge, is there any monitoring of a Zookeeper
+        deployment that will a Zookeeper sys admin will want to do, outside of
+        Yahoo?]</remark>
+</p>
+<a name="N102A6"></a><a name="sc_dataFileManagement"></a>
+<h3 class="h4">Data File Management</h3>
+<p>ZooKeeper stores its data in a data directory and its transaction
+      log in a transaction log directory. By default these two directories are
+      the same. The server can (and should) be configured to store the
+      transaction log files in a separate directory than the data files.
+      Throughput increases and latency decreases when transaction logs reside
+      on a dedicated log devices.</p>
+<a name="N102AF"></a><a name="The+Data+Directory"></a>
+<h4>The Data Directory</h4>
+<p>This directory has two files in it:</p>
+<ul>
+          
+<li>
+            
+<p>
+<span class="codefrag filename">myid</span> - contains a single integer in
+            human readable ASCII text that represents the server id.</p>
+          
+</li>
+
+          
+<li>
+            
+<p>
+<span class="codefrag filename">snapshot.&lt;zxid&gt;</span> - holds the fuzzy
+            snapshot of a data tree.</p>
+          
+</li>
+        
+</ul>
+<p>Each ZooKeeper server has a unique id. This id is used in two
+        places: the <span class="codefrag filename">myid</span> file and the configuration file.
+        The <span class="codefrag filename">myid</span> file identifies the server that
+        corresponds to the given data directory. The configuration file lists
+        the contact information for each server identified by its server id.
+        When a ZooKeeper server instance starts, it reads its id from the
+        <span class="codefrag filename">myid</span> file and then, using that id, reads from the
+        configuration file, looking up the port on which it should
+        listen.</p>
+<p>The <span class="codefrag filename">snapshot</span> files stored in the data
+        directory are fuzzy snapshots in the sense that during the time the
+        ZooKeeper server is taking the snapshot, updates are occurring to the
+        data tree. The suffix of the <span class="codefrag filename">snapshot</span> file names
+        is the <em>zxid</em>, the ZooKeeper transaction id, of the
+        last committed transaction at the start of the snapshot. Thus, the
+        snapshot includes a subset of the updates to the data tree that
+        occurred while the snapshot was in process. The snapshot, then, may
+        not correspond to any data tree that actually existed, and for this
+        reason we refer to it as a fuzzy snapshot. Still, ZooKeeper can
+        recover using this snapshot because it takes advantage of the
+        idempotent nature of its updates. By replaying the transaction log
+        against fuzzy snapshots ZooKeeper gets the state of the system at the
+        end of the log.</p>
+<a name="N102EB"></a><a name="The+Log+Directory"></a>
+<h4>The Log Directory</h4>
+<p>The Log Directory contains the ZooKeeper transaction logs.
+        Before any update takes place, ZooKeeper ensures that the transaction
+        that represents the update is written to non-volatile storage. A new
+        log file is started each time a snapshot is begun. The log file's
+        suffix is the first zxid written to that log.</p>
+<a name="N102F5"></a><a name="File+Management"></a>
+<h4>File Management</h4>
+<p>The format of snapshot and log files does not change between
+        standalone ZooKeeper servers and different configurations of
+        replicated ZooKeeper servers. Therefore, you can pull these files from
+        a running replicated ZooKeeper server to a development machine with a
+        stand-alone ZooKeeper server for trouble shooting.</p>
+<p>Using older log and snapshot files, you can look at the previous
+        state of ZooKeeper servers and even restore that state. The
+        LogFormatter class allows an administrator to look at the transactions
+        in a log.</p>
+<p>The ZooKeeper server creates snapshot and log files, but never
+        deletes them. The retention policy of the data and log files is
+        implemented outside of the ZooKeeper server. The server itself only
+        needs the latest complete fuzzy snapshot and the log files from the
+        start of that snapshot. The PurgeTxnLog utility implements a simple
+        retention policy that administrators can use.</p>
+<a name="N10306"></a><a name="sc_commonProblems"></a>
+<h3 class="h4">Things to Avoid</h3>
+<p>Here are some common problems you can avoid by configuring
+      ZooKeeper correctly:</p>
+<dl>
+        
+<dt>
+<term>inconsistent lists of servers</term>
+</dt>
+<dd>
+<p>The list of Zookeeper servers used by the clients must match
+            the list of ZooKeeper servers that each ZooKeeper server has.
+            Things work okay if the client list is a subset of the real list,
+            but things will really act strange if clients have a list of
+            ZooKeeper servers that are in different ZooKeeper clusters. Also,
+            the server lists in each Zookeeper server configuration file
+            should be consistent with one another. <remark>[tbd: I'm assuming
+            this last part is true. Is it?]</remark>
+</p>
+</dd>
+
+        
+<dt>
+<term>incorrect placement of transasction log</term>
+</dt>
+<dd>
+<p>The most performance critical part of ZooKeeper is the
+            transaction log. Zookeeper syncs transactions to media before it
+            returns a response. A dedicated transaction log device is key to
+            consistent good performance. Putting the log on a busy device will
+            adversely effect performance. If you only have one storage device,
+            put trace files on NFS and increase the snapshotCount; it doesn't
+            eliminate the problem, but it should mitigate it.</p>
+</dd>
+
+        
+<dt>
+<term>incorrect Java heap size</term>
+</dt>
+<dd>
+<p>You should take special care to set your Java max heap size
+            correctly. In particular, you should not create a situation in
+            which Zookeeper swaps to disk. The disk is death to ZooKeeper.
+            Everything is ordered, so if processing one request swaps the
+            disk, all other queued requests will probably do the same. the
+            disk. DON'T SWAP.</p>
+<p>Be conservative in your estimates: if you have 4G of RAM, do
+            not set the Java max heap size to 6G or even 4G. For example, it
+            is more likely you would use a 3G heap for a 4G machine, as the
+            operating system and the cache also need memory. The best and only
+            recommend practice for estimating the heap size your system needs
+            is to run load tests, and then make sure you are well below the
+            usage limit that would cause the system to swap.</p>
+</dd>
+      
+</dl>
+<a name="N1032C"></a><a name="sc_bestPractices"></a>
+<h3 class="h4">Best Practices</h3>
+<p>For best results, take note of the following list of good
+      Zookeeper practices. <remark>[tbd: I just threw this section in. Do we
+      have list that is is different from the "things to avoid"? If not, I can
+      easily remove this section.]</remark>
+</p>
+</div>
+
+<p align="right">
+<font size="-2"></font>
+</p>
+</div>
+<!--+
+    |end content
+    +-->
+<div class="clearboth">&nbsp;</div>
+</div>
+<div id="footer">
+<!--+
+    |start bottomstrip
+    +-->
+<div class="lastmodified">
+<script type="text/javascript"><!--
+document.write("Last Published: " + document.lastModified);
+//  --></script>
+</div>
+<div class="copyright">
+        Copyright &copy;
+         2008 <a href="http://www.apache.org/licenses/">The Apache Software Foundation.</a>
+</div>
+<!--+
+    |end bottomstrip
+    +-->
+</div>
+</body>
+</html>

File diff suppressed because it is too large
+ 151 - 0
docs/zookeeperAdmin.pdf


+ 206 - 0
docs/zookeeperOtherInfo.html

@@ -0,0 +1,206 @@
+<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
+<html>
+<head>
+<META http-equiv="Content-Type" content="text/html; charset=UTF-8">
+<meta content="Apache Forrest" name="Generator">
+<meta name="Forrest-version" content="0.8">
+<meta name="Forrest-skin-name" content="pelt">
+<title></title>
+<link type="text/css" href="skin/basic.css" rel="stylesheet">
+<link media="screen" type="text/css" href="skin/screen.css" rel="stylesheet">
+<link media="print" type="text/css" href="skin/print.css" rel="stylesheet">
+<link type="text/css" href="skin/profile.css" rel="stylesheet">
+<script src="skin/getBlank.js" language="javascript" type="text/javascript"></script><script src="skin/getMenu.js" language="javascript" type="text/javascript"></script><script src="skin/fontsize.js" language="javascript" type="text/javascript"></script>
+<link rel="shortcut icon" href="images/favicon.ico">
+</head>
+<body onload="init()">
+<script type="text/javascript">ndeSetTextSize();</script>
+<div id="top">
+<!--+
+    |breadtrail
+    +-->
+<div class="breadtrail">
+<a href="http://www.apache.org/">Apache</a> &gt; <a href="http://hadoop.apache.org/">Hadoop</a> &gt; <a href="http://hadoop.apache.org/zookeeper/">ZooKeeper</a><script src="skin/breadcrumbs.js" language="JavaScript" type="text/javascript"></script>
+</div>
+<!--+
+    |header
+    +-->
+<div class="header">
+<!--+
+    |start group logo
+    +-->
+<div class="grouplogo">
+<a href="http://hadoop.apache.org/"><img class="logoImage" alt="Hadoop" src="images/hadoop-logo.jpg" title="Apache Hadoop"></a>
+</div>
+<!--+
+    |end group logo
+    +-->
+<!--+
+    |start Project Logo
+    +-->
+<div class="projectlogo">
+<a href="http://hadoop.apache.org/zookeeper/"><img class="logoImage" alt="ZooKeeper" src="images/zookeeper_small.gif" title="The Hadoop database"></a>
+</div>
+<!--+
+    |end Project Logo
+    +-->
+<!--+
+    |start Search
+    +-->
+<div class="searchbox">
+<form action="http://www.google.com/search" method="get" class="roundtopsmall">
+<input value="hadoop.apache.org" name="sitesearch" type="hidden"><input onFocus="getBlank (this, 'Search the site with google');" size="25" name="q" id="query" type="text" value="Search the site with google">&nbsp; 
+                    <input name="Search" value="Search" type="submit">
+</form>
+</div>
+<!--+
+    |end search
+    +-->
+<!--+
+    |start Tabs
+    +-->
+<ul id="tabs">
+<li>
+<a class="unselected" href="http://hadoop.apache.org/zookeeper/">Project</a>
+</li>
+<li>
+<a class="unselected" href="http://wiki.apache.org/hadoop/ZooKeeper">Wiki</a>
+</li>
+<li class="current">
+<a class="selected" href="index.html">ZooKeeper Documentation</a>
+</li>
+</ul>
+<!--+
+    |end Tabs
+    +-->
+</div>
+</div>
+<div id="main">
+<div id="publishedStrip">
+<!--+
+    |start Subtabs
+    +-->
+<div id="level2tabs"></div>
+<!--+
+    |end Endtabs
+    +-->
+<script type="text/javascript"><!--
+document.write("Last Published: " + document.lastModified);
+//  --></script>
+</div>
+<!--+
+    |breadtrail
+    +-->
+<div class="breadtrail">
+
+             &nbsp;
+           </div>
+<!--+
+    |start Menu, mainarea
+    +-->
+<!--+
+    |start Menu
+    +-->
+<div id="menu">
+<div onclick="SwitchMenu('menu_selected_1.1', 'skin/')" id="menu_selected_1.1Title" class="menutitle" style="background-image: url('skin/images/chapter_open.gif');">Documentation</div>
+<div id="menu_selected_1.1" class="selectedmenuitemgroup" style="display: block;">
+<div class="menuitem">
+<a href="index.html">Welcome</a>
+</div>
+<div class="menuitem">
+<a href="zookeeperOver.html">Zookeeper Overview</a>
+</div>
+<div class="menuitem">
+<a href="zookeeperStarted.html">Getting Started</a>
+</div>
+<div class="menuitem">
+<a href="zookeeperProgrammers.html">Programmer's Guide</a>
+</div>
+<div class="menuitem">
+<a href="recipes.html">Recipes</a>
+</div>
+<div class="menuitem">
+<a href="zookeeperAdmin.html">Administrator's Guide</a>
+</div>
+<div class="menuitem">
+<a href="api/index.html">API Docs</a>
+</div>
+<div class="menuitem">
+<a href="http://wiki.apache.org/hadoop/ZooKeeper">Wiki</a>
+</div>
+<div class="menuitem">
+<a href="http://wiki.apache.org/hadoop/ZooKeeper/FAQ">FAQ</a>
+</div>
+<div class="menuitem">
+<a href="http://hadoop.apache.org/zookeeper/mailing_lists.html">Mailing Lists</a>
+</div>
+<div class="menupage">
+<div class="menupagetitle">Other Info</div>
+</div>
+</div>
+<div id="credit"></div>
+<div id="roundbottom">
+<img style="display: none" class="corner" height="15" width="15" alt="" src="skin/images/rc-b-l-15-1body-2menu-3menu.png"></div>
+<!--+
+  |alternative credits
+  +-->
+<div id="credit2"></div>
+</div>
+<!--+
+    |end Menu
+    +-->
+<!--+
+    |start content
+    +-->
+<div id="content">
+<div title="Portable Document Format" class="pdflink">
+<a class="dida" href="zookeeperOtherInfo.pdf"><img alt="PDF -icon" src="skin/images/pdfdoc.gif" class="skin"><br>
+        PDF</a>
+</div>
+<div id="minitoc-area">
+<ul class="minitoc">
+<li>
+<a href="#Other+Info">Other Info</a>
+</li>
+</ul>
+</div>
+  
+<title>ZooKeeper</title>
+
+  
+
+  
+<a name="N1000A"></a><a name="Other+Info"></a>
+<h2 class="h3">Other Info</h2>
+<div class="section">
+<p> currently empty </p>
+</div>
+
+<p align="right">
+<font size="-2"></font>
+</p>
+</div>
+<!--+
+    |end content
+    +-->
+<div class="clearboth">&nbsp;</div>
+</div>
+<div id="footer">
+<!--+
+    |start bottomstrip
+    +-->
+<div class="lastmodified">
+<script type="text/javascript"><!--
+document.write("Last Published: " + document.lastModified);
+//  --></script>
+</div>
+<div class="copyright">
+        Copyright &copy;
+         2008 <a href="http://www.apache.org/licenses/">The Apache Software Foundation.</a>
+</div>
+<!--+
+    |end bottomstrip
+    +-->
+</div>
+</body>
+</html>

+ 151 - 0
docs/zookeeperOtherInfo.pdf

@@ -0,0 +1,151 @@
+%PDF-1.3
+%ª«¬­
+4 0 obj
+<< /Type /Info
+/Producer (FOP 0.20.5) >>
+endobj
+5 0 obj
+<< /Length 351 /Filter [ /ASCII85Decode /FlateDecode ]
+ >>
+stream
+Gau`M_+qok&A@ZMF)2AI""amD]#IJ:@Z9*l6ao^tS(+aP<O70nTk<Qc"%!(\i4X\eZSq/rP<ja&VInnuM@o1m#FZup_Onqa?"Fqu41lin;[RGN*[[Q^T$PPP\e7Q(cZE*'>FWKpOT$_LFc7j&NOBd]F$9o#0(X&q5%1TKbN57U0BqI,4m4Q_O(2\spu[!+]+dSfs#6hTkSuBC+;)?lh&5nu]FQ?0;Q(l%iS0#P6>rQ0:MfE&-I,?uC8%M6A';BH=KEb^)+lcUE9iO^rig<1DhQQQqH"hc7"Rs"A/ZE'c*=@9pt=3XGM+WE2X[p+Q/f]'\<++8"QA`VJaHkB0>#,(q@<^0MsU~>
+endstream
+endobj
+6 0 obj
+<< /Type /Page
+/Parent 1 0 R
+/MediaBox [ 0 0 612 792 ]
+/Resources 3 0 R
+/Contents 5 0 R
+/Annots 7 0 R
+>>
+endobj
+7 0 obj
+[
+8 0 R
+]
+endobj
+8 0 obj
+<< /Type /Annot
+/Subtype /Link
+/Rect [ 102.0 556.541 160.316 544.541 ]
+/C [ 0 0 0 ]
+/Border [ 0 0 0 ]
+/A 9 0 R
+/H /I
+>>
+endobj
+10 0 obj
+<< /Length 355 /Filter [ /ASCII85Decode /FlateDecode ]
+ >>
+stream
+Gaqcq_+qm%%#44r$6Qmt]JrWSoZ.S2Dui@F^nGRY4)&J#eq*Pg1i6*AI/5Dks"(>][b>C;M&_D1EMs`U+,V3(6SFWP;OTmkKXZVR!DB,65C)N5lN08W<O[Jh-m%&I_7kX8+*>i+OAqsr16CP_;@j1S(-^CZ>C4*d&sN@1HPhun;K"56fsX+MK@1f:rAAcpj2XUJ)RZ#V*/3sllH_q-:rY7Z.AeGGI(3);POH&`X]K4r\]t='@Q(>H@`Y1VKgoaTU7[QGQtHbb6kV=Fh0,]sgXf.ajr&[F]2024E,G^lN=BiJ.^q.IG#eQ@\u;t"0(#h.pK`^)g>NTNFIsedBC!;!%`@aa!J'NYYQ~>
+endstream
+endobj
+11 0 obj
+<< /Type /Page
+/Parent 1 0 R
+/MediaBox [ 0 0 612 792 ]
+/Resources 3 0 R
+/Contents 10 0 R
+>>
+endobj
+13 0 obj
+<<
+ /Title (\376\377\0\61\0\40\0\117\0\164\0\150\0\145\0\162\0\40\0\111\0\156\0\146\0\157)
+ /Parent 12 0 R
+ /A 9 0 R
+>> endobj
+14 0 obj
+<< /Type /Font
+/Subtype /Type1
+/Name /F3
+/BaseFont /Helvetica-Bold
+/Encoding /WinAnsiEncoding >>
+endobj
+15 0 obj
+<< /Type /Font
+/Subtype /Type1
+/Name /F5
+/BaseFont /Times-Roman
+/Encoding /WinAnsiEncoding >>
+endobj
+16 0 obj
+<< /Type /Font
+/Subtype /Type1
+/Name /F1
+/BaseFont /Helvetica
+/Encoding /WinAnsiEncoding >>
+endobj
+17 0 obj
+<< /Type /Font
+/Subtype /Type1
+/Name /F2
+/BaseFont /Helvetica-Oblique
+/Encoding /WinAnsiEncoding >>
+endobj
+18 0 obj
+<< /Type /Font
+/Subtype /Type1
+/Name /F7
+/BaseFont /Times-Bold
+/Encoding /WinAnsiEncoding >>
+endobj
+1 0 obj
+<< /Type /Pages
+/Count 2
+/Kids [6 0 R 11 0 R ] >>
+endobj
+2 0 obj
+<< /Type /Catalog
+/Pages 1 0 R
+ /Outlines 12 0 R
+ /PageMode /UseOutlines
+ >>
+endobj
+3 0 obj
+<< 
+/Font << /F3 14 0 R /F5 15 0 R /F1 16 0 R /F2 17 0 R /F7 18 0 R >> 
+/ProcSet [ /PDF /ImageC /Text ] >> 
+endobj
+9 0 obj
+<<
+/S /GoTo
+/D [11 0 R /XYZ 85.0 659.0 null]
+>>
+endobj
+12 0 obj
+<<
+ /First 13 0 R
+ /Last 13 0 R
+>> endobj
+xref
+0 19
+0000000000 65535 f 
+0000002040 00000 n 
+0000002105 00000 n 
+0000002197 00000 n 
+0000000015 00000 n 
+0000000071 00000 n 
+0000000513 00000 n 
+0000000633 00000 n 
+0000000658 00000 n 
+0000002320 00000 n 
+0000000793 00000 n 
+0000001240 00000 n 
+0000002383 00000 n 
+0000001348 00000 n 
+0000001484 00000 n 
+0000001597 00000 n 
+0000001707 00000 n 
+0000001815 00000 n 
+0000001931 00000 n 
+trailer
+<<
+/Size 19
+/Root 2 0 R
+/Info 4 0 R
+>>
+startxref
+2434
+%%EOF

+ 629 - 0
docs/zookeeperOver.html

@@ -0,0 +1,629 @@
+<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
+<html>
+<head>
+<META http-equiv="Content-Type" content="text/html; charset=UTF-8">
+<meta content="Apache Forrest" name="Generator">
+<meta name="Forrest-version" content="0.8">
+<meta name="Forrest-skin-name" content="pelt">
+<title></title>
+<link type="text/css" href="skin/basic.css" rel="stylesheet">
+<link media="screen" type="text/css" href="skin/screen.css" rel="stylesheet">
+<link media="print" type="text/css" href="skin/print.css" rel="stylesheet">
+<link type="text/css" href="skin/profile.css" rel="stylesheet">
+<script src="skin/getBlank.js" language="javascript" type="text/javascript"></script><script src="skin/getMenu.js" language="javascript" type="text/javascript"></script><script src="skin/fontsize.js" language="javascript" type="text/javascript"></script>
+<link rel="shortcut icon" href="images/favicon.ico">
+</head>
+<body onload="init()">
+<script type="text/javascript">ndeSetTextSize();</script>
+<div id="top">
+<!--+
+    |breadtrail
+    +-->
+<div class="breadtrail">
+<a href="http://www.apache.org/">Apache</a> &gt; <a href="http://hadoop.apache.org/">Hadoop</a> &gt; <a href="http://hadoop.apache.org/zookeeper/">ZooKeeper</a><script src="skin/breadcrumbs.js" language="JavaScript" type="text/javascript"></script>
+</div>
+<!--+
+    |header
+    +-->
+<div class="header">
+<!--+
+    |start group logo
+    +-->
+<div class="grouplogo">
+<a href="http://hadoop.apache.org/"><img class="logoImage" alt="Hadoop" src="images/hadoop-logo.jpg" title="Apache Hadoop"></a>
+</div>
+<!--+
+    |end group logo
+    +-->
+<!--+
+    |start Project Logo
+    +-->
+<div class="projectlogo">
+<a href="http://hadoop.apache.org/zookeeper/"><img class="logoImage" alt="ZooKeeper" src="images/zookeeper_small.gif" title="The Hadoop database"></a>
+</div>
+<!--+
+    |end Project Logo
+    +-->
+<!--+
+    |start Search
+    +-->
+<div class="searchbox">
+<form action="http://www.google.com/search" method="get" class="roundtopsmall">
+<input value="hadoop.apache.org" name="sitesearch" type="hidden"><input onFocus="getBlank (this, 'Search the site with google');" size="25" name="q" id="query" type="text" value="Search the site with google">&nbsp; 
+                    <input name="Search" value="Search" type="submit">
+</form>
+</div>
+<!--+
+    |end search
+    +-->
+<!--+
+    |start Tabs
+    +-->
+<ul id="tabs">
+<li>
+<a class="unselected" href="http://hadoop.apache.org/zookeeper/">Project</a>
+</li>
+<li>
+<a class="unselected" href="http://wiki.apache.org/hadoop/ZooKeeper">Wiki</a>
+</li>
+<li class="current">
+<a class="selected" href="index.html">ZooKeeper Documentation</a>
+</li>
+</ul>
+<!--+
+    |end Tabs
+    +-->
+</div>
+</div>
+<div id="main">
+<div id="publishedStrip">
+<!--+
+    |start Subtabs
+    +-->
+<div id="level2tabs"></div>
+<!--+
+    |end Endtabs
+    +-->
+<script type="text/javascript"><!--
+document.write("Last Published: " + document.lastModified);
+//  --></script>
+</div>
+<!--+
+    |breadtrail
+    +-->
+<div class="breadtrail">
+
+             &nbsp;
+           </div>
+<!--+
+    |start Menu, mainarea
+    +-->
+<!--+
+    |start Menu
+    +-->
+<div id="menu">
+<div onclick="SwitchMenu('menu_selected_1.1', 'skin/')" id="menu_selected_1.1Title" class="menutitle" style="background-image: url('skin/images/chapter_open.gif');">Documentation</div>
+<div id="menu_selected_1.1" class="selectedmenuitemgroup" style="display: block;">
+<div class="menuitem">
+<a href="index.html">Welcome</a>
+</div>
+<div class="menupage">
+<div class="menupagetitle">Zookeeper Overview</div>
+</div>
+<div class="menuitem">
+<a href="zookeeperStarted.html">Getting Started</a>
+</div>
+<div class="menuitem">
+<a href="zookeeperProgrammers.html">Programmer's Guide</a>
+</div>
+<div class="menuitem">
+<a href="recipes.html">Recipes</a>
+</div>
+<div class="menuitem">
+<a href="zookeeperAdmin.html">Administrator's Guide</a>
+</div>
+<div class="menuitem">
+<a href="api/index.html">API Docs</a>
+</div>
+<div class="menuitem">
+<a href="http://wiki.apache.org/hadoop/ZooKeeper">Wiki</a>
+</div>
+<div class="menuitem">
+<a href="http://wiki.apache.org/hadoop/ZooKeeper/FAQ">FAQ</a>
+</div>
+<div class="menuitem">
+<a href="http://hadoop.apache.org/zookeeper/mailing_lists.html">Mailing Lists</a>
+</div>
+<div class="menuitem">
+<a href="zookeeperOtherInfo.html">Other Info</a>
+</div>
+</div>
+<div id="credit"></div>
+<div id="roundbottom">
+<img style="display: none" class="corner" height="15" width="15" alt="" src="skin/images/rc-b-l-15-1body-2menu-3menu.png"></div>
+<!--+
+  |alternative credits
+  +-->
+<div id="credit2"></div>
+</div>
+<!--+
+    |end Menu
+    +-->
+<!--+
+    |start content
+    +-->
+<div id="content">
+<div title="Portable Document Format" class="pdflink">
+<a class="dida" href="zookeeperOver.pdf"><img alt="PDF -icon" src="skin/images/pdfdoc.gif" class="skin"><br>
+        PDF</a>
+</div>
+<div id="minitoc-area">
+<ul class="minitoc">
+<li>
+<a href="#ZooKeeper%3A+A+Distributed+Coordination+Service+for+Distributed%0A++++Applications">ZooKeeper: A Distributed Coordination Service for Distributed
+    Applications</a>
+<ul class="minitoc">
+<li>
+<a href="#sc_designGoals">Design Goals</a>
+</li>
+<li>
+<a href="#sc_dataModelNameSpace">Data model and the hierarchical namespace</a>
+</li>
+<li>
+<a href="#Nodes+and+ephemeral+nodes">Nodes and ephemeral nodes</a>
+</li>
+<li>
+<a href="#Conditional+updates+and+watches">Conditional updates and watches</a>
+</li>
+<li>
+<a href="#Guarantees">Guarantees</a>
+</li>
+<li>
+<a href="#Simple+API">Simple API</a>
+</li>
+<li>
+<a href="#Implementation">Implementation</a>
+</li>
+<li>
+<a href="#Uses">Uses</a>
+</li>
+<li>
+<a href="#Performance">Performance</a>
+</li>
+<li>
+<a href="#The+ZooKeeper+Project">The ZooKeeper Project</a>
+</li>
+</ul>
+</li>
+</ul>
+</div>
+  
+<title>ZooKeeper</title>
+
+  
+
+  
+<a name="N1000A"></a><a name="ZooKeeper%3A+A+Distributed+Coordination+Service+for+Distributed%0A++++Applications"></a>
+<h2 class="h3">ZooKeeper: A Distributed Coordination Service for Distributed
+    Applications</h2>
+<div class="section">
+<p>ZooKeeper is a distributed, open-source coordination service for
+    distributed applications. It exposes a simple set of primitives that
+    distributed applications can build upon to implement higher level services
+    for synchronization, configuration maintenance, and groups and naming. It
+    is designed to be easy to program to, and uses a data model styled after
+    the familiar directory tree structure of file systems. It runs in Java and
+    has bindings for both Java and C.</p>
+<p>Coordination services are notoriously hard to get right. They are
+    especially prone to errors such as race conditions and deadlock. The
+    motivation behind ZooKeeper is to relieve distributed applications the
+    responsibility of implementing coordination services from scratch.</p>
+<a name="N10016"></a><a name="sc_designGoals"></a>
+<h3 class="h4">Design Goals</h3>
+<p>
+<strong>ZooKeeper is simple.</strong> ZooKeeper
+      allows distributed processes to coordinate with each other through a
+      shared hierarchal namespace which is organized similarly to a standard
+      file system. The name space consists of data registers - called znodes,
+      in ZooKeeper parlance - and these are similar to files and directories.
+      Unlike a typical file system, which is designed for storage, ZooKeeper
+      data is kept in-memory, which means ZooKeeper can acheive high
+      throughput and low latency numbers.</p>
+<p>The ZooKeeper implementation puts a premium on high performance,
+      highly available, strictly ordered access. The performance aspects of
+      ZooKeeper means it can be used in large, distributed systems. The
+      reliability aspects keep it from being a single point of failure. The
+      strict ordering means that sophisticated synchronization primitives can
+      be implemented at the client.</p>
+<p>
+<strong>ZooKeeper is replicated.</strong> Like the
+      distributed processes it coordinates, ZooKeeper itself is intended to be
+      replicated over a sets of machines called quorums.</p>
+<table class="ForrestTable" cellspacing="1" cellpadding="4">
+<tr>
+<td>ZooKeeper Service</td>
+</tr>
+<tr>
+<td>
+          
+            <img alt="" src="images/zkservice.jpg">
+          
+        </td>
+</tr>
+</table>
+<p>The servers that make up the ZooKeeper service must all know about
+      each other. They maintain an in-memory image of state, along with a
+      transaction logs and snapshots in a persistent store. As long as a
+      majority of the servers are available, the ZooKeeper service will be
+      available.</p>
+<p>Clients connect to a single ZooKeeper server. The client maintains
+      a TCP connection through which it sends requests, gets responses, gets
+      watch events, and sends heart beats. If the TCP connection to the server
+      breaks, the client will connect to a different server.</p>
+<p>
+<strong>ZooKeeper is ordered.</strong> ZooKeeper
+      stamps each update with a number that reflects the order of all
+      ZooKeeper transactions. Subsequent operations can use the order to
+      implement higher-level abstractions, such as synchronization
+      primitives.</p>
+<p>
+<strong>ZooKeeper is fast.</strong> It is
+      especially fast in "read-dominant" workloads. ZooKeeper applications run
+      on thousands of machines, and it performs best where reads are more
+      common than writes, at ratios of around 10:1.</p>
+<a name="N10046"></a><a name="sc_dataModelNameSpace"></a>
+<h3 class="h4">Data model and the hierarchical namespace</h3>
+<p>The name space provided by ZooKeeper is much like that of a
+      standard file system. A name is a sequence of path elements separated by
+      a slash (/). Every node in ZooKeeper's name space is identified by a
+      path.</p>
+<table class="ForrestTable" cellspacing="1" cellpadding="4">
+<tr>
+<td>ZooKeeper's Hierarchical Namespace</td>
+</tr>
+<tr>
+<td>
+          
+            <img alt="" src="images/zknamespace.jpg">
+          
+        </td>
+</tr>
+</table>
+<a name="N1005C"></a><a name="Nodes+and+ephemeral+nodes"></a>
+<h3 class="h4">Nodes and ephemeral nodes</h3>
+<p>Unlike is standard file systems, each node in a ZooKeeper
+      namespace can have data associated with it as well as children. It is
+      like having a file-system that allows a file to also be a directory.
+      (ZooKeeper was designed to store coordination data: status information,
+      configuration, location information, etc., so the data stored at each
+      node is usually small, in the byte to kilobyte range.) We use the term
+      <em>znode</em> to make it clear that we are talking about
+      ZooKeeper data nodes.</p>
+<p>Znodes maintain a stat structure that includes version numbers for
+      data changes, ACL changes, and timestamps, to allow cache validations
+      and coordinated updates. Each time a znode's data changes, the version
+      number increases. For instance, whenever a client retrieves data it also
+      receives the version of the data.</p>
+<p>The data stored at each znode in a namespace is read and written
+      atomically. Reads get all the data bytes associated with a znode and a
+      write replaces all the data. Each node has an Access Control List (ACL)
+      that restricts who can do what.</p>
+<p>ZooKeeper also has the notion of ephemeral nodes. These znodes
+      exists as long as the session that created the znode is active. When the
+      session ends the znode is deleted. Ephemeral nodes are useful when you
+      want to implement <remark>[tbd]</remark>.</p>
+<a name="N10075"></a><a name="Conditional+updates+and+watches"></a>
+<h3 class="h4">Conditional updates and watches</h3>
+<p>ZooKeeper supports the concept of <em>watches</em>.
+      Clients can set a watch on a znodes. A watch will be triggered and
+      removed when the znode changes. When a watch is triggered the client
+      receives a packet saying that the znode has changed. And if the
+      connection between the client and one of the Zoo Keeper servers is
+      broken, the client will receive a local notification. These can be used
+      to <remark>[tbd]</remark>.</p>
+<a name="N10085"></a><a name="Guarantees"></a>
+<h3 class="h4">Guarantees</h3>
+<p>ZooKeeper is very fast and very simple. Since its goal, though, is
+      to be a basis for the construction of more complicated services, such as
+      synchronization, it provides a set of guarantees. These are:</p>
+<ul>
+        
+<li>
+          
+<p>Sequential Consistency - Updates from a client will be applied
+          in the order that they were sent.</p>
+        
+</li>
+
+        
+<li>
+          
+<p>Atomicity - Updates either succeed or fail. No partial
+          results.</p>
+        
+</li>
+
+        
+<li>
+          
+<p>Single System Image - A client will see the same view of the
+          service regardless of the server that it connects to.</p>
+        
+</li>
+      
+</ul>
+<ul>
+        
+<li>
+          
+<p>Reliability - Once an update has been applied, it will persist
+          from that time forward until a client overwrites the update.</p>
+        
+</li>
+      
+</ul>
+<ul>
+        
+<li>
+          
+<p>Timeliness - The clients view of the system is guaranteed to
+          be up-to-date within a certain time bound.</p>
+        
+</li>
+      
+</ul>
+<p>For more information on these, and how they can be used, see
+      <remark>[tbd]</remark>
+</p>
+<a name="N100BB"></a><a name="Simple+API"></a>
+<h3 class="h4">Simple API</h3>
+<p>One of the design goals of ZooKeeper is provide a very simple
+      programming interface. As a result, it supports only these
+      operations:</p>
+<dl>
+        
+<dt>
+<term>create</term>
+</dt>
+<dd>
+<p>creates a node at a location in the tree</p>
+</dd>
+
+        
+<dt>
+<term>delete</term>
+</dt>
+<dd>
+<p>deletes a node</p>
+</dd>
+
+        
+<dt>
+<term>exists</term>
+</dt>
+<dd>
+<p>tests if a node exists at a location</p>
+</dd>
+
+        
+<dt>
+<term>get data</term>
+</dt>
+<dd>
+<p>reads the data from a node</p>
+</dd>
+
+        
+<dt>
+<term>set data</term>
+</dt>
+<dd>
+<p>writes data to a node</p>
+</dd>
+
+        
+<dt>
+<term>get children</term>
+</dt>
+<dd>
+<p>retrieves a list of children of a node</p>
+</dd>
+
+        
+<dt>
+<term>sync</term>
+</dt>
+<dd>
+<p>waits for data to be propagated</p>
+</dd>
+      
+</dl>
+<p>For a more in-depth discussion on these, and how they can be used
+      to implement higher level operations, please refer to
+      <remark>[tbd]</remark>
+</p>
+<a name="N100FE"></a><a name="Implementation"></a>
+<h3 class="h4">Implementation</h3>
+<p>
+<a href="#fg_zkComponents">ZooKeeper Components</a> shows the high-level components
+      of the ZooKeeper service. With the exception of the request processor,
+      <remark>[tbd: where does the request processor live?]</remark> each of
+      the servers that make up the ZooKeeper service replicates its own copy
+      of each of components. <remark>[tbd: I changed the wording in this
+      sentence from the white paper. Can someone please make sure it is still
+      correct?]</remark>
+</p>
+<p>
+<table class="ForrestTable" cellspacing="1" cellpadding="4">
+<tr>
+<td>ZooKeeper Components</td>
+</tr>
+<tr>
+<td>
+            
+              <img alt="" src="images/zkcomponents.jpg">
+            
+          </td>
+</tr>
+</table>
+</p>
+<p>The replicated database is an in-memory database containing the
+      entire data tree. Updates are logged to disk for recoverability, and
+      writes are serialized to disk before they are applied to the in-memory
+      database.</p>
+<p>Every ZooKeeper server services clients. Clients connect to
+      exactly one server to submit irequests. Read requests are serviced from
+      the local replica of each server database. Requests that change the
+      state of the service, write requests, are processed by an agreement
+      protocol.</p>
+<p>As part of the agreement protocol all write requests from clients
+      are forwarded to a single server, called the
+      <em>leader</em>. The rest of the ZooKeeper servers, called
+      <em>followers</em>, receive message proposals from the
+      leader and agree upon message delivery. The messaging layer takes care
+      of replacing leaders on failures and syncing followers with
+      leaders.</p>
+<p>ZooKeeper uses a custom atomic messaging protocol. Since the
+      messaging layer is atomic, ZooKeeper can guarantee that the local
+      replicas never diverge. When the leader receives a write request, it
+      calculates what the state of the system is when the write is to be
+      applied and transforms this into a transaction that captures this new
+      state.</p>
+<a name="N1012F"></a><a name="Uses"></a>
+<h3 class="h4">Uses</h3>
+<p>The programming interface to ZooKeeper is deliberately simple.
+      With it, however, you can implement higher order operations, such as
+      synchronizations primitives, group membership, ownership, etc. Some
+      distributed applications have used it to: <remark>[tbd: add uses from
+      white paper and video presentation.]</remark> For more information, see
+      <remark>[tbd]</remark>
+</p>
+<a name="N1013E"></a><a name="Performance"></a>
+<h3 class="h4">Performance</h3>
+<p>ZooKeeper is designed to be highly performant. But is it? The
+      results of the ZooKeeper's development team at Yahoo! Research indicate
+      that it is. (See <a href="#fg_zkPerfRW">ZooKeeper Throughput as the Read-Write Ratio Varies</a>.) It is especially high
+      performance in applications where reads outnumber writes, since writes
+      involve synchronizing the state of all servers. (Reads outnumbering
+      writes is typically the case for a coordination service.)</p>
+<p>
+<table class="ForrestTable" cellspacing="1" cellpadding="4">
+<tr>
+<td>ZooKeeper Throughput as the Read-Write Ratio Varies</td>
+</tr>
+<tr>
+<td>
+            
+              <img alt="" src="images/zkperfRW.jpg">
+            
+          </td>
+</tr>
+</table>Benchmarks also indicate that it is reliable, too. <a href="#fg_zkPerfReliability">Reliability in the Presence of Errors</a> shows how a deployment responds to
+      various failures. The events marked in the figure are the
+      following:</p>
+<ol>
+        
+<li>
+          
+<p>Failure and recovery of a follower</p>
+        
+</li>
+
+        
+<li>
+          
+<p>Failure and recovery of a different follower</p>
+        
+</li>
+
+        
+<li>
+          
+<p>Failure of the leader</p>
+        
+</li>
+
+        
+<li>
+          
+<p>Failure and recovery of two followers</p>
+        
+</li>
+
+        
+<li>
+          
+<p>Failure of another leader</p>
+        
+</li>
+      
+</ol>
+<p>
+<table class="ForrestTable" cellspacing="1" cellpadding="4">
+<tr>
+<td>Reliability in the Presence of Errors</td>
+</tr>
+<tr>
+<td>
+            
+              <img alt="" src="images/zkperfreliability.jpg">
+            
+          </td>
+</tr>
+</table>
+</p>
+<p>The are a few important observations from this graph. First, if
+      followers fail and recover quickly, then ZooKeeper is able to sustain a
+      high throughput despite the failure. But maybe more importantly, the
+      leader election algorithm allows for the system to recover fast enough
+      to prevent throughput from dropping substantially. In our observations,
+      ZooKeeper takes less than 200ms to elect a new leader. Third, as
+      followers recover, ZooKeeper is able to raise throughput again once they
+      start processing requests.</p>
+<a name="N1018F"></a><a name="The+ZooKeeper+Project"></a>
+<h3 class="h4">The ZooKeeper Project</h3>
+<p>ZooKeeper has been successfully used in industrial applications.
+      It is used at Yahoo! as the coordination and failure recovery service
+      for Yahoo! Message Broker, which is a highly scalable publish-subscribe
+      system managing thousands of topics for replication and data delivery.
+      It is used by the Fetching Service for Yahoo! crawler, where it also
+      manages failure recovery. And it is used by Hadoop On Demand (HOD),
+      which is an open source implementation of the map-reduce model of
+      computation. HOD uses Zookeeper as a communications and control channel
+      between slave and master process. (For more information, see the <a href="http://hadoop.apache.org/core/">Hadoop</a> and <a href="http://hadoop.apache.org/core/docs/current/hod.html">Hadoop on
+      Demand</a> open source projects on Apache.)</p>
+<p>ZooKeeper itself is an open source project, under the Apache Open
+      Source Foundation. It is a subproject of Hadoop. All users and
+      developers are encourged to join the community and contribute their
+      expertise. See the <a href="http://hadoop.apache.org/zookeeper/">Zookeeper Project on
+      Apache</a> for more information.</p>
+</div>
+
+<p align="right">
+<font size="-2"></font>
+</p>
+</div>
+<!--+
+    |end content
+    +-->
+<div class="clearboth">&nbsp;</div>
+</div>
+<div id="footer">
+<!--+
+    |start bottomstrip
+    +-->
+<div class="lastmodified">
+<script type="text/javascript"><!--
+document.write("Last Published: " + document.lastModified);
+//  --></script>
+</div>
+<div class="copyright">
+        Copyright &copy;
+         2008 <a href="http://www.apache.org/licenses/">The Apache Software Foundation.</a>
+</div>
+<!--+
+    |end bottomstrip
+    +-->
+</div>
+</body>
+</html>

BIN
docs/zookeeperOver.pdf


+ 1540 - 0
docs/zookeeperProgrammers.html

@@ -0,0 +1,1540 @@
+<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
+<html>
+<head>
+<META http-equiv="Content-Type" content="text/html; charset=UTF-8">
+<meta content="Apache Forrest" name="Generator">
+<meta name="Forrest-version" content="0.8">
+<meta name="Forrest-skin-name" content="pelt">
+<title></title>
+<link type="text/css" href="skin/basic.css" rel="stylesheet">
+<link media="screen" type="text/css" href="skin/screen.css" rel="stylesheet">
+<link media="print" type="text/css" href="skin/print.css" rel="stylesheet">
+<link type="text/css" href="skin/profile.css" rel="stylesheet">
+<script src="skin/getBlank.js" language="javascript" type="text/javascript"></script><script src="skin/getMenu.js" language="javascript" type="text/javascript"></script><script src="skin/fontsize.js" language="javascript" type="text/javascript"></script>
+<link rel="shortcut icon" href="images/favicon.ico">
+</head>
+<body onload="init()">
+<script type="text/javascript">ndeSetTextSize();</script>
+<div id="top">
+<!--+
+    |breadtrail
+    +-->
+<div class="breadtrail">
+<a href="http://www.apache.org/">Apache</a> &gt; <a href="http://hadoop.apache.org/">Hadoop</a> &gt; <a href="http://hadoop.apache.org/zookeeper/">ZooKeeper</a><script src="skin/breadcrumbs.js" language="JavaScript" type="text/javascript"></script>
+</div>
+<!--+
+    |header
+    +-->
+<div class="header">
+<!--+
+    |start group logo
+    +-->
+<div class="grouplogo">
+<a href="http://hadoop.apache.org/"><img class="logoImage" alt="Hadoop" src="images/hadoop-logo.jpg" title="Apache Hadoop"></a>
+</div>
+<!--+
+    |end group logo
+    +-->
+<!--+
+    |start Project Logo
+    +-->
+<div class="projectlogo">
+<a href="http://hadoop.apache.org/zookeeper/"><img class="logoImage" alt="ZooKeeper" src="images/zookeeper_small.gif" title="The Hadoop database"></a>
+</div>
+<!--+
+    |end Project Logo
+    +-->
+<!--+
+    |start Search
+    +-->
+<div class="searchbox">
+<form action="http://www.google.com/search" method="get" class="roundtopsmall">
+<input value="hadoop.apache.org" name="sitesearch" type="hidden"><input onFocus="getBlank (this, 'Search the site with google');" size="25" name="q" id="query" type="text" value="Search the site with google">&nbsp; 
+                    <input name="Search" value="Search" type="submit">
+</form>
+</div>
+<!--+
+    |end search
+    +-->
+<!--+
+    |start Tabs
+    +-->
+<ul id="tabs">
+<li>
+<a class="unselected" href="http://hadoop.apache.org/zookeeper/">Project</a>
+</li>
+<li>
+<a class="unselected" href="http://wiki.apache.org/hadoop/ZooKeeper">Wiki</a>
+</li>
+<li class="current">
+<a class="selected" href="index.html">ZooKeeper Documentation</a>
+</li>
+</ul>
+<!--+
+    |end Tabs
+    +-->
+</div>
+</div>
+<div id="main">
+<div id="publishedStrip">
+<!--+
+    |start Subtabs
+    +-->
+<div id="level2tabs"></div>
+<!--+
+    |end Endtabs
+    +-->
+<script type="text/javascript"><!--
+document.write("Last Published: " + document.lastModified);
+//  --></script>
+</div>
+<!--+
+    |breadtrail
+    +-->
+<div class="breadtrail">
+
+             &nbsp;
+           </div>
+<!--+
+    |start Menu, mainarea
+    +-->
+<!--+
+    |start Menu
+    +-->
+<div id="menu">
+<div onclick="SwitchMenu('menu_selected_1.1', 'skin/')" id="menu_selected_1.1Title" class="menutitle" style="background-image: url('skin/images/chapter_open.gif');">Documentation</div>
+<div id="menu_selected_1.1" class="selectedmenuitemgroup" style="display: block;">
+<div class="menuitem">
+<a href="index.html">Welcome</a>
+</div>
+<div class="menuitem">
+<a href="zookeeperOver.html">Zookeeper Overview</a>
+</div>
+<div class="menuitem">
+<a href="zookeeperStarted.html">Getting Started</a>
+</div>
+<div class="menupage">
+<div class="menupagetitle">Programmer's Guide</div>
+</div>
+<div class="menuitem">
+<a href="recipes.html">Recipes</a>
+</div>
+<div class="menuitem">
+<a href="zookeeperAdmin.html">Administrator's Guide</a>
+</div>
+<div class="menuitem">
+<a href="api/index.html">API Docs</a>
+</div>
+<div class="menuitem">
+<a href="http://wiki.apache.org/hadoop/ZooKeeper">Wiki</a>
+</div>
+<div class="menuitem">
+<a href="http://wiki.apache.org/hadoop/ZooKeeper/FAQ">FAQ</a>
+</div>
+<div class="menuitem">
+<a href="http://hadoop.apache.org/zookeeper/mailing_lists.html">Mailing Lists</a>
+</div>
+<div class="menuitem">
+<a href="zookeeperOtherInfo.html">Other Info</a>
+</div>
+</div>
+<div id="credit"></div>
+<div id="roundbottom">
+<img style="display: none" class="corner" height="15" width="15" alt="" src="skin/images/rc-b-l-15-1body-2menu-3menu.png"></div>
+<!--+
+  |alternative credits
+  +-->
+<div id="credit2"></div>
+</div>
+<!--+
+    |end Menu
+    +-->
+<!--+
+    |start content
+    +-->
+<div id="content">
+<div title="Portable Document Format" class="pdflink">
+<a class="dida" href="zookeeperProgrammers.pdf"><img alt="PDF -icon" src="skin/images/pdfdoc.gif" class="skin"><br>
+        PDF</a>
+</div>
+<div id="minitoc-area">
+<ul class="minitoc">
+<li>
+<a href="#The+ZooKeeper+Data+Model">The ZooKeeper Data Model</a>
+<ul class="minitoc">
+<li>
+<a href="#sc_zkDataModel_znodes">ZNodes</a>
+<ul class="minitoc">
+<li>
+<a href="#sc_zkDataMode_watches">Watches</a>
+</li>
+<li>
+<a href="#Data+Access">Data Access</a>
+</li>
+<li>
+<a href="#Ephemeral+Nodes">Ephemeral Nodes</a>
+</li>
+<li>
+<a href="#Unique+Naming">Unique Naming</a>
+</li>
+</ul>
+</li>
+<li>
+<a href="#sc_timeInZk">Time in ZooKeeper</a>
+</li>
+<li>
+<a href="#sc_zkStatStructure">ZooKeeper Stat Structure</a>
+</li>
+</ul>
+</li>
+<li>
+<a href="#ZooKeeper+Sessions">ZooKeeper Sessions</a>
+</li>
+<li>
+<a href="#ZooKeeper+Watches">ZooKeeper Watches</a>
+<ul class="minitoc">
+<li>
+<a href="#sc_WatchGuarantees">What ZooKeeper Guarantees about Watches</a>
+</li>
+<li>
+<a href="#sc_WatchRememberThese">Things to Remember about Watches</a>
+</li>
+</ul>
+</li>
+<li>
+<a href="#Consistency+Guarantees">Consistency Guarantees</a>
+</li>
+<li>
+<a href="#Bindings">Bindings</a>
+<ul class="minitoc">
+<li>
+<a href="#Java+Binding">Java Binding</a>
+</li>
+<li>
+<a href="#C+Binding">C Binding</a>
+<ul class="minitoc">
+<li>
+<a href="#Installation">Installation</a>
+</li>
+<li>
+<a href="#Using+the+Client">Using the Client</a>
+</li>
+</ul>
+</li>
+</ul>
+</li>
+<li>
+<a href="#Building+Blocks%3A+A+Guide+to+ZooKeeper+Operations">Building Blocks: A Guide to ZooKeeper Operations</a>
+</li>
+<li>
+<a href="#Program+Structure%2C+with+Simple+Example">Program Structure, with Simple Example</a>
+</li>
+<li>
+<a href="#Gotchas%3A+Common+Problems+and+Troubleshooting">Gotchas: Common Problems and Troubleshooting</a>
+</li>
+</ul>
+</div>
+  
+<title>ZooKeeper Programmer's Guide</title>
+
+  
+<subtitle>Developing Distributed Applications that use ZooKeeper</subtitle>
+
+  
+
+  
+<a name="_introduction"></a>
+<preface id="_introduction">
+    
+<title>Introduction</title>
+
+    
+<p>This document is a guide for developers wishing to create
+    distributed applications that take advantage of ZooKeeper's coordination
+    services. It contains conceptual and practical information.</p>
+
+    
+<p>The first four chapters of this guide present higher level
+    discussions of various ZooKeeper concepts. These are necessary both for an
+    understanding of how Zookeeper works as well how to work with it. It does
+    not contain source code, but it does assume a familiarity with the
+    problems associated with distributed computing. The chapters in this first
+    group are:</p>
+
+    
+<ul>
+      
+<li>
+        
+<p>
+<a href="#ch_zkDataModel">The ZooKeeper Data Model</a>
+</p>
+      
+</li>
+
+      
+<li>
+        
+<p>
+<a href="#ch_zkSessions">ZooKeeper Sessions</a>
+</p>
+      
+</li>
+
+      
+<li>
+        
+<p>
+<a href="#ch_zkWatches">ZooKeeper Watches</a>
+</p>
+      
+</li>
+
+      
+<li>
+        
+<p>
+<a href="#ch_zkGuarantees">Consistency Guarantees</a>
+</p>
+      
+</li>
+    
+</ul>
+
+    
+<p>The next four chapters of this provided practical programming
+    information. These are:</p>
+
+    
+<ul>
+      
+<li>
+        
+<p>
+<a href="#ch_guideToZkOperations">Building Blocks: A Guide to ZooKeeper Operations</a>
+</p>
+      
+</li>
+
+      
+<li>
+        
+<p>
+<a href="#ch_bindings">Bindings</a>
+</p>
+      
+</li>
+
+      
+<li>
+        
+<p>
+<a href="#ch_programStructureWithExample">Program Structure, with Simple Example</a>
+        
+<remark>[tbd]</remark>
+</p>
+      
+</li>
+
+      
+<li>
+        
+<p>
+<a href="#ch_gotchas">Gotchas: Common Problems and Troubleshooting</a>
+</p>
+      
+</li>
+    
+</ul>
+
+    
+<p>The book concludes with an <a href="#apx_linksToOtherInfo">appendix</a> containing links to other
+    useful, ZooKeeper-related information.</p>
+
+    
+<p>Most of information in this document is written to be accessible as
+    stand-alone reference material. However, before starting your first
+    ZooKeeper application, you should probably at least read the chaptes on
+    the <a href="#ch_zkDataModel">ZooKeeper Data Model</a> and <a href="#ch_guideToZkOperations">ZooKeeper Basic Operations</a>. Also,
+    the <a href="#ch_programStructureWithExample">Simple Programmming
+    Example</a> 
+<remark>[tbd]</remark> is helpful for understand the basic
+    structure of a ZooKeeper client application.</p>
+  
+</preface>
+
+  
+<a name="N1007F"></a><a name="The+ZooKeeper+Data+Model"></a>
+<h2 class="h3">The ZooKeeper Data Model</h2>
+<div class="section">
+<p>ZooKeeper has a hierarchal name space, much like a distributed file
+    system. The only difference is that each node in the namespace can have
+    data associated with it as well as children. It is like having a file
+    system that allows a file to also be a directory. Paths to nodes are
+    always expressed as canonical, absolute, slash-separated paths; there are
+    no relative reference. Any unicode character can be used in a path subject
+    to the following constraints:</p>
+<ul>
+      
+<li>
+        
+<p>The null character (\u0000) cannot be part of a path name. (This
+        causes problems with the C binding.)</p>
+      
+</li>
+
+      
+<li>
+        
+<p>The following characters can't be used because they don't
+        display well, or render in confusing ways: \u0001 - \u0019 and \u007F
+        - \u009F.</p>
+      
+</li>
+
+      
+<li>
+        
+<p>The following characters are not allowed because <remark>[tbd:
+        do we need reasons?]</remark> :\ud800 -uF8FFF, \uFFF0-uFFFF, \uXFFFE -
+        \uXFFFF (where X is an digit 1 - E), \uF0000 - \uFFFFF.</p>
+      
+</li>
+
+      
+<li>
+        
+<p>The "." character can be used as part of another name, but "."
+        and ".." cannot alone make up the whole name of a path location,
+        because ZooKeeper doesn't use relative paths. The following would be
+        invalid: "/a/b/./c" or "/a/b/../c".</p>
+      
+</li>
+
+      
+<li>
+        
+<p>The token "zookeeper" is reserved.</p>
+      
+</li>
+    
+</ul>
+<a name="N100AC"></a><a name="sc_zkDataModel_znodes"></a>
+<h3 class="h4">ZNodes</h3>
+<p>Every node in a ZooKeeper tree is refered to as a
+      <em>znode</em>. Znodes maintain a stat structure that
+      includes version numbers for data changes, acl changes. The stat
+      structure also has timestamps. The version number, together with the
+      timestamp allow ZooKeeper to validate the cache and to coordinate
+      updates. Each time a znode's data changes, the version number increases.
+      For instance, whenever a client retrieves data, it also receives the
+      version of the data. And when a client performs an update or a delete,
+      it must supply the version of the data of the znode it is changing. If
+      the version it supplies doesn't match the actual version of the data,
+      the update will fail. (This behavior can be overridden. For more
+      information see... <remark>[tbd... reference here to the section
+      describing the special version number -1]</remark>
+</p>
+<div class="note">
+<div class="label">Note</div>
+<div class="content">
+        
+<p>In distributed application engineering, the word
+        <em>node</em> can refer to a generic host machine, a
+        server, a member of quorums, a client process, etc. In the ZooKeeper
+        documentatin, <em>znodes</em> refer to the data nodes.
+        <em>Servers</em> to refer to machines that make up the
+        ZooKeeper service; <em>quorum peers</em> refer to the
+        servers that make up a quorum; client refers to any host or process
+        which uses a ZooKeeper service.</p>
+      
+</div>
+</div>
+<p>Znodes are the main enitity that a programmer access. They have
+      several characteristics that are worth mentioning here.</p>
+<a name="N100CF"></a><a name="sc_zkDataMode_watches"></a>
+<h4>Watches</h4>
+<p>Clients can set watches on znodes. Changes to that znode trigger
+        the watch and then clear the watch. When a watch triggers, ZooKeeper
+        sends the client a notification. More information about watches can be
+        found in the section 
+	<a href="recipes.html#sc_recipes_Locks">
+	Zookeeper Watches</a>.
+        <remark>[tbd: fix this link] [tbd: Ben there is note from to emphasize
+        that "it is queued". What is "it" and is what we have here
+        sufficient?]</remark>
+</p>
+<a name="N100DF"></a><a name="Data+Access"></a>
+<h4>Data Access</h4>
+<p>The data stored at each znode in a namespace is read and written
+        atomically. Reads get all the data bytes associated with a znode and a
+        write replaces all the data. Each node has an Access Control List
+        (ACL) that restricts who can do what.</p>
+<a name="N100E9"></a><a name="Ephemeral+Nodes"></a>
+<h4>Ephemeral Nodes</h4>
+<p>ZooKeeper also has the notion of ephemeral nodes. These znodes
+        exists as long as the session that created the znode is active. When
+        the session ends the znode is deleted. Because of this behavior
+        ephemeral znodes are not allowed to have children.</p>
+<a name="N100F3"></a><a name="Unique+Naming"></a>
+<h4>Unique Naming</h4>
+<p>Finally you create a znode, you can request that ZooKeeper
+        append a monotonicly increasing counter be appended to the path name
+        of the znode to be requested. This counter is unique to the parent
+        znode.</p>
+<a name="N100FE"></a><a name="sc_timeInZk"></a>
+<h3 class="h4">Time in ZooKeeper</h3>
+<p>ZooKeeper tracks time multiple ways:</p>
+<ul>
+        
+<li>
+          
+<p>
+<strong>Zxid</strong>
+</p>
+
+          
+<p>Every change to the ZooKeeper state receives a stamp in the
+          form of a <em>zxid</em> (ZooKeeper Transaction Id).
+          This exposes the total ordering of all changes to ZooKeeper. Each
+          change will have a unique zxid and if zxid1 is smaller than zxid2
+          then zxid1 happened before zxid2.</p>
+        
+</li>
+
+        
+<li>
+          
+<p>
+<strong>Version numbers</strong>
+</p>
+
+          
+<p>Every change to a a node will cause an increase to one of the
+          version numbers of that node. The three version numbers are version
+          (number of changes to the data of a znode), cversion (number of
+          changes to the children of a znode), and aversion (number of changes
+          to the ACL of a znode).</p>
+        
+</li>
+
+        
+<li>
+          
+<p>
+<strong>Ticks</strong>
+</p>
+
+          
+<p>When using multi-server ZooKeeper, servers use ticks to define
+          timing of events such as status uploads, session timeouts,
+          connection timeouts between peers, etc. The tick time is only
+          indirectly exposed through the minimum session timeout (2 times the
+          tick time); if a client requests a session timeout less than the
+          minimum session timeout, the server will tell the client that the
+          session timeout is actually the minimum session timeout.</p>
+        
+</li>
+
+        
+<li>
+          
+<p>
+<strong>Real time</strong>
+</p>
+
+          
+<p>ZooKeeper doesn't use real time, or clock time, at all except
+          to put timestamps into the stat structure on znode creation and
+          znode modification.</p>
+        
+</li>
+      
+</ul>
+<a name="N10136"></a><a name="sc_zkStatStructure"></a>
+<h3 class="h4">ZooKeeper Stat Structure</h3>
+<p>The Stat structure for each znode in ZooKeeper is made up of the
+      following fields:</p>
+<ul>
+        
+<li>
+          
+<p>
+<strong>czxid</strong>
+</p>
+
+          
+<p>The zxid of the change that caused this znode to be
+          created.</p>
+        
+</li>
+
+        
+<li>
+          
+<p>
+<strong>mzxid</strong>
+</p>
+
+          
+<p>The zxid of the change that last modified this znode.</p>
+        
+</li>
+
+        
+<li>
+          
+<p>
+<strong>ctime</strong>
+</p>
+
+          
+<p>The time in milliseconds from epoch when this znode was
+          created.</p>
+        
+</li>
+
+        
+<li>
+          
+<p>
+<strong>mtime</strong>
+</p>
+
+          
+<p>The time in milliseconds from epoch when this znode was last
+          modified.</p>
+        
+</li>
+
+        
+<li>
+          
+<p>
+<strong>version</strong>
+</p>
+
+          
+<p>The number of changes to the data of this znode.</p>
+        
+</li>
+
+        
+<li>
+          
+<p>
+<strong>cversion</strong>
+</p>
+
+          
+<p>The number of changes to the children of this znode.</p>
+        
+</li>
+
+        
+<li>
+          
+<p>
+<strong>aversion</strong>
+</p>
+
+          
+<p>The number of changes to the ACL of this znode.</p>
+        
+</li>
+
+        
+<li>
+          
+<p>
+<strong>ephemeralOwner</strong>
+</p>
+
+          
+<p>The session id of the owner of this znode if the znode is an
+          ephemeral node. If it is not an ephemeral node, it will be
+          zero.</p>
+        
+</li>
+      
+</ul>
+</div>
+
+  
+<a name="N10194"></a><a name="ZooKeeper+Sessions"></a>
+<h2 class="h3">ZooKeeper Sessions</h2>
+<div class="section">
+<p>When a client gets a handle to the ZooKeeper service, ZooKeeper
+    creates a ZooKeeper session, represented as a 64-bit number, that it
+    assigns to the client. If the client connects to a different ZooKeeper
+    server, it will send the session id as a part of the connection handshake.
+    As a security measure, the server creates a password for the session id
+    that any ZooKeeper server can validate. <remark>[tbd: note from Ben:
+    "perhaps capability is a better word." need clarification on that.]
+    </remark>The password is sent to the client with the session id when the
+    client establishes the session. The client sends this password with the
+    session id whenever it reestablishes the session with a new server.</p>
+<p>One of the parameters to the ZooKeeper client library call to create
+    a ZooKeeper session is the session timeout in milliseconds. The client
+    sends a requested timeout, the server responds with the timeout that it
+    can give the client. The current implementation requires that the timeout
+    be between 2 times the tickTime (as set in the server configuration) and
+    60 seconds.</p>
+<p>The session is kept alive by requests sent by the client. If the
+    session is idle for a period of time that would timeout the session, the
+    client will send a PING request to keep the session alive. This PING
+    request not only allows the ZooKeeper server to know that the client is
+    still active, but it also allows the client to verify that its connection
+    to the ZooKeeper server is still active. The timing of the PING is
+    conservative enough to ensure reasonable time to detect a dead connection
+    and reconnect to a new server.</p>
+</div>
+
+  
+<a name="N101A7"></a><a name="ZooKeeper+Watches"></a>
+<h2 class="h3">ZooKeeper Watches</h2>
+<div class="section">
+<p>All of the read operations in ZooKeeper - <strong>getData()</strong>, <strong>getChildren()</strong>, and <strong>exists()</strong> - have the option of setting a watch as a
+    side effect. Here is ZooKeeper's definition of a watch: a watch event is
+    one-time trigger, sent to the client that set the watch, which occurs when
+    the data for which the watch was set changes. There are three key points
+    to consider in this definition of a watch:</p>
+<ul>
+      
+<li>
+        
+<p>
+<strong>One-time trigger</strong>
+</p>
+
+        
+<p>One watch event will be sent to the client the data has changed.
+        For example, if a client does a getData("/znode1", true) and later the
+        data for /znode1 is changed or deleted, the client will get a watch
+        event for /znode1. If /znode1 changes again, no watch event will be
+        sent unless the client has done another read that sets a new
+        watch.</p>
+      
+</li>
+
+      
+<li>
+        
+<p>
+<strong>Sent to the client</strong>
+</p>
+
+        
+<p>This implies that an event is on the way to the client, but may
+        not reach the client before the successful return code to the change
+        operation reaches the client that initiated the change. Watches are
+        sent asynchronously to watchers. ZooKeeper provides an ordering
+        guarantee: a client will never see a change for which it has set a
+        watch until it first sees the watch event. Network delays or other
+        factors may cause different clients to see watches and return codes
+        from updates at different times. The key point is that everything seen
+        by the different clients will have a consistent order.</p>
+      
+</li>
+
+      
+<li>
+        
+<p>
+<strong>The data for which the watch was
+        set</strong>
+</p>
+
+        
+<p>This refers to the different ways a node can change. ZooKeeper
+        maintains two lists of watches: data watches and child watches.
+        getData() and exists() set data watches. getChildren() sets child
+        watches. Thus, setData() will trigger data watches for the znode being
+        set (assuming the set is successful). A successful create() will
+        trigger a data watch for the znode being created and a child watch for
+        the parent znode. A successful delete() will trigger both a data watch
+        and a child watch (since there can be no more children) for a znode
+        being deleted as well as a child watch for the parent znode.</p>
+      
+</li>
+    
+</ul>
+<p>Watches are maintained locally at the ZooKeeper server to which the
+    client is connected. This allows watches to be light weight to set,
+    maintain, and dispatch. It also means if a client connects to a different
+    server, the new server is not going to know about its watches. So, when a
+    client gets a disconnect event, it must consider that an implicit trigger
+    of all watches. When a client reconnects to a new server, the client
+    should re-set any watches that it is still interested in.</p>
+<a name="N101DD"></a><a name="sc_WatchGuarantees"></a>
+<h3 class="h4">What ZooKeeper Guarantees about Watches</h3>
+<p>With regard to watches, ZooKeeper maintains these
+      guarantees:</p>
+<ul>
+        
+<li>
+          
+<p>Watches are ordered with respect to other events, other
+          watches, and asynchronous replies. The ZooKeeper client libraries
+          ensures that everything is dispatched in order.</p>
+        
+</li>
+      
+</ul>
+<ul>
+        
+<li>
+          
+<p>A client will see a watch event for a znode it is watching
+          before seeing the new data that corresponds to that znode.</p>
+        
+</li>
+      
+</ul>
+<ul>
+        
+<li>
+          
+<p>The order of watch events from ZooKeeper corresponds to the
+          order of the updates as seen by the ZooKeeper service.</p>
+        
+</li>
+      
+</ul>
+<a name="N10202"></a><a name="sc_WatchRememberThese"></a>
+<h3 class="h4">Things to Remember about Watches</h3>
+<ul>
+        
+<li>
+          
+<p>Watches are one time triggers; if you get a watch event and
+          you want to get notified of future changes, you must set another
+          watch.</p>
+        
+</li>
+      
+</ul>
+<ul>
+        
+<li>
+          
+<p>Because watches are one time triggers and there is latency
+          between getting the event and sending a new request to get a watch
+          you cannot reliably see every change that happens to a node in
+          ZooKeeper. Be prepared to handle the case where the znode changes
+          multiple times between getting the event and setting the watch
+          again. (You may not care, but at least realize it may
+          happen.)</p>
+        
+</li>
+      
+</ul>
+<ul>
+        
+<li>
+          
+<p>When you disconnect from a server (for example, when the
+          server fails), all of the watches you have registered are lost, so
+          you should treat this case as if all your watches were
+          triggered.</p>
+        
+</li>
+      
+</ul>
+</div>
+
+  
+<a name="N10225"></a><a name="Consistency+Guarantees"></a>
+<h2 class="h3">Consistency Guarantees</h2>
+<div class="section">
+<p>ZooKeeper is a high performance, scalable service. Both reads and
+    write operations are designed to be fast, though reads are faster than
+    writes. The reason for this is that in the case of reads, ZooKeeper can
+    serve older data, which in turn is due to ZooKeeper's consistency
+    guarantees:</p>
+<dl>
+      
+<dt>
+<term>Sequential Consistency</term>
+</dt>
+<dd>
+<p>Updates from a client will be applied in the order that they
+          were sent.</p>
+</dd>
+
+      
+<dt>
+<term>Atomicity</term>
+</dt>
+<dd>
+<p>Updates either succeed or fail -- there are no partial
+          results.</p>
+</dd>
+
+      
+<dt>
+<term>Single System Image</term>
+</dt>
+<dd>
+<p>A client will see the same view of the service regardless of
+          the server that it connects to.</p>
+</dd>
+
+      
+<dt>
+<term>Reliability</term>
+</dt>
+<dd>
+<p>Once an update has been applied, it will persist from that
+          time forward until a client overwrites the update. This guarantee
+          has two corollaries:</p>
+<ol>
+            
+<li>
+              
+<p>If a client gets a successful return code, the update will
+              have been applied. On some failures (communication errors,
+              timeouts, etc) the client will not know if the update has
+              applied or not. We take steps to minimize the failures, but the
+              only guarantee is only present with successful return codes.
+              (This is called the _monotonicity condition_ in Paxos.)</p>
+            
+</li>
+
+            
+<li>
+              
+<p>Any updates that are seen by the client, through a read
+              request or successful update, will never be rolled back when
+              recovering from server failures.</p>
+            
+</li>
+          
+</ol>
+</dd>
+
+      
+<dt>
+<term>Timeliness</term>
+</dt>
+<dd>
+<p>The clients view of the system is guaranteed to be up-to-date
+          within a certain time bound. (On the order of tens of seconds.)
+          Either system changes will be seen by a client within this bound, or
+          the client will detect a service outage.</p>
+</dd>
+    
+</dl>
+<p>Using these consistency guarantees it is easy to build higher level
+    functions such as leader election, barriers, queues, and read/write
+    revocable locks solely at the ZooKeeper client (no additions needed to
+    ZooKeeper). See <a href="recipes.html">Recipes and Solutions</a>
+    for more details.</p>
+<p>
+<div class="note">
+<div class="label">Note</div>
+<div class="content">
+        
+<p>Sometimes developers mistakenly assume one other guarantee that
+        Zookeeper does <em>not</em> in fact make. This is:</p>
+
+        
+<dl>
+          
+<dt>
+<term>Simultaneously Conistent Cross-Client Views</term>
+</dt>
+<dd>
+<p>ZooKeeper does not guarantee that at every instance in
+              time, two different clients will have identical views of
+              ZooKeeper data. Due to factors like network delays, one client
+              may perform an update before another client gets notified of the
+              change. Consider the scenario of two clients, A and B. If client
+              A sets the value of a znode /a from 0 to 1, then tells client B
+              to read /a, client B may read the old value of 0, depending on
+              which server in the ZooKeeper quorum it is connected to. If it
+              is important that Client A and Client B read the same value,
+              Client B should should call the <strong>sync()</strong> method from the ZooKeeper API
+              method before it performs its read.</p>
+<p>So, ZooKeeper by itself doesn't guarantee instantaneous,
+              atomic, synchronization across its quorum, but ZooKeeper
+              primitives can be used to construct higher level functions that
+              provide complete client synchronization. (For more information,
+              see the <a href="recipes.html#sc_recipes_Locks">Locks</a>
+              
+<remark>[tbd: fix final link target]</remark> in <a href="recipes.html">Zookeeper Recipes</a>.
+              <remark>[tbd: fix final link target]</remark>).</p>
+</dd>
+        
+</dl>
+      
+</div>
+</div>
+</p>
+</div>
+
+  
+<a name="N10291"></a><a name="Bindings"></a>
+<h2 class="h3">Bindings</h2>
+<div class="section">
+<p>The ZooKeeper client libraries come in two languages: Java and C.
+    The following sections describe these.</p>
+<a name="N1029A"></a><a name="Java+Binding"></a>
+<h3 class="h4">Java Binding</h3>
+<p>There are two packages that make up the ZooKeeper Java binding:
+      <strong>org.apache.zookeeper</strong> and <strong>org.apache.zookeeper.data</strong>. The rest of the
+      packages that make up ZooKeeper are used internally or are part of the
+      server implementation. The <strong>org.apache.zookeeper.data</strong> package is made up of
+      generated classes that are used simply as containers.</p>
+<p>The main class used by a ZooKeeper Java client is the <strong>ZooKeeper</strong> class. Its two constructors differ only
+      by an optional session id and password. ZooKeeper supports session
+      recovery accross instances of a process. A Java program may save its
+      session id and password to stable storage, restart, and recover the
+      session that was used by the earlier instance of the program.</p>
+<p>When a ZooKeeper object is created, two threads are created as
+      well: an IO thread and an event thread. All IO happens on the IO thread
+      (using Java NIO). All event callbacks happen on the event thread.
+      Session maintenance such as reconnecting to ZooKeeper servers and
+      maintaining heartbeat is done on the IO thread. Responses for
+      synchronous methods are also processed in the IO thread. All responses
+      to asynchronous methods and watch events are processed on the event
+      thread. There are a few things to notice that result from this
+      design:</p>
+<ul>
+        
+<li>
+          
+<p>All completions for asynchronous calls and watcher callbacks
+          will be made in order, one at a time. The caller can do any
+          processing they wish, but no other callbacks will be processed
+          during that time.</p>
+        
+</li>
+
+        
+<li>
+          
+<p>Callbacks do not block the processing of the IO thread or the
+          processing of the synchronous calls.</p>
+        
+</li>
+
+        
+<li>
+          
+<p>Synchronous calls may not return in the correct order. For
+          example, assume a client does the following processing: issues an
+          asynchronous read of node <strong>/a</strong> with
+          <em>watch</em> set to true, and then in the completion
+          callback of the read it does a synchronous read of <strong>/a</strong>. (Maybe not good practice, but not illegal
+          either, and it makes for a simple example.)</p>
+
+          
+<p>Note that if there is a change to <strong>/a</strong> between the asynchronous read and the
+          synchronous read, the client library will receive the watch event
+          saying <strong>/a</strong> changed before the
+          response for the synchronous read, but because the completion
+          callback is blocking the event queue, the synchronous read will
+          return with the new value of <strong>/a</strong>
+          before the watch event is processed.</p>
+        
+</li>
+      
+</ul>
+<p>Finally, the rules associated with shutdown are straightforward:
+      once a ZooKeeper object is closed or receives a fatal event
+      (SESSION_EXPIRED and AUTH_FAILED), the ZooKeeper object becomes invalid,
+      the two threads shut down, and any further ZooKeeper calls throw
+      errors.</p>
+<a name="N102E3"></a><a name="C+Binding"></a>
+<h3 class="h4">C Binding</h3>
+<p>The C binding has a single-threaded and multi-threaded library.
+      The multi-threaded library is easiest to use and is most similar to the
+      Java API. This library will create an IO thread and an event dispatch
+      thread for handling connection maintenance and callbacks. The
+      single-threaded library allows ZooKeeper to be used in event driven
+      applications by exposing the event loop used in the multi-threaded
+      library.</p>
+<p>The package includes two shared libraries: zookeeper_st and
+      zookeeper_mt. The former only provides the asynchronous APIs and
+      callbacks for integrating into the application's event loop. The only
+      reason this library exists is to support the platforms were a
+      <em>pthread</em> library is not available or is unstable
+      (i.e. FreeBSD 4.x). In all other cases, application developers should
+      link with zookeeper_mt, as it includes support for both Sync and Async
+      API.</p>
+<a name="N102F2"></a><a name="Installation"></a>
+<h4>Installation</h4>
+<p>If you're building the client from a check-out from the Apache
+        repository, follow the steps outlined below. If you're building from a
+        project source package downloaded from apache, skip to step <strong>3</strong>.</p>
+<ol>
+          
+<li>
+            
+<p>Run <span class="codefrag command">ant compile_just</span> from the zookeeper
+            top level directory (<span class="codefrag filename">.../trunk/zookeeper</span>).
+            This will create a directory named "generated" under
+            <span class="codefrag filename">zookeeper/c</span>.</p>
+          
+</li>
+
+          
+<li>
+            
+<p>Change directory to the<span class="codefrag filename">zookeeper/c</span> and
+            run <span class="codefrag command">autoreconf -i</span> to bootstrap <strong>autoconf</strong>, <strong>automake</strong> and <strong>libtool</strong>. Make sure you have <strong>autoconf version 2.59</strong> or greater installed.
+            Skip to step<strong> 4</strong>.</p>
+          
+</li>
+
+          
+<li>
+            
+<p>If you are building from a project source package,
+            unzip/untar the source tarball and cd to the<span class="codefrag filename">
+            zookeeper-x.x.x/</span> directory.</p>
+          
+</li>
+
+          
+<li>
+            
+<p>Run <span class="codefrag command">./configure &lt;your-options&gt;</span> to
+            generate the makefile. Here are some of options the <strong>configure</strong> utility supports that can be
+            useful in this step:</p>
+
+            
+<ul>
+              
+<li>
+                
+<p>
+<span class="codefrag command">--enable-debug</span>
+</p>
+
+                
+<p>Enables optimization and enables debug info compiler
+                options. (Disabled by default.)</p>
+              
+</li>
+
+              
+<li>
+                
+<p>
+<span class="codefrag command">--without-syncapi </span>
+</p>
+
+                
+<p>Disables Sync API support; zookeeper_mt library won't be
+                built. (Enabled by default.)</p>
+              
+</li>
+
+              
+<li>
+                
+<p>
+<span class="codefrag command">--disable-static </span>
+</p>
+
+                
+<p>Do not build static libraries. (Enabled by
+                default.)</p>
+              
+</li>
+
+              
+<li>
+                
+<p>
+<span class="codefrag command">--disable-shared</span>
+</p>
+
+                
+<p>Do not build shared libraries. (Enabled by
+                default.)</p>
+              
+</li>
+            
+</ul>
+
+            
+<div class="note">
+<div class="label">Note</div>
+<div class="content">
+              
+<p>See INSTALL for general information about running
+              <strong>configure</strong>. <remark>[tbd: what
+              is INSTALL? a directory? a file?]</remark>
+</p>
+            
+</div>
+</div>
+          
+</li>
+
+          
+<li>
+            
+<p>Run <span class="codefrag command">make</span> or <span class="codefrag command">make
+            install</span> to build the libraries and install them.</p>
+          
+</li>
+
+          
+<li>
+            
+<p>To generate doxygen documentation for the ZooKeeper API, run
+            <span class="codefrag command">make doxygen-doc</span>. All documentation will be
+            placed in a new subfolder named docs. By default, this command
+            only generates HTML. For information on other document formats,
+            run <span class="codefrag command">./configure --help</span>
+</p>
+          
+</li>
+        
+</ol>
+<a name="N1039D"></a><a name="Using+the+Client"></a>
+<h4>Using the Client</h4>
+<p>You can test your client by running a zookeeper server (see
+        instructions on the project wiki page on how to run it) and connecting
+        to it using one of the cli applications that were built as part of the
+        installation procedure. cli_mt (multithreaded, built against
+        zookeeper_mt library) is shown in this example, but you could also use
+        cli_st (singlethreaded, built against zookeeper_st library):</p>
+<p>
+<pre class="code">$ cli_mt zookeeper_host:9876</pre>This
+        is a client application that gives you a shell for executing simple
+        zookeeper commands. Once succesully started and connected to the
+        server it displays a shell prompt. You can now enter zookeeper
+        commands. For example, to create a node:</p>
+<pre class="code">&gt; create /my_new_node</pre>
+<p>To verify that the node's been created:</p>
+<p>You should see a list of node who are children of the root node
+        "/". <remark>[tbd: document all the cli commands (I think this is
+        Ben's tbd? It's from sourceforge)]</remark>
+</p>
+<p>In order to be able to use the ZooKeeper API in your application
+        you have to remember to</p>
+<ol>
+          
+<li>
+            
+<p>Include zookeeper header: #include
+            &lt;zookeeper/zookeeper.h</p>
+          
+</li>
+
+          
+<li>
+            
+<p>If you are building a multithreaded client, compile with
+            -DTHREADED compiler flag to enable the multi-threaded version of
+            the library, and then link against against the
+            <span class="codefrag varname">zookeeper_mt</span> library. If you are building a
+            single-threaded client, do not compile with -DTHREADED, and be
+            sure to link against the<span class="codefrag varname"> zookeeper_st
+            </span>library.</p>
+          
+</li>
+        
+</ol>
+<p>Refer to <a href="#ch_programStructureWithExample">Program Structure, with Simple Example</a>for examples of usage in Java and C.
+        <remark>[tbd: some kind of short tutorial would be helpful, I guess
+        (ben's tbd?) ][tbd: whatever the case, make sure that link points to something.]</remark>
+</p>
+</div>
+
+   
+<a name="N103DC"></a><a name="Building+Blocks%3A+A+Guide+to+ZooKeeper+Operations"></a>
+<h2 class="h3">Building Blocks: A Guide to ZooKeeper Operations</h2>
+<div class="section">
+<p>
+<remark>[Engineering input needed. This is a new section. The below
+    is just placeholder, and was actually copied from the overview book. There
+    should probably be a subsection on each of those operations, with a little
+    bit of illustrative code for each op.] </remark>
+</p>
+<p>One of the design goals of ZooKeeper is provide a very simple
+    programming interface. As a result, it supports only these
+    operations:</p>
+<dl>
+      
+<dt>
+<term>create</term>
+</dt>
+<dd>
+<p>creates a node at a location in the tree</p>
+</dd>
+
+      
+<dt>
+<term>delete</term>
+</dt>
+<dd>
+<p>deletes a node</p>
+</dd>
+
+      
+<dt>
+<term>exists</term>
+</dt>
+<dd>
+<p>tests if a node exists at a location</p>
+</dd>
+
+      
+<dt>
+<term>get data</term>
+</dt>
+<dd>
+<p>reads the data from a node</p>
+</dd>
+
+      
+<dt>
+<term>set data</term>
+</dt>
+<dd>
+<p>writes data to a node</p>
+</dd>
+
+      
+<dt>
+<term>get children</term>
+</dt>
+<dd>
+<p>retrieves a list of children of a node</p>
+</dd>
+
+      
+<dt>
+<term>sync</term>
+</dt>
+<dd>
+<p>waits for data to be propagated.</p>
+</dd>
+    
+</dl>
+</div>
+  
+  
+<a name="N1041E"></a><a name="Program+Structure%2C+with+Simple+Example"></a>
+<h2 class="h3">Program Structure, with Simple Example</h2>
+<div class="section">
+<p>
+<remark>[tbd]</remark>
+</p>
+</div>
+
+  
+<a name="N10429"></a><a name="Gotchas%3A+Common+Problems+and+Troubleshooting"></a>
+<h2 class="h3">Gotchas: Common Problems and Troubleshooting</h2>
+<div class="section">
+<p>So now you know ZooKeeper. It's fast, simple, your application
+    works, but wait ... something's wrong. Here are some pitfalls that
+    ZooKeeper users fall into:</p>
+<ol>
+      
+<li>
+        
+<p>If you are using watches, you must look for the connected watch
+        event. When a ZooKeeper client disconnects from a server, all the
+        watches are removed, so a client must treat the disconnect event as an
+        implicit trigger of watches. The easiest way to deal with this is to
+        act like the connected watch event is a watch trigger for all your
+        watches. The connected event makes a better trigger than the
+        disconnected event because you can access ZooKeeper and reestablish
+        watches when you are connected.</p>
+      
+</li>
+
+      
+<li>
+        
+<p>You must test ZooKeeper server failures. The ZooKeeper service
+        can survive failures as long as a majority of servers are active. The
+        question to ask is: can your application handle it? In the real world
+        a client's connection to ZooKeeper can break. (ZooKeeper server
+        failures and network partitions are common reasons for connection
+        loss.) The ZooKeeper client library takes care of recovering your
+        connection and letting you know what happened, but you must make sure
+        that you recover your state and any outstanding requests that failed.
+        Find out if you got it right in the test lab, not in production - test
+        with a ZooKeeper service made up of a several of servers and subject
+        them to reboots.</p>
+      
+</li>
+
+      
+<li>
+        
+<p>The list of ZooKeeper servers used by the client must match the
+        list of ZooKeeper servers that each ZooKeeper server has. Things can
+        work, although not optimally, if the client list is a subset of the
+        real list of ZooKeeper servers, but not if the client lists ZooKeeper
+        servers not in the ZooKeeper cluster.</p>
+      
+</li>
+
+      
+<li>
+        
+<p>Be careful where you put that transaction log. The most
+        performance-critical part of ZooKeeper is the transaction log.
+        ZooKeeper must sync transactions to media before it returns a
+        response. A dedicated transaction log device is key to consistent good
+        performance. Putting the log on a busy device will adversely effect
+        performance. If you only have one storage device, put trace files on
+        NFS and increase the snapshotCount; it doesn't eliminate the problem,
+        but it can mitigate it.</p>
+      
+</li>
+
+      
+<li>
+        
+<p>Set your Java max heap size correctly. It is very important to
+        <em>avoid swapping.</em> Going to disk unnecessarily will
+        almost certainly degrade your performance unacceptably. Remember, in
+        ZooKeeper, everything is ordered, so if one request hits the disk, all
+        other queued requests hit the disk.</p>
+
+        
+<p>To avoid swapping, try to set the heapsize to the amount of
+        physical memory you have, minus the amount needed by the OS and cache.
+        The best way to determine an optimal heap size for your configurations
+        is to <em>run load tests</em>. If for some reason you
+        can't, be conservative in your estimates and choose a number well
+        below the limit that would cause your machine to swap. For example, on
+        a 4G machine, a 3G heap is a conservative estimate to start
+        with.</p>
+      
+</li>
+    
+</ol>
+</div>
+
+  
+<a name="apx_linksToOtherInfo"></a>
+<appendix id="apx_linksToOtherInfo">
+    
+<title>Links to Other Information</title>
+
+    
+<p>Outside the formal documentation, there're several other sources of
+    information for ZooKeeper developers.</p>
+
+    
+<dl>
+      
+<dt>
+<term>ZooKeeper Whitepaper <remark>[tbd: find url]</remark>
+</term>
+</dt>
+<dd>
+<p>The definitive discussion of ZooKeeper design and performance,
+          by Yahoo! Research</p>
+</dd>
+
+      
+<dt>
+<term>API Reference <remark>[tbd: find url]</remark>
+</term>
+</dt>
+<dd>
+<p>The complete reference to the ZooKeeper API</p>
+</dd>
+
+      
+<dt>
+<term>
+<a href="http://us.dl1.yimg.com/download.yahoo.com/dl/ydn/zookeeper.m4v">Zookeeper
+        Talk at the Hadoup Summit 2008</a>
+</term>
+</dt>
+<dd>
+<p>A video introduction to ZooKeeper, by Benjamin Reed of Yahoo!
+          Research</p>
+</dd>
+
+      
+<dt>
+<term>
+<a href="http://wiki.apache.org/hadoop/ZooKeeper/Tutorial">Barrier and
+        Queue Tutorial</a>
+</term>
+</dt>
+<dd>
+<p>The excellent Java tutorial by Flavio Junqueira, implementing
+          simple barriers and producer-consumer queues using ZooKeeper.</p>
+</dd>
+
+      
+<dt>
+<term>
+<a href="http://wiki.apache.org/hadoop/ZooKeeper/ZooKeeperArticles">ZooKeeper
+        - A Reliable, Scalable Distributed Coordination System</a>
+</term>
+</dt>
+<dd>
+<p>An article by Todd Hoff (07/15/2008)</p>
+</dd>
+
+      
+<dt>
+<term>
+<a href="recipes.html">Zookeeper Recipes [tbd: fix
+        linkend for apache site]</a>
+</term>
+</dt>
+<dd>
+<p>Pseudo-level discussion of the implementation of various
+          synchronization solutions with ZooKeeper: Event Handles, Queues,
+          Locks, and Two-phase Commits.</p>
+</dd>
+
+      
+<dt>
+<term>
+<remark>[tbd]</remark>
+</term>
+</dt>
+<dd>
+<p>Whatever good sources anyone can think of...</p>
+</dd>
+    
+</dl>
+  
+</appendix>
+
+<p align="right">
+<font size="-2"></font>
+</p>
+</div>
+<!--+
+    |end content
+    +-->
+<div class="clearboth">&nbsp;</div>
+</div>
+<div id="footer">
+<!--+
+    |start bottomstrip
+    +-->
+<div class="lastmodified">
+<script type="text/javascript"><!--
+document.write("Last Published: " + document.lastModified);
+//  --></script>
+</div>
+<div class="copyright">
+        Copyright &copy;
+         2008 <a href="http://www.apache.org/licenses/">The Apache Software Foundation.</a>
+</div>
+<!--+
+    |end bottomstrip
+    +-->
+</div>
+</body>
+</html>

File diff suppressed because it is too large
+ 195 - 0
docs/zookeeperProgrammers.pdf


+ 446 - 0
docs/zookeeperStarted.html

@@ -0,0 +1,446 @@
+<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
+<html>
+<head>
+<META http-equiv="Content-Type" content="text/html; charset=UTF-8">
+<meta content="Apache Forrest" name="Generator">
+<meta name="Forrest-version" content="0.8">
+<meta name="Forrest-skin-name" content="pelt">
+<title></title>
+<link type="text/css" href="skin/basic.css" rel="stylesheet">
+<link media="screen" type="text/css" href="skin/screen.css" rel="stylesheet">
+<link media="print" type="text/css" href="skin/print.css" rel="stylesheet">
+<link type="text/css" href="skin/profile.css" rel="stylesheet">
+<script src="skin/getBlank.js" language="javascript" type="text/javascript"></script><script src="skin/getMenu.js" language="javascript" type="text/javascript"></script><script src="skin/fontsize.js" language="javascript" type="text/javascript"></script>
+<link rel="shortcut icon" href="images/favicon.ico">
+</head>
+<body onload="init()">
+<script type="text/javascript">ndeSetTextSize();</script>
+<div id="top">
+<!--+
+    |breadtrail
+    +-->
+<div class="breadtrail">
+<a href="http://www.apache.org/">Apache</a> &gt; <a href="http://hadoop.apache.org/">Hadoop</a> &gt; <a href="http://hadoop.apache.org/zookeeper/">ZooKeeper</a><script src="skin/breadcrumbs.js" language="JavaScript" type="text/javascript"></script>
+</div>
+<!--+
+    |header
+    +-->
+<div class="header">
+<!--+
+    |start group logo
+    +-->
+<div class="grouplogo">
+<a href="http://hadoop.apache.org/"><img class="logoImage" alt="Hadoop" src="images/hadoop-logo.jpg" title="Apache Hadoop"></a>
+</div>
+<!--+
+    |end group logo
+    +-->
+<!--+
+    |start Project Logo
+    +-->
+<div class="projectlogo">
+<a href="http://hadoop.apache.org/zookeeper/"><img class="logoImage" alt="ZooKeeper" src="images/zookeeper_small.gif" title="The Hadoop database"></a>
+</div>
+<!--+
+    |end Project Logo
+    +-->
+<!--+
+    |start Search
+    +-->
+<div class="searchbox">
+<form action="http://www.google.com/search" method="get" class="roundtopsmall">
+<input value="hadoop.apache.org" name="sitesearch" type="hidden"><input onFocus="getBlank (this, 'Search the site with google');" size="25" name="q" id="query" type="text" value="Search the site with google">&nbsp; 
+                    <input name="Search" value="Search" type="submit">
+</form>
+</div>
+<!--+
+    |end search
+    +-->
+<!--+
+    |start Tabs
+    +-->
+<ul id="tabs">
+<li>
+<a class="unselected" href="http://hadoop.apache.org/zookeeper/">Project</a>
+</li>
+<li>
+<a class="unselected" href="http://wiki.apache.org/hadoop/ZooKeeper">Wiki</a>
+</li>
+<li class="current">
+<a class="selected" href="index.html">ZooKeeper Documentation</a>
+</li>
+</ul>
+<!--+
+    |end Tabs
+    +-->
+</div>
+</div>
+<div id="main">
+<div id="publishedStrip">
+<!--+
+    |start Subtabs
+    +-->
+<div id="level2tabs"></div>
+<!--+
+    |end Endtabs
+    +-->
+<script type="text/javascript"><!--
+document.write("Last Published: " + document.lastModified);
+//  --></script>
+</div>
+<!--+
+    |breadtrail
+    +-->
+<div class="breadtrail">
+
+             &nbsp;
+           </div>
+<!--+
+    |start Menu, mainarea
+    +-->
+<!--+
+    |start Menu
+    +-->
+<div id="menu">
+<div onclick="SwitchMenu('menu_selected_1.1', 'skin/')" id="menu_selected_1.1Title" class="menutitle" style="background-image: url('skin/images/chapter_open.gif');">Documentation</div>
+<div id="menu_selected_1.1" class="selectedmenuitemgroup" style="display: block;">
+<div class="menuitem">
+<a href="index.html">Welcome</a>
+</div>
+<div class="menuitem">
+<a href="zookeeperOver.html">Zookeeper Overview</a>
+</div>
+<div class="menupage">
+<div class="menupagetitle">Getting Started</div>
+</div>
+<div class="menuitem">
+<a href="zookeeperProgrammers.html">Programmer's Guide</a>
+</div>
+<div class="menuitem">
+<a href="recipes.html">Recipes</a>
+</div>
+<div class="menuitem">
+<a href="zookeeperAdmin.html">Administrator's Guide</a>
+</div>
+<div class="menuitem">
+<a href="api/index.html">API Docs</a>
+</div>
+<div class="menuitem">
+<a href="http://wiki.apache.org/hadoop/ZooKeeper">Wiki</a>
+</div>
+<div class="menuitem">
+<a href="http://wiki.apache.org/hadoop/ZooKeeper/FAQ">FAQ</a>
+</div>
+<div class="menuitem">
+<a href="http://hadoop.apache.org/zookeeper/mailing_lists.html">Mailing Lists</a>
+</div>
+<div class="menuitem">
+<a href="zookeeperOtherInfo.html">Other Info</a>
+</div>
+</div>
+<div id="credit"></div>
+<div id="roundbottom">
+<img style="display: none" class="corner" height="15" width="15" alt="" src="skin/images/rc-b-l-15-1body-2menu-3menu.png"></div>
+<!--+
+  |alternative credits
+  +-->
+<div id="credit2"></div>
+</div>
+<!--+
+    |end Menu
+    +-->
+<!--+
+    |start content
+    +-->
+<div id="content">
+<div title="Portable Document Format" class="pdflink">
+<a class="dida" href="zookeeperStarted.pdf"><img alt="PDF -icon" src="skin/images/pdfdoc.gif" class="skin"><br>
+        PDF</a>
+</div>
+<div id="minitoc-area">
+<ul class="minitoc">
+<li>
+<a href="#Getting+Started%3A+Coordinating+Distributed+Applications+with%0A++++++ZooKeeper">Getting Started: Coordinating Distributed Applications with
+      ZooKeeper</a>
+<ul class="minitoc">
+<li>
+<a href="#sc_InstallingSingleMode">Installing and Running ZooKeeper in Single Server Mode</a>
+</li>
+<li>
+<a href="#sc_ConnectingToZooKeeper">Connecting to ZooKeeper</a>
+</li>
+<li>
+<a href="#sc_ProgrammingToZooKeeper">Programming to ZooKeeper</a>
+</li>
+<li>
+<a href="#sc_RunningReplicatedZooKeeper">Running Replicated ZooKeeper</a>
+</li>
+<li>
+<a href="#Other+Optimizations">Other Optimizations</a>
+</li>
+</ul>
+</li>
+</ul>
+</div>
+  
+<title>ZooKeeper Getting Started Guide</title>
+
+  
+
+  
+<a name="N1000A"></a><a name="Getting+Started%3A+Coordinating+Distributed+Applications+with%0A++++++ZooKeeper"></a>
+<h2 class="h3">Getting Started: Coordinating Distributed Applications with
+      ZooKeeper</h2>
+<div class="section">
+<p>This document contains information to get you started quickly with
+    Zookeeper. It is aimed primarily at developers hoping to try it out, and
+    contains simple installation instructions for a single ZooKeeper server, a
+    few commands to verify that it is running, and a simple programming
+    example. Finally, as a convenience, there are a few sections regarding
+    more complicated installations, for example running replicated
+    deployments, and optimizing the transaction log. However for the complete
+    instructions for commercial deployments, please refer to the <a href="zookeeperAdmin.html">Zookeeper
+    Administrator's Guide</a>.</p>
+<a name="N10017"></a><a name="sc_InstallingSingleMode"></a>
+<h3 class="h4">Installing and Running ZooKeeper in Single Server Mode</h3>
+<p>Setting up a ZooKeeper server in standalone mode is
+      straightforward. The server is contained in a single JAR file, so
+      installation consists of copying a JAR file and creating a
+      configuration.</p>
+<div class="note">
+<div class="label">Note</div>
+<div class="content">
+        
+<p>Zookeeper requires Java 1.5 or more recent.</p>
+      
+</div>
+</div>
+<p>[tbd: should we start w/ a word here about were to get the source,
+      exactly what to download, how to unpack it, and where to put it? Also,
+      does the user need to be in sudo, or can they be under their regular
+      login?]</p>
+<p>Once you have downloaded the ZooKeeper source, cd to the root of
+      your ZooKeeper source, and run "ant jar". For example:<pre class="code">$ cd ~/dev/zookeeper
+
+$ ~/dev/zookeeper/: ant jar</pre>
+</p>
+<p>This should generate a JAR file called zookeeper.jar. To start
+      Zookeeper, compile and run zookeeper.jar. <em>[tbd, some more
+      instruction here. Perhaps a command line? Are these two steps or
+      one?]</em>
+</p>
+<p>To start ZooKeeper you need a configuration file. Here is a sample
+      file:</p>
+<p>
+<pre class="code">tickTime=2000
+dataDir=/var/zookeeper/ 
+clientPort=2181
+</pre>
+</p>
+<p>This file can be called anything, but for the sake of this
+      discussion, call it <strong>zoo.cfg</strong>. Here are
+      the meanings for each of the fields:</p>
+<dl>
+        
+<dt>
+<term>
+<strong>tickTime</strong>
+</term>
+</dt>
+<dd>
+<p>the basic time unit in milliseconds used by ZooKeeper. It is
+            used to do heartbeats and the minimum session timeout will be
+            twice the tickTime.</p>
+</dd>
+      
+</dl>
+<dl>
+        
+<dt>
+<term>
+<strong>dataDir</strong>
+</term>
+</dt>
+<dd>
+<p>the location to store the in-memory database snapshots and,
+            unless specified otherwise, the transaction log of updates to the
+            database.</p>
+</dd>
+
+        
+<dt>
+<term>
+<strong>clientPort</strong>
+</term>
+</dt>
+<dd>
+<p>the port to listen for client connections</p>
+</dd>
+      
+</dl>
+<p>Now that you created the configuration file, you can start
+      ZooKeeper:</p>
+<p>
+<pre class="code">java -cp zookeeper-dev.jar:java/lib/log4j-1.2.15.jar:conf org.apache.zookeeper.server.quorum.QuorumPeerMain zoo.cfg</pre>
+</p>
+<p>ZooKeeper logs messages using log4j -- more detail available in
+      the <a href="zookeeperProgrammers.html#Logging">Logging</a>
+      section of the Programmer's Guide.<remark revision="include_tbd">[tbd:
+      real reference needed]</remark> You will see log messages coming to the
+      console and/or a log file depending on the log4j configuration.</p>
+<p>The steps outlined here run ZooKeeper in standalone mode. There is
+      no replication, so if Zookeeper process fails, the service will go down.
+      This is fine for most development situations, but to run Zookeeper in
+      replicated mode, please see <a href="#sc_RunningReplicatedZooKeeper">Running Replicated
+      Zookeeper</a>.</p>
+<p></p>
+<a name="N1007A"></a><a name="sc_ConnectingToZooKeeper"></a>
+<h3 class="h4">Connecting to ZooKeeper</h3>
+<p>Once ZooKeeper is running, you have several option for connection
+      to it:</p>
+<ul>
+        
+<li>
+          
+<p>
+<strong>Java</strong>: Use java -cp
+          zookeeper.jar:java/lib/log4j-1.2.15.jar:conf
+          org.apache.zookeeper.ZooKeeperMain 127.0.0.1:2181</p>
+
+          
+<p>This lets you perform simple, file-like operations.</p>
+        
+</li>
+
+        
+<li>
+          
+<p>
+<strong>C</strong>: compile cli_mt
+          (multi-threaded) or cli_st (single-threaded) by running
+          <span class="codefrag command">_make cli_mt_</span> or <span class="codefrag command">_make cli_st_</span>
+          in the c subdirectory in the ZooKeeper sources.</p>
+
+          
+<p>You can run the program using <em>LD_LIBRARY_PATH=.
+          cli_mt 127.0.0.1:2181</em> or <em>LD_LIBRARY_PATH=.
+          cli_st 127.0.0.1:2181</em>. This will give you a simple shell
+          to execute file system like operations on ZooKeeper.</p>
+        
+</li>
+      
+</ul>
+<a name="N100AB"></a><a name="sc_ProgrammingToZooKeeper"></a>
+<h3 class="h4">Programming to ZooKeeper</h3>
+<p>ZooKeeper has a Java bindings and C bindings. They are
+      functionally equivalent. The C bindings exist in two variants: single
+      threaded and multi-threaded. These differ only in how the messaging loop
+      is done. <remark>[tbd: what is the messaging loop? Do we talk about it
+      anywyhere? is this too much info for a getting started guide?]</remark>
+      For more information, see the <a href="zookeeperProgrammers.html#ch_programStructureWithExample.html">Programming
+      Examples in the Zookeeper Programmer's Guide</a> for
+      sample code using of the different APIs.</p>
+<a name="N100BC"></a><a name="sc_RunningReplicatedZooKeeper"></a>
+<h3 class="h4">Running Replicated ZooKeeper</h3>
+<p>Running ZooKeeper in standalone mode is convenient for evaluation,
+      some development, and testing. But in production, you should run
+      ZooKeeper in replicated mode. A replicated group of servers in the same
+      application is called a <em>quorum</em>, and in replicated
+      mode, all servers in the quorum have copies of the same configuration
+      file. The file is similar to the one used in standalone mode, but with a
+      few differences. Here is an example:</p>
+<p>
+<pre class="code">tickTime=2000 
+dataDir=/var/zookeeper/ 
+clientPort=2181 
+initLimit=5 
+syncLimit=2 
+server.1=zoo1:2888 server.2=zoo2:2888 
+server.3=zoo3:2888 </pre>
+</p>
+<p>The new entry, <strong>initLimit</strong> is
+      timeouts ZooKeeper uses to limit the length of time the Zookeeper
+      servers in quorum have to connect to a leader. The entry <strong>syncLimit</strong> limits how far out of date a server can
+      be from a leader. [TBD: someone please verify that the previous is
+      true.]</p>
+<p>With both of these timeouts, you specify the unit of time using
+      <strong>tickTime</strong>. In this example, the timeout
+      for initLimit is 5 ticks at 2000 milleseconds a tick, or 10
+      seconds.</p>
+<p>The entries of the form <em>server.X</em> list the
+      servers that make up the ZooKeeper service. When the server starts up,
+      it knows which server it is by looking for the file *myid* in the data
+      directory. That file has the contains the server number, in
+      ASCII.</p>
+<p>Finally, note the "2888" port numbers after each server name.
+      These are the "electionPort" numbers of the servers (as opposed to
+      clientPorts), that is ports for <remark>[tbd: feedback need: what are
+      these ports, exactly?]</remark>.</p>
+<div class="note">
+<div class="label">Note</div>
+<div class="content">
+        
+<p>If you want to test multiple servers on a single machine, define
+        the electionPort for each server in that server's config file, using
+        the line <span class="codefrag command">electionPort=xxxx</span> as means of avoiding
+        clashes.</p>
+      
+</div>
+</div>
+<a name="N100F2"></a><a name="Other+Optimizations"></a>
+<h3 class="h4">Other Optimizations</h3>
+<p>There are a couple of other configuration parameters that can
+      greatly increase performance:</p>
+<ul>
+        
+<li>
+          
+<p>To get low latencies on updates it is important to have a
+          dedicated transaction log directory. By default transaction logs are
+          put in the same directory as the data snapshots and *myid* file. The
+          dataLogDir parameters indicates a different directory to use for the
+          transaction logs.</p>
+        
+</li>
+
+        
+<li>
+          
+<p>
+<remark>[tbd: feedback need: what is the other config param?
+          (I believe two are mentioned above.)]</remark>
+</p>
+        
+</li>
+      
+</ul>
+</div>
+
+<p align="right">
+<font size="-2"></font>
+</p>
+</div>
+<!--+
+    |end content
+    +-->
+<div class="clearboth">&nbsp;</div>
+</div>
+<div id="footer">
+<!--+
+    |start bottomstrip
+    +-->
+<div class="lastmodified">
+<script type="text/javascript"><!--
+document.write("Last Published: " + document.lastModified);
+//  --></script>
+</div>
+<div class="copyright">
+        Copyright &copy;
+         2008 <a href="http://www.apache.org/licenses/">The Apache Software Foundation.</a>
+</div>
+<!--+
+    |end bottomstrip
+    +-->
+</div>
+</body>
+</html>

File diff suppressed because it is too large
+ 96 - 0
docs/zookeeperStarted.pdf


+ 2 - 0
src/docs/forrest.properties

@@ -102,3 +102,5 @@
 #project.issues-rss-url=
 #I18n Property only works for the "forrest run" target.
 #project.i18n=true
+
+project.required.plugins=org.apache.forrest.plugin.output.pdf,org.apache.forrest.plugin.input.simplifiedDocbook

+ 16 - 8
src/docs/src/documentation/content/xdocs/index.xml

@@ -20,20 +20,28 @@
 <document>
   
   <header>
-    <title>ZooKeeper Documentation</title>
+    <title>ZooKeeper: Because Coordinating Distributed Systems is a Zoo</title>
   </header>
   
   <body>
     <p>
-    The following documents provide concepts and procedures that will help you 
-    get started using ZooKeeper. If you have more questions, you can ask the 
-    <a href="ext:lists">mailing list</a> or browse the archives.
+ZooKeeper is a high-performance coordination service for distributed applications. It exposes common services - such as naming, configuration management, synchronization, and group services - in a simple interface so you don't have to write them from scratch. You can use it off-the-shelf to implement consensus, group management, leader election, and presence protocols. And you can build on it for your own, specific needs.
+</p>
+
+<p>
+The following documents provide concepts and procedures to get you started using ZooKeeper. If you have more questions, please ask the <a href="ext:lists">mailing list</a> or browse the archives.
     </p>
     <ul>
-      <li><a href="ext:api/started">Getting Started</a></li>
-      <li><a href="ext:api/index">API Docs</a></li>
-      <li><a href="ext:wiki">Wiki</a></li>
-      <li><a href="ext:faq">FAQ</a></li>
+
+      <li><a href="zookeeperOver.html">Overview</a> - a bird's eye view of ZooKeeper, including design concepts and architecture</li>
+      <li><a href="zookeeperStarted.html">Getting Started</a> - a tutorial-style guide for developers to install, run, and program to ZooKeeper</li>
+      <li><a href="zookeeperProgrammers.html">Programmer's Guide</a> - an application developer's guide to ZooKeeper</li>
+      <li><a href="recipes.html">ZooKeeper Recipes</a> - a set of common, higher level solutions using ZooKeeper</li>
+      <li><a href="zookeeperAdmin.html">Administrator's Guide</a> - a guide for system administrators and anyone else who might deploy Zookeeer</li>
+      <li><a href="ext:api/index">API Docs</a> - the technical reference to ZooKeeper APIs</li>
+      <li><a href="ext:wiki">Wiki</a> - miscellaneous, informal ZooKeeper documentation, in Wiki format</li>
+      <li><a href="ext:faq">FAQ</a> - frequently asked questions</li>    
+
     </ul>
   </body>
   

+ 623 - 0
src/docs/src/documentation/content/xdocs/recipes.xml

@@ -0,0 +1,623 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!--
+  Copyright 2002-2004 The Apache Software Foundation
+
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+      http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License.
+-->
+
+<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN"
+"http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd">
+<book id="ar_Recipes">
+  <title>ZooKeeper Recipes and Solutions</title>
+
+  <bookinfo>
+    <legalnotice>
+      <para>Licensed under the Apache License, Version 2.0 (the "License");
+      you may not use this file except in compliance with the License. You may
+      obtain a copy of the License at <ulink
+      url="http://www.apache.org/licenses/LICENSE-2.0">http://www.apache.org/licenses/LICENSE-2.0</ulink>.</para>
+
+      <para>Unless required by applicable law or agreed to in writing,
+      software distributed under the License is distributed on an "AS IS"
+      BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
+      implied. See the License for the specific language governing permissions
+      and limitations under the License.</para>
+    </legalnotice>
+
+    <abstract>
+      <para>This guide contains pseudocode and guidelines for using Zookeeper to
+      solve common problems in Distributed Application Coordination. It
+      discusses such problems as event handlers, queues, and locks..</para>
+
+      <para>$Revision: 1.6 $ $Date: 2008/09/19 03:46:18 $</para>
+    </abstract>
+  </bookinfo>
+
+  <chapter id="ch_recipes">
+    <title>A Guide to Creating Higher-level Constructs with ZooKeeper</title>
+
+    <para>In this article, you'll find guidelines for using
+    ZooKeeper to implement higher order functions. All of them are conventions
+    implemented at the client and do not require special support from
+    ZooKeeper. Hopfully the community will capture these conventions in client-side libraries 
+    to ease their use and to encourage standardization.</para>
+
+    <para>One of the most interesting things about ZooKeeper is that even
+    though ZooKeeper uses <emphasis>asynchronous</emphasis> notifications, you
+    can use it to build <emphasis>synchronous</emphasis> consistency
+    primitives, such as queues and locks. As you will see, this is possible
+    because ZooKeeper imposes an overall order on updates, and has mechanisms
+    to expose this ordering.</para>
+
+    <para>Note that the recipes below attempt to employ best practices. In
+    particular, they avoid polling, timers or anything else that would result
+    in a "herd effect", causing bursts of traffic and limiting
+    scalability.</para>
+
+    <para>There are many useful functions that can be imagined that aren't
+    included here - revocable read-write priority locks, as just one example.
+    And some of the constructs mentioned here - locks, in particular -
+    illustrate certain points, even though you may find other constructs, such
+    as event handles or queues, a more practical means of performing the same
+    function. In general, the examples in this section are designed to
+    stimulate thought.</para>
+
+
+  <section id="sc_outOfTheBox">
+    <title>Out of the Box Applications: Name Service, Configuration, Group
+    Membership</title>
+
+    <para>Name service and configuration are two of the primary applications
+    of ZooKeeper. These two functions are provided directly by the ZooKeeper
+    API.</para>
+
+    <para>Another function directly provided by ZooKeeper is <emphasis>group
+    membership</emphasis>. The group is represented by a node. Members of the
+    group create ephemeral nodes under the group node. Nodes of the members
+    that fail abnormally will be removed automatically when ZooKeeper detects
+    the failure.</para>
+  </section>
+
+  <section id="sc_recipes_eventHandles">
+    <title>Barriers</title>
+
+    <para>Distributed systems use <firstterm>barriers</firstterm> to block
+    processing of a set of nodes until a condition is met at which time all
+    the nodes are allowed to proceed. Barriers are implemented in ZooKeeper by
+    designating a barrier node. The barrier is in place if the barrier node
+    exists. Here's the pseudo code:</para>
+
+    <orderedlist>
+      <listitem>
+        <para>Client calls the ZooKeeper API's <emphasis
+        role="bold">exists()</emphasis> function on the barrier node, with
+        <emphasis>watch</emphasis> set to true.</para>
+      </listitem>
+
+      <listitem>
+        <para>If <emphasis role="bold">exists()</emphasis> returns false, the
+        barrier is gone and the client proceeds</para>
+      </listitem>
+
+      <listitem>
+        <para>Else, if <emphasis role="bold">exists()</emphasis> returns true,
+        the clients wait for a watch event from ZooKeeper for the barrier
+        node.</para>
+      </listitem>
+
+      <listitem>
+        <para>When the watch event is triggered, the client reissues the
+        <emphasis role="bold">exists( )</emphasis> call, again waiting until
+        the barrier node is removed.</para>
+      </listitem>
+    </orderedlist>
+
+    <para><remark>[tbd: maybe an illustration would be nice for each of the
+    recipes?]</remark></para>
+
+    <section id="sc_doubleBarriers">
+      <title>Double Barriers</title>
+
+      <para>Double barriers enable clients to synchronize the beginning and
+      the end of a computation. When enough processes have joined the barrier,
+      processes start their computation and leave the barrier once they have
+      finished. This recipe shows how to use a ZooKeeper node as a
+      barrier.</para>
+
+      <para>The pseudo code in this recipe represents the barrier node as
+      <emphasis>b</emphasis>. Every client process <emphasis>p</emphasis>
+      registers with the barrier node on entry and unregisters when it is
+      ready to leave. A node registers with the barrier node via the <emphasis
+      role="bold">Enter</emphasis> procedure below, it waits until
+      <emphasis>x</emphasis> client process register before proceeding with
+      the computation. (The <emphasis>x</emphasis> here is up to you to
+      determine for your system.)</para>
+
+      <para><informaltable colsep="0" frame="none" rowsep="0">
+          <tgroup cols="2">
+            <tbody>
+              <row>
+                <entry align="center"><emphasis
+                role="bold">Enter</emphasis></entry>
+
+                <entry align="center"><emphasis
+                role="bold">Leave</emphasis></entry>
+              </row>
+
+              <row>
+                <entry align="left"><orderedlist>
+                    <listitem>
+                      <para>Create a name <emphasis><emphasis>n</emphasis> =
+                      <emphasis>b</emphasis>+“/”+<emphasis>p</emphasis></emphasis></para>
+                    </listitem>
+
+                    <listitem>
+                      <para>Set watch: <emphasis
+                      role="bold">exists(<emphasis>b</emphasis> + ‘‘/ready’’,
+                      true)</emphasis></para>
+                    </listitem>
+
+                    <listitem>
+                      <para>Create child: <emphasis role="bold">create(
+                      <emphasis>n</emphasis>, EPHEMERAL)</emphasis></para>
+                    </listitem>
+
+                    <listitem>
+                      <para><emphasis role="bold">L = getChildren(b,
+                      false)</emphasis></para>
+                    </listitem>
+
+                    <listitem>
+                      <para>if fewer children in L than<emphasis>
+                      x</emphasis>, wait for watch event <remark>[tbd: how do
+                      you wait?]</remark></para>
+                    </listitem>
+
+                    <listitem>
+                      <para>else <emphasis role="bold">create(b + ‘‘/ready’’,
+                      REGULAR)</emphasis></para>
+                    </listitem>
+                  </orderedlist></entry>
+
+                <entry><orderedlist>
+                    <listitem>
+                      <para><emphasis role="bold">L = getChildren(b,
+                      false)</emphasis></para>
+                    </listitem>
+
+                    <listitem>
+                      <para>if no children, exit</para>
+                    </listitem>
+
+                    <listitem>
+                      <para>if <emphasis>p</emphasis> is only process node in
+                      L, delete(n) and exit</para>
+                    </listitem>
+
+                    <listitem>
+                      <para>if <emphasis>p</emphasis> is the lowest process
+                      node in L, wait on highest process node in P</para>
+                    </listitem>
+
+                    <listitem>
+                      <para>else <emphasis
+                      role="bold">delete(<emphasis>n</emphasis>) </emphasis>if
+                      still exists and wait on lowest process node in L</para>
+                    </listitem>
+
+                    <listitem>
+                      <para>goto 1</para>
+                    </listitem>
+                  </orderedlist></entry>
+              </row>
+            </tbody>
+          </tgroup>
+        </informaltable>On entering, all processes watch on a ready node and
+      create an ephemeral node as a child of the barrier node. Each process
+      but the last enters the barrier and waits for the ready node to appear
+      at line 5. The process that creates the xth node, the last process, will
+      see x nodes in the list of children and create the ready node, waking up
+      the other processes. Note that waiting processes wake up only when it is
+      time to exit, so waiting is efficient.</para>
+
+      <para>On exit, you can't use a flag such as <emphasis>ready</emphasis>
+      because you are watching for process nodes to go away. By using
+      ephemeral nodes, processes that fail after the barrier has been entered
+      do not prevent correct processes from finishing. When processes are
+      ready to leave, they need to delete their process nodes and wait for all
+      other processes to do the same.</para>
+
+      <para>Processes exit when there are no process nodes left as children of
+      <emphasis>b</emphasis>. However, as an efficiency, you can use the
+      lowest process node as the ready flag. All other processes that are
+      ready to exit watch for the lowest existing process node to go away, and
+      the owner of the lowest process watches for any other process node
+      (picking the highest for simplicity) to go away. This means that only a
+      single process wakes up on each node deletion except for the last node,
+      which wakes up everyone when it is removed.</para>
+    </section>
+  </section>
+
+  <section id="sc_recipes_Queues">
+    <title>Queues</title>
+
+    <para>Distributed queues are a common data structure. To implement a
+    distributed queue in ZooKeeper, first designate a znode to hold the queue,
+    the queue node. The distributed clients put something into the queue by
+    calling create() with a pathname ending in "queue-", with the
+    <emphasis>sequence</emphasis> and <emphasis>ephemeral</emphasis> flags in
+    the create() call set to true. Because the <emphasis>sequence</emphasis>
+    flag is set, the new pathnames will have the form
+    _path-to-queue-node_/queue-X, where X is a monotonic increasing number. A
+    client that wants to be remove from the queue calls ZooKeeper's <emphasis
+    role="bold">getChildren( )</emphasis> function, with
+    <emphasis>watch</emphasis> set to true on the queue node, and begins
+    processing nodes with the lowest number. The client does not need to issue
+    another <emphasis role="bold">getChildren( )</emphasis> until it exhausts
+    the list obtained from the first <emphasis role="bold">getChildren(
+    )</emphasis> call. If there are are no children in the queue node, the
+    reader waits for a watch notification to check to queue again.</para>
+
+    <section id="sc_recipes_priorityQueues">
+      <title>Priority Queues</title>
+
+      <para>To implement a priority queue, you need only make two simple
+      changes to the generic <ulink url="#sc_recipes_Queues">queue
+      recipe</ulink> . First, to add to a queue, the pathname ends with
+      "queue-YY" where YY is the priority of the element with lower numbers
+      representing higher priority (just like UNIX). Second, when removing
+      from the queue a client uses an up-to-date children list meaning that
+      the client will invalidate previously obtained children lists if a watch
+      notification triggers for the queue node.</para>
+    </section>
+  </section>
+
+  <section id="sc_recipes_Locks">
+    <title>Locks</title>
+
+    <para>Fully distributed locks that are globally synchronous, meaning at
+    any snapshot in time no two clients think they hold the same lock. These
+    can be implemented using ZooKeeeper. As with priority queues, first define
+    a lock node.</para>
+
+    <para>Clients wishing to obtain a lock do the following:</para>
+
+    <orderedlist>
+      <listitem>
+        <para>Call <emphasis role="bold">create( )</emphasis> with a pathname
+        of "_locknode_/lock-" and the <emphasis>sequence</emphasis> and
+        <emphasis>ephemeral</emphasis> flags set.</para>
+      </listitem>
+
+      <listitem>
+        <para>Call <emphasis role="bold">getChildren( )</emphasis> on the lock
+        node <emphasis>without</emphasis> setting the watch flag (this is
+        important to avoid the herd effect).</para>
+      </listitem>
+
+      <listitem>
+        <para>If the pathname created in step <emphasis
+        role="bold">1</emphasis> has the lowest sequence number suffix, the
+        client has the lock and the client exits the protocol.</para>
+      </listitem>
+
+      <listitem>
+        <para>The client calls <emphasis role="bold">exists( )</emphasis> with
+        the watch flag set on the path in the lock directory with the next
+        lowest sequence number.</para>
+      </listitem>
+
+      <listitem>
+        <para>if <emphasis role="bold">exists( )</emphasis> returns false, go
+        to step <emphasis role="bold">2</emphasis>. Otherwise, wait for a
+        notification for the pathname from the previous step before going to
+        step <emphasis role="bold">2</emphasis>.</para>
+      </listitem>
+    </orderedlist>
+
+    <para>The unlock protocol is very simple: clients wishing to release a
+    lock simply delete the node they created in step 1.</para>
+
+    <para>Here are a few things to notice:</para>
+
+    <itemizedlist>
+      <listitem>
+        <para>The removal of a node will only cause one client to wake up
+        since each node is watched by exactly one client. In this way, you
+        avoid the herd effect.</para>
+      </listitem>
+    </itemizedlist>
+
+    <itemizedlist>
+      <listitem>
+        <para>There is no polling or timeouts.</para>
+      </listitem>
+    </itemizedlist>
+
+    <itemizedlist>
+      <listitem>
+        <para>Because of the way you implement locking, it is easy to see the
+        amount of lock contention, break locks, debug locking problems,
+        etc.</para>
+      </listitem>
+    </itemizedlist>
+
+    <section>
+      <title>Shared Locks</title>
+
+      <para>You can implement shared locks by with a few changes to the lock
+      protocol:</para>
+
+      <informaltable colsep="0" frame="none" rowsep="0">
+        <tgroup cols="2">
+          <tbody>
+            <row>
+              <entry align="center"><emphasis role="bold">Obtaining a read
+              lock:</emphasis></entry>
+
+              <entry align="center"><emphasis role="bold">Obtaining a write
+              lock:</emphasis></entry>
+            </row>
+
+            <row>
+              <entry align="left"><orderedlist>
+                  <listitem>
+                    <para>Call <emphasis role="bold">create( )</emphasis> to
+                    create a node with pathname
+                    "<parameter>_locknode_/read-</parameter>". This is the
+                    lock node use later in the protocol. Make sure to set both
+                    the <emphasis>sequence</emphasis> and
+                    <emphasis>ephemeral</emphasis> flags.</para>
+                  </listitem>
+
+                  <listitem>
+                    <para>Call <emphasis role="bold">getChildren( )</emphasis>
+                    on the lock node <emphasis>without</emphasis> setting the
+                    <emphasis>watch</emphasis> flag - this is important, as it
+                    avoids the herd effect.</para>
+                  </listitem>
+
+                  <listitem>
+                    <para>If there are no children with a pathname starting
+                    with "<parameter>write-</parameter>" and having a lower
+                    sequence number than the node created in step <emphasis
+                    role="bold">1</emphasis>, the client has the lock and can
+                    exit the protocol. </para>
+                  </listitem>
+
+                  <listitem>
+                    <para>Otherwise, call <emphasis role="bold">exists(
+                    )</emphasis>, with <emphasis>watch</emphasis> flag, set on
+                    the node in lock directory with pathname staring with
+                    "<parameter>write-</parameter>" having the next lowest
+                    sequence number.</para>
+                  </listitem>
+
+                  <listitem>
+                    <para>If <emphasis role="bold">exists( )</emphasis>
+                    returns <emphasis>false</emphasis>, goto step <emphasis
+                    role="bold">2</emphasis>.</para>
+                  </listitem>
+
+                  <listitem>
+                    <para>Otherwise, wait for a notification for the pathname
+                    from the previous step before going to step <emphasis
+                    role="bold">2</emphasis></para>
+                  </listitem>
+                </orderedlist></entry>
+
+              <entry><orderedlist>
+                  <listitem>
+                    <para>Call <emphasis role="bold">create( )</emphasis> to
+                    create a node with pathname
+                    "<parameter>_locknode_/write-</parameter>". This is the
+                    lock node spoken of later in the protocol. Make sure to
+                    set both <emphasis>sequence</emphasis> and
+                    <emphasis>ephemeral</emphasis> flags.</para>
+                  </listitem>
+
+                  <listitem>
+                    <para>Call <emphasis role="bold">getChildren( )
+                    </emphasis> on the lock node <emphasis>without</emphasis>
+                    setting the <emphasis>watch</emphasis> flag - this is
+                    important, as it avoids the herd effect.</para>
+                  </listitem>
+
+                  <listitem>
+                    <para>If there are no children with a lower sequence
+                    number than the node created in step <emphasis
+                    role="bold">1</emphasis>, the client has the lock and the
+                    client exits the protocol.</para>
+                  </listitem>
+
+                  <listitem>
+                    <para>Call <emphasis role="bold">exists( ),</emphasis>
+                    with <emphasis>watch</emphasis> flag set, on the node with
+                    the pathname that has the next lowest sequence
+                    number.</para>
+                  </listitem>
+
+                  <listitem>
+                    <para>If <emphasis role="bold">exists( )</emphasis>
+                    returns <emphasis>false</emphasis>, goto step <emphasis
+                    role="bold">2</emphasis>. Otherwise, wait for a
+                    notification for the pathname from the previous step
+                    before going to step <emphasis
+                    role="bold">2</emphasis>.</para>
+                  </listitem>
+                </orderedlist></entry>
+            </row>
+          </tbody>
+        </tgroup>
+      </informaltable>
+
+      <para><note>
+          <para>It might appear that this recipe creates a herd effect: when
+          there is a large group of clients waiting for a read lock, and all
+          getting notified more or less simultaneously when the
+          "<parameter>write-</parameter>" node with the lowest sequence number
+          is deleted. In fact. that's valid behavior: as all those waiting
+          reader clients should be released since they have the lock. The herd
+          effect refers to releasing a "herd" when in fact only a single or a
+          small number of machines can proceed. <remark>[tbd: myabe helpful to
+          indicate which step this refers to?]</remark></para>
+        </note></para>
+    </section>
+
+    <section id="sc_recoverableSharedLocks">
+      <title>Recoverable Shared Locks</title>
+
+      <para>With minor modifications to the Shared Lock protocol, you make
+      shared locks revocable by modifying the shared lock protocol:</para>
+
+      <para>In step <emphasis role="bold">1</emphasis>, of both obtain reader
+      and writer lock protocols, call <emphasis role="bold">getData(
+      )</emphasis> with <emphasis>watch</emphasis> set, immediately after the
+      call to <emphasis role="bold">create( )</emphasis>. If the client
+      subsequently receives notification for the node it created in step
+      <emphasis role="bold">1</emphasis>, it does another <emphasis
+      role="bold">getData( )</emphasis> on that node, with
+      <emphasis>watch</emphasis> set and looks for the string "unlock", which
+      signals to the client that it must release the lock. This is because,
+      according to this shared lock protocol, you can request the client with
+      the lock give up the lock by calling <emphasis role="bold">setData()
+      </emphasis> on the lock node, writing "unlock" to that node.</para>
+
+      <para>Note that this protocol requires the lock holder to consent to
+      releasing the lock. Such consent is important, especially if the lock
+      holder needs to do some processing before releasing the lock. Of course
+      you can always implement <emphasis>Revocable Shared Locks with Freaking
+      Laser Beams</emphasis> by stipulating in your protocol that the revoker
+      is allowed to delete the lock node if after some length of time the lock
+      isn't deleted by the lock holder.</para>
+    </section>
+  </section>
+
+  <section id="sc_recipes_twoPhasedCommit">
+    <title>Two-phased Commit</title>
+
+    <para>A two-phase commit protocol is an algorithm that lets all clients in
+    a distributed system agree either to commit a transaction or abort.</para>
+
+    <para>In ZooKeeper, you can implement a two-phased commit by having a
+    coordinator create a transaction node, say "/app/Tx", and one child node
+    per participating site, say "/app/Tx/s_i". When coordinator creates the
+    child node, it leaves the content undefined. Once each site involved in
+    the transaction receives the transaction from the coordinator, the site
+    reads each child node and sets a watch. Each site then processes the query
+    and votes "commit" or "abort" by writing to its respective node. Once the
+    write completes, the other sites are notified, and as soon as all sites
+    have all votes, they can decide either "abort" or "commit". Note that a
+    node can decide "abort" earlier if some site votes for "abort".</para>
+
+    <para>An interesting aspect of this implementation is that the only role
+    of the coordinator is to decide upon the group of sites, to create the
+    ZooKeeper nodes, and to propagate the transaction to the corresponding
+    sites. In fact, even propagating the transaction can be done through
+    ZooKeeper by writing it in the transaction node.</para>
+
+    <para>There are two important drawbacks of the approach described above.
+    One is the message complexity, which is O(n²). The second is the
+    impossibility of detecting failures of sites through ephemeral nodes. To
+    detect the failure of a site using ephemeral nodes, it is necessary that
+    the site create the node.</para>
+
+    <para>To solve the first problem, you can have only the coordinator
+    notified of changes to the transaction nodes, and then notify the sites
+    once coordinator reaches a decision. Note that this approach is scalable,
+    but it's is slower too, as it requires all communication to go through the
+    coordinator.</para>
+
+    <para>To address the second problem, you can have the coordinator
+    propagate the transaction to the sites, and have each site creating its
+    own ephemeral node.</para>
+  </section>
+
+  <section id="sc_leaderElection">
+    <title>Leader Election</title>
+
+    <para>A simple way of doing leader election with ZooKeeper is to use the
+    <emphasis role="bold">SEQUENCE|EPHEMERAL</emphasis> flags when creating
+    znodes that represent "proposals" of clients. The idea is to have a znode,
+    say "/election", such that each znode creates a child znode "/election/n_"
+    with both flags SEQUENCE|EPHEMERAL. With the sequence flag, ZooKeeper
+    automatically appends a sequence number that is greater that any one
+    previously appended to a child of "/election". The process that created
+    the znode with the smallest appended sequence number is the leader.
+    </para>
+
+    <para>That's not all, though. It is important to watch for failures of the
+    leader, so that a new client arises as the new leader in the case the
+    current leader fails. A trivial solution is to have all application
+    processes watching upon the current smallest znode, and checking if they
+    are the new leader when the smallest znode goes away (note that the
+    smallest znode will go away if the leader fails because the node is
+    ephemeral). But this causes a herd effect: upon of failure of the current
+    leader, all other processes receive a notification, and execute
+    getChildren on "/election" to obtain the current list of children of
+    "/election". If the number of clients is large, it causes a spike on the
+    number of operations that ZooKeeper servers have to process. To avoid the
+    herd effect, it is sufficient to watch for the next znode down on the
+    sequence of znodes. If a client receives a notification that the znode it
+    is watching is gone, then it becomes the new leader in the case that there
+    is no smaller znode. Note that this avoids the herd effect by not having
+    all clients watching the same znode. </para>
+
+    <para>Here's the pseudo code:</para>
+
+    <para>Let ELECTION be a path of choice of the application. To volunteer to
+    be a leader: </para>
+
+    <orderedlist>
+      <listitem>
+        <para>Create znode z with path "ELECTION/n_" with both SEQUENCE and
+        EPHEMERAL flags;</para>
+      </listitem>
+
+      <listitem>
+        <para>Let C be the children of "ELECTION", and i be the sequence
+        number of z;</para>
+      </listitem>
+
+      <listitem>
+        <para>Watch for changes on "ELECTION/n_j", where j is the smallest
+        sequence number such that j &lt; i and n_j is a znode in C;</para>
+      </listitem>
+    </orderedlist>
+
+    <para>Upon receiving a notification of znode deletion: </para>
+
+    <orderedlist>
+      <listitem>
+        <para>Let C be the new set of children of ELECTION; </para>
+      </listitem>
+
+      <listitem>
+        <para>If z is the smallest node in C, then execute leader
+        procedure;</para>
+      </listitem>
+
+      <listitem>
+        <para>Otherwise, watch for changes on "ELECTION/n_j", where j is the
+        smallest sequence number such that j &lt; i and n_j is a znode in C;
+        </para>
+      </listitem>
+    </orderedlist>
+
+    <para>Note that the znode having no preceding znode on the list of
+    children does not imply that the creator of this znode is aware that it is
+    the current leader. Applications may consider creating a separate to znode
+    to acknowledge that the leader has executed the leader procedure. </para>
+  </section>
+  </chapter>
+</book>

+ 11 - 6
src/docs/src/documentation/content/xdocs/site.xml

@@ -32,12 +32,17 @@ See http://forrest.apache.org/docs/linking.html for more info.
 <site label="Hadoop" href="" xmlns="http://apache.org/forrest/linkmap/1.0">
 
   <docs label="Documentation"> 
-    <overview  label="Overview"           href="index.html" />
-    <started   label="Getting Started"    href="ext:api/started" />
-    <api       label="API Docs"           href="ext:api/index" />
-    <wiki      label="Wiki"               href="ext:wiki" />
-    <faq       label="FAQ"                href="ext:faq" />
-    <lists     label="Mailing Lists"      href="ext:lists" />
+    <welcome   label="Welcome"                href="index.html" />
+    <overview  label="Zookeeper Overview"     href="zookeeperOver.html" />
+    <started   label="Getting Started"        href="zookeeperStarted.html" />
+    <program   label="Programmer's Guide"     href="zookeeperProgrammers.html" />
+    <recipes   label="Recipes"		      href="recipes.html" />
+    <admin     label="Administrator's Guide"  href="zookeeperAdmin.html" />
+    <api       label="API Docs"               href="ext:api/index" />
+    <wiki      label="Wiki"                   href="ext:wiki" />
+    <faq       label="FAQ"                    href="ext:faq" />
+    <lists     label="Mailing Lists"          href="ext:lists" />
+    <other     label="Other Info"	      href="zookeeperOtherInfo.html" />
   </docs>
 
   <external-refs>

+ 827 - 0
src/docs/src/documentation/content/xdocs/zookeeperAdmin.xml

@@ -0,0 +1,827 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!--
+  Copyright 2002-2004 The Apache Software Foundation
+
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+      http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License.
+-->
+
+<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN"
+"http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd">
+<book id="bk_Admin">
+  <title>ZooKeeper Administrator's Guide</title>
+
+  <subtitle>A Guide to Deployment and Administration</subtitle>
+
+  <bookinfo>
+    <legalnotice>
+      <para>Licensed under the Apache License, Version 2.0 (the "License");
+      you may not use this file except in compliance with the License. You may
+      obtain a copy of the License at <ulink
+      url="http://www.apache.org/licenses/LICENSE-2.0">http://www.apache.org/licenses/LICENSE-2.0</ulink>.</para>
+
+      <para>Unless required by applicable law or agreed to in writing,
+      software distributed under the License is distributed on an "AS IS"
+      BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
+      implied. See the License for the specific language governing permissions
+      and limitations under the License.</para>
+    </legalnotice>
+
+    <abstract>
+      <para>This document contains information about deploying, administering
+      and mantaining ZooKeeper. It also discusses best practices and common
+      problems.</para>
+
+      <para>$Revision: 1.7 $ $Date: 2008/09/19 05:29:31 $</para>
+    </abstract>
+  </bookinfo>
+
+  <chapter id="ch_deployment">
+    <title>Deployment</title>
+
+    <para>This chapter contains information about deploying Zookeeper and
+    covers these topics:</para>
+
+    <itemizedlist>
+      <listitem>
+        <para><xref linkend="sc_systemReq"/></para>
+      </listitem>
+
+      <listitem>
+        <para><xref linkend="sc_zkMulitServerSetup"/></para>
+      </listitem>
+
+      <listitem>
+        <para><xref linkend="sc_singleAndDevSetup"/></para>
+      </listitem>
+    </itemizedlist>
+
+    <para>The first two sections assume you are interested in installing
+    Zookeeper in a production environment such as a datacenter. The final
+    section covers situations in which you are setting up Zookeeper on a
+    limited basis - for evaluation, testing, or development - but not in a
+    production environment.</para>
+
+    <section id="sc_systemReq">
+      <title>System Requirements</title>
+
+      <para>Zookeeper runs in Java, release 1.6 or greater, as group of hosts
+      called a quorum. Three Zookeeper hosts per quorum is the minimum
+      recommended quorum size. At Yahoo!, Zookeeper is usually deployed on
+      dedicated RHEL boxes, with dual-core processors, 2GB of RAM, and 80GB
+      IDE harddrives.</para>
+    </section>
+
+    <section id="sc_zkMulitServerSetup">
+      <title>Clustered (Multi-Server) Setup</title>
+
+      <para>For reliable ZooKeeper service, you should deploy ZooKeeper in a
+      cluster known as a <firstterm>quorum</firstterm>. As long as a majority
+      of the quorum are up, the service will be available. Because Zookeeper
+      requires a majority <remark>[tbd: why?]</remark>, it is best to use an
+      odd number of machines. For example, with four machines ZooKeeper can
+      only handle the failure of a single machine; if two machines fail, the
+      remaining two machines do not constitute a majority. However, with five
+      machines ZooKeeper can handle the failure of two machines. </para>
+
+      <para>Here are the steps to setting a server that will be part of a
+      quorum. These steps should be performed on every host in the
+      quorum:</para>
+
+      <orderedlist>
+        <listitem>
+          <para>Install the Java JDK:</para>
+
+          <screen>$yinst -i jdk-1.6.0.00_3 -br test  <remark>[y! prop - replace with open equiv]</remark></screen>
+        </listitem>
+
+        <listitem>
+          <para>Set the Java heap size. This is very important, to avoid
+          swapping, which will seriously degrade Zookeeper performance. To
+          determine the correct value, load tests, make sure you are well
+          below the usage limit that would cause you to swap. Be conservative
+          - use a maximum heap size of 3GB for a 4GB machine. <remark>[tbd:
+          where would they do this? Environment variable,
+          etc?]</remark></para>
+        </listitem>
+
+        <listitem>
+          <para>Install the Zookeeper Server Package:</para>
+
+          <screen>$ yinst install -nostart zookeeper_server <remark>[Y! prop - replace with open eq]</remark></screen>
+        </listitem>
+
+        <listitem>
+          <para>Create a configuration file. This file can be called anything.
+          Use the following settings as a starting point:</para>
+
+          <screen>
+tickTime=2000
+dataDir=/var/zookeeper/
+clientPort=2181
+initLimit=5
+syncLimit=2
+server.1=zoo1:2888
+server.2=zoo2:2888
+server.3=zoo3:2888</screen>
+
+          <para>You can find the meanings of these and other configuration
+          settings in the section <xref linkend="sc_configuration" />. A word
+          though about a few here:</para>
+
+          <para>Every machine that is part of the ZooKeeper quorum should know
+          about every other machine in the quorum. You accomplish this with
+          the series of lines of the form <emphasis
+          role="bold">server.id=host:port</emphasis>. The integers <emphasis
+          role="bold">host</emphasis> and <emphasis
+          role="bold">port</emphasis> are straightforward. You attribute the
+          server id to each machine by creating a file named
+          <filename>myid</filename>, one for each server, which resides in
+          that server's data directory, as specified by the configuration file
+          parameter <emphasis role="bold">dataDir</emphasis>. The myid file
+          consists of a single line containing only the text of that machine's
+          id. So <filename>myid</filename> of server 1 would contain the text
+          "1" and nothing else. The id must be unique within the
+          quorum.</para>
+        </listitem>
+
+        <listitem>
+          <para>If your configuration file is set up, you can start
+          Zookeeper:</para>
+
+          <screen>$ java -cp zookeeper-dev.jar:java/lib/log4j-1.2.15.jar:conf \
+        org.apache.zookeeper.server.quorum.QuorumPeerMain zoo.cfg</screen>
+        </listitem>
+
+        <listitem>
+          <para>Test your deployment by connecting to the hosts:</para>
+
+          <itemizedlist>
+            <listitem>
+              <para>In Java, you can run the following command to execute
+              simple operations:<remark> [tbd: also, maybe give some of those
+              simple operations?]</remark></para>
+
+              <screen>$ java -cp zookeeper.jar:java/lib/log4j-1.2.15.jar:conf \
+      org.apache.zookeeper.ZooKeeperMain 127.0.0.1:2181</screen>
+            </listitem>
+
+            <listitem>
+              <para>In C, you can compile either the single threaded client or
+              the multithreaded client: or n the c subdirectory in the
+              Zookeeper sources. This compiles the single threaded
+              client:</para>
+
+              <screen>$ _make cli_st_</screen>
+
+              <para>And this compiles the mulithreaded client:</para>
+
+              <screen>$ _make cli_mt_</screen>
+            </listitem>
+          </itemizedlist>
+
+          <para>Running either program gives you a shell in which to execute
+          simple file-system-like operations. <remark>[tbd: again, sample
+          operations?]</remark> To connect to Zookeeper with the multithreaded
+          client, for example, you would run:</para>
+
+          <screen>$ cli_mt 127.0.0.1:2181</screen>
+        </listitem>
+      </orderedlist>
+    </section>
+
+    <section id="sc_singleAndDevSetup">
+      <title>Single Server and Developer Setup</title>
+
+      <para>If you want to setup Zookeeper for development purposes, you will
+      probably want to setup a single server instance of Zookeeper, and then
+      install either the Java or C client-side libraries and bindings on your
+      development machine.</para>
+
+      <para>The steps to setting up a single server instance are the similar
+      to the above, except the configuration file is simpler. You can find the
+      complete instructions in the <ulink
+      url="zookeeperStarted.html#sc_InstallingSingleMode">Installing
+      and Running Zookeeper in SIngle Server Mode</ulink> section of the
+      <ulink url="zookeeperStarted.html">Zookeeper
+      Getting Started Guide</ulink>.</para>
+
+      <para>For information on installing the client side libraries, refer to
+      the <ulink
+      url="zookeeperProgrammers.html#Bindings">Bindings</ulink>
+      section of the <ulink
+      url="zookeeperProgrammers.html">Zookeeper
+      Programmer's Guide</ulink>.</para>
+    </section>
+  </chapter>
+
+  <chapter id="ch_administration">
+    <title>Administration</title>
+
+    <para>This chapter contains information about running and maintaining
+    ZooKeeper and covers these topics: <itemizedlist>
+        <listitem>
+          <para><xref linkend="sc_configuration"/></para>
+        </listitem>
+
+        <listitem>
+          <para><xref linkend="sc_zkCommands"/></para>
+        </listitem>
+
+        <listitem>
+          <para><xref linkend="sc_dataFileManagement"/></para>
+        </listitem>
+
+        <listitem>
+          <para><xref linkend="sc_commonProblems"/></para>
+        </listitem>
+
+        <listitem>
+          <para><xref linkend="sc_bestPractices"/></para>
+        </listitem>
+      </itemizedlist></para>
+
+
+      <section id="sc_configuration">
+        <title>Configuration Parameters</title>
+
+        <para>ZooKeeper's behavior is governed by the ZooKeeper configuration
+        file. This file is designed so that the exact same file can be used by
+        all the servers that make up a ZooKeeper server assuming the disk
+        layouts are the same. If servers use different configuration files,
+        care must be taken to ensure that the list of servers in all of the
+        different configuration files match.<remark> [tbd: reformat in
+        standard form, with legal values, etc]</remark></para>
+
+        <section id="sc_minimumConfiguration">
+          <title>Minimum Configuration</title>
+
+          <para>Here are the minimum configuration keywords that must be
+          defined in the configuration file:</para>
+
+          <variablelist>
+
+	    <varlistentry>
+              <term>clientPort</term>
+
+              <listitem>
+                <para>the port to listen for client connections; that is, the
+                port that clients attempt to connect to.</para>
+              </listitem>
+            </varlistentry>
+
+            <varlistentry>
+              <term>dataDir</term>
+
+              <listitem>
+                <para>the location where Zookeeper will store the in-memory
+                database snapshots and, unless specified otherwise, the
+                transaction log of updates to the database.</para>
+
+                <note>
+                  <para>Be careful where you put the transaction log. A
+                  dedicated transaction log device is key to consistent good
+                  performance. Putting the log on a busy device will adversely
+                  effect performance.</para>
+                </note>
+              </listitem>
+            </varlistentry>
+	    
+	    <varlistentry id="id_tickTime">
+              <term>tickTime</term>
+
+              <listitem>
+                <para>the length of a single tick, which is the basic time
+                unit used by ZooKeeper, as measured in milliseconds. It is
+                used to regulate heartbeats, and timeouts. For example, the
+                minimum session timeout will be two ticks.</para>
+              </listitem>
+            </varlistentry>
+	    
+          </variablelist>
+        </section>
+
+        <section id="sc_advancedConfiguration">
+          <title>Advanced Configuration</title>
+
+          <para>The configuration settings in the section are optional. You
+          can use them to further fine tune the behaviour of your Zookeeper
+          servers. Some can also be set using Java system properties,
+          generally of the form <emphasis>zookeeper.keyword</emphasis>. The
+          exact system property, when available, is noted below.</para>
+
+          <variablelist>
+	  
+            <varlistentry>
+              <term>dataLogDir</term>
+
+              <listitem>
+                <para>(No Java system property)</para>
+
+                <para>This option will direct the machine to write the
+                transaction log to the <emphasis
+                role="bold">dataLogDir</emphasis> rather than the <emphasis
+                role="bold">dataDir</emphasis>. This allows a dedicated log
+                device to be used, and helps avoid competition between logging
+                and snaphots.</para>
+
+                <note>
+                  <para>Having a dedicated log device has a large impact on
+                  throughput and stable latencies. It is highly recommened to
+                  dedicate a log device and set <emphasis
+                  role="bold">dataLogDir</emphasis> to point to a directory on
+                  that device, and then make sure to point <emphasis
+                  role="bold">dataDir</emphasis> to a directory
+                  <emphasis>not</emphasis> residing on that device.</para>
+                </note>
+              </listitem>
+            </varlistentry>
+	    
+	     <varlistentry>
+              <term>globalOutstandingLimit</term>
+
+              <listitem>
+                <para>(Java system property: <emphasis
+                role="bold">zookeeper.globalOutstandingLimit.</emphasis>)</para>
+
+                <para>Clients can submit requests faster than ZooKeeper can
+                process them, especially if there are a lot of clients. To
+                prevent ZooKeeper from running out of memory due to queued
+                requests, ZooKeeper will throttle clients so that there is no
+                more than globalOutstandingLimit outstanding requests in the
+                system. The default limit is 1,000.</para>
+              </listitem>
+            </varlistentry>
+	    
+            <varlistentry>
+              <term>preAllocSize</term>
+
+              <listitem>
+                <para>(Java system property: <emphasis
+                role="bold">zookeeper.preAllocSize</emphasis>)</para>
+
+                <para>To avoid seeks ZooKeeper allocates space in the
+                transaction log file in blocks of preAllocSize kilobytes. The
+                default block size is 64M. One reason for changing the size of
+                the blocks is to reduce the block size if snapshots are taken
+                more often. (Also, see <emphasis
+                role="bold">snapCount</emphasis>).</para>
+              </listitem>
+            </varlistentry>
+
+            <varlistentry>
+              <term>snapCount</term>
+
+              <listitem>
+                <para>(Java system property: <emphasis
+                role="bold">zookeeper.snapCount</emphasis>)</para>
+
+                <para>Clients can submit requests faster than ZooKeeper can
+                process them, especially if there are a lot of clients. To
+                prevent ZooKeeper from running out of memory due to queued
+                requests, ZooKeeper will throttle clients so that there is no
+                more than globalOutstandingLimit outstanding requests in the
+                system. The default limit is 1,000.ZooKeeper logs transactions
+                to a transaction log. After snapCount transactions are written
+                to a log file a snapshot is started and a new transaction log
+                file is started. The default snapCount is 10,000.</para>
+              </listitem>
+            </varlistentry>
+
+            <varlistentry>
+              <term>traceFile</term>
+
+              <listitem>
+                <para>(Java system property: <emphasis
+                role="bold">requestTraceFile</emphasis>)</para>
+
+                <para>If this option is defined, requests will be will logged
+                to a trace file named traceFile.year.month.day. Use of this
+                option provides useful debugging information, but will impact
+                performance. (Note: The system property has no zookeeper
+                prefix, and the configuration variable name is different from
+                the system property. Yes - it's not consistent, and it's
+                annoying.<remark> [tbd: is there any explanation for
+                this?]</remark>)</para>
+              </listitem>
+            </varlistentry>
+
+          </variablelist>
+        </section>
+
+        <section id="sc_clusterOptions">
+          <title>Cluster Options</title>
+
+          <para>The options in this section are designed for use in quorums --
+          that is, when deploying clusters of servers.</para>
+
+          <variablelist>
+            <varlistentry>
+              <term>electionAlg:</term>
+
+              <listitem>
+                <para>(No Java system property)</para>
+
+                <para>Election implementation to use. A value of "0"
+                corresponds to the original UDP-based version, "1" corresponds
+                to the non-authenticated UDP-based version of fast leader
+                election, "2" corresponds to the authenticated UDP-based
+                version of fast leader election, and "3" corresponds to
+                TCP-based version of fast leader election</para>
+              </listitem>
+            </varlistentry>
+
+            <varlistentry>
+              <term>electionPort</term>
+
+              <listitem>
+                <para>(No Java system property)</para>
+
+                <para>Port used for leader election. It is only used when the
+                election algorithm is not "0". When the election algorithm is
+                "0" a UDP port with the same port number as the port listed in
+                the <emphasis role="bold">server.num</emphasis> option will be
+                used. <remark>[tbd: should that be <emphasis
+                role="bold">server.id</emphasis>? Also, why isn't server.id
+                documented anywhere?]</remark></para>
+              </listitem>
+            </varlistentry>
+
+            <varlistentry>
+              <term>initLimit</term>
+
+              <listitem>
+                <para>(No Java system property)</para>
+
+                <para>Amount of time, in ticks (see <ulink
+                url="#id_tickTime">tickTime</ulink>), to allow followers to
+                connect and sync to a leader. Increased this value as needed,
+                if the amount of data managed by ZooKeeper is large.</para>
+              </listitem>
+            </varlistentry>
+
+            <varlistentry>
+              <term>leaderServes</term>
+
+              <listitem>
+                <para>(Java system property: zookeeper.<emphasis
+                role="bold">leaderServes</emphasis>)</para>
+
+                <para>Leader accepts client connections. Default value is
+                "yes". The leader machine coordinates updates. For higher
+                update throughput at thes slight expense of read throughput
+                the leader can be configured to not accept clients and focus
+                on coordination. The default to this option is yes, which
+                means that a leader will accept client connections.
+                <remark>[tbd: how do you specifiy which server is the
+                leader?]</remark></para>
+
+                <note>
+                  <para>Turning on leader selection is highly recommended when
+                  you have more than three Zookeeper servers in a
+                  quorum.</para>
+                </note>
+              </listitem>
+            </varlistentry>
+
+            <varlistentry>
+              <term>server.x=[hostname]:nnnn, etc</term>
+
+              <listitem>
+                <para>(No Java system property)</para>
+
+                <para>servers making up the Zookeeper quorum. When the server
+                starts up, it determines which server it is by looking for the
+                file <filename>myid</filename> in the data directory.<remark>
+                [tdb: should we mention somewhere about creating this file,
+                myid, in the setup procedure?]</remark> That file contains the
+                server number, in ASCII, and it should match <emphasis
+                role="bold">x</emphasis> in <emphasis
+                role="bold">server.x</emphasis> in the left hand side of this
+                setting.</para>
+
+                <para>The list of servers that make up ZooKeeper servers that
+                is used by the clients must match the list of ZooKeeper
+                servers that each ZooKeeper server has.</para>
+
+                <para>The port numbers <emphasis role="bold">nnnn</emphasis>
+                in this setting are the <emphasis>electionPort</emphasis>
+                numbers of the servers (as opposed to clientPorts).
+                <remark>[tbd: is the next sentence explanation an of what the
+                election port or is it a description of a special case?]
+                </remark>If you want to test multiple servers on a single
+                machine, the individual choices of electionPort for each
+                server can be defined in each server's config files using the
+                line electionPort=xxxx to avoid clashes.</para>
+              </listitem>
+            </varlistentry>
+
+            <varlistentry>
+              <term>syncLimit</term>
+
+              <listitem>
+                <para>(No Java system property)</para>
+
+                <para>Amount of time, in ticks (see <ulink
+                url="#id_tickTime">tickTime</ulink>), to allow followers to
+                sync with ZooKeeper. If followers fall too far behind a
+                leader, they will be dropped. <remark>[tbd: is this a correct
+                rewording: if followers fall beyond this limit, they are
+                dropped?]</remark></para>
+              </listitem>
+            </varlistentry>
+          </variablelist>
+
+          <para></para>
+        </section>
+
+        <section>
+          <title>Unsafe Options</title>
+
+          <para>The following options can be useful, but be careful when you
+          use them. The risk of each is explained along with the explanation
+          of what the variable does.</para>
+
+          <variablelist>
+	  
+	  <varlistentry>
+              <term>forceSync</term>
+
+              <listitem>
+                <para>(Java system property: <emphasis
+                role="bold">zookeeper.forceSync</emphasis>)</para>
+
+                <para>Requires updates to be synced to media of the
+                transaction log before finishing processing the update. If
+                this option is set to no, ZooKeeper will not require updates
+                to be synced to the media. <remark>[tbd: useful because...,
+                dangerous because...]</remark></para>
+              </listitem>
+            </varlistentry>
+
+            <varlistentry>
+              <term>jute.maxbuffer:</term>
+
+              <listitem>
+                <para>(Java system property:<emphasis role="bold">
+                jute.maxbuffer</emphasis>)</para>
+
+                <para>This option can only be set as a Java system property.
+                There is no zookeeper prefix on it. It specifies the maximum
+                size of the data that can be stored in a znode. The default is
+                0xfffff, or just under 1M. If this option is changed, the
+                system property must be set on all servers and clients
+                otherwise problems will arise. This is really a sanity check.
+                ZooKeeper is designed to store data on the order of kilobytes
+                in size.</para>
+              </listitem>
+            </varlistentry>
+	    
+            <varlistentry>
+              <term>skipACL</term>
+
+              <listitem>
+                <para>(Java system property: <emphasis
+                role="bold">zookeeper.skipACL</emphasis>)</para>
+
+                <para>Skips ACL checks. <remark>[tbd: when? where?]</remark>
+                This results in a boost in throughput, but opens up full
+                access to the data tree to everyone.</para>
+              </listitem>
+            </varlistentry>
+
+            
+          </variablelist>
+        </section>
+      </section>
+
+      <section id="sc_zkCommands">
+        <title>Zookeeper Commands: The Four Letter Words</title>
+
+        <para>Zookeeper responds to a small set of commands. Each command is composed of
+        four letters. You issue the commands to Zookeeper via telnet or nc, at
+        the client port.</para>
+
+        <variablelist>
+	
+	    <varlistentry>
+            <term>dump</term>
+
+            <listitem>
+              <para>Lists the outstanding sessions and ephemeral nodes. This
+              only works on the leader.</para>
+            </listitem>
+          </varlistentry>
+	  
+	    <varlistentry>
+            <term>kill</term>
+
+            <listitem>
+              <para>Shuts down the server. This must be issued from the
+              machine the Zookeeper server is running on.</para>
+            </listitem>
+          </varlistentry>
+	  
+          <varlistentry>
+            <term>ruok</term>
+
+            <listitem>
+              <para>Tests if server is running in a non-error state. The
+              server will respond with imok if it is running. Otherwise it
+              will not respond at all.</para>
+            </listitem>
+          </varlistentry>
+
+          <varlistentry>
+            <term>stat</term>
+
+            <listitem>
+              <para>Lists statistics about performance and connected
+              clients.</para>
+            </listitem>
+          </varlistentry>
+        </variablelist>
+
+        <para>Here's an example of the <emphasis role="bold">ruok</emphasis>
+        command:</para>
+
+        <screen>$ echo ruok | nc 127.0.0.1 5111
+
+imok
+</screen>
+      </section>
+
+      <section id="sc_monitoring">
+        <title>Monitoring</title>
+
+        <para><remark>[tbd: Patrick, Ben, et al: I believe the Message Broker
+        team does perform routine monitoring of Zookeeper. But I might be
+        wrong. To your knowledge, is there any monitoring of a Zookeeper
+        deployment that will a Zookeeper sys admin will want to do, outside of
+        Yahoo?]</remark></para>
+      </section>
+
+    <section id="sc_dataFileManagement">
+      <title>Data File Management</title>
+
+      <para>ZooKeeper stores its data in a data directory and its transaction
+      log in a transaction log directory. By default these two directories are
+      the same. The server can (and should) be configured to store the
+      transaction log files in a separate directory than the data files.
+      Throughput increases and latency decreases when transaction logs reside
+      on a dedicated log devices.</para>
+
+      <section>
+        <title>The Data Directory</title>
+
+        <para>This directory has two files in it:</para>
+
+        <itemizedlist>
+          <listitem>
+            <para><filename>myid</filename> - contains a single integer in
+            human readable ASCII text that represents the server id.</para>
+          </listitem>
+
+          <listitem>
+            <para><filename>snapshot.&lt;zxid&gt;</filename> - holds the fuzzy
+            snapshot of a data tree.</para>
+          </listitem>
+        </itemizedlist>
+
+        <para>Each ZooKeeper server has a unique id. This id is used in two
+        places: the <filename>myid</filename> file and the configuration file.
+        The <filename>myid</filename> file identifies the server that
+        corresponds to the given data directory. The configuration file lists
+        the contact information for each server identified by its server id.
+        When a ZooKeeper server instance starts, it reads its id from the
+        <filename>myid</filename> file and then, using that id, reads from the
+        configuration file, looking up the port on which it should
+        listen.</para>
+
+        <para>The <filename>snapshot</filename> files stored in the data
+        directory are fuzzy snapshots in the sense that during the time the
+        ZooKeeper server is taking the snapshot, updates are occurring to the
+        data tree. The suffix of the <filename>snapshot</filename> file names
+        is the <emphasis>zxid</emphasis>, the ZooKeeper transaction id, of the
+        last committed transaction at the start of the snapshot. Thus, the
+        snapshot includes a subset of the updates to the data tree that
+        occurred while the snapshot was in process. The snapshot, then, may
+        not correspond to any data tree that actually existed, and for this
+        reason we refer to it as a fuzzy snapshot. Still, ZooKeeper can
+        recover using this snapshot because it takes advantage of the
+        idempotent nature of its updates. By replaying the transaction log
+        against fuzzy snapshots ZooKeeper gets the state of the system at the
+        end of the log.</para>
+      </section>
+
+      <section>
+        <title>The Log Directory</title>
+
+        <para>The Log Directory contains the ZooKeeper transaction logs.
+        Before any update takes place, ZooKeeper ensures that the transaction
+        that represents the update is written to non-volatile storage. A new
+        log file is started each time a snapshot is begun. The log file's
+        suffix is the first zxid written to that log.</para>
+      </section>
+
+      <section>
+        <title>File Management</title>
+
+        <para>The format of snapshot and log files does not change between
+        standalone ZooKeeper servers and different configurations of
+        replicated ZooKeeper servers. Therefore, you can pull these files from
+        a running replicated ZooKeeper server to a development machine with a
+        stand-alone ZooKeeper server for trouble shooting.</para>
+
+        <para>Using older log and snapshot files, you can look at the previous
+        state of ZooKeeper servers and even restore that state. The
+        LogFormatter class allows an administrator to look at the transactions
+        in a log.</para>
+
+        <para>The ZooKeeper server creates snapshot and log files, but never
+        deletes them. The retention policy of the data and log files is
+        implemented outside of the ZooKeeper server. The server itself only
+        needs the latest complete fuzzy snapshot and the log files from the
+        start of that snapshot. The PurgeTxnLog utility implements a simple
+        retention policy that administrators can use.</para>
+      </section>
+    </section>
+
+    <section id="sc_commonProblems">
+      <title>Things to Avoid</title>
+
+      <para>Here are some common problems you can avoid by configuring
+      ZooKeeper correctly:</para>
+
+      <variablelist>
+        <varlistentry>
+          <term>inconsistent lists of servers</term>
+
+          <listitem>
+            <para>The list of Zookeeper servers used by the clients must match
+            the list of ZooKeeper servers that each ZooKeeper server has.
+            Things work okay if the client list is a subset of the real list,
+            but things will really act strange if clients have a list of
+            ZooKeeper servers that are in different ZooKeeper clusters. Also,
+            the server lists in each Zookeeper server configuration file
+            should be consistent with one another. <remark>[tbd: I'm assuming
+            this last part is true. Is it?]</remark></para>
+          </listitem>
+        </varlistentry>
+
+        <varlistentry>
+          <term>incorrect placement of transasction log</term>
+
+          <listitem>
+            <para>The most performance critical part of ZooKeeper is the
+            transaction log. Zookeeper syncs transactions to media before it
+            returns a response. A dedicated transaction log device is key to
+            consistent good performance. Putting the log on a busy device will
+            adversely effect performance. If you only have one storage device,
+            put trace files on NFS and increase the snapshotCount; it doesn't
+            eliminate the problem, but it should mitigate it.</para>
+          </listitem>
+        </varlistentry>
+
+        <varlistentry>
+          <term>incorrect Java heap size</term>
+
+          <listitem>
+            <para>You should take special care to set your Java max heap size
+            correctly. In particular, you should not create a situation in
+            which Zookeeper swaps to disk. The disk is death to ZooKeeper.
+            Everything is ordered, so if processing one request swaps the
+            disk, all other queued requests will probably do the same. the
+            disk. DON'T SWAP.</para>
+
+            <para>Be conservative in your estimates: if you have 4G of RAM, do
+            not set the Java max heap size to 6G or even 4G. For example, it
+            is more likely you would use a 3G heap for a 4G machine, as the
+            operating system and the cache also need memory. The best and only
+            recommend practice for estimating the heap size your system needs
+            is to run load tests, and then make sure you are well below the
+            usage limit that would cause the system to swap.</para>
+          </listitem>
+        </varlistentry>
+      </variablelist>
+    </section>
+
+    <section id="sc_bestPractices">
+      <title>Best Practices</title>
+
+      <para>For best results, take note of the following list of good
+      Zookeeper practices. <remark>[tbd: I just threw this section in. Do we
+      have list that is is different from the "things to avoid"? If not, I can
+      easily remove this section.]</remark></para>
+    </section>
+  </chapter>
+</book>

+ 46 - 0
src/docs/src/documentation/content/xdocs/zookeeperOtherInfo.xml

@@ -0,0 +1,46 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!--
+  Copyright 2002-2004 The Apache Software Foundation
+
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+      http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License.
+-->
+
+<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN"
+"http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd">
+<book id="bk_OtherInfo">
+  <title>ZooKeeper</title>
+
+  <bookinfo>
+    <legalnotice>
+      <para>Licensed under the Apache License, Version 2.0 (the "License");
+      you may not use this file except in compliance with the License. You may
+      obtain a copy of the License at <ulink
+      url="http://www.apache.org/licenses/LICENSE-2.0">http://www.apache.org/licenses/LICENSE-2.0</ulink>.</para>
+
+      <para>Unless required by applicable law or agreed to in writing,
+      software distributed under the License is distributed on an "AS IS"
+      BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
+      implied. See the License for the specific language governing permissions
+      and limitations under the License.</para>
+    </legalnotice>
+
+    <abstract>
+      <para> currently empty </para>
+    </abstract>
+  </bookinfo>
+
+  <chapter id="ch_placeholder">
+    <title>Other Info</title>
+    <para> currently empty </para>
+  </chapter>
+</book>

+ 437 - 0
src/docs/src/documentation/content/xdocs/zookeeperOver.xml

@@ -0,0 +1,437 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!--
+  Copyright 2002-2004 The Apache Software Foundation
+
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+      http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License.
+-->
+
+<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN"
+"http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd">
+<book id="bk_Overview">
+  <title>ZooKeeper</title>
+
+  <bookinfo>
+    <legalnotice>
+      <para>Licensed under the Apache License, Version 2.0 (the "License");
+      you may not use this file except in compliance with the License. You may
+      obtain a copy of the License at <ulink
+      url="http://www.apache.org/licenses/LICENSE-2.0">http://www.apache.org/licenses/LICENSE-2.0</ulink>.</para>
+
+      <para>Unless required by applicable law or agreed to in writing,
+      software distributed under the License is distributed on an "AS IS"
+      BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
+      implied. See the License for the specific language governing permissions
+      and limitations under the License.</para>
+    </legalnotice>
+
+    <abstract>
+      <para>This document contains overview information about ZooKeeper. It
+      discusses design goals, key concepts, implementation, and
+      performance.</para>
+    </abstract>
+  </bookinfo>
+
+  <chapter id="ch_DesignOverview">
+    <title>ZooKeeper: A Distributed Coordination Service for Distributed
+    Applications</title>
+
+    <para>ZooKeeper is a distributed, open-source coordination service for
+    distributed applications. It exposes a simple set of primitives that
+    distributed applications can build upon to implement higher level services
+    for synchronization, configuration maintenance, and groups and naming. It
+    is designed to be easy to program to, and uses a data model styled after
+    the familiar directory tree structure of file systems. It runs in Java and
+    has bindings for both Java and C.</para>
+
+    <para>Coordination services are notoriously hard to get right. They are
+    especially prone to errors such as race conditions and deadlock. The
+    motivation behind ZooKeeper is to relieve distributed applications the
+    responsibility of implementing coordination services from scratch.</para>
+
+    <section id="sc_designGoals">
+      <title>Design Goals</title>
+
+      <para><emphasis role="bold">ZooKeeper is simple.</emphasis> ZooKeeper
+      allows distributed processes to coordinate with each other through a
+      shared hierarchal namespace which is organized similarly to a standard
+      file system. The name space consists of data registers - called znodes,
+      in ZooKeeper parlance - and these are similar to files and directories.
+      Unlike a typical file system, which is designed for storage, ZooKeeper
+      data is kept in-memory, which means ZooKeeper can acheive high
+      throughput and low latency numbers.</para>
+
+      <para>The ZooKeeper implementation puts a premium on high performance,
+      highly available, strictly ordered access. The performance aspects of
+      ZooKeeper means it can be used in large, distributed systems. The
+      reliability aspects keep it from being a single point of failure. The
+      strict ordering means that sophisticated synchronization primitives can
+      be implemented at the client.</para>
+
+      <para><emphasis role="bold">ZooKeeper is replicated.</emphasis> Like the
+      distributed processes it coordinates, ZooKeeper itself is intended to be
+      replicated over a sets of machines called quorums.</para>
+
+      <figure>
+        <title>ZooKeeper Service</title>
+
+        <mediaobject>
+          <imageobject>
+            <imagedata fileref="images/zkservice.jpg" />
+          </imageobject>
+        </mediaobject>
+      </figure>
+
+      <para>The servers that make up the ZooKeeper service must all know about
+      each other. They maintain an in-memory image of state, along with a
+      transaction logs and snapshots in a persistent store. As long as a
+      majority of the servers are available, the ZooKeeper service will be
+      available.</para>
+
+      <para>Clients connect to a single ZooKeeper server. The client maintains
+      a TCP connection through which it sends requests, gets responses, gets
+      watch events, and sends heart beats. If the TCP connection to the server
+      breaks, the client will connect to a different server.</para>
+
+      <para><emphasis role="bold">ZooKeeper is ordered.</emphasis> ZooKeeper
+      stamps each update with a number that reflects the order of all
+      ZooKeeper transactions. Subsequent operations can use the order to
+      implement higher-level abstractions, such as synchronization
+      primitives.</para>
+
+      <para><emphasis role="bold">ZooKeeper is fast.</emphasis> It is
+      especially fast in "read-dominant" workloads. ZooKeeper applications run
+      on thousands of machines, and it performs best where reads are more
+      common than writes, at ratios of around 10:1.</para>
+    </section>
+
+    <section id="sc_dataModelNameSpace">
+      <title>Data model and the hierarchical namespace</title>
+
+      <para>The name space provided by ZooKeeper is much like that of a
+      standard file system. A name is a sequence of path elements separated by
+      a slash (/). Every node in ZooKeeper's name space is identified by a
+      path.</para>
+
+      <figure>
+        <title>ZooKeeper's Hierarchical Namespace</title>
+
+        <mediaobject>
+          <imageobject>
+            <imagedata fileref="images/zknamespace.jpg" />
+          </imageobject>
+        </mediaobject>
+      </figure>
+    </section>
+
+    <section>
+      <title>Nodes and ephemeral nodes</title>
+
+      <para>Unlike is standard file systems, each node in a ZooKeeper
+      namespace can have data associated with it as well as children. It is
+      like having a file-system that allows a file to also be a directory.
+      (ZooKeeper was designed to store coordination data: status information,
+      configuration, location information, etc., so the data stored at each
+      node is usually small, in the byte to kilobyte range.) We use the term
+      <firstterm>znode</firstterm> to make it clear that we are talking about
+      ZooKeeper data nodes.</para>
+
+      <para>Znodes maintain a stat structure that includes version numbers for
+      data changes, ACL changes, and timestamps, to allow cache validations
+      and coordinated updates. Each time a znode's data changes, the version
+      number increases. For instance, whenever a client retrieves data it also
+      receives the version of the data.</para>
+
+      <para>The data stored at each znode in a namespace is read and written
+      atomically. Reads get all the data bytes associated with a znode and a
+      write replaces all the data. Each node has an Access Control List (ACL)
+      that restricts who can do what.</para>
+
+      <para>ZooKeeper also has the notion of ephemeral nodes. These znodes
+      exists as long as the session that created the znode is active. When the
+      session ends the znode is deleted. Ephemeral nodes are useful when you
+      want to implement <remark>[tbd]</remark>.</para>
+    </section>
+
+    <section>
+      <title>Conditional updates and watches</title>
+
+      <para>ZooKeeper supports the concept of <firstterm>watches</firstterm>.
+      Clients can set a watch on a znodes. A watch will be triggered and
+      removed when the znode changes. When a watch is triggered the client
+      receives a packet saying that the znode has changed. And if the
+      connection between the client and one of the Zoo Keeper servers is
+      broken, the client will receive a local notification. These can be used
+      to <remark>[tbd]</remark>.</para>
+    </section>
+
+    <section>
+      <title>Guarantees</title>
+
+      <para>ZooKeeper is very fast and very simple. Since its goal, though, is
+      to be a basis for the construction of more complicated services, such as
+      synchronization, it provides a set of guarantees. These are:</para>
+
+      <itemizedlist>
+        <listitem>
+          <para>Sequential Consistency - Updates from a client will be applied
+          in the order that they were sent.</para>
+        </listitem>
+
+        <listitem>
+          <para>Atomicity - Updates either succeed or fail. No partial
+          results.</para>
+        </listitem>
+
+        <listitem>
+          <para>Single System Image - A client will see the same view of the
+          service regardless of the server that it connects to.</para>
+        </listitem>
+      </itemizedlist>
+
+      <itemizedlist>
+        <listitem>
+          <para>Reliability - Once an update has been applied, it will persist
+          from that time forward until a client overwrites the update.</para>
+        </listitem>
+      </itemizedlist>
+
+      <itemizedlist>
+        <listitem>
+          <para>Timeliness - The clients view of the system is guaranteed to
+          be up-to-date within a certain time bound.</para>
+        </listitem>
+      </itemizedlist>
+
+      <para>For more information on these, and how they can be used, see
+      <remark>[tbd]</remark></para>
+    </section>
+
+    <section>
+      <title>Simple API</title>
+
+      <para>One of the design goals of ZooKeeper is provide a very simple
+      programming interface. As a result, it supports only these
+      operations:</para>
+
+      <variablelist>
+        <varlistentry>
+          <term>create</term>
+
+          <listitem>
+            <para>creates a node at a location in the tree</para>
+          </listitem>
+        </varlistentry>
+
+        <varlistentry>
+          <term>delete</term>
+
+          <listitem>
+            <para>deletes a node</para>
+          </listitem>
+        </varlistentry>
+
+        <varlistentry>
+          <term>exists</term>
+
+          <listitem>
+            <para>tests if a node exists at a location</para>
+          </listitem>
+        </varlistentry>
+
+        <varlistentry>
+          <term>get data</term>
+
+          <listitem>
+            <para>reads the data from a node</para>
+          </listitem>
+        </varlistentry>
+
+        <varlistentry>
+          <term>set data</term>
+
+          <listitem>
+            <para>writes data to a node</para>
+          </listitem>
+        </varlistentry>
+
+        <varlistentry>
+          <term>get children</term>
+
+          <listitem>
+            <para>retrieves a list of children of a node</para>
+          </listitem>
+        </varlistentry>
+
+        <varlistentry>
+          <term>sync</term>
+
+          <listitem>
+            <para>waits for data to be propagated</para>
+          </listitem>
+        </varlistentry>
+      </variablelist>
+
+      <para>For a more in-depth discussion on these, and how they can be used
+      to implement higher level operations, please refer to
+      <remark>[tbd]</remark></para>
+    </section>
+
+    <section>
+      <title>Implementation</title>
+
+      <para><xref linkend="fg_zkComponents" /> shows the high-level components
+      of the ZooKeeper service. With the exception of the request processor,
+      <remark>[tbd: where does the request processor live?]</remark> each of
+      the servers that make up the ZooKeeper service replicates its own copy
+      of each of components. <remark>[tbd: I changed the wording in this
+      sentence from the white paper. Can someone please make sure it is still
+      correct?]</remark></para>
+
+      <para><figure id="fg_zkComponents">
+          <title>ZooKeeper Components</title>
+
+          <mediaobject>
+            <imageobject>
+              <imagedata fileref="images/zkcomponents.jpg" />
+            </imageobject>
+          </mediaobject>
+        </figure></para>
+
+      <para>The replicated database is an in-memory database containing the
+      entire data tree. Updates are logged to disk for recoverability, and
+      writes are serialized to disk before they are applied to the in-memory
+      database.</para>
+
+      <para>Every ZooKeeper server services clients. Clients connect to
+      exactly one server to submit irequests. Read requests are serviced from
+      the local replica of each server database. Requests that change the
+      state of the service, write requests, are processed by an agreement
+      protocol.</para>
+
+      <para>As part of the agreement protocol all write requests from clients
+      are forwarded to a single server, called the
+      <firstterm>leader</firstterm>. The rest of the ZooKeeper servers, called
+      <firstterm>followers</firstterm>, receive message proposals from the
+      leader and agree upon message delivery. The messaging layer takes care
+      of replacing leaders on failures and syncing followers with
+      leaders.</para>
+
+      <para>ZooKeeper uses a custom atomic messaging protocol. Since the
+      messaging layer is atomic, ZooKeeper can guarantee that the local
+      replicas never diverge. When the leader receives a write request, it
+      calculates what the state of the system is when the write is to be
+      applied and transforms this into a transaction that captures this new
+      state.</para>
+    </section>
+
+    <section>
+      <title>Uses</title>
+
+      <para>The programming interface to ZooKeeper is deliberately simple.
+      With it, however, you can implement higher order operations, such as
+      synchronizations primitives, group membership, ownership, etc. Some
+      distributed applications have used it to: <remark>[tbd: add uses from
+      white paper and video presentation.]</remark> For more information, see
+      <remark>[tbd]</remark></para>
+    </section>
+
+    <section>
+      <title>Performance</title>
+
+      <para>ZooKeeper is designed to be highly performant. But is it? The
+      results of the ZooKeeper's development team at Yahoo! Research indicate
+      that it is. (See <xref linkend="fg_zkPerfRW" />.) It is especially high
+      performance in applications where reads outnumber writes, since writes
+      involve synchronizing the state of all servers. (Reads outnumbering
+      writes is typically the case for a coordination service.)</para>
+
+      <para><figure id="fg_zkPerfRW">
+          <title>ZooKeeper Throughput as the Read-Write Ratio Varies</title>
+
+          <mediaobject>
+            <imageobject>
+              <imagedata fileref="images/zkperfRW.jpg" />
+            </imageobject>
+          </mediaobject>
+        </figure>Benchmarks also indicate that it is reliable, too. <xref
+      linkend="fg_zkPerfReliability" /> shows how a deployment responds to
+      various failures. The events marked in the figure are the
+      following:</para>
+
+      <orderedlist>
+        <listitem>
+          <para>Failure and recovery of a follower</para>
+        </listitem>
+
+        <listitem>
+          <para>Failure and recovery of a different follower</para>
+        </listitem>
+
+        <listitem>
+          <para>Failure of the leader</para>
+        </listitem>
+
+        <listitem>
+          <para>Failure and recovery of two followers</para>
+        </listitem>
+
+        <listitem>
+          <para>Failure of another leader</para>
+        </listitem>
+      </orderedlist>
+
+      <para><figure id="fg_zkPerfReliability">
+          <title>Reliability in the Presence of Errors</title>
+
+          <mediaobject>
+            <imageobject>
+              <imagedata fileref="images/zkperfreliability.jpg" />
+            </imageobject>
+          </mediaobject>
+        </figure></para>
+
+      <para>The are a few important observations from this graph. First, if
+      followers fail and recover quickly, then ZooKeeper is able to sustain a
+      high throughput despite the failure. But maybe more importantly, the
+      leader election algorithm allows for the system to recover fast enough
+      to prevent throughput from dropping substantially. In our observations,
+      ZooKeeper takes less than 200ms to elect a new leader. Third, as
+      followers recover, ZooKeeper is able to raise throughput again once they
+      start processing requests.</para>
+    </section>
+
+    <section>
+      <title>The ZooKeeper Project</title>
+
+      <para>ZooKeeper has been successfully used in industrial applications.
+      It is used at Yahoo! as the coordination and failure recovery service
+      for Yahoo! Message Broker, which is a highly scalable publish-subscribe
+      system managing thousands of topics for replication and data delivery.
+      It is used by the Fetching Service for Yahoo! crawler, where it also
+      manages failure recovery. And it is used by Hadoop On Demand (HOD),
+      which is an open source implementation of the map-reduce model of
+      computation. HOD uses Zookeeper as a communications and control channel
+      between slave and master process. (For more information, see the <ulink
+      url="http://hadoop.apache.org/core/">Hadoop</ulink> and <ulink
+      url="http://hadoop.apache.org/core/docs/current/hod.html">Hadoop on
+      Demand</ulink> open source projects on Apache.)</para>
+
+      <para>ZooKeeper itself is an open source project, under the Apache Open
+      Source Foundation. It is a subproject of Hadoop. All users and
+      developers are encourged to join the community and contribute their
+      expertise. See the <ulink
+      url="http://hadoop.apache.org/zookeeper/">Zookeeper Project on
+      Apache</ulink> for more information.</para>
+    </section>
+  </chapter>
+</book>

+ 1077 - 0
src/docs/src/documentation/content/xdocs/zookeeperProgrammers.xml

@@ -0,0 +1,1077 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!--
+  Copyright 2002-2004 The Apache Software Foundation
+
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+      http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License.
+-->
+
+<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN"
+"http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd">
+<book id="bk_programmersGuide">
+  <title>ZooKeeper Programmer's Guide</title>
+
+  <subtitle>Developing Distributed Applications that use ZooKeeper</subtitle>
+
+  <bookinfo>
+    <legalnotice>
+      <para>Licensed under the Apache License, Version 2.0 (the "License");
+      you may not use this file except in compliance with the License. You may
+      obtain a copy of the License at <ulink
+      url="http://www.apache.org/licenses/LICENSE-2.0">http://www.apache.org/licenses/LICENSE-2.0</ulink>.</para>
+
+      <para>Unless required by applicable law or agreed to in writing,
+      software distributed under the License is distributed on an "AS IS"
+      BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
+      implied. See the License for the specific language governing permissions
+      and limitations under the License.</para>
+    </legalnotice>
+
+    <abstract>
+      <para>This guide contains detailed information about creating
+      distributed applications that use ZooKeeper. It discusses the basic
+      operations Zookeeper supports, and how these can be used to build
+      higher-level abstractions. It contains solutions to common tasks, a
+      troubleshooting guide, and links to other information.</para>
+
+      <para>$Revision: 1.14 $ $Date: 2008/09/19 05:31:45 $</para>
+    </abstract>
+  </bookinfo>
+
+  <preface id="_introduction">
+    <title>Introduction</title>
+
+    <para>This document is a guide for developers wishing to create
+    distributed applications that take advantage of ZooKeeper's coordination
+    services. It contains conceptual and practical information.</para>
+
+    <para>The first four chapters of this guide present higher level
+    discussions of various ZooKeeper concepts. These are necessary both for an
+    understanding of how Zookeeper works as well how to work with it. It does
+    not contain source code, but it does assume a familiarity with the
+    problems associated with distributed computing. The chapters in this first
+    group are:</para>
+
+    <itemizedlist>
+      <listitem>
+        <para><xref linkend="ch_zkDataModel" /></para>
+      </listitem>
+
+      <listitem>
+        <para><xref linkend="ch_zkSessions" /></para>
+      </listitem>
+
+      <listitem>
+        <para><xref linkend="ch_zkWatches" /></para>
+      </listitem>
+
+      <listitem>
+        <para><xref linkend="ch_zkGuarantees" /></para>
+      </listitem>
+    </itemizedlist>
+
+    <para>The next four chapters of this provided practical programming
+    information. These are:</para>
+
+    <itemizedlist>
+      <listitem>
+        <para><xref linkend="ch_guideToZkOperations" /></para>
+      </listitem>
+
+      <listitem>
+        <para><xref linkend="ch_bindings" /></para>
+      </listitem>
+
+      <listitem>
+        <para><xref linkend="ch_programStructureWithExample" />
+        <remark>[tbd]</remark></para>
+      </listitem>
+
+      <listitem>
+        <para><xref linkend="ch_gotchas" /></para>
+      </listitem>
+    </itemizedlist>
+
+    <para>The book concludes with an <ulink
+    url="#apx_linksToOtherInfo">appendix</ulink> containing links to other
+    useful, ZooKeeper-related information.</para>
+
+    <para>Most of information in this document is written to be accessible as
+    stand-alone reference material. However, before starting your first
+    ZooKeeper application, you should probably at least read the chaptes on
+    the <ulink url="#ch_zkDataModel">ZooKeeper Data Model</ulink> and <ulink
+    url="#ch_guideToZkOperations">ZooKeeper Basic Operations</ulink>. Also,
+    the <ulink url="#ch_programStructureWithExample">Simple Programmming
+    Example</ulink> <remark>[tbd]</remark> is helpful for understand the basic
+    structure of a ZooKeeper client application.</para>
+  </preface>
+
+  <chapter id="ch_zkDataModel">
+    <title>The ZooKeeper Data Model</title>
+
+    <para>ZooKeeper has a hierarchal name space, much like a distributed file
+    system. The only difference is that each node in the namespace can have
+    data associated with it as well as children. It is like having a file
+    system that allows a file to also be a directory. Paths to nodes are
+    always expressed as canonical, absolute, slash-separated paths; there are
+    no relative reference. Any unicode character can be used in a path subject
+    to the following constraints:</para>
+
+    <itemizedlist>
+      <listitem>
+        <para>The null character (\u0000) cannot be part of a path name. (This
+        causes problems with the C binding.)</para>
+      </listitem>
+
+      <listitem>
+        <para>The following characters can't be used because they don't
+        display well, or render in confusing ways: \u0001 - \u0019 and \u007F
+        - \u009F.</para>
+      </listitem>
+
+      <listitem>
+        <para>The following characters are not allowed because <remark>[tbd:
+        do we need reasons?]</remark> :\ud800 -uF8FFF, \uFFF0-uFFFF, \uXFFFE -
+        \uXFFFF (where X is an digit 1 - E), \uF0000 - \uFFFFF.</para>
+      </listitem>
+
+      <listitem>
+        <para>The "." character can be used as part of another name, but "."
+        and ".." cannot alone make up the whole name of a path location,
+        because ZooKeeper doesn't use relative paths. The following would be
+        invalid: "/a/b/./c" or "/a/b/../c".</para>
+      </listitem>
+
+      <listitem>
+        <para>The token "zookeeper" is reserved.</para>
+      </listitem>
+    </itemizedlist>
+
+    <section id="sc_zkDataModel_znodes">
+      <title>ZNodes</title>
+
+      <para>Every node in a ZooKeeper tree is refered to as a
+      <firstterm>znode</firstterm>. Znodes maintain a stat structure that
+      includes version numbers for data changes, acl changes. The stat
+      structure also has timestamps. The version number, together with the
+      timestamp allow ZooKeeper to validate the cache and to coordinate
+      updates. Each time a znode's data changes, the version number increases.
+      For instance, whenever a client retrieves data, it also receives the
+      version of the data. And when a client performs an update or a delete,
+      it must supply the version of the data of the znode it is changing. If
+      the version it supplies doesn't match the actual version of the data,
+      the update will fail. (This behavior can be overridden. For more
+      information see... <remark>[tbd... reference here to the section
+      describing the special version number -1]</remark></para>
+
+      <note>
+        <para>In distributed application engineering, the word
+        <emphasis>node</emphasis> can refer to a generic host machine, a
+        server, a member of quorums, a client process, etc. In the ZooKeeper
+        documentatin, <emphasis>znodes</emphasis> refer to the data nodes.
+        <firstterm>Servers</firstterm> to refer to machines that make up the
+        ZooKeeper service; <emphasis>quorum peers</emphasis> refer to the
+        servers that make up a quorum; client refers to any host or process
+        which uses a ZooKeeper service.</para>
+      </note>
+
+      <para>Znodes are the main enitity that a programmer access. They have
+      several characteristics that are worth mentioning here.</para>
+
+      <section id="sc_zkDataMode_watches">
+        <title>Watches</title>
+
+        <para>Clients can set watches on znodes. Changes to that znode trigger
+        the watch and then clear the watch. When a watch triggers, ZooKeeper
+        sends the client a notification. More information about watches can be
+        found in the section 
+	<ulink url="recipes.html#sc_recipes_Locks">
+	Zookeeper Watches</ulink>.
+        <remark>[tbd: fix this link] [tbd: Ben there is note from to emphasize
+        that "it is queued". What is "it" and is what we have here
+        sufficient?]</remark></para>
+      </section>
+
+      <section>
+        <title>Data Access</title>
+
+        <para>The data stored at each znode in a namespace is read and written
+        atomically. Reads get all the data bytes associated with a znode and a
+        write replaces all the data. Each node has an Access Control List
+        (ACL) that restricts who can do what.</para>
+      </section>
+
+      <section>
+        <title>Ephemeral Nodes</title>
+
+        <para>ZooKeeper also has the notion of ephemeral nodes. These znodes
+        exists as long as the session that created the znode is active. When
+        the session ends the znode is deleted. Because of this behavior
+        ephemeral znodes are not allowed to have children.</para>
+      </section>
+
+      <section>
+        <title>Unique Naming</title>
+
+        <para>Finally you create a znode, you can request that ZooKeeper
+        append a monotonicly increasing counter be appended to the path name
+        of the znode to be requested. This counter is unique to the parent
+        znode.</para>
+      </section>
+    </section>
+
+    <section id="sc_timeInZk">
+      <title>Time in ZooKeeper</title>
+
+      <para>ZooKeeper tracks time multiple ways:</para>
+
+      <itemizedlist>
+        <listitem>
+          <para><emphasis role="bold">Zxid</emphasis></para>
+
+          <para>Every change to the ZooKeeper state receives a stamp in the
+          form of a <firstterm>zxid</firstterm> (ZooKeeper Transaction Id).
+          This exposes the total ordering of all changes to ZooKeeper. Each
+          change will have a unique zxid and if zxid1 is smaller than zxid2
+          then zxid1 happened before zxid2.</para>
+        </listitem>
+
+        <listitem>
+          <para><emphasis role="bold">Version numbers</emphasis></para>
+
+          <para>Every change to a a node will cause an increase to one of the
+          version numbers of that node. The three version numbers are version
+          (number of changes to the data of a znode), cversion (number of
+          changes to the children of a znode), and aversion (number of changes
+          to the ACL of a znode).</para>
+        </listitem>
+
+        <listitem>
+          <para><emphasis role="bold">Ticks</emphasis></para>
+
+          <para>When using multi-server ZooKeeper, servers use ticks to define
+          timing of events such as status uploads, session timeouts,
+          connection timeouts between peers, etc. The tick time is only
+          indirectly exposed through the minimum session timeout (2 times the
+          tick time); if a client requests a session timeout less than the
+          minimum session timeout, the server will tell the client that the
+          session timeout is actually the minimum session timeout.</para>
+        </listitem>
+
+        <listitem>
+          <para><emphasis role="bold">Real time</emphasis></para>
+
+          <para>ZooKeeper doesn't use real time, or clock time, at all except
+          to put timestamps into the stat structure on znode creation and
+          znode modification.</para>
+        </listitem>
+      </itemizedlist>
+    </section>
+
+    <section id="sc_zkStatStructure">
+      <title>ZooKeeper Stat Structure</title>
+
+      <para>The Stat structure for each znode in ZooKeeper is made up of the
+      following fields:</para>
+
+      <itemizedlist>
+        <listitem>
+          <para><emphasis role="bold">czxid</emphasis></para>
+
+          <para>The zxid of the change that caused this znode to be
+          created.</para>
+        </listitem>
+
+        <listitem>
+          <para><emphasis role="bold">mzxid</emphasis></para>
+
+          <para>The zxid of the change that last modified this znode.</para>
+        </listitem>
+
+        <listitem>
+          <para><emphasis role="bold">ctime</emphasis></para>
+
+          <para>The time in milliseconds from epoch when this znode was
+          created.</para>
+        </listitem>
+
+        <listitem>
+          <para><emphasis role="bold">mtime</emphasis></para>
+
+          <para>The time in milliseconds from epoch when this znode was last
+          modified.</para>
+        </listitem>
+
+        <listitem>
+          <para><emphasis role="bold">version</emphasis></para>
+
+          <para>The number of changes to the data of this znode.</para>
+        </listitem>
+
+        <listitem>
+          <para><emphasis role="bold">cversion</emphasis></para>
+
+          <para>The number of changes to the children of this znode.</para>
+        </listitem>
+
+        <listitem>
+          <para><emphasis role="bold">aversion</emphasis></para>
+
+          <para>The number of changes to the ACL of this znode.</para>
+        </listitem>
+
+        <listitem>
+          <para><emphasis role="bold">ephemeralOwner</emphasis></para>
+
+          <para>The session id of the owner of this znode if the znode is an
+          ephemeral node. If it is not an ephemeral node, it will be
+          zero.</para>
+        </listitem>
+      </itemizedlist>
+    </section>
+  </chapter>
+
+  <chapter id="ch_zkSessions">
+    <title>ZooKeeper Sessions</title>
+
+    <para>When a client gets a handle to the ZooKeeper service, ZooKeeper
+    creates a ZooKeeper session, represented as a 64-bit number, that it
+    assigns to the client. If the client connects to a different ZooKeeper
+    server, it will send the session id as a part of the connection handshake.
+    As a security measure, the server creates a password for the session id
+    that any ZooKeeper server can validate. <remark>[tbd: note from Ben:
+    "perhaps capability is a better word." need clarification on that.]
+    </remark>The password is sent to the client with the session id when the
+    client establishes the session. The client sends this password with the
+    session id whenever it reestablishes the session with a new server.</para>
+
+    <para>One of the parameters to the ZooKeeper client library call to create
+    a ZooKeeper session is the session timeout in milliseconds. The client
+    sends a requested timeout, the server responds with the timeout that it
+    can give the client. The current implementation requires that the timeout
+    be between 2 times the tickTime (as set in the server configuration) and
+    60 seconds.</para>
+
+    <para>The session is kept alive by requests sent by the client. If the
+    session is idle for a period of time that would timeout the session, the
+    client will send a PING request to keep the session alive. This PING
+    request not only allows the ZooKeeper server to know that the client is
+    still active, but it also allows the client to verify that its connection
+    to the ZooKeeper server is still active. The timing of the PING is
+    conservative enough to ensure reasonable time to detect a dead connection
+    and reconnect to a new server.</para>
+  </chapter>
+
+  <chapter id="ch_zkWatches">
+    <title>ZooKeeper Watches</title>
+
+    <para>All of the read operations in ZooKeeper - <emphasis
+    role="bold">getData()</emphasis>, <emphasis
+    role="bold">getChildren()</emphasis>, and <emphasis
+    role="bold">exists()</emphasis> - have the option of setting a watch as a
+    side effect. Here is ZooKeeper's definition of a watch: a watch event is
+    one-time trigger, sent to the client that set the watch, which occurs when
+    the data for which the watch was set changes. There are three key points
+    to consider in this definition of a watch:</para>
+
+    <itemizedlist>
+      <listitem>
+        <para><emphasis role="bold">One-time trigger</emphasis></para>
+
+        <para>One watch event will be sent to the client the data has changed.
+        For example, if a client does a getData("/znode1", true) and later the
+        data for /znode1 is changed or deleted, the client will get a watch
+        event for /znode1. If /znode1 changes again, no watch event will be
+        sent unless the client has done another read that sets a new
+        watch.</para>
+      </listitem>
+
+      <listitem>
+        <para><emphasis role="bold">Sent to the client</emphasis></para>
+
+        <para>This implies that an event is on the way to the client, but may
+        not reach the client before the successful return code to the change
+        operation reaches the client that initiated the change. Watches are
+        sent asynchronously to watchers. ZooKeeper provides an ordering
+        guarantee: a client will never see a change for which it has set a
+        watch until it first sees the watch event. Network delays or other
+        factors may cause different clients to see watches and return codes
+        from updates at different times. The key point is that everything seen
+        by the different clients will have a consistent order.</para>
+      </listitem>
+
+      <listitem>
+        <para><emphasis role="bold">The data for which the watch was
+        set</emphasis></para>
+
+        <para>This refers to the different ways a node can change. ZooKeeper
+        maintains two lists of watches: data watches and child watches.
+        getData() and exists() set data watches. getChildren() sets child
+        watches. Thus, setData() will trigger data watches for the znode being
+        set (assuming the set is successful). A successful create() will
+        trigger a data watch for the znode being created and a child watch for
+        the parent znode. A successful delete() will trigger both a data watch
+        and a child watch (since there can be no more children) for a znode
+        being deleted as well as a child watch for the parent znode.</para>
+      </listitem>
+    </itemizedlist>
+
+    <para>Watches are maintained locally at the ZooKeeper server to which the
+    client is connected. This allows watches to be light weight to set,
+    maintain, and dispatch. It also means if a client connects to a different
+    server, the new server is not going to know about its watches. So, when a
+    client gets a disconnect event, it must consider that an implicit trigger
+    of all watches. When a client reconnects to a new server, the client
+    should re-set any watches that it is still interested in.</para>
+
+    <section id="sc_WatchGuarantees">
+      <title>What ZooKeeper Guarantees about Watches</title>
+
+      <para>With regard to watches, ZooKeeper maintains these
+      guarantees:</para>
+
+      <itemizedlist>
+        <listitem>
+          <para>Watches are ordered with respect to other events, other
+          watches, and asynchronous replies. The ZooKeeper client libraries
+          ensures that everything is dispatched in order.</para>
+        </listitem>
+      </itemizedlist>
+
+      <itemizedlist>
+        <listitem>
+          <para>A client will see a watch event for a znode it is watching
+          before seeing the new data that corresponds to that znode.</para>
+        </listitem>
+      </itemizedlist>
+
+      <itemizedlist>
+        <listitem>
+          <para>The order of watch events from ZooKeeper corresponds to the
+          order of the updates as seen by the ZooKeeper service.</para>
+        </listitem>
+      </itemizedlist>
+    </section>
+
+    <section id="sc_WatchRememberThese">
+      <title>Things to Remember about Watches</title>
+
+      <itemizedlist>
+        <listitem>
+          <para>Watches are one time triggers; if you get a watch event and
+          you want to get notified of future changes, you must set another
+          watch.</para>
+        </listitem>
+      </itemizedlist>
+
+      <itemizedlist>
+        <listitem>
+          <para>Because watches are one time triggers and there is latency
+          between getting the event and sending a new request to get a watch
+          you cannot reliably see every change that happens to a node in
+          ZooKeeper. Be prepared to handle the case where the znode changes
+          multiple times between getting the event and setting the watch
+          again. (You may not care, but at least realize it may
+          happen.)</para>
+        </listitem>
+      </itemizedlist>
+
+      <itemizedlist>
+        <listitem>
+          <para>When you disconnect from a server (for example, when the
+          server fails), all of the watches you have registered are lost, so
+          you should treat this case as if all your watches were
+          triggered.</para>
+        </listitem>
+      </itemizedlist>
+    </section>
+  </chapter>
+
+  <chapter id="ch_zkGuarantees">
+    <title>Consistency Guarantees</title>
+
+    <para>ZooKeeper is a high performance, scalable service. Both reads and
+    write operations are designed to be fast, though reads are faster than
+    writes. The reason for this is that in the case of reads, ZooKeeper can
+    serve older data, which in turn is due to ZooKeeper's consistency
+    guarantees:</para>
+
+    <variablelist>
+      <varlistentry>
+        <term>Sequential Consistency</term>
+
+        <listitem>
+          <para>Updates from a client will be applied in the order that they
+          were sent.</para>
+        </listitem>
+      </varlistentry>
+
+      <varlistentry>
+        <term>Atomicity</term>
+
+        <listitem>
+          <para>Updates either succeed or fail -- there are no partial
+          results.</para>
+        </listitem>
+      </varlistentry>
+
+      <varlistentry>
+        <term>Single System Image</term>
+
+        <listitem>
+          <para>A client will see the same view of the service regardless of
+          the server that it connects to.</para>
+        </listitem>
+      </varlistentry>
+
+      <varlistentry>
+        <term>Reliability</term>
+
+        <listitem>
+          <para>Once an update has been applied, it will persist from that
+          time forward until a client overwrites the update. This guarantee
+          has two corollaries:</para>
+
+          <orderedlist>
+            <listitem>
+              <para>If a client gets a successful return code, the update will
+              have been applied. On some failures (communication errors,
+              timeouts, etc) the client will not know if the update has
+              applied or not. We take steps to minimize the failures, but the
+              only guarantee is only present with successful return codes.
+              (This is called the _monotonicity condition_ in Paxos.)</para>
+            </listitem>
+
+            <listitem>
+              <para>Any updates that are seen by the client, through a read
+              request or successful update, will never be rolled back when
+              recovering from server failures.</para>
+            </listitem>
+          </orderedlist>
+        </listitem>
+      </varlistentry>
+
+      <varlistentry>
+        <term>Timeliness</term>
+
+        <listitem>
+          <para>The clients view of the system is guaranteed to be up-to-date
+          within a certain time bound. (On the order of tens of seconds.)
+          Either system changes will be seen by a client within this bound, or
+          the client will detect a service outage.</para>
+        </listitem>
+      </varlistentry>
+    </variablelist>
+
+    <para>Using these consistency guarantees it is easy to build higher level
+    functions such as leader election, barriers, queues, and read/write
+    revocable locks solely at the ZooKeeper client (no additions needed to
+    ZooKeeper). See <ulink url="recipes.html">Recipes and Solutions</ulink>
+    for more details.</para>
+
+    <para><note>
+        <para>Sometimes developers mistakenly assume one other guarantee that
+        Zookeeper does <emphasis>not</emphasis> in fact make. This is:</para>
+
+        <variablelist>
+          <varlistentry>
+            <term>Simultaneously Conistent Cross-Client Views</term>
+
+            <listitem>
+              <para>ZooKeeper does not guarantee that at every instance in
+              time, two different clients will have identical views of
+              ZooKeeper data. Due to factors like network delays, one client
+              may perform an update before another client gets notified of the
+              change. Consider the scenario of two clients, A and B. If client
+              A sets the value of a znode /a from 0 to 1, then tells client B
+              to read /a, client B may read the old value of 0, depending on
+              which server in the ZooKeeper quorum it is connected to. If it
+              is important that Client A and Client B read the same value,
+              Client B should should call the <emphasis
+              role="bold">sync()</emphasis> method from the ZooKeeper API
+              method before it performs its read.</para>
+
+              <para>So, ZooKeeper by itself doesn't guarantee instantaneous,
+              atomic, synchronization across its quorum, but ZooKeeper
+              primitives can be used to construct higher level functions that
+              provide complete client synchronization. (For more information,
+              see the <ulink
+              url="recipes.html#sc_recipes_Locks">Locks</ulink>
+              <remark>[tbd: fix final link target]</remark> in <ulink
+              url="recipes.html">Zookeeper Recipes</ulink>.
+              <remark>[tbd: fix final link target]</remark>).</para>
+            </listitem>
+          </varlistentry>
+        </variablelist>
+      </note></para>
+  </chapter>
+
+  <chapter id="ch_bindings">
+    <title>Bindings</title>
+
+    <para>The ZooKeeper client libraries come in two languages: Java and C.
+    The following sections describe these.</para>
+
+    <section>
+      <title>Java Binding</title>
+
+      <para>There are two packages that make up the ZooKeeper Java binding:
+      <emphasis role="bold">org.apache.zookeeper</emphasis> and <emphasis
+      role="bold">org.apache.zookeeper.data</emphasis>. The rest of the
+      packages that make up ZooKeeper are used internally or are part of the
+      server implementation. The <emphasis
+      role="bold">org.apache.zookeeper.data</emphasis> package is made up of
+      generated classes that are used simply as containers.</para>
+
+      <para>The main class used by a ZooKeeper Java client is the <emphasis
+      role="bold">ZooKeeper</emphasis> class. Its two constructors differ only
+      by an optional session id and password. ZooKeeper supports session
+      recovery accross instances of a process. A Java program may save its
+      session id and password to stable storage, restart, and recover the
+      session that was used by the earlier instance of the program.</para>
+
+      <para>When a ZooKeeper object is created, two threads are created as
+      well: an IO thread and an event thread. All IO happens on the IO thread
+      (using Java NIO). All event callbacks happen on the event thread.
+      Session maintenance such as reconnecting to ZooKeeper servers and
+      maintaining heartbeat is done on the IO thread. Responses for
+      synchronous methods are also processed in the IO thread. All responses
+      to asynchronous methods and watch events are processed on the event
+      thread. There are a few things to notice that result from this
+      design:</para>
+
+      <itemizedlist>
+        <listitem>
+          <para>All completions for asynchronous calls and watcher callbacks
+          will be made in order, one at a time. The caller can do any
+          processing they wish, but no other callbacks will be processed
+          during that time.</para>
+        </listitem>
+
+        <listitem>
+          <para>Callbacks do not block the processing of the IO thread or the
+          processing of the synchronous calls.</para>
+        </listitem>
+
+        <listitem>
+          <para>Synchronous calls may not return in the correct order. For
+          example, assume a client does the following processing: issues an
+          asynchronous read of node <emphasis role="bold">/a</emphasis> with
+          <emphasis>watch</emphasis> set to true, and then in the completion
+          callback of the read it does a synchronous read of <emphasis
+          role="bold">/a</emphasis>. (Maybe not good practice, but not illegal
+          either, and it makes for a simple example.)</para>
+
+          <para>Note that if there is a change to <emphasis
+          role="bold">/a</emphasis> between the asynchronous read and the
+          synchronous read, the client library will receive the watch event
+          saying <emphasis role="bold">/a</emphasis> changed before the
+          response for the synchronous read, but because the completion
+          callback is blocking the event queue, the synchronous read will
+          return with the new value of <emphasis role="bold">/a</emphasis>
+          before the watch event is processed.</para>
+        </listitem>
+      </itemizedlist>
+
+      <para>Finally, the rules associated with shutdown are straightforward:
+      once a ZooKeeper object is closed or receives a fatal event
+      (SESSION_EXPIRED and AUTH_FAILED), the ZooKeeper object becomes invalid,
+      the two threads shut down, and any further ZooKeeper calls throw
+      errors.</para>
+    </section>
+
+    <section>
+      <title>C Binding</title>
+
+      <para>The C binding has a single-threaded and multi-threaded library.
+      The multi-threaded library is easiest to use and is most similar to the
+      Java API. This library will create an IO thread and an event dispatch
+      thread for handling connection maintenance and callbacks. The
+      single-threaded library allows ZooKeeper to be used in event driven
+      applications by exposing the event loop used in the multi-threaded
+      library.</para>
+
+      <para>The package includes two shared libraries: zookeeper_st and
+      zookeeper_mt. The former only provides the asynchronous APIs and
+      callbacks for integrating into the application's event loop. The only
+      reason this library exists is to support the platforms were a
+      <emphasis>pthread</emphasis> library is not available or is unstable
+      (i.e. FreeBSD 4.x). In all other cases, application developers should
+      link with zookeeper_mt, as it includes support for both Sync and Async
+      API.</para>
+
+      <section>
+        <title>Installation</title>
+
+        <para>If you're building the client from a check-out from the Apache
+        repository, follow the steps outlined below. If you're building from a
+        project source package downloaded from apache, skip to step <emphasis
+        role="bold">3</emphasis>.</para>
+
+        <orderedlist>
+          <listitem>
+            <para>Run <command>ant compile_just</command> from the zookeeper
+            top level directory (<filename>.../trunk/zookeeper</filename>).
+            This will create a directory named "generated" under
+            <filename>zookeeper/c</filename>.</para>
+          </listitem>
+
+          <listitem>
+            <para>Change directory to the<filename>zookeeper/c</filename> and
+            run <command>autoreconf -i</command> to bootstrap <emphasis
+            role="bold">autoconf</emphasis>, <emphasis
+            role="bold">automake</emphasis> and <emphasis
+            role="bold">libtool</emphasis>. Make sure you have <emphasis
+            role="bold">autoconf version 2.59</emphasis> or greater installed.
+            Skip to step<emphasis role="bold"> 4</emphasis>.</para>
+          </listitem>
+
+          <listitem>
+            <para>If you are building from a project source package,
+            unzip/untar the source tarball and cd to the<filename>
+            zookeeper-x.x.x/</filename> directory.</para>
+          </listitem>
+
+          <listitem>
+            <para>Run <command>./configure &lt;your-options&gt;</command> to
+            generate the makefile. Here are some of options the <emphasis
+            role="bold">configure</emphasis> utility supports that can be
+            useful in this step:</para>
+
+            <itemizedlist>
+              <listitem>
+                <para><command>--enable-debug</command></para>
+
+                <para>Enables optimization and enables debug info compiler
+                options. (Disabled by default.)</para>
+              </listitem>
+
+              <listitem>
+                <para><command>--without-syncapi </command></para>
+
+                <para>Disables Sync API support; zookeeper_mt library won't be
+                built. (Enabled by default.)</para>
+              </listitem>
+
+              <listitem>
+                <para><command>--disable-static </command></para>
+
+                <para>Do not build static libraries. (Enabled by
+                default.)</para>
+              </listitem>
+
+              <listitem>
+                <para><command>--disable-shared</command></para>
+
+                <para>Do not build shared libraries. (Enabled by
+                default.)</para>
+              </listitem>
+            </itemizedlist>
+
+            <note>
+              <para>See INSTALL for general information about running
+              <emphasis role="bold">configure</emphasis>. <remark>[tbd: what
+              is INSTALL? a directory? a file?]</remark></para>
+            </note>
+          </listitem>
+
+          <listitem>
+            <para>Run <command>make</command> or <command>make
+            install</command> to build the libraries and install them.</para>
+          </listitem>
+
+          <listitem>
+            <para>To generate doxygen documentation for the ZooKeeper API, run
+            <command>make doxygen-doc</command>. All documentation will be
+            placed in a new subfolder named docs. By default, this command
+            only generates HTML. For information on other document formats,
+            run <command>./configure --help</command></para>
+          </listitem>
+        </orderedlist>
+      </section>
+
+      <section>
+        <title>Using the Client</title>
+
+        <para>You can test your client by running a zookeeper server (see
+        instructions on the project wiki page on how to run it) and connecting
+        to it using one of the cli applications that were built as part of the
+        installation procedure. cli_mt (multithreaded, built against
+        zookeeper_mt library) is shown in this example, but you could also use
+        cli_st (singlethreaded, built against zookeeper_st library):</para>
+
+        <para><programlisting>$ cli_mt zookeeper_host:9876</programlisting>This
+        is a client application that gives you a shell for executing simple
+        zookeeper commands. Once succesully started and connected to the
+        server it displays a shell prompt. You can now enter zookeeper
+        commands. For example, to create a node:</para>
+
+        <programlisting>&gt; create /my_new_node</programlisting>
+
+        <para>To verify that the node's been created:</para>
+
+        <para>You should see a list of node who are children of the root node
+        "/". <remark>[tbd: document all the cli commands (I think this is
+        Ben's tbd? It's from sourceforge)]</remark></para>
+
+        <para>In order to be able to use the ZooKeeper API in your application
+        you have to remember to</para>
+
+        <orderedlist>
+          <listitem>
+            <para>Include zookeeper header: #include
+            &lt;zookeeper/zookeeper.h</para>
+          </listitem>
+
+          <listitem>
+            <para>If you are building a multithreaded client, compile with
+            -DTHREADED compiler flag to enable the multi-threaded version of
+            the library, and then link against against the
+            <varname>zookeeper_mt</varname> library. If you are building a
+            single-threaded client, do not compile with -DTHREADED, and be
+            sure to link against the<varname> zookeeper_st
+            </varname>library.</para>
+          </listitem>
+        </orderedlist>
+
+        <para>Refer to <xref linkend="ch_programStructureWithExample"/>for examples of usage in Java and C.
+        <remark>[tbd: some kind of short tutorial would be helpful, I guess
+        (ben's tbd?) ][tbd: whatever the case, make sure that link points to something.]</remark></para>
+      </section>
+    </section>
+  </chapter>
+
+   <chapter id="ch_guideToZkOperations">
+    <title>Building Blocks: A Guide to ZooKeeper Operations</title>
+
+    <para><remark>[Engineering input needed. This is a new section. The below
+    is just placeholder, and was actually copied from the overview book. There
+    should probably be a subsection on each of those operations, with a little
+    bit of illustrative code for each op.] </remark></para>
+
+    <para>One of the design goals of ZooKeeper is provide a very simple
+    programming interface. As a result, it supports only these
+    operations:</para>
+
+    <variablelist>
+      <varlistentry>
+        <term>create</term>
+
+        <listitem>
+          <para>creates a node at a location in the tree</para>
+        </listitem>
+      </varlistentry>
+
+      <varlistentry>
+        <term>delete</term>
+
+        <listitem>
+          <para>deletes a node</para>
+        </listitem>
+      </varlistentry>
+
+      <varlistentry>
+        <term>exists</term>
+
+        <listitem>
+          <para>tests if a node exists at a location</para>
+        </listitem>
+      </varlistentry>
+
+      <varlistentry>
+        <term>get data</term>
+
+        <listitem>
+          <para>reads the data from a node</para>
+        </listitem>
+      </varlistentry>
+
+      <varlistentry>
+        <term>set data</term>
+
+        <listitem>
+          <para>writes data to a node</para>
+        </listitem>
+      </varlistentry>
+
+      <varlistentry>
+        <term>get children</term>
+
+        <listitem>
+          <para>retrieves a list of children of a node</para>
+        </listitem>
+      </varlistentry>
+
+      <varlistentry>
+        <term>sync</term>
+
+        <listitem>
+          <para>waits for data to be propagated.</para>
+        </listitem>
+      </varlistentry>
+    </variablelist>
+  </chapter>
+  
+  <chapter id="ch_programStructureWithExample">
+    <title>Program Structure, with Simple Example</title>
+
+    <para><remark>[tbd]</remark></para>
+  </chapter>
+
+  <chapter id="ch_gotchas">
+    <title>Gotchas: Common Problems and Troubleshooting</title>
+
+    <para>So now you know ZooKeeper. It's fast, simple, your application
+    works, but wait ... something's wrong. Here are some pitfalls that
+    ZooKeeper users fall into:</para>
+
+    <orderedlist>
+      <listitem>
+        <para>If you are using watches, you must look for the connected watch
+        event. When a ZooKeeper client disconnects from a server, all the
+        watches are removed, so a client must treat the disconnect event as an
+        implicit trigger of watches. The easiest way to deal with this is to
+        act like the connected watch event is a watch trigger for all your
+        watches. The connected event makes a better trigger than the
+        disconnected event because you can access ZooKeeper and reestablish
+        watches when you are connected.</para>
+      </listitem>
+
+      <listitem>
+        <para>You must test ZooKeeper server failures. The ZooKeeper service
+        can survive failures as long as a majority of servers are active. The
+        question to ask is: can your application handle it? In the real world
+        a client's connection to ZooKeeper can break. (ZooKeeper server
+        failures and network partitions are common reasons for connection
+        loss.) The ZooKeeper client library takes care of recovering your
+        connection and letting you know what happened, but you must make sure
+        that you recover your state and any outstanding requests that failed.
+        Find out if you got it right in the test lab, not in production - test
+        with a ZooKeeper service made up of a several of servers and subject
+        them to reboots.</para>
+      </listitem>
+
+      <listitem>
+        <para>The list of ZooKeeper servers used by the client must match the
+        list of ZooKeeper servers that each ZooKeeper server has. Things can
+        work, although not optimally, if the client list is a subset of the
+        real list of ZooKeeper servers, but not if the client lists ZooKeeper
+        servers not in the ZooKeeper cluster.</para>
+      </listitem>
+
+      <listitem>
+        <para>Be careful where you put that transaction log. The most
+        performance-critical part of ZooKeeper is the transaction log.
+        ZooKeeper must sync transactions to media before it returns a
+        response. A dedicated transaction log device is key to consistent good
+        performance. Putting the log on a busy device will adversely effect
+        performance. If you only have one storage device, put trace files on
+        NFS and increase the snapshotCount; it doesn't eliminate the problem,
+        but it can mitigate it.</para>
+      </listitem>
+
+      <listitem>
+        <para>Set your Java max heap size correctly. It is very important to
+        <emphasis>avoid swapping.</emphasis> Going to disk unnecessarily will
+        almost certainly degrade your performance unacceptably. Remember, in
+        ZooKeeper, everything is ordered, so if one request hits the disk, all
+        other queued requests hit the disk.</para>
+
+        <para>To avoid swapping, try to set the heapsize to the amount of
+        physical memory you have, minus the amount needed by the OS and cache.
+        The best way to determine an optimal heap size for your configurations
+        is to <emphasis>run load tests</emphasis>. If for some reason you
+        can't, be conservative in your estimates and choose a number well
+        below the limit that would cause your machine to swap. For example, on
+        a 4G machine, a 3G heap is a conservative estimate to start
+        with.</para>
+      </listitem>
+    </orderedlist>
+  </chapter>
+
+  <appendix id="apx_linksToOtherInfo">
+    <title>Links to Other Information</title>
+
+    <para>Outside the formal documentation, there're several other sources of
+    information for ZooKeeper developers.</para>
+
+    <variablelist>
+      <varlistentry>
+        <term>ZooKeeper Whitepaper <remark>[tbd: find url]</remark></term>
+
+        <listitem>
+          <para>The definitive discussion of ZooKeeper design and performance,
+          by Yahoo! Research</para>
+        </listitem>
+      </varlistentry>
+
+      <varlistentry>
+        <term>API Reference <remark>[tbd: find url]</remark></term>
+
+        <listitem>
+          <para>The complete reference to the ZooKeeper API</para>
+        </listitem>
+      </varlistentry>
+
+      <varlistentry>
+        <term><ulink
+        url="http://us.dl1.yimg.com/download.yahoo.com/dl/ydn/zookeeper.m4v">Zookeeper
+        Talk at the Hadoup Summit 2008</ulink></term>
+
+        <listitem>
+          <para>A video introduction to ZooKeeper, by Benjamin Reed of Yahoo!
+          Research</para>
+        </listitem>
+      </varlistentry>
+
+      <varlistentry>
+        <term><ulink
+        url="http://wiki.apache.org/hadoop/ZooKeeper/Tutorial">Barrier and
+        Queue Tutorial</ulink></term>
+
+        <listitem>
+          <para>The excellent Java tutorial by Flavio Junqueira, implementing
+          simple barriers and producer-consumer queues using ZooKeeper.</para>
+        </listitem>
+      </varlistentry>
+
+      <varlistentry>
+        <term><ulink
+        url="http://wiki.apache.org/hadoop/ZooKeeper/ZooKeeperArticles">ZooKeeper
+        - A Reliable, Scalable Distributed Coordination System</ulink></term>
+
+        <listitem>
+          <para>An article by Todd Hoff (07/15/2008)</para>
+        </listitem>
+      </varlistentry>
+
+      <varlistentry>
+        <term><ulink url="recipes.html">Zookeeper Recipes [tbd: fix
+        linkend for apache site]</ulink></term>
+
+        <listitem>
+          <para>Pseudo-level discussion of the implementation of various
+          synchronization solutions with ZooKeeper: Event Handles, Queues,
+          Locks, and Two-phase Commits.</para>
+        </listitem>
+      </varlistentry>
+
+      <varlistentry>
+        <term><remark>[tbd]</remark></term>
+
+        <listitem>
+          <para>Whatever good sources anyone can think of...</para>
+        </listitem>
+      </varlistentry>
+    </variablelist>
+  </appendix>
+</book>

+ 268 - 0
src/docs/src/documentation/content/xdocs/zookeeperStarted.xml

@@ -0,0 +1,268 @@
+<?xml version="1.0" encoding="UTF-8"?>
+<!--
+  Copyright 2002-2004 The Apache Software Foundation
+
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+      http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License.
+-->
+
+<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN"
+"http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd">
+<book id="bk_GettStartedGuide">
+  <title>ZooKeeper Getting Started Guide</title>
+
+  <bookinfo>
+    <legalnotice>
+      <para>Licensed under the Apache License, Version 2.0 (the "License");
+      you may not use this file except in compliance with the License. You may
+      obtain a copy of the License at <ulink
+      url="http://www.apache.org/licenses/LICENSE-2.0">http://www.apache.org/licenses/LICENSE-2.0</ulink>.</para>
+
+      <para>Unless required by applicable law or agreed to in writing,
+      software distributed under the License is distributed on an "AS IS"
+      BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
+      implied. See the License for the specific language governing permissions
+      and limitations under the License.</para>
+    </legalnotice>
+
+    <abstract>
+      <para>This guide contains detailed information about creating
+      distributed applications that use ZooKeeper. It discusses the basic
+      operations Zookeeper supports, and how these can be used to build
+      higher-level abstractions. It contains solutions to common tasks, a
+      troubleshooting guide, and links to other information.</para>
+    </abstract>
+  </bookinfo>
+
+  <chapter id="ch_GettingStarted">
+    <title>Getting Started: Coordinating Distributed Applications with
+      ZooKeeper</title>
+
+    <para>This document contains information to get you started quickly with
+    Zookeeper. It is aimed primarily at developers hoping to try it out, and
+    contains simple installation instructions for a single ZooKeeper server, a
+    few commands to verify that it is running, and a simple programming
+    example. Finally, as a convenience, there are a few sections regarding
+    more complicated installations, for example running replicated
+    deployments, and optimizing the transaction log. However for the complete
+    instructions for commercial deployments, please refer to the <ulink
+    url="zookeeperAdmin.html">Zookeeper
+    Administrator's Guide</ulink>.</para>
+
+    <section id="sc_InstallingSingleMode">
+      <title>Installing and Running ZooKeeper in Single Server Mode</title>
+
+      <para>Setting up a ZooKeeper server in standalone mode is
+      straightforward. The server is contained in a single JAR file, so
+      installation consists of copying a JAR file and creating a
+      configuration.</para>
+
+      <note>
+        <para>Zookeeper requires Java 1.5 or more recent.</para>
+      </note>
+
+      <para>[tbd: should we start w/ a word here about were to get the source,
+      exactly what to download, how to unpack it, and where to put it? Also,
+      does the user need to be in sudo, or can they be under their regular
+      login?]</para>
+
+      <para>Once you have downloaded the ZooKeeper source, cd to the root of
+      your ZooKeeper source, and run "ant jar". For example:<screen>$ cd ~/dev/zookeeper
+
+$ ~/dev/zookeeper/: ant jar</screen></para>
+
+      <para>This should generate a JAR file called zookeeper.jar. To start
+      Zookeeper, compile and run zookeeper.jar. <emphasis>[tbd, some more
+      instruction here. Perhaps a command line? Are these two steps or
+      one?]</emphasis></para>
+
+      <para>To start ZooKeeper you need a configuration file. Here is a sample
+      file:</para>
+
+      <para><programlisting>tickTime=2000
+dataDir=/var/zookeeper/ 
+clientPort=2181
+</programlisting></para>
+
+      <para>This file can be called anything, but for the sake of this
+      discussion, call it <emphasis role="bold">zoo.cfg</emphasis>. Here are
+      the meanings for each of the fields:</para>
+
+      <variablelist>
+        <varlistentry>
+          <term><emphasis role="bold">tickTime</emphasis></term>
+
+          <listitem>
+            <para>the basic time unit in milliseconds used by ZooKeeper. It is
+            used to do heartbeats and the minimum session timeout will be
+            twice the tickTime.</para>
+          </listitem>
+        </varlistentry>
+      </variablelist>
+
+      <variablelist>
+        <varlistentry>
+          <term><emphasis role="bold">dataDir</emphasis></term>
+
+          <listitem>
+            <para>the location to store the in-memory database snapshots and,
+            unless specified otherwise, the transaction log of updates to the
+            database.</para>
+          </listitem>
+        </varlistentry>
+
+        <varlistentry>
+          <term><emphasis role="bold">clientPort</emphasis></term>
+
+          <listitem>
+            <para>the port to listen for client connections</para>
+          </listitem>
+        </varlistentry>
+      </variablelist>
+
+      <para>Now that you created the configuration file, you can start
+      ZooKeeper:</para>
+
+      <para><screen>java -cp zookeeper-dev.jar:java/lib/log4j-1.2.15.jar:conf org.apache.zookeeper.server.quorum.QuorumPeerMain zoo.cfg</screen></para>
+
+      <para>ZooKeeper logs messages using log4j -- more detail available in
+      the <ulink url="zookeeperProgrammers.html#Logging">Logging</ulink>
+      section of the Programmer's Guide.<remark revision="include_tbd">[tbd:
+      real reference needed]</remark> You will see log messages coming to the
+      console and/or a log file depending on the log4j configuration.</para>
+
+      <para>The steps outlined here run ZooKeeper in standalone mode. There is
+      no replication, so if Zookeeper process fails, the service will go down.
+      This is fine for most development situations, but to run Zookeeper in
+      replicated mode, please see <ulink
+      url="#sc_RunningReplicatedZooKeeper">Running Replicated
+      Zookeeper</ulink>.</para>
+
+      <para></para>
+    </section>
+
+    <section id="sc_ConnectingToZooKeeper">
+      <title>Connecting to ZooKeeper</title>
+
+      <para>Once ZooKeeper is running, you have several option for connection
+      to it:</para>
+
+      <itemizedlist>
+        <listitem>
+          <para><emphasis role="bold">Java</emphasis>: Use java -cp
+          zookeeper.jar:java/lib/log4j-1.2.15.jar:conf
+          org.apache.zookeeper.ZooKeeperMain 127.0.0.1:2181</para>
+
+          <para>This lets you perform simple, file-like operations.</para>
+        </listitem>
+
+        <listitem>
+          <para><emphasis role="bold">C</emphasis>: compile cli_mt
+          (multi-threaded) or cli_st (single-threaded) by running
+          <command>_make cli_mt_</command> or <command>_make cli_st_</command>
+          in the c subdirectory in the ZooKeeper sources.</para>
+
+          <para>You can run the program using <emphasis>LD_LIBRARY_PATH=.
+          cli_mt 127.0.0.1:2181</emphasis> or <emphasis>LD_LIBRARY_PATH=.
+          cli_st 127.0.0.1:2181</emphasis>. This will give you a simple shell
+          to execute file system like operations on ZooKeeper.</para>
+        </listitem>
+      </itemizedlist>
+    </section>
+
+    <section id="sc_ProgrammingToZooKeeper">
+      <title>Programming to ZooKeeper</title>
+
+      <para>ZooKeeper has a Java bindings and C bindings. They are
+      functionally equivalent. The C bindings exist in two variants: single
+      threaded and multi-threaded. These differ only in how the messaging loop
+      is done. <remark>[tbd: what is the messaging loop? Do we talk about it
+      anywyhere? is this too much info for a getting started guide?]</remark>
+      For more information, see the <ulink
+      url="zookeeperProgrammers.html#ch_programStructureWithExample.html">Programming
+      Examples in the Zookeeper Programmer's Guide</ulink> for
+      sample code using of the different APIs.</para>
+    </section>
+
+    <section id="sc_RunningReplicatedZooKeeper">
+      <title>Running Replicated ZooKeeper</title>
+
+      <para>Running ZooKeeper in standalone mode is convenient for evaluation,
+      some development, and testing. But in production, you should run
+      ZooKeeper in replicated mode. A replicated group of servers in the same
+      application is called a <emphasis>quorum</emphasis>, and in replicated
+      mode, all servers in the quorum have copies of the same configuration
+      file. The file is similar to the one used in standalone mode, but with a
+      few differences. Here is an example:</para>
+
+      <para><screen>tickTime=2000 
+dataDir=/var/zookeeper/ 
+clientPort=2181 
+initLimit=5 
+syncLimit=2 
+server.1=zoo1:2888 server.2=zoo2:2888 
+server.3=zoo3:2888 </screen></para>
+
+      <para>The new entry, <emphasis role="bold">initLimit</emphasis> is
+      timeouts ZooKeeper uses to limit the length of time the Zookeeper
+      servers in quorum have to connect to a leader. The entry <emphasis
+      role="bold">syncLimit</emphasis> limits how far out of date a server can
+      be from a leader. [TBD: someone please verify that the previous is
+      true.]</para>
+
+      <para>With both of these timeouts, you specify the unit of time using
+      <emphasis role="bold">tickTime</emphasis>. In this example, the timeout
+      for initLimit is 5 ticks at 2000 milleseconds a tick, or 10
+      seconds.</para>
+
+      <para>The entries of the form <emphasis>server.X</emphasis> list the
+      servers that make up the ZooKeeper service. When the server starts up,
+      it knows which server it is by looking for the file *myid* in the data
+      directory. That file has the contains the server number, in
+      ASCII.</para>
+
+      <para>Finally, note the "2888" port numbers after each server name.
+      These are the "electionPort" numbers of the servers (as opposed to
+      clientPorts), that is ports for <remark>[tbd: feedback need: what are
+      these ports, exactly?]</remark>.</para>
+
+      <note>
+        <para>If you want to test multiple servers on a single machine, define
+        the electionPort for each server in that server's config file, using
+        the line <command>electionPort=xxxx</command> as means of avoiding
+        clashes.</para>
+      </note>
+    </section>
+
+    <section>
+      <title>Other Optimizations</title>
+
+      <para>There are a couple of other configuration parameters that can
+      greatly increase performance:</para>
+
+      <itemizedlist>
+        <listitem>
+          <para>To get low latencies on updates it is important to have a
+          dedicated transaction log directory. By default transaction logs are
+          put in the same directory as the data snapshots and *myid* file. The
+          dataLogDir parameters indicates a different directory to use for the
+          transaction logs.</para>
+        </listitem>
+
+        <listitem>
+          <para><remark>[tbd: feedback need: what is the other config param?
+          (I believe two are mentioned above.)]</remark></para>
+        </listitem>
+      </itemizedlist>
+    </section>
+  </chapter>
+</book>

BIN
src/docs/src/documentation/resources/images/architecture.gif


BIN
src/docs/src/documentation/resources/images/zkarch.jpg


BIN
src/docs/src/documentation/resources/images/zkcomponents.jpg


BIN
src/docs/src/documentation/resources/images/zknamespace.jpg


BIN
src/docs/src/documentation/resources/images/zkperfRW.jpg


BIN
src/docs/src/documentation/resources/images/zkperfreliability.jpg


BIN
src/docs/src/documentation/resources/images/zkservice.jpg


Some files were not shown because too many files changed in this diff