浏览代码

ZOOKEEPER-3615: Provide formal specification and verification using TLA+ for Zab (#1690)

* push zab-tla into project

* update experiment

* modify experiment/README

* update data

* update experiment data

* fix bugs, miss broadcast info of proposalMsgsLog in NEWLEADER, modify LocalPrimaryOrder; update pdf and README

* add apache license to pass building, and updata experiment data

* add testVars in ZabWithQTest.tla, update README

* + handling of cepochSent in receiving NEWEPOCH to decrease meaningles… …

* add Apache License in tla file

* Update line number of actions from tla file in README

* project migration, update README, add test tla file by extending Zab.tla

* modify .cfg by adding license header, delete .aux and .tex

* upload zab-1.0-tla

* update ZabWithFLE, update results

* update zab-1.0

* fix bug, modify README

* modify test module

* Update spec: implement phase of recovery-sync in ZabWithFLEAndSYNC.tla

* update ZabWithFLEAndSYNC: Replace Value with recorder.nClientRequest, and modify actions related to TxnEqual and nClientRequest

* update ZabWithFLEAndSYNC: Replace Value with recorder.nClientRequest, and modify actions related to TxnEqual and nClientRequest

* update zabWithFLEAndSYNC: fix bug only occured in multiple servers(>=3), when follower may receive LEADERINFO twice.

* update zab pre-1.0 in paper DSN'2011

* update zookeeper-specifications: update framework and tla code
Binyu Huang 1 年之前
父节点
当前提交
df8a53a974

+ 1 - 1
zookeeper-server/src/main/java/org/apache/zookeeper/server/quorum/LearnerHandler.java

@@ -1031,7 +1031,7 @@ public class LearnerHandler extends ZooKeeperThread {
             // is the catch when our history older than learner and there is
             // no new txn since then. So we need an empty diff
             LOG.info(
-                "Sending TRUNC zxid=0x{}  for peer sid: {}",
+                "Sending DIFF zxid=0x{}  for peer sid: {}",
                 Long.toHexString(lastCommittedZxid),
                 getSid());
             queueOpPacket(Leader.DIFF, lastCommittedZxid);

+ 1233 - 0
zookeeper-specifications/protocol-spec/Zab.tla

@@ -0,0 +1,1233 @@
+-------------------------------- MODULE Zab ---------------------------------
+(* This is the formal specification for the Zab consensus algorithm,
+   in DSN'2011, which represents protocol specification in our work.*)
+EXTENDS Integers, FiniteSets, Sequences, Naturals, TLC
+-----------------------------------------------------------------------------
+\* The set of servers
+CONSTANT Server
+\* States of server
+CONSTANTS LOOKING, FOLLOWING, LEADING
+\* Zab states of server
+CONSTANTS ELECTION, DISCOVERY, SYNCHRONIZATION, BROADCAST
+\* Message types
+CONSTANTS CEPOCH, NEWEPOCH, ACKEPOCH, NEWLEADER, ACKLD, COMMITLD, PROPOSE, ACK, COMMIT
+\* [MaxTimeoutFailures, MaxTransactionNum, MaxEpoch, MaxRestarts]
+CONSTANT Parameters
+
+MAXEPOCH == 10
+NullPoint == CHOOSE p: p \notin Server
+Quorums == {Q \in SUBSET Server: Cardinality(Q)*2 > Cardinality(Server)}
+-----------------------------------------------------------------------------
+\* Variables that all servers use.
+VARIABLES state,          \* State of server, in {LOOKING, FOLLOWING, LEADING}.
+          zabState,       \* Current phase of server, in
+                          \* {ELECTION, DISCOVERY, SYNCHRONIZATION, BROADCAST}.
+          acceptedEpoch,  \* Epoch of the last LEADERINFO packet accepted,
+                          \* namely f.p in paper.
+          currentEpoch,   \* Epoch of the last NEWLEADER packet accepted,
+                          \* namely f.a in paper.
+          history,        \* History of servers: sequence of transactions,
+                          \* containing: [zxid, value, ackSid, epoch].
+          lastCommitted   \* Maximum index and zxid known to be committed,
+                          \* namely 'lastCommitted' in Leader. Starts from 0,
+                          \* and increases monotonically before restarting.
+
+\* Variables only used for leader.
+VARIABLES learners,       \* Set of servers leader connects.
+          cepochRecv,     \* Set of learners leader has received CEPOCH from.
+                          \* Set of record [sid, connected, epoch],
+                          \* where epoch means f.p from followers.
+          ackeRecv,       \* Set of learners leader has received ACKEPOCH from.
+                          \* Set of record 
+                          \* [sid, connected, peerLastEpoch, peerHistory],
+                          \* to record f.a and h(f) from followers.
+          ackldRecv,      \* Set of learners leader has received ACKLD from.
+                          \* Set of record [sid, connected].
+          sendCounter     \* Count of txns leader has broadcast.
+
+\* Variables only used for follower.
+VARIABLES connectInfo \* If follower has connected with leader.
+                      \* If follower lost connection, then null.
+
+\* Variable representing oracle of leader.
+VARIABLE  leaderOracle  \* Current oracle.
+
+\* Variables about network channel.
+VARIABLE  msgs       \* Simulates network channel.
+                     \* msgs[i][j] means the input buffer of server j 
+                     \* from server i.
+
+\* Variables only used in verifying properties.
+VARIABLES epochLeader,       \* Set of leaders in every epoch.
+          proposalMsgsLog,   \* Set of all broadcast messages.
+          violatedInvariants \* Check whether there are conditions 
+                             \* contrary to the facts.
+
+\* Variable used for recording critical data,
+\* to constrain state space or update values.
+VARIABLE  recorder \* Consists: members of Parameters and pc, values.
+                   \* Form is record: 
+                   \* [pc, nTransaction, maxEpoch, nTimeout, nRestart, nClientRequest]
+
+serverVars == <<state, zabState, acceptedEpoch, currentEpoch, 
+                history, lastCommitted>>
+
+leaderVars == <<learners, cepochRecv, ackeRecv, ackldRecv, 
+                sendCounter>>
+
+followerVars == connectInfo
+
+electionVars == leaderOracle
+
+msgVars == msgs
+
+verifyVars == <<proposalMsgsLog, epochLeader, violatedInvariants>>
+
+vars == <<serverVars, leaderVars, followerVars, electionVars, 
+          msgVars, verifyVars, recorder>>
+-----------------------------------------------------------------------------
+\* Return the maximum value from the set S
+Maximum(S) == IF S = {} THEN -1
+                        ELSE CHOOSE n \in S: \A m \in S: n >= m
+\* Return the minimum value from the set S
+Minimum(S) == IF S = {} THEN -1
+                        ELSE CHOOSE n \in S: \A m \in S: n <= m
+
+\* Check server state                       
+IsLeader(s)   == state[s] = LEADING
+IsFollower(s) == state[s] = FOLLOWING
+IsLooking(s)  == state[s] = LOOKING
+
+\* Check if s is a quorum
+IsQuorum(s) == s \in Quorums
+
+IsMyLearner(i, j) == j \in learners[i]
+IsMyLeader(i, j)  == connectInfo[i] = j
+HasNoLeader(i)    == connectInfo[i] = NullPoint
+HasLeader(i)      == connectInfo[i] /= NullPoint
+-----------------------------------------------------------------------------
+\* FALSE: zxid1 <= zxid2; TRUE: zxid1 > zxid2
+ZxidCompare(zxid1, zxid2) == \/ zxid1[1] > zxid2[1]
+                             \/ /\ zxid1[1] = zxid2[1]
+                                /\ zxid1[2] > zxid2[2]
+
+ZxidEqual(zxid1, zxid2) == zxid1[1] = zxid2[1] /\ zxid1[2] = zxid2[2]
+
+TxnZxidEqual(txn, z) == txn.zxid[1] = z[1] /\ txn.zxid[2] = z[2]
+
+TxnEqual(txn1, txn2) == /\ ZxidEqual(txn1.zxid, txn2.zxid)
+                        /\ txn1.value = txn2.value
+
+EpochPrecedeInTxn(txn1, txn2) == txn1.zxid[1] < txn2.zxid[1]
+-----------------------------------------------------------------------------
+\* Actions about recorder
+GetParameter(p) == IF p \in DOMAIN Parameters THEN Parameters[p] ELSE 0
+GetRecorder(p)  == IF p \in DOMAIN recorder   THEN recorder[p]   ELSE 0
+
+RecorderGetHelper(m) == (m :> recorder[m])
+RecorderIncHelper(m) == (m :> recorder[m] + 1)
+
+RecorderIncTimeout == RecorderIncHelper("nTimeout")
+RecorderGetTimeout == RecorderGetHelper("nTimeout")
+RecorderIncRestart == RecorderIncHelper("nRestart")
+RecorderGetRestart == RecorderGetHelper("nRestart")
+RecorderSetTransactionNum(pc) == ("nTransaction" :> 
+                                IF pc[1] = "LeaderProcessRequest" THEN
+                                    LET s == CHOOSE i \in Server: 
+                                        \A j \in Server: Len(history'[i]) >= Len(history'[j])                       
+                                    IN Len(history'[s])
+                                ELSE recorder["nTransaction"])
+RecorderSetMaxEpoch(pc)       == ("maxEpoch" :> 
+                                IF pc[1] = "LeaderProcessCEPOCH" THEN
+                                    LET s == CHOOSE i \in Server:
+                                        \A j \in Server: acceptedEpoch'[i] >= acceptedEpoch'[j]
+                                    IN acceptedEpoch'[s]
+                                ELSE recorder["maxEpoch"])
+RecorderSetRequests(pc)       == ("nClientRequest" :>
+                                IF pc[1] = "LeaderProcessRequest" THEN
+                                    recorder["nClientRequest"] + 1
+                                ELSE recorder["nClientRequest"] )
+RecorderSetPc(pc)      == ("pc" :> pc)
+RecorderSetFailure(pc) == CASE pc[1] = "Timeout"         -> RecorderIncTimeout @@ RecorderGetRestart
+                          []   pc[1] = "LeaderTimeout"   -> RecorderIncTimeout @@ RecorderGetRestart
+                          []   pc[1] = "FollowerTimeout" -> RecorderIncTimeout @@ RecorderGetRestart
+                          []   pc[1] = "Restart"         -> RecorderIncTimeout @@ RecorderIncRestart
+                          []   OTHER                     -> RecorderGetTimeout @@ RecorderGetRestart
+
+UpdateRecorder(pc) == recorder' = RecorderSetFailure(pc)      @@ RecorderSetTransactionNum(pc)
+                                  @@ RecorderSetMaxEpoch(pc)  @@ RecorderSetPc(pc) 
+                                  @@ RecorderSetRequests(pc)  @@ recorder
+UnchangeRecorder   == UNCHANGED recorder
+
+CheckParameterHelper(n, p, Comp(_,_)) == IF p \in DOMAIN Parameters 
+                                         THEN Comp(n, Parameters[p])
+                                         ELSE TRUE
+CheckParameterLimit(n, p) == CheckParameterHelper(n, p, LAMBDA i, j: i < j)
+
+CheckTimeout        == CheckParameterLimit(recorder.nTimeout,     "MaxTimeoutFailures")
+CheckTransactionNum == CheckParameterLimit(recorder.nTransaction, "MaxTransactionNum")
+CheckEpoch          == CheckParameterLimit(recorder.maxEpoch,     "MaxEpoch")
+CheckRestart        == /\ CheckTimeout 
+                       /\ CheckParameterLimit(recorder.nRestart,  "MaxRestarts")
+
+CheckStateConstraints == CheckTimeout /\ CheckTransactionNum /\ CheckEpoch /\ CheckRestart
+-----------------------------------------------------------------------------
+\* Actions about network
+PendingCEPOCH(i, j)    == /\ msgs[j][i] /= << >>
+                          /\ msgs[j][i][1].mtype = CEPOCH
+PendingNEWEPOCH(i, j)  == /\ msgs[j][i] /= << >>
+                          /\ msgs[j][i][1].mtype = NEWEPOCH
+PendingACKEPOCH(i, j)  == /\ msgs[j][i] /= << >>
+                          /\ msgs[j][i][1].mtype = ACKEPOCH
+PendingNEWLEADER(i, j) == /\ msgs[j][i] /= << >>
+                          /\ msgs[j][i][1].mtype = NEWLEADER
+PendingACKLD(i, j)     == /\ msgs[j][i] /= << >>
+                          /\ msgs[j][i][1].mtype = ACKLD
+PendingCOMMITLD(i, j)  == /\ msgs[j][i] /= << >>
+                          /\ msgs[j][i][1].mtype = COMMITLD
+PendingPROPOSE(i, j)   == /\ msgs[j][i] /= << >>
+                          /\ msgs[j][i][1].mtype = PROPOSE
+PendingACK(i, j)       == /\ msgs[j][i] /= << >>
+                          /\ msgs[j][i][1].mtype = ACK
+PendingCOMMIT(i, j)    == /\ msgs[j][i] /= << >>
+                          /\ msgs[j][i][1].mtype = COMMIT
+\* Add a message to msgs - add a message m to msgs.
+Send(i, j, m) == msgs' = [msgs EXCEPT ![i][j] = Append(msgs[i][j], m)]
+\* Remove a message from msgs - discard head of msgs.
+Discard(i, j) == msgs' = IF msgs[i][j] /= << >> THEN [msgs EXCEPT ![i][j] = Tail(msgs[i][j])]
+                                                ELSE msgs
+\* Combination of Send and Discard - discard head of msgs[j][i] and add m into msgs.
+Reply(i, j, m) == msgs' = [msgs EXCEPT ![j][i] = Tail(msgs[j][i]),
+                                       ![i][j] = Append(msgs[i][j], m)]
+\* Shuffle input buffer.
+Clean(i, j) == msgs' = [msgs EXCEPT ![j][i] = << >>, ![i][j] = << >>]   
+CleanInputBuffer(S) == msgs' = [s \in Server |-> 
+                                    [v \in Server |-> IF v \in S THEN << >>
+                                                      ELSE msgs[s][v] ] ]
+\* Leader broadcasts a message PROPOSE to all other servers in Q.
+\* Note: In paper, Q is fuzzy. We think servers who leader broadcasts NEWLEADER to
+\*       should receive every PROPOSE. So we consider ackeRecv as Q.
+\* Since we let ackeRecv = Q, there may exist some follower receiving COMMIT before
+\* COMMITLD, and zxid in COMMIT later than zxid in COMMITLD. To avoid this situation,
+\* if f \in ackeRecv but \notin ackldRecv, f should not receive COMMIT until 
+\* f \in ackldRecv and receives COMMITLD.
+Broadcast(i, m) ==
+        LET ackeRecv_quorum == {a \in ackeRecv[i]: a.connected = TRUE }
+            sid_ackeRecv == { a.sid: a \in ackeRecv_quorum }
+        IN msgs' = [msgs EXCEPT ![i] = [v \in Server |-> IF /\ v \in sid_ackeRecv
+                                                            /\ v \in learners[i] 
+                                                            /\ v /= i
+                                                         THEN Append(msgs[i][v], m)
+                                                         ELSE msgs[i][v] ] ]  
+\* Since leader decides to broadcasts message COMMIT when processing ACK, so
+\* we need to discard ACK and broadcast COMMIT.
+\* Here Q is ackldRecv, because we assume that f should not receive COMMIT until
+\* f receives COMMITLD.
+DiscardAndBroadcast(i, j, m) ==
+        LET ackldRecv_quorum == {a \in ackldRecv[i]: a.connected = TRUE }
+            sid_ackldRecv == { a.sid: a \in ackldRecv_quorum }
+        IN msgs' = [msgs EXCEPT ![j][i] = Tail(msgs[j][i]),
+                                ![i] = [v \in Server |-> IF /\ v \in sid_ackldRecv
+                                                            /\ v \in learners[i] 
+                                                            /\ v /= i
+                                                         THEN Append(msgs[i][v], m)
+                                                         ELSE msgs[i][v] ] ]  
+\* Leader broadcasts LEADERINFO to all other servers in cepochRecv.
+DiscardAndBroadcastNEWEPOCH(i, j, m) ==
+        LET new_cepochRecv_quorum == {c \in cepochRecv'[i]: c.connected = TRUE }
+            new_sid_cepochRecv == { c.sid: c \in new_cepochRecv_quorum }
+        IN msgs' = [msgs EXCEPT ![j][i] = Tail(msgs[j][i]),
+                                ![i] = [v \in Server |-> IF /\ v \in new_sid_cepochRecv
+                                                            /\ v \in learners[i] 
+                                                            /\ v /= i
+                                                         THEN Append(msgs[i][v], m)
+                                                         ELSE msgs[i][v] ] ]
+\* Leader broadcasts NEWLEADER to all other servers in ackeRecv.
+DiscardAndBroadcastNEWLEADER(i, j, m) ==
+        LET new_ackeRecv_quorum == {a \in ackeRecv'[i]: a.connected = TRUE }
+            new_sid_ackeRecv == { a.sid: a \in new_ackeRecv_quorum }
+        IN msgs' = [msgs EXCEPT ![j][i] = Tail(msgs[j][i]),
+                                ![i] = [v \in Server |-> IF /\ v \in new_sid_ackeRecv
+                                                            /\ v \in learners[i] 
+                                                            /\ v /= i
+                                                         THEN Append(msgs[i][v], m)
+                                                         ELSE msgs[i][v] ] ]
+\* Leader broadcasts COMMITLD to all other servers in ackldRecv.
+DiscardAndBroadcastCOMMITLD(i, j, m) ==
+        LET new_ackldRecv_quorum == {a \in ackldRecv'[i]: a.connected = TRUE }
+            new_sid_ackldRecv == { a.sid: a \in new_ackldRecv_quorum }
+        IN msgs' = [msgs EXCEPT ![j][i] = Tail(msgs[j][i]),
+                                ![i] = [v \in Server |-> IF /\ v \in new_sid_ackldRecv
+                                                            /\ v \in learners[i] 
+                                                            /\ v /= i
+                                                         THEN Append(msgs[i][v], m)
+                                                         ELSE msgs[i][v] ] ]
+-----------------------------------------------------------------------------
+\* Define initial values for all variables 
+InitServerVars == /\ state         = [s \in Server |-> LOOKING]
+                  /\ zabState      = [s \in Server |-> ELECTION]
+                  /\ acceptedEpoch = [s \in Server |-> 0]
+                  /\ currentEpoch  = [s \in Server |-> 0]
+                  /\ history       = [s \in Server |-> << >>]
+                  /\ lastCommitted = [s \in Server |-> [ index |-> 0,
+                                                         zxid  |-> <<0, 0>> ] ]
+
+InitLeaderVars == /\ learners       = [s \in Server |-> {}]
+                  /\ cepochRecv     = [s \in Server |-> {}]
+                  /\ ackeRecv       = [s \in Server |-> {}]
+                  /\ ackldRecv      = [s \in Server |-> {}]
+                  /\ sendCounter    = [s \in Server |-> 0]
+
+InitFollowerVars == connectInfo = [s \in Server |-> NullPoint]
+
+InitElectionVars == leaderOracle = NullPoint
+
+InitMsgVars == msgs = [s \in Server |-> [v \in Server |-> << >>] ]
+
+InitVerifyVars == /\ proposalMsgsLog    = {}
+                  /\ epochLeader        = [i \in 1..MAXEPOCH |-> {} ]
+                  /\ violatedInvariants = [stateInconsistent    |-> FALSE,
+                                           proposalInconsistent |-> FALSE,
+                                           commitInconsistent   |-> FALSE,
+                                           ackInconsistent      |-> FALSE,
+                                           messageIllegal       |-> FALSE ]
+
+InitRecorder == recorder = [nTimeout       |-> 0,
+                            nTransaction   |-> 0,
+                            maxEpoch       |-> 0,
+                            nRestart       |-> 0,
+                            pc             |-> <<"Init">>,
+                            nClientRequest |-> 0]
+
+Init == /\ InitServerVars
+        /\ InitLeaderVars
+        /\ InitFollowerVars
+        /\ InitElectionVars
+        /\ InitVerifyVars
+        /\ InitMsgVars
+        /\ InitRecorder
+-----------------------------------------------------------------------------
+\* Utils in state switching
+FollowerShutdown(i) == 
+        /\ state'    = [state      EXCEPT ![i] = LOOKING]
+        /\ zabState' = [zabState   EXCEPT ![i] = ELECTION]
+        /\ connectInfo' = [connectInfo EXCEPT ![i] = NullPoint]
+
+LeaderShutdown(i) ==
+        /\ LET S == learners[i]
+           IN /\ state' = [s \in Server |-> IF s \in S THEN LOOKING ELSE state[s] ]
+              /\ zabState' = [s \in Server |-> IF s \in S THEN ELECTION ELSE zabState[s] ]
+              /\ connectInfo' = [s \in Server |-> IF s \in S THEN NullPoint ELSE connectInfo[s] ]
+              /\ CleanInputBuffer(S)
+        /\ learners'   = [learners   EXCEPT ![i] = {}]
+
+SwitchToFollower(i) ==
+        /\ state' = [state EXCEPT ![i] = FOLLOWING]
+        /\ zabState' = [zabState EXCEPT ![i] = DISCOVERY]
+
+SwitchToLeader(i) ==
+        /\ state' = [state EXCEPT ![i] = LEADING]
+        /\ zabState' = [zabState EXCEPT ![i] = DISCOVERY]
+        /\ learners' = [learners EXCEPT ![i] = {i}]
+        /\ cepochRecv' = [cepochRecv EXCEPT ![i] = { [ sid       |-> i,
+                                                       connected |-> TRUE,
+                                                       epoch     |-> acceptedEpoch[i] ] }]
+        /\ ackeRecv' = [ackeRecv EXCEPT ![i] = { [ sid           |-> i,
+                                                   connected     |-> TRUE,
+                                                   peerLastEpoch |-> currentEpoch[i],
+                                                   peerHistory   |-> history[i] ] }]
+        /\ ackldRecv' = [ackldRecv EXCEPT ![i] = { [ sid       |-> i,
+                                                     connected |-> TRUE ] }]
+        /\ sendCounter' = [sendCounter EXCEPT ![i] = 0]
+
+RemoveCepochRecv(set, sid) ==
+        LET sid_cepochRecv == {s.sid: s \in set}
+        IN IF sid \notin sid_cepochRecv THEN set
+           ELSE LET info == CHOOSE s \in set: s.sid = sid
+                    new_info == [ sid       |-> sid,
+                                  connected |-> FALSE,
+                                  epoch     |-> info.epoch ]
+                IN (set \ {info}) \union {new_info}
+
+RemoveAckeRecv(set, sid) ==
+        LET sid_ackeRecv == {s.sid: s \in set}
+        IN IF sid \notin sid_ackeRecv THEN set
+           ELSE LET info == CHOOSE s \in set: s.sid = sid
+                    new_info == [ sid |-> sid,
+                                  connected |-> FALSE,
+                                  peerLastEpoch |-> info.peerLastEpoch,
+                                  peerHistory   |-> info.peerHistory ]
+                IN (set \ {info}) \union {new_info}
+
+RemoveAckldRecv(set, sid) ==
+        LET sid_ackldRecv == {s.sid: s \in set}
+        IN IF sid \notin sid_ackldRecv THEN set
+           ELSE LET info == CHOOSE s \in set: s.sid = sid
+                    new_info == [ sid |-> sid,
+                                  connected |-> FALSE ]
+                IN (set \ {info}) \union {new_info}
+
+RemoveLearner(i, j) ==
+        /\ learners'   = [learners   EXCEPT ![i] = @ \ {j}] 
+        /\ cepochRecv' = [cepochRecv EXCEPT ![i] = RemoveCepochRecv(@, j) ]
+        /\ ackeRecv'   = [ackeRecv   EXCEPT ![i] = RemoveAckeRecv(@, j) ]
+        /\ ackldRecv'  = [ackldRecv  EXCEPT ![i] = RemoveAckldRecv(@, j) ]
+-----------------------------------------------------------------------------
+\* Actions of election
+UpdateLeader(i) ==
+        /\ IsLooking(i)
+        /\ leaderOracle /= i
+        /\ leaderOracle' = i
+        /\ SwitchToLeader(i)
+        /\ UNCHANGED <<acceptedEpoch, currentEpoch, history, lastCommitted, 
+                followerVars, verifyVars, msgVars>>
+        /\ UpdateRecorder(<<"UpdateLeader", i>>)
+
+FollowLeader(i) ==
+        /\ IsLooking(i)
+        /\ leaderOracle /= NullPoint
+        /\ \/ /\ leaderOracle = i
+              /\ SwitchToLeader(i)
+           \/ /\ leaderOracle /= i
+              /\ SwitchToFollower(i)
+              /\ UNCHANGED leaderVars
+        /\ UNCHANGED <<acceptedEpoch, currentEpoch, history, lastCommitted, 
+                electionVars, followerVars, verifyVars, msgVars>>
+        /\ UpdateRecorder(<<"FollowLeader", i>>)
+
+-----------------------------------------------------------------------------
+(* Actions of situation error. Situation error in protocol spec is different
+   from latter specs. This is for compressing state space, we focus on results
+   from external events (e.g. network partition, node failure, etc.), so we do
+   not need to add variables and actions about network conditions and node 
+   conditions. It is reasonable that we have action 'Restart' but no 'Crash',
+   because when a node does not execute any internal events after restarting, 
+   this is equivalent to executing a crash.
+*)
+
+\* Timeout between leader and follower.   
+Timeout(i, j) ==
+        /\ CheckTimeout \* test restrictions of timeout
+        /\ IsLeader(i)   /\ IsMyLearner(i, j)
+        /\ IsFollower(j) /\ IsMyLeader(j, i)
+        /\ LET newLearners == learners[i] \ {j}
+           IN \/ /\ IsQuorum(newLearners)  \* just remove this learner
+                 /\ RemoveLearner(i, j)
+                 /\ FollowerShutdown(j)
+                 /\ Clean(i, j)
+              \/ /\ ~IsQuorum(newLearners) \* leader switches to looking
+                 /\ LeaderShutdown(i)
+                 /\ UNCHANGED <<cepochRecv, ackeRecv, ackldRecv>>
+        /\ UNCHANGED <<acceptedEpoch, currentEpoch, history, lastCommitted,
+                       sendCounter, electionVars, verifyVars>>
+        /\ UpdateRecorder(<<"Timeout", i, j>>)
+
+Restart(i) ==
+        /\ CheckRestart \* test restrictions of restart
+        /\ \/ /\ IsLooking(i)
+              /\ UNCHANGED <<state, zabState, learners, followerVars, msgVars,
+                    cepochRecv, ackeRecv, ackldRecv>>
+           \/ /\ IsFollower(i)
+              /\ LET connectedWithLeader == HasLeader(i)
+                 IN \/ /\ connectedWithLeader
+                       /\ LET leader == connectInfo[i]
+                              newLearners == learners[leader] \ {i}
+                          IN 
+                          \/ /\ IsQuorum(newLearners)  \* leader remove learner i
+                             /\ RemoveLearner(leader, i)
+                             /\ FollowerShutdown(i)
+                             /\ Clean(leader, i)
+                          \/ /\ ~IsQuorum(newLearners) \* leader switches to looking
+                             /\ LeaderShutdown(leader)
+                             /\ UNCHANGED <<cepochRecv, ackeRecv, ackldRecv>>
+                    \/ /\ ~connectedWithLeader
+                       /\ FollowerShutdown(i)
+                       /\ CleanInputBuffer({i})
+                       /\ UNCHANGED <<learners, cepochRecv, ackeRecv, ackldRecv>>
+           \/ /\ IsLeader(i)
+              /\ LeaderShutdown(i)
+              /\ UNCHANGED <<cepochRecv, ackeRecv, ackldRecv>>
+        /\ lastCommitted' = [lastCommitted EXCEPT ![i] = [ index |-> 0,
+                                                           zxid  |-> <<0, 0>> ] ]
+        /\ UNCHANGED <<acceptedEpoch, currentEpoch, history,
+                       sendCounter, leaderOracle, verifyVars>>
+        /\ UpdateRecorder(<<"Restart", i>>)
+-----------------------------------------------------------------------------
+(* Establish connection between leader and follower. *)
+ConnectAndFollowerSendCEPOCH(i, j) ==
+        /\ IsLeader(i) /\ \lnot IsMyLearner(i, j)
+        /\ IsFollower(j) /\ HasNoLeader(j) /\ leaderOracle = i
+        /\ learners'   = [learners   EXCEPT ![i] = @ \union {j}]
+        /\ connectInfo' = [connectInfo EXCEPT ![j] = i]
+        /\ Send(j, i, [ mtype  |-> CEPOCH,
+                        mepoch |-> acceptedEpoch[j] ]) \* contains f.p
+        /\ UNCHANGED <<serverVars, electionVars, verifyVars, cepochRecv,
+                       ackeRecv, ackldRecv, sendCounter>>
+        /\ UpdateRecorder(<<"ConnectAndFollowerSendCEPOCH", i, j>>)
+
+CepochRecvQuorumFormed(i) == LET sid_cepochRecv == {c.sid: c \in cepochRecv[i]}
+                             IN IsQuorum(sid_cepochRecv)
+CepochRecvBecomeQuorum(i) == LET sid_cepochRecv == {c.sid: c \in cepochRecv'[i]}
+                             IN IsQuorum(sid_cepochRecv)
+
+UpdateCepochRecv(oldSet, sid, peerEpoch) ==
+        LET sid_set == {s.sid: s \in oldSet}
+        IN IF sid \in sid_set
+           THEN LET old_info == CHOOSE info \in oldSet: info.sid = sid
+                    new_info == [ sid       |-> sid,
+                                  connected |-> TRUE,
+                                  epoch     |-> peerEpoch ]
+                IN ( oldSet \ {old_info} ) \union {new_info}
+           ELSE LET follower_info == [ sid       |-> sid,
+                                       connected |-> TRUE,
+                                       epoch     |-> peerEpoch ]
+                IN oldSet \union {follower_info}
+
+\* Determine new e' in this round from a quorum of CEPOCH.
+DetermineNewEpoch(i) ==
+        LET epoch_cepochRecv == {c.epoch: c \in cepochRecv'[i]}
+        IN Maximum(epoch_cepochRecv) + 1
+
+(* Leader waits for receiving FOLLOWERINFO from a quorum including itself,
+   and chooses a new epoch e' as its own epoch and broadcasts NEWEPOCH. *)
+LeaderProcessCEPOCH(i, j) ==
+        /\ CheckEpoch  \* test restrictions of max epoch
+        /\ IsLeader(i)
+        /\ PendingCEPOCH(i, j)
+        /\ LET msg == msgs[j][i][1]
+               infoOk == IsMyLearner(i, j)
+           IN /\ infoOk
+              /\ \/ \* 1. has not broadcast NEWEPOCH
+                    /\ ~CepochRecvQuorumFormed(i)
+                    /\ \/ /\ zabState[i] = DISCOVERY
+                          /\ UNCHANGED violatedInvariants
+                       \/ /\ zabState[i] /= DISCOVERY
+                          /\ PrintT("Exception: CepochRecvQuorumFormed false," \o
+                               " while zabState not DISCOVERY.")
+                          /\ violatedInvariants' = [violatedInvariants 
+                                    EXCEPT !.stateInconsistent = TRUE]
+                    /\ cepochRecv' = [cepochRecv EXCEPT ![i] = UpdateCepochRecv(@, j, msg.mepoch) ]
+                    /\ \/ \* 1.1. cepochRecv becomes quorum, 
+                          \* then determine e' and broadcasts NEWEPOCH in Q. 
+                          /\ CepochRecvBecomeQuorum(i)
+                          /\ acceptedEpoch' = [acceptedEpoch EXCEPT ![i] = DetermineNewEpoch(i)]
+                          /\ LET m == [ mtype  |-> NEWEPOCH,
+                                        mepoch |-> acceptedEpoch'[i] ]
+                             IN DiscardAndBroadcastNEWEPOCH(i, j, m)
+                       \/ \* 1.2. cepochRecv still not quorum.
+                          /\ ~CepochRecvBecomeQuorum(i)
+                          /\ Discard(j, i)
+                          /\ UNCHANGED acceptedEpoch
+                 \/ \* 2. has broadcast NEWEPOCH
+                    /\ CepochRecvQuorumFormed(i)
+                    /\ cepochRecv' = [cepochRecv EXCEPT ![i] = UpdateCepochRecv(@, j, msg.mepoch) ]
+                    /\ Reply(i, j, [ mtype  |-> NEWEPOCH,
+                                     mepoch |-> acceptedEpoch[i] ])
+                    /\ UNCHANGED <<violatedInvariants, acceptedEpoch>>
+        /\ UNCHANGED <<state, zabState, currentEpoch, history, lastCommitted, learners, 
+                       ackeRecv, ackldRecv, sendCounter, followerVars,
+                       electionVars, proposalMsgsLog, epochLeader>>
+        /\ UpdateRecorder(<<"LeaderProcessCEPOCH", i, j>>)
+
+(* Follower receives LEADERINFO. If newEpoch >= acceptedEpoch, then follower 
+   updates acceptedEpoch and sends ACKEPOCH back, containing currentEpoch and
+   history. After this, zabState turns to SYNC. *)
+FollowerProcessNEWEPOCH(i, j) ==
+        /\ IsFollower(i)
+        /\ PendingNEWEPOCH(i, j)
+        /\ LET msg     == msgs[j][i][1]
+               infoOk  == IsMyLeader(i, j)
+               stateOk == zabState[i] = DISCOVERY
+               epochOk == msg.mepoch >= acceptedEpoch[i]
+           IN /\ infoOk
+              /\ \/ \* 1. Normal case
+                    /\ epochOk
+                    /\ \/ /\ stateOk
+                          /\ acceptedEpoch' = [acceptedEpoch EXCEPT ![i] = msg.mepoch]
+                          /\ LET m == [ mtype    |-> ACKEPOCH,
+                                        mepoch   |-> currentEpoch[i],
+                                        mhistory |-> history[i] ]
+                             IN Reply(i, j, m)
+                          /\ zabState' = [zabState EXCEPT ![i] = SYNCHRONIZATION]
+                          /\ UNCHANGED violatedInvariants
+                       \/ /\ ~stateOk
+                          /\ PrintT("Exception: Follower receives NEWEPOCH," \o
+                             " whileZabState not DISCOVERY.")
+                          /\ violatedInvariants' = [violatedInvariants 
+                                        EXCEPT !.stateInconsistent = TRUE]
+                          /\ Discard(j, i)
+                          /\ UNCHANGED <<acceptedEpoch, zabState>>
+                    /\ UNCHANGED <<followerVars, learners, cepochRecv, ackeRecv,
+                            ackldRecv, state>>
+                 \/ \* 2. Abnormal case - go back to election
+                    /\ ~epochOk
+                    /\ FollowerShutdown(i)
+                    /\ LET leader == connectInfo[i]
+                       IN /\ Clean(i, leader)
+                          /\ RemoveLearner(leader, i)
+                    /\ UNCHANGED <<acceptedEpoch, violatedInvariants>>
+        /\ UNCHANGED <<currentEpoch, history, lastCommitted, sendCounter,
+                    electionVars, proposalMsgsLog, epochLeader>>
+        /\ UpdateRecorder(<<"FollowerProcessNEWEPOCH", i, j>>)
+
+AckeRecvQuorumFormed(i) == LET sid_ackeRecv == {a.sid: a \in ackeRecv[i]}
+                           IN IsQuorum(sid_ackeRecv)
+AckeRecvBecomeQuorum(i) == LET sid_ackeRecv == {a.sid: a \in ackeRecv'[i]}
+                           IN IsQuorum(sid_ackeRecv)
+
+UpdateAckeRecv(oldSet, sid, peerEpoch, peerHistory) ==
+        LET sid_set == {s.sid: s \in oldSet}
+            follower_info == [ sid           |-> sid,
+                               connected     |-> TRUE,
+                               peerLastEpoch |-> peerEpoch,
+                               peerHistory   |-> peerHistory ]
+        IN IF sid \in sid_set 
+           THEN LET old_info == CHOOSE info \in oldSet: info.sid = sid
+                IN (oldSet \ {old_info}) \union {follower_info}
+           ELSE oldSet \union {follower_info}
+
+\* for checking invariants
+RECURSIVE SetPacketsForChecking(_,_,_,_,_,_)
+SetPacketsForChecking(set, src, ep, his, cur, end) ==
+        IF cur > end THEN set
+        ELSE LET m_proposal == [ source |-> src,
+                                 epoch  |-> ep,
+                                 zxid   |-> his[cur].zxid,
+                                 data   |-> his[cur].value ]
+             IN SetPacketsForChecking((set \union {m_proposal}), src, ep, his, cur + 1, end)
+    
+
+LastZxidOfHistory(his) == IF Len(his) = 0 THEN <<0, 0>>
+                          ELSE his[Len(his)].zxid
+
+\* TRUE: f1.a > f2.a or (f1.a = fa.a and f1.zxid >= f2.zxid)
+MoreResentOrEqual(ss1, ss2) == \/ ss1.currentEpoch > ss2.currentEpoch
+                               \/ /\ ss1.currentEpoch = ss2.currentEpoch
+                                  /\ ~ZxidCompare(ss2.lastZxid, ss1.lastZxid)
+
+\* Determine initial history Ie' in this round from a quorum of ACKEPOCH.
+DetermineInitialHistory(i) ==
+        LET set == ackeRecv'[i]
+            ss_set == { [ sid          |-> a.sid,
+                          currentEpoch |-> a.peerLastEpoch,
+                          lastZxid     |-> LastZxidOfHistory(a.peerHistory) ]
+                        : a \in set }
+            selected == CHOOSE ss \in ss_set: 
+                            \A ss1 \in (ss_set \ {ss}): MoreResentOrEqual(ss, ss1)
+            info == CHOOSE f \in set: f.sid = selected.sid
+        IN info.peerHistory
+
+RECURSIVE InitAcksidHelper(_,_)
+InitAcksidHelper(txns, src) == IF Len(txns) = 0 THEN << >>
+                               ELSE LET oldTxn == txns[1]
+                                        newTxn == [ zxid   |-> oldTxn.zxid,
+                                                    value  |-> oldTxn.value,
+                                                    ackSid |-> {src},
+                                                    epoch  |-> oldTxn.epoch ]
+                                    IN <<newTxn>> \o InitAcksidHelper( Tail(txns), src)
+
+\* Atomically let all txns in initial history contain self's acks.
+InitAcksid(i, his) == InitAcksidHelper(his, i)
+
+(* Leader waits for receiving ACKEPOPCH from a quorum, and determines initialHistory
+   according to history of whom has most recent state summary from them. After this,
+   leader's zabState turns to SYNCHRONIZATION. *)
+LeaderProcessACKEPOCH(i, j) ==
+        /\ IsLeader(i)
+        /\ PendingACKEPOCH(i, j)
+        /\ LET msg == msgs[j][i][1]
+               infoOk == IsMyLearner(i, j)
+           IN /\ infoOk
+              /\ \/ \* 1. has broadcast NEWLEADER 
+                    /\ AckeRecvQuorumFormed(i)
+                    /\ ackeRecv' = [ackeRecv EXCEPT ![i] = UpdateAckeRecv(@, j, 
+                                            msg.mepoch, msg.mhistory) ]
+                    /\ LET toSend == history[i] \* contains (Ie', Be')
+                           m == [ mtype    |-> NEWLEADER,
+                                  mepoch   |-> acceptedEpoch[i],
+                                  mhistory |-> toSend ]
+                           set_forChecking == SetPacketsForChecking({ }, i, 
+                                        acceptedEpoch[i], toSend, 1, Len(toSend))
+                       IN 
+                       /\ Reply(i, j, m) 
+                       /\ proposalMsgsLog' = proposalMsgsLog \union set_forChecking
+                    /\ UNCHANGED <<violatedInvariants, currentEpoch, history, 
+                                   zabState, epochLeader>>
+                 \/ \* 2. has not broadcast NEWLEADER
+                    /\ ~AckeRecvQuorumFormed(i)
+                    /\ \/ /\ zabState[i] = DISCOVERY
+                          /\ UNCHANGED violatedInvariants
+                       \/ /\ zabState[i] /= DISCOVERY
+                          /\ PrintT("Exception: AckeRecvQuorumFormed false," \o
+                             " while zabState not DISCOVERY.")
+                          /\ violatedInvariants' = [violatedInvariants EXCEPT 
+                                                    !.stateInconsistent = TRUE]
+                    /\ ackeRecv' = [ackeRecv EXCEPT ![i] = UpdateAckeRecv(@, j, 
+                                            msg.mepoch, msg.mhistory) ]
+                    /\ \/ \* 2.1. ackeRecv becomes quorum, determine Ie'
+                          \* and broacasts NEWLEADER in Q. (l.1.2 + l.2.1)
+                          /\ AckeRecvBecomeQuorum(i)
+                          /\ \* Update f.a
+                             LET newLeaderEpoch == acceptedEpoch[i] IN 
+                             /\ currentEpoch' = [currentEpoch EXCEPT ![i] = newLeaderEpoch]
+                             /\ epochLeader' = [epochLeader EXCEPT ![newLeaderEpoch] 
+                                                = @ \union {i} ] \* for checking invariants
+                          /\ \* Determine initial history Ie'
+                             LET initialHistory == DetermineInitialHistory(i) IN 
+                             history' = [history EXCEPT ![i] = InitAcksid(i, initialHistory) ]
+                          /\ \* Update zabState
+                             zabState' = [zabState EXCEPT ![i] = SYNCHRONIZATION]
+                          /\ \* Broadcast NEWLEADER with (e', Ie')
+                             LET toSend == history'[i] 
+                                 m == [ mtype    |-> NEWLEADER,
+                                        mepoch   |-> acceptedEpoch[i],
+                                        mhistory |-> toSend ]
+                                 set_forChecking == SetPacketsForChecking({ }, i, 
+                                            acceptedEpoch[i], toSend, 1, Len(toSend))
+                             IN 
+                             /\ DiscardAndBroadcastNEWLEADER(i, j, m)
+                             /\ proposalMsgsLog' = proposalMsgsLog \union set_forChecking
+                       \/ \* 2.2. ackeRecv still not quorum.
+                          /\ ~AckeRecvBecomeQuorum(i)
+                          /\ Discard(j, i)
+                          /\ UNCHANGED <<currentEpoch, history, zabState, 
+                                     proposalMsgsLog, epochLeader>>
+        /\ UNCHANGED <<state, acceptedEpoch, lastCommitted, learners, cepochRecv, ackldRecv, 
+                sendCounter, followerVars, electionVars>>
+        /\ UpdateRecorder(<<"LeaderProcessACKEPOCH", i, j>>)
+-----------------------------------------------------------------------------    
+(* Follower receives NEWLEADER. Update f.a and history. *)
+FollowerProcessNEWLEADER(i, j) ==
+        /\ IsFollower(i)
+        /\ PendingNEWLEADER(i, j)
+        /\ LET msg == msgs[j][i][1]
+               infoOk == IsMyLeader(i, j)
+               epochOk == acceptedEpoch[i] = msg.mepoch
+               stateOk == zabState[i] = SYNCHRONIZATION
+           IN /\ infoOk
+              /\ \/ \* 1. f.p not equals e', starts a new iteration.
+                    /\ ~epochOk
+                    /\ FollowerShutdown(i)
+                    /\ LET leader == connectInfo[i]
+                       IN /\ Clean(i, leader)
+                          /\ RemoveLearner(leader, i)
+                    /\ UNCHANGED <<violatedInvariants, currentEpoch, history>>
+                 \/ \* 2. f.p equals e'.
+                    /\ epochOk
+                    /\ \/ /\ stateOk
+                          /\ UNCHANGED violatedInvariants
+                       \/ /\ ~stateOk
+                          /\ PrintT("Exception: Follower receives NEWLEADER," \o
+                             " whileZabState not SYNCHRONIZATION.")
+                          /\ violatedInvariants' = [violatedInvariants 
+                                    EXCEPT !.stateInconsistent = TRUE]
+                    /\ currentEpoch' = [currentEpoch EXCEPT ![i] = acceptedEpoch[i]]
+                    /\ history' = [history EXCEPT ![i] = msg.mhistory] \* no need to care ackSid
+                    /\ LET m == [ mtype |-> ACKLD,
+                                  mzxid |-> LastZxidOfHistory(history'[i]) ]
+                       IN Reply(i, j, m)
+                    /\ UNCHANGED <<followerVars, state, zabState, learners, cepochRecv,
+                                    ackeRecv, ackldRecv>>
+        /\ UNCHANGED <<acceptedEpoch, lastCommitted, sendCounter, electionVars, 
+                proposalMsgsLog, epochLeader>>
+        /\ UpdateRecorder(<<"FollowerProcessNEWLEADER", i, j>>)
+
+AckldRecvQuorumFormed(i) == LET sid_ackldRecv == {a.sid: a \in ackldRecv[i]}
+                            IN IsQuorum(sid_ackldRecv)
+AckldRecvBecomeQuorum(i) == LET sid_ackldRecv == {a.sid: a \in ackldRecv'[i]}
+                            IN IsQuorum(sid_ackldRecv)
+
+UpdateAckldRecv(oldSet, sid) ==
+        LET sid_set == {s.sid: s \in oldSet}
+            follower_info == [ sid       |-> sid,
+                               connected |-> TRUE ]
+        IN IF sid \in sid_set
+           THEN LET old_info == CHOOSE info \in oldSet: info.sid = sid
+                IN (oldSet \ {old_info}) \union {follower_info}
+           ELSE oldSet \union {follower_info}
+
+LastZxid(i) == LastZxidOfHistory(history[i])
+
+RECURSIVE UpdateAcksidHelper(_,_,_)
+UpdateAcksidHelper(txns, target, endZxid) ==
+        IF Len(txns) = 0 THEN << >>
+        ELSE LET oldTxn == txns[1]
+             IN IF ZxidCompare(oldTxn.zxid, endZxid) THEN txns
+                ELSE LET newTxn == [ zxid   |-> oldTxn.zxid,
+                                     value  |-> oldTxn.value,
+                                     ackSid |-> IF target \in oldTxn.ackSid
+                                                THEN oldTxn.ackSid
+                                                ELSE oldTxn.ackSid \union {target},
+                                     epoch  |-> oldTxn.epoch ]
+                     IN <<newTxn>> \o UpdateAcksidHelper( Tail(txns), target, endZxid)
+    
+\* Atomically add ackSid of one learner according to zxid in ACKLD.
+UpdateAcksid(his, target, endZxid) == UpdateAcksidHelper(his, target, endZxid)
+
+(* Leader waits for receiving ACKLD from a quorum including itself,
+   and broadcasts COMMITLD and turns to BROADCAST. *)
+LeaderProcessACKLD(i, j) ==
+        /\ IsLeader(i)
+        /\ PendingACKLD(i, j)
+        /\ LET msg == msgs[j][i][1]
+               infoOk == IsMyLearner(i, j)
+           IN /\ infoOk
+              /\ \/ \* 1. has not broadcast COMMITLD
+                    /\ ~AckldRecvQuorumFormed(i)
+                    /\ \/ /\ zabState[i] = SYNCHRONIZATION
+                          /\ UNCHANGED violatedInvariants
+                       \/ /\ zabState[i] /= SYNCHRONIZATION
+                          /\ PrintT("Exception: AckldRecvQuorumFormed false," \o
+                                " while zabState not SYNCHRONIZATION.")
+                          /\ violatedInvariants' = [violatedInvariants 
+                                    EXCEPT !.stateInconsistent = TRUE]
+                    /\ ackldRecv' = [ackldRecv EXCEPT ![i] = UpdateAckldRecv(@, j) ]
+                    /\ history' = [history EXCEPT ![i] = UpdateAcksid(@, j, msg.mzxid)]
+                    /\ \/ \* 1.1. ackldRecv becomes quorum,
+                          \* then broadcasts COMMITLD and turns to BROADCAST.
+                          /\ AckldRecvBecomeQuorum(i)
+                          /\ lastCommitted' = [lastCommitted EXCEPT 
+                                                    ![i] = [ index |-> Len(history[i]),
+                                                             zxid  |-> LastZxid(i) ] ]
+                          /\ zabState' = [zabState EXCEPT ![i] = BROADCAST]
+                          /\ LET m == [ mtype |-> COMMITLD,
+                                        mzxid |-> LastZxid(i) ]
+                             IN DiscardAndBroadcastCOMMITLD(i, j, m)
+                       \/ \* 1.2. ackldRecv still not quorum.
+                          /\ ~AckldRecvBecomeQuorum(i)
+                          /\ Discard(j, i)
+                          /\ UNCHANGED <<zabState, lastCommitted>>
+                 \/ \* 2. has broadcast COMMITLD
+                    /\ AckldRecvQuorumFormed(i)
+                    /\ \/ /\ zabState[i] = BROADCAST
+                          /\ UNCHANGED violatedInvariants
+                       \/ /\ zabState[i] /= BROADCAST
+                          /\ PrintT("Exception: AckldRecvQuorumFormed true," \o
+                                " while zabState not BROADCAST.")
+                          /\ violatedInvariants' = [violatedInvariants 
+                                    EXCEPT !.stateInconsistent = TRUE]
+                    /\ ackldRecv' = [ackldRecv EXCEPT ![i] = UpdateAckldRecv(@, j) ]
+                    /\ history' = [history EXCEPT ![i] = UpdateAcksid(@, j, msg.mzxid)]
+                    /\ Reply(i, j, [ mtype |-> COMMITLD,
+                                     mzxid |-> lastCommitted[i].zxid ])
+                    /\ UNCHANGED <<zabState, lastCommitted>>
+        /\ UNCHANGED <<state, acceptedEpoch, currentEpoch, learners, cepochRecv, ackeRecv, 
+                    sendCounter, followerVars, electionVars, proposalMsgsLog, epochLeader>>
+        /\ UpdateRecorder(<<"LeaderProcessACKLD", i, j>>)
+
+RECURSIVE ZxidToIndexHepler(_,_,_,_)
+ZxidToIndexHepler(his, zxid, cur, appeared) == 
+        IF cur > Len(his) THEN cur  
+        ELSE IF TxnZxidEqual(his[cur], zxid) 
+             THEN CASE appeared = TRUE -> -1
+                  []   OTHER           -> Minimum( { cur, 
+                            ZxidToIndexHepler(his, zxid, cur + 1, TRUE) } ) 
+             ELSE ZxidToIndexHepler(his, zxid, cur + 1, appeared)
+
+\* return -1: this zxid appears at least twice. Len(his) + 1: does not exist.
+\* 1 - Len(his): exists and appears just once.
+ZxidToIndex(his, zxid) == IF ZxidEqual( zxid, <<0, 0>> ) THEN 0
+                          ELSE IF Len(his) = 0 THEN 1
+                               ELSE LET len == Len(his) IN
+                                    IF \E idx \in 1..len: TxnZxidEqual(his[idx], zxid)
+                                    THEN ZxidToIndexHepler(his, zxid, 1, FALSE)
+                                    ELSE len + 1
+
+(* Follower receives COMMITLD. Commit all txns. *)
+FollowerProcessCOMMITLD(i, j) ==
+        /\ IsFollower(i)
+        /\ PendingCOMMITLD(i, j)
+        /\ LET msg == msgs[j][i][1]
+               infoOk == IsMyLeader(i, j)
+               index == IF ZxidEqual(msg.mzxid, <<0, 0>>) THEN 0
+                        ELSE ZxidToIndex(history[i], msg.mzxid)
+               logOk == index >= 0 /\ index <= Len(history[i])
+           IN /\ infoOk
+              /\ \/ /\ logOk
+                    /\ UNCHANGED violatedInvariants
+                 \/ /\ ~logOk
+                    /\ PrintT("Exception: zxid in COMMITLD not exists in history.")
+                    /\ violatedInvariants' = [violatedInvariants
+                                 EXCEPT !.proposalInconsistent = TRUE]
+              /\ lastCommitted' = [lastCommitted EXCEPT ![i] = [ index |-> index,
+                                                                 zxid  |-> msg.mzxid ] ]
+              /\ zabState' = [zabState EXCEPT ![i] = BROADCAST]
+              /\ Discard(j, i)
+        /\ UNCHANGED <<state, acceptedEpoch, currentEpoch, history, leaderVars, 
+                    followerVars, electionVars, proposalMsgsLog, epochLeader>>
+        /\ UpdateRecorder(<<"FollowerProcessCOMMITLD", i, j>>)
+----------------------------------------------------------------------------
+IncZxid(s, zxid) == IF currentEpoch[s] = zxid[1] THEN <<zxid[1], zxid[2] + 1>>
+                    ELSE <<currentEpoch[s], 1>>
+
+(* Leader receives client request.
+   Note: In production, any server in traffic can receive requests and 
+         forward it to leader if necessary. We choose to let leader be
+         the sole one who can receive write requests, to simplify spec 
+         and keep correctness at the same time. *)
+LeaderProcessRequest(i) ==
+        /\ CheckTransactionNum \* test restrictions of transaction num
+        /\ IsLeader(i)
+        /\ zabState[i] = BROADCAST
+        /\ LET request_value == GetRecorder("nClientRequest") \* unique value
+               newTxn == [ zxid   |-> IncZxid(i, LastZxid(i)),
+                           value  |-> request_value,
+                           ackSid |-> {i},
+                           epoch  |-> currentEpoch[i] ]
+           IN history' = [history EXCEPT ![i] = Append(@, newTxn) ]
+        /\ UNCHANGED <<state, zabState, acceptedEpoch, currentEpoch, lastCommitted,
+                    leaderVars, followerVars, electionVars, msgVars, verifyVars>>
+        /\ UpdateRecorder(<<"LeaderProcessRequest", i>>)
+
+\* Latest counter existing in history.
+CurrentCounter(i) == IF LastZxid(i)[1] = currentEpoch[i] THEN LastZxid(i)[2]
+                     ELSE 0
+
+(* Leader broadcasts PROPOSE when sendCounter < currentCounter. *)
+LeaderBroadcastPROPOSE(i) == 
+        /\ IsLeader(i)
+        /\ zabState[i] = BROADCAST
+        /\ sendCounter[i] < CurrentCounter(i) \* there exists proposal to be sent
+        /\ LET toSendCounter == sendCounter[i] + 1
+               toSendZxid == <<currentEpoch[i], toSendCounter>>
+               toSendIndex == ZxidToIndex(history[i], toSendZxid)
+               toSendTxn == history[i][toSendIndex]
+               m_proposal == [ mtype |-> PROPOSE,
+                               mzxid |-> toSendTxn.zxid,
+                               mdata |-> toSendTxn.value ]
+               m_proposal_forChecking == [ source |-> i,
+                                           epoch  |-> currentEpoch[i],
+                                           zxid   |-> toSendTxn.zxid,
+                                           data   |-> toSendTxn.value ]
+           IN /\ sendCounter' = [sendCounter EXCEPT ![i] = toSendCounter]
+              /\ Broadcast(i, m_proposal)
+              /\ proposalMsgsLog' = proposalMsgsLog \union {m_proposal_forChecking}
+        /\ UNCHANGED <<serverVars, learners, cepochRecv, ackeRecv, ackldRecv, 
+                followerVars, electionVars, epochLeader, violatedInvariants>>
+        /\ UpdateRecorder(<<"LeaderBroadcastPROPOSE", i>>)
+
+IsNextZxid(curZxid, nextZxid) ==
+            \/ \* first PROPOSAL in this epoch
+               /\ nextZxid[2] = 1
+               /\ curZxid[1] < nextZxid[1]
+            \/ \* not first PROPOSAL in this epoch
+               /\ nextZxid[2] > 1
+               /\ curZxid[1] = nextZxid[1]
+               /\ curZxid[2] + 1 = nextZxid[2]
+
+(* Follower processes PROPOSE, saves it in history and replies ACK. *)
+FollowerProcessPROPOSE(i, j) ==
+        /\ IsFollower(i)
+        /\ PendingPROPOSE(i, j)
+        /\ LET msg == msgs[j][i][1]
+               infoOk == IsMyLeader(i, j)
+               isNext == IsNextZxid(LastZxid(i), msg.mzxid)
+               newTxn == [ zxid   |-> msg.mzxid,
+                           value  |-> msg.mdata,
+                           ackSid |-> {},
+                           epoch  |-> currentEpoch[i] ]
+               m_ack == [ mtype |-> ACK,
+                          mzxid |-> msg.mzxid ]
+           IN /\ infoOk
+              /\ \/ /\ isNext
+                    /\ history' = [history EXCEPT ![i] = Append(@, newTxn)]
+                    /\ Reply(i, j, m_ack)
+                    /\ UNCHANGED violatedInvariants
+                 \/ /\ ~isNext
+                    /\ LET index == ZxidToIndex(history[i], msg.mzxid)
+                           exist == index > 0 /\ index <= Len(history[i])
+                       IN \/ /\ exist
+                             /\ UNCHANGED violatedInvariants
+                          \/ /\ ~exist
+                             /\ PrintT("Exception: Follower receives PROPOSE, while" \o 
+                                    " txn is neither the next nor exists in history.")
+                             /\ violatedInvariants' = [violatedInvariants EXCEPT 
+                                         !.proposalInconsistent = TRUE]
+                    /\ Discard(j, i)
+                    /\ UNCHANGED history
+        /\ UNCHANGED <<state, zabState, acceptedEpoch, currentEpoch, lastCommitted,
+                    leaderVars, followerVars, electionVars, proposalMsgsLog, epochLeader>>
+        /\ UpdateRecorder(<<"FollowerProcessPROPOSE", i, j>>)
+
+LeaderTryToCommit(s, index, zxid, newTxn, follower) ==
+        LET allTxnsBeforeCommitted == lastCommitted[s].index >= index - 1
+                    \* Only when all proposals before zxid has been committed,
+                    \* this proposal can be permitted to be committed.
+            hasAllQuorums == IsQuorum(newTxn.ackSid)
+                    \* In order to be committed, a proposal must be accepted
+                    \* by a quorum.
+            ordered == lastCommitted[s].index + 1 = index
+                    \* Commit proposals in order.
+        IN \/ /\ \* Current conditions do not satisfy committing the proposal.
+                 \/ ~allTxnsBeforeCommitted
+                 \/ ~hasAllQuorums
+              /\ Discard(follower, s)
+              /\ UNCHANGED <<violatedInvariants, lastCommitted>>
+           \/ /\ allTxnsBeforeCommitted
+              /\ hasAllQuorums
+              /\ \/ /\ ~ordered
+                    /\ PrintT("Warn: Committing zxid " \o ToString(zxid) \o " not first.")
+                    /\ violatedInvariants' = [violatedInvariants EXCEPT 
+                            !.commitInconsistent = TRUE]
+                 \/ /\ ordered
+                    /\ UNCHANGED violatedInvariants
+              /\ lastCommitted' = [lastCommitted EXCEPT ![s] = [ index |-> index,
+                                                                 zxid  |-> zxid ] ]
+              /\ LET m_commit == [ mtype |-> COMMIT,
+                                   mzxid |-> zxid ]
+                 IN DiscardAndBroadcast(s, follower, m_commit)
+
+
+LastAckIndexFromFollower(i, j) == 
+        LET set_index == {idx \in 1..Len(history[i]): j \in history[i][idx].ackSid }
+        IN Maximum(set_index)
+
+(* Leader Keeps a count of acks for a particular proposal, and try to
+   commit the proposal. If committed, COMMIT of proposal will be broadcast. *)
+LeaderProcessACK(i, j) ==
+        /\ IsLeader(i)
+        /\ PendingACK(i, j)
+        /\ LET msg == msgs[j][i][1]
+               infoOk == IsMyLearner(i, j)
+               index == ZxidToIndex(history[i], msg.mzxid)
+               exist == index >= 1 /\ index <= Len(history[i]) \* proposal exists in history
+               outstanding == lastCommitted[i].index < Len(history[i]) \* outstanding not null
+               hasCommitted == ~ZxidCompare(msg.mzxid, lastCommitted[i].zxid)
+               ackIndex == LastAckIndexFromFollower(i, j)
+               monotonicallyInc == \/ ackIndex = -1
+                                   \/ ackIndex + 1 = index
+           IN /\ infoOk
+              /\ \/ /\ exist
+                    /\ monotonicallyInc
+                    /\ LET txn == history[i][index]
+                           txnAfterAddAck == [ zxid   |-> txn.zxid,
+                                               value  |-> txn.value,
+                                               ackSid |-> txn.ackSid \union {j} ,
+                                               epoch  |-> txn.epoch ]   
+                       IN
+                       /\ history' = [history EXCEPT ![i][index] = txnAfterAddAck ]
+                       /\ \/ /\ \* Note: outstanding is 0. 
+                                \* / proposal has already been committed.
+                                \/ ~outstanding
+                                \/ hasCommitted
+                             /\ Discard(j, i)
+                             /\ UNCHANGED <<violatedInvariants, lastCommitted>>
+                          \/ /\ outstanding
+                             /\ ~hasCommitted
+                             /\ LeaderTryToCommit(i, index, msg.mzxid, txnAfterAddAck, j)
+                 \/ /\ \/ ~exist
+                       \/ ~monotonicallyInc
+                    /\ PrintT("Exception: No such zxid. " \o 
+                           " / ackIndex doesn't inc monotonically.")
+                    /\ violatedInvariants' = [violatedInvariants 
+                            EXCEPT !.ackInconsistent = TRUE]
+                    /\ Discard(j, i)
+                    /\ UNCHANGED <<history, lastCommitted>>
+        /\ UNCHANGED <<state, zabState, acceptedEpoch, currentEpoch, leaderVars,
+                    followerVars, electionVars, proposalMsgsLog, epochLeader>>
+        /\ UpdateRecorder(<<"LeaderProcessACK", i, j>>)
+
+(* Follower processes COMMIT. *)
+FollowerProcessCOMMIT(i, j) ==
+        /\ IsFollower(i)
+        /\ PendingCOMMIT(i, j)
+        /\ LET msg == msgs[j][i][1]
+               infoOk == IsMyLeader(i, j)
+               pending == lastCommitted[i].index < Len(history[i])
+           IN /\ infoOk
+              /\ \/ /\ ~pending
+                    /\ PrintT("Warn: Committing zxid without seeing txn.")
+                    /\ UNCHANGED <<lastCommitted, violatedInvariants>>
+                 \/ /\ pending
+                    /\ LET firstElement == history[i][lastCommitted[i].index + 1]
+                           match == ZxidEqual(firstElement.zxid, msg.mzxid)
+                       IN
+                       \/ /\ ~match
+                          /\ PrintT("Exception: Committing zxid not equals" \o
+                                     " next pending txn zxid.")
+                          /\ violatedInvariants' = [violatedInvariants EXCEPT 
+                                    !.commitInconsistent = TRUE]
+                          /\ UNCHANGED lastCommitted
+                       \/ /\ match
+                          /\ lastCommitted' = [lastCommitted EXCEPT ![i] = 
+                                            [ index |-> lastCommitted[i].index + 1,
+                                              zxid  |-> firstElement.zxid ] ]
+                          /\ UNCHANGED violatedInvariants
+        /\ Discard(j, i)
+        /\ UNCHANGED <<state, zabState, acceptedEpoch, currentEpoch, history,
+                    leaderVars, followerVars, electionVars, proposalMsgsLog, epochLeader>>
+        /\ UpdateRecorder(<<"FollowerProcessCOMMIT", i, j>>)
+----------------------------------------------------------------------------     
+(* Used to discard some messages which should not exist in network channel.
+   This action should not be triggered. *)
+FilterNonexistentMessage(i) ==
+        /\ \E j \in Server \ {i}: /\ msgs[j][i] /= << >>
+                                  /\ LET msg == msgs[j][i][1]
+                                     IN 
+                                        \/ /\ IsLeader(i)
+                                           /\ LET infoOk == IsMyLearner(i, j)
+                                              IN
+                                              \/ msg.mtype = NEWEPOCH
+                                              \/ msg.mtype = NEWLEADER
+                                              \/ msg.mtype = COMMITLD
+                                              \/ msg.mtype = PROPOSE
+                                              \/ msg.mtype = COMMIT
+                                              \/ /\ ~infoOk
+                                                 /\ \/ msg.mtype = CEPOCH
+                                                    \/ msg.mtype = ACKEPOCH
+                                                    \/ msg.mtype = ACKLD
+                                                    \/ msg.mtype = ACK
+                                        \/ /\ IsFollower(i)
+                                           /\ LET infoOk == IsMyLeader(i, j) 
+                                              IN
+                                              \/ msg.mtype = CEPOCH
+                                              \/ msg.mtype = ACKEPOCH
+                                              \/ msg.mtype = ACKLD
+                                              \/ msg.mtype = ACK
+                                              \/ /\ ~infoOk
+                                                 /\ \/ msg.mtype = NEWEPOCH
+                                                    \/ msg.mtype = NEWLEADER
+                                                    \/ msg.mtype = COMMITLD
+                                                    \/ msg.mtype = PROPOSE
+                                                    \/ msg.mtype = COMMIT   
+                                        \/ IsLooking(i)
+                                  /\ Discard(j, i)
+        /\ violatedInvariants' = [violatedInvariants EXCEPT !.messageIllegal = TRUE]
+        /\ UNCHANGED <<serverVars, leaderVars, followerVars, electionVars,
+                    proposalMsgsLog, epochLeader>>
+        /\ UnchangeRecorder
+----------------------------------------------------------------------------
+\* Defines how the variables may transition.
+Next ==
+        (* Election *)
+        \/ \E i \in Server:    UpdateLeader(i)
+        \/ \E i \in Server:    FollowLeader(i)
+        (* Abnormal situations like failure, network disconnection *)
+        \/ \E i, j \in Server: Timeout(i, j)
+        \/ \E i \in Server:    Restart(i)
+        (* Zab module - Discovery and Synchronization part *)
+        \/ \E i, j \in Server: ConnectAndFollowerSendCEPOCH(i, j)
+        \/ \E i, j \in Server: LeaderProcessCEPOCH(i, j)
+        \/ \E i, j \in Server: FollowerProcessNEWEPOCH(i, j)
+        \/ \E i, j \in Server: LeaderProcessACKEPOCH(i, j)
+        \/ \E i, j \in Server: FollowerProcessNEWLEADER(i, j)
+        \/ \E i, j \in Server: LeaderProcessACKLD(i, j)
+        \/ \E i, j \in Server: FollowerProcessCOMMITLD(i, j)
+        (* Zab module - Broadcast part *)
+        \/ \E i \in Server:    LeaderProcessRequest(i)
+        \/ \E i \in Server:    LeaderBroadcastPROPOSE(i)
+        \/ \E i, j \in Server: FollowerProcessPROPOSE(i, j)
+        \/ \E i, j \in Server: LeaderProcessACK(i, j)
+        \/ \E i, j \in Server: FollowerProcessCOMMIT(i, j)
+        (* An action used to judge whether there are redundant messages in network *)
+        \/ \E i \in Server:    FilterNonexistentMessage(i)
+
+Spec == Init /\ [][Next]_vars
+----------------------------------------------------------------------------
+\* Define safety properties of Zab.
+
+ShouldNotBeTriggered == \A p \in DOMAIN violatedInvariants: violatedInvariants[p] = FALSE
+
+\* There is most one established leader for a certain epoch.
+Leadership1 == \A i, j \in Server:
+                   /\ IsLeader(i) /\ zabState[i] \in {SYNCHRONIZATION, BROADCAST}
+                   /\ IsLeader(j) /\ zabState[j] \in {SYNCHRONIZATION, BROADCAST}
+                   /\ currentEpoch[i] = currentEpoch[j]
+                  => i = j
+
+Leadership2 == \A epoch \in 1..MAXEPOCH: Cardinality(epochLeader[epoch]) <= 1
+
+\* PrefixConsistency: The prefix that have been committed 
+\* in history in any process is the same.
+PrefixConsistency == \A i, j \in Server:
+                        LET smaller == Minimum({lastCommitted[i].index, lastCommitted[j].index})
+                        IN \/ smaller = 0
+                           \/ /\ smaller > 0
+                              /\ \A index \in 1..smaller:
+                                   TxnEqual(history[i][index], history[j][index])
+
+\* Integrity: If some follower delivers one transaction, then some primary has broadcast it.
+Integrity == \A i \in Server:
+                /\ IsFollower(i)
+                /\ lastCommitted[i].index > 0
+                => \A idx \in 1..lastCommitted[i].index: \E proposal \in proposalMsgsLog:
+                    LET txn_proposal == [ zxid  |-> proposal.zxid,
+                                          value |-> proposal.data ]
+                    IN  TxnEqual(history[i][idx], txn_proposal)
+
+\* Agreement: If some follower f delivers transaction a and some follower f' delivers transaction b,
+\*            then f' delivers a or f delivers b.
+Agreement == \A i, j \in Server:
+                /\ IsFollower(i) /\ lastCommitted[i].index > 0
+                /\ IsFollower(j) /\ lastCommitted[j].index > 0
+                =>
+                \A idx1 \in 1..lastCommitted[i].index, idx2 \in 1..lastCommitted[j].index :
+                    \/ \E idx_j \in 1..lastCommitted[j].index:
+                        TxnEqual(history[j][idx_j], history[i][idx1])
+                    \/ \E idx_i \in 1..lastCommitted[i].index:
+                        TxnEqual(history[i][idx_i], history[j][idx2])
+
+\* Total order: If some follower delivers a before b, then any process that delivers b
+\*              must also deliver a and deliver a before b.
+TotalOrder == \A i, j \in Server: 
+                LET committed1 == lastCommitted[i].index 
+                    committed2 == lastCommitted[j].index  
+                IN committed1 >= 2 /\ committed2 >= 2
+                    => \A idx_i1 \in 1..(committed1 - 1) : \A idx_i2 \in (idx_i1 + 1)..committed1 :
+                    LET logOk == \E idx \in 1..committed2 :
+                                     TxnEqual(history[i][idx_i2], history[j][idx])
+                    IN \/ ~logOk 
+                       \/ /\ logOk 
+                          /\ \E idx_j2 \in 1..committed2 : 
+                                /\ TxnEqual(history[i][idx_i2], history[j][idx_j2])
+                                /\ \E idx_j1 \in 1..(idx_j2 - 1):
+                                       TxnEqual(history[i][idx_i1], history[j][idx_j1])
+
+\* Local primary order: If a primary broadcasts a before it broadcasts b, then a follower that
+\*                      delivers b must also deliver a before b.
+LocalPrimaryOrder == LET p_set(i, e) == {p \in proposalMsgsLog: /\ p.source = i 
+                                                                /\ p.epoch  = e }
+                         txn_set(i, e) == { [ zxid  |-> p.zxid, 
+                                              value |-> p.data ] : p \in p_set(i, e) }
+                     IN \A i \in Server: \A e \in 1..currentEpoch[i]:
+                         \/ Cardinality(txn_set(i, e)) < 2
+                         \/ /\ Cardinality(txn_set(i, e)) >= 2
+                            /\ \E txn1, txn2 \in txn_set(i, e):
+                             \/ TxnEqual(txn1, txn2)
+                             \/ /\ ~TxnEqual(txn1, txn2)
+                                /\ LET TxnPre  == IF ZxidCompare(txn1.zxid, txn2.zxid) THEN txn2 ELSE txn1
+                                       TxnNext == IF ZxidCompare(txn1.zxid, txn2.zxid) THEN txn1 ELSE txn2
+                                   IN \A j \in Server: /\ lastCommitted[j].index >= 2
+                                                       /\ \E idx \in 1..lastCommitted[j].index: 
+                                                            TxnEqual(history[j][idx], TxnNext)
+                                        => \E idx2 \in 1..lastCommitted[j].index: 
+                                            /\ TxnEqual(history[j][idx2], TxnNext)
+                                            /\ idx2 > 1
+                                            /\ \E idx1 \in 1..(idx2 - 1): 
+                                                TxnEqual(history[j][idx1], TxnPre)
+
+\* Global primary order: A follower f delivers both a with epoch e and b with epoch e', and e < e',
+\*                       then f must deliver a before b.
+GlobalPrimaryOrder == \A i \in Server: lastCommitted[i].index >= 2
+                         => \A idx1, idx2 \in 1..lastCommitted[i].index:
+                                \/ ~EpochPrecedeInTxn(history[i][idx1], history[i][idx2])
+                                \/ /\ EpochPrecedeInTxn(history[i][idx1], history[i][idx2])
+                                   /\ idx1 < idx2
+
+\* Primary integrity: If primary p broadcasts a and some follower f delivers b such that b has epoch
+\*                    smaller than epoch of p, then p must deliver b before it broadcasts a.
+PrimaryIntegrity == \A i, j \in Server: /\ IsLeader(i)   /\ IsMyLearner(i, j)
+                                        /\ IsFollower(j) /\ IsMyLeader(j, i)
+                                        /\ zabState[i] = BROADCAST
+                                        /\ zabState[j] = BROADCAST
+                                        /\ lastCommitted[j].index >= 1
+                        => \A idx_j \in 1..lastCommitted[j].index:
+                                \/ history[j][idx_j].zxid[1] >= currentEpoch[i]
+                                \/ /\ history[j][idx_j].zxid[1] < currentEpoch[i]
+                                   /\ \E idx_i \in 1..lastCommitted[i].index: 
+                                        TxnEqual(history[i][idx_i], history[j][idx_j])
+=============================================================================
+\* Modification History
+\* Last modified Tue Jan 31 20:40:11 CST 2023 by huangbinyu
+\* Last modified Sat Dec 11 22:31:08 CST 2021 by Dell
+\* Created Thu Dec 02 20:49:23 CST 2021 by Dell

+ 102 - 0
zookeeper-specifications/protocol-spec/doc.md

@@ -0,0 +1,102 @@
+# ZooKeeper's Protocol Specification of TLA+
+
+## Overview
+ZooKeeper's Protocol Specification focuses on the Zookeeper Atomic Broadcast (Zab) consensus protocol proposed in *Junqueira F P, Reed B C, Serafini M. Zab: High-performance broadcast for primary-backup systems[C]//2011 IEEE/IFIP 41st International Conference on Dependable Systems & Networks (DSN). IEEE, 2011: 245-256*.  
+
+We have completed the the [protocol specification](Zab.tla) for Zab using TLA+ toolbox, and we have done a certain scale of model checking to verify the correctness of Zab. From the experience, we have found some subtle [issues](issues.md) related to the protocol specification and the Zab informal description. 
+
+To handle the ambiguities and omissions of the informal description in the paper, we supplement the specification with some further details. If you have any question, please let us know.
+
+
+
+## Specification Development
+
+
+### Requirements
+TLA+ toolbox version 1.7.0
+
+### Run
+Create specification [Zab.tla](Zab.tla) and run models in the following way.  
+We can clearly divide spec into five modules, which are:  
+
+- Phase0. Leader Election  
+- Phase1. Discovery  
+- Phase2. Synchronization  
+- Phase3. Broadcast  
+- Abnormal situations like failure, network disconnection
+
+#### Assign constants
+After creating this new model and choosing *Temporal formula* with value *Spec*, we first assign most of constants.  
+We should set CONSTANTS about server states as model value, including *FOLLOWING*, *LEADING*, and *LOOKING*.  
+We should set CONSTANTS about server zabStates as model value, including *ELECTION*, *DISCOVERY*, *SYNCHRONIZATION*, and *BROADCAST*.  
+We should set CONSTANTS about message types as model value, including *CEPOCH*, *NEWEPOCH*, *ACKE*, *NEWLEADER*, *ACKLD*, *COMMITLD*, *PROPOSE*, *ACK*, and *COMMIT*.  
+
+#### Assign left constants affecting state space
+Then we should assign CONSTANTS *Server* as a symmetrical model value(such as <symmetrical\>{s1, s2, s3}).  
+To compress state space, we need to assign CONSTANT *Parameters* as a record, whose domain contains *MaxTimeoutFailures*, *MaxTransactionNum*, *MaxEpoch*, and *MaxRestarts*. For example, we can assign it to format like [MaxTimeoutFailures |-> 3, MaxTransactionNum |-> 5, MaxEpoch |-> 3, MaxRestarts |-> 2].
+
+#### Assign invariants
+We remove *'Deadlock'* option.  
+We add invariants defined in spec into *'Invariants'* to check whether the model will reach an illogical state, including *ShouldNotBeTriggered*, *Leadership1*, *Leadership2*, *PrefixConsistency*, *Integrity*, *Agreement*, *TotalOrder*, *LocalPrimaryOrder*, *GlobalPriamryOrder*, and *PrimaryIntegrity*.  
+Here the meanings of these invariants are described in the following. Except for the first four, all invariants are defined in paper.   
+	-	**ShouldNotBeTriggered**: Some conditions should not be triggered when we are running the model. For example, follower should not receive NEWLEADER when its zabState is not SYNCHRONIZATION.  
+	-	**Lerdership**: There is most one established leader in a certain epoch.(Established means leader has updated its f.a and switched its zabState to SYNCHRONIZATION.)  
+	-	**PrefixConsistency**: Transactions that have been committed in history are the same in any server.  
+	-	**Integrity**: If some follower delivers one transaction, some primary must have broadcast it.  
+	-	**Agreement**: If some follower *f<sub>1</sub>* delivers transaction *a* and some follower *f<sub>2</sub>* delivers transaction *b*, then *f<sub>2</sub>* delivers *a* or *f<sub>1</sub>* delivers *b*.  
+	-	**TotalOrder**: If some server delivers *a* before *b*, then any server that delivers *b* must also deliver *a* and deliver *a* before *b*.  
+	-	**LocalPrimaryOrder**: If a primary broadcasts *a* before it broadcasts *b*, then a follower that delivers *b* must also deliver *a* before *b*.  
+	-	**GlobalPrimaryOrder**: A server *f* delivers both *a* with epoch *e* and *b* with epoch *e'*, and *e* < *e'*, then *f* must deliver *a* before *b*.  
+	-	**PrimaryIntegrity**: If primary *p* broadcasts *a* and some follower *f* delivers *b* such that *b* has epoch smaller than epoch of *p*, then *p* must deliver *b* before it broadcasts *a*.  
+
+#### Assign additional TLC options
+We set number of worker threads as 10(if unavailable on your system, just decrease it).  
+We can choose checking mode from *Model-checking mode* and *simulation mode*.  
+
+-	Model-checking mode: It is a traverse method like BFS. Diameter in results represent the maximum depth when traversing. All intermediate results will be saved as binary files locally and occupy a large space if running time is long.  
+-	Simulation mode: Everytime TLC randomly chooses a path and run through it until reaching termination or reaching maximum length of the trace, and randomly chooses another path. Currently we set *Maximum length of the trace* as 100.  
+Here we mainly use simulation mode to discover if there exists deep bugs, which is hard to be found in model-checking mode.
+
+
+
+### Results
+
+You can find our [result](verification-statistics.md) of verification using TLC model checking.
+
+## Adjustments in protocol spec from paper
+>Because the pursuit of the paper is to describe the Zab protocol to others in a concise manner, which will lead to insufficient description of the protocol, there are missing or vague places in the paper. As a mathematical language, no ambiguity is allowed in the TLA+ specification, and this is why we need adjustment.
+
+Overall, we categorize the flaws of the original paper into two classes: abstraction and vagueness.
+
+### Abstraction
+
+There is a missing part in the paper, in which the pseudocode uses the **Discovery** stage as the initial stage of each round, and omits the **Election** stage.  
+
+On the one hand, in spec, **Election** helps advance the state of the system, and is also related to the liveness and strong consistency of the system, so we cannot omit it. On the other hand, our focus is on Zab, so the **Election** module should be expressed with a small number of variables and actions to reduce the state space of the model.
+
+We use one variable *leaderOracle* and two actions *UpdateLeader* and *FollowerLeader* to express the **election** module streamlined.
+
+### Vagueness
+
+We categorize vagueness in the paper into two classes: vagueness in variables and vagueness in actions.
+
+#### Vagueness in variables
+First, the character **Q** is used everywhere in the pseudocode to represent the set of Followers perceived by the Leader in the current term. We divide the set **Q** specifically into variables *learners*, *cepochRecv*, *ackeRecv* and *ackldRecv*. We use *cepochRecv* to let Leader broadcast *NEWEPOCH*, *ackeRecv* to let Leader broadcast *NEWLEADER* and *PROPOSE*, *ackldRecv* to let Leader broadcast *COMMIT-LD* and *COMMIT*. We will explain the reason why we use these sets when Leader broadcasts *PROPOSE* and *COMMIT* in the [issues](issues.md).
+
+Second, *zxid* in *COMMIT-LD* is omitted in hte paper. We will explain the value of the *zxid* in the [issues](issues.md).
+
+#### Vagueness in actions
+Totally, adjustment on vagueness in actions can be divided into two classes: Completing the missing description and Adjusting the protocol structure.
+
+*	For completing the missing description, we categorize four classes:
+
+	1.	Complete the branch of action where after the Leader node processes the message *m* of type *t*, the receiving set of messages of type *t* still does not satisfy the quorum.
+	2.	Complete the branch of action where before the Leader node processes the message *m* of type *t*, the receiving set of messages of type *t* has already satisfied the quorum.
+	3.	Supplement the logical action that the Leader receives the request from the client, encapsulates the request as a log entry, and persists the log entry.
+	4.	Supplement the logical action that the Leader tries to commit log entries based on confirmation information from followers.
+
+*	For adjusting the protocol structure, in order to improve the readability of the spec, we impose standardized and unified restrictions on the spec. That is, the division unit is to one node receiving and processing one message. Each action, except actions in election module and environment error module, makes a node receiving a certain message a trigger condition, and then produces a subsequent state change.
+
+See example when Leader processes message *CEPOCH* from one follower:
+
+![case_leader_process_cepoch](pic/case_leader_process_cepoch.png)

+ 40 - 0
zookeeper-specifications/protocol-spec/issues.md

@@ -0,0 +1,40 @@
+# Issues
+
+>This document describes issues related to the ambiguous description of the Zab protocol. 
+
+## (Issue 1)Issue introduced by vague explanation of broadcast object set Q
+Because the Leader will only accept the client request when it reaches the *Broadcast* stage, thus when Leader broadcasting messages from *Broadcast* stage, we naturally use the set *ackldRecv* as the object set when Leader broadcasting *Propose* and *Commit* messages in the initial protocol. 
+
+On this basis, the model checking found that there was an error in the protocol. After analyzing the execution path and the violated invariant, we located the error in the action *FollowerProcessCommit*. The Follower received an illegal *COMMIT* message, as shown in the figure below. In the *COMMIT* message, the log entry corresponding to the *zxid* does not exist in the local log.  
+
+![protocol-spec-mck-bug1](pic/protocol-spec-mck-bug1.png)
+
+A quorum of nodes including node *A* and node *B* completed the log recovery, and entered the *Broadcast* stage. Leader node *B* sent *NEWLEADER*(1, <[1, 1, v1]>) to node *C*, which joined the cluster. Then Leader *B* received client request and broadcast *PROPOSE*([1, 2, v2]). But node *C* did not receive the message because *C* is not in *B*'s *ackldRecv*. Later, the Leader firstly processes *ACK-LD* from *C* and then *ACK* from *A*. So, the object of Leader *B*'s broadcast *COMMIT*([1, 2]) contains *C*, and there is no log entry with *zxid* [1, 2] locally in *C*.
+
+We found that because the set of objects when the Leader broadcast *PROPOSE* is *ackldRecv*, the Follower will not receive messages with type *PROPOSE* until receiving the *COMMIT-LD*. If a log entry is committed between the Leader sending *NEWLEADER* and *COMMIT-LD*, the corresponding receiver will permanently lose the log entry.
+
+Therefore, we have made amendments in the spec. When the Leader broadcasts *PROPOSE*, the set of sending objects has been changed from *ackldRecv* to *ackeRecv*. We successfully discovered errors introduced by the vague description of **Q** in the protocol through model checking, and fixed them in the specification.
+
+## (Issue 2) Issue introduced by lack of client request logic
+Since there is not discriptions about client request, we need to add a process that describes the Leader receives a client request and encapsulates the request as a new log entry and appends it to the local log. This process are represented as action *LeaderProcessRequest* in the spec.
+
+In our original specification, when Follower receives the message *PROPOSE*, the *zxid* of the proposed log entry should be the successor *zxid* of the latest local log entry‘s *zxid*, otherwise it will be regarded as an illegal *PROPOSE*.
+
+On this basis, model checking told us that there was an error in the model, as shown in the figure below, where Follower receives redundant *PROPOSE* from the same log entry. 
+
+![protocol-spec-mck-bug2](pic/protocol-spec-mck-bug2.png)
+
+A quorum of nodes including node *A* and *B* completed log recovery first and entered the *Broadcast* stage. When receiving a client request, Leader B encapsulated the request as a log entry [1, 1, v1]. Before B broadcast the proposal of the log entry, node *C* that joined the cluster later sent *ACKEPOCH* to *B*, and *B* replied with *NEWLEADER*(1, <[1, 1, v1]>), and added *C* to the set *ackeRecv*. So the *PROPOSE*([1, 1, v1]) broadcast by *B* was also sent to *C*, so *C* received the redundant proposal for the log entry [1, 1, v1].
+
+We have analyzed two repair schemes. The first scheme is to atomize the actions *LeaderProcessRequest* and *LeaderBroadcastPROPOSE* and merge them into one action. The second scheme is to relax the constraints, where if the Follower receives redundant *PROPOSE*, it will be release this message directly. We chose the second scheme, because we think the situation is not serious and will not affect the correctness of Zab.
+
+## (Issue 3) Issue introduced by ambiguity of zxid in the message COMMIT-LD
+When the Leader sends *COMMIT-LD*, the paper does not express the specific value of the *zxid* carried in the message. Therefore, we naturally set the *zxid* carried in *COMMIT-LD* to the *zxid* in *ACK-LD*.
+
+Therefore, the model checking found an invariant violation, as shown in the figure below, the Follower received an illegal message *COMMIT*.
+
+![protocol-spec-mck-bug3](pic/protocol-spec-mck-bug3.png)
+
+A quorum of nodes including node *A* and *B* completed log recovery first and entered the *Broadcast* stage. When the node *C* that joined the cluster later sent *ACKEPOCH* to Leader *B*, *B* sent *NEWLEADER*(1, <[1, 1, v1]>) to *C*, and then received the client request to broadcast *PROPOSE*([1, 2, v2]). Then, *B* received the message *ACK*([1, 2]) from *A* before receiving *ACK-LD* from *C*, so Leader *B* committed the log entry [1, 2, v2] and broadcasts *COMMIT*([1, 2]). Because the object set when broadcasting *COMMIT* is *ackldRecv*, *C* did not receive the commit message. Afterwards, *B* processed the *ACK-LD*([1, 1]) from *C*, and replied with *COMMIT-LD*([1, 1]). Therefore, when Leader *B* reached a consensus on the new log entry [1, 3, v3], it sendt *COMMIT*([1, 3]) to *C*, and *C* found that entry [1, 2, v2] in the local log has not been committed yet, which was treated as an illegal *COMMIT*.
+
+The repair solution we analyzed is that in the action *LeaderProcessACKLD*, the *zxid* carried by *COMMIT-LD* is changed from the *zxid* in *ACK-LD* to the latest *zxid* committed locally by Leader, thus making up for the missing commit information.

二进制
zookeeper-specifications/protocol-spec/pic/case_leader_process_cepoch.png


二进制
zookeeper-specifications/protocol-spec/pic/protocol-spec-mck-bug1.png


二进制
zookeeper-specifications/protocol-spec/pic/protocol-spec-mck-bug2.png


二进制
zookeeper-specifications/protocol-spec/pic/protocol-spec-mck-bug3.png


+ 44 - 0
zookeeper-specifications/protocol-spec/verification-statistics.md

@@ -0,0 +1,44 @@
+# Verification Statistics 
+##### Experiment configuration
+
+Our statistical results include: diameter of the system states that have been traversed, the number of states that have been traversed, the number of different states that have been discovered, and the time spent in the experiment.
+
+The machine configuration used in the experiment is 2.40 GHz, 10-core CPU, 64GB memory. The TLC version number is 1.7.0.
+
+
+
+##### State space constraints in model checking
+
+Due to the state space explosion in model checking and the complex actions of Zab protocol, as well as unlimited number of rounds and unlimited length of history, it is impossible to traverse all states.  
+We try to let models can tap into larger state space. See CONSTANT *Parameters* for details.
+
+
+
+##### Verification statistics of model checking  
+|  Mode  |     TLC model         |    Diameter   |     num of states  | time of checking(hh:mm:ss) |
+| ----- | ---------------------- | ------------- | ------------------ | ------------------ |
+| BFS   | (2 servers,3 rounds,2 transactions)    |     59   |  7758091583 |  17:28:17|
+| Simulation | (2 servers,3 rounds,2 transactions)   |   -|  6412825222| 17:07:20  |
+| BFS   | (3 servers,2 rounds,2 transactions)    |     19   |  4275801206 |  09:40:08|
+| Simulation | (3 servers,2 rounds,2 transactions)   |   -|  10899460942| 20:15:11  |
+| BFS   | (3 servers,2 rounds,3 transactions)   |    22    |  8740566213  | 17:49:09 |
+| Simulation | (3 servers,2 rounds,3 transactions)  |  -    | 9639135842  |   20:10:00 |
+| BFS    |  (3 servers,3 rounds,2 transactions)    |    21    | 7079744342    |14:17:45 |
+| Simulation | (3 servers,3 rounds,2 transactions)    |  -  |  6254964039   | 15:08:42 |
+| BFS    |  (4 servers,3 rounds,2 transactions)    |    16    | 5634112480  |15:42:09 |
+| Simulation | (4 servers,3 rounds,2 transactions)    |  -  |  3883461291   | 15:52:03 |
+
+
+
+##### Verification statistics with parameters (count of servers, MaxTotalRestartNum, MaxElectionNum, MaxTransactionNum)
+
+|  Mode  |     TLC model         |    Diameter   |     num of states  | time of checking(hh:mm:ss) |
+| ----- | ---------------------- | ------------- | ------------------ | ------------------ |
+| BFS   | (2,2,3,2,termination) |     55   |  10772649   |  00:02:21|
+| BFS   | (3,1,1,2)             |     45   |  9602018536 |  31:01:57|
+
+
+
+##### Issues
+
+Besides, we have found several issues related to the ambiguous description of the Zab protocol. Details can be found at [issues.md](issues.md).

+ 17 - 0
zookeeper-specifications/system-spec/doc.md

@@ -0,0 +1,17 @@
+# ZooKeeper's System Specification of TLA+
+
+## Overview
+
+ZooKeeper's system specification of TLA+ focuses on the implementation of the Zookeeper Atomic Broadcast(ZAB) consensus protocol (or, [ZAB1.0](https://cwiki.apache.org/confluence/display/ZOOKEEPER/Zab1.0)). 
+
+As is shown by the informal description of [ZAB1.0](https://cwiki.apache.org/confluence/display/ZOOKEEPER/Zab1.0), the implementation of ZAB that used in ZooKeeper production differs a lot from its original design.
+
+We make this system specification to grasp the core logic of the ZAB's implementation especially. 
+
+The [sysetm specification](zk-3.7/ZkV3_7_0.tla) written in TLA+ is precise without ambiguity, and testable with existed tools like the TLC model checker. (We have done a certain scale of model checking to verify its correctness!) 
+
+The system specification can serve as the super-doc supplementing detailed system documentation for the ZooKeeper developers. It can evolve incrementally without high cost as the system develops over time, continually ensuring correctness for both the design and the implementation.
+
+We have also made a formal [specification](zk-3.7/FastLeaderElection.tla) for Fast Leader Election in Zab since ZAB 1.0 depends on FLE to complete the election phase.
+
+If you have any question or find any problem of the specification, please contact us.

+ 535 - 0
zookeeper-specifications/system-spec/zk-3.7/FastLeaderElection.tla

@@ -0,0 +1,535 @@
+------------------------- MODULE FastLeaderElection -------------------------
+\* This is the formal specification for Fast Leader Election in Zab protocol.
+(* Reference:
+   FastLeaderElection.java, Vote.java, QuorumPeer.java in https://github.com/apache/zookeeper.
+   Medeiros A. ZooKeeper's atomic broadcast protocol: Theory and practice[J]. Aalto University School of Science, 2012.
+*)
+EXTENDS Integers, FiniteSets, Sequences, Naturals, TLC
+
+-----------------------------------------------------------------------------
+\* The set of server identifiers
+CONSTANT Server
+
+\* Server states
+CONSTANTS LOOKING, FOLLOWING, LEADING
+(* NOTE: In spec, we do not discuss servers whose ServerState is OBSERVING.
+*)
+
+\* Message types
+CONSTANTS NOTIFICATION
+
+\* Timeout signal
+CONSTANT NONE
+-----------------------------------------------------------------------------
+Quorums == {Q \in SUBSET Server: Cardinality(Q)*2 > Cardinality(Server)}
+
+NullPoint == CHOOSE p: p \notin Server
+
+-----------------------------------------------------------------------------
+\* Server's state(LOOKING, FOLLOWING, LEADING).
+VARIABLE state
+
+VARIABLE history
+
+\* The epoch number of the last NEWLEADER packet accepted, used for comparing.
+VARIABLE currentEpoch
+
+\* The index and zxid of the last processed transaction in history.
+VARIABLE lastProcessed
+
+\* currentVote[i]: The server who i thinks is the current leader(id,zxid,peerEpoch,...).
+VARIABLE currentVote
+
+\* Election instance.(logicalClock in code)
+VARIABLE logicalClock
+
+\* The votes from the current leader election are stored in ReceiveVotes.
+VARIABLE receiveVotes
+
+(* The votes from previous leader elections, as well as the votes from the current leader election are
+   stored in outofelection. Note that notifications in a LOOKING state are not stored in outofelection.
+   Only FOLLOWING or LEADING notifications are stored in outofelection.  *)
+VARIABLE outOfElection
+
+\* recvQueue[i]: The queue of received notifications or timeout signals in server i.
+VARIABLE recvQueue
+
+\* A veriable to wait for new notifications, corresponding to line 1050 in FastLeaderElection.java.
+VARIABLE waitNotmsg
+
+\* leadingVoteSet[i]: The set of voters that follow i.
+VARIABLE leadingVoteSet
+
+(* The messages about election sent from one server to another.
+   electionMsgs[i][j] means the input buffer of server j from server i. *)
+VARIABLE electionMsgs
+
+\* Set used for mapping Server to Integers, to compare ids from different servers.
+\* VARIABLE idTable
+
+serverVarsL == <<state, currentEpoch, lastProcessed, history>>
+
+electionVarsL == <<currentVote, logicalClock, receiveVotes, outOfElection, recvQueue, waitNotmsg>>
+
+leaderVarsL == <<leadingVoteSet>>
+
+varsL == <<serverVarsL, electionVarsL, leaderVarsL, electionMsgs>>
+-----------------------------------------------------------------------------
+\* Processing of electionMsgs
+BroadcastNotmsg(i, m) == electionMsgs' = [electionMsgs EXCEPT ![i] = [v \in Server |-> IF v /= i
+                                                                                       THEN Append(electionMsgs[i][v], m)
+                                                                                       ELSE electionMsgs[i][v]]]
+
+DiscardNotmsg(i, j) == electionMsgs' = [electionMsgs EXCEPT ![i][j] = IF electionMsgs[i][j] /= << >>
+                                                                      THEN Tail(electionMsgs[i][j])
+                                                                      ELSE << >>]
+
+ReplyNotmsg(i, j, m) == electionMsgs' = [electionMsgs EXCEPT ![i][j] = Append(electionMsgs[i][j], m),
+                                                             ![j][i] = Tail(electionMsgs[j][i])]
+                                               
+-----------------------------------------------------------------------------
+\* Processing of recvQueue
+RECURSIVE RemoveNone(_)
+RemoveNone(seq) == CASE seq =  << >> -> << >>
+                   []   seq /= << >> -> IF Head(seq).mtype = NONE THEN RemoveNone(Tail(seq))
+                                                                  ELSE <<Head(seq)>> \o RemoveNone(Tail(seq)) 
+
+\* Processing of idTable and order comparing
+RECURSIVE InitializeIdTable(_)
+InitializeIdTable(Remaining) == IF Remaining = {} THEN {}
+                                ELSE LET chosen == CHOOSE i \in Remaining: TRUE
+                                         re     == Remaining \ {chosen}
+                                     IN {<<chosen, Cardinality(Remaining)>>} \union InitializeIdTable(re)
+
+IdTable == InitializeIdTable(Server) 
+
+\* FALSE: id1 < id2; TRUE: id1 > id2
+IdCompare(id1,id2) == LET item1 == CHOOSE item \in IdTable: item[1] = id1
+                          item2 == CHOOSE item \in IdTable: item[1] = id2
+                      IN item1[2] > item2[2]
+
+\* FALSE: zxid1 <= zxid2; TRUE: zxid1 > zxid2
+ZxidCompare(zxid1, zxid2) == \/ zxid1[1] > zxid2[1]
+                             \/ /\ zxid1[1] = zxid2[1]
+                                /\ zxid1[2] > zxid2[2]
+
+ZxidEqual(zxid1, zxid2) == zxid1[1] = zxid2[1] /\ zxid1[2] = zxid2[2]
+
+\* FALSE: vote1 <= vote2; TRUE: vote1 > vote2
+TotalOrderPredicate(vote1, vote2) == \/ vote1.proposedEpoch > vote2.proposedEpoch
+                                     \/ /\ vote1.proposedEpoch = vote2.proposedEpoch
+                                        /\ \/ ZxidCompare(vote1.proposedZxid, vote2.proposedZxid)
+                                           \/ /\ ZxidEqual(vote1.proposedZxid, vote2.proposedZxid)
+                                              /\ IdCompare(vote1.proposedLeader, vote2.proposedLeader)
+
+VoteEqual(vote1, round1, vote2, round2) == /\ vote1.proposedLeader = vote2.proposedLeader
+                                           /\ ZxidEqual(vote1.proposedZxid, vote2.proposedZxid)
+                                           /\ vote1.proposedEpoch  = vote2.proposedEpoch
+                                           /\ round1 = round2
+
+InitLastProcessed(i) == IF Len(history[i]) = 0 THEN [ index |-> 0, 
+                                                 zxid |-> <<0, 0>> ]
+                        ELSE
+                        LET lastIndex == Len(history[i])
+                            entry     == history[i][lastIndex]
+                        IN [ index |-> lastIndex,
+                             zxid  |-> entry.zxid ]
+
+RECURSIVE InitAcksidInTxns(_,_)
+InitAcksidInTxns(txns, src) == IF Len(txns) = 0 THEN << >>
+                               ELSE LET newTxn == [ zxid   |-> txns[1].zxid,
+                                                    value  |-> txns[1].value,
+                                                    ackSid |-> {src},
+                                                    epoch  |-> txns[1].epoch ]
+                                    IN <<newTxn>> \o InitAcksidInTxns( Tail(txns), src)
+
+InitHistory(i) == LET newState == state'[i] IN 
+                    IF newState = LEADING THEN InitAcksidInTxns(history[i], i)
+                    ELSE history[i]
+-----------------------------------------------------------------------------
+\* Processing of currentVote
+InitialVote == [proposedLeader |-> NullPoint,
+                proposedZxid   |-> <<0, 0>>,
+                proposedEpoch  |-> 0]
+
+SelfVote(i) == [proposedLeader |-> i,
+                proposedZxid   |-> lastProcessed[i].zxid,
+                proposedEpoch  |-> currentEpoch[i]]
+
+UpdateProposal(i, nid, nzxid, nepoch) == currentVote' = [currentVote EXCEPT ![i].proposedLeader = nid, \* no need to record state in LOOKING
+                                                                            ![i].proposedZxid   = nzxid,
+                                                                            ![i].proposedEpoch  = nepoch]  
+                                                                            
+-----------------------------------------------------------------------------
+\* Processing of receiveVotes and outOfElection
+RvClear(i) == receiveVotes'  = [receiveVotes  EXCEPT ![i] = [v \in Server |-> [vote    |-> InitialVote,
+                                                                               round   |-> 0,
+                                                                               state   |-> LOOKING,
+                                                                               version |-> 0]]]
+
+RvPut(i, id, mvote, mround, mstate) == receiveVotes' = CASE receiveVotes[i][id].round < mround -> [receiveVotes EXCEPT ![i][id].vote    = mvote,
+                                                                                                                       ![i][id].round   = mround,
+                                                                                                                       ![i][id].state   = mstate,
+                                                                                                                       ![i][id].version = 1]
+                                                       []   receiveVotes[i][id].round = mround -> [receiveVotes EXCEPT ![i][id].vote    = mvote,
+                                                                                                                       ![i][id].state   = mstate,
+                                                                                                                       ![i][id].version = @ + 1]
+                                                       []   receiveVotes[i][id].round > mround -> receiveVotes
+
+Put(i, id, rcvset, mvote, mround, mstate) == CASE rcvset[id].round < mround -> [rcvset EXCEPT ![id].vote    = mvote,
+                                                                                              ![id].round   = mround,
+                                                                                              ![id].state   = mstate,
+                                                                                              ![id].version = 1]
+                                             []   rcvset[id].round = mround -> [rcvset EXCEPT ![id].vote    = mvote,
+                                                                                              ![id].state   = mstate,
+                                                                                              ![id].version = @ + 1]
+                                             []   rcvset[id].round > mround -> rcvset
+
+RvClearAndPut(i, id, vote, round) == receiveVotes' = LET oneVote == [vote    |-> vote, 
+                                                                     round   |-> round, 
+                                                                     state   |-> LOOKING,
+                                                                     version |-> 1]
+                                                     IN [receiveVotes EXCEPT ![i] = [v \in Server |-> IF v = id THEN oneVote
+                                                                                                                ELSE [vote    |-> InitialVote,
+                                                                                                                      round   |-> 0,
+                                                                                                                      state   |-> LOOKING,
+                                                                                                                      version |-> 0]]]                     
+
+VoteSet(i, msource, rcvset, thisvote, thisround) == {msource} \union {s \in (Server \ {msource}): VoteEqual(rcvset[s].vote, 
+                                                                                                            rcvset[s].round,
+                                                                                                            thisvote,
+                                                                                                            thisround)}
+
+HasQuorums(i, msource, rcvset, thisvote, thisround) == LET Q == VoteSet(i, msource, rcvset, thisvote, thisround)
+                                                       IN IF Q \in Quorums THEN TRUE ELSE FALSE
+
+CheckLeader(i, votes, thisleader, thisround) == IF thisleader = i THEN (IF thisround = logicalClock[i] THEN TRUE ELSE FALSE)
+                                                ELSE (IF votes[thisleader].vote.proposedLeader = NullPoint THEN FALSE
+                                                      ELSE (IF votes[thisleader].state = LEADING THEN TRUE 
+                                                                                                 ELSE FALSE))
+
+OoeClear(i) == outOfElection' = [outOfElection EXCEPT ![i] = [v \in Server |-> [vote    |-> InitialVote,
+                                                                                round   |-> 0,
+                                                                                state   |-> LOOKING,
+                                                                                version |-> 0]]]  
+
+OoePut(i, id, mvote, mround, mstate) == outOfElection' = CASE outOfElection[i][id].round < mround -> [outOfElection EXCEPT ![i][id].vote    = mvote,
+                                                                                                                           ![i][id].round   = mround,
+                                                                                                                           ![i][id].state   = mstate,
+                                                                                                                           ![i][id].version = 1]
+                                                         []   outOfElection[i][id].round = mround -> [outOfElection EXCEPT ![i][id].vote    = mvote,
+                                                                                                                           ![i][id].state   = mstate,
+                                                                                                                           ![i][id].version = @ + 1]
+                                                         []   outOfElection[i][id].round > mround -> outOfElection
+                                                                                                                             
+-----------------------------------------------------------------------------    
+InitServerVarsL == /\ state         = [s \in Server |-> LOOKING]
+                   /\ currentEpoch  = [s \in Server |-> 0]
+                   /\ lastProcessed = [s \in Server |-> [index |-> 0,
+                                                         zxid  |-> <<0, 0>>] ]
+                   /\ history       = [s \in Server |-> << >>]
+
+InitElectionVarsL == /\ currentVote   = [s \in Server |-> SelfVote(s)]
+                     /\ logicalClock  = [s \in Server |-> 0]
+                     /\ receiveVotes  = [s \in Server |-> [v \in Server |-> [vote    |-> InitialVote,
+                                                                             round   |-> 0,
+                                                                             state   |-> LOOKING,
+                                                                             version |-> 0]]]
+                     /\ outOfElection = [s \in Server |-> [v \in Server |-> [vote    |-> InitialVote,
+                                                                             round   |-> 0,
+                                                                             state   |-> LOOKING,
+                                                                             version |-> 0]]]
+                     /\ recvQueue     = [s \in Server |-> << >>]
+                     /\ waitNotmsg    = [s \in Server |-> FALSE]
+
+InitLeaderVarsL == leadingVoteSet = [s \in Server |-> {}]
+
+InitL == /\ InitServerVarsL
+        /\ InitElectionVarsL
+        /\ InitLeaderVarsL
+        /\ electionMsgs = [s \in Server |-> [v \in Server |-> << >>]]
+        \* /\ idTable = InitializeIdTable(Server)
+        
+-----------------------------------------------------------------------------
+(* The beginning part of FLE's main function lookForLeader() *)
+ZabTimeout(i) ==
+        /\ state[i] \in {LEADING, FOLLOWING}
+        /\ state'          = [state          EXCEPT ![i] = LOOKING]
+        /\ lastProcessed'  = [lastProcessed  EXCEPT ![i] = InitLastProcessed(i)]
+        /\ logicalClock'   = [logicalClock   EXCEPT ![i] = logicalClock[i] + 1]
+        /\ currentVote'    = [currentVote    EXCEPT ![i] = [proposedLeader |-> i,
+                                                            proposedZxid   |-> lastProcessed'[i].zxid,
+                                                            proposedEpoch  |-> currentEpoch[i]]]
+        /\ receiveVotes'   = [receiveVotes   EXCEPT ![i] = [v \in Server |-> [vote    |-> InitialVote,
+                                                                              round   |-> 0,
+                                                                              state   |-> LOOKING,
+                                                                              version |-> 0]]]
+        /\ outOfElection'  = [outOfElection  EXCEPT ![i] = [v \in Server |-> [vote    |-> InitialVote,
+                                                                              round   |-> 0,
+                                                                              state   |-> LOOKING,
+                                                                              version |-> 0]]]
+        /\ recvQueue'      = [recvQueue      EXCEPT ![i] = << >>]  
+        /\ waitNotmsg'     = [waitNotmsg     EXCEPT ![i] = FALSE]
+        /\ leadingVoteSet' = [leadingVoteSet EXCEPT ![i] = {}]
+        /\ BroadcastNotmsg(i, [mtype   |-> NOTIFICATION,
+                               msource |-> i,
+                               mstate  |-> LOOKING,
+                               mround  |-> logicalClock'[i],
+                               mvote   |-> currentVote'[i]])
+        /\ UNCHANGED <<currentEpoch, history>>
+        
+(* Abstraction of WorkerReceiver.run() *)
+ReceiveNotmsg(i, j) ==
+        /\ electionMsgs[j][i] /= << >>
+        /\ LET notmsg == electionMsgs[j][i][1]
+               toSend == [mtype   |-> NOTIFICATION,
+                          msource |-> i,
+                          mstate  |-> state[i],
+                          mround  |-> logicalClock[i],
+                          mvote   |-> currentVote[i]]
+           IN \/ /\ state[i] = LOOKING
+                 /\ recvQueue' = [recvQueue EXCEPT ![i] = Append(RemoveNone(recvQueue[i]), notmsg)]
+                 /\ LET replyOk == /\ notmsg.mstate = LOOKING
+                                   /\ notmsg.mround < logicalClock[i]
+                    IN 
+                    \/ /\ replyOk
+                       /\ ReplyNotmsg(i, j, toSend)
+                    \/ /\ ~replyOk
+                       /\ DiscardNotmsg(j, i)
+              \/ /\ state[i] \in {LEADING, FOLLOWING}
+                 /\ \/ \* Only reply when sender's state is LOOKING
+                       /\ notmsg.mstate = LOOKING
+                       /\ ReplyNotmsg(i, j, toSend)
+                    \/ \* sender's state and mine are both not LOOKING, just discard
+                       /\ notmsg.mstate /= LOOKING
+                       /\ DiscardNotmsg(j, i)
+                 /\ UNCHANGED recvQueue
+        /\ UNCHANGED <<serverVarsL, currentVote, logicalClock, receiveVotes, outOfElection, waitNotmsg, leaderVarsL>>
+        
+NotmsgTimeout(i) == 
+        /\ state[i] = LOOKING
+        /\ \A j \in Server: electionMsgs[j][i] = << >>
+        /\ recvQueue[i] = << >>
+        /\ recvQueue' = [recvQueue EXCEPT ![i] = Append(recvQueue[i], [mtype |-> NONE])]
+        /\ UNCHANGED <<serverVarsL, currentVote, logicalClock, receiveVotes, outOfElection, waitNotmsg, leaderVarsL, electionMsgs>>
+
+-----------------------------------------------------------------------------
+\* Sub-action in HandleNotmsg
+ReceivedFollowingAndLeadingNotification(i, n) ==
+        LET newVotes    == Put(i, n.msource, receiveVotes[i], n.mvote, n.mround, n.mstate)
+            voteSet1    == VoteSet(i, n.msource, newVotes, n.mvote, n.mround)
+            hasQuorums1 == voteSet1 \in Quorums
+            check1      == CheckLeader(i, newVotes, n.mvote.proposedLeader, n.mround)
+            leaveOk1    == /\ n.mround = logicalClock[i]
+                           /\ hasQuorums1
+                           /\ check1    \* state and leadingVoteSet cannot be changed twice in the first '/\' and second '/\'.
+        IN
+        /\ \/ /\ n.mround = logicalClock[i]
+              /\ receiveVotes' = [receiveVotes EXCEPT ![i] = newVotes]
+           \/ /\ n.mround /= logicalClock[i]
+              /\ UNCHANGED receiveVotes
+        /\ \/ /\ leaveOk1
+              \* /\ PrintT("leave with condition 1")
+              /\ state' = [state EXCEPT ![i] = IF n.mvote.proposedLeader = i THEN LEADING ELSE FOLLOWING]
+              /\ leadingVoteSet' = [leadingVoteSet EXCEPT ![i] = IF n.mvote.proposedLeader = i THEN voteSet1 ELSE @]
+              /\ UpdateProposal(i, n.mvote.proposedLeader, n.mvote.proposedZxid, n.mvote.proposedEpoch)
+              /\ UNCHANGED <<logicalClock, outOfElection>>
+           \/ /\ ~leaveOk1
+              /\ outOfElection' = [outOfElection EXCEPT ![i] = Put(i, n.msource, outOfElection[i], n.mvote,n.mround, n.mstate)]
+              /\ LET voteSet2    == VoteSet(i, n.msource, outOfElection'[i], n.mvote, n.mround)
+                     hasQuorums2 == voteSet2 \in Quorums
+                     check2      == CheckLeader(i, outOfElection'[i], n.mvote.proposedLeader, n.mround)
+                     leaveOk2    == /\ hasQuorums2
+                                    /\ check2
+                 IN
+                 \/ /\ leaveOk2
+                    \* /\ PrintT("leave with condition 2")
+                    /\ logicalClock' = [logicalClock EXCEPT ![i] = n.mround]
+                    /\ state' = [state EXCEPT ![i] = IF n.mvote.proposedLeader = i THEN LEADING ELSE FOLLOWING]
+                    /\ leadingVoteSet' = [leadingVoteSet EXCEPT ![i] = IF n.mvote.proposedLeader = i THEN voteSet2 ELSE @]
+                    /\ UpdateProposal(i, n.mvote.proposedLeader, n.mvote.proposedZxid, n.mvote.proposedEpoch)
+                 \/ /\ ~leaveOk2
+                    /\ LET leaveOk3 == /\ n.mstate = LEADING
+                                       /\ n.mround = logicalClock[i]
+                       IN
+                       \/ /\ leaveOk3
+                          \* /\ PrintT("leave with condition 3")
+                          /\ state' = [state EXCEPT ![i] = IF n.mvote.proposedLeader = i THEN LEADING ELSE FOLLOWING]
+                          /\ UpdateProposal(i, n.mvote.proposedLeader, n.mvote.proposedZxid, n.mvote.proposedEpoch)
+                       \/ /\ ~leaveOk3
+                          /\ UNCHANGED <<state, currentVote>>
+                    /\ UNCHANGED <<logicalClock, leadingVoteSet>>
+
+(* Main part of lookForLeader() *)
+HandleNotmsg(i) ==
+        /\ state[i] = LOOKING
+        /\ \lnot waitNotmsg[i]
+        /\ recvQueue[i] /= << >>
+        /\ LET n         == recvQueue[i][1]
+               rawToSend == [mtype   |-> NOTIFICATION,
+                             msource |-> i,
+                             mstate  |-> LOOKING,
+                             mround  |-> logicalClock[i],
+                             mvote   |-> currentVote[i]]
+           IN \/ /\ n.mtype = NONE
+                 /\ BroadcastNotmsg(i, rawToSend)
+                 /\ UNCHANGED <<history, logicalClock, currentVote, receiveVotes, waitNotmsg, outOfElection, state, leadingVoteSet>>
+              \/ /\ n.mtype = NOTIFICATION
+                 /\ \/ /\ n.mstate = LOOKING
+                       /\ \/ \* n.round >= my round, then update data and receiveVotes.
+                             /\ n.mround >= logicalClock[i]
+                             /\ \/ \* n.round > my round, update round and decide new proposed leader.
+                                   /\ n.mround > logicalClock[i]
+                                   /\ logicalClock' = [logicalClock EXCEPT ![i] = n.mround] \* There should be RvClear, we will handle it in the following.
+                                   /\ LET selfinfo == [proposedLeader |-> i,
+                                                       proposedZxid   |-> lastProcessed[i].zxid,
+                                                       proposedEpoch  |-> currentEpoch[i]]
+                                          peerOk   == TotalOrderPredicate(n.mvote, selfinfo)
+                                      IN \/ /\ peerOk
+                                            /\ UpdateProposal(i, n.mvote.proposedLeader, n.mvote.proposedZxid, n.mvote.proposedEpoch)
+                                         \/ /\ ~peerOk
+                                            /\ UpdateProposal(i, i, lastProcessed[i].zxid, currentEpoch[i])
+                                   /\ BroadcastNotmsg(i, [mtype   |-> NOTIFICATION,
+                                                          msource |-> i,
+                                                          mstate  |-> LOOKING,
+                                                          mround  |-> n.mround,
+                                                          mvote   |-> currentVote'[i]])
+                                \/ \* n.round = my round & n.vote > my vote
+                                   /\ n.mround = logicalClock[i]
+                                   /\ LET peerOk == TotalOrderPredicate(n.mvote, currentVote[i])
+                                      IN \/ /\ peerOk
+                                            /\ UpdateProposal(i, n.mvote.proposedLeader, n.mvote.proposedZxid, n.mvote.proposedEpoch)
+                                            /\ BroadcastNotmsg(i, [mtype   |-> NOTIFICATION,
+                                                                   msource |-> i,
+                                                                   mstate  |-> LOOKING,
+                                                                   mround  |-> logicalClock[i],
+                                                                   mvote   |-> n.mvote])
+                                         \/ /\ ~peerOk
+                                            /\ UNCHANGED <<currentVote, electionMsgs>>
+                                   /\ UNCHANGED logicalClock
+                             /\ LET rcvsetModifiedTwice == n.mround > logicalClock[i]
+                                IN \/ /\ rcvsetModifiedTwice   \* Since a variable cannot be changed more than once in one action, we handle receiveVotes here.
+                                      /\ RvClearAndPut(i, n.msource, n.mvote, n.mround)  \* clear + put
+                                   \/ /\ ~rcvsetModifiedTwice
+                                      /\ RvPut(i, n.msource, n.mvote, n.mround, n.mstate)          \* put
+                             /\ LET hasQuorums == HasQuorums(i, i, receiveVotes'[i], currentVote'[i], n.mround)
+                                IN \/ /\ hasQuorums \* If hasQuorums, see action WaitNewNotmsg and WaitNewNotmsgEnd.
+                                      /\ waitNotmsg' = [waitNotmsg EXCEPT ![i] = TRUE] 
+                                   \/ /\ ~hasQuorums                            
+                                      /\ UNCHANGED waitNotmsg
+                          \/ \* n.round < my round, just discard it.
+                             /\ n.mround < logicalClock[i]
+                             /\ UNCHANGED <<logicalClock, currentVote, electionMsgs, receiveVotes, waitNotmsg>>
+                       /\ UNCHANGED <<state, history, outOfElection, leadingVoteSet>>
+                    \/ \* mainly contains receivedFollowingNotification(line 1146), receivedLeadingNotification(line 1185).
+                       /\ n.mstate \in {LEADING, FOLLOWING}
+                       /\ ReceivedFollowingAndLeadingNotification(i, n)
+                       /\ history' = [history EXCEPT ![i] = InitHistory(i) ]
+                       /\ UNCHANGED <<electionMsgs, waitNotmsg>>
+        /\ recvQueue' = [recvQueue EXCEPT ![i] = Tail(recvQueue[i])]
+        /\ UNCHANGED <<currentEpoch, lastProcessed>>
+
+\* On the premise that ReceiveVotes.HasQuorums = TRUE, corresponding to logic in LFE.java.
+WaitNewNotmsg(i) ==
+        /\ state[i] = LOOKING
+        /\ waitNotmsg[i] = TRUE
+        /\ \/ /\ recvQueue[i] /= << >>
+              /\ recvQueue[i][1].mtype = NOTIFICATION
+              /\ LET n == recvQueue[i][1]
+                     peerOk == TotalOrderPredicate(n.mvote, currentVote[i])
+                 IN \/ /\ peerOk
+                       /\ waitNotmsg' = [waitNotmsg EXCEPT ![i] = FALSE]
+                       /\ recvQueue'  = [recvQueue  EXCEPT ![i] = Append(Tail(@), n)]
+                    \/ /\ ~peerOk
+                       /\ recvQueue' = [recvQueue EXCEPT ![i] = Tail(@)]
+                       /\ UNCHANGED waitNotmsg
+              /\ UNCHANGED <<serverVarsL, currentVote, logicalClock, receiveVotes, outOfElection, 
+                             leaderVarsL, electionMsgs>>
+           \/ /\ \/ recvQueue[i] = << >>
+                 \/ /\ recvQueue[i] /= << >>
+                    /\ recvQueue[i][1].mtype = NONE
+              /\ state' = [state EXCEPT ![i] = IF currentVote[i].proposedLeader = i THEN LEADING
+                                               ELSE FOLLOWING ]
+              /\ leadingVoteSet' = [leadingVoteSet EXCEPT ![i] = 
+                                                           IF currentVote[i].proposedLeader = i 
+                                                           THEN VoteSet(i, i, receiveVotes[i], currentVote[i],
+                                                                        logicalClock[i])
+                                                           ELSE @]
+              /\ history' = [history EXCEPT ![i] = InitHistory(i)]
+              /\ UNCHANGED <<currentEpoch, lastProcessed, electionVarsL, electionMsgs>>
+-----------------------------------------------------------------------------
+(*Test - simulate modifying currentEpoch and lastProcessed.
+  We want to reach violations to achieve some traces and see whether the whole state of system is advancing.
+  The actions below are completely not equal to implementation in real, 
+  just simulate a process of leader updates state and followers get it. *)
+
+LeaderAdvanceEpoch(i) ==
+        /\ state[i] = LEADING
+        /\ currentEpoch' = [currentEpoch EXCEPT ![i] = @ + 1]
+        /\ UNCHANGED <<state, lastProcessed, history, electionVarsL, leaderVarsL, electionMsgs>>
+
+FollowerUpdateEpoch(i, j) ==
+        /\ state[i] = FOLLOWING
+        /\ currentVote[i].proposedLeader = j
+        /\ state[j] = LEADING
+        /\ currentEpoch[i] < currentEpoch[j]
+        /\ currentEpoch' = [currentEpoch EXCEPT ![i] = currentEpoch[j]]
+        /\ UNCHANGED <<state, lastProcessed, history, electionVarsL, leaderVarsL, electionMsgs>>
+
+LeaderAdvanceZxid(i) ==
+        /\ state[i] = LEADING
+        /\ lastProcessed' = [lastProcessed EXCEPT ![i] = IF lastProcessed[i].zxid[1] = currentEpoch[i] 
+                                               THEN [  index |-> lastProcessed[i].index + 1,
+                                                       zxid  |-> <<currentEpoch[i], lastProcessed[i].zxid[2] + 1>> ]
+                                               ELSE [  index |-> lastProcessed[i].index + 1,
+                                                       zxid  |-> <<currentEpoch[i], 1>> ] ]
+        /\ history' = [history EXCEPT ![i] = Append(@, [zxid   |-> lastProcessed'[i].zxid,
+                                                        value  |-> NONE,
+                                                        ackSid |-> {},
+                                                        epoch  |-> 0])]
+        /\ UNCHANGED <<state, currentEpoch, electionVarsL, leaderVarsL, electionMsgs>>
+
+FollowerUpdateZxid(i, j) ==
+        /\ state[i] = FOLLOWING
+        /\ currentVote[i].proposedLeader = j
+        /\ state[j] = LEADING
+        /\ LET precede == \/ lastProcessed[i].zxid[1] < lastProcessed[j].zxid[1]
+                          \/ /\ lastProcessed[i].zxid[1] = lastProcessed[j].zxid[1]
+                             /\ lastProcessed[i].zxid[2] < lastProcessed[j].zxid[2]
+           IN /\ precede
+              /\ lastProcessed' = [lastProcessed EXCEPT ![i] = lastProcessed[j]]
+              /\ history' = [history EXCEPT ![i] = history[j]]
+        /\ UNCHANGED <<state, currentEpoch, electionVarsL, leaderVarsL, electionMsgs>>
+
+NextL == 
+        \/ \E i \in Server:     ZabTimeout(i)
+        \/ \E i, j \in Server:  ReceiveNotmsg(i, j)
+        \/ \E i \in Server:     NotmsgTimeout(i)
+        \/ \E i \in Server:     HandleNotmsg(i)
+        \/ \E i \in Server:     WaitNewNotmsg(i)
+       
+        \/ \E i \in Server:     LeaderAdvanceEpoch(i)
+        \/ \E i, j \in Server:  FollowerUpdateEpoch(i, j)
+        \/ \E i \in Server:     LeaderAdvanceZxid(i)
+        \/ \E i, j \in Server:  FollowerUpdateZxid(i, j)
+
+SpecL == InitL /\ [][NextL]_varsL
+       
+\* These invariants should be violated after running for minutes.
+
+ShouldBeTriggered1 == ~\E Q \in Quorums: /\ \A i \in Q: /\ state[i] \in {FOLLOWING, LEADING}
+                                                        /\ currentEpoch[i] > 3
+                                                        /\ logicalClock[i] > 2
+                                                        /\ currentVote[i].proposedLeader \in Q
+                                         /\ \A i, j \in Q: currentVote[i].proposedLeader = currentVote[j].proposedLeader
+
+(*
+ShouldBeTriggered2 == ~\E Q \in Quorums: /\ \A i \in Q: /\ state[i] \in {FOLLOWING, LEADING}
+                                                        /\ currentEpoch[i] > 3
+                                                        /\ currentVote[i].proposedLeader \in Q
+                                         /\ \A i, j \in Q: currentVote[i].proposedLeader = currentVote[j].proposedLeader*)
+=============================================================================
+\* Modification History
+\* Last modified Sat Jan 14 15:19:45 CST 2023 by huangbinyu
+\* Last modified Sun Nov 14 15:18:32 CST 2021 by Dell
+\* Created Fri Jun 18 20:23:47 CST 2021 by Dell

+ 2032 - 0
zookeeper-specifications/system-spec/zk-3.7/ZkV3_7_0.tla

@@ -0,0 +1,2032 @@
+------------------------------ MODULE ZkV3_7_0 ------------------------------
+(* This is the system specification for Zab in apache/zookeeper with version 3.7.0 *)
+
+(* Reference:
+   FLE: FastLeaderElection.java, Vote.java, QuorumPeer.java, e.g. in 
+        https://github.com/apache/zookeeper.
+   ZAB: QuorumPeer.java, Learner.java, Follower.java, LearnerHandler.java,
+        Leader.java, e.g. in https://github.com/apache/zookeeper.
+        https://cwiki.apache.org/confluence/display/ZOOKEEPER/Zab1.0.
+ *)
+EXTENDS FastLeaderElection
+-----------------------------------------------------------------------------
+\* The set of requests that can go into history
+\* CONSTANT Value \* Replaced by recorder.nClientRequest
+Value == Nat
+ 
+\* Zab states
+CONSTANTS ELECTION, DISCOVERY, SYNCHRONIZATION, BROADCAST
+
+\* Sync modes & message types
+CONSTANTS DIFF, TRUNC, SNAP
+
+\* Message types
+CONSTANTS FOLLOWERINFO, LEADERINFO, ACKEPOCH, NEWLEADER, ACKLD, 
+          UPTODATE, PROPOSAL, ACK, COMMIT
+(* NOTE: In production, there is no message type ACKLD. Server judges if counter 
+         of ACK is 0 to distinguish one ACK represents ACKLD or not. Here we
+         divide ACK into ACKLD and ACK, to enhance readability of spec.*)
+
+\* Node status
+CONSTANTS ONLINE, OFFLINE
+
+\* [MaxTimeoutFailures, MaxTransactionNum, MaxEpoch, MaxCrashes, MaxPartitions]
+CONSTANT Parameters
+
+MAXEPOCH == 10
+-----------------------------------------------------------------------------
+\* Variables in annotations mean variables defined in FastLeaderElection.
+\* Variables that all servers use.
+VARIABLES zabState,      \* Current phase of server, in
+                         \* {ELECTION, DISCOVERY, SYNCHRONIZATION, BROADCAST}.
+          acceptedEpoch, \* Epoch of the last LEADERINFO packet accepted,
+                         \* namely f.p in paper.
+          lastCommitted, \* Maximum index and zxid known to be committed,
+                         \* namely 'lastCommitted' in Leader. Starts from 0,
+                         \* and increases monotonically before restarting.
+          lastSnapshot,  \* Index and zxid corresponding to latest snapshot
+                         \* from data tree.
+          initialHistory \* history that server initially has before election.
+          \* state,        \* State of server, in {LOOKING, FOLLOWING, LEADING}.
+          \* currentEpoch, \* Epoch of the last NEWLEADER packet accepted,
+                           \* namely f.a in paper.
+          \* lastProcessed,\* Index and zxid of the last processed txn.
+          \* history       \* History of servers: sequence of transactions,
+                           \* containing: zxid, value, ackSid, epoch.
+           \* leader  : [committedRequests + toBeApplied] [outstandingProposals]
+           \* follower: [committedRequests] [pendingTxns] 
+
+\* Variables only used for leader.
+VARIABLES learners,       \* Set of servers leader connects, 
+                          \* namely 'learners' in Leader.
+          connecting,     \* Set of learners leader has received 
+                          \* FOLLOWERINFO from, namely  
+                          \* 'connectingFollowers' in Leader.
+                          \* Set of record [sid, connected].
+          electing,       \* Set of learners leader has received
+                          \* ACKEPOCH from, namely 'electingFollowers'
+                          \* in Leader. Set of record 
+                          \* [sid, peerLastZxid, inQuorum].
+                          \* And peerLastZxid = <<-1,-1>> means has done
+                          \* syncFollower with this sid.
+                          \* inQuorum = TRUE means in code it is one
+                          \* element in 'electingFollowers'.
+          ackldRecv,      \* Set of learners leader has received
+                          \* ACK of NEWLEADER from, namely
+                          \* 'newLeaderProposal' in Leader.
+                          \* Set of record [sid, connected].         
+          forwarding,     \* Set of learners that are synced with
+                          \* leader, namely 'forwardingFollowers'
+                          \* in Leader.
+          tempMaxEpoch    \* ({Maximum epoch in FOLLOWEINFO} + 1) that 
+                          \* leader has received from learners,
+                          \* namely 'epoch' in Leader.
+          \* leadingVoteSet \* Set of voters that follow leader.
+
+\* Variables only used for follower.
+VARIABLES connectInfo, \* Record [sid, syncMode, nlRcv].
+                       \* sid: Leader id that follower has connected with.
+                       \* syncMode: Sync mode according to reveiced Sync Message.
+                       \* nlRcv: If follower has received NEWLEADER.
+          packetsSync  \* packets of PROPOSAL and COMMIT from leader,
+                       \* namely 'packetsNotCommitted' and
+                       \* 'packetsCommitted' in SyncWithLeader
+                       \* in Learner.
+
+\* Variables about network channel.
+VARIABLE msgs       \* Simulates network channel.
+                    \* msgs[i][j] means the input buffer of server j 
+                    \* from server i.
+         \* electionMsgs \* Network channel in FLE module.
+
+\* Variables about status of cluster network and node presence.
+VARIABLES status,    \* Whether the server is online or offline.
+          partition  \* network partion.
+
+\* Variables only used in verifying properties.
+VARIABLES epochLeader,       \* Set of leaders in every epoch.
+          proposalMsgsLog,   \* Set of all broadcast messages.
+          violatedInvariants \* Check whether there are conditions 
+                             \* contrary to the facts.
+\* Variables only used for looking.                             
+\*VARIABLE currentVote,   \* Info of current vote, namely 'currentVote'
+\*                        \* in QuorumPeer.
+\*         logicalClock,  \* Election instance, namely 'logicalClock'
+\*                        \* in FastLeaderElection.
+\*         receiveVotes,  \* Votes from current FLE round, namely
+\*                        \* 'recvset' in FastLeaderElection.
+\*         outOfElection, \* Votes from previous and current FLE round,
+\*                        \* namely 'outofelection' in FastLeaderElection.
+\*         recvQueue,     \* Queue of received notifications or timeout
+\*                        \* signals.
+\*         waitNotmsg     \* Whether waiting for new not.See line 1050
+\*                        \* in FastLeaderElection for details.
+\*VARIABLE idTable \* For mapping Server to Integers,
+                   \* to compare ids between servers.
+    \* Update: we have transformed idTable from variable to function.
+
+\*VARIABLE clientReuqest \* Start from 0, and increases monotonically
+                         \* when LeaderProcessRequest performed. To
+                         \* avoid existing two requests with same value. 
+    \* Update: Remove it to recorder.nClientRequest.
+
+\* Variable used for recording critical data,
+\* to constrain state space or update values.
+VARIABLE recorder \* Consists: members of Parameters and pc, values.
+                  \* Form is record: 
+                  \* [pc, nTransaction, maxEpoch, nTimeout, nClientRequest,
+                  \*  nPartition, nCrash]
+
+serverVars == <<state, currentEpoch, lastProcessed, zabState, acceptedEpoch,
+                history, lastCommitted, lastSnapshot, initialHistory>>       
+electionVars == electionVarsL  
+leaderVars == <<leadingVoteSet, learners, connecting, electing, 
+                 ackldRecv, forwarding, tempMaxEpoch>>                                           
+followerVars == <<connectInfo, packetsSync>>                
+verifyVars == <<proposalMsgsLog, epochLeader, violatedInvariants>>   
+msgVars == <<msgs, electionMsgs>>
+envVars == <<status, partition>>                        
+
+vars == <<serverVars, electionVars, leaderVars, followerVars,
+          verifyVars, msgVars, envVars, recorder>> 
+-----------------------------------------------------------------------------
+ServersIncNullPoint == Server \union {NullPoint} 
+
+Zxid ==
+    Seq(Nat \union {-1}) 
+    
+HistoryItem ==
+     [ zxid: Zxid,
+       value: Value,
+       ackSid: SUBSET Server,
+       epoch: Nat ]    
+    
+Proposal ==
+    [ source: Server, 
+      epoch: Nat,
+      zxid: Zxid,
+      data: Value ]   
+
+LastItem ==
+    [ index: Nat, zxid: Zxid ]
+
+SyncPackets == 
+    [ notCommitted: Seq(HistoryItem),
+      committed: Seq(Zxid) ]
+
+Message ==
+    [ mtype: {FOLLOWERINFO}, mzxid: Zxid ] \union
+    [ mtype: {LEADERINFO}, mzxid: Zxid ] \union
+    [ mtype: {ACKEPOCH}, mzxid: Zxid, mepoch: Nat \union {-1} ] \union
+    [ mtype: {DIFF}, mzxid: Zxid ] \union 
+    [ mtype: {TRUNC}, mtruncZxid: Zxid ] \union 
+    [ mtype: {SNAP}, msnapZxid: Zxid, msnapshot: Seq(HistoryItem)] \union
+    [ mtype: {PROPOSAL}, mzxid: Zxid, mdata: Value ] \union 
+    [ mtype: {COMMIT}, mzxid: Zxid ] \union 
+    [ mtype: {NEWLEADER}, mzxid: Zxid ] \union 
+    [ mtype: {ACKLD}, mzxid: Zxid ] \union 
+    [ mtype: {ACK}, mzxid: Zxid ] \union 
+    [ mtype: {UPTODATE}, mzxid: Zxid ]
+ 
+ElectionState == {LOOKING, FOLLOWING, LEADING}
+
+ZabState == {ELECTION, DISCOVERY, SYNCHRONIZATION, BROADCAST}
+
+ViolationSet == {"stateInconsistent", "proposalInconsistent", 
+                 "commitInconsistent", "ackInconsistent", 
+                 "messageIllegal" }
+
+SyncMode == {DIFF, TRUNC, SNAP, NONE}
+
+Connecting == [ sid : Server,
+                connected: BOOLEAN ]
+
+AckldRecv  == Connecting
+
+Electing == [ sid: Server,
+              peerLastZxid: Zxid,
+              inQuorum: BOOLEAN  ]
+
+ConnectInfo == [ sid : ServersIncNullPoint,
+                 syncMode: SyncMode,
+                 nlRcv: BOOLEAN ]
+
+Vote ==
+    [proposedLeader: ServersIncNullPoint,
+     proposedZxid: Zxid,
+     proposedEpoch: Nat ]
+     
+ElectionVote ==
+    [vote: Vote, round: Nat, state: ElectionState, version: Nat]
+
+ElectionMsg ==
+    [ mtype: {NOTIFICATION}, 
+      msource: Server, 
+      mstate: ElectionState, 
+      mround: Nat, 
+      mvote: Vote ] \union
+    [ mtype: {NONE} ]
+            
+TypeOK ==
+    /\ zabState \in [Server -> ZabState]
+    /\ acceptedEpoch \in [Server -> Nat]
+    /\ lastCommitted \in [Server -> LastItem]
+    /\ lastSnapshot \in [Server -> LastItem]
+    /\ learners \in [Server -> SUBSET Server]
+    /\ connecting \in [Server -> SUBSET Connecting]
+    /\ electing \in [Server -> SUBSET Electing]
+    /\ ackldRecv \in [Server -> SUBSET AckldRecv]
+    /\ forwarding \in [Server -> SUBSET Server]
+    /\ initialHistory \in [Server -> Seq(HistoryItem)] 
+    /\ tempMaxEpoch \in [Server -> Nat]
+    /\ connectInfo \in [Server -> ConnectInfo]
+    /\ packetsSync \in [Server -> SyncPackets]
+    /\ status \in [Server -> {ONLINE, OFFLINE}]
+    /\ partition \in [Server -> [Server -> BOOLEAN ]]
+    /\ proposalMsgsLog \in SUBSET Proposal
+    /\ epochLeader \in [1..MAXEPOCH -> SUBSET Server]
+    /\ violatedInvariants \in [ViolationSet -> BOOLEAN]
+    /\ msgs \in [Server -> [Server -> Seq(Message)]]
+    \* Fast Leader Election
+    /\ electionMsgs \in [Server -> [Server -> Seq(ElectionMsg)]]
+    /\ recvQueue \in [Server -> Seq(ElectionMsg)]
+    /\ leadingVoteSet \in [Server -> SUBSET Server]
+    /\ receiveVotes \in [Server -> [Server -> ElectionVote]]
+    /\ currentVote \in [Server -> Vote]
+    /\ outOfElection \in [Server -> [Server -> ElectionVote]]
+    /\ lastProcessed \in [Server -> LastItem]
+    /\ history \in [Server -> Seq(HistoryItem)]
+    /\ state \in [Server -> ElectionState]
+    /\ waitNotmsg \in [Server -> BOOLEAN]
+    /\ currentEpoch \in [Server -> Nat]
+    /\ logicalClock \in [Server -> Nat]
+-----------------------------------------------------------------------------
+\* Return the maximum value from the set S
+Maximum(S) == IF S = {} THEN -1
+                        ELSE CHOOSE n \in S: \A m \in S: n >= m
+
+\* Return the minimum value from the set S
+Minimum(S) == IF S = {} THEN -1
+                        ELSE CHOOSE n \in S: \A m \in S: n <= m
+
+\* Check server state
+IsON(s)  == status[s] = ONLINE 
+IsOFF(s) == status[s] = OFFLINE
+
+IsLeader(s)   == state[s] = LEADING
+IsFollower(s) == state[s] = FOLLOWING
+IsLooking(s)  == state[s] = LOOKING
+
+IsMyLearner(i, j) == j \in learners[i]
+IsMyLeader(i, j)  == connectInfo[i].sid = j
+HasNoLeader(i)    == connectInfo[i].sid = NullPoint
+HasLeader(i)      == connectInfo[i].sid /= NullPoint
+MyVote(i)         == currentVote[i].proposedLeader 
+
+\* Check if s is a quorum
+IsQuorum(s) == s \in Quorums
+
+HasPartitioned(i, j) == /\ partition[i][j] = TRUE 
+                        /\ partition[j][i] = TRUE
+-----------------------------------------------------------------------------
+\* Check zxid state
+ToZxid(z) == [epoch |-> z[1], counter |-> z[2]]
+
+TxnZxidEqual(txn, z) == txn.zxid[1] = z[1] /\ txn.zxid[2] = z[2]
+
+TxnEqual(txn1, txn2) == /\ ZxidEqual(txn1.zxid, txn2.zxid)
+                        /\ txn1.value = txn2.value
+
+EpochPrecedeInTxn(txn1, txn2) == txn1.zxid[1] < txn2.zxid[1]
+-----------------------------------------------------------------------------
+\* Actions about recorder
+GetParameter(p) == IF p \in DOMAIN Parameters THEN Parameters[p] ELSE 0
+GetRecorder(p)  == IF p \in DOMAIN recorder   THEN recorder[p]   ELSE 0
+
+RecorderGetHelper(m) == (m :> recorder[m])
+RecorderIncHelper(m) == (m :> recorder[m] + 1)
+
+RecorderIncTimeout == RecorderIncHelper("nTimeout")
+RecorderGetTimeout == RecorderGetHelper("nTimeout")
+RecorderIncCrash   == RecorderIncHelper("nCrash")
+RecorderGetCrash   == RecorderGetHelper("nCrash")
+
+RecorderSetTransactionNum(pc) == ("nTransaction" :> 
+                                IF pc[1] = "LeaderProcessRequest" THEN
+                                    LET s == CHOOSE i \in Server: 
+                                        \A j \in Server: Len(history'[i]) >= Len(history'[j])                       
+                                    IN Len(history'[s])
+                                ELSE recorder["nTransaction"])
+RecorderSetMaxEpoch(pc)       == ("maxEpoch" :> 
+                                IF pc[1] = "LeaderProcessFOLLOWERINFO" THEN
+                                    LET s == CHOOSE i \in Server:
+                                        \A j \in Server: acceptedEpoch'[i] >= acceptedEpoch'[j]
+                                    IN acceptedEpoch'[s]
+                                ELSE recorder["maxEpoch"])
+RecorderSetRequests(pc)       == ("nClientRequest" :>
+                                IF pc[1] = "LeaderProcessRequest" THEN
+                                    recorder["nClientRequest"] + 1
+                                ELSE recorder["nClientRequest"] )
+RecorderSetPartition(pc)      == ("nPartition" :> 
+                                IF pc[1] = "PartitionStart" THEN recorder["nPartition"] + 1
+                                                            ELSE recorder["nPartition"] )  
+RecorderSetPc(pc)      == ("pc" :> pc)
+RecorderSetFailure(pc) == CASE pc[1] = "Timeout"         -> RecorderIncTimeout @@ RecorderGetCrash
+                          []   pc[1] = "LeaderTimeout"   -> RecorderIncTimeout @@ RecorderGetCrash 
+                          []   pc[1] = "FollowerTimeout" -> RecorderIncTimeout @@ RecorderGetCrash 
+                          []   pc[1] = "PartitionStart"  -> IF IsLooking(pc[2]) 
+                                                            THEN RecorderGetTimeout @@ RecorderGetCrash
+                                                            ELSE RecorderIncTimeout @@ RecorderGetCrash
+                          []   pc[1] = "NodeCrash"       -> IF IsLooking(pc[2]) 
+                                                            THEN RecorderGetTimeout @@ RecorderIncCrash 
+                                                            ELSE RecorderIncTimeout @@ RecorderIncCrash 
+                          []   OTHER                     -> RecorderGetTimeout @@ RecorderGetCrash
+
+UpdateRecorder(pc) == recorder' = RecorderSetPartition(pc)
+                                  @@  RecorderSetFailure(pc)  @@ RecorderSetTransactionNum(pc)
+                                  @@ RecorderSetMaxEpoch(pc)  @@ RecorderSetPc(pc) 
+                                  @@ RecorderSetRequests(pc)  @@ recorder
+UnchangeRecorder   == UNCHANGED recorder
+
+CheckParameterHelper(n, p, Comp(_,_)) == IF p \in DOMAIN Parameters 
+                                         THEN Comp(n, Parameters[p])
+                                         ELSE TRUE
+CheckParameterLimit(n, p) == CheckParameterHelper(n, p, LAMBDA i, j: i < j)
+
+CheckTimeout        == CheckParameterLimit(recorder.nTimeout,     "MaxTimeoutFailures")
+CheckTransactionNum == CheckParameterLimit(recorder.nTransaction, "MaxTransactionNum")
+CheckEpoch          == CheckParameterLimit(recorder.maxEpoch,     "MaxEpoch")
+CheckPartition      == /\ CheckTimeout
+                       /\ CheckParameterLimit(recorder.nPartition,   "MaxPartitions")
+CheckCrash(i)       == /\ \/ IsLooking(i)
+                          \/ /\ ~IsLooking(i)
+                             /\ CheckTimeout
+                       /\ CheckParameterLimit(recorder.nCrash,    "MaxCrashes")
+
+CheckStateConstraints == CheckTimeout /\ CheckTransactionNum /\ CheckEpoch
+-----------------------------------------------------------------------------
+\* Actions about network
+PendingFOLLOWERINFO(i, j) == /\ msgs[j][i] /= << >>
+                             /\ msgs[j][i][1].mtype = FOLLOWERINFO
+PendingLEADERINFO(i, j)   == /\ msgs[j][i] /= << >>
+                             /\ msgs[j][i][1].mtype = LEADERINFO
+PendingACKEPOCH(i, j)     == /\ msgs[j][i] /= << >>
+                             /\ msgs[j][i][1].mtype = ACKEPOCH
+PendingNEWLEADER(i, j)    == /\ msgs[j][i] /= << >>
+                             /\ msgs[j][i][1].mtype = NEWLEADER
+PendingACKLD(i, j)        == /\ msgs[j][i] /= << >>
+                             /\ msgs[j][i][1].mtype = ACKLD
+PendingUPTODATE(i, j)     == /\ msgs[j][i] /= << >>
+                             /\ msgs[j][i][1].mtype = UPTODATE
+PendingPROPOSAL(i, j)     == /\ msgs[j][i] /= << >>
+                             /\ msgs[j][i][1].mtype = PROPOSAL
+PendingACK(i, j)          == /\ msgs[j][i] /= << >>
+                             /\ msgs[j][i][1].mtype = ACK
+PendingCOMMIT(i, j)       == /\ msgs[j][i] /= << >>
+                             /\ msgs[j][i][1].mtype = COMMIT
+\* Add a message to msgs - add a message m to msgs.
+Send(i, j, m) == msgs' = [msgs EXCEPT ![i][j] = Append(msgs[i][j], m)]
+SendPackets(i, j, ms) == msgs' = [msgs EXCEPT ![i][j] = msgs[i][j] \o ms ]
+DiscardAndSendPackets(i, j, ms) == msgs' = [msgs EXCEPT ![j][i] = Tail(msgs[j][i]), 
+                                                ![i][j] = msgs[i][j] \o ms ]
+\* Remove a message from msgs - discard head of msgs.
+Discard(i, j) == msgs' = IF msgs[i][j] /= << >> THEN [msgs EXCEPT ![i][j] = Tail(msgs[i][j])]
+                                                ELSE msgs
+\* Leader broadcasts a message(PROPOSAL/COMMIT) to all other servers in forwardingFollowers.
+Broadcast(i, m) == msgs' = [msgs EXCEPT ![i] = [v \in Server |-> IF /\ v \in forwarding[i]
+                                                                    /\ v /= i
+                                                                 THEN Append(msgs[i][v], m)
+                                                                 ELSE msgs[i][v]]]                                                           
+DiscardAndBroadcast(i, j, m) ==
+        msgs' = [msgs EXCEPT ![j][i] = Tail(msgs[j][i]),
+                             ![i] = [v \in Server |-> IF /\ v \in forwarding[i]
+                                                         /\ v /= i
+                                                      THEN Append(msgs[i][v], m)
+                                                      ELSE msgs[i][v]]]            
+\* Leader broadcasts LEADERINFO to all other servers in connectingFollowers.
+DiscardAndBroadcastLEADERINFO(i, j, m) ==
+        LET new_connecting_quorum == {c \in connecting'[i]: c.connected = TRUE }
+            new_sid_connecting == {c.sid: c \in new_connecting_quorum }
+        IN 
+        msgs' = [msgs EXCEPT ![j][i] = Tail(msgs[j][i]),
+                             ![i] = [v \in Server |-> IF /\ v \in new_sid_connecting
+                                                         /\ v \in learners[i] 
+                                                         /\ v /= i
+                                                      THEN Append(msgs[i][v], m)
+                                                      ELSE msgs[i][v] ] ]
+\* Leader broadcasts UPTODATE to all other servers in newLeaderProposal.
+DiscardAndBroadcastUPTODATE(i, j, m) ==
+        LET new_ackldRecv_quorum == {a \in ackldRecv'[i]: a.connected = TRUE }
+            new_sid_ackldRecv == {a.sid: a \in new_ackldRecv_quorum}
+        IN
+        msgs' = [msgs EXCEPT ![j][i] = Tail(msgs[j][i]),
+                             ![i] = [v \in Server |-> IF /\ v \in new_sid_ackldRecv
+                                                         /\ v \in learners[i] 
+                                                         /\ v /= i
+                                                      THEN Append(msgs[i][v], m)
+                                                      ELSE msgs[i][v] ] ]
+\* Combination of Send and Discard - discard head of msgs[j][i] and add m into msgs.
+Reply(i, j, m) == msgs' = [msgs EXCEPT ![j][i] = Tail(msgs[j][i]),
+                                       ![i][j] = Append(msgs[i][j], m)]
+
+\* Shuffle input buffer.
+Clean(i, j) == msgs' = [msgs EXCEPT ![j][i] = << >>, ![i][j] = << >>]     
+CleanInputBuffer(i) == msgs' = [s \in Server |-> [v \in Server |-> IF v = i THEN << >>
+                                                                   ELSE msgs[s][v]]]  
+CleanInputBufferInCluster(S) == msgs' = [s \in Server |-> 
+                                            [v \in Server |-> IF v \in S THEN << >>
+                                                              ELSE msgs[s][v] ] ]                      
+-----------------------------------------------------------------------------
+\* Define initial values for all variables 
+InitServerVars == /\ InitServerVarsL
+                  /\ zabState      = [s \in Server |-> ELECTION]
+                  /\ acceptedEpoch = [s \in Server |-> 0]
+                  /\ lastCommitted = [s \in Server |-> [ index |-> 0,
+                                                         zxid  |-> <<0, 0>> ] ]
+                  /\ lastSnapshot  = [s \in Server |-> [ index |-> 0,
+                                                         zxid  |-> <<0, 0>> ] ]
+                  /\ initialHistory = [s \in Server |-> << >>]
+
+InitLeaderVars == /\ InitLeaderVarsL
+                  /\ learners         = [s \in Server |-> {}]
+                  /\ connecting       = [s \in Server |-> {}]
+                  /\ electing         = [s \in Server |-> {}]
+                  /\ ackldRecv        = [s \in Server |-> {}]
+                  /\ forwarding       = [s \in Server |-> {}]
+                  /\ tempMaxEpoch     = [s \in Server |-> 0]
+
+InitElectionVars == InitElectionVarsL
+
+InitFollowerVars == /\ connectInfo = [s \in Server |-> [sid |-> NullPoint,
+                                                        syncMode |-> NONE,
+                                                        nlRcv |-> FALSE ] ]
+                    /\ packetsSync = [s \in Server |->
+                                        [ notCommitted |-> << >>,
+                                          committed    |-> << >> ] ]
+
+InitVerifyVars == /\ proposalMsgsLog    = {}
+                  /\ epochLeader        = [e \in 1..MAXEPOCH |-> {} ]
+                  /\ violatedInvariants = [stateInconsistent    |-> FALSE,
+                                           proposalInconsistent |-> FALSE,
+                                           commitInconsistent   |-> FALSE,
+                                           ackInconsistent      |-> FALSE,
+                                           messageIllegal       |-> FALSE ]
+                   
+InitMsgVars == /\ msgs         = [s \in Server |-> [v \in Server |-> << >>] ]
+               /\ electionMsgs = [s \in Server |-> [v \in Server |-> << >>] ]
+
+InitEnvVars == /\ status    = [s \in Server |-> ONLINE ]
+               /\ partition = [s \in Server |-> [v \in Server |-> FALSE] ]
+                
+InitRecorder == recorder = [nTimeout       |-> 0,
+                            nTransaction   |-> 0,
+                            nPartition     |-> 0,
+                            maxEpoch       |-> 0,
+                            nCrash         |-> 0,
+                            pc             |-> <<"Init">>,
+                            nClientRequest |-> 0]
+
+Init == /\ InitServerVars
+        /\ InitLeaderVars
+        /\ InitElectionVars
+        /\ InitFollowerVars
+        /\ InitVerifyVars
+        /\ InitMsgVars
+        /\ InitEnvVars
+        /\ InitRecorder
+-----------------------------------------------------------------------------
+ZabTurnToLeading(i) ==
+        /\ zabState'       = [zabState   EXCEPT ![i] = DISCOVERY]
+        /\ learners'       = [learners   EXCEPT ![i] = {i}]
+        /\ connecting'     = [connecting EXCEPT ![i] = { [ sid       |-> i,
+                                                           connected |-> TRUE ] }]
+        /\ electing'       = [electing   EXCEPT ![i] = { [ sid          |-> i,
+                                                           peerLastZxid |-> <<-1,-1>>,
+                                                           inQuorum     |-> TRUE ] }]
+        /\ ackldRecv'      = [ackldRecv  EXCEPT ![i] = { [ sid       |-> i,
+                                                           connected |-> TRUE ] }]
+        /\ forwarding'     = [forwarding EXCEPT ![i] = {}]
+        /\ initialHistory' = [initialHistory EXCEPT ![i] = history'[i]]
+        /\ tempMaxEpoch'   = [tempMaxEpoch   EXCEPT ![i] = acceptedEpoch[i] + 1]
+
+ZabTurnToFollowing(i) ==
+        /\ zabState' = [zabState EXCEPT ![i] = DISCOVERY]
+        /\ initialHistory' = [initialHistory EXCEPT ![i] = history'[i]]
+        /\ packetsSync' = [packetsSync EXCEPT ![i].notCommitted = << >>, 
+                                              ![i].committed = << >> ]
+          
+(* Fast Leader Election *)
+FLEReceiveNotmsg(i, j) ==
+        /\ IsON(i)
+        /\ ReceiveNotmsg(i, j)
+        /\ UNCHANGED <<zabState, acceptedEpoch, lastCommitted, learners, connecting, 
+                      initialHistory, electing, ackldRecv, forwarding, tempMaxEpoch,
+                      lastSnapshot, followerVars, verifyVars, envVars, msgs>>
+        /\ UpdateRecorder(<<"FLEReceiveNotmsg", i, j>>)
+
+FLENotmsgTimeout(i) ==
+        /\ IsON(i)
+        /\ NotmsgTimeout(i)
+        /\ UNCHANGED <<zabState, acceptedEpoch, lastCommitted, learners, connecting, 
+                       initialHistory, electing, ackldRecv, forwarding, tempMaxEpoch, 
+                       lastSnapshot, followerVars, verifyVars, envVars, msgs>>
+        /\ UpdateRecorder(<<"FLENotmsgTimeout", i>>)
+
+FLEHandleNotmsg(i) ==
+        /\ IsON(i)
+        /\ HandleNotmsg(i)
+        /\ LET newState == state'[i]
+           IN
+           \/ /\ newState = LEADING
+              /\ ZabTurnToLeading(i)
+              /\ UNCHANGED packetsSync
+           \/ /\ newState = FOLLOWING
+              /\ ZabTurnToFollowing(i)
+              /\ UNCHANGED <<learners, connecting, electing, ackldRecv, 
+                            forwarding, tempMaxEpoch>>
+           \/ /\ newState = LOOKING
+              /\ UNCHANGED <<zabState, learners, connecting, electing, ackldRecv,
+                             forwarding, tempMaxEpoch, packetsSync, initialHistory>>
+        /\ UNCHANGED <<lastCommitted, lastSnapshot, acceptedEpoch,
+                       connectInfo, verifyVars, envVars, msgs>>
+        /\ UpdateRecorder(<<"FLEHandleNotmsg", i>>)
+
+\* On the premise that ReceiveVotes.HasQuorums = TRUE, 
+\* corresponding to logic in FastLeaderElection.
+FLEWaitNewNotmsg(i) ==
+        /\ IsON(i)
+        /\ WaitNewNotmsg(i)
+        /\ LET newState == state'[i]
+           IN
+           \/ /\ newState = LEADING
+              /\ ZabTurnToLeading(i)
+              /\ UNCHANGED packetsSync
+           \/ /\ newState = FOLLOWING
+              /\ ZabTurnToFollowing(i)
+              /\ UNCHANGED <<learners, connecting, electing, ackldRecv, forwarding,
+                             tempMaxEpoch>>
+           \/ /\ newState = LOOKING
+              /\ PrintT("Note: New state is LOOKING in FLEWaitNewNotmsgEnd," \o 
+                    " which should not happen.")
+              /\ UNCHANGED <<zabState, learners, connecting, electing, ackldRecv,
+                             forwarding, tempMaxEpoch, initialHistory, packetsSync>>
+        /\ UNCHANGED <<lastCommitted, lastSnapshot, acceptedEpoch,
+                       connectInfo, verifyVars, envVars, msgs>>
+        /\ UpdateRecorder(<<"FLEWaitNewNotmsg", i>>)
+-----------------------------------------------------------------------------
+InitialVotes == [ vote    |-> InitialVote,
+                  round   |-> 0,
+                  state   |-> LOOKING,
+                  version |-> 0 ]
+
+InitialConnectInfo == [sid        |-> NullPoint,
+                       syncMode   |-> NONE,
+                       nlRcv      |-> FALSE ]
+
+\* Equals to for every server in S, performing action ZabTimeout.
+ZabTimeoutInCluster(S) ==
+        /\ state' = [s \in Server |-> IF s \in S THEN LOOKING ELSE state[s] ]
+        /\ lastProcessed' = [s \in Server |-> IF s \in S THEN InitLastProcessed(s)
+                                                         ELSE lastProcessed[s] ]
+        /\ logicalClock' = [s \in Server |-> IF s \in S THEN logicalClock[s] + 1 
+                                                        ELSE logicalClock[s] ]
+        /\ currentVote' = [s \in Server |-> IF s \in S THEN
+                                                       [proposedLeader |-> s,
+                                                        proposedZxid   |-> lastProcessed'[s].zxid,
+                                                        proposedEpoch  |-> currentEpoch[s] ]
+                                                       ELSE currentVote[s] ]
+        /\ receiveVotes' = [s \in Server |-> IF s \in S THEN [v \in Server |-> InitialVotes]
+                                                        ELSE receiveVotes[s] ]
+        /\ outOfElection' = [s \in Server |-> IF s \in S THEN [v \in Server |-> InitialVotes]
+                                                         ELSE outOfElection[s] ]
+        /\ recvQueue' = [s \in Server |-> IF s \in S THEN << [mtype |-> NONE] >> 
+                                                     ELSE recvQueue[s] ]
+        /\ waitNotmsg' = [s \in Server |-> IF s \in S THEN FALSE ELSE waitNotmsg[s] ]
+        /\ leadingVoteSet' = [s \in Server |-> IF s \in S THEN {} ELSE leadingVoteSet[s] ]
+        /\ UNCHANGED <<electionMsgs, currentEpoch, history>>
+        /\ zabState' = [s \in Server |-> IF s \in S THEN ELECTION ELSE zabState[s] ]
+        /\ connectInfo' = [s \in Server |-> IF s \in S THEN InitialConnectInfo
+                                                       ELSE connectInfo[s] ]
+        /\ CleanInputBufferInCluster(S)
+
+(* Describe how a server transitions from LEADING/FOLLOWING to LOOKING.*)
+FollowerShutdown(i) ==
+        /\ ZabTimeout(i)
+        /\ zabState'    = [zabState    EXCEPT ![i] = ELECTION]
+        /\ connectInfo' = [connectInfo EXCEPT ![i] = InitialConnectInfo]
+
+LeaderShutdown(i) ==
+        /\ LET cluster == {i} \union learners[i]
+           IN ZabTimeoutInCluster(cluster)
+        /\ learners'   = [learners   EXCEPT ![i] = {}]
+        /\ forwarding' = [forwarding EXCEPT ![i] = {}]
+
+RemoveElecting(set, sid) ==
+        LET sid_electing == {s.sid: s \in set }
+        IN IF sid \notin sid_electing THEN set
+           ELSE LET info == CHOOSE s \in set: s.sid = sid
+                    new_info == [ sid          |-> sid,
+                                  peerLastZxid |-> <<-1, -1>>,
+                                  inQuorum     |-> info.inQuorum ]
+                IN (set \ {info}) \union {new_info}
+RemoveConnectingOrAckldRecv(set, sid) ==
+        LET sid_set == {s.sid: s \in set}
+        IN IF sid \notin sid_set THEN set
+           ELSE LET info == CHOOSE s \in set: s.sid = sid
+                    new_info == [ sid       |-> sid,
+                                  connected |-> FALSE ]
+                IN (set \ {info}) \union {new_info}
+
+\* See removeLearnerHandler for details.
+RemoveLearner(i, j) ==
+        /\ learners'   = [learners   EXCEPT ![i] = @ \ {j}] 
+        /\ forwarding' = [forwarding EXCEPT ![i] = IF j \in forwarding[i] 
+                                                   THEN @ \ {j} ELSE @ ]
+        /\ electing'   = [electing   EXCEPT ![i] = RemoveElecting(@, j) ]
+        /\ connecting' = [connecting EXCEPT ![i] = RemoveConnectingOrAckldRecv(@, j) ]
+        /\ ackldRecv'  = [ackldRecv  EXCEPT ![i] = RemoveConnectingOrAckldRecv(@, j) ]
+-----------------------------------------------------------------------------
+\* Actions of situation error. 
+PartitionStart(i, j) ==
+        /\ CheckPartition \* test restrictions of partition
+        /\ i /= j
+        /\ IsON(i)
+        /\ IsON(j)
+        /\ \lnot HasPartitioned(i, j)
+        /\ \/ /\ IsLeader(i)   /\ IsMyLearner(i, j)
+              /\ IsFollower(j) /\ IsMyLeader(j, i)
+              /\ LET newLearners == learners[i] \ {j}
+                 IN \/ /\ IsQuorum(newLearners)   \* just remove this learner
+                       /\ RemoveLearner(i, j)
+                       /\ FollowerShutdown(j)
+                       /\ Clean(i ,j)
+                    \/ /\ ~IsQuorum(newLearners)  \* leader switches to looking
+                       /\ LeaderShutdown(i)
+                       /\ UNCHANGED <<connecting, electing, ackldRecv>>
+           \/ /\ IsLooking(i)
+              /\ IsLooking(j)
+              /\ IdCompare(i, j)
+              /\ UNCHANGED <<varsL, zabState, connectInfo, msgs, learners,
+                             forwarding, connecting, electing, ackldRecv>>
+        /\ partition' = [partition EXCEPT ![i][j] = TRUE, ![j][i] = TRUE ]
+        /\ UNCHANGED <<acceptedEpoch, lastCommitted, lastSnapshot, tempMaxEpoch,
+                       initialHistory, verifyVars, packetsSync, status>>
+        /\ UpdateRecorder(<<"PartitionStart", i, j>>)
+
+PartitionRecover(i, j) ==
+        /\ IsON(i)
+        /\ IsON(j)
+        /\ IdCompare(i, j)
+        /\ HasPartitioned(i, j)
+        /\ partition' = [partition EXCEPT ![i][j] = FALSE, ![j][i] = FALSE ]
+        /\ UNCHANGED <<serverVars, leaderVars, electionVars, followerVars,
+                       verifyVars, msgVars, status>>
+        /\ UpdateRecorder(<<"PartitionRecover", i, j>>)
+
+NodeCrash(i) ==
+        /\ CheckCrash(i)
+        /\ IsON(i)
+        /\ status' = [status EXCEPT ![i] = OFFLINE ]
+        /\ \/ /\ IsLooking(i)
+              /\ UNCHANGED <<varsL, zabState, connectInfo, msgs, learners,
+                             forwarding, connecting, electing, ackldRecv>>
+           \/ /\ IsFollower(i)
+              /\ LET connectedWithLeader == HasLeader(i)
+                 IN \/ /\ connectedWithLeader
+                       /\ LET leader == connectInfo[i].sid
+                              newCluster == learners[leader] \ {i}
+                          IN 
+                          \/ /\ IsQuorum(newCluster)
+                             /\ RemoveLearner(leader, i) 
+                             /\ FollowerShutdown(i)
+                             /\ Clean(leader, i)
+                          \/ /\ ~IsQuorum(newCluster)
+                             /\ LeaderShutdown(leader)
+                             /\ UNCHANGED <<electing, connecting, ackldRecv>>
+                    \/ /\ ~connectedWithLeader
+                       /\ FollowerShutdown(i)
+                       /\ CleanInputBuffer({i})
+                       /\ UNCHANGED <<learners, forwarding, connecting, electing, ackldRecv>>
+           \/ /\ IsLeader(i)
+              /\ LeaderShutdown(i)
+              /\ UNCHANGED <<electing, connecting, ackldRecv>>
+        /\ UNCHANGED <<acceptedEpoch, lastCommitted, lastSnapshot, tempMaxEpoch,
+                       initialHistory, verifyVars, packetsSync, partition>>
+        /\ UpdateRecorder(<<"NodeCrash", i>>)
+
+NodeStart(i) ==
+        /\ IsOFF(i)
+        /\ status' = [status EXCEPT ![i] = ONLINE ]
+        /\ lastProcessed' = [lastProcessed  EXCEPT ![i] = InitLastProcessed(i)]
+        /\ lastCommitted' = [lastCommitted  EXCEPT ![i] = lastSnapshot[i]]
+        /\ UNCHANGED <<state, currentEpoch, zabState, acceptedEpoch, history, 
+                       lastSnapshot, initialHistory, leaderVars, electionVars, 
+                       followerVars, verifyVars, msgVars, partition>>
+        /\ UpdateRecorder(<<"NodeStart", i>>)
+-----------------------------------------------------------------------------
+(* Establish connection between leader and follower, containing actions like 
+   addLearnerHandler, findLeader, connectToLeader.*)
+ConnectAndFollowerSendFOLLOWERINFO(i, j) ==
+        /\ IsON(i)     /\ IsON(j)
+        /\ IsLeader(i) /\ \lnot IsMyLearner(i, j)
+        /\ IsFollower(j) /\ HasNoLeader(j) /\ MyVote(j) = i
+        /\ learners'   = [learners   EXCEPT ![i] = learners[i] \union {j}] 
+        /\ connectInfo' = [connectInfo EXCEPT ![j].sid = i]
+        /\ Send(j, i, [ mtype |-> FOLLOWERINFO,
+                        mzxid |-> <<acceptedEpoch[j], 0>> ])  
+        /\ UNCHANGED <<serverVars, electionVars, leadingVoteSet, connecting, 
+                       electing, ackldRecv, forwarding, tempMaxEpoch,
+                       verifyVars, envVars, electionMsgs, packetsSync>>
+        /\ UpdateRecorder(<<"ConnectAndFollowerSendFOLLOWERINFO", i, j>>)
+
+\* waitingForNewEpoch in Leader
+WaitingForNewEpoch(i, set) == (i \in set /\ IsQuorum(set)) = FALSE
+WaitingForNewEpochTurnToFalse(i, set) == /\ i \in set
+                                         /\ IsQuorum(set) 
+
+\* There may exists some follower in connecting but shuts down and
+\* connects again. So when leader broadcasts LEADERINFO, the
+\* follower will receive LEADERINFO, and receive it again after
+\* sending FOLLOWERINFO. So connected makes sure each follower
+\* will only receive LEADERINFO at most once before timeout. 
+UpdateConnectingOrAckldRecv(oldSet, sid) ==
+        LET sid_set == {s.sid: s \in oldSet}
+        IN IF sid \in sid_set
+           THEN LET old_info == CHOOSE info \in oldSet: info.sid = sid
+                    follower_info == [ sid       |-> sid,
+                                       connected |-> TRUE ]
+                IN (oldSet \ {old_info} ) \union {follower_info}
+           ELSE LET follower_info == [ sid       |-> sid,
+                                       connected |-> TRUE ]
+                IN oldSet \union {follower_info}
+
+(* Leader waits for receiving FOLLOWERINFO from a quorum including itself,
+   and chooses a new epoch e' as its own epoch and broadcasts LEADERINFO.
+   See getEpochToPropose in Leader for details. *)
+LeaderProcessFOLLOWERINFO(i, j) ==
+        /\ CheckEpoch  \* test restrictions of max epoch
+        /\ IsON(i)
+        /\ IsLeader(i)
+        /\ PendingFOLLOWERINFO(i, j)
+        /\ LET msg == msgs[j][i][1]
+               infoOk == IsMyLearner(i, j)
+               lastAcceptedEpoch == msg.mzxid[1]
+               sid_connecting == {c.sid: c \in connecting[i]}
+           IN 
+           /\ infoOk
+           /\ \/ \* 1. has not broadcast LEADERINFO 
+                 /\ WaitingForNewEpoch(i, sid_connecting)
+                 /\ \/ /\ zabState[i] = DISCOVERY
+                       /\ UNCHANGED violatedInvariants
+                    \/ /\ zabState[i] /= DISCOVERY
+                       /\ PrintT("Exception: waitingFotNewEpoch true," \o
+                          " while zabState not DISCOVERY.")
+                       /\ violatedInvariants' = [violatedInvariants EXCEPT !.stateInconsistent = TRUE]
+                 /\ tempMaxEpoch' = [tempMaxEpoch EXCEPT ![i] = IF lastAcceptedEpoch >= tempMaxEpoch[i] 
+                                                                THEN lastAcceptedEpoch + 1
+                                                                ELSE @]
+                 /\ connecting'   = [connecting   EXCEPT ![i] = UpdateConnectingOrAckldRecv(@, j) ]
+                 /\ LET new_sid_connecting == {c.sid: c \in connecting'[i]}
+                    IN
+                    \/ /\ WaitingForNewEpochTurnToFalse(i, new_sid_connecting)
+                       /\ acceptedEpoch' = [acceptedEpoch EXCEPT ![i] = tempMaxEpoch'[i]]
+                       /\ LET newLeaderZxid == <<acceptedEpoch'[i], 0>>
+                              m == [ mtype |-> LEADERINFO,
+                                     mzxid |-> newLeaderZxid ]
+                          IN DiscardAndBroadcastLEADERINFO(i, j, m)
+                    \/ /\ ~WaitingForNewEpochTurnToFalse(i, new_sid_connecting)
+                       /\ Discard(j, i)
+                       /\ UNCHANGED acceptedEpoch
+              \/  \* 2. has broadcast LEADERINFO 
+                 /\ ~WaitingForNewEpoch(i, sid_connecting)
+                 /\ Reply(i, j, [ mtype |-> LEADERINFO,
+                                  mzxid |-> <<acceptedEpoch[i], 0>> ] )
+                 /\ UNCHANGED <<tempMaxEpoch, connecting, acceptedEpoch, violatedInvariants>>
+        /\ UNCHANGED <<state, currentEpoch, lastProcessed, zabState, history, lastCommitted, 
+                       followerVars, electionVars, initialHistory, leadingVoteSet, learners, 
+                       electing, ackldRecv, forwarding, proposalMsgsLog, epochLeader, 
+                       lastSnapshot, electionMsgs, envVars>>
+        /\ UpdateRecorder(<<"LeaderProcessFOLLOWERINFO", i, j>>)
+        
+(* Follower receives LEADERINFO. If newEpoch >= acceptedEpoch, then follower 
+   updates acceptedEpoch and sends ACKEPOCH back, containing currentEpoch and
+   lastProcessedZxid. After this, zabState turns to SYNC. 
+   See registerWithLeader in Learner for details.*)
+FollowerProcessLEADERINFO(i, j) ==
+        /\ IsON(i)
+        /\ IsFollower(i)
+        /\ PendingLEADERINFO(i, j)
+        /\ LET msg      == msgs[j][i][1]
+               newEpoch == msg.mzxid[1]
+               infoOk   == IsMyLeader(i, j)
+               epochOk  == newEpoch >= acceptedEpoch[i]
+               stateOk  == zabState[i] = DISCOVERY
+           IN /\ infoOk
+              /\ \/ \* 1. Normal case
+                    /\ epochOk   
+                    /\ \/ /\ stateOk
+                          /\ \/ /\ newEpoch > acceptedEpoch[i]
+                                /\ acceptedEpoch' = [acceptedEpoch EXCEPT ![i] = newEpoch]
+                                /\ LET epochBytes == currentEpoch[i]
+                                       m == [ mtype  |-> ACKEPOCH,
+                                              mzxid  |-> lastProcessed[i].zxid, 
+                                              mepoch |-> epochBytes ] 
+                                   IN Reply(i, j, m)
+                             \/ /\ newEpoch = acceptedEpoch[i]
+                                /\ LET m == [ mtype  |-> ACKEPOCH,
+                                              mzxid  |-> lastProcessed[i].zxid,
+                                              mepoch |-> -1 ]
+                                   IN Reply(i, j, m)
+                                /\ UNCHANGED acceptedEpoch
+                          /\ zabState' = [zabState EXCEPT ![i] = SYNCHRONIZATION]
+                          /\ UNCHANGED violatedInvariants
+                       \/ /\ ~stateOk
+                          /\ PrintT("Exception: Follower receives LEADERINFO," \o
+                             " whileZabState not DISCOVERY.")
+                          /\ violatedInvariants' = [violatedInvariants EXCEPT !.stateInconsistent = TRUE]
+                          /\ Discard(j, i)
+                          /\ UNCHANGED <<acceptedEpoch, zabState>>
+                    /\ UNCHANGED <<varsL, connectInfo, learners, forwarding, electing,
+                                   connecting, ackldRecv>>
+                 \/ \* 2. Abnormal case - go back to election
+                    /\ ~epochOk 
+                    /\ FollowerShutdown(i)
+                    /\ Clean(i, connectInfo[i].sid)
+                    /\ RemoveLearner(connectInfo[i].sid, i)
+                    /\ UNCHANGED <<acceptedEpoch, violatedInvariants>>
+        /\ UNCHANGED <<history, lastCommitted, tempMaxEpoch, initialHistory, lastSnapshot,
+                       proposalMsgsLog, epochLeader, packetsSync, envVars>>
+        /\ UpdateRecorder(<<"FollowerProcessLEADERINFO", i, j>>)
+-----------------------------------------------------------------------------    
+RECURSIVE UpdateAckSidHelper(_,_,_,_)
+UpdateAckSidHelper(his, cur, end, target) ==
+        IF cur > end THEN his
+        ELSE LET curTxn == [ zxid   |-> his[1].zxid,
+                             value  |-> his[1].value,
+                             ackSid |-> IF target \in his[1].ackSid THEN his[1].ackSid
+                                        ELSE his[1].ackSid \union {target},
+                             epoch  |-> his[1].epoch ]
+             IN <<curTxn>> \o UpdateAckSidHelper(Tail(his), cur + 1, end, target)
+
+\* There originally existed one bug in LeaderProcessACK when 
+\* monotonicallyInc = FALSE, and it is we did not add ackSid of 
+\* history in SYNC. So we update ackSid in syncFollower.
+UpdateAckSid(his, lastSeenIndex, target) ==
+        IF Len(his) = 0 \/ lastSeenIndex = 0 THEN his
+        ELSE UpdateAckSidHelper(his, 1, Minimum( { Len(his), lastSeenIndex} ), target)
+
+\* return -1: this zxid appears at least twice; Len(his) + 1: does not exist;
+\* 1 ~ Len(his): exists and appears just once.
+RECURSIVE ZxidToIndexHepler(_,_,_,_)
+ZxidToIndexHepler(his, zxid, cur, appeared) == 
+        IF cur > Len(his) THEN cur  
+        ELSE IF TxnZxidEqual(his[cur], zxid) 
+             THEN CASE appeared = TRUE -> -1
+                  []   OTHER           -> Minimum( { cur, 
+                            ZxidToIndexHepler(his, zxid, cur + 1, TRUE) } ) 
+             ELSE ZxidToIndexHepler(his, zxid, cur + 1, appeared)
+
+ZxidToIndex(his, zxid) == IF ZxidEqual( zxid, <<0, 0>> ) THEN 0
+                          ELSE IF Len(his) = 0 THEN 1
+                               ELSE LET len == Len(his) IN
+                                    IF \E idx \in 1..len: TxnZxidEqual(his[idx], zxid)
+                                    THEN ZxidToIndexHepler(his, zxid, 1, FALSE)
+                                    ELSE len + 1
+
+\* Find index idx which meets: 
+\* history[idx].zxid <= zxid < history[idx + 1].zxid
+RECURSIVE IndexOfZxidHelper(_,_,_,_)
+IndexOfZxidHelper(his, zxid, cur, end) ==
+        IF cur > end THEN end
+        ELSE IF ZxidCompare(his[cur].zxid, zxid) THEN cur - 1
+             ELSE IndexOfZxidHelper(his, zxid, cur + 1, end)
+
+IndexOfZxid(his, zxid) == IF Len(his) = 0 THEN 0
+                          ELSE LET idx == ZxidToIndex(his, zxid)
+                                   len == Len(his)
+                               IN 
+                               IF idx <= len THEN idx
+                               ELSE IndexOfZxidHelper(his, zxid, 1, len)
+
+RECURSIVE queuePackets(_,_,_,_,_)
+queuePackets(queue, his, cur, committed, end) == 
+        IF cur > end THEN queue
+        ELSE CASE cur > committed ->
+                LET m_proposal == [ mtype |-> PROPOSAL, 
+                                    mzxid |-> his[cur].zxid,
+                                    mdata |-> his[cur].value ]
+                IN queuePackets(Append(queue, m_proposal), his, cur + 1, committed, end)
+             []   cur <= committed ->
+                LET m_proposal == [ mtype |-> PROPOSAL, 
+                                    mzxid |-> his[cur].zxid,
+                                    mdata |-> his[cur].value ]
+                    m_commit   == [ mtype |-> COMMIT,
+                                    mzxid |-> his[cur].zxid ]
+                    newQueue   == queue \o <<m_proposal, m_commit>>
+                IN queuePackets(newQueue, his, cur + 1, committed, end)
+
+RECURSIVE setPacketsForChecking(_,_,_,_,_,_)
+setPacketsForChecking(set, src, ep, his, cur, end) ==
+        IF cur > end THEN set
+        ELSE LET m_proposal == [ source |-> src,
+                                 epoch  |-> ep,
+                                 zxid   |-> his[cur].zxid,
+                                 data   |-> his[cur].value ]
+             IN setPacketsForChecking((set \union {m_proposal}), src, ep, his, cur + 1, end)
+
+\* Func lead() calls zk.loadData(), which will call takeSnapshot().
+LastSnapshot(i) == IF zabState[i] = BROADCAST THEN lastSnapshot[i]
+                   ELSE CASE IsLeader(i) -> 
+                            LET lastIndex == Len(history[i])
+                            IN IF lastIndex = 0 THEN [ index |-> 0,
+                                                       zxid  |-> <<0, 0>> ]
+                               ELSE [ index |-> lastIndex,
+                                      zxid  |-> history[i][lastIndex].zxid ]
+                        []   OTHER -> lastSnapshot[i]
+
+\* To compress state space, 
+\* 1. we merge sending SNAP and outputing snapshot buffer into sending SNAP, and
+\* 2. substitute sub sequence of history for snapshot of data tree.
+SerializeSnapshot(i, idx) == IF idx <= 0 THEN << >>
+                             ELSE SubSeq(history[i], 1, idx)
+
+(* See queueCommittedProposals in LearnerHandler and startForwarding in Leader
+   for details. For proposals in committedLog and toBeApplied, send <PROPOSAL,
+   COMMIT>. For proposals in outstandingProposals, send PROPOSAL only. *)
+SendSyncMsgs(i, j, lastSeenZxid, lastSeenIndex, mode, needRemoveHead) ==
+        /\ LET lastCommittedIndex == IF zabState[i] = BROADCAST 
+                                     THEN lastCommitted[i].index
+                                     ELSE Len(history[i])
+               lastProposedIndex  == Len(history[i])
+               queue_origin == IF lastSeenIndex >= lastProposedIndex 
+                               THEN << >>
+                               ELSE queuePackets(<< >>, history[i], 
+                                    lastSeenIndex + 1, lastCommittedIndex,
+                                    lastProposedIndex)
+               set_forChecking == IF lastSeenIndex >= lastProposedIndex 
+                                  THEN {}
+                                  ELSE setPacketsForChecking( { }, i, 
+                                        acceptedEpoch[i], history[i],
+                                        lastSeenIndex + 1, lastProposedIndex)
+               m_trunc == [ mtype |-> TRUNC, mtruncZxid |-> lastSeenZxid ]
+               m_diff  == [ mtype |-> DIFF,  mzxid |-> lastSeenZxid ]
+               m_snap  == [ mtype |-> SNAP,  msnapZxid |-> lastSeenZxid,
+                                             msnapshot |-> SerializeSnapshot(i, lastSeenIndex) ]
+               newLeaderZxid == <<acceptedEpoch[i], 0>>
+               m_newleader == [ mtype |-> NEWLEADER,
+                                mzxid |-> newLeaderZxid ]
+               queue_toSend == CASE mode = TRUNC -> (<<m_trunc>> \o queue_origin) \o <<m_newleader>>
+                               []   mode = DIFF  -> (<<m_diff>>  \o queue_origin) \o <<m_newleader>>
+                               []   mode = SNAP  -> (<<m_snap>>  \o queue_origin) \o <<m_newleader>>
+           IN /\ \/ /\ needRemoveHead
+                    /\ DiscardAndSendPackets(i, j, queue_toSend)
+                 \/ /\ ~needRemoveHead
+                    /\ SendPackets(i, j, queue_toSend)
+              /\ proposalMsgsLog' = proposalMsgsLog \union set_forChecking
+        /\ forwarding' = [forwarding EXCEPT ![i] = @ \union {j} ]
+        /\ \/ /\ mode = TRUNC \/ mode = DIFF 
+              /\ history' = [history EXCEPT ![i] = UpdateAckSid(@, lastSeenIndex, j) ]
+           \/ /\ mode = SNAP
+              /\ UNCHANGED history \* txns before minCommitted don't need to be committed again
+
+(* Leader syncs with follower by sending DIFF/TRUNC/SNAP/PROPOSAL/COMMIT/NEWLEADER.
+   See syncFollower in LearnerHandler for details. *)
+SyncFollower(i, j, peerLastZxid, needRemoveHead) ==
+        LET \* IsPeerNewEpochZxid == peerLastZxid[2] = 0
+            lastProcessedZxid == lastProcessed[i].zxid
+            minCommittedIdx   == lastSnapshot[i].index + 1
+            maxCommittedIdx   == IF zabState[i] = BROADCAST THEN lastCommitted[i].index
+                                 ELSE Len(history[i])
+            committedLogEmpty == minCommittedIdx > maxCommittedIdx
+            minCommittedLog   == IF committedLogEmpty THEN lastProcessedZxid
+                                 ELSE history[i][minCommittedIdx].zxid
+            maxCommittedLog   == IF committedLogEmpty THEN lastProcessedZxid
+                                 ELSE IF maxCommittedIdx = 0 THEN << 0, 0>>
+                                      ELSE history[i][maxCommittedIdx].zxid
+
+            \* Hypothesis: 1. minCommittedLog : txn with index of lastSnapshot + 1
+            \*             2. maxCommittedLog : LastCommitted, to compress state space.
+            \*             3. merge queueCommittedProposals,startForwarding and 
+            \*                sending NEWLEADER into SendSyncMsgs.
+
+        IN \/ \* case1. peerLastZxid = lastProcessedZxid,
+              \*        sned DIFF & StartForwarding(lastProcessedZxid)
+              /\ ZxidEqual(peerLastZxid, lastProcessedZxid)
+              /\ SendSyncMsgs(i, j, peerLastZxid, lastProcessed[i].index, 
+                                    DIFF, needRemoveHead)
+           \/ /\ ~ZxidEqual(peerLastZxid, lastProcessedZxid)
+              /\ \/ \* case2. peerLastZxid > maxCommittedLog,
+                    \*        send TRUNC(maxCommittedLog) & StartForwarding
+                    /\ ZxidCompare(peerLastZxid, maxCommittedLog)
+                    /\ SendSyncMsgs(i, j, maxCommittedLog, maxCommittedIdx, 
+                                          TRUNC, needRemoveHead)
+                 \/ \* case3. minCommittedLog <= peerLastZxid <= maxCommittedLog
+                    /\ ~ZxidCompare(peerLastZxid, maxCommittedLog)
+                    /\ ~ZxidCompare(minCommittedLog, peerLastZxid)
+                    /\ LET lastSeenIndex == ZxidToIndex(history[i], peerLastZxid)
+                           exist == /\ lastSeenIndex >= minCommittedIdx
+                                    /\ lastSeenIndex <= Len(history[i])
+                           lastIndex == IF exist THEN lastSeenIndex
+                                        ELSE IndexOfZxid(history[i], peerLastZxid)
+                           \* Maximum zxid that < peerLastZxid
+                           lastZxid  == IF exist THEN peerLastZxid
+                                        ELSE IF lastIndex = 0 THEN <<0, 0>>
+                                             ELSE history[i][lastIndex].zxid
+                       IN 
+                       \/ \* case 3.1. peerLastZxid exists in committedLog,
+                          \*           DIFF + queueCommittedProposals(peerLastZxid + 1)
+                          \*                + StartForwarding
+                          /\ exist
+                          /\ SendSyncMsgs(i, j, peerLastZxid, lastSeenIndex, 
+                                                DIFF, needRemoveHead)
+                       \/ \* case 3.2. peerLastZxid does not exist in committedLog,
+                          \*           TRUNC(lastZxid) + queueCommittedProposals(lastZxid + 1)
+                          \*                           + StartForwarding
+                          /\ ~exist
+                          /\ SendSyncMsgs(i, j, lastZxid, lastIndex, 
+                                                TRUNC, needRemoveHead)
+                 \/ \* case4. peerLastZxid < minCommittedLog,
+                    \*        send SNAP(lastProcessed) + StartForwarding
+                    /\ ZxidCompare(minCommittedLog, peerLastZxid)
+                    /\ SendSyncMsgs(i, j, lastProcessedZxid, maxCommittedIdx,
+                                          SNAP, needRemoveHead)
+
+\* compare state summary of two servers
+IsMoreRecentThan(ss1, ss2) == \/ ss1.currentEpoch > ss2.currentEpoch
+                              \/ /\ ss1.currentEpoch = ss2.currentEpoch
+                                 /\ ZxidCompare(ss1.lastZxid, ss2.lastZxid)
+
+\* electionFinished in Leader
+ElectionFinished(i, set) == /\ i \in set
+                            /\ IsQuorum(set)
+
+\* There may exist some follower shuts down and connects again, while
+\* it has sent ACKEPOCH or updated currentEpoch last time. This means
+\* sid of this follower has existed in elecingFollower but its info 
+\* is old. So we need to make sure each sid in electingFollower is 
+\* unique and latest(newest).
+UpdateElecting(oldSet, sid, peerLastZxid, inQuorum) ==
+        LET sid_electing == {s.sid: s \in oldSet }
+        IN IF sid \in sid_electing 
+           THEN LET old_info == CHOOSE info \in oldSet : info.sid = sid
+                    follower_info == 
+                             [ sid          |-> sid,
+                               peerLastZxid |-> peerLastZxid,
+                               inQuorum     |-> (inQuorum \/ old_info.inQuorum) ]
+                IN (oldSet \ {old_info} ) \union {follower_info}
+           ELSE LET follower_info == 
+                             [ sid          |-> sid,
+                               peerLastZxid |-> peerLastZxid,
+                               inQuorum     |-> inQuorum ]
+                IN oldSet \union {follower_info}
+
+LeaderTurnToSynchronization(i) ==
+        /\ currentEpoch' = [currentEpoch EXCEPT ![i] = acceptedEpoch[i]]
+        /\ zabState'     = [zabState     EXCEPT ![i] = SYNCHRONIZATION]
+
+(* Leader waits for receiving ACKEPOPCH from a quorum, and check whether it has most recent
+   state summary from them. After this, leader's zabState turns to SYNCHRONIZATION.
+   See waitForEpochAck in Leader for details. *)
+LeaderProcessACKEPOCH(i, j) ==
+        /\ IsON(i)
+        /\ IsLeader(i)
+        /\ PendingACKEPOCH(i, j)
+        /\ LET msg == msgs[j][i][1]
+               infoOk == IsMyLearner(i, j)           
+               leaderStateSummary   == [ currentEpoch |-> currentEpoch[i], 
+                                         lastZxid     |-> lastProcessed[i].zxid ]
+               followerStateSummary == [ currentEpoch |-> msg.mepoch,  
+                                         lastZxid     |-> msg.mzxid ]
+               logOk == \* whether follower is no more up-to-date than leader
+                        ~IsMoreRecentThan(followerStateSummary, leaderStateSummary)
+               electing_quorum == {e \in electing[i]: e.inQuorum = TRUE }
+               sid_electing == {s.sid: s \in electing_quorum }
+           IN /\ infoOk
+              /\ \/ \* electionFinished = true, jump ouf of waitForEpochAck. 
+                    \* Different from code, here we still need to record info
+                    \* into electing, to help us perform syncFollower afterwards.
+                    \* Since electing already meets quorum, it does not break
+                    \* consistency between code and spec.
+                    /\ ElectionFinished(i, sid_electing)
+                    /\ electing' = [electing EXCEPT ![i] = UpdateElecting(@, j, msg.mzxid, FALSE) ]
+                    /\ Discard(j, i)
+                    /\ UNCHANGED <<varsL, zabState, forwarding, connectInfo, 
+                                   learners, epochLeader, violatedInvariants>>
+                 \/ /\ ~ElectionFinished(i, sid_electing)
+                    /\ \/ /\ zabState[i] = DISCOVERY
+                          /\ UNCHANGED violatedInvariants
+                       \/ /\ zabState[i] /= DISCOVERY
+                          /\ PrintT("Exception: electionFinished false," \o
+                             " while zabState not DISCOVERY.")
+                          /\ violatedInvariants' = [violatedInvariants EXCEPT 
+                                                    !.stateInconsistent = TRUE]
+                    /\ \/ /\ followerStateSummary.currentEpoch = -1
+                          /\ electing' = [electing EXCEPT ![i] = UpdateElecting(@, j, 
+                                                                msg.mzxid, FALSE) ]
+                          /\ Discard(j, i)
+                          /\ UNCHANGED <<varsL, zabState, forwarding, connectInfo, 
+                                         learners, epochLeader>>
+                       \/ /\ followerStateSummary.currentEpoch > -1
+                          /\ \/ \* normal follower 
+                                /\ logOk
+                                /\ electing' = [electing EXCEPT ![i] = 
+                                            UpdateElecting(@, j, msg.mzxid, TRUE) ]
+                                /\ LET new_electing_quorum == {e \in electing'[i]: e.inQuorum = TRUE }
+                                       new_sid_electing == {s.sid: s \in new_electing_quorum }
+                                   IN 
+                                   \/ \* electionFinished = true, jump out of waitForEpochAck,
+                                      \* update currentEpoch and zabState.
+                                      /\ ElectionFinished(i, new_sid_electing) 
+                                      /\ LeaderTurnToSynchronization(i)
+                                      /\ LET newLeaderEpoch == acceptedEpoch[i]
+                                         IN epochLeader' = [epochLeader EXCEPT ![newLeaderEpoch]
+                                                = @ \union {i} ] \* for checking invariants
+                                   \/ \* there still exists electionFinished = false.
+                                      /\ ~ElectionFinished(i, new_sid_electing)
+                                      /\ UNCHANGED <<currentEpoch, zabState, epochLeader>>
+                                /\ Discard(j, i)
+                                /\ UNCHANGED <<state, lastProcessed, electionVars, leadingVoteSet,
+                                               electionMsgs, connectInfo, learners, history, forwarding>>
+                             \/ \* Exists follower more recent than leader
+                                /\ ~logOk 
+                                /\ LeaderShutdown(i)
+                                /\ UNCHANGED <<electing, epochLeader>>
+        /\ UNCHANGED <<acceptedEpoch, lastCommitted, lastSnapshot, connecting, ackldRecv,
+                       tempMaxEpoch, initialHistory, packetsSync, proposalMsgsLog, envVars>>
+        /\ UpdateRecorder(<<"LeaderProcessACKEPOCH", i, j>>)
+
+\* Strip syncFollower from LeaderProcessACKEPOCH.
+\* Only when electionFinished = true and there exists some
+\* learnerHandler has not perform syncFollower, this 
+\* action will be called.
+LeaderSyncFollower(i, j) ==
+        /\ IsON(i)
+        /\ IsLeader(i)
+        /\ LET electing_quorum == {e \in electing[i]: e.inQuorum = TRUE }
+               electionFinished == ElectionFinished(i, {s.sid: s \in electing_quorum } )
+               toSync == {s \in electing[i] : /\ ~ZxidEqual( s.peerLastZxid, <<-1, -1>>)
+                                              /\ s.sid \in learners[i] }
+               canSync == toSync /= {}
+           IN
+           /\ electionFinished
+           /\ canSync
+           /\ \E s \in toSync: s.sid = j
+           /\ LET chosen == CHOOSE s \in toSync: s.sid = j
+                  newChosen == [ sid          |-> chosen.sid,
+                                 peerLastZxid |-> <<-1, -1>>, \* <<-1,-1>> means has handled.
+                                 inQuorum     |-> chosen.inQuorum ] 
+              IN /\ SyncFollower(i, chosen.sid, chosen.peerLastZxid, FALSE)
+                 /\ electing' = [electing EXCEPT ![i] = (@ \ {chosen}) \union {newChosen} ]
+        /\ UNCHANGED <<state, currentEpoch, lastProcessed, zabState, acceptedEpoch, 
+                    lastCommitted, initialHistory, electionVars, leadingVoteSet,
+                    learners, connecting, ackldRecv, tempMaxEpoch, followerVars, 
+                    lastSnapshot, epochLeader, violatedInvariants, electionMsgs, envVars>>
+        /\ UpdateRecorder(<<"LeaderSyncFollower", i, j>>)
+
+TruncateLog(his, index) == IF index <= 0 THEN << >>
+                           ELSE SubSeq(his, 1, index)
+
+(* Follower receives DIFF/TRUNC, and then may receives PROPOSAL,COMMIT,NEWLEADER,
+   and UPTODATE. See syncWithLeader in Learner for details. *)
+FollowerProcessSyncMessage(i, j) ==
+        /\ IsON(i)
+        /\ IsFollower(i)
+        /\ msgs[j][i] /= << >>
+        /\ \/ msgs[j][i][1].mtype = DIFF 
+           \/ msgs[j][i][1].mtype = TRUNC
+           \/ msgs[j][i][1].mtype = SNAP
+        /\ LET msg == msgs[j][i][1]
+               infoOk == IsMyLeader(i, j)
+               stateOk == zabState[i] = SYNCHRONIZATION
+           IN /\ infoOk
+              /\ \/ \* Follower should receive packets in SYNC.
+                    /\ ~stateOk
+                    /\ PrintT("Exception: Follower receives DIFF/TRUNC/SNAP," \o
+                             " whileZabState not SYNCHRONIZATION.")
+                    /\ violatedInvariants' = [violatedInvariants EXCEPT !.stateInconsistent = TRUE]
+                    /\ UNCHANGED <<history, initialHistory, lastProcessed, lastCommitted, connectInfo>>
+                 \/ /\ stateOk
+                    /\ \/ /\ msg.mtype = DIFF
+                          /\ connectInfo' = [connectInfo EXCEPT ![i].syncMode = DIFF]         
+                          /\ UNCHANGED <<history, initialHistory, lastProcessed, lastCommitted,
+                                    violatedInvariants>>
+                       \/ /\ msg.mtype = TRUNC
+                          /\ connectInfo' = [connectInfo EXCEPT ![i].syncMode = TRUNC]
+                          /\ LET truncZxid  == msg.mtruncZxid
+                                 truncIndex == ZxidToIndex(history[i], truncZxid)
+                                 truncOk    == /\ truncIndex >= lastCommitted[i].index
+                                               /\ truncIndex <= Len(history[i])
+                             IN
+                             \/ /\ ~truncOk
+                                /\ PrintT("Exception: TRUNC error.")
+                                /\ violatedInvariants' = [violatedInvariants EXCEPT 
+                                                !.proposalInconsistent = TRUE]
+                                /\ UNCHANGED <<history, initialHistory, lastProcessed, lastCommitted>>
+                             \/ /\ truncOk
+                                /\ history' = [history EXCEPT 
+                                                    ![i] = TruncateLog(history[i], truncIndex)]
+                                /\ initialHistory' = [initialHistory EXCEPT ![i] = history'[i]]
+                                /\ lastProcessed' = [lastProcessed EXCEPT 
+                                                    ![i] = [ index |-> truncIndex,
+                                                             zxid  |-> truncZxid] ]
+                                /\ lastCommitted' = [lastCommitted EXCEPT 
+                                                    ![i] = [ index |-> truncIndex,
+                                                             zxid  |-> truncZxid] ]
+                                /\ UNCHANGED violatedInvariants
+                       \/ /\ msg.mtype = SNAP
+                          /\ connectInfo' = [connectInfo EXCEPT ![i].syncMode = SNAP]
+                          /\ history' = [history EXCEPT ![i] = msg.msnapshot]
+                          /\ initialHistory' = [initialHistory EXCEPT ![i] = history'[i]]
+                          /\ lastProcessed' = [lastProcessed EXCEPT 
+                                                    ![i] = [ index |-> Len(history'[i]),
+                                                             zxid  |-> msg.msnapZxid] ]
+                          /\ lastCommitted' = [lastCommitted EXCEPT 
+                                                    ![i] = [ index |-> Len(history'[i]),
+                                                             zxid  |-> msg.msnapZxid] ]
+                          /\ UNCHANGED violatedInvariants
+        /\ Discard(j, i)
+        /\ UNCHANGED <<state, currentEpoch, zabState, acceptedEpoch, electionVars,
+                       leaderVars, tempMaxEpoch, packetsSync, lastSnapshot,
+                       proposalMsgsLog, epochLeader, electionMsgs, envVars>>
+        /\ UpdateRecorder(<<"FollowerProcessSyncMessage", i, j>>)
+
+\* See variable snapshotNeeded in Learner for details.
+SnapshotNeeded(i) == \/ connectInfo[i].syncMode = TRUNC
+                     \/ connectInfo[i].syncMode = SNAP
+
+\* See variable writeToTxnLog in Learner for details.
+WriteToTxnLog(i) == IF \/ connectInfo[i].syncMode = DIFF
+                       \/ connectInfo[i].nlRcv = TRUE
+                    THEN TRUE ELSE FALSE
+
+\* See lastProposed in Leader for details.
+LastProposed(i) == IF Len(history[i]) = 0 THEN [ index |-> 0, 
+                                                 zxid  |-> <<0, 0>> ]
+                   ELSE
+                   LET lastIndex == Len(history[i])
+                       entry     == history[i][lastIndex]
+                   IN [ index |-> lastIndex,
+                        zxid  |-> entry.zxid ]
+
+\* See lastQueued in Learner for details.
+LastQueued(i) == IF ~IsFollower(i) \/ zabState[i] /= SYNCHRONIZATION 
+                 THEN LastProposed(i)
+                 ELSE \* condition: IsFollower(i) /\ zabState = SYNCHRONIZATION
+                      LET packetsInSync == packetsSync[i].notCommitted
+                          lenSync  == Len(packetsInSync)
+                          totalLen == Len(history[i]) + lenSync
+                      IN IF lenSync = 0 THEN LastProposed(i)
+                         ELSE [ index |-> totalLen,
+                                zxid  |-> packetsInSync[lenSync].zxid ]
+
+IsNextZxid(curZxid, nextZxid) ==
+            \/ \* first PROPOSAL in this epoch
+               /\ nextZxid[2] = 1
+               /\ curZxid[1] < nextZxid[1]
+            \/ \* not first PROPOSAL in this epoch
+               /\ nextZxid[2] > 1
+               /\ curZxid[1] = nextZxid[1]
+               /\ curZxid[2] + 1 = nextZxid[2]
+
+FollowerProcessPROPOSALInSync(i, j) ==
+        /\ IsON(i)
+        /\ IsFollower(i)
+        /\ PendingPROPOSAL(i, j)
+        /\ zabState[i] = SYNCHRONIZATION
+        /\ LET msg == msgs[j][i][1]
+               infoOk == IsMyLeader(i, j)
+               isNext == IsNextZxid(LastQueued(i).zxid, msg.mzxid)
+               newTxn == [ zxid   |-> msg.mzxid,
+                           value  |-> msg.mdata,
+                           ackSid |-> {},    \* follower do not consider ackSid
+                           epoch  |-> acceptedEpoch[i] ] \* epoch of this round
+           IN /\ infoOk
+              /\ \/ /\ isNext
+                    /\ packetsSync' = [packetsSync EXCEPT ![i].notCommitted 
+                                = Append(packetsSync[i].notCommitted, newTxn) ]
+                 \/ /\ ~isNext
+                    /\ PrintT("Warn: Follower receives PROPOSAL," \o
+                        " while zxid != lastQueued + 1.")
+                    /\ UNCHANGED packetsSync
+        \* logRequest -> SyncRequestProcessor -> SendAckRequestProcessor -> reply ack
+        \* So here we do not need to send ack to leader.
+        /\ Discard(j, i)
+        /\ UNCHANGED <<serverVars, electionVars, leaderVars, connectInfo,
+                       verifyVars, electionMsgs, envVars>>
+        /\ UpdateRecorder(<<"FollowerProcessPROPOSALInSync", i, j>>)
+
+RECURSIVE IndexOfFirstTxnWithEpoch(_,_,_,_)
+IndexOfFirstTxnWithEpoch(his, epoch, cur, end) == 
+            IF cur > end THEN cur 
+            ELSE IF his[cur].epoch = epoch THEN cur
+                 ELSE IndexOfFirstTxnWithEpoch(his, epoch, cur + 1, end)
+
+LastCommitted(i) == IF zabState[i] = BROADCAST THEN lastCommitted[i]
+                    ELSE CASE IsLeader(i)   -> 
+                            LET lastInitialIndex == Len(initialHistory[i])
+                            IN IF lastInitialIndex = 0 THEN [ index |-> 0,
+                                                              zxid  |-> <<0, 0>> ]
+                               ELSE [ index |-> lastInitialIndex,
+                                      zxid  |-> history[i][lastInitialIndex].zxid ]
+                         []   IsFollower(i) ->
+                            LET completeHis == history[i] \o packetsSync[i].notCommitted
+                                packetsCommitted == packetsSync[i].committed
+                                lenCommitted == Len(packetsCommitted)
+                            IN IF lenCommitted = 0 \* return last one in history
+                               THEN LET lastIndex == Len(history[i])
+                                        lastInitialIndex == Len(initialHistory[i])
+                                    IN IF lastIndex = lastInitialIndex
+                                       THEN IF lastIndex = 0
+                                            THEN [ index |-> 0,
+                                                   zxid  |-> <<0, 0>> ]
+                                            ELSE [ index |-> lastIndex ,
+                                                   zxid  |-> history[lastIndex].zxid ]
+                                       ELSE IF lastInitialIndex < lastCommitted[i].index
+                                            THEN lastCommitted[i]
+                                            ELSE IF lastInitialIndex = 0
+                                                 THEN [ index |-> 0,
+                                                        zxid  |-> <<0, 0>> ]
+                                                 ELSE [ index |-> lastInitialIndex,
+                                                        zxid  |-> history[lastInitialIndex].zxid ]
+                               ELSE                \* return tail of packetsCommitted
+                                    LET committedIndex == ZxidToIndex(completeHis, 
+                                                     packetsCommitted[lenCommitted] )
+                                    IN [ index |-> committedIndex, 
+                                         zxid  |-> packetsCommitted[lenCommitted] ]
+                         []   OTHER -> lastCommitted[i]
+
+TxnWithIndex(i, idx) == IF ~IsFollower(i) \/ zabState[i] /= SYNCHRONIZATION 
+                        THEN history[i][idx]
+                        ELSE LET completeHis == history[i] \o packetsSync[i].notCommitted
+                             IN completeHis[idx]
+
+(* To simplify specification, we assume snapshotNeeded = false and 
+   writeToTxnLog = true. So here we just call packetsCommitted.add. *)
+FollowerProcessCOMMITInSync(i, j) ==
+        /\ IsON(i)
+        /\ IsFollower(i)
+        /\ PendingCOMMIT(i, j)
+        /\ zabState[i] = SYNCHRONIZATION
+        /\ LET msg == msgs[j][i][1]
+               infoOk == IsMyLeader(i, j)
+               committedIndex == LastCommitted(i).index + 1
+               exist == /\ committedIndex <= LastQueued(i).index
+                        /\ IsNextZxid(LastCommitted(i).zxid, msg.mzxid)
+               match == ZxidEqual(msg.mzxid, TxnWithIndex(i, committedIndex).zxid )
+           IN /\ infoOk 
+              /\ \/ /\ exist
+                    /\ \/ /\ match
+                          /\ LET writeToTxnLog == WriteToTxnLog(i)
+                             IN
+                             \/ /\ ~writeToTxnLog \* zk.processTxn() & packetsNotCommitted.remove()
+                                /\ LET committedTxn == packetsSync[i].notCommitted[1]
+                                   IN 
+                                   /\ history' = [ history EXCEPT ![i] 
+                                               = Append(@, committedTxn)]
+                                   /\ lastCommitted' = [ lastCommitted EXCEPT ![i]
+                                                     = [index |-> Len(history'[i]),
+                                                        zxid  |-> committedTxn.zxid ] ]
+                                   /\ lastProcessed' = [ lastProcessed EXCEPT ![i]
+                                                     = lastCommitted'[i] ]
+                                   /\ packetsSync' = [ packetsSync EXCEPT ![i].notCommitted
+                                                   = Tail(@) ]
+                             \/ /\ writeToTxnLog  \* packetsCommitted.add()
+                                /\ packetsSync' = [ packetsSync EXCEPT ![i].committed
+                                                = Append(packetsSync[i].committed, msg.mzxid) ]
+                                /\ UNCHANGED <<history, lastCommitted, lastProcessed>>
+                          /\ UNCHANGED violatedInvariants
+                       \/ /\ ~match
+                          /\ PrintT("Warn: Follower receives COMMIT," \o
+                               " but zxid not the next committed zxid in COMMIT.")
+                          /\ violatedInvariants' = [violatedInvariants EXCEPT 
+                                    !.commitInconsistent = TRUE ]
+                          /\ UNCHANGED <<history, lastCommitted, lastProcessed, packetsSync>>
+                 \/ /\ ~exist
+                    /\ PrintT("Warn: Follower receives COMMIT," \o
+                         " but no packets with its zxid exists.")
+                    /\ violatedInvariants' = [violatedInvariants EXCEPT 
+                                !.commitInconsistent = TRUE ]
+                    /\ UNCHANGED <<history, lastCommitted, lastProcessed, packetsSync>>
+        /\ Discard(j, i)
+        /\ UNCHANGED <<state, currentEpoch, zabState, acceptedEpoch,
+                       lastSnapshot, initialHistory, electionVars, leaderVars,
+                       connectInfo, epochLeader, proposalMsgsLog, electionMsgs, envVars>>
+        /\ UpdateRecorder(<<"FollowerProcessCOMMITInSync", i, j>>)
+
+\* Assuming that everytime committing two txns, node takes snapshot.
+ShouldSnapshot(i) == lastCommitted[i].index - lastSnapshot[i].index >= 2
+
+(* There are mainly three places where calling takeSnapshot():
+   1. zk.loadData() in lead() when node becomes leader;
+   2. syncRequestProcessor.run() tells when to snapshot;
+   3. node processing NEWLEADER in learner.syncWithLeader();
+ *)
+ TakeSnapshot(i) == LET snapOk == lastSnapshot[i].index <= lastCommitted[i].index
+                    IN \/ /\ snapOk
+                          /\ lastSnapshot' = [ lastSnapshot EXCEPT ![i] = lastCommitted[i] ]
+                          /\ UNCHANGED violatedInvariants
+                       \/ /\ ~snapOk
+                          /\ PrintT("Exception: index of snapshot greater than" \o
+                                    "index of committed.")
+                          /\ violatedInvariants' = [violatedInvariants EXCEPT 
+                                                    !.commitInconsistent = TRUE ]
+                          /\ UNCHANGED lastSnapshot
+
+RECURSIVE ACKInBatches(_,_)
+ACKInBatches(queue, packets) ==
+        IF packets = << >> THEN queue
+        ELSE LET head == packets[1]
+                 newPackets == Tail(packets)
+                 m_ack == [ mtype |-> ACK,
+                            mzxid |-> head.zxid ]
+             IN ACKInBatches( Append(queue, m_ack), newPackets )
+
+(* Update currentEpoch, and logRequest every packets in
+   packetsNotCommitted and clear it. As syncProcessor will 
+   be called in logRequest, we have to reply acks here. *)
+FollowerProcessNEWLEADER(i, j) ==
+        /\ IsON(i)
+        /\ IsFollower(i)
+        /\ PendingNEWLEADER(i, j)
+        /\ LET msg == msgs[j][i][1]
+               infoOk == IsMyLeader(i, j)
+               packetsInSync == packetsSync[i].notCommitted
+               m_ackld == [ mtype |-> ACKLD,
+                            mzxid |-> msg.mzxid ]
+               ms_ack  == ACKInBatches( << >>, packetsInSync )
+               queue_toSend == <<m_ackld>> \o ms_ack \* send ACK-NEWLEADER first.
+           IN /\ infoOk
+              /\ currentEpoch' = [currentEpoch EXCEPT ![i] = acceptedEpoch[i] ]
+              /\ history'      = [history      EXCEPT ![i] = @ \o packetsInSync ]
+              /\ packetsSync'  = [packetsSync  EXCEPT ![i].notCommitted = << >> ]
+              /\ connectInfo'  = [connectInfo  EXCEPT ![i].nlRcv = TRUE,
+                                                      ![i].syncMode = NONE ]
+              /\ \/ /\ SnapshotNeeded(i)
+                    /\ TakeSnapshot(i)
+                 \/ /\ ~SnapshotNeeded(i)
+                    /\ UNCHANGED <<lastSnapshot, violatedInvariants>>
+              /\ DiscardAndSendPackets(i, j, queue_toSend)
+        /\ UNCHANGED <<state, lastProcessed, zabState, acceptedEpoch, lastCommitted, 
+                       electionVars, leaderVars, initialHistory,
+                       proposalMsgsLog, epochLeader, electionMsgs, envVars>>
+        /\ UpdateRecorder(<<"FollowerProcessNEWLEADER", i, j>>)
+
+\* quorumFormed in Leader
+QuorumFormed(i, set) == i \in set /\ IsQuorum(set)
+
+UpdateElectionVote(i, epoch) == UpdateProposal(i, currentVote[i].proposedLeader,
+                                    currentVote[i].proposedZxid, epoch)
+
+\* See startZkServer in Leader for details.
+StartZkServer(i) ==
+        LET latest == LastProposed(i)
+        IN /\ lastCommitted' = [lastCommitted EXCEPT ![i] = latest]
+           /\ lastProcessed' = [lastProcessed EXCEPT ![i] = latest]
+           /\ lastSnapshot'  = [lastSnapshot  EXCEPT ![i] = latest]
+           /\ UpdateElectionVote(i, acceptedEpoch[i])
+
+LeaderTurnToBroadcast(i) ==
+        /\ StartZkServer(i)
+        /\ zabState' = [zabState EXCEPT ![i] = BROADCAST]
+
+(* Leader waits for receiving quorum of ACK whose lower bits of zxid is 0, and
+   broadcasts UPTODATE. See waitForNewLeaderAck for details.  *)
+LeaderProcessACKLD(i, j) ==
+        /\ IsON(i)
+        /\ IsLeader(i)
+        /\ PendingACKLD(i, j)
+        /\ LET msg    == msgs[j][i][1]
+               infoOk == IsMyLearner(i, j)
+               match  == ZxidEqual(msg.mzxid, <<acceptedEpoch[i], 0>>)
+               currentZxid == <<acceptedEpoch[i], 0>>
+               m_uptodate == [ mtype |-> UPTODATE,
+                               mzxid |-> currentZxid ] \* not important
+               sid_ackldRecv == {a.sid: a \in ackldRecv[i]}
+           IN /\ infoOk
+              /\ \/ \* just reply UPTODATE.
+                    /\ QuorumFormed(i, sid_ackldRecv)
+                    /\ Reply(i, j, m_uptodate)
+                    /\ UNCHANGED <<ackldRecv, zabState, lastCommitted, lastProcessed,
+                                   lastSnapshot, currentVote, violatedInvariants>>
+                 \/ /\ ~QuorumFormed(i, sid_ackldRecv)
+                    /\ \/ /\ match
+                          /\ ackldRecv' = [ackldRecv EXCEPT ![i] = UpdateConnectingOrAckldRecv(@, j) ]
+                          /\ LET new_sid_ackldRecv == {a.sid: a \in ackldRecv'[i]}
+                             IN
+                             \/ \* jump out of waitForNewLeaderAck, and do startZkServer,
+                                \* setZabState, and reply UPTODATE.
+                                /\ QuorumFormed(i, new_sid_ackldRecv)
+                                /\ LeaderTurnToBroadcast(i)
+                                /\ DiscardAndBroadcastUPTODATE(i, j, m_uptodate)
+                             \/ \* still wait in waitForNewLeaderAck.
+                                /\ ~QuorumFormed(i, new_sid_ackldRecv)
+                                /\ Discard(j, i)
+                                /\ UNCHANGED <<zabState, lastCommitted, lastProcessed,
+                                               lastSnapshot, currentVote>>
+                          /\ UNCHANGED violatedInvariants
+                       \/ /\ ~match
+                          /\ PrintT("Exception: NEWLEADER ACK is from a different epoch. ")
+                          /\ violatedInvariants' = [violatedInvariants EXCEPT 
+                                        !.ackInconsistent = TRUE]
+                          /\ Discard(j, i)
+                          /\ UNCHANGED <<ackldRecv, zabState, lastCommitted, 
+                                         lastSnapshot, lastProcessed, currentVote>>
+        /\ UNCHANGED <<state, currentEpoch, acceptedEpoch, history, logicalClock, receiveVotes, 
+                    outOfElection, recvQueue, waitNotmsg, leadingVoteSet, learners, connecting, 
+                    electing, forwarding, tempMaxEpoch, initialHistory, followerVars, 
+                    proposalMsgsLog, epochLeader, electionMsgs ,envVars>>
+        /\ UpdateRecorder(<<"LeaderProcessACKLD", i, j>>)
+
+TxnsWithPreviousEpoch(i) ==
+            LET completeHis == IF ~IsFollower(i) \/ zabState[i] /= SYNCHRONIZATION 
+                               THEN history[i] 
+                               ELSE history[i] \o packetsSync[i].notCommitted
+                end   == Len(completeHis)
+                first == IndexOfFirstTxnWithEpoch(completeHis, acceptedEpoch[i], 1, end)
+            IN IF first > end THEN completeHis
+               ELSE SubSeq(completeHis, 1, first - 1)
+
+TxnsRcvWithCurEpoch(i) ==
+            LET completeHis == IF ~IsFollower(i) \/ zabState[i] /= SYNCHRONIZATION 
+                               THEN history[i] 
+                               ELSE history[i] \o packetsSync[i].notCommitted
+                end   == Len(completeHis)
+                first == IndexOfFirstTxnWithEpoch(completeHis, acceptedEpoch[i], 1, end)
+            IN IF first > end THEN << >>
+               ELSE SubSeq(completeHis, first, end) \* completeHis[first : end]
+
+\* Txns received in current epoch but not committed.
+\* See pendingTxns in FollowerZooKeeper for details.
+PendingTxns(i) == IF ~IsFollower(i) \/ zabState[i] /= SYNCHRONIZATION 
+                  THEN SubSeq(history[i], lastCommitted[i].index + 1, Len(history[i]))
+                  ELSE LET packetsCommitted == packetsSync[i].committed
+                           completeHis == history[i] \o packetsSync[i].notCommitted
+                       IN IF Len(packetsCommitted) = 0 
+                          THEN SubSeq(completeHis, Len(history[i]) + 1, Len(completeHis))
+                          ELSE SubSeq(completeHis, LastCommitted(i).index + 1, Len(completeHis))
+
+CommittedTxns(i) == IF ~IsFollower(i) \/ zabState[i] /= SYNCHRONIZATION 
+                    THEN SubSeq(history[i], 1, lastCommitted[i].index)
+                    ELSE LET packetsCommitted == packetsSync[i].committed
+                             completeHis == history[i] \o packetsSync[i].notCommitted
+                         IN IF Len(packetsCommitted) = 0 THEN history[i]
+                            ELSE SubSeq( completeHis, 1, LastCommitted(i).index )
+
+\* Each zxid of packetsCommitted equals to zxid of 
+\* corresponding txn in txns.
+RECURSIVE TxnsAndCommittedMatch(_,_)
+TxnsAndCommittedMatch(txns, packetsCommitted) ==
+        LET len1 == Len(txns)
+            len2 == Len(packetsCommitted)
+        IN IF len2 = 0 THEN TRUE 
+           ELSE IF len1 < len2 THEN FALSE 
+                ELSE /\ ZxidEqual(txns[len1].zxid, packetsCommitted[len2])
+                     /\ TxnsAndCommittedMatch( SubSeq(txns, 1, len1 - 1), 
+                                               SubSeq(packetsCommitted, 1, len2 - 1) )
+
+FollowerLogRequestInBatches(i, leader, ms_ack, packetsNotCommitted) ==
+        /\ history' = [history EXCEPT ![i] = @ \o packetsNotCommitted ]
+        /\ DiscardAndSendPackets(i, leader, ms_ack)
+
+\* Since commit will call commitProcessor.commit, which will finally 
+\* update lastProcessed, we update it here atomically.
+FollowerCommitInBatches(i) ==
+        LET committedTxns == CommittedTxns(i)
+            packetsCommitted == packetsSync[i].committed
+            match == TxnsAndCommittedMatch(committedTxns, packetsCommitted)
+        IN 
+        \/ /\ match 
+           /\ lastCommitted' = [lastCommitted EXCEPT ![i] = LastCommitted(i)]
+           /\ lastProcessed' = [lastProcessed EXCEPT ![i] = lastCommitted'[i]]
+           /\ UNCHANGED violatedInvariants
+        \/ /\ ~match
+           /\ PrintT("Warn: Committing zxid withou see txn. /" \o 
+                "Committing zxid != pending txn zxid.")
+           /\ violatedInvariants' = [violatedInvariants EXCEPT 
+                        !.commitInconsistent = TRUE ]
+           /\ UNCHANGED <<lastCommitted, lastProcessed>>
+
+(* Follower jump out of outerLoop here, and log the stuff that came in
+   between snapshot and uptodate, which means calling logRequest and 
+   commit to clear packetsNotCommitted and packetsCommitted. *)
+FollowerProcessUPTODATE(i, j) ==
+        /\ IsON(i)
+        /\ IsFollower(i)
+        /\ PendingUPTODATE(i, j)
+        /\ LET msg == msgs[j][i][1]
+               infoOk == IsMyLeader(i, j)
+               packetsNotCommitted == packetsSync[i].notCommitted
+               ms_ack == ACKInBatches(<< >>, packetsNotCommitted)
+           IN /\ infoOk
+              \* Here we ignore ack of UPTODATE.
+              /\ UpdateElectionVote(i, acceptedEpoch[i])
+              /\ FollowerLogRequestInBatches(i, j, ms_ack, packetsNotCommitted)
+              /\ FollowerCommitInBatches(i)
+              /\ packetsSync' = [packetsSync EXCEPT ![i].notCommitted = << >>,
+                                                    ![i].committed = << >> ]
+              /\ zabState' = [zabState EXCEPT ![i] = BROADCAST ]
+        /\ UNCHANGED <<state, currentEpoch, acceptedEpoch, logicalClock, lastSnapshot,
+                receiveVotes, outOfElection, recvQueue, waitNotmsg, leaderVars, envVars,
+                initialHistory, connectInfo, epochLeader, proposalMsgsLog, electionMsgs>>
+        /\ UpdateRecorder(<<"FollowerProcessUPTODATE", i, j>>)
+-----------------------------------------------------------------------------
+IncZxid(s, zxid) == IF currentEpoch[s] = zxid[1] THEN <<zxid[1], zxid[2] + 1>>
+                    ELSE <<currentEpoch[s], 1>>
+
+(* Leader receives client propose and broadcasts PROPOSAL. See processRequest
+   in ProposalRequestProcessor and propose in Leader for details. Since 
+   prosalProcessor.processRequest -> syncProcessor.processRequest ->
+   ackProcessor.processRequest -> leader.processAck, we initially set 
+   txn.ackSid = {i}, assuming we have done leader.processAck.
+   Note: In production, any server in traffic can receive requests and 
+         forward it to leader if necessary. We choose to let leader be
+         the sole one who can receive write requests, to simplify spec 
+         and keep correctness at the same time.
+*)
+LeaderProcessRequest(i) ==
+        /\ CheckTransactionNum \* test restrictions of transaction num
+        /\ IsON(i)
+        /\ IsLeader(i)
+        /\ zabState[i] = BROADCAST
+        /\ LET inBroadcast == {s \in forwarding[i]: zabState[s] = BROADCAST}
+           IN IsQuorum(inBroadcast)
+        /\ LET request_value == GetRecorder("nClientRequest") \* unique value
+               newTxn == [ zxid   |-> IncZxid(i, LastProposed(i).zxid),
+                           value  |-> request_value, 
+                           ackSid |-> {i}, \* assume we have done leader.processAck
+                           epoch  |-> acceptedEpoch[i] ]
+               m_proposal == [ mtype |-> PROPOSAL,
+                               mzxid |-> newTxn.zxid,
+                               mdata |-> request_value ]
+               m_proposal_for_checking == [ source |-> i,
+                                            epoch  |-> acceptedEpoch[i],
+                                            zxid   |-> newTxn.zxid,
+                                            data   |-> request_value ]
+           IN /\ history' = [history EXCEPT ![i] = Append(@, newTxn) ]
+              /\ \/ /\ ShouldSnapshot(i)
+                    /\ TakeSnapshot(i)
+                 \/ /\ ~ShouldSnapshot(i)
+                    /\ UNCHANGED <<lastSnapshot, violatedInvariants>>
+              /\ Broadcast(i, m_proposal)
+              /\ proposalMsgsLog' = proposalMsgsLog \union {m_proposal_for_checking}
+        /\ UNCHANGED <<state, currentEpoch, lastProcessed, zabState, acceptedEpoch,
+                       lastCommitted, electionVars, leaderVars, followerVars,
+                       initialHistory, epochLeader, electionMsgs, envVars>>
+        /\ UpdateRecorder(<<"LeaderProcessRequest", i>>)
+
+(* Follower processes PROPOSAL in BROADCAST. See processPacket
+   in Follower for details. *)
+FollowerProcessPROPOSAL(i, j) ==
+        /\ IsON(i)
+        /\ IsFollower(i)
+        /\ PendingPROPOSAL(i, j)
+        /\ zabState[i] = BROADCAST
+        /\ LET msg == msgs[j][i][1]
+               infoOk == IsMyLeader(i, j)
+               isNext == IsNextZxid( LastQueued(i).zxid, msg.mzxid)
+               newTxn == [ zxid   |-> msg.mzxid,
+                           value  |-> msg.mdata,
+                           ackSid |-> {},
+                           epoch  |-> acceptedEpoch[i] ]
+               m_ack  == [ mtype |-> ACK,
+                           mzxid |-> msg.mzxid ]
+          IN /\ infoOk
+             /\ \/ /\ isNext
+                   /\ history' = [history EXCEPT ![i] = Append(@, newTxn)]
+                   /\ \/ /\ ShouldSnapshot(i)
+                         /\ TakeSnapshot(i)
+                      \/ /\ ~ShouldSnapshot(i)
+                         /\ UNCHANGED <<lastSnapshot, violatedInvariants>>
+                   /\ Reply(i, j, m_ack)
+                \/ /\ ~isNext
+                   /\ PrintT("Exception: Follower receives PROPOSAL, while" \o 
+                        " the transaction is not the next.")
+                   /\ violatedInvariants' = [violatedInvariants EXCEPT 
+                                !.proposalInconsistent = TRUE]
+                   /\ UNCHANGED <<history, lastSnapshot, msgs>>
+        /\ UNCHANGED <<state, currentEpoch, lastProcessed, zabState, acceptedEpoch,
+                 lastCommitted, electionVars, leaderVars, followerVars, initialHistory,
+                 epochLeader, proposalMsgsLog, electionMsgs, envVars>>
+        /\ UpdateRecorder(<<"FollowerProcessPROPOSAL", i, j>>)
+
+\* See outstandingProposals in Leader
+OutstandingProposals(i) == IF zabState[i] /= BROADCAST THEN << >>
+                           ELSE SubSeq( history[i], lastCommitted[i].index + 1,
+                                 Len(history[i]) )
+
+LastAckIndexFromFollower(i, j) == 
+        LET set_index == {idx \in 1..Len(history[i]): j \in history[i][idx].ackSid }
+        IN Maximum(set_index)
+
+\* See commit in Leader for details.
+LeaderCommit(s, follower, index, zxid) ==
+        /\ lastCommitted' = [lastCommitted EXCEPT ![s] = [ index |-> index,
+                                                           zxid  |-> zxid ] ]
+        /\ LET m_commit == [ mtype |-> COMMIT,
+                             mzxid |-> zxid ]
+           IN DiscardAndBroadcast(s, follower, m_commit)
+
+\* Try to commit one operation, called by LeaderProcessAck.
+\* See tryToCommit in Leader for details.
+\* commitProcessor.commit -> processWrite -> toBeApplied.processRequest
+\* -> finalProcessor.processRequest, finally processTxn will be implemented
+\* and lastProcessed will be updated. So we update it here.
+LeaderTryToCommit(s, index, zxid, newTxn, follower) ==
+        LET allTxnsBeforeCommitted == lastCommitted[s].index >= index - 1
+                    \* Only when all proposals before zxid has been committed,
+                    \* this proposal can be permitted to be committed.
+            hasAllQuorums == IsQuorum(newTxn.ackSid)
+                    \* In order to be committed, a proposal must be accepted
+                    \* by a quorum.
+            ordered == lastCommitted[s].index + 1 = index
+                    \* Commit proposals in order.
+        IN \/ /\ \* Current conditions do not satisfy committing the proposal.
+                 \/ ~allTxnsBeforeCommitted
+                 \/ ~hasAllQuorums
+              /\ Discard(follower, s)
+              /\ UNCHANGED <<violatedInvariants, lastCommitted, lastProcessed>>
+           \/ /\ allTxnsBeforeCommitted
+              /\ hasAllQuorums
+              /\ \/ /\ ~ordered
+                    /\ PrintT("Warn: Committing zxid " \o ToString(zxid) \o " not first.")
+                    /\ violatedInvariants' = [violatedInvariants EXCEPT 
+                            !.commitInconsistent = TRUE]
+                 \/ /\ ordered
+                    /\ UNCHANGED violatedInvariants
+              /\ LeaderCommit(s, follower, index, zxid)
+              /\ lastProcessed' = [lastProcessed EXCEPT ![s] = [ index |-> index,
+                                                                 zxid  |-> zxid ] ]
+
+(* Leader Keeps a count of acks for a particular proposal, and try to
+   commit the proposal. See case Leader.ACK in LearnerHandler,
+   processRequest in AckRequestProcessor, and processAck in Leader for
+   details. *)
+LeaderProcessACK(i, j) ==
+        /\ IsON(i)
+        /\ IsLeader(i)
+        /\ PendingACK(i, j)
+        /\ LET msg == msgs[j][i][1]
+               infoOk == IsMyLearner(i, j)
+               outstanding == LastCommitted(i).index < LastProposed(i).index
+                        \* outstandingProposals not null
+               hasCommitted == ~ZxidCompare( msg.mzxid, LastCommitted(i).zxid)
+                        \* namely, lastCommitted >= zxid
+               index == ZxidToIndex(history[i], msg.mzxid)
+               exist == index >= 1 /\ index <= LastProposed(i).index
+                        \* the proposal exists in history
+               ackIndex == LastAckIndexFromFollower(i, j)
+               monotonicallyInc == \/ ackIndex = -1
+                                   \/ ackIndex + 1 = index
+                        \* TCP makes everytime ackIndex should just increase by 1
+           IN /\ infoOk 
+              /\ \/ /\ exist
+                    /\ monotonicallyInc
+                    /\ LET txn == history[i][index]
+                           txnAfterAddAck == [ zxid   |-> txn.zxid,
+                                               value  |-> txn.value,
+                                               ackSid |-> txn.ackSid \union {j} ,
+                                               epoch  |-> txn.epoch ]   
+                       IN \* p.addAck(sid)
+                       /\ history' = [history EXCEPT ![i][index] = txnAfterAddAck ] 
+                       /\ \/ /\ \* Note: outstanding is 0. 
+                                \* / proposal has already been committed.
+                                \/ ~outstanding
+                                \/ hasCommitted
+                             /\ Discard(j, i)
+                             /\ UNCHANGED <<violatedInvariants, lastCommitted, lastProcessed>>
+                          \/ /\ outstanding
+                             /\ ~hasCommitted
+                             /\ LeaderTryToCommit(i, index, msg.mzxid, txnAfterAddAck, j)
+                 \/ /\ \/ ~exist
+                       \/ ~monotonicallyInc
+                    /\ PrintT("Exception: No such zxid. " \o 
+                           " / ackIndex doesn't inc monotonically.")
+                    /\ violatedInvariants' = [violatedInvariants 
+                            EXCEPT !.ackInconsistent = TRUE]
+                    /\ Discard(j, i)
+                    /\ UNCHANGED <<history, lastCommitted, lastProcessed>>
+        /\ UNCHANGED <<state, currentEpoch, zabState, acceptedEpoch, electionVars,
+                    leaderVars, initialHistory, followerVars, proposalMsgsLog, epochLeader, 
+                    lastSnapshot, electionMsgs, envVars>>
+        /\ UpdateRecorder(<<"LeaderProcessACK", i, j>>)
+
+(* Follower processes COMMIT in BROADCAST. See processPacket
+   in Follower for details. *)
+FollowerProcessCOMMIT(i, j) ==
+        /\ IsON(i)
+        /\ IsFollower(i)
+        /\ PendingCOMMIT(i, j)
+        /\ zabState[i] = BROADCAST
+        /\ LET msg == msgs[j][i][1]
+               infoOk == IsMyLeader(i, j)
+               pendingTxns == PendingTxns(i)
+               noPending == Len(pendingTxns) = 0
+           IN
+           /\ infoOk 
+           /\ \/ /\ noPending
+                 /\ PrintT("Warn: Committing zxid without seeing txn.")
+                 /\ UNCHANGED <<lastCommitted, lastProcessed, violatedInvariants>>
+              \/ /\ ~noPending
+                 /\ LET firstElementZxid == pendingTxns[1].zxid
+                        match == ZxidEqual(firstElementZxid, msg.mzxid)
+                    IN 
+                    \/ /\ ~match
+                       /\ PrintT("Exception: Committing zxid not equals" \o
+                                " next pending txn zxid.")
+                       /\ violatedInvariants' = [violatedInvariants EXCEPT 
+                               !.commitInconsistent = TRUE]
+                       /\ UNCHANGED <<lastCommitted, lastProcessed>>
+                    \/ /\ match
+                       /\ lastCommitted' = [lastCommitted EXCEPT 
+                                ![i] = [ index |-> lastCommitted[i].index + 1,
+                                         zxid  |-> firstElementZxid ] ]
+                       /\ lastProcessed' = [lastProcessed EXCEPT 
+                                ![i] = [ index |-> lastCommitted[i].index + 1,
+                                         zxid  |-> firstElementZxid ] ]
+                       /\ UNCHANGED violatedInvariants
+        /\ Discard(j, i)
+        /\ UNCHANGED <<state, currentEpoch, zabState, acceptedEpoch, history,
+                    electionVars, leaderVars, initialHistory, followerVars,
+                    lastSnapshot, proposalMsgsLog, epochLeader, electionMsgs, envVars>>
+        /\ UpdateRecorder(<<"FollowerProcessCOMMIT", i, j>>)
+-----------------------------------------------------------------------------
+(* Used to discard some messages which should not exist in network channel.
+   This action should not be triggered. *)
+FilterNonexistentMessage(i) ==
+        /\ \E j \in Server \ {i}: /\ msgs[j][i] /= << >>
+                                  /\ LET msg == msgs[j][i][1]
+                                     IN 
+                                        \/ /\ IsLeader(i)
+                                           /\ LET infoOk == IsMyLearner(i, j)
+                                              IN
+                                              \/ msg.mtype = LEADERINFO
+                                              \/ msg.mtype = NEWLEADER
+                                              \/ msg.mtype = UPTODATE
+                                              \/ msg.mtype = PROPOSAL
+                                              \/ msg.mtype = COMMIT
+                                              \/ /\ ~infoOk
+                                                 /\ \/ msg.mtype = FOLLOWERINFO
+                                                    \/ msg.mtype = ACKEPOCH
+                                                    \/ msg.mtype = ACKLD
+                                                    \/ msg.mtype = ACK
+                                        \/ /\ IsFollower(i)
+                                           /\ LET infoOk == IsMyLeader(i, j) 
+                                              IN
+                                              \/ msg.mtype = FOLLOWERINFO
+                                              \/ msg.mtype = ACKEPOCH
+                                              \/ msg.mtype = ACKLD
+                                              \/ msg.mtype = ACK
+                                              \/ /\ ~infoOk
+                                                 /\ \/ msg.mtype = LEADERINFO
+                                                    \/ msg.mtype = NEWLEADER
+                                                    \/ msg.mtype = UPTODATE
+                                                    \/ msg.mtype = PROPOSAL
+                                                    \/ msg.mtype = COMMIT   
+                                        \/ IsLooking(i)
+                                  /\ Discard(j, i)
+        /\ violatedInvariants' = [violatedInvariants EXCEPT !.messageIllegal = TRUE]
+        /\ UNCHANGED <<serverVars, electionVars, leaderVars, envVars,
+                       followerVars, proposalMsgsLog, epochLeader, electionMsgs>>
+        /\ UnchangeRecorder        
+-----------------------------------------------------------------------------
+\* Defines how the variables may transition.
+Next == 
+        (* FLE module *)
+            \/ \E i, j \in Server: FLEReceiveNotmsg(i, j)
+            \/ \E i \in Server:    FLENotmsgTimeout(i)
+            \/ \E i \in Server:    FLEHandleNotmsg(i)
+            \/ \E i \in Server:    FLEWaitNewNotmsg(i)
+        (* situation errors like failure, network partition *)
+            \/ \E i, j \in Server: PartitionStart(i, j)
+            \/ \E i, j \in Server: PartitionRecover(i, j)
+            \/ \E i \in Server:    NodeCrash(i)
+            \/ \E i \in Server:    NodeStart(i)
+        (* Zab module - Discovery and Synchronization part *)
+            \/ \E i, j \in Server: ConnectAndFollowerSendFOLLOWERINFO(i, j)
+            \/ \E i, j \in Server: LeaderProcessFOLLOWERINFO(i, j)
+            \/ \E i, j \in Server: FollowerProcessLEADERINFO(i, j)
+            \/ \E i, j \in Server: LeaderProcessACKEPOCH(i, j)
+            \/ \E i, j \in Server: LeaderSyncFollower(i, j)
+            \/ \E i, j \in Server: FollowerProcessSyncMessage(i, j)
+            \/ \E i, j \in Server: FollowerProcessPROPOSALInSync(i, j)
+            \/ \E i, j \in Server: FollowerProcessCOMMITInSync(i, j)
+            \/ \E i, j \in Server: FollowerProcessNEWLEADER(i, j)
+            \/ \E i, j \in Server: LeaderProcessACKLD(i, j)
+            \/ \E i, j \in Server: FollowerProcessUPTODATE(i, j)
+        (* Zab module - Broadcast part *)
+            \/ \E i \in Server:    LeaderProcessRequest(i)
+            \/ \E i, j \in Server: FollowerProcessPROPOSAL(i, j)
+            \/ \E i, j \in Server: LeaderProcessACK(i, j)
+            \/ \E i, j \in Server: FollowerProcessCOMMIT(i, j)
+        (* An action used to judge whether there are redundant messages in network *)
+            \/ \E i \in Server:    FilterNonexistentMessage(i)
+
+Spec == Init /\ [][Next]_vars
+-----------------------------------------------------------------------------
+\* Define safety properties of Zab 1.0 protocol.
+
+CheckDuringAction == \A p \in DOMAIN violatedInvariants: violatedInvariants[p] = FALSE
+
+\* There is most one established leader for a certain epoch.
+Leadership1 == \A i, j \in Server:
+                   /\ IsLeader(i) /\ zabState[i] \in {SYNCHRONIZATION, BROADCAST}
+                   /\ IsLeader(j) /\ zabState[j] \in {SYNCHRONIZATION, BROADCAST}
+                   /\ acceptedEpoch[i] = acceptedEpoch[j]
+                  => i = j
+
+Leadership2 == \A epoch \in 1..MAXEPOCH: Cardinality(epochLeader[epoch]) <= 1
+
+\* PrefixConsistency: The prefix that have been committed 
+\* in history in any process is the same.
+PrefixConsistency == \A i, j \in Server:
+                        LET smaller == Minimum({lastCommitted[i].index, lastCommitted[j].index})
+                        IN \/ smaller = 0
+                           \/ /\ smaller > 0
+                              /\ \A index \in 1..smaller:
+                                   TxnEqual(history[i][index], history[j][index])
+
+\* Integrity: If some follower delivers one transaction, then some primary has broadcast it.
+Integrity == \A i \in Server:
+                /\ IsFollower(i)
+                /\ lastCommitted[i].index > 0
+                => \A idx \in 1..lastCommitted[i].index: \E proposal \in proposalMsgsLog:
+                    LET txn_proposal == [ zxid  |-> proposal.zxid,
+                                          value |-> proposal.data ]
+                    IN  TxnEqual(history[i][idx], txn_proposal)
+
+\* Agreement: If some follower f delivers transaction a and some follower f' delivers transaction b,
+\*            then f' delivers a or f delivers b.
+Agreement == \A i, j \in Server:
+                /\ IsFollower(i) /\ lastCommitted[i].index > 0
+                /\ IsFollower(j) /\ lastCommitted[j].index > 0
+                =>
+                \A idx1 \in 1..lastCommitted[i].index, idx2 \in 1..lastCommitted[j].index :
+                    \/ \E idx_j \in 1..lastCommitted[j].index:
+                        TxnEqual(history[j][idx_j], history[i][idx1])
+                    \/ \E idx_i \in 1..lastCommitted[i].index:
+                        TxnEqual(history[i][idx_i], history[j][idx2])
+
+\* Total order: If some follower delivers a before b, then any process that delivers b
+\*              must also deliver a and deliver a before b.
+TotalOrder == \A i, j \in Server: 
+                LET committed1 == lastCommitted[i].index 
+                    committed2 == lastCommitted[j].index  
+                IN committed1 >= 2 /\ committed2 >= 2
+                    => \A idx_i1 \in 1..(committed1 - 1) : \A idx_i2 \in (idx_i1 + 1)..committed1 :
+                    LET logOk == \E idx \in 1..committed2 :
+                                     TxnEqual(history[i][idx_i2], history[j][idx])
+                    IN \/ ~logOk 
+                       \/ /\ logOk 
+                          /\ \E idx_j2 \in 1..committed2 : 
+                                /\ TxnEqual(history[i][idx_i2], history[j][idx_j2])
+                                /\ \E idx_j1 \in 1..(idx_j2 - 1):
+                                       TxnEqual(history[i][idx_i1], history[j][idx_j1])
+
+\* Local primary order: If a primary broadcasts a before it broadcasts b, then a follower that
+\*                      delivers b must also deliver a before b.
+LocalPrimaryOrder == LET p_set(i, e) == {p \in proposalMsgsLog: /\ p.source = i 
+                                                                /\ p.epoch  = e }
+                         txn_set(i, e) == { [ zxid  |-> p.zxid, 
+                                              value |-> p.data ] : p \in p_set(i, e) }
+                     IN \A i \in Server: \A e \in 1..currentEpoch[i]:
+                         \/ Cardinality(txn_set(i, e)) < 2
+                         \/ /\ Cardinality(txn_set(i, e)) >= 2
+                            /\ \E txn1, txn2 \in txn_set(i, e):
+                             \/ TxnEqual(txn1, txn2)
+                             \/ /\ ~TxnEqual(txn1, txn2)
+                                /\ LET TxnPre  == IF ZxidCompare(txn1.zxid, txn2.zxid) THEN txn2 ELSE txn1
+                                       TxnNext == IF ZxidCompare(txn1.zxid, txn2.zxid) THEN txn1 ELSE txn2
+                                   IN \A j \in Server: /\ lastCommitted[j].index >= 2
+                                                       /\ \E idx \in 1..lastCommitted[j].index: 
+                                                            TxnEqual(history[j][idx], TxnNext)
+                                        => \E idx2 \in 1..lastCommitted[j].index: 
+                                            /\ TxnEqual(history[j][idx2], TxnNext)
+                                            /\ idx2 > 1
+                                            /\ \E idx1 \in 1..(idx2 - 1): 
+                                                TxnEqual(history[j][idx1], TxnPre)
+
+\* Global primary order: A follower f delivers both a with epoch e and b with epoch e', and e < e',
+\*                       then f must deliver a before b.
+GlobalPrimaryOrder == \A i \in Server: lastCommitted[i].index >= 2
+                         => \A idx1, idx2 \in 1..lastCommitted[i].index:
+                                \/ ~EpochPrecedeInTxn(history[i][idx1], history[i][idx2])
+                                \/ /\ EpochPrecedeInTxn(history[i][idx1], history[i][idx2])
+                                   /\ idx1 < idx2
+
+\* Primary integrity: If primary p broadcasts a and some follower f delivers b such that b has epoch
+\*                    smaller than epoch of p, then p must deliver b before it broadcasts a.
+PrimaryIntegrity == \A i, j \in Server: /\ IsLeader(i)   /\ IsMyLearner(i, j)
+                                        /\ IsFollower(j) /\ IsMyLeader(j, i)
+                                        /\ zabState[i] = BROADCAST
+                                        /\ zabState[j] = BROADCAST
+                                        /\ lastCommitted[j].index >= 1
+                        => \A idx_j \in 1..lastCommitted[j].index:
+                                \/ history[j][idx_j].zxid[1] >= currentEpoch[i]
+                                \/ /\ history[j][idx_j].zxid[1] < currentEpoch[i]
+                                   /\ \E idx_i \in 1..lastCommitted[i].index: 
+                                        TxnEqual(history[i][idx_i], history[j][idx_j])
+=============================================================================
+\* Modification History
+\* Last modified Tue Jan 17 21:21:28 CST 2023 by huangbinyu
+\* Last modified Mon Nov 22 22:25:23 CST 2021 by Dell
+\* Created Sat Oct 23 16:05:04 CST 2021 by Dell