Ver código fonte

ZOOKEEPER-3479: Logging false leader election times

See https://issues.apache.org/jira/browse/ZOOKEEPER-3479 here for the issue.

The problem is that the logged leader election times are much smaller than in reality. For example, we have log files that contain the following lines:
```
2019-07-31 00:57:31,852 [myid:1] - WARN [QuorumPeer[myid=1](plain=/0.0.0.0:2791)(secure=disabled):QuorumPeer1318] - PeerState set to LOOKING
2019-07-31 00:57:31,853 [myid:1] - INFO [QuorumPeer[myid=1](plain=/0.0.0.0:2791)(secure=disabled):QuorumPeer1193] - LOOKING
2019-07-31 00:57:31,853 [myid:1] - INFO [QuorumPeer[myid=1](plain=/0.0.0.0:2791)(secure=disabled):FastLeaderElection885] - New election. My id = 1, proposed zxid=0x100000001
[...]
2019-07-31 00:57:32,272 [myid:1] - INFO [QuorumPeer[myid=1](plain=/0.0.0.0:2791)(secure=disabled):Follower69] - FOLLOWING - LEADER ELECTION TOOK - 1 MS
```
Leader election took more than 400ms, not 1ms!

The reason this logging bug exists has to do with this [line](https://github.com/apache/zookeeper/blob/master/zookeeper-server/src/main/java/org/apache/zookeeper/server/quorum/QuorumPeer.java#L1402) in `QuorumPeer`. The computation of the leader election time starts inside [`FastLeaderElection.lookForLeader` ](https://github.com/apache/zookeeper/blob/master/zookeeper-server/src/main/java/org/apache/zookeeper/server/quorum/FastLeaderElection.java#L915) where `start_fle` is set. After a server has found a leader, it repeats the `QuorumPeer` [loop](https://github.com/apache/zookeeper/blob/master/zookeeper-server/src/main/java/org/apache/zookeeper/server/quorum/QuorumPeer.java#L1288) and resets the `start_fle` value. Hence, when, for example, a follower [logs](https://github.com/apache/zookeeper/blob/master/zookeeper-server/src/main/java/org/apache/zookeeper/server/quorum/Follower.java#L75) how much time it took for the leader election, it logs a much smaller time than it actually took for the leader election to complete. To be precise, it just logs the time it took to loop inside [`QuorumPeer`](https://github.com/apache/zookeeper/blob/master/zookeeper-server/src/main/java/org/apache/zookeeper/server/quorum/QuorumPeer.java#L1288).

Furthermore, if we remove the problematic [line](https://github.com/apache/zookeeper/blob/master/zookeeper-server/src/main/java/org/apache/zookeeper/server/quorum/QuorumPeer.java#L1402), then there is no reason to check whether `start_fle == 0` in [`FastLeaderElection`](https://github.com/apache/zookeeper/blob/master/zookeeper-server/src/main/java/org/apache/zookeeper/server/quorum/FastLeaderElection.java#L914), since `start_fle` is always `0` at that point. I'm not sure if this `if` statement is needed in the first place.

Author: foo bar <karolos@gmail.com>

Reviewers: Michael Han <hanm@apache.org>, Allan Lyu <fangmin@apache.org>

Closes #1031 from insumity/ZOOKEEPER-3479
Karolos Antoniadis 5 anos atrás
pai
commit
3882a0171f

+ 2 - 3
zookeeper-server/src/main/java/org/apache/zookeeper/server/quorum/FastLeaderElection.java

@@ -911,9 +911,8 @@ public class FastLeaderElection implements Election {
             LOG.warn("Failed to register with JMX", e);
             self.jmxLeaderElectionBean = null;
         }
-        if (self.start_fle == 0) {
-           self.start_fle = Time.currentElapsedTime();
-        }
+
+        self.start_fle = Time.currentElapsedTime();
         try {
             Map<Long, Vote> recvset = new HashMap<Long, Vote>();
 

+ 0 - 1
zookeeper-server/src/main/java/org/apache/zookeeper/server/quorum/QuorumPeer.java

@@ -1399,7 +1399,6 @@ public class QuorumPeer extends ZooKeeperThread implements QuorumStats.Provider
                     }
                     break;
                 }
-                start_fle = Time.currentElapsedTime();
             }
         } finally {
             LOG.warn("QuorumPeer main thread exited");