diff mbox series

[ovs-dev,2/2] ovsdb raft: Avoid unnecessary reconnecting during leader election.

Message ID 1555701467-26745-2-git-send-email-hzhou8@ebay.com
State Accepted
Commit 4d9b28cbb797ad112d2aefccab93c7d37493941a
Headers show
Series [ovs-dev,1/2] ovsdb-cluster-testsuite.at: Restores "clustered transactions" tests back. | expand

Commit Message

Han Zhou April 19, 2019, 7:17 p.m. UTC
From: Han Zhou <hzhou8@ebay.com>

If a server claims itself as "disconnected", all clients connected
to that server will try to reconnect to a new server in the cluster.

However, currently a server would claim itself as disconnected even
when itself is the candidate and try to become the new leader (most
likely it will be), and all its clients will reconnect to another
node.

During a leader fail-over (e.g. due to a leader failure), it is
expected that all clients of the old leader will have to reconnect
to other nodes in the cluster, but it is unnecessary for all the
clients of a healthy node to reconnect, which could cause more
disturbance in a large scale environment.

This patch fixes the problem by slightly change the condition that
a server regards itself as disconnected: if its role is candidate,
it is regarded as disconnected only if the election didn't succeed
at the first attempt. Related failure test cases are also unskipped
and all passed with this patch.

Signed-off-by: Han Zhou <hzhou8@ebay.com>
---
 ovsdb/raft.c           | 13 +++++++++++--
 tests/ovsdb-cluster.at |  6 ------
 2 files changed, 11 insertions(+), 8 deletions(-)

Comments

Ben Pfaff April 22, 2019, 8:15 p.m. UTC | #1
Thanks for the patches.  I applied them to master.
diff mbox series

Patch

diff --git a/ovsdb/raft.c b/ovsdb/raft.c
index 77ad365..c60ef41 100644
--- a/ovsdb/raft.c
+++ b/ovsdb/raft.c
@@ -286,6 +286,8 @@  struct raft {
 
     /* Candidates only.  Reinitialized at start of election. */
     int n_votes;                /* Number of votes for me. */
+    bool candidate_retrying;    /* The first round of election timed-out and it
+                                   is now retrying. */
 };
 
 /* All Raft structures. */
@@ -985,11 +987,17 @@  raft_get_sid(const struct raft *raft)
 /* Returns true if 'raft' has completed joining its cluster, has not left or
  * initiated leaving the cluster, does not have failed disk storage, and is
  * apparently connected to the leader in a healthy way (or is itself the
- * leader).*/
+ * leader).
+ *
+ * If 'raft' is candidate:
+ * a) if it is the first round of election, consider it as connected, hoping
+ *    it will successfully elect a new leader soon.
+ * b) if it is already retrying, consider it as disconnected (so that clients
+ *    may decide to reconnect to other members). */
 bool
 raft_is_connected(const struct raft *raft)
 {
-    return (raft->role != RAFT_CANDIDATE
+    return (!(raft->role == RAFT_CANDIDATE && raft->candidate_retrying)
             && !raft->joining
             && !raft->leaving
             && !raft->left
@@ -1608,6 +1616,7 @@  raft_start_election(struct raft *raft, bool leadership_transfer)
     }
 
     ovs_assert(raft->role != RAFT_LEADER);
+    raft->candidate_retrying = (raft->role == RAFT_CANDIDATE);
     raft->role = RAFT_CANDIDATE;
 
     raft->n_votes = 0;
diff --git a/tests/ovsdb-cluster.at b/tests/ovsdb-cluster.at
index 931da6e..4701272 100644
--- a/tests/ovsdb-cluster.at
+++ b/tests/ovsdb-cluster.at
@@ -148,8 +148,6 @@  AT_CLEANUP
 
 AT_SETUP([OVSDB cluster - txn on follower-2, leader crash before sending execRep, follower-2 becomes leader])
 AT_KEYWORDS([ovsdb server negative unix cluster pending-txn])
-# XXX: fix bug before enabling this test
-AT_CHECK([exit 77])
 ovsdb_cluster_failure_test 2 3 1 crash-before-sending-execute-command-reply 2
 AT_CLEANUP
 
@@ -160,8 +158,6 @@  AT_CLEANUP
 
 AT_SETUP([OVSDB cluster - txn on follower-2, leader crash after sending execRep, follower-2 becomes leader])
 AT_KEYWORDS([ovsdb server negative unix cluster pending-txn])
-# XXX: fix bug before enabling this test
-AT_CHECK([exit 77])
 ovsdb_cluster_failure_test 2 3 1 crash-after-sending-execute-command-reply 2
 AT_CLEANUP
 
@@ -172,8 +168,6 @@  AT_CLEANUP
 
 AT_SETUP([OVSDB cluster - txn on leader, leader crash before sending appendReq, follower-2 becomes leader])
 AT_KEYWORDS([ovsdb server negative unix cluster pending-txn])
-# XXX: fix bug before enabling this test
-AT_CHECK([exit 77])
 ovsdb_cluster_failure_test 1 2 1 crash-before-sending-append-request 2
 AT_CLEANUP