diff mbox series

[ovs-dev,1/2] ovsdb: raft: Fix inability to join a cluster with a large database.

Message ID 20240411234504.2913083-2-i.maximets@ovn.org
State Accepted
Commit d7f2150ea83a02336adbf0bf48f160664f11ce8f
Delegated to: Ilya Maximets
Headers show
Series ovsdb: raft: Fixes for probe interval updates. | expand

Checks

Context Check Description
ovsrobot/apply-robot success apply and check: success
ovsrobot/intel-ovs-compilation fail test: fail
ovsrobot/github-robot-_Build_and_Test success github build: passed

Commit Message

Ilya Maximets April 11, 2024, 11:45 p.m. UTC
Inactivity probe interval on RAFT connections depend on a value of the
election timer.  However, the actual value is not known until the
database snapshot with the RAFT information is received by a joining
server.  New joining server is using a default 1 second until then.

In case a new joining server is trying to join an existing cluster
with a large database, it may take more than a second to generate and
send an initial database snapshot.  This is causing an inability to
actually join this cluster.  Joining server sends ADD_SERVER request,
waits 1 second, sends a probe, doesn't get a reply within another
second, because the leader is busy preparing and sending an initial
snapshot to it, disconnects, repeat.

This is not an issue for the servers that did already join, since
their probe intervals are larger than election timeout.
Cooperative multitasking also doesn't fully solve this issue, since
it depends on election timer, which is likely higher in the existing
cluster with a very big database.

Fix that by using the maximum election timer value for inactivity
probes until the actual value is known.  We still shouldn't completely
disable the probes, because in the rare event the connection is
established but the other side silently goes away, we still want to
disconnect and try to re-establish the connection eventually.

Since probe intervals also depend on the joining state now, update
them when the server joins the cluster.

Fixes: 14b2b0aad7ae ("raft: Reintroduce jsonrpc inactivity probes.")
Reported-by: Terry Wilson <twilson@redhat.com>
Reported-at: https://issues.redhat.com/browse/FDP-144
Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
---
 ovsdb/raft.c | 22 +++++++++++++++++-----
 1 file changed, 17 insertions(+), 5 deletions(-)

Comments

Mike Pattrick April 17, 2024, 6:37 p.m. UTC | #1
On Thu, Apr 11, 2024 at 7:44 PM Ilya Maximets <i.maximets@ovn.org> wrote:
>
> Inactivity probe interval on RAFT connections depend on a value of the
> election timer.  However, the actual value is not known until the
> database snapshot with the RAFT information is received by a joining
> server.  New joining server is using a default 1 second until then.
>
> In case a new joining server is trying to join an existing cluster
> with a large database, it may take more than a second to generate and
> send an initial database snapshot.  This is causing an inability to
> actually join this cluster.  Joining server sends ADD_SERVER request,
> waits 1 second, sends a probe, doesn't get a reply within another
> second, because the leader is busy preparing and sending an initial
> snapshot to it, disconnects, repeat.
>
> This is not an issue for the servers that did already join, since
> their probe intervals are larger than election timeout.
> Cooperative multitasking also doesn't fully solve this issue, since
> it depends on election timer, which is likely higher in the existing
> cluster with a very big database.
>
> Fix that by using the maximum election timer value for inactivity
> probes until the actual value is known.  We still shouldn't completely
> disable the probes, because in the rare event the connection is
> established but the other side silently goes away, we still want to
> disconnect and try to re-establish the connection eventually.
>
> Since probe intervals also depend on the joining state now, update
> them when the server joins the cluster.
>
> Fixes: 14b2b0aad7ae ("raft: Reintroduce jsonrpc inactivity probes.")
> Reported-by: Terry Wilson <twilson@redhat.com>
> Reported-at: https://issues.redhat.com/browse/FDP-144
> Signed-off-by: Ilya Maximets <i.maximets@ovn.org>
> ---

Acked-by: Mike Pattrick <mkp@redhat.com>
diff mbox series

Patch

diff --git a/ovsdb/raft.c b/ovsdb/raft.c
index d81a1758a..083ebf66a 100644
--- a/ovsdb/raft.c
+++ b/ovsdb/raft.c
@@ -1018,8 +1018,13 @@  raft_conn_update_probe_interval(struct raft *raft, struct raft_conn *r_conn)
      * inactivity probe follower will just try to initiate election
      * indefinitely staying in 'candidate' role.  And the leader will continue
      * to send heartbeats to the dead connection thinking that remote server
-     * is still part of the cluster. */
-    int probe_interval = raft->election_timer + ELECTION_RANGE_MSEC;
+     * is still part of the cluster.
+     *
+     * While joining, the real value of the election timeout is not known to
+     * this server, so using the maximum. */
+    int probe_interval = (raft->joining ? ELECTION_MAX_MSEC
+                                        : raft->election_timer)
+                         + ELECTION_RANGE_MSEC;
 
     jsonrpc_session_set_probe_interval(r_conn->js, probe_interval);
 }
@@ -2820,6 +2825,13 @@  raft_send_heartbeats(struct raft *raft)
     raft_reset_ping_timer(raft);
 }
 
+static void
+raft_join_complete(struct raft *raft)
+{
+    raft->joining = false;
+    raft_update_probe_intervals(raft);
+}
+
 /* Initializes the fields in 's' that represent the leader's view of the
  * server. */
 static void
@@ -2866,7 +2878,7 @@  raft_become_leader(struct raft *raft)
          * we're becoming a cluster leader without receiving reply for a
          * join request and will commit addition of this server ourselves. */
         VLOG_INFO_RL(&rl, "elected as leader while joining");
-        raft->joining = false;
+        raft_join_complete(raft);
     }
 
     struct raft_server *s;
@@ -3101,7 +3113,7 @@  raft_update_commit_index(struct raft *raft, uint64_t new_commit_index)
                         "added to configuration without reply "
                         "(eid: "UUID_FMT", commit index: %"PRIu64")",
                         UUID_ARGS(&e->eid), index);
-                    raft->joining = false;
+                    raft_join_complete(raft);
                 }
             }
             raft_servers_destroy(&servers);
@@ -4049,7 +4061,7 @@  raft_handle_add_server_reply(struct raft *raft,
     }
 
     if (rpy->success) {
-        raft->joining = false;
+        raft_join_complete(raft);
 
         /* It is tempting, at this point, to check that this server is part of
          * the current configuration.  However, this is not necessarily the