diff mbox series

[ovs-dev,v2] raft: Only allow followers to snapshot.

Message ID 20211213194603.32487-1-dceara@redhat.com
State Accepted
Headers show
Series [ovs-dev,v2] raft: Only allow followers to snapshot. | expand

Checks

Context Check Description
ovsrobot/apply-robot success apply and check: success
ovsrobot/github-robot-_Build_and_Test success github build: passed

Commit Message

Dumitru Ceara Dec. 13, 2021, 7:46 p.m. UTC
Commit 3c2d6274bcee ("raft: Transfer leadership before creating
snapshots.") made it such that raft leaders transfer leadership before
snapshotting.  However, there's still the case when the next leader to
be is in the process of snapshotting.  To avoid delays in that case too,
we now explicitly allow snapshots only on followers.  Cluster members
will have to wait until the current election is settled before
snapshotting.

Given the following logs taken from an OVN_Southbound 3-server cluster
during a scale test:

S1 (old leader):
  2021-12-10T19:07:51.226Z|00823|raft|INFO|Transferring leadership to write a snapshot.
  2021-12-10T19:08:03.830Z|00824|ovsdb|INFO|OVN_Southbound: Database compaction took 12601ms
  2021-12-10T19:08:03.833Z|00825|timeval|WARN|Unreasonably long 12604ms poll interval (10632ms user, 1924ms system)
  2021-12-10T19:08:03.940Z|00838|raft|INFO|server 8b8d is leader for term 43

S2 (follower):
  2021-12-10T19:08:00.870Z|00481|raft|INFO|server 8b8d is leader for term 43

S3 (new leader):
  2021-12-10T19:07:51.242Z|01083|raft|INFO|received leadership transfer from f5c9 in term 42
  2021-12-10T19:07:51.244Z|01084|raft|INFO|term 43: starting election
  2021-12-10T19:08:00.805Z|01085|ovsdb|INFO|OVN_Southbound: Database compaction took 9559ms
  2021-12-10T19:08:00.869Z|01100|raft|INFO|term 43: elected leader by 2+ of 3 servers

We see that the leader to be (S3) receives the leadership transfer,
initiates the election and immediately after starts a snapshot that
takes ~9.5 seconds.  During this time, S2 votes for S3 electing it
as cluster leader but S3 doesn't effectively become leader until it
finishes snapshotting, essentially keeping the cluster without a
leader for up to ~9.5 seconds.

With the current change, S3 will delay compaction and snapshotting until
the election is finished.

The only exception is the case of single-node clusters for which we
allow the node to snapshot regardless of role.

Acked-by: Han Zhou <hzhou@ovn.org>
Signed-off-by: Dumitru Ceara <dceara@redhat.com>
---
V2:
- Added Han's ack and his suggestion to allow single-node clusters to
  snapshot.
---
 ovsdb/raft.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Ilya Maximets Dec. 15, 2021, 12:52 p.m. UTC | #1
On 12/13/21 20:46, Dumitru Ceara wrote:
> Commit 3c2d6274bcee ("raft: Transfer leadership before creating
> snapshots.") made it such that raft leaders transfer leadership before
> snapshotting.  However, there's still the case when the next leader to
> be is in the process of snapshotting.  To avoid delays in that case too,
> we now explicitly allow snapshots only on followers.  Cluster members
> will have to wait until the current election is settled before
> snapshotting.
> 
> Given the following logs taken from an OVN_Southbound 3-server cluster
> during a scale test:
> 
> S1 (old leader):
>   2021-12-10T19:07:51.226Z|00823|raft|INFO|Transferring leadership to write a snapshot.
>   2021-12-10T19:08:03.830Z|00824|ovsdb|INFO|OVN_Southbound: Database compaction took 12601ms
>   2021-12-10T19:08:03.833Z|00825|timeval|WARN|Unreasonably long 12604ms poll interval (10632ms user, 1924ms system)
>   2021-12-10T19:08:03.940Z|00838|raft|INFO|server 8b8d is leader for term 43
> 
> S2 (follower):
>   2021-12-10T19:08:00.870Z|00481|raft|INFO|server 8b8d is leader for term 43
> 
> S3 (new leader):
>   2021-12-10T19:07:51.242Z|01083|raft|INFO|received leadership transfer from f5c9 in term 42
>   2021-12-10T19:07:51.244Z|01084|raft|INFO|term 43: starting election
>   2021-12-10T19:08:00.805Z|01085|ovsdb|INFO|OVN_Southbound: Database compaction took 9559ms
>   2021-12-10T19:08:00.869Z|01100|raft|INFO|term 43: elected leader by 2+ of 3 servers
> 
> We see that the leader to be (S3) receives the leadership transfer,
> initiates the election and immediately after starts a snapshot that
> takes ~9.5 seconds.  During this time, S2 votes for S3 electing it
> as cluster leader but S3 doesn't effectively become leader until it
> finishes snapshotting, essentially keeping the cluster without a
> leader for up to ~9.5 seconds.
> 
> With the current change, S3 will delay compaction and snapshotting until
> the election is finished.
> 
> The only exception is the case of single-node clusters for which we
> allow the node to snapshot regardless of role.
> 
> Acked-by: Han Zhou <hzhou@ovn.org>
> Signed-off-by: Dumitru Ceara <dceara@redhat.com>

Thanks, Han and Dumitru!  Applied.

Best regards, Ilya Maximets.
diff mbox series

Patch

diff --git a/ovsdb/raft.c b/ovsdb/raft.c
index ce40c5bc075c..1a3447a8dd4f 100644
--- a/ovsdb/raft.c
+++ b/ovsdb/raft.c
@@ -4226,7 +4226,7 @@  raft_may_snapshot(const struct raft *raft)
             && !raft->leaving
             && !raft->left
             && !raft->failed
-            && raft->role != RAFT_LEADER
+            && (raft->role == RAFT_FOLLOWER || hmap_count(&raft->servers) == 1)
             && raft->last_applied >= raft->log_start);
 }