Message ID | 20240426165448.42125-1-ihrachys@redhat.com |
---|---|
State | Accepted |
Commit | 01a0fff36104790640e274f1d457084aeb5b968d |
Delegated to: | Ilya Maximets |
Headers | show |
Series | [ovs-dev,v3] docs: Document manual cluster recovery procedure. | expand |
Context | Check | Description |
---|---|---|
ovsrobot/apply-robot | success | apply and check: success |
ovsrobot/intel-ovs-compilation | success | test: success |
On Fri, Apr 26, 2024 at 04:54:48PM +0000, Ihar Hrachyshka wrote: > Remove the notion of cluster/leave --force since it was never > implemented. Instead of these instructions, document how a broken > cluster can be re-initialized with the old database contents. > > Signed-off-by: Ihar Hrachyshka <ihrachys@redhat.com> > > --- > > v1: initial version. > v2: remove --force mentioned in ovsdb-server(1). > v3: multiple language and markup changes suggested by Ilya. Thanks for the updates Ihar, this version looks good to me. Acked-by: Simon Horman <horms@ovn.org> ...
On 4/26/24 18:54, Ihar Hrachyshka wrote: > Remove the notion of cluster/leave --force since it was never > implemented. Instead of these instructions, document how a broken > cluster can be re-initialized with the old database contents. > > Signed-off-by: Ihar Hrachyshka <ihrachys@redhat.com> > > --- > > v1: initial version. > v2: remove --force mentioned in ovsdb-server(1). > v3: multiple language and markup changes suggested by Ilya. Thanks, Ihar! This version looks good to me in general. I have a couple of minor nits below. If you agree, I can fold those in while applying the change. Let me know what you think. Best regards, Ilya Maximets. > > --- > Documentation/ref/ovsdb.7.rst | 44 ++++++++++++++++++++++++++++------- > ovsdb/ovsdb-server.1.in | 3 +-- > 2 files changed, 37 insertions(+), 10 deletions(-) > > diff --git a/Documentation/ref/ovsdb.7.rst b/Documentation/ref/ovsdb.7.rst > index 46ed13e61..5766e64b9 100644 > --- a/Documentation/ref/ovsdb.7.rst > +++ b/Documentation/ref/ovsdb.7.rst > @@ -315,16 +315,11 @@ The above methods for adding and removing servers only work for healthy > clusters, that is, for clusters with no more failures than their maximum > tolerance. For example, in a 3-server cluster, the failure of 2 servers > prevents servers joining or leaving the cluster (as well as database access). > + > To prevent data loss or inconsistency, the preferred solution to this problem > is to bring up enough of the failed servers to make the cluster healthy again, > -then if necessary remove any remaining failed servers and add new ones. If > -this cannot be done, though, use ``ovs-appctl`` to invoke ``cluster/leave > ---force`` on a running server. This command forces the server to which it is > -directed to leave its cluster and form a new single-node cluster that contains > -only itself. The data in the new cluster may be inconsistent with the former > -cluster: transactions not yet replicated to the server will be lost, and > -transactions not yet applied to the cluster may be committed. Afterward, any > -servers in its former cluster will regard the server to have failed. > +then if necessary remove any remaining failed servers and add new ones. If this Nit: 2 spaces between sentences. > +is not an option, see the next section for `Manual cluster recovery`_. > > Once a server leaves a cluster, it may never rejoin it. Instead, create a new > server and join it to the cluster. > @@ -362,6 +357,39 @@ Clustered OVSDB does not support the OVSDB "ephemeral columns" feature. > ones when they work with schemas for clustered databases. Future versions of > OVSDB might add support for this feature. > > +Manual cluster recovery > +~~~~~~~~~~~~~~~~~~~~~~~ > + > +.. important:: Nit: An empty line here would be nice to be consistent at least within this document. > + The procedure below will result in ``cid`` and ``sid`` change. A *new* Nit: 2 spaces between sentences. > + cluster will be initialized. > + > +To recover a clustered database after a failure: > + > +1. Stop *all* old cluster ``ovsdb-server`` instances before proceeding. > + > +2. Pick one of the old members which will serve as a bootstrap member of the > + to-be-recovered cluster. > + > +3. Convert its database file to the standalone format using ``ovsdb-tool > + cluster-to-standalone``. > + > +4. Backup the standalone database file. > + > +5. Create a new single-node cluster with ``ovsdb-tool create-cluster`` > + using the previously saved standalone database file, then start > + ``ovsdb-server``. > + > +Once the single-node cluster is up and running and serves the restored data, > +new members should be created and join the new cluster, as usual (``ovsdb-tool > +join-cluster``). I'm having hard time reading 'new members should be created and join' as my brain wants to relate 'should be' to both 'created' and 'join' and 'should be join' is not a correct construct. How about: "new members should be created and added to the cluster, as usual, with ``ovsdb-tool join-cluster``." ? Also, should it be a step 6 ? > + > +.. note:: > + > + The data in the new cluster may be inconsistent with the former cluster: > + transactions not yet replicated to the server chosen in step 2 will be lost, > + and transactions not yet applied to the cluster may be committed. > + > Upgrading from version 2.14 and earlier to 2.15 and later > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > > diff --git a/ovsdb/ovsdb-server.1.in b/ovsdb/ovsdb-server.1.in > index 9fabf2d67..23b8e6e9c 100644 > --- a/ovsdb/ovsdb-server.1.in > +++ b/ovsdb/ovsdb-server.1.in > @@ -461,8 +461,7 @@ This does not result in a three server cluster that lacks quorum. > . > .IP "\fBcluster/kick \fIdb server\fR" > Start graceful removal of \fIserver\fR from \fIdb\fR's cluster, like > -\fBcluster/leave\fR (without \fB\-\-force\fR) except that it can > -remove any server, not just this one. > +\fBcluster/leave\fR, except that it can remove any server, not just this one. > .IP > \fIserver\fR may be a server ID, as printed by \fBcluster/sid\fR, or > the server's local network address as passed to \fBovsdb-tool\fR's
On Thu, May 2, 2024 at 5:52 PM Ilya Maximets <i.maximets@ovn.org> wrote: > On 4/26/24 18:54, Ihar Hrachyshka wrote: > > Remove the notion of cluster/leave --force since it was never > > implemented. Instead of these instructions, document how a broken > > cluster can be re-initialized with the old database contents. > > > > Signed-off-by: Ihar Hrachyshka <ihrachys@redhat.com> > > > > --- > > > > v1: initial version. > > v2: remove --force mentioned in ovsdb-server(1). > > v3: multiple language and markup changes suggested by Ilya. > > Thanks, Ihar! This version looks good to me in general. > I have a couple of minor nits below. If you agree, I can > fold those in while applying the change. > Feel free to. And thanks for your patience. > > Let me know what you think. > > Best regards, Ilya Maximets. > > > > > --- > > Documentation/ref/ovsdb.7.rst | 44 ++++++++++++++++++++++++++++------- > > ovsdb/ovsdb-server.1.in | 3 +-- > > 2 files changed, 37 insertions(+), 10 deletions(-) > > > > diff --git a/Documentation/ref/ovsdb.7.rst > b/Documentation/ref/ovsdb.7.rst > > index 46ed13e61..5766e64b9 100644 > > --- a/Documentation/ref/ovsdb.7.rst > > +++ b/Documentation/ref/ovsdb.7.rst > > @@ -315,16 +315,11 @@ The above methods for adding and removing servers > only work for healthy > > clusters, that is, for clusters with no more failures than their maximum > > tolerance. For example, in a 3-server cluster, the failure of 2 servers > > prevents servers joining or leaving the cluster (as well as database > access). > > + > > To prevent data loss or inconsistency, the preferred solution to this > problem > > is to bring up enough of the failed servers to make the cluster healthy > again, > > -then if necessary remove any remaining failed servers and add new > ones. If > > -this cannot be done, though, use ``ovs-appctl`` to invoke > ``cluster/leave > > ---force`` on a running server. This command forces the server to which > it is > > -directed to leave its cluster and form a new single-node cluster that > contains > > -only itself. The data in the new cluster may be inconsistent with the > former > > -cluster: transactions not yet replicated to the server will be lost, and > > -transactions not yet applied to the cluster may be committed. > Afterward, any > > -servers in its former cluster will regard the server to have failed. > > +then if necessary remove any remaining failed servers and add new ones. > If this > > Nit: 2 spaces between sentences. > > > +is not an option, see the next section for `Manual cluster recovery`_. > > > > Once a server leaves a cluster, it may never rejoin it. Instead, > create a new > > server and join it to the cluster. > > @@ -362,6 +357,39 @@ Clustered OVSDB does not support the OVSDB > "ephemeral columns" feature. > > ones when they work with schemas for clustered databases. Future > versions of > > OVSDB might add support for this feature. > > > > +Manual cluster recovery > > +~~~~~~~~~~~~~~~~~~~~~~~ > > + > > +.. important:: > > Nit: An empty line here would be nice to be consistent at least > within this document. > > > + The procedure below will result in ``cid`` and ``sid`` change. A > *new* > > Nit: 2 spaces between sentences. > > > + cluster will be initialized. > > + > > +To recover a clustered database after a failure: > > + > > +1. Stop *all* old cluster ``ovsdb-server`` instances before proceeding. > > + > > +2. Pick one of the old members which will serve as a bootstrap member > of the > > + to-be-recovered cluster. > > + > > +3. Convert its database file to the standalone format using ``ovsdb-tool > > + cluster-to-standalone``. > > + > > +4. Backup the standalone database file. > > + > > +5. Create a new single-node cluster with ``ovsdb-tool create-cluster`` > > + using the previously saved standalone database file, then start > > + ``ovsdb-server``. > > + > > +Once the single-node cluster is up and running and serves the restored > data, > > +new members should be created and join the new cluster, as usual > (``ovsdb-tool > > +join-cluster``). > > I'm having hard time reading 'new members should be created and join' as > my brain wants to relate 'should be' to both 'created' and 'join' and > 'should be join' is not a correct construct. > > How about: "new members should be created and added to the cluster, as > usual, > with ``ovsdb-tool join-cluster``." ? > Though it doesn't confuse me, I am not a native speaker, and I find your version at least as good as mine, so feel free to change. > > Also, should it be a step 6 ? > > It won't hurt to fold it into the list. > > + > > +.. note:: > > + > > + The data in the new cluster may be inconsistent with the former > cluster: > > + transactions not yet replicated to the server chosen in step 2 will > be lost, > > + and transactions not yet applied to the cluster may be committed. > > + > > Upgrading from version 2.14 and earlier to 2.15 and later > > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > > > > diff --git a/ovsdb/ovsdb-server.1.in b/ovsdb/ovsdb-server.1.in > > index 9fabf2d67..23b8e6e9c 100644 > > --- a/ovsdb/ovsdb-server.1.in > > +++ b/ovsdb/ovsdb-server.1.in > > @@ -461,8 +461,7 @@ This does not result in a three server cluster that > lacks quorum. > > . > > .IP "\fBcluster/kick \fIdb server\fR" > > Start graceful removal of \fIserver\fR from \fIdb\fR's cluster, like > > -\fBcluster/leave\fR (without \fB\-\-force\fR) except that it can > > -remove any server, not just this one. > > +\fBcluster/leave\fR, except that it can remove any server, not just > this one. > > .IP > > \fIserver\fR may be a server ID, as printed by \fBcluster/sid\fR, or > > the server's local network address as passed to \fBovsdb-tool\fR's > >
On 5/3/24 00:42, Ihar Hrachyshka wrote: > On Thu, May 2, 2024 at 5:52 PM Ilya Maximets <i.maximets@ovn.org <mailto:i.maximets@ovn.org>> wrote: > > On 4/26/24 18:54, Ihar Hrachyshka wrote: > > Remove the notion of cluster/leave --force since it was never > > implemented. Instead of these instructions, document how a broken > > cluster can be re-initialized with the old database contents. > > > > Signed-off-by: Ihar Hrachyshka <ihrachys@redhat.com <mailto:ihrachys@redhat.com>> > > > > --- > > > > v1: initial version. > > v2: remove --force mentioned in ovsdb-server(1). > > v3: multiple language and markup changes suggested by Ilya. > > Thanks, Ihar! This version looks good to me in general. > I have a couple of minor nits below. If you agree, I can > fold those in while applying the change. > > > Feel free to. And thanks for your patience. Thanks, Ihar and Simon! I made the discussed changes and applied the patch. Best regards, Ilya Maximets.
diff --git a/Documentation/ref/ovsdb.7.rst b/Documentation/ref/ovsdb.7.rst index 46ed13e61..5766e64b9 100644 --- a/Documentation/ref/ovsdb.7.rst +++ b/Documentation/ref/ovsdb.7.rst @@ -315,16 +315,11 @@ The above methods for adding and removing servers only work for healthy clusters, that is, for clusters with no more failures than their maximum tolerance. For example, in a 3-server cluster, the failure of 2 servers prevents servers joining or leaving the cluster (as well as database access). + To prevent data loss or inconsistency, the preferred solution to this problem is to bring up enough of the failed servers to make the cluster healthy again, -then if necessary remove any remaining failed servers and add new ones. If -this cannot be done, though, use ``ovs-appctl`` to invoke ``cluster/leave ---force`` on a running server. This command forces the server to which it is -directed to leave its cluster and form a new single-node cluster that contains -only itself. The data in the new cluster may be inconsistent with the former -cluster: transactions not yet replicated to the server will be lost, and -transactions not yet applied to the cluster may be committed. Afterward, any -servers in its former cluster will regard the server to have failed. +then if necessary remove any remaining failed servers and add new ones. If this +is not an option, see the next section for `Manual cluster recovery`_. Once a server leaves a cluster, it may never rejoin it. Instead, create a new server and join it to the cluster. @@ -362,6 +357,39 @@ Clustered OVSDB does not support the OVSDB "ephemeral columns" feature. ones when they work with schemas for clustered databases. Future versions of OVSDB might add support for this feature. +Manual cluster recovery +~~~~~~~~~~~~~~~~~~~~~~~ + +.. important:: + The procedure below will result in ``cid`` and ``sid`` change. A *new* + cluster will be initialized. + +To recover a clustered database after a failure: + +1. Stop *all* old cluster ``ovsdb-server`` instances before proceeding. + +2. Pick one of the old members which will serve as a bootstrap member of the + to-be-recovered cluster. + +3. Convert its database file to the standalone format using ``ovsdb-tool + cluster-to-standalone``. + +4. Backup the standalone database file. + +5. Create a new single-node cluster with ``ovsdb-tool create-cluster`` + using the previously saved standalone database file, then start + ``ovsdb-server``. + +Once the single-node cluster is up and running and serves the restored data, +new members should be created and join the new cluster, as usual (``ovsdb-tool +join-cluster``). + +.. note:: + + The data in the new cluster may be inconsistent with the former cluster: + transactions not yet replicated to the server chosen in step 2 will be lost, + and transactions not yet applied to the cluster may be committed. + Upgrading from version 2.14 and earlier to 2.15 and later ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ diff --git a/ovsdb/ovsdb-server.1.in b/ovsdb/ovsdb-server.1.in index 9fabf2d67..23b8e6e9c 100644 --- a/ovsdb/ovsdb-server.1.in +++ b/ovsdb/ovsdb-server.1.in @@ -461,8 +461,7 @@ This does not result in a three server cluster that lacks quorum. . .IP "\fBcluster/kick \fIdb server\fR" Start graceful removal of \fIserver\fR from \fIdb\fR's cluster, like -\fBcluster/leave\fR (without \fB\-\-force\fR) except that it can -remove any server, not just this one. +\fBcluster/leave\fR, except that it can remove any server, not just this one. .IP \fIserver\fR may be a server ID, as printed by \fBcluster/sid\fR, or the server's local network address as passed to \fBovsdb-tool\fR's
Remove the notion of cluster/leave --force since it was never implemented. Instead of these instructions, document how a broken cluster can be re-initialized with the old database contents. Signed-off-by: Ihar Hrachyshka <ihrachys@redhat.com> --- v1: initial version. v2: remove --force mentioned in ovsdb-server(1). v3: multiple language and markup changes suggested by Ilya. --- Documentation/ref/ovsdb.7.rst | 44 ++++++++++++++++++++++++++++------- ovsdb/ovsdb-server.1.in | 3 +-- 2 files changed, 37 insertions(+), 10 deletions(-)