diff mbox series

[ovs-dev,ovn,2/2] northd: Improve handling of pause and resume

Message ID 62c2025dae33227fa2e19306e840f10d0d726e96.1574411554.git.frode.nordahl@canonical.com
State Superseded
Headers show
Series northd: Improve pause, resume and status | expand

Commit Message

Frode Nordahl Nov. 21, 2019, 4:32 p.m. UTC
Move paused state to ``struct northd_context`` and pass the
context to paused and status command handlers.

On pause release the OVSDB lock on SB DB.

Re-instante the lock on resume.

Status command will now provide accurate information for 'paused',
'active' and 'standby' states.

Merge separate status command test into the pause and resume tests.

Signed-off-by: Frode Nordahl <frode.nordahl@canonical.com>
---
 northd/ovn-northd.8.xml |  9 +++--
 northd/ovn-northd.c     | 87 +++++++++++++++++++++++++++--------------
 tests/ovn-northd.at     | 24 +++++++-----
 3 files changed, 79 insertions(+), 41 deletions(-)

Comments

0-day Robot Nov. 22, 2019, 9:57 a.m. UTC | #1
Bleep bloop.  Greetings Frode Nordahl, I am a robot and I have tried out your patch.
Thanks for your contribution.

I encountered some error that I wasn't expecting.  See the details below.


git-am:
fatal: sha1 information is lacking or useless (tests/ovn-northd.at).
Repository lacks necessary blobs to fall back on 3-way merge.
Cannot fall back to three-way merge.
Patch failed at 0001 northd: Improve handling of pause and resume
The copy of the patch that failed is found in:
   /var/lib/jenkins/jobs/upstream_build_from_pw/workspace/OVN/.git/rebase-apply/patch
When you have resolved this problem, run "git am --resolved".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".


Please check this out.  If you feel there has been an error, please email aconole@redhat.com

Thanks,
0-day Robot
Frode Nordahl Nov. 22, 2019, 10:12 a.m. UTC | #2
This is weird, the patches apply on an up-to-date ovn repo with git am
on blobs from the list.

I noticed the mailing list thread order is in reverse with patch
number two preceding patch number one, could that be the issue?
Frode Nordahl Nov. 22, 2019, 10:46 a.m. UTC | #3
On Fri, Nov 22, 2019 at 11:12 AM Frode Nordahl
<frode.nordahl@canonical.com> wrote:
>
> This is weird, the patches apply on an up-to-date ovn repo with git am
> on blobs from the list.
>
> I noticed the mailing list thread order is in reverse with patch
> number two preceding patch number one, could that be the issue?

It may also be me editing the output from ``git format-patch`` prior to
submitting, but cannot reproduce the error reported by the bot locally
with blobs gleaned from list, so would love to hear your views on
what's going on here.
Frode Nordahl Nov. 22, 2019, 11:17 a.m. UTC | #4
On Fri, Nov 22, 2019 at 11:46 AM Frode Nordahl
<frode.nordahl@canonical.com> wrote:
>
> On Fri, Nov 22, 2019 at 11:12 AM Frode Nordahl
> <frode.nordahl@canonical.com> wrote:
> >
> > This is weird, the patches apply on an up-to-date ovn repo with git am
> > on blobs from the list.
> >
> > I noticed the mailing list thread order is in reverse with patch
> > number two preceding patch number one, could that be the issue?
>
> It may also be me editing the output from ``git format-patch`` prior to
> submitting, but cannot reproduce the error reported by the bot locally
> with blobs gleaned from list, so would love to hear your views on
> what's going on here.

Ah yes that must be it, the commit msg is part of the commit sha1
and apparently I have tampered with it, so this is my bad.

Sorry about the noise, I'll submit an untampered v2.
Aaron Conole Nov. 22, 2019, 2:50 p.m. UTC | #5
Frode Nordahl <frode.nordahl@canonical.com> writes:

> On Fri, Nov 22, 2019 at 11:46 AM Frode Nordahl
> <frode.nordahl@canonical.com> wrote:
>>
>> On Fri, Nov 22, 2019 at 11:12 AM Frode Nordahl
>> <frode.nordahl@canonical.com> wrote:
>> >
>> > This is weird, the patches apply on an up-to-date ovn repo with git am
>> > on blobs from the list.
>> >
>> > I noticed the mailing list thread order is in reverse with patch
>> > number two preceding patch number one, could that be the issue?
>>
>> It may also be me editing the output from ``git format-patch`` prior to
>> submitting, but cannot reproduce the error reported by the bot locally
>> with blobs gleaned from list, so would love to hear your views on
>> what's going on here.
>
> Ah yes that must be it, the commit msg is part of the commit sha1
> and apparently I have tampered with it, so this is my bad.
>
> Sorry about the noise, I'll submit an untampered v2.

Glad it worked out :)

In the future, you can try and grab the patch from the patchwork
instance (https://patchwork.ozlabs.org/project/openvswitch/) and check
that it matches and applies.

I'll try to improve the output of the bot as well to include base commit
information (so at least we can try to reproduce the errors).
diff mbox series

Patch

diff --git a/northd/ovn-northd.8.xml b/northd/ovn-northd.8.xml
index 9734e79e6..c6d5d96b9 100644
--- a/northd/ovn-northd.8.xml
+++ b/northd/ovn-northd.8.xml
@@ -74,13 +74,15 @@ 
       <dt><code>pause</code></dt>
       <dd>
         Pauses the ovn-northd operation from processing any Northbound and
-        Southbound database changes.
+        Southbound database changes.  This will also instruct ovn-northd to
+        drop any lock on SB DB.
       </dd>
 
       <dt><code>resume</code></dt>
       <dd>
         Resumes the ovn-northd operation to process Northbound and
-        Southbound database contents and generate logical flows.
+        Southbound database contents and generate logical flows.  This will
+        also instruct ovn-northd to aspire for the lock on SB DB.
       </dd>
 
       <dt><code>is-paused</code></dt>
@@ -91,7 +93,8 @@ 
       <dt><code>status</code></dt>
       <dd>
         Prints this server's status.  Status will be "active" if ovn-northd has
-        acquired OVSDB lock on NB DB, "standby" otherwise.
+        acquired OVSDB lock on SB DB, "standby" if it has not or "paused" if
+        this instance is paused.
       </dd>
       </dl>
     </p>
diff --git a/northd/ovn-northd.c b/northd/ovn-northd.c
index a943e1037..e19515d14 100644
--- a/northd/ovn-northd.c
+++ b/northd/ovn-northd.c
@@ -65,6 +65,7 @@  struct northd_context {
     struct ovsdb_idl_index *sbrec_ha_chassis_grp_by_name;
     struct ovsdb_idl_index *sbrec_mcast_group_by_name_dp;
     struct ovsdb_idl_index *sbrec_ip_mcast_by_dp;
+    bool paused;
 };
 
 /* An IPv4 or IPv6 address */
@@ -10836,6 +10837,8 @@  add_column_noalert(struct ovsdb_idl *idl,
     ovsdb_idl_omit_alert(idl, column);
 }
 
+#define OVN_NORTHD_SB_DB_LOCK_NAME "ovn_northd"
+
 int
 main(int argc, char *argv[])
 {
@@ -10843,8 +10846,10 @@  main(int argc, char *argv[])
     struct unixctl_server *unixctl;
     int retval;
     bool exiting;
-    bool paused;
     bool had_lock;
+    struct northd_context ctx;
+
+    memset(&ctx, 0, sizeof(ctx));
 
     fatal_ignore_sigpipe();
     ovs_cmdl_proctitle_init(argc, argv);
@@ -10866,11 +10871,11 @@  main(int argc, char *argv[])
         exit(EXIT_FAILURE);
     }
     unixctl_command_register("exit", "", 0, 0, ovn_northd_exit, &exiting);
-    unixctl_command_register("pause", "", 0, 0, ovn_northd_pause, &paused);
-    unixctl_command_register("resume", "", 0, 0, ovn_northd_resume, &paused);
+    unixctl_command_register("pause", "", 0, 0, ovn_northd_pause, &ctx);
+    unixctl_command_register("resume", "", 0, 0, ovn_northd_resume, &ctx);
     unixctl_command_register("is-paused", "", 0, 0, ovn_northd_is_paused,
-                             &paused);
-    unixctl_command_register("status", "", 0, 0, ovn_northd_status, &had_lock);
+                             &ctx);
+    unixctl_command_register("status", "", 0, 0, ovn_northd_status, &ctx);
 
     daemonize_complete();
 
@@ -11075,23 +11080,21 @@  main(int argc, char *argv[])
     /* Ensure that only a single ovn-northd is active in the deployment by
      * acquiring a lock called "ovn_northd" on the southbound database
      * and then only performing DB transactions if the lock is held. */
-    ovsdb_idl_set_lock(ovnsb_idl_loop.idl, "ovn_northd");
+    ovsdb_idl_set_lock(ovnsb_idl_loop.idl, OVN_NORTHD_SB_DB_LOCK_NAME);
 
     /* Main loop. */
     exiting = false;
-    paused = false;
+    ctx.paused = false;
     had_lock = false;
     while (!exiting) {
-        if (!paused) {
-            struct northd_context ctx = {
-                .ovnnb_idl = ovnnb_idl_loop.idl,
-                .ovnnb_txn = ovsdb_idl_loop_run(&ovnnb_idl_loop),
-                .ovnsb_idl = ovnsb_idl_loop.idl,
-                .ovnsb_txn = ovsdb_idl_loop_run(&ovnsb_idl_loop),
-                .sbrec_ha_chassis_grp_by_name = sbrec_ha_chassis_grp_by_name,
-                .sbrec_mcast_group_by_name_dp = sbrec_mcast_group_by_name_dp,
-                .sbrec_ip_mcast_by_dp = sbrec_ip_mcast_by_dp,
-            };
+        if (!ctx.paused) {
+            ctx.ovnnb_idl = ovnnb_idl_loop.idl;
+            ctx.ovnnb_txn = ovsdb_idl_loop_run(&ovnnb_idl_loop);
+            ctx.ovnsb_idl = ovnsb_idl_loop.idl;
+            ctx.ovnsb_txn = ovsdb_idl_loop_run(&ovnsb_idl_loop);
+            ctx.sbrec_ha_chassis_grp_by_name = sbrec_ha_chassis_grp_by_name;
+            ctx.sbrec_mcast_group_by_name_dp = sbrec_mcast_group_by_name_dp;
+            ctx.sbrec_ip_mcast_by_dp = sbrec_ip_mcast_by_dp;
 
             if (!had_lock && ovsdb_idl_has_lock(ovnsb_idl_loop.idl)) {
                 VLOG_INFO("ovn-northd lock acquired. "
@@ -11125,6 +11128,11 @@  main(int argc, char *argv[])
             ovsdb_idl_run(ovnsb_idl_loop.idl);
             ovsdb_idl_wait(ovnnb_idl_loop.idl);
             ovsdb_idl_wait(ovnsb_idl_loop.idl);
+            /*
+             * the lock is dropped on pause, avoid incorrect message logged
+             * about lock lost when resumed.
+             */
+            had_lock = false;
         }
 
         unixctl_server_run(unixctl);
@@ -11159,30 +11167,41 @@  ovn_northd_exit(struct unixctl_conn *conn, int argc OVS_UNUSED,
 
 static void
 ovn_northd_pause(struct unixctl_conn *conn, int argc OVS_UNUSED,
-                const char *argv[] OVS_UNUSED, void *pause_)
+                const char *argv[] OVS_UNUSED, void *ctx_)
 {
-    bool *pause = pause_;
-    *pause = true;
+    struct northd_context  *ctx = ctx_;
+
+    if (!ctx->paused && ctx->ovnsb_idl != NULL) {
+        /* Drop our aspiration for the lock while paused */
+        ovsdb_idl_set_lock(ctx->ovnsb_idl, NULL);
+        ctx->paused = true;
+        VLOG_INFO("This ovn-northd instance is now paused.");
+    }
 
     unixctl_command_reply(conn, NULL);
 }
 
 static void
 ovn_northd_resume(struct unixctl_conn *conn, int argc OVS_UNUSED,
-                  const char *argv[] OVS_UNUSED, void *pause_)
+                  const char *argv[] OVS_UNUSED, void *ctx_)
 {
-    bool *pause = pause_;
-    *pause = false;
+    struct northd_context *ctx = ctx_;
+
+    if (ctx->paused && ctx->ovnsb_idl != NULL) {
+        /* Reinstate our aspiration for lock */
+        ovsdb_idl_set_lock(ctx->ovnsb_idl, OVN_NORTHD_SB_DB_LOCK_NAME);
+        ctx->paused = false;
+    }
 
     unixctl_command_reply(conn, NULL);
 }
 
 static void
 ovn_northd_is_paused(struct unixctl_conn *conn, int argc OVS_UNUSED,
-                     const char *argv[] OVS_UNUSED, void *paused_)
+                     const char *argv[] OVS_UNUSED, void *ctx_)
 {
-    bool *paused = paused_;
-    if (*paused) {
+    struct northd_context *ctx = ctx_;
+    if (ctx->paused) {
         unixctl_command_reply(conn, "true");
     } else {
         unixctl_command_reply(conn, "false");
@@ -11191,15 +11210,25 @@  ovn_northd_is_paused(struct unixctl_conn *conn, int argc OVS_UNUSED,
 
 static void
 ovn_northd_status(struct unixctl_conn *conn, int argc OVS_UNUSED,
-                  const char *argv[] OVS_UNUSED, void *had_lock_)
+                  const char *argv[] OVS_UNUSED, void *ctx_)
 {
-    bool *had_lock = had_lock_;
+    struct northd_context *ctx = ctx_;
+    char *status;
+
+    if (ctx->paused) {
+        status = "paused";
+    } else if (ctx->ovnsb_idl != NULL) {
+        status = ovsdb_idl_has_lock(ctx->ovnsb_idl) ? "active" : "standby";
+    } else {
+        status = "unknown";
+    }
+
     /*
      * Use a labelled formatted output so we can add more to the status command
      * later without breaking any consuming scripts
      */
     struct ds s = DS_EMPTY_INITIALIZER;
-    ds_put_format(&s, "Status: %s\n", *had_lock ? "active" : "standby");
+    ds_put_format(&s, "Status: %s\n", status);
     unixctl_command_reply(conn, ds_cstr(&s));
     ds_destroy(&s);
 }
diff --git a/tests/ovn-northd.at b/tests/ovn-northd.at
index 1c94f2f65..d73a22f68 100644
--- a/tests/ovn-northd.at
+++ b/tests/ovn-northd.at
@@ -899,22 +899,18 @@  OVS_APP_EXIT_AND_WAIT([ovn-northd])
 
 AT_CLEANUP
 
-AT_SETUP([ovn -- ovn-northd status])
-AT_SKIP_IF([test $HAVE_PYTHON = no])
-ovn_start
-
-AT_CHECK([as northd ovn-appctl -t ovn-northd status], [0], [Status: active
-])
-
-AT_CLEANUP
-
 AT_SETUP([ovn -- ovn-northd pause and resume])
 AT_SKIP_IF([test $HAVE_PYTHON = no])
 ovn_start
 
 AT_CHECK([test xfalse = x`as northd ovn-appctl -t ovn-northd is-paused`])
+AT_CHECK([as northd ovn-appctl -t ovn-northd status], [0], [Status: active
+])
 AT_CHECK([test xfalse = x`as northd-backup ovn-appctl -t ovn-northd \
 is-paused`])
+AT_CHECK([as northd-backup ovn-appctl -t ovn-northd status], [0],
+[Status: standby
+])
 
 ovn-nbctl ls-add sw0
 
@@ -931,7 +927,12 @@  OVS_WAIT_UNTIL([
 as northd ovs-appctl -t ovn-northd pause
 as northd-backup ovs-appctl -t ovn-northd pause
 AT_CHECK([test xtrue = x`as northd ovn-appctl -t ovn-northd is-paused`])
+AT_CHECK([as northd ovn-appctl -t ovn-northd status], [0], [Status: paused
+])
 AT_CHECK([test xtrue = x`as northd-backup ovn-appctl -t ovn-northd is-paused`])
+AT_CHECK([as northd-backup ovn-appctl -t ovn-northd status], [0],
+[Status: paused
+])
 
 ovn-nbctl ls-add sw0
 
@@ -944,8 +945,13 @@  OVS_WAIT_UNTIL([
 as northd ovs-appctl -t ovn-northd resume
 as northd-backup ovs-appctl -t ovn-northd resume
 AT_CHECK([test xfalse = x`as northd ovn-appctl -t ovn-northd is-paused`])
+AT_CHECK([as northd ovn-appctl -t ovn-northd status], [0], [Status: active
+])
 AT_CHECK([test xfalse = x`as northd-backup ovn-appctl -t ovn-northd \
 is-paused`])
+AT_CHECK([as northd-backup ovn-appctl -t ovn-northd status], [0],
+[Status: standby
+])
 
 OVS_WAIT_UNTIL([
     ovn-sbctl lflow-list sw0