diff mbox series

[ovs-dev] ovn-northd: Add the option to pause and resume

Message ID 20190617054004.5597-1-nusiddiq@redhat.com
State Changes Requested
Headers show
Series [ovs-dev] ovn-northd: Add the option to pause and resume | expand

Commit Message

Numan Siddique June 17, 2019, 5:40 a.m. UTC
From: Numan Siddique <nusiddiq@redhat.com>

This patch adds 3 unixctl socket comments - pause, resume and is-paused.

Usage: ovs-appctl -t ovn-northd pause/resume/is-paused

This feature will be useful if the CMS wants to
  - deploy OVN DB servers in active/passive mode and
  - run ovn-northd on all these nodes and use unix ctl sockets to
    connect to the local OVN DB servers.

On the nodes where OVN Db ovsdb-servers are in passive mode, the local ovn-northds
will process the DB changes and calculate logical flows to be throw out later
because write txns are not allowed by these ovsdb-servers. It results in
unncessary CPU usage.

With these commands, CMS can pause ovn-northd on these node. A node
which becomes master, can resume the ovn-northd.

This feature will be useful in ovn-kubernetes if the above deployment model
is chosen.

Signed-off-by: Numan Siddique <nusiddiq@redhat.com>
---
 ovn/northd/ovn-northd.8.xml |  17 ++++++
 ovn/northd/ovn-northd.c     | 103 +++++++++++++++++++++++++++---------
 tests/ovn-northd.at         |  38 +++++++++++++
 3 files changed, 132 insertions(+), 26 deletions(-)

Comments

Ben Pfaff July 5, 2019, 9:29 p.m. UTC | #1
On Mon, Jun 17, 2019 at 11:10:04AM +0530, nusiddiq@redhat.com wrote:
> From: Numan Siddique <nusiddiq@redhat.com>
> 
> This patch adds 3 unixctl socket comments - pause, resume and is-paused.
> 
> Usage: ovs-appctl -t ovn-northd pause/resume/is-paused
> 
> This feature will be useful if the CMS wants to
>   - deploy OVN DB servers in active/passive mode and
>   - run ovn-northd on all these nodes and use unix ctl sockets to
>     connect to the local OVN DB servers.
> 
> On the nodes where OVN Db ovsdb-servers are in passive mode, the local ovn-northds
> will process the DB changes and calculate logical flows to be throw out later
> because write txns are not allowed by these ovsdb-servers. It results in
> unncessary CPU usage.
> 
> With these commands, CMS can pause ovn-northd on these node. A node
> which becomes master, can resume the ovn-northd.
> 
> This feature will be useful in ovn-kubernetes if the above deployment model
> is chosen.
> 
> Signed-off-by: Numan Siddique <nusiddiq@redhat.com>

Thank you.  This seems like a helpful feature.

The commit message above explains the idea well at a high level, but the
documentation that it adds does not give the high level idea but only
the low level feature description.  This could make it hard for a system
administrator to understand why the low level feature is there.  I would
suggest sending a v2 that explains the high level idea in the
documentation as well as in the commit message.

Thanks,

Ben.
Numan Siddique July 6, 2019, 6:24 a.m. UTC | #2
On Sat, Jul 6, 2019 at 3:00 AM Ben Pfaff <blp@ovn.org> wrote:

> On Mon, Jun 17, 2019 at 11:10:04AM +0530, nusiddiq@redhat.com wrote:
> > From: Numan Siddique <nusiddiq@redhat.com>
> >
> > This patch adds 3 unixctl socket comments - pause, resume and is-paused.
> >
> > Usage: ovs-appctl -t ovn-northd pause/resume/is-paused
> >
> > This feature will be useful if the CMS wants to
> >   - deploy OVN DB servers in active/passive mode and
> >   - run ovn-northd on all these nodes and use unix ctl sockets to
> >     connect to the local OVN DB servers.
> >
> > On the nodes where OVN Db ovsdb-servers are in passive mode, the local
> ovn-northds
> > will process the DB changes and calculate logical flows to be throw out
> later
> > because write txns are not allowed by these ovsdb-servers. It results in
> > unncessary CPU usage.
> >
> > With these commands, CMS can pause ovn-northd on these node. A node
> > which becomes master, can resume the ovn-northd.
> >
> > This feature will be useful in ovn-kubernetes if the above deployment
> model
> > is chosen.
> >
> > Signed-off-by: Numan Siddique <nusiddiq@redhat.com>
>
> Thank you.  This seems like a helpful feature.
>
> The commit message above explains the idea well at a high level, but the
> documentation that it adds does not give the high level idea but only
> the low level feature description.  This could make it hard for a system
> administrator to understand why the low level feature is there.  I would
> suggest sending a v2 that explains the high level idea in the
> documentation as well as in the commit message.
>

Thanks for the review. Sure I will do that and submit v2.

Numan


>
> Thanks,
>
> Ben.
>
diff mbox series

Patch

diff --git a/ovn/northd/ovn-northd.8.xml b/ovn/northd/ovn-northd.8.xml
index e6417220f..ffec67079 100644
--- a/ovn/northd/ovn-northd.8.xml
+++ b/ovn/northd/ovn-northd.8.xml
@@ -70,6 +70,23 @@ 
       <dd>
         Causes <code>ovn-northd</code> to gracefully terminate.
       </dd>
+
+      <dt><code>pause</code></dt>
+      <dd>
+        Pauses the ovn-northd operation from processing any Northbound and
+        Southbound database changes.
+      </dd>
+
+      <dt><code>resume</code></dt>
+      <dd>
+        Resumes the ovn-northd operation to process Northbound and
+        Southbound database contents and generate logical flows.
+      </dd>
+
+      <dt><code>is-paused</code></dt>
+      <dd>
+        Returns "true" if ovn-northd is currently paused, "false" otherwise.
+      </dd>
       </dl>
     </p>
 
diff --git a/ovn/northd/ovn-northd.c b/ovn/northd/ovn-northd.c
index 0b0a96a3a..d3606dc30 100644
--- a/ovn/northd/ovn-northd.c
+++ b/ovn/northd/ovn-northd.c
@@ -50,6 +50,9 @@ 
 VLOG_DEFINE_THIS_MODULE(ovn_northd);
 
 static unixctl_cb_func ovn_northd_exit;
+static unixctl_cb_func ovn_northd_pause;
+static unixctl_cb_func ovn_northd_resume;
+static unixctl_cb_func ovn_northd_is_paused;
 
 struct northd_context {
     struct ovsdb_idl *ovnnb_idl;
@@ -8639,6 +8642,7 @@  main(int argc, char *argv[])
     struct unixctl_server *unixctl;
     int retval;
     bool exiting;
+    bool paused;
 
     fatal_ignore_sigpipe();
     ovs_cmdl_proctitle_init(argc, argv);
@@ -8653,6 +8657,10 @@  main(int argc, char *argv[])
         exit(EXIT_FAILURE);
     }
     unixctl_command_register("exit", "", 0, 0, ovn_northd_exit, &exiting);
+    unixctl_command_register("pause", "", 0, 0, ovn_northd_pause, &paused);
+    unixctl_command_register("resume", "", 0, 0, ovn_northd_resume, &paused);
+    unixctl_command_register("is-paused", "", 0, 0, ovn_northd_is_paused,
+                             &paused);
 
     daemonize_complete();
 
@@ -8809,31 +8817,39 @@  main(int argc, char *argv[])
 
     /* Main loop. */
     exiting = false;
+    paused = false;
     while (!exiting) {
-        struct northd_context ctx = {
-            .ovnnb_idl = ovnnb_idl_loop.idl,
-            .ovnnb_txn = ovsdb_idl_loop_run(&ovnnb_idl_loop),
-            .ovnsb_idl = ovnsb_idl_loop.idl,
-            .ovnsb_txn = ovsdb_idl_loop_run(&ovnsb_idl_loop),
-            .sbrec_ha_chassis_grp_by_name = sbrec_ha_chassis_grp_by_name,
-        };
-
-        if (!had_lock && ovsdb_idl_has_lock(ovnsb_idl_loop.idl)) {
-            VLOG_INFO("ovn-northd lock acquired. "
-                      "This ovn-northd instance is now active.");
-            had_lock = true;
-        } else if (had_lock && !ovsdb_idl_has_lock(ovnsb_idl_loop.idl)) {
-            VLOG_INFO("ovn-northd lock lost. "
-                      "This ovn-northd instance is now on standby.");
-            had_lock = false;
-        }
-
-        if (ovsdb_idl_has_lock(ovnsb_idl_loop.idl)) {
-            ovn_db_run(&ctx, sbrec_chassis_by_name, &ovnsb_idl_loop);
-            if (ctx.ovnsb_txn) {
-                check_and_add_supported_dhcp_opts_to_sb_db(&ctx);
-                check_and_add_supported_dhcpv6_opts_to_sb_db(&ctx);
-                check_and_update_rbac(&ctx);
+        /* unixctl_server_run could modify the value of 'paused'.
+         * So store the value in local 'paused_' so that we run
+         * 'ovsdb_idl_loop_commit_and_wait() at the end of the loop. */
+        bool paused_ = paused;
+
+        if (!paused_) {
+            struct northd_context ctx = {
+                .ovnnb_idl = ovnnb_idl_loop.idl,
+                .ovnnb_txn = ovsdb_idl_loop_run(&ovnnb_idl_loop),
+                .ovnsb_idl = ovnsb_idl_loop.idl,
+                .ovnsb_txn = ovsdb_idl_loop_run(&ovnsb_idl_loop),
+                .sbrec_ha_chassis_grp_by_name = sbrec_ha_chassis_grp_by_name,
+            };
+
+            if (!had_lock && ovsdb_idl_has_lock(ovnsb_idl_loop.idl)) {
+                VLOG_INFO("ovn-northd lock acquired. "
+                        "This ovn-northd instance is now active.");
+                had_lock = true;
+            } else if (had_lock && !ovsdb_idl_has_lock(ovnsb_idl_loop.idl)) {
+                VLOG_INFO("ovn-northd lock lost. "
+                        "This ovn-northd instance is now on standby.");
+                had_lock = false;
+            }
+
+            if (ovsdb_idl_has_lock(ovnsb_idl_loop.idl)) {
+                ovn_db_run(&ctx, sbrec_chassis_by_name, &ovnsb_idl_loop);
+                if (ctx.ovnsb_txn) {
+                    check_and_add_supported_dhcp_opts_to_sb_db(&ctx);
+                    check_and_add_supported_dhcpv6_opts_to_sb_db(&ctx);
+                    check_and_update_rbac(&ctx);
+                }
             }
         }
 
@@ -8842,8 +8858,11 @@  main(int argc, char *argv[])
         if (exiting) {
             poll_immediate_wake();
         }
-        ovsdb_idl_loop_commit_and_wait(&ovnnb_idl_loop);
-        ovsdb_idl_loop_commit_and_wait(&ovnsb_idl_loop);
+
+        if (!paused_) {
+            ovsdb_idl_loop_commit_and_wait(&ovnnb_idl_loop);
+            ovsdb_idl_loop_commit_and_wait(&ovnsb_idl_loop);
+        }
 
         poll_block();
         if (should_service_stop()) {
@@ -8868,3 +8887,35 @@  ovn_northd_exit(struct unixctl_conn *conn, int argc OVS_UNUSED,
 
     unixctl_command_reply(conn, NULL);
 }
+
+static void
+ovn_northd_pause(struct unixctl_conn *conn, int argc OVS_UNUSED,
+                const char *argv[] OVS_UNUSED, void *pause_)
+{
+    bool *pause = pause_;
+    *pause = true;
+
+    unixctl_command_reply(conn, NULL);
+}
+
+static void
+ovn_northd_resume(struct unixctl_conn *conn, int argc OVS_UNUSED,
+                  const char *argv[] OVS_UNUSED, void *pause_)
+{
+    bool *pause = pause_;
+    *pause = false;
+
+    unixctl_command_reply(conn, NULL);
+}
+
+static void
+ovn_northd_is_paused(struct unixctl_conn *conn, int argc OVS_UNUSED,
+                     const char *argv[] OVS_UNUSED, void *paused_)
+{
+    bool *paused = paused_;
+    if (*paused) {
+        unixctl_command_reply(conn, "true");
+    } else {
+        unixctl_command_reply(conn, "false");
+    }
+}
diff --git a/tests/ovn-northd.at b/tests/ovn-northd.at
index 62e58fd0e..0dea04edc 100644
--- a/tests/ovn-northd.at
+++ b/tests/ovn-northd.at
@@ -898,3 +898,41 @@  as northd
 OVS_APP_EXIT_AND_WAIT([ovn-northd])
 
 AT_CLEANUP
+
+AT_SETUP([ovn -- ovn-northd pause and resume])
+AT_SKIP_IF([test $HAVE_PYTHON = no])
+ovn_start
+
+AT_CHECK([test xfalse = x`as northd ovs-appctl -t ovn-northd is-paused`])
+
+ovn-nbctl ls-add sw0
+
+OVS_WAIT_UNTIL([
+    ovn-sbctl lflow-list sw0
+    test 0 = $?])
+
+ovn-nbctl ls-del sw0
+OVS_WAIT_UNTIL([
+    ovn-sbctl lflow-list sw0
+    test 1 = $?])
+
+# Now pause the ovn-northd
+as northd ovs-appctl -t ovn-northd pause
+AT_CHECK([test xtrue = x`as northd ovs-appctl -t ovn-northd is-paused`])
+
+ovn-nbctl ls-add sw0
+
+# There should be no logical flows for sw0 datapath.
+OVS_WAIT_UNTIL([
+    ovn-sbctl lflow-list sw0
+    test 1 = $?])
+
+# Now resume ovn-northd
+as northd ovs-appctl -t ovn-northd resume
+AT_CHECK([test xfalse = x`as northd ovs-appctl -t ovn-northd is-paused`])
+
+OVS_WAIT_UNTIL([
+    ovn-sbctl lflow-list sw0
+    test 0 = $?])
+
+AT_CLEANUP