diff mbox series

[ovs-dev] ovn-controller: Fix busy loop when ofctrl is disconnected.

Message ID 20240320054127.979228-1-hzhou@ovn.org
State Accepted
Headers show
Series [ovs-dev] ovn-controller: Fix busy loop when ofctrl is disconnected. | expand

Checks

Context Check Description
ovsrobot/apply-robot success apply and check: success
ovsrobot/github-robot-_Build_and_Test success github build: passed
ovsrobot/github-robot-_ovn-kubernetes success github build: passed

Commit Message

Han Zhou March 20, 2024, 5:41 a.m. UTC
ovn-controller runs at 100% cpu when OVS exits. This is because the
ofctrl_run is not called while ofctrl_wait is always called in the main
loop. Because of the missing ofctrl_run, it doesn't even detect that the
ofctrl connection is disconnected.

This patch fixes the issue by always giving a chance to run ofctrl_run
as long as ofctrl_wait is called.

Fixes: 1d6d953bf883 ("controller: Don't artificially limit group and meter IDs to 16bit.")
Fixes: 94cbc59dc0f1 ("ovn-controller: Fix use of dangling pointers in I-P runtime_data.")
Signed-off-by: Han Zhou <hzhou@ovn.org>
---
 controller/ofctrl.c         | 2 +-
 controller/ovn-controller.c | 9 +++++----
 2 files changed, 6 insertions(+), 5 deletions(-)

Comments

Dumitru Ceara March 20, 2024, 9:56 a.m. UTC | #1
On 3/20/24 06:41, Han Zhou wrote:
> ovn-controller runs at 100% cpu when OVS exits. This is because the
> ofctrl_run is not called while ofctrl_wait is always called in the main
> loop. Because of the missing ofctrl_run, it doesn't even detect that the
> ofctrl connection is disconnected.
> 
> This patch fixes the issue by always giving a chance to run ofctrl_run
> as long as ofctrl_wait is called.
> 
> Fixes: 1d6d953bf883 ("controller: Don't artificially limit group and meter IDs to 16bit.")
> Fixes: 94cbc59dc0f1 ("ovn-controller: Fix use of dangling pointers in I-P runtime_data.")
> Signed-off-by: Han Zhou <hzhou@ovn.org>
> ---

Thanks for the patch, Han!  I tested it in a sandbox and it fixes the
issue.  Looks good to me:

Acked-by: Dumitru Ceara <dceara@redhat.com>

Regards,
Dumitru
Han Zhou March 20, 2024, 5:56 p.m. UTC | #2
On Wed, Mar 20, 2024 at 2:56 AM Dumitru Ceara <dceara@redhat.com> wrote:
>
> On 3/20/24 06:41, Han Zhou wrote:
> > ovn-controller runs at 100% cpu when OVS exits. This is because the
> > ofctrl_run is not called while ofctrl_wait is always called in the main
> > loop. Because of the missing ofctrl_run, it doesn't even detect that the
> > ofctrl connection is disconnected.
> >
> > This patch fixes the issue by always giving a chance to run ofctrl_run
> > as long as ofctrl_wait is called.
> >
> > Fixes: 1d6d953bf883 ("controller: Don't artificially limit group and
meter IDs to 16bit.")
> > Fixes: 94cbc59dc0f1 ("ovn-controller: Fix use of dangling pointers in
I-P runtime_data.")
> > Signed-off-by: Han Zhou <hzhou@ovn.org>
> > ---
>
> Thanks for the patch, Han!  I tested it in a sandbox and it fixes the
> issue.  Looks good to me:
>
> Acked-by: Dumitru Ceara <dceara@redhat.com>
>
Thanks Dumitru. I applied to main and backported down to branch-23.06.

> Regards,
> Dumitru
>
diff mbox series

Patch

diff --git a/controller/ofctrl.c b/controller/ofctrl.c
index f14cd79a8dbb..1a2d2be791eb 100644
--- a/controller/ofctrl.c
+++ b/controller/ofctrl.c
@@ -787,7 +787,7 @@  ofctrl_run(const struct ovsrec_bridge *br_int,
 
     rconn_run(swconn);
 
-    if (!rconn_is_connected(swconn)) {
+    if (!rconn_is_connected(swconn) || !pending_ct_zones) {
         return reconnected;
     }
 
diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c
index 1c9960c708bf..c9ff5967a2af 100644
--- a/controller/ovn-controller.c
+++ b/controller/ovn-controller.c
@@ -5736,10 +5736,11 @@  main(int argc, char *argv[])
                 }
             }
 
-            if (br_int && ovs_feature_set_discovered()) {
+            if (br_int) {
                 ct_zones_data = engine_get_data(&en_ct_zones);
-                if (ct_zones_data && ofctrl_run(br_int, ovs_table,
-                                                &ct_zones_data->pending)) {
+                if (ofctrl_run(br_int, ovs_table,
+                               ct_zones_data ? &ct_zones_data->pending
+                                             : NULL)) {
                     static struct vlog_rate_limit rl
                             = VLOG_RATE_LIMIT_INIT(1, 1);
 
@@ -5748,7 +5749,7 @@  main(int argc, char *argv[])
                     engine_set_force_recompute(true);
                 }
 
-                if (chassis) {
+                if (chassis && ovs_feature_set_discovered()) {
                     encaps_run(ovs_idl_txn, br_int,
                                sbrec_chassis_table_get(ovnsb_idl_loop.idl),
                                chassis,