[ovs-dev,ovn,v2] Fix ha chassis failover issues for stale ha chassis entries
diff mbox series

Message ID 20191107093643.2434377-1-numans@ovn.org
State Accepted
Headers show
Series
  • [ovs-dev,ovn,v2] Fix ha chassis failover issues for stale ha chassis entries
Related show

Commit Message

Numan Siddique Nov. 7, 2019, 9:36 a.m. UTC
From: Numan Siddique <numans@ovn.org>

If ha chassis rows of an HA chassis group become stale i.e the HA_Chassis.chassis
column is empty (because ovn-controller is not running in that chassis)
except one row and when ha_chassis_group_is_active()
is called on that ovn-controller, then it returns false. Ideally it should
become active since its the only active chassis. This patch fixes this issue.

Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=1762777
Reported-by: Daniel Alvarez <dalvarez@redhat.com>

Signed-off-by: Numan Siddique <numans@ovn.org>
---

v1 -> v2
------
  * Addresses Dumitru's comments.

 controller/ha-chassis.c | 25 +++++++++++++++++++++++++
 tests/ovn.at            | 20 +++++++++++++++++++-
 2 files changed, 44 insertions(+), 1 deletion(-)

Comments

Dumitru Ceara Nov. 7, 2019, 12:26 p.m. UTC | #1
On Thu, Nov 7, 2019 at 10:37 AM <numans@ovn.org> wrote:
>
> From: Numan Siddique <numans@ovn.org>
>
> If ha chassis rows of an HA chassis group become stale i.e the HA_Chassis.chassis
> column is empty (because ovn-controller is not running in that chassis)
> except one row and when ha_chassis_group_is_active()
> is called on that ovn-controller, then it returns false. Ideally it should
> become active since its the only active chassis. This patch fixes this issue.
>
> Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=1762777
> Reported-by: Daniel Alvarez <dalvarez@redhat.com>
>
> Signed-off-by: Numan Siddique <numans@ovn.org>

Hi Numan,

Looks good to me.

Thanks,
Dumitru

Acked-by: Dumitru Ceara <dceara@redhat.com>

> ---
>
> v1 -> v2
> ------
>   * Addresses Dumitru's comments.
>
>  controller/ha-chassis.c | 25 +++++++++++++++++++++++++
>  tests/ovn.at            | 20 +++++++++++++++++++-
>  2 files changed, 44 insertions(+), 1 deletion(-)
>
> diff --git a/controller/ha-chassis.c b/controller/ha-chassis.c
> index 6d9426a5c..d6ec7b658 100644
> --- a/controller/ha-chassis.c
> +++ b/controller/ha-chassis.c
> @@ -142,6 +142,27 @@ ha_chassis_destroy_ordered(struct ha_chassis_ordered *ordered_ha_ch)
>      }
>  }
>
> +/* Returns true if there is only one active ha chassis in the chassis group
> + * (i.e HA_Chassis.chassis column is set) and that active ha chassis is
> + * local chassis.
> + * Returns false otherwise. */
> +static bool
> +is_local_chassis_only_candidate(const struct sbrec_ha_chassis_group *ha_ch_grp,
> +                                const struct sbrec_chassis *local_chassis)
> +{
> +    size_t n_active_ha_chassis = 0;
> +    bool local_chassis_present = false;
> +    for (size_t i = 0; i < ha_ch_grp->n_ha_chassis; i++) {
> +        if (ha_ch_grp->ha_chassis[i]->chassis) {
> +            n_active_ha_chassis++;
> +            if (ha_ch_grp->ha_chassis[i]->chassis == local_chassis) {
> +                local_chassis_present = true;
> +            }
> +        }
> +    }
> +
> +    return (local_chassis_present && n_active_ha_chassis == 1);
> +}
>
>  /* Returns true if the local_chassis is the master of
>   * the HA chassis group, false otherwise. */
> @@ -159,6 +180,10 @@ ha_chassis_group_is_active(
>          return (ha_ch_grp->ha_chassis[0]->chassis == local_chassis);
>      }
>
> +    if (is_local_chassis_only_candidate(ha_ch_grp, local_chassis)) {
> +        return true;
> +    }
> +
>      if (sset_is_empty(active_tunnels)) {
>          /* If active tunnel sset is empty, it means it has lost
>           * connectivity with other chassis. */
> diff --git a/tests/ovn.at b/tests/ovn.at
> index 410f4b514..cb7903db8 100644
> --- a/tests/ovn.at
> +++ b/tests/ovn.at
> @@ -13413,7 +13413,25 @@ OVS_WAIT_UNTIL(
>  logical_port=ls1-lp_ext1`
>      test "$chassis" = "$hv1_uuid"])
>
> -OVN_CLEANUP([hv1],[hv2],[hv3])
> +# Stop ovn-controllers on hv1 and hv3.
> +as hv1 ovn-appctl -t ovn-controller exit
> +as hv3 ovn-appctl -t ovn-controller exit
> +
> +# hv2 should be master and claim ls1-lp_ext1
> +OVS_WAIT_UNTIL(
> +    [chassis=`ovn-sbctl --bare --columns chassis find port_binding \
> +logical_port=ls1-lp_ext1`
> +    test "$chassis" = "$hv2_uuid"])
> +
> +as hv1
> +OVS_APP_EXIT_AND_WAIT([ovs-vswitchd])
> +OVS_APP_EXIT_AND_WAIT([ovsdb-server])
> +
> +as hv3
> +OVS_APP_EXIT_AND_WAIT([ovs-vswitchd])
> +OVS_APP_EXIT_AND_WAIT([ovsdb-server])
> +
> +OVN_CLEANUP([hv2])
>  AT_CLEANUP
>
>  AT_SETUP([ovn -- Address Set Incremental Processing])
> --
> 2.23.0
>
> _______________________________________________
> dev mailing list
> dev@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Numan Siddique Nov. 7, 2019, 1:35 p.m. UTC | #2
On Thu, Nov 7, 2019 at 5:57 PM Dumitru Ceara <dceara@redhat.com> wrote:
>
> On Thu, Nov 7, 2019 at 10:37 AM <numans@ovn.org> wrote:
> >
> > From: Numan Siddique <numans@ovn.org>
> >
> > If ha chassis rows of an HA chassis group become stale i.e the HA_Chassis.chassis
> > column is empty (because ovn-controller is not running in that chassis)
> > except one row and when ha_chassis_group_is_active()
> > is called on that ovn-controller, then it returns false. Ideally it should
> > become active since its the only active chassis. This patch fixes this issue.
> >
> > Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=1762777
> > Reported-by: Daniel Alvarez <dalvarez@redhat.com>
> >
> > Signed-off-by: Numan Siddique <numans@ovn.org>
>
> Hi Numan,
>
> Looks good to me.
>
> Thanks,
> Dumitru
>
> Acked-by: Dumitru Ceara <dceara@redhat.com>

Thanks. I applied this to master.


Numan

>
> > ---
> >
> > v1 -> v2
> > ------
> >   * Addresses Dumitru's comments.
> >
> >  controller/ha-chassis.c | 25 +++++++++++++++++++++++++
> >  tests/ovn.at            | 20 +++++++++++++++++++-
> >  2 files changed, 44 insertions(+), 1 deletion(-)
> >
> > diff --git a/controller/ha-chassis.c b/controller/ha-chassis.c
> > index 6d9426a5c..d6ec7b658 100644
> > --- a/controller/ha-chassis.c
> > +++ b/controller/ha-chassis.c
> > @@ -142,6 +142,27 @@ ha_chassis_destroy_ordered(struct ha_chassis_ordered *ordered_ha_ch)
> >      }
> >  }
> >
> > +/* Returns true if there is only one active ha chassis in the chassis group
> > + * (i.e HA_Chassis.chassis column is set) and that active ha chassis is
> > + * local chassis.
> > + * Returns false otherwise. */
> > +static bool
> > +is_local_chassis_only_candidate(const struct sbrec_ha_chassis_group *ha_ch_grp,
> > +                                const struct sbrec_chassis *local_chassis)
> > +{
> > +    size_t n_active_ha_chassis = 0;
> > +    bool local_chassis_present = false;
> > +    for (size_t i = 0; i < ha_ch_grp->n_ha_chassis; i++) {
> > +        if (ha_ch_grp->ha_chassis[i]->chassis) {
> > +            n_active_ha_chassis++;
> > +            if (ha_ch_grp->ha_chassis[i]->chassis == local_chassis) {
> > +                local_chassis_present = true;
> > +            }
> > +        }
> > +    }
> > +
> > +    return (local_chassis_present && n_active_ha_chassis == 1);
> > +}
> >
> >  /* Returns true if the local_chassis is the master of
> >   * the HA chassis group, false otherwise. */
> > @@ -159,6 +180,10 @@ ha_chassis_group_is_active(
> >          return (ha_ch_grp->ha_chassis[0]->chassis == local_chassis);
> >      }
> >
> > +    if (is_local_chassis_only_candidate(ha_ch_grp, local_chassis)) {
> > +        return true;
> > +    }
> > +
> >      if (sset_is_empty(active_tunnels)) {
> >          /* If active tunnel sset is empty, it means it has lost
> >           * connectivity with other chassis. */
> > diff --git a/tests/ovn.at b/tests/ovn.at
> > index 410f4b514..cb7903db8 100644
> > --- a/tests/ovn.at
> > +++ b/tests/ovn.at
> > @@ -13413,7 +13413,25 @@ OVS_WAIT_UNTIL(
> >  logical_port=ls1-lp_ext1`
> >      test "$chassis" = "$hv1_uuid"])
> >
> > -OVN_CLEANUP([hv1],[hv2],[hv3])
> > +# Stop ovn-controllers on hv1 and hv3.
> > +as hv1 ovn-appctl -t ovn-controller exit
> > +as hv3 ovn-appctl -t ovn-controller exit
> > +
> > +# hv2 should be master and claim ls1-lp_ext1
> > +OVS_WAIT_UNTIL(
> > +    [chassis=`ovn-sbctl --bare --columns chassis find port_binding \
> > +logical_port=ls1-lp_ext1`
> > +    test "$chassis" = "$hv2_uuid"])
> > +
> > +as hv1
> > +OVS_APP_EXIT_AND_WAIT([ovs-vswitchd])
> > +OVS_APP_EXIT_AND_WAIT([ovsdb-server])
> > +
> > +as hv3
> > +OVS_APP_EXIT_AND_WAIT([ovs-vswitchd])
> > +OVS_APP_EXIT_AND_WAIT([ovsdb-server])
> > +
> > +OVN_CLEANUP([hv2])
> >  AT_CLEANUP
> >
> >  AT_SETUP([ovn -- Address Set Incremental Processing])
> > --
> > 2.23.0
> >
> > _______________________________________________
> > dev mailing list
> > dev@openvswitch.org
> > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>
> _______________________________________________
> dev mailing list
> dev@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev

Patch
diff mbox series

diff --git a/controller/ha-chassis.c b/controller/ha-chassis.c
index 6d9426a5c..d6ec7b658 100644
--- a/controller/ha-chassis.c
+++ b/controller/ha-chassis.c
@@ -142,6 +142,27 @@  ha_chassis_destroy_ordered(struct ha_chassis_ordered *ordered_ha_ch)
     }
 }
 
+/* Returns true if there is only one active ha chassis in the chassis group
+ * (i.e HA_Chassis.chassis column is set) and that active ha chassis is
+ * local chassis.
+ * Returns false otherwise. */
+static bool
+is_local_chassis_only_candidate(const struct sbrec_ha_chassis_group *ha_ch_grp,
+                                const struct sbrec_chassis *local_chassis)
+{
+    size_t n_active_ha_chassis = 0;
+    bool local_chassis_present = false;
+    for (size_t i = 0; i < ha_ch_grp->n_ha_chassis; i++) {
+        if (ha_ch_grp->ha_chassis[i]->chassis) {
+            n_active_ha_chassis++;
+            if (ha_ch_grp->ha_chassis[i]->chassis == local_chassis) {
+                local_chassis_present = true;
+            }
+        }
+    }
+
+    return (local_chassis_present && n_active_ha_chassis == 1);
+}
 
 /* Returns true if the local_chassis is the master of
  * the HA chassis group, false otherwise. */
@@ -159,6 +180,10 @@  ha_chassis_group_is_active(
         return (ha_ch_grp->ha_chassis[0]->chassis == local_chassis);
     }
 
+    if (is_local_chassis_only_candidate(ha_ch_grp, local_chassis)) {
+        return true;
+    }
+
     if (sset_is_empty(active_tunnels)) {
         /* If active tunnel sset is empty, it means it has lost
          * connectivity with other chassis. */
diff --git a/tests/ovn.at b/tests/ovn.at
index 410f4b514..cb7903db8 100644
--- a/tests/ovn.at
+++ b/tests/ovn.at
@@ -13413,7 +13413,25 @@  OVS_WAIT_UNTIL(
 logical_port=ls1-lp_ext1`
     test "$chassis" = "$hv1_uuid"])
 
-OVN_CLEANUP([hv1],[hv2],[hv3])
+# Stop ovn-controllers on hv1 and hv3.
+as hv1 ovn-appctl -t ovn-controller exit
+as hv3 ovn-appctl -t ovn-controller exit
+
+# hv2 should be master and claim ls1-lp_ext1
+OVS_WAIT_UNTIL(
+    [chassis=`ovn-sbctl --bare --columns chassis find port_binding \
+logical_port=ls1-lp_ext1`
+    test "$chassis" = "$hv2_uuid"])
+
+as hv1
+OVS_APP_EXIT_AND_WAIT([ovs-vswitchd])
+OVS_APP_EXIT_AND_WAIT([ovsdb-server])
+
+as hv3
+OVS_APP_EXIT_AND_WAIT([ovs-vswitchd])
+OVS_APP_EXIT_AND_WAIT([ovsdb-server])
+
+OVN_CLEANUP([hv2])
 AT_CLEANUP
 
 AT_SETUP([ovn -- Address Set Incremental Processing])