Message ID | 20191107093643.2434377-1-numans@ovn.org |
---|---|
State | Accepted |
Headers | show |
Series | [ovs-dev,ovn,v2] Fix ha chassis failover issues for stale ha chassis entries | expand |
On Thu, Nov 7, 2019 at 10:37 AM <numans@ovn.org> wrote: > > From: Numan Siddique <numans@ovn.org> > > If ha chassis rows of an HA chassis group become stale i.e the HA_Chassis.chassis > column is empty (because ovn-controller is not running in that chassis) > except one row and when ha_chassis_group_is_active() > is called on that ovn-controller, then it returns false. Ideally it should > become active since its the only active chassis. This patch fixes this issue. > > Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=1762777 > Reported-by: Daniel Alvarez <dalvarez@redhat.com> > > Signed-off-by: Numan Siddique <numans@ovn.org> Hi Numan, Looks good to me. Thanks, Dumitru Acked-by: Dumitru Ceara <dceara@redhat.com> > --- > > v1 -> v2 > ------ > * Addresses Dumitru's comments. > > controller/ha-chassis.c | 25 +++++++++++++++++++++++++ > tests/ovn.at | 20 +++++++++++++++++++- > 2 files changed, 44 insertions(+), 1 deletion(-) > > diff --git a/controller/ha-chassis.c b/controller/ha-chassis.c > index 6d9426a5c..d6ec7b658 100644 > --- a/controller/ha-chassis.c > +++ b/controller/ha-chassis.c > @@ -142,6 +142,27 @@ ha_chassis_destroy_ordered(struct ha_chassis_ordered *ordered_ha_ch) > } > } > > +/* Returns true if there is only one active ha chassis in the chassis group > + * (i.e HA_Chassis.chassis column is set) and that active ha chassis is > + * local chassis. > + * Returns false otherwise. */ > +static bool > +is_local_chassis_only_candidate(const struct sbrec_ha_chassis_group *ha_ch_grp, > + const struct sbrec_chassis *local_chassis) > +{ > + size_t n_active_ha_chassis = 0; > + bool local_chassis_present = false; > + for (size_t i = 0; i < ha_ch_grp->n_ha_chassis; i++) { > + if (ha_ch_grp->ha_chassis[i]->chassis) { > + n_active_ha_chassis++; > + if (ha_ch_grp->ha_chassis[i]->chassis == local_chassis) { > + local_chassis_present = true; > + } > + } > + } > + > + return (local_chassis_present && n_active_ha_chassis == 1); > +} > > /* Returns true if the local_chassis is the master of > * the HA chassis group, false otherwise. */ > @@ -159,6 +180,10 @@ ha_chassis_group_is_active( > return (ha_ch_grp->ha_chassis[0]->chassis == local_chassis); > } > > + if (is_local_chassis_only_candidate(ha_ch_grp, local_chassis)) { > + return true; > + } > + > if (sset_is_empty(active_tunnels)) { > /* If active tunnel sset is empty, it means it has lost > * connectivity with other chassis. */ > diff --git a/tests/ovn.at b/tests/ovn.at > index 410f4b514..cb7903db8 100644 > --- a/tests/ovn.at > +++ b/tests/ovn.at > @@ -13413,7 +13413,25 @@ OVS_WAIT_UNTIL( > logical_port=ls1-lp_ext1` > test "$chassis" = "$hv1_uuid"]) > > -OVN_CLEANUP([hv1],[hv2],[hv3]) > +# Stop ovn-controllers on hv1 and hv3. > +as hv1 ovn-appctl -t ovn-controller exit > +as hv3 ovn-appctl -t ovn-controller exit > + > +# hv2 should be master and claim ls1-lp_ext1 > +OVS_WAIT_UNTIL( > + [chassis=`ovn-sbctl --bare --columns chassis find port_binding \ > +logical_port=ls1-lp_ext1` > + test "$chassis" = "$hv2_uuid"]) > + > +as hv1 > +OVS_APP_EXIT_AND_WAIT([ovs-vswitchd]) > +OVS_APP_EXIT_AND_WAIT([ovsdb-server]) > + > +as hv3 > +OVS_APP_EXIT_AND_WAIT([ovs-vswitchd]) > +OVS_APP_EXIT_AND_WAIT([ovsdb-server]) > + > +OVN_CLEANUP([hv2]) > AT_CLEANUP > > AT_SETUP([ovn -- Address Set Incremental Processing]) > -- > 2.23.0 > > _______________________________________________ > dev mailing list > dev@openvswitch.org > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
On Thu, Nov 7, 2019 at 5:57 PM Dumitru Ceara <dceara@redhat.com> wrote: > > On Thu, Nov 7, 2019 at 10:37 AM <numans@ovn.org> wrote: > > > > From: Numan Siddique <numans@ovn.org> > > > > If ha chassis rows of an HA chassis group become stale i.e the HA_Chassis.chassis > > column is empty (because ovn-controller is not running in that chassis) > > except one row and when ha_chassis_group_is_active() > > is called on that ovn-controller, then it returns false. Ideally it should > > become active since its the only active chassis. This patch fixes this issue. > > > > Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=1762777 > > Reported-by: Daniel Alvarez <dalvarez@redhat.com> > > > > Signed-off-by: Numan Siddique <numans@ovn.org> > > Hi Numan, > > Looks good to me. > > Thanks, > Dumitru > > Acked-by: Dumitru Ceara <dceara@redhat.com> Thanks. I applied this to master. Numan > > > --- > > > > v1 -> v2 > > ------ > > * Addresses Dumitru's comments. > > > > controller/ha-chassis.c | 25 +++++++++++++++++++++++++ > > tests/ovn.at | 20 +++++++++++++++++++- > > 2 files changed, 44 insertions(+), 1 deletion(-) > > > > diff --git a/controller/ha-chassis.c b/controller/ha-chassis.c > > index 6d9426a5c..d6ec7b658 100644 > > --- a/controller/ha-chassis.c > > +++ b/controller/ha-chassis.c > > @@ -142,6 +142,27 @@ ha_chassis_destroy_ordered(struct ha_chassis_ordered *ordered_ha_ch) > > } > > } > > > > +/* Returns true if there is only one active ha chassis in the chassis group > > + * (i.e HA_Chassis.chassis column is set) and that active ha chassis is > > + * local chassis. > > + * Returns false otherwise. */ > > +static bool > > +is_local_chassis_only_candidate(const struct sbrec_ha_chassis_group *ha_ch_grp, > > + const struct sbrec_chassis *local_chassis) > > +{ > > + size_t n_active_ha_chassis = 0; > > + bool local_chassis_present = false; > > + for (size_t i = 0; i < ha_ch_grp->n_ha_chassis; i++) { > > + if (ha_ch_grp->ha_chassis[i]->chassis) { > > + n_active_ha_chassis++; > > + if (ha_ch_grp->ha_chassis[i]->chassis == local_chassis) { > > + local_chassis_present = true; > > + } > > + } > > + } > > + > > + return (local_chassis_present && n_active_ha_chassis == 1); > > +} > > > > /* Returns true if the local_chassis is the master of > > * the HA chassis group, false otherwise. */ > > @@ -159,6 +180,10 @@ ha_chassis_group_is_active( > > return (ha_ch_grp->ha_chassis[0]->chassis == local_chassis); > > } > > > > + if (is_local_chassis_only_candidate(ha_ch_grp, local_chassis)) { > > + return true; > > + } > > + > > if (sset_is_empty(active_tunnels)) { > > /* If active tunnel sset is empty, it means it has lost > > * connectivity with other chassis. */ > > diff --git a/tests/ovn.at b/tests/ovn.at > > index 410f4b514..cb7903db8 100644 > > --- a/tests/ovn.at > > +++ b/tests/ovn.at > > @@ -13413,7 +13413,25 @@ OVS_WAIT_UNTIL( > > logical_port=ls1-lp_ext1` > > test "$chassis" = "$hv1_uuid"]) > > > > -OVN_CLEANUP([hv1],[hv2],[hv3]) > > +# Stop ovn-controllers on hv1 and hv3. > > +as hv1 ovn-appctl -t ovn-controller exit > > +as hv3 ovn-appctl -t ovn-controller exit > > + > > +# hv2 should be master and claim ls1-lp_ext1 > > +OVS_WAIT_UNTIL( > > + [chassis=`ovn-sbctl --bare --columns chassis find port_binding \ > > +logical_port=ls1-lp_ext1` > > + test "$chassis" = "$hv2_uuid"]) > > + > > +as hv1 > > +OVS_APP_EXIT_AND_WAIT([ovs-vswitchd]) > > +OVS_APP_EXIT_AND_WAIT([ovsdb-server]) > > + > > +as hv3 > > +OVS_APP_EXIT_AND_WAIT([ovs-vswitchd]) > > +OVS_APP_EXIT_AND_WAIT([ovsdb-server]) > > + > > +OVN_CLEANUP([hv2]) > > AT_CLEANUP > > > > AT_SETUP([ovn -- Address Set Incremental Processing]) > > -- > > 2.23.0 > > > > _______________________________________________ > > dev mailing list > > dev@openvswitch.org > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev > > _______________________________________________ > dev mailing list > dev@openvswitch.org > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
diff --git a/controller/ha-chassis.c b/controller/ha-chassis.c index 6d9426a5c..d6ec7b658 100644 --- a/controller/ha-chassis.c +++ b/controller/ha-chassis.c @@ -142,6 +142,27 @@ ha_chassis_destroy_ordered(struct ha_chassis_ordered *ordered_ha_ch) } } +/* Returns true if there is only one active ha chassis in the chassis group + * (i.e HA_Chassis.chassis column is set) and that active ha chassis is + * local chassis. + * Returns false otherwise. */ +static bool +is_local_chassis_only_candidate(const struct sbrec_ha_chassis_group *ha_ch_grp, + const struct sbrec_chassis *local_chassis) +{ + size_t n_active_ha_chassis = 0; + bool local_chassis_present = false; + for (size_t i = 0; i < ha_ch_grp->n_ha_chassis; i++) { + if (ha_ch_grp->ha_chassis[i]->chassis) { + n_active_ha_chassis++; + if (ha_ch_grp->ha_chassis[i]->chassis == local_chassis) { + local_chassis_present = true; + } + } + } + + return (local_chassis_present && n_active_ha_chassis == 1); +} /* Returns true if the local_chassis is the master of * the HA chassis group, false otherwise. */ @@ -159,6 +180,10 @@ ha_chassis_group_is_active( return (ha_ch_grp->ha_chassis[0]->chassis == local_chassis); } + if (is_local_chassis_only_candidate(ha_ch_grp, local_chassis)) { + return true; + } + if (sset_is_empty(active_tunnels)) { /* If active tunnel sset is empty, it means it has lost * connectivity with other chassis. */ diff --git a/tests/ovn.at b/tests/ovn.at index 410f4b514..cb7903db8 100644 --- a/tests/ovn.at +++ b/tests/ovn.at @@ -13413,7 +13413,25 @@ OVS_WAIT_UNTIL( logical_port=ls1-lp_ext1` test "$chassis" = "$hv1_uuid"]) -OVN_CLEANUP([hv1],[hv2],[hv3]) +# Stop ovn-controllers on hv1 and hv3. +as hv1 ovn-appctl -t ovn-controller exit +as hv3 ovn-appctl -t ovn-controller exit + +# hv2 should be master and claim ls1-lp_ext1 +OVS_WAIT_UNTIL( + [chassis=`ovn-sbctl --bare --columns chassis find port_binding \ +logical_port=ls1-lp_ext1` + test "$chassis" = "$hv2_uuid"]) + +as hv1 +OVS_APP_EXIT_AND_WAIT([ovs-vswitchd]) +OVS_APP_EXIT_AND_WAIT([ovsdb-server]) + +as hv3 +OVS_APP_EXIT_AND_WAIT([ovs-vswitchd]) +OVS_APP_EXIT_AND_WAIT([ovsdb-server]) + +OVN_CLEANUP([hv2]) AT_CLEANUP AT_SETUP([ovn -- Address Set Incremental Processing])