| Message ID | 20260309193614.61980-1-xsimonar@redhat.com |
|---|---|
| State | Superseded |
| Delegated to: | Dumitru Ceara |
| Headers | show |
| Series | [ovs-dev,v2] controller: Fix bfd up too early after unexpected reboot. | expand |
| Context | Check | Description |
|---|---|---|
| ovsrobot/apply-robot | success | apply and check: success |
| ovsrobot/github-robot-_Build_and_Test | success | github build: passed |
| ovsrobot/github-robot-_ovn-kubernetes | success | github build: passed |
Hi Xavier, It took a bit for me to look through all of this, but I believe this looks good to me. Thanks for adding the notes about the offline discussions, as I had not recalled all the details about those discussions between then and now. Acked-by: Mark Michelson <mmichels@redhat.com> On Mon, Mar 9, 2026 at 3:36 PM Xavier Simonart <xsimonar@redhat.com> wrote: > > If a server unexpectedly rebooted, OVS, when restarted, sets BFD > UP on bfd-enabled geneve tunnels. > However, if it takes time to restart OVN, an HA gw chassis > would attract the traffic while being unable to handle it > (as no flows), resulting in traffic loss. > > This is fixed by re-using ovs flow-restore-wait. > If set, OVS waits (prevents upcalls, ignores bfd, ...) until reset. > Once OVS receives the notification of flow-restore-wait being false, > it restarts handling upcalls, bfd... and ignores any new change to > flow-restore-wait. > > Hence, on chassis hosting ha gateways, OVN toggles flow-restore-wait: > it set it to false, waits for ack from OVS and then sets it back to true. > If server reboots, OVS will see flow-restore-wait being true. > > OVN also sets external_ids->ovn-managed-flow-restore-wait when setting > flow-restore-wait. When set, it tells that OVN once set flow-restore-wait. > > "ovs-ctl restart" also uses flow-restore-wait: when called, it saves the > flows, stops "ovs-vswitchd", sets "flow-restore-wait" to true, restarts > "ovs-vswitchd", restores the flows and finally removes "flow-restore-wait". > So OVS will wait either for "ovs-ctl restart" to remove "flow-restore-wait" > or for OVN to set "flow-restore-wait" to false. > > Reported-at: https://issues.redhat.com/browse/FDP-3075 > Signed-off-by: Xavier Simonart <xsimonar@redhat.com> > > --- > -v2 : - Updated based on Mark's feedback (commit message, comments). > - Avoid setting flow-restore-wait for computes. > - Add external_ids->ovn-managed-flow-restore-wait. > - Updated test: add test for compute update + nits (variable name changes) > --- > controller/bfd.c | 5 +- > controller/bfd.h | 4 +- > controller/ovn-controller.8.xml | 11 + > controller/ovn-controller.c | 171 +++++++++- > tests/multinode-macros.at | 22 ++ > tests/multinode.at | 546 ++++++++++++++++++++++---------- > 6 files changed, 584 insertions(+), 175 deletions(-) > > diff --git a/controller/bfd.c b/controller/bfd.c > index 3b0c3f6da..56bfa4936 100644 > --- a/controller/bfd.c > +++ b/controller/bfd.c > @@ -117,13 +117,14 @@ bfd_calculate_active_tunnels(const struct ovsrec_bridge *br_int, > * > * If 'our_chassis' is C5 then this function returns empty bfd set. > */ > -void > +bool > bfd_calculate_chassis( > const struct sbrec_chassis *our_chassis, > const struct sbrec_ha_chassis_group_table *ha_chassis_grp_table, > struct sset *bfd_chassis) > { > const struct sbrec_ha_chassis_group *ha_chassis_grp; > + bool chassis_is_ha_gw = false; > SBREC_HA_CHASSIS_GROUP_TABLE_FOR_EACH (ha_chassis_grp, > ha_chassis_grp_table) { > bool is_ha_chassis = false; > @@ -143,6 +144,7 @@ bfd_calculate_chassis( > sset_add(&grp_chassis, ha_ch->chassis->name); > if (our_chassis == ha_ch->chassis) { > is_ha_chassis = true; > + chassis_is_ha_gw = true; > bfd_setup_required = true; > } > } > @@ -178,6 +180,7 @@ bfd_calculate_chassis( > } > sset_destroy(&grp_chassis); > } > + return chassis_is_ha_gw; > } > > void > diff --git a/controller/bfd.h b/controller/bfd.h > index f8fece5a5..3e3384891 100644 > --- a/controller/bfd.h > +++ b/controller/bfd.h > @@ -16,6 +16,8 @@ > #ifndef OVN_BFD_H > #define OVN_BFD_H 1 > > +#include <stdbool.h> > + > struct hmap; > struct ovsdb_idl; > struct ovsdb_idl_index; > @@ -36,7 +38,7 @@ void bfd_run(const struct ovsrec_interface_table *, > const struct sbrec_sb_global_table *, > const struct ovsrec_open_vswitch_table *); > > -void bfd_calculate_chassis( > +bool bfd_calculate_chassis( > const struct sbrec_chassis *, > const struct sbrec_ha_chassis_group_table *, > struct sset *); > diff --git a/controller/ovn-controller.8.xml b/controller/ovn-controller.8.xml > index 57e7cf5dd..33281a4d6 100644 > --- a/controller/ovn-controller.8.xml > +++ b/controller/ovn-controller.8.xml > @@ -531,6 +531,17 @@ > 65535. > </dd> > > + <dt> > + <code>external_ids:ovn-managed-flow-restore-wait</code> in the > + <code>Open_vSwitch</code> table > + </dt> > + <dd> > + When set to true, this key indicates that <code>ovn-controller</code> > + has set the <code>other_config:flow-restore-wait</code> option. > + The key is set when <code>ovn-controller</code> enables > + flow-restore-wait and removed when it clears it. > + </dd> > + > <dt> > <code>external_ids:ct-zone-*</code> in the <code>Bridge</code> table > </dt> > diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c > index 5815f1b92..0611b1767 100644 > --- a/controller/ovn-controller.c > +++ b/controller/ovn-controller.c > @@ -211,6 +211,150 @@ static char *get_file_system_id(void) > free(filename); > return ret; > } > + > +/* Set/unset flow-restore-wait, and inc ovs next_cfg if false > + * When set to true, also sets ovn-managed-flow-restore-wait to true to > + * indicate ownership */ > +static void set_flow_restore_wait(struct ovsdb_idl_txn *ovs_idl_txn, > + const struct ovsrec_open_vswitch *cfg, > + const struct smap *other_config, > + const char *val, bool ovn_managed) > +{ > + struct smap new_config; > + smap_clone(&new_config, other_config); > + smap_replace(&new_config, "flow-restore-wait", val); > + ovsrec_open_vswitch_set_other_config(cfg, &new_config); > + if (!strcmp(val, "true")) { > + ovsrec_open_vswitch_update_external_ids_setkey( > + cfg, "ovn-managed-flow-restore-wait", "true"); > + } else if (ovn_managed) { > + ovsrec_open_vswitch_update_external_ids_delkey( > + cfg, "ovn-managed-flow-restore-wait"); > + } > + ovsdb_idl_txn_increment(ovs_idl_txn, &cfg->header_, > + &ovsrec_open_vswitch_col_next_cfg, true); > + smap_destroy(&new_config); > +} > + > +static void > +manage_flow_restore_wait(struct ovsdb_idl_txn *ovs_idl_txn, > + const struct ovsrec_open_vswitch *cfg, > + uint64_t ofctrl_cur_cfg, uint64_t ovs_next_cfg, > + int ovs_txn_status, bool is_ha_gw) > +{ > + enum flow_restore_wait_state { > + FRW_INIT, /* Initial state */ > + FRW_WAIT_TXN_COMPLETE, /* Sent false, waiting txn to complete */ > + FRW_TXN_SUCCESS, /* Txn completed. Waiting for OVS Ack. */ > + FRW_DONE /* Everything completed */ > + }; > + > + static int64_t frw_next_cfg; > + static enum flow_restore_wait_state frw_state; > + static bool ofctrl_was_connected = false; > + > + bool ofctrl_connected = ofctrl_is_connected(); > + > + if (!ovs_idl_txn || !cfg) { > + return; > + } > + > + /* If OVS is stopped/started, make sure flow-restore-wait is toggled */ > + if (ofctrl_connected && !ofctrl_was_connected) { > + frw_state = FRW_INIT; > + } > + ofctrl_was_connected = ofctrl_connected; > + > + if (!ofctrl_connected) { > + return; > + } > + > + bool frw = smap_get_bool(&cfg->other_config, "flow-restore-wait", false); > + bool ovn_managed_once = smap_get_bool(&cfg->external_ids, > + "ovn-managed-flow-restore-wait", false); > + > + if (frw && !ovn_managed_once) { > + /* frw has been set by ovs-ctl. Do not touch. */ > + return; > + } > + > + if (!is_ha_gw) { > + if (frw) { > + /* frw has once been set by OVN. We are now not an HA chassis > + * anymore, unset it. */ > + set_flow_restore_wait(ovs_idl_txn, cfg, &cfg->other_config, > + "false", ovn_managed_once); > + } > + /* else we are not an HA chassis and frw is false. Ignore it. */ > + return; > + } > + > + switch (frw_state) { > + case FRW_INIT: > + if (ofctrl_cur_cfg > 0) { > + set_flow_restore_wait(ovs_idl_txn, cfg, &cfg->other_config, > + "false", ovn_managed_once); > + frw_state = FRW_WAIT_TXN_COMPLETE; > + VLOG_INFO("Setting flow-restore-wait=false " > + "(cur_cfg=%"PRIu64")", ofctrl_cur_cfg); > + } > + break; > + > + case FRW_WAIT_TXN_COMPLETE: > + /* if (ovs_idl_txn != NULL), the transaction completed. > + * When the transaction completed, it either failed > + * (ovs_txn_status == 0) or succeeded (ovs_txn_status != 0) */ > + if (ovs_txn_status == 0) { > + /* Previous transaction failed. */ > + set_flow_restore_wait(ovs_idl_txn, cfg, &cfg->other_config, > + "false", ovn_managed_once); > + break; > + } > + /* txn succeeded, get next_cfg */ > + frw_next_cfg = ovs_next_cfg; > + frw_state = FRW_TXN_SUCCESS; > + /* fall through */ > + > + case FRW_TXN_SUCCESS: > + if (ovs_next_cfg < frw_next_cfg) { > + /* DB was reset, next_cfg went backwards */ > + VLOG_INFO("OVS DB reset (next_cfg %"PRId64" -> %"PRIu64"), " > + "resetting state", > + frw_next_cfg, ovs_next_cfg); > + set_flow_restore_wait(ovs_idl_txn, cfg, &cfg->other_config, > + "false", ovn_managed_once); > + frw_state = FRW_WAIT_TXN_COMPLETE; > + break; > + } > + > + if (!frw) { > + if (cfg->cur_cfg >= frw_next_cfg) { > + set_flow_restore_wait(ovs_idl_txn, cfg, &cfg->other_config, > + "true", ovn_managed_once); > + frw_state = FRW_DONE; > + VLOG_INFO("Setting flow-restore-wait=true"); > + } > + } else { > + /* The transaction to false succeeded but frw is true. > + * So, another task already set it to true */ > + frw_state = FRW_DONE; > + VLOG_INFO("flow-restore-wait was already true"); > + } > + break; > + case FRW_DONE: > + if (!frw) { > + /* frw has been removed (e.g. by ovs-ctl restart) or is false > + * (e.g. txn failed.) */ > + set_flow_restore_wait(ovs_idl_txn, cfg, &cfg->other_config, > + "false", ovn_managed_once); > + frw_state = FRW_WAIT_TXN_COMPLETE; > + VLOG_INFO("OVS frw cleared, restarting flow-restore-wait sequence " > + "(cur_cfg=%"PRIu64")", ofctrl_cur_cfg); > + } > + break; > + } > +} > + > /* Only set monitor conditions on tables that are available in the > * server schema. > */ > @@ -3381,6 +3525,7 @@ en_mac_cache_cleanup(void *data) > > struct ed_type_bfd_chassis { > struct sset bfd_chassis; > + bool is_ha_gw; > }; > > static void * > @@ -3409,8 +3554,9 @@ en_bfd_chassis_run(struct engine_node *node, void *data OVS_UNUSED) > = chassis_lookup_by_name(sbrec_chassis_by_name, chassis_id); > > sset_clear(&bfd_chassis->bfd_chassis); > - bfd_calculate_chassis(chassis, ha_chassis_grp_table, > - &bfd_chassis->bfd_chassis); > + bfd_chassis->is_ha_gw = bfd_calculate_chassis(chassis, > + ha_chassis_grp_table, > + &bfd_chassis->bfd_chassis); > return EN_UPDATED; > } > > @@ -7117,6 +7263,7 @@ main(int argc, char *argv[]) > struct unixctl_server *unixctl; > struct ovn_exit_args exit_args = {0}; > struct br_int_remote br_int_remote = {0}; > + static uint64_t next_cfg = 0; > int retval; > > /* Read from system-id-override file once on startup. */ > @@ -7444,6 +7591,7 @@ main(int argc, char *argv[]) > > /* Main loop. */ > int ovnsb_txn_status = 1; > + int ovs_txn_status = 1; > bool sb_monitor_all = false; > struct tracked_acl_ids *tracked_acl_ids = NULL; > while (!exit_args.exiting) { > @@ -7545,6 +7693,11 @@ main(int argc, char *argv[]) > pinctrl_update_swconn(br_int_remote.target, > br_int_remote.probe_interval); > > + if (cfg && ovs_idl_txn && ovs_txn_status == -1) { > + /* txn was in progress and is now completed */ > + next_cfg = cfg->next_cfg; > + } > + > /* Enable ACL matching for double tagged traffic. */ > if (ovs_idl_txn && cfg) { > int vlan_limit = smap_get_int( > @@ -7894,6 +8047,13 @@ main(int argc, char *argv[]) > stopwatch_start(OFCTRL_SEQNO_RUN_STOPWATCH_NAME, > time_msec()); > ofctrl_seqno_run(ofctrl_get_cur_cfg()); > + if (ovs_idl_txn && bfd_chassis_data) { > + manage_flow_restore_wait(ovs_idl_txn, cfg, > + ofctrl_get_cur_cfg(), > + next_cfg, ovs_txn_status, > + bfd_chassis_data->is_ha_gw); > + } > + > stopwatch_stop(OFCTRL_SEQNO_RUN_STOPWATCH_NAME, > time_msec()); > stopwatch_start(IF_STATUS_MGR_RUN_STOPWATCH_NAME, > @@ -7993,7 +8153,7 @@ main(int argc, char *argv[]) > OVS_NOT_REACHED(); > } > > - int ovs_txn_status = ovsdb_idl_loop_commit_and_wait(&ovs_idl_loop); > + ovs_txn_status = ovsdb_idl_loop_commit_and_wait(&ovs_idl_loop); > if (!ovs_txn_status) { > /* The transaction failed. */ > vif_plug_clear_deleted( > @@ -8012,6 +8172,9 @@ main(int argc, char *argv[]) > &vif_plug_deleted_iface_ids); > vif_plug_finish_changed( > &vif_plug_changed_iface_ids); > + if (cfg) { > + next_cfg = cfg->next_cfg; > + } > } else if (ovs_txn_status == -1) { > /* The commit is still in progress */ > } else { > @@ -8085,7 +8248,7 @@ loop_done: > } > > ovsdb_idl_loop_commit_and_wait(&ovnsb_idl_loop); > - int ovs_txn_status = ovsdb_idl_loop_commit_and_wait(&ovs_idl_loop); > + ovs_txn_status = ovsdb_idl_loop_commit_and_wait(&ovs_idl_loop); > if (!ovs_txn_status) { > /* The transaction failed. */ > vif_plug_clear_deleted( > diff --git a/tests/multinode-macros.at b/tests/multinode-macros.at > index 4a74d5127..646ee2d79 100644 > --- a/tests/multinode-macros.at > +++ b/tests/multinode-macros.at > @@ -41,6 +41,28 @@ m4_define([M_START_TCPDUMP], > ] > ) > > +m4_define([_M_START_TCPDUMPS_RECURSIVE], [ > + m4_if(m4_eval($# > 3), [1], [dnl > + names="$names $3" > + echo "Running podman exec $1 tcpdump -l $2 >$3.tcpdump 2>$3.stderr" > + podman exec $1 tcpdump -l $2 >$3.tcpdump 2>$3.stderr & > + echo "podman exec $1 ps -ef | grep -v grep | grep tcpdump && podman exec $1 killall tcpdump" >> cleanup > + _M_START_TCPDUMPS_RECURSIVE(m4_shift(m4_shift(m4_shift($@)))) > + ]) > + ] > +) > + > +# Start Multiple tcpdump. Useful to speed up when many tcpdump > +# must be started as waiting for "listening" takes usually 1 second. > +m4_define([M_START_TCPDUMPS], > + [ > + names="" > + _M_START_TCPDUMPS_RECURSIVE($@) > + for name in $names; do > + OVS_WAIT_UNTIL([grep -q "listening" ${name}.stderr]) > + done > + ] > +) > > # M_FORMAT_CT([ip-addr]) > # > diff --git a/tests/multinode.at b/tests/multinode.at > index 6b9614126..24d7ca27c 100644 > --- a/tests/multinode.at > +++ b/tests/multinode.at > @@ -2986,42 +2986,42 @@ AT_CLEANUP > > AT_SETUP([HA: Check for missing garp on leader when BFD goes back up]) > # Network topology > -# ┌────────────────────────────────────────────────────────────────────────────────────────────────────────┐ > -# │ │ > -# │ ┌───────────────────┐ ┌───────────────────┐ ┌───────────────────┐ ┌───────────────────┐ │ > -# │ │ ovn-chassis-1 │ │ ovn-gw-1 │ │ ovn-gw-2 │ │ ovn-chassis-2 │ │ > -# │ └─────────┬─────────┘ └───────────────────┘ └───────────────────┘ └───────────────────┘ │ > -# │ ┌─────────┴─────────┐ │ > -# │ │ inside1 │ │ > -# │ │ 192.168.1.1/24 │ │ > -# │ └─────────┬─────────┘ │ > -# │ ┌─────────┴─────────┐ │ > -# │ │ inside │ │ > -# │ └─────────┬─────────┘ │ > -# │ ┌─────────┴─────────┐ │ > -# │ │ 192.168.1.254 │ │ > -# │ │ R1 │ │ > -# │ │ 192.168.0.254 │ │ > -# │ └─────────┬─────────┘ │ > -# │ └------eth1---------------┬--------eth1-----------┐ │ > -# │ ┌──────────┴────────┐ ┌─────────┴─────────┐ │ > -# │ │ 192.168.1.254 │ │ 192.168.1.254 │ │ > -# │ │ R1 │ │ R1 │ │ > -# │ │ 192.168.0.254 │ │ 192.168.0.254 │ │ > -# │ └─────────┬─────────┘ └─────────┬─────────┘ │ > -# │ │ │ ┌───────────────────┐ │ > -# │ ┌─────────┴─────────┐ ┌─────────┴─────────┐ │ 192.168.0.1 │ │ > -# │ │ outside │ │ outside │ │ ext1 │ │ > -# │ └─────────┬─────────┘ └─────────┬─────────┘ └─────────┬─────────┘ │ > -# │ ┌─────────┴─────────┐ ┌─────────┴─────────┐ ┌─────────┴─────────┐ │ > -# │ │ ln-outside │ │ ln-outside │ │ ln-ext1 │ │ > -# │ └─────────┬─────────┘ └─────────┬─────────┘ └─────────┬─────────┘ │ > -# │ ┌─────────┴─────────┐ ┌─────────┴─────────┐ ┌─────────┴─────────┐ │ > -# │ │ br-ex │ │ br-ex │ │ br-ex │ │ > -# │ └─────────┬─────────┘ └─────────┬─────────┘ └─────────┬─────────┘ │ > -# │ └---------eth2-----------┴-------eth2-------------┘ │ > -# │ │ > -# └────────────────────────────────────────────────────────────────────────────────────────────────────────┘ > +# ┌────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ > +# │ │ > +# │ ┌───────────────────┐ ┌───────────────────┐ ┌───────────────────┐ ┌───────────────────┐ ┌───────────────────┐ │ > +# │ │ ovn-chassis-1 │ │ ovn-chassis-2 │ │ ovn-gw-1 │ │ ovn-gw-2 │ │ ovn-chassis-3 │ │ > +# │ └─────────┬─────────┘ └─────────┬─────────┘ └───────────────────┘ └───────────────────┘ └───────────────────┘ │ > +# │ ┌─────────┴─────────┐ ┌─────────┴─────────┐ │ > +# │ │ inside1 │ │ inside2 │ │ > +# │ │ 192.168.1.1/24 │ │ 192.168.1.2/24 │ │ > +# │ └─────────┬─────────┘ └─────────┬─────────┘ │ > +# │ ┌─┴────────────────────────┴─┐ │ > +# │ │ inside │ │ > +# │ └──────────────┬─────────────┘ │ > +# │ ┌─────────┴─────────┐ │ > +# │ │ 192.168.1.254 │ │ > +# │ │ R1 │ │ > +# │ │ 192.168.0.254 │ │ > +# │ └─────────┬─────────┘ │ > +# │ └------eth1---------------------------┬--------eth1-----------┐ │ > +# │ ┌──────────┴────────┐ ┌─────────┴─────────┐ │ > +# │ │ 192.168.1.254 │ │ 192.168.1.254 │ │ > +# │ │ R1 │ │ R1 │ │ > +# │ │ 192.168.0.254 │ │ 192.168.0.254 │ │ > +# │ └─────────┬─────────┘ └─────────┬─────────┘ │ > +# │ │ │ ┌───────────────────┐ │ > +# │ ┌─────────┴─────────┐ ┌─────────┴─────────┐ │ 192.168.0.1 │ │ > +# │ │ outside │ │ outside │ │ ext1 │ │ > +# │ └─────────┬─────────┘ └─────────┬─────────┘ └─────────┬─────────┘ │ > +# │ ┌─────────┴─────────┐ ┌─────────┴─────────┐ ┌─────────┴─────────┐ │ > +# │ │ ln-outside │ │ ln-outside │ │ ln-ext1 │ │ > +# │ └─────────┬─────────┘ └─────────┬─────────┘ └─────────┬─────────┘ │ > +# │ ┌─────────┴─────────┐ ┌─────────┴─────────┐ ┌─────────┴─────────┐ │ > +# │ │ br-ex │ │ br-ex │ │ br-ex │ │ > +# │ └─────────┬─────────┘ └─────────┬─────────┘ └─────────┬─────────┘ │ > +# │ └---------eth2-----------┴-------eth2-------------┘ │ > +# │ │ > +# └────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘ > > # The goal of this test is the check that GARP are properly generated by higest priority traffic when > # BFD goes down, and back up, and this whether the BFD event is due either to some bfd packet lost > @@ -3030,6 +3030,12 @@ AT_SETUP([HA: Check for missing garp on leader when BFD goes back up]) > # So gw3 should in this test neither send garp or receive packets. > # > # Enable vconn so we can check the GARP from a log perspective. > +on_exit "podman exec ovn-gw-1 ovn-appctl vlog/set info" > +on_exit "podman exec ovn-gw-1 ovn-appctl vlog/enable-rate-limit" > +on_exit "podman exec ovn-gw-2 ovn-appctl vlog/set info" > +on_exit "podman exec ovn-gw-2 ovn-appctl vlog/enable-rate-limit" > +on_exit "podman exec ovn-gw-3 ovn-appctl vlog/set info" > +on_exit "podman exec ovn-gw-3 ovn-appctl vlog/enable-rate-limit" > m_as ovn-gw-1 ovn-appctl vlog/set vconn:dbg > m_as ovn-gw-2 ovn-appctl vlog/set vconn:dbg > m_as ovn-gw-3 ovn-appctl vlog/set vconn:dbg > @@ -3037,12 +3043,17 @@ m_as ovn-gw-1 ovn-appctl vlog/disable-rate-limit > m_as ovn-gw-2 ovn-appctl vlog/disable-rate-limit > m_as ovn-gw-3 ovn-appctl vlog/disable-rate-limit > > +# Decrease revalidation time on ovs switch simulating ToR. > +on_exit "OVS_RUNDIR= ovs-vsctl set Open_vSwitch . other_config:max-revalidator=500" > +OVS_RUNDIR= ovs-vsctl set Open_vSwitch . other_config:max-revalidator=100 > + > check_fake_multinode_setup > > # Delete the multinode NB and OVS resources before starting the test. > cleanup_multinode_resources > > ip_ch1=$(m_as ovn-chassis-1 ip a show dev eth1 | grep "inet " | awk '{print $2}'| cut -d '/' -f1) > +ip_ch2=$(m_as ovn-chassis-2 ip a show dev eth1 | grep "inet " | awk '{print $2}'| cut -d '/' -f1) > ip_gw1=$(m_as ovn-gw-1 ip a show dev eth1 | grep "inet " | awk '{print $2}'| cut -d '/' -f1) > ip_gw2=$(m_as ovn-gw-2 ip a show dev eth1 | grep "inet " | awk '{print $2}'| cut -d '/' -f1) > ip_gw3=$(m_as ovn-gw-3 ip a show dev eth1 | grep "inet " | awk '{print $2}'| cut -d '/' -f1) > @@ -3050,25 +3061,35 @@ ip_gw3=$(m_as ovn-gw-3 ip a show dev eth1 | grep "inet " | awk '{print $2}'| cut > from_gw1_to_gw2=$(m_as ovn-gw-1 ovs-vsctl --bare --columns=name find interface options:remote_ip=$ip_gw2) > from_gw1_to_gw3=$(m_as ovn-gw-1 ovs-vsctl --bare --columns=name find interface options:remote_ip=$ip_gw3) > from_gw1_to_ch1=$(m_as ovn-gw-1 ovs-vsctl --bare --columns=name find interface options:remote_ip=$ip_ch1) > +from_gw1_to_ch2=$(m_as ovn-gw-1 ovs-vsctl --bare --columns=name find interface options:remote_ip=$ip_ch2) > from_gw2_to_gw1=$(m_as ovn-gw-2 ovs-vsctl --bare --columns=name find interface options:remote_ip=$ip_gw1) > from_gw2_to_gw3=$(m_as ovn-gw-2 ovs-vsctl --bare --columns=name find interface options:remote_ip=$ip_gw3) > from_gw2_to_ch1=$(m_as ovn-gw-2 ovs-vsctl --bare --columns=name find interface options:remote_ip=$ip_ch1) > +from_gw2_to_ch2=$(m_as ovn-gw-2 ovs-vsctl --bare --columns=name find interface options:remote_ip=$ip_ch2) > from_ch1_to_gw1=$(m_as ovn-chassis-1 ovs-vsctl --bare --columns=name find interface options:remote_ip=$ip_gw1) > from_ch1_to_gw2=$(m_as ovn-chassis-1 ovs-vsctl --bare --columns=name find interface options:remote_ip=$ip_gw2) > +from_ch2_to_gw1=$(m_as ovn-chassis-2 ovs-vsctl --bare --columns=name find interface options:remote_ip=$ip_gw1) > +from_ch2_to_gw2=$(m_as ovn-chassis-2 ovs-vsctl --bare --columns=name find interface options:remote_ip=$ip_gw2) > > m_as ovn-chassis-1 ip link del hv1-vif1-p > -m_as ovn-chassis-2 ip link del ext1-p > +m_as ovn-chassis-2 ip link del hv2-vif1-p > +m_as ovn-chassis-3 ip link del ext1-p > > OVS_WAIT_UNTIL([m_as ovn-chassis-1 ip link show | grep -q genev_sys]) > OVS_WAIT_UNTIL([m_as ovn-chassis-2 ip link show | grep -q genev_sys]) > +OVS_WAIT_UNTIL([m_as ovn-chassis-3 ip link show | grep -q genev_sys]) > OVS_WAIT_UNTIL([m_as ovn-gw-1 ip link show | grep -q genev_sys]) > OVS_WAIT_UNTIL([m_as ovn-gw-2 ip link show | grep -q genev_sys]) > OVS_WAIT_UNTIL([m_as ovn-gw-3 ip link show | grep -q genev_sys]) > > +# Use "aggressive" bfd parameters > +check multinode_nbctl set NB_Global . options:"bfd-min-rx"=500 > +check multinode_nbctl set NB_Global . options:"bfd-min-tx"=100 > check multinode_nbctl ls-add inside > check multinode_nbctl ls-add outside > check multinode_nbctl ls-add ext > check multinode_nbctl lsp-add inside inside1 -- lsp-set-addresses inside1 "f0:00:c0:a8:01:01 192.168.1.1" > +check multinode_nbctl lsp-add inside inside2 -- lsp-set-addresses inside2 "f0:00:c0:a8:01:02 192.168.1.2" > check multinode_nbctl lsp-add ext ext1 -- lsp-set-addresses ext1 "00:00:c0:a8:00:01 192.168.0.1" > > multinode_nbctl create Logical_Router name=R1 > @@ -3100,12 +3121,14 @@ m_as ovn-gw-3 ovs-vsctl remove open . external_ids garp-max-timeout-sec > > m_as ovn-chassis-1 ovs-vsctl set open . external-ids:ovn-bridge-mappings=public:br-ex > m_as ovn-chassis-2 ovs-vsctl set open . external-ids:ovn-bridge-mappings=public:br-ex > +m_as ovn-chassis-3 ovs-vsctl set open . external-ids:ovn-bridge-mappings=public:br-ex > m_as ovn-gw-1 ovs-vsctl set open . external-ids:ovn-bridge-mappings=public:br-ex > m_as ovn-gw-2 ovs-vsctl set open . external-ids:ovn-bridge-mappings=public:br-ex > m_as ovn-gw-3 ovs-vsctl set open . external-ids:ovn-bridge-mappings=public:br-ex > > m_as ovn-chassis-1 /data/create_fake_vm.sh inside1 hv1-vif1 f0:00:c0:a8:01:01 1500 192.168.1.1 24 192.168.1.254 2000::1/64 2000::a > -m_as ovn-chassis-2 /data/create_fake_vm.sh ext1 ext1 00:00:c0:a8:00:01 1500 192.168.0.1 24 192.168.0.254 1000::3/64 1000::a > +m_as ovn-chassis-2 /data/create_fake_vm.sh inside2 hv2-vif1 f0:00:c0:a8:01:02 1500 192.168.1.2 24 192.168.1.254 2000::2/64 2000::a > +m_as ovn-chassis-3 /data/create_fake_vm.sh ext1 ext1 00:00:c0:a8:00:01 1500 192.168.0.1 24 192.168.0.254 1000::3/64 1000::a > > # There should be one ha_chassis_group with the name "R1_outside" > m_check_row_count HA_Chassis_Group 1 name=R1_outside > @@ -3160,53 +3183,67 @@ for chassis in $from_ch1_to_gw1 $from_ch1_to_gw2; do > wait_bfd_enabled ovn-chassis-1 $chassis > done > > +# check BFD enablement on tunnel ports from ovn-chassis-2 ########### > +for chassis in $from_ch2_to_gw1 $from_ch2_to_gw2; do > + echo "checking ovn-chassis-2 -> $chassis" > + wait_bfd_enabled ovn-chassis-2 $chassis > +done > + > # Make sure there is no nft table left. Do not use nft directly as might not be installed in container. > gw1_pid=$(podman inspect -f '{{.State.Pid}}' ovn-gw-1) > nsenter --net=/proc/$gw1_pid/ns/net nft list tables | grep ovn-test && nsenter --net=/proc/$gw1_pid/ns/net nft delete table ip ovn-test > -on_exit "nsenter --net=/proc/$gw1_pid/ns/net nft list tables | grep ovn-test && nsenter --net=/proc/$gw1_pid/ns/net nft delete table ip ovn-test" > +on_exit "if [[ -d "/proc/$gw1_pid" ]]; then nsenter --net=/proc/$gw1_pid/ns/net nft list tables | grep ovn-test && nsenter --net=/proc/$gw1_pid/ns/net nft delete table ip ovn-test; fi" > > -for chassis in $from_gw1_to_gw2 $from_gw1_to_gw3 $from_gw1_to_ch1; do > +for chassis in $from_gw1_to_gw2 $from_gw1_to_gw3 $from_gw1_to_ch1 $from_gw1_to_ch2; do > wait_bfd_up ovn-gw-1 $chassis > done > -for chassis in $from_gw2_to_gw1 $from_gw2_to_gw3 $from_gw2_to_ch1; do > +for chassis in $from_gw2_to_gw1 $from_gw2_to_gw3 $from_gw2_to_ch1 $from_gw2_to_ch2; do > wait_bfd_up ovn-gw-2 $chassis > done > for chassis in $from_ch1_to_gw1 $from_ch1_to_gw2; do > wait_bfd_up ovn-chassis-1 $chassis > done > +for chassis in $from_ch2_to_gw1 $from_ch2_to_gw2; do > + wait_bfd_up ovn-chassis-2 $chassis > +done > > m_wait_row_count Port_Binding 1 logical_port=cr-R1_outside chassis=$gw1_chassis > check multinode_nbctl --wait=hv sync > > start_tcpdump() { > echo "$(date +%H:%M:%S.%03N) Starting tcpdump" > - M_START_TCPDUMP([ovn-chassis-1], [-neei hv1-vif1-p], [ch1]) > - M_START_TCPDUMP([ovn-chassis-2], [-neei eth2], [ch2]) > - M_START_TCPDUMP([ovn-gw-1], [-neei eth2], [gw1]) > - M_START_TCPDUMP([ovn-gw-1], [-neei eth2 -Q out], [gw1_out]) > - M_START_TCPDUMP([ovn-gw-2], [-neei eth2], [gw2]) > - M_START_TCPDUMP([ovn-gw-2], [-neei eth2 -Q out], [gw2_out]) > - M_START_TCPDUMP([ovn-gw-3], [-neei eth2], [gw3]) > - M_START_TCPDUMP([ovn-gw-3], [-neei eth2 -Q out], [gw3_out]) > + M_START_TCPDUMPS([ovn-chassis-1], [-neei hv1-vif1-p], [ch1], > + [ovn-chassis-2], [-neei hv2-vif1-p], [ch2], > + [ovn-chassis-3], [-neei eth2], [ch3], > + [ovn-gw-1], [-neei eth2], [gw1], > + [ovn-gw-1], [-neei eth2 -Q out], [gw1_out], > + [ovn-gw-2], [-neei eth2], [gw2], > + [ovn-gw-2], [-neei eth2 -Q out], [gw2_out], > + [ovn-gw-3], [-neei eth2], [gw3], > + [ovn-gw-3], [-neei eth2 -Q out], [gw3_out], > + [ovn-gw-1], [-neei eth1], [gw1_eth1], > + [ovn-gw-2], [-neei eth1], [gw2_eth1], > + [ovn-chassis-1], [-neei eth1], [ch1_eth1], > + [ovn-chassis-2], [-neei eth1], [ch2_eth1]) > } > > stop_tcpdump() { > echo "$(date +%H:%M:%S.%03N) Stopping tcpdump" > - m_kill 'ovn-gw-1 ovn-gw-2 ovn-gw-3 ovn-chassis-1 ovn-chassis-2' tcpdump > + m_kill 'ovn-gw-1 ovn-gw-2 ovn-gw-3 ovn-chassis-1 ovn-chassis-2 ovn-chassis-3' tcpdump > } > > -# Send packets from chassis2 (ext1) to chassis1 > +# Send packets from ovn-chassis-3 (ext1) to ovn-chassis-1 > send_background_packets() { > echo "$(date +%H:%M:%S.%03N) Sending packets in Background" > start_tcpdump > - M_NS_DAEMONIZE([ovn-chassis-2], [ext1], [ping -f -i 0.1 192.168.1.1], [ping.pid]) > + M_NS_DAEMONIZE([ovn-chassis-3], [ext1], [ping -f -i 0.1 192.168.1.1], [ping.pid]) > } > > stop_sending_background_packets() { > echo "$(date +%H:%M:%S.%03N) Stopping Background process" > m_as ovn-chassis-1 ps -ef | grep -v grep | grep -q ping && \ > m_as ovn-chassis-1 echo "Stopping ping on ovn-chassis-1" && killall ping > - m_as ovn-chassis-2 ps -ef | grep -v grep | grep -q ping && \ > + m_as ovn-chassis-3 ps -ef | grep -v grep | grep -q ping && \ > m_as ovn-chassis-2 echo "Stopping ping on ovn-chassis-2" && killall ping > stop_tcpdump > } > @@ -3216,8 +3253,8 @@ check_for_new_garps() { > expecting_garp=$2 > n_new_garps=$(cat ${hv}_out.tcpdump | grep -c "f0:00:c0:a8:00:fe > Broadcast, ethertype ARP (0x0806), length 42: Request who-has 192.168.0.254 tell 192.168.0.254, length 28") > > - if [ "$expecting_garp" == "true" ]; then > - AS_BOX([$(date +%H:%M:%S.%03N) Waiting/checking for garp from $hv - Starting with $n_new_garps]) > + if [[ "$expecting_garp" == "true" ]]; then > + echo "$(date +%H:%M:%S.%03N) Waiting/checking for garp from $hv - Starting with $n_new_garps" > OVS_WAIT_UNTIL([ > n_garps=$n_new_garps > n_new_garps=$(cat ${hv}_out.tcpdump | grep -c "f0:00:c0:a8:00:fe > Broadcast, ethertype ARP (0x0806), length 42: Request who-has 192.168.0.254 tell 192.168.0.254, length 28") > @@ -3225,7 +3262,7 @@ check_for_new_garps() { > test "$n_garps" -ne "$n_new_garps" > ]) > else > - AS_BOX([$(date +%H:%M:%S.%03N) Checking no garp from ${hv}]) > + echo "$(date +%H:%M:%S.%03N) Checking no garp from ${hv}" > # Waiting a few seconds to get a chance to see unexpected garps. > sleep 3 > n_garps=$(cat ${hv}_out.tcpdump | grep -c "f0:00:c0:a8:00:fe > Broadcast, ethertype ARP (0x0806), length 42: Request who-has 192.168.0.254 tell 192.168.0.254, length 28") > @@ -3241,8 +3278,8 @@ check_for_new_echo_pkts() { > n_new_echo_req=$(cat ${hv}.tcpdump | grep -c "$mac_src > $mac_dst, ethertype IPv4 (0x0800), length 98: 192.168.0.1 > 192.168.1.1: ICMP echo request") > n_new_echo_rep=$(cat ${hv}.tcpdump | grep -c "$mac_dst > $mac_src, ethertype IPv4 (0x0800), length 98: 192.168.1.1 > 192.168.0.1: ICMP echo reply") > > - if [ "$expecting_pkts" == "true" ]; then > - AS_BOX([$(date +%H:%M:%S.%03N) Waiting/checking for echo pkts through ${hv}]) > + if [[ "$expecting_pkts" == "true" ]]; then > + echo "$(date +%H:%M:%S.%03N) Waiting/checking for echo pkts through ${hv}" > echo "Starting with $n_new_echo_req requests and $n_new_echo_rep replies so far on ${hv}." > OVS_WAIT_UNTIL([ > n_echo_req=$n_new_echo_req > @@ -3253,7 +3290,7 @@ check_for_new_echo_pkts() { > test "$n_echo_req" -ne "$n_new_echo_req" && test "$n_echo_rep" -ne "$n_new_echo_rep" > ]) > else > - AS_BOX([$(date +%H:%M:%S.%03N) Checking no pkts from ${hv}]) > + echo "$(date +%H:%M:%S.%03N) Checking no pkts from ${hv}" > # Waiting a few seconds to get a chance to see unexpected pkts. > sleep 3 > n_echo_req=$(cat ${hv}.tcpdump | grep -c "$mac_src > $mac_dst, ethertype IPv4 (0x0800), length 98: 192.168.0.1 > 192.168.1.1: ICMP echo request") > @@ -3271,22 +3308,44 @@ dump_statistics() { > ch1_rep=$(grep -c "ICMP echo reply" ch1.tcpdump) > ch2_req=$(grep -c "ICMP echo request" ch2.tcpdump) > ch2_rep=$(grep -c "ICMP echo reply" ch2.tcpdump) > + ch3_req=$(grep -c "ICMP echo request" ch3.tcpdump) > + ch3_rep=$(grep -c "ICMP echo reply" ch3.tcpdump) > gw1_req=$(grep -c "ICMP echo request" gw1.tcpdump) > gw1_rep=$(grep -c "ICMP echo reply" gw1.tcpdump) > gw2_req=$(grep -c "ICMP echo request" gw2.tcpdump) > gw2_rep=$(grep -c "ICMP echo reply" gw2.tcpdump) > gw3_req=$(grep -c "ICMP echo request" gw3.tcpdump) > gw3_rep=$(grep -c "ICMP echo reply" gw3.tcpdump) > - echo "$n1 claims in gw1, $n2 in gw2 and $n3 on gw3" > - echo "ch2_request=$ch2_req gw1_request=$gw1_req gw2_request=$gw2_req gw3_request=$gw3_req ch1_request=$ch1_req ch1_reply=$ch1_rep gw1_reply=$gw1_rep gw2_reply=$gw2_rep gw3_reply=$gw3_rep ch2_reply=$ch2_rep" > + echo "$n1 claims in gw1, $n2 in gw2 and $n3 on gw3" >&2 > + echo "ch3_req=$ch3_req gw_req=($gw1_req + $gw2_req +$gw3_req) ch1_req=$ch1_req ch1_rep=$ch1_rep gw_rep=($gw1_rep + $gw2_rep + $gw3_rep) ch3_rep=$ch3_rep ch2=($ch2_req+$ch2_rep)" >&2 > + echo "$((ch3_req - ch3_rep))" > } > > -check_migration_between_gw1_and_gw2() { > - action=$1 > - send_background_packets > +add_port() { > + bridge=$1 > + interface=$2 > + address=$3 > + echo "Adding $bridge $interface $address" > + > + pid=$(podman inspect -f '{{.State.Pid}}' ovn-gw-1) > + ln -sf /proc/$pid/ns/net /var/run/netns/$pid > + port=$(OVS_RUNDIR= ovs-vsctl --data=bare --no-heading --columns=name find interface \ > + external_ids:container_id=ovn-gw-1 external_ids:container_iface="$interface") > + port="${port:0:13}" > + ip link add "${port}_l" type veth peer name "${port}_c" > + ip link set "${port}_l" up > + ip link set "${port}_c" netns $pid > + ip netns exec $pid ip link set dev "${port}_c" name "$interface" > + ip netns exec $pid ip link set "$interface" up > + if [[ -n "$address" ]]; then > + ip netns exec $pid ip addr add "$address" dev "$interface" > + fi > +} > > +prepare() { > + send_background_packets > # We make sure gw1 is leader since enough time that it generated all its garps. > - AS_BOX([$(date +%H:%M:%S.%03N) Waiting all garps sent by gw1]) > + echo $(date +%H:%M:%S.%03N) Waiting all garps sent by gw1 > n_new_garps=$(cat gw1_out.tcpdump | grep -c "f0:00:c0:a8:00:fe > Broadcast, ethertype ARP (0x0806), length 42: Request who-has 192.168.0.254 tell 192.168.0.254, length 28") > OVS_WAIT_UNTIL([ > n_garps=$n_new_garps > @@ -3302,130 +3361,267 @@ check_migration_between_gw1_and_gw2() { > check_for_new_echo_pkts gw2 "00:00:c0:a8:00:01" "f0:00:c0:a8:00:fe" "false" > check_for_new_echo_pkts gw3 "00:00:c0:a8:00:01" "f0:00:c0:a8:00:fe" "false" > > + # All packets should go through gw1, and none through gw2 or gw3. > + check_packets "true" "false" "false" "true" > flap_count_gw_1=$(m_as ovn-gw-1 ovs-vsctl get interface $from_gw1_to_gw2 bfd_status | sed 's/.*flap_count=\"\([[0-9]]*\).*/\1/g') > flap_count_gw_2=$(m_as ovn-gw-2 ovs-vsctl get interface $from_gw2_to_gw1 bfd_status | sed 's/.*flap_count=\"\([[0-9]]*\).*/\1/g') > +} > > - if [ test "$action" == "stop_bfd" ]; then > - AS_BOX([$(date +%H:%M:%S.%03N) Blocking bfd on gw1 (from $ip_gw1 to $ip_gw2)]) > - nsenter --net=/proc/$gw1_pid/ns/net nft add table ip ovn-test > - nsenter --net=/proc/$gw1_pid/ns/net nft 'add chain ip ovn-test INPUT { type filter hook input priority 0; policy accept; }' > - # Drop BFD from gw-1 to gw-2: geneve port (6081), inner port 3784 (0xec8), Session state Up, Init, Down. > - nsenter --net=/proc/$gw1_pid/ns/net nft add rule ip ovn-test INPUT ip daddr $ip_gw1 ip saddr $ip_gw2 udp dport 6081 '@th,416,16 == 0x0ec8 @th,472,8 == 0xc0 counter drop' > - nsenter --net=/proc/$gw1_pid/ns/net nft add rule ip ovn-test INPUT ip daddr $ip_gw1 ip saddr $ip_gw2 udp dport 6081 '@th,416,16 == 0x0ec8 @th,472,8 == 0x80 counter drop' > - nsenter --net=/proc/$gw1_pid/ns/net nft add rule ip ovn-test INPUT ip daddr $ip_gw1 ip saddr $ip_gw2 udp dport 6081 '@th,416,16 == 0x0ec8 @th,472,8 == 0x40 counter drop' > - > - # We do not check that packets go through gw2 as BFD between chassis-2 and gw1 is still up > - fi > - > - if [ test "$action" == "kill_gw2" ]; then > - AS_BOX([$(date +%H:%M:%S.%03N) Killing gw2 ovn-controller]) > - on_exit 'm_as ovn-gw-2 /usr/share/openvswitch/scripts/ovs-ctl status || > - m_as ovn-gw-2 /usr/share/openvswitch/scripts/ovs-ctl start --system-id=ovn-gw-2' > - on_exit 'm_as ovn-gw-2 /usr/share/ovn/scripts/ovn-ctl status_controller || > - m_as ovn-gw-2 /usr/share/ovn/scripts/ovn-ctl start_controller ${CONTROLLER_SSL_ARGS}' > - > - m_as ovn-gw-2 kill -9 $(m_as ovn-gw-2 cat /run/ovn/ovn-controller.pid) > - m_as ovn-gw-2 kill -9 $(m_as ovn-gw-2 cat /run/openvswitch/ovs-vswitchd.pid) > - m_as ovn-gw-2 kill -9 $(m_as ovn-gw-2 cat /run/openvswitch/ovsdb-server.pid) > - # Also delete datapath (flows) > - m_as ovn-gw-2 ovs-dpctl del-dp system@ovs-system > - fi > - > - if [ test "$action" == "kill_gw1" ]; then > - AS_BOX([$(date +%H:%M:%S.%03N) Killing gw1 ovn-controller]) > - on_exit 'm_as ovn-gw-1 /usr/share/openvswitch/scripts/ovs-ctl status || > - m_as ovn-gw-1 /usr/share/openvswitch/scripts/ovs-ctl start --system-id=ovn-gw-1' > - on_exit 'm_as ovn-gw-1 /usr/share/ovn/scripts/ovn-ctl status_controller || > - m_as ovn-gw-1 /usr/share/ovn/scripts/ovn-ctl start_controller ${CONTROLLER_SSL_ARGS}' > - > - m_as ovn-gw-1 kill -9 $(m_as ovn-gw-1 cat /run/ovn/ovn-controller.pid) > - m_as ovn-gw-1 kill -9 $(m_as ovn-gw-1 cat /run/openvswitch/ovs-vswitchd.pid) > - m_as ovn-gw-1 kill -9 $(m_as ovn-gw-1 cat /run/openvswitch/ovsdb-server.pid) > - # Also delete datapath (flows) > - m_as ovn-gw-1 ovs-dpctl del-dp system@ovs-system > - fi > +check_loss_after_flap() > +{ > + dead=$1 > + max_expected_loss=$2 > > - if [ test "$action" == "kill_gw2" ]; then > - AS_BOX([$(date +%H:%M:%S.%03N) Waiting for flap count between gw1 and gw2 to increase]) > + if [[ "$dead" == "gw2" ]]; then > + echo "$(date +%H:%M:%S.%03N) Waiting for flap count between gw1 and gw2 to increase" > OVS_WAIT_UNTIL([ > new_flap_count=$(m_as ovn-gw-1 ovs-vsctl get interfac $from_gw1_to_gw2 bfd_status | sed 's/.*flap_count=\"\([[0-9]]*\).*/\1/g') > echo "Comparing $new_flap_count versus $flap_count_gw_1" > test "$new_flap_count" -gt "$((flap_count_gw_1))" > ]) > else > - AS_BOX([$(date +%H:%M:%S.%03N) Waiting for flap count between gw2 and gw1 to increase]) > + echo "$(date +%H:%M:%S.%03N) Waiting for flap count between gw2 and gw1 to increase])" > OVS_WAIT_UNTIL([ > new_flap_count=$(m_as ovn-gw-2 ovs-vsctl get interfac $from_gw2_to_gw1 bfd_status | sed 's/.*flap_count=\"\([[0-9]]*\).*/\1/g') > echo "Comparing $new_flap_count versus $flap_count_gw_2" > test "$new_flap_count" -gt "$((flap_count_gw_2))" > ]) > - > fi > - AS_BOX([$(date +%H:%M:%S.%03N) Flapped!]) > > + echo "$(date +%H:%M:%S.%03N) Flapped!" > # Wait a few more second for the fight. > + sleep 4 > + > + echo "$(date +%H:%M:%S.%03N) Statistics after flapping" > + lost=$(dump_statistics) > + echo "===> $lost packet lost while handling migration" > + AT_CHECK([test "$lost" -le "$max_expected_loss"]) > +} > + > +final_check() > +{ > + action=$1 > + lost=$2 > + max_expected_loss_after_restoration=$3 > + > + # Wait a little more to get packets while network is restored > sleep 2 > - AS_BOX([$(date +%H:%M:%S.%03N) Statistics after flapping]) > - dump_statistics > - > - if [ test "$action" == "stop_bfd" ]; then > - # gw1 still alive and gw2 tried to claim => gw1 should restart generating garps. > - check_for_new_garps gw1 "true" > - check_for_new_garps gw2 "false" > - check_for_new_garps gw3 "false" > - check_for_new_echo_pkts gw1 00:00:c0:a8:00:01 f0:00:c0:a8:00:fe "true" > - check_for_new_echo_pkts gw2 00:00:c0:a8:00:01 f0:00:c0:a8:00:fe "false" > - check_for_new_echo_pkts gw3 00:00:c0:a8:00:01 f0:00:c0:a8:00:fe "false" > - check_for_new_echo_pkts ch1 f0:00:c0:a8:01:fe f0:00:c0:a8:01:01 "true" > - AS_BOX([$(date +%H:%M:%S.%03N) Unblocking bfd on gw1]) > - nsenter --net=/proc/$gw1_pid/ns/net nft -a list ruleset > - nsenter --net=/proc/$gw1_pid/ns/net nft delete table ip ovn-test > - fi > + echo "$(date +%H:%M:%S.%03N) Statistics after network restored (after $action)" > + new_lost=$(dump_statistics) > + echo "===> $((new_lost - lost)) packets lost during network restoration" > + AT_CHECK([test "$((new_lost - lost))" -le "$max_expected_loss_after_restoration"]) > + stop_sending_background_packets > +} > > - if [ test "$action" == "kill_gw2" ]; then > - # gw1 still alive, but gw2 did not try to claim => gw1 should not generate new garps. > - check_for_new_garps gw1 "false" > - check_for_new_garps gw2 "false" > - check_for_new_garps gw3 "false" > - check_for_new_echo_pkts gw1 00:00:c0:a8:00:01 f0:00:c0:a8:00:fe "true" > - check_for_new_echo_pkts gw2 00:00:c0:a8:00:01 f0:00:c0:a8:00:fe "false" > - check_for_new_echo_pkts gw3 00:00:c0:a8:00:01 f0:00:c0:a8:00:fe "false" > - check_for_new_echo_pkts ch1 f0:00:c0:a8:01:fe f0:00:c0:a8:01:01 "true" > - AS_BOX([$(date +%H:%M:%S.%03N) Restarting gw2 ovn-vswitchd]) > - m_as ovn-gw-2 /usr/share/openvswitch/scripts/ovs-ctl start --system-id=ovn-gw-2 > - > - AS_BOX([$(date +%H:%M:%S.%03N) Restarting gw2 ovn-controller]) > - m_as ovn-gw-2 /usr/share/ovn/scripts/ovn-ctl start_controller ${CONTROLLER_SSL_ARGS} > - fi > +check_garps() > +{ > + check_for_new_garps gw1 "$1" > + check_for_new_garps gw2 "$2" > + check_for_new_garps gw3 "$3" > +} > > - if [ test "$action" == "kill_gw1" ]; then > - # gw1 died => gw2 should generate garps. > - check_for_new_garps gw1 "false" > - check_for_new_garps gw2 "true" > - check_for_new_garps gw3 "false" > - check_for_new_echo_pkts gw1 00:00:c0:a8:00:01 f0:00:c0:a8:00:fe "false" > - check_for_new_echo_pkts gw2 00:00:c0:a8:00:01 f0:00:c0:a8:00:fe "true" > - check_for_new_echo_pkts gw3 00:00:c0:a8:00:01 f0:00:c0:a8:00:fe "false" > - check_for_new_echo_pkts ch1 f0:00:c0:a8:01:fe f0:00:c0:a8:01:01 "true" > - AS_BOX([$(date +%H:%M:%S.%03N) Restarting gw1 ovn-vswitchd]) > - m_as ovn-gw-1 /usr/share/openvswitch/scripts/ovs-ctl start --system-id=ovn-gw-1 > - > - AS_BOX([$(date +%H:%M:%S.%03N) Restarting gw1 ovn-controller]) > - m_as ovn-gw-1 /usr/share/ovn/scripts/ovn-ctl start_controller ${CONTROLLER_SSL_ARGS} > - fi > +check_packets() > +{ > + check_for_new_echo_pkts gw1 00:00:c0:a8:00:01 f0:00:c0:a8:00:fe "$1" > + check_for_new_echo_pkts gw2 00:00:c0:a8:00:01 f0:00:c0:a8:00:fe "$2" > + check_for_new_echo_pkts gw3 00:00:c0:a8:00:01 f0:00:c0:a8:00:fe "$3" > + check_for_new_echo_pkts ch1 f0:00:c0:a8:01:fe f0:00:c0:a8:01:01 "$4" > +} > + > +check_migration_between_gw1_and_gw2_bfd_stop() > +{ > + AS_BOX([$(date +%H:%M:%S.%03N) Testing migration after bfd_stop]) > + max_expected_max_expected_loss1=$1 > + max_expected_max_expected_loss2=$2 > + prepare > + > + echo "$(date +%H:%M:%S.%03N) Blocking bfd on gw1 (from $ip_gw1 to $ip_gw2)" > + nsenter --net=/proc/$gw1_pid/ns/net nft add table ip ovn-test > + nsenter --net=/proc/$gw1_pid/ns/net nft 'add chain ip ovn-test INPUT { type filter hook input priority 0; policy accept; }' > + # Drop BFD from gw-1 to gw-2: geneve port (6081), inner port 3784 (0xec8), Session state Up, Init, Down. > + nsenter --net=/proc/$gw1_pid/ns/net nft add rule ip ovn-test INPUT ip daddr $ip_gw1 ip saddr $ip_gw2 udp dport 6081 '@th,416,16 == 0x0ec8 @th,472,8 == 0xc0 counter drop' > + nsenter --net=/proc/$gw1_pid/ns/net nft add rule ip ovn-test INPUT ip daddr $ip_gw1 ip saddr $ip_gw2 udp dport 6081 '@th,416,16 == 0x0ec8 @th,472,8 == 0x80 counter drop' > + nsenter --net=/proc/$gw1_pid/ns/net nft add rule ip ovn-test INPUT ip daddr $ip_gw1 ip saddr $ip_gw2 udp dport 6081 '@th,416,16 == 0x0ec8 @th,472,8 == 0x40 counter drop' > + > + check_loss_after_flap "gw1" $max_expected_max_expected_loss1 > + > + # gw1 still alive and gw2 tried to claim => gw1 should restart generating garps. > + check_garps "true" "false" "false" > + check_packets "true" "false" "false" "true" > + > + echo "$(date +%H:%M:%S.%03N) Unblocking bfd on gw1" > + nsenter --net=/proc/$gw1_pid/ns/net nft -a list ruleset > + nsenter --net=/proc/$gw1_pid/ns/net nft delete table ip ovn-test > > # The network is now restored => packets should go through gw1 and reach chassis-1. > - check_for_new_echo_pkts gw1 00:00:c0:a8:00:01 f0:00:c0:a8:00:fe "true" > - check_for_new_echo_pkts gw2 00:00:c0:a8:00:01 f0:00:c0:a8:00:fe "false" > - check_for_new_echo_pkts gw3 00:00:c0:a8:00:01 f0:00:c0:a8:00:fe "false" > - check_for_new_echo_pkts ch1 f0:00:c0:a8:01:fe f0:00:c0:a8:01:01 "true" > - AS_BOX([$(date +%H:%M:%S.%03N) Statistics after network restored]) > - dump_statistics > - stop_sending_background_packets > + check_packets "true" "false" "false" "true" > + final_check "bfd_stop" $lost $max_expected_max_expected_loss2 > +} > + > +check_migration_between_gw1_and_gw2_kill_gw2() { > + AS_BOX([$(date +%H:%M:%S.%03N) Check migration after killing gw2 ovn-controller & vswitchd]) > + max_expected_loss1=$1 > + max_expected_loss2=$2 > + prepare > + > + on_exit 'm_as ovn-gw-2 /usr/share/openvswitch/scripts/ovs-ctl status || > + m_as ovn-gw-2 /usr/share/openvswitch/scripts/ovs-ctl start --system-id=ovn-gw-2' > + on_exit 'm_as ovn-gw-2 /usr/share/ovn/scripts/ovn-ctl status_controller || > + m_as ovn-gw-2 /usr/share/ovn/scripts/ovn-ctl start_controller ${CONTROLLER_SSL_ARGS}' > + > + m_as ovn-gw-2 kill -9 $(m_as ovn-gw-2 cat /run/ovn/ovn-controller.pid) > + m_as ovn-gw-2 kill -9 $(m_as ovn-gw-2 cat /run/openvswitch/ovs-vswitchd.pid) > + m_as ovn-gw-2 kill -9 $(m_as ovn-gw-2 cat /run/openvswitch/ovsdb-server.pid) > + m_as ovn-gw-2 ovs-dpctl del-dp system@ovs-system > + > + check_loss_after_flap "gw2" $max_expected_loss1 > + > + # gw1 still alive, but gw2 did not try to claim => gw1 should not generate new garps. > + check_garps "false" "false" "false" > + check_packets "true" "fals" "false" "true" > + > + echo "$(date +%H:%M:%S.%03N) Restarting gw2 ovn-vswitchd]" > + m_as ovn-gw-2 /usr/share/openvswitch/scripts/ovs-ctl start --system-id=ovn-gw-2 > + > + echo "$(date +%H:%M:%S.%03N) Restarting gw2 ovn-controller" > + m_as ovn-gw-2 /usr/share/ovn/scripts/ovn-ctl start_controller ${CONTROLLER_SSL_ARGS} > + > + # The network is now restored => packets should go through gw1 and reach chassis-1. > + check_packets "true" "false" "false" "true" > + final_check "kill_gw2" $lost $max_expected_loss2 > +} > + > +check_migration_between_gw1_and_gw2_update_ovs() { > + AS_BOX([$(date +%H:%M:%S.%03N) Check migration after restarting gw1 ovs-vswitchd ("update")]) > + max_expected_loss1=$1 > + max_expected_loss2=$2 > + prepare > + > + m_as ovn-gw-1 /usr/share/openvswitch/scripts/ovs-ctl restart --system-id=ovn-gw-1 > + > + check_loss_after_flap "gw1" $max_expected_loss1 > + > + # The network is now restored => packets should go through gw1 and reach chassis-1. > + check_packets "true" "false" "false" "true" > + final_check "ovs_update" $lost $max_expected_loss2 > +} > + > +check_migration_between_gw1_and_gw2_kill_gw1() { > + AS_BOX([$(date +%H:%M:%S.%03N) Killing gw1 ovn-controller and ovs-vswitchd]) > + max_expected_loss1=$1 > + max_expected_loss2=$2 > + prepare > + > + on_exit 'm_as ovn-gw-1 /usr/share/openvswitch/scripts/ovs-ctl status || > + m_as ovn-gw-1 /usr/share/openvswitch/scripts/ovs-ctl start --system-id=ovn-gw-1' > + on_exit 'm_as ovn-gw-1 /usr/share/ovn/scripts/ovn-ctl status_controller || > + m_as ovn-gw-1 /usr/share/ovn/scripts/ovn-ctl start_controller ${CONTROLLER_SSL_ARGS}' > + > + m_as ovn-gw-1 kill -9 $(m_as ovn-gw-1 cat /run/ovn/ovn-controller.pid) > + m_as ovn-gw-1 kill -9 $(m_as ovn-gw-1 cat /run/openvswitch/ovs-vswitchd.pid) > + m_as ovn-gw-1 kill -9 $(m_as ovn-gw-1 cat /run/openvswitch/ovsdb-server.pid) > + # Also delete datapath (flows) > + m_as ovn-gw-1 ovs-dpctl del-dp system@ovs-system > + > + check_loss_after_flap "gw1" $max_expected_loss1 > + > + # gw1 died => gw2 should generate garps. > + check_garps "false" "true" "false" > + check_packets "false" "true" "false" "true" > + echo "$(date +%H:%M:%S.%03N) Restarting gw1 ovn-vswitchd after killing gw1" > + m_as ovn-gw-1 /usr/share/openvswitch/scripts/ovs-ctl start --system-id=ovn-gw-1 > + > + # Wait some long time before restarting ovn-controller > + sleep 10 > + > + # gw2 should still be handling packets as OVN not restarted on gw1 > + check_packets "false" "true" "false" "true" > + > + echo "$(date +%H:%M:%S.%03N) Restarting gw1 ovn-controller after killing gw1" > + m_as ovn-gw-1 /usr/share/ovn/scripts/ovn-ctl start_controller ${CONTROLLER_SSL_ARGS} > + > + # The network is now restored => packets should go through gw1 and reach chassis-1. > + check_packets "true" "false" "false" "true" > + final_check "kill_gw1" $lost $max_expected_loss2 > +} > + > +check_migration_between_gw1_and_gw2_reboot_gw1() { > + ip_gw1_eth1=$(podman exec ovn-gw-1 ip -brief address show eth1 | awk '{print $3}' | cut -d/ -f1) > + cidr=$(podman exec ovn-gw-1 ip -brief address show eth1 | awk '{print $3}' | cut -d/ -f2) > + AS_BOX([$(date +%H:%M:%S.%03N) Rebooting ovn-gw-1 with $ip_gw1_eth1/$cidr]) > + max_expected_loss1=$1 > + max_expected_loss2=$2 > + prepare > + > + podman stop -t 0 ovn-gw-1 > + (exec 3>&- 4>&- 5>&- 6>&-; podman start ovn-gw-1) > + > + add_port br-ovn eth1 $ip_gw1_eth1/$cidr > + add_port br-ovn-ext eth2 > + M_START_TCPDUMPS([ovn-gw-1], [-neei eth2], [gw1], [ovn-gw-1], [-neei eth1], [gw1_eth1], [ovn-gw-1], [-neei eth2 -Q out], [gw1_out]) > + check_loss_after_flap "gw1" $max_expected_loss1 > + > + # gw1 died => gw2 should generate garps. > + check_garps "false" "true" "false" > + check_packets "false" "true" "false" "true" > + > + echo "$(date +%H:%M:%S.%03N) Restarting gw1 ovn-vswitchd after rebooting gw1" > + m_as ovn-gw-1 /usr/share/openvswitch/scripts/ovs-ctl start --system-id=ovn-gw-1 > + > + # Wait some long time before restarting ovn-controller > + sleep 10 > + > + # gw2 should still be handling packets as OVN not restarted on gw1 > + check_packets "false" "true" "false" "true" > + > + echo "$(date +%H:%M:%S.%03N) Restarting gw1 ovn-controller after rebooting gw1" > + m_as ovn-gw-1 /usr/share/ovn/scripts/ovn-ctl start_controller ${CONTROLLER_SSL_ARGS} > + > + # The network is now restored => packets should go through gw1 and reach chassis-1. > + check_packets "true" "false" "false" "true" > + final_check "kill_gw1" $lost $max_expected_loss2 > +} > + > +check_compute_restart() { > + AS_BOX([$(date +%H:%M:%S.%03N) Killing ovn-chassis-1 ovn-controller and ovs-vswitchd]) > + max_expected_loss=$1 > + prepare > + > + # Kill ovn-chassis-1 > + echo "$(date +%H:%M:%S.%03N) Killing chassis-1" > + on_exit 'm_as ovn-chassis-1 /usr/share/openvswitch/scripts/ovs-ctl status || > + m_as ovn-chassis-1 /usr/share/openvswitch/scripts/ovs-ctl start --system-id=ovn-chassis-1' > + on_exit 'm_as ovn-chassis-1 /usr/share/ovn/scripts/ovn-ctl status_controller || > + m_as ovn-chassis-1 /usr/share/ovn/scripts/ovn-ctl start_controller ${CONTROLLER_SSL_ARGS}' > + > + m_as ovn-chassis-1 kill -9 $(m_as ovn-chassis-1 cat /run/ovn/ovn-controller.pid) > + m_as ovn-chassis-1 kill -9 $(m_as ovn-chassis-1 cat /run/openvswitch/ovs-vswitchd.pid) > + m_as ovn-chassis-1 kill -9 $(m_as ovn-chassis-1 cat /run/openvswitch/ovsdb-server.pid) > + > + # Now restart chassis-1 > + flap_count=$(m_as ovn-gw-2 ovs-vsctl get interfac $from_gw2_to_ch1 bfd_status | sed 's/.*flap_count=\"\([[0-9]]*\).*/\1/g') > + echo "$(date +%H:%M:%S.%03N) Restarting ovn-chassis-1 ovn-vswitchd." > + m_as ovn-chassis-1 /usr/share/openvswitch/scripts/ovs-ctl start --system-id=ovn-chassis-1 > + > + echo "$(date +%H:%M:%S.%03N) Waiting for flap count between gw-1 and chassis-1 to increase" > + OVS_WAIT_UNTIL([ > + new_flap_count=$(m_as ovn-gw-1 ovs-vsctl get interfac $from_gw1_to_ch1 bfd_status | sed 's/.*flap_count=\"\([[0-9]]*\).*/\1/g') > + echo "Comparing $new_flap_count versus $flap_count" > + test "$new_flap_count" -gt "$((flap_count))" > + ]) > + > + wait_bfd_up ovn-chassis-1 $from_ch1_to_gw1 > + > + echo "$(date +%H:%M:%S.%03N) Restarting ovn-chassis-1 ovn-controller." > + m_as ovn-chassis-1 /usr/share/ovn/scripts/ovn-ctl start_controller ${CONTROLLER_SSL_ARGS} > + > + # Wait a long time to catch losses > + sleep 5 > + final_check "compute" 0 $max_expected_loss > } > > start_tcpdump > -AS_BOX([$(date +%H:%M:%S.%03N) Sending packet from hv1-vif1(inside1) to ext1]) > +echo "$(date +%H:%M:%S.%03N) Sending packet from hv1-vif1(inside1) to ext1" > M_NS_CHECK_EXEC([ovn-chassis-1], [hv1-vif1], [ping -c3 -q -i 0.1 192.168.0.1 | FORMAT_PING], > [0], [dnl > 3 packets transmitted, 3 received, 0% packet loss, time 0ms > @@ -3433,7 +3629,7 @@ M_NS_CHECK_EXEC([ovn-chassis-1], [hv1-vif1], [ping -c3 -q -i 0.1 192.168.0.1 | F > stop_tcpdump > > # It should have gone through gw1 and not gw2 > -AS_BOX([$(date +%H:%M:%S.%03N) Checking it went through gw1 and not gw2]) > +echo "$(date +%H:%M:%S.%03N) Checking it went through gw1 and not gw2" > AT_CHECK([cat gw2.tcpdump | grep "ICMP echo"], [1], [dnl > ]) > > @@ -3446,17 +3642,29 @@ f0:00:c0:a8:00:fe > 00:00:c0:a8:00:01, ethertype IPv4 (0x0800), length 98: 192.1 > 00:00:c0:a8:00:01 > f0:00:c0:a8:00:fe, ethertype IPv4 (0x0800), length 98: 192.168.0.1 > 192.168.1.1: ICMP echo reply, > ]) > > -# We stop bfd between gw1 & gw2, but keep gw1 & gw2 running. > -check_migration_between_gw1_and_gw2 "stop_bfd" > +# We stop bfd between gw1 & gw2, but keep gw1 & gw2 running. We should not lose packets. > +check_migration_between_gw1_and_gw2_bfd_stop 1 1 > > # We simulate death of gw2. It should not have any effect. > -check_migration_between_gw1_and_gw2 "kill_gw2" > +check_migration_between_gw1_and_gw2_kill_gw2 1 1 > + > +# We simulate ovs update on gw1. When ovs is stopped, flows should still be handled by Kernel datapath. > +# When OVS is restarted, BFD should go down immediately, and gw2 will start handling packets. > +# There will be packet losses as gw2 will usually see BFD from gw1 up (and hence relase port) before gw1 sees > +# BFD up (and claim port). > +check_migration_between_gw1_and_gw2_update_ovs 20 1 > + > +# We simulate restart of both OVS & OVN gw1. gw2 should take over. > +check_migration_between_gw1_and_gw2_kill_gw1 40 20 > > # We simulate death of gw1. gw2 should take over. > -check_migration_between_gw1_and_gw2 "kill_gw1" > +check_migration_between_gw1_and_gw2_reboot_gw1 40 20 > + > +# We simulate restart of ovn-chassis-1. We expct for ~2 sec losses as we wait for bfd up before starting > +# ovn-controller. > +check_compute_restart 30 > > AT_CLEANUP > -]) > > AT_SETUP([ovn multinode bgp L2 EVPN]) > check_fake_multinode_setup > -- > 2.47.1 >
On 3/12/26 8:35 PM, Mark Michelson via dev wrote: > Hi Xavier, > Hi Xavier, Mark, > It took a bit for me to look through all of this, but I believe this > looks good to me. Thanks for adding the notes about the offline > discussions, as I had not recalled all the details about those > discussions between then and now. > > Acked-by: Mark Michelson <mmichels@redhat.com> > I think this change makes sense too. I do have some very small comments below but I might be able to address them myself and just squash that into this version of the patch before applying it to main. Please let me know what you think. > On Mon, Mar 9, 2026 at 3:36 PM Xavier Simonart <xsimonar@redhat.com> wrote: >> >> If a server unexpectedly rebooted, OVS, when restarted, sets BFD >> UP on bfd-enabled geneve tunnels. >> However, if it takes time to restart OVN, an HA gw chassis >> would attract the traffic while being unable to handle it >> (as no flows), resulting in traffic loss. >> >> This is fixed by re-using ovs flow-restore-wait. >> If set, OVS waits (prevents upcalls, ignores bfd, ...) until reset. >> Once OVS receives the notification of flow-restore-wait being false, >> it restarts handling upcalls, bfd... and ignores any new change to >> flow-restore-wait. >> >> Hence, on chassis hosting ha gateways, OVN toggles flow-restore-wait: >> it set it to false, waits for ack from OVS and then sets it back to true. >> If server reboots, OVS will see flow-restore-wait being true. >> >> OVN also sets external_ids->ovn-managed-flow-restore-wait when setting >> flow-restore-wait. When set, it tells that OVN once set flow-restore-wait. >> >> "ovs-ctl restart" also uses flow-restore-wait: when called, it saves the >> flows, stops "ovs-vswitchd", sets "flow-restore-wait" to true, restarts >> "ovs-vswitchd", restores the flows and finally removes "flow-restore-wait". >> So OVS will wait either for "ovs-ctl restart" to remove "flow-restore-wait" >> or for OVN to set "flow-restore-wait" to false. >> >> Reported-at: https://issues.redhat.com/browse/FDP-3075 >> Signed-off-by: Xavier Simonart <xsimonar@redhat.com> >> >> --- >> -v2 : - Updated based on Mark's feedback (commit message, comments). >> - Avoid setting flow-restore-wait for computes. >> - Add external_ids->ovn-managed-flow-restore-wait. >> - Updated test: add test for compute update + nits (variable name changes) >> --- >> controller/bfd.c | 5 +- >> controller/bfd.h | 4 +- >> controller/ovn-controller.8.xml | 11 + >> controller/ovn-controller.c | 171 +++++++++- >> tests/multinode-macros.at | 22 ++ >> tests/multinode.at | 546 ++++++++++++++++++++++---------- >> 6 files changed, 584 insertions(+), 175 deletions(-) >> >> diff --git a/controller/bfd.c b/controller/bfd.c >> index 3b0c3f6da..56bfa4936 100644 >> --- a/controller/bfd.c >> +++ b/controller/bfd.c >> @@ -117,13 +117,14 @@ bfd_calculate_active_tunnels(const struct ovsrec_bridge *br_int, >> * >> * If 'our_chassis' is C5 then this function returns empty bfd set. >> */ >> -void >> +bool >> bfd_calculate_chassis( >> const struct sbrec_chassis *our_chassis, >> const struct sbrec_ha_chassis_group_table *ha_chassis_grp_table, >> struct sset *bfd_chassis) >> { >> const struct sbrec_ha_chassis_group *ha_chassis_grp; >> + bool chassis_is_ha_gw = false; >> SBREC_HA_CHASSIS_GROUP_TABLE_FOR_EACH (ha_chassis_grp, >> ha_chassis_grp_table) { >> bool is_ha_chassis = false; >> @@ -143,6 +144,7 @@ bfd_calculate_chassis( >> sset_add(&grp_chassis, ha_ch->chassis->name); >> if (our_chassis == ha_ch->chassis) { >> is_ha_chassis = true; >> + chassis_is_ha_gw = true; >> bfd_setup_required = true; >> } >> } >> @@ -178,6 +180,7 @@ bfd_calculate_chassis( >> } >> sset_destroy(&grp_chassis); >> } >> + return chassis_is_ha_gw; >> } >> >> void >> diff --git a/controller/bfd.h b/controller/bfd.h >> index f8fece5a5..3e3384891 100644 >> --- a/controller/bfd.h >> +++ b/controller/bfd.h >> @@ -16,6 +16,8 @@ >> #ifndef OVN_BFD_H >> #define OVN_BFD_H 1 >> >> +#include <stdbool.h> >> + >> struct hmap; >> struct ovsdb_idl; >> struct ovsdb_idl_index; >> @@ -36,7 +38,7 @@ void bfd_run(const struct ovsrec_interface_table *, >> const struct sbrec_sb_global_table *, >> const struct ovsrec_open_vswitch_table *); >> >> -void bfd_calculate_chassis( >> +bool bfd_calculate_chassis( >> const struct sbrec_chassis *, >> const struct sbrec_ha_chassis_group_table *, >> struct sset *); >> diff --git a/controller/ovn-controller.8.xml b/controller/ovn-controller.8.xml >> index 57e7cf5dd..33281a4d6 100644 >> --- a/controller/ovn-controller.8.xml >> +++ b/controller/ovn-controller.8.xml >> @@ -531,6 +531,17 @@ >> 65535. >> </dd> >> >> + <dt> >> + <code>external_ids:ovn-managed-flow-restore-wait</code> in the >> + <code>Open_vSwitch</code> table >> + </dt> >> + <dd> >> + When set to true, this key indicates that <code>ovn-controller</code> >> + has set the <code>other_config:flow-restore-wait</code> option. >> + The key is set when <code>ovn-controller</code> enables >> + flow-restore-wait and removed when it clears it. >> + </dd> >> + >> <dt> >> <code>external_ids:ct-zone-*</code> in the <code>Bridge</code> table >> </dt> >> diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c >> index 5815f1b92..0611b1767 100644 >> --- a/controller/ovn-controller.c >> +++ b/controller/ovn-controller.c >> @@ -211,6 +211,150 @@ static char *get_file_system_id(void) >> free(filename); >> return ret; >> } >> + >> +/* Set/unset flow-restore-wait, and inc ovs next_cfg if false >> + * When set to true, also sets ovn-managed-flow-restore-wait to true to >> + * indicate ownership */ >> +static void set_flow_restore_wait(struct ovsdb_idl_txn *ovs_idl_txn, >> + const struct ovsrec_open_vswitch *cfg, >> + const struct smap *other_config, >> + const char *val, bool ovn_managed) Nit: "static void" should be on a different line. >> +{ >> + struct smap new_config; >> + smap_clone(&new_config, other_config); >> + smap_replace(&new_config, "flow-restore-wait", val); >> + ovsrec_open_vswitch_set_other_config(cfg, &new_config); >> + if (!strcmp(val, "true")) { I'd prefer we use an actual boolean here as 'val'. We'd only have to change: smap_replace(&new_config, "flow-restore-wait", val ? "true", "false"); >> + ovsrec_open_vswitch_update_external_ids_setkey( >> + cfg, "ovn-managed-flow-restore-wait", "true"); >> + } else if (ovn_managed) { >> + ovsrec_open_vswitch_update_external_ids_delkey( >> + cfg, "ovn-managed-flow-restore-wait"); >> + } >> + ovsdb_idl_txn_increment(ovs_idl_txn, &cfg->header_, >> + &ovsrec_open_vswitch_col_next_cfg, true); >> + smap_destroy(&new_config); >> +} >> + >> +static void >> +manage_flow_restore_wait(struct ovsdb_idl_txn *ovs_idl_txn, >> + const struct ovsrec_open_vswitch *cfg, >> + uint64_t ofctrl_cur_cfg, uint64_t ovs_next_cfg, >> + int ovs_txn_status, bool is_ha_gw) >> +{ >> + enum flow_restore_wait_state { >> + FRW_INIT, /* Initial state */ >> + FRW_WAIT_TXN_COMPLETE, /* Sent false, waiting txn to complete */ >> + FRW_TXN_SUCCESS, /* Txn completed. Waiting for OVS Ack. */ >> + FRW_DONE /* Everything completed */ >> + }; >> + >> + static int64_t frw_next_cfg; >> + static enum flow_restore_wait_state frw_state; >> + static bool ofctrl_was_connected = false; >> + >> + bool ofctrl_connected = ofctrl_is_connected(); >> + >> + if (!ovs_idl_txn || !cfg) { >> + return; >> + } >> + >> + /* If OVS is stopped/started, make sure flow-restore-wait is toggled */ Nit: comments should be sentences and end with a '.'. >> + if (ofctrl_connected && !ofctrl_was_connected) { >> + frw_state = FRW_INIT; >> + } >> + ofctrl_was_connected = ofctrl_connected; >> + >> + if (!ofctrl_connected) { >> + return; >> + } >> + >> + bool frw = smap_get_bool(&cfg->other_config, "flow-restore-wait", false); >> + bool ovn_managed_once = smap_get_bool(&cfg->external_ids, >> + "ovn-managed-flow-restore-wait", false); Nit: indentation. >> + >> + if (frw && !ovn_managed_once) { >> + /* frw has been set by ovs-ctl. Do not touch. */ >> + return; >> + } >> + >> + if (!is_ha_gw) { >> + if (frw) { >> + /* frw has once been set by OVN. We are now not an HA chassis >> + * anymore, unset it. */ >> + set_flow_restore_wait(ovs_idl_txn, cfg, &cfg->other_config, >> + "false", ovn_managed_once); >> + } >> + /* else we are not an HA chassis and frw is false. Ignore it. */ >> + return; >> + } >> + >> + switch (frw_state) { >> + case FRW_INIT: >> + if (ofctrl_cur_cfg > 0) { >> + set_flow_restore_wait(ovs_idl_txn, cfg, &cfg->other_config, >> + "false", ovn_managed_once); >> + frw_state = FRW_WAIT_TXN_COMPLETE; >> + VLOG_INFO("Setting flow-restore-wait=false " >> + "(cur_cfg=%"PRIu64")", ofctrl_cur_cfg); >> + } >> + break; >> + >> + case FRW_WAIT_TXN_COMPLETE: >> + /* if (ovs_idl_txn != NULL), the transaction completed. >> + * When the transaction completed, it either failed >> + * (ovs_txn_status == 0) or succeeded (ovs_txn_status != 0) */ >> + if (ovs_txn_status == 0) { >> + /* Previous transaction failed. */ >> + set_flow_restore_wait(ovs_idl_txn, cfg, &cfg->other_config, >> + "false", ovn_managed_once); >> + break; >> + } >> + /* txn succeeded, get next_cfg */ >> + frw_next_cfg = ovs_next_cfg; >> + frw_state = FRW_TXN_SUCCESS; >> + /* fall through */ >> + >> + case FRW_TXN_SUCCESS: >> + if (ovs_next_cfg < frw_next_cfg) { >> + /* DB was reset, next_cfg went backwards */ >> + VLOG_INFO("OVS DB reset (next_cfg %"PRId64" -> %"PRIu64"), " >> + "resetting state", >> + frw_next_cfg, ovs_next_cfg); >> + set_flow_restore_wait(ovs_idl_txn, cfg, &cfg->other_config, >> + "false", ovn_managed_once); >> + frw_state = FRW_WAIT_TXN_COMPLETE; >> + break; >> + } >> + >> + if (!frw) { >> + if (cfg->cur_cfg >= frw_next_cfg) { >> + set_flow_restore_wait(ovs_idl_txn, cfg, &cfg->other_config, >> + "true", ovn_managed_once); >> + frw_state = FRW_DONE; >> + VLOG_INFO("Setting flow-restore-wait=true"); >> + } >> + } else { >> + /* The transaction to false succeeded but frw is true. >> + * So, another task already set it to true */ >> + frw_state = FRW_DONE; >> + VLOG_INFO("flow-restore-wait was already true"); >> + } >> + break; >> + case FRW_DONE: >> + if (!frw) { >> + /* frw has been removed (e.g. by ovs-ctl restart) or is false >> + * (e.g. txn failed.) */ Nit: extra whitespace at the beginning of the line. >> + set_flow_restore_wait(ovs_idl_txn, cfg, &cfg->other_config, >> + "false", ovn_managed_once); >> + frw_state = FRW_WAIT_TXN_COMPLETE; >> + VLOG_INFO("OVS frw cleared, restarting flow-restore-wait sequence " >> + "(cur_cfg=%"PRIu64")", ofctrl_cur_cfg); >> + } >> + break; >> + } >> +} >> + >> /* Only set monitor conditions on tables that are available in the >> * server schema. >> */ >> @@ -3381,6 +3525,7 @@ en_mac_cache_cleanup(void *data) >> >> struct ed_type_bfd_chassis { >> struct sset bfd_chassis; >> + bool is_ha_gw; >> }; >> >> static void * >> @@ -3409,8 +3554,9 @@ en_bfd_chassis_run(struct engine_node *node, void *data OVS_UNUSED) >> = chassis_lookup_by_name(sbrec_chassis_by_name, chassis_id); >> >> sset_clear(&bfd_chassis->bfd_chassis); >> - bfd_calculate_chassis(chassis, ha_chassis_grp_table, >> - &bfd_chassis->bfd_chassis); >> + bfd_chassis->is_ha_gw = bfd_calculate_chassis(chassis, >> + ha_chassis_grp_table, >> + &bfd_chassis->bfd_chassis); >> return EN_UPDATED; >> } >> >> @@ -7117,6 +7263,7 @@ main(int argc, char *argv[]) >> struct unixctl_server *unixctl; >> struct ovn_exit_args exit_args = {0}; >> struct br_int_remote br_int_remote = {0}; >> + static uint64_t next_cfg = 0; >> int retval; >> >> /* Read from system-id-override file once on startup. */ >> @@ -7444,6 +7591,7 @@ main(int argc, char *argv[]) >> >> /* Main loop. */ >> int ovnsb_txn_status = 1; >> + int ovs_txn_status = 1; >> bool sb_monitor_all = false; >> struct tracked_acl_ids *tracked_acl_ids = NULL; >> while (!exit_args.exiting) { >> @@ -7545,6 +7693,11 @@ main(int argc, char *argv[]) >> pinctrl_update_swconn(br_int_remote.target, >> br_int_remote.probe_interval); >> >> + if (cfg && ovs_idl_txn && ovs_txn_status == -1) { >> + /* txn was in progress and is now completed */ >> + next_cfg = cfg->next_cfg; >> + } >> + >> /* Enable ACL matching for double tagged traffic. */ >> if (ovs_idl_txn && cfg) { >> int vlan_limit = smap_get_int( >> @@ -7894,6 +8047,13 @@ main(int argc, char *argv[]) >> stopwatch_start(OFCTRL_SEQNO_RUN_STOPWATCH_NAME, >> time_msec()); >> ofctrl_seqno_run(ofctrl_get_cur_cfg()); >> + if (ovs_idl_txn && bfd_chassis_data) { >> + manage_flow_restore_wait(ovs_idl_txn, cfg, >> + ofctrl_get_cur_cfg(), >> + next_cfg, ovs_txn_status, >> + bfd_chassis_data->is_ha_gw); >> + } >> + >> stopwatch_stop(OFCTRL_SEQNO_RUN_STOPWATCH_NAME, >> time_msec()); >> stopwatch_start(IF_STATUS_MGR_RUN_STOPWATCH_NAME, >> @@ -7993,7 +8153,7 @@ main(int argc, char *argv[]) >> OVS_NOT_REACHED(); >> } >> >> - int ovs_txn_status = ovsdb_idl_loop_commit_and_wait(&ovs_idl_loop); >> + ovs_txn_status = ovsdb_idl_loop_commit_and_wait(&ovs_idl_loop); >> if (!ovs_txn_status) { >> /* The transaction failed. */ >> vif_plug_clear_deleted( >> @@ -8012,6 +8172,9 @@ main(int argc, char *argv[]) >> &vif_plug_deleted_iface_ids); >> vif_plug_finish_changed( >> &vif_plug_changed_iface_ids); >> + if (cfg) { >> + next_cfg = cfg->next_cfg; >> + } >> } else if (ovs_txn_status == -1) { >> /* The commit is still in progress */ >> } else { >> @@ -8085,7 +8248,7 @@ loop_done: >> } >> >> ovsdb_idl_loop_commit_and_wait(&ovnsb_idl_loop); >> - int ovs_txn_status = ovsdb_idl_loop_commit_and_wait(&ovs_idl_loop); >> + ovs_txn_status = ovsdb_idl_loop_commit_and_wait(&ovs_idl_loop); >> if (!ovs_txn_status) { >> /* The transaction failed. */ >> vif_plug_clear_deleted( >> diff --git a/tests/multinode-macros.at b/tests/multinode-macros.at >> index 4a74d5127..646ee2d79 100644 >> --- a/tests/multinode-macros.at >> +++ b/tests/multinode-macros.at >> @@ -41,6 +41,28 @@ m4_define([M_START_TCPDUMP], >> ] >> ) >> >> +m4_define([_M_START_TCPDUMPS_RECURSIVE], [ I think we normally put the _ at the end, i.e., M_START_TCPDUMPS_RECURSIVE_ >> + m4_if(m4_eval($# > 3), [1], [dnl >> + names="$names $3" >> + echo "Running podman exec $1 tcpdump -l $2 >$3.tcpdump 2>$3.stderr" >> + podman exec $1 tcpdump -l $2 >$3.tcpdump 2>$3.stderr & >> + echo "podman exec $1 ps -ef | grep -v grep | grep tcpdump && podman exec $1 killall tcpdump" >> cleanup >> + _M_START_TCPDUMPS_RECURSIVE(m4_shift(m4_shift(m4_shift($@)))) >> + ]) >> + ] >> +) >> + >> +# Start Multiple tcpdump. Useful to speed up when many tcpdump >> +# must be started as waiting for "listening" takes usually 1 second. >> +m4_define([M_START_TCPDUMPS], >> + [ >> + names="" >> + _M_START_TCPDUMPS_RECURSIVE($@) >> + for name in $names; do >> + OVS_WAIT_UNTIL([grep -q "listening" ${name}.stderr]) >> + done >> + ] >> +) >> >> # M_FORMAT_CT([ip-addr]) >> # >> diff --git a/tests/multinode.at b/tests/multinode.at >> index 6b9614126..24d7ca27c 100644 >> --- a/tests/multinode.at >> +++ b/tests/multinode.at >> @@ -2986,42 +2986,42 @@ AT_CLEANUP >> >> AT_SETUP([HA: Check for missing garp on leader when BFD goes back up]) >> # Network topology >> -# ┌────────────────────────────────────────────────────────────────────────────────────────────────────────┐ >> -# │ │ >> -# │ ┌───────────────────┐ ┌───────────────────┐ ┌───────────────────┐ ┌───────────────────┐ │ >> -# │ │ ovn-chassis-1 │ │ ovn-gw-1 │ │ ovn-gw-2 │ │ ovn-chassis-2 │ │ >> -# │ └─────────┬─────────┘ └───────────────────┘ └───────────────────┘ └───────────────────┘ │ >> -# │ ┌─────────┴─────────┐ │ >> -# │ │ inside1 │ │ >> -# │ │ 192.168.1.1/24 │ │ >> -# │ └─────────┬─────────┘ │ >> -# │ ┌─────────┴─────────┐ │ >> -# │ │ inside │ │ >> -# │ └─────────┬─────────┘ │ >> -# │ ┌─────────┴─────────┐ │ >> -# │ │ 192.168.1.254 │ │ >> -# │ │ R1 │ │ >> -# │ │ 192.168.0.254 │ │ >> -# │ └─────────┬─────────┘ │ >> -# │ └------eth1---------------┬--------eth1-----------┐ │ >> -# │ ┌──────────┴────────┐ ┌─────────┴─────────┐ │ >> -# │ │ 192.168.1.254 │ │ 192.168.1.254 │ │ >> -# │ │ R1 │ │ R1 │ │ >> -# │ │ 192.168.0.254 │ │ 192.168.0.254 │ │ >> -# │ └─────────┬─────────┘ └─────────┬─────────┘ │ >> -# │ │ │ ┌───────────────────┐ │ >> -# │ ┌─────────┴─────────┐ ┌─────────┴─────────┐ │ 192.168.0.1 │ │ >> -# │ │ outside │ │ outside │ │ ext1 │ │ >> -# │ └─────────┬─────────┘ └─────────┬─────────┘ └─────────┬─────────┘ │ >> -# │ ┌─────────┴─────────┐ ┌─────────┴─────────┐ ┌─────────┴─────────┐ │ >> -# │ │ ln-outside │ │ ln-outside │ │ ln-ext1 │ │ >> -# │ └─────────┬─────────┘ └─────────┬─────────┘ └─────────┬─────────┘ │ >> -# │ ┌─────────┴─────────┐ ┌─────────┴─────────┐ ┌─────────┴─────────┐ │ >> -# │ │ br-ex │ │ br-ex │ │ br-ex │ │ >> -# │ └─────────┬─────────┘ └─────────┬─────────┘ └─────────┬─────────┘ │ >> -# │ └---------eth2-----------┴-------eth2-------------┘ │ >> -# │ │ >> -# └────────────────────────────────────────────────────────────────────────────────────────────────────────┘ >> +# ┌────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ >> +# │ │ >> +# │ ┌───────────────────┐ ┌───────────────────┐ ┌───────────────────┐ ┌───────────────────┐ ┌───────────────────┐ │ >> +# │ │ ovn-chassis-1 │ │ ovn-chassis-2 │ │ ovn-gw-1 │ │ ovn-gw-2 │ │ ovn-chassis-3 │ │ >> +# │ └─────────┬─────────┘ └─────────┬─────────┘ └───────────────────┘ └───────────────────┘ └───────────────────┘ │ >> +# │ ┌─────────┴─────────┐ ┌─────────┴─────────┐ │ >> +# │ │ inside1 │ │ inside2 │ │ >> +# │ │ 192.168.1.1/24 │ │ 192.168.1.2/24 │ │ >> +# │ └─────────┬─────────┘ └─────────┬─────────┘ │ >> +# │ ┌─┴────────────────────────┴─┐ │ >> +# │ │ inside │ │ >> +# │ └──────────────┬─────────────┘ │ >> +# │ ┌─────────┴─────────┐ │ >> +# │ │ 192.168.1.254 │ │ >> +# │ │ R1 │ │ >> +# │ │ 192.168.0.254 │ │ >> +# │ └─────────┬─────────┘ │ >> +# │ └------eth1---------------------------┬--------eth1-----------┐ │ >> +# │ ┌──────────┴────────┐ ┌─────────┴─────────┐ │ >> +# │ │ 192.168.1.254 │ │ 192.168.1.254 │ │ >> +# │ │ R1 │ │ R1 │ │ >> +# │ │ 192.168.0.254 │ │ 192.168.0.254 │ │ >> +# │ └─────────┬─────────┘ └─────────┬─────────┘ │ >> +# │ │ │ ┌───────────────────┐ │ >> +# │ ┌─────────┴─────────┐ ┌─────────┴─────────┐ │ 192.168.0.1 │ │ >> +# │ │ outside │ │ outside │ │ ext1 │ │ >> +# │ └─────────┬─────────┘ └─────────┬─────────┘ └─────────┬─────────┘ │ >> +# │ ┌─────────┴─────────┐ ┌─────────┴─────────┐ ┌─────────┴─────────┐ │ >> +# │ │ ln-outside │ │ ln-outside │ │ ln-ext1 │ │ >> +# │ └─────────┬─────────┘ └─────────┬─────────┘ └─────────┬─────────┘ │ >> +# │ ┌─────────┴─────────┐ ┌─────────┴─────────┐ ┌─────────┴─────────┐ │ >> +# │ │ br-ex │ │ br-ex │ │ br-ex │ │ >> +# │ └─────────┬─────────┘ └─────────┬─────────┘ └─────────┬─────────┘ │ >> +# │ └---------eth2-----------┴-------eth2-------------┘ │ >> +# │ │ >> +# └────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘ >> >> # The goal of this test is the check that GARP are properly generated by higest priority traffic when >> # BFD goes down, and back up, and this whether the BFD event is due either to some bfd packet lost >> @@ -3030,6 +3030,12 @@ AT_SETUP([HA: Check for missing garp on leader when BFD goes back up]) >> # So gw3 should in this test neither send garp or receive packets. >> # >> # Enable vconn so we can check the GARP from a log perspective. >> +on_exit "podman exec ovn-gw-1 ovn-appctl vlog/set info" >> +on_exit "podman exec ovn-gw-1 ovn-appctl vlog/enable-rate-limit" >> +on_exit "podman exec ovn-gw-2 ovn-appctl vlog/set info" >> +on_exit "podman exec ovn-gw-2 ovn-appctl vlog/enable-rate-limit" >> +on_exit "podman exec ovn-gw-3 ovn-appctl vlog/set info" >> +on_exit "podman exec ovn-gw-3 ovn-appctl vlog/enable-rate-limit" >> m_as ovn-gw-1 ovn-appctl vlog/set vconn:dbg >> m_as ovn-gw-2 ovn-appctl vlog/set vconn:dbg >> m_as ovn-gw-3 ovn-appctl vlog/set vconn:dbg >> @@ -3037,12 +3043,17 @@ m_as ovn-gw-1 ovn-appctl vlog/disable-rate-limit >> m_as ovn-gw-2 ovn-appctl vlog/disable-rate-limit >> m_as ovn-gw-3 ovn-appctl vlog/disable-rate-limit >> >> +# Decrease revalidation time on ovs switch simulating ToR. >> +on_exit "OVS_RUNDIR= ovs-vsctl set Open_vSwitch . other_config:max-revalidator=500" Isn't it better to just remove the key on_exit, just in case the default ever changes from 500 to something else? I.e.: on_exit "OVS_RUNDIR= ovs-vsctl remove Open_vSwitch . other_config max-revalidator" >> +OVS_RUNDIR= ovs-vsctl set Open_vSwitch . other_config:max-revalidator=100 >> + >> check_fake_multinode_setup >> >> # Delete the multinode NB and OVS resources before starting the test. >> cleanup_multinode_resources >> >> ip_ch1=$(m_as ovn-chassis-1 ip a show dev eth1 | grep "inet " | awk '{print $2}'| cut -d '/' -f1) >> +ip_ch2=$(m_as ovn-chassis-2 ip a show dev eth1 | grep "inet " | awk '{print $2}'| cut -d '/' -f1) >> ip_gw1=$(m_as ovn-gw-1 ip a show dev eth1 | grep "inet " | awk '{print $2}'| cut -d '/' -f1) >> ip_gw2=$(m_as ovn-gw-2 ip a show dev eth1 | grep "inet " | awk '{print $2}'| cut -d '/' -f1) >> ip_gw3=$(m_as ovn-gw-3 ip a show dev eth1 | grep "inet " | awk '{print $2}'| cut -d '/' -f1) >> @@ -3050,25 +3061,35 @@ ip_gw3=$(m_as ovn-gw-3 ip a show dev eth1 | grep "inet " | awk '{print $2}'| cut >> from_gw1_to_gw2=$(m_as ovn-gw-1 ovs-vsctl --bare --columns=name find interface options:remote_ip=$ip_gw2) >> from_gw1_to_gw3=$(m_as ovn-gw-1 ovs-vsctl --bare --columns=name find interface options:remote_ip=$ip_gw3) >> from_gw1_to_ch1=$(m_as ovn-gw-1 ovs-vsctl --bare --columns=name find interface options:remote_ip=$ip_ch1) >> +from_gw1_to_ch2=$(m_as ovn-gw-1 ovs-vsctl --bare --columns=name find interface options:remote_ip=$ip_ch2) >> from_gw2_to_gw1=$(m_as ovn-gw-2 ovs-vsctl --bare --columns=name find interface options:remote_ip=$ip_gw1) >> from_gw2_to_gw3=$(m_as ovn-gw-2 ovs-vsctl --bare --columns=name find interface options:remote_ip=$ip_gw3) >> from_gw2_to_ch1=$(m_as ovn-gw-2 ovs-vsctl --bare --columns=name find interface options:remote_ip=$ip_ch1) >> +from_gw2_to_ch2=$(m_as ovn-gw-2 ovs-vsctl --bare --columns=name find interface options:remote_ip=$ip_ch2) >> from_ch1_to_gw1=$(m_as ovn-chassis-1 ovs-vsctl --bare --columns=name find interface options:remote_ip=$ip_gw1) >> from_ch1_to_gw2=$(m_as ovn-chassis-1 ovs-vsctl --bare --columns=name find interface options:remote_ip=$ip_gw2) >> +from_ch2_to_gw1=$(m_as ovn-chassis-2 ovs-vsctl --bare --columns=name find interface options:remote_ip=$ip_gw1) >> +from_ch2_to_gw2=$(m_as ovn-chassis-2 ovs-vsctl --bare --columns=name find interface options:remote_ip=$ip_gw2) >> >> m_as ovn-chassis-1 ip link del hv1-vif1-p >> -m_as ovn-chassis-2 ip link del ext1-p >> +m_as ovn-chassis-2 ip link del hv2-vif1-p >> +m_as ovn-chassis-3 ip link del ext1-p >> >> OVS_WAIT_UNTIL([m_as ovn-chassis-1 ip link show | grep -q genev_sys]) >> OVS_WAIT_UNTIL([m_as ovn-chassis-2 ip link show | grep -q genev_sys]) >> +OVS_WAIT_UNTIL([m_as ovn-chassis-3 ip link show | grep -q genev_sys]) >> OVS_WAIT_UNTIL([m_as ovn-gw-1 ip link show | grep -q genev_sys]) >> OVS_WAIT_UNTIL([m_as ovn-gw-2 ip link show | grep -q genev_sys]) >> OVS_WAIT_UNTIL([m_as ovn-gw-3 ip link show | grep -q genev_sys]) >> >> +# Use "aggressive" bfd parameters >> +check multinode_nbctl set NB_Global . options:"bfd-min-rx"=500 >> +check multinode_nbctl set NB_Global . options:"bfd-min-tx"=100 >> check multinode_nbctl ls-add inside >> check multinode_nbctl ls-add outside >> check multinode_nbctl ls-add ext >> check multinode_nbctl lsp-add inside inside1 -- lsp-set-addresses inside1 "f0:00:c0:a8:01:01 192.168.1.1" >> +check multinode_nbctl lsp-add inside inside2 -- lsp-set-addresses inside2 "f0:00:c0:a8:01:02 192.168.1.2" >> check multinode_nbctl lsp-add ext ext1 -- lsp-set-addresses ext1 "00:00:c0:a8:00:01 192.168.0.1" >> >> multinode_nbctl create Logical_Router name=R1 >> @@ -3100,12 +3121,14 @@ m_as ovn-gw-3 ovs-vsctl remove open . external_ids garp-max-timeout-sec >> >> m_as ovn-chassis-1 ovs-vsctl set open . external-ids:ovn-bridge-mappings=public:br-ex >> m_as ovn-chassis-2 ovs-vsctl set open . external-ids:ovn-bridge-mappings=public:br-ex >> +m_as ovn-chassis-3 ovs-vsctl set open . external-ids:ovn-bridge-mappings=public:br-ex >> m_as ovn-gw-1 ovs-vsctl set open . external-ids:ovn-bridge-mappings=public:br-ex >> m_as ovn-gw-2 ovs-vsctl set open . external-ids:ovn-bridge-mappings=public:br-ex >> m_as ovn-gw-3 ovs-vsctl set open . external-ids:ovn-bridge-mappings=public:br-ex >> >> m_as ovn-chassis-1 /data/create_fake_vm.sh inside1 hv1-vif1 f0:00:c0:a8:01:01 1500 192.168.1.1 24 192.168.1.254 2000::1/64 2000::a >> -m_as ovn-chassis-2 /data/create_fake_vm.sh ext1 ext1 00:00:c0:a8:00:01 1500 192.168.0.1 24 192.168.0.254 1000::3/64 1000::a >> +m_as ovn-chassis-2 /data/create_fake_vm.sh inside2 hv2-vif1 f0:00:c0:a8:01:02 1500 192.168.1.2 24 192.168.1.254 2000::2/64 2000::a >> +m_as ovn-chassis-3 /data/create_fake_vm.sh ext1 ext1 00:00:c0:a8:00:01 1500 192.168.0.1 24 192.168.0.254 1000::3/64 1000::a >> >> # There should be one ha_chassis_group with the name "R1_outside" >> m_check_row_count HA_Chassis_Group 1 name=R1_outside >> @@ -3160,53 +3183,67 @@ for chassis in $from_ch1_to_gw1 $from_ch1_to_gw2; do >> wait_bfd_enabled ovn-chassis-1 $chassis >> done >> >> +# check BFD enablement on tunnel ports from ovn-chassis-2 ########### >> +for chassis in $from_ch2_to_gw1 $from_ch2_to_gw2; do >> + echo "checking ovn-chassis-2 -> $chassis" >> + wait_bfd_enabled ovn-chassis-2 $chassis >> +done >> + >> # Make sure there is no nft table left. Do not use nft directly as might not be installed in container. >> gw1_pid=$(podman inspect -f '{{.State.Pid}}' ovn-gw-1) >> nsenter --net=/proc/$gw1_pid/ns/net nft list tables | grep ovn-test && nsenter --net=/proc/$gw1_pid/ns/net nft delete table ip ovn-test >> -on_exit "nsenter --net=/proc/$gw1_pid/ns/net nft list tables | grep ovn-test && nsenter --net=/proc/$gw1_pid/ns/net nft delete table ip ovn-test" >> +on_exit "if [[ -d "/proc/$gw1_pid" ]]; then nsenter --net=/proc/$gw1_pid/ns/net nft list tables | grep ovn-test && nsenter --net=/proc/$gw1_pid/ns/net nft delete table ip ovn-test; fi" >> >> -for chassis in $from_gw1_to_gw2 $from_gw1_to_gw3 $from_gw1_to_ch1; do >> +for chassis in $from_gw1_to_gw2 $from_gw1_to_gw3 $from_gw1_to_ch1 $from_gw1_to_ch2; do >> wait_bfd_up ovn-gw-1 $chassis >> done >> -for chassis in $from_gw2_to_gw1 $from_gw2_to_gw3 $from_gw2_to_ch1; do >> +for chassis in $from_gw2_to_gw1 $from_gw2_to_gw3 $from_gw2_to_ch1 $from_gw2_to_ch2; do >> wait_bfd_up ovn-gw-2 $chassis >> done >> for chassis in $from_ch1_to_gw1 $from_ch1_to_gw2; do >> wait_bfd_up ovn-chassis-1 $chassis >> done >> +for chassis in $from_ch2_to_gw1 $from_ch2_to_gw2; do >> + wait_bfd_up ovn-chassis-2 $chassis >> +done >> >> m_wait_row_count Port_Binding 1 logical_port=cr-R1_outside chassis=$gw1_chassis >> check multinode_nbctl --wait=hv sync >> >> start_tcpdump() { >> echo "$(date +%H:%M:%S.%03N) Starting tcpdump" >> - M_START_TCPDUMP([ovn-chassis-1], [-neei hv1-vif1-p], [ch1]) >> - M_START_TCPDUMP([ovn-chassis-2], [-neei eth2], [ch2]) >> - M_START_TCPDUMP([ovn-gw-1], [-neei eth2], [gw1]) >> - M_START_TCPDUMP([ovn-gw-1], [-neei eth2 -Q out], [gw1_out]) >> - M_START_TCPDUMP([ovn-gw-2], [-neei eth2], [gw2]) >> - M_START_TCPDUMP([ovn-gw-2], [-neei eth2 -Q out], [gw2_out]) >> - M_START_TCPDUMP([ovn-gw-3], [-neei eth2], [gw3]) >> - M_START_TCPDUMP([ovn-gw-3], [-neei eth2 -Q out], [gw3_out]) >> + M_START_TCPDUMPS([ovn-chassis-1], [-neei hv1-vif1-p], [ch1], >> + [ovn-chassis-2], [-neei hv2-vif1-p], [ch2], >> + [ovn-chassis-3], [-neei eth2], [ch3], >> + [ovn-gw-1], [-neei eth2], [gw1], >> + [ovn-gw-1], [-neei eth2 -Q out], [gw1_out], >> + [ovn-gw-2], [-neei eth2], [gw2], >> + [ovn-gw-2], [-neei eth2 -Q out], [gw2_out], >> + [ovn-gw-3], [-neei eth2], [gw3], >> + [ovn-gw-3], [-neei eth2 -Q out], [gw3_out], >> + [ovn-gw-1], [-neei eth1], [gw1_eth1], >> + [ovn-gw-2], [-neei eth1], [gw2_eth1], >> + [ovn-chassis-1], [-neei eth1], [ch1_eth1], >> + [ovn-chassis-2], [-neei eth1], [ch2_eth1]) >> } >> >> stop_tcpdump() { >> echo "$(date +%H:%M:%S.%03N) Stopping tcpdump" >> - m_kill 'ovn-gw-1 ovn-gw-2 ovn-gw-3 ovn-chassis-1 ovn-chassis-2' tcpdump >> + m_kill 'ovn-gw-1 ovn-gw-2 ovn-gw-3 ovn-chassis-1 ovn-chassis-2 ovn-chassis-3' tcpdump >> } >> >> -# Send packets from chassis2 (ext1) to chassis1 >> +# Send packets from ovn-chassis-3 (ext1) to ovn-chassis-1 >> send_background_packets() { >> echo "$(date +%H:%M:%S.%03N) Sending packets in Background" >> start_tcpdump >> - M_NS_DAEMONIZE([ovn-chassis-2], [ext1], [ping -f -i 0.1 192.168.1.1], [ping.pid]) >> + M_NS_DAEMONIZE([ovn-chassis-3], [ext1], [ping -f -i 0.1 192.168.1.1], [ping.pid]) >> } >> >> stop_sending_background_packets() { >> echo "$(date +%H:%M:%S.%03N) Stopping Background process" >> m_as ovn-chassis-1 ps -ef | grep -v grep | grep -q ping && \ >> m_as ovn-chassis-1 echo "Stopping ping on ovn-chassis-1" && killall ping >> - m_as ovn-chassis-2 ps -ef | grep -v grep | grep -q ping && \ >> + m_as ovn-chassis-3 ps -ef | grep -v grep | grep -q ping && \ >> m_as ovn-chassis-2 echo "Stopping ping on ovn-chassis-2" && killall ping >> stop_tcpdump >> } >> @@ -3216,8 +3253,8 @@ check_for_new_garps() { >> expecting_garp=$2 >> n_new_garps=$(cat ${hv}_out.tcpdump | grep -c "f0:00:c0:a8:00:fe > Broadcast, ethertype ARP (0x0806), length 42: Request who-has 192.168.0.254 tell 192.168.0.254, length 28") >> >> - if [ "$expecting_garp" == "true" ]; then >> - AS_BOX([$(date +%H:%M:%S.%03N) Waiting/checking for garp from $hv - Starting with $n_new_garps]) >> + if [[ "$expecting_garp" == "true" ]]; then >> + echo "$(date +%H:%M:%S.%03N) Waiting/checking for garp from $hv - Starting with $n_new_garps" >> OVS_WAIT_UNTIL([ >> n_garps=$n_new_garps >> n_new_garps=$(cat ${hv}_out.tcpdump | grep -c "f0:00:c0:a8:00:fe > Broadcast, ethertype ARP (0x0806), length 42: Request who-has 192.168.0.254 tell 192.168.0.254, length 28") >> @@ -3225,7 +3262,7 @@ check_for_new_garps() { >> test "$n_garps" -ne "$n_new_garps" >> ]) >> else >> - AS_BOX([$(date +%H:%M:%S.%03N) Checking no garp from ${hv}]) >> + echo "$(date +%H:%M:%S.%03N) Checking no garp from ${hv}" >> # Waiting a few seconds to get a chance to see unexpected garps. >> sleep 3 >> n_garps=$(cat ${hv}_out.tcpdump | grep -c "f0:00:c0:a8:00:fe > Broadcast, ethertype ARP (0x0806), length 42: Request who-has 192.168.0.254 tell 192.168.0.254, length 28") >> @@ -3241,8 +3278,8 @@ check_for_new_echo_pkts() { >> n_new_echo_req=$(cat ${hv}.tcpdump | grep -c "$mac_src > $mac_dst, ethertype IPv4 (0x0800), length 98: 192.168.0.1 > 192.168.1.1: ICMP echo request") >> n_new_echo_rep=$(cat ${hv}.tcpdump | grep -c "$mac_dst > $mac_src, ethertype IPv4 (0x0800), length 98: 192.168.1.1 > 192.168.0.1: ICMP echo reply") >> >> - if [ "$expecting_pkts" == "true" ]; then >> - AS_BOX([$(date +%H:%M:%S.%03N) Waiting/checking for echo pkts through ${hv}]) >> + if [[ "$expecting_pkts" == "true" ]]; then >> + echo "$(date +%H:%M:%S.%03N) Waiting/checking for echo pkts through ${hv}" >> echo "Starting with $n_new_echo_req requests and $n_new_echo_rep replies so far on ${hv}." >> OVS_WAIT_UNTIL([ >> n_echo_req=$n_new_echo_req >> @@ -3253,7 +3290,7 @@ check_for_new_echo_pkts() { >> test "$n_echo_req" -ne "$n_new_echo_req" && test "$n_echo_rep" -ne "$n_new_echo_rep" >> ]) >> else >> - AS_BOX([$(date +%H:%M:%S.%03N) Checking no pkts from ${hv}]) >> + echo "$(date +%H:%M:%S.%03N) Checking no pkts from ${hv}" >> # Waiting a few seconds to get a chance to see unexpected pkts. >> sleep 3 >> n_echo_req=$(cat ${hv}.tcpdump | grep -c "$mac_src > $mac_dst, ethertype IPv4 (0x0800), length 98: 192.168.0.1 > 192.168.1.1: ICMP echo request") >> @@ -3271,22 +3308,44 @@ dump_statistics() { >> ch1_rep=$(grep -c "ICMP echo reply" ch1.tcpdump) >> ch2_req=$(grep -c "ICMP echo request" ch2.tcpdump) >> ch2_rep=$(grep -c "ICMP echo reply" ch2.tcpdump) >> + ch3_req=$(grep -c "ICMP echo request" ch3.tcpdump) >> + ch3_rep=$(grep -c "ICMP echo reply" ch3.tcpdump) >> gw1_req=$(grep -c "ICMP echo request" gw1.tcpdump) >> gw1_rep=$(grep -c "ICMP echo reply" gw1.tcpdump) >> gw2_req=$(grep -c "ICMP echo request" gw2.tcpdump) >> gw2_rep=$(grep -c "ICMP echo reply" gw2.tcpdump) >> gw3_req=$(grep -c "ICMP echo request" gw3.tcpdump) >> gw3_rep=$(grep -c "ICMP echo reply" gw3.tcpdump) >> - echo "$n1 claims in gw1, $n2 in gw2 and $n3 on gw3" >> - echo "ch2_request=$ch2_req gw1_request=$gw1_req gw2_request=$gw2_req gw3_request=$gw3_req ch1_request=$ch1_req ch1_reply=$ch1_rep gw1_reply=$gw1_rep gw2_reply=$gw2_rep gw3_reply=$gw3_rep ch2_reply=$ch2_rep" >> + echo "$n1 claims in gw1, $n2 in gw2 and $n3 on gw3" >&2 >> + echo "ch3_req=$ch3_req gw_req=($gw1_req + $gw2_req +$gw3_req) ch1_req=$ch1_req ch1_rep=$ch1_rep gw_rep=($gw1_rep + $gw2_rep + $gw3_rep) ch3_rep=$ch3_rep ch2=($ch2_req+$ch2_rep)" >&2 >> + echo "$((ch3_req - ch3_rep))" >> } >> >> -check_migration_between_gw1_and_gw2() { >> - action=$1 >> - send_background_packets >> +add_port() { >> + bridge=$1 >> + interface=$2 >> + address=$3 >> + echo "Adding $bridge $interface $address" >> + >> + pid=$(podman inspect -f '{{.State.Pid}}' ovn-gw-1) >> + ln -sf /proc/$pid/ns/net /var/run/netns/$pid >> + port=$(OVS_RUNDIR= ovs-vsctl --data=bare --no-heading --columns=name find interface \ >> + external_ids:container_id=ovn-gw-1 external_ids:container_iface="$interface") >> + port="${port:0:13}" >> + ip link add "${port}_l" type veth peer name "${port}_c" >> + ip link set "${port}_l" up >> + ip link set "${port}_c" netns $pid >> + ip netns exec $pid ip link set dev "${port}_c" name "$interface" >> + ip netns exec $pid ip link set "$interface" up >> + if [[ -n "$address" ]]; then >> + ip netns exec $pid ip addr add "$address" dev "$interface" >> + fi I might be wrong but I think nobody cleans up any of these ports created in the ovn-gw-1 container. Do we need some on_exit() calls here? >> +} >> >> +prepare() { >> + send_background_packets >> # We make sure gw1 is leader since enough time that it generated all its garps. >> - AS_BOX([$(date +%H:%M:%S.%03N) Waiting all garps sent by gw1]) >> + echo $(date +%H:%M:%S.%03N) Waiting all garps sent by gw1 >> n_new_garps=$(cat gw1_out.tcpdump | grep -c "f0:00:c0:a8:00:fe > Broadcast, ethertype ARP (0x0806), length 42: Request who-has 192.168.0.254 tell 192.168.0.254, length 28") >> OVS_WAIT_UNTIL([ >> n_garps=$n_new_garps >> @@ -3302,130 +3361,267 @@ check_migration_between_gw1_and_gw2() { >> check_for_new_echo_pkts gw2 "00:00:c0:a8:00:01" "f0:00:c0:a8:00:fe" "false" >> check_for_new_echo_pkts gw3 "00:00:c0:a8:00:01" "f0:00:c0:a8:00:fe" "false" >> >> + # All packets should go through gw1, and none through gw2 or gw3. >> + check_packets "true" "false" "false" "true" >> flap_count_gw_1=$(m_as ovn-gw-1 ovs-vsctl get interface $from_gw1_to_gw2 bfd_status | sed 's/.*flap_count=\"\([[0-9]]*\).*/\1/g') >> flap_count_gw_2=$(m_as ovn-gw-2 ovs-vsctl get interface $from_gw2_to_gw1 bfd_status | sed 's/.*flap_count=\"\([[0-9]]*\).*/\1/g') >> +} >> >> - if [ test "$action" == "stop_bfd" ]; then >> - AS_BOX([$(date +%H:%M:%S.%03N) Blocking bfd on gw1 (from $ip_gw1 to $ip_gw2)]) >> - nsenter --net=/proc/$gw1_pid/ns/net nft add table ip ovn-test >> - nsenter --net=/proc/$gw1_pid/ns/net nft 'add chain ip ovn-test INPUT { type filter hook input priority 0; policy accept; }' >> - # Drop BFD from gw-1 to gw-2: geneve port (6081), inner port 3784 (0xec8), Session state Up, Init, Down. >> - nsenter --net=/proc/$gw1_pid/ns/net nft add rule ip ovn-test INPUT ip daddr $ip_gw1 ip saddr $ip_gw2 udp dport 6081 '@th,416,16 == 0x0ec8 @th,472,8 == 0xc0 counter drop' >> - nsenter --net=/proc/$gw1_pid/ns/net nft add rule ip ovn-test INPUT ip daddr $ip_gw1 ip saddr $ip_gw2 udp dport 6081 '@th,416,16 == 0x0ec8 @th,472,8 == 0x80 counter drop' >> - nsenter --net=/proc/$gw1_pid/ns/net nft add rule ip ovn-test INPUT ip daddr $ip_gw1 ip saddr $ip_gw2 udp dport 6081 '@th,416,16 == 0x0ec8 @th,472,8 == 0x40 counter drop' >> - >> - # We do not check that packets go through gw2 as BFD between chassis-2 and gw1 is still up >> - fi >> - >> - if [ test "$action" == "kill_gw2" ]; then >> - AS_BOX([$(date +%H:%M:%S.%03N) Killing gw2 ovn-controller]) >> - on_exit 'm_as ovn-gw-2 /usr/share/openvswitch/scripts/ovs-ctl status || >> - m_as ovn-gw-2 /usr/share/openvswitch/scripts/ovs-ctl start --system-id=ovn-gw-2' >> - on_exit 'm_as ovn-gw-2 /usr/share/ovn/scripts/ovn-ctl status_controller || >> - m_as ovn-gw-2 /usr/share/ovn/scripts/ovn-ctl start_controller ${CONTROLLER_SSL_ARGS}' >> - >> - m_as ovn-gw-2 kill -9 $(m_as ovn-gw-2 cat /run/ovn/ovn-controller.pid) >> - m_as ovn-gw-2 kill -9 $(m_as ovn-gw-2 cat /run/openvswitch/ovs-vswitchd.pid) >> - m_as ovn-gw-2 kill -9 $(m_as ovn-gw-2 cat /run/openvswitch/ovsdb-server.pid) >> - # Also delete datapath (flows) >> - m_as ovn-gw-2 ovs-dpctl del-dp system@ovs-system >> - fi >> - >> - if [ test "$action" == "kill_gw1" ]; then >> - AS_BOX([$(date +%H:%M:%S.%03N) Killing gw1 ovn-controller]) >> - on_exit 'm_as ovn-gw-1 /usr/share/openvswitch/scripts/ovs-ctl status || >> - m_as ovn-gw-1 /usr/share/openvswitch/scripts/ovs-ctl start --system-id=ovn-gw-1' >> - on_exit 'm_as ovn-gw-1 /usr/share/ovn/scripts/ovn-ctl status_controller || >> - m_as ovn-gw-1 /usr/share/ovn/scripts/ovn-ctl start_controller ${CONTROLLER_SSL_ARGS}' >> - >> - m_as ovn-gw-1 kill -9 $(m_as ovn-gw-1 cat /run/ovn/ovn-controller.pid) >> - m_as ovn-gw-1 kill -9 $(m_as ovn-gw-1 cat /run/openvswitch/ovs-vswitchd.pid) >> - m_as ovn-gw-1 kill -9 $(m_as ovn-gw-1 cat /run/openvswitch/ovsdb-server.pid) >> - # Also delete datapath (flows) >> - m_as ovn-gw-1 ovs-dpctl del-dp system@ovs-system >> - fi >> +check_loss_after_flap() >> +{ >> + dead=$1 >> + max_expected_loss=$2 >> >> - if [ test "$action" == "kill_gw2" ]; then >> - AS_BOX([$(date +%H:%M:%S.%03N) Waiting for flap count between gw1 and gw2 to increase]) >> + if [[ "$dead" == "gw2" ]]; then >> + echo "$(date +%H:%M:%S.%03N) Waiting for flap count between gw1 and gw2 to increase" >> OVS_WAIT_UNTIL([ >> new_flap_count=$(m_as ovn-gw-1 ovs-vsctl get interfac $from_gw1_to_gw2 bfd_status | sed 's/.*flap_count=\"\([[0-9]]*\).*/\1/g') >> echo "Comparing $new_flap_count versus $flap_count_gw_1" >> test "$new_flap_count" -gt "$((flap_count_gw_1))" >> ]) >> else >> - AS_BOX([$(date +%H:%M:%S.%03N) Waiting for flap count between gw2 and gw1 to increase]) >> + echo "$(date +%H:%M:%S.%03N) Waiting for flap count between gw2 and gw1 to increase])" >> OVS_WAIT_UNTIL([ >> new_flap_count=$(m_as ovn-gw-2 ovs-vsctl get interfac $from_gw2_to_gw1 bfd_status | sed 's/.*flap_count=\"\([[0-9]]*\).*/\1/g') >> echo "Comparing $new_flap_count versus $flap_count_gw_2" >> test "$new_flap_count" -gt "$((flap_count_gw_2))" >> ]) >> - >> fi >> - AS_BOX([$(date +%H:%M:%S.%03N) Flapped!]) >> >> + echo "$(date +%H:%M:%S.%03N) Flapped!" >> # Wait a few more second for the fight. >> + sleep 4 >> + >> + echo "$(date +%H:%M:%S.%03N) Statistics after flapping" >> + lost=$(dump_statistics) >> + echo "===> $lost packet lost while handling migration" >> + AT_CHECK([test "$lost" -le "$max_expected_loss"]) >> +} >> + >> +final_check() >> +{ >> + action=$1 >> + lost=$2 >> + max_expected_loss_after_restoration=$3 >> + >> + # Wait a little more to get packets while network is restored >> sleep 2 >> - AS_BOX([$(date +%H:%M:%S.%03N) Statistics after flapping]) >> - dump_statistics >> - >> - if [ test "$action" == "stop_bfd" ]; then >> - # gw1 still alive and gw2 tried to claim => gw1 should restart generating garps. >> - check_for_new_garps gw1 "true" >> - check_for_new_garps gw2 "false" >> - check_for_new_garps gw3 "false" >> - check_for_new_echo_pkts gw1 00:00:c0:a8:00:01 f0:00:c0:a8:00:fe "true" >> - check_for_new_echo_pkts gw2 00:00:c0:a8:00:01 f0:00:c0:a8:00:fe "false" >> - check_for_new_echo_pkts gw3 00:00:c0:a8:00:01 f0:00:c0:a8:00:fe "false" >> - check_for_new_echo_pkts ch1 f0:00:c0:a8:01:fe f0:00:c0:a8:01:01 "true" >> - AS_BOX([$(date +%H:%M:%S.%03N) Unblocking bfd on gw1]) >> - nsenter --net=/proc/$gw1_pid/ns/net nft -a list ruleset >> - nsenter --net=/proc/$gw1_pid/ns/net nft delete table ip ovn-test >> - fi >> + echo "$(date +%H:%M:%S.%03N) Statistics after network restored (after $action)" >> + new_lost=$(dump_statistics) >> + echo "===> $((new_lost - lost)) packets lost during network restoration" >> + AT_CHECK([test "$((new_lost - lost))" -le "$max_expected_loss_after_restoration"]) >> + stop_sending_background_packets >> +} >> >> - if [ test "$action" == "kill_gw2" ]; then >> - # gw1 still alive, but gw2 did not try to claim => gw1 should not generate new garps. >> - check_for_new_garps gw1 "false" >> - check_for_new_garps gw2 "false" >> - check_for_new_garps gw3 "false" >> - check_for_new_echo_pkts gw1 00:00:c0:a8:00:01 f0:00:c0:a8:00:fe "true" >> - check_for_new_echo_pkts gw2 00:00:c0:a8:00:01 f0:00:c0:a8:00:fe "false" >> - check_for_new_echo_pkts gw3 00:00:c0:a8:00:01 f0:00:c0:a8:00:fe "false" >> - check_for_new_echo_pkts ch1 f0:00:c0:a8:01:fe f0:00:c0:a8:01:01 "true" >> - AS_BOX([$(date +%H:%M:%S.%03N) Restarting gw2 ovn-vswitchd]) >> - m_as ovn-gw-2 /usr/share/openvswitch/scripts/ovs-ctl start --system-id=ovn-gw-2 >> - >> - AS_BOX([$(date +%H:%M:%S.%03N) Restarting gw2 ovn-controller]) >> - m_as ovn-gw-2 /usr/share/ovn/scripts/ovn-ctl start_controller ${CONTROLLER_SSL_ARGS} >> - fi >> +check_garps() >> +{ >> + check_for_new_garps gw1 "$1" >> + check_for_new_garps gw2 "$2" >> + check_for_new_garps gw3 "$3" >> +} >> >> - if [ test "$action" == "kill_gw1" ]; then >> - # gw1 died => gw2 should generate garps. >> - check_for_new_garps gw1 "false" >> - check_for_new_garps gw2 "true" >> - check_for_new_garps gw3 "false" >> - check_for_new_echo_pkts gw1 00:00:c0:a8:00:01 f0:00:c0:a8:00:fe "false" >> - check_for_new_echo_pkts gw2 00:00:c0:a8:00:01 f0:00:c0:a8:00:fe "true" >> - check_for_new_echo_pkts gw3 00:00:c0:a8:00:01 f0:00:c0:a8:00:fe "false" >> - check_for_new_echo_pkts ch1 f0:00:c0:a8:01:fe f0:00:c0:a8:01:01 "true" >> - AS_BOX([$(date +%H:%M:%S.%03N) Restarting gw1 ovn-vswitchd]) >> - m_as ovn-gw-1 /usr/share/openvswitch/scripts/ovs-ctl start --system-id=ovn-gw-1 >> - >> - AS_BOX([$(date +%H:%M:%S.%03N) Restarting gw1 ovn-controller]) >> - m_as ovn-gw-1 /usr/share/ovn/scripts/ovn-ctl start_controller ${CONTROLLER_SSL_ARGS} >> - fi >> +check_packets() >> +{ >> + check_for_new_echo_pkts gw1 00:00:c0:a8:00:01 f0:00:c0:a8:00:fe "$1" >> + check_for_new_echo_pkts gw2 00:00:c0:a8:00:01 f0:00:c0:a8:00:fe "$2" >> + check_for_new_echo_pkts gw3 00:00:c0:a8:00:01 f0:00:c0:a8:00:fe "$3" >> + check_for_new_echo_pkts ch1 f0:00:c0:a8:01:fe f0:00:c0:a8:01:01 "$4" >> +} >> + >> +check_migration_between_gw1_and_gw2_bfd_stop() >> +{ >> + AS_BOX([$(date +%H:%M:%S.%03N) Testing migration after bfd_stop]) >> + max_expected_max_expected_loss1=$1 >> + max_expected_max_expected_loss2=$2 >> + prepare >> + >> + echo "$(date +%H:%M:%S.%03N) Blocking bfd on gw1 (from $ip_gw1 to $ip_gw2)" >> + nsenter --net=/proc/$gw1_pid/ns/net nft add table ip ovn-test >> + nsenter --net=/proc/$gw1_pid/ns/net nft 'add chain ip ovn-test INPUT { type filter hook input priority 0; policy accept; }' >> + # Drop BFD from gw-1 to gw-2: geneve port (6081), inner port 3784 (0xec8), Session state Up, Init, Down. >> + nsenter --net=/proc/$gw1_pid/ns/net nft add rule ip ovn-test INPUT ip daddr $ip_gw1 ip saddr $ip_gw2 udp dport 6081 '@th,416,16 == 0x0ec8 @th,472,8 == 0xc0 counter drop' >> + nsenter --net=/proc/$gw1_pid/ns/net nft add rule ip ovn-test INPUT ip daddr $ip_gw1 ip saddr $ip_gw2 udp dport 6081 '@th,416,16 == 0x0ec8 @th,472,8 == 0x80 counter drop' >> + nsenter --net=/proc/$gw1_pid/ns/net nft add rule ip ovn-test INPUT ip daddr $ip_gw1 ip saddr $ip_gw2 udp dport 6081 '@th,416,16 == 0x0ec8 @th,472,8 == 0x40 counter drop' >> + >> + check_loss_after_flap "gw1" $max_expected_max_expected_loss1 >> + >> + # gw1 still alive and gw2 tried to claim => gw1 should restart generating garps. >> + check_garps "true" "false" "false" >> + check_packets "true" "false" "false" "true" >> + >> + echo "$(date +%H:%M:%S.%03N) Unblocking bfd on gw1" >> + nsenter --net=/proc/$gw1_pid/ns/net nft -a list ruleset >> + nsenter --net=/proc/$gw1_pid/ns/net nft delete table ip ovn-test >> >> # The network is now restored => packets should go through gw1 and reach chassis-1. >> - check_for_new_echo_pkts gw1 00:00:c0:a8:00:01 f0:00:c0:a8:00:fe "true" >> - check_for_new_echo_pkts gw2 00:00:c0:a8:00:01 f0:00:c0:a8:00:fe "false" >> - check_for_new_echo_pkts gw3 00:00:c0:a8:00:01 f0:00:c0:a8:00:fe "false" >> - check_for_new_echo_pkts ch1 f0:00:c0:a8:01:fe f0:00:c0:a8:01:01 "true" >> - AS_BOX([$(date +%H:%M:%S.%03N) Statistics after network restored]) >> - dump_statistics >> - stop_sending_background_packets >> + check_packets "true" "false" "false" "true" >> + final_check "bfd_stop" $lost $max_expected_max_expected_loss2 >> +} >> + >> +check_migration_between_gw1_and_gw2_kill_gw2() { >> + AS_BOX([$(date +%H:%M:%S.%03N) Check migration after killing gw2 ovn-controller & vswitchd]) >> + max_expected_loss1=$1 >> + max_expected_loss2=$2 >> + prepare >> + >> + on_exit 'm_as ovn-gw-2 /usr/share/openvswitch/scripts/ovs-ctl status || >> + m_as ovn-gw-2 /usr/share/openvswitch/scripts/ovs-ctl start --system-id=ovn-gw-2' >> + on_exit 'm_as ovn-gw-2 /usr/share/ovn/scripts/ovn-ctl status_controller || >> + m_as ovn-gw-2 /usr/share/ovn/scripts/ovn-ctl start_controller ${CONTROLLER_SSL_ARGS}' >> + >> + m_as ovn-gw-2 kill -9 $(m_as ovn-gw-2 cat /run/ovn/ovn-controller.pid) >> + m_as ovn-gw-2 kill -9 $(m_as ovn-gw-2 cat /run/openvswitch/ovs-vswitchd.pid) >> + m_as ovn-gw-2 kill -9 $(m_as ovn-gw-2 cat /run/openvswitch/ovsdb-server.pid) >> + m_as ovn-gw-2 ovs-dpctl del-dp system@ovs-system >> + >> + check_loss_after_flap "gw2" $max_expected_loss1 >> + >> + # gw1 still alive, but gw2 did not try to claim => gw1 should not generate new garps. >> + check_garps "false" "false" "false" >> + check_packets "true" "fals" "false" "true" >> + >> + echo "$(date +%H:%M:%S.%03N) Restarting gw2 ovn-vswitchd]" >> + m_as ovn-gw-2 /usr/share/openvswitch/scripts/ovs-ctl start --system-id=ovn-gw-2 >> + >> + echo "$(date +%H:%M:%S.%03N) Restarting gw2 ovn-controller" >> + m_as ovn-gw-2 /usr/share/ovn/scripts/ovn-ctl start_controller ${CONTROLLER_SSL_ARGS} >> + >> + # The network is now restored => packets should go through gw1 and reach chassis-1. >> + check_packets "true" "false" "false" "true" >> + final_check "kill_gw2" $lost $max_expected_loss2 >> +} >> + >> +check_migration_between_gw1_and_gw2_update_ovs() { >> + AS_BOX([$(date +%H:%M:%S.%03N) Check migration after restarting gw1 ovs-vswitchd ("update")]) >> + max_expected_loss1=$1 >> + max_expected_loss2=$2 >> + prepare >> + >> + m_as ovn-gw-1 /usr/share/openvswitch/scripts/ovs-ctl restart --system-id=ovn-gw-1 >> + >> + check_loss_after_flap "gw1" $max_expected_loss1 >> + >> + # The network is now restored => packets should go through gw1 and reach chassis-1. >> + check_packets "true" "false" "false" "true" >> + final_check "ovs_update" $lost $max_expected_loss2 >> +} >> + >> +check_migration_between_gw1_and_gw2_kill_gw1() { >> + AS_BOX([$(date +%H:%M:%S.%03N) Killing gw1 ovn-controller and ovs-vswitchd]) >> + max_expected_loss1=$1 >> + max_expected_loss2=$2 >> + prepare >> + >> + on_exit 'm_as ovn-gw-1 /usr/share/openvswitch/scripts/ovs-ctl status || >> + m_as ovn-gw-1 /usr/share/openvswitch/scripts/ovs-ctl start --system-id=ovn-gw-1' >> + on_exit 'm_as ovn-gw-1 /usr/share/ovn/scripts/ovn-ctl status_controller || >> + m_as ovn-gw-1 /usr/share/ovn/scripts/ovn-ctl start_controller ${CONTROLLER_SSL_ARGS}' >> + >> + m_as ovn-gw-1 kill -9 $(m_as ovn-gw-1 cat /run/ovn/ovn-controller.pid) >> + m_as ovn-gw-1 kill -9 $(m_as ovn-gw-1 cat /run/openvswitch/ovs-vswitchd.pid) >> + m_as ovn-gw-1 kill -9 $(m_as ovn-gw-1 cat /run/openvswitch/ovsdb-server.pid) >> + # Also delete datapath (flows) >> + m_as ovn-gw-1 ovs-dpctl del-dp system@ovs-system >> + >> + check_loss_after_flap "gw1" $max_expected_loss1 >> + >> + # gw1 died => gw2 should generate garps. >> + check_garps "false" "true" "false" >> + check_packets "false" "true" "false" "true" >> + echo "$(date +%H:%M:%S.%03N) Restarting gw1 ovn-vswitchd after killing gw1" >> + m_as ovn-gw-1 /usr/share/openvswitch/scripts/ovs-ctl start --system-id=ovn-gw-1 >> + >> + # Wait some long time before restarting ovn-controller >> + sleep 10 >> + >> + # gw2 should still be handling packets as OVN not restarted on gw1 >> + check_packets "false" "true" "false" "true" >> + >> + echo "$(date +%H:%M:%S.%03N) Restarting gw1 ovn-controller after killing gw1" >> + m_as ovn-gw-1 /usr/share/ovn/scripts/ovn-ctl start_controller ${CONTROLLER_SSL_ARGS} >> + >> + # The network is now restored => packets should go through gw1 and reach chassis-1. >> + check_packets "true" "false" "false" "true" >> + final_check "kill_gw1" $lost $max_expected_loss2 >> +} >> + >> +check_migration_between_gw1_and_gw2_reboot_gw1() { >> + ip_gw1_eth1=$(podman exec ovn-gw-1 ip -brief address show eth1 | awk '{print $3}' | cut -d/ -f1) >> + cidr=$(podman exec ovn-gw-1 ip -brief address show eth1 | awk '{print $3}' | cut -d/ -f2) >> + AS_BOX([$(date +%H:%M:%S.%03N) Rebooting ovn-gw-1 with $ip_gw1_eth1/$cidr]) >> + max_expected_loss1=$1 >> + max_expected_loss2=$2 >> + prepare >> + >> + podman stop -t 0 ovn-gw-1 >> + (exec 3>&- 4>&- 5>&- 6>&-; podman start ovn-gw-1) >> + >> + add_port br-ovn eth1 $ip_gw1_eth1/$cidr >> + add_port br-ovn-ext eth2 >> + M_START_TCPDUMPS([ovn-gw-1], [-neei eth2], [gw1], [ovn-gw-1], [-neei eth1], [gw1_eth1], [ovn-gw-1], [-neei eth2 -Q out], [gw1_out]) >> + check_loss_after_flap "gw1" $max_expected_loss1 >> + >> + # gw1 died => gw2 should generate garps. >> + check_garps "false" "true" "false" >> + check_packets "false" "true" "false" "true" >> + >> + echo "$(date +%H:%M:%S.%03N) Restarting gw1 ovn-vswitchd after rebooting gw1" >> + m_as ovn-gw-1 /usr/share/openvswitch/scripts/ovs-ctl start --system-id=ovn-gw-1 >> + >> + # Wait some long time before restarting ovn-controller >> + sleep 10 >> + >> + # gw2 should still be handling packets as OVN not restarted on gw1 >> + check_packets "false" "true" "false" "true" >> + >> + echo "$(date +%H:%M:%S.%03N) Restarting gw1 ovn-controller after rebooting gw1" >> + m_as ovn-gw-1 /usr/share/ovn/scripts/ovn-ctl start_controller ${CONTROLLER_SSL_ARGS} >> + >> + # The network is now restored => packets should go through gw1 and reach chassis-1. >> + check_packets "true" "false" "false" "true" >> + final_check "kill_gw1" $lost $max_expected_loss2 >> +} >> + >> +check_compute_restart() { >> + AS_BOX([$(date +%H:%M:%S.%03N) Killing ovn-chassis-1 ovn-controller and ovs-vswitchd]) >> + max_expected_loss=$1 >> + prepare >> + >> + # Kill ovn-chassis-1 >> + echo "$(date +%H:%M:%S.%03N) Killing chassis-1" >> + on_exit 'm_as ovn-chassis-1 /usr/share/openvswitch/scripts/ovs-ctl status || >> + m_as ovn-chassis-1 /usr/share/openvswitch/scripts/ovs-ctl start --system-id=ovn-chassis-1' >> + on_exit 'm_as ovn-chassis-1 /usr/share/ovn/scripts/ovn-ctl status_controller || >> + m_as ovn-chassis-1 /usr/share/ovn/scripts/ovn-ctl start_controller ${CONTROLLER_SSL_ARGS}' >> + >> + m_as ovn-chassis-1 kill -9 $(m_as ovn-chassis-1 cat /run/ovn/ovn-controller.pid) >> + m_as ovn-chassis-1 kill -9 $(m_as ovn-chassis-1 cat /run/openvswitch/ovs-vswitchd.pid) >> + m_as ovn-chassis-1 kill -9 $(m_as ovn-chassis-1 cat /run/openvswitch/ovsdb-server.pid) >> + >> + # Now restart chassis-1 >> + flap_count=$(m_as ovn-gw-2 ovs-vsctl get interfac $from_gw2_to_ch1 bfd_status | sed 's/.*flap_count=\"\([[0-9]]*\).*/\1/g') >> + echo "$(date +%H:%M:%S.%03N) Restarting ovn-chassis-1 ovn-vswitchd." >> + m_as ovn-chassis-1 /usr/share/openvswitch/scripts/ovs-ctl start --system-id=ovn-chassis-1 >> + >> + echo "$(date +%H:%M:%S.%03N) Waiting for flap count between gw-1 and chassis-1 to increase" >> + OVS_WAIT_UNTIL([ >> + new_flap_count=$(m_as ovn-gw-1 ovs-vsctl get interfac $from_gw1_to_ch1 bfd_status | sed 's/.*flap_count=\"\([[0-9]]*\).*/\1/g') >> + echo "Comparing $new_flap_count versus $flap_count" >> + test "$new_flap_count" -gt "$((flap_count))" >> + ]) >> + >> + wait_bfd_up ovn-chassis-1 $from_ch1_to_gw1 >> + >> + echo "$(date +%H:%M:%S.%03N) Restarting ovn-chassis-1 ovn-controller." >> + m_as ovn-chassis-1 /usr/share/ovn/scripts/ovn-ctl start_controller ${CONTROLLER_SSL_ARGS} >> + >> + # Wait a long time to catch losses >> + sleep 5 >> + final_check "compute" 0 $max_expected_loss >> } >> >> start_tcpdump >> -AS_BOX([$(date +%H:%M:%S.%03N) Sending packet from hv1-vif1(inside1) to ext1]) >> +echo "$(date +%H:%M:%S.%03N) Sending packet from hv1-vif1(inside1) to ext1" >> M_NS_CHECK_EXEC([ovn-chassis-1], [hv1-vif1], [ping -c3 -q -i 0.1 192.168.0.1 | FORMAT_PING], >> [0], [dnl >> 3 packets transmitted, 3 received, 0% packet loss, time 0ms >> @@ -3433,7 +3629,7 @@ M_NS_CHECK_EXEC([ovn-chassis-1], [hv1-vif1], [ping -c3 -q -i 0.1 192.168.0.1 | F >> stop_tcpdump >> >> # It should have gone through gw1 and not gw2 >> -AS_BOX([$(date +%H:%M:%S.%03N) Checking it went through gw1 and not gw2]) >> +echo "$(date +%H:%M:%S.%03N) Checking it went through gw1 and not gw2" >> AT_CHECK([cat gw2.tcpdump | grep "ICMP echo"], [1], [dnl >> ]) >> >> @@ -3446,17 +3642,29 @@ f0:00:c0:a8:00:fe > 00:00:c0:a8:00:01, ethertype IPv4 (0x0800), length 98: 192.1 >> 00:00:c0:a8:00:01 > f0:00:c0:a8:00:fe, ethertype IPv4 (0x0800), length 98: 192.168.0.1 > 192.168.1.1: ICMP echo reply, >> ]) >> >> -# We stop bfd between gw1 & gw2, but keep gw1 & gw2 running. >> -check_migration_between_gw1_and_gw2 "stop_bfd" >> +# We stop bfd between gw1 & gw2, but keep gw1 & gw2 running. We should not lose packets. >> +check_migration_between_gw1_and_gw2_bfd_stop 1 1 >> >> # We simulate death of gw2. It should not have any effect. >> -check_migration_between_gw1_and_gw2 "kill_gw2" >> +check_migration_between_gw1_and_gw2_kill_gw2 1 1 >> + >> +# We simulate ovs update on gw1. When ovs is stopped, flows should still be handled by Kernel datapath. >> +# When OVS is restarted, BFD should go down immediately, and gw2 will start handling packets. >> +# There will be packet losses as gw2 will usually see BFD from gw1 up (and hence relase port) before gw1 sees >> +# BFD up (and claim port). >> +check_migration_between_gw1_and_gw2_update_ovs 20 1 >> + >> +# We simulate restart of both OVS & OVN gw1. gw2 should take over. >> +check_migration_between_gw1_and_gw2_kill_gw1 40 20 >> >> # We simulate death of gw1. gw2 should take over. >> -check_migration_between_gw1_and_gw2 "kill_gw1" >> +check_migration_between_gw1_and_gw2_reboot_gw1 40 20 >> + >> +# We simulate restart of ovn-chassis-1. We expct for ~2 sec losses as we wait for bfd up before starting >> +# ovn-controller. >> +check_compute_restart 30 >> >> AT_CLEANUP >> -]) >> >> AT_SETUP([ovn multinode bgp L2 EVPN]) >> check_fake_multinode_setup >> -- >> 2.47.1 >> > Regards, Dumitru
On 3/25/26 4:31 PM, Dumitru Ceara wrote: > On 3/12/26 8:35 PM, Mark Michelson via dev wrote: >> Hi Xavier, >> > > Hi Xavier, Mark, > >> It took a bit for me to look through all of this, but I believe this >> looks good to me. Thanks for adding the notes about the offline >> discussions, as I had not recalled all the details about those >> discussions between then and now. >> >> Acked-by: Mark Michelson <mmichels@redhat.com> >> > > I think this change makes sense too. I do have some very small comments > below but I might be able to address them myself and just squash that > into this version of the patch before applying it to main. > > Please let me know what you think. > Hmm, this actually failed in CI: https://github.com/dceara/ovn/actions/runs/23547684046/job/68553636598 Xavier, would you happen to have some time to look into it by any chance? Thanks, Dumitru >> On Mon, Mar 9, 2026 at 3:36 PM Xavier Simonart <xsimonar@redhat.com> wrote: >>> >>> If a server unexpectedly rebooted, OVS, when restarted, sets BFD >>> UP on bfd-enabled geneve tunnels. >>> However, if it takes time to restart OVN, an HA gw chassis >>> would attract the traffic while being unable to handle it >>> (as no flows), resulting in traffic loss. >>> >>> This is fixed by re-using ovs flow-restore-wait. >>> If set, OVS waits (prevents upcalls, ignores bfd, ...) until reset. >>> Once OVS receives the notification of flow-restore-wait being false, >>> it restarts handling upcalls, bfd... and ignores any new change to >>> flow-restore-wait. >>> >>> Hence, on chassis hosting ha gateways, OVN toggles flow-restore-wait: >>> it set it to false, waits for ack from OVS and then sets it back to true. >>> If server reboots, OVS will see flow-restore-wait being true. >>> >>> OVN also sets external_ids->ovn-managed-flow-restore-wait when setting >>> flow-restore-wait. When set, it tells that OVN once set flow-restore-wait. >>> >>> "ovs-ctl restart" also uses flow-restore-wait: when called, it saves the >>> flows, stops "ovs-vswitchd", sets "flow-restore-wait" to true, restarts >>> "ovs-vswitchd", restores the flows and finally removes "flow-restore-wait". >>> So OVS will wait either for "ovs-ctl restart" to remove "flow-restore-wait" >>> or for OVN to set "flow-restore-wait" to false. >>> >>> Reported-at: https://issues.redhat.com/browse/FDP-3075 >>> Signed-off-by: Xavier Simonart <xsimonar@redhat.com> >>> >>> --- >>> -v2 : - Updated based on Mark's feedback (commit message, comments). >>> - Avoid setting flow-restore-wait for computes. >>> - Add external_ids->ovn-managed-flow-restore-wait. >>> - Updated test: add test for compute update + nits (variable name changes) >>> --- >>> controller/bfd.c | 5 +- >>> controller/bfd.h | 4 +- >>> controller/ovn-controller.8.xml | 11 + >>> controller/ovn-controller.c | 171 +++++++++- >>> tests/multinode-macros.at | 22 ++ >>> tests/multinode.at | 546 ++++++++++++++++++++++---------- >>> 6 files changed, 584 insertions(+), 175 deletions(-) >>> >>> diff --git a/controller/bfd.c b/controller/bfd.c >>> index 3b0c3f6da..56bfa4936 100644 >>> --- a/controller/bfd.c >>> +++ b/controller/bfd.c >>> @@ -117,13 +117,14 @@ bfd_calculate_active_tunnels(const struct ovsrec_bridge *br_int, >>> * >>> * If 'our_chassis' is C5 then this function returns empty bfd set. >>> */ >>> -void >>> +bool >>> bfd_calculate_chassis( >>> const struct sbrec_chassis *our_chassis, >>> const struct sbrec_ha_chassis_group_table *ha_chassis_grp_table, >>> struct sset *bfd_chassis) >>> { >>> const struct sbrec_ha_chassis_group *ha_chassis_grp; >>> + bool chassis_is_ha_gw = false; >>> SBREC_HA_CHASSIS_GROUP_TABLE_FOR_EACH (ha_chassis_grp, >>> ha_chassis_grp_table) { >>> bool is_ha_chassis = false; >>> @@ -143,6 +144,7 @@ bfd_calculate_chassis( >>> sset_add(&grp_chassis, ha_ch->chassis->name); >>> if (our_chassis == ha_ch->chassis) { >>> is_ha_chassis = true; >>> + chassis_is_ha_gw = true; >>> bfd_setup_required = true; >>> } >>> } >>> @@ -178,6 +180,7 @@ bfd_calculate_chassis( >>> } >>> sset_destroy(&grp_chassis); >>> } >>> + return chassis_is_ha_gw; >>> } >>> >>> void >>> diff --git a/controller/bfd.h b/controller/bfd.h >>> index f8fece5a5..3e3384891 100644 >>> --- a/controller/bfd.h >>> +++ b/controller/bfd.h >>> @@ -16,6 +16,8 @@ >>> #ifndef OVN_BFD_H >>> #define OVN_BFD_H 1 >>> >>> +#include <stdbool.h> >>> + >>> struct hmap; >>> struct ovsdb_idl; >>> struct ovsdb_idl_index; >>> @@ -36,7 +38,7 @@ void bfd_run(const struct ovsrec_interface_table *, >>> const struct sbrec_sb_global_table *, >>> const struct ovsrec_open_vswitch_table *); >>> >>> -void bfd_calculate_chassis( >>> +bool bfd_calculate_chassis( >>> const struct sbrec_chassis *, >>> const struct sbrec_ha_chassis_group_table *, >>> struct sset *); >>> diff --git a/controller/ovn-controller.8.xml b/controller/ovn-controller.8.xml >>> index 57e7cf5dd..33281a4d6 100644 >>> --- a/controller/ovn-controller.8.xml >>> +++ b/controller/ovn-controller.8.xml >>> @@ -531,6 +531,17 @@ >>> 65535. >>> </dd> >>> >>> + <dt> >>> + <code>external_ids:ovn-managed-flow-restore-wait</code> in the >>> + <code>Open_vSwitch</code> table >>> + </dt> >>> + <dd> >>> + When set to true, this key indicates that <code>ovn-controller</code> >>> + has set the <code>other_config:flow-restore-wait</code> option. >>> + The key is set when <code>ovn-controller</code> enables >>> + flow-restore-wait and removed when it clears it. >>> + </dd> >>> + >>> <dt> >>> <code>external_ids:ct-zone-*</code> in the <code>Bridge</code> table >>> </dt> >>> diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c >>> index 5815f1b92..0611b1767 100644 >>> --- a/controller/ovn-controller.c >>> +++ b/controller/ovn-controller.c >>> @@ -211,6 +211,150 @@ static char *get_file_system_id(void) >>> free(filename); >>> return ret; >>> } >>> + >>> +/* Set/unset flow-restore-wait, and inc ovs next_cfg if false >>> + * When set to true, also sets ovn-managed-flow-restore-wait to true to >>> + * indicate ownership */ >>> +static void set_flow_restore_wait(struct ovsdb_idl_txn *ovs_idl_txn, >>> + const struct ovsrec_open_vswitch *cfg, >>> + const struct smap *other_config, >>> + const char *val, bool ovn_managed) > > Nit: "static void" should be on a different line. > >>> +{ >>> + struct smap new_config; >>> + smap_clone(&new_config, other_config); >>> + smap_replace(&new_config, "flow-restore-wait", val); >>> + ovsrec_open_vswitch_set_other_config(cfg, &new_config); >>> + if (!strcmp(val, "true")) { > > I'd prefer we use an actual boolean here as 'val'. > > We'd only have to change: > > smap_replace(&new_config, "flow-restore-wait", val ? "true", "false"); > >>> + ovsrec_open_vswitch_update_external_ids_setkey( >>> + cfg, "ovn-managed-flow-restore-wait", "true"); >>> + } else if (ovn_managed) { >>> + ovsrec_open_vswitch_update_external_ids_delkey( >>> + cfg, "ovn-managed-flow-restore-wait"); >>> + } >>> + ovsdb_idl_txn_increment(ovs_idl_txn, &cfg->header_, >>> + &ovsrec_open_vswitch_col_next_cfg, true); >>> + smap_destroy(&new_config); >>> +} >>> + >>> +static void >>> +manage_flow_restore_wait(struct ovsdb_idl_txn *ovs_idl_txn, >>> + const struct ovsrec_open_vswitch *cfg, >>> + uint64_t ofctrl_cur_cfg, uint64_t ovs_next_cfg, >>> + int ovs_txn_status, bool is_ha_gw) >>> +{ >>> + enum flow_restore_wait_state { >>> + FRW_INIT, /* Initial state */ >>> + FRW_WAIT_TXN_COMPLETE, /* Sent false, waiting txn to complete */ >>> + FRW_TXN_SUCCESS, /* Txn completed. Waiting for OVS Ack. */ >>> + FRW_DONE /* Everything completed */ >>> + }; >>> + >>> + static int64_t frw_next_cfg; >>> + static enum flow_restore_wait_state frw_state; >>> + static bool ofctrl_was_connected = false; >>> + >>> + bool ofctrl_connected = ofctrl_is_connected(); >>> + >>> + if (!ovs_idl_txn || !cfg) { >>> + return; >>> + } >>> + >>> + /* If OVS is stopped/started, make sure flow-restore-wait is toggled */ > > Nit: comments should be sentences and end with a '.'. > >>> + if (ofctrl_connected && !ofctrl_was_connected) { >>> + frw_state = FRW_INIT; >>> + } >>> + ofctrl_was_connected = ofctrl_connected; >>> + >>> + if (!ofctrl_connected) { >>> + return; >>> + } >>> + >>> + bool frw = smap_get_bool(&cfg->other_config, "flow-restore-wait", false); >>> + bool ovn_managed_once = smap_get_bool(&cfg->external_ids, >>> + "ovn-managed-flow-restore-wait", false); > > Nit: indentation. > >>> + >>> + if (frw && !ovn_managed_once) { >>> + /* frw has been set by ovs-ctl. Do not touch. */ >>> + return; >>> + } >>> + >>> + if (!is_ha_gw) { >>> + if (frw) { >>> + /* frw has once been set by OVN. We are now not an HA chassis >>> + * anymore, unset it. */ >>> + set_flow_restore_wait(ovs_idl_txn, cfg, &cfg->other_config, >>> + "false", ovn_managed_once); >>> + } >>> + /* else we are not an HA chassis and frw is false. Ignore it. */ >>> + return; >>> + } >>> + >>> + switch (frw_state) { >>> + case FRW_INIT: >>> + if (ofctrl_cur_cfg > 0) { >>> + set_flow_restore_wait(ovs_idl_txn, cfg, &cfg->other_config, >>> + "false", ovn_managed_once); >>> + frw_state = FRW_WAIT_TXN_COMPLETE; >>> + VLOG_INFO("Setting flow-restore-wait=false " >>> + "(cur_cfg=%"PRIu64")", ofctrl_cur_cfg); >>> + } >>> + break; >>> + >>> + case FRW_WAIT_TXN_COMPLETE: >>> + /* if (ovs_idl_txn != NULL), the transaction completed. >>> + * When the transaction completed, it either failed >>> + * (ovs_txn_status == 0) or succeeded (ovs_txn_status != 0) */ >>> + if (ovs_txn_status == 0) { >>> + /* Previous transaction failed. */ >>> + set_flow_restore_wait(ovs_idl_txn, cfg, &cfg->other_config, >>> + "false", ovn_managed_once); >>> + break; >>> + } >>> + /* txn succeeded, get next_cfg */ >>> + frw_next_cfg = ovs_next_cfg; >>> + frw_state = FRW_TXN_SUCCESS; >>> + /* fall through */ >>> + >>> + case FRW_TXN_SUCCESS: >>> + if (ovs_next_cfg < frw_next_cfg) { >>> + /* DB was reset, next_cfg went backwards */ >>> + VLOG_INFO("OVS DB reset (next_cfg %"PRId64" -> %"PRIu64"), " >>> + "resetting state", >>> + frw_next_cfg, ovs_next_cfg); >>> + set_flow_restore_wait(ovs_idl_txn, cfg, &cfg->other_config, >>> + "false", ovn_managed_once); >>> + frw_state = FRW_WAIT_TXN_COMPLETE; >>> + break; >>> + } >>> + >>> + if (!frw) { >>> + if (cfg->cur_cfg >= frw_next_cfg) { >>> + set_flow_restore_wait(ovs_idl_txn, cfg, &cfg->other_config, >>> + "true", ovn_managed_once); >>> + frw_state = FRW_DONE; >>> + VLOG_INFO("Setting flow-restore-wait=true"); >>> + } >>> + } else { >>> + /* The transaction to false succeeded but frw is true. >>> + * So, another task already set it to true */ >>> + frw_state = FRW_DONE; >>> + VLOG_INFO("flow-restore-wait was already true"); >>> + } >>> + break; >>> + case FRW_DONE: >>> + if (!frw) { >>> + /* frw has been removed (e.g. by ovs-ctl restart) or is false >>> + * (e.g. txn failed.) */ > > Nit: extra whitespace at the beginning of the line. > >>> + set_flow_restore_wait(ovs_idl_txn, cfg, &cfg->other_config, >>> + "false", ovn_managed_once); >>> + frw_state = FRW_WAIT_TXN_COMPLETE; >>> + VLOG_INFO("OVS frw cleared, restarting flow-restore-wait sequence " >>> + "(cur_cfg=%"PRIu64")", ofctrl_cur_cfg); >>> + } >>> + break; >>> + } >>> +} >>> + >>> /* Only set monitor conditions on tables that are available in the >>> * server schema. >>> */ >>> @@ -3381,6 +3525,7 @@ en_mac_cache_cleanup(void *data) >>> >>> struct ed_type_bfd_chassis { >>> struct sset bfd_chassis; >>> + bool is_ha_gw; >>> }; >>> >>> static void * >>> @@ -3409,8 +3554,9 @@ en_bfd_chassis_run(struct engine_node *node, void *data OVS_UNUSED) >>> = chassis_lookup_by_name(sbrec_chassis_by_name, chassis_id); >>> >>> sset_clear(&bfd_chassis->bfd_chassis); >>> - bfd_calculate_chassis(chassis, ha_chassis_grp_table, >>> - &bfd_chassis->bfd_chassis); >>> + bfd_chassis->is_ha_gw = bfd_calculate_chassis(chassis, >>> + ha_chassis_grp_table, >>> + &bfd_chassis->bfd_chassis); >>> return EN_UPDATED; >>> } >>> >>> @@ -7117,6 +7263,7 @@ main(int argc, char *argv[]) >>> struct unixctl_server *unixctl; >>> struct ovn_exit_args exit_args = {0}; >>> struct br_int_remote br_int_remote = {0}; >>> + static uint64_t next_cfg = 0; >>> int retval; >>> >>> /* Read from system-id-override file once on startup. */ >>> @@ -7444,6 +7591,7 @@ main(int argc, char *argv[]) >>> >>> /* Main loop. */ >>> int ovnsb_txn_status = 1; >>> + int ovs_txn_status = 1; >>> bool sb_monitor_all = false; >>> struct tracked_acl_ids *tracked_acl_ids = NULL; >>> while (!exit_args.exiting) { >>> @@ -7545,6 +7693,11 @@ main(int argc, char *argv[]) >>> pinctrl_update_swconn(br_int_remote.target, >>> br_int_remote.probe_interval); >>> >>> + if (cfg && ovs_idl_txn && ovs_txn_status == -1) { >>> + /* txn was in progress and is now completed */ >>> + next_cfg = cfg->next_cfg; >>> + } >>> + >>> /* Enable ACL matching for double tagged traffic. */ >>> if (ovs_idl_txn && cfg) { >>> int vlan_limit = smap_get_int( >>> @@ -7894,6 +8047,13 @@ main(int argc, char *argv[]) >>> stopwatch_start(OFCTRL_SEQNO_RUN_STOPWATCH_NAME, >>> time_msec()); >>> ofctrl_seqno_run(ofctrl_get_cur_cfg()); >>> + if (ovs_idl_txn && bfd_chassis_data) { >>> + manage_flow_restore_wait(ovs_idl_txn, cfg, >>> + ofctrl_get_cur_cfg(), >>> + next_cfg, ovs_txn_status, >>> + bfd_chassis_data->is_ha_gw); >>> + } >>> + >>> stopwatch_stop(OFCTRL_SEQNO_RUN_STOPWATCH_NAME, >>> time_msec()); >>> stopwatch_start(IF_STATUS_MGR_RUN_STOPWATCH_NAME, >>> @@ -7993,7 +8153,7 @@ main(int argc, char *argv[]) >>> OVS_NOT_REACHED(); >>> } >>> >>> - int ovs_txn_status = ovsdb_idl_loop_commit_and_wait(&ovs_idl_loop); >>> + ovs_txn_status = ovsdb_idl_loop_commit_and_wait(&ovs_idl_loop); >>> if (!ovs_txn_status) { >>> /* The transaction failed. */ >>> vif_plug_clear_deleted( >>> @@ -8012,6 +8172,9 @@ main(int argc, char *argv[]) >>> &vif_plug_deleted_iface_ids); >>> vif_plug_finish_changed( >>> &vif_plug_changed_iface_ids); >>> + if (cfg) { >>> + next_cfg = cfg->next_cfg; >>> + } >>> } else if (ovs_txn_status == -1) { >>> /* The commit is still in progress */ >>> } else { >>> @@ -8085,7 +8248,7 @@ loop_done: >>> } >>> >>> ovsdb_idl_loop_commit_and_wait(&ovnsb_idl_loop); >>> - int ovs_txn_status = ovsdb_idl_loop_commit_and_wait(&ovs_idl_loop); >>> + ovs_txn_status = ovsdb_idl_loop_commit_and_wait(&ovs_idl_loop); >>> if (!ovs_txn_status) { >>> /* The transaction failed. */ >>> vif_plug_clear_deleted( >>> diff --git a/tests/multinode-macros.at b/tests/multinode-macros.at >>> index 4a74d5127..646ee2d79 100644 >>> --- a/tests/multinode-macros.at >>> +++ b/tests/multinode-macros.at >>> @@ -41,6 +41,28 @@ m4_define([M_START_TCPDUMP], >>> ] >>> ) >>> >>> +m4_define([_M_START_TCPDUMPS_RECURSIVE], [ > > I think we normally put the _ at the end, i.e., M_START_TCPDUMPS_RECURSIVE_ > >>> + m4_if(m4_eval($# > 3), [1], [dnl >>> + names="$names $3" >>> + echo "Running podman exec $1 tcpdump -l $2 >$3.tcpdump 2>$3.stderr" >>> + podman exec $1 tcpdump -l $2 >$3.tcpdump 2>$3.stderr & >>> + echo "podman exec $1 ps -ef | grep -v grep | grep tcpdump && podman exec $1 killall tcpdump" >> cleanup >>> + _M_START_TCPDUMPS_RECURSIVE(m4_shift(m4_shift(m4_shift($@)))) >>> + ]) >>> + ] >>> +) >>> + >>> +# Start Multiple tcpdump. Useful to speed up when many tcpdump >>> +# must be started as waiting for "listening" takes usually 1 second. >>> +m4_define([M_START_TCPDUMPS], >>> + [ >>> + names="" >>> + _M_START_TCPDUMPS_RECURSIVE($@) >>> + for name in $names; do >>> + OVS_WAIT_UNTIL([grep -q "listening" ${name}.stderr]) >>> + done >>> + ] >>> +) >>> >>> # M_FORMAT_CT([ip-addr]) >>> # >>> diff --git a/tests/multinode.at b/tests/multinode.at >>> index 6b9614126..24d7ca27c 100644 >>> --- a/tests/multinode.at >>> +++ b/tests/multinode.at >>> @@ -2986,42 +2986,42 @@ AT_CLEANUP >>> >>> AT_SETUP([HA: Check for missing garp on leader when BFD goes back up]) >>> # Network topology >>> -# ┌────────────────────────────────────────────────────────────────────────────────────────────────────────┐ >>> -# │ │ >>> -# │ ┌───────────────────┐ ┌───────────────────┐ ┌───────────────────┐ ┌───────────────────┐ │ >>> -# │ │ ovn-chassis-1 │ │ ovn-gw-1 │ │ ovn-gw-2 │ │ ovn-chassis-2 │ │ >>> -# │ └─────────┬─────────┘ └───────────────────┘ └───────────────────┘ └───────────────────┘ │ >>> -# │ ┌─────────┴─────────┐ │ >>> -# │ │ inside1 │ │ >>> -# │ │ 192.168.1.1/24 │ │ >>> -# │ └─────────┬─────────┘ │ >>> -# │ ┌─────────┴─────────┐ │ >>> -# │ │ inside │ │ >>> -# │ └─────────┬─────────┘ │ >>> -# │ ┌─────────┴─────────┐ │ >>> -# │ │ 192.168.1.254 │ │ >>> -# │ │ R1 │ │ >>> -# │ │ 192.168.0.254 │ │ >>> -# │ └─────────┬─────────┘ │ >>> -# │ └------eth1---------------┬--------eth1-----------┐ │ >>> -# │ ┌──────────┴────────┐ ┌─────────┴─────────┐ │ >>> -# │ │ 192.168.1.254 │ │ 192.168.1.254 │ │ >>> -# │ │ R1 │ │ R1 │ │ >>> -# │ │ 192.168.0.254 │ │ 192.168.0.254 │ │ >>> -# │ └─────────┬─────────┘ └─────────┬─────────┘ │ >>> -# │ │ │ ┌───────────────────┐ │ >>> -# │ ┌─────────┴─────────┐ ┌─────────┴─────────┐ │ 192.168.0.1 │ │ >>> -# │ │ outside │ │ outside │ │ ext1 │ │ >>> -# │ └─────────┬─────────┘ └─────────┬─────────┘ └─────────┬─────────┘ │ >>> -# │ ┌─────────┴─────────┐ ┌─────────┴─────────┐ ┌─────────┴─────────┐ │ >>> -# │ │ ln-outside │ │ ln-outside │ │ ln-ext1 │ │ >>> -# │ └─────────┬─────────┘ └─────────┬─────────┘ └─────────┬─────────┘ │ >>> -# │ ┌─────────┴─────────┐ ┌─────────┴─────────┐ ┌─────────┴─────────┐ │ >>> -# │ │ br-ex │ │ br-ex │ │ br-ex │ │ >>> -# │ └─────────┬─────────┘ └─────────┬─────────┘ └─────────┬─────────┘ │ >>> -# │ └---------eth2-----------┴-------eth2-------------┘ │ >>> -# │ │ >>> -# └────────────────────────────────────────────────────────────────────────────────────────────────────────┘ >>> +# ┌────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ >>> +# │ │ >>> +# │ ┌───────────────────┐ ┌───────────────────┐ ┌───────────────────┐ ┌───────────────────┐ ┌───────────────────┐ │ >>> +# │ │ ovn-chassis-1 │ │ ovn-chassis-2 │ │ ovn-gw-1 │ │ ovn-gw-2 │ │ ovn-chassis-3 │ │ >>> +# │ └─────────┬─────────┘ └─────────┬─────────┘ └───────────────────┘ └───────────────────┘ └───────────────────┘ │ >>> +# │ ┌─────────┴─────────┐ ┌─────────┴─────────┐ │ >>> +# │ │ inside1 │ │ inside2 │ │ >>> +# │ │ 192.168.1.1/24 │ │ 192.168.1.2/24 │ │ >>> +# │ └─────────┬─────────┘ └─────────┬─────────┘ │ >>> +# │ ┌─┴────────────────────────┴─┐ │ >>> +# │ │ inside │ │ >>> +# │ └──────────────┬─────────────┘ │ >>> +# │ ┌─────────┴─────────┐ │ >>> +# │ │ 192.168.1.254 │ │ >>> +# │ │ R1 │ │ >>> +# │ │ 192.168.0.254 │ │ >>> +# │ └─────────┬─────────┘ │ >>> +# │ └------eth1---------------------------┬--------eth1-----------┐ │ >>> +# │ ┌──────────┴────────┐ ┌─────────┴─────────┐ │ >>> +# │ │ 192.168.1.254 │ │ 192.168.1.254 │ │ >>> +# │ │ R1 │ │ R1 │ │ >>> +# │ │ 192.168.0.254 │ │ 192.168.0.254 │ │ >>> +# │ └─────────┬─────────┘ └─────────┬─────────┘ │ >>> +# │ │ │ ┌───────────────────┐ │ >>> +# │ ┌─────────┴─────────┐ ┌─────────┴─────────┐ │ 192.168.0.1 │ │ >>> +# │ │ outside │ │ outside │ │ ext1 │ │ >>> +# │ └─────────┬─────────┘ └─────────┬─────────┘ └─────────┬─────────┘ │ >>> +# │ ┌─────────┴─────────┐ ┌─────────┴─────────┐ ┌─────────┴─────────┐ │ >>> +# │ │ ln-outside │ │ ln-outside │ │ ln-ext1 │ │ >>> +# │ └─────────┬─────────┘ └─────────┬─────────┘ └─────────┬─────────┘ │ >>> +# │ ┌─────────┴─────────┐ ┌─────────┴─────────┐ ┌─────────┴─────────┐ │ >>> +# │ │ br-ex │ │ br-ex │ │ br-ex │ │ >>> +# │ └─────────┬─────────┘ └─────────┬─────────┘ └─────────┬─────────┘ │ >>> +# │ └---------eth2-----------┴-------eth2-------------┘ │ >>> +# │ │ >>> +# └────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘ >>> >>> # The goal of this test is the check that GARP are properly generated by higest priority traffic when >>> # BFD goes down, and back up, and this whether the BFD event is due either to some bfd packet lost >>> @@ -3030,6 +3030,12 @@ AT_SETUP([HA: Check for missing garp on leader when BFD goes back up]) >>> # So gw3 should in this test neither send garp or receive packets. >>> # >>> # Enable vconn so we can check the GARP from a log perspective. >>> +on_exit "podman exec ovn-gw-1 ovn-appctl vlog/set info" >>> +on_exit "podman exec ovn-gw-1 ovn-appctl vlog/enable-rate-limit" >>> +on_exit "podman exec ovn-gw-2 ovn-appctl vlog/set info" >>> +on_exit "podman exec ovn-gw-2 ovn-appctl vlog/enable-rate-limit" >>> +on_exit "podman exec ovn-gw-3 ovn-appctl vlog/set info" >>> +on_exit "podman exec ovn-gw-3 ovn-appctl vlog/enable-rate-limit" >>> m_as ovn-gw-1 ovn-appctl vlog/set vconn:dbg >>> m_as ovn-gw-2 ovn-appctl vlog/set vconn:dbg >>> m_as ovn-gw-3 ovn-appctl vlog/set vconn:dbg >>> @@ -3037,12 +3043,17 @@ m_as ovn-gw-1 ovn-appctl vlog/disable-rate-limit >>> m_as ovn-gw-2 ovn-appctl vlog/disable-rate-limit >>> m_as ovn-gw-3 ovn-appctl vlog/disable-rate-limit >>> >>> +# Decrease revalidation time on ovs switch simulating ToR. >>> +on_exit "OVS_RUNDIR= ovs-vsctl set Open_vSwitch . other_config:max-revalidator=500" > > Isn't it better to just remove the key on_exit, just in case the default > ever changes from 500 to something else? I.e.: > > on_exit "OVS_RUNDIR= ovs-vsctl remove Open_vSwitch . other_config > max-revalidator" > >>> +OVS_RUNDIR= ovs-vsctl set Open_vSwitch . other_config:max-revalidator=100 >>> + >>> check_fake_multinode_setup >>> >>> # Delete the multinode NB and OVS resources before starting the test. >>> cleanup_multinode_resources >>> >>> ip_ch1=$(m_as ovn-chassis-1 ip a show dev eth1 | grep "inet " | awk '{print $2}'| cut -d '/' -f1) >>> +ip_ch2=$(m_as ovn-chassis-2 ip a show dev eth1 | grep "inet " | awk '{print $2}'| cut -d '/' -f1) >>> ip_gw1=$(m_as ovn-gw-1 ip a show dev eth1 | grep "inet " | awk '{print $2}'| cut -d '/' -f1) >>> ip_gw2=$(m_as ovn-gw-2 ip a show dev eth1 | grep "inet " | awk '{print $2}'| cut -d '/' -f1) >>> ip_gw3=$(m_as ovn-gw-3 ip a show dev eth1 | grep "inet " | awk '{print $2}'| cut -d '/' -f1) >>> @@ -3050,25 +3061,35 @@ ip_gw3=$(m_as ovn-gw-3 ip a show dev eth1 | grep "inet " | awk '{print $2}'| cut >>> from_gw1_to_gw2=$(m_as ovn-gw-1 ovs-vsctl --bare --columns=name find interface options:remote_ip=$ip_gw2) >>> from_gw1_to_gw3=$(m_as ovn-gw-1 ovs-vsctl --bare --columns=name find interface options:remote_ip=$ip_gw3) >>> from_gw1_to_ch1=$(m_as ovn-gw-1 ovs-vsctl --bare --columns=name find interface options:remote_ip=$ip_ch1) >>> +from_gw1_to_ch2=$(m_as ovn-gw-1 ovs-vsctl --bare --columns=name find interface options:remote_ip=$ip_ch2) >>> from_gw2_to_gw1=$(m_as ovn-gw-2 ovs-vsctl --bare --columns=name find interface options:remote_ip=$ip_gw1) >>> from_gw2_to_gw3=$(m_as ovn-gw-2 ovs-vsctl --bare --columns=name find interface options:remote_ip=$ip_gw3) >>> from_gw2_to_ch1=$(m_as ovn-gw-2 ovs-vsctl --bare --columns=name find interface options:remote_ip=$ip_ch1) >>> +from_gw2_to_ch2=$(m_as ovn-gw-2 ovs-vsctl --bare --columns=name find interface options:remote_ip=$ip_ch2) >>> from_ch1_to_gw1=$(m_as ovn-chassis-1 ovs-vsctl --bare --columns=name find interface options:remote_ip=$ip_gw1) >>> from_ch1_to_gw2=$(m_as ovn-chassis-1 ovs-vsctl --bare --columns=name find interface options:remote_ip=$ip_gw2) >>> +from_ch2_to_gw1=$(m_as ovn-chassis-2 ovs-vsctl --bare --columns=name find interface options:remote_ip=$ip_gw1) >>> +from_ch2_to_gw2=$(m_as ovn-chassis-2 ovs-vsctl --bare --columns=name find interface options:remote_ip=$ip_gw2) >>> >>> m_as ovn-chassis-1 ip link del hv1-vif1-p >>> -m_as ovn-chassis-2 ip link del ext1-p >>> +m_as ovn-chassis-2 ip link del hv2-vif1-p >>> +m_as ovn-chassis-3 ip link del ext1-p >>> >>> OVS_WAIT_UNTIL([m_as ovn-chassis-1 ip link show | grep -q genev_sys]) >>> OVS_WAIT_UNTIL([m_as ovn-chassis-2 ip link show | grep -q genev_sys]) >>> +OVS_WAIT_UNTIL([m_as ovn-chassis-3 ip link show | grep -q genev_sys]) >>> OVS_WAIT_UNTIL([m_as ovn-gw-1 ip link show | grep -q genev_sys]) >>> OVS_WAIT_UNTIL([m_as ovn-gw-2 ip link show | grep -q genev_sys]) >>> OVS_WAIT_UNTIL([m_as ovn-gw-3 ip link show | grep -q genev_sys]) >>> >>> +# Use "aggressive" bfd parameters >>> +check multinode_nbctl set NB_Global . options:"bfd-min-rx"=500 >>> +check multinode_nbctl set NB_Global . options:"bfd-min-tx"=100 >>> check multinode_nbctl ls-add inside >>> check multinode_nbctl ls-add outside >>> check multinode_nbctl ls-add ext >>> check multinode_nbctl lsp-add inside inside1 -- lsp-set-addresses inside1 "f0:00:c0:a8:01:01 192.168.1.1" >>> +check multinode_nbctl lsp-add inside inside2 -- lsp-set-addresses inside2 "f0:00:c0:a8:01:02 192.168.1.2" >>> check multinode_nbctl lsp-add ext ext1 -- lsp-set-addresses ext1 "00:00:c0:a8:00:01 192.168.0.1" >>> >>> multinode_nbctl create Logical_Router name=R1 >>> @@ -3100,12 +3121,14 @@ m_as ovn-gw-3 ovs-vsctl remove open . external_ids garp-max-timeout-sec >>> >>> m_as ovn-chassis-1 ovs-vsctl set open . external-ids:ovn-bridge-mappings=public:br-ex >>> m_as ovn-chassis-2 ovs-vsctl set open . external-ids:ovn-bridge-mappings=public:br-ex >>> +m_as ovn-chassis-3 ovs-vsctl set open . external-ids:ovn-bridge-mappings=public:br-ex >>> m_as ovn-gw-1 ovs-vsctl set open . external-ids:ovn-bridge-mappings=public:br-ex >>> m_as ovn-gw-2 ovs-vsctl set open . external-ids:ovn-bridge-mappings=public:br-ex >>> m_as ovn-gw-3 ovs-vsctl set open . external-ids:ovn-bridge-mappings=public:br-ex >>> >>> m_as ovn-chassis-1 /data/create_fake_vm.sh inside1 hv1-vif1 f0:00:c0:a8:01:01 1500 192.168.1.1 24 192.168.1.254 2000::1/64 2000::a >>> -m_as ovn-chassis-2 /data/create_fake_vm.sh ext1 ext1 00:00:c0:a8:00:01 1500 192.168.0.1 24 192.168.0.254 1000::3/64 1000::a >>> +m_as ovn-chassis-2 /data/create_fake_vm.sh inside2 hv2-vif1 f0:00:c0:a8:01:02 1500 192.168.1.2 24 192.168.1.254 2000::2/64 2000::a >>> +m_as ovn-chassis-3 /data/create_fake_vm.sh ext1 ext1 00:00:c0:a8:00:01 1500 192.168.0.1 24 192.168.0.254 1000::3/64 1000::a >>> >>> # There should be one ha_chassis_group with the name "R1_outside" >>> m_check_row_count HA_Chassis_Group 1 name=R1_outside >>> @@ -3160,53 +3183,67 @@ for chassis in $from_ch1_to_gw1 $from_ch1_to_gw2; do >>> wait_bfd_enabled ovn-chassis-1 $chassis >>> done >>> >>> +# check BFD enablement on tunnel ports from ovn-chassis-2 ########### >>> +for chassis in $from_ch2_to_gw1 $from_ch2_to_gw2; do >>> + echo "checking ovn-chassis-2 -> $chassis" >>> + wait_bfd_enabled ovn-chassis-2 $chassis >>> +done >>> + >>> # Make sure there is no nft table left. Do not use nft directly as might not be installed in container. >>> gw1_pid=$(podman inspect -f '{{.State.Pid}}' ovn-gw-1) >>> nsenter --net=/proc/$gw1_pid/ns/net nft list tables | grep ovn-test && nsenter --net=/proc/$gw1_pid/ns/net nft delete table ip ovn-test >>> -on_exit "nsenter --net=/proc/$gw1_pid/ns/net nft list tables | grep ovn-test && nsenter --net=/proc/$gw1_pid/ns/net nft delete table ip ovn-test" >>> +on_exit "if [[ -d "/proc/$gw1_pid" ]]; then nsenter --net=/proc/$gw1_pid/ns/net nft list tables | grep ovn-test && nsenter --net=/proc/$gw1_pid/ns/net nft delete table ip ovn-test; fi" >>> >>> -for chassis in $from_gw1_to_gw2 $from_gw1_to_gw3 $from_gw1_to_ch1; do >>> +for chassis in $from_gw1_to_gw2 $from_gw1_to_gw3 $from_gw1_to_ch1 $from_gw1_to_ch2; do >>> wait_bfd_up ovn-gw-1 $chassis >>> done >>> -for chassis in $from_gw2_to_gw1 $from_gw2_to_gw3 $from_gw2_to_ch1; do >>> +for chassis in $from_gw2_to_gw1 $from_gw2_to_gw3 $from_gw2_to_ch1 $from_gw2_to_ch2; do >>> wait_bfd_up ovn-gw-2 $chassis >>> done >>> for chassis in $from_ch1_to_gw1 $from_ch1_to_gw2; do >>> wait_bfd_up ovn-chassis-1 $chassis >>> done >>> +for chassis in $from_ch2_to_gw1 $from_ch2_to_gw2; do >>> + wait_bfd_up ovn-chassis-2 $chassis >>> +done >>> >>> m_wait_row_count Port_Binding 1 logical_port=cr-R1_outside chassis=$gw1_chassis >>> check multinode_nbctl --wait=hv sync >>> >>> start_tcpdump() { >>> echo "$(date +%H:%M:%S.%03N) Starting tcpdump" >>> - M_START_TCPDUMP([ovn-chassis-1], [-neei hv1-vif1-p], [ch1]) >>> - M_START_TCPDUMP([ovn-chassis-2], [-neei eth2], [ch2]) >>> - M_START_TCPDUMP([ovn-gw-1], [-neei eth2], [gw1]) >>> - M_START_TCPDUMP([ovn-gw-1], [-neei eth2 -Q out], [gw1_out]) >>> - M_START_TCPDUMP([ovn-gw-2], [-neei eth2], [gw2]) >>> - M_START_TCPDUMP([ovn-gw-2], [-neei eth2 -Q out], [gw2_out]) >>> - M_START_TCPDUMP([ovn-gw-3], [-neei eth2], [gw3]) >>> - M_START_TCPDUMP([ovn-gw-3], [-neei eth2 -Q out], [gw3_out]) >>> + M_START_TCPDUMPS([ovn-chassis-1], [-neei hv1-vif1-p], [ch1], >>> + [ovn-chassis-2], [-neei hv2-vif1-p], [ch2], >>> + [ovn-chassis-3], [-neei eth2], [ch3], >>> + [ovn-gw-1], [-neei eth2], [gw1], >>> + [ovn-gw-1], [-neei eth2 -Q out], [gw1_out], >>> + [ovn-gw-2], [-neei eth2], [gw2], >>> + [ovn-gw-2], [-neei eth2 -Q out], [gw2_out], >>> + [ovn-gw-3], [-neei eth2], [gw3], >>> + [ovn-gw-3], [-neei eth2 -Q out], [gw3_out], >>> + [ovn-gw-1], [-neei eth1], [gw1_eth1], >>> + [ovn-gw-2], [-neei eth1], [gw2_eth1], >>> + [ovn-chassis-1], [-neei eth1], [ch1_eth1], >>> + [ovn-chassis-2], [-neei eth1], [ch2_eth1]) >>> } >>> >>> stop_tcpdump() { >>> echo "$(date +%H:%M:%S.%03N) Stopping tcpdump" >>> - m_kill 'ovn-gw-1 ovn-gw-2 ovn-gw-3 ovn-chassis-1 ovn-chassis-2' tcpdump >>> + m_kill 'ovn-gw-1 ovn-gw-2 ovn-gw-3 ovn-chassis-1 ovn-chassis-2 ovn-chassis-3' tcpdump >>> } >>> >>> -# Send packets from chassis2 (ext1) to chassis1 >>> +# Send packets from ovn-chassis-3 (ext1) to ovn-chassis-1 >>> send_background_packets() { >>> echo "$(date +%H:%M:%S.%03N) Sending packets in Background" >>> start_tcpdump >>> - M_NS_DAEMONIZE([ovn-chassis-2], [ext1], [ping -f -i 0.1 192.168.1.1], [ping.pid]) >>> + M_NS_DAEMONIZE([ovn-chassis-3], [ext1], [ping -f -i 0.1 192.168.1.1], [ping.pid]) >>> } >>> >>> stop_sending_background_packets() { >>> echo "$(date +%H:%M:%S.%03N) Stopping Background process" >>> m_as ovn-chassis-1 ps -ef | grep -v grep | grep -q ping && \ >>> m_as ovn-chassis-1 echo "Stopping ping on ovn-chassis-1" && killall ping >>> - m_as ovn-chassis-2 ps -ef | grep -v grep | grep -q ping && \ >>> + m_as ovn-chassis-3 ps -ef | grep -v grep | grep -q ping && \ >>> m_as ovn-chassis-2 echo "Stopping ping on ovn-chassis-2" && killall ping >>> stop_tcpdump >>> } >>> @@ -3216,8 +3253,8 @@ check_for_new_garps() { >>> expecting_garp=$2 >>> n_new_garps=$(cat ${hv}_out.tcpdump | grep -c "f0:00:c0:a8:00:fe > Broadcast, ethertype ARP (0x0806), length 42: Request who-has 192.168.0.254 tell 192.168.0.254, length 28") >>> >>> - if [ "$expecting_garp" == "true" ]; then >>> - AS_BOX([$(date +%H:%M:%S.%03N) Waiting/checking for garp from $hv - Starting with $n_new_garps]) >>> + if [[ "$expecting_garp" == "true" ]]; then >>> + echo "$(date +%H:%M:%S.%03N) Waiting/checking for garp from $hv - Starting with $n_new_garps" >>> OVS_WAIT_UNTIL([ >>> n_garps=$n_new_garps >>> n_new_garps=$(cat ${hv}_out.tcpdump | grep -c "f0:00:c0:a8:00:fe > Broadcast, ethertype ARP (0x0806), length 42: Request who-has 192.168.0.254 tell 192.168.0.254, length 28") >>> @@ -3225,7 +3262,7 @@ check_for_new_garps() { >>> test "$n_garps" -ne "$n_new_garps" >>> ]) >>> else >>> - AS_BOX([$(date +%H:%M:%S.%03N) Checking no garp from ${hv}]) >>> + echo "$(date +%H:%M:%S.%03N) Checking no garp from ${hv}" >>> # Waiting a few seconds to get a chance to see unexpected garps. >>> sleep 3 >>> n_garps=$(cat ${hv}_out.tcpdump | grep -c "f0:00:c0:a8:00:fe > Broadcast, ethertype ARP (0x0806), length 42: Request who-has 192.168.0.254 tell 192.168.0.254, length 28") >>> @@ -3241,8 +3278,8 @@ check_for_new_echo_pkts() { >>> n_new_echo_req=$(cat ${hv}.tcpdump | grep -c "$mac_src > $mac_dst, ethertype IPv4 (0x0800), length 98: 192.168.0.1 > 192.168.1.1: ICMP echo request") >>> n_new_echo_rep=$(cat ${hv}.tcpdump | grep -c "$mac_dst > $mac_src, ethertype IPv4 (0x0800), length 98: 192.168.1.1 > 192.168.0.1: ICMP echo reply") >>> >>> - if [ "$expecting_pkts" == "true" ]; then >>> - AS_BOX([$(date +%H:%M:%S.%03N) Waiting/checking for echo pkts through ${hv}]) >>> + if [[ "$expecting_pkts" == "true" ]]; then >>> + echo "$(date +%H:%M:%S.%03N) Waiting/checking for echo pkts through ${hv}" >>> echo "Starting with $n_new_echo_req requests and $n_new_echo_rep replies so far on ${hv}." >>> OVS_WAIT_UNTIL([ >>> n_echo_req=$n_new_echo_req >>> @@ -3253,7 +3290,7 @@ check_for_new_echo_pkts() { >>> test "$n_echo_req" -ne "$n_new_echo_req" && test "$n_echo_rep" -ne "$n_new_echo_rep" >>> ]) >>> else >>> - AS_BOX([$(date +%H:%M:%S.%03N) Checking no pkts from ${hv}]) >>> + echo "$(date +%H:%M:%S.%03N) Checking no pkts from ${hv}" >>> # Waiting a few seconds to get a chance to see unexpected pkts. >>> sleep 3 >>> n_echo_req=$(cat ${hv}.tcpdump | grep -c "$mac_src > $mac_dst, ethertype IPv4 (0x0800), length 98: 192.168.0.1 > 192.168.1.1: ICMP echo request") >>> @@ -3271,22 +3308,44 @@ dump_statistics() { >>> ch1_rep=$(grep -c "ICMP echo reply" ch1.tcpdump) >>> ch2_req=$(grep -c "ICMP echo request" ch2.tcpdump) >>> ch2_rep=$(grep -c "ICMP echo reply" ch2.tcpdump) >>> + ch3_req=$(grep -c "ICMP echo request" ch3.tcpdump) >>> + ch3_rep=$(grep -c "ICMP echo reply" ch3.tcpdump) >>> gw1_req=$(grep -c "ICMP echo request" gw1.tcpdump) >>> gw1_rep=$(grep -c "ICMP echo reply" gw1.tcpdump) >>> gw2_req=$(grep -c "ICMP echo request" gw2.tcpdump) >>> gw2_rep=$(grep -c "ICMP echo reply" gw2.tcpdump) >>> gw3_req=$(grep -c "ICMP echo request" gw3.tcpdump) >>> gw3_rep=$(grep -c "ICMP echo reply" gw3.tcpdump) >>> - echo "$n1 claims in gw1, $n2 in gw2 and $n3 on gw3" >>> - echo "ch2_request=$ch2_req gw1_request=$gw1_req gw2_request=$gw2_req gw3_request=$gw3_req ch1_request=$ch1_req ch1_reply=$ch1_rep gw1_reply=$gw1_rep gw2_reply=$gw2_rep gw3_reply=$gw3_rep ch2_reply=$ch2_rep" >>> + echo "$n1 claims in gw1, $n2 in gw2 and $n3 on gw3" >&2 >>> + echo "ch3_req=$ch3_req gw_req=($gw1_req + $gw2_req +$gw3_req) ch1_req=$ch1_req ch1_rep=$ch1_rep gw_rep=($gw1_rep + $gw2_rep + $gw3_rep) ch3_rep=$ch3_rep ch2=($ch2_req+$ch2_rep)" >&2 >>> + echo "$((ch3_req - ch3_rep))" >>> } >>> >>> -check_migration_between_gw1_and_gw2() { >>> - action=$1 >>> - send_background_packets >>> +add_port() { >>> + bridge=$1 >>> + interface=$2 >>> + address=$3 >>> + echo "Adding $bridge $interface $address" >>> + >>> + pid=$(podman inspect -f '{{.State.Pid}}' ovn-gw-1) >>> + ln -sf /proc/$pid/ns/net /var/run/netns/$pid >>> + port=$(OVS_RUNDIR= ovs-vsctl --data=bare --no-heading --columns=name find interface \ >>> + external_ids:container_id=ovn-gw-1 external_ids:container_iface="$interface") >>> + port="${port:0:13}" >>> + ip link add "${port}_l" type veth peer name "${port}_c" >>> + ip link set "${port}_l" up >>> + ip link set "${port}_c" netns $pid >>> + ip netns exec $pid ip link set dev "${port}_c" name "$interface" >>> + ip netns exec $pid ip link set "$interface" up >>> + if [[ -n "$address" ]]; then >>> + ip netns exec $pid ip addr add "$address" dev "$interface" >>> + fi > > I might be wrong but I think nobody cleans up any of these ports created > in the ovn-gw-1 container. Do we need some on_exit() calls here? > >>> +} >>> >>> +prepare() { >>> + send_background_packets >>> # We make sure gw1 is leader since enough time that it generated all its garps. >>> - AS_BOX([$(date +%H:%M:%S.%03N) Waiting all garps sent by gw1]) >>> + echo $(date +%H:%M:%S.%03N) Waiting all garps sent by gw1 >>> n_new_garps=$(cat gw1_out.tcpdump | grep -c "f0:00:c0:a8:00:fe > Broadcast, ethertype ARP (0x0806), length 42: Request who-has 192.168.0.254 tell 192.168.0.254, length 28") >>> OVS_WAIT_UNTIL([ >>> n_garps=$n_new_garps >>> @@ -3302,130 +3361,267 @@ check_migration_between_gw1_and_gw2() { >>> check_for_new_echo_pkts gw2 "00:00:c0:a8:00:01" "f0:00:c0:a8:00:fe" "false" >>> check_for_new_echo_pkts gw3 "00:00:c0:a8:00:01" "f0:00:c0:a8:00:fe" "false" >>> >>> + # All packets should go through gw1, and none through gw2 or gw3. >>> + check_packets "true" "false" "false" "true" >>> flap_count_gw_1=$(m_as ovn-gw-1 ovs-vsctl get interface $from_gw1_to_gw2 bfd_status | sed 's/.*flap_count=\"\([[0-9]]*\).*/\1/g') >>> flap_count_gw_2=$(m_as ovn-gw-2 ovs-vsctl get interface $from_gw2_to_gw1 bfd_status | sed 's/.*flap_count=\"\([[0-9]]*\).*/\1/g') >>> +} >>> >>> - if [ test "$action" == "stop_bfd" ]; then >>> - AS_BOX([$(date +%H:%M:%S.%03N) Blocking bfd on gw1 (from $ip_gw1 to $ip_gw2)]) >>> - nsenter --net=/proc/$gw1_pid/ns/net nft add table ip ovn-test >>> - nsenter --net=/proc/$gw1_pid/ns/net nft 'add chain ip ovn-test INPUT { type filter hook input priority 0; policy accept; }' >>> - # Drop BFD from gw-1 to gw-2: geneve port (6081), inner port 3784 (0xec8), Session state Up, Init, Down. >>> - nsenter --net=/proc/$gw1_pid/ns/net nft add rule ip ovn-test INPUT ip daddr $ip_gw1 ip saddr $ip_gw2 udp dport 6081 '@th,416,16 == 0x0ec8 @th,472,8 == 0xc0 counter drop' >>> - nsenter --net=/proc/$gw1_pid/ns/net nft add rule ip ovn-test INPUT ip daddr $ip_gw1 ip saddr $ip_gw2 udp dport 6081 '@th,416,16 == 0x0ec8 @th,472,8 == 0x80 counter drop' >>> - nsenter --net=/proc/$gw1_pid/ns/net nft add rule ip ovn-test INPUT ip daddr $ip_gw1 ip saddr $ip_gw2 udp dport 6081 '@th,416,16 == 0x0ec8 @th,472,8 == 0x40 counter drop' >>> - >>> - # We do not check that packets go through gw2 as BFD between chassis-2 and gw1 is still up >>> - fi >>> - >>> - if [ test "$action" == "kill_gw2" ]; then >>> - AS_BOX([$(date +%H:%M:%S.%03N) Killing gw2 ovn-controller]) >>> - on_exit 'm_as ovn-gw-2 /usr/share/openvswitch/scripts/ovs-ctl status || >>> - m_as ovn-gw-2 /usr/share/openvswitch/scripts/ovs-ctl start --system-id=ovn-gw-2' >>> - on_exit 'm_as ovn-gw-2 /usr/share/ovn/scripts/ovn-ctl status_controller || >>> - m_as ovn-gw-2 /usr/share/ovn/scripts/ovn-ctl start_controller ${CONTROLLER_SSL_ARGS}' >>> - >>> - m_as ovn-gw-2 kill -9 $(m_as ovn-gw-2 cat /run/ovn/ovn-controller.pid) >>> - m_as ovn-gw-2 kill -9 $(m_as ovn-gw-2 cat /run/openvswitch/ovs-vswitchd.pid) >>> - m_as ovn-gw-2 kill -9 $(m_as ovn-gw-2 cat /run/openvswitch/ovsdb-server.pid) >>> - # Also delete datapath (flows) >>> - m_as ovn-gw-2 ovs-dpctl del-dp system@ovs-system >>> - fi >>> - >>> - if [ test "$action" == "kill_gw1" ]; then >>> - AS_BOX([$(date +%H:%M:%S.%03N) Killing gw1 ovn-controller]) >>> - on_exit 'm_as ovn-gw-1 /usr/share/openvswitch/scripts/ovs-ctl status || >>> - m_as ovn-gw-1 /usr/share/openvswitch/scripts/ovs-ctl start --system-id=ovn-gw-1' >>> - on_exit 'm_as ovn-gw-1 /usr/share/ovn/scripts/ovn-ctl status_controller || >>> - m_as ovn-gw-1 /usr/share/ovn/scripts/ovn-ctl start_controller ${CONTROLLER_SSL_ARGS}' >>> - >>> - m_as ovn-gw-1 kill -9 $(m_as ovn-gw-1 cat /run/ovn/ovn-controller.pid) >>> - m_as ovn-gw-1 kill -9 $(m_as ovn-gw-1 cat /run/openvswitch/ovs-vswitchd.pid) >>> - m_as ovn-gw-1 kill -9 $(m_as ovn-gw-1 cat /run/openvswitch/ovsdb-server.pid) >>> - # Also delete datapath (flows) >>> - m_as ovn-gw-1 ovs-dpctl del-dp system@ovs-system >>> - fi >>> +check_loss_after_flap() >>> +{ >>> + dead=$1 >>> + max_expected_loss=$2 >>> >>> - if [ test "$action" == "kill_gw2" ]; then >>> - AS_BOX([$(date +%H:%M:%S.%03N) Waiting for flap count between gw1 and gw2 to increase]) >>> + if [[ "$dead" == "gw2" ]]; then >>> + echo "$(date +%H:%M:%S.%03N) Waiting for flap count between gw1 and gw2 to increase" >>> OVS_WAIT_UNTIL([ >>> new_flap_count=$(m_as ovn-gw-1 ovs-vsctl get interfac $from_gw1_to_gw2 bfd_status | sed 's/.*flap_count=\"\([[0-9]]*\).*/\1/g') >>> echo "Comparing $new_flap_count versus $flap_count_gw_1" >>> test "$new_flap_count" -gt "$((flap_count_gw_1))" >>> ]) >>> else >>> - AS_BOX([$(date +%H:%M:%S.%03N) Waiting for flap count between gw2 and gw1 to increase]) >>> + echo "$(date +%H:%M:%S.%03N) Waiting for flap count between gw2 and gw1 to increase])" >>> OVS_WAIT_UNTIL([ >>> new_flap_count=$(m_as ovn-gw-2 ovs-vsctl get interfac $from_gw2_to_gw1 bfd_status | sed 's/.*flap_count=\"\([[0-9]]*\).*/\1/g') >>> echo "Comparing $new_flap_count versus $flap_count_gw_2" >>> test "$new_flap_count" -gt "$((flap_count_gw_2))" >>> ]) >>> - >>> fi >>> - AS_BOX([$(date +%H:%M:%S.%03N) Flapped!]) >>> >>> + echo "$(date +%H:%M:%S.%03N) Flapped!" >>> # Wait a few more second for the fight. >>> + sleep 4 >>> + >>> + echo "$(date +%H:%M:%S.%03N) Statistics after flapping" >>> + lost=$(dump_statistics) >>> + echo "===> $lost packet lost while handling migration" >>> + AT_CHECK([test "$lost" -le "$max_expected_loss"]) >>> +} >>> + >>> +final_check() >>> +{ >>> + action=$1 >>> + lost=$2 >>> + max_expected_loss_after_restoration=$3 >>> + >>> + # Wait a little more to get packets while network is restored >>> sleep 2 >>> - AS_BOX([$(date +%H:%M:%S.%03N) Statistics after flapping]) >>> - dump_statistics >>> - >>> - if [ test "$action" == "stop_bfd" ]; then >>> - # gw1 still alive and gw2 tried to claim => gw1 should restart generating garps. >>> - check_for_new_garps gw1 "true" >>> - check_for_new_garps gw2 "false" >>> - check_for_new_garps gw3 "false" >>> - check_for_new_echo_pkts gw1 00:00:c0:a8:00:01 f0:00:c0:a8:00:fe "true" >>> - check_for_new_echo_pkts gw2 00:00:c0:a8:00:01 f0:00:c0:a8:00:fe "false" >>> - check_for_new_echo_pkts gw3 00:00:c0:a8:00:01 f0:00:c0:a8:00:fe "false" >>> - check_for_new_echo_pkts ch1 f0:00:c0:a8:01:fe f0:00:c0:a8:01:01 "true" >>> - AS_BOX([$(date +%H:%M:%S.%03N) Unblocking bfd on gw1]) >>> - nsenter --net=/proc/$gw1_pid/ns/net nft -a list ruleset >>> - nsenter --net=/proc/$gw1_pid/ns/net nft delete table ip ovn-test >>> - fi >>> + echo "$(date +%H:%M:%S.%03N) Statistics after network restored (after $action)" >>> + new_lost=$(dump_statistics) >>> + echo "===> $((new_lost - lost)) packets lost during network restoration" >>> + AT_CHECK([test "$((new_lost - lost))" -le "$max_expected_loss_after_restoration"]) >>> + stop_sending_background_packets >>> +} >>> >>> - if [ test "$action" == "kill_gw2" ]; then >>> - # gw1 still alive, but gw2 did not try to claim => gw1 should not generate new garps. >>> - check_for_new_garps gw1 "false" >>> - check_for_new_garps gw2 "false" >>> - check_for_new_garps gw3 "false" >>> - check_for_new_echo_pkts gw1 00:00:c0:a8:00:01 f0:00:c0:a8:00:fe "true" >>> - check_for_new_echo_pkts gw2 00:00:c0:a8:00:01 f0:00:c0:a8:00:fe "false" >>> - check_for_new_echo_pkts gw3 00:00:c0:a8:00:01 f0:00:c0:a8:00:fe "false" >>> - check_for_new_echo_pkts ch1 f0:00:c0:a8:01:fe f0:00:c0:a8:01:01 "true" >>> - AS_BOX([$(date +%H:%M:%S.%03N) Restarting gw2 ovn-vswitchd]) >>> - m_as ovn-gw-2 /usr/share/openvswitch/scripts/ovs-ctl start --system-id=ovn-gw-2 >>> - >>> - AS_BOX([$(date +%H:%M:%S.%03N) Restarting gw2 ovn-controller]) >>> - m_as ovn-gw-2 /usr/share/ovn/scripts/ovn-ctl start_controller ${CONTROLLER_SSL_ARGS} >>> - fi >>> +check_garps() >>> +{ >>> + check_for_new_garps gw1 "$1" >>> + check_for_new_garps gw2 "$2" >>> + check_for_new_garps gw3 "$3" >>> +} >>> >>> - if [ test "$action" == "kill_gw1" ]; then >>> - # gw1 died => gw2 should generate garps. >>> - check_for_new_garps gw1 "false" >>> - check_for_new_garps gw2 "true" >>> - check_for_new_garps gw3 "false" >>> - check_for_new_echo_pkts gw1 00:00:c0:a8:00:01 f0:00:c0:a8:00:fe "false" >>> - check_for_new_echo_pkts gw2 00:00:c0:a8:00:01 f0:00:c0:a8:00:fe "true" >>> - check_for_new_echo_pkts gw3 00:00:c0:a8:00:01 f0:00:c0:a8:00:fe "false" >>> - check_for_new_echo_pkts ch1 f0:00:c0:a8:01:fe f0:00:c0:a8:01:01 "true" >>> - AS_BOX([$(date +%H:%M:%S.%03N) Restarting gw1 ovn-vswitchd]) >>> - m_as ovn-gw-1 /usr/share/openvswitch/scripts/ovs-ctl start --system-id=ovn-gw-1 >>> - >>> - AS_BOX([$(date +%H:%M:%S.%03N) Restarting gw1 ovn-controller]) >>> - m_as ovn-gw-1 /usr/share/ovn/scripts/ovn-ctl start_controller ${CONTROLLER_SSL_ARGS} >>> - fi >>> +check_packets() >>> +{ >>> + check_for_new_echo_pkts gw1 00:00:c0:a8:00:01 f0:00:c0:a8:00:fe "$1" >>> + check_for_new_echo_pkts gw2 00:00:c0:a8:00:01 f0:00:c0:a8:00:fe "$2" >>> + check_for_new_echo_pkts gw3 00:00:c0:a8:00:01 f0:00:c0:a8:00:fe "$3" >>> + check_for_new_echo_pkts ch1 f0:00:c0:a8:01:fe f0:00:c0:a8:01:01 "$4" >>> +} >>> + >>> +check_migration_between_gw1_and_gw2_bfd_stop() >>> +{ >>> + AS_BOX([$(date +%H:%M:%S.%03N) Testing migration after bfd_stop]) >>> + max_expected_max_expected_loss1=$1 >>> + max_expected_max_expected_loss2=$2 >>> + prepare >>> + >>> + echo "$(date +%H:%M:%S.%03N) Blocking bfd on gw1 (from $ip_gw1 to $ip_gw2)" >>> + nsenter --net=/proc/$gw1_pid/ns/net nft add table ip ovn-test >>> + nsenter --net=/proc/$gw1_pid/ns/net nft 'add chain ip ovn-test INPUT { type filter hook input priority 0; policy accept; }' >>> + # Drop BFD from gw-1 to gw-2: geneve port (6081), inner port 3784 (0xec8), Session state Up, Init, Down. >>> + nsenter --net=/proc/$gw1_pid/ns/net nft add rule ip ovn-test INPUT ip daddr $ip_gw1 ip saddr $ip_gw2 udp dport 6081 '@th,416,16 == 0x0ec8 @th,472,8 == 0xc0 counter drop' >>> + nsenter --net=/proc/$gw1_pid/ns/net nft add rule ip ovn-test INPUT ip daddr $ip_gw1 ip saddr $ip_gw2 udp dport 6081 '@th,416,16 == 0x0ec8 @th,472,8 == 0x80 counter drop' >>> + nsenter --net=/proc/$gw1_pid/ns/net nft add rule ip ovn-test INPUT ip daddr $ip_gw1 ip saddr $ip_gw2 udp dport 6081 '@th,416,16 == 0x0ec8 @th,472,8 == 0x40 counter drop' >>> + >>> + check_loss_after_flap "gw1" $max_expected_max_expected_loss1 >>> + >>> + # gw1 still alive and gw2 tried to claim => gw1 should restart generating garps. >>> + check_garps "true" "false" "false" >>> + check_packets "true" "false" "false" "true" >>> + >>> + echo "$(date +%H:%M:%S.%03N) Unblocking bfd on gw1" >>> + nsenter --net=/proc/$gw1_pid/ns/net nft -a list ruleset >>> + nsenter --net=/proc/$gw1_pid/ns/net nft delete table ip ovn-test >>> >>> # The network is now restored => packets should go through gw1 and reach chassis-1. >>> - check_for_new_echo_pkts gw1 00:00:c0:a8:00:01 f0:00:c0:a8:00:fe "true" >>> - check_for_new_echo_pkts gw2 00:00:c0:a8:00:01 f0:00:c0:a8:00:fe "false" >>> - check_for_new_echo_pkts gw3 00:00:c0:a8:00:01 f0:00:c0:a8:00:fe "false" >>> - check_for_new_echo_pkts ch1 f0:00:c0:a8:01:fe f0:00:c0:a8:01:01 "true" >>> - AS_BOX([$(date +%H:%M:%S.%03N) Statistics after network restored]) >>> - dump_statistics >>> - stop_sending_background_packets >>> + check_packets "true" "false" "false" "true" >>> + final_check "bfd_stop" $lost $max_expected_max_expected_loss2 >>> +} >>> + >>> +check_migration_between_gw1_and_gw2_kill_gw2() { >>> + AS_BOX([$(date +%H:%M:%S.%03N) Check migration after killing gw2 ovn-controller & vswitchd]) >>> + max_expected_loss1=$1 >>> + max_expected_loss2=$2 >>> + prepare >>> + >>> + on_exit 'm_as ovn-gw-2 /usr/share/openvswitch/scripts/ovs-ctl status || >>> + m_as ovn-gw-2 /usr/share/openvswitch/scripts/ovs-ctl start --system-id=ovn-gw-2' >>> + on_exit 'm_as ovn-gw-2 /usr/share/ovn/scripts/ovn-ctl status_controller || >>> + m_as ovn-gw-2 /usr/share/ovn/scripts/ovn-ctl start_controller ${CONTROLLER_SSL_ARGS}' >>> + >>> + m_as ovn-gw-2 kill -9 $(m_as ovn-gw-2 cat /run/ovn/ovn-controller.pid) >>> + m_as ovn-gw-2 kill -9 $(m_as ovn-gw-2 cat /run/openvswitch/ovs-vswitchd.pid) >>> + m_as ovn-gw-2 kill -9 $(m_as ovn-gw-2 cat /run/openvswitch/ovsdb-server.pid) >>> + m_as ovn-gw-2 ovs-dpctl del-dp system@ovs-system >>> + >>> + check_loss_after_flap "gw2" $max_expected_loss1 >>> + >>> + # gw1 still alive, but gw2 did not try to claim => gw1 should not generate new garps. >>> + check_garps "false" "false" "false" >>> + check_packets "true" "fals" "false" "true" >>> + >>> + echo "$(date +%H:%M:%S.%03N) Restarting gw2 ovn-vswitchd]" >>> + m_as ovn-gw-2 /usr/share/openvswitch/scripts/ovs-ctl start --system-id=ovn-gw-2 >>> + >>> + echo "$(date +%H:%M:%S.%03N) Restarting gw2 ovn-controller" >>> + m_as ovn-gw-2 /usr/share/ovn/scripts/ovn-ctl start_controller ${CONTROLLER_SSL_ARGS} >>> + >>> + # The network is now restored => packets should go through gw1 and reach chassis-1. >>> + check_packets "true" "false" "false" "true" >>> + final_check "kill_gw2" $lost $max_expected_loss2 >>> +} >>> + >>> +check_migration_between_gw1_and_gw2_update_ovs() { >>> + AS_BOX([$(date +%H:%M:%S.%03N) Check migration after restarting gw1 ovs-vswitchd ("update")]) >>> + max_expected_loss1=$1 >>> + max_expected_loss2=$2 >>> + prepare >>> + >>> + m_as ovn-gw-1 /usr/share/openvswitch/scripts/ovs-ctl restart --system-id=ovn-gw-1 >>> + >>> + check_loss_after_flap "gw1" $max_expected_loss1 >>> + >>> + # The network is now restored => packets should go through gw1 and reach chassis-1. >>> + check_packets "true" "false" "false" "true" >>> + final_check "ovs_update" $lost $max_expected_loss2 >>> +} >>> + >>> +check_migration_between_gw1_and_gw2_kill_gw1() { >>> + AS_BOX([$(date +%H:%M:%S.%03N) Killing gw1 ovn-controller and ovs-vswitchd]) >>> + max_expected_loss1=$1 >>> + max_expected_loss2=$2 >>> + prepare >>> + >>> + on_exit 'm_as ovn-gw-1 /usr/share/openvswitch/scripts/ovs-ctl status || >>> + m_as ovn-gw-1 /usr/share/openvswitch/scripts/ovs-ctl start --system-id=ovn-gw-1' >>> + on_exit 'm_as ovn-gw-1 /usr/share/ovn/scripts/ovn-ctl status_controller || >>> + m_as ovn-gw-1 /usr/share/ovn/scripts/ovn-ctl start_controller ${CONTROLLER_SSL_ARGS}' >>> + >>> + m_as ovn-gw-1 kill -9 $(m_as ovn-gw-1 cat /run/ovn/ovn-controller.pid) >>> + m_as ovn-gw-1 kill -9 $(m_as ovn-gw-1 cat /run/openvswitch/ovs-vswitchd.pid) >>> + m_as ovn-gw-1 kill -9 $(m_as ovn-gw-1 cat /run/openvswitch/ovsdb-server.pid) >>> + # Also delete datapath (flows) >>> + m_as ovn-gw-1 ovs-dpctl del-dp system@ovs-system >>> + >>> + check_loss_after_flap "gw1" $max_expected_loss1 >>> + >>> + # gw1 died => gw2 should generate garps. >>> + check_garps "false" "true" "false" >>> + check_packets "false" "true" "false" "true" >>> + echo "$(date +%H:%M:%S.%03N) Restarting gw1 ovn-vswitchd after killing gw1" >>> + m_as ovn-gw-1 /usr/share/openvswitch/scripts/ovs-ctl start --system-id=ovn-gw-1 >>> + >>> + # Wait some long time before restarting ovn-controller >>> + sleep 10 >>> + >>> + # gw2 should still be handling packets as OVN not restarted on gw1 >>> + check_packets "false" "true" "false" "true" >>> + >>> + echo "$(date +%H:%M:%S.%03N) Restarting gw1 ovn-controller after killing gw1" >>> + m_as ovn-gw-1 /usr/share/ovn/scripts/ovn-ctl start_controller ${CONTROLLER_SSL_ARGS} >>> + >>> + # The network is now restored => packets should go through gw1 and reach chassis-1. >>> + check_packets "true" "false" "false" "true" >>> + final_check "kill_gw1" $lost $max_expected_loss2 >>> +} >>> + >>> +check_migration_between_gw1_and_gw2_reboot_gw1() { >>> + ip_gw1_eth1=$(podman exec ovn-gw-1 ip -brief address show eth1 | awk '{print $3}' | cut -d/ -f1) >>> + cidr=$(podman exec ovn-gw-1 ip -brief address show eth1 | awk '{print $3}' | cut -d/ -f2) >>> + AS_BOX([$(date +%H:%M:%S.%03N) Rebooting ovn-gw-1 with $ip_gw1_eth1/$cidr]) >>> + max_expected_loss1=$1 >>> + max_expected_loss2=$2 >>> + prepare >>> + >>> + podman stop -t 0 ovn-gw-1 >>> + (exec 3>&- 4>&- 5>&- 6>&-; podman start ovn-gw-1) >>> + >>> + add_port br-ovn eth1 $ip_gw1_eth1/$cidr >>> + add_port br-ovn-ext eth2 >>> + M_START_TCPDUMPS([ovn-gw-1], [-neei eth2], [gw1], [ovn-gw-1], [-neei eth1], [gw1_eth1], [ovn-gw-1], [-neei eth2 -Q out], [gw1_out]) >>> + check_loss_after_flap "gw1" $max_expected_loss1 >>> + >>> + # gw1 died => gw2 should generate garps. >>> + check_garps "false" "true" "false" >>> + check_packets "false" "true" "false" "true" >>> + >>> + echo "$(date +%H:%M:%S.%03N) Restarting gw1 ovn-vswitchd after rebooting gw1" >>> + m_as ovn-gw-1 /usr/share/openvswitch/scripts/ovs-ctl start --system-id=ovn-gw-1 >>> + >>> + # Wait some long time before restarting ovn-controller >>> + sleep 10 >>> + >>> + # gw2 should still be handling packets as OVN not restarted on gw1 >>> + check_packets "false" "true" "false" "true" >>> + >>> + echo "$(date +%H:%M:%S.%03N) Restarting gw1 ovn-controller after rebooting gw1" >>> + m_as ovn-gw-1 /usr/share/ovn/scripts/ovn-ctl start_controller ${CONTROLLER_SSL_ARGS} >>> + >>> + # The network is now restored => packets should go through gw1 and reach chassis-1. >>> + check_packets "true" "false" "false" "true" >>> + final_check "kill_gw1" $lost $max_expected_loss2 >>> +} >>> + >>> +check_compute_restart() { >>> + AS_BOX([$(date +%H:%M:%S.%03N) Killing ovn-chassis-1 ovn-controller and ovs-vswitchd]) >>> + max_expected_loss=$1 >>> + prepare >>> + >>> + # Kill ovn-chassis-1 >>> + echo "$(date +%H:%M:%S.%03N) Killing chassis-1" >>> + on_exit 'm_as ovn-chassis-1 /usr/share/openvswitch/scripts/ovs-ctl status || >>> + m_as ovn-chassis-1 /usr/share/openvswitch/scripts/ovs-ctl start --system-id=ovn-chassis-1' >>> + on_exit 'm_as ovn-chassis-1 /usr/share/ovn/scripts/ovn-ctl status_controller || >>> + m_as ovn-chassis-1 /usr/share/ovn/scripts/ovn-ctl start_controller ${CONTROLLER_SSL_ARGS}' >>> + >>> + m_as ovn-chassis-1 kill -9 $(m_as ovn-chassis-1 cat /run/ovn/ovn-controller.pid) >>> + m_as ovn-chassis-1 kill -9 $(m_as ovn-chassis-1 cat /run/openvswitch/ovs-vswitchd.pid) >>> + m_as ovn-chassis-1 kill -9 $(m_as ovn-chassis-1 cat /run/openvswitch/ovsdb-server.pid) >>> + >>> + # Now restart chassis-1 >>> + flap_count=$(m_as ovn-gw-2 ovs-vsctl get interfac $from_gw2_to_ch1 bfd_status | sed 's/.*flap_count=\"\([[0-9]]*\).*/\1/g') >>> + echo "$(date +%H:%M:%S.%03N) Restarting ovn-chassis-1 ovn-vswitchd." >>> + m_as ovn-chassis-1 /usr/share/openvswitch/scripts/ovs-ctl start --system-id=ovn-chassis-1 >>> + >>> + echo "$(date +%H:%M:%S.%03N) Waiting for flap count between gw-1 and chassis-1 to increase" >>> + OVS_WAIT_UNTIL([ >>> + new_flap_count=$(m_as ovn-gw-1 ovs-vsctl get interfac $from_gw1_to_ch1 bfd_status | sed 's/.*flap_count=\"\([[0-9]]*\).*/\1/g') >>> + echo "Comparing $new_flap_count versus $flap_count" >>> + test "$new_flap_count" -gt "$((flap_count))" >>> + ]) >>> + >>> + wait_bfd_up ovn-chassis-1 $from_ch1_to_gw1 >>> + >>> + echo "$(date +%H:%M:%S.%03N) Restarting ovn-chassis-1 ovn-controller." >>> + m_as ovn-chassis-1 /usr/share/ovn/scripts/ovn-ctl start_controller ${CONTROLLER_SSL_ARGS} >>> + >>> + # Wait a long time to catch losses >>> + sleep 5 >>> + final_check "compute" 0 $max_expected_loss >>> } >>> >>> start_tcpdump >>> -AS_BOX([$(date +%H:%M:%S.%03N) Sending packet from hv1-vif1(inside1) to ext1]) >>> +echo "$(date +%H:%M:%S.%03N) Sending packet from hv1-vif1(inside1) to ext1" >>> M_NS_CHECK_EXEC([ovn-chassis-1], [hv1-vif1], [ping -c3 -q -i 0.1 192.168.0.1 | FORMAT_PING], >>> [0], [dnl >>> 3 packets transmitted, 3 received, 0% packet loss, time 0ms >>> @@ -3433,7 +3629,7 @@ M_NS_CHECK_EXEC([ovn-chassis-1], [hv1-vif1], [ping -c3 -q -i 0.1 192.168.0.1 | F >>> stop_tcpdump >>> >>> # It should have gone through gw1 and not gw2 >>> -AS_BOX([$(date +%H:%M:%S.%03N) Checking it went through gw1 and not gw2]) >>> +echo "$(date +%H:%M:%S.%03N) Checking it went through gw1 and not gw2" >>> AT_CHECK([cat gw2.tcpdump | grep "ICMP echo"], [1], [dnl >>> ]) >>> >>> @@ -3446,17 +3642,29 @@ f0:00:c0:a8:00:fe > 00:00:c0:a8:00:01, ethertype IPv4 (0x0800), length 98: 192.1 >>> 00:00:c0:a8:00:01 > f0:00:c0:a8:00:fe, ethertype IPv4 (0x0800), length 98: 192.168.0.1 > 192.168.1.1: ICMP echo reply, >>> ]) >>> >>> -# We stop bfd between gw1 & gw2, but keep gw1 & gw2 running. >>> -check_migration_between_gw1_and_gw2 "stop_bfd" >>> +# We stop bfd between gw1 & gw2, but keep gw1 & gw2 running. We should not lose packets. >>> +check_migration_between_gw1_and_gw2_bfd_stop 1 1 >>> >>> # We simulate death of gw2. It should not have any effect. >>> -check_migration_between_gw1_and_gw2 "kill_gw2" >>> +check_migration_between_gw1_and_gw2_kill_gw2 1 1 >>> + >>> +# We simulate ovs update on gw1. When ovs is stopped, flows should still be handled by Kernel datapath. >>> +# When OVS is restarted, BFD should go down immediately, and gw2 will start handling packets. >>> +# There will be packet losses as gw2 will usually see BFD from gw1 up (and hence relase port) before gw1 sees >>> +# BFD up (and claim port). >>> +check_migration_between_gw1_and_gw2_update_ovs 20 1 >>> + >>> +# We simulate restart of both OVS & OVN gw1. gw2 should take over. >>> +check_migration_between_gw1_and_gw2_kill_gw1 40 20 >>> >>> # We simulate death of gw1. gw2 should take over. >>> -check_migration_between_gw1_and_gw2 "kill_gw1" >>> +check_migration_between_gw1_and_gw2_reboot_gw1 40 20 >>> + >>> +# We simulate restart of ovn-chassis-1. We expct for ~2 sec losses as we wait for bfd up before starting >>> +# ovn-controller. >>> +check_compute_restart 30 >>> >>> AT_CLEANUP >>> -]) >>> >>> AT_SETUP([ovn multinode bgp L2 EVPN]) >>> check_fake_multinode_setup >>> -- >>> 2.47.1 >>> >> > > Regards, > Dumitru >
Hi Dumitru Thanks for looking into this. On Wed, Mar 25, 2026 at 4:35 PM Dumitru Ceara <dceara@redhat.com> wrote: > On 3/25/26 4:31 PM, Dumitru Ceara wrote: > > On 3/12/26 8:35 PM, Mark Michelson via dev wrote: > >> Hi Xavier, > >> > > > > Hi Xavier, Mark, > > > >> It took a bit for me to look through all of this, but I believe this > >> looks good to me. Thanks for adding the notes about the offline > >> discussions, as I had not recalled all the details about those > >> discussions between then and now. > >> > >> Acked-by: Mark Michelson <mmichels@redhat.com> > >> > > > > I think this change makes sense too. I do have some very small comments > > below but I might be able to address them myself and just squash that > > into this version of the patch before applying it to main. > > > > Please let me know what you think. > > > > Hmm, this actually failed in CI: > > https://github.com/dceara/ovn/actions/runs/23547684046/job/68553636598 > > Xavier, would you happen to have some time to look into it by any chance? > I think that it is a bug in the test added by the patch: it uses an incorrect socket path (which depends on the way the host OVS was installed/started). I'll provide a v2., and while at it, will go through the 'nits' > Thanks, > Dumitru > Thanks Xavier > > >> On Mon, Mar 9, 2026 at 3:36 PM Xavier Simonart <xsimonar@redhat.com> > wrote: > >>> > >>> If a server unexpectedly rebooted, OVS, when restarted, sets BFD > >>> UP on bfd-enabled geneve tunnels. > >>> However, if it takes time to restart OVN, an HA gw chassis > >>> would attract the traffic while being unable to handle it > >>> (as no flows), resulting in traffic loss. > >>> > >>> This is fixed by re-using ovs flow-restore-wait. > >>> If set, OVS waits (prevents upcalls, ignores bfd, ...) until reset. > >>> Once OVS receives the notification of flow-restore-wait being false, > >>> it restarts handling upcalls, bfd... and ignores any new change to > >>> flow-restore-wait. > >>> > >>> Hence, on chassis hosting ha gateways, OVN toggles flow-restore-wait: > >>> it set it to false, waits for ack from OVS and then sets it back to > true. > >>> If server reboots, OVS will see flow-restore-wait being true. > >>> > >>> OVN also sets external_ids->ovn-managed-flow-restore-wait when setting > >>> flow-restore-wait. When set, it tells that OVN once set > flow-restore-wait. > >>> > >>> "ovs-ctl restart" also uses flow-restore-wait: when called, it saves > the > >>> flows, stops "ovs-vswitchd", sets "flow-restore-wait" to true, restarts > >>> "ovs-vswitchd", restores the flows and finally removes > "flow-restore-wait". > >>> So OVS will wait either for "ovs-ctl restart" to remove > "flow-restore-wait" > >>> or for OVN to set "flow-restore-wait" to false. > >>> > >>> Reported-at: https://issues.redhat.com/browse/FDP-3075 > >>> Signed-off-by: Xavier Simonart <xsimonar@redhat.com> > >>> > >>> --- > >>> -v2 : - Updated based on Mark's feedback (commit message, comments). > >>> - Avoid setting flow-restore-wait for computes. > >>> - Add external_ids->ovn-managed-flow-restore-wait. > >>> - Updated test: add test for compute update + nits (variable > name changes) > >>> --- > >>> controller/bfd.c | 5 +- > >>> controller/bfd.h | 4 +- > >>> controller/ovn-controller.8.xml | 11 + > >>> controller/ovn-controller.c | 171 +++++++++- > >>> tests/multinode-macros.at | 22 ++ > >>> tests/multinode.at | 546 > ++++++++++++++++++++++---------- > >>> 6 files changed, 584 insertions(+), 175 deletions(-) > >>> > >>> diff --git a/controller/bfd.c b/controller/bfd.c > >>> index 3b0c3f6da..56bfa4936 100644 > >>> --- a/controller/bfd.c > >>> +++ b/controller/bfd.c > >>> @@ -117,13 +117,14 @@ bfd_calculate_active_tunnels(const struct > ovsrec_bridge *br_int, > >>> * > >>> * If 'our_chassis' is C5 then this function returns empty bfd set. > >>> */ > >>> -void > >>> +bool > >>> bfd_calculate_chassis( > >>> const struct sbrec_chassis *our_chassis, > >>> const struct sbrec_ha_chassis_group_table *ha_chassis_grp_table, > >>> struct sset *bfd_chassis) > >>> { > >>> const struct sbrec_ha_chassis_group *ha_chassis_grp; > >>> + bool chassis_is_ha_gw = false; > >>> SBREC_HA_CHASSIS_GROUP_TABLE_FOR_EACH (ha_chassis_grp, > >>> ha_chassis_grp_table) { > >>> bool is_ha_chassis = false; > >>> @@ -143,6 +144,7 @@ bfd_calculate_chassis( > >>> sset_add(&grp_chassis, ha_ch->chassis->name); > >>> if (our_chassis == ha_ch->chassis) { > >>> is_ha_chassis = true; > >>> + chassis_is_ha_gw = true; > >>> bfd_setup_required = true; > >>> } > >>> } > >>> @@ -178,6 +180,7 @@ bfd_calculate_chassis( > >>> } > >>> sset_destroy(&grp_chassis); > >>> } > >>> + return chassis_is_ha_gw; > >>> } > >>> > >>> void > >>> diff --git a/controller/bfd.h b/controller/bfd.h > >>> index f8fece5a5..3e3384891 100644 > >>> --- a/controller/bfd.h > >>> +++ b/controller/bfd.h > >>> @@ -16,6 +16,8 @@ > >>> #ifndef OVN_BFD_H > >>> #define OVN_BFD_H 1 > >>> > >>> +#include <stdbool.h> > >>> + > >>> struct hmap; > >>> struct ovsdb_idl; > >>> struct ovsdb_idl_index; > >>> @@ -36,7 +38,7 @@ void bfd_run(const struct ovsrec_interface_table *, > >>> const struct sbrec_sb_global_table *, > >>> const struct ovsrec_open_vswitch_table *); > >>> > >>> -void bfd_calculate_chassis( > >>> +bool bfd_calculate_chassis( > >>> const struct sbrec_chassis *, > >>> const struct sbrec_ha_chassis_group_table *, > >>> struct sset *); > >>> diff --git a/controller/ovn-controller.8.xml > b/controller/ovn-controller.8.xml > >>> index 57e7cf5dd..33281a4d6 100644 > >>> --- a/controller/ovn-controller.8.xml > >>> +++ b/controller/ovn-controller.8.xml > >>> @@ -531,6 +531,17 @@ > >>> 65535. > >>> </dd> > >>> > >>> + <dt> > >>> + <code>external_ids:ovn-managed-flow-restore-wait</code> in the > >>> + <code>Open_vSwitch</code> table > >>> + </dt> > >>> + <dd> > >>> + When set to true, this key indicates that > <code>ovn-controller</code> > >>> + has set the <code>other_config:flow-restore-wait</code> > option. > >>> + The key is set when <code>ovn-controller</code> enables > >>> + flow-restore-wait and removed when it clears it. > >>> + </dd> > >>> + > >>> <dt> > >>> <code>external_ids:ct-zone-*</code> in the > <code>Bridge</code> table > >>> </dt> > >>> diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c > >>> index 5815f1b92..0611b1767 100644 > >>> --- a/controller/ovn-controller.c > >>> +++ b/controller/ovn-controller.c > >>> @@ -211,6 +211,150 @@ static char *get_file_system_id(void) > >>> free(filename); > >>> return ret; > >>> } > >>> + > >>> +/* Set/unset flow-restore-wait, and inc ovs next_cfg if false > >>> + * When set to true, also sets ovn-managed-flow-restore-wait to true > to > >>> + * indicate ownership */ > >>> +static void set_flow_restore_wait(struct ovsdb_idl_txn *ovs_idl_txn, > >>> + const struct ovsrec_open_vswitch > *cfg, > >>> + const struct smap *other_config, > >>> + const char *val, bool ovn_managed) > > > > Nit: "static void" should be on a different line. > > > >>> +{ > >>> + struct smap new_config; > >>> + smap_clone(&new_config, other_config); > >>> + smap_replace(&new_config, "flow-restore-wait", val); > >>> + ovsrec_open_vswitch_set_other_config(cfg, &new_config); > >>> + if (!strcmp(val, "true")) { > > > > I'd prefer we use an actual boolean here as 'val'. > > > > We'd only have to change: > > > > smap_replace(&new_config, "flow-restore-wait", val ? "true", "false"); > > > >>> + ovsrec_open_vswitch_update_external_ids_setkey( > >>> + cfg, "ovn-managed-flow-restore-wait", "true"); > >>> + } else if (ovn_managed) { > >>> + ovsrec_open_vswitch_update_external_ids_delkey( > >>> + cfg, "ovn-managed-flow-restore-wait"); > >>> + } > >>> + ovsdb_idl_txn_increment(ovs_idl_txn, &cfg->header_, > >>> + &ovsrec_open_vswitch_col_next_cfg, true); > >>> + smap_destroy(&new_config); > >>> +} > >>> + > >>> +static void > >>> +manage_flow_restore_wait(struct ovsdb_idl_txn *ovs_idl_txn, > >>> + const struct ovsrec_open_vswitch *cfg, > >>> + uint64_t ofctrl_cur_cfg, uint64_t > ovs_next_cfg, > >>> + int ovs_txn_status, bool is_ha_gw) > >>> +{ > >>> + enum flow_restore_wait_state { > >>> + FRW_INIT, /* Initial state */ > >>> + FRW_WAIT_TXN_COMPLETE, /* Sent false, waiting txn to complete > */ > >>> + FRW_TXN_SUCCESS, /* Txn completed. Waiting for OVS Ack. > */ > >>> + FRW_DONE /* Everything completed */ > >>> + }; > >>> + > >>> + static int64_t frw_next_cfg; > >>> + static enum flow_restore_wait_state frw_state; > >>> + static bool ofctrl_was_connected = false; > >>> + > >>> + bool ofctrl_connected = ofctrl_is_connected(); > >>> + > >>> + if (!ovs_idl_txn || !cfg) { > >>> + return; > >>> + } > >>> + > >>> + /* If OVS is stopped/started, make sure flow-restore-wait is > toggled */ > > > > Nit: comments should be sentences and end with a '.'. > > > >>> + if (ofctrl_connected && !ofctrl_was_connected) { > >>> + frw_state = FRW_INIT; > >>> + } > >>> + ofctrl_was_connected = ofctrl_connected; > >>> + > >>> + if (!ofctrl_connected) { > >>> + return; > >>> + } > >>> + > >>> + bool frw = smap_get_bool(&cfg->other_config, "flow-restore-wait", > false); > >>> + bool ovn_managed_once = smap_get_bool(&cfg->external_ids, > >>> + "ovn-managed-flow-restore-wait", > false); > > > > Nit: indentation. > > > >>> + > >>> + if (frw && !ovn_managed_once) { > >>> + /* frw has been set by ovs-ctl. Do not touch. */ > >>> + return; > >>> + } > >>> + > >>> + if (!is_ha_gw) { > >>> + if (frw) { > >>> + /* frw has once been set by OVN. We are now not an HA > chassis > >>> + * anymore, unset it. */ > >>> + set_flow_restore_wait(ovs_idl_txn, cfg, > &cfg->other_config, > >>> + "false", ovn_managed_once); > >>> + } > >>> + /* else we are not an HA chassis and frw is false. Ignore it. > */ > >>> + return; > >>> + } > >>> + > >>> + switch (frw_state) { > >>> + case FRW_INIT: > >>> + if (ofctrl_cur_cfg > 0) { > >>> + set_flow_restore_wait(ovs_idl_txn, cfg, > &cfg->other_config, > >>> + "false", ovn_managed_once); > >>> + frw_state = FRW_WAIT_TXN_COMPLETE; > >>> + VLOG_INFO("Setting flow-restore-wait=false " > >>> + "(cur_cfg=%"PRIu64")", ofctrl_cur_cfg); > >>> + } > >>> + break; > >>> + > >>> + case FRW_WAIT_TXN_COMPLETE: > >>> + /* if (ovs_idl_txn != NULL), the transaction completed. > >>> + * When the transaction completed, it either failed > >>> + * (ovs_txn_status == 0) or succeeded (ovs_txn_status != 0) */ > >>> + if (ovs_txn_status == 0) { > >>> + /* Previous transaction failed. */ > >>> + set_flow_restore_wait(ovs_idl_txn, cfg, > &cfg->other_config, > >>> + "false", ovn_managed_once); > >>> + break; > >>> + } > >>> + /* txn succeeded, get next_cfg */ > >>> + frw_next_cfg = ovs_next_cfg; > >>> + frw_state = FRW_TXN_SUCCESS; > >>> + /* fall through */ > >>> + > >>> + case FRW_TXN_SUCCESS: > >>> + if (ovs_next_cfg < frw_next_cfg) { > >>> + /* DB was reset, next_cfg went backwards */ > >>> + VLOG_INFO("OVS DB reset (next_cfg %"PRId64" -> > %"PRIu64"), " > >>> + "resetting state", > >>> + frw_next_cfg, ovs_next_cfg); > >>> + set_flow_restore_wait(ovs_idl_txn, cfg, > &cfg->other_config, > >>> + "false", ovn_managed_once); > >>> + frw_state = FRW_WAIT_TXN_COMPLETE; > >>> + break; > >>> + } > >>> + > >>> + if (!frw) { > >>> + if (cfg->cur_cfg >= frw_next_cfg) { > >>> + set_flow_restore_wait(ovs_idl_txn, cfg, > &cfg->other_config, > >>> + "true", ovn_managed_once); > >>> + frw_state = FRW_DONE; > >>> + VLOG_INFO("Setting flow-restore-wait=true"); > >>> + } > >>> + } else { > >>> + /* The transaction to false succeeded but frw is true. > >>> + * So, another task already set it to true */ > >>> + frw_state = FRW_DONE; > >>> + VLOG_INFO("flow-restore-wait was already true"); > >>> + } > >>> + break; > >>> + case FRW_DONE: > >>> + if (!frw) { > >>> + /* frw has been removed (e.g. by ovs-ctl restart) or is > false > >>> + * (e.g. txn failed.) */ > > > > Nit: extra whitespace at the beginning of the line. > > > >>> + set_flow_restore_wait(ovs_idl_txn, cfg, > &cfg->other_config, > >>> + "false", ovn_managed_once); > >>> + frw_state = FRW_WAIT_TXN_COMPLETE; > >>> + VLOG_INFO("OVS frw cleared, restarting flow-restore-wait > sequence " > >>> + "(cur_cfg=%"PRIu64")", ofctrl_cur_cfg); > >>> + } > >>> + break; > >>> + } > >>> +} > >>> + > >>> /* Only set monitor conditions on tables that are available in the > >>> * server schema. > >>> */ > >>> @@ -3381,6 +3525,7 @@ en_mac_cache_cleanup(void *data) > >>> > >>> struct ed_type_bfd_chassis { > >>> struct sset bfd_chassis; > >>> + bool is_ha_gw; > >>> }; > >>> > >>> static void * > >>> @@ -3409,8 +3554,9 @@ en_bfd_chassis_run(struct engine_node *node, > void *data OVS_UNUSED) > >>> = chassis_lookup_by_name(sbrec_chassis_by_name, chassis_id); > >>> > >>> sset_clear(&bfd_chassis->bfd_chassis); > >>> - bfd_calculate_chassis(chassis, ha_chassis_grp_table, > >>> - &bfd_chassis->bfd_chassis); > >>> + bfd_chassis->is_ha_gw = bfd_calculate_chassis(chassis, > >>> + > ha_chassis_grp_table, > >>> + > &bfd_chassis->bfd_chassis); > >>> return EN_UPDATED; > >>> } > >>> > >>> @@ -7117,6 +7263,7 @@ main(int argc, char *argv[]) > >>> struct unixctl_server *unixctl; > >>> struct ovn_exit_args exit_args = {0}; > >>> struct br_int_remote br_int_remote = {0}; > >>> + static uint64_t next_cfg = 0; > >>> int retval; > >>> > >>> /* Read from system-id-override file once on startup. */ > >>> @@ -7444,6 +7591,7 @@ main(int argc, char *argv[]) > >>> > >>> /* Main loop. */ > >>> int ovnsb_txn_status = 1; > >>> + int ovs_txn_status = 1; > >>> bool sb_monitor_all = false; > >>> struct tracked_acl_ids *tracked_acl_ids = NULL; > >>> while (!exit_args.exiting) { > >>> @@ -7545,6 +7693,11 @@ main(int argc, char *argv[]) > >>> pinctrl_update_swconn(br_int_remote.target, > >>> br_int_remote.probe_interval); > >>> > >>> + if (cfg && ovs_idl_txn && ovs_txn_status == -1) { > >>> + /* txn was in progress and is now completed */ > >>> + next_cfg = cfg->next_cfg; > >>> + } > >>> + > >>> /* Enable ACL matching for double tagged traffic. */ > >>> if (ovs_idl_txn && cfg) { > >>> int vlan_limit = smap_get_int( > >>> @@ -7894,6 +8047,13 @@ main(int argc, char *argv[]) > >>> stopwatch_start(OFCTRL_SEQNO_RUN_STOPWATCH_NAME, > >>> time_msec()); > >>> ofctrl_seqno_run(ofctrl_get_cur_cfg()); > >>> + if (ovs_idl_txn && bfd_chassis_data) { > >>> + manage_flow_restore_wait(ovs_idl_txn, cfg, > >>> + ofctrl_get_cur_cfg(), > >>> + next_cfg, > ovs_txn_status, > >>> + > bfd_chassis_data->is_ha_gw); > >>> + } > >>> + > >>> stopwatch_stop(OFCTRL_SEQNO_RUN_STOPWATCH_NAME, > >>> time_msec()); > >>> stopwatch_start(IF_STATUS_MGR_RUN_STOPWATCH_NAME, > >>> @@ -7993,7 +8153,7 @@ main(int argc, char *argv[]) > >>> OVS_NOT_REACHED(); > >>> } > >>> > >>> - int ovs_txn_status = > ovsdb_idl_loop_commit_and_wait(&ovs_idl_loop); > >>> + ovs_txn_status = > ovsdb_idl_loop_commit_and_wait(&ovs_idl_loop); > >>> if (!ovs_txn_status) { > >>> /* The transaction failed. */ > >>> vif_plug_clear_deleted( > >>> @@ -8012,6 +8172,9 @@ main(int argc, char *argv[]) > >>> &vif_plug_deleted_iface_ids); > >>> vif_plug_finish_changed( > >>> &vif_plug_changed_iface_ids); > >>> + if (cfg) { > >>> + next_cfg = cfg->next_cfg; > >>> + } > >>> } else if (ovs_txn_status == -1) { > >>> /* The commit is still in progress */ > >>> } else { > >>> @@ -8085,7 +8248,7 @@ loop_done: > >>> } > >>> > >>> ovsdb_idl_loop_commit_and_wait(&ovnsb_idl_loop); > >>> - int ovs_txn_status = > ovsdb_idl_loop_commit_and_wait(&ovs_idl_loop); > >>> + ovs_txn_status = > ovsdb_idl_loop_commit_and_wait(&ovs_idl_loop); > >>> if (!ovs_txn_status) { > >>> /* The transaction failed. */ > >>> vif_plug_clear_deleted( > >>> diff --git a/tests/multinode-macros.at b/tests/multinode-macros.at > >>> index 4a74d5127..646ee2d79 100644 > >>> --- a/tests/multinode-macros.at > >>> +++ b/tests/multinode-macros.at > >>> @@ -41,6 +41,28 @@ m4_define([M_START_TCPDUMP], > >>> ] > >>> ) > >>> > >>> +m4_define([_M_START_TCPDUMPS_RECURSIVE], [ > > > > I think we normally put the _ at the end, i.e., > M_START_TCPDUMPS_RECURSIVE_ > > > >>> + m4_if(m4_eval($# > 3), [1], [dnl > >>> + names="$names $3" > >>> + echo "Running podman exec $1 tcpdump -l $2 >$3.tcpdump > 2>$3.stderr" > >>> + podman exec $1 tcpdump -l $2 >$3.tcpdump 2>$3.stderr & > >>> + echo "podman exec $1 ps -ef | grep -v grep | grep tcpdump && > podman exec $1 killall tcpdump" >> cleanup > >>> + _M_START_TCPDUMPS_RECURSIVE(m4_shift(m4_shift(m4_shift($@)))) > >>> + ]) > >>> + ] > >>> +) > >>> + > >>> +# Start Multiple tcpdump. Useful to speed up when many tcpdump > >>> +# must be started as waiting for "listening" takes usually 1 second. > >>> +m4_define([M_START_TCPDUMPS], > >>> + [ > >>> + names="" > >>> + _M_START_TCPDUMPS_RECURSIVE($@) > >>> + for name in $names; do > >>> + OVS_WAIT_UNTIL([grep -q "listening" ${name}.stderr]) > >>> + done > >>> + ] > >>> +) > >>> > >>> # M_FORMAT_CT([ip-addr]) > >>> # > >>> diff --git a/tests/multinode.at b/tests/multinode.at > >>> index 6b9614126..24d7ca27c 100644 > >>> --- a/tests/multinode.at > >>> +++ b/tests/multinode.at > >>> @@ -2986,42 +2986,42 @@ AT_CLEANUP > >>> > >>> AT_SETUP([HA: Check for missing garp on leader when BFD goes back up]) > >>> # Network topology > >>> -# > ┌────────────────────────────────────────────────────────────────────────────────────────────────────────┐ > >>> -# │ > │ > >>> -# │ ┌───────────────────┐ ┌───────────────────┐ > ┌───────────────────┐ ┌───────────────────┐ │ > >>> -# │ │ ovn-chassis-1 │ │ ovn-gw-1 │ │ > ovn-gw-2 │ │ ovn-chassis-2 │ │ > >>> -# │ └─────────┬─────────┘ └───────────────────┘ > └───────────────────┘ └───────────────────┘ │ > >>> -# │ ┌─────────┴─────────┐ > │ > >>> -# │ │ inside1 │ > │ > >>> -# │ │ 192.168.1.1/24 │ > │ > >>> -# │ └─────────┬─────────┘ > │ > >>> -# │ ┌─────────┴─────────┐ > │ > >>> -# │ │ inside │ > │ > >>> -# │ └─────────┬─────────┘ > │ > >>> -# │ ┌─────────┴─────────┐ > │ > >>> -# │ │ 192.168.1.254 │ > │ > >>> -# │ │ R1 │ > │ > >>> -# │ │ 192.168.0.254 │ > │ > >>> -# │ └─────────┬─────────┘ > │ > >>> -# │ > └------eth1---------------┬--------eth1-----------┐ > │ > >>> -# │ ┌──────────┴────────┐ > ┌─────────┴─────────┐ │ > >>> -# │ │ 192.168.1.254 │ │ > 192.168.1.254 │ │ > >>> -# │ │ R1 │ │ > R1 │ │ > >>> -# │ │ 192.168.0.254 │ │ > 192.168.0.254 │ │ > >>> -# │ └─────────┬─────────┘ > └─────────┬─────────┘ │ > >>> -# │ │ > │ ┌───────────────────┐ │ > >>> -# │ ┌─────────┴─────────┐ > ┌─────────┴─────────┐ │ 192.168.0.1 │ │ > >>> -# │ │ outside │ │ > outside │ │ ext1 │ │ > >>> -# │ └─────────┬─────────┘ > └─────────┬─────────┘ └─────────┬─────────┘ │ > >>> -# │ ┌─────────┴─────────┐ > ┌─────────┴─────────┐ ┌─────────┴─────────┐ │ > >>> -# │ │ ln-outside │ │ > ln-outside │ │ ln-ext1 │ │ > >>> -# │ └─────────┬─────────┘ > └─────────┬─────────┘ └─────────┬─────────┘ │ > >>> -# │ ┌─────────┴─────────┐ > ┌─────────┴─────────┐ ┌─────────┴─────────┐ │ > >>> -# │ │ br-ex │ │ > br-ex │ │ br-ex │ │ > >>> -# │ └─────────┬─────────┘ > └─────────┬─────────┘ └─────────┬─────────┘ │ > >>> -# │ > └---------eth2-----------┴-------eth2-------------┘ │ > >>> -# │ > │ > >>> -# > └────────────────────────────────────────────────────────────────────────────────────────────────────────┘ > >>> +# > ┌────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ > >>> +# │ > │ > >>> +# │ ┌───────────────────┐ ┌───────────────────┐ > ┌───────────────────┐ ┌───────────────────┐ ┌───────────────────┐ │ > >>> +# │ │ ovn-chassis-1 │ │ ovn-chassis-2 │ │ > ovn-gw-1 │ │ ovn-gw-2 │ │ ovn-chassis-3 │ │ > >>> +# │ └─────────┬─────────┘ └─────────┬─────────┘ > └───────────────────┘ └───────────────────┘ └───────────────────┘ │ > >>> +# │ ┌─────────┴─────────┐ ┌─────────┴─────────┐ > │ > >>> +# │ │ inside1 │ │ inside2 │ > │ > >>> +# │ │ 192.168.1.1/24 │ │ 192.168.1.2/24 │ > │ > >>> +# │ └─────────┬─────────┘ └─────────┬─────────┘ > │ > >>> +# │ ┌─┴────────────────────────┴─┐ > │ > >>> +# │ │ inside │ > │ > >>> +# │ └──────────────┬─────────────┘ > │ > >>> +# │ ┌─────────┴─────────┐ > │ > >>> +# │ │ 192.168.1.254 │ > │ > >>> +# │ │ R1 │ > │ > >>> +# │ │ 192.168.0.254 │ > │ > >>> +# │ └─────────┬─────────┘ > │ > >>> +# │ > └------eth1---------------------------┬--------eth1-----------┐ > │ > >>> +# │ > ┌──────────┴────────┐ ┌─────────┴─────────┐ > │ > >>> +# │ │ > 192.168.1.254 │ │ 192.168.1.254 │ │ > >>> +# │ │ > R1 │ │ R1 │ │ > >>> +# │ │ > 192.168.0.254 │ │ 192.168.0.254 │ │ > >>> +# │ > └─────────┬─────────┘ └─────────┬─────────┘ > │ > >>> +# │ > │ │ ┌───────────────────┐ │ > >>> +# │ > ┌─────────┴─────────┐ ┌─────────┴─────────┐ │ 192.168.0.1 │ > │ > >>> +# │ │ > outside │ │ outside │ │ ext1 │ │ > >>> +# │ > └─────────┬─────────┘ └─────────┬─────────┘ └─────────┬─────────┘ > │ > >>> +# │ > ┌─────────┴─────────┐ ┌─────────┴─────────┐ ┌─────────┴─────────┐ > │ > >>> +# │ │ > ln-outside │ │ ln-outside │ │ ln-ext1 │ │ > >>> +# │ > └─────────┬─────────┘ └─────────┬─────────┘ └─────────┬─────────┘ > │ > >>> +# │ > ┌─────────┴─────────┐ ┌─────────┴─────────┐ ┌─────────┴─────────┐ > │ > >>> +# │ │ > br-ex │ │ br-ex │ │ br-ex │ │ > >>> +# │ > └─────────┬─────────┘ └─────────┬─────────┘ └─────────┬─────────┘ > │ > >>> +# │ > └---------eth2-----------┴-------eth2-------------┘ │ > >>> +# │ > │ > >>> +# > └────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘ > >>> > >>> # The goal of this test is the check that GARP are properly generated > by higest priority traffic when > >>> # BFD goes down, and back up, and this whether the BFD event is due > either to some bfd packet lost > >>> @@ -3030,6 +3030,12 @@ AT_SETUP([HA: Check for missing garp on leader > when BFD goes back up]) > >>> # So gw3 should in this test neither send garp or receive packets. > >>> # > >>> # Enable vconn so we can check the GARP from a log perspective. > >>> +on_exit "podman exec ovn-gw-1 ovn-appctl vlog/set info" > >>> +on_exit "podman exec ovn-gw-1 ovn-appctl vlog/enable-rate-limit" > >>> +on_exit "podman exec ovn-gw-2 ovn-appctl vlog/set info" > >>> +on_exit "podman exec ovn-gw-2 ovn-appctl vlog/enable-rate-limit" > >>> +on_exit "podman exec ovn-gw-3 ovn-appctl vlog/set info" > >>> +on_exit "podman exec ovn-gw-3 ovn-appctl vlog/enable-rate-limit" > >>> m_as ovn-gw-1 ovn-appctl vlog/set vconn:dbg > >>> m_as ovn-gw-2 ovn-appctl vlog/set vconn:dbg > >>> m_as ovn-gw-3 ovn-appctl vlog/set vconn:dbg > >>> @@ -3037,12 +3043,17 @@ m_as ovn-gw-1 ovn-appctl > vlog/disable-rate-limit > >>> m_as ovn-gw-2 ovn-appctl vlog/disable-rate-limit > >>> m_as ovn-gw-3 ovn-appctl vlog/disable-rate-limit > >>> > >>> +# Decrease revalidation time on ovs switch simulating ToR. > >>> +on_exit "OVS_RUNDIR= ovs-vsctl set Open_vSwitch . > other_config:max-revalidator=500" > > > > Isn't it better to just remove the key on_exit, just in case the default > > ever changes from 500 to something else? I.e.: > > > > on_exit "OVS_RUNDIR= ovs-vsctl remove Open_vSwitch . other_config > > max-revalidator" > > > >>> +OVS_RUNDIR= ovs-vsctl set Open_vSwitch . > other_config:max-revalidator=100 > >>> + > >>> check_fake_multinode_setup > >>> > >>> # Delete the multinode NB and OVS resources before starting the test. > >>> cleanup_multinode_resources > >>> > >>> ip_ch1=$(m_as ovn-chassis-1 ip a show dev eth1 | grep "inet " | awk > '{print $2}'| cut -d '/' -f1) > >>> +ip_ch2=$(m_as ovn-chassis-2 ip a show dev eth1 | grep "inet " | awk > '{print $2}'| cut -d '/' -f1) > >>> ip_gw1=$(m_as ovn-gw-1 ip a show dev eth1 | grep "inet " | awk > '{print $2}'| cut -d '/' -f1) > >>> ip_gw2=$(m_as ovn-gw-2 ip a show dev eth1 | grep "inet " | awk > '{print $2}'| cut -d '/' -f1) > >>> ip_gw3=$(m_as ovn-gw-3 ip a show dev eth1 | grep "inet " | awk > '{print $2}'| cut -d '/' -f1) > >>> @@ -3050,25 +3061,35 @@ ip_gw3=$(m_as ovn-gw-3 ip a show dev eth1 | > grep "inet " | awk '{print $2}'| cut > >>> from_gw1_to_gw2=$(m_as ovn-gw-1 ovs-vsctl --bare --columns=name find > interface options:remote_ip=$ip_gw2) > >>> from_gw1_to_gw3=$(m_as ovn-gw-1 ovs-vsctl --bare --columns=name find > interface options:remote_ip=$ip_gw3) > >>> from_gw1_to_ch1=$(m_as ovn-gw-1 ovs-vsctl --bare --columns=name find > interface options:remote_ip=$ip_ch1) > >>> +from_gw1_to_ch2=$(m_as ovn-gw-1 ovs-vsctl --bare --columns=name find > interface options:remote_ip=$ip_ch2) > >>> from_gw2_to_gw1=$(m_as ovn-gw-2 ovs-vsctl --bare --columns=name find > interface options:remote_ip=$ip_gw1) > >>> from_gw2_to_gw3=$(m_as ovn-gw-2 ovs-vsctl --bare --columns=name find > interface options:remote_ip=$ip_gw3) > >>> from_gw2_to_ch1=$(m_as ovn-gw-2 ovs-vsctl --bare --columns=name find > interface options:remote_ip=$ip_ch1) > >>> +from_gw2_to_ch2=$(m_as ovn-gw-2 ovs-vsctl --bare --columns=name find > interface options:remote_ip=$ip_ch2) > >>> from_ch1_to_gw1=$(m_as ovn-chassis-1 ovs-vsctl --bare --columns=name > find interface options:remote_ip=$ip_gw1) > >>> from_ch1_to_gw2=$(m_as ovn-chassis-1 ovs-vsctl --bare --columns=name > find interface options:remote_ip=$ip_gw2) > >>> +from_ch2_to_gw1=$(m_as ovn-chassis-2 ovs-vsctl --bare --columns=name > find interface options:remote_ip=$ip_gw1) > >>> +from_ch2_to_gw2=$(m_as ovn-chassis-2 ovs-vsctl --bare --columns=name > find interface options:remote_ip=$ip_gw2) > >>> > >>> m_as ovn-chassis-1 ip link del hv1-vif1-p > >>> -m_as ovn-chassis-2 ip link del ext1-p > >>> +m_as ovn-chassis-2 ip link del hv2-vif1-p > >>> +m_as ovn-chassis-3 ip link del ext1-p > >>> > >>> OVS_WAIT_UNTIL([m_as ovn-chassis-1 ip link show | grep -q genev_sys]) > >>> OVS_WAIT_UNTIL([m_as ovn-chassis-2 ip link show | grep -q genev_sys]) > >>> +OVS_WAIT_UNTIL([m_as ovn-chassis-3 ip link show | grep -q genev_sys]) > >>> OVS_WAIT_UNTIL([m_as ovn-gw-1 ip link show | grep -q genev_sys]) > >>> OVS_WAIT_UNTIL([m_as ovn-gw-2 ip link show | grep -q genev_sys]) > >>> OVS_WAIT_UNTIL([m_as ovn-gw-3 ip link show | grep -q genev_sys]) > >>> > >>> +# Use "aggressive" bfd parameters > >>> +check multinode_nbctl set NB_Global . options:"bfd-min-rx"=500 > >>> +check multinode_nbctl set NB_Global . options:"bfd-min-tx"=100 > >>> check multinode_nbctl ls-add inside > >>> check multinode_nbctl ls-add outside > >>> check multinode_nbctl ls-add ext > >>> check multinode_nbctl lsp-add inside inside1 -- lsp-set-addresses > inside1 "f0:00:c0:a8:01:01 192.168.1.1" > >>> +check multinode_nbctl lsp-add inside inside2 -- lsp-set-addresses > inside2 "f0:00:c0:a8:01:02 192.168.1.2" > >>> check multinode_nbctl lsp-add ext ext1 -- lsp-set-addresses ext1 > "00:00:c0:a8:00:01 192.168.0.1" > >>> > >>> multinode_nbctl create Logical_Router name=R1 > >>> @@ -3100,12 +3121,14 @@ m_as ovn-gw-3 ovs-vsctl remove open . > external_ids garp-max-timeout-sec > >>> > >>> m_as ovn-chassis-1 ovs-vsctl set open . > external-ids:ovn-bridge-mappings=public:br-ex > >>> m_as ovn-chassis-2 ovs-vsctl set open . > external-ids:ovn-bridge-mappings=public:br-ex > >>> +m_as ovn-chassis-3 ovs-vsctl set open . > external-ids:ovn-bridge-mappings=public:br-ex > >>> m_as ovn-gw-1 ovs-vsctl set open . > external-ids:ovn-bridge-mappings=public:br-ex > >>> m_as ovn-gw-2 ovs-vsctl set open . > external-ids:ovn-bridge-mappings=public:br-ex > >>> m_as ovn-gw-3 ovs-vsctl set open . > external-ids:ovn-bridge-mappings=public:br-ex > >>> > >>> m_as ovn-chassis-1 /data/create_fake_vm.sh inside1 hv1-vif1 > f0:00:c0:a8:01:01 1500 192.168.1.1 24 192.168.1.254 2000::1/64 2000::a > >>> -m_as ovn-chassis-2 /data/create_fake_vm.sh ext1 ext1 > 00:00:c0:a8:00:01 1500 192.168.0.1 24 192.168.0.254 1000::3/64 1000::a > >>> +m_as ovn-chassis-2 /data/create_fake_vm.sh inside2 hv2-vif1 > f0:00:c0:a8:01:02 1500 192.168.1.2 24 192.168.1.254 2000::2/64 2000::a > >>> +m_as ovn-chassis-3 /data/create_fake_vm.sh ext1 ext1 > 00:00:c0:a8:00:01 1500 192.168.0.1 24 192.168.0.254 1000::3/64 1000::a > >>> > >>> # There should be one ha_chassis_group with the name "R1_outside" > >>> m_check_row_count HA_Chassis_Group 1 name=R1_outside > >>> @@ -3160,53 +3183,67 @@ for chassis in $from_ch1_to_gw1 > $from_ch1_to_gw2; do > >>> wait_bfd_enabled ovn-chassis-1 $chassis > >>> done > >>> > >>> +# check BFD enablement on tunnel ports from ovn-chassis-2 ########### > >>> +for chassis in $from_ch2_to_gw1 $from_ch2_to_gw2; do > >>> + echo "checking ovn-chassis-2 -> $chassis" > >>> + wait_bfd_enabled ovn-chassis-2 $chassis > >>> +done > >>> + > >>> # Make sure there is no nft table left. Do not use nft directly as > might not be installed in container. > >>> gw1_pid=$(podman inspect -f '{{.State.Pid}}' ovn-gw-1) > >>> nsenter --net=/proc/$gw1_pid/ns/net nft list tables | grep ovn-test > && nsenter --net=/proc/$gw1_pid/ns/net nft delete table ip ovn-test > >>> -on_exit "nsenter --net=/proc/$gw1_pid/ns/net nft list tables | grep > ovn-test && nsenter --net=/proc/$gw1_pid/ns/net nft delete table ip > ovn-test" > >>> +on_exit "if [[ -d "/proc/$gw1_pid" ]]; then nsenter > --net=/proc/$gw1_pid/ns/net nft list tables | grep ovn-test && nsenter > --net=/proc/$gw1_pid/ns/net nft delete table ip ovn-test; fi" > >>> > >>> -for chassis in $from_gw1_to_gw2 $from_gw1_to_gw3 $from_gw1_to_ch1; do > >>> +for chassis in $from_gw1_to_gw2 $from_gw1_to_gw3 $from_gw1_to_ch1 > $from_gw1_to_ch2; do > >>> wait_bfd_up ovn-gw-1 $chassis > >>> done > >>> -for chassis in $from_gw2_to_gw1 $from_gw2_to_gw3 $from_gw2_to_ch1; do > >>> +for chassis in $from_gw2_to_gw1 $from_gw2_to_gw3 $from_gw2_to_ch1 > $from_gw2_to_ch2; do > >>> wait_bfd_up ovn-gw-2 $chassis > >>> done > >>> for chassis in $from_ch1_to_gw1 $from_ch1_to_gw2; do > >>> wait_bfd_up ovn-chassis-1 $chassis > >>> done > >>> +for chassis in $from_ch2_to_gw1 $from_ch2_to_gw2; do > >>> + wait_bfd_up ovn-chassis-2 $chassis > >>> +done > >>> > >>> m_wait_row_count Port_Binding 1 logical_port=cr-R1_outside > chassis=$gw1_chassis > >>> check multinode_nbctl --wait=hv sync > >>> > >>> start_tcpdump() { > >>> echo "$(date +%H:%M:%S.%03N) Starting tcpdump" > >>> - M_START_TCPDUMP([ovn-chassis-1], [-neei hv1-vif1-p], [ch1]) > >>> - M_START_TCPDUMP([ovn-chassis-2], [-neei eth2], [ch2]) > >>> - M_START_TCPDUMP([ovn-gw-1], [-neei eth2], [gw1]) > >>> - M_START_TCPDUMP([ovn-gw-1], [-neei eth2 -Q out], [gw1_out]) > >>> - M_START_TCPDUMP([ovn-gw-2], [-neei eth2], [gw2]) > >>> - M_START_TCPDUMP([ovn-gw-2], [-neei eth2 -Q out], [gw2_out]) > >>> - M_START_TCPDUMP([ovn-gw-3], [-neei eth2], [gw3]) > >>> - M_START_TCPDUMP([ovn-gw-3], [-neei eth2 -Q out], [gw3_out]) > >>> + M_START_TCPDUMPS([ovn-chassis-1], [-neei hv1-vif1-p], [ch1], > >>> + [ovn-chassis-2], [-neei hv2-vif1-p], [ch2], > >>> + [ovn-chassis-3], [-neei eth2], [ch3], > >>> + [ovn-gw-1], [-neei eth2], [gw1], > >>> + [ovn-gw-1], [-neei eth2 -Q out], [gw1_out], > >>> + [ovn-gw-2], [-neei eth2], [gw2], > >>> + [ovn-gw-2], [-neei eth2 -Q out], [gw2_out], > >>> + [ovn-gw-3], [-neei eth2], [gw3], > >>> + [ovn-gw-3], [-neei eth2 -Q out], [gw3_out], > >>> + [ovn-gw-1], [-neei eth1], [gw1_eth1], > >>> + [ovn-gw-2], [-neei eth1], [gw2_eth1], > >>> + [ovn-chassis-1], [-neei eth1], [ch1_eth1], > >>> + [ovn-chassis-2], [-neei eth1], [ch2_eth1]) > >>> } > >>> > >>> stop_tcpdump() { > >>> echo "$(date +%H:%M:%S.%03N) Stopping tcpdump" > >>> - m_kill 'ovn-gw-1 ovn-gw-2 ovn-gw-3 ovn-chassis-1 ovn-chassis-2' > tcpdump > >>> + m_kill 'ovn-gw-1 ovn-gw-2 ovn-gw-3 ovn-chassis-1 ovn-chassis-2 > ovn-chassis-3' tcpdump > >>> } > >>> > >>> -# Send packets from chassis2 (ext1) to chassis1 > >>> +# Send packets from ovn-chassis-3 (ext1) to ovn-chassis-1 > >>> send_background_packets() { > >>> echo "$(date +%H:%M:%S.%03N) Sending packets in Background" > >>> start_tcpdump > >>> - M_NS_DAEMONIZE([ovn-chassis-2], [ext1], [ping -f -i 0.1 > 192.168.1.1], [ping.pid]) > >>> + M_NS_DAEMONIZE([ovn-chassis-3], [ext1], [ping -f -i 0.1 > 192.168.1.1], [ping.pid]) > >>> } > >>> > >>> stop_sending_background_packets() { > >>> echo "$(date +%H:%M:%S.%03N) Stopping Background process" > >>> m_as ovn-chassis-1 ps -ef | grep -v grep | grep -q ping && \ > >>> m_as ovn-chassis-1 echo "Stopping ping on ovn-chassis-1" && > killall ping > >>> - m_as ovn-chassis-2 ps -ef | grep -v grep | grep -q ping && \ > >>> + m_as ovn-chassis-3 ps -ef | grep -v grep | grep -q ping && \ > >>> m_as ovn-chassis-2 echo "Stopping ping on ovn-chassis-2" && > killall ping > >>> stop_tcpdump > >>> } > >>> @@ -3216,8 +3253,8 @@ check_for_new_garps() { > >>> expecting_garp=$2 > >>> n_new_garps=$(cat ${hv}_out.tcpdump | grep -c "f0:00:c0:a8:00:fe > > Broadcast, ethertype ARP (0x0806), length 42: Request who-has > 192.168.0.254 tell 192.168.0.254, length 28") > >>> > >>> - if [ "$expecting_garp" == "true" ]; then > >>> - AS_BOX([$(date +%H:%M:%S.%03N) Waiting/checking for garp from > $hv - Starting with $n_new_garps]) > >>> + if [[ "$expecting_garp" == "true" ]]; then > >>> + echo "$(date +%H:%M:%S.%03N) Waiting/checking for garp from > $hv - Starting with $n_new_garps" > >>> OVS_WAIT_UNTIL([ > >>> n_garps=$n_new_garps > >>> n_new_garps=$(cat ${hv}_out.tcpdump | grep -c > "f0:00:c0:a8:00:fe > Broadcast, ethertype ARP (0x0806), length 42: Request > who-has 192.168.0.254 tell 192.168.0.254, length 28") > >>> @@ -3225,7 +3262,7 @@ check_for_new_garps() { > >>> test "$n_garps" -ne "$n_new_garps" > >>> ]) > >>> else > >>> - AS_BOX([$(date +%H:%M:%S.%03N) Checking no garp from ${hv}]) > >>> + echo "$(date +%H:%M:%S.%03N) Checking no garp from ${hv}" > >>> # Waiting a few seconds to get a chance to see unexpected > garps. > >>> sleep 3 > >>> n_garps=$(cat ${hv}_out.tcpdump | grep -c "f0:00:c0:a8:00:fe > > Broadcast, ethertype ARP (0x0806), length 42: Request who-has > 192.168.0.254 tell 192.168.0.254, length 28") > >>> @@ -3241,8 +3278,8 @@ check_for_new_echo_pkts() { > >>> n_new_echo_req=$(cat ${hv}.tcpdump | grep -c "$mac_src > > $mac_dst, ethertype IPv4 (0x0800), length 98: 192.168.0.1 > 192.168.1.1: > ICMP echo request") > >>> n_new_echo_rep=$(cat ${hv}.tcpdump | grep -c "$mac_dst > > $mac_src, ethertype IPv4 (0x0800), length 98: 192.168.1.1 > 192.168.0.1: > ICMP echo reply") > >>> > >>> - if [ "$expecting_pkts" == "true" ]; then > >>> - AS_BOX([$(date +%H:%M:%S.%03N) Waiting/checking for echo pkts > through ${hv}]) > >>> + if [[ "$expecting_pkts" == "true" ]]; then > >>> + echo "$(date +%H:%M:%S.%03N) Waiting/checking for echo pkts > through ${hv}" > >>> echo "Starting with $n_new_echo_req requests and > $n_new_echo_rep replies so far on ${hv}." > >>> OVS_WAIT_UNTIL([ > >>> n_echo_req=$n_new_echo_req > >>> @@ -3253,7 +3290,7 @@ check_for_new_echo_pkts() { > >>> test "$n_echo_req" -ne "$n_new_echo_req" && test > "$n_echo_rep" -ne "$n_new_echo_rep" > >>> ]) > >>> else > >>> - AS_BOX([$(date +%H:%M:%S.%03N) Checking no pkts from ${hv}]) > >>> + echo "$(date +%H:%M:%S.%03N) Checking no pkts from ${hv}" > >>> # Waiting a few seconds to get a chance to see unexpected > pkts. > >>> sleep 3 > >>> n_echo_req=$(cat ${hv}.tcpdump | grep -c "$mac_src > > $mac_dst, ethertype IPv4 (0x0800), length 98: 192.168.0.1 > 192.168.1.1: > ICMP echo request") > >>> @@ -3271,22 +3308,44 @@ dump_statistics() { > >>> ch1_rep=$(grep -c "ICMP echo reply" ch1.tcpdump) > >>> ch2_req=$(grep -c "ICMP echo request" ch2.tcpdump) > >>> ch2_rep=$(grep -c "ICMP echo reply" ch2.tcpdump) > >>> + ch3_req=$(grep -c "ICMP echo request" ch3.tcpdump) > >>> + ch3_rep=$(grep -c "ICMP echo reply" ch3.tcpdump) > >>> gw1_req=$(grep -c "ICMP echo request" gw1.tcpdump) > >>> gw1_rep=$(grep -c "ICMP echo reply" gw1.tcpdump) > >>> gw2_req=$(grep -c "ICMP echo request" gw2.tcpdump) > >>> gw2_rep=$(grep -c "ICMP echo reply" gw2.tcpdump) > >>> gw3_req=$(grep -c "ICMP echo request" gw3.tcpdump) > >>> gw3_rep=$(grep -c "ICMP echo reply" gw3.tcpdump) > >>> - echo "$n1 claims in gw1, $n2 in gw2 and $n3 on gw3" > >>> - echo "ch2_request=$ch2_req gw1_request=$gw1_req > gw2_request=$gw2_req gw3_request=$gw3_req ch1_request=$ch1_req > ch1_reply=$ch1_rep gw1_reply=$gw1_rep gw2_reply=$gw2_rep gw3_reply=$gw3_rep > ch2_reply=$ch2_rep" > >>> + echo "$n1 claims in gw1, $n2 in gw2 and $n3 on gw3" >&2 > >>> + echo "ch3_req=$ch3_req gw_req=($gw1_req + $gw2_req +$gw3_req) > ch1_req=$ch1_req ch1_rep=$ch1_rep gw_rep=($gw1_rep + $gw2_rep + $gw3_rep) > ch3_rep=$ch3_rep ch2=($ch2_req+$ch2_rep)" >&2 > >>> + echo "$((ch3_req - ch3_rep))" > >>> } > >>> > >>> -check_migration_between_gw1_and_gw2() { > >>> - action=$1 > >>> - send_background_packets > >>> +add_port() { > >>> + bridge=$1 > >>> + interface=$2 > >>> + address=$3 > >>> + echo "Adding $bridge $interface $address" > >>> + > >>> + pid=$(podman inspect -f '{{.State.Pid}}' ovn-gw-1) > >>> + ln -sf /proc/$pid/ns/net /var/run/netns/$pid > >>> + port=$(OVS_RUNDIR= ovs-vsctl --data=bare --no-heading > --columns=name find interface \ > >>> + external_ids:container_id=ovn-gw-1 > external_ids:container_iface="$interface") > >>> + port="${port:0:13}" > >>> + ip link add "${port}_l" type veth peer name "${port}_c" > >>> + ip link set "${port}_l" up > >>> + ip link set "${port}_c" netns $pid > >>> + ip netns exec $pid ip link set dev "${port}_c" name "$interface" > >>> + ip netns exec $pid ip link set "$interface" up > >>> + if [[ -n "$address" ]]; then > >>> + ip netns exec $pid ip addr add "$address" dev "$interface" > >>> + fi > > > > I might be wrong but I think nobody cleans up any of these ports created > > in the ovn-gw-1 container. Do we need some on_exit() calls here? > > > >>> +} > >>> > >>> +prepare() { > >>> + send_background_packets > >>> # We make sure gw1 is leader since enough time that it generated > all its garps. > >>> - AS_BOX([$(date +%H:%M:%S.%03N) Waiting all garps sent by gw1]) > >>> + echo $(date +%H:%M:%S.%03N) Waiting all garps sent by gw1 > >>> n_new_garps=$(cat gw1_out.tcpdump | grep -c "f0:00:c0:a8:00:fe > > Broadcast, ethertype ARP (0x0806), length 42: Request who-has 192.168.0.254 > tell 192.168.0.254, length 28") > >>> OVS_WAIT_UNTIL([ > >>> n_garps=$n_new_garps > >>> @@ -3302,130 +3361,267 @@ check_migration_between_gw1_and_gw2() { > >>> check_for_new_echo_pkts gw2 "00:00:c0:a8:00:01" > "f0:00:c0:a8:00:fe" "false" > >>> check_for_new_echo_pkts gw3 "00:00:c0:a8:00:01" > "f0:00:c0:a8:00:fe" "false" > >>> > >>> + # All packets should go through gw1, and none through gw2 or gw3. > >>> + check_packets "true" "false" "false" "true" > >>> flap_count_gw_1=$(m_as ovn-gw-1 ovs-vsctl get interface > $from_gw1_to_gw2 bfd_status | sed 's/.*flap_count=\"\([[0-9]]*\).*/\1/g') > >>> flap_count_gw_2=$(m_as ovn-gw-2 ovs-vsctl get interface > $from_gw2_to_gw1 bfd_status | sed 's/.*flap_count=\"\([[0-9]]*\).*/\1/g') > >>> +} > >>> > >>> - if [ test "$action" == "stop_bfd" ]; then > >>> - AS_BOX([$(date +%H:%M:%S.%03N) Blocking bfd on gw1 (from > $ip_gw1 to $ip_gw2)]) > >>> - nsenter --net=/proc/$gw1_pid/ns/net nft add table ip ovn-test > >>> - nsenter --net=/proc/$gw1_pid/ns/net nft 'add chain ip > ovn-test INPUT { type filter hook input priority 0; policy accept; }' > >>> - # Drop BFD from gw-1 to gw-2: geneve port (6081), inner port > 3784 (0xec8), Session state Up, Init, Down. > >>> - nsenter --net=/proc/$gw1_pid/ns/net nft add rule ip ovn-test > INPUT ip daddr $ip_gw1 ip saddr $ip_gw2 udp dport 6081 '@th,416,16 == > 0x0ec8 @th,472,8 == 0xc0 counter drop' > >>> - nsenter --net=/proc/$gw1_pid/ns/net nft add rule ip ovn-test > INPUT ip daddr $ip_gw1 ip saddr $ip_gw2 udp dport 6081 '@th,416,16 == > 0x0ec8 @th,472,8 == 0x80 counter drop' > >>> - nsenter --net=/proc/$gw1_pid/ns/net nft add rule ip ovn-test > INPUT ip daddr $ip_gw1 ip saddr $ip_gw2 udp dport 6081 '@th,416,16 == > 0x0ec8 @th,472,8 == 0x40 counter drop' > >>> - > >>> - # We do not check that packets go through gw2 as BFD between > chassis-2 and gw1 is still up > >>> - fi > >>> - > >>> - if [ test "$action" == "kill_gw2" ]; then > >>> - AS_BOX([$(date +%H:%M:%S.%03N) Killing gw2 ovn-controller]) > >>> - on_exit 'm_as ovn-gw-2 /usr/share/openvswitch/scripts/ovs-ctl > status || > >>> - m_as ovn-gw-2 /usr/share/openvswitch/scripts/ovs-ctl > start --system-id=ovn-gw-2' > >>> - on_exit 'm_as ovn-gw-2 /usr/share/ovn/scripts/ovn-ctl > status_controller || > >>> - m_as ovn-gw-2 /usr/share/ovn/scripts/ovn-ctl > start_controller ${CONTROLLER_SSL_ARGS}' > >>> - > >>> - m_as ovn-gw-2 kill -9 $(m_as ovn-gw-2 cat > /run/ovn/ovn-controller.pid) > >>> - m_as ovn-gw-2 kill -9 $(m_as ovn-gw-2 cat > /run/openvswitch/ovs-vswitchd.pid) > >>> - m_as ovn-gw-2 kill -9 $(m_as ovn-gw-2 cat > /run/openvswitch/ovsdb-server.pid) > >>> - # Also delete datapath (flows) > >>> - m_as ovn-gw-2 ovs-dpctl del-dp system@ovs-system > >>> - fi > >>> - > >>> - if [ test "$action" == "kill_gw1" ]; then > >>> - AS_BOX([$(date +%H:%M:%S.%03N) Killing gw1 ovn-controller]) > >>> - on_exit 'm_as ovn-gw-1 /usr/share/openvswitch/scripts/ovs-ctl > status || > >>> - m_as ovn-gw-1 /usr/share/openvswitch/scripts/ovs-ctl > start --system-id=ovn-gw-1' > >>> - on_exit 'm_as ovn-gw-1 /usr/share/ovn/scripts/ovn-ctl > status_controller || > >>> - m_as ovn-gw-1 /usr/share/ovn/scripts/ovn-ctl > start_controller ${CONTROLLER_SSL_ARGS}' > >>> - > >>> - m_as ovn-gw-1 kill -9 $(m_as ovn-gw-1 cat > /run/ovn/ovn-controller.pid) > >>> - m_as ovn-gw-1 kill -9 $(m_as ovn-gw-1 cat > /run/openvswitch/ovs-vswitchd.pid) > >>> - m_as ovn-gw-1 kill -9 $(m_as ovn-gw-1 cat > /run/openvswitch/ovsdb-server.pid) > >>> - # Also delete datapath (flows) > >>> - m_as ovn-gw-1 ovs-dpctl del-dp system@ovs-system > >>> - fi > >>> +check_loss_after_flap() > >>> +{ > >>> + dead=$1 > >>> + max_expected_loss=$2 > >>> > >>> - if [ test "$action" == "kill_gw2" ]; then > >>> - AS_BOX([$(date +%H:%M:%S.%03N) Waiting for flap count between > gw1 and gw2 to increase]) > >>> + if [[ "$dead" == "gw2" ]]; then > >>> + echo "$(date +%H:%M:%S.%03N) Waiting for flap count between > gw1 and gw2 to increase" > >>> OVS_WAIT_UNTIL([ > >>> new_flap_count=$(m_as ovn-gw-1 ovs-vsctl get interfac > $from_gw1_to_gw2 bfd_status | sed 's/.*flap_count=\"\([[0-9]]*\).*/\1/g') > >>> echo "Comparing $new_flap_count versus $flap_count_gw_1" > >>> test "$new_flap_count" -gt "$((flap_count_gw_1))" > >>> ]) > >>> else > >>> - AS_BOX([$(date +%H:%M:%S.%03N) Waiting for flap count between > gw2 and gw1 to increase]) > >>> + echo "$(date +%H:%M:%S.%03N) Waiting for flap count between > gw2 and gw1 to increase])" > >>> OVS_WAIT_UNTIL([ > >>> new_flap_count=$(m_as ovn-gw-2 ovs-vsctl get interfac > $from_gw2_to_gw1 bfd_status | sed 's/.*flap_count=\"\([[0-9]]*\).*/\1/g') > >>> echo "Comparing $new_flap_count versus $flap_count_gw_2" > >>> test "$new_flap_count" -gt "$((flap_count_gw_2))" > >>> ]) > >>> - > >>> fi > >>> - AS_BOX([$(date +%H:%M:%S.%03N) Flapped!]) > >>> > >>> + echo "$(date +%H:%M:%S.%03N) Flapped!" > >>> # Wait a few more second for the fight. > >>> + sleep 4 > >>> + > >>> + echo "$(date +%H:%M:%S.%03N) Statistics after flapping" > >>> + lost=$(dump_statistics) > >>> + echo "===> $lost packet lost while handling migration" > >>> + AT_CHECK([test "$lost" -le "$max_expected_loss"]) > >>> +} > >>> + > >>> +final_check() > >>> +{ > >>> + action=$1 > >>> + lost=$2 > >>> + max_expected_loss_after_restoration=$3 > >>> + > >>> + # Wait a little more to get packets while network is restored > >>> sleep 2 > >>> - AS_BOX([$(date +%H:%M:%S.%03N) Statistics after flapping]) > >>> - dump_statistics > >>> - > >>> - if [ test "$action" == "stop_bfd" ]; then > >>> - # gw1 still alive and gw2 tried to claim => gw1 should > restart generating garps. > >>> - check_for_new_garps gw1 "true" > >>> - check_for_new_garps gw2 "false" > >>> - check_for_new_garps gw3 "false" > >>> - check_for_new_echo_pkts gw1 00:00:c0:a8:00:01 > f0:00:c0:a8:00:fe "true" > >>> - check_for_new_echo_pkts gw2 00:00:c0:a8:00:01 > f0:00:c0:a8:00:fe "false" > >>> - check_for_new_echo_pkts gw3 00:00:c0:a8:00:01 > f0:00:c0:a8:00:fe "false" > >>> - check_for_new_echo_pkts ch1 f0:00:c0:a8:01:fe > f0:00:c0:a8:01:01 "true" > >>> - AS_BOX([$(date +%H:%M:%S.%03N) Unblocking bfd on gw1]) > >>> - nsenter --net=/proc/$gw1_pid/ns/net nft -a list ruleset > >>> - nsenter --net=/proc/$gw1_pid/ns/net nft delete table ip > ovn-test > >>> - fi > >>> + echo "$(date +%H:%M:%S.%03N) Statistics after network restored > (after $action)" > >>> + new_lost=$(dump_statistics) > >>> + echo "===> $((new_lost - lost)) packets lost during network > restoration" > >>> + AT_CHECK([test "$((new_lost - lost))" -le > "$max_expected_loss_after_restoration"]) > >>> + stop_sending_background_packets > >>> +} > >>> > >>> - if [ test "$action" == "kill_gw2" ]; then > >>> - # gw1 still alive, but gw2 did not try to claim => gw1 should > not generate new garps. > >>> - check_for_new_garps gw1 "false" > >>> - check_for_new_garps gw2 "false" > >>> - check_for_new_garps gw3 "false" > >>> - check_for_new_echo_pkts gw1 00:00:c0:a8:00:01 > f0:00:c0:a8:00:fe "true" > >>> - check_for_new_echo_pkts gw2 00:00:c0:a8:00:01 > f0:00:c0:a8:00:fe "false" > >>> - check_for_new_echo_pkts gw3 00:00:c0:a8:00:01 > f0:00:c0:a8:00:fe "false" > >>> - check_for_new_echo_pkts ch1 f0:00:c0:a8:01:fe > f0:00:c0:a8:01:01 "true" > >>> - AS_BOX([$(date +%H:%M:%S.%03N) Restarting gw2 ovn-vswitchd]) > >>> - m_as ovn-gw-2 /usr/share/openvswitch/scripts/ovs-ctl start > --system-id=ovn-gw-2 > >>> - > >>> - AS_BOX([$(date +%H:%M:%S.%03N) Restarting gw2 ovn-controller]) > >>> - m_as ovn-gw-2 /usr/share/ovn/scripts/ovn-ctl start_controller > ${CONTROLLER_SSL_ARGS} > >>> - fi > >>> +check_garps() > >>> +{ > >>> + check_for_new_garps gw1 "$1" > >>> + check_for_new_garps gw2 "$2" > >>> + check_for_new_garps gw3 "$3" > >>> +} > >>> > >>> - if [ test "$action" == "kill_gw1" ]; then > >>> - # gw1 died => gw2 should generate garps. > >>> - check_for_new_garps gw1 "false" > >>> - check_for_new_garps gw2 "true" > >>> - check_for_new_garps gw3 "false" > >>> - check_for_new_echo_pkts gw1 00:00:c0:a8:00:01 > f0:00:c0:a8:00:fe "false" > >>> - check_for_new_echo_pkts gw2 00:00:c0:a8:00:01 > f0:00:c0:a8:00:fe "true" > >>> - check_for_new_echo_pkts gw3 00:00:c0:a8:00:01 > f0:00:c0:a8:00:fe "false" > >>> - check_for_new_echo_pkts ch1 f0:00:c0:a8:01:fe > f0:00:c0:a8:01:01 "true" > >>> - AS_BOX([$(date +%H:%M:%S.%03N) Restarting gw1 ovn-vswitchd]) > >>> - m_as ovn-gw-1 /usr/share/openvswitch/scripts/ovs-ctl start > --system-id=ovn-gw-1 > >>> - > >>> - AS_BOX([$(date +%H:%M:%S.%03N) Restarting gw1 ovn-controller]) > >>> - m_as ovn-gw-1 /usr/share/ovn/scripts/ovn-ctl start_controller > ${CONTROLLER_SSL_ARGS} > >>> - fi > >>> +check_packets() > >>> +{ > >>> + check_for_new_echo_pkts gw1 00:00:c0:a8:00:01 f0:00:c0:a8:00:fe > "$1" > >>> + check_for_new_echo_pkts gw2 00:00:c0:a8:00:01 f0:00:c0:a8:00:fe > "$2" > >>> + check_for_new_echo_pkts gw3 00:00:c0:a8:00:01 f0:00:c0:a8:00:fe > "$3" > >>> + check_for_new_echo_pkts ch1 f0:00:c0:a8:01:fe f0:00:c0:a8:01:01 > "$4" > >>> +} > >>> + > >>> +check_migration_between_gw1_and_gw2_bfd_stop() > >>> +{ > >>> + AS_BOX([$(date +%H:%M:%S.%03N) Testing migration after bfd_stop]) > >>> + max_expected_max_expected_loss1=$1 > >>> + max_expected_max_expected_loss2=$2 > >>> + prepare > >>> + > >>> + echo "$(date +%H:%M:%S.%03N) Blocking bfd on gw1 (from $ip_gw1 to > $ip_gw2)" > >>> + nsenter --net=/proc/$gw1_pid/ns/net nft add table ip ovn-test > >>> + nsenter --net=/proc/$gw1_pid/ns/net nft 'add chain ip ovn-test > INPUT { type filter hook input priority 0; policy accept; }' > >>> + # Drop BFD from gw-1 to gw-2: geneve port (6081), inner port 3784 > (0xec8), Session state Up, Init, Down. > >>> + nsenter --net=/proc/$gw1_pid/ns/net nft add rule ip ovn-test > INPUT ip daddr $ip_gw1 ip saddr $ip_gw2 udp dport 6081 '@th,416,16 == > 0x0ec8 @th,472,8 == 0xc0 counter drop' > >>> + nsenter --net=/proc/$gw1_pid/ns/net nft add rule ip ovn-test > INPUT ip daddr $ip_gw1 ip saddr $ip_gw2 udp dport 6081 '@th,416,16 == > 0x0ec8 @th,472,8 == 0x80 counter drop' > >>> + nsenter --net=/proc/$gw1_pid/ns/net nft add rule ip ovn-test > INPUT ip daddr $ip_gw1 ip saddr $ip_gw2 udp dport 6081 '@th,416,16 == > 0x0ec8 @th,472,8 == 0x40 counter drop' > >>> + > >>> + check_loss_after_flap "gw1" $max_expected_max_expected_loss1 > >>> + > >>> + # gw1 still alive and gw2 tried to claim => gw1 should restart > generating garps. > >>> + check_garps "true" "false" "false" > >>> + check_packets "true" "false" "false" "true" > >>> + > >>> + echo "$(date +%H:%M:%S.%03N) Unblocking bfd on gw1" > >>> + nsenter --net=/proc/$gw1_pid/ns/net nft -a list ruleset > >>> + nsenter --net=/proc/$gw1_pid/ns/net nft delete table ip ovn-test > >>> > >>> # The network is now restored => packets should go through gw1 > and reach chassis-1. > >>> - check_for_new_echo_pkts gw1 00:00:c0:a8:00:01 f0:00:c0:a8:00:fe > "true" > >>> - check_for_new_echo_pkts gw2 00:00:c0:a8:00:01 f0:00:c0:a8:00:fe > "false" > >>> - check_for_new_echo_pkts gw3 00:00:c0:a8:00:01 f0:00:c0:a8:00:fe > "false" > >>> - check_for_new_echo_pkts ch1 f0:00:c0:a8:01:fe f0:00:c0:a8:01:01 > "true" > >>> - AS_BOX([$(date +%H:%M:%S.%03N) Statistics after network restored]) > >>> - dump_statistics > >>> - stop_sending_background_packets > >>> + check_packets "true" "false" "false" "true" > >>> + final_check "bfd_stop" $lost $max_expected_max_expected_loss2 > >>> +} > >>> + > >>> +check_migration_between_gw1_and_gw2_kill_gw2() { > >>> + AS_BOX([$(date +%H:%M:%S.%03N) Check migration after killing gw2 > ovn-controller & vswitchd]) > >>> + max_expected_loss1=$1 > >>> + max_expected_loss2=$2 > >>> + prepare > >>> + > >>> + on_exit 'm_as ovn-gw-2 /usr/share/openvswitch/scripts/ovs-ctl > status || > >>> + m_as ovn-gw-2 /usr/share/openvswitch/scripts/ovs-ctl > start --system-id=ovn-gw-2' > >>> + on_exit 'm_as ovn-gw-2 /usr/share/ovn/scripts/ovn-ctl > status_controller || > >>> + m_as ovn-gw-2 /usr/share/ovn/scripts/ovn-ctl > start_controller ${CONTROLLER_SSL_ARGS}' > >>> + > >>> + m_as ovn-gw-2 kill -9 $(m_as ovn-gw-2 cat > /run/ovn/ovn-controller.pid) > >>> + m_as ovn-gw-2 kill -9 $(m_as ovn-gw-2 cat > /run/openvswitch/ovs-vswitchd.pid) > >>> + m_as ovn-gw-2 kill -9 $(m_as ovn-gw-2 cat > /run/openvswitch/ovsdb-server.pid) > >>> + m_as ovn-gw-2 ovs-dpctl del-dp system@ovs-system > >>> + > >>> + check_loss_after_flap "gw2" $max_expected_loss1 > >>> + > >>> + # gw1 still alive, but gw2 did not try to claim => gw1 should not > generate new garps. > >>> + check_garps "false" "false" "false" > >>> + check_packets "true" "fals" "false" "true" > >>> + > >>> + echo "$(date +%H:%M:%S.%03N) Restarting gw2 ovn-vswitchd]" > >>> + m_as ovn-gw-2 /usr/share/openvswitch/scripts/ovs-ctl start > --system-id=ovn-gw-2 > >>> + > >>> + echo "$(date +%H:%M:%S.%03N) Restarting gw2 ovn-controller" > >>> + m_as ovn-gw-2 /usr/share/ovn/scripts/ovn-ctl start_controller > ${CONTROLLER_SSL_ARGS} > >>> + > >>> + # The network is now restored => packets should go through gw1 > and reach chassis-1. > >>> + check_packets "true" "false" "false" "true" > >>> + final_check "kill_gw2" $lost $max_expected_loss2 > >>> +} > >>> + > >>> +check_migration_between_gw1_and_gw2_update_ovs() { > >>> + AS_BOX([$(date +%H:%M:%S.%03N) Check migration after restarting > gw1 ovs-vswitchd ("update")]) > >>> + max_expected_loss1=$1 > >>> + max_expected_loss2=$2 > >>> + prepare > >>> + > >>> + m_as ovn-gw-1 /usr/share/openvswitch/scripts/ovs-ctl restart > --system-id=ovn-gw-1 > >>> + > >>> + check_loss_after_flap "gw1" $max_expected_loss1 > >>> + > >>> + # The network is now restored => packets should go through gw1 > and reach chassis-1. > >>> + check_packets "true" "false" "false" "true" > >>> + final_check "ovs_update" $lost $max_expected_loss2 > >>> +} > >>> + > >>> +check_migration_between_gw1_and_gw2_kill_gw1() { > >>> + AS_BOX([$(date +%H:%M:%S.%03N) Killing gw1 ovn-controller and > ovs-vswitchd]) > >>> + max_expected_loss1=$1 > >>> + max_expected_loss2=$2 > >>> + prepare > >>> + > >>> + on_exit 'm_as ovn-gw-1 /usr/share/openvswitch/scripts/ovs-ctl > status || > >>> + m_as ovn-gw-1 /usr/share/openvswitch/scripts/ovs-ctl > start --system-id=ovn-gw-1' > >>> + on_exit 'm_as ovn-gw-1 /usr/share/ovn/scripts/ovn-ctl > status_controller || > >>> + m_as ovn-gw-1 /usr/share/ovn/scripts/ovn-ctl > start_controller ${CONTROLLER_SSL_ARGS}' > >>> + > >>> + m_as ovn-gw-1 kill -9 $(m_as ovn-gw-1 cat > /run/ovn/ovn-controller.pid) > >>> + m_as ovn-gw-1 kill -9 $(m_as ovn-gw-1 cat > /run/openvswitch/ovs-vswitchd.pid) > >>> + m_as ovn-gw-1 kill -9 $(m_as ovn-gw-1 cat > /run/openvswitch/ovsdb-server.pid) > >>> + # Also delete datapath (flows) > >>> + m_as ovn-gw-1 ovs-dpctl del-dp system@ovs-system > >>> + > >>> + check_loss_after_flap "gw1" $max_expected_loss1 > >>> + > >>> + # gw1 died => gw2 should generate garps. > >>> + check_garps "false" "true" "false" > >>> + check_packets "false" "true" "false" "true" > >>> + echo "$(date +%H:%M:%S.%03N) Restarting gw1 ovn-vswitchd after > killing gw1" > >>> + m_as ovn-gw-1 /usr/share/openvswitch/scripts/ovs-ctl start > --system-id=ovn-gw-1 > >>> + > >>> + # Wait some long time before restarting ovn-controller > >>> + sleep 10 > >>> + > >>> + # gw2 should still be handling packets as OVN not restarted on gw1 > >>> + check_packets "false" "true" "false" "true" > >>> + > >>> + echo "$(date +%H:%M:%S.%03N) Restarting gw1 ovn-controller after > killing gw1" > >>> + m_as ovn-gw-1 /usr/share/ovn/scripts/ovn-ctl start_controller > ${CONTROLLER_SSL_ARGS} > >>> + > >>> + # The network is now restored => packets should go through gw1 > and reach chassis-1. > >>> + check_packets "true" "false" "false" "true" > >>> + final_check "kill_gw1" $lost $max_expected_loss2 > >>> +} > >>> + > >>> +check_migration_between_gw1_and_gw2_reboot_gw1() { > >>> + ip_gw1_eth1=$(podman exec ovn-gw-1 ip -brief address show eth1 | > awk '{print $3}' | cut -d/ -f1) > >>> + cidr=$(podman exec ovn-gw-1 ip -brief address show eth1 | awk > '{print $3}' | cut -d/ -f2) > >>> + AS_BOX([$(date +%H:%M:%S.%03N) Rebooting ovn-gw-1 with > $ip_gw1_eth1/$cidr]) > >>> + max_expected_loss1=$1 > >>> + max_expected_loss2=$2 > >>> + prepare > >>> + > >>> + podman stop -t 0 ovn-gw-1 > >>> + (exec 3>&- 4>&- 5>&- 6>&-; podman start ovn-gw-1) > >>> + > >>> + add_port br-ovn eth1 $ip_gw1_eth1/$cidr > >>> + add_port br-ovn-ext eth2 > >>> + M_START_TCPDUMPS([ovn-gw-1], [-neei eth2], [gw1], [ovn-gw-1], > [-neei eth1], [gw1_eth1], [ovn-gw-1], [-neei eth2 -Q out], [gw1_out]) > >>> + check_loss_after_flap "gw1" $max_expected_loss1 > >>> + > >>> + # gw1 died => gw2 should generate garps. > >>> + check_garps "false" "true" "false" > >>> + check_packets "false" "true" "false" "true" > >>> + > >>> + echo "$(date +%H:%M:%S.%03N) Restarting gw1 ovn-vswitchd after > rebooting gw1" > >>> + m_as ovn-gw-1 /usr/share/openvswitch/scripts/ovs-ctl start > --system-id=ovn-gw-1 > >>> + > >>> + # Wait some long time before restarting ovn-controller > >>> + sleep 10 > >>> + > >>> + # gw2 should still be handling packets as OVN not restarted on gw1 > >>> + check_packets "false" "true" "false" "true" > >>> + > >>> + echo "$(date +%H:%M:%S.%03N) Restarting gw1 ovn-controller after > rebooting gw1" > >>> + m_as ovn-gw-1 /usr/share/ovn/scripts/ovn-ctl start_controller > ${CONTROLLER_SSL_ARGS} > >>> + > >>> + # The network is now restored => packets should go through gw1 > and reach chassis-1. > >>> + check_packets "true" "false" "false" "true" > >>> + final_check "kill_gw1" $lost $max_expected_loss2 > >>> +} > >>> + > >>> +check_compute_restart() { > >>> + AS_BOX([$(date +%H:%M:%S.%03N) Killing ovn-chassis-1 > ovn-controller and ovs-vswitchd]) > >>> + max_expected_loss=$1 > >>> + prepare > >>> + > >>> + # Kill ovn-chassis-1 > >>> + echo "$(date +%H:%M:%S.%03N) Killing chassis-1" > >>> + on_exit 'm_as ovn-chassis-1 > /usr/share/openvswitch/scripts/ovs-ctl status || > >>> + m_as ovn-chassis-1 > /usr/share/openvswitch/scripts/ovs-ctl start --system-id=ovn-chassis-1' > >>> + on_exit 'm_as ovn-chassis-1 /usr/share/ovn/scripts/ovn-ctl > status_controller || > >>> + m_as ovn-chassis-1 /usr/share/ovn/scripts/ovn-ctl > start_controller ${CONTROLLER_SSL_ARGS}' > >>> + > >>> + m_as ovn-chassis-1 kill -9 $(m_as ovn-chassis-1 cat > /run/ovn/ovn-controller.pid) > >>> + m_as ovn-chassis-1 kill -9 $(m_as ovn-chassis-1 cat > /run/openvswitch/ovs-vswitchd.pid) > >>> + m_as ovn-chassis-1 kill -9 $(m_as ovn-chassis-1 cat > /run/openvswitch/ovsdb-server.pid) > >>> + > >>> + # Now restart chassis-1 > >>> + flap_count=$(m_as ovn-gw-2 ovs-vsctl get interfac > $from_gw2_to_ch1 bfd_status | sed 's/.*flap_count=\"\([[0-9]]*\).*/\1/g') > >>> + echo "$(date +%H:%M:%S.%03N) Restarting ovn-chassis-1 > ovn-vswitchd." > >>> + m_as ovn-chassis-1 /usr/share/openvswitch/scripts/ovs-ctl start > --system-id=ovn-chassis-1 > >>> + > >>> + echo "$(date +%H:%M:%S.%03N) Waiting for flap count between gw-1 > and chassis-1 to increase" > >>> + OVS_WAIT_UNTIL([ > >>> + new_flap_count=$(m_as ovn-gw-1 ovs-vsctl get interfac > $from_gw1_to_ch1 bfd_status | sed 's/.*flap_count=\"\([[0-9]]*\).*/\1/g') > >>> + echo "Comparing $new_flap_count versus $flap_count" > >>> + test "$new_flap_count" -gt "$((flap_count))" > >>> + ]) > >>> + > >>> + wait_bfd_up ovn-chassis-1 $from_ch1_to_gw1 > >>> + > >>> + echo "$(date +%H:%M:%S.%03N) Restarting ovn-chassis-1 > ovn-controller." > >>> + m_as ovn-chassis-1 /usr/share/ovn/scripts/ovn-ctl > start_controller ${CONTROLLER_SSL_ARGS} > >>> + > >>> + # Wait a long time to catch losses > >>> + sleep 5 > >>> + final_check "compute" 0 $max_expected_loss > >>> } > >>> > >>> start_tcpdump > >>> -AS_BOX([$(date +%H:%M:%S.%03N) Sending packet from hv1-vif1(inside1) > to ext1]) > >>> +echo "$(date +%H:%M:%S.%03N) Sending packet from hv1-vif1(inside1) to > ext1" > >>> M_NS_CHECK_EXEC([ovn-chassis-1], [hv1-vif1], [ping -c3 -q -i 0.1 > 192.168.0.1 | FORMAT_PING], > >>> [0], [dnl > >>> 3 packets transmitted, 3 received, 0% packet loss, time 0ms > >>> @@ -3433,7 +3629,7 @@ M_NS_CHECK_EXEC([ovn-chassis-1], [hv1-vif1], > [ping -c3 -q -i 0.1 192.168.0.1 | F > >>> stop_tcpdump > >>> > >>> # It should have gone through gw1 and not gw2 > >>> -AS_BOX([$(date +%H:%M:%S.%03N) Checking it went through gw1 and not > gw2]) > >>> +echo "$(date +%H:%M:%S.%03N) Checking it went through gw1 and not gw2" > >>> AT_CHECK([cat gw2.tcpdump | grep "ICMP echo"], [1], [dnl > >>> ]) > >>> > >>> @@ -3446,17 +3642,29 @@ f0:00:c0:a8:00:fe > 00:00:c0:a8:00:01, > ethertype IPv4 (0x0800), length 98: 192.1 > >>> 00:00:c0:a8:00:01 > f0:00:c0:a8:00:fe, ethertype IPv4 (0x0800), > length 98: 192.168.0.1 > 192.168.1.1: ICMP echo reply, > >>> ]) > >>> > >>> -# We stop bfd between gw1 & gw2, but keep gw1 & gw2 running. > >>> -check_migration_between_gw1_and_gw2 "stop_bfd" > >>> +# We stop bfd between gw1 & gw2, but keep gw1 & gw2 running. We > should not lose packets. > >>> +check_migration_between_gw1_and_gw2_bfd_stop 1 1 > >>> > >>> # We simulate death of gw2. It should not have any effect. > >>> -check_migration_between_gw1_and_gw2 "kill_gw2" > >>> +check_migration_between_gw1_and_gw2_kill_gw2 1 1 > >>> + > >>> +# We simulate ovs update on gw1. When ovs is stopped, flows should > still be handled by Kernel datapath. > >>> +# When OVS is restarted, BFD should go down immediately, and gw2 will > start handling packets. > >>> +# There will be packet losses as gw2 will usually see BFD from gw1 up > (and hence relase port) before gw1 sees > >>> +# BFD up (and claim port). > >>> +check_migration_between_gw1_and_gw2_update_ovs 20 1 > >>> + > >>> +# We simulate restart of both OVS & OVN gw1. gw2 should take over. > >>> +check_migration_between_gw1_and_gw2_kill_gw1 40 20 > >>> > >>> # We simulate death of gw1. gw2 should take over. > >>> -check_migration_between_gw1_and_gw2 "kill_gw1" > >>> +check_migration_between_gw1_and_gw2_reboot_gw1 40 20 > >>> + > >>> +# We simulate restart of ovn-chassis-1. We expct for ~2 sec losses as > we wait for bfd up before starting > >>> +# ovn-controller. > >>> +check_compute_restart 30 > >>> > >>> AT_CLEANUP > >>> -]) > >>> > >>> AT_SETUP([ovn multinode bgp L2 EVPN]) > >>> check_fake_multinode_setup > >>> -- > >>> 2.47.1 > >>> > >> > > > > Regards, > > Dumitru > > > >
Hi Dumitru I also have one comment on one of your (no so) nits below. Thanks Xavier On Wed, Mar 25, 2026 at 4:31 PM Dumitru Ceara <dceara@redhat.com> wrote: > On 3/12/26 8:35 PM, Mark Michelson via dev wrote: > > Hi Xavier, > > > > Hi Xavier, Mark, > > > It took a bit for me to look through all of this, but I believe this > > looks good to me. Thanks for adding the notes about the offline > > discussions, as I had not recalled all the details about those > > discussions between then and now. > > > > Acked-by: Mark Michelson <mmichels@redhat.com> > > > > I think this change makes sense too. I do have some very small comments > below but I might be able to address them myself and just squash that > into this version of the patch before applying it to main. > > Please let me know what you think. > > > On Mon, Mar 9, 2026 at 3:36 PM Xavier Simonart <xsimonar@redhat.com> > wrote: > >> > >> If a server unexpectedly rebooted, OVS, when restarted, sets BFD > >> UP on bfd-enabled geneve tunnels. > >> However, if it takes time to restart OVN, an HA gw chassis > >> would attract the traffic while being unable to handle it > >> (as no flows), resulting in traffic loss. > >> > >> This is fixed by re-using ovs flow-restore-wait. > >> If set, OVS waits (prevents upcalls, ignores bfd, ...) until reset. > >> Once OVS receives the notification of flow-restore-wait being false, > >> it restarts handling upcalls, bfd... and ignores any new change to > >> flow-restore-wait. > >> > >> Hence, on chassis hosting ha gateways, OVN toggles flow-restore-wait: > >> it set it to false, waits for ack from OVS and then sets it back to > true. > >> If server reboots, OVS will see flow-restore-wait being true. > >> > >> OVN also sets external_ids->ovn-managed-flow-restore-wait when setting > >> flow-restore-wait. When set, it tells that OVN once set > flow-restore-wait. > >> > >> "ovs-ctl restart" also uses flow-restore-wait: when called, it saves the > >> flows, stops "ovs-vswitchd", sets "flow-restore-wait" to true, restarts > >> "ovs-vswitchd", restores the flows and finally removes > "flow-restore-wait". > >> So OVS will wait either for "ovs-ctl restart" to remove > "flow-restore-wait" > >> or for OVN to set "flow-restore-wait" to false. > >> > >> Reported-at: https://issues.redhat.com/browse/FDP-3075 > >> Signed-off-by: Xavier Simonart <xsimonar@redhat.com> > >> > >> --- > >> -v2 : - Updated based on Mark's feedback (commit message, comments). > >> - Avoid setting flow-restore-wait for computes. > >> - Add external_ids->ovn-managed-flow-restore-wait. > >> - Updated test: add test for compute update + nits (variable name > changes) > >> --- > >> controller/bfd.c | 5 +- > >> controller/bfd.h | 4 +- > >> controller/ovn-controller.8.xml | 11 + > >> controller/ovn-controller.c | 171 +++++++++- > >> tests/multinode-macros.at | 22 ++ > >> tests/multinode.at | 546 ++++++++++++++++++++++---------- > >> 6 files changed, 584 insertions(+), 175 deletions(-) > >> > >> diff --git a/controller/bfd.c b/controller/bfd.c > >> index 3b0c3f6da..56bfa4936 100644 > >> --- a/controller/bfd.c > >> +++ b/controller/bfd.c > >> @@ -117,13 +117,14 @@ bfd_calculate_active_tunnels(const struct > ovsrec_bridge *br_int, > >> * > >> * If 'our_chassis' is C5 then this function returns empty bfd set. > >> */ > >> -void > >> +bool > >> bfd_calculate_chassis( > >> const struct sbrec_chassis *our_chassis, > >> const struct sbrec_ha_chassis_group_table *ha_chassis_grp_table, > >> struct sset *bfd_chassis) > >> { > >> const struct sbrec_ha_chassis_group *ha_chassis_grp; > >> + bool chassis_is_ha_gw = false; > >> SBREC_HA_CHASSIS_GROUP_TABLE_FOR_EACH (ha_chassis_grp, > >> ha_chassis_grp_table) { > >> bool is_ha_chassis = false; > >> @@ -143,6 +144,7 @@ bfd_calculate_chassis( > >> sset_add(&grp_chassis, ha_ch->chassis->name); > >> if (our_chassis == ha_ch->chassis) { > >> is_ha_chassis = true; > >> + chassis_is_ha_gw = true; > >> bfd_setup_required = true; > >> } > >> } > >> @@ -178,6 +180,7 @@ bfd_calculate_chassis( > >> } > >> sset_destroy(&grp_chassis); > >> } > >> + return chassis_is_ha_gw; > >> } > >> > >> void > >> diff --git a/controller/bfd.h b/controller/bfd.h > >> index f8fece5a5..3e3384891 100644 > >> --- a/controller/bfd.h > >> +++ b/controller/bfd.h > >> @@ -16,6 +16,8 @@ > >> #ifndef OVN_BFD_H > >> #define OVN_BFD_H 1 > >> > >> +#include <stdbool.h> > >> + > >> struct hmap; > >> struct ovsdb_idl; > >> struct ovsdb_idl_index; > >> @@ -36,7 +38,7 @@ void bfd_run(const struct ovsrec_interface_table *, > >> const struct sbrec_sb_global_table *, > >> const struct ovsrec_open_vswitch_table *); > >> > >> -void bfd_calculate_chassis( > >> +bool bfd_calculate_chassis( > >> const struct sbrec_chassis *, > >> const struct sbrec_ha_chassis_group_table *, > >> struct sset *); > >> diff --git a/controller/ovn-controller.8.xml > b/controller/ovn-controller.8.xml > >> index 57e7cf5dd..33281a4d6 100644 > >> --- a/controller/ovn-controller.8.xml > >> +++ b/controller/ovn-controller.8.xml > >> @@ -531,6 +531,17 @@ > >> 65535. > >> </dd> > >> > >> + <dt> > >> + <code>external_ids:ovn-managed-flow-restore-wait</code> in the > >> + <code>Open_vSwitch</code> table > >> + </dt> > >> + <dd> > >> + When set to true, this key indicates that > <code>ovn-controller</code> > >> + has set the <code>other_config:flow-restore-wait</code> option. > >> + The key is set when <code>ovn-controller</code> enables > >> + flow-restore-wait and removed when it clears it. > >> + </dd> > >> + > >> <dt> > >> <code>external_ids:ct-zone-*</code> in the <code>Bridge</code> > table > >> </dt> > >> diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c > >> index 5815f1b92..0611b1767 100644 > >> --- a/controller/ovn-controller.c > >> +++ b/controller/ovn-controller.c > >> @@ -211,6 +211,150 @@ static char *get_file_system_id(void) > >> free(filename); > >> return ret; > >> } > >> + > >> +/* Set/unset flow-restore-wait, and inc ovs next_cfg if false > >> + * When set to true, also sets ovn-managed-flow-restore-wait to true to > >> + * indicate ownership */ > >> +static void set_flow_restore_wait(struct ovsdb_idl_txn *ovs_idl_txn, > >> + const struct ovsrec_open_vswitch > *cfg, > >> + const struct smap *other_config, > >> + const char *val, bool ovn_managed) > > Nit: "static void" should be on a different line. > > >> +{ > >> + struct smap new_config; > >> + smap_clone(&new_config, other_config); > >> + smap_replace(&new_config, "flow-restore-wait", val); > >> + ovsrec_open_vswitch_set_other_config(cfg, &new_config); > >> + if (!strcmp(val, "true")) { > > I'd prefer we use an actual boolean here as 'val'. > > We'd only have to change: > > smap_replace(&new_config, "flow-restore-wait", val ? "true", "false"); > > >> + ovsrec_open_vswitch_update_external_ids_setkey( > >> + cfg, "ovn-managed-flow-restore-wait", "true"); > >> + } else if (ovn_managed) { > >> + ovsrec_open_vswitch_update_external_ids_delkey( > >> + cfg, "ovn-managed-flow-restore-wait"); > >> + } > >> + ovsdb_idl_txn_increment(ovs_idl_txn, &cfg->header_, > >> + &ovsrec_open_vswitch_col_next_cfg, true); > >> + smap_destroy(&new_config); > >> +} > >> + > >> +static void > >> +manage_flow_restore_wait(struct ovsdb_idl_txn *ovs_idl_txn, > >> + const struct ovsrec_open_vswitch *cfg, > >> + uint64_t ofctrl_cur_cfg, uint64_t > ovs_next_cfg, > >> + int ovs_txn_status, bool is_ha_gw) > >> +{ > >> + enum flow_restore_wait_state { > >> + FRW_INIT, /* Initial state */ > >> + FRW_WAIT_TXN_COMPLETE, /* Sent false, waiting txn to complete > */ > >> + FRW_TXN_SUCCESS, /* Txn completed. Waiting for OVS Ack. > */ > >> + FRW_DONE /* Everything completed */ > >> + }; > >> + > >> + static int64_t frw_next_cfg; > >> + static enum flow_restore_wait_state frw_state; > >> + static bool ofctrl_was_connected = false; > >> + > >> + bool ofctrl_connected = ofctrl_is_connected(); > >> + > >> + if (!ovs_idl_txn || !cfg) { > >> + return; > >> + } > >> + > >> + /* If OVS is stopped/started, make sure flow-restore-wait is > toggled */ > > Nit: comments should be sentences and end with a '.'. > > >> + if (ofctrl_connected && !ofctrl_was_connected) { > >> + frw_state = FRW_INIT; > >> + } > >> + ofctrl_was_connected = ofctrl_connected; > >> + > >> + if (!ofctrl_connected) { > >> + return; > >> + } > >> + > >> + bool frw = smap_get_bool(&cfg->other_config, "flow-restore-wait", > false); > >> + bool ovn_managed_once = smap_get_bool(&cfg->external_ids, > >> + "ovn-managed-flow-restore-wait", > false); > > Nit: indentation. > > >> + > >> + if (frw && !ovn_managed_once) { > >> + /* frw has been set by ovs-ctl. Do not touch. */ > >> + return; > >> + } > >> + > >> + if (!is_ha_gw) { > >> + if (frw) { > >> + /* frw has once been set by OVN. We are now not an HA > chassis > >> + * anymore, unset it. */ > >> + set_flow_restore_wait(ovs_idl_txn, cfg, &cfg->other_config, > >> + "false", ovn_managed_once); > >> + } > >> + /* else we are not an HA chassis and frw is false. Ignore it. > */ > >> + return; > >> + } > >> + > >> + switch (frw_state) { > >> + case FRW_INIT: > >> + if (ofctrl_cur_cfg > 0) { > >> + set_flow_restore_wait(ovs_idl_txn, cfg, &cfg->other_config, > >> + "false", ovn_managed_once); > >> + frw_state = FRW_WAIT_TXN_COMPLETE; > >> + VLOG_INFO("Setting flow-restore-wait=false " > >> + "(cur_cfg=%"PRIu64")", ofctrl_cur_cfg); > >> + } > >> + break; > >> + > >> + case FRW_WAIT_TXN_COMPLETE: > >> + /* if (ovs_idl_txn != NULL), the transaction completed. > >> + * When the transaction completed, it either failed > >> + * (ovs_txn_status == 0) or succeeded (ovs_txn_status != 0) */ > >> + if (ovs_txn_status == 0) { > >> + /* Previous transaction failed. */ > >> + set_flow_restore_wait(ovs_idl_txn, cfg, &cfg->other_config, > >> + "false", ovn_managed_once); > >> + break; > >> + } > >> + /* txn succeeded, get next_cfg */ > >> + frw_next_cfg = ovs_next_cfg; > >> + frw_state = FRW_TXN_SUCCESS; > >> + /* fall through */ > >> + > >> + case FRW_TXN_SUCCESS: > >> + if (ovs_next_cfg < frw_next_cfg) { > >> + /* DB was reset, next_cfg went backwards */ > >> + VLOG_INFO("OVS DB reset (next_cfg %"PRId64" -> %"PRIu64"), > " > >> + "resetting state", > >> + frw_next_cfg, ovs_next_cfg); > >> + set_flow_restore_wait(ovs_idl_txn, cfg, &cfg->other_config, > >> + "false", ovn_managed_once); > >> + frw_state = FRW_WAIT_TXN_COMPLETE; > >> + break; > >> + } > >> + > >> + if (!frw) { > >> + if (cfg->cur_cfg >= frw_next_cfg) { > >> + set_flow_restore_wait(ovs_idl_txn, cfg, > &cfg->other_config, > >> + "true", ovn_managed_once); > >> + frw_state = FRW_DONE; > >> + VLOG_INFO("Setting flow-restore-wait=true"); > >> + } > >> + } else { > >> + /* The transaction to false succeeded but frw is true. > >> + * So, another task already set it to true */ > >> + frw_state = FRW_DONE; > >> + VLOG_INFO("flow-restore-wait was already true"); > >> + } > >> + break; > >> + case FRW_DONE: > >> + if (!frw) { > >> + /* frw has been removed (e.g. by ovs-ctl restart) or is > false > >> + * (e.g. txn failed.) */ > > Nit: extra whitespace at the beginning of the line. > > >> + set_flow_restore_wait(ovs_idl_txn, cfg, &cfg->other_config, > >> + "false", ovn_managed_once); > >> + frw_state = FRW_WAIT_TXN_COMPLETE; > >> + VLOG_INFO("OVS frw cleared, restarting flow-restore-wait > sequence " > >> + "(cur_cfg=%"PRIu64")", ofctrl_cur_cfg); > >> + } > >> + break; > >> + } > >> +} > >> + > >> /* Only set monitor conditions on tables that are available in the > >> * server schema. > >> */ > >> @@ -3381,6 +3525,7 @@ en_mac_cache_cleanup(void *data) > >> > >> struct ed_type_bfd_chassis { > >> struct sset bfd_chassis; > >> + bool is_ha_gw; > >> }; > >> > >> static void * > >> @@ -3409,8 +3554,9 @@ en_bfd_chassis_run(struct engine_node *node, void > *data OVS_UNUSED) > >> = chassis_lookup_by_name(sbrec_chassis_by_name, chassis_id); > >> > >> sset_clear(&bfd_chassis->bfd_chassis); > >> - bfd_calculate_chassis(chassis, ha_chassis_grp_table, > >> - &bfd_chassis->bfd_chassis); > >> + bfd_chassis->is_ha_gw = bfd_calculate_chassis(chassis, > >> + ha_chassis_grp_table, > >> + > &bfd_chassis->bfd_chassis); > >> return EN_UPDATED; > >> } > >> > >> @@ -7117,6 +7263,7 @@ main(int argc, char *argv[]) > >> struct unixctl_server *unixctl; > >> struct ovn_exit_args exit_args = {0}; > >> struct br_int_remote br_int_remote = {0}; > >> + static uint64_t next_cfg = 0; > >> int retval; > >> > >> /* Read from system-id-override file once on startup. */ > >> @@ -7444,6 +7591,7 @@ main(int argc, char *argv[]) > >> > >> /* Main loop. */ > >> int ovnsb_txn_status = 1; > >> + int ovs_txn_status = 1; > >> bool sb_monitor_all = false; > >> struct tracked_acl_ids *tracked_acl_ids = NULL; > >> while (!exit_args.exiting) { > >> @@ -7545,6 +7693,11 @@ main(int argc, char *argv[]) > >> pinctrl_update_swconn(br_int_remote.target, > >> br_int_remote.probe_interval); > >> > >> + if (cfg && ovs_idl_txn && ovs_txn_status == -1) { > >> + /* txn was in progress and is now completed */ > >> + next_cfg = cfg->next_cfg; > >> + } > >> + > >> /* Enable ACL matching for double tagged traffic. */ > >> if (ovs_idl_txn && cfg) { > >> int vlan_limit = smap_get_int( > >> @@ -7894,6 +8047,13 @@ main(int argc, char *argv[]) > >> stopwatch_start(OFCTRL_SEQNO_RUN_STOPWATCH_NAME, > >> time_msec()); > >> ofctrl_seqno_run(ofctrl_get_cur_cfg()); > >> + if (ovs_idl_txn && bfd_chassis_data) { > >> + manage_flow_restore_wait(ovs_idl_txn, cfg, > >> + ofctrl_get_cur_cfg(), > >> + next_cfg, > ovs_txn_status, > >> + > bfd_chassis_data->is_ha_gw); > >> + } > >> + > >> stopwatch_stop(OFCTRL_SEQNO_RUN_STOPWATCH_NAME, > >> time_msec()); > >> stopwatch_start(IF_STATUS_MGR_RUN_STOPWATCH_NAME, > >> @@ -7993,7 +8153,7 @@ main(int argc, char *argv[]) > >> OVS_NOT_REACHED(); > >> } > >> > >> - int ovs_txn_status = > ovsdb_idl_loop_commit_and_wait(&ovs_idl_loop); > >> + ovs_txn_status = ovsdb_idl_loop_commit_and_wait(&ovs_idl_loop); > >> if (!ovs_txn_status) { > >> /* The transaction failed. */ > >> vif_plug_clear_deleted( > >> @@ -8012,6 +8172,9 @@ main(int argc, char *argv[]) > >> &vif_plug_deleted_iface_ids); > >> vif_plug_finish_changed( > >> &vif_plug_changed_iface_ids); > >> + if (cfg) { > >> + next_cfg = cfg->next_cfg; > >> + } > >> } else if (ovs_txn_status == -1) { > >> /* The commit is still in progress */ > >> } else { > >> @@ -8085,7 +8248,7 @@ loop_done: > >> } > >> > >> ovsdb_idl_loop_commit_and_wait(&ovnsb_idl_loop); > >> - int ovs_txn_status = > ovsdb_idl_loop_commit_and_wait(&ovs_idl_loop); > >> + ovs_txn_status = > ovsdb_idl_loop_commit_and_wait(&ovs_idl_loop); > >> if (!ovs_txn_status) { > >> /* The transaction failed. */ > >> vif_plug_clear_deleted( > >> diff --git a/tests/multinode-macros.at b/tests/multinode-macros.at > >> index 4a74d5127..646ee2d79 100644 > >> --- a/tests/multinode-macros.at > >> +++ b/tests/multinode-macros.at > >> @@ -41,6 +41,28 @@ m4_define([M_START_TCPDUMP], > >> ] > >> ) > >> > >> +m4_define([_M_START_TCPDUMPS_RECURSIVE], [ > > I think we normally put the _ at the end, i.e., M_START_TCPDUMPS_RECURSIVE_ > > >> + m4_if(m4_eval($# > 3), [1], [dnl > >> + names="$names $3" > >> + echo "Running podman exec $1 tcpdump -l $2 >$3.tcpdump > 2>$3.stderr" > >> + podman exec $1 tcpdump -l $2 >$3.tcpdump 2>$3.stderr & > >> + echo "podman exec $1 ps -ef | grep -v grep | grep tcpdump && > podman exec $1 killall tcpdump" >> cleanup > >> + _M_START_TCPDUMPS_RECURSIVE(m4_shift(m4_shift(m4_shift($@)))) > >> + ]) > >> + ] > >> +) > >> + > >> +# Start Multiple tcpdump. Useful to speed up when many tcpdump > >> +# must be started as waiting for "listening" takes usually 1 second. > >> +m4_define([M_START_TCPDUMPS], > >> + [ > >> + names="" > >> + _M_START_TCPDUMPS_RECURSIVE($@) > >> + for name in $names; do > >> + OVS_WAIT_UNTIL([grep -q "listening" ${name}.stderr]) > >> + done > >> + ] > >> +) > >> > >> # M_FORMAT_CT([ip-addr]) > >> # > >> diff --git a/tests/multinode.at b/tests/multinode.at > >> index 6b9614126..24d7ca27c 100644 > >> --- a/tests/multinode.at > >> +++ b/tests/multinode.at > >> @@ -2986,42 +2986,42 @@ AT_CLEANUP > >> > >> AT_SETUP([HA: Check for missing garp on leader when BFD goes back up]) > >> # Network topology > >> -# > ┌────────────────────────────────────────────────────────────────────────────────────────────────────────┐ > >> -# │ > │ > >> -# │ ┌───────────────────┐ ┌───────────────────┐ > ┌───────────────────┐ ┌───────────────────┐ │ > >> -# │ │ ovn-chassis-1 │ │ ovn-gw-1 │ │ > ovn-gw-2 │ │ ovn-chassis-2 │ │ > >> -# │ └─────────┬─────────┘ └───────────────────┘ > └───────────────────┘ └───────────────────┘ │ > >> -# │ ┌─────────┴─────────┐ > │ > >> -# │ │ inside1 │ > │ > >> -# │ │ 192.168.1.1/24 │ > │ > >> -# │ └─────────┬─────────┘ > │ > >> -# │ ┌─────────┴─────────┐ > │ > >> -# │ │ inside │ > │ > >> -# │ └─────────┬─────────┘ > │ > >> -# │ ┌─────────┴─────────┐ > │ > >> -# │ │ 192.168.1.254 │ > │ > >> -# │ │ R1 │ > │ > >> -# │ │ 192.168.0.254 │ > │ > >> -# │ └─────────┬─────────┘ > │ > >> -# │ > └------eth1---------------┬--------eth1-----------┐ > │ > >> -# │ ┌──────────┴────────┐ > ┌─────────┴─────────┐ │ > >> -# │ │ 192.168.1.254 │ │ > 192.168.1.254 │ │ > >> -# │ │ R1 │ │ > R1 │ │ > >> -# │ │ 192.168.0.254 │ │ > 192.168.0.254 │ │ > >> -# │ └─────────┬─────────┘ > └─────────┬─────────┘ │ > >> -# │ │ > │ ┌───────────────────┐ │ > >> -# │ ┌─────────┴─────────┐ > ┌─────────┴─────────┐ │ 192.168.0.1 │ │ > >> -# │ │ outside │ │ > outside │ │ ext1 │ │ > >> -# │ └─────────┬─────────┘ > └─────────┬─────────┘ └─────────┬─────────┘ │ > >> -# │ ┌─────────┴─────────┐ > ┌─────────┴─────────┐ ┌─────────┴─────────┐ │ > >> -# │ │ ln-outside │ │ > ln-outside │ │ ln-ext1 │ │ > >> -# │ └─────────┬─────────┘ > └─────────┬─────────┘ └─────────┬─────────┘ │ > >> -# │ ┌─────────┴─────────┐ > ┌─────────┴─────────┐ ┌─────────┴─────────┐ │ > >> -# │ │ br-ex │ │ > br-ex │ │ br-ex │ │ > >> -# │ └─────────┬─────────┘ > └─────────┬─────────┘ └─────────┬─────────┘ │ > >> -# │ > └---------eth2-----------┴-------eth2-------------┘ │ > >> -# │ > │ > >> -# > └────────────────────────────────────────────────────────────────────────────────────────────────────────┘ > >> +# > ┌────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ > >> +# │ > │ > >> +# │ ┌───────────────────┐ ┌───────────────────┐ > ┌───────────────────┐ ┌───────────────────┐ ┌───────────────────┐ │ > >> +# │ │ ovn-chassis-1 │ │ ovn-chassis-2 │ │ > ovn-gw-1 │ │ ovn-gw-2 │ │ ovn-chassis-3 │ │ > >> +# │ └─────────┬─────────┘ └─────────┬─────────┘ > └───────────────────┘ └───────────────────┘ └───────────────────┘ │ > >> +# │ ┌─────────┴─────────┐ ┌─────────┴─────────┐ > │ > >> +# │ │ inside1 │ │ inside2 │ > │ > >> +# │ │ 192.168.1.1/24 │ │ 192.168.1.2/24 │ > │ > >> +# │ └─────────┬─────────┘ └─────────┬─────────┘ > │ > >> +# │ ┌─┴────────────────────────┴─┐ > │ > >> +# │ │ inside │ > │ > >> +# │ └──────────────┬─────────────┘ > │ > >> +# │ ┌─────────┴─────────┐ > │ > >> +# │ │ 192.168.1.254 │ > │ > >> +# │ │ R1 │ > │ > >> +# │ │ 192.168.0.254 │ > │ > >> +# │ └─────────┬─────────┘ > │ > >> +# │ > └------eth1---------------------------┬--------eth1-----------┐ > │ > >> +# │ > ┌──────────┴────────┐ ┌─────────┴─────────┐ > │ > >> +# │ │ > 192.168.1.254 │ │ 192.168.1.254 │ │ > >> +# │ │ > R1 │ │ R1 │ │ > >> +# │ │ > 192.168.0.254 │ │ 192.168.0.254 │ │ > >> +# │ > └─────────┬─────────┘ └─────────┬─────────┘ > │ > >> +# │ > │ │ ┌───────────────────┐ │ > >> +# │ > ┌─────────┴─────────┐ ┌─────────┴─────────┐ │ 192.168.0.1 │ > │ > >> +# │ │ > outside │ │ outside │ │ ext1 │ │ > >> +# │ > └─────────┬─────────┘ └─────────┬─────────┘ └─────────┬─────────┘ > │ > >> +# │ > ┌─────────┴─────────┐ ┌─────────┴─────────┐ ┌─────────┴─────────┐ > │ > >> +# │ │ > ln-outside │ │ ln-outside │ │ ln-ext1 │ │ > >> +# │ > └─────────┬─────────┘ └─────────┬─────────┘ └─────────┬─────────┘ > │ > >> +# │ > ┌─────────┴─────────┐ ┌─────────┴─────────┐ ┌─────────┴─────────┐ > │ > >> +# │ │ > br-ex │ │ br-ex │ │ br-ex │ │ > >> +# │ > └─────────┬─────────┘ └─────────┬─────────┘ └─────────┬─────────┘ > │ > >> +# │ > └---------eth2-----------┴-------eth2-------------┘ │ > >> +# │ > │ > >> +# > └────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘ > >> > >> # The goal of this test is the check that GARP are properly generated > by higest priority traffic when > >> # BFD goes down, and back up, and this whether the BFD event is due > either to some bfd packet lost > >> @@ -3030,6 +3030,12 @@ AT_SETUP([HA: Check for missing garp on leader > when BFD goes back up]) > >> # So gw3 should in this test neither send garp or receive packets. > >> # > >> # Enable vconn so we can check the GARP from a log perspective. > >> +on_exit "podman exec ovn-gw-1 ovn-appctl vlog/set info" > >> +on_exit "podman exec ovn-gw-1 ovn-appctl vlog/enable-rate-limit" > >> +on_exit "podman exec ovn-gw-2 ovn-appctl vlog/set info" > >> +on_exit "podman exec ovn-gw-2 ovn-appctl vlog/enable-rate-limit" > >> +on_exit "podman exec ovn-gw-3 ovn-appctl vlog/set info" > >> +on_exit "podman exec ovn-gw-3 ovn-appctl vlog/enable-rate-limit" > >> m_as ovn-gw-1 ovn-appctl vlog/set vconn:dbg > >> m_as ovn-gw-2 ovn-appctl vlog/set vconn:dbg > >> m_as ovn-gw-3 ovn-appctl vlog/set vconn:dbg > >> @@ -3037,12 +3043,17 @@ m_as ovn-gw-1 ovn-appctl vlog/disable-rate-limit > >> m_as ovn-gw-2 ovn-appctl vlog/disable-rate-limit > >> m_as ovn-gw-3 ovn-appctl vlog/disable-rate-limit > >> > >> +# Decrease revalidation time on ovs switch simulating ToR. > >> +on_exit "OVS_RUNDIR= ovs-vsctl set Open_vSwitch . > other_config:max-revalidator=500" > > Isn't it better to just remove the key on_exit, just in case the default > ever changes from 500 to something else? I.e.: > > on_exit "OVS_RUNDIR= ovs-vsctl remove Open_vSwitch . other_config > max-revalidator" > > >> +OVS_RUNDIR= ovs-vsctl set Open_vSwitch . > other_config:max-revalidator=100 > >> + > >> check_fake_multinode_setup > >> > >> # Delete the multinode NB and OVS resources before starting the test. > >> cleanup_multinode_resources > >> > >> ip_ch1=$(m_as ovn-chassis-1 ip a show dev eth1 | grep "inet " | awk > '{print $2}'| cut -d '/' -f1) > >> +ip_ch2=$(m_as ovn-chassis-2 ip a show dev eth1 | grep "inet " | awk > '{print $2}'| cut -d '/' -f1) > >> ip_gw1=$(m_as ovn-gw-1 ip a show dev eth1 | grep "inet " | awk '{print > $2}'| cut -d '/' -f1) > >> ip_gw2=$(m_as ovn-gw-2 ip a show dev eth1 | grep "inet " | awk '{print > $2}'| cut -d '/' -f1) > >> ip_gw3=$(m_as ovn-gw-3 ip a show dev eth1 | grep "inet " | awk '{print > $2}'| cut -d '/' -f1) > >> @@ -3050,25 +3061,35 @@ ip_gw3=$(m_as ovn-gw-3 ip a show dev eth1 | > grep "inet " | awk '{print $2}'| cut > >> from_gw1_to_gw2=$(m_as ovn-gw-1 ovs-vsctl --bare --columns=name find > interface options:remote_ip=$ip_gw2) > >> from_gw1_to_gw3=$(m_as ovn-gw-1 ovs-vsctl --bare --columns=name find > interface options:remote_ip=$ip_gw3) > >> from_gw1_to_ch1=$(m_as ovn-gw-1 ovs-vsctl --bare --columns=name find > interface options:remote_ip=$ip_ch1) > >> +from_gw1_to_ch2=$(m_as ovn-gw-1 ovs-vsctl --bare --columns=name find > interface options:remote_ip=$ip_ch2) > >> from_gw2_to_gw1=$(m_as ovn-gw-2 ovs-vsctl --bare --columns=name find > interface options:remote_ip=$ip_gw1) > >> from_gw2_to_gw3=$(m_as ovn-gw-2 ovs-vsctl --bare --columns=name find > interface options:remote_ip=$ip_gw3) > >> from_gw2_to_ch1=$(m_as ovn-gw-2 ovs-vsctl --bare --columns=name find > interface options:remote_ip=$ip_ch1) > >> +from_gw2_to_ch2=$(m_as ovn-gw-2 ovs-vsctl --bare --columns=name find > interface options:remote_ip=$ip_ch2) > >> from_ch1_to_gw1=$(m_as ovn-chassis-1 ovs-vsctl --bare --columns=name > find interface options:remote_ip=$ip_gw1) > >> from_ch1_to_gw2=$(m_as ovn-chassis-1 ovs-vsctl --bare --columns=name > find interface options:remote_ip=$ip_gw2) > >> +from_ch2_to_gw1=$(m_as ovn-chassis-2 ovs-vsctl --bare --columns=name > find interface options:remote_ip=$ip_gw1) > >> +from_ch2_to_gw2=$(m_as ovn-chassis-2 ovs-vsctl --bare --columns=name > find interface options:remote_ip=$ip_gw2) > >> > >> m_as ovn-chassis-1 ip link del hv1-vif1-p > >> -m_as ovn-chassis-2 ip link del ext1-p > >> +m_as ovn-chassis-2 ip link del hv2-vif1-p > >> +m_as ovn-chassis-3 ip link del ext1-p > >> > >> OVS_WAIT_UNTIL([m_as ovn-chassis-1 ip link show | grep -q genev_sys]) > >> OVS_WAIT_UNTIL([m_as ovn-chassis-2 ip link show | grep -q genev_sys]) > >> +OVS_WAIT_UNTIL([m_as ovn-chassis-3 ip link show | grep -q genev_sys]) > >> OVS_WAIT_UNTIL([m_as ovn-gw-1 ip link show | grep -q genev_sys]) > >> OVS_WAIT_UNTIL([m_as ovn-gw-2 ip link show | grep -q genev_sys]) > >> OVS_WAIT_UNTIL([m_as ovn-gw-3 ip link show | grep -q genev_sys]) > >> > >> +# Use "aggressive" bfd parameters > >> +check multinode_nbctl set NB_Global . options:"bfd-min-rx"=500 > >> +check multinode_nbctl set NB_Global . options:"bfd-min-tx"=100 > >> check multinode_nbctl ls-add inside > >> check multinode_nbctl ls-add outside > >> check multinode_nbctl ls-add ext > >> check multinode_nbctl lsp-add inside inside1 -- lsp-set-addresses > inside1 "f0:00:c0:a8:01:01 192.168.1.1" > >> +check multinode_nbctl lsp-add inside inside2 -- lsp-set-addresses > inside2 "f0:00:c0:a8:01:02 192.168.1.2" > >> check multinode_nbctl lsp-add ext ext1 -- lsp-set-addresses ext1 > "00:00:c0:a8:00:01 192.168.0.1" > >> > >> multinode_nbctl create Logical_Router name=R1 > >> @@ -3100,12 +3121,14 @@ m_as ovn-gw-3 ovs-vsctl remove open . > external_ids garp-max-timeout-sec > >> > >> m_as ovn-chassis-1 ovs-vsctl set open . > external-ids:ovn-bridge-mappings=public:br-ex > >> m_as ovn-chassis-2 ovs-vsctl set open . > external-ids:ovn-bridge-mappings=public:br-ex > >> +m_as ovn-chassis-3 ovs-vsctl set open . > external-ids:ovn-bridge-mappings=public:br-ex > >> m_as ovn-gw-1 ovs-vsctl set open . > external-ids:ovn-bridge-mappings=public:br-ex > >> m_as ovn-gw-2 ovs-vsctl set open . > external-ids:ovn-bridge-mappings=public:br-ex > >> m_as ovn-gw-3 ovs-vsctl set open . > external-ids:ovn-bridge-mappings=public:br-ex > >> > >> m_as ovn-chassis-1 /data/create_fake_vm.sh inside1 hv1-vif1 > f0:00:c0:a8:01:01 1500 192.168.1.1 24 192.168.1.254 2000::1/64 2000::a > >> -m_as ovn-chassis-2 /data/create_fake_vm.sh ext1 ext1 00:00:c0:a8:00:01 > 1500 192.168.0.1 24 192.168.0.254 1000::3/64 1000::a > >> +m_as ovn-chassis-2 /data/create_fake_vm.sh inside2 hv2-vif1 > f0:00:c0:a8:01:02 1500 192.168.1.2 24 192.168.1.254 2000::2/64 2000::a > >> +m_as ovn-chassis-3 /data/create_fake_vm.sh ext1 ext1 00:00:c0:a8:00:01 > 1500 192.168.0.1 24 192.168.0.254 1000::3/64 1000::a > >> > >> # There should be one ha_chassis_group with the name "R1_outside" > >> m_check_row_count HA_Chassis_Group 1 name=R1_outside > >> @@ -3160,53 +3183,67 @@ for chassis in $from_ch1_to_gw1 > $from_ch1_to_gw2; do > >> wait_bfd_enabled ovn-chassis-1 $chassis > >> done > >> > >> +# check BFD enablement on tunnel ports from ovn-chassis-2 ########### > >> +for chassis in $from_ch2_to_gw1 $from_ch2_to_gw2; do > >> + echo "checking ovn-chassis-2 -> $chassis" > >> + wait_bfd_enabled ovn-chassis-2 $chassis > >> +done > >> + > >> # Make sure there is no nft table left. Do not use nft directly as > might not be installed in container. > >> gw1_pid=$(podman inspect -f '{{.State.Pid}}' ovn-gw-1) > >> nsenter --net=/proc/$gw1_pid/ns/net nft list tables | grep ovn-test && > nsenter --net=/proc/$gw1_pid/ns/net nft delete table ip ovn-test > >> -on_exit "nsenter --net=/proc/$gw1_pid/ns/net nft list tables | grep > ovn-test && nsenter --net=/proc/$gw1_pid/ns/net nft delete table ip > ovn-test" > >> +on_exit "if [[ -d "/proc/$gw1_pid" ]]; then nsenter > --net=/proc/$gw1_pid/ns/net nft list tables | grep ovn-test && nsenter > --net=/proc/$gw1_pid/ns/net nft delete table ip ovn-test; fi" > >> > >> -for chassis in $from_gw1_to_gw2 $from_gw1_to_gw3 $from_gw1_to_ch1; do > >> +for chassis in $from_gw1_to_gw2 $from_gw1_to_gw3 $from_gw1_to_ch1 > $from_gw1_to_ch2; do > >> wait_bfd_up ovn-gw-1 $chassis > >> done > >> -for chassis in $from_gw2_to_gw1 $from_gw2_to_gw3 $from_gw2_to_ch1; do > >> +for chassis in $from_gw2_to_gw1 $from_gw2_to_gw3 $from_gw2_to_ch1 > $from_gw2_to_ch2; do > >> wait_bfd_up ovn-gw-2 $chassis > >> done > >> for chassis in $from_ch1_to_gw1 $from_ch1_to_gw2; do > >> wait_bfd_up ovn-chassis-1 $chassis > >> done > >> +for chassis in $from_ch2_to_gw1 $from_ch2_to_gw2; do > >> + wait_bfd_up ovn-chassis-2 $chassis > >> +done > >> > >> m_wait_row_count Port_Binding 1 logical_port=cr-R1_outside > chassis=$gw1_chassis > >> check multinode_nbctl --wait=hv sync > >> > >> start_tcpdump() { > >> echo "$(date +%H:%M:%S.%03N) Starting tcpdump" > >> - M_START_TCPDUMP([ovn-chassis-1], [-neei hv1-vif1-p], [ch1]) > >> - M_START_TCPDUMP([ovn-chassis-2], [-neei eth2], [ch2]) > >> - M_START_TCPDUMP([ovn-gw-1], [-neei eth2], [gw1]) > >> - M_START_TCPDUMP([ovn-gw-1], [-neei eth2 -Q out], [gw1_out]) > >> - M_START_TCPDUMP([ovn-gw-2], [-neei eth2], [gw2]) > >> - M_START_TCPDUMP([ovn-gw-2], [-neei eth2 -Q out], [gw2_out]) > >> - M_START_TCPDUMP([ovn-gw-3], [-neei eth2], [gw3]) > >> - M_START_TCPDUMP([ovn-gw-3], [-neei eth2 -Q out], [gw3_out]) > >> + M_START_TCPDUMPS([ovn-chassis-1], [-neei hv1-vif1-p], [ch1], > >> + [ovn-chassis-2], [-neei hv2-vif1-p], [ch2], > >> + [ovn-chassis-3], [-neei eth2], [ch3], > >> + [ovn-gw-1], [-neei eth2], [gw1], > >> + [ovn-gw-1], [-neei eth2 -Q out], [gw1_out], > >> + [ovn-gw-2], [-neei eth2], [gw2], > >> + [ovn-gw-2], [-neei eth2 -Q out], [gw2_out], > >> + [ovn-gw-3], [-neei eth2], [gw3], > >> + [ovn-gw-3], [-neei eth2 -Q out], [gw3_out], > >> + [ovn-gw-1], [-neei eth1], [gw1_eth1], > >> + [ovn-gw-2], [-neei eth1], [gw2_eth1], > >> + [ovn-chassis-1], [-neei eth1], [ch1_eth1], > >> + [ovn-chassis-2], [-neei eth1], [ch2_eth1]) > >> } > >> > >> stop_tcpdump() { > >> echo "$(date +%H:%M:%S.%03N) Stopping tcpdump" > >> - m_kill 'ovn-gw-1 ovn-gw-2 ovn-gw-3 ovn-chassis-1 ovn-chassis-2' > tcpdump > >> + m_kill 'ovn-gw-1 ovn-gw-2 ovn-gw-3 ovn-chassis-1 ovn-chassis-2 > ovn-chassis-3' tcpdump > >> } > >> > >> -# Send packets from chassis2 (ext1) to chassis1 > >> +# Send packets from ovn-chassis-3 (ext1) to ovn-chassis-1 > >> send_background_packets() { > >> echo "$(date +%H:%M:%S.%03N) Sending packets in Background" > >> start_tcpdump > >> - M_NS_DAEMONIZE([ovn-chassis-2], [ext1], [ping -f -i 0.1 > 192.168.1.1], [ping.pid]) > >> + M_NS_DAEMONIZE([ovn-chassis-3], [ext1], [ping -f -i 0.1 > 192.168.1.1], [ping.pid]) > >> } > >> > >> stop_sending_background_packets() { > >> echo "$(date +%H:%M:%S.%03N) Stopping Background process" > >> m_as ovn-chassis-1 ps -ef | grep -v grep | grep -q ping && \ > >> m_as ovn-chassis-1 echo "Stopping ping on ovn-chassis-1" && > killall ping > >> - m_as ovn-chassis-2 ps -ef | grep -v grep | grep -q ping && \ > >> + m_as ovn-chassis-3 ps -ef | grep -v grep | grep -q ping && \ > >> m_as ovn-chassis-2 echo "Stopping ping on ovn-chassis-2" && > killall ping > >> stop_tcpdump > >> } > >> @@ -3216,8 +3253,8 @@ check_for_new_garps() { > >> expecting_garp=$2 > >> n_new_garps=$(cat ${hv}_out.tcpdump | grep -c "f0:00:c0:a8:00:fe > > Broadcast, ethertype ARP (0x0806), length 42: Request who-has 192.168.0.254 > tell 192.168.0.254, length 28") > >> > >> - if [ "$expecting_garp" == "true" ]; then > >> - AS_BOX([$(date +%H:%M:%S.%03N) Waiting/checking for garp from > $hv - Starting with $n_new_garps]) > >> + if [[ "$expecting_garp" == "true" ]]; then > >> + echo "$(date +%H:%M:%S.%03N) Waiting/checking for garp from > $hv - Starting with $n_new_garps" > >> OVS_WAIT_UNTIL([ > >> n_garps=$n_new_garps > >> n_new_garps=$(cat ${hv}_out.tcpdump | grep -c > "f0:00:c0:a8:00:fe > Broadcast, ethertype ARP (0x0806), length 42: Request > who-has 192.168.0.254 tell 192.168.0.254, length 28") > >> @@ -3225,7 +3262,7 @@ check_for_new_garps() { > >> test "$n_garps" -ne "$n_new_garps" > >> ]) > >> else > >> - AS_BOX([$(date +%H:%M:%S.%03N) Checking no garp from ${hv}]) > >> + echo "$(date +%H:%M:%S.%03N) Checking no garp from ${hv}" > >> # Waiting a few seconds to get a chance to see unexpected > garps. > >> sleep 3 > >> n_garps=$(cat ${hv}_out.tcpdump | grep -c "f0:00:c0:a8:00:fe > > Broadcast, ethertype ARP (0x0806), length 42: Request who-has 192.168.0.254 > tell 192.168.0.254, length 28") > >> @@ -3241,8 +3278,8 @@ check_for_new_echo_pkts() { > >> n_new_echo_req=$(cat ${hv}.tcpdump | grep -c "$mac_src > $mac_dst, > ethertype IPv4 (0x0800), length 98: 192.168.0.1 > 192.168.1.1: ICMP echo > request") > >> n_new_echo_rep=$(cat ${hv}.tcpdump | grep -c "$mac_dst > $mac_src, > ethertype IPv4 (0x0800), length 98: 192.168.1.1 > 192.168.0.1: ICMP echo > reply") > >> > >> - if [ "$expecting_pkts" == "true" ]; then > >> - AS_BOX([$(date +%H:%M:%S.%03N) Waiting/checking for echo pkts > through ${hv}]) > >> + if [[ "$expecting_pkts" == "true" ]]; then > >> + echo "$(date +%H:%M:%S.%03N) Waiting/checking for echo pkts > through ${hv}" > >> echo "Starting with $n_new_echo_req requests and > $n_new_echo_rep replies so far on ${hv}." > >> OVS_WAIT_UNTIL([ > >> n_echo_req=$n_new_echo_req > >> @@ -3253,7 +3290,7 @@ check_for_new_echo_pkts() { > >> test "$n_echo_req" -ne "$n_new_echo_req" && test > "$n_echo_rep" -ne "$n_new_echo_rep" > >> ]) > >> else > >> - AS_BOX([$(date +%H:%M:%S.%03N) Checking no pkts from ${hv}]) > >> + echo "$(date +%H:%M:%S.%03N) Checking no pkts from ${hv}" > >> # Waiting a few seconds to get a chance to see unexpected pkts. > >> sleep 3 > >> n_echo_req=$(cat ${hv}.tcpdump | grep -c "$mac_src > $mac_dst, > ethertype IPv4 (0x0800), length 98: 192.168.0.1 > 192.168.1.1: ICMP echo > request") > >> @@ -3271,22 +3308,44 @@ dump_statistics() { > >> ch1_rep=$(grep -c "ICMP echo reply" ch1.tcpdump) > >> ch2_req=$(grep -c "ICMP echo request" ch2.tcpdump) > >> ch2_rep=$(grep -c "ICMP echo reply" ch2.tcpdump) > >> + ch3_req=$(grep -c "ICMP echo request" ch3.tcpdump) > >> + ch3_rep=$(grep -c "ICMP echo reply" ch3.tcpdump) > >> gw1_req=$(grep -c "ICMP echo request" gw1.tcpdump) > >> gw1_rep=$(grep -c "ICMP echo reply" gw1.tcpdump) > >> gw2_req=$(grep -c "ICMP echo request" gw2.tcpdump) > >> gw2_rep=$(grep -c "ICMP echo reply" gw2.tcpdump) > >> gw3_req=$(grep -c "ICMP echo request" gw3.tcpdump) > >> gw3_rep=$(grep -c "ICMP echo reply" gw3.tcpdump) > >> - echo "$n1 claims in gw1, $n2 in gw2 and $n3 on gw3" > >> - echo "ch2_request=$ch2_req gw1_request=$gw1_req > gw2_request=$gw2_req gw3_request=$gw3_req ch1_request=$ch1_req > ch1_reply=$ch1_rep gw1_reply=$gw1_rep gw2_reply=$gw2_rep gw3_reply=$gw3_rep > ch2_reply=$ch2_rep" > >> + echo "$n1 claims in gw1, $n2 in gw2 and $n3 on gw3" >&2 > >> + echo "ch3_req=$ch3_req gw_req=($gw1_req + $gw2_req +$gw3_req) > ch1_req=$ch1_req ch1_rep=$ch1_rep gw_rep=($gw1_rep + $gw2_rep + $gw3_rep) > ch3_rep=$ch3_rep ch2=($ch2_req+$ch2_rep)" >&2 > >> + echo "$((ch3_req - ch3_rep))" > >> } > >> > >> -check_migration_between_gw1_and_gw2() { > >> - action=$1 > >> - send_background_packets > >> +add_port() { > >> + bridge=$1 > >> + interface=$2 > >> + address=$3 > >> + echo "Adding $bridge $interface $address" > >> + > >> + pid=$(podman inspect -f '{{.State.Pid}}' ovn-gw-1) > >> + ln -sf /proc/$pid/ns/net /var/run/netns/$pid > >> + port=$(OVS_RUNDIR= ovs-vsctl --data=bare --no-heading > --columns=name find interface \ > >> + external_ids:container_id=ovn-gw-1 > external_ids:container_iface="$interface") > >> + port="${port:0:13}" > >> + ip link add "${port}_l" type veth peer name "${port}_c" > >> + ip link set "${port}_l" up > >> + ip link set "${port}_c" netns $pid > >> + ip netns exec $pid ip link set dev "${port}_c" name "$interface" > >> + ip netns exec $pid ip link set "$interface" up > >> + if [[ -n "$address" ]]; then > >> + ip netns exec $pid ip addr add "$address" dev "$interface" > >> + fi > > I might be wrong but I think nobody cleans up any of these ports created > in the ovn-gw-1 container. Do we need some on_exit() calls here? > I do not think so. We stop a container, and ports on host were deleted as a consequence. add_port just "repairs" the container, after restarting it, adding back the ports which ovn-fake-multinode initially added. We could debate whether add_port should itself be added in on_exit, right after podman stop the container. But it looked overkill to me (we do add_port right after podman stop/start, so on_exit would only be useful if podman stop/start failed (in which case cluster is anyhow in bad state) or if e.g. we ctrl-c just between podman stop and add_port WDYT? > > >> +} > >> > >> +prepare() { > >> + send_background_packets > >> # We make sure gw1 is leader since enough time that it generated > all its garps. > >> - AS_BOX([$(date +%H:%M:%S.%03N) Waiting all garps sent by gw1]) > >> + echo $(date +%H:%M:%S.%03N) Waiting all garps sent by gw1 > >> n_new_garps=$(cat gw1_out.tcpdump | grep -c "f0:00:c0:a8:00:fe > > Broadcast, ethertype ARP (0x0806), length 42: Request who-has 192.168.0.254 > tell 192.168.0.254, length 28") > >> OVS_WAIT_UNTIL([ > >> n_garps=$n_new_garps > >> @@ -3302,130 +3361,267 @@ check_migration_between_gw1_and_gw2() { > >> check_for_new_echo_pkts gw2 "00:00:c0:a8:00:01" > "f0:00:c0:a8:00:fe" "false" > >> check_for_new_echo_pkts gw3 "00:00:c0:a8:00:01" > "f0:00:c0:a8:00:fe" "false" > >> > >> + # All packets should go through gw1, and none through gw2 or gw3. > >> + check_packets "true" "false" "false" "true" > >> flap_count_gw_1=$(m_as ovn-gw-1 ovs-vsctl get interface > $from_gw1_to_gw2 bfd_status | sed 's/.*flap_count=\"\([[0-9]]*\).*/\1/g') > >> flap_count_gw_2=$(m_as ovn-gw-2 ovs-vsctl get interface > $from_gw2_to_gw1 bfd_status | sed 's/.*flap_count=\"\([[0-9]]*\).*/\1/g') > >> +} > >> > >> - if [ test "$action" == "stop_bfd" ]; then > >> - AS_BOX([$(date +%H:%M:%S.%03N) Blocking bfd on gw1 (from > $ip_gw1 to $ip_gw2)]) > >> - nsenter --net=/proc/$gw1_pid/ns/net nft add table ip ovn-test > >> - nsenter --net=/proc/$gw1_pid/ns/net nft 'add chain ip ovn-test > INPUT { type filter hook input priority 0; policy accept; }' > >> - # Drop BFD from gw-1 to gw-2: geneve port (6081), inner port > 3784 (0xec8), Session state Up, Init, Down. > >> - nsenter --net=/proc/$gw1_pid/ns/net nft add rule ip ovn-test > INPUT ip daddr $ip_gw1 ip saddr $ip_gw2 udp dport 6081 '@th,416,16 == > 0x0ec8 @th,472,8 == 0xc0 counter drop' > >> - nsenter --net=/proc/$gw1_pid/ns/net nft add rule ip ovn-test > INPUT ip daddr $ip_gw1 ip saddr $ip_gw2 udp dport 6081 '@th,416,16 == > 0x0ec8 @th,472,8 == 0x80 counter drop' > >> - nsenter --net=/proc/$gw1_pid/ns/net nft add rule ip ovn-test > INPUT ip daddr $ip_gw1 ip saddr $ip_gw2 udp dport 6081 '@th,416,16 == > 0x0ec8 @th,472,8 == 0x40 counter drop' > >> - > >> - # We do not check that packets go through gw2 as BFD between > chassis-2 and gw1 is still up > >> - fi > >> - > >> - if [ test "$action" == "kill_gw2" ]; then > >> - AS_BOX([$(date +%H:%M:%S.%03N) Killing gw2 ovn-controller]) > >> - on_exit 'm_as ovn-gw-2 /usr/share/openvswitch/scripts/ovs-ctl > status || > >> - m_as ovn-gw-2 /usr/share/openvswitch/scripts/ovs-ctl > start --system-id=ovn-gw-2' > >> - on_exit 'm_as ovn-gw-2 /usr/share/ovn/scripts/ovn-ctl > status_controller || > >> - m_as ovn-gw-2 /usr/share/ovn/scripts/ovn-ctl > start_controller ${CONTROLLER_SSL_ARGS}' > >> - > >> - m_as ovn-gw-2 kill -9 $(m_as ovn-gw-2 cat > /run/ovn/ovn-controller.pid) > >> - m_as ovn-gw-2 kill -9 $(m_as ovn-gw-2 cat > /run/openvswitch/ovs-vswitchd.pid) > >> - m_as ovn-gw-2 kill -9 $(m_as ovn-gw-2 cat > /run/openvswitch/ovsdb-server.pid) > >> - # Also delete datapath (flows) > >> - m_as ovn-gw-2 ovs-dpctl del-dp system@ovs-system > >> - fi > >> - > >> - if [ test "$action" == "kill_gw1" ]; then > >> - AS_BOX([$(date +%H:%M:%S.%03N) Killing gw1 ovn-controller]) > >> - on_exit 'm_as ovn-gw-1 /usr/share/openvswitch/scripts/ovs-ctl > status || > >> - m_as ovn-gw-1 /usr/share/openvswitch/scripts/ovs-ctl > start --system-id=ovn-gw-1' > >> - on_exit 'm_as ovn-gw-1 /usr/share/ovn/scripts/ovn-ctl > status_controller || > >> - m_as ovn-gw-1 /usr/share/ovn/scripts/ovn-ctl > start_controller ${CONTROLLER_SSL_ARGS}' > >> - > >> - m_as ovn-gw-1 kill -9 $(m_as ovn-gw-1 cat > /run/ovn/ovn-controller.pid) > >> - m_as ovn-gw-1 kill -9 $(m_as ovn-gw-1 cat > /run/openvswitch/ovs-vswitchd.pid) > >> - m_as ovn-gw-1 kill -9 $(m_as ovn-gw-1 cat > /run/openvswitch/ovsdb-server.pid) > >> - # Also delete datapath (flows) > >> - m_as ovn-gw-1 ovs-dpctl del-dp system@ovs-system > >> - fi > >> +check_loss_after_flap() > >> +{ > >> + dead=$1 > >> + max_expected_loss=$2 > >> > >> - if [ test "$action" == "kill_gw2" ]; then > >> - AS_BOX([$(date +%H:%M:%S.%03N) Waiting for flap count between > gw1 and gw2 to increase]) > >> + if [[ "$dead" == "gw2" ]]; then > >> + echo "$(date +%H:%M:%S.%03N) Waiting for flap count between > gw1 and gw2 to increase" > >> OVS_WAIT_UNTIL([ > >> new_flap_count=$(m_as ovn-gw-1 ovs-vsctl get interfac > $from_gw1_to_gw2 bfd_status | sed 's/.*flap_count=\"\([[0-9]]*\).*/\1/g') > >> echo "Comparing $new_flap_count versus $flap_count_gw_1" > >> test "$new_flap_count" -gt "$((flap_count_gw_1))" > >> ]) > >> else > >> - AS_BOX([$(date +%H:%M:%S.%03N) Waiting for flap count between > gw2 and gw1 to increase]) > >> + echo "$(date +%H:%M:%S.%03N) Waiting for flap count between > gw2 and gw1 to increase])" > >> OVS_WAIT_UNTIL([ > >> new_flap_count=$(m_as ovn-gw-2 ovs-vsctl get interfac > $from_gw2_to_gw1 bfd_status | sed 's/.*flap_count=\"\([[0-9]]*\).*/\1/g') > >> echo "Comparing $new_flap_count versus $flap_count_gw_2" > >> test "$new_flap_count" -gt "$((flap_count_gw_2))" > >> ]) > >> - > >> fi > >> - AS_BOX([$(date +%H:%M:%S.%03N) Flapped!]) > >> > >> + echo "$(date +%H:%M:%S.%03N) Flapped!" > >> # Wait a few more second for the fight. > >> + sleep 4 > >> + > >> + echo "$(date +%H:%M:%S.%03N) Statistics after flapping" > >> + lost=$(dump_statistics) > >> + echo "===> $lost packet lost while handling migration" > >> + AT_CHECK([test "$lost" -le "$max_expected_loss"]) > >> +} > >> + > >> +final_check() > >> +{ > >> + action=$1 > >> + lost=$2 > >> + max_expected_loss_after_restoration=$3 > >> + > >> + # Wait a little more to get packets while network is restored > >> sleep 2 > >> - AS_BOX([$(date +%H:%M:%S.%03N) Statistics after flapping]) > >> - dump_statistics > >> - > >> - if [ test "$action" == "stop_bfd" ]; then > >> - # gw1 still alive and gw2 tried to claim => gw1 should restart > generating garps. > >> - check_for_new_garps gw1 "true" > >> - check_for_new_garps gw2 "false" > >> - check_for_new_garps gw3 "false" > >> - check_for_new_echo_pkts gw1 00:00:c0:a8:00:01 > f0:00:c0:a8:00:fe "true" > >> - check_for_new_echo_pkts gw2 00:00:c0:a8:00:01 > f0:00:c0:a8:00:fe "false" > >> - check_for_new_echo_pkts gw3 00:00:c0:a8:00:01 > f0:00:c0:a8:00:fe "false" > >> - check_for_new_echo_pkts ch1 f0:00:c0:a8:01:fe > f0:00:c0:a8:01:01 "true" > >> - AS_BOX([$(date +%H:%M:%S.%03N) Unblocking bfd on gw1]) > >> - nsenter --net=/proc/$gw1_pid/ns/net nft -a list ruleset > >> - nsenter --net=/proc/$gw1_pid/ns/net nft delete table ip > ovn-test > >> - fi > >> + echo "$(date +%H:%M:%S.%03N) Statistics after network restored > (after $action)" > >> + new_lost=$(dump_statistics) > >> + echo "===> $((new_lost - lost)) packets lost during network > restoration" > >> + AT_CHECK([test "$((new_lost - lost))" -le > "$max_expected_loss_after_restoration"]) > >> + stop_sending_background_packets > >> +} > >> > >> - if [ test "$action" == "kill_gw2" ]; then > >> - # gw1 still alive, but gw2 did not try to claim => gw1 should > not generate new garps. > >> - check_for_new_garps gw1 "false" > >> - check_for_new_garps gw2 "false" > >> - check_for_new_garps gw3 "false" > >> - check_for_new_echo_pkts gw1 00:00:c0:a8:00:01 > f0:00:c0:a8:00:fe "true" > >> - check_for_new_echo_pkts gw2 00:00:c0:a8:00:01 > f0:00:c0:a8:00:fe "false" > >> - check_for_new_echo_pkts gw3 00:00:c0:a8:00:01 > f0:00:c0:a8:00:fe "false" > >> - check_for_new_echo_pkts ch1 f0:00:c0:a8:01:fe > f0:00:c0:a8:01:01 "true" > >> - AS_BOX([$(date +%H:%M:%S.%03N) Restarting gw2 ovn-vswitchd]) > >> - m_as ovn-gw-2 /usr/share/openvswitch/scripts/ovs-ctl start > --system-id=ovn-gw-2 > >> - > >> - AS_BOX([$(date +%H:%M:%S.%03N) Restarting gw2 ovn-controller]) > >> - m_as ovn-gw-2 /usr/share/ovn/scripts/ovn-ctl start_controller > ${CONTROLLER_SSL_ARGS} > >> - fi > >> +check_garps() > >> +{ > >> + check_for_new_garps gw1 "$1" > >> + check_for_new_garps gw2 "$2" > >> + check_for_new_garps gw3 "$3" > >> +} > >> > >> - if [ test "$action" == "kill_gw1" ]; then > >> - # gw1 died => gw2 should generate garps. > >> - check_for_new_garps gw1 "false" > >> - check_for_new_garps gw2 "true" > >> - check_for_new_garps gw3 "false" > >> - check_for_new_echo_pkts gw1 00:00:c0:a8:00:01 > f0:00:c0:a8:00:fe "false" > >> - check_for_new_echo_pkts gw2 00:00:c0:a8:00:01 > f0:00:c0:a8:00:fe "true" > >> - check_for_new_echo_pkts gw3 00:00:c0:a8:00:01 > f0:00:c0:a8:00:fe "false" > >> - check_for_new_echo_pkts ch1 f0:00:c0:a8:01:fe > f0:00:c0:a8:01:01 "true" > >> - AS_BOX([$(date +%H:%M:%S.%03N) Restarting gw1 ovn-vswitchd]) > >> - m_as ovn-gw-1 /usr/share/openvswitch/scripts/ovs-ctl start > --system-id=ovn-gw-1 > >> - > >> - AS_BOX([$(date +%H:%M:%S.%03N) Restarting gw1 ovn-controller]) > >> - m_as ovn-gw-1 /usr/share/ovn/scripts/ovn-ctl start_controller > ${CONTROLLER_SSL_ARGS} > >> - fi > >> +check_packets() > >> +{ > >> + check_for_new_echo_pkts gw1 00:00:c0:a8:00:01 f0:00:c0:a8:00:fe > "$1" > >> + check_for_new_echo_pkts gw2 00:00:c0:a8:00:01 f0:00:c0:a8:00:fe > "$2" > >> + check_for_new_echo_pkts gw3 00:00:c0:a8:00:01 f0:00:c0:a8:00:fe > "$3" > >> + check_for_new_echo_pkts ch1 f0:00:c0:a8:01:fe f0:00:c0:a8:01:01 > "$4" > >> +} > >> + > >> +check_migration_between_gw1_and_gw2_bfd_stop() > >> +{ > >> + AS_BOX([$(date +%H:%M:%S.%03N) Testing migration after bfd_stop]) > >> + max_expected_max_expected_loss1=$1 > >> + max_expected_max_expected_loss2=$2 > >> + prepare > >> + > >> + echo "$(date +%H:%M:%S.%03N) Blocking bfd on gw1 (from $ip_gw1 to > $ip_gw2)" > >> + nsenter --net=/proc/$gw1_pid/ns/net nft add table ip ovn-test > >> + nsenter --net=/proc/$gw1_pid/ns/net nft 'add chain ip ovn-test > INPUT { type filter hook input priority 0; policy accept; }' > >> + # Drop BFD from gw-1 to gw-2: geneve port (6081), inner port 3784 > (0xec8), Session state Up, Init, Down. > >> + nsenter --net=/proc/$gw1_pid/ns/net nft add rule ip ovn-test INPUT > ip daddr $ip_gw1 ip saddr $ip_gw2 udp dport 6081 '@th,416,16 == 0x0ec8 > @th,472,8 == 0xc0 counter drop' > >> + nsenter --net=/proc/$gw1_pid/ns/net nft add rule ip ovn-test INPUT > ip daddr $ip_gw1 ip saddr $ip_gw2 udp dport 6081 '@th,416,16 == 0x0ec8 > @th,472,8 == 0x80 counter drop' > >> + nsenter --net=/proc/$gw1_pid/ns/net nft add rule ip ovn-test INPUT > ip daddr $ip_gw1 ip saddr $ip_gw2 udp dport 6081 '@th,416,16 == 0x0ec8 > @th,472,8 == 0x40 counter drop' > >> + > >> + check_loss_after_flap "gw1" $max_expected_max_expected_loss1 > >> + > >> + # gw1 still alive and gw2 tried to claim => gw1 should restart > generating garps. > >> + check_garps "true" "false" "false" > >> + check_packets "true" "false" "false" "true" > >> + > >> + echo "$(date +%H:%M:%S.%03N) Unblocking bfd on gw1" > >> + nsenter --net=/proc/$gw1_pid/ns/net nft -a list ruleset > >> + nsenter --net=/proc/$gw1_pid/ns/net nft delete table ip ovn-test > >> > >> # The network is now restored => packets should go through gw1 and > reach chassis-1. > >> - check_for_new_echo_pkts gw1 00:00:c0:a8:00:01 f0:00:c0:a8:00:fe > "true" > >> - check_for_new_echo_pkts gw2 00:00:c0:a8:00:01 f0:00:c0:a8:00:fe > "false" > >> - check_for_new_echo_pkts gw3 00:00:c0:a8:00:01 f0:00:c0:a8:00:fe > "false" > >> - check_for_new_echo_pkts ch1 f0:00:c0:a8:01:fe f0:00:c0:a8:01:01 > "true" > >> - AS_BOX([$(date +%H:%M:%S.%03N) Statistics after network restored]) > >> - dump_statistics > >> - stop_sending_background_packets > >> + check_packets "true" "false" "false" "true" > >> + final_check "bfd_stop" $lost $max_expected_max_expected_loss2 > >> +} > >> + > >> +check_migration_between_gw1_and_gw2_kill_gw2() { > >> + AS_BOX([$(date +%H:%M:%S.%03N) Check migration after killing gw2 > ovn-controller & vswitchd]) > >> + max_expected_loss1=$1 > >> + max_expected_loss2=$2 > >> + prepare > >> + > >> + on_exit 'm_as ovn-gw-2 /usr/share/openvswitch/scripts/ovs-ctl > status || > >> + m_as ovn-gw-2 /usr/share/openvswitch/scripts/ovs-ctl > start --system-id=ovn-gw-2' > >> + on_exit 'm_as ovn-gw-2 /usr/share/ovn/scripts/ovn-ctl > status_controller || > >> + m_as ovn-gw-2 /usr/share/ovn/scripts/ovn-ctl > start_controller ${CONTROLLER_SSL_ARGS}' > >> + > >> + m_as ovn-gw-2 kill -9 $(m_as ovn-gw-2 cat > /run/ovn/ovn-controller.pid) > >> + m_as ovn-gw-2 kill -9 $(m_as ovn-gw-2 cat > /run/openvswitch/ovs-vswitchd.pid) > >> + m_as ovn-gw-2 kill -9 $(m_as ovn-gw-2 cat > /run/openvswitch/ovsdb-server.pid) > >> + m_as ovn-gw-2 ovs-dpctl del-dp system@ovs-system > >> + > >> + check_loss_after_flap "gw2" $max_expected_loss1 > >> + > >> + # gw1 still alive, but gw2 did not try to claim => gw1 should not > generate new garps. > >> + check_garps "false" "false" "false" > >> + check_packets "true" "fals" "false" "true" > >> + > >> + echo "$(date +%H:%M:%S.%03N) Restarting gw2 ovn-vswitchd]" > >> + m_as ovn-gw-2 /usr/share/openvswitch/scripts/ovs-ctl start > --system-id=ovn-gw-2 > >> + > >> + echo "$(date +%H:%M:%S.%03N) Restarting gw2 ovn-controller" > >> + m_as ovn-gw-2 /usr/share/ovn/scripts/ovn-ctl start_controller > ${CONTROLLER_SSL_ARGS} > >> + > >> + # The network is now restored => packets should go through gw1 and > reach chassis-1. > >> + check_packets "true" "false" "false" "true" > >> + final_check "kill_gw2" $lost $max_expected_loss2 > >> +} > >> + > >> +check_migration_between_gw1_and_gw2_update_ovs() { > >> + AS_BOX([$(date +%H:%M:%S.%03N) Check migration after restarting > gw1 ovs-vswitchd ("update")]) > >> + max_expected_loss1=$1 > >> + max_expected_loss2=$2 > >> + prepare > >> + > >> + m_as ovn-gw-1 /usr/share/openvswitch/scripts/ovs-ctl restart > --system-id=ovn-gw-1 > >> + > >> + check_loss_after_flap "gw1" $max_expected_loss1 > >> + > >> + # The network is now restored => packets should go through gw1 and > reach chassis-1. > >> + check_packets "true" "false" "false" "true" > >> + final_check "ovs_update" $lost $max_expected_loss2 > >> +} > >> + > >> +check_migration_between_gw1_and_gw2_kill_gw1() { > >> + AS_BOX([$(date +%H:%M:%S.%03N) Killing gw1 ovn-controller and > ovs-vswitchd]) > >> + max_expected_loss1=$1 > >> + max_expected_loss2=$2 > >> + prepare > >> + > >> + on_exit 'm_as ovn-gw-1 /usr/share/openvswitch/scripts/ovs-ctl > status || > >> + m_as ovn-gw-1 /usr/share/openvswitch/scripts/ovs-ctl > start --system-id=ovn-gw-1' > >> + on_exit 'm_as ovn-gw-1 /usr/share/ovn/scripts/ovn-ctl > status_controller || > >> + m_as ovn-gw-1 /usr/share/ovn/scripts/ovn-ctl > start_controller ${CONTROLLER_SSL_ARGS}' > >> + > >> + m_as ovn-gw-1 kill -9 $(m_as ovn-gw-1 cat > /run/ovn/ovn-controller.pid) > >> + m_as ovn-gw-1 kill -9 $(m_as ovn-gw-1 cat > /run/openvswitch/ovs-vswitchd.pid) > >> + m_as ovn-gw-1 kill -9 $(m_as ovn-gw-1 cat > /run/openvswitch/ovsdb-server.pid) > >> + # Also delete datapath (flows) > >> + m_as ovn-gw-1 ovs-dpctl del-dp system@ovs-system > >> + > >> + check_loss_after_flap "gw1" $max_expected_loss1 > >> + > >> + # gw1 died => gw2 should generate garps. > >> + check_garps "false" "true" "false" > >> + check_packets "false" "true" "false" "true" > >> + echo "$(date +%H:%M:%S.%03N) Restarting gw1 ovn-vswitchd after > killing gw1" > >> + m_as ovn-gw-1 /usr/share/openvswitch/scripts/ovs-ctl start > --system-id=ovn-gw-1 > >> + > >> + # Wait some long time before restarting ovn-controller > >> + sleep 10 > >> + > >> + # gw2 should still be handling packets as OVN not restarted on gw1 > >> + check_packets "false" "true" "false" "true" > >> + > >> + echo "$(date +%H:%M:%S.%03N) Restarting gw1 ovn-controller after > killing gw1" > >> + m_as ovn-gw-1 /usr/share/ovn/scripts/ovn-ctl start_controller > ${CONTROLLER_SSL_ARGS} > >> + > >> + # The network is now restored => packets should go through gw1 and > reach chassis-1. > >> + check_packets "true" "false" "false" "true" > >> + final_check "kill_gw1" $lost $max_expected_loss2 > >> +} > >> + > >> +check_migration_between_gw1_and_gw2_reboot_gw1() { > >> + ip_gw1_eth1=$(podman exec ovn-gw-1 ip -brief address show eth1 | > awk '{print $3}' | cut -d/ -f1) > >> + cidr=$(podman exec ovn-gw-1 ip -brief address show eth1 | awk > '{print $3}' | cut -d/ -f2) > >> + AS_BOX([$(date +%H:%M:%S.%03N) Rebooting ovn-gw-1 with > $ip_gw1_eth1/$cidr]) > >> + max_expected_loss1=$1 > >> + max_expected_loss2=$2 > >> + prepare > >> + > >> + podman stop -t 0 ovn-gw-1 > >> + (exec 3>&- 4>&- 5>&- 6>&-; podman start ovn-gw-1) > >> + > >> + add_port br-ovn eth1 $ip_gw1_eth1/$cidr > >> + add_port br-ovn-ext eth2 > >> + M_START_TCPDUMPS([ovn-gw-1], [-neei eth2], [gw1], [ovn-gw-1], > [-neei eth1], [gw1_eth1], [ovn-gw-1], [-neei eth2 -Q out], [gw1_out]) > >> + check_loss_after_flap "gw1" $max_expected_loss1 > >> + > >> + # gw1 died => gw2 should generate garps. > >> + check_garps "false" "true" "false" > >> + check_packets "false" "true" "false" "true" > >> + > >> + echo "$(date +%H:%M:%S.%03N) Restarting gw1 ovn-vswitchd after > rebooting gw1" > >> + m_as ovn-gw-1 /usr/share/openvswitch/scripts/ovs-ctl start > --system-id=ovn-gw-1 > >> + > >> + # Wait some long time before restarting ovn-controller > >> + sleep 10 > >> + > >> + # gw2 should still be handling packets as OVN not restarted on gw1 > >> + check_packets "false" "true" "false" "true" > >> + > >> + echo "$(date +%H:%M:%S.%03N) Restarting gw1 ovn-controller after > rebooting gw1" > >> + m_as ovn-gw-1 /usr/share/ovn/scripts/ovn-ctl start_controller > ${CONTROLLER_SSL_ARGS} > >> + > >> + # The network is now restored => packets should go through gw1 and > reach chassis-1. > >> + check_packets "true" "false" "false" "true" > >> + final_check "kill_gw1" $lost $max_expected_loss2 > >> +} > >> + > >> +check_compute_restart() { > >> + AS_BOX([$(date +%H:%M:%S.%03N) Killing ovn-chassis-1 > ovn-controller and ovs-vswitchd]) > >> + max_expected_loss=$1 > >> + prepare > >> + > >> + # Kill ovn-chassis-1 > >> + echo "$(date +%H:%M:%S.%03N) Killing chassis-1" > >> + on_exit 'm_as ovn-chassis-1 /usr/share/openvswitch/scripts/ovs-ctl > status || > >> + m_as ovn-chassis-1 /usr/share/openvswitch/scripts/ovs-ctl > start --system-id=ovn-chassis-1' > >> + on_exit 'm_as ovn-chassis-1 /usr/share/ovn/scripts/ovn-ctl > status_controller || > >> + m_as ovn-chassis-1 /usr/share/ovn/scripts/ovn-ctl > start_controller ${CONTROLLER_SSL_ARGS}' > >> + > >> + m_as ovn-chassis-1 kill -9 $(m_as ovn-chassis-1 cat > /run/ovn/ovn-controller.pid) > >> + m_as ovn-chassis-1 kill -9 $(m_as ovn-chassis-1 cat > /run/openvswitch/ovs-vswitchd.pid) > >> + m_as ovn-chassis-1 kill -9 $(m_as ovn-chassis-1 cat > /run/openvswitch/ovsdb-server.pid) > >> + > >> + # Now restart chassis-1 > >> + flap_count=$(m_as ovn-gw-2 ovs-vsctl get interfac $from_gw2_to_ch1 > bfd_status | sed 's/.*flap_count=\"\([[0-9]]*\).*/\1/g') > >> + echo "$(date +%H:%M:%S.%03N) Restarting ovn-chassis-1 > ovn-vswitchd." > >> + m_as ovn-chassis-1 /usr/share/openvswitch/scripts/ovs-ctl start > --system-id=ovn-chassis-1 > >> + > >> + echo "$(date +%H:%M:%S.%03N) Waiting for flap count between gw-1 > and chassis-1 to increase" > >> + OVS_WAIT_UNTIL([ > >> + new_flap_count=$(m_as ovn-gw-1 ovs-vsctl get interfac > $from_gw1_to_ch1 bfd_status | sed 's/.*flap_count=\"\([[0-9]]*\).*/\1/g') > >> + echo "Comparing $new_flap_count versus $flap_count" > >> + test "$new_flap_count" -gt "$((flap_count))" > >> + ]) > >> + > >> + wait_bfd_up ovn-chassis-1 $from_ch1_to_gw1 > >> + > >> + echo "$(date +%H:%M:%S.%03N) Restarting ovn-chassis-1 > ovn-controller." > >> + m_as ovn-chassis-1 /usr/share/ovn/scripts/ovn-ctl start_controller > ${CONTROLLER_SSL_ARGS} > >> + > >> + # Wait a long time to catch losses > >> + sleep 5 > >> + final_check "compute" 0 $max_expected_loss > >> } > >> > >> start_tcpdump > >> -AS_BOX([$(date +%H:%M:%S.%03N) Sending packet from hv1-vif1(inside1) > to ext1]) > >> +echo "$(date +%H:%M:%S.%03N) Sending packet from hv1-vif1(inside1) to > ext1" > >> M_NS_CHECK_EXEC([ovn-chassis-1], [hv1-vif1], [ping -c3 -q -i 0.1 > 192.168.0.1 | FORMAT_PING], > >> [0], [dnl > >> 3 packets transmitted, 3 received, 0% packet loss, time 0ms > >> @@ -3433,7 +3629,7 @@ M_NS_CHECK_EXEC([ovn-chassis-1], [hv1-vif1], > [ping -c3 -q -i 0.1 192.168.0.1 | F > >> stop_tcpdump > >> > >> # It should have gone through gw1 and not gw2 > >> -AS_BOX([$(date +%H:%M:%S.%03N) Checking it went through gw1 and not > gw2]) > >> +echo "$(date +%H:%M:%S.%03N) Checking it went through gw1 and not gw2" > >> AT_CHECK([cat gw2.tcpdump | grep "ICMP echo"], [1], [dnl > >> ]) > >> > >> @@ -3446,17 +3642,29 @@ f0:00:c0:a8:00:fe > 00:00:c0:a8:00:01, > ethertype IPv4 (0x0800), length 98: 192.1 > >> 00:00:c0:a8:00:01 > f0:00:c0:a8:00:fe, ethertype IPv4 (0x0800), length > 98: 192.168.0.1 > 192.168.1.1: ICMP echo reply, > >> ]) > >> > >> -# We stop bfd between gw1 & gw2, but keep gw1 & gw2 running. > >> -check_migration_between_gw1_and_gw2 "stop_bfd" > >> +# We stop bfd between gw1 & gw2, but keep gw1 & gw2 running. We should > not lose packets. > >> +check_migration_between_gw1_and_gw2_bfd_stop 1 1 > >> > >> # We simulate death of gw2. It should not have any effect. > >> -check_migration_between_gw1_and_gw2 "kill_gw2" > >> +check_migration_between_gw1_and_gw2_kill_gw2 1 1 > >> + > >> +# We simulate ovs update on gw1. When ovs is stopped, flows should > still be handled by Kernel datapath. > >> +# When OVS is restarted, BFD should go down immediately, and gw2 will > start handling packets. > >> +# There will be packet losses as gw2 will usually see BFD from gw1 up > (and hence relase port) before gw1 sees > >> +# BFD up (and claim port). > >> +check_migration_between_gw1_and_gw2_update_ovs 20 1 > >> + > >> +# We simulate restart of both OVS & OVN gw1. gw2 should take over. > >> +check_migration_between_gw1_and_gw2_kill_gw1 40 20 > >> > >> # We simulate death of gw1. gw2 should take over. > >> -check_migration_between_gw1_and_gw2 "kill_gw1" > >> +check_migration_between_gw1_and_gw2_reboot_gw1 40 20 > >> + > >> +# We simulate restart of ovn-chassis-1. We expct for ~2 sec losses as > we wait for bfd up before starting > >> +# ovn-controller. > >> +check_compute_restart 30 > >> > >> AT_CLEANUP > >> -]) > >> > >> AT_SETUP([ovn multinode bgp L2 EVPN]) > >> check_fake_multinode_setup > >> -- > >> 2.47.1 > >> > > > > Regards, > Dumitru > >
On 3/26/26 5:26 PM, Xavier Simonart wrote: > Hi Dumitru > Hi Xavier, > I also have one comment on one of your (no so) nits below. > Thanks > Xavier > [...] >>>> @@ -3271,22 +3308,44 @@ dump_statistics() { >>>> ch1_rep=$(grep -c "ICMP echo reply" ch1.tcpdump) >>>> ch2_req=$(grep -c "ICMP echo request" ch2.tcpdump) >>>> ch2_rep=$(grep -c "ICMP echo reply" ch2.tcpdump) >>>> + ch3_req=$(grep -c "ICMP echo request" ch3.tcpdump) >>>> + ch3_rep=$(grep -c "ICMP echo reply" ch3.tcpdump) >>>> gw1_req=$(grep -c "ICMP echo request" gw1.tcpdump) >>>> gw1_rep=$(grep -c "ICMP echo reply" gw1.tcpdump) >>>> gw2_req=$(grep -c "ICMP echo request" gw2.tcpdump) >>>> gw2_rep=$(grep -c "ICMP echo reply" gw2.tcpdump) >>>> gw3_req=$(grep -c "ICMP echo request" gw3.tcpdump) >>>> gw3_rep=$(grep -c "ICMP echo reply" gw3.tcpdump) >>>> - echo "$n1 claims in gw1, $n2 in gw2 and $n3 on gw3" >>>> - echo "ch2_request=$ch2_req gw1_request=$gw1_req >> gw2_request=$gw2_req gw3_request=$gw3_req ch1_request=$ch1_req >> ch1_reply=$ch1_rep gw1_reply=$gw1_rep gw2_reply=$gw2_rep gw3_reply=$gw3_rep >> ch2_reply=$ch2_rep" >>>> + echo "$n1 claims in gw1, $n2 in gw2 and $n3 on gw3" >&2 >>>> + echo "ch3_req=$ch3_req gw_req=($gw1_req + $gw2_req +$gw3_req) >> ch1_req=$ch1_req ch1_rep=$ch1_rep gw_rep=($gw1_rep + $gw2_rep + $gw3_rep) >> ch3_rep=$ch3_rep ch2=($ch2_req+$ch2_rep)" >&2 >>>> + echo "$((ch3_req - ch3_rep))" >>>> } >>>> >>>> -check_migration_between_gw1_and_gw2() { >>>> - action=$1 >>>> - send_background_packets >>>> +add_port() { >>>> + bridge=$1 >>>> + interface=$2 >>>> + address=$3 >>>> + echo "Adding $bridge $interface $address" >>>> + >>>> + pid=$(podman inspect -f '{{.State.Pid}}' ovn-gw-1) >>>> + ln -sf /proc/$pid/ns/net /var/run/netns/$pid >>>> + port=$(OVS_RUNDIR= ovs-vsctl --data=bare --no-heading >> --columns=name find interface \ >>>> + external_ids:container_id=ovn-gw-1 >> external_ids:container_iface="$interface") >>>> + port="${port:0:13}" >>>> + ip link add "${port}_l" type veth peer name "${port}_c" >>>> + ip link set "${port}_l" up >>>> + ip link set "${port}_c" netns $pid >>>> + ip netns exec $pid ip link set dev "${port}_c" name "$interface" >>>> + ip netns exec $pid ip link set "$interface" up >>>> + if [[ -n "$address" ]]; then >>>> + ip netns exec $pid ip addr add "$address" dev "$interface" >>>> + fi >> >> I might be wrong but I think nobody cleans up any of these ports created >> in the ovn-gw-1 container. Do we need some on_exit() calls here? >> > I do not think so. We stop a container, and ports on host were deleted as a > consequence. > add_port just "repairs" the container, after restarting it, adding back the > ports which > ovn-fake-multinode initially added. > We could debate whether add_port should itself be added in on_exit, right > after > podman stop the container. But it looked overkill to me (we do add_port > right after > podman stop/start, so on_exit would only be useful if podman stop/start > failed (in which case cluster > is anyhow in bad state) or if e.g. we ctrl-c just between podman stop and > add_port > WDYT? > Maybe I'm misunderstanding stuff but we don't stop/start containers during one execution of the multinode.at testsuite. At least not in GitHub CI. What we do is: - start the fake-multinode cluster (this does podman run under the hood): CHASSIS_COUNT=4 GW_COUNT=4 ./ovn_cluster.sh start - run "make check-multinode": this only runs "podman exec" commands inside the containers started above - collect logs - stop cluster: ./ovn_cluster.sh stop So what I meant was, if you run the full test suite, when this specific test that called "add_ports()" ends we still have the veth pair configured on the system, at least until the full test suite ends. That may cause issues as tests that happen to run after this specific one will have to take into account that some veths might exist. In general, we (should) try to cleanup everything that was created by a test (an instance of AT_SETUP) when that test ends successfully or with a failure. Regards, Dumitru
Hi Dumitru On Thu, Mar 26, 2026 at 5:50 PM Dumitru Ceara <dceara@redhat.com> wrote: > On 3/26/26 5:26 PM, Xavier Simonart wrote: > > Hi Dumitru > > > > Hi Xavier, > > > > I also have one comment on one of your (no so) nits below. > > Thanks > > Xavier > > > > [...] > > >>>> @@ -3271,22 +3308,44 @@ dump_statistics() { > >>>> ch1_rep=$(grep -c "ICMP echo reply" ch1.tcpdump) > >>>> ch2_req=$(grep -c "ICMP echo request" ch2.tcpdump) > >>>> ch2_rep=$(grep -c "ICMP echo reply" ch2.tcpdump) > >>>> + ch3_req=$(grep -c "ICMP echo request" ch3.tcpdump) > >>>> + ch3_rep=$(grep -c "ICMP echo reply" ch3.tcpdump) > >>>> gw1_req=$(grep -c "ICMP echo request" gw1.tcpdump) > >>>> gw1_rep=$(grep -c "ICMP echo reply" gw1.tcpdump) > >>>> gw2_req=$(grep -c "ICMP echo request" gw2.tcpdump) > >>>> gw2_rep=$(grep -c "ICMP echo reply" gw2.tcpdump) > >>>> gw3_req=$(grep -c "ICMP echo request" gw3.tcpdump) > >>>> gw3_rep=$(grep -c "ICMP echo reply" gw3.tcpdump) > >>>> - echo "$n1 claims in gw1, $n2 in gw2 and $n3 on gw3" > >>>> - echo "ch2_request=$ch2_req gw1_request=$gw1_req > >> gw2_request=$gw2_req gw3_request=$gw3_req ch1_request=$ch1_req > >> ch1_reply=$ch1_rep gw1_reply=$gw1_rep gw2_reply=$gw2_rep > gw3_reply=$gw3_rep > >> ch2_reply=$ch2_rep" > >>>> + echo "$n1 claims in gw1, $n2 in gw2 and $n3 on gw3" >&2 > >>>> + echo "ch3_req=$ch3_req gw_req=($gw1_req + $gw2_req +$gw3_req) > >> ch1_req=$ch1_req ch1_rep=$ch1_rep gw_rep=($gw1_rep + $gw2_rep + > $gw3_rep) > >> ch3_rep=$ch3_rep ch2=($ch2_req+$ch2_rep)" >&2 > >>>> + echo "$((ch3_req - ch3_rep))" > >>>> } > >>>> > >>>> -check_migration_between_gw1_and_gw2() { > >>>> - action=$1 > >>>> - send_background_packets > >>>> +add_port() { > >>>> + bridge=$1 > >>>> + interface=$2 > >>>> + address=$3 > >>>> + echo "Adding $bridge $interface $address" > >>>> + > >>>> + pid=$(podman inspect -f '{{.State.Pid}}' ovn-gw-1) > >>>> + ln -sf /proc/$pid/ns/net /var/run/netns/$pid > >>>> + port=$(OVS_RUNDIR= ovs-vsctl --data=bare --no-heading > >> --columns=name find interface \ > >>>> + external_ids:container_id=ovn-gw-1 > >> external_ids:container_iface="$interface") > >>>> + port="${port:0:13}" > >>>> + ip link add "${port}_l" type veth peer name "${port}_c" > >>>> + ip link set "${port}_l" up > >>>> + ip link set "${port}_c" netns $pid > >>>> + ip netns exec $pid ip link set dev "${port}_c" name "$interface" > >>>> + ip netns exec $pid ip link set "$interface" up > >>>> + if [[ -n "$address" ]]; then > >>>> + ip netns exec $pid ip addr add "$address" dev "$interface" > >>>> + fi > >> > >> I might be wrong but I think nobody cleans up any of these ports created > >> in the ovn-gw-1 container. Do we need some on_exit() calls here? > >> > > I do not think so. We stop a container, and ports on host were deleted > as a > > consequence. > > add_port just "repairs" the container, after restarting it, adding back > the > > ports which > > ovn-fake-multinode initially added. > > We could debate whether add_port should itself be added in on_exit, right > > after > > podman stop the container. But it looked overkill to me (we do add_port > > right after > > podman stop/start, so on_exit would only be useful if podman stop/start > > failed (in which case cluster > > is anyhow in bad state) or if e.g. we ctrl-c just between podman stop and > > add_port > > WDYT? > > > > Maybe I'm misunderstanding stuff but we don't stop/start containers > during one execution of the multinode.at testsuite. At least not in > GitHub CI. What we do is: > I think there is a misunderstanding: the multinode test now stops and starts the container within the test. multinode.at calls podman stop and podman start. When the ovn-gw-1 container is stopped by the multinode test (test 18 so far), the (host related) ports are deleted. So the container (ovn-gw-1) lacks those ports when restarted (as those ports were initially added by ovn-fake-multinode). So, we add them back here to ensure the cluster is in the same state as before the test run. I've added a small comment in the test of patch v3 regarding this. > > - start the fake-multinode cluster (this does podman run under the hood): > CHASSIS_COUNT=4 GW_COUNT=4 ./ovn_cluster.sh start > > - run "make check-multinode": this only runs "podman exec" commands > inside the containers started above > > - collect logs > > - stop cluster: > ./ovn_cluster.sh stop > > So what I meant was, if you run the full test suite, when this specific > test that called "add_ports()" ends we still have the veth pair > configured on the system, at least until the full test suite ends. > > That may cause issues as tests that happen to run after this specific > one will have to take into account that some veths might exist. > > In general, we (should) try to cleanup everything that was created by a > test (an instance of AT_SETUP) when that test ends successfully or with > a failure. > Agree. Here we add back what "we" deleted. Oh, and by the way, after test 18 failed, the following tests also failed in CI, which added to the confusion. They were failing because the "add_port" (through ovs-vsctl) failed, preventing us from restoring the cluster in its right state. > > Regards, > Dumitru > Thanks Xavier
On 3/26/26 6:29 PM, Xavier Simonart wrote: > Hi Dumitru > Hi Xavier, > On Thu, Mar 26, 2026 at 5:50 PM Dumitru Ceara <dceara@redhat.com> wrote: > >> On 3/26/26 5:26 PM, Xavier Simonart wrote: >>> Hi Dumitru >>> >> >> Hi Xavier, >> >> >>> I also have one comment on one of your (no so) nits below. >>> Thanks >>> Xavier >>> >> >> [...] >> >>>>>> @@ -3271,22 +3308,44 @@ dump_statistics() { >>>>>> ch1_rep=$(grep -c "ICMP echo reply" ch1.tcpdump) >>>>>> ch2_req=$(grep -c "ICMP echo request" ch2.tcpdump) >>>>>> ch2_rep=$(grep -c "ICMP echo reply" ch2.tcpdump) >>>>>> + ch3_req=$(grep -c "ICMP echo request" ch3.tcpdump) >>>>>> + ch3_rep=$(grep -c "ICMP echo reply" ch3.tcpdump) >>>>>> gw1_req=$(grep -c "ICMP echo request" gw1.tcpdump) >>>>>> gw1_rep=$(grep -c "ICMP echo reply" gw1.tcpdump) >>>>>> gw2_req=$(grep -c "ICMP echo request" gw2.tcpdump) >>>>>> gw2_rep=$(grep -c "ICMP echo reply" gw2.tcpdump) >>>>>> gw3_req=$(grep -c "ICMP echo request" gw3.tcpdump) >>>>>> gw3_rep=$(grep -c "ICMP echo reply" gw3.tcpdump) >>>>>> - echo "$n1 claims in gw1, $n2 in gw2 and $n3 on gw3" >>>>>> - echo "ch2_request=$ch2_req gw1_request=$gw1_req >>>> gw2_request=$gw2_req gw3_request=$gw3_req ch1_request=$ch1_req >>>> ch1_reply=$ch1_rep gw1_reply=$gw1_rep gw2_reply=$gw2_rep >> gw3_reply=$gw3_rep >>>> ch2_reply=$ch2_rep" >>>>>> + echo "$n1 claims in gw1, $n2 in gw2 and $n3 on gw3" >&2 >>>>>> + echo "ch3_req=$ch3_req gw_req=($gw1_req + $gw2_req +$gw3_req) >>>> ch1_req=$ch1_req ch1_rep=$ch1_rep gw_rep=($gw1_rep + $gw2_rep + >> $gw3_rep) >>>> ch3_rep=$ch3_rep ch2=($ch2_req+$ch2_rep)" >&2 >>>>>> + echo "$((ch3_req - ch3_rep))" >>>>>> } >>>>>> >>>>>> -check_migration_between_gw1_and_gw2() { >>>>>> - action=$1 >>>>>> - send_background_packets >>>>>> +add_port() { >>>>>> + bridge=$1 >>>>>> + interface=$2 >>>>>> + address=$3 >>>>>> + echo "Adding $bridge $interface $address" >>>>>> + >>>>>> + pid=$(podman inspect -f '{{.State.Pid}}' ovn-gw-1) >>>>>> + ln -sf /proc/$pid/ns/net /var/run/netns/$pid >>>>>> + port=$(OVS_RUNDIR= ovs-vsctl --data=bare --no-heading >>>> --columns=name find interface \ >>>>>> + external_ids:container_id=ovn-gw-1 >>>> external_ids:container_iface="$interface") >>>>>> + port="${port:0:13}" >>>>>> + ip link add "${port}_l" type veth peer name "${port}_c" >>>>>> + ip link set "${port}_l" up >>>>>> + ip link set "${port}_c" netns $pid >>>>>> + ip netns exec $pid ip link set dev "${port}_c" name "$interface" >>>>>> + ip netns exec $pid ip link set "$interface" up >>>>>> + if [[ -n "$address" ]]; then >>>>>> + ip netns exec $pid ip addr add "$address" dev "$interface" >>>>>> + fi >>>> >>>> I might be wrong but I think nobody cleans up any of these ports created >>>> in the ovn-gw-1 container. Do we need some on_exit() calls here? >>>> >>> I do not think so. We stop a container, and ports on host were deleted >> as a >>> consequence. >>> add_port just "repairs" the container, after restarting it, adding back >> the >>> ports which >>> ovn-fake-multinode initially added. >>> We could debate whether add_port should itself be added in on_exit, right >>> after >>> podman stop the container. But it looked overkill to me (we do add_port >>> right after >>> podman stop/start, so on_exit would only be useful if podman stop/start >>> failed (in which case cluster >>> is anyhow in bad state) or if e.g. we ctrl-c just between podman stop and >>> add_port >>> WDYT? >>> >> >> Maybe I'm misunderstanding stuff but we don't stop/start containers >> during one execution of the multinode.at testsuite. At least not in >> GitHub CI. What we do is: >> > I think there is a misunderstanding: the multinode test now stops and > starts the container within the test. > multinode.at calls podman stop and podman start. > When the ovn-gw-1 container is stopped by the multinode test (test 18 so > far), the (host related) ports are deleted. > So the container (ovn-gw-1) lacks those ports when restarted (as those > ports were initially added by ovn-fake-multinode). > So, we add them back here to ensure the cluster is in the same state as > before the test run. > I've added a small comment in the test of patch v3 regarding this. > Ah, I completely misunderstood that part, thanks for the explanation. I'll have a closer look at v3. Regards, Dumitru >> >> - start the fake-multinode cluster (this does podman run under the hood): >> CHASSIS_COUNT=4 GW_COUNT=4 ./ovn_cluster.sh start >> >> - run "make check-multinode": this only runs "podman exec" commands >> inside the containers started above >> >> - collect logs >> >> - stop cluster: >> ./ovn_cluster.sh stop >> >> So what I meant was, if you run the full test suite, when this specific >> test that called "add_ports()" ends we still have the veth pair >> configured on the system, at least until the full test suite ends. >> >> That may cause issues as tests that happen to run after this specific >> one will have to take into account that some veths might exist. >> >> In general, we (should) try to cleanup everything that was created by a >> test (an instance of AT_SETUP) when that test ends successfully or with >> a failure. >> > Agree. Here we add back what "we" deleted. > Oh, and by the way, after test 18 failed, the following tests also failed > in CI, which added to the confusion. > They were failing because the "add_port" (through ovs-vsctl) failed, > preventing us from restoring the cluster in its right state. > >> >> Regards, >> Dumitru >> > Thanks > Xavier >
diff --git a/controller/bfd.c b/controller/bfd.c index 3b0c3f6da..56bfa4936 100644 --- a/controller/bfd.c +++ b/controller/bfd.c @@ -117,13 +117,14 @@ bfd_calculate_active_tunnels(const struct ovsrec_bridge *br_int, * * If 'our_chassis' is C5 then this function returns empty bfd set. */ -void +bool bfd_calculate_chassis( const struct sbrec_chassis *our_chassis, const struct sbrec_ha_chassis_group_table *ha_chassis_grp_table, struct sset *bfd_chassis) { const struct sbrec_ha_chassis_group *ha_chassis_grp; + bool chassis_is_ha_gw = false; SBREC_HA_CHASSIS_GROUP_TABLE_FOR_EACH (ha_chassis_grp, ha_chassis_grp_table) { bool is_ha_chassis = false; @@ -143,6 +144,7 @@ bfd_calculate_chassis( sset_add(&grp_chassis, ha_ch->chassis->name); if (our_chassis == ha_ch->chassis) { is_ha_chassis = true; + chassis_is_ha_gw = true; bfd_setup_required = true; } } @@ -178,6 +180,7 @@ bfd_calculate_chassis( } sset_destroy(&grp_chassis); } + return chassis_is_ha_gw; } void diff --git a/controller/bfd.h b/controller/bfd.h index f8fece5a5..3e3384891 100644 --- a/controller/bfd.h +++ b/controller/bfd.h @@ -16,6 +16,8 @@ #ifndef OVN_BFD_H #define OVN_BFD_H 1 +#include <stdbool.h> + struct hmap; struct ovsdb_idl; struct ovsdb_idl_index; @@ -36,7 +38,7 @@ void bfd_run(const struct ovsrec_interface_table *, const struct sbrec_sb_global_table *, const struct ovsrec_open_vswitch_table *); -void bfd_calculate_chassis( +bool bfd_calculate_chassis( const struct sbrec_chassis *, const struct sbrec_ha_chassis_group_table *, struct sset *); diff --git a/controller/ovn-controller.8.xml b/controller/ovn-controller.8.xml index 57e7cf5dd..33281a4d6 100644 --- a/controller/ovn-controller.8.xml +++ b/controller/ovn-controller.8.xml @@ -531,6 +531,17 @@ 65535. </dd> + <dt> + <code>external_ids:ovn-managed-flow-restore-wait</code> in the + <code>Open_vSwitch</code> table + </dt> + <dd> + When set to true, this key indicates that <code>ovn-controller</code> + has set the <code>other_config:flow-restore-wait</code> option. + The key is set when <code>ovn-controller</code> enables + flow-restore-wait and removed when it clears it. + </dd> + <dt> <code>external_ids:ct-zone-*</code> in the <code>Bridge</code> table </dt> diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c index 5815f1b92..0611b1767 100644 --- a/controller/ovn-controller.c +++ b/controller/ovn-controller.c @@ -211,6 +211,150 @@ static char *get_file_system_id(void) free(filename); return ret; } + +/* Set/unset flow-restore-wait, and inc ovs next_cfg if false + * When set to true, also sets ovn-managed-flow-restore-wait to true to + * indicate ownership */ +static void set_flow_restore_wait(struct ovsdb_idl_txn *ovs_idl_txn, + const struct ovsrec_open_vswitch *cfg, + const struct smap *other_config, + const char *val, bool ovn_managed) +{ + struct smap new_config; + smap_clone(&new_config, other_config); + smap_replace(&new_config, "flow-restore-wait", val); + ovsrec_open_vswitch_set_other_config(cfg, &new_config); + if (!strcmp(val, "true")) { + ovsrec_open_vswitch_update_external_ids_setkey( + cfg, "ovn-managed-flow-restore-wait", "true"); + } else if (ovn_managed) { + ovsrec_open_vswitch_update_external_ids_delkey( + cfg, "ovn-managed-flow-restore-wait"); + } + ovsdb_idl_txn_increment(ovs_idl_txn, &cfg->header_, + &ovsrec_open_vswitch_col_next_cfg, true); + smap_destroy(&new_config); +} + +static void +manage_flow_restore_wait(struct ovsdb_idl_txn *ovs_idl_txn, + const struct ovsrec_open_vswitch *cfg, + uint64_t ofctrl_cur_cfg, uint64_t ovs_next_cfg, + int ovs_txn_status, bool is_ha_gw) +{ + enum flow_restore_wait_state { + FRW_INIT, /* Initial state */ + FRW_WAIT_TXN_COMPLETE, /* Sent false, waiting txn to complete */ + FRW_TXN_SUCCESS, /* Txn completed. Waiting for OVS Ack. */ + FRW_DONE /* Everything completed */ + }; + + static int64_t frw_next_cfg; + static enum flow_restore_wait_state frw_state; + static bool ofctrl_was_connected = false; + + bool ofctrl_connected = ofctrl_is_connected(); + + if (!ovs_idl_txn || !cfg) { + return; + } + + /* If OVS is stopped/started, make sure flow-restore-wait is toggled */ + if (ofctrl_connected && !ofctrl_was_connected) { + frw_state = FRW_INIT; + } + ofctrl_was_connected = ofctrl_connected; + + if (!ofctrl_connected) { + return; + } + + bool frw = smap_get_bool(&cfg->other_config, "flow-restore-wait", false); + bool ovn_managed_once = smap_get_bool(&cfg->external_ids, + "ovn-managed-flow-restore-wait", false); + + if (frw && !ovn_managed_once) { + /* frw has been set by ovs-ctl. Do not touch. */ + return; + } + + if (!is_ha_gw) { + if (frw) { + /* frw has once been set by OVN. We are now not an HA chassis + * anymore, unset it. */ + set_flow_restore_wait(ovs_idl_txn, cfg, &cfg->other_config, + "false", ovn_managed_once); + } + /* else we are not an HA chassis and frw is false. Ignore it. */ + return; + } + + switch (frw_state) { + case FRW_INIT: + if (ofctrl_cur_cfg > 0) { + set_flow_restore_wait(ovs_idl_txn, cfg, &cfg->other_config, + "false", ovn_managed_once); + frw_state = FRW_WAIT_TXN_COMPLETE; + VLOG_INFO("Setting flow-restore-wait=false " + "(cur_cfg=%"PRIu64")", ofctrl_cur_cfg); + } + break; + + case FRW_WAIT_TXN_COMPLETE: + /* if (ovs_idl_txn != NULL), the transaction completed. + * When the transaction completed, it either failed + * (ovs_txn_status == 0) or succeeded (ovs_txn_status != 0) */ + if (ovs_txn_status == 0) { + /* Previous transaction failed. */ + set_flow_restore_wait(ovs_idl_txn, cfg, &cfg->other_config, + "false", ovn_managed_once); + break; + } + /* txn succeeded, get next_cfg */ + frw_next_cfg = ovs_next_cfg; + frw_state = FRW_TXN_SUCCESS; + /* fall through */ + + case FRW_TXN_SUCCESS: + if (ovs_next_cfg < frw_next_cfg) { + /* DB was reset, next_cfg went backwards */ + VLOG_INFO("OVS DB reset (next_cfg %"PRId64" -> %"PRIu64"), " + "resetting state", + frw_next_cfg, ovs_next_cfg); + set_flow_restore_wait(ovs_idl_txn, cfg, &cfg->other_config, + "false", ovn_managed_once); + frw_state = FRW_WAIT_TXN_COMPLETE; + break; + } + + if (!frw) { + if (cfg->cur_cfg >= frw_next_cfg) { + set_flow_restore_wait(ovs_idl_txn, cfg, &cfg->other_config, + "true", ovn_managed_once); + frw_state = FRW_DONE; + VLOG_INFO("Setting flow-restore-wait=true"); + } + } else { + /* The transaction to false succeeded but frw is true. + * So, another task already set it to true */ + frw_state = FRW_DONE; + VLOG_INFO("flow-restore-wait was already true"); + } + break; + case FRW_DONE: + if (!frw) { + /* frw has been removed (e.g. by ovs-ctl restart) or is false + * (e.g. txn failed.) */ + set_flow_restore_wait(ovs_idl_txn, cfg, &cfg->other_config, + "false", ovn_managed_once); + frw_state = FRW_WAIT_TXN_COMPLETE; + VLOG_INFO("OVS frw cleared, restarting flow-restore-wait sequence " + "(cur_cfg=%"PRIu64")", ofctrl_cur_cfg); + } + break; + } +} + /* Only set monitor conditions on tables that are available in the * server schema. */ @@ -3381,6 +3525,7 @@ en_mac_cache_cleanup(void *data) struct ed_type_bfd_chassis { struct sset bfd_chassis; + bool is_ha_gw; }; static void * @@ -3409,8 +3554,9 @@ en_bfd_chassis_run(struct engine_node *node, void *data OVS_UNUSED) = chassis_lookup_by_name(sbrec_chassis_by_name, chassis_id); sset_clear(&bfd_chassis->bfd_chassis); - bfd_calculate_chassis(chassis, ha_chassis_grp_table, - &bfd_chassis->bfd_chassis); + bfd_chassis->is_ha_gw = bfd_calculate_chassis(chassis, + ha_chassis_grp_table, + &bfd_chassis->bfd_chassis); return EN_UPDATED; } @@ -7117,6 +7263,7 @@ main(int argc, char *argv[]) struct unixctl_server *unixctl; struct ovn_exit_args exit_args = {0}; struct br_int_remote br_int_remote = {0}; + static uint64_t next_cfg = 0; int retval; /* Read from system-id-override file once on startup. */ @@ -7444,6 +7591,7 @@ main(int argc, char *argv[]) /* Main loop. */ int ovnsb_txn_status = 1; + int ovs_txn_status = 1; bool sb_monitor_all = false; struct tracked_acl_ids *tracked_acl_ids = NULL; while (!exit_args.exiting) { @@ -7545,6 +7693,11 @@ main(int argc, char *argv[]) pinctrl_update_swconn(br_int_remote.target, br_int_remote.probe_interval); + if (cfg && ovs_idl_txn && ovs_txn_status == -1) { + /* txn was in progress and is now completed */ + next_cfg = cfg->next_cfg; + } + /* Enable ACL matching for double tagged traffic. */ if (ovs_idl_txn && cfg) { int vlan_limit = smap_get_int( @@ -7894,6 +8047,13 @@ main(int argc, char *argv[]) stopwatch_start(OFCTRL_SEQNO_RUN_STOPWATCH_NAME, time_msec()); ofctrl_seqno_run(ofctrl_get_cur_cfg()); + if (ovs_idl_txn && bfd_chassis_data) { + manage_flow_restore_wait(ovs_idl_txn, cfg, + ofctrl_get_cur_cfg(), + next_cfg, ovs_txn_status, + bfd_chassis_data->is_ha_gw); + } + stopwatch_stop(OFCTRL_SEQNO_RUN_STOPWATCH_NAME, time_msec()); stopwatch_start(IF_STATUS_MGR_RUN_STOPWATCH_NAME, @@ -7993,7 +8153,7 @@ main(int argc, char *argv[]) OVS_NOT_REACHED(); } - int ovs_txn_status = ovsdb_idl_loop_commit_and_wait(&ovs_idl_loop); + ovs_txn_status = ovsdb_idl_loop_commit_and_wait(&ovs_idl_loop); if (!ovs_txn_status) { /* The transaction failed. */ vif_plug_clear_deleted( @@ -8012,6 +8172,9 @@ main(int argc, char *argv[]) &vif_plug_deleted_iface_ids); vif_plug_finish_changed( &vif_plug_changed_iface_ids); + if (cfg) { + next_cfg = cfg->next_cfg; + } } else if (ovs_txn_status == -1) { /* The commit is still in progress */ } else { @@ -8085,7 +8248,7 @@ loop_done: } ovsdb_idl_loop_commit_and_wait(&ovnsb_idl_loop); - int ovs_txn_status = ovsdb_idl_loop_commit_and_wait(&ovs_idl_loop); + ovs_txn_status = ovsdb_idl_loop_commit_and_wait(&ovs_idl_loop); if (!ovs_txn_status) { /* The transaction failed. */ vif_plug_clear_deleted( diff --git a/tests/multinode-macros.at b/tests/multinode-macros.at index 4a74d5127..646ee2d79 100644 --- a/tests/multinode-macros.at +++ b/tests/multinode-macros.at @@ -41,6 +41,28 @@ m4_define([M_START_TCPDUMP], ] ) +m4_define([_M_START_TCPDUMPS_RECURSIVE], [ + m4_if(m4_eval($# > 3), [1], [dnl + names="$names $3" + echo "Running podman exec $1 tcpdump -l $2 >$3.tcpdump 2>$3.stderr" + podman exec $1 tcpdump -l $2 >$3.tcpdump 2>$3.stderr & + echo "podman exec $1 ps -ef | grep -v grep | grep tcpdump && podman exec $1 killall tcpdump" >> cleanup + _M_START_TCPDUMPS_RECURSIVE(m4_shift(m4_shift(m4_shift($@)))) + ]) + ] +) + +# Start Multiple tcpdump. Useful to speed up when many tcpdump +# must be started as waiting for "listening" takes usually 1 second. +m4_define([M_START_TCPDUMPS], + [ + names="" + _M_START_TCPDUMPS_RECURSIVE($@) + for name in $names; do + OVS_WAIT_UNTIL([grep -q "listening" ${name}.stderr]) + done + ] +) # M_FORMAT_CT([ip-addr]) # diff --git a/tests/multinode.at b/tests/multinode.at index 6b9614126..24d7ca27c 100644 --- a/tests/multinode.at +++ b/tests/multinode.at @@ -2986,42 +2986,42 @@ AT_CLEANUP AT_SETUP([HA: Check for missing garp on leader when BFD goes back up]) # Network topology -# ┌────────────────────────────────────────────────────────────────────────────────────────────────────────┐ -# │ │ -# │ ┌───────────────────┐ ┌───────────────────┐ ┌───────────────────┐ ┌───────────────────┐ │ -# │ │ ovn-chassis-1 │ │ ovn-gw-1 │ │ ovn-gw-2 │ │ ovn-chassis-2 │ │ -# │ └─────────┬─────────┘ └───────────────────┘ └───────────────────┘ └───────────────────┘ │ -# │ ┌─────────┴─────────┐ │ -# │ │ inside1 │ │ -# │ │ 192.168.1.1/24 │ │ -# │ └─────────┬─────────┘ │ -# │ ┌─────────┴─────────┐ │ -# │ │ inside │ │ -# │ └─────────┬─────────┘ │ -# │ ┌─────────┴─────────┐ │ -# │ │ 192.168.1.254 │ │ -# │ │ R1 │ │ -# │ │ 192.168.0.254 │ │ -# │ └─────────┬─────────┘ │ -# │ └------eth1---------------┬--------eth1-----------┐ │ -# │ ┌──────────┴────────┐ ┌─────────┴─────────┐ │ -# │ │ 192.168.1.254 │ │ 192.168.1.254 │ │ -# │ │ R1 │ │ R1 │ │ -# │ │ 192.168.0.254 │ │ 192.168.0.254 │ │ -# │ └─────────┬─────────┘ └─────────┬─────────┘ │ -# │ │ │ ┌───────────────────┐ │ -# │ ┌─────────┴─────────┐ ┌─────────┴─────────┐ │ 192.168.0.1 │ │ -# │ │ outside │ │ outside │ │ ext1 │ │ -# │ └─────────┬─────────┘ └─────────┬─────────┘ └─────────┬─────────┘ │ -# │ ┌─────────┴─────────┐ ┌─────────┴─────────┐ ┌─────────┴─────────┐ │ -# │ │ ln-outside │ │ ln-outside │ │ ln-ext1 │ │ -# │ └─────────┬─────────┘ └─────────┬─────────┘ └─────────┬─────────┘ │ -# │ ┌─────────┴─────────┐ ┌─────────┴─────────┐ ┌─────────┴─────────┐ │ -# │ │ br-ex │ │ br-ex │ │ br-ex │ │ -# │ └─────────┬─────────┘ └─────────┬─────────┘ └─────────┬─────────┘ │ -# │ └---------eth2-----------┴-------eth2-------------┘ │ -# │ │ -# └────────────────────────────────────────────────────────────────────────────────────────────────────────┘ +# ┌────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐ +# │ │ +# │ ┌───────────────────┐ ┌───────────────────┐ ┌───────────────────┐ ┌───────────────────┐ ┌───────────────────┐ │ +# │ │ ovn-chassis-1 │ │ ovn-chassis-2 │ │ ovn-gw-1 │ │ ovn-gw-2 │ │ ovn-chassis-3 │ │ +# │ └─────────┬─────────┘ └─────────┬─────────┘ └───────────────────┘ └───────────────────┘ └───────────────────┘ │ +# │ ┌─────────┴─────────┐ ┌─────────┴─────────┐ │ +# │ │ inside1 │ │ inside2 │ │ +# │ │ 192.168.1.1/24 │ │ 192.168.1.2/24 │ │ +# │ └─────────┬─────────┘ └─────────┬─────────┘ │ +# │ ┌─┴────────────────────────┴─┐ │ +# │ │ inside │ │ +# │ └──────────────┬─────────────┘ │ +# │ ┌─────────┴─────────┐ │ +# │ │ 192.168.1.254 │ │ +# │ │ R1 │ │ +# │ │ 192.168.0.254 │ │ +# │ └─────────┬─────────┘ │ +# │ └------eth1---------------------------┬--------eth1-----------┐ │ +# │ ┌──────────┴────────┐ ┌─────────┴─────────┐ │ +# │ │ 192.168.1.254 │ │ 192.168.1.254 │ │ +# │ │ R1 │ │ R1 │ │ +# │ │ 192.168.0.254 │ │ 192.168.0.254 │ │ +# │ └─────────┬─────────┘ └─────────┬─────────┘ │ +# │ │ │ ┌───────────────────┐ │ +# │ ┌─────────┴─────────┐ ┌─────────┴─────────┐ │ 192.168.0.1 │ │ +# │ │ outside │ │ outside │ │ ext1 │ │ +# │ └─────────┬─────────┘ └─────────┬─────────┘ └─────────┬─────────┘ │ +# │ ┌─────────┴─────────┐ ┌─────────┴─────────┐ ┌─────────┴─────────┐ │ +# │ │ ln-outside │ │ ln-outside │ │ ln-ext1 │ │ +# │ └─────────┬─────────┘ └─────────┬─────────┘ └─────────┬─────────┘ │ +# │ ┌─────────┴─────────┐ ┌─────────┴─────────┐ ┌─────────┴─────────┐ │ +# │ │ br-ex │ │ br-ex │ │ br-ex │ │ +# │ └─────────┬─────────┘ └─────────┬─────────┘ └─────────┬─────────┘ │ +# │ └---------eth2-----------┴-------eth2-------------┘ │ +# │ │ +# └────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘ # The goal of this test is the check that GARP are properly generated by higest priority traffic when # BFD goes down, and back up, and this whether the BFD event is due either to some bfd packet lost @@ -3030,6 +3030,12 @@ AT_SETUP([HA: Check for missing garp on leader when BFD goes back up]) # So gw3 should in this test neither send garp or receive packets. # # Enable vconn so we can check the GARP from a log perspective. +on_exit "podman exec ovn-gw-1 ovn-appctl vlog/set info" +on_exit "podman exec ovn-gw-1 ovn-appctl vlog/enable-rate-limit" +on_exit "podman exec ovn-gw-2 ovn-appctl vlog/set info" +on_exit "podman exec ovn-gw-2 ovn-appctl vlog/enable-rate-limit" +on_exit "podman exec ovn-gw-3 ovn-appctl vlog/set info" +on_exit "podman exec ovn-gw-3 ovn-appctl vlog/enable-rate-limit" m_as ovn-gw-1 ovn-appctl vlog/set vconn:dbg m_as ovn-gw-2 ovn-appctl vlog/set vconn:dbg m_as ovn-gw-3 ovn-appctl vlog/set vconn:dbg @@ -3037,12 +3043,17 @@ m_as ovn-gw-1 ovn-appctl vlog/disable-rate-limit m_as ovn-gw-2 ovn-appctl vlog/disable-rate-limit m_as ovn-gw-3 ovn-appctl vlog/disable-rate-limit +# Decrease revalidation time on ovs switch simulating ToR. +on_exit "OVS_RUNDIR= ovs-vsctl set Open_vSwitch . other_config:max-revalidator=500" +OVS_RUNDIR= ovs-vsctl set Open_vSwitch . other_config:max-revalidator=100 + check_fake_multinode_setup # Delete the multinode NB and OVS resources before starting the test. cleanup_multinode_resources ip_ch1=$(m_as ovn-chassis-1 ip a show dev eth1 | grep "inet " | awk '{print $2}'| cut -d '/' -f1) +ip_ch2=$(m_as ovn-chassis-2 ip a show dev eth1 | grep "inet " | awk '{print $2}'| cut -d '/' -f1) ip_gw1=$(m_as ovn-gw-1 ip a show dev eth1 | grep "inet " | awk '{print $2}'| cut -d '/' -f1) ip_gw2=$(m_as ovn-gw-2 ip a show dev eth1 | grep "inet " | awk '{print $2}'| cut -d '/' -f1) ip_gw3=$(m_as ovn-gw-3 ip a show dev eth1 | grep "inet " | awk '{print $2}'| cut -d '/' -f1) @@ -3050,25 +3061,35 @@ ip_gw3=$(m_as ovn-gw-3 ip a show dev eth1 | grep "inet " | awk '{print $2}'| cut from_gw1_to_gw2=$(m_as ovn-gw-1 ovs-vsctl --bare --columns=name find interface options:remote_ip=$ip_gw2) from_gw1_to_gw3=$(m_as ovn-gw-1 ovs-vsctl --bare --columns=name find interface options:remote_ip=$ip_gw3) from_gw1_to_ch1=$(m_as ovn-gw-1 ovs-vsctl --bare --columns=name find interface options:remote_ip=$ip_ch1) +from_gw1_to_ch2=$(m_as ovn-gw-1 ovs-vsctl --bare --columns=name find interface options:remote_ip=$ip_ch2) from_gw2_to_gw1=$(m_as ovn-gw-2 ovs-vsctl --bare --columns=name find interface options:remote_ip=$ip_gw1) from_gw2_to_gw3=$(m_as ovn-gw-2 ovs-vsctl --bare --columns=name find interface options:remote_ip=$ip_gw3) from_gw2_to_ch1=$(m_as ovn-gw-2 ovs-vsctl --bare --columns=name find interface options:remote_ip=$ip_ch1) +from_gw2_to_ch2=$(m_as ovn-gw-2 ovs-vsctl --bare --columns=name find interface options:remote_ip=$ip_ch2) from_ch1_to_gw1=$(m_as ovn-chassis-1 ovs-vsctl --bare --columns=name find interface options:remote_ip=$ip_gw1) from_ch1_to_gw2=$(m_as ovn-chassis-1 ovs-vsctl --bare --columns=name find interface options:remote_ip=$ip_gw2) +from_ch2_to_gw1=$(m_as ovn-chassis-2 ovs-vsctl --bare --columns=name find interface options:remote_ip=$ip_gw1) +from_ch2_to_gw2=$(m_as ovn-chassis-2 ovs-vsctl --bare --columns=name find interface options:remote_ip=$ip_gw2) m_as ovn-chassis-1 ip link del hv1-vif1-p -m_as ovn-chassis-2 ip link del ext1-p +m_as ovn-chassis-2 ip link del hv2-vif1-p +m_as ovn-chassis-3 ip link del ext1-p OVS_WAIT_UNTIL([m_as ovn-chassis-1 ip link show | grep -q genev_sys]) OVS_WAIT_UNTIL([m_as ovn-chassis-2 ip link show | grep -q genev_sys]) +OVS_WAIT_UNTIL([m_as ovn-chassis-3 ip link show | grep -q genev_sys]) OVS_WAIT_UNTIL([m_as ovn-gw-1 ip link show | grep -q genev_sys]) OVS_WAIT_UNTIL([m_as ovn-gw-2 ip link show | grep -q genev_sys]) OVS_WAIT_UNTIL([m_as ovn-gw-3 ip link show | grep -q genev_sys]) +# Use "aggressive" bfd parameters +check multinode_nbctl set NB_Global . options:"bfd-min-rx"=500 +check multinode_nbctl set NB_Global . options:"bfd-min-tx"=100 check multinode_nbctl ls-add inside check multinode_nbctl ls-add outside check multinode_nbctl ls-add ext check multinode_nbctl lsp-add inside inside1 -- lsp-set-addresses inside1 "f0:00:c0:a8:01:01 192.168.1.1" +check multinode_nbctl lsp-add inside inside2 -- lsp-set-addresses inside2 "f0:00:c0:a8:01:02 192.168.1.2" check multinode_nbctl lsp-add ext ext1 -- lsp-set-addresses ext1 "00:00:c0:a8:00:01 192.168.0.1" multinode_nbctl create Logical_Router name=R1 @@ -3100,12 +3121,14 @@ m_as ovn-gw-3 ovs-vsctl remove open . external_ids garp-max-timeout-sec m_as ovn-chassis-1 ovs-vsctl set open . external-ids:ovn-bridge-mappings=public:br-ex m_as ovn-chassis-2 ovs-vsctl set open . external-ids:ovn-bridge-mappings=public:br-ex +m_as ovn-chassis-3 ovs-vsctl set open . external-ids:ovn-bridge-mappings=public:br-ex m_as ovn-gw-1 ovs-vsctl set open . external-ids:ovn-bridge-mappings=public:br-ex m_as ovn-gw-2 ovs-vsctl set open . external-ids:ovn-bridge-mappings=public:br-ex m_as ovn-gw-3 ovs-vsctl set open . external-ids:ovn-bridge-mappings=public:br-ex m_as ovn-chassis-1 /data/create_fake_vm.sh inside1 hv1-vif1 f0:00:c0:a8:01:01 1500 192.168.1.1 24 192.168.1.254 2000::1/64 2000::a -m_as ovn-chassis-2 /data/create_fake_vm.sh ext1 ext1 00:00:c0:a8:00:01 1500 192.168.0.1 24 192.168.0.254 1000::3/64 1000::a +m_as ovn-chassis-2 /data/create_fake_vm.sh inside2 hv2-vif1 f0:00:c0:a8:01:02 1500 192.168.1.2 24 192.168.1.254 2000::2/64 2000::a +m_as ovn-chassis-3 /data/create_fake_vm.sh ext1 ext1 00:00:c0:a8:00:01 1500 192.168.0.1 24 192.168.0.254 1000::3/64 1000::a # There should be one ha_chassis_group with the name "R1_outside" m_check_row_count HA_Chassis_Group 1 name=R1_outside @@ -3160,53 +3183,67 @@ for chassis in $from_ch1_to_gw1 $from_ch1_to_gw2; do wait_bfd_enabled ovn-chassis-1 $chassis done +# check BFD enablement on tunnel ports from ovn-chassis-2 ########### +for chassis in $from_ch2_to_gw1 $from_ch2_to_gw2; do + echo "checking ovn-chassis-2 -> $chassis" + wait_bfd_enabled ovn-chassis-2 $chassis +done + # Make sure there is no nft table left. Do not use nft directly as might not be installed in container. gw1_pid=$(podman inspect -f '{{.State.Pid}}' ovn-gw-1) nsenter --net=/proc/$gw1_pid/ns/net nft list tables | grep ovn-test && nsenter --net=/proc/$gw1_pid/ns/net nft delete table ip ovn-test -on_exit "nsenter --net=/proc/$gw1_pid/ns/net nft list tables | grep ovn-test && nsenter --net=/proc/$gw1_pid/ns/net nft delete table ip ovn-test" +on_exit "if [[ -d "/proc/$gw1_pid" ]]; then nsenter --net=/proc/$gw1_pid/ns/net nft list tables | grep ovn-test && nsenter --net=/proc/$gw1_pid/ns/net nft delete table ip ovn-test; fi" -for chassis in $from_gw1_to_gw2 $from_gw1_to_gw3 $from_gw1_to_ch1; do +for chassis in $from_gw1_to_gw2 $from_gw1_to_gw3 $from_gw1_to_ch1 $from_gw1_to_ch2; do wait_bfd_up ovn-gw-1 $chassis done -for chassis in $from_gw2_to_gw1 $from_gw2_to_gw3 $from_gw2_to_ch1; do +for chassis in $from_gw2_to_gw1 $from_gw2_to_gw3 $from_gw2_to_ch1 $from_gw2_to_ch2; do wait_bfd_up ovn-gw-2 $chassis done for chassis in $from_ch1_to_gw1 $from_ch1_to_gw2; do wait_bfd_up ovn-chassis-1 $chassis done +for chassis in $from_ch2_to_gw1 $from_ch2_to_gw2; do + wait_bfd_up ovn-chassis-2 $chassis +done m_wait_row_count Port_Binding 1 logical_port=cr-R1_outside chassis=$gw1_chassis check multinode_nbctl --wait=hv sync start_tcpdump() { echo "$(date +%H:%M:%S.%03N) Starting tcpdump" - M_START_TCPDUMP([ovn-chassis-1], [-neei hv1-vif1-p], [ch1]) - M_START_TCPDUMP([ovn-chassis-2], [-neei eth2], [ch2]) - M_START_TCPDUMP([ovn-gw-1], [-neei eth2], [gw1]) - M_START_TCPDUMP([ovn-gw-1], [-neei eth2 -Q out], [gw1_out]) - M_START_TCPDUMP([ovn-gw-2], [-neei eth2], [gw2]) - M_START_TCPDUMP([ovn-gw-2], [-neei eth2 -Q out], [gw2_out]) - M_START_TCPDUMP([ovn-gw-3], [-neei eth2], [gw3]) - M_START_TCPDUMP([ovn-gw-3], [-neei eth2 -Q out], [gw3_out]) + M_START_TCPDUMPS([ovn-chassis-1], [-neei hv1-vif1-p], [ch1], + [ovn-chassis-2], [-neei hv2-vif1-p], [ch2], + [ovn-chassis-3], [-neei eth2], [ch3], + [ovn-gw-1], [-neei eth2], [gw1], + [ovn-gw-1], [-neei eth2 -Q out], [gw1_out], + [ovn-gw-2], [-neei eth2], [gw2], + [ovn-gw-2], [-neei eth2 -Q out], [gw2_out], + [ovn-gw-3], [-neei eth2], [gw3], + [ovn-gw-3], [-neei eth2 -Q out], [gw3_out], + [ovn-gw-1], [-neei eth1], [gw1_eth1], + [ovn-gw-2], [-neei eth1], [gw2_eth1], + [ovn-chassis-1], [-neei eth1], [ch1_eth1], + [ovn-chassis-2], [-neei eth1], [ch2_eth1]) } stop_tcpdump() { echo "$(date +%H:%M:%S.%03N) Stopping tcpdump" - m_kill 'ovn-gw-1 ovn-gw-2 ovn-gw-3 ovn-chassis-1 ovn-chassis-2' tcpdump + m_kill 'ovn-gw-1 ovn-gw-2 ovn-gw-3 ovn-chassis-1 ovn-chassis-2 ovn-chassis-3' tcpdump } -# Send packets from chassis2 (ext1) to chassis1 +# Send packets from ovn-chassis-3 (ext1) to ovn-chassis-1 send_background_packets() { echo "$(date +%H:%M:%S.%03N) Sending packets in Background" start_tcpdump - M_NS_DAEMONIZE([ovn-chassis-2], [ext1], [ping -f -i 0.1 192.168.1.1], [ping.pid]) + M_NS_DAEMONIZE([ovn-chassis-3], [ext1], [ping -f -i 0.1 192.168.1.1], [ping.pid]) } stop_sending_background_packets() { echo "$(date +%H:%M:%S.%03N) Stopping Background process" m_as ovn-chassis-1 ps -ef | grep -v grep | grep -q ping && \ m_as ovn-chassis-1 echo "Stopping ping on ovn-chassis-1" && killall ping - m_as ovn-chassis-2 ps -ef | grep -v grep | grep -q ping && \ + m_as ovn-chassis-3 ps -ef | grep -v grep | grep -q ping && \ m_as ovn-chassis-2 echo "Stopping ping on ovn-chassis-2" && killall ping stop_tcpdump } @@ -3216,8 +3253,8 @@ check_for_new_garps() { expecting_garp=$2 n_new_garps=$(cat ${hv}_out.tcpdump | grep -c "f0:00:c0:a8:00:fe > Broadcast, ethertype ARP (0x0806), length 42: Request who-has 192.168.0.254 tell 192.168.0.254, length 28") - if [ "$expecting_garp" == "true" ]; then - AS_BOX([$(date +%H:%M:%S.%03N) Waiting/checking for garp from $hv - Starting with $n_new_garps]) + if [[ "$expecting_garp" == "true" ]]; then + echo "$(date +%H:%M:%S.%03N) Waiting/checking for garp from $hv - Starting with $n_new_garps" OVS_WAIT_UNTIL([ n_garps=$n_new_garps n_new_garps=$(cat ${hv}_out.tcpdump | grep -c "f0:00:c0:a8:00:fe > Broadcast, ethertype ARP (0x0806), length 42: Request who-has 192.168.0.254 tell 192.168.0.254, length 28") @@ -3225,7 +3262,7 @@ check_for_new_garps() { test "$n_garps" -ne "$n_new_garps" ]) else - AS_BOX([$(date +%H:%M:%S.%03N) Checking no garp from ${hv}]) + echo "$(date +%H:%M:%S.%03N) Checking no garp from ${hv}" # Waiting a few seconds to get a chance to see unexpected garps. sleep 3 n_garps=$(cat ${hv}_out.tcpdump | grep -c "f0:00:c0:a8:00:fe > Broadcast, ethertype ARP (0x0806), length 42: Request who-has 192.168.0.254 tell 192.168.0.254, length 28") @@ -3241,8 +3278,8 @@ check_for_new_echo_pkts() { n_new_echo_req=$(cat ${hv}.tcpdump | grep -c "$mac_src > $mac_dst, ethertype IPv4 (0x0800), length 98: 192.168.0.1 > 192.168.1.1: ICMP echo request") n_new_echo_rep=$(cat ${hv}.tcpdump | grep -c "$mac_dst > $mac_src, ethertype IPv4 (0x0800), length 98: 192.168.1.1 > 192.168.0.1: ICMP echo reply") - if [ "$expecting_pkts" == "true" ]; then - AS_BOX([$(date +%H:%M:%S.%03N) Waiting/checking for echo pkts through ${hv}]) + if [[ "$expecting_pkts" == "true" ]]; then + echo "$(date +%H:%M:%S.%03N) Waiting/checking for echo pkts through ${hv}" echo "Starting with $n_new_echo_req requests and $n_new_echo_rep replies so far on ${hv}." OVS_WAIT_UNTIL([ n_echo_req=$n_new_echo_req @@ -3253,7 +3290,7 @@ check_for_new_echo_pkts() { test "$n_echo_req" -ne "$n_new_echo_req" && test "$n_echo_rep" -ne "$n_new_echo_rep" ]) else - AS_BOX([$(date +%H:%M:%S.%03N) Checking no pkts from ${hv}]) + echo "$(date +%H:%M:%S.%03N) Checking no pkts from ${hv}" # Waiting a few seconds to get a chance to see unexpected pkts. sleep 3 n_echo_req=$(cat ${hv}.tcpdump | grep -c "$mac_src > $mac_dst, ethertype IPv4 (0x0800), length 98: 192.168.0.1 > 192.168.1.1: ICMP echo request") @@ -3271,22 +3308,44 @@ dump_statistics() { ch1_rep=$(grep -c "ICMP echo reply" ch1.tcpdump) ch2_req=$(grep -c "ICMP echo request" ch2.tcpdump) ch2_rep=$(grep -c "ICMP echo reply" ch2.tcpdump) + ch3_req=$(grep -c "ICMP echo request" ch3.tcpdump) + ch3_rep=$(grep -c "ICMP echo reply" ch3.tcpdump) gw1_req=$(grep -c "ICMP echo request" gw1.tcpdump) gw1_rep=$(grep -c "ICMP echo reply" gw1.tcpdump) gw2_req=$(grep -c "ICMP echo request" gw2.tcpdump) gw2_rep=$(grep -c "ICMP echo reply" gw2.tcpdump) gw3_req=$(grep -c "ICMP echo request" gw3.tcpdump) gw3_rep=$(grep -c "ICMP echo reply" gw3.tcpdump) - echo "$n1 claims in gw1, $n2 in gw2 and $n3 on gw3" - echo "ch2_request=$ch2_req gw1_request=$gw1_req gw2_request=$gw2_req gw3_request=$gw3_req ch1_request=$ch1_req ch1_reply=$ch1_rep gw1_reply=$gw1_rep gw2_reply=$gw2_rep gw3_reply=$gw3_rep ch2_reply=$ch2_rep" + echo "$n1 claims in gw1, $n2 in gw2 and $n3 on gw3" >&2 + echo "ch3_req=$ch3_req gw_req=($gw1_req + $gw2_req +$gw3_req) ch1_req=$ch1_req ch1_rep=$ch1_rep gw_rep=($gw1_rep + $gw2_rep + $gw3_rep) ch3_rep=$ch3_rep ch2=($ch2_req+$ch2_rep)" >&2 + echo "$((ch3_req - ch3_rep))" } -check_migration_between_gw1_and_gw2() { - action=$1 - send_background_packets +add_port() { + bridge=$1 + interface=$2 + address=$3 + echo "Adding $bridge $interface $address" + + pid=$(podman inspect -f '{{.State.Pid}}' ovn-gw-1) + ln -sf /proc/$pid/ns/net /var/run/netns/$pid + port=$(OVS_RUNDIR= ovs-vsctl --data=bare --no-heading --columns=name find interface \ + external_ids:container_id=ovn-gw-1 external_ids:container_iface="$interface") + port="${port:0:13}" + ip link add "${port}_l" type veth peer name "${port}_c" + ip link set "${port}_l" up + ip link set "${port}_c" netns $pid + ip netns exec $pid ip link set dev "${port}_c" name "$interface" + ip netns exec $pid ip link set "$interface" up + if [[ -n "$address" ]]; then + ip netns exec $pid ip addr add "$address" dev "$interface" + fi +} +prepare() { + send_background_packets # We make sure gw1 is leader since enough time that it generated all its garps. - AS_BOX([$(date +%H:%M:%S.%03N) Waiting all garps sent by gw1]) + echo $(date +%H:%M:%S.%03N) Waiting all garps sent by gw1 n_new_garps=$(cat gw1_out.tcpdump | grep -c "f0:00:c0:a8:00:fe > Broadcast, ethertype ARP (0x0806), length 42: Request who-has 192.168.0.254 tell 192.168.0.254, length 28") OVS_WAIT_UNTIL([ n_garps=$n_new_garps @@ -3302,130 +3361,267 @@ check_migration_between_gw1_and_gw2() { check_for_new_echo_pkts gw2 "00:00:c0:a8:00:01" "f0:00:c0:a8:00:fe" "false" check_for_new_echo_pkts gw3 "00:00:c0:a8:00:01" "f0:00:c0:a8:00:fe" "false" + # All packets should go through gw1, and none through gw2 or gw3. + check_packets "true" "false" "false" "true" flap_count_gw_1=$(m_as ovn-gw-1 ovs-vsctl get interface $from_gw1_to_gw2 bfd_status | sed 's/.*flap_count=\"\([[0-9]]*\).*/\1/g') flap_count_gw_2=$(m_as ovn-gw-2 ovs-vsctl get interface $from_gw2_to_gw1 bfd_status | sed 's/.*flap_count=\"\([[0-9]]*\).*/\1/g') +} - if [ test "$action" == "stop_bfd" ]; then - AS_BOX([$(date +%H:%M:%S.%03N) Blocking bfd on gw1 (from $ip_gw1 to $ip_gw2)]) - nsenter --net=/proc/$gw1_pid/ns/net nft add table ip ovn-test - nsenter --net=/proc/$gw1_pid/ns/net nft 'add chain ip ovn-test INPUT { type filter hook input priority 0; policy accept; }' - # Drop BFD from gw-1 to gw-2: geneve port (6081), inner port 3784 (0xec8), Session state Up, Init, Down. - nsenter --net=/proc/$gw1_pid/ns/net nft add rule ip ovn-test INPUT ip daddr $ip_gw1 ip saddr $ip_gw2 udp dport 6081 '@th,416,16 == 0x0ec8 @th,472,8 == 0xc0 counter drop' - nsenter --net=/proc/$gw1_pid/ns/net nft add rule ip ovn-test INPUT ip daddr $ip_gw1 ip saddr $ip_gw2 udp dport 6081 '@th,416,16 == 0x0ec8 @th,472,8 == 0x80 counter drop' - nsenter --net=/proc/$gw1_pid/ns/net nft add rule ip ovn-test INPUT ip daddr $ip_gw1 ip saddr $ip_gw2 udp dport 6081 '@th,416,16 == 0x0ec8 @th,472,8 == 0x40 counter drop' - - # We do not check that packets go through gw2 as BFD between chassis-2 and gw1 is still up - fi - - if [ test "$action" == "kill_gw2" ]; then - AS_BOX([$(date +%H:%M:%S.%03N) Killing gw2 ovn-controller]) - on_exit 'm_as ovn-gw-2 /usr/share/openvswitch/scripts/ovs-ctl status || - m_as ovn-gw-2 /usr/share/openvswitch/scripts/ovs-ctl start --system-id=ovn-gw-2' - on_exit 'm_as ovn-gw-2 /usr/share/ovn/scripts/ovn-ctl status_controller || - m_as ovn-gw-2 /usr/share/ovn/scripts/ovn-ctl start_controller ${CONTROLLER_SSL_ARGS}' - - m_as ovn-gw-2 kill -9 $(m_as ovn-gw-2 cat /run/ovn/ovn-controller.pid) - m_as ovn-gw-2 kill -9 $(m_as ovn-gw-2 cat /run/openvswitch/ovs-vswitchd.pid) - m_as ovn-gw-2 kill -9 $(m_as ovn-gw-2 cat /run/openvswitch/ovsdb-server.pid) - # Also delete datapath (flows) - m_as ovn-gw-2 ovs-dpctl del-dp system@ovs-system - fi - - if [ test "$action" == "kill_gw1" ]; then - AS_BOX([$(date +%H:%M:%S.%03N) Killing gw1 ovn-controller]) - on_exit 'm_as ovn-gw-1 /usr/share/openvswitch/scripts/ovs-ctl status || - m_as ovn-gw-1 /usr/share/openvswitch/scripts/ovs-ctl start --system-id=ovn-gw-1' - on_exit 'm_as ovn-gw-1 /usr/share/ovn/scripts/ovn-ctl status_controller || - m_as ovn-gw-1 /usr/share/ovn/scripts/ovn-ctl start_controller ${CONTROLLER_SSL_ARGS}' - - m_as ovn-gw-1 kill -9 $(m_as ovn-gw-1 cat /run/ovn/ovn-controller.pid) - m_as ovn-gw-1 kill -9 $(m_as ovn-gw-1 cat /run/openvswitch/ovs-vswitchd.pid) - m_as ovn-gw-1 kill -9 $(m_as ovn-gw-1 cat /run/openvswitch/ovsdb-server.pid) - # Also delete datapath (flows) - m_as ovn-gw-1 ovs-dpctl del-dp system@ovs-system - fi +check_loss_after_flap() +{ + dead=$1 + max_expected_loss=$2 - if [ test "$action" == "kill_gw2" ]; then - AS_BOX([$(date +%H:%M:%S.%03N) Waiting for flap count between gw1 and gw2 to increase]) + if [[ "$dead" == "gw2" ]]; then + echo "$(date +%H:%M:%S.%03N) Waiting for flap count between gw1 and gw2 to increase" OVS_WAIT_UNTIL([ new_flap_count=$(m_as ovn-gw-1 ovs-vsctl get interfac $from_gw1_to_gw2 bfd_status | sed 's/.*flap_count=\"\([[0-9]]*\).*/\1/g') echo "Comparing $new_flap_count versus $flap_count_gw_1" test "$new_flap_count" -gt "$((flap_count_gw_1))" ]) else - AS_BOX([$(date +%H:%M:%S.%03N) Waiting for flap count between gw2 and gw1 to increase]) + echo "$(date +%H:%M:%S.%03N) Waiting for flap count between gw2 and gw1 to increase])" OVS_WAIT_UNTIL([ new_flap_count=$(m_as ovn-gw-2 ovs-vsctl get interfac $from_gw2_to_gw1 bfd_status | sed 's/.*flap_count=\"\([[0-9]]*\).*/\1/g') echo "Comparing $new_flap_count versus $flap_count_gw_2" test "$new_flap_count" -gt "$((flap_count_gw_2))" ]) - fi - AS_BOX([$(date +%H:%M:%S.%03N) Flapped!]) + echo "$(date +%H:%M:%S.%03N) Flapped!" # Wait a few more second for the fight. + sleep 4 + + echo "$(date +%H:%M:%S.%03N) Statistics after flapping" + lost=$(dump_statistics) + echo "===> $lost packet lost while handling migration" + AT_CHECK([test "$lost" -le "$max_expected_loss"]) +} + +final_check() +{ + action=$1 + lost=$2 + max_expected_loss_after_restoration=$3 + + # Wait a little more to get packets while network is restored sleep 2 - AS_BOX([$(date +%H:%M:%S.%03N) Statistics after flapping]) - dump_statistics - - if [ test "$action" == "stop_bfd" ]; then - # gw1 still alive and gw2 tried to claim => gw1 should restart generating garps. - check_for_new_garps gw1 "true" - check_for_new_garps gw2 "false" - check_for_new_garps gw3 "false" - check_for_new_echo_pkts gw1 00:00:c0:a8:00:01 f0:00:c0:a8:00:fe "true" - check_for_new_echo_pkts gw2 00:00:c0:a8:00:01 f0:00:c0:a8:00:fe "false" - check_for_new_echo_pkts gw3 00:00:c0:a8:00:01 f0:00:c0:a8:00:fe "false" - check_for_new_echo_pkts ch1 f0:00:c0:a8:01:fe f0:00:c0:a8:01:01 "true" - AS_BOX([$(date +%H:%M:%S.%03N) Unblocking bfd on gw1]) - nsenter --net=/proc/$gw1_pid/ns/net nft -a list ruleset - nsenter --net=/proc/$gw1_pid/ns/net nft delete table ip ovn-test - fi + echo "$(date +%H:%M:%S.%03N) Statistics after network restored (after $action)" + new_lost=$(dump_statistics) + echo "===> $((new_lost - lost)) packets lost during network restoration" + AT_CHECK([test "$((new_lost - lost))" -le "$max_expected_loss_after_restoration"]) + stop_sending_background_packets +} - if [ test "$action" == "kill_gw2" ]; then - # gw1 still alive, but gw2 did not try to claim => gw1 should not generate new garps. - check_for_new_garps gw1 "false" - check_for_new_garps gw2 "false" - check_for_new_garps gw3 "false" - check_for_new_echo_pkts gw1 00:00:c0:a8:00:01 f0:00:c0:a8:00:fe "true" - check_for_new_echo_pkts gw2 00:00:c0:a8:00:01 f0:00:c0:a8:00:fe "false" - check_for_new_echo_pkts gw3 00:00:c0:a8:00:01 f0:00:c0:a8:00:fe "false" - check_for_new_echo_pkts ch1 f0:00:c0:a8:01:fe f0:00:c0:a8:01:01 "true" - AS_BOX([$(date +%H:%M:%S.%03N) Restarting gw2 ovn-vswitchd]) - m_as ovn-gw-2 /usr/share/openvswitch/scripts/ovs-ctl start --system-id=ovn-gw-2 - - AS_BOX([$(date +%H:%M:%S.%03N) Restarting gw2 ovn-controller]) - m_as ovn-gw-2 /usr/share/ovn/scripts/ovn-ctl start_controller ${CONTROLLER_SSL_ARGS} - fi +check_garps() +{ + check_for_new_garps gw1 "$1" + check_for_new_garps gw2 "$2" + check_for_new_garps gw3 "$3" +} - if [ test "$action" == "kill_gw1" ]; then - # gw1 died => gw2 should generate garps. - check_for_new_garps gw1 "false" - check_for_new_garps gw2 "true" - check_for_new_garps gw3 "false" - check_for_new_echo_pkts gw1 00:00:c0:a8:00:01 f0:00:c0:a8:00:fe "false" - check_for_new_echo_pkts gw2 00:00:c0:a8:00:01 f0:00:c0:a8:00:fe "true" - check_for_new_echo_pkts gw3 00:00:c0:a8:00:01 f0:00:c0:a8:00:fe "false" - check_for_new_echo_pkts ch1 f0:00:c0:a8:01:fe f0:00:c0:a8:01:01 "true" - AS_BOX([$(date +%H:%M:%S.%03N) Restarting gw1 ovn-vswitchd]) - m_as ovn-gw-1 /usr/share/openvswitch/scripts/ovs-ctl start --system-id=ovn-gw-1 - - AS_BOX([$(date +%H:%M:%S.%03N) Restarting gw1 ovn-controller]) - m_as ovn-gw-1 /usr/share/ovn/scripts/ovn-ctl start_controller ${CONTROLLER_SSL_ARGS} - fi +check_packets() +{ + check_for_new_echo_pkts gw1 00:00:c0:a8:00:01 f0:00:c0:a8:00:fe "$1" + check_for_new_echo_pkts gw2 00:00:c0:a8:00:01 f0:00:c0:a8:00:fe "$2" + check_for_new_echo_pkts gw3 00:00:c0:a8:00:01 f0:00:c0:a8:00:fe "$3" + check_for_new_echo_pkts ch1 f0:00:c0:a8:01:fe f0:00:c0:a8:01:01 "$4" +} + +check_migration_between_gw1_and_gw2_bfd_stop() +{ + AS_BOX([$(date +%H:%M:%S.%03N) Testing migration after bfd_stop]) + max_expected_max_expected_loss1=$1 + max_expected_max_expected_loss2=$2 + prepare + + echo "$(date +%H:%M:%S.%03N) Blocking bfd on gw1 (from $ip_gw1 to $ip_gw2)" + nsenter --net=/proc/$gw1_pid/ns/net nft add table ip ovn-test + nsenter --net=/proc/$gw1_pid/ns/net nft 'add chain ip ovn-test INPUT { type filter hook input priority 0; policy accept; }' + # Drop BFD from gw-1 to gw-2: geneve port (6081), inner port 3784 (0xec8), Session state Up, Init, Down. + nsenter --net=/proc/$gw1_pid/ns/net nft add rule ip ovn-test INPUT ip daddr $ip_gw1 ip saddr $ip_gw2 udp dport 6081 '@th,416,16 == 0x0ec8 @th,472,8 == 0xc0 counter drop' + nsenter --net=/proc/$gw1_pid/ns/net nft add rule ip ovn-test INPUT ip daddr $ip_gw1 ip saddr $ip_gw2 udp dport 6081 '@th,416,16 == 0x0ec8 @th,472,8 == 0x80 counter drop' + nsenter --net=/proc/$gw1_pid/ns/net nft add rule ip ovn-test INPUT ip daddr $ip_gw1 ip saddr $ip_gw2 udp dport 6081 '@th,416,16 == 0x0ec8 @th,472,8 == 0x40 counter drop' + + check_loss_after_flap "gw1" $max_expected_max_expected_loss1 + + # gw1 still alive and gw2 tried to claim => gw1 should restart generating garps. + check_garps "true" "false" "false" + check_packets "true" "false" "false" "true" + + echo "$(date +%H:%M:%S.%03N) Unblocking bfd on gw1" + nsenter --net=/proc/$gw1_pid/ns/net nft -a list ruleset + nsenter --net=/proc/$gw1_pid/ns/net nft delete table ip ovn-test # The network is now restored => packets should go through gw1 and reach chassis-1. - check_for_new_echo_pkts gw1 00:00:c0:a8:00:01 f0:00:c0:a8:00:fe "true" - check_for_new_echo_pkts gw2 00:00:c0:a8:00:01 f0:00:c0:a8:00:fe "false" - check_for_new_echo_pkts gw3 00:00:c0:a8:00:01 f0:00:c0:a8:00:fe "false" - check_for_new_echo_pkts ch1 f0:00:c0:a8:01:fe f0:00:c0:a8:01:01 "true" - AS_BOX([$(date +%H:%M:%S.%03N) Statistics after network restored]) - dump_statistics - stop_sending_background_packets + check_packets "true" "false" "false" "true" + final_check "bfd_stop" $lost $max_expected_max_expected_loss2 +} + +check_migration_between_gw1_and_gw2_kill_gw2() { + AS_BOX([$(date +%H:%M:%S.%03N) Check migration after killing gw2 ovn-controller & vswitchd]) + max_expected_loss1=$1 + max_expected_loss2=$2 + prepare + + on_exit 'm_as ovn-gw-2 /usr/share/openvswitch/scripts/ovs-ctl status || + m_as ovn-gw-2 /usr/share/openvswitch/scripts/ovs-ctl start --system-id=ovn-gw-2' + on_exit 'm_as ovn-gw-2 /usr/share/ovn/scripts/ovn-ctl status_controller || + m_as ovn-gw-2 /usr/share/ovn/scripts/ovn-ctl start_controller ${CONTROLLER_SSL_ARGS}' + + m_as ovn-gw-2 kill -9 $(m_as ovn-gw-2 cat /run/ovn/ovn-controller.pid) + m_as ovn-gw-2 kill -9 $(m_as ovn-gw-2 cat /run/openvswitch/ovs-vswitchd.pid) + m_as ovn-gw-2 kill -9 $(m_as ovn-gw-2 cat /run/openvswitch/ovsdb-server.pid) + m_as ovn-gw-2 ovs-dpctl del-dp system@ovs-system + + check_loss_after_flap "gw2" $max_expected_loss1 + + # gw1 still alive, but gw2 did not try to claim => gw1 should not generate new garps. + check_garps "false" "false" "false" + check_packets "true" "fals" "false" "true" + + echo "$(date +%H:%M:%S.%03N) Restarting gw2 ovn-vswitchd]" + m_as ovn-gw-2 /usr/share/openvswitch/scripts/ovs-ctl start --system-id=ovn-gw-2 + + echo "$(date +%H:%M:%S.%03N) Restarting gw2 ovn-controller" + m_as ovn-gw-2 /usr/share/ovn/scripts/ovn-ctl start_controller ${CONTROLLER_SSL_ARGS} + + # The network is now restored => packets should go through gw1 and reach chassis-1. + check_packets "true" "false" "false" "true" + final_check "kill_gw2" $lost $max_expected_loss2 +} + +check_migration_between_gw1_and_gw2_update_ovs() { + AS_BOX([$(date +%H:%M:%S.%03N) Check migration after restarting gw1 ovs-vswitchd ("update")]) + max_expected_loss1=$1 + max_expected_loss2=$2 + prepare + + m_as ovn-gw-1 /usr/share/openvswitch/scripts/ovs-ctl restart --system-id=ovn-gw-1 + + check_loss_after_flap "gw1" $max_expected_loss1 + + # The network is now restored => packets should go through gw1 and reach chassis-1. + check_packets "true" "false" "false" "true" + final_check "ovs_update" $lost $max_expected_loss2 +} + +check_migration_between_gw1_and_gw2_kill_gw1() { + AS_BOX([$(date +%H:%M:%S.%03N) Killing gw1 ovn-controller and ovs-vswitchd]) + max_expected_loss1=$1 + max_expected_loss2=$2 + prepare + + on_exit 'm_as ovn-gw-1 /usr/share/openvswitch/scripts/ovs-ctl status || + m_as ovn-gw-1 /usr/share/openvswitch/scripts/ovs-ctl start --system-id=ovn-gw-1' + on_exit 'm_as ovn-gw-1 /usr/share/ovn/scripts/ovn-ctl status_controller || + m_as ovn-gw-1 /usr/share/ovn/scripts/ovn-ctl start_controller ${CONTROLLER_SSL_ARGS}' + + m_as ovn-gw-1 kill -9 $(m_as ovn-gw-1 cat /run/ovn/ovn-controller.pid) + m_as ovn-gw-1 kill -9 $(m_as ovn-gw-1 cat /run/openvswitch/ovs-vswitchd.pid) + m_as ovn-gw-1 kill -9 $(m_as ovn-gw-1 cat /run/openvswitch/ovsdb-server.pid) + # Also delete datapath (flows) + m_as ovn-gw-1 ovs-dpctl del-dp system@ovs-system + + check_loss_after_flap "gw1" $max_expected_loss1 + + # gw1 died => gw2 should generate garps. + check_garps "false" "true" "false" + check_packets "false" "true" "false" "true" + echo "$(date +%H:%M:%S.%03N) Restarting gw1 ovn-vswitchd after killing gw1" + m_as ovn-gw-1 /usr/share/openvswitch/scripts/ovs-ctl start --system-id=ovn-gw-1 + + # Wait some long time before restarting ovn-controller + sleep 10 + + # gw2 should still be handling packets as OVN not restarted on gw1 + check_packets "false" "true" "false" "true" + + echo "$(date +%H:%M:%S.%03N) Restarting gw1 ovn-controller after killing gw1" + m_as ovn-gw-1 /usr/share/ovn/scripts/ovn-ctl start_controller ${CONTROLLER_SSL_ARGS} + + # The network is now restored => packets should go through gw1 and reach chassis-1. + check_packets "true" "false" "false" "true" + final_check "kill_gw1" $lost $max_expected_loss2 +} + +check_migration_between_gw1_and_gw2_reboot_gw1() { + ip_gw1_eth1=$(podman exec ovn-gw-1 ip -brief address show eth1 | awk '{print $3}' | cut -d/ -f1) + cidr=$(podman exec ovn-gw-1 ip -brief address show eth1 | awk '{print $3}' | cut -d/ -f2) + AS_BOX([$(date +%H:%M:%S.%03N) Rebooting ovn-gw-1 with $ip_gw1_eth1/$cidr]) + max_expected_loss1=$1 + max_expected_loss2=$2 + prepare + + podman stop -t 0 ovn-gw-1 + (exec 3>&- 4>&- 5>&- 6>&-; podman start ovn-gw-1) + + add_port br-ovn eth1 $ip_gw1_eth1/$cidr + add_port br-ovn-ext eth2 + M_START_TCPDUMPS([ovn-gw-1], [-neei eth2], [gw1], [ovn-gw-1], [-neei eth1], [gw1_eth1], [ovn-gw-1], [-neei eth2 -Q out], [gw1_out]) + check_loss_after_flap "gw1" $max_expected_loss1 + + # gw1 died => gw2 should generate garps. + check_garps "false" "true" "false" + check_packets "false" "true" "false" "true" + + echo "$(date +%H:%M:%S.%03N) Restarting gw1 ovn-vswitchd after rebooting gw1" + m_as ovn-gw-1 /usr/share/openvswitch/scripts/ovs-ctl start --system-id=ovn-gw-1 + + # Wait some long time before restarting ovn-controller + sleep 10 + + # gw2 should still be handling packets as OVN not restarted on gw1 + check_packets "false" "true" "false" "true" + + echo "$(date +%H:%M:%S.%03N) Restarting gw1 ovn-controller after rebooting gw1" + m_as ovn-gw-1 /usr/share/ovn/scripts/ovn-ctl start_controller ${CONTROLLER_SSL_ARGS} + + # The network is now restored => packets should go through gw1 and reach chassis-1. + check_packets "true" "false" "false" "true" + final_check "kill_gw1" $lost $max_expected_loss2 +} + +check_compute_restart() { + AS_BOX([$(date +%H:%M:%S.%03N) Killing ovn-chassis-1 ovn-controller and ovs-vswitchd]) + max_expected_loss=$1 + prepare + + # Kill ovn-chassis-1 + echo "$(date +%H:%M:%S.%03N) Killing chassis-1" + on_exit 'm_as ovn-chassis-1 /usr/share/openvswitch/scripts/ovs-ctl status || + m_as ovn-chassis-1 /usr/share/openvswitch/scripts/ovs-ctl start --system-id=ovn-chassis-1' + on_exit 'm_as ovn-chassis-1 /usr/share/ovn/scripts/ovn-ctl status_controller || + m_as ovn-chassis-1 /usr/share/ovn/scripts/ovn-ctl start_controller ${CONTROLLER_SSL_ARGS}' + + m_as ovn-chassis-1 kill -9 $(m_as ovn-chassis-1 cat /run/ovn/ovn-controller.pid) + m_as ovn-chassis-1 kill -9 $(m_as ovn-chassis-1 cat /run/openvswitch/ovs-vswitchd.pid) + m_as ovn-chassis-1 kill -9 $(m_as ovn-chassis-1 cat /run/openvswitch/ovsdb-server.pid) + + # Now restart chassis-1 + flap_count=$(m_as ovn-gw-2 ovs-vsctl get interfac $from_gw2_to_ch1 bfd_status | sed 's/.*flap_count=\"\([[0-9]]*\).*/\1/g') + echo "$(date +%H:%M:%S.%03N) Restarting ovn-chassis-1 ovn-vswitchd." + m_as ovn-chassis-1 /usr/share/openvswitch/scripts/ovs-ctl start --system-id=ovn-chassis-1 + + echo "$(date +%H:%M:%S.%03N) Waiting for flap count between gw-1 and chassis-1 to increase" + OVS_WAIT_UNTIL([ + new_flap_count=$(m_as ovn-gw-1 ovs-vsctl get interfac $from_gw1_to_ch1 bfd_status | sed 's/.*flap_count=\"\([[0-9]]*\).*/\1/g') + echo "Comparing $new_flap_count versus $flap_count" + test "$new_flap_count" -gt "$((flap_count))" + ]) + + wait_bfd_up ovn-chassis-1 $from_ch1_to_gw1 + + echo "$(date +%H:%M:%S.%03N) Restarting ovn-chassis-1 ovn-controller." + m_as ovn-chassis-1 /usr/share/ovn/scripts/ovn-ctl start_controller ${CONTROLLER_SSL_ARGS} + + # Wait a long time to catch losses + sleep 5 + final_check "compute" 0 $max_expected_loss } start_tcpdump -AS_BOX([$(date +%H:%M:%S.%03N) Sending packet from hv1-vif1(inside1) to ext1]) +echo "$(date +%H:%M:%S.%03N) Sending packet from hv1-vif1(inside1) to ext1" M_NS_CHECK_EXEC([ovn-chassis-1], [hv1-vif1], [ping -c3 -q -i 0.1 192.168.0.1 | FORMAT_PING], [0], [dnl 3 packets transmitted, 3 received, 0% packet loss, time 0ms @@ -3433,7 +3629,7 @@ M_NS_CHECK_EXEC([ovn-chassis-1], [hv1-vif1], [ping -c3 -q -i 0.1 192.168.0.1 | F stop_tcpdump # It should have gone through gw1 and not gw2 -AS_BOX([$(date +%H:%M:%S.%03N) Checking it went through gw1 and not gw2]) +echo "$(date +%H:%M:%S.%03N) Checking it went through gw1 and not gw2" AT_CHECK([cat gw2.tcpdump | grep "ICMP echo"], [1], [dnl ]) @@ -3446,17 +3642,29 @@ f0:00:c0:a8:00:fe > 00:00:c0:a8:00:01, ethertype IPv4 (0x0800), length 98: 192.1 00:00:c0:a8:00:01 > f0:00:c0:a8:00:fe, ethertype IPv4 (0x0800), length 98: 192.168.0.1 > 192.168.1.1: ICMP echo reply, ]) -# We stop bfd between gw1 & gw2, but keep gw1 & gw2 running. -check_migration_between_gw1_and_gw2 "stop_bfd" +# We stop bfd between gw1 & gw2, but keep gw1 & gw2 running. We should not lose packets. +check_migration_between_gw1_and_gw2_bfd_stop 1 1 # We simulate death of gw2. It should not have any effect. -check_migration_between_gw1_and_gw2 "kill_gw2" +check_migration_between_gw1_and_gw2_kill_gw2 1 1 + +# We simulate ovs update on gw1. When ovs is stopped, flows should still be handled by Kernel datapath. +# When OVS is restarted, BFD should go down immediately, and gw2 will start handling packets. +# There will be packet losses as gw2 will usually see BFD from gw1 up (and hence relase port) before gw1 sees +# BFD up (and claim port). +check_migration_between_gw1_and_gw2_update_ovs 20 1 + +# We simulate restart of both OVS & OVN gw1. gw2 should take over. +check_migration_between_gw1_and_gw2_kill_gw1 40 20 # We simulate death of gw1. gw2 should take over. -check_migration_between_gw1_and_gw2 "kill_gw1" +check_migration_between_gw1_and_gw2_reboot_gw1 40 20 + +# We simulate restart of ovn-chassis-1. We expct for ~2 sec losses as we wait for bfd up before starting +# ovn-controller. +check_compute_restart 30 AT_CLEANUP -]) AT_SETUP([ovn multinode bgp L2 EVPN]) check_fake_multinode_setup
If a server unexpectedly rebooted, OVS, when restarted, sets BFD UP on bfd-enabled geneve tunnels. However, if it takes time to restart OVN, an HA gw chassis would attract the traffic while being unable to handle it (as no flows), resulting in traffic loss. This is fixed by re-using ovs flow-restore-wait. If set, OVS waits (prevents upcalls, ignores bfd, ...) until reset. Once OVS receives the notification of flow-restore-wait being false, it restarts handling upcalls, bfd... and ignores any new change to flow-restore-wait. Hence, on chassis hosting ha gateways, OVN toggles flow-restore-wait: it set it to false, waits for ack from OVS and then sets it back to true. If server reboots, OVS will see flow-restore-wait being true. OVN also sets external_ids->ovn-managed-flow-restore-wait when setting flow-restore-wait. When set, it tells that OVN once set flow-restore-wait. "ovs-ctl restart" also uses flow-restore-wait: when called, it saves the flows, stops "ovs-vswitchd", sets "flow-restore-wait" to true, restarts "ovs-vswitchd", restores the flows and finally removes "flow-restore-wait". So OVS will wait either for "ovs-ctl restart" to remove "flow-restore-wait" or for OVN to set "flow-restore-wait" to false. Reported-at: https://issues.redhat.com/browse/FDP-3075 Signed-off-by: Xavier Simonart <xsimonar@redhat.com> --- -v2 : - Updated based on Mark's feedback (commit message, comments). - Avoid setting flow-restore-wait for computes. - Add external_ids->ovn-managed-flow-restore-wait. - Updated test: add test for compute update + nits (variable name changes) --- controller/bfd.c | 5 +- controller/bfd.h | 4 +- controller/ovn-controller.8.xml | 11 + controller/ovn-controller.c | 171 +++++++++- tests/multinode-macros.at | 22 ++ tests/multinode.at | 546 ++++++++++++++++++++++---------- 6 files changed, 584 insertions(+), 175 deletions(-)