Message ID | 20230608141555.1339905-1-odivlad@gmail.com |
---|---|
State | New |
Headers | show |
Series | [ovs-dev] controller: disable OpenFlow inactivity probing | expand |
Context | Check | Description |
---|---|---|
ovsrobot/apply-robot | success | apply and check: success |
ovsrobot/github-robot-_Build_and_Test | fail | github build: failed |
ovsrobot/github-robot-_ovn-kubernetes | fail | github build: failed |
Hi, Thanks for linking the email thread, since I had the exact same concern that Numan did about detecting if ovs-vswitchd goes down. It sounds like because of the nature of unix sockets, this disconnection is still detected by OVN and failover can occur as expected. But what about if ovs-vswitchd is still running but in a "compromised" state. For instance, what if a coding error in ovs-vswitchd results in it running an infinite loop? In such a case, the unix connection would still be alive, but OVN would not be able to communicate with the ovs-vswitchd process. Does this situation eventually lead to OVN disconnecting anyway after it fills up the unix socket's buffer? If so, how long does that take? Or does it lead to OVN slowly growing in memory usage while it waits forever to be able to write to the socket? I think removing the hard-coded 5 second probe intervals from pinctrl.c and features.c is a good idea. But I think we should leave the configurable probe interval used by ofctrl.c for the case I described before. Since Han's patch to OVS should disable probe intervals from the OVS side by default, we would not trigger bad behavior simply because OVN has a lot of work to do during a recompute. What do you think? On 6/8/23 10:15, Vladislav Odintsov wrote: > OpenFlow connection is established over unix socket, which is a reliable > connection and doesn't require additional probing. > > With this patch openflow probing in ovn-controller -> ovs-vswitchd > direction is disabled for all three connections: > - OF flows management connection, > - OF features negotiation connection, > - pinctrl connection. > > ovn-controller external_ids:ovn-openflow-probe-interval is removed as > non-needed anymore. > > Disablement for ovs-vswitchd -> ovn-controller OF inacivity probing will > be done in the next patch. > > Reported-at: https://mail.openvswitch.org/pipermail/ovs-dev/2023-May/404625.html > Signed-off-by: Vladislav Odintsov <odivlad@gmail.com> > --- > NEWS | 5 +++++ > controller/ofctrl.c | 14 ++------------ > controller/ofctrl.h | 4 +--- > controller/ovn-controller.8.xml | 14 -------------- > controller/ovn-controller.c | 21 +-------------------- > controller/pinctrl.c | 2 +- > lib/features.c | 7 ++----- > 7 files changed, 12 insertions(+), 55 deletions(-) > > diff --git a/NEWS b/NEWS > index 645acea1f..bd63b187b 100644 > --- a/NEWS > +++ b/NEWS > @@ -1,5 +1,10 @@ > Post v23.06.0 > ------------- > + - Disable OpenFlow inactivity probing between ovn-controller and OVS. > + OF connection is established over unix socket, which is a reliable > + connection method and doesn't require additional probing. > + external_ids:ovn-openflow-probe-interval configuration option for > + ovn-controller is no longer matter. > > OVN v23.06.0 - 01 Jun 2023 > -------------------------- > diff --git a/controller/ofctrl.c b/controller/ofctrl.c > index 64a444ff6..788634494 100644 > --- a/controller/ofctrl.c > +++ b/controller/ofctrl.c > @@ -415,11 +415,9 @@ static void ofctrl_recv(const struct ofp_header *, enum ofptype); > > void > ofctrl_init(struct ovn_extend_table *group_table, > - struct ovn_extend_table *meter_table, > - int inactivity_probe_interval) > + struct ovn_extend_table *meter_table) > { > - swconn = rconn_create(inactivity_probe_interval, 0, > - DSCP_DEFAULT, 1 << OFP15_VERSION); > + swconn = rconn_create(0, 0, DSCP_DEFAULT, 1 << OFP15_VERSION); > tx_counter = rconn_packet_counter_create(); > hmap_init(&installed_lflows); > hmap_init(&installed_pflows); > @@ -2986,14 +2984,6 @@ ofctrl_is_connected(void) > return rconn_is_connected(swconn); > } > > -void > -ofctrl_set_probe_interval(int probe_interval) > -{ > - if (swconn) { > - rconn_set_probe_interval(swconn, probe_interval); > - } > -} > - > void > ofctrl_get_memory_usage(struct simap *usage) > { > diff --git a/controller/ofctrl.h b/controller/ofctrl.h > index 105f9370b..46bfccd85 100644 > --- a/controller/ofctrl.h > +++ b/controller/ofctrl.h > @@ -49,8 +49,7 @@ struct ovn_desired_flow_table { > > /* Interface for OVN main loop. */ > void ofctrl_init(struct ovn_extend_table *group_table, > - struct ovn_extend_table *meter_table, > - int inactivity_probe_interval); > + struct ovn_extend_table *meter_table); > bool ofctrl_run(const struct ovsrec_bridge *br_int, > const struct ovsrec_open_vswitch_table *, > struct shash *pending_ct_zones); > @@ -142,7 +141,6 @@ void ofctrl_check_and_add_flow_metered(struct ovn_desired_flow_table *, > > > bool ofctrl_is_connected(void); > -void ofctrl_set_probe_interval(int probe_interval); > void ofctrl_get_memory_usage(struct simap *usage); > > #endif /* controller/ofctrl.h */ > diff --git a/controller/ovn-controller.8.xml b/controller/ovn-controller.8.xml > index f61f43008..52eb137d3 100644 > --- a/controller/ovn-controller.8.xml > +++ b/controller/ovn-controller.8.xml > @@ -147,20 +147,6 @@ > </p> > </dd> > > - <dt><code>external_ids:ovn-openflow-probe-interval</code></dt> > - <dd> > - <p> > - The inactivity probe interval of the OpenFlow connection to the > - OpenvSwitch integration bridge, in seconds. > - If the value is zero, it disables the connection keepalive feature. > - </p> > - > - <p> > - If the value is nonzero, then it will be forced to a value of > - at least 5s. > - </p> > - </dd> > - > <dt><code>external_ids:ovn-encap-type</code></dt> > <dd> > <p> > diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c > index 3a81a13fb..732e7a690 100644 > --- a/controller/ovn-controller.c > +++ b/controller/ovn-controller.c > @@ -105,7 +105,6 @@ static unixctl_cb_func debug_ignore_startup_delay; > > #define DEFAULT_BRIDGE_NAME "br-int" > #define DEFAULT_DATAPATH "system" > -#define OFCTRL_DEFAULT_PROBE_INTERVAL_SEC 0 > > #define CONTROLLER_LOOP_STOPWATCH_NAME "flow-generation" > #define OFCTRL_PUT_STOPWATCH_NAME "flow-installation" > @@ -556,22 +555,6 @@ update_ssl_config(const struct ovsrec_ssl_table *ssl_table) > } > } > > -static int > -get_ofctrl_probe_interval(struct ovsdb_idl *ovs_idl) > -{ > - const struct ovsrec_open_vswitch *cfg = ovsrec_open_vswitch_first(ovs_idl); > - if (!cfg) { > - return OFCTRL_DEFAULT_PROBE_INTERVAL_SEC; > - } > - > - const struct ovsrec_open_vswitch_table *ovs_table = > - ovsrec_open_vswitch_table_get(ovs_idl); > - const char *chassis_id = get_ovs_chassis_id(ovs_table); > - return get_chassis_external_id_value_int( > - &cfg->external_ids, chassis_id, > - "ovn-openflow-probe-interval", OFCTRL_DEFAULT_PROBE_INTERVAL_SEC); > -} > - > /* Retrieves the pointer to the OVN Southbound database from 'ovs_idl' and > * updates 'sbdb_idl' with that pointer. */ > static void > @@ -4975,8 +4958,7 @@ main(int argc, char *argv[]) > engine_get_internal_data(&en_lb_data); > > ofctrl_init(&lflow_output_data->group_table, > - &lflow_output_data->meter_table, > - get_ofctrl_probe_interval(ovs_idl_loop.idl)); > + &lflow_output_data->meter_table); > ofctrl_seqno_init(); > > unixctl_command_register("group-table-list", "", 0, 0, > @@ -5104,7 +5086,6 @@ main(int argc, char *argv[]) > &reset_ovnsb_idl_min_index, > &ctrl_engine_ctx, &ovnsb_expected_cond_seqno); > update_ssl_config(ovsrec_ssl_table_get(ovs_idl_loop.idl)); > - ofctrl_set_probe_interval(get_ofctrl_probe_interval(ovs_idl_loop.idl)); > > struct ovsdb_idl_txn *ovnsb_idl_txn > = ovsdb_idl_loop_run(&ovnsb_idl_loop); > diff --git a/controller/pinctrl.c b/controller/pinctrl.c > index c396ad4c2..a86db3f32 100644 > --- a/controller/pinctrl.c > +++ b/controller/pinctrl.c > @@ -3362,7 +3362,7 @@ pinctrl_handler(void *arg_) > static long long int svc_monitors_next_run_time = LLONG_MAX; > static long long int send_prefixd_time = LLONG_MAX; > > - swconn = rconn_create(5, 0, DSCP_DEFAULT, 1 << OFP15_VERSION); > + swconn = rconn_create(0, 0, DSCP_DEFAULT, 1 << OFP15_VERSION); > > while (!latch_is_set(&pctrl->pinctrl_thread_exit)) { > long long int bfd_time = LLONG_MAX; > diff --git a/lib/features.c b/lib/features.c > index 97c9c86f0..ad3357570 100644 > --- a/lib/features.c > +++ b/lib/features.c > @@ -28,11 +28,10 @@ > #include "openvswitch/ofp-meter.h" > #include "openvswitch/ofp-util.h" > #include "ovn/features.h" > +#include "controller/ofctrl.h" > > VLOG_DEFINE_THIS_MODULE(features); > > -#define FEATURES_DEFAULT_PROBE_INTERVAL_SEC 5 > - > struct ovs_feature { > enum ovs_feature_value value; > const char *name; > @@ -82,8 +81,7 @@ static void > ovs_feature_rconn_setup(const char *br_name) > { > if (!swconn) { > - swconn = rconn_create(FEATURES_DEFAULT_PROBE_INTERVAL_SEC, 0, > - DSCP_DEFAULT, 1 << OFP15_VERSION); > + swconn = rconn_create(0, 0, DSCP_DEFAULT, 1 << OFP15_VERSION); > } > > if (!rconn_is_connected(swconn)) { > @@ -94,7 +92,6 @@ ovs_feature_rconn_setup(const char *br_name) > } > free(target); > } > - rconn_set_probe_interval(swconn, FEATURES_DEFAULT_PROBE_INTERVAL_SEC); > } > > static bool
Hi Mark, thanks for taking time to look on this! Your point with a hung OVS is really an interesting case. From one hand it’s a possible situation, and from another I guess it’s much higher probability for OVS to be busy by some other work rather than to hang in a loop. In my installation the first happening sometimes (OVS business by real work). The reasons can be different but I can’t say that it’s better to drop OF connection in such a moment (possibly triggering ha chassis-redirect ports failover). At the same time if we’re talking about L3GW node, which has a gateway chassis redirect port, the hung OVS should be detected by other nodes with BFD probes (if it’s a ha chassis-redirect port). And if it’s a normal hypervisor (with just a vif ports), so what can we do with it? I guess if we drop OF connection, it won’t make any positive changes. Maybe just release memory needed for handling this connection and not allocate buffers…? Unfortunately I can’t evaluate this. Moreover, the OVN default is a disabled probing. What can be a possible recommended values to set probes if to leave it as is? How should users find out that they have to configure this? Currently in docs there is only information that this option configures probe interval. But it’s not obvious when to configure it and which value to set. I hope this makes sense. Also, which Han’s patch to OVS you’re talking about? I looked through open OVS patches and haven’t seen any. Please point me out. regards, Vladislav Odintsov > On 9 Jun 2023, at 18:28, Mark Michelson <mmichels@redhat.com> wrote: > > Hi, > > Thanks for linking the email thread, since I had the exact same concern that Numan did about detecting if ovs-vswitchd goes down. It sounds like because of the nature of unix sockets, this disconnection is still detected by OVN and failover can occur as expected. > > But what about if ovs-vswitchd is still running but in a "compromised" state. For instance, what if a coding error in ovs-vswitchd results in it running an infinite loop? In such a case, the unix connection would still be alive, but OVN would not be able to communicate with the ovs-vswitchd process. > > Does this situation eventually lead to OVN disconnecting anyway after it fills up the unix socket's buffer? If so, how long does that take? Or does it lead to OVN slowly growing in memory usage while it waits forever to be able to write to the socket? > > I think removing the hard-coded 5 second probe intervals from pinctrl.c and features.c is a good idea. But I think we should leave the configurable probe interval used by ofctrl.c for the case I described before. Since Han's patch to OVS should disable probe intervals from the OVS side by default, we would not trigger bad behavior simply because OVN has a lot of work to do during a recompute. What do you think? >> On 6/8/23 10:15, Vladislav Odintsov wrote: >> OpenFlow connection is established over unix socket, which is a reliable >> connection and doesn't require additional probing. >> With this patch openflow probing in ovn-controller -> ovs-vswitchd >> direction is disabled for all three connections: >> - OF flows management connection, >> - OF features negotiation connection, >> - pinctrl connection. >> ovn-controller external_ids:ovn-openflow-probe-interval is removed as >> non-needed anymore. >> Disablement for ovs-vswitchd -> ovn-controller OF inacivity probing will >> be done in the next patch. >> Reported-at: https://mail.openvswitch.org/pipermail/ovs-dev/2023-May/404625.html >> Signed-off-by: Vladislav Odintsov <odivlad@gmail.com> >> --- >> NEWS | 5 +++++ >> controller/ofctrl.c | 14 ++------------ >> controller/ofctrl.h | 4 +--- >> controller/ovn-controller.8.xml | 14 -------------- >> controller/ovn-controller.c | 21 +-------------------- >> controller/pinctrl.c | 2 +- >> lib/features.c | 7 ++----- >> 7 files changed, 12 insertions(+), 55 deletions(-) >> diff --git a/NEWS b/NEWS >> index 645acea1f..bd63b187b 100644 >> --- a/NEWS >> +++ b/NEWS >> @@ -1,5 +1,10 @@ >> Post v23.06.0 >> ------------- >> + - Disable OpenFlow inactivity probing between ovn-controller and OVS. >> + OF connection is established over unix socket, which is a reliable >> + connection method and doesn't require additional probing. >> + external_ids:ovn-openflow-probe-interval configuration option for >> + ovn-controller is no longer matter. >> OVN v23.06.0 - 01 Jun 2023 >> -------------------------- >> diff --git a/controller/ofctrl.c b/controller/ofctrl.c >> index 64a444ff6..788634494 100644 >> --- a/controller/ofctrl.c >> +++ b/controller/ofctrl.c >> @@ -415,11 +415,9 @@ static void ofctrl_recv(const struct ofp_header *, enum ofptype); >> void >> ofctrl_init(struct ovn_extend_table *group_table, >> - struct ovn_extend_table *meter_table, >> - int inactivity_probe_interval) >> + struct ovn_extend_table *meter_table) >> { >> - swconn = rconn_create(inactivity_probe_interval, 0, >> - DSCP_DEFAULT, 1 << OFP15_VERSION); >> + swconn = rconn_create(0, 0, DSCP_DEFAULT, 1 << OFP15_VERSION); >> tx_counter = rconn_packet_counter_create(); >> hmap_init(&installed_lflows); >> hmap_init(&installed_pflows); >> @@ -2986,14 +2984,6 @@ ofctrl_is_connected(void) >> return rconn_is_connected(swconn); >> } >> -void >> -ofctrl_set_probe_interval(int probe_interval) >> -{ >> - if (swconn) { >> - rconn_set_probe_interval(swconn, probe_interval); >> - } >> -} >> - >> void >> ofctrl_get_memory_usage(struct simap *usage) >> { >> diff --git a/controller/ofctrl.h b/controller/ofctrl.h >> index 105f9370b..46bfccd85 100644 >> --- a/controller/ofctrl.h >> +++ b/controller/ofctrl.h >> @@ -49,8 +49,7 @@ struct ovn_desired_flow_table { >> /* Interface for OVN main loop. */ >> void ofctrl_init(struct ovn_extend_table *group_table, >> - struct ovn_extend_table *meter_table, >> - int inactivity_probe_interval); >> + struct ovn_extend_table *meter_table); >> bool ofctrl_run(const struct ovsrec_bridge *br_int, >> const struct ovsrec_open_vswitch_table *, >> struct shash *pending_ct_zones); >> @@ -142,7 +141,6 @@ void ofctrl_check_and_add_flow_metered(struct ovn_desired_flow_table *, >> bool ofctrl_is_connected(void); >> -void ofctrl_set_probe_interval(int probe_interval); >> void ofctrl_get_memory_usage(struct simap *usage); >> #endif /* controller/ofctrl.h */ >> diff --git a/controller/ovn-controller.8.xml b/controller/ovn-controller.8.xml >> index f61f43008..52eb137d3 100644 >> --- a/controller/ovn-controller.8.xml >> +++ b/controller/ovn-controller.8.xml >> @@ -147,20 +147,6 @@ >> </p> >> </dd> >> - <dt><code>external_ids:ovn-openflow-probe-interval</code></dt> >> - <dd> >> - <p> >> - The inactivity probe interval of the OpenFlow connection to the >> - OpenvSwitch integration bridge, in seconds. >> - If the value is zero, it disables the connection keepalive feature. >> - </p> >> - >> - <p> >> - If the value is nonzero, then it will be forced to a value of >> - at least 5s. >> - </p> >> - </dd> >> - >> <dt><code>external_ids:ovn-encap-type</code></dt> >> <dd> >> <p> >> diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c >> index 3a81a13fb..732e7a690 100644 >> --- a/controller/ovn-controller.c >> +++ b/controller/ovn-controller.c >> @@ -105,7 +105,6 @@ static unixctl_cb_func debug_ignore_startup_delay; >> #define DEFAULT_BRIDGE_NAME "br-int" >> #define DEFAULT_DATAPATH "system" >> -#define OFCTRL_DEFAULT_PROBE_INTERVAL_SEC 0 >> #define CONTROLLER_LOOP_STOPWATCH_NAME "flow-generation" >> #define OFCTRL_PUT_STOPWATCH_NAME "flow-installation" >> @@ -556,22 +555,6 @@ update_ssl_config(const struct ovsrec_ssl_table *ssl_table) >> } >> } >> -static int >> -get_ofctrl_probe_interval(struct ovsdb_idl *ovs_idl) >> -{ >> - const struct ovsrec_open_vswitch *cfg = ovsrec_open_vswitch_first(ovs_idl); >> - if (!cfg) { >> - return OFCTRL_DEFAULT_PROBE_INTERVAL_SEC; >> - } >> - >> - const struct ovsrec_open_vswitch_table *ovs_table = >> - ovsrec_open_vswitch_table_get(ovs_idl); >> - const char *chassis_id = get_ovs_chassis_id(ovs_table); >> - return get_chassis_external_id_value_int( >> - &cfg->external_ids, chassis_id, >> - "ovn-openflow-probe-interval", OFCTRL_DEFAULT_PROBE_INTERVAL_SEC); >> -} >> - >> /* Retrieves the pointer to the OVN Southbound database from 'ovs_idl' and >> * updates 'sbdb_idl' with that pointer. */ >> static void >> @@ -4975,8 +4958,7 @@ main(int argc, char *argv[]) >> engine_get_internal_data(&en_lb_data); >> ofctrl_init(&lflow_output_data->group_table, >> - &lflow_output_data->meter_table, >> - get_ofctrl_probe_interval(ovs_idl_loop.idl)); >> + &lflow_output_data->meter_table); >> ofctrl_seqno_init(); >> unixctl_command_register("group-table-list", "", 0, 0, >> @@ -5104,7 +5086,6 @@ main(int argc, char *argv[]) >> &reset_ovnsb_idl_min_index, >> &ctrl_engine_ctx, &ovnsb_expected_cond_seqno); >> update_ssl_config(ovsrec_ssl_table_get(ovs_idl_loop.idl)); >> - ofctrl_set_probe_interval(get_ofctrl_probe_interval(ovs_idl_loop.idl)); >> struct ovsdb_idl_txn *ovnsb_idl_txn >> = ovsdb_idl_loop_run(&ovnsb_idl_loop); >> diff --git a/controller/pinctrl.c b/controller/pinctrl.c >> index c396ad4c2..a86db3f32 100644 >> --- a/controller/pinctrl.c >> +++ b/controller/pinctrl.c >> @@ -3362,7 +3362,7 @@ pinctrl_handler(void *arg_) >> static long long int svc_monitors_next_run_time = LLONG_MAX; >> static long long int send_prefixd_time = LLONG_MAX; >> - swconn = rconn_create(5, 0, DSCP_DEFAULT, 1 << OFP15_VERSION); >> + swconn = rconn_create(0, 0, DSCP_DEFAULT, 1 << OFP15_VERSION); >> while (!latch_is_set(&pctrl->pinctrl_thread_exit)) { >> long long int bfd_time = LLONG_MAX; >> diff --git a/lib/features.c b/lib/features.c >> index 97c9c86f0..ad3357570 100644 >> --- a/lib/features.c >> +++ b/lib/features.c >> @@ -28,11 +28,10 @@ >> #include "openvswitch/ofp-meter.h" >> #include "openvswitch/ofp-util.h" >> #include "ovn/features.h" >> +#include "controller/ofctrl.h" >> VLOG_DEFINE_THIS_MODULE(features); >> -#define FEATURES_DEFAULT_PROBE_INTERVAL_SEC 5 >> - >> struct ovs_feature { >> enum ovs_feature_value value; >> const char *name; >> @@ -82,8 +81,7 @@ static void >> ovs_feature_rconn_setup(const char *br_name) >> { >> if (!swconn) { >> - swconn = rconn_create(FEATURES_DEFAULT_PROBE_INTERVAL_SEC, 0, >> - DSCP_DEFAULT, 1 << OFP15_VERSION); >> + swconn = rconn_create(0, 0, DSCP_DEFAULT, 1 << OFP15_VERSION); >> } >> if (!rconn_is_connected(swconn)) { >> @@ -94,7 +92,6 @@ ovs_feature_rconn_setup(const char *br_name) >> } >> free(target); >> } >> - rconn_set_probe_interval(swconn, FEATURES_DEFAULT_PROBE_INTERVAL_SEC); >> } >> static bool >
On 6/14/23 20:02, Vladislav Odintsov wrote: > Hi Mark, > > thanks for taking time to look on this! > > Your point with a hung OVS is really an interesting case. > > From one hand it’s a possible situation, and from another I guess it’s much higher probability for OVS to be busy by some other work rather than to hang in a loop. In my installation the first happening sometimes (OVS business by real work). The reasons can be different but I can’t say that it’s better to drop OF connection in such a moment (possibly triggering ha chassis-redirect ports failover). > > At the same time if we’re talking about L3GW node, which has a gateway chassis redirect port, the hung OVS should be detected by other nodes with BFD probes (if it’s a ha chassis-redirect port). > And if it’s a normal hypervisor (with just a vif ports), so what can we do with it? I guess if we drop OF connection, it won’t make any positive changes. Maybe just release memory needed for handling this connection and not allocate buffers…? Unfortunately I can’t evaluate this. > > Moreover, the OVN default is a disabled probing. What can be a possible recommended values to set probes if to leave it as is? How should users find out that they have to configure this? Currently in docs there is only information that this option configures probe interval. But it’s not obvious when to configure it and which value to set. > In support of this, quoting the original commit that added the probe config knob: "The main intention of this patch is to fix this recompuation issue for older versions (there by requesting backport), it still would be beneficial with the incremental processing engine." That was back in 2019: https://github.com/ovn-org/ovn/commit/c99069c8934c9ea55d310a8b6d48fb66aa477589 Since then lots of performance improvements landed in OVN (ovn-controller too), we're probably fine. Checking ovn-org/ovn-kubernetes code I see they were also setting ovn-openflow-probe-interval "just in case" so they probably won't be affected by it getting removed: https://github.com/ovn-org/ovn-kubernetes/commit/be1eb843852cd4c490514cec60553cca4feaf18e > I hope this makes sense. > The patch in its current form looks OK to me, I'm OK to merge it if Mark doesn't have anything against it; therefore: Acked-by: Dumitru Ceara <dceara@redhat.com> Regards, Dumitru > Also, which Han’s patch to OVS you’re talking about? I looked through open OVS patches and haven’t seen any. Please point me out. > > regards, > Vladislav Odintsov > >> On 9 Jun 2023, at 18:28, Mark Michelson <mmichels@redhat.com> wrote: >> >> Hi, >> >> Thanks for linking the email thread, since I had the exact same concern that Numan did about detecting if ovs-vswitchd goes down. It sounds like because of the nature of unix sockets, this disconnection is still detected by OVN and failover can occur as expected. >> >> But what about if ovs-vswitchd is still running but in a "compromised" state. For instance, what if a coding error in ovs-vswitchd results in it running an infinite loop? In such a case, the unix connection would still be alive, but OVN would not be able to communicate with the ovs-vswitchd process. >> >> Does this situation eventually lead to OVN disconnecting anyway after it fills up the unix socket's buffer? If so, how long does that take? Or does it lead to OVN slowly growing in memory usage while it waits forever to be able to write to the socket? >> >> I think removing the hard-coded 5 second probe intervals from pinctrl.c and features.c is a good idea. But I think we should leave the configurable probe interval used by ofctrl.c for the case I described before. Since Han's patch to OVS should disable probe intervals from the OVS side by default, we would not trigger bad behavior simply because OVN has a lot of work to do during a recompute. What do you think? >>> On 6/8/23 10:15, Vladislav Odintsov wrote: >>> OpenFlow connection is established over unix socket, which is a reliable >>> connection and doesn't require additional probing. >>> With this patch openflow probing in ovn-controller -> ovs-vswitchd >>> direction is disabled for all three connections: >>> - OF flows management connection, >>> - OF features negotiation connection, >>> - pinctrl connection. >>> ovn-controller external_ids:ovn-openflow-probe-interval is removed as >>> non-needed anymore. >>> Disablement for ovs-vswitchd -> ovn-controller OF inacivity probing will >>> be done in the next patch. >>> Reported-at: https://mail.openvswitch.org/pipermail/ovs-dev/2023-May/404625.html >>> Signed-off-by: Vladislav Odintsov <odivlad@gmail.com> >>> --- >>> NEWS | 5 +++++ >>> controller/ofctrl.c | 14 ++------------ >>> controller/ofctrl.h | 4 +--- >>> controller/ovn-controller.8.xml | 14 -------------- >>> controller/ovn-controller.c | 21 +-------------------- >>> controller/pinctrl.c | 2 +- >>> lib/features.c | 7 ++----- >>> 7 files changed, 12 insertions(+), 55 deletions(-) >>> diff --git a/NEWS b/NEWS >>> index 645acea1f..bd63b187b 100644 >>> --- a/NEWS >>> +++ b/NEWS >>> @@ -1,5 +1,10 @@ >>> Post v23.06.0 >>> ------------- >>> + - Disable OpenFlow inactivity probing between ovn-controller and OVS. >>> + OF connection is established over unix socket, which is a reliable >>> + connection method and doesn't require additional probing. >>> + external_ids:ovn-openflow-probe-interval configuration option for >>> + ovn-controller is no longer matter. >>> OVN v23.06.0 - 01 Jun 2023 >>> -------------------------- >>> diff --git a/controller/ofctrl.c b/controller/ofctrl.c >>> index 64a444ff6..788634494 100644 >>> --- a/controller/ofctrl.c >>> +++ b/controller/ofctrl.c >>> @@ -415,11 +415,9 @@ static void ofctrl_recv(const struct ofp_header *, enum ofptype); >>> void >>> ofctrl_init(struct ovn_extend_table *group_table, >>> - struct ovn_extend_table *meter_table, >>> - int inactivity_probe_interval) >>> + struct ovn_extend_table *meter_table) >>> { >>> - swconn = rconn_create(inactivity_probe_interval, 0, >>> - DSCP_DEFAULT, 1 << OFP15_VERSION); >>> + swconn = rconn_create(0, 0, DSCP_DEFAULT, 1 << OFP15_VERSION); >>> tx_counter = rconn_packet_counter_create(); >>> hmap_init(&installed_lflows); >>> hmap_init(&installed_pflows); >>> @@ -2986,14 +2984,6 @@ ofctrl_is_connected(void) >>> return rconn_is_connected(swconn); >>> } >>> -void >>> -ofctrl_set_probe_interval(int probe_interval) >>> -{ >>> - if (swconn) { >>> - rconn_set_probe_interval(swconn, probe_interval); >>> - } >>> -} >>> - >>> void >>> ofctrl_get_memory_usage(struct simap *usage) >>> { >>> diff --git a/controller/ofctrl.h b/controller/ofctrl.h >>> index 105f9370b..46bfccd85 100644 >>> --- a/controller/ofctrl.h >>> +++ b/controller/ofctrl.h >>> @@ -49,8 +49,7 @@ struct ovn_desired_flow_table { >>> /* Interface for OVN main loop. */ >>> void ofctrl_init(struct ovn_extend_table *group_table, >>> - struct ovn_extend_table *meter_table, >>> - int inactivity_probe_interval); >>> + struct ovn_extend_table *meter_table); >>> bool ofctrl_run(const struct ovsrec_bridge *br_int, >>> const struct ovsrec_open_vswitch_table *, >>> struct shash *pending_ct_zones); >>> @@ -142,7 +141,6 @@ void ofctrl_check_and_add_flow_metered(struct ovn_desired_flow_table *, >>> bool ofctrl_is_connected(void); >>> -void ofctrl_set_probe_interval(int probe_interval); >>> void ofctrl_get_memory_usage(struct simap *usage); >>> #endif /* controller/ofctrl.h */ >>> diff --git a/controller/ovn-controller.8.xml b/controller/ovn-controller.8.xml >>> index f61f43008..52eb137d3 100644 >>> --- a/controller/ovn-controller.8.xml >>> +++ b/controller/ovn-controller.8.xml >>> @@ -147,20 +147,6 @@ >>> </p> >>> </dd> >>> - <dt><code>external_ids:ovn-openflow-probe-interval</code></dt> >>> - <dd> >>> - <p> >>> - The inactivity probe interval of the OpenFlow connection to the >>> - OpenvSwitch integration bridge, in seconds. >>> - If the value is zero, it disables the connection keepalive feature. >>> - </p> >>> - >>> - <p> >>> - If the value is nonzero, then it will be forced to a value of >>> - at least 5s. >>> - </p> >>> - </dd> >>> - >>> <dt><code>external_ids:ovn-encap-type</code></dt> >>> <dd> >>> <p> >>> diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c >>> index 3a81a13fb..732e7a690 100644 >>> --- a/controller/ovn-controller.c >>> +++ b/controller/ovn-controller.c >>> @@ -105,7 +105,6 @@ static unixctl_cb_func debug_ignore_startup_delay; >>> #define DEFAULT_BRIDGE_NAME "br-int" >>> #define DEFAULT_DATAPATH "system" >>> -#define OFCTRL_DEFAULT_PROBE_INTERVAL_SEC 0 >>> #define CONTROLLER_LOOP_STOPWATCH_NAME "flow-generation" >>> #define OFCTRL_PUT_STOPWATCH_NAME "flow-installation" >>> @@ -556,22 +555,6 @@ update_ssl_config(const struct ovsrec_ssl_table *ssl_table) >>> } >>> } >>> -static int >>> -get_ofctrl_probe_interval(struct ovsdb_idl *ovs_idl) >>> -{ >>> - const struct ovsrec_open_vswitch *cfg = ovsrec_open_vswitch_first(ovs_idl); >>> - if (!cfg) { >>> - return OFCTRL_DEFAULT_PROBE_INTERVAL_SEC; >>> - } >>> - >>> - const struct ovsrec_open_vswitch_table *ovs_table = >>> - ovsrec_open_vswitch_table_get(ovs_idl); >>> - const char *chassis_id = get_ovs_chassis_id(ovs_table); >>> - return get_chassis_external_id_value_int( >>> - &cfg->external_ids, chassis_id, >>> - "ovn-openflow-probe-interval", OFCTRL_DEFAULT_PROBE_INTERVAL_SEC); >>> -} >>> - >>> /* Retrieves the pointer to the OVN Southbound database from 'ovs_idl' and >>> * updates 'sbdb_idl' with that pointer. */ >>> static void >>> @@ -4975,8 +4958,7 @@ main(int argc, char *argv[]) >>> engine_get_internal_data(&en_lb_data); >>> ofctrl_init(&lflow_output_data->group_table, >>> - &lflow_output_data->meter_table, >>> - get_ofctrl_probe_interval(ovs_idl_loop.idl)); >>> + &lflow_output_data->meter_table); >>> ofctrl_seqno_init(); >>> unixctl_command_register("group-table-list", "", 0, 0, >>> @@ -5104,7 +5086,6 @@ main(int argc, char *argv[]) >>> &reset_ovnsb_idl_min_index, >>> &ctrl_engine_ctx, &ovnsb_expected_cond_seqno); >>> update_ssl_config(ovsrec_ssl_table_get(ovs_idl_loop.idl)); >>> - ofctrl_set_probe_interval(get_ofctrl_probe_interval(ovs_idl_loop.idl)); >>> struct ovsdb_idl_txn *ovnsb_idl_txn >>> = ovsdb_idl_loop_run(&ovnsb_idl_loop); >>> diff --git a/controller/pinctrl.c b/controller/pinctrl.c >>> index c396ad4c2..a86db3f32 100644 >>> --- a/controller/pinctrl.c >>> +++ b/controller/pinctrl.c >>> @@ -3362,7 +3362,7 @@ pinctrl_handler(void *arg_) >>> static long long int svc_monitors_next_run_time = LLONG_MAX; >>> static long long int send_prefixd_time = LLONG_MAX; >>> - swconn = rconn_create(5, 0, DSCP_DEFAULT, 1 << OFP15_VERSION); >>> + swconn = rconn_create(0, 0, DSCP_DEFAULT, 1 << OFP15_VERSION); >>> while (!latch_is_set(&pctrl->pinctrl_thread_exit)) { >>> long long int bfd_time = LLONG_MAX; >>> diff --git a/lib/features.c b/lib/features.c >>> index 97c9c86f0..ad3357570 100644 >>> --- a/lib/features.c >>> +++ b/lib/features.c >>> @@ -28,11 +28,10 @@ >>> #include "openvswitch/ofp-meter.h" >>> #include "openvswitch/ofp-util.h" >>> #include "ovn/features.h" >>> +#include "controller/ofctrl.h" >>> VLOG_DEFINE_THIS_MODULE(features); >>> -#define FEATURES_DEFAULT_PROBE_INTERVAL_SEC 5 >>> - >>> struct ovs_feature { >>> enum ovs_feature_value value; >>> const char *name; >>> @@ -82,8 +81,7 @@ static void >>> ovs_feature_rconn_setup(const char *br_name) >>> { >>> if (!swconn) { >>> - swconn = rconn_create(FEATURES_DEFAULT_PROBE_INTERVAL_SEC, 0, >>> - DSCP_DEFAULT, 1 << OFP15_VERSION); >>> + swconn = rconn_create(0, 0, DSCP_DEFAULT, 1 << OFP15_VERSION); >>> } >>> if (!rconn_is_connected(swconn)) { >>> @@ -94,7 +92,6 @@ ovs_feature_rconn_setup(const char *br_name) >>> } >>> free(target); >>> } >>> - rconn_set_probe_interval(swconn, FEATURES_DEFAULT_PROBE_INTERVAL_SEC); >>> } >>> static bool >> > _______________________________________________ > dev mailing list > dev@openvswitch.org > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
On 9/11/23 14:54, Dumitru Ceara wrote: > On 6/14/23 20:02, Vladislav Odintsov wrote: >> Hi Mark, >> >> thanks for taking time to look on this! >> >> Your point with a hung OVS is really an interesting case. >> >> From one hand it’s a possible situation, and from another I guess it’s much higher probability for OVS to be busy by some other work rather than to hang in a loop. In my installation the first happening sometimes (OVS business by real work). The reasons can be different but I can’t say that it’s better to drop OF connection in such a moment (possibly triggering ha chassis-redirect ports failover). >> >> At the same time if we’re talking about L3GW node, which has a gateway chassis redirect port, the hung OVS should be detected by other nodes with BFD probes (if it’s a ha chassis-redirect port). >> And if it’s a normal hypervisor (with just a vif ports), so what can we do with it? I guess if we drop OF connection, it won’t make any positive changes. Maybe just release memory needed for handling this connection and not allocate buffers…? Unfortunately I can’t evaluate this. >> >> Moreover, the OVN default is a disabled probing. What can be a possible recommended values to set probes if to leave it as is? How should users find out that they have to configure this? Currently in docs there is only information that this option configures probe interval. But it’s not obvious when to configure it and which value to set. >> > > In support of this, quoting the original commit that added the probe > config knob: > > "The main intention of this patch is to fix this recompuation issue for > older versions (there by requesting backport), it still would be > beneficial with the incremental processing engine." > > That was back in 2019: > > https://github.com/ovn-org/ovn/commit/c99069c8934c9ea55d310a8b6d48fb66aa477589 > > Since then lots of performance improvements landed in OVN > (ovn-controller too), we're probably fine. Checking > ovn-org/ovn-kubernetes code I see they were also setting > ovn-openflow-probe-interval "just in case" so they probably won't be > affected by it getting removed: > > https://github.com/ovn-org/ovn-kubernetes/commit/be1eb843852cd4c490514cec60553cca4feaf18e > >> I hope this makes sense. >> > > The patch in its current form looks OK to me, I'm OK to merge it if Mark > doesn't have anything against it; therefore: > > Acked-by: Dumitru Ceara <dceara@redhat.com> > I forgot to mention this earlier but, since this patch was posted way before soft freeze and since the changes seem contained, it's probably fair to apply this to the 23.09 branch too. What do you think, Mark? Thanks! > Regards, > Dumitru > >> Also, which Han’s patch to OVS you’re talking about? I looked through open OVS patches and haven’t seen any. Please point me out. >> >> regards, >> Vladislav Odintsov >> >>> On 9 Jun 2023, at 18:28, Mark Michelson <mmichels@redhat.com> wrote: >>> >>> Hi, >>> >>> Thanks for linking the email thread, since I had the exact same concern that Numan did about detecting if ovs-vswitchd goes down. It sounds like because of the nature of unix sockets, this disconnection is still detected by OVN and failover can occur as expected. >>> >>> But what about if ovs-vswitchd is still running but in a "compromised" state. For instance, what if a coding error in ovs-vswitchd results in it running an infinite loop? In such a case, the unix connection would still be alive, but OVN would not be able to communicate with the ovs-vswitchd process. >>> >>> Does this situation eventually lead to OVN disconnecting anyway after it fills up the unix socket's buffer? If so, how long does that take? Or does it lead to OVN slowly growing in memory usage while it waits forever to be able to write to the socket? >>> >>> I think removing the hard-coded 5 second probe intervals from pinctrl.c and features.c is a good idea. But I think we should leave the configurable probe interval used by ofctrl.c for the case I described before. Since Han's patch to OVS should disable probe intervals from the OVS side by default, we would not trigger bad behavior simply because OVN has a lot of work to do during a recompute. What do you think? >>>> On 6/8/23 10:15, Vladislav Odintsov wrote: >>>> OpenFlow connection is established over unix socket, which is a reliable >>>> connection and doesn't require additional probing. >>>> With this patch openflow probing in ovn-controller -> ovs-vswitchd >>>> direction is disabled for all three connections: >>>> - OF flows management connection, >>>> - OF features negotiation connection, >>>> - pinctrl connection. >>>> ovn-controller external_ids:ovn-openflow-probe-interval is removed as >>>> non-needed anymore. >>>> Disablement for ovs-vswitchd -> ovn-controller OF inacivity probing will >>>> be done in the next patch. >>>> Reported-at: https://mail.openvswitch.org/pipermail/ovs-dev/2023-May/404625.html >>>> Signed-off-by: Vladislav Odintsov <odivlad@gmail.com> >>>> --- >>>> NEWS | 5 +++++ >>>> controller/ofctrl.c | 14 ++------------ >>>> controller/ofctrl.h | 4 +--- >>>> controller/ovn-controller.8.xml | 14 -------------- >>>> controller/ovn-controller.c | 21 +-------------------- >>>> controller/pinctrl.c | 2 +- >>>> lib/features.c | 7 ++----- >>>> 7 files changed, 12 insertions(+), 55 deletions(-) >>>> diff --git a/NEWS b/NEWS >>>> index 645acea1f..bd63b187b 100644 >>>> --- a/NEWS >>>> +++ b/NEWS >>>> @@ -1,5 +1,10 @@ >>>> Post v23.06.0 >>>> ------------- >>>> + - Disable OpenFlow inactivity probing between ovn-controller and OVS. >>>> + OF connection is established over unix socket, which is a reliable >>>> + connection method and doesn't require additional probing. >>>> + external_ids:ovn-openflow-probe-interval configuration option for >>>> + ovn-controller is no longer matter. >>>> OVN v23.06.0 - 01 Jun 2023 >>>> -------------------------- >>>> diff --git a/controller/ofctrl.c b/controller/ofctrl.c >>>> index 64a444ff6..788634494 100644 >>>> --- a/controller/ofctrl.c >>>> +++ b/controller/ofctrl.c >>>> @@ -415,11 +415,9 @@ static void ofctrl_recv(const struct ofp_header *, enum ofptype); >>>> void >>>> ofctrl_init(struct ovn_extend_table *group_table, >>>> - struct ovn_extend_table *meter_table, >>>> - int inactivity_probe_interval) >>>> + struct ovn_extend_table *meter_table) >>>> { >>>> - swconn = rconn_create(inactivity_probe_interval, 0, >>>> - DSCP_DEFAULT, 1 << OFP15_VERSION); >>>> + swconn = rconn_create(0, 0, DSCP_DEFAULT, 1 << OFP15_VERSION); >>>> tx_counter = rconn_packet_counter_create(); >>>> hmap_init(&installed_lflows); >>>> hmap_init(&installed_pflows); >>>> @@ -2986,14 +2984,6 @@ ofctrl_is_connected(void) >>>> return rconn_is_connected(swconn); >>>> } >>>> -void >>>> -ofctrl_set_probe_interval(int probe_interval) >>>> -{ >>>> - if (swconn) { >>>> - rconn_set_probe_interval(swconn, probe_interval); >>>> - } >>>> -} >>>> - >>>> void >>>> ofctrl_get_memory_usage(struct simap *usage) >>>> { >>>> diff --git a/controller/ofctrl.h b/controller/ofctrl.h >>>> index 105f9370b..46bfccd85 100644 >>>> --- a/controller/ofctrl.h >>>> +++ b/controller/ofctrl.h >>>> @@ -49,8 +49,7 @@ struct ovn_desired_flow_table { >>>> /* Interface for OVN main loop. */ >>>> void ofctrl_init(struct ovn_extend_table *group_table, >>>> - struct ovn_extend_table *meter_table, >>>> - int inactivity_probe_interval); >>>> + struct ovn_extend_table *meter_table); >>>> bool ofctrl_run(const struct ovsrec_bridge *br_int, >>>> const struct ovsrec_open_vswitch_table *, >>>> struct shash *pending_ct_zones); >>>> @@ -142,7 +141,6 @@ void ofctrl_check_and_add_flow_metered(struct ovn_desired_flow_table *, >>>> bool ofctrl_is_connected(void); >>>> -void ofctrl_set_probe_interval(int probe_interval); >>>> void ofctrl_get_memory_usage(struct simap *usage); >>>> #endif /* controller/ofctrl.h */ >>>> diff --git a/controller/ovn-controller.8.xml b/controller/ovn-controller.8.xml >>>> index f61f43008..52eb137d3 100644 >>>> --- a/controller/ovn-controller.8.xml >>>> +++ b/controller/ovn-controller.8.xml >>>> @@ -147,20 +147,6 @@ >>>> </p> >>>> </dd> >>>> - <dt><code>external_ids:ovn-openflow-probe-interval</code></dt> >>>> - <dd> >>>> - <p> >>>> - The inactivity probe interval of the OpenFlow connection to the >>>> - OpenvSwitch integration bridge, in seconds. >>>> - If the value is zero, it disables the connection keepalive feature. >>>> - </p> >>>> - >>>> - <p> >>>> - If the value is nonzero, then it will be forced to a value of >>>> - at least 5s. >>>> - </p> >>>> - </dd> >>>> - >>>> <dt><code>external_ids:ovn-encap-type</code></dt> >>>> <dd> >>>> <p> >>>> diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c >>>> index 3a81a13fb..732e7a690 100644 >>>> --- a/controller/ovn-controller.c >>>> +++ b/controller/ovn-controller.c >>>> @@ -105,7 +105,6 @@ static unixctl_cb_func debug_ignore_startup_delay; >>>> #define DEFAULT_BRIDGE_NAME "br-int" >>>> #define DEFAULT_DATAPATH "system" >>>> -#define OFCTRL_DEFAULT_PROBE_INTERVAL_SEC 0 >>>> #define CONTROLLER_LOOP_STOPWATCH_NAME "flow-generation" >>>> #define OFCTRL_PUT_STOPWATCH_NAME "flow-installation" >>>> @@ -556,22 +555,6 @@ update_ssl_config(const struct ovsrec_ssl_table *ssl_table) >>>> } >>>> } >>>> -static int >>>> -get_ofctrl_probe_interval(struct ovsdb_idl *ovs_idl) >>>> -{ >>>> - const struct ovsrec_open_vswitch *cfg = ovsrec_open_vswitch_first(ovs_idl); >>>> - if (!cfg) { >>>> - return OFCTRL_DEFAULT_PROBE_INTERVAL_SEC; >>>> - } >>>> - >>>> - const struct ovsrec_open_vswitch_table *ovs_table = >>>> - ovsrec_open_vswitch_table_get(ovs_idl); >>>> - const char *chassis_id = get_ovs_chassis_id(ovs_table); >>>> - return get_chassis_external_id_value_int( >>>> - &cfg->external_ids, chassis_id, >>>> - "ovn-openflow-probe-interval", OFCTRL_DEFAULT_PROBE_INTERVAL_SEC); >>>> -} >>>> - >>>> /* Retrieves the pointer to the OVN Southbound database from 'ovs_idl' and >>>> * updates 'sbdb_idl' with that pointer. */ >>>> static void >>>> @@ -4975,8 +4958,7 @@ main(int argc, char *argv[]) >>>> engine_get_internal_data(&en_lb_data); >>>> ofctrl_init(&lflow_output_data->group_table, >>>> - &lflow_output_data->meter_table, >>>> - get_ofctrl_probe_interval(ovs_idl_loop.idl)); >>>> + &lflow_output_data->meter_table); >>>> ofctrl_seqno_init(); >>>> unixctl_command_register("group-table-list", "", 0, 0, >>>> @@ -5104,7 +5086,6 @@ main(int argc, char *argv[]) >>>> &reset_ovnsb_idl_min_index, >>>> &ctrl_engine_ctx, &ovnsb_expected_cond_seqno); >>>> update_ssl_config(ovsrec_ssl_table_get(ovs_idl_loop.idl)); >>>> - ofctrl_set_probe_interval(get_ofctrl_probe_interval(ovs_idl_loop.idl)); >>>> struct ovsdb_idl_txn *ovnsb_idl_txn >>>> = ovsdb_idl_loop_run(&ovnsb_idl_loop); >>>> diff --git a/controller/pinctrl.c b/controller/pinctrl.c >>>> index c396ad4c2..a86db3f32 100644 >>>> --- a/controller/pinctrl.c >>>> +++ b/controller/pinctrl.c >>>> @@ -3362,7 +3362,7 @@ pinctrl_handler(void *arg_) >>>> static long long int svc_monitors_next_run_time = LLONG_MAX; >>>> static long long int send_prefixd_time = LLONG_MAX; >>>> - swconn = rconn_create(5, 0, DSCP_DEFAULT, 1 << OFP15_VERSION); >>>> + swconn = rconn_create(0, 0, DSCP_DEFAULT, 1 << OFP15_VERSION); >>>> while (!latch_is_set(&pctrl->pinctrl_thread_exit)) { >>>> long long int bfd_time = LLONG_MAX; >>>> diff --git a/lib/features.c b/lib/features.c >>>> index 97c9c86f0..ad3357570 100644 >>>> --- a/lib/features.c >>>> +++ b/lib/features.c >>>> @@ -28,11 +28,10 @@ >>>> #include "openvswitch/ofp-meter.h" >>>> #include "openvswitch/ofp-util.h" >>>> #include "ovn/features.h" >>>> +#include "controller/ofctrl.h" >>>> VLOG_DEFINE_THIS_MODULE(features); >>>> -#define FEATURES_DEFAULT_PROBE_INTERVAL_SEC 5 >>>> - >>>> struct ovs_feature { >>>> enum ovs_feature_value value; >>>> const char *name; >>>> @@ -82,8 +81,7 @@ static void >>>> ovs_feature_rconn_setup(const char *br_name) >>>> { >>>> if (!swconn) { >>>> - swconn = rconn_create(FEATURES_DEFAULT_PROBE_INTERVAL_SEC, 0, >>>> - DSCP_DEFAULT, 1 << OFP15_VERSION); >>>> + swconn = rconn_create(0, 0, DSCP_DEFAULT, 1 << OFP15_VERSION); >>>> } >>>> if (!rconn_is_connected(swconn)) { >>>> @@ -94,7 +92,6 @@ ovs_feature_rconn_setup(const char *br_name) >>>> } >>>> free(target); >>>> } >>>> - rconn_set_probe_interval(swconn, FEATURES_DEFAULT_PROBE_INTERVAL_SEC); >>>> } >>>> static bool >>> >> _______________________________________________ >> dev mailing list >> dev@openvswitch.org >> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Hi Dumitru, if eventually this patch got merged, please remove next lines from its commit message: "Disablement for ovs-vswitchd -> ovn-controller OF inacivity probing will be done in the next patch." Unfortunately I was unsuccessful in removing these probing direction (default 60s). Maybe you can give any hint how to solve this [1]. 1: https://mail.openvswitch.org/pipermail/ovs-dev/2023-June/405445.html > On 11 Sep 2023, at 15:57, Dumitru Ceara <dceara@redhat.com> wrote: > > On 9/11/23 14:54, Dumitru Ceara wrote: >> On 6/14/23 20:02, Vladislav Odintsov wrote: >>> Hi Mark, >>> >>> thanks for taking time to look on this! >>> >>> Your point with a hung OVS is really an interesting case. >>> >>> From one hand it’s a possible situation, and from another I guess it’s much higher probability for OVS to be busy by some other work rather than to hang in a loop. In my installation the first happening sometimes (OVS business by real work). The reasons can be different but I can’t say that it’s better to drop OF connection in such a moment (possibly triggering ha chassis-redirect ports failover). >>> >>> At the same time if we’re talking about L3GW node, which has a gateway chassis redirect port, the hung OVS should be detected by other nodes with BFD probes (if it’s a ha chassis-redirect port). >>> And if it’s a normal hypervisor (with just a vif ports), so what can we do with it? I guess if we drop OF connection, it won’t make any positive changes. Maybe just release memory needed for handling this connection and not allocate buffers…? Unfortunately I can’t evaluate this. >>> >>> Moreover, the OVN default is a disabled probing. What can be a possible recommended values to set probes if to leave it as is? How should users find out that they have to configure this? Currently in docs there is only information that this option configures probe interval. But it’s not obvious when to configure it and which value to set. >>> >> >> In support of this, quoting the original commit that added the probe >> config knob: >> >> "The main intention of this patch is to fix this recompuation issue for >> older versions (there by requesting backport), it still would be >> beneficial with the incremental processing engine." >> >> That was back in 2019: >> >> https://github.com/ovn-org/ovn/commit/c99069c8934c9ea55d310a8b6d48fb66aa477589 >> >> Since then lots of performance improvements landed in OVN >> (ovn-controller too), we're probably fine. Checking >> ovn-org/ovn-kubernetes code I see they were also setting >> ovn-openflow-probe-interval "just in case" so they probably won't be >> affected by it getting removed: >> >> https://github.com/ovn-org/ovn-kubernetes/commit/be1eb843852cd4c490514cec60553cca4feaf18e >> >>> I hope this makes sense. >>> >> >> The patch in its current form looks OK to me, I'm OK to merge it if Mark >> doesn't have anything against it; therefore: >> >> Acked-by: Dumitru Ceara <dceara@redhat.com> >> > > I forgot to mention this earlier but, since this patch was posted way > before soft freeze and since the changes seem contained, it's probably > fair to apply this to the 23.09 branch too. What do you think, Mark? > > Thanks! > >> Regards, >> Dumitru >> >>> Also, which Han’s patch to OVS you’re talking about? I looked through open OVS patches and haven’t seen any. Please point me out. >>> >>> regards, >>> Vladislav Odintsov >>> >>>> On 9 Jun 2023, at 18:28, Mark Michelson <mmichels@redhat.com> wrote: >>>> >>>> Hi, >>>> >>>> Thanks for linking the email thread, since I had the exact same concern that Numan did about detecting if ovs-vswitchd goes down. It sounds like because of the nature of unix sockets, this disconnection is still detected by OVN and failover can occur as expected. >>>> >>>> But what about if ovs-vswitchd is still running but in a "compromised" state. For instance, what if a coding error in ovs-vswitchd results in it running an infinite loop? In such a case, the unix connection would still be alive, but OVN would not be able to communicate with the ovs-vswitchd process. >>>> >>>> Does this situation eventually lead to OVN disconnecting anyway after it fills up the unix socket's buffer? If so, how long does that take? Or does it lead to OVN slowly growing in memory usage while it waits forever to be able to write to the socket? >>>> >>>> I think removing the hard-coded 5 second probe intervals from pinctrl.c and features.c is a good idea. But I think we should leave the configurable probe interval used by ofctrl.c for the case I described before. Since Han's patch to OVS should disable probe intervals from the OVS side by default, we would not trigger bad behavior simply because OVN has a lot of work to do during a recompute. What do you think? >>>>> On 6/8/23 10:15, Vladislav Odintsov wrote: >>>>> OpenFlow connection is established over unix socket, which is a reliable >>>>> connection and doesn't require additional probing. >>>>> With this patch openflow probing in ovn-controller -> ovs-vswitchd >>>>> direction is disabled for all three connections: >>>>> - OF flows management connection, >>>>> - OF features negotiation connection, >>>>> - pinctrl connection. >>>>> ovn-controller external_ids:ovn-openflow-probe-interval is removed as >>>>> non-needed anymore. >>>>> Disablement for ovs-vswitchd -> ovn-controller OF inacivity probing will >>>>> be done in the next patch. >>>>> Reported-at: https://mail.openvswitch.org/pipermail/ovs-dev/2023-May/404625.html >>>>> Signed-off-by: Vladislav Odintsov <odivlad@gmail.com> >>>>> --- >>>>> NEWS | 5 +++++ >>>>> controller/ofctrl.c | 14 ++------------ >>>>> controller/ofctrl.h | 4 +--- >>>>> controller/ovn-controller.8.xml | 14 -------------- >>>>> controller/ovn-controller.c | 21 +-------------------- >>>>> controller/pinctrl.c | 2 +- >>>>> lib/features.c | 7 ++----- >>>>> 7 files changed, 12 insertions(+), 55 deletions(-) >>>>> diff --git a/NEWS b/NEWS >>>>> index 645acea1f..bd63b187b 100644 >>>>> --- a/NEWS >>>>> +++ b/NEWS >>>>> @@ -1,5 +1,10 @@ >>>>> Post v23.06.0 >>>>> ------------- >>>>> + - Disable OpenFlow inactivity probing between ovn-controller and OVS. >>>>> + OF connection is established over unix socket, which is a reliable >>>>> + connection method and doesn't require additional probing. >>>>> + external_ids:ovn-openflow-probe-interval configuration option for >>>>> + ovn-controller is no longer matter. >>>>> OVN v23.06.0 - 01 Jun 2023 >>>>> -------------------------- >>>>> diff --git a/controller/ofctrl.c b/controller/ofctrl.c >>>>> index 64a444ff6..788634494 100644 >>>>> --- a/controller/ofctrl.c >>>>> +++ b/controller/ofctrl.c >>>>> @@ -415,11 +415,9 @@ static void ofctrl_recv(const struct ofp_header *, enum ofptype); >>>>> void >>>>> ofctrl_init(struct ovn_extend_table *group_table, >>>>> - struct ovn_extend_table *meter_table, >>>>> - int inactivity_probe_interval) >>>>> + struct ovn_extend_table *meter_table) >>>>> { >>>>> - swconn = rconn_create(inactivity_probe_interval, 0, >>>>> - DSCP_DEFAULT, 1 << OFP15_VERSION); >>>>> + swconn = rconn_create(0, 0, DSCP_DEFAULT, 1 << OFP15_VERSION); >>>>> tx_counter = rconn_packet_counter_create(); >>>>> hmap_init(&installed_lflows); >>>>> hmap_init(&installed_pflows); >>>>> @@ -2986,14 +2984,6 @@ ofctrl_is_connected(void) >>>>> return rconn_is_connected(swconn); >>>>> } >>>>> -void >>>>> -ofctrl_set_probe_interval(int probe_interval) >>>>> -{ >>>>> - if (swconn) { >>>>> - rconn_set_probe_interval(swconn, probe_interval); >>>>> - } >>>>> -} >>>>> - >>>>> void >>>>> ofctrl_get_memory_usage(struct simap *usage) >>>>> { >>>>> diff --git a/controller/ofctrl.h b/controller/ofctrl.h >>>>> index 105f9370b..46bfccd85 100644 >>>>> --- a/controller/ofctrl.h >>>>> +++ b/controller/ofctrl.h >>>>> @@ -49,8 +49,7 @@ struct ovn_desired_flow_table { >>>>> /* Interface for OVN main loop. */ >>>>> void ofctrl_init(struct ovn_extend_table *group_table, >>>>> - struct ovn_extend_table *meter_table, >>>>> - int inactivity_probe_interval); >>>>> + struct ovn_extend_table *meter_table); >>>>> bool ofctrl_run(const struct ovsrec_bridge *br_int, >>>>> const struct ovsrec_open_vswitch_table *, >>>>> struct shash *pending_ct_zones); >>>>> @@ -142,7 +141,6 @@ void ofctrl_check_and_add_flow_metered(struct ovn_desired_flow_table *, >>>>> bool ofctrl_is_connected(void); >>>>> -void ofctrl_set_probe_interval(int probe_interval); >>>>> void ofctrl_get_memory_usage(struct simap *usage); >>>>> #endif /* controller/ofctrl.h */ >>>>> diff --git a/controller/ovn-controller.8.xml b/controller/ovn-controller.8.xml >>>>> index f61f43008..52eb137d3 100644 >>>>> --- a/controller/ovn-controller.8.xml >>>>> +++ b/controller/ovn-controller.8.xml >>>>> @@ -147,20 +147,6 @@ >>>>> </p> >>>>> </dd> >>>>> - <dt><code>external_ids:ovn-openflow-probe-interval</code></dt> >>>>> - <dd> >>>>> - <p> >>>>> - The inactivity probe interval of the OpenFlow connection to the >>>>> - OpenvSwitch integration bridge, in seconds. >>>>> - If the value is zero, it disables the connection keepalive feature. >>>>> - </p> >>>>> - >>>>> - <p> >>>>> - If the value is nonzero, then it will be forced to a value of >>>>> - at least 5s. >>>>> - </p> >>>>> - </dd> >>>>> - >>>>> <dt><code>external_ids:ovn-encap-type</code></dt> >>>>> <dd> >>>>> <p> >>>>> diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c >>>>> index 3a81a13fb..732e7a690 100644 >>>>> --- a/controller/ovn-controller.c >>>>> +++ b/controller/ovn-controller.c >>>>> @@ -105,7 +105,6 @@ static unixctl_cb_func debug_ignore_startup_delay; >>>>> #define DEFAULT_BRIDGE_NAME "br-int" >>>>> #define DEFAULT_DATAPATH "system" >>>>> -#define OFCTRL_DEFAULT_PROBE_INTERVAL_SEC 0 >>>>> #define CONTROLLER_LOOP_STOPWATCH_NAME "flow-generation" >>>>> #define OFCTRL_PUT_STOPWATCH_NAME "flow-installation" >>>>> @@ -556,22 +555,6 @@ update_ssl_config(const struct ovsrec_ssl_table *ssl_table) >>>>> } >>>>> } >>>>> -static int >>>>> -get_ofctrl_probe_interval(struct ovsdb_idl *ovs_idl) >>>>> -{ >>>>> - const struct ovsrec_open_vswitch *cfg = ovsrec_open_vswitch_first(ovs_idl); >>>>> - if (!cfg) { >>>>> - return OFCTRL_DEFAULT_PROBE_INTERVAL_SEC; >>>>> - } >>>>> - >>>>> - const struct ovsrec_open_vswitch_table *ovs_table = >>>>> - ovsrec_open_vswitch_table_get(ovs_idl); >>>>> - const char *chassis_id = get_ovs_chassis_id(ovs_table); >>>>> - return get_chassis_external_id_value_int( >>>>> - &cfg->external_ids, chassis_id, >>>>> - "ovn-openflow-probe-interval", OFCTRL_DEFAULT_PROBE_INTERVAL_SEC); >>>>> -} >>>>> - >>>>> /* Retrieves the pointer to the OVN Southbound database from 'ovs_idl' and >>>>> * updates 'sbdb_idl' with that pointer. */ >>>>> static void >>>>> @@ -4975,8 +4958,7 @@ main(int argc, char *argv[]) >>>>> engine_get_internal_data(&en_lb_data); >>>>> ofctrl_init(&lflow_output_data->group_table, >>>>> - &lflow_output_data->meter_table, >>>>> - get_ofctrl_probe_interval(ovs_idl_loop.idl)); >>>>> + &lflow_output_data->meter_table); >>>>> ofctrl_seqno_init(); >>>>> unixctl_command_register("group-table-list", "", 0, 0, >>>>> @@ -5104,7 +5086,6 @@ main(int argc, char *argv[]) >>>>> &reset_ovnsb_idl_min_index, >>>>> &ctrl_engine_ctx, &ovnsb_expected_cond_seqno); >>>>> update_ssl_config(ovsrec_ssl_table_get(ovs_idl_loop.idl)); >>>>> - ofctrl_set_probe_interval(get_ofctrl_probe_interval(ovs_idl_loop.idl)); >>>>> struct ovsdb_idl_txn *ovnsb_idl_txn >>>>> = ovsdb_idl_loop_run(&ovnsb_idl_loop); >>>>> diff --git a/controller/pinctrl.c b/controller/pinctrl.c >>>>> index c396ad4c2..a86db3f32 100644 >>>>> --- a/controller/pinctrl.c >>>>> +++ b/controller/pinctrl.c >>>>> @@ -3362,7 +3362,7 @@ pinctrl_handler(void *arg_) >>>>> static long long int svc_monitors_next_run_time = LLONG_MAX; >>>>> static long long int send_prefixd_time = LLONG_MAX; >>>>> - swconn = rconn_create(5, 0, DSCP_DEFAULT, 1 << OFP15_VERSION); >>>>> + swconn = rconn_create(0, 0, DSCP_DEFAULT, 1 << OFP15_VERSION); >>>>> while (!latch_is_set(&pctrl->pinctrl_thread_exit)) { >>>>> long long int bfd_time = LLONG_MAX; >>>>> diff --git a/lib/features.c b/lib/features.c >>>>> index 97c9c86f0..ad3357570 100644 >>>>> --- a/lib/features.c >>>>> +++ b/lib/features.c >>>>> @@ -28,11 +28,10 @@ >>>>> #include "openvswitch/ofp-meter.h" >>>>> #include "openvswitch/ofp-util.h" >>>>> #include "ovn/features.h" >>>>> +#include "controller/ofctrl.h" >>>>> VLOG_DEFINE_THIS_MODULE(features); >>>>> -#define FEATURES_DEFAULT_PROBE_INTERVAL_SEC 5 >>>>> - >>>>> struct ovs_feature { >>>>> enum ovs_feature_value value; >>>>> const char *name; >>>>> @@ -82,8 +81,7 @@ static void >>>>> ovs_feature_rconn_setup(const char *br_name) >>>>> { >>>>> if (!swconn) { >>>>> - swconn = rconn_create(FEATURES_DEFAULT_PROBE_INTERVAL_SEC, 0, >>>>> - DSCP_DEFAULT, 1 << OFP15_VERSION); >>>>> + swconn = rconn_create(0, 0, DSCP_DEFAULT, 1 << OFP15_VERSION); >>>>> } >>>>> if (!rconn_is_connected(swconn)) { >>>>> @@ -94,7 +92,6 @@ ovs_feature_rconn_setup(const char *br_name) >>>>> } >>>>> free(target); >>>>> } >>>>> - rconn_set_probe_interval(swconn, FEATURES_DEFAULT_PROBE_INTERVAL_SEC); >>>>> } >>>>> static bool >>>> >>> _______________________________________________ >>> dev mailing list >>> dev@openvswitch.org >>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev Regards, Vladislav Odintsov
On Mon, Sep 11, 2023 at 7:31 AM Vladislav Odintsov <odivlad@gmail.com> wrote: > > Hi Dumitru, > > if eventually this patch got merged, please remove next lines from its commit message: > > "Disablement for ovs-vswitchd -> ovn-controller OF inacivity probing will > be done in the next patch." > > Unfortunately I was unsuccessful in removing these probing direction (default 60s). > Maybe you can give any hint how to solve this [1]. > > 1: https://mail.openvswitch.org/pipermail/ovs-dev/2023-June/405445.html > > > On 11 Sep 2023, at 15:57, Dumitru Ceara <dceara@redhat.com> wrote: > > > > On 9/11/23 14:54, Dumitru Ceara wrote: > >> On 6/14/23 20:02, Vladislav Odintsov wrote: > >>> Hi Mark, > >>> > >>> thanks for taking time to look on this! > >>> > >>> Your point with a hung OVS is really an interesting case. > >>> > >>> From one hand it’s a possible situation, and from another I guess it’s much higher probability for OVS to be busy by some other work rather than to hang in a loop. In my installation the first happening sometimes (OVS business by real work). The reasons can be different but I can’t say that it’s better to drop OF connection in such a moment (possibly triggering ha chassis-redirect ports failover). > >>> > >>> At the same time if we’re talking about L3GW node, which has a gateway chassis redirect port, the hung OVS should be detected by other nodes with BFD probes (if it’s a ha chassis-redirect port). > >>> And if it’s a normal hypervisor (with just a vif ports), so what can we do with it? I guess if we drop OF connection, it won’t make any positive changes. Maybe just release memory needed for handling this connection and not allocate buffers…? Unfortunately I can’t evaluate this. > >>> > >>> Moreover, the OVN default is a disabled probing. What can be a possible recommended values to set probes if to leave it as is? How should users find out that they have to configure this? Currently in docs there is only information that this option configures probe interval. But it’s not obvious when to configure it and which value to set. > >>> > >> > >> In support of this, quoting the original commit that added the probe > >> config knob: > >> > >> "The main intention of this patch is to fix this recompuation issue for > >> older versions (there by requesting backport), it still would be > >> beneficial with the incremental processing engine." > >> > >> That was back in 2019: > >> > >> https://github.com/ovn-org/ovn/commit/c99069c8934c9ea55d310a8b6d48fb66aa477589 To clarify a little more, the effect of the above commit in fact changed the default behavior of probe from disabled (which is the default behavior of all unix/punix rconn) to 5s, and was changed back to default disabled by: https://github.com/ovn-org/ovn/commit/b8af8549396e62d6523be18e104352e334825783 So I am totally fine to remove this config if no one would ever need to enable the probe. > >> > >> Since then lots of performance improvements landed in OVN > >> (ovn-controller too), we're probably fine. Checking > >> ovn-org/ovn-kubernetes code I see they were also setting > >> ovn-openflow-probe-interval "just in case" so they probably won't be > >> affected by it getting removed: > >> > >> https://github.com/ovn-org/ovn-kubernetes/commit/be1eb843852cd4c490514cec60553cca4feaf18e > >> > >>> I hope this makes sense. > >>> > >> > >> The patch in its current form looks OK to me, I'm OK to merge it if Mark > >> doesn't have anything against it; therefore: > >> > >> Acked-by: Dumitru Ceara <dceara@redhat.com> > >> > > > > I forgot to mention this earlier but, since this patch was posted way > > before soft freeze and since the changes seem contained, it's probably > > fair to apply this to the 23.09 branch too. What do you think, Mark? > > > > Thanks! > > > >> Regards, > >> Dumitru > >> > >>> Also, which Han’s patch to OVS you’re talking about? I looked through open OVS patches and haven’t seen any. Please point me out. > >>> I don't remember that I posted any patch related to this. We have a patch downstream that changed the hardcoded 60s to 0 in OVS for the mgmt controller. > >>> regards, > >>> Vladislav Odintsov > >>> > >>>> On 9 Jun 2023, at 18:28, Mark Michelson <mmichels@redhat.com> wrote: > >>>> > >>>> Hi, > >>>> > >>>> Thanks for linking the email thread, since I had the exact same concern that Numan did about detecting if ovs-vswitchd goes down. It sounds like because of the nature of unix sockets, this disconnection is still detected by OVN and failover can occur as expected. > >>>> > >>>> But what about if ovs-vswitchd is still running but in a "compromised" state. For instance, what if a coding error in ovs-vswitchd results in it running an infinite loop? In such a case, the unix connection would still be alive, but OVN would not be able to communicate with the ovs-vswitchd process. > >>>> > >>>> Does this situation eventually lead to OVN disconnecting anyway after it fills up the unix socket's buffer? If so, how long does that take? Or does it lead to OVN slowly growing in memory usage while it waits forever to be able to write to the socket? Hi Mark, if OVS is in such infinite loop, disconnect it from ovn-controller and reconnect would not help anyway, right? > >>>> > >>>> I think removing the hard-coded 5 second probe intervals from pinctrl.c and features.c is a good idea. But I think we should leave the configurable probe interval used by ofctrl.c for the case I described before. Since Han's patch to OVS should disable probe intervals from the OVS side by default, we would not trigger bad behavior simply because OVN has a lot of work to do during a recompute. What do you think? > >>>>> On 6/8/23 10:15, Vladislav Odintsov wrote: > >>>>> OpenFlow connection is established over unix socket, which is a reliable > >>>>> connection and doesn't require additional probing. > >>>>> With this patch openflow probing in ovn-controller -> ovs-vswitchd > >>>>> direction is disabled for all three connections: > >>>>> - OF flows management connection, > >>>>> - OF features negotiation connection, > >>>>> - pinctrl connection. > >>>>> ovn-controller external_ids:ovn-openflow-probe-interval is removed as > >>>>> non-needed anymore. > >>>>> Disablement for ovs-vswitchd -> ovn-controller OF inacivity probing will > >>>>> be done in the next patch. > >>>>> Reported-at: https://mail.openvswitch.org/pipermail/ovs-dev/2023-May/404625.html > >>>>> Signed-off-by: Vladislav Odintsov <odivlad@gmail.com> > >>>>> --- > >>>>> NEWS | 5 +++++ > >>>>> controller/ofctrl.c | 14 ++------------ > >>>>> controller/ofctrl.h | 4 +--- > >>>>> controller/ovn-controller.8.xml | 14 -------------- > >>>>> controller/ovn-controller.c | 21 +-------------------- > >>>>> controller/pinctrl.c | 2 +- > >>>>> lib/features.c | 7 ++----- > >>>>> 7 files changed, 12 insertions(+), 55 deletions(-) > >>>>> diff --git a/NEWS b/NEWS > >>>>> index 645acea1f..bd63b187b 100644 > >>>>> --- a/NEWS > >>>>> +++ b/NEWS > >>>>> @@ -1,5 +1,10 @@ > >>>>> Post v23.06.0 > >>>>> ------------- > >>>>> + - Disable OpenFlow inactivity probing between ovn-controller and OVS. > >>>>> + OF connection is established over unix socket, which is a reliable > >>>>> + connection method and doesn't require additional probing. > >>>>> + external_ids:ovn-openflow-probe-interval configuration option for > >>>>> + ovn-controller is no longer matter. nit: s/is no longer matter/is removed Acked-by: Han Zhou <hzhou@ovn.org> (waiting for Mark's confirm about this concerns) > >>>>> OVN v23.06.0 - 01 Jun 2023 > >>>>> -------------------------- > >>>>> diff --git a/controller/ofctrl.c b/controller/ofctrl.c > >>>>> index 64a444ff6..788634494 100644 > >>>>> --- a/controller/ofctrl.c > >>>>> +++ b/controller/ofctrl.c > >>>>> @@ -415,11 +415,9 @@ static void ofctrl_recv(const struct ofp_header *, enum ofptype); > >>>>> void > >>>>> ofctrl_init(struct ovn_extend_table *group_table, > >>>>> - struct ovn_extend_table *meter_table, > >>>>> - int inactivity_probe_interval) > >>>>> + struct ovn_extend_table *meter_table) > >>>>> { > >>>>> - swconn = rconn_create(inactivity_probe_interval, 0, > >>>>> - DSCP_DEFAULT, 1 << OFP15_VERSION); > >>>>> + swconn = rconn_create(0, 0, DSCP_DEFAULT, 1 << OFP15_VERSION); > >>>>> tx_counter = rconn_packet_counter_create(); > >>>>> hmap_init(&installed_lflows); > >>>>> hmap_init(&installed_pflows); > >>>>> @@ -2986,14 +2984,6 @@ ofctrl_is_connected(void) > >>>>> return rconn_is_connected(swconn); > >>>>> } > >>>>> -void > >>>>> -ofctrl_set_probe_interval(int probe_interval) > >>>>> -{ > >>>>> - if (swconn) { > >>>>> - rconn_set_probe_interval(swconn, probe_interval); > >>>>> - } > >>>>> -} > >>>>> - > >>>>> void > >>>>> ofctrl_get_memory_usage(struct simap *usage) > >>>>> { > >>>>> diff --git a/controller/ofctrl.h b/controller/ofctrl.h > >>>>> index 105f9370b..46bfccd85 100644 > >>>>> --- a/controller/ofctrl.h > >>>>> +++ b/controller/ofctrl.h > >>>>> @@ -49,8 +49,7 @@ struct ovn_desired_flow_table { > >>>>> /* Interface for OVN main loop. */ > >>>>> void ofctrl_init(struct ovn_extend_table *group_table, > >>>>> - struct ovn_extend_table *meter_table, > >>>>> - int inactivity_probe_interval); > >>>>> + struct ovn_extend_table *meter_table); > >>>>> bool ofctrl_run(const struct ovsrec_bridge *br_int, > >>>>> const struct ovsrec_open_vswitch_table *, > >>>>> struct shash *pending_ct_zones); > >>>>> @@ -142,7 +141,6 @@ void ofctrl_check_and_add_flow_metered(struct ovn_desired_flow_table *, > >>>>> bool ofctrl_is_connected(void); > >>>>> -void ofctrl_set_probe_interval(int probe_interval); > >>>>> void ofctrl_get_memory_usage(struct simap *usage); > >>>>> #endif /* controller/ofctrl.h */ > >>>>> diff --git a/controller/ovn-controller.8.xml b/controller/ovn-controller.8.xml > >>>>> index f61f43008..52eb137d3 100644 > >>>>> --- a/controller/ovn-controller.8.xml > >>>>> +++ b/controller/ovn-controller.8.xml > >>>>> @@ -147,20 +147,6 @@ > >>>>> </p> > >>>>> </dd> > >>>>> - <dt><code>external_ids:ovn-openflow-probe-interval</code></dt> > >>>>> - <dd> > >>>>> - <p> > >>>>> - The inactivity probe interval of the OpenFlow connection to the > >>>>> - OpenvSwitch integration bridge, in seconds. > >>>>> - If the value is zero, it disables the connection keepalive feature. > >>>>> - </p> > >>>>> - > >>>>> - <p> > >>>>> - If the value is nonzero, then it will be forced to a value of > >>>>> - at least 5s. > >>>>> - </p> > >>>>> - </dd> > >>>>> - > >>>>> <dt><code>external_ids:ovn-encap-type</code></dt> > >>>>> <dd> > >>>>> <p> > >>>>> diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c > >>>>> index 3a81a13fb..732e7a690 100644 > >>>>> --- a/controller/ovn-controller.c > >>>>> +++ b/controller/ovn-controller.c > >>>>> @@ -105,7 +105,6 @@ static unixctl_cb_func debug_ignore_startup_delay; > >>>>> #define DEFAULT_BRIDGE_NAME "br-int" > >>>>> #define DEFAULT_DATAPATH "system" > >>>>> -#define OFCTRL_DEFAULT_PROBE_INTERVAL_SEC 0 > >>>>> #define CONTROLLER_LOOP_STOPWATCH_NAME "flow-generation" > >>>>> #define OFCTRL_PUT_STOPWATCH_NAME "flow-installation" > >>>>> @@ -556,22 +555,6 @@ update_ssl_config(const struct ovsrec_ssl_table *ssl_table) > >>>>> } > >>>>> } > >>>>> -static int > >>>>> -get_ofctrl_probe_interval(struct ovsdb_idl *ovs_idl) > >>>>> -{ > >>>>> - const struct ovsrec_open_vswitch *cfg = ovsrec_open_vswitch_first(ovs_idl); > >>>>> - if (!cfg) { > >>>>> - return OFCTRL_DEFAULT_PROBE_INTERVAL_SEC; > >>>>> - } > >>>>> - > >>>>> - const struct ovsrec_open_vswitch_table *ovs_table = > >>>>> - ovsrec_open_vswitch_table_get(ovs_idl); > >>>>> - const char *chassis_id = get_ovs_chassis_id(ovs_table); > >>>>> - return get_chassis_external_id_value_int( > >>>>> - &cfg->external_ids, chassis_id, > >>>>> - "ovn-openflow-probe-interval", OFCTRL_DEFAULT_PROBE_INTERVAL_SEC); > >>>>> -} > >>>>> - > >>>>> /* Retrieves the pointer to the OVN Southbound database from 'ovs_idl' and > >>>>> * updates 'sbdb_idl' with that pointer. */ > >>>>> static void > >>>>> @@ -4975,8 +4958,7 @@ main(int argc, char *argv[]) > >>>>> engine_get_internal_data(&en_lb_data); > >>>>> ofctrl_init(&lflow_output_data->group_table, > >>>>> - &lflow_output_data->meter_table, > >>>>> - get_ofctrl_probe_interval(ovs_idl_loop.idl)); > >>>>> + &lflow_output_data->meter_table); > >>>>> ofctrl_seqno_init(); > >>>>> unixctl_command_register("group-table-list", "", 0, 0, > >>>>> @@ -5104,7 +5086,6 @@ main(int argc, char *argv[]) > >>>>> &reset_ovnsb_idl_min_index, > >>>>> &ctrl_engine_ctx, &ovnsb_expected_cond_seqno); > >>>>> update_ssl_config(ovsrec_ssl_table_get(ovs_idl_loop.idl)); > >>>>> - ofctrl_set_probe_interval(get_ofctrl_probe_interval(ovs_idl_loop.idl)); > >>>>> struct ovsdb_idl_txn *ovnsb_idl_txn > >>>>> = ovsdb_idl_loop_run(&ovnsb_idl_loop); > >>>>> diff --git a/controller/pinctrl.c b/controller/pinctrl.c > >>>>> index c396ad4c2..a86db3f32 100644 > >>>>> --- a/controller/pinctrl.c > >>>>> +++ b/controller/pinctrl.c > >>>>> @@ -3362,7 +3362,7 @@ pinctrl_handler(void *arg_) > >>>>> static long long int svc_monitors_next_run_time = LLONG_MAX; > >>>>> static long long int send_prefixd_time = LLONG_MAX; > >>>>> - swconn = rconn_create(5, 0, DSCP_DEFAULT, 1 << OFP15_VERSION); > >>>>> + swconn = rconn_create(0, 0, DSCP_DEFAULT, 1 << OFP15_VERSION); > >>>>> while (!latch_is_set(&pctrl->pinctrl_thread_exit)) { > >>>>> long long int bfd_time = LLONG_MAX; > >>>>> diff --git a/lib/features.c b/lib/features.c > >>>>> index 97c9c86f0..ad3357570 100644 > >>>>> --- a/lib/features.c > >>>>> +++ b/lib/features.c > >>>>> @@ -28,11 +28,10 @@ > >>>>> #include "openvswitch/ofp-meter.h" > >>>>> #include "openvswitch/ofp-util.h" > >>>>> #include "ovn/features.h" > >>>>> +#include "controller/ofctrl.h" > >>>>> VLOG_DEFINE_THIS_MODULE(features); > >>>>> -#define FEATURES_DEFAULT_PROBE_INTERVAL_SEC 5 > >>>>> - > >>>>> struct ovs_feature { > >>>>> enum ovs_feature_value value; > >>>>> const char *name; > >>>>> @@ -82,8 +81,7 @@ static void > >>>>> ovs_feature_rconn_setup(const char *br_name) > >>>>> { > >>>>> if (!swconn) { > >>>>> - swconn = rconn_create(FEATURES_DEFAULT_PROBE_INTERVAL_SEC, 0, > >>>>> - DSCP_DEFAULT, 1 << OFP15_VERSION); > >>>>> + swconn = rconn_create(0, 0, DSCP_DEFAULT, 1 << OFP15_VERSION); > >>>>> } > >>>>> if (!rconn_is_connected(swconn)) { > >>>>> @@ -94,7 +92,6 @@ ovs_feature_rconn_setup(const char *br_name) > >>>>> } > >>>>> free(target); > >>>>> } > >>>>> - rconn_set_probe_interval(swconn, FEATURES_DEFAULT_PROBE_INTERVAL_SEC); > >>>>> } > >>>>> static bool > >>>> > >>> _______________________________________________ > >>> dev mailing list > >>> dev@openvswitch.org > >>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev > > > Regards, > Vladislav Odintsov > > _______________________________________________ > dev mailing list > dev@openvswitch.org > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
diff --git a/NEWS b/NEWS index 645acea1f..bd63b187b 100644 --- a/NEWS +++ b/NEWS @@ -1,5 +1,10 @@ Post v23.06.0 ------------- + - Disable OpenFlow inactivity probing between ovn-controller and OVS. + OF connection is established over unix socket, which is a reliable + connection method and doesn't require additional probing. + external_ids:ovn-openflow-probe-interval configuration option for + ovn-controller is no longer matter. OVN v23.06.0 - 01 Jun 2023 -------------------------- diff --git a/controller/ofctrl.c b/controller/ofctrl.c index 64a444ff6..788634494 100644 --- a/controller/ofctrl.c +++ b/controller/ofctrl.c @@ -415,11 +415,9 @@ static void ofctrl_recv(const struct ofp_header *, enum ofptype); void ofctrl_init(struct ovn_extend_table *group_table, - struct ovn_extend_table *meter_table, - int inactivity_probe_interval) + struct ovn_extend_table *meter_table) { - swconn = rconn_create(inactivity_probe_interval, 0, - DSCP_DEFAULT, 1 << OFP15_VERSION); + swconn = rconn_create(0, 0, DSCP_DEFAULT, 1 << OFP15_VERSION); tx_counter = rconn_packet_counter_create(); hmap_init(&installed_lflows); hmap_init(&installed_pflows); @@ -2986,14 +2984,6 @@ ofctrl_is_connected(void) return rconn_is_connected(swconn); } -void -ofctrl_set_probe_interval(int probe_interval) -{ - if (swconn) { - rconn_set_probe_interval(swconn, probe_interval); - } -} - void ofctrl_get_memory_usage(struct simap *usage) { diff --git a/controller/ofctrl.h b/controller/ofctrl.h index 105f9370b..46bfccd85 100644 --- a/controller/ofctrl.h +++ b/controller/ofctrl.h @@ -49,8 +49,7 @@ struct ovn_desired_flow_table { /* Interface for OVN main loop. */ void ofctrl_init(struct ovn_extend_table *group_table, - struct ovn_extend_table *meter_table, - int inactivity_probe_interval); + struct ovn_extend_table *meter_table); bool ofctrl_run(const struct ovsrec_bridge *br_int, const struct ovsrec_open_vswitch_table *, struct shash *pending_ct_zones); @@ -142,7 +141,6 @@ void ofctrl_check_and_add_flow_metered(struct ovn_desired_flow_table *, bool ofctrl_is_connected(void); -void ofctrl_set_probe_interval(int probe_interval); void ofctrl_get_memory_usage(struct simap *usage); #endif /* controller/ofctrl.h */ diff --git a/controller/ovn-controller.8.xml b/controller/ovn-controller.8.xml index f61f43008..52eb137d3 100644 --- a/controller/ovn-controller.8.xml +++ b/controller/ovn-controller.8.xml @@ -147,20 +147,6 @@ </p> </dd> - <dt><code>external_ids:ovn-openflow-probe-interval</code></dt> - <dd> - <p> - The inactivity probe interval of the OpenFlow connection to the - OpenvSwitch integration bridge, in seconds. - If the value is zero, it disables the connection keepalive feature. - </p> - - <p> - If the value is nonzero, then it will be forced to a value of - at least 5s. - </p> - </dd> - <dt><code>external_ids:ovn-encap-type</code></dt> <dd> <p> diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c index 3a81a13fb..732e7a690 100644 --- a/controller/ovn-controller.c +++ b/controller/ovn-controller.c @@ -105,7 +105,6 @@ static unixctl_cb_func debug_ignore_startup_delay; #define DEFAULT_BRIDGE_NAME "br-int" #define DEFAULT_DATAPATH "system" -#define OFCTRL_DEFAULT_PROBE_INTERVAL_SEC 0 #define CONTROLLER_LOOP_STOPWATCH_NAME "flow-generation" #define OFCTRL_PUT_STOPWATCH_NAME "flow-installation" @@ -556,22 +555,6 @@ update_ssl_config(const struct ovsrec_ssl_table *ssl_table) } } -static int -get_ofctrl_probe_interval(struct ovsdb_idl *ovs_idl) -{ - const struct ovsrec_open_vswitch *cfg = ovsrec_open_vswitch_first(ovs_idl); - if (!cfg) { - return OFCTRL_DEFAULT_PROBE_INTERVAL_SEC; - } - - const struct ovsrec_open_vswitch_table *ovs_table = - ovsrec_open_vswitch_table_get(ovs_idl); - const char *chassis_id = get_ovs_chassis_id(ovs_table); - return get_chassis_external_id_value_int( - &cfg->external_ids, chassis_id, - "ovn-openflow-probe-interval", OFCTRL_DEFAULT_PROBE_INTERVAL_SEC); -} - /* Retrieves the pointer to the OVN Southbound database from 'ovs_idl' and * updates 'sbdb_idl' with that pointer. */ static void @@ -4975,8 +4958,7 @@ main(int argc, char *argv[]) engine_get_internal_data(&en_lb_data); ofctrl_init(&lflow_output_data->group_table, - &lflow_output_data->meter_table, - get_ofctrl_probe_interval(ovs_idl_loop.idl)); + &lflow_output_data->meter_table); ofctrl_seqno_init(); unixctl_command_register("group-table-list", "", 0, 0, @@ -5104,7 +5086,6 @@ main(int argc, char *argv[]) &reset_ovnsb_idl_min_index, &ctrl_engine_ctx, &ovnsb_expected_cond_seqno); update_ssl_config(ovsrec_ssl_table_get(ovs_idl_loop.idl)); - ofctrl_set_probe_interval(get_ofctrl_probe_interval(ovs_idl_loop.idl)); struct ovsdb_idl_txn *ovnsb_idl_txn = ovsdb_idl_loop_run(&ovnsb_idl_loop); diff --git a/controller/pinctrl.c b/controller/pinctrl.c index c396ad4c2..a86db3f32 100644 --- a/controller/pinctrl.c +++ b/controller/pinctrl.c @@ -3362,7 +3362,7 @@ pinctrl_handler(void *arg_) static long long int svc_monitors_next_run_time = LLONG_MAX; static long long int send_prefixd_time = LLONG_MAX; - swconn = rconn_create(5, 0, DSCP_DEFAULT, 1 << OFP15_VERSION); + swconn = rconn_create(0, 0, DSCP_DEFAULT, 1 << OFP15_VERSION); while (!latch_is_set(&pctrl->pinctrl_thread_exit)) { long long int bfd_time = LLONG_MAX; diff --git a/lib/features.c b/lib/features.c index 97c9c86f0..ad3357570 100644 --- a/lib/features.c +++ b/lib/features.c @@ -28,11 +28,10 @@ #include "openvswitch/ofp-meter.h" #include "openvswitch/ofp-util.h" #include "ovn/features.h" +#include "controller/ofctrl.h" VLOG_DEFINE_THIS_MODULE(features); -#define FEATURES_DEFAULT_PROBE_INTERVAL_SEC 5 - struct ovs_feature { enum ovs_feature_value value; const char *name; @@ -82,8 +81,7 @@ static void ovs_feature_rconn_setup(const char *br_name) { if (!swconn) { - swconn = rconn_create(FEATURES_DEFAULT_PROBE_INTERVAL_SEC, 0, - DSCP_DEFAULT, 1 << OFP15_VERSION); + swconn = rconn_create(0, 0, DSCP_DEFAULT, 1 << OFP15_VERSION); } if (!rconn_is_connected(swconn)) { @@ -94,7 +92,6 @@ ovs_feature_rconn_setup(const char *br_name) } free(target); } - rconn_set_probe_interval(swconn, FEATURES_DEFAULT_PROBE_INTERVAL_SEC); } static bool
OpenFlow connection is established over unix socket, which is a reliable connection and doesn't require additional probing. With this patch openflow probing in ovn-controller -> ovs-vswitchd direction is disabled for all three connections: - OF flows management connection, - OF features negotiation connection, - pinctrl connection. ovn-controller external_ids:ovn-openflow-probe-interval is removed as non-needed anymore. Disablement for ovs-vswitchd -> ovn-controller OF inacivity probing will be done in the next patch. Reported-at: https://mail.openvswitch.org/pipermail/ovs-dev/2023-May/404625.html Signed-off-by: Vladislav Odintsov <odivlad@gmail.com> --- NEWS | 5 +++++ controller/ofctrl.c | 14 ++------------ controller/ofctrl.h | 4 +--- controller/ovn-controller.8.xml | 14 -------------- controller/ovn-controller.c | 21 +-------------------- controller/pinctrl.c | 2 +- lib/features.c | 7 ++----- 7 files changed, 12 insertions(+), 55 deletions(-)