diff mbox series

[ovs-dev] controller: disable OpenFlow inactivity probing

Message ID 20230608141555.1339905-1-odivlad@gmail.com
State Accepted
Headers show
Series [ovs-dev] controller: disable OpenFlow inactivity probing | expand

Checks

Context Check Description
ovsrobot/apply-robot success apply and check: success
ovsrobot/github-robot-_Build_and_Test fail github build: failed
ovsrobot/github-robot-_ovn-kubernetes fail github build: failed

Commit Message

Vladislav Odintsov June 8, 2023, 2:15 p.m. UTC
OpenFlow connection is established over unix socket, which is a reliable
connection and doesn't require additional probing.

With this patch openflow probing in ovn-controller -> ovs-vswitchd
direction is disabled for all three connections:
  - OF flows management connection,
  - OF features negotiation connection,
  - pinctrl connection.

ovn-controller external_ids:ovn-openflow-probe-interval is removed as
non-needed anymore.

Disablement for ovs-vswitchd -> ovn-controller OF inacivity probing will
be done in the next patch.

Reported-at: https://mail.openvswitch.org/pipermail/ovs-dev/2023-May/404625.html
Signed-off-by: Vladislav Odintsov <odivlad@gmail.com>
---
 NEWS                            |  5 +++++
 controller/ofctrl.c             | 14 ++------------
 controller/ofctrl.h             |  4 +---
 controller/ovn-controller.8.xml | 14 --------------
 controller/ovn-controller.c     | 21 +--------------------
 controller/pinctrl.c            |  2 +-
 lib/features.c                  |  7 ++-----
 7 files changed, 12 insertions(+), 55 deletions(-)

Comments

Mark Michelson June 9, 2023, 3:28 p.m. UTC | #1
Hi,

Thanks for linking the email thread, since I had the exact same concern 
that Numan did about detecting if ovs-vswitchd goes down. It sounds like 
because of the nature of unix sockets, this disconnection is still 
detected by OVN and failover can occur as expected.

But what about if ovs-vswitchd is still running but in a "compromised" 
state. For instance, what if a coding error in ovs-vswitchd results in 
it running an infinite loop? In such a case, the unix connection would 
still be alive, but OVN would not be able to communicate with the 
ovs-vswitchd process.

Does this situation eventually lead to OVN disconnecting anyway after it 
fills up the unix socket's buffer? If so, how long does that take? Or 
does it lead to OVN slowly growing in memory usage while it waits 
forever to be able to write to the socket?

I think removing the hard-coded 5 second probe intervals from pinctrl.c 
and features.c is a good idea. But I think we should leave the 
configurable probe interval used by ofctrl.c for the case I described 
before. Since Han's patch to OVS should disable probe intervals from the 
OVS side by default, we would not trigger bad behavior simply because 
OVN has a lot of work to do during a recompute. What do you think?

On 6/8/23 10:15, Vladislav Odintsov wrote:
> OpenFlow connection is established over unix socket, which is a reliable
> connection and doesn't require additional probing.
> 
> With this patch openflow probing in ovn-controller -> ovs-vswitchd
> direction is disabled for all three connections:
>    - OF flows management connection,
>    - OF features negotiation connection,
>    - pinctrl connection.
> 
> ovn-controller external_ids:ovn-openflow-probe-interval is removed as
> non-needed anymore.
> 
> Disablement for ovs-vswitchd -> ovn-controller OF inacivity probing will
> be done in the next patch.
> 
> Reported-at: https://mail.openvswitch.org/pipermail/ovs-dev/2023-May/404625.html
> Signed-off-by: Vladislav Odintsov <odivlad@gmail.com>
> ---
>   NEWS                            |  5 +++++
>   controller/ofctrl.c             | 14 ++------------
>   controller/ofctrl.h             |  4 +---
>   controller/ovn-controller.8.xml | 14 --------------
>   controller/ovn-controller.c     | 21 +--------------------
>   controller/pinctrl.c            |  2 +-
>   lib/features.c                  |  7 ++-----
>   7 files changed, 12 insertions(+), 55 deletions(-)
> 
> diff --git a/NEWS b/NEWS
> index 645acea1f..bd63b187b 100644
> --- a/NEWS
> +++ b/NEWS
> @@ -1,5 +1,10 @@
>   Post v23.06.0
>   -------------
> +  - Disable OpenFlow inactivity probing between ovn-controller and OVS.
> +    OF connection is established over unix socket, which is a reliable
> +    connection method and doesn't require additional probing.
> +    external_ids:ovn-openflow-probe-interval configuration option for
> +    ovn-controller is no longer matter.
>   
>   OVN v23.06.0 - 01 Jun 2023
>   --------------------------
> diff --git a/controller/ofctrl.c b/controller/ofctrl.c
> index 64a444ff6..788634494 100644
> --- a/controller/ofctrl.c
> +++ b/controller/ofctrl.c
> @@ -415,11 +415,9 @@ static void ofctrl_recv(const struct ofp_header *, enum ofptype);
>   
>   void
>   ofctrl_init(struct ovn_extend_table *group_table,
> -            struct ovn_extend_table *meter_table,
> -            int inactivity_probe_interval)
> +            struct ovn_extend_table *meter_table)
>   {
> -    swconn = rconn_create(inactivity_probe_interval, 0,
> -                          DSCP_DEFAULT, 1 << OFP15_VERSION);
> +    swconn = rconn_create(0, 0, DSCP_DEFAULT, 1 << OFP15_VERSION);
>       tx_counter = rconn_packet_counter_create();
>       hmap_init(&installed_lflows);
>       hmap_init(&installed_pflows);
> @@ -2986,14 +2984,6 @@ ofctrl_is_connected(void)
>       return rconn_is_connected(swconn);
>   }
>   
> -void
> -ofctrl_set_probe_interval(int probe_interval)
> -{
> -    if (swconn) {
> -        rconn_set_probe_interval(swconn, probe_interval);
> -    }
> -}
> -
>   void
>   ofctrl_get_memory_usage(struct simap *usage)
>   {
> diff --git a/controller/ofctrl.h b/controller/ofctrl.h
> index 105f9370b..46bfccd85 100644
> --- a/controller/ofctrl.h
> +++ b/controller/ofctrl.h
> @@ -49,8 +49,7 @@ struct ovn_desired_flow_table {
>   
>   /* Interface for OVN main loop. */
>   void ofctrl_init(struct ovn_extend_table *group_table,
> -                 struct ovn_extend_table *meter_table,
> -                 int inactivity_probe_interval);
> +                 struct ovn_extend_table *meter_table);
>   bool ofctrl_run(const struct ovsrec_bridge *br_int,
>                   const struct ovsrec_open_vswitch_table *,
>                   struct shash *pending_ct_zones);
> @@ -142,7 +141,6 @@ void ofctrl_check_and_add_flow_metered(struct ovn_desired_flow_table *,
>   
>   
>   bool ofctrl_is_connected(void);
> -void ofctrl_set_probe_interval(int probe_interval);
>   void ofctrl_get_memory_usage(struct simap *usage);
>   
>   #endif /* controller/ofctrl.h */
> diff --git a/controller/ovn-controller.8.xml b/controller/ovn-controller.8.xml
> index f61f43008..52eb137d3 100644
> --- a/controller/ovn-controller.8.xml
> +++ b/controller/ovn-controller.8.xml
> @@ -147,20 +147,6 @@
>           </p>
>         </dd>
>   
> -      <dt><code>external_ids:ovn-openflow-probe-interval</code></dt>
> -      <dd>
> -        <p>
> -          The inactivity probe interval of the OpenFlow connection to the
> -          OpenvSwitch integration bridge, in seconds.
> -          If the value is zero, it disables the connection keepalive feature.
> -        </p>
> -
> -        <p>
> -          If the value is nonzero, then it will be forced to a value of
> -          at least 5s.
> -        </p>
> -      </dd>
> -
>         <dt><code>external_ids:ovn-encap-type</code></dt>
>         <dd>
>           <p>
> diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c
> index 3a81a13fb..732e7a690 100644
> --- a/controller/ovn-controller.c
> +++ b/controller/ovn-controller.c
> @@ -105,7 +105,6 @@ static unixctl_cb_func debug_ignore_startup_delay;
>   
>   #define DEFAULT_BRIDGE_NAME "br-int"
>   #define DEFAULT_DATAPATH "system"
> -#define OFCTRL_DEFAULT_PROBE_INTERVAL_SEC 0
>   
>   #define CONTROLLER_LOOP_STOPWATCH_NAME "flow-generation"
>   #define OFCTRL_PUT_STOPWATCH_NAME "flow-installation"
> @@ -556,22 +555,6 @@ update_ssl_config(const struct ovsrec_ssl_table *ssl_table)
>       }
>   }
>   
> -static int
> -get_ofctrl_probe_interval(struct ovsdb_idl *ovs_idl)
> -{
> -    const struct ovsrec_open_vswitch *cfg = ovsrec_open_vswitch_first(ovs_idl);
> -    if (!cfg) {
> -        return OFCTRL_DEFAULT_PROBE_INTERVAL_SEC;
> -    }
> -
> -    const struct ovsrec_open_vswitch_table *ovs_table =
> -        ovsrec_open_vswitch_table_get(ovs_idl);
> -    const char *chassis_id = get_ovs_chassis_id(ovs_table);
> -    return get_chassis_external_id_value_int(
> -        &cfg->external_ids, chassis_id,
> -        "ovn-openflow-probe-interval", OFCTRL_DEFAULT_PROBE_INTERVAL_SEC);
> -}
> -
>   /* Retrieves the pointer to the OVN Southbound database from 'ovs_idl' and
>    * updates 'sbdb_idl' with that pointer. */
>   static void
> @@ -4975,8 +4958,7 @@ main(int argc, char *argv[])
>           engine_get_internal_data(&en_lb_data);
>   
>       ofctrl_init(&lflow_output_data->group_table,
> -                &lflow_output_data->meter_table,
> -                get_ofctrl_probe_interval(ovs_idl_loop.idl));
> +                &lflow_output_data->meter_table);
>       ofctrl_seqno_init();
>   
>       unixctl_command_register("group-table-list", "", 0, 0,
> @@ -5104,7 +5086,6 @@ main(int argc, char *argv[])
>                        &reset_ovnsb_idl_min_index,
>                        &ctrl_engine_ctx, &ovnsb_expected_cond_seqno);
>           update_ssl_config(ovsrec_ssl_table_get(ovs_idl_loop.idl));
> -        ofctrl_set_probe_interval(get_ofctrl_probe_interval(ovs_idl_loop.idl));
>   
>           struct ovsdb_idl_txn *ovnsb_idl_txn
>               = ovsdb_idl_loop_run(&ovnsb_idl_loop);
> diff --git a/controller/pinctrl.c b/controller/pinctrl.c
> index c396ad4c2..a86db3f32 100644
> --- a/controller/pinctrl.c
> +++ b/controller/pinctrl.c
> @@ -3362,7 +3362,7 @@ pinctrl_handler(void *arg_)
>       static long long int svc_monitors_next_run_time = LLONG_MAX;
>       static long long int send_prefixd_time = LLONG_MAX;
>   
> -    swconn = rconn_create(5, 0, DSCP_DEFAULT, 1 << OFP15_VERSION);
> +    swconn = rconn_create(0, 0, DSCP_DEFAULT, 1 << OFP15_VERSION);
>   
>       while (!latch_is_set(&pctrl->pinctrl_thread_exit)) {
>           long long int bfd_time = LLONG_MAX;
> diff --git a/lib/features.c b/lib/features.c
> index 97c9c86f0..ad3357570 100644
> --- a/lib/features.c
> +++ b/lib/features.c
> @@ -28,11 +28,10 @@
>   #include "openvswitch/ofp-meter.h"
>   #include "openvswitch/ofp-util.h"
>   #include "ovn/features.h"
> +#include "controller/ofctrl.h"
>   
>   VLOG_DEFINE_THIS_MODULE(features);
>   
> -#define FEATURES_DEFAULT_PROBE_INTERVAL_SEC 5
> -
>   struct ovs_feature {
>       enum ovs_feature_value value;
>       const char *name;
> @@ -82,8 +81,7 @@ static void
>   ovs_feature_rconn_setup(const char *br_name)
>   {
>       if (!swconn) {
> -        swconn = rconn_create(FEATURES_DEFAULT_PROBE_INTERVAL_SEC, 0,
> -                              DSCP_DEFAULT, 1 << OFP15_VERSION);
> +        swconn = rconn_create(0, 0, DSCP_DEFAULT, 1 << OFP15_VERSION);
>       }
>   
>       if (!rconn_is_connected(swconn)) {
> @@ -94,7 +92,6 @@ ovs_feature_rconn_setup(const char *br_name)
>           }
>           free(target);
>       }
> -    rconn_set_probe_interval(swconn, FEATURES_DEFAULT_PROBE_INTERVAL_SEC);
>   }
>   
>   static bool
Vladislav Odintsov June 14, 2023, 6:02 p.m. UTC | #2
Hi Mark,

thanks for taking time to look on this!

Your point with a hung OVS is really an interesting case.

From one hand it’s a possible situation, and from another I guess it’s much higher probability for OVS to be busy by some other work rather than to hang in a loop. In my installation the first happening sometimes (OVS business by real work). The reasons can be different but I can’t say that it’s better to drop OF connection in such a moment (possibly triggering ha chassis-redirect ports failover).

At the same time if we’re talking about L3GW node, which has a gateway chassis redirect port, the hung OVS should be detected by other nodes with BFD probes (if it’s a ha chassis-redirect port).
And if it’s a normal hypervisor (with just a vif ports), so what can we do with it? I guess if we drop OF connection, it won’t make any positive changes. Maybe just release memory needed for handling this connection and not allocate buffers…? Unfortunately I can’t evaluate this.

Moreover, the OVN default is a disabled probing. What can be a possible recommended values to set probes if to leave it as is? How should users find out that they have to configure this? Currently in docs there is only information that this option configures probe interval. But it’s not obvious when to configure it and which value to set.

I hope this makes sense.

Also, which Han’s patch to OVS you’re talking about? I looked through open OVS patches and haven’t seen any. Please point me out.

regards,
Vladislav Odintsov

> On 9 Jun 2023, at 18:28, Mark Michelson <mmichels@redhat.com> wrote:
> 
> Hi,
> 
> Thanks for linking the email thread, since I had the exact same concern that Numan did about detecting if ovs-vswitchd goes down. It sounds like because of the nature of unix sockets, this disconnection is still detected by OVN and failover can occur as expected.
> 
> But what about if ovs-vswitchd is still running but in a "compromised" state. For instance, what if a coding error in ovs-vswitchd results in it running an infinite loop? In such a case, the unix connection would still be alive, but OVN would not be able to communicate with the ovs-vswitchd process.
> 
> Does this situation eventually lead to OVN disconnecting anyway after it fills up the unix socket's buffer? If so, how long does that take? Or does it lead to OVN slowly growing in memory usage while it waits forever to be able to write to the socket?
> 
> I think removing the hard-coded 5 second probe intervals from pinctrl.c and features.c is a good idea. But I think we should leave the configurable probe interval used by ofctrl.c for the case I described before. Since Han's patch to OVS should disable probe intervals from the OVS side by default, we would not trigger bad behavior simply because OVN has a lot of work to do during a recompute. What do you think?
>> On 6/8/23 10:15, Vladislav Odintsov wrote:
>> OpenFlow connection is established over unix socket, which is a reliable
>> connection and doesn't require additional probing.
>> With this patch openflow probing in ovn-controller -> ovs-vswitchd
>> direction is disabled for all three connections:
>>   - OF flows management connection,
>>   - OF features negotiation connection,
>>   - pinctrl connection.
>> ovn-controller external_ids:ovn-openflow-probe-interval is removed as
>> non-needed anymore.
>> Disablement for ovs-vswitchd -> ovn-controller OF inacivity probing will
>> be done in the next patch.
>> Reported-at: https://mail.openvswitch.org/pipermail/ovs-dev/2023-May/404625.html
>> Signed-off-by: Vladislav Odintsov <odivlad@gmail.com>
>> ---
>>  NEWS                            |  5 +++++
>>  controller/ofctrl.c             | 14 ++------------
>>  controller/ofctrl.h             |  4 +---
>>  controller/ovn-controller.8.xml | 14 --------------
>>  controller/ovn-controller.c     | 21 +--------------------
>>  controller/pinctrl.c            |  2 +-
>>  lib/features.c                  |  7 ++-----
>>  7 files changed, 12 insertions(+), 55 deletions(-)
>> diff --git a/NEWS b/NEWS
>> index 645acea1f..bd63b187b 100644
>> --- a/NEWS
>> +++ b/NEWS
>> @@ -1,5 +1,10 @@
>>  Post v23.06.0
>>  -------------
>> +  - Disable OpenFlow inactivity probing between ovn-controller and OVS.
>> +    OF connection is established over unix socket, which is a reliable
>> +    connection method and doesn't require additional probing.
>> +    external_ids:ovn-openflow-probe-interval configuration option for
>> +    ovn-controller is no longer matter.
>>    OVN v23.06.0 - 01 Jun 2023
>>  --------------------------
>> diff --git a/controller/ofctrl.c b/controller/ofctrl.c
>> index 64a444ff6..788634494 100644
>> --- a/controller/ofctrl.c
>> +++ b/controller/ofctrl.c
>> @@ -415,11 +415,9 @@ static void ofctrl_recv(const struct ofp_header *, enum ofptype);
>>    void
>>  ofctrl_init(struct ovn_extend_table *group_table,
>> -            struct ovn_extend_table *meter_table,
>> -            int inactivity_probe_interval)
>> +            struct ovn_extend_table *meter_table)
>>  {
>> -    swconn = rconn_create(inactivity_probe_interval, 0,
>> -                          DSCP_DEFAULT, 1 << OFP15_VERSION);
>> +    swconn = rconn_create(0, 0, DSCP_DEFAULT, 1 << OFP15_VERSION);
>>      tx_counter = rconn_packet_counter_create();
>>      hmap_init(&installed_lflows);
>>      hmap_init(&installed_pflows);
>> @@ -2986,14 +2984,6 @@ ofctrl_is_connected(void)
>>      return rconn_is_connected(swconn);
>>  }
>>  -void
>> -ofctrl_set_probe_interval(int probe_interval)
>> -{
>> -    if (swconn) {
>> -        rconn_set_probe_interval(swconn, probe_interval);
>> -    }
>> -}
>> -
>>  void
>>  ofctrl_get_memory_usage(struct simap *usage)
>>  {
>> diff --git a/controller/ofctrl.h b/controller/ofctrl.h
>> index 105f9370b..46bfccd85 100644
>> --- a/controller/ofctrl.h
>> +++ b/controller/ofctrl.h
>> @@ -49,8 +49,7 @@ struct ovn_desired_flow_table {
>>    /* Interface for OVN main loop. */
>>  void ofctrl_init(struct ovn_extend_table *group_table,
>> -                 struct ovn_extend_table *meter_table,
>> -                 int inactivity_probe_interval);
>> +                 struct ovn_extend_table *meter_table);
>>  bool ofctrl_run(const struct ovsrec_bridge *br_int,
>>                  const struct ovsrec_open_vswitch_table *,
>>                  struct shash *pending_ct_zones);
>> @@ -142,7 +141,6 @@ void ofctrl_check_and_add_flow_metered(struct ovn_desired_flow_table *,
>>      bool ofctrl_is_connected(void);
>> -void ofctrl_set_probe_interval(int probe_interval);
>>  void ofctrl_get_memory_usage(struct simap *usage);
>>    #endif /* controller/ofctrl.h */
>> diff --git a/controller/ovn-controller.8.xml b/controller/ovn-controller.8.xml
>> index f61f43008..52eb137d3 100644
>> --- a/controller/ovn-controller.8.xml
>> +++ b/controller/ovn-controller.8.xml
>> @@ -147,20 +147,6 @@
>>          </p>
>>        </dd>
>>  -      <dt><code>external_ids:ovn-openflow-probe-interval</code></dt>
>> -      <dd>
>> -        <p>
>> -          The inactivity probe interval of the OpenFlow connection to the
>> -          OpenvSwitch integration bridge, in seconds.
>> -          If the value is zero, it disables the connection keepalive feature.
>> -        </p>
>> -
>> -        <p>
>> -          If the value is nonzero, then it will be forced to a value of
>> -          at least 5s.
>> -        </p>
>> -      </dd>
>> -
>>        <dt><code>external_ids:ovn-encap-type</code></dt>
>>        <dd>
>>          <p>
>> diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c
>> index 3a81a13fb..732e7a690 100644
>> --- a/controller/ovn-controller.c
>> +++ b/controller/ovn-controller.c
>> @@ -105,7 +105,6 @@ static unixctl_cb_func debug_ignore_startup_delay;
>>    #define DEFAULT_BRIDGE_NAME "br-int"
>>  #define DEFAULT_DATAPATH "system"
>> -#define OFCTRL_DEFAULT_PROBE_INTERVAL_SEC 0
>>    #define CONTROLLER_LOOP_STOPWATCH_NAME "flow-generation"
>>  #define OFCTRL_PUT_STOPWATCH_NAME "flow-installation"
>> @@ -556,22 +555,6 @@ update_ssl_config(const struct ovsrec_ssl_table *ssl_table)
>>      }
>>  }
>>  -static int
>> -get_ofctrl_probe_interval(struct ovsdb_idl *ovs_idl)
>> -{
>> -    const struct ovsrec_open_vswitch *cfg = ovsrec_open_vswitch_first(ovs_idl);
>> -    if (!cfg) {
>> -        return OFCTRL_DEFAULT_PROBE_INTERVAL_SEC;
>> -    }
>> -
>> -    const struct ovsrec_open_vswitch_table *ovs_table =
>> -        ovsrec_open_vswitch_table_get(ovs_idl);
>> -    const char *chassis_id = get_ovs_chassis_id(ovs_table);
>> -    return get_chassis_external_id_value_int(
>> -        &cfg->external_ids, chassis_id,
>> -        "ovn-openflow-probe-interval", OFCTRL_DEFAULT_PROBE_INTERVAL_SEC);
>> -}
>> -
>>  /* Retrieves the pointer to the OVN Southbound database from 'ovs_idl' and
>>   * updates 'sbdb_idl' with that pointer. */
>>  static void
>> @@ -4975,8 +4958,7 @@ main(int argc, char *argv[])
>>          engine_get_internal_data(&en_lb_data);
>>        ofctrl_init(&lflow_output_data->group_table,
>> -                &lflow_output_data->meter_table,
>> -                get_ofctrl_probe_interval(ovs_idl_loop.idl));
>> +                &lflow_output_data->meter_table);
>>      ofctrl_seqno_init();
>>        unixctl_command_register("group-table-list", "", 0, 0,
>> @@ -5104,7 +5086,6 @@ main(int argc, char *argv[])
>>                       &reset_ovnsb_idl_min_index,
>>                       &ctrl_engine_ctx, &ovnsb_expected_cond_seqno);
>>          update_ssl_config(ovsrec_ssl_table_get(ovs_idl_loop.idl));
>> -        ofctrl_set_probe_interval(get_ofctrl_probe_interval(ovs_idl_loop.idl));
>>            struct ovsdb_idl_txn *ovnsb_idl_txn
>>              = ovsdb_idl_loop_run(&ovnsb_idl_loop);
>> diff --git a/controller/pinctrl.c b/controller/pinctrl.c
>> index c396ad4c2..a86db3f32 100644
>> --- a/controller/pinctrl.c
>> +++ b/controller/pinctrl.c
>> @@ -3362,7 +3362,7 @@ pinctrl_handler(void *arg_)
>>      static long long int svc_monitors_next_run_time = LLONG_MAX;
>>      static long long int send_prefixd_time = LLONG_MAX;
>>  -    swconn = rconn_create(5, 0, DSCP_DEFAULT, 1 << OFP15_VERSION);
>> +    swconn = rconn_create(0, 0, DSCP_DEFAULT, 1 << OFP15_VERSION);
>>        while (!latch_is_set(&pctrl->pinctrl_thread_exit)) {
>>          long long int bfd_time = LLONG_MAX;
>> diff --git a/lib/features.c b/lib/features.c
>> index 97c9c86f0..ad3357570 100644
>> --- a/lib/features.c
>> +++ b/lib/features.c
>> @@ -28,11 +28,10 @@
>>  #include "openvswitch/ofp-meter.h"
>>  #include "openvswitch/ofp-util.h"
>>  #include "ovn/features.h"
>> +#include "controller/ofctrl.h"
>>    VLOG_DEFINE_THIS_MODULE(features);
>>  -#define FEATURES_DEFAULT_PROBE_INTERVAL_SEC 5
>> -
>>  struct ovs_feature {
>>      enum ovs_feature_value value;
>>      const char *name;
>> @@ -82,8 +81,7 @@ static void
>>  ovs_feature_rconn_setup(const char *br_name)
>>  {
>>      if (!swconn) {
>> -        swconn = rconn_create(FEATURES_DEFAULT_PROBE_INTERVAL_SEC, 0,
>> -                              DSCP_DEFAULT, 1 << OFP15_VERSION);
>> +        swconn = rconn_create(0, 0, DSCP_DEFAULT, 1 << OFP15_VERSION);
>>      }
>>        if (!rconn_is_connected(swconn)) {
>> @@ -94,7 +92,6 @@ ovs_feature_rconn_setup(const char *br_name)
>>          }
>>          free(target);
>>      }
>> -    rconn_set_probe_interval(swconn, FEATURES_DEFAULT_PROBE_INTERVAL_SEC);
>>  }
>>    static bool
>
Dumitru Ceara Sept. 11, 2023, 12:54 p.m. UTC | #3
On 6/14/23 20:02, Vladislav Odintsov wrote:
> Hi Mark,
> 
> thanks for taking time to look on this!
> 
> Your point with a hung OVS is really an interesting case.
> 
> From one hand it’s a possible situation, and from another I guess it’s much higher probability for OVS to be busy by some other work rather than to hang in a loop. In my installation the first happening sometimes (OVS business by real work). The reasons can be different but I can’t say that it’s better to drop OF connection in such a moment (possibly triggering ha chassis-redirect ports failover).
> 
> At the same time if we’re talking about L3GW node, which has a gateway chassis redirect port, the hung OVS should be detected by other nodes with BFD probes (if it’s a ha chassis-redirect port).
> And if it’s a normal hypervisor (with just a vif ports), so what can we do with it? I guess if we drop OF connection, it won’t make any positive changes. Maybe just release memory needed for handling this connection and not allocate buffers…? Unfortunately I can’t evaluate this.
> 
> Moreover, the OVN default is a disabled probing. What can be a possible recommended values to set probes if to leave it as is? How should users find out that they have to configure this? Currently in docs there is only information that this option configures probe interval. But it’s not obvious when to configure it and which value to set.
> 

In support of this, quoting the original commit that added the probe
config knob:

"The main intention of this patch is to fix this recompuation issue for
older versions (there by requesting backport), it still would be
beneficial with the incremental processing engine."

That was back in 2019:

https://github.com/ovn-org/ovn/commit/c99069c8934c9ea55d310a8b6d48fb66aa477589

Since then lots of performance improvements landed in OVN
(ovn-controller too), we're probably fine.  Checking
ovn-org/ovn-kubernetes code I see they were also setting
ovn-openflow-probe-interval "just in case" so they probably won't be
affected by it getting removed:

https://github.com/ovn-org/ovn-kubernetes/commit/be1eb843852cd4c490514cec60553cca4feaf18e

> I hope this makes sense.
> 

The patch in its current form looks OK to me, I'm OK to merge it if Mark
doesn't have anything against it; therefore:

Acked-by: Dumitru Ceara <dceara@redhat.com>

Regards,
Dumitru

> Also, which Han’s patch to OVS you’re talking about? I looked through open OVS patches and haven’t seen any. Please point me out.
> 
> regards,
> Vladislav Odintsov
> 
>> On 9 Jun 2023, at 18:28, Mark Michelson <mmichels@redhat.com> wrote:
>>
>> Hi,
>>
>> Thanks for linking the email thread, since I had the exact same concern that Numan did about detecting if ovs-vswitchd goes down. It sounds like because of the nature of unix sockets, this disconnection is still detected by OVN and failover can occur as expected.
>>
>> But what about if ovs-vswitchd is still running but in a "compromised" state. For instance, what if a coding error in ovs-vswitchd results in it running an infinite loop? In such a case, the unix connection would still be alive, but OVN would not be able to communicate with the ovs-vswitchd process.
>>
>> Does this situation eventually lead to OVN disconnecting anyway after it fills up the unix socket's buffer? If so, how long does that take? Or does it lead to OVN slowly growing in memory usage while it waits forever to be able to write to the socket?
>>
>> I think removing the hard-coded 5 second probe intervals from pinctrl.c and features.c is a good idea. But I think we should leave the configurable probe interval used by ofctrl.c for the case I described before. Since Han's patch to OVS should disable probe intervals from the OVS side by default, we would not trigger bad behavior simply because OVN has a lot of work to do during a recompute. What do you think?
>>> On 6/8/23 10:15, Vladislav Odintsov wrote:
>>> OpenFlow connection is established over unix socket, which is a reliable
>>> connection and doesn't require additional probing.
>>> With this patch openflow probing in ovn-controller -> ovs-vswitchd
>>> direction is disabled for all three connections:
>>>   - OF flows management connection,
>>>   - OF features negotiation connection,
>>>   - pinctrl connection.
>>> ovn-controller external_ids:ovn-openflow-probe-interval is removed as
>>> non-needed anymore.
>>> Disablement for ovs-vswitchd -> ovn-controller OF inacivity probing will
>>> be done in the next patch.
>>> Reported-at: https://mail.openvswitch.org/pipermail/ovs-dev/2023-May/404625.html
>>> Signed-off-by: Vladislav Odintsov <odivlad@gmail.com>
>>> ---
>>>  NEWS                            |  5 +++++
>>>  controller/ofctrl.c             | 14 ++------------
>>>  controller/ofctrl.h             |  4 +---
>>>  controller/ovn-controller.8.xml | 14 --------------
>>>  controller/ovn-controller.c     | 21 +--------------------
>>>  controller/pinctrl.c            |  2 +-
>>>  lib/features.c                  |  7 ++-----
>>>  7 files changed, 12 insertions(+), 55 deletions(-)
>>> diff --git a/NEWS b/NEWS
>>> index 645acea1f..bd63b187b 100644
>>> --- a/NEWS
>>> +++ b/NEWS
>>> @@ -1,5 +1,10 @@
>>>  Post v23.06.0
>>>  -------------
>>> +  - Disable OpenFlow inactivity probing between ovn-controller and OVS.
>>> +    OF connection is established over unix socket, which is a reliable
>>> +    connection method and doesn't require additional probing.
>>> +    external_ids:ovn-openflow-probe-interval configuration option for
>>> +    ovn-controller is no longer matter.
>>>    OVN v23.06.0 - 01 Jun 2023
>>>  --------------------------
>>> diff --git a/controller/ofctrl.c b/controller/ofctrl.c
>>> index 64a444ff6..788634494 100644
>>> --- a/controller/ofctrl.c
>>> +++ b/controller/ofctrl.c
>>> @@ -415,11 +415,9 @@ static void ofctrl_recv(const struct ofp_header *, enum ofptype);
>>>    void
>>>  ofctrl_init(struct ovn_extend_table *group_table,
>>> -            struct ovn_extend_table *meter_table,
>>> -            int inactivity_probe_interval)
>>> +            struct ovn_extend_table *meter_table)
>>>  {
>>> -    swconn = rconn_create(inactivity_probe_interval, 0,
>>> -                          DSCP_DEFAULT, 1 << OFP15_VERSION);
>>> +    swconn = rconn_create(0, 0, DSCP_DEFAULT, 1 << OFP15_VERSION);
>>>      tx_counter = rconn_packet_counter_create();
>>>      hmap_init(&installed_lflows);
>>>      hmap_init(&installed_pflows);
>>> @@ -2986,14 +2984,6 @@ ofctrl_is_connected(void)
>>>      return rconn_is_connected(swconn);
>>>  }
>>>  -void
>>> -ofctrl_set_probe_interval(int probe_interval)
>>> -{
>>> -    if (swconn) {
>>> -        rconn_set_probe_interval(swconn, probe_interval);
>>> -    }
>>> -}
>>> -
>>>  void
>>>  ofctrl_get_memory_usage(struct simap *usage)
>>>  {
>>> diff --git a/controller/ofctrl.h b/controller/ofctrl.h
>>> index 105f9370b..46bfccd85 100644
>>> --- a/controller/ofctrl.h
>>> +++ b/controller/ofctrl.h
>>> @@ -49,8 +49,7 @@ struct ovn_desired_flow_table {
>>>    /* Interface for OVN main loop. */
>>>  void ofctrl_init(struct ovn_extend_table *group_table,
>>> -                 struct ovn_extend_table *meter_table,
>>> -                 int inactivity_probe_interval);
>>> +                 struct ovn_extend_table *meter_table);
>>>  bool ofctrl_run(const struct ovsrec_bridge *br_int,
>>>                  const struct ovsrec_open_vswitch_table *,
>>>                  struct shash *pending_ct_zones);
>>> @@ -142,7 +141,6 @@ void ofctrl_check_and_add_flow_metered(struct ovn_desired_flow_table *,
>>>      bool ofctrl_is_connected(void);
>>> -void ofctrl_set_probe_interval(int probe_interval);
>>>  void ofctrl_get_memory_usage(struct simap *usage);
>>>    #endif /* controller/ofctrl.h */
>>> diff --git a/controller/ovn-controller.8.xml b/controller/ovn-controller.8.xml
>>> index f61f43008..52eb137d3 100644
>>> --- a/controller/ovn-controller.8.xml
>>> +++ b/controller/ovn-controller.8.xml
>>> @@ -147,20 +147,6 @@
>>>          </p>
>>>        </dd>
>>>  -      <dt><code>external_ids:ovn-openflow-probe-interval</code></dt>
>>> -      <dd>
>>> -        <p>
>>> -          The inactivity probe interval of the OpenFlow connection to the
>>> -          OpenvSwitch integration bridge, in seconds.
>>> -          If the value is zero, it disables the connection keepalive feature.
>>> -        </p>
>>> -
>>> -        <p>
>>> -          If the value is nonzero, then it will be forced to a value of
>>> -          at least 5s.
>>> -        </p>
>>> -      </dd>
>>> -
>>>        <dt><code>external_ids:ovn-encap-type</code></dt>
>>>        <dd>
>>>          <p>
>>> diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c
>>> index 3a81a13fb..732e7a690 100644
>>> --- a/controller/ovn-controller.c
>>> +++ b/controller/ovn-controller.c
>>> @@ -105,7 +105,6 @@ static unixctl_cb_func debug_ignore_startup_delay;
>>>    #define DEFAULT_BRIDGE_NAME "br-int"
>>>  #define DEFAULT_DATAPATH "system"
>>> -#define OFCTRL_DEFAULT_PROBE_INTERVAL_SEC 0
>>>    #define CONTROLLER_LOOP_STOPWATCH_NAME "flow-generation"
>>>  #define OFCTRL_PUT_STOPWATCH_NAME "flow-installation"
>>> @@ -556,22 +555,6 @@ update_ssl_config(const struct ovsrec_ssl_table *ssl_table)
>>>      }
>>>  }
>>>  -static int
>>> -get_ofctrl_probe_interval(struct ovsdb_idl *ovs_idl)
>>> -{
>>> -    const struct ovsrec_open_vswitch *cfg = ovsrec_open_vswitch_first(ovs_idl);
>>> -    if (!cfg) {
>>> -        return OFCTRL_DEFAULT_PROBE_INTERVAL_SEC;
>>> -    }
>>> -
>>> -    const struct ovsrec_open_vswitch_table *ovs_table =
>>> -        ovsrec_open_vswitch_table_get(ovs_idl);
>>> -    const char *chassis_id = get_ovs_chassis_id(ovs_table);
>>> -    return get_chassis_external_id_value_int(
>>> -        &cfg->external_ids, chassis_id,
>>> -        "ovn-openflow-probe-interval", OFCTRL_DEFAULT_PROBE_INTERVAL_SEC);
>>> -}
>>> -
>>>  /* Retrieves the pointer to the OVN Southbound database from 'ovs_idl' and
>>>   * updates 'sbdb_idl' with that pointer. */
>>>  static void
>>> @@ -4975,8 +4958,7 @@ main(int argc, char *argv[])
>>>          engine_get_internal_data(&en_lb_data);
>>>        ofctrl_init(&lflow_output_data->group_table,
>>> -                &lflow_output_data->meter_table,
>>> -                get_ofctrl_probe_interval(ovs_idl_loop.idl));
>>> +                &lflow_output_data->meter_table);
>>>      ofctrl_seqno_init();
>>>        unixctl_command_register("group-table-list", "", 0, 0,
>>> @@ -5104,7 +5086,6 @@ main(int argc, char *argv[])
>>>                       &reset_ovnsb_idl_min_index,
>>>                       &ctrl_engine_ctx, &ovnsb_expected_cond_seqno);
>>>          update_ssl_config(ovsrec_ssl_table_get(ovs_idl_loop.idl));
>>> -        ofctrl_set_probe_interval(get_ofctrl_probe_interval(ovs_idl_loop.idl));
>>>            struct ovsdb_idl_txn *ovnsb_idl_txn
>>>              = ovsdb_idl_loop_run(&ovnsb_idl_loop);
>>> diff --git a/controller/pinctrl.c b/controller/pinctrl.c
>>> index c396ad4c2..a86db3f32 100644
>>> --- a/controller/pinctrl.c
>>> +++ b/controller/pinctrl.c
>>> @@ -3362,7 +3362,7 @@ pinctrl_handler(void *arg_)
>>>      static long long int svc_monitors_next_run_time = LLONG_MAX;
>>>      static long long int send_prefixd_time = LLONG_MAX;
>>>  -    swconn = rconn_create(5, 0, DSCP_DEFAULT, 1 << OFP15_VERSION);
>>> +    swconn = rconn_create(0, 0, DSCP_DEFAULT, 1 << OFP15_VERSION);
>>>        while (!latch_is_set(&pctrl->pinctrl_thread_exit)) {
>>>          long long int bfd_time = LLONG_MAX;
>>> diff --git a/lib/features.c b/lib/features.c
>>> index 97c9c86f0..ad3357570 100644
>>> --- a/lib/features.c
>>> +++ b/lib/features.c
>>> @@ -28,11 +28,10 @@
>>>  #include "openvswitch/ofp-meter.h"
>>>  #include "openvswitch/ofp-util.h"
>>>  #include "ovn/features.h"
>>> +#include "controller/ofctrl.h"
>>>    VLOG_DEFINE_THIS_MODULE(features);
>>>  -#define FEATURES_DEFAULT_PROBE_INTERVAL_SEC 5
>>> -
>>>  struct ovs_feature {
>>>      enum ovs_feature_value value;
>>>      const char *name;
>>> @@ -82,8 +81,7 @@ static void
>>>  ovs_feature_rconn_setup(const char *br_name)
>>>  {
>>>      if (!swconn) {
>>> -        swconn = rconn_create(FEATURES_DEFAULT_PROBE_INTERVAL_SEC, 0,
>>> -                              DSCP_DEFAULT, 1 << OFP15_VERSION);
>>> +        swconn = rconn_create(0, 0, DSCP_DEFAULT, 1 << OFP15_VERSION);
>>>      }
>>>        if (!rconn_is_connected(swconn)) {
>>> @@ -94,7 +92,6 @@ ovs_feature_rconn_setup(const char *br_name)
>>>          }
>>>          free(target);
>>>      }
>>> -    rconn_set_probe_interval(swconn, FEATURES_DEFAULT_PROBE_INTERVAL_SEC);
>>>  }
>>>    static bool
>>
> _______________________________________________
> dev mailing list
> dev@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Dumitru Ceara Sept. 11, 2023, 12:57 p.m. UTC | #4
On 9/11/23 14:54, Dumitru Ceara wrote:
> On 6/14/23 20:02, Vladislav Odintsov wrote:
>> Hi Mark,
>>
>> thanks for taking time to look on this!
>>
>> Your point with a hung OVS is really an interesting case.
>>
>> From one hand it’s a possible situation, and from another I guess it’s much higher probability for OVS to be busy by some other work rather than to hang in a loop. In my installation the first happening sometimes (OVS business by real work). The reasons can be different but I can’t say that it’s better to drop OF connection in such a moment (possibly triggering ha chassis-redirect ports failover).
>>
>> At the same time if we’re talking about L3GW node, which has a gateway chassis redirect port, the hung OVS should be detected by other nodes with BFD probes (if it’s a ha chassis-redirect port).
>> And if it’s a normal hypervisor (with just a vif ports), so what can we do with it? I guess if we drop OF connection, it won’t make any positive changes. Maybe just release memory needed for handling this connection and not allocate buffers…? Unfortunately I can’t evaluate this.
>>
>> Moreover, the OVN default is a disabled probing. What can be a possible recommended values to set probes if to leave it as is? How should users find out that they have to configure this? Currently in docs there is only information that this option configures probe interval. But it’s not obvious when to configure it and which value to set.
>>
> 
> In support of this, quoting the original commit that added the probe
> config knob:
> 
> "The main intention of this patch is to fix this recompuation issue for
> older versions (there by requesting backport), it still would be
> beneficial with the incremental processing engine."
> 
> That was back in 2019:
> 
> https://github.com/ovn-org/ovn/commit/c99069c8934c9ea55d310a8b6d48fb66aa477589
> 
> Since then lots of performance improvements landed in OVN
> (ovn-controller too), we're probably fine.  Checking
> ovn-org/ovn-kubernetes code I see they were also setting
> ovn-openflow-probe-interval "just in case" so they probably won't be
> affected by it getting removed:
> 
> https://github.com/ovn-org/ovn-kubernetes/commit/be1eb843852cd4c490514cec60553cca4feaf18e
> 
>> I hope this makes sense.
>>
> 
> The patch in its current form looks OK to me, I'm OK to merge it if Mark
> doesn't have anything against it; therefore:
> 
> Acked-by: Dumitru Ceara <dceara@redhat.com>
> 

I forgot to mention this earlier but, since this patch was posted way
before soft freeze and since the changes seem contained, it's probably
fair to apply this to the 23.09 branch too.  What do you think, Mark?

Thanks!

> Regards,
> Dumitru
> 
>> Also, which Han’s patch to OVS you’re talking about? I looked through open OVS patches and haven’t seen any. Please point me out.
>>
>> regards,
>> Vladislav Odintsov
>>
>>> On 9 Jun 2023, at 18:28, Mark Michelson <mmichels@redhat.com> wrote:
>>>
>>> Hi,
>>>
>>> Thanks for linking the email thread, since I had the exact same concern that Numan did about detecting if ovs-vswitchd goes down. It sounds like because of the nature of unix sockets, this disconnection is still detected by OVN and failover can occur as expected.
>>>
>>> But what about if ovs-vswitchd is still running but in a "compromised" state. For instance, what if a coding error in ovs-vswitchd results in it running an infinite loop? In such a case, the unix connection would still be alive, but OVN would not be able to communicate with the ovs-vswitchd process.
>>>
>>> Does this situation eventually lead to OVN disconnecting anyway after it fills up the unix socket's buffer? If so, how long does that take? Or does it lead to OVN slowly growing in memory usage while it waits forever to be able to write to the socket?
>>>
>>> I think removing the hard-coded 5 second probe intervals from pinctrl.c and features.c is a good idea. But I think we should leave the configurable probe interval used by ofctrl.c for the case I described before. Since Han's patch to OVS should disable probe intervals from the OVS side by default, we would not trigger bad behavior simply because OVN has a lot of work to do during a recompute. What do you think?
>>>> On 6/8/23 10:15, Vladislav Odintsov wrote:
>>>> OpenFlow connection is established over unix socket, which is a reliable
>>>> connection and doesn't require additional probing.
>>>> With this patch openflow probing in ovn-controller -> ovs-vswitchd
>>>> direction is disabled for all three connections:
>>>>   - OF flows management connection,
>>>>   - OF features negotiation connection,
>>>>   - pinctrl connection.
>>>> ovn-controller external_ids:ovn-openflow-probe-interval is removed as
>>>> non-needed anymore.
>>>> Disablement for ovs-vswitchd -> ovn-controller OF inacivity probing will
>>>> be done in the next patch.
>>>> Reported-at: https://mail.openvswitch.org/pipermail/ovs-dev/2023-May/404625.html
>>>> Signed-off-by: Vladislav Odintsov <odivlad@gmail.com>
>>>> ---
>>>>  NEWS                            |  5 +++++
>>>>  controller/ofctrl.c             | 14 ++------------
>>>>  controller/ofctrl.h             |  4 +---
>>>>  controller/ovn-controller.8.xml | 14 --------------
>>>>  controller/ovn-controller.c     | 21 +--------------------
>>>>  controller/pinctrl.c            |  2 +-
>>>>  lib/features.c                  |  7 ++-----
>>>>  7 files changed, 12 insertions(+), 55 deletions(-)
>>>> diff --git a/NEWS b/NEWS
>>>> index 645acea1f..bd63b187b 100644
>>>> --- a/NEWS
>>>> +++ b/NEWS
>>>> @@ -1,5 +1,10 @@
>>>>  Post v23.06.0
>>>>  -------------
>>>> +  - Disable OpenFlow inactivity probing between ovn-controller and OVS.
>>>> +    OF connection is established over unix socket, which is a reliable
>>>> +    connection method and doesn't require additional probing.
>>>> +    external_ids:ovn-openflow-probe-interval configuration option for
>>>> +    ovn-controller is no longer matter.
>>>>    OVN v23.06.0 - 01 Jun 2023
>>>>  --------------------------
>>>> diff --git a/controller/ofctrl.c b/controller/ofctrl.c
>>>> index 64a444ff6..788634494 100644
>>>> --- a/controller/ofctrl.c
>>>> +++ b/controller/ofctrl.c
>>>> @@ -415,11 +415,9 @@ static void ofctrl_recv(const struct ofp_header *, enum ofptype);
>>>>    void
>>>>  ofctrl_init(struct ovn_extend_table *group_table,
>>>> -            struct ovn_extend_table *meter_table,
>>>> -            int inactivity_probe_interval)
>>>> +            struct ovn_extend_table *meter_table)
>>>>  {
>>>> -    swconn = rconn_create(inactivity_probe_interval, 0,
>>>> -                          DSCP_DEFAULT, 1 << OFP15_VERSION);
>>>> +    swconn = rconn_create(0, 0, DSCP_DEFAULT, 1 << OFP15_VERSION);
>>>>      tx_counter = rconn_packet_counter_create();
>>>>      hmap_init(&installed_lflows);
>>>>      hmap_init(&installed_pflows);
>>>> @@ -2986,14 +2984,6 @@ ofctrl_is_connected(void)
>>>>      return rconn_is_connected(swconn);
>>>>  }
>>>>  -void
>>>> -ofctrl_set_probe_interval(int probe_interval)
>>>> -{
>>>> -    if (swconn) {
>>>> -        rconn_set_probe_interval(swconn, probe_interval);
>>>> -    }
>>>> -}
>>>> -
>>>>  void
>>>>  ofctrl_get_memory_usage(struct simap *usage)
>>>>  {
>>>> diff --git a/controller/ofctrl.h b/controller/ofctrl.h
>>>> index 105f9370b..46bfccd85 100644
>>>> --- a/controller/ofctrl.h
>>>> +++ b/controller/ofctrl.h
>>>> @@ -49,8 +49,7 @@ struct ovn_desired_flow_table {
>>>>    /* Interface for OVN main loop. */
>>>>  void ofctrl_init(struct ovn_extend_table *group_table,
>>>> -                 struct ovn_extend_table *meter_table,
>>>> -                 int inactivity_probe_interval);
>>>> +                 struct ovn_extend_table *meter_table);
>>>>  bool ofctrl_run(const struct ovsrec_bridge *br_int,
>>>>                  const struct ovsrec_open_vswitch_table *,
>>>>                  struct shash *pending_ct_zones);
>>>> @@ -142,7 +141,6 @@ void ofctrl_check_and_add_flow_metered(struct ovn_desired_flow_table *,
>>>>      bool ofctrl_is_connected(void);
>>>> -void ofctrl_set_probe_interval(int probe_interval);
>>>>  void ofctrl_get_memory_usage(struct simap *usage);
>>>>    #endif /* controller/ofctrl.h */
>>>> diff --git a/controller/ovn-controller.8.xml b/controller/ovn-controller.8.xml
>>>> index f61f43008..52eb137d3 100644
>>>> --- a/controller/ovn-controller.8.xml
>>>> +++ b/controller/ovn-controller.8.xml
>>>> @@ -147,20 +147,6 @@
>>>>          </p>
>>>>        </dd>
>>>>  -      <dt><code>external_ids:ovn-openflow-probe-interval</code></dt>
>>>> -      <dd>
>>>> -        <p>
>>>> -          The inactivity probe interval of the OpenFlow connection to the
>>>> -          OpenvSwitch integration bridge, in seconds.
>>>> -          If the value is zero, it disables the connection keepalive feature.
>>>> -        </p>
>>>> -
>>>> -        <p>
>>>> -          If the value is nonzero, then it will be forced to a value of
>>>> -          at least 5s.
>>>> -        </p>
>>>> -      </dd>
>>>> -
>>>>        <dt><code>external_ids:ovn-encap-type</code></dt>
>>>>        <dd>
>>>>          <p>
>>>> diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c
>>>> index 3a81a13fb..732e7a690 100644
>>>> --- a/controller/ovn-controller.c
>>>> +++ b/controller/ovn-controller.c
>>>> @@ -105,7 +105,6 @@ static unixctl_cb_func debug_ignore_startup_delay;
>>>>    #define DEFAULT_BRIDGE_NAME "br-int"
>>>>  #define DEFAULT_DATAPATH "system"
>>>> -#define OFCTRL_DEFAULT_PROBE_INTERVAL_SEC 0
>>>>    #define CONTROLLER_LOOP_STOPWATCH_NAME "flow-generation"
>>>>  #define OFCTRL_PUT_STOPWATCH_NAME "flow-installation"
>>>> @@ -556,22 +555,6 @@ update_ssl_config(const struct ovsrec_ssl_table *ssl_table)
>>>>      }
>>>>  }
>>>>  -static int
>>>> -get_ofctrl_probe_interval(struct ovsdb_idl *ovs_idl)
>>>> -{
>>>> -    const struct ovsrec_open_vswitch *cfg = ovsrec_open_vswitch_first(ovs_idl);
>>>> -    if (!cfg) {
>>>> -        return OFCTRL_DEFAULT_PROBE_INTERVAL_SEC;
>>>> -    }
>>>> -
>>>> -    const struct ovsrec_open_vswitch_table *ovs_table =
>>>> -        ovsrec_open_vswitch_table_get(ovs_idl);
>>>> -    const char *chassis_id = get_ovs_chassis_id(ovs_table);
>>>> -    return get_chassis_external_id_value_int(
>>>> -        &cfg->external_ids, chassis_id,
>>>> -        "ovn-openflow-probe-interval", OFCTRL_DEFAULT_PROBE_INTERVAL_SEC);
>>>> -}
>>>> -
>>>>  /* Retrieves the pointer to the OVN Southbound database from 'ovs_idl' and
>>>>   * updates 'sbdb_idl' with that pointer. */
>>>>  static void
>>>> @@ -4975,8 +4958,7 @@ main(int argc, char *argv[])
>>>>          engine_get_internal_data(&en_lb_data);
>>>>        ofctrl_init(&lflow_output_data->group_table,
>>>> -                &lflow_output_data->meter_table,
>>>> -                get_ofctrl_probe_interval(ovs_idl_loop.idl));
>>>> +                &lflow_output_data->meter_table);
>>>>      ofctrl_seqno_init();
>>>>        unixctl_command_register("group-table-list", "", 0, 0,
>>>> @@ -5104,7 +5086,6 @@ main(int argc, char *argv[])
>>>>                       &reset_ovnsb_idl_min_index,
>>>>                       &ctrl_engine_ctx, &ovnsb_expected_cond_seqno);
>>>>          update_ssl_config(ovsrec_ssl_table_get(ovs_idl_loop.idl));
>>>> -        ofctrl_set_probe_interval(get_ofctrl_probe_interval(ovs_idl_loop.idl));
>>>>            struct ovsdb_idl_txn *ovnsb_idl_txn
>>>>              = ovsdb_idl_loop_run(&ovnsb_idl_loop);
>>>> diff --git a/controller/pinctrl.c b/controller/pinctrl.c
>>>> index c396ad4c2..a86db3f32 100644
>>>> --- a/controller/pinctrl.c
>>>> +++ b/controller/pinctrl.c
>>>> @@ -3362,7 +3362,7 @@ pinctrl_handler(void *arg_)
>>>>      static long long int svc_monitors_next_run_time = LLONG_MAX;
>>>>      static long long int send_prefixd_time = LLONG_MAX;
>>>>  -    swconn = rconn_create(5, 0, DSCP_DEFAULT, 1 << OFP15_VERSION);
>>>> +    swconn = rconn_create(0, 0, DSCP_DEFAULT, 1 << OFP15_VERSION);
>>>>        while (!latch_is_set(&pctrl->pinctrl_thread_exit)) {
>>>>          long long int bfd_time = LLONG_MAX;
>>>> diff --git a/lib/features.c b/lib/features.c
>>>> index 97c9c86f0..ad3357570 100644
>>>> --- a/lib/features.c
>>>> +++ b/lib/features.c
>>>> @@ -28,11 +28,10 @@
>>>>  #include "openvswitch/ofp-meter.h"
>>>>  #include "openvswitch/ofp-util.h"
>>>>  #include "ovn/features.h"
>>>> +#include "controller/ofctrl.h"
>>>>    VLOG_DEFINE_THIS_MODULE(features);
>>>>  -#define FEATURES_DEFAULT_PROBE_INTERVAL_SEC 5
>>>> -
>>>>  struct ovs_feature {
>>>>      enum ovs_feature_value value;
>>>>      const char *name;
>>>> @@ -82,8 +81,7 @@ static void
>>>>  ovs_feature_rconn_setup(const char *br_name)
>>>>  {
>>>>      if (!swconn) {
>>>> -        swconn = rconn_create(FEATURES_DEFAULT_PROBE_INTERVAL_SEC, 0,
>>>> -                              DSCP_DEFAULT, 1 << OFP15_VERSION);
>>>> +        swconn = rconn_create(0, 0, DSCP_DEFAULT, 1 << OFP15_VERSION);
>>>>      }
>>>>        if (!rconn_is_connected(swconn)) {
>>>> @@ -94,7 +92,6 @@ ovs_feature_rconn_setup(const char *br_name)
>>>>          }
>>>>          free(target);
>>>>      }
>>>> -    rconn_set_probe_interval(swconn, FEATURES_DEFAULT_PROBE_INTERVAL_SEC);
>>>>  }
>>>>    static bool
>>>
>> _______________________________________________
>> dev mailing list
>> dev@openvswitch.org
>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Vladislav Odintsov Sept. 11, 2023, 2:31 p.m. UTC | #5
Hi Dumitru,

if eventually this patch got merged, please remove next lines from its commit message:

"Disablement for ovs-vswitchd -> ovn-controller OF inacivity probing will
be done in the next patch."

Unfortunately I was unsuccessful in removing these probing direction (default 60s).
Maybe you can give any hint how to solve this [1].

1: https://mail.openvswitch.org/pipermail/ovs-dev/2023-June/405445.html

> On 11 Sep 2023, at 15:57, Dumitru Ceara <dceara@redhat.com> wrote:
> 
> On 9/11/23 14:54, Dumitru Ceara wrote:
>> On 6/14/23 20:02, Vladislav Odintsov wrote:
>>> Hi Mark,
>>> 
>>> thanks for taking time to look on this!
>>> 
>>> Your point with a hung OVS is really an interesting case.
>>> 
>>> From one hand it’s a possible situation, and from another I guess it’s much higher probability for OVS to be busy by some other work rather than to hang in a loop. In my installation the first happening sometimes (OVS business by real work). The reasons can be different but I can’t say that it’s better to drop OF connection in such a moment (possibly triggering ha chassis-redirect ports failover).
>>> 
>>> At the same time if we’re talking about L3GW node, which has a gateway chassis redirect port, the hung OVS should be detected by other nodes with BFD probes (if it’s a ha chassis-redirect port).
>>> And if it’s a normal hypervisor (with just a vif ports), so what can we do with it? I guess if we drop OF connection, it won’t make any positive changes. Maybe just release memory needed for handling this connection and not allocate buffers…? Unfortunately I can’t evaluate this.
>>> 
>>> Moreover, the OVN default is a disabled probing. What can be a possible recommended values to set probes if to leave it as is? How should users find out that they have to configure this? Currently in docs there is only information that this option configures probe interval. But it’s not obvious when to configure it and which value to set.
>>> 
>> 
>> In support of this, quoting the original commit that added the probe
>> config knob:
>> 
>> "The main intention of this patch is to fix this recompuation issue for
>> older versions (there by requesting backport), it still would be
>> beneficial with the incremental processing engine."
>> 
>> That was back in 2019:
>> 
>> https://github.com/ovn-org/ovn/commit/c99069c8934c9ea55d310a8b6d48fb66aa477589
>> 
>> Since then lots of performance improvements landed in OVN
>> (ovn-controller too), we're probably fine.  Checking
>> ovn-org/ovn-kubernetes code I see they were also setting
>> ovn-openflow-probe-interval "just in case" so they probably won't be
>> affected by it getting removed:
>> 
>> https://github.com/ovn-org/ovn-kubernetes/commit/be1eb843852cd4c490514cec60553cca4feaf18e
>> 
>>> I hope this makes sense.
>>> 
>> 
>> The patch in its current form looks OK to me, I'm OK to merge it if Mark
>> doesn't have anything against it; therefore:
>> 
>> Acked-by: Dumitru Ceara <dceara@redhat.com>
>> 
> 
> I forgot to mention this earlier but, since this patch was posted way
> before soft freeze and since the changes seem contained, it's probably
> fair to apply this to the 23.09 branch too.  What do you think, Mark?
> 
> Thanks!
> 
>> Regards,
>> Dumitru
>> 
>>> Also, which Han’s patch to OVS you’re talking about? I looked through open OVS patches and haven’t seen any. Please point me out.
>>> 
>>> regards,
>>> Vladislav Odintsov
>>> 
>>>> On 9 Jun 2023, at 18:28, Mark Michelson <mmichels@redhat.com> wrote:
>>>> 
>>>> Hi,
>>>> 
>>>> Thanks for linking the email thread, since I had the exact same concern that Numan did about detecting if ovs-vswitchd goes down. It sounds like because of the nature of unix sockets, this disconnection is still detected by OVN and failover can occur as expected.
>>>> 
>>>> But what about if ovs-vswitchd is still running but in a "compromised" state. For instance, what if a coding error in ovs-vswitchd results in it running an infinite loop? In such a case, the unix connection would still be alive, but OVN would not be able to communicate with the ovs-vswitchd process.
>>>> 
>>>> Does this situation eventually lead to OVN disconnecting anyway after it fills up the unix socket's buffer? If so, how long does that take? Or does it lead to OVN slowly growing in memory usage while it waits forever to be able to write to the socket?
>>>> 
>>>> I think removing the hard-coded 5 second probe intervals from pinctrl.c and features.c is a good idea. But I think we should leave the configurable probe interval used by ofctrl.c for the case I described before. Since Han's patch to OVS should disable probe intervals from the OVS side by default, we would not trigger bad behavior simply because OVN has a lot of work to do during a recompute. What do you think?
>>>>> On 6/8/23 10:15, Vladislav Odintsov wrote:
>>>>> OpenFlow connection is established over unix socket, which is a reliable
>>>>> connection and doesn't require additional probing.
>>>>> With this patch openflow probing in ovn-controller -> ovs-vswitchd
>>>>> direction is disabled for all three connections:
>>>>>  - OF flows management connection,
>>>>>  - OF features negotiation connection,
>>>>>  - pinctrl connection.
>>>>> ovn-controller external_ids:ovn-openflow-probe-interval is removed as
>>>>> non-needed anymore.
>>>>> Disablement for ovs-vswitchd -> ovn-controller OF inacivity probing will
>>>>> be done in the next patch.
>>>>> Reported-at: https://mail.openvswitch.org/pipermail/ovs-dev/2023-May/404625.html
>>>>> Signed-off-by: Vladislav Odintsov <odivlad@gmail.com>
>>>>> ---
>>>>> NEWS                            |  5 +++++
>>>>> controller/ofctrl.c             | 14 ++------------
>>>>> controller/ofctrl.h             |  4 +---
>>>>> controller/ovn-controller.8.xml | 14 --------------
>>>>> controller/ovn-controller.c     | 21 +--------------------
>>>>> controller/pinctrl.c            |  2 +-
>>>>> lib/features.c                  |  7 ++-----
>>>>> 7 files changed, 12 insertions(+), 55 deletions(-)
>>>>> diff --git a/NEWS b/NEWS
>>>>> index 645acea1f..bd63b187b 100644
>>>>> --- a/NEWS
>>>>> +++ b/NEWS
>>>>> @@ -1,5 +1,10 @@
>>>>> Post v23.06.0
>>>>> -------------
>>>>> +  - Disable OpenFlow inactivity probing between ovn-controller and OVS.
>>>>> +    OF connection is established over unix socket, which is a reliable
>>>>> +    connection method and doesn't require additional probing.
>>>>> +    external_ids:ovn-openflow-probe-interval configuration option for
>>>>> +    ovn-controller is no longer matter.
>>>>>   OVN v23.06.0 - 01 Jun 2023
>>>>> --------------------------
>>>>> diff --git a/controller/ofctrl.c b/controller/ofctrl.c
>>>>> index 64a444ff6..788634494 100644
>>>>> --- a/controller/ofctrl.c
>>>>> +++ b/controller/ofctrl.c
>>>>> @@ -415,11 +415,9 @@ static void ofctrl_recv(const struct ofp_header *, enum ofptype);
>>>>>   void
>>>>> ofctrl_init(struct ovn_extend_table *group_table,
>>>>> -            struct ovn_extend_table *meter_table,
>>>>> -            int inactivity_probe_interval)
>>>>> +            struct ovn_extend_table *meter_table)
>>>>> {
>>>>> -    swconn = rconn_create(inactivity_probe_interval, 0,
>>>>> -                          DSCP_DEFAULT, 1 << OFP15_VERSION);
>>>>> +    swconn = rconn_create(0, 0, DSCP_DEFAULT, 1 << OFP15_VERSION);
>>>>>     tx_counter = rconn_packet_counter_create();
>>>>>     hmap_init(&installed_lflows);
>>>>>     hmap_init(&installed_pflows);
>>>>> @@ -2986,14 +2984,6 @@ ofctrl_is_connected(void)
>>>>>     return rconn_is_connected(swconn);
>>>>> }
>>>>> -void
>>>>> -ofctrl_set_probe_interval(int probe_interval)
>>>>> -{
>>>>> -    if (swconn) {
>>>>> -        rconn_set_probe_interval(swconn, probe_interval);
>>>>> -    }
>>>>> -}
>>>>> -
>>>>> void
>>>>> ofctrl_get_memory_usage(struct simap *usage)
>>>>> {
>>>>> diff --git a/controller/ofctrl.h b/controller/ofctrl.h
>>>>> index 105f9370b..46bfccd85 100644
>>>>> --- a/controller/ofctrl.h
>>>>> +++ b/controller/ofctrl.h
>>>>> @@ -49,8 +49,7 @@ struct ovn_desired_flow_table {
>>>>>   /* Interface for OVN main loop. */
>>>>> void ofctrl_init(struct ovn_extend_table *group_table,
>>>>> -                 struct ovn_extend_table *meter_table,
>>>>> -                 int inactivity_probe_interval);
>>>>> +                 struct ovn_extend_table *meter_table);
>>>>> bool ofctrl_run(const struct ovsrec_bridge *br_int,
>>>>>                 const struct ovsrec_open_vswitch_table *,
>>>>>                 struct shash *pending_ct_zones);
>>>>> @@ -142,7 +141,6 @@ void ofctrl_check_and_add_flow_metered(struct ovn_desired_flow_table *,
>>>>>     bool ofctrl_is_connected(void);
>>>>> -void ofctrl_set_probe_interval(int probe_interval);
>>>>> void ofctrl_get_memory_usage(struct simap *usage);
>>>>>   #endif /* controller/ofctrl.h */
>>>>> diff --git a/controller/ovn-controller.8.xml b/controller/ovn-controller.8.xml
>>>>> index f61f43008..52eb137d3 100644
>>>>> --- a/controller/ovn-controller.8.xml
>>>>> +++ b/controller/ovn-controller.8.xml
>>>>> @@ -147,20 +147,6 @@
>>>>>         </p>
>>>>>       </dd>
>>>>> -      <dt><code>external_ids:ovn-openflow-probe-interval</code></dt>
>>>>> -      <dd>
>>>>> -        <p>
>>>>> -          The inactivity probe interval of the OpenFlow connection to the
>>>>> -          OpenvSwitch integration bridge, in seconds.
>>>>> -          If the value is zero, it disables the connection keepalive feature.
>>>>> -        </p>
>>>>> -
>>>>> -        <p>
>>>>> -          If the value is nonzero, then it will be forced to a value of
>>>>> -          at least 5s.
>>>>> -        </p>
>>>>> -      </dd>
>>>>> -
>>>>>       <dt><code>external_ids:ovn-encap-type</code></dt>
>>>>>       <dd>
>>>>>         <p>
>>>>> diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c
>>>>> index 3a81a13fb..732e7a690 100644
>>>>> --- a/controller/ovn-controller.c
>>>>> +++ b/controller/ovn-controller.c
>>>>> @@ -105,7 +105,6 @@ static unixctl_cb_func debug_ignore_startup_delay;
>>>>>   #define DEFAULT_BRIDGE_NAME "br-int"
>>>>> #define DEFAULT_DATAPATH "system"
>>>>> -#define OFCTRL_DEFAULT_PROBE_INTERVAL_SEC 0
>>>>>   #define CONTROLLER_LOOP_STOPWATCH_NAME "flow-generation"
>>>>> #define OFCTRL_PUT_STOPWATCH_NAME "flow-installation"
>>>>> @@ -556,22 +555,6 @@ update_ssl_config(const struct ovsrec_ssl_table *ssl_table)
>>>>>     }
>>>>> }
>>>>> -static int
>>>>> -get_ofctrl_probe_interval(struct ovsdb_idl *ovs_idl)
>>>>> -{
>>>>> -    const struct ovsrec_open_vswitch *cfg = ovsrec_open_vswitch_first(ovs_idl);
>>>>> -    if (!cfg) {
>>>>> -        return OFCTRL_DEFAULT_PROBE_INTERVAL_SEC;
>>>>> -    }
>>>>> -
>>>>> -    const struct ovsrec_open_vswitch_table *ovs_table =
>>>>> -        ovsrec_open_vswitch_table_get(ovs_idl);
>>>>> -    const char *chassis_id = get_ovs_chassis_id(ovs_table);
>>>>> -    return get_chassis_external_id_value_int(
>>>>> -        &cfg->external_ids, chassis_id,
>>>>> -        "ovn-openflow-probe-interval", OFCTRL_DEFAULT_PROBE_INTERVAL_SEC);
>>>>> -}
>>>>> -
>>>>> /* Retrieves the pointer to the OVN Southbound database from 'ovs_idl' and
>>>>>  * updates 'sbdb_idl' with that pointer. */
>>>>> static void
>>>>> @@ -4975,8 +4958,7 @@ main(int argc, char *argv[])
>>>>>         engine_get_internal_data(&en_lb_data);
>>>>>       ofctrl_init(&lflow_output_data->group_table,
>>>>> -                &lflow_output_data->meter_table,
>>>>> -                get_ofctrl_probe_interval(ovs_idl_loop.idl));
>>>>> +                &lflow_output_data->meter_table);
>>>>>     ofctrl_seqno_init();
>>>>>       unixctl_command_register("group-table-list", "", 0, 0,
>>>>> @@ -5104,7 +5086,6 @@ main(int argc, char *argv[])
>>>>>                      &reset_ovnsb_idl_min_index,
>>>>>                      &ctrl_engine_ctx, &ovnsb_expected_cond_seqno);
>>>>>         update_ssl_config(ovsrec_ssl_table_get(ovs_idl_loop.idl));
>>>>> -        ofctrl_set_probe_interval(get_ofctrl_probe_interval(ovs_idl_loop.idl));
>>>>>           struct ovsdb_idl_txn *ovnsb_idl_txn
>>>>>             = ovsdb_idl_loop_run(&ovnsb_idl_loop);
>>>>> diff --git a/controller/pinctrl.c b/controller/pinctrl.c
>>>>> index c396ad4c2..a86db3f32 100644
>>>>> --- a/controller/pinctrl.c
>>>>> +++ b/controller/pinctrl.c
>>>>> @@ -3362,7 +3362,7 @@ pinctrl_handler(void *arg_)
>>>>>     static long long int svc_monitors_next_run_time = LLONG_MAX;
>>>>>     static long long int send_prefixd_time = LLONG_MAX;
>>>>> -    swconn = rconn_create(5, 0, DSCP_DEFAULT, 1 << OFP15_VERSION);
>>>>> +    swconn = rconn_create(0, 0, DSCP_DEFAULT, 1 << OFP15_VERSION);
>>>>>       while (!latch_is_set(&pctrl->pinctrl_thread_exit)) {
>>>>>         long long int bfd_time = LLONG_MAX;
>>>>> diff --git a/lib/features.c b/lib/features.c
>>>>> index 97c9c86f0..ad3357570 100644
>>>>> --- a/lib/features.c
>>>>> +++ b/lib/features.c
>>>>> @@ -28,11 +28,10 @@
>>>>> #include "openvswitch/ofp-meter.h"
>>>>> #include "openvswitch/ofp-util.h"
>>>>> #include "ovn/features.h"
>>>>> +#include "controller/ofctrl.h"
>>>>>   VLOG_DEFINE_THIS_MODULE(features);
>>>>> -#define FEATURES_DEFAULT_PROBE_INTERVAL_SEC 5
>>>>> -
>>>>> struct ovs_feature {
>>>>>     enum ovs_feature_value value;
>>>>>     const char *name;
>>>>> @@ -82,8 +81,7 @@ static void
>>>>> ovs_feature_rconn_setup(const char *br_name)
>>>>> {
>>>>>     if (!swconn) {
>>>>> -        swconn = rconn_create(FEATURES_DEFAULT_PROBE_INTERVAL_SEC, 0,
>>>>> -                              DSCP_DEFAULT, 1 << OFP15_VERSION);
>>>>> +        swconn = rconn_create(0, 0, DSCP_DEFAULT, 1 << OFP15_VERSION);
>>>>>     }
>>>>>       if (!rconn_is_connected(swconn)) {
>>>>> @@ -94,7 +92,6 @@ ovs_feature_rconn_setup(const char *br_name)
>>>>>         }
>>>>>         free(target);
>>>>>     }
>>>>> -    rconn_set_probe_interval(swconn, FEATURES_DEFAULT_PROBE_INTERVAL_SEC);
>>>>> }
>>>>>   static bool
>>>> 
>>> _______________________________________________
>>> dev mailing list
>>> dev@openvswitch.org
>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Regards,
Vladislav Odintsov
Han Zhou Sept. 22, 2023, 6:39 p.m. UTC | #6
On Mon, Sep 11, 2023 at 7:31 AM Vladislav Odintsov <odivlad@gmail.com>
wrote:
>
> Hi Dumitru,
>
> if eventually this patch got merged, please remove next lines from its
commit message:
>
> "Disablement for ovs-vswitchd -> ovn-controller OF inacivity probing will
> be done in the next patch."
>
> Unfortunately I was unsuccessful in removing these probing direction
(default 60s).
> Maybe you can give any hint how to solve this [1].
>
> 1: https://mail.openvswitch.org/pipermail/ovs-dev/2023-June/405445.html
>
> > On 11 Sep 2023, at 15:57, Dumitru Ceara <dceara@redhat.com> wrote:
> >
> > On 9/11/23 14:54, Dumitru Ceara wrote:
> >> On 6/14/23 20:02, Vladislav Odintsov wrote:
> >>> Hi Mark,
> >>>
> >>> thanks for taking time to look on this!
> >>>
> >>> Your point with a hung OVS is really an interesting case.
> >>>
> >>> From one hand it’s a possible situation, and from another I guess
it’s much higher probability for OVS to be busy by some other work rather
than to hang in a loop. In my installation the first happening sometimes
(OVS business by real work). The reasons can be different but I can’t say
that it’s better to drop OF connection in such a moment (possibly
triggering ha chassis-redirect ports failover).
> >>>
> >>> At the same time if we’re talking about L3GW node, which has a
gateway chassis redirect port, the hung OVS should be detected by other
nodes with BFD probes (if it’s a ha chassis-redirect port).
> >>> And if it’s a normal hypervisor (with just a vif ports), so what can
we do with it? I guess if we drop OF connection, it won’t make any positive
changes. Maybe just release memory needed for handling this connection and
not allocate buffers…? Unfortunately I can’t evaluate this.
> >>>
> >>> Moreover, the OVN default is a disabled probing. What can be a
possible recommended values to set probes if to leave it as is? How should
users find out that they have to configure this? Currently in docs there is
only information that this option configures probe interval. But it’s not
obvious when to configure it and which value to set.
> >>>
> >>
> >> In support of this, quoting the original commit that added the probe
> >> config knob:
> >>
> >> "The main intention of this patch is to fix this recompuation issue for
> >> older versions (there by requesting backport), it still would be
> >> beneficial with the incremental processing engine."
> >>
> >> That was back in 2019:
> >>
> >>
https://github.com/ovn-org/ovn/commit/c99069c8934c9ea55d310a8b6d48fb66aa477589

To clarify a little more, the effect of the above commit in fact changed
the default behavior of probe from disabled (which is the default behavior
of all unix/punix rconn) to 5s, and was changed back to default disabled by:
https://github.com/ovn-org/ovn/commit/b8af8549396e62d6523be18e104352e334825783

So I am totally fine to remove this config if no one would ever need to
enable the probe.

> >>
> >> Since then lots of performance improvements landed in OVN
> >> (ovn-controller too), we're probably fine.  Checking
> >> ovn-org/ovn-kubernetes code I see they were also setting
> >> ovn-openflow-probe-interval "just in case" so they probably won't be
> >> affected by it getting removed:
> >>
> >>
https://github.com/ovn-org/ovn-kubernetes/commit/be1eb843852cd4c490514cec60553cca4feaf18e
> >>
> >>> I hope this makes sense.
> >>>
> >>
> >> The patch in its current form looks OK to me, I'm OK to merge it if
Mark
> >> doesn't have anything against it; therefore:
> >>
> >> Acked-by: Dumitru Ceara <dceara@redhat.com>
> >>
> >
> > I forgot to mention this earlier but, since this patch was posted way
> > before soft freeze and since the changes seem contained, it's probably
> > fair to apply this to the 23.09 branch too.  What do you think, Mark?
> >
> > Thanks!
> >
> >> Regards,
> >> Dumitru
> >>
> >>> Also, which Han’s patch to OVS you’re talking about? I looked through
open OVS patches and haven’t seen any. Please point me out.
> >>>

I don't remember that I posted any patch related to this. We have a patch
downstream that changed the hardcoded 60s to 0 in OVS for the mgmt
controller.

> >>> regards,
> >>> Vladislav Odintsov
> >>>
> >>>> On 9 Jun 2023, at 18:28, Mark Michelson <mmichels@redhat.com> wrote:
> >>>>
> >>>> Hi,
> >>>>
> >>>> Thanks for linking the email thread, since I had the exact same
concern that Numan did about detecting if ovs-vswitchd goes down. It sounds
like because of the nature of unix sockets, this disconnection is still
detected by OVN and failover can occur as expected.
> >>>>
> >>>> But what about if ovs-vswitchd is still running but in a
"compromised" state. For instance, what if a coding error in ovs-vswitchd
results in it running an infinite loop? In such a case, the unix connection
would still be alive, but OVN would not be able to communicate with the
ovs-vswitchd process.
> >>>>
> >>>> Does this situation eventually lead to OVN disconnecting anyway
after it fills up the unix socket's buffer? If so, how long does that take?
Or does it lead to OVN slowly growing in memory usage while it waits
forever to be able to write to the socket?

Hi Mark, if OVS is in such infinite loop, disconnect it from ovn-controller
and reconnect would not help anyway, right?

> >>>>
> >>>> I think removing the hard-coded 5 second probe intervals from
pinctrl.c and features.c is a good idea. But I think we should leave the
configurable probe interval used by ofctrl.c for the case I described
before. Since Han's patch to OVS should disable probe intervals from the
OVS side by default, we would not trigger bad behavior simply because OVN
has a lot of work to do during a recompute. What do you think?
> >>>>> On 6/8/23 10:15, Vladislav Odintsov wrote:
> >>>>> OpenFlow connection is established over unix socket, which is a
reliable
> >>>>> connection and doesn't require additional probing.
> >>>>> With this patch openflow probing in ovn-controller -> ovs-vswitchd
> >>>>> direction is disabled for all three connections:
> >>>>>  - OF flows management connection,
> >>>>>  - OF features negotiation connection,
> >>>>>  - pinctrl connection.
> >>>>> ovn-controller external_ids:ovn-openflow-probe-interval is removed
as
> >>>>> non-needed anymore.
> >>>>> Disablement for ovs-vswitchd -> ovn-controller OF inacivity probing
will
> >>>>> be done in the next patch.
> >>>>> Reported-at:
https://mail.openvswitch.org/pipermail/ovs-dev/2023-May/404625.html
> >>>>> Signed-off-by: Vladislav Odintsov <odivlad@gmail.com>
> >>>>> ---
> >>>>> NEWS                            |  5 +++++
> >>>>> controller/ofctrl.c             | 14 ++------------
> >>>>> controller/ofctrl.h             |  4 +---
> >>>>> controller/ovn-controller.8.xml | 14 --------------
> >>>>> controller/ovn-controller.c     | 21 +--------------------
> >>>>> controller/pinctrl.c            |  2 +-
> >>>>> lib/features.c                  |  7 ++-----
> >>>>> 7 files changed, 12 insertions(+), 55 deletions(-)
> >>>>> diff --git a/NEWS b/NEWS
> >>>>> index 645acea1f..bd63b187b 100644
> >>>>> --- a/NEWS
> >>>>> +++ b/NEWS
> >>>>> @@ -1,5 +1,10 @@
> >>>>> Post v23.06.0
> >>>>> -------------
> >>>>> +  - Disable OpenFlow inactivity probing between ovn-controller and
OVS.
> >>>>> +    OF connection is established over unix socket, which is a
reliable
> >>>>> +    connection method and doesn't require additional probing.
> >>>>> +    external_ids:ovn-openflow-probe-interval configuration option
for
> >>>>> +    ovn-controller is no longer matter.

nit: s/is no longer matter/is removed

Acked-by: Han Zhou <hzhou@ovn.org>

(waiting for Mark's confirm about this concerns)

> >>>>>   OVN v23.06.0 - 01 Jun 2023
> >>>>> --------------------------
> >>>>> diff --git a/controller/ofctrl.c b/controller/ofctrl.c
> >>>>> index 64a444ff6..788634494 100644
> >>>>> --- a/controller/ofctrl.c
> >>>>> +++ b/controller/ofctrl.c
> >>>>> @@ -415,11 +415,9 @@ static void ofctrl_recv(const struct
ofp_header *, enum ofptype);
> >>>>>   void
> >>>>> ofctrl_init(struct ovn_extend_table *group_table,
> >>>>> -            struct ovn_extend_table *meter_table,
> >>>>> -            int inactivity_probe_interval)
> >>>>> +            struct ovn_extend_table *meter_table)
> >>>>> {
> >>>>> -    swconn = rconn_create(inactivity_probe_interval, 0,
> >>>>> -                          DSCP_DEFAULT, 1 << OFP15_VERSION);
> >>>>> +    swconn = rconn_create(0, 0, DSCP_DEFAULT, 1 << OFP15_VERSION);
> >>>>>     tx_counter = rconn_packet_counter_create();
> >>>>>     hmap_init(&installed_lflows);
> >>>>>     hmap_init(&installed_pflows);
> >>>>> @@ -2986,14 +2984,6 @@ ofctrl_is_connected(void)
> >>>>>     return rconn_is_connected(swconn);
> >>>>> }
> >>>>> -void
> >>>>> -ofctrl_set_probe_interval(int probe_interval)
> >>>>> -{
> >>>>> -    if (swconn) {
> >>>>> -        rconn_set_probe_interval(swconn, probe_interval);
> >>>>> -    }
> >>>>> -}
> >>>>> -
> >>>>> void
> >>>>> ofctrl_get_memory_usage(struct simap *usage)
> >>>>> {
> >>>>> diff --git a/controller/ofctrl.h b/controller/ofctrl.h
> >>>>> index 105f9370b..46bfccd85 100644
> >>>>> --- a/controller/ofctrl.h
> >>>>> +++ b/controller/ofctrl.h
> >>>>> @@ -49,8 +49,7 @@ struct ovn_desired_flow_table {
> >>>>>   /* Interface for OVN main loop. */
> >>>>> void ofctrl_init(struct ovn_extend_table *group_table,
> >>>>> -                 struct ovn_extend_table *meter_table,
> >>>>> -                 int inactivity_probe_interval);
> >>>>> +                 struct ovn_extend_table *meter_table);
> >>>>> bool ofctrl_run(const struct ovsrec_bridge *br_int,
> >>>>>                 const struct ovsrec_open_vswitch_table *,
> >>>>>                 struct shash *pending_ct_zones);
> >>>>> @@ -142,7 +141,6 @@ void ofctrl_check_and_add_flow_metered(struct
ovn_desired_flow_table *,
> >>>>>     bool ofctrl_is_connected(void);
> >>>>> -void ofctrl_set_probe_interval(int probe_interval);
> >>>>> void ofctrl_get_memory_usage(struct simap *usage);
> >>>>>   #endif /* controller/ofctrl.h */
> >>>>> diff --git a/controller/ovn-controller.8.xml
b/controller/ovn-controller.8.xml
> >>>>> index f61f43008..52eb137d3 100644
> >>>>> --- a/controller/ovn-controller.8.xml
> >>>>> +++ b/controller/ovn-controller.8.xml
> >>>>> @@ -147,20 +147,6 @@
> >>>>>         </p>
> >>>>>       </dd>
> >>>>> -
 <dt><code>external_ids:ovn-openflow-probe-interval</code></dt>
> >>>>> -      <dd>
> >>>>> -        <p>
> >>>>> -          The inactivity probe interval of the OpenFlow connection
to the
> >>>>> -          OpenvSwitch integration bridge, in seconds.
> >>>>> -          If the value is zero, it disables the connection
keepalive feature.
> >>>>> -        </p>
> >>>>> -
> >>>>> -        <p>
> >>>>> -          If the value is nonzero, then it will be forced to a
value of
> >>>>> -          at least 5s.
> >>>>> -        </p>
> >>>>> -      </dd>
> >>>>> -
> >>>>>       <dt><code>external_ids:ovn-encap-type</code></dt>
> >>>>>       <dd>
> >>>>>         <p>
> >>>>> diff --git a/controller/ovn-controller.c
b/controller/ovn-controller.c
> >>>>> index 3a81a13fb..732e7a690 100644
> >>>>> --- a/controller/ovn-controller.c
> >>>>> +++ b/controller/ovn-controller.c
> >>>>> @@ -105,7 +105,6 @@ static unixctl_cb_func
debug_ignore_startup_delay;
> >>>>>   #define DEFAULT_BRIDGE_NAME "br-int"
> >>>>> #define DEFAULT_DATAPATH "system"
> >>>>> -#define OFCTRL_DEFAULT_PROBE_INTERVAL_SEC 0
> >>>>>   #define CONTROLLER_LOOP_STOPWATCH_NAME "flow-generation"
> >>>>> #define OFCTRL_PUT_STOPWATCH_NAME "flow-installation"
> >>>>> @@ -556,22 +555,6 @@ update_ssl_config(const struct
ovsrec_ssl_table *ssl_table)
> >>>>>     }
> >>>>> }
> >>>>> -static int
> >>>>> -get_ofctrl_probe_interval(struct ovsdb_idl *ovs_idl)
> >>>>> -{
> >>>>> -    const struct ovsrec_open_vswitch *cfg =
ovsrec_open_vswitch_first(ovs_idl);
> >>>>> -    if (!cfg) {
> >>>>> -        return OFCTRL_DEFAULT_PROBE_INTERVAL_SEC;
> >>>>> -    }
> >>>>> -
> >>>>> -    const struct ovsrec_open_vswitch_table *ovs_table =
> >>>>> -        ovsrec_open_vswitch_table_get(ovs_idl);
> >>>>> -    const char *chassis_id = get_ovs_chassis_id(ovs_table);
> >>>>> -    return get_chassis_external_id_value_int(
> >>>>> -        &cfg->external_ids, chassis_id,
> >>>>> -        "ovn-openflow-probe-interval",
OFCTRL_DEFAULT_PROBE_INTERVAL_SEC);
> >>>>> -}
> >>>>> -
> >>>>> /* Retrieves the pointer to the OVN Southbound database from
'ovs_idl' and
> >>>>>  * updates 'sbdb_idl' with that pointer. */
> >>>>> static void
> >>>>> @@ -4975,8 +4958,7 @@ main(int argc, char *argv[])
> >>>>>         engine_get_internal_data(&en_lb_data);
> >>>>>       ofctrl_init(&lflow_output_data->group_table,
> >>>>> -                &lflow_output_data->meter_table,
> >>>>> -                get_ofctrl_probe_interval(ovs_idl_loop.idl));
> >>>>> +                &lflow_output_data->meter_table);
> >>>>>     ofctrl_seqno_init();
> >>>>>       unixctl_command_register("group-table-list", "", 0, 0,
> >>>>> @@ -5104,7 +5086,6 @@ main(int argc, char *argv[])
> >>>>>                      &reset_ovnsb_idl_min_index,
> >>>>>                      &ctrl_engine_ctx, &ovnsb_expected_cond_seqno);
> >>>>>         update_ssl_config(ovsrec_ssl_table_get(ovs_idl_loop.idl));
> >>>>> -
 ofctrl_set_probe_interval(get_ofctrl_probe_interval(ovs_idl_loop.idl));
> >>>>>           struct ovsdb_idl_txn *ovnsb_idl_txn
> >>>>>             = ovsdb_idl_loop_run(&ovnsb_idl_loop);
> >>>>> diff --git a/controller/pinctrl.c b/controller/pinctrl.c
> >>>>> index c396ad4c2..a86db3f32 100644
> >>>>> --- a/controller/pinctrl.c
> >>>>> +++ b/controller/pinctrl.c
> >>>>> @@ -3362,7 +3362,7 @@ pinctrl_handler(void *arg_)
> >>>>>     static long long int svc_monitors_next_run_time = LLONG_MAX;
> >>>>>     static long long int send_prefixd_time = LLONG_MAX;
> >>>>> -    swconn = rconn_create(5, 0, DSCP_DEFAULT, 1 << OFP15_VERSION);
> >>>>> +    swconn = rconn_create(0, 0, DSCP_DEFAULT, 1 << OFP15_VERSION);
> >>>>>       while (!latch_is_set(&pctrl->pinctrl_thread_exit)) {
> >>>>>         long long int bfd_time = LLONG_MAX;
> >>>>> diff --git a/lib/features.c b/lib/features.c
> >>>>> index 97c9c86f0..ad3357570 100644
> >>>>> --- a/lib/features.c
> >>>>> +++ b/lib/features.c
> >>>>> @@ -28,11 +28,10 @@
> >>>>> #include "openvswitch/ofp-meter.h"
> >>>>> #include "openvswitch/ofp-util.h"
> >>>>> #include "ovn/features.h"
> >>>>> +#include "controller/ofctrl.h"
> >>>>>   VLOG_DEFINE_THIS_MODULE(features);
> >>>>> -#define FEATURES_DEFAULT_PROBE_INTERVAL_SEC 5
> >>>>> -
> >>>>> struct ovs_feature {
> >>>>>     enum ovs_feature_value value;
> >>>>>     const char *name;
> >>>>> @@ -82,8 +81,7 @@ static void
> >>>>> ovs_feature_rconn_setup(const char *br_name)
> >>>>> {
> >>>>>     if (!swconn) {
> >>>>> -        swconn = rconn_create(FEATURES_DEFAULT_PROBE_INTERVAL_SEC,
0,
> >>>>> -                              DSCP_DEFAULT, 1 << OFP15_VERSION);
> >>>>> +        swconn = rconn_create(0, 0, DSCP_DEFAULT, 1 <<
OFP15_VERSION);
> >>>>>     }
> >>>>>       if (!rconn_is_connected(swconn)) {
> >>>>> @@ -94,7 +92,6 @@ ovs_feature_rconn_setup(const char *br_name)
> >>>>>         }
> >>>>>         free(target);
> >>>>>     }
> >>>>> -    rconn_set_probe_interval(swconn,
FEATURES_DEFAULT_PROBE_INTERVAL_SEC);
> >>>>> }
> >>>>>   static bool
> >>>>
> >>> _______________________________________________
> >>> dev mailing list
> >>> dev@openvswitch.org
> >>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>
>
> Regards,
> Vladislav Odintsov
>
> _______________________________________________
> dev mailing list
> dev@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Numan Siddique Nov. 2, 2023, 8:44 p.m. UTC | #7
On Fri, Sep 22, 2023 at 2:40 PM Han Zhou <hzhou@ovn.org> wrote:
>
> On Mon, Sep 11, 2023 at 7:31 AM Vladislav Odintsov <odivlad@gmail.com>
> wrote:
> >
> > Hi Dumitru,
> >
> > if eventually this patch got merged, please remove next lines from its
> commit message:
> >
> > "Disablement for ovs-vswitchd -> ovn-controller OF inacivity probing will
> > be done in the next patch."
> >
> > Unfortunately I was unsuccessful in removing these probing direction
> (default 60s).
> > Maybe you can give any hint how to solve this [1].
> >
> > 1: https://mail.openvswitch.org/pipermail/ovs-dev/2023-June/405445.html
> >
> > > On 11 Sep 2023, at 15:57, Dumitru Ceara <dceara@redhat.com> wrote:
> > >
> > > On 9/11/23 14:54, Dumitru Ceara wrote:
> > >> On 6/14/23 20:02, Vladislav Odintsov wrote:
> > >>> Hi Mark,
> > >>>
> > >>> thanks for taking time to look on this!
> > >>>
> > >>> Your point with a hung OVS is really an interesting case.
> > >>>
> > >>> From one hand it’s a possible situation, and from another I guess
> it’s much higher probability for OVS to be busy by some other work rather
> than to hang in a loop. In my installation the first happening sometimes
> (OVS business by real work). The reasons can be different but I can’t say
> that it’s better to drop OF connection in such a moment (possibly
> triggering ha chassis-redirect ports failover).
> > >>>
> > >>> At the same time if we’re talking about L3GW node, which has a
> gateway chassis redirect port, the hung OVS should be detected by other
> nodes with BFD probes (if it’s a ha chassis-redirect port).
> > >>> And if it’s a normal hypervisor (with just a vif ports), so what can
> we do with it? I guess if we drop OF connection, it won’t make any positive
> changes. Maybe just release memory needed for handling this connection and
> not allocate buffers…? Unfortunately I can’t evaluate this.
> > >>>
> > >>> Moreover, the OVN default is a disabled probing. What can be a
> possible recommended values to set probes if to leave it as is? How should
> users find out that they have to configure this? Currently in docs there is
> only information that this option configures probe interval. But it’s not
> obvious when to configure it and which value to set.
> > >>>
> > >>
> > >> In support of this, quoting the original commit that added the probe
> > >> config knob:
> > >>
> > >> "The main intention of this patch is to fix this recompuation issue for
> > >> older versions (there by requesting backport), it still would be
> > >> beneficial with the incremental processing engine."
> > >>
> > >> That was back in 2019:
> > >>
> > >>
> https://github.com/ovn-org/ovn/commit/c99069c8934c9ea55d310a8b6d48fb66aa477589
>
> To clarify a little more, the effect of the above commit in fact changed
> the default behavior of probe from disabled (which is the default behavior
> of all unix/punix rconn) to 5s, and was changed back to default disabled by:
> https://github.com/ovn-org/ovn/commit/b8af8549396e62d6523be18e104352e334825783
>
> So I am totally fine to remove this config if no one would ever need to
> enable the probe.
>
> > >>
> > >> Since then lots of performance improvements landed in OVN
> > >> (ovn-controller too), we're probably fine.  Checking
> > >> ovn-org/ovn-kubernetes code I see they were also setting
> > >> ovn-openflow-probe-interval "just in case" so they probably won't be
> > >> affected by it getting removed:
> > >>
> > >>
> https://github.com/ovn-org/ovn-kubernetes/commit/be1eb843852cd4c490514cec60553cca4feaf18e
> > >>
> > >>> I hope this makes sense.
> > >>>
> > >>
> > >> The patch in its current form looks OK to me, I'm OK to merge it if
> Mark
> > >> doesn't have anything against it; therefore:
> > >>
> > >> Acked-by: Dumitru Ceara <dceara@redhat.com>
> > >>
> > >
> > > I forgot to mention this earlier but, since this patch was posted way
> > > before soft freeze and since the changes seem contained, it's probably
> > > fair to apply this to the 23.09 branch too.  What do you think, Mark?
> > >
> > > Thanks!
> > >
> > >> Regards,
> > >> Dumitru
> > >>
> > >>> Also, which Han’s patch to OVS you’re talking about? I looked through
> open OVS patches and haven’t seen any. Please point me out.
> > >>>
>
> I don't remember that I posted any patch related to this. We have a patch
> downstream that changed the hardcoded 60s to 0 in OVS for the mgmt
> controller.
>
> > >>> regards,
> > >>> Vladislav Odintsov
> > >>>
> > >>>> On 9 Jun 2023, at 18:28, Mark Michelson <mmichels@redhat.com> wrote:
> > >>>>
> > >>>> Hi,
> > >>>>
> > >>>> Thanks for linking the email thread, since I had the exact same
> concern that Numan did about detecting if ovs-vswitchd goes down. It sounds
> like because of the nature of unix sockets, this disconnection is still
> detected by OVN and failover can occur as expected.
> > >>>>
> > >>>> But what about if ovs-vswitchd is still running but in a
> "compromised" state. For instance, what if a coding error in ovs-vswitchd
> results in it running an infinite loop? In such a case, the unix connection
> would still be alive, but OVN would not be able to communicate with the
> ovs-vswitchd process.
> > >>>>
> > >>>> Does this situation eventually lead to OVN disconnecting anyway
> after it fills up the unix socket's buffer? If so, how long does that take?
> Or does it lead to OVN slowly growing in memory usage while it waits
> forever to be able to write to the socket?
>
> Hi Mark, if OVS is in such infinite loop, disconnect it from ovn-controller
> and reconnect would not help anyway, right?
>
> > >>>>
> > >>>> I think removing the hard-coded 5 second probe intervals from
> pinctrl.c and features.c is a good idea. But I think we should leave the
> configurable probe interval used by ofctrl.c for the case I described
> before. Since Han's patch to OVS should disable probe intervals from the
> OVS side by default, we would not trigger bad behavior simply because OVN
> has a lot of work to do during a recompute. What do you think?
> > >>>>> On 6/8/23 10:15, Vladislav Odintsov wrote:
> > >>>>> OpenFlow connection is established over unix socket, which is a
> reliable
> > >>>>> connection and doesn't require additional probing.
> > >>>>> With this patch openflow probing in ovn-controller -> ovs-vswitchd
> > >>>>> direction is disabled for all three connections:
> > >>>>>  - OF flows management connection,
> > >>>>>  - OF features negotiation connection,
> > >>>>>  - pinctrl connection.
> > >>>>> ovn-controller external_ids:ovn-openflow-probe-interval is removed
> as
> > >>>>> non-needed anymore.
> > >>>>> Disablement for ovs-vswitchd -> ovn-controller OF inacivity probing
> will
> > >>>>> be done in the next patch.
> > >>>>> Reported-at:
> https://mail.openvswitch.org/pipermail/ovs-dev/2023-May/404625.html
> > >>>>> Signed-off-by: Vladislav Odintsov <odivlad@gmail.com>
> > >>>>> ---
> > >>>>> NEWS                            |  5 +++++
> > >>>>> controller/ofctrl.c             | 14 ++------------
> > >>>>> controller/ofctrl.h             |  4 +---
> > >>>>> controller/ovn-controller.8.xml | 14 --------------
> > >>>>> controller/ovn-controller.c     | 21 +--------------------
> > >>>>> controller/pinctrl.c            |  2 +-
> > >>>>> lib/features.c                  |  7 ++-----
> > >>>>> 7 files changed, 12 insertions(+), 55 deletions(-)
> > >>>>> diff --git a/NEWS b/NEWS
> > >>>>> index 645acea1f..bd63b187b 100644
> > >>>>> --- a/NEWS
> > >>>>> +++ b/NEWS
> > >>>>> @@ -1,5 +1,10 @@
> > >>>>> Post v23.06.0
> > >>>>> -------------
> > >>>>> +  - Disable OpenFlow inactivity probing between ovn-controller and
> OVS.
> > >>>>> +    OF connection is established over unix socket, which is a
> reliable
> > >>>>> +    connection method and doesn't require additional probing.
> > >>>>> +    external_ids:ovn-openflow-probe-interval configuration option
> for
> > >>>>> +    ovn-controller is no longer matter.
>
> nit: s/is no longer matter/is removed
>
> Acked-by: Han Zhou <hzhou@ovn.org>
>
> (waiting for Mark's confirm about this concerns)

OK.  I went ahead and applied this patch to the main.  Since there
were no comments from Mark I assume he is fine with the patch.

Numan

>
> > >>>>>   OVN v23.06.0 - 01 Jun 2023
> > >>>>> --------------------------
> > >>>>> diff --git a/controller/ofctrl.c b/controller/ofctrl.c
> > >>>>> index 64a444ff6..788634494 100644
> > >>>>> --- a/controller/ofctrl.c
> > >>>>> +++ b/controller/ofctrl.c
> > >>>>> @@ -415,11 +415,9 @@ static void ofctrl_recv(const struct
> ofp_header *, enum ofptype);
> > >>>>>   void
> > >>>>> ofctrl_init(struct ovn_extend_table *group_table,
> > >>>>> -            struct ovn_extend_table *meter_table,
> > >>>>> -            int inactivity_probe_interval)
> > >>>>> +            struct ovn_extend_table *meter_table)
> > >>>>> {
> > >>>>> -    swconn = rconn_create(inactivity_probe_interval, 0,
> > >>>>> -                          DSCP_DEFAULT, 1 << OFP15_VERSION);
> > >>>>> +    swconn = rconn_create(0, 0, DSCP_DEFAULT, 1 << OFP15_VERSION);
> > >>>>>     tx_counter = rconn_packet_counter_create();
> > >>>>>     hmap_init(&installed_lflows);
> > >>>>>     hmap_init(&installed_pflows);
> > >>>>> @@ -2986,14 +2984,6 @@ ofctrl_is_connected(void)
> > >>>>>     return rconn_is_connected(swconn);
> > >>>>> }
> > >>>>> -void
> > >>>>> -ofctrl_set_probe_interval(int probe_interval)
> > >>>>> -{
> > >>>>> -    if (swconn) {
> > >>>>> -        rconn_set_probe_interval(swconn, probe_interval);
> > >>>>> -    }
> > >>>>> -}
> > >>>>> -
> > >>>>> void
> > >>>>> ofctrl_get_memory_usage(struct simap *usage)
> > >>>>> {
> > >>>>> diff --git a/controller/ofctrl.h b/controller/ofctrl.h
> > >>>>> index 105f9370b..46bfccd85 100644
> > >>>>> --- a/controller/ofctrl.h
> > >>>>> +++ b/controller/ofctrl.h
> > >>>>> @@ -49,8 +49,7 @@ struct ovn_desired_flow_table {
> > >>>>>   /* Interface for OVN main loop. */
> > >>>>> void ofctrl_init(struct ovn_extend_table *group_table,
> > >>>>> -                 struct ovn_extend_table *meter_table,
> > >>>>> -                 int inactivity_probe_interval);
> > >>>>> +                 struct ovn_extend_table *meter_table);
> > >>>>> bool ofctrl_run(const struct ovsrec_bridge *br_int,
> > >>>>>                 const struct ovsrec_open_vswitch_table *,
> > >>>>>                 struct shash *pending_ct_zones);
> > >>>>> @@ -142,7 +141,6 @@ void ofctrl_check_and_add_flow_metered(struct
> ovn_desired_flow_table *,
> > >>>>>     bool ofctrl_is_connected(void);
> > >>>>> -void ofctrl_set_probe_interval(int probe_interval);
> > >>>>> void ofctrl_get_memory_usage(struct simap *usage);
> > >>>>>   #endif /* controller/ofctrl.h */
> > >>>>> diff --git a/controller/ovn-controller.8.xml
> b/controller/ovn-controller.8.xml
> > >>>>> index f61f43008..52eb137d3 100644
> > >>>>> --- a/controller/ovn-controller.8.xml
> > >>>>> +++ b/controller/ovn-controller.8.xml
> > >>>>> @@ -147,20 +147,6 @@
> > >>>>>         </p>
> > >>>>>       </dd>
> > >>>>> -
>  <dt><code>external_ids:ovn-openflow-probe-interval</code></dt>
> > >>>>> -      <dd>
> > >>>>> -        <p>
> > >>>>> -          The inactivity probe interval of the OpenFlow connection
> to the
> > >>>>> -          OpenvSwitch integration bridge, in seconds.
> > >>>>> -          If the value is zero, it disables the connection
> keepalive feature.
> > >>>>> -        </p>
> > >>>>> -
> > >>>>> -        <p>
> > >>>>> -          If the value is nonzero, then it will be forced to a
> value of
> > >>>>> -          at least 5s.
> > >>>>> -        </p>
> > >>>>> -      </dd>
> > >>>>> -
> > >>>>>       <dt><code>external_ids:ovn-encap-type</code></dt>
> > >>>>>       <dd>
> > >>>>>         <p>
> > >>>>> diff --git a/controller/ovn-controller.c
> b/controller/ovn-controller.c
> > >>>>> index 3a81a13fb..732e7a690 100644
> > >>>>> --- a/controller/ovn-controller.c
> > >>>>> +++ b/controller/ovn-controller.c
> > >>>>> @@ -105,7 +105,6 @@ static unixctl_cb_func
> debug_ignore_startup_delay;
> > >>>>>   #define DEFAULT_BRIDGE_NAME "br-int"
> > >>>>> #define DEFAULT_DATAPATH "system"
> > >>>>> -#define OFCTRL_DEFAULT_PROBE_INTERVAL_SEC 0
> > >>>>>   #define CONTROLLER_LOOP_STOPWATCH_NAME "flow-generation"
> > >>>>> #define OFCTRL_PUT_STOPWATCH_NAME "flow-installation"
> > >>>>> @@ -556,22 +555,6 @@ update_ssl_config(const struct
> ovsrec_ssl_table *ssl_table)
> > >>>>>     }
> > >>>>> }
> > >>>>> -static int
> > >>>>> -get_ofctrl_probe_interval(struct ovsdb_idl *ovs_idl)
> > >>>>> -{
> > >>>>> -    const struct ovsrec_open_vswitch *cfg =
> ovsrec_open_vswitch_first(ovs_idl);
> > >>>>> -    if (!cfg) {
> > >>>>> -        return OFCTRL_DEFAULT_PROBE_INTERVAL_SEC;
> > >>>>> -    }
> > >>>>> -
> > >>>>> -    const struct ovsrec_open_vswitch_table *ovs_table =
> > >>>>> -        ovsrec_open_vswitch_table_get(ovs_idl);
> > >>>>> -    const char *chassis_id = get_ovs_chassis_id(ovs_table);
> > >>>>> -    return get_chassis_external_id_value_int(
> > >>>>> -        &cfg->external_ids, chassis_id,
> > >>>>> -        "ovn-openflow-probe-interval",
> OFCTRL_DEFAULT_PROBE_INTERVAL_SEC);
> > >>>>> -}
> > >>>>> -
> > >>>>> /* Retrieves the pointer to the OVN Southbound database from
> 'ovs_idl' and
> > >>>>>  * updates 'sbdb_idl' with that pointer. */
> > >>>>> static void
> > >>>>> @@ -4975,8 +4958,7 @@ main(int argc, char *argv[])
> > >>>>>         engine_get_internal_data(&en_lb_data);
> > >>>>>       ofctrl_init(&lflow_output_data->group_table,
> > >>>>> -                &lflow_output_data->meter_table,
> > >>>>> -                get_ofctrl_probe_interval(ovs_idl_loop.idl));
> > >>>>> +                &lflow_output_data->meter_table);
> > >>>>>     ofctrl_seqno_init();
> > >>>>>       unixctl_command_register("group-table-list", "", 0, 0,
> > >>>>> @@ -5104,7 +5086,6 @@ main(int argc, char *argv[])
> > >>>>>                      &reset_ovnsb_idl_min_index,
> > >>>>>                      &ctrl_engine_ctx, &ovnsb_expected_cond_seqno);
> > >>>>>         update_ssl_config(ovsrec_ssl_table_get(ovs_idl_loop.idl));
> > >>>>> -
>  ofctrl_set_probe_interval(get_ofctrl_probe_interval(ovs_idl_loop.idl));
> > >>>>>           struct ovsdb_idl_txn *ovnsb_idl_txn
> > >>>>>             = ovsdb_idl_loop_run(&ovnsb_idl_loop);
> > >>>>> diff --git a/controller/pinctrl.c b/controller/pinctrl.c
> > >>>>> index c396ad4c2..a86db3f32 100644
> > >>>>> --- a/controller/pinctrl.c
> > >>>>> +++ b/controller/pinctrl.c
> > >>>>> @@ -3362,7 +3362,7 @@ pinctrl_handler(void *arg_)
> > >>>>>     static long long int svc_monitors_next_run_time = LLONG_MAX;
> > >>>>>     static long long int send_prefixd_time = LLONG_MAX;
> > >>>>> -    swconn = rconn_create(5, 0, DSCP_DEFAULT, 1 << OFP15_VERSION);
> > >>>>> +    swconn = rconn_create(0, 0, DSCP_DEFAULT, 1 << OFP15_VERSION);
> > >>>>>       while (!latch_is_set(&pctrl->pinctrl_thread_exit)) {
> > >>>>>         long long int bfd_time = LLONG_MAX;
> > >>>>> diff --git a/lib/features.c b/lib/features.c
> > >>>>> index 97c9c86f0..ad3357570 100644
> > >>>>> --- a/lib/features.c
> > >>>>> +++ b/lib/features.c
> > >>>>> @@ -28,11 +28,10 @@
> > >>>>> #include "openvswitch/ofp-meter.h"
> > >>>>> #include "openvswitch/ofp-util.h"
> > >>>>> #include "ovn/features.h"
> > >>>>> +#include "controller/ofctrl.h"
> > >>>>>   VLOG_DEFINE_THIS_MODULE(features);
> > >>>>> -#define FEATURES_DEFAULT_PROBE_INTERVAL_SEC 5
> > >>>>> -
> > >>>>> struct ovs_feature {
> > >>>>>     enum ovs_feature_value value;
> > >>>>>     const char *name;
> > >>>>> @@ -82,8 +81,7 @@ static void
> > >>>>> ovs_feature_rconn_setup(const char *br_name)
> > >>>>> {
> > >>>>>     if (!swconn) {
> > >>>>> -        swconn = rconn_create(FEATURES_DEFAULT_PROBE_INTERVAL_SEC,
> 0,
> > >>>>> -                              DSCP_DEFAULT, 1 << OFP15_VERSION);
> > >>>>> +        swconn = rconn_create(0, 0, DSCP_DEFAULT, 1 <<
> OFP15_VERSION);
> > >>>>>     }
> > >>>>>       if (!rconn_is_connected(swconn)) {
> > >>>>> @@ -94,7 +92,6 @@ ovs_feature_rconn_setup(const char *br_name)
> > >>>>>         }
> > >>>>>         free(target);
> > >>>>>     }
> > >>>>> -    rconn_set_probe_interval(swconn,
> FEATURES_DEFAULT_PROBE_INTERVAL_SEC);
> > >>>>> }
> > >>>>>   static bool
> > >>>>
> > >>> _______________________________________________
> > >>> dev mailing list
> > >>> dev@openvswitch.org
> > >>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> >
> >
> > Regards,
> > Vladislav Odintsov
> >
> > _______________________________________________
> > dev mailing list
> > dev@openvswitch.org
> > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> _______________________________________________
> dev mailing list
> dev@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Vladislav Odintsov Nov. 8, 2023, 12:16 p.m. UTC | #8
Thanks Numan!

> On 2 Nov 2023, at 23:44, Numan Siddique <numans@ovn.org> wrote:
> 
> On Fri, Sep 22, 2023 at 2:40 PM Han Zhou <hzhou@ovn.org <mailto:hzhou@ovn.org>> wrote:
>> 
>> On Mon, Sep 11, 2023 at 7:31 AM Vladislav Odintsov <odivlad@gmail.com>
>> wrote:
>>> 
>>> Hi Dumitru,
>>> 
>>> if eventually this patch got merged, please remove next lines from its
>> commit message:
>>> 
>>> "Disablement for ovs-vswitchd -> ovn-controller OF inacivity probing will
>>> be done in the next patch."
>>> 
>>> Unfortunately I was unsuccessful in removing these probing direction
>> (default 60s).
>>> Maybe you can give any hint how to solve this [1].
>>> 
>>> 1: https://mail.openvswitch.org/pipermail/ovs-dev/2023-June/405445.html
>>> 
>>>> On 11 Sep 2023, at 15:57, Dumitru Ceara <dceara@redhat.com> wrote:
>>>> 
>>>> On 9/11/23 14:54, Dumitru Ceara wrote:
>>>>> On 6/14/23 20:02, Vladislav Odintsov wrote:
>>>>>> Hi Mark,
>>>>>> 
>>>>>> thanks for taking time to look on this!
>>>>>> 
>>>>>> Your point with a hung OVS is really an interesting case.
>>>>>> 
>>>>>> From one hand it’s a possible situation, and from another I guess
>> it’s much higher probability for OVS to be busy by some other work rather
>> than to hang in a loop. In my installation the first happening sometimes
>> (OVS business by real work). The reasons can be different but I can’t say
>> that it’s better to drop OF connection in such a moment (possibly
>> triggering ha chassis-redirect ports failover).
>>>>>> 
>>>>>> At the same time if we’re talking about L3GW node, which has a
>> gateway chassis redirect port, the hung OVS should be detected by other
>> nodes with BFD probes (if it’s a ha chassis-redirect port).
>>>>>> And if it’s a normal hypervisor (with just a vif ports), so what can
>> we do with it? I guess if we drop OF connection, it won’t make any positive
>> changes. Maybe just release memory needed for handling this connection and
>> not allocate buffers…? Unfortunately I can’t evaluate this.
>>>>>> 
>>>>>> Moreover, the OVN default is a disabled probing. What can be a
>> possible recommended values to set probes if to leave it as is? How should
>> users find out that they have to configure this? Currently in docs there is
>> only information that this option configures probe interval. But it’s not
>> obvious when to configure it and which value to set.
>>>>>> 
>>>>> 
>>>>> In support of this, quoting the original commit that added the probe
>>>>> config knob:
>>>>> 
>>>>> "The main intention of this patch is to fix this recompuation issue for
>>>>> older versions (there by requesting backport), it still would be
>>>>> beneficial with the incremental processing engine."
>>>>> 
>>>>> That was back in 2019:
>>>>> 
>>>>> 
>> https://github.com/ovn-org/ovn/commit/c99069c8934c9ea55d310a8b6d48fb66aa477589
>> 
>> To clarify a little more, the effect of the above commit in fact changed
>> the default behavior of probe from disabled (which is the default behavior
>> of all unix/punix rconn) to 5s, and was changed back to default disabled by:
>> https://github.com/ovn-org/ovn/commit/b8af8549396e62d6523be18e104352e334825783
>> 
>> So I am totally fine to remove this config if no one would ever need to
>> enable the probe.
>> 
>>>>> 
>>>>> Since then lots of performance improvements landed in OVN
>>>>> (ovn-controller too), we're probably fine.  Checking
>>>>> ovn-org/ovn-kubernetes code I see they were also setting
>>>>> ovn-openflow-probe-interval "just in case" so they probably won't be
>>>>> affected by it getting removed:
>>>>> 
>>>>> 
>> https://github.com/ovn-org/ovn-kubernetes/commit/be1eb843852cd4c490514cec60553cca4feaf18e
>>>>> 
>>>>>> I hope this makes sense.
>>>>>> 
>>>>> 
>>>>> The patch in its current form looks OK to me, I'm OK to merge it if
>> Mark
>>>>> doesn't have anything against it; therefore:
>>>>> 
>>>>> Acked-by: Dumitru Ceara <dceara@redhat.com>
>>>>> 
>>>> 
>>>> I forgot to mention this earlier but, since this patch was posted way
>>>> before soft freeze and since the changes seem contained, it's probably
>>>> fair to apply this to the 23.09 branch too.  What do you think, Mark?
>>>> 
>>>> Thanks!
>>>> 
>>>>> Regards,
>>>>> Dumitru
>>>>> 
>>>>>> Also, which Han’s patch to OVS you’re talking about? I looked through
>> open OVS patches and haven’t seen any. Please point me out.
>>>>>> 
>> 
>> I don't remember that I posted any patch related to this. We have a patch
>> downstream that changed the hardcoded 60s to 0 in OVS for the mgmt
>> controller.
>> 
>>>>>> regards,
>>>>>> Vladislav Odintsov
>>>>>> 
>>>>>>> On 9 Jun 2023, at 18:28, Mark Michelson <mmichels@redhat.com> wrote:
>>>>>>> 
>>>>>>> Hi,
>>>>>>> 
>>>>>>> Thanks for linking the email thread, since I had the exact same
>> concern that Numan did about detecting if ovs-vswitchd goes down. It sounds
>> like because of the nature of unix sockets, this disconnection is still
>> detected by OVN and failover can occur as expected.
>>>>>>> 
>>>>>>> But what about if ovs-vswitchd is still running but in a
>> "compromised" state. For instance, what if a coding error in ovs-vswitchd
>> results in it running an infinite loop? In such a case, the unix connection
>> would still be alive, but OVN would not be able to communicate with the
>> ovs-vswitchd process.
>>>>>>> 
>>>>>>> Does this situation eventually lead to OVN disconnecting anyway
>> after it fills up the unix socket's buffer? If so, how long does that take?
>> Or does it lead to OVN slowly growing in memory usage while it waits
>> forever to be able to write to the socket?
>> 
>> Hi Mark, if OVS is in such infinite loop, disconnect it from ovn-controller
>> and reconnect would not help anyway, right?
>> 
>>>>>>> 
>>>>>>> I think removing the hard-coded 5 second probe intervals from
>> pinctrl.c and features.c is a good idea. But I think we should leave the
>> configurable probe interval used by ofctrl.c for the case I described
>> before. Since Han's patch to OVS should disable probe intervals from the
>> OVS side by default, we would not trigger bad behavior simply because OVN
>> has a lot of work to do during a recompute. What do you think?
>>>>>>>> On 6/8/23 10:15, Vladislav Odintsov wrote:
>>>>>>>> OpenFlow connection is established over unix socket, which is a
>> reliable
>>>>>>>> connection and doesn't require additional probing.
>>>>>>>> With this patch openflow probing in ovn-controller -> ovs-vswitchd
>>>>>>>> direction is disabled for all three connections:
>>>>>>>> - OF flows management connection,
>>>>>>>> - OF features negotiation connection,
>>>>>>>> - pinctrl connection.
>>>>>>>> ovn-controller external_ids:ovn-openflow-probe-interval is removed
>> as
>>>>>>>> non-needed anymore.
>>>>>>>> Disablement for ovs-vswitchd -> ovn-controller OF inacivity probing
>> will
>>>>>>>> be done in the next patch.
>>>>>>>> Reported-at:
>> https://mail.openvswitch.org/pipermail/ovs-dev/2023-May/404625.html
>>>>>>>> Signed-off-by: Vladislav Odintsov <odivlad@gmail.com>
>>>>>>>> ---
>>>>>>>> NEWS                            |  5 +++++
>>>>>>>> controller/ofctrl.c             | 14 ++------------
>>>>>>>> controller/ofctrl.h             |  4 +---
>>>>>>>> controller/ovn-controller.8.xml | 14 --------------
>>>>>>>> controller/ovn-controller.c     | 21 +--------------------
>>>>>>>> controller/pinctrl.c            |  2 +-
>>>>>>>> lib/features.c                  |  7 ++-----
>>>>>>>> 7 files changed, 12 insertions(+), 55 deletions(-)
>>>>>>>> diff --git a/NEWS b/NEWS
>>>>>>>> index 645acea1f..bd63b187b 100644
>>>>>>>> --- a/NEWS
>>>>>>>> +++ b/NEWS
>>>>>>>> @@ -1,5 +1,10 @@
>>>>>>>> Post v23.06.0
>>>>>>>> -------------
>>>>>>>> +  - Disable OpenFlow inactivity probing between ovn-controller and
>> OVS.
>>>>>>>> +    OF connection is established over unix socket, which is a
>> reliable
>>>>>>>> +    connection method and doesn't require additional probing.
>>>>>>>> +    external_ids:ovn-openflow-probe-interval configuration option
>> for
>>>>>>>> +    ovn-controller is no longer matter.
>> 
>> nit: s/is no longer matter/is removed
>> 
>> Acked-by: Han Zhou <hzhou@ovn.org>
>> 
>> (waiting for Mark's confirm about this concerns)
> 
> OK.  I went ahead and applied this patch to the main.  Since there
> were no comments from Mark I assume he is fine with the patch.
> 
> Numan
> 
>> 
>>>>>>>>  OVN v23.06.0 - 01 Jun 2023
>>>>>>>> --------------------------
>>>>>>>> diff --git a/controller/ofctrl.c b/controller/ofctrl.c
>>>>>>>> index 64a444ff6..788634494 100644
>>>>>>>> --- a/controller/ofctrl.c
>>>>>>>> +++ b/controller/ofctrl.c
>>>>>>>> @@ -415,11 +415,9 @@ static void ofctrl_recv(const struct
>> ofp_header *, enum ofptype);
>>>>>>>>  void
>>>>>>>> ofctrl_init(struct ovn_extend_table *group_table,
>>>>>>>> -            struct ovn_extend_table *meter_table,
>>>>>>>> -            int inactivity_probe_interval)
>>>>>>>> +            struct ovn_extend_table *meter_table)
>>>>>>>> {
>>>>>>>> -    swconn = rconn_create(inactivity_probe_interval, 0,
>>>>>>>> -                          DSCP_DEFAULT, 1 << OFP15_VERSION);
>>>>>>>> +    swconn = rconn_create(0, 0, DSCP_DEFAULT, 1 << OFP15_VERSION);
>>>>>>>>    tx_counter = rconn_packet_counter_create();
>>>>>>>>    hmap_init(&installed_lflows);
>>>>>>>>    hmap_init(&installed_pflows);
>>>>>>>> @@ -2986,14 +2984,6 @@ ofctrl_is_connected(void)
>>>>>>>>    return rconn_is_connected(swconn);
>>>>>>>> }
>>>>>>>> -void
>>>>>>>> -ofctrl_set_probe_interval(int probe_interval)
>>>>>>>> -{
>>>>>>>> -    if (swconn) {
>>>>>>>> -        rconn_set_probe_interval(swconn, probe_interval);
>>>>>>>> -    }
>>>>>>>> -}
>>>>>>>> -
>>>>>>>> void
>>>>>>>> ofctrl_get_memory_usage(struct simap *usage)
>>>>>>>> {
>>>>>>>> diff --git a/controller/ofctrl.h b/controller/ofctrl.h
>>>>>>>> index 105f9370b..46bfccd85 100644
>>>>>>>> --- a/controller/ofctrl.h
>>>>>>>> +++ b/controller/ofctrl.h
>>>>>>>> @@ -49,8 +49,7 @@ struct ovn_desired_flow_table {
>>>>>>>>  /* Interface for OVN main loop. */
>>>>>>>> void ofctrl_init(struct ovn_extend_table *group_table,
>>>>>>>> -                 struct ovn_extend_table *meter_table,
>>>>>>>> -                 int inactivity_probe_interval);
>>>>>>>> +                 struct ovn_extend_table *meter_table);
>>>>>>>> bool ofctrl_run(const struct ovsrec_bridge *br_int,
>>>>>>>>                const struct ovsrec_open_vswitch_table *,
>>>>>>>>                struct shash *pending_ct_zones);
>>>>>>>> @@ -142,7 +141,6 @@ void ofctrl_check_and_add_flow_metered(struct
>> ovn_desired_flow_table *,
>>>>>>>>    bool ofctrl_is_connected(void);
>>>>>>>> -void ofctrl_set_probe_interval(int probe_interval);
>>>>>>>> void ofctrl_get_memory_usage(struct simap *usage);
>>>>>>>>  #endif /* controller/ofctrl.h */
>>>>>>>> diff --git a/controller/ovn-controller.8.xml
>> b/controller/ovn-controller.8.xml
>>>>>>>> index f61f43008..52eb137d3 100644
>>>>>>>> --- a/controller/ovn-controller.8.xml
>>>>>>>> +++ b/controller/ovn-controller.8.xml
>>>>>>>> @@ -147,20 +147,6 @@
>>>>>>>>        </p>
>>>>>>>>      </dd>
>>>>>>>> -
>> <dt><code>external_ids:ovn-openflow-probe-interval</code></dt>
>>>>>>>> -      <dd>
>>>>>>>> -        <p>
>>>>>>>> -          The inactivity probe interval of the OpenFlow connection
>> to the
>>>>>>>> -          OpenvSwitch integration bridge, in seconds.
>>>>>>>> -          If the value is zero, it disables the connection
>> keepalive feature.
>>>>>>>> -        </p>
>>>>>>>> -
>>>>>>>> -        <p>
>>>>>>>> -          If the value is nonzero, then it will be forced to a
>> value of
>>>>>>>> -          at least 5s.
>>>>>>>> -        </p>
>>>>>>>> -      </dd>
>>>>>>>> -
>>>>>>>>      <dt><code>external_ids:ovn-encap-type</code></dt>
>>>>>>>>      <dd>
>>>>>>>>        <p>
>>>>>>>> diff --git a/controller/ovn-controller.c
>> b/controller/ovn-controller.c
>>>>>>>> index 3a81a13fb..732e7a690 100644
>>>>>>>> --- a/controller/ovn-controller.c
>>>>>>>> +++ b/controller/ovn-controller.c
>>>>>>>> @@ -105,7 +105,6 @@ static unixctl_cb_func
>> debug_ignore_startup_delay;
>>>>>>>>  #define DEFAULT_BRIDGE_NAME "br-int"
>>>>>>>> #define DEFAULT_DATAPATH "system"
>>>>>>>> -#define OFCTRL_DEFAULT_PROBE_INTERVAL_SEC 0
>>>>>>>>  #define CONTROLLER_LOOP_STOPWATCH_NAME "flow-generation"
>>>>>>>> #define OFCTRL_PUT_STOPWATCH_NAME "flow-installation"
>>>>>>>> @@ -556,22 +555,6 @@ update_ssl_config(const struct
>> ovsrec_ssl_table *ssl_table)
>>>>>>>>    }
>>>>>>>> }
>>>>>>>> -static int
>>>>>>>> -get_ofctrl_probe_interval(struct ovsdb_idl *ovs_idl)
>>>>>>>> -{
>>>>>>>> -    const struct ovsrec_open_vswitch *cfg =
>> ovsrec_open_vswitch_first(ovs_idl);
>>>>>>>> -    if (!cfg) {
>>>>>>>> -        return OFCTRL_DEFAULT_PROBE_INTERVAL_SEC;
>>>>>>>> -    }
>>>>>>>> -
>>>>>>>> -    const struct ovsrec_open_vswitch_table *ovs_table =
>>>>>>>> -        ovsrec_open_vswitch_table_get(ovs_idl);
>>>>>>>> -    const char *chassis_id = get_ovs_chassis_id(ovs_table);
>>>>>>>> -    return get_chassis_external_id_value_int(
>>>>>>>> -        &cfg->external_ids, chassis_id,
>>>>>>>> -        "ovn-openflow-probe-interval",
>> OFCTRL_DEFAULT_PROBE_INTERVAL_SEC);
>>>>>>>> -}
>>>>>>>> -
>>>>>>>> /* Retrieves the pointer to the OVN Southbound database from
>> 'ovs_idl' and
>>>>>>>> * updates 'sbdb_idl' with that pointer. */
>>>>>>>> static void
>>>>>>>> @@ -4975,8 +4958,7 @@ main(int argc, char *argv[])
>>>>>>>>        engine_get_internal_data(&en_lb_data);
>>>>>>>>      ofctrl_init(&lflow_output_data->group_table,
>>>>>>>> -                &lflow_output_data->meter_table,
>>>>>>>> -                get_ofctrl_probe_interval(ovs_idl_loop.idl));
>>>>>>>> +                &lflow_output_data->meter_table);
>>>>>>>>    ofctrl_seqno_init();
>>>>>>>>      unixctl_command_register("group-table-list", "", 0, 0,
>>>>>>>> @@ -5104,7 +5086,6 @@ main(int argc, char *argv[])
>>>>>>>>                     &reset_ovnsb_idl_min_index,
>>>>>>>>                     &ctrl_engine_ctx, &ovnsb_expected_cond_seqno);
>>>>>>>>        update_ssl_config(ovsrec_ssl_table_get(ovs_idl_loop.idl));
>>>>>>>> -
>> ofctrl_set_probe_interval(get_ofctrl_probe_interval(ovs_idl_loop.idl));
>>>>>>>>          struct ovsdb_idl_txn *ovnsb_idl_txn
>>>>>>>>            = ovsdb_idl_loop_run(&ovnsb_idl_loop);
>>>>>>>> diff --git a/controller/pinctrl.c b/controller/pinctrl.c
>>>>>>>> index c396ad4c2..a86db3f32 100644
>>>>>>>> --- a/controller/pinctrl.c
>>>>>>>> +++ b/controller/pinctrl.c
>>>>>>>> @@ -3362,7 +3362,7 @@ pinctrl_handler(void *arg_)
>>>>>>>>    static long long int svc_monitors_next_run_time = LLONG_MAX;
>>>>>>>>    static long long int send_prefixd_time = LLONG_MAX;
>>>>>>>> -    swconn = rconn_create(5, 0, DSCP_DEFAULT, 1 << OFP15_VERSION);
>>>>>>>> +    swconn = rconn_create(0, 0, DSCP_DEFAULT, 1 << OFP15_VERSION);
>>>>>>>>      while (!latch_is_set(&pctrl->pinctrl_thread_exit)) {
>>>>>>>>        long long int bfd_time = LLONG_MAX;
>>>>>>>> diff --git a/lib/features.c b/lib/features.c
>>>>>>>> index 97c9c86f0..ad3357570 100644
>>>>>>>> --- a/lib/features.c
>>>>>>>> +++ b/lib/features.c
>>>>>>>> @@ -28,11 +28,10 @@
>>>>>>>> #include "openvswitch/ofp-meter.h"
>>>>>>>> #include "openvswitch/ofp-util.h"
>>>>>>>> #include "ovn/features.h"
>>>>>>>> +#include "controller/ofctrl.h"
>>>>>>>>  VLOG_DEFINE_THIS_MODULE(features);
>>>>>>>> -#define FEATURES_DEFAULT_PROBE_INTERVAL_SEC 5
>>>>>>>> -
>>>>>>>> struct ovs_feature {
>>>>>>>>    enum ovs_feature_value value;
>>>>>>>>    const char *name;
>>>>>>>> @@ -82,8 +81,7 @@ static void
>>>>>>>> ovs_feature_rconn_setup(const char *br_name)
>>>>>>>> {
>>>>>>>>    if (!swconn) {
>>>>>>>> -        swconn = rconn_create(FEATURES_DEFAULT_PROBE_INTERVAL_SEC,
>> 0,
>>>>>>>> -                              DSCP_DEFAULT, 1 << OFP15_VERSION);
>>>>>>>> +        swconn = rconn_create(0, 0, DSCP_DEFAULT, 1 <<
>> OFP15_VERSION);
>>>>>>>>    }
>>>>>>>>      if (!rconn_is_connected(swconn)) {
>>>>>>>> @@ -94,7 +92,6 @@ ovs_feature_rconn_setup(const char *br_name)
>>>>>>>>        }
>>>>>>>>        free(target);
>>>>>>>>    }
>>>>>>>> -    rconn_set_probe_interval(swconn,
>> FEATURES_DEFAULT_PROBE_INTERVAL_SEC);
>>>>>>>> }
>>>>>>>>  static bool
>>>>>>> 
>>>>>> _______________________________________________
>>>>>> dev mailing list
>>>>>> dev@openvswitch.org
>>>>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>>> 
>>> 
>>> Regards,
>>> Vladislav Odintsov
>>> 
>>> _______________________________________________
>>> dev mailing list
>>> dev@openvswitch.org
>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>> _______________________________________________
>> dev mailing list
>> dev@openvswitch.org
>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> _______________________________________________
> dev mailing list
> dev@openvswitch.org <mailto:dev@openvswitch.org>
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev


Regards,
Vladislav Odintsov
diff mbox series

Patch

diff --git a/NEWS b/NEWS
index 645acea1f..bd63b187b 100644
--- a/NEWS
+++ b/NEWS
@@ -1,5 +1,10 @@ 
 Post v23.06.0
 -------------
+  - Disable OpenFlow inactivity probing between ovn-controller and OVS.
+    OF connection is established over unix socket, which is a reliable
+    connection method and doesn't require additional probing.
+    external_ids:ovn-openflow-probe-interval configuration option for
+    ovn-controller is no longer matter.
 
 OVN v23.06.0 - 01 Jun 2023
 --------------------------
diff --git a/controller/ofctrl.c b/controller/ofctrl.c
index 64a444ff6..788634494 100644
--- a/controller/ofctrl.c
+++ b/controller/ofctrl.c
@@ -415,11 +415,9 @@  static void ofctrl_recv(const struct ofp_header *, enum ofptype);
 
 void
 ofctrl_init(struct ovn_extend_table *group_table,
-            struct ovn_extend_table *meter_table,
-            int inactivity_probe_interval)
+            struct ovn_extend_table *meter_table)
 {
-    swconn = rconn_create(inactivity_probe_interval, 0,
-                          DSCP_DEFAULT, 1 << OFP15_VERSION);
+    swconn = rconn_create(0, 0, DSCP_DEFAULT, 1 << OFP15_VERSION);
     tx_counter = rconn_packet_counter_create();
     hmap_init(&installed_lflows);
     hmap_init(&installed_pflows);
@@ -2986,14 +2984,6 @@  ofctrl_is_connected(void)
     return rconn_is_connected(swconn);
 }
 
-void
-ofctrl_set_probe_interval(int probe_interval)
-{
-    if (swconn) {
-        rconn_set_probe_interval(swconn, probe_interval);
-    }
-}
-
 void
 ofctrl_get_memory_usage(struct simap *usage)
 {
diff --git a/controller/ofctrl.h b/controller/ofctrl.h
index 105f9370b..46bfccd85 100644
--- a/controller/ofctrl.h
+++ b/controller/ofctrl.h
@@ -49,8 +49,7 @@  struct ovn_desired_flow_table {
 
 /* Interface for OVN main loop. */
 void ofctrl_init(struct ovn_extend_table *group_table,
-                 struct ovn_extend_table *meter_table,
-                 int inactivity_probe_interval);
+                 struct ovn_extend_table *meter_table);
 bool ofctrl_run(const struct ovsrec_bridge *br_int,
                 const struct ovsrec_open_vswitch_table *,
                 struct shash *pending_ct_zones);
@@ -142,7 +141,6 @@  void ofctrl_check_and_add_flow_metered(struct ovn_desired_flow_table *,
 
 
 bool ofctrl_is_connected(void);
-void ofctrl_set_probe_interval(int probe_interval);
 void ofctrl_get_memory_usage(struct simap *usage);
 
 #endif /* controller/ofctrl.h */
diff --git a/controller/ovn-controller.8.xml b/controller/ovn-controller.8.xml
index f61f43008..52eb137d3 100644
--- a/controller/ovn-controller.8.xml
+++ b/controller/ovn-controller.8.xml
@@ -147,20 +147,6 @@ 
         </p>
       </dd>
 
-      <dt><code>external_ids:ovn-openflow-probe-interval</code></dt>
-      <dd>
-        <p>
-          The inactivity probe interval of the OpenFlow connection to the
-          OpenvSwitch integration bridge, in seconds.
-          If the value is zero, it disables the connection keepalive feature.
-        </p>
-
-        <p>
-          If the value is nonzero, then it will be forced to a value of
-          at least 5s.
-        </p>
-      </dd>
-
       <dt><code>external_ids:ovn-encap-type</code></dt>
       <dd>
         <p>
diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c
index 3a81a13fb..732e7a690 100644
--- a/controller/ovn-controller.c
+++ b/controller/ovn-controller.c
@@ -105,7 +105,6 @@  static unixctl_cb_func debug_ignore_startup_delay;
 
 #define DEFAULT_BRIDGE_NAME "br-int"
 #define DEFAULT_DATAPATH "system"
-#define OFCTRL_DEFAULT_PROBE_INTERVAL_SEC 0
 
 #define CONTROLLER_LOOP_STOPWATCH_NAME "flow-generation"
 #define OFCTRL_PUT_STOPWATCH_NAME "flow-installation"
@@ -556,22 +555,6 @@  update_ssl_config(const struct ovsrec_ssl_table *ssl_table)
     }
 }
 
-static int
-get_ofctrl_probe_interval(struct ovsdb_idl *ovs_idl)
-{
-    const struct ovsrec_open_vswitch *cfg = ovsrec_open_vswitch_first(ovs_idl);
-    if (!cfg) {
-        return OFCTRL_DEFAULT_PROBE_INTERVAL_SEC;
-    }
-
-    const struct ovsrec_open_vswitch_table *ovs_table =
-        ovsrec_open_vswitch_table_get(ovs_idl);
-    const char *chassis_id = get_ovs_chassis_id(ovs_table);
-    return get_chassis_external_id_value_int(
-        &cfg->external_ids, chassis_id,
-        "ovn-openflow-probe-interval", OFCTRL_DEFAULT_PROBE_INTERVAL_SEC);
-}
-
 /* Retrieves the pointer to the OVN Southbound database from 'ovs_idl' and
  * updates 'sbdb_idl' with that pointer. */
 static void
@@ -4975,8 +4958,7 @@  main(int argc, char *argv[])
         engine_get_internal_data(&en_lb_data);
 
     ofctrl_init(&lflow_output_data->group_table,
-                &lflow_output_data->meter_table,
-                get_ofctrl_probe_interval(ovs_idl_loop.idl));
+                &lflow_output_data->meter_table);
     ofctrl_seqno_init();
 
     unixctl_command_register("group-table-list", "", 0, 0,
@@ -5104,7 +5086,6 @@  main(int argc, char *argv[])
                      &reset_ovnsb_idl_min_index,
                      &ctrl_engine_ctx, &ovnsb_expected_cond_seqno);
         update_ssl_config(ovsrec_ssl_table_get(ovs_idl_loop.idl));
-        ofctrl_set_probe_interval(get_ofctrl_probe_interval(ovs_idl_loop.idl));
 
         struct ovsdb_idl_txn *ovnsb_idl_txn
             = ovsdb_idl_loop_run(&ovnsb_idl_loop);
diff --git a/controller/pinctrl.c b/controller/pinctrl.c
index c396ad4c2..a86db3f32 100644
--- a/controller/pinctrl.c
+++ b/controller/pinctrl.c
@@ -3362,7 +3362,7 @@  pinctrl_handler(void *arg_)
     static long long int svc_monitors_next_run_time = LLONG_MAX;
     static long long int send_prefixd_time = LLONG_MAX;
 
-    swconn = rconn_create(5, 0, DSCP_DEFAULT, 1 << OFP15_VERSION);
+    swconn = rconn_create(0, 0, DSCP_DEFAULT, 1 << OFP15_VERSION);
 
     while (!latch_is_set(&pctrl->pinctrl_thread_exit)) {
         long long int bfd_time = LLONG_MAX;
diff --git a/lib/features.c b/lib/features.c
index 97c9c86f0..ad3357570 100644
--- a/lib/features.c
+++ b/lib/features.c
@@ -28,11 +28,10 @@ 
 #include "openvswitch/ofp-meter.h"
 #include "openvswitch/ofp-util.h"
 #include "ovn/features.h"
+#include "controller/ofctrl.h"
 
 VLOG_DEFINE_THIS_MODULE(features);
 
-#define FEATURES_DEFAULT_PROBE_INTERVAL_SEC 5
-
 struct ovs_feature {
     enum ovs_feature_value value;
     const char *name;
@@ -82,8 +81,7 @@  static void
 ovs_feature_rconn_setup(const char *br_name)
 {
     if (!swconn) {
-        swconn = rconn_create(FEATURES_DEFAULT_PROBE_INTERVAL_SEC, 0,
-                              DSCP_DEFAULT, 1 << OFP15_VERSION);
+        swconn = rconn_create(0, 0, DSCP_DEFAULT, 1 << OFP15_VERSION);
     }
 
     if (!rconn_is_connected(swconn)) {
@@ -94,7 +92,6 @@  ovs_feature_rconn_setup(const char *br_name)
         }
         free(target);
     }
-    rconn_set_probe_interval(swconn, FEATURES_DEFAULT_PROBE_INTERVAL_SEC);
 }
 
 static bool