diff mbox series

[ovs-dev,v2] ovn-controller: Reduce size of the SB monitor condition.

Message ID 20230621094753.150072-1-dceara@redhat.com
State Accepted
Headers show
Series [ovs-dev,v2] ovn-controller: Reduce size of the SB monitor condition. | expand

Checks

Context Check Description
ovsrobot/apply-robot success apply and check: success
ovsrobot/github-robot-_Build_and_Test success github build: passed
ovsrobot/github-robot-_ovn-kubernetes fail github build: failed

Commit Message

Dumitru Ceara June 21, 2023, 9:47 a.m. UTC
We don't need to explicitly add port bindings that were already bound
locally.  We implicitly get those because we monitor the datapaths
they're attached to.

When performing an ovn-heater 500-node density-heavy scale test [0], with
conditional monitoring enabled, the unreasonably long poll intervals on
the Southbound database (the ones that took more than one second) are
measured as:

  ---------------------------------------------------------------------
       Count      Min        Max       Median      Mean   95 percentile
  ---------------------------------------------------------------------
        56.0     1010.0     2664.0     1455.5     1544.9     2163.0
        77.0     1015.0     3460.0     1896.0     1858.2     2907.8
        69.0     1010.0     3118.0     1618.0     1688.1     2582.4
  ---------------------------------------------------------------------
       202.0     1010.0     3460.0     1610.0     1713.3     2711.5

Compared to the baseline results (also with conditional monitoring
enabled):
  ---------------------------------------------------------------------
       Count      Min        Max       Median      Mean   95 percentile
  ---------------------------------------------------------------------
       141.0     1003.0    18338.0     1721.0     2625.4     7626.0
       151.0     1019.0    80297.0     1808.0     3410.7     9089.0
       165.0     1007.0    50736.0     2354.0     3067.7     7309.8
  ---------------------------------------------------------------------
       457.0     1003.0    80297.0     2131.0     3044.6     7799.6

We see significant improvement on the server side.  This is explained
by the fact that now the Southbound database server doesn't need to
apply as many conditions as before when filtering individual monitor
contents.

Note: Sub-ports - OVN switch ports with parent_port set - have to be
monitored unconditionally as we cannot efficiently determine their local
datapaths without including all local OVS interfaces in the monitor.

[0] https://github.com/ovn-org/ovn-heater/blob/main/test-scenarios/ocp-500-density-heavy.yml

Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2139194
Reported-by: Ilya Maximets <i.maximets@ovn.org>
Signed-off-by: Dumitru Ceara <dceara@redhat.com>
---
V2:
- Addressed Han's comments:
  - changed commit log: removed assumption about sub-ports.
  - added ovn-nb documentation note about sub-ports monitoring.
  - added ovn-controller documentation note about sub-ports monitoring.
  - added TODO item.
- Added NEWS item.
---
 NEWS                            |  3 +++
 TODO.rst                        |  6 ++++++
 controller/binding.c            |  2 +-
 controller/binding.h            |  2 +-
 controller/ovn-controller.8.xml |  6 ++++++
 controller/ovn-controller.c     | 19 ++++++++++++++++---
 ovn-nb.xml                      |  6 +++++-
 7 files changed, 38 insertions(+), 6 deletions(-)

Comments

Han Zhou June 21, 2023, 2:50 p.m. UTC | #1
On Wed, Jun 21, 2023 at 2:48 AM Dumitru Ceara <dceara@redhat.com> wrote:
>
> We don't need to explicitly add port bindings that were already bound
> locally.  We implicitly get those because we monitor the datapaths
> they're attached to.
>
> When performing an ovn-heater 500-node density-heavy scale test [0], with
> conditional monitoring enabled, the unreasonably long poll intervals on
> the Southbound database (the ones that took more than one second) are
> measured as:
>
>   ---------------------------------------------------------------------
>        Count      Min        Max       Median      Mean   95 percentile
>   ---------------------------------------------------------------------
>         56.0     1010.0     2664.0     1455.5     1544.9     2163.0
>         77.0     1015.0     3460.0     1896.0     1858.2     2907.8
>         69.0     1010.0     3118.0     1618.0     1688.1     2582.4
>   ---------------------------------------------------------------------
>        202.0     1010.0     3460.0     1610.0     1713.3     2711.5
>
> Compared to the baseline results (also with conditional monitoring
> enabled):
>   ---------------------------------------------------------------------
>        Count      Min        Max       Median      Mean   95 percentile
>   ---------------------------------------------------------------------
>        141.0     1003.0    18338.0     1721.0     2625.4     7626.0
>        151.0     1019.0    80297.0     1808.0     3410.7     9089.0
>        165.0     1007.0    50736.0     2354.0     3067.7     7309.8
>   ---------------------------------------------------------------------
>        457.0     1003.0    80297.0     2131.0     3044.6     7799.6
>
> We see significant improvement on the server side.  This is explained
> by the fact that now the Southbound database server doesn't need to
> apply as many conditions as before when filtering individual monitor
> contents.
>
> Note: Sub-ports - OVN switch ports with parent_port set - have to be
> monitored unconditionally as we cannot efficiently determine their local
> datapaths without including all local OVS interfaces in the monitor.
>
> [0]
https://github.com/ovn-org/ovn-heater/blob/main/test-scenarios/ocp-500-density-heavy.yml
>
> Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2139194
> Reported-by: Ilya Maximets <i.maximets@ovn.org>
> Signed-off-by: Dumitru Ceara <dceara@redhat.com>
> ---
> V2:
> - Addressed Han's comments:
>   - changed commit log: removed assumption about sub-ports.
>   - added ovn-nb documentation note about sub-ports monitoring.
>   - added ovn-controller documentation note about sub-ports monitoring.
>   - added TODO item.
> - Added NEWS item.
> ---
>  NEWS                            |  3 +++
>  TODO.rst                        |  6 ++++++
>  controller/binding.c            |  2 +-
>  controller/binding.h            |  2 +-
>  controller/ovn-controller.8.xml |  6 ++++++
>  controller/ovn-controller.c     | 19 ++++++++++++++++---
>  ovn-nb.xml                      |  6 +++++-
>  7 files changed, 38 insertions(+), 6 deletions(-)
>
> diff --git a/NEWS b/NEWS
> index 482970eeeb..8275877f99 100644
> --- a/NEWS
> +++ b/NEWS
> @@ -7,6 +7,9 @@ Post v23.06.0
>      DHCPv4 "hostname" (12) option.
>    - Support to create/update MAC_Binding when GARP received from VTEP
(RAMP)
>      switch on l3dgw port.
> +  - To allow optimizing ovn-controller's monitor conditions for the
regular
> +    VIF case, ovn-controller now unconditionally monitors all sub-ports
> +    (ports with parent_port set).
>
>  OVN v23.06.0 - 01 Jun 2023
>  --------------------------
> diff --git a/TODO.rst b/TODO.rst
> index dff163e70b..b790a9fadf 100644
> --- a/TODO.rst
> +++ b/TODO.rst
> @@ -124,3 +124,9 @@ OVN To-do List
>  * Load Balancer templates
>
>    * Support combining the VIP IP and port into a single template
variable.
> +
> +* ovn-controller conditional monitoring
> +
> +  * Improve sub-ports (with parent_port set) conditional monitoring;
these
> +    are currently unconditionally monitored, even if ovn-monitor-all is
> +    set to false.
> diff --git a/controller/binding.c b/controller/binding.c
> index 486095ca79..359ad6d660 100644
> --- a/controller/binding.c
> +++ b/controller/binding.c
> @@ -785,7 +785,7 @@ local_binding_data_destroy(struct local_binding_data
*lbinding_data)
>  }
>
>  const struct sbrec_port_binding *
> -local_binding_get_primary_pb(struct shash *local_bindings,
> +local_binding_get_primary_pb(const struct shash *local_bindings,
>                               const char *pb_name)
>  {
>      struct local_binding *lbinding =
> diff --git a/controller/binding.h b/controller/binding.h
> index 0e57f02ee5..319cbd263c 100644
> --- a/controller/binding.h
> +++ b/controller/binding.h
> @@ -150,7 +150,7 @@ void local_binding_data_init(struct
local_binding_data *);
>  void local_binding_data_destroy(struct local_binding_data *);
>
>  const struct sbrec_port_binding *local_binding_get_primary_pb(
> -    struct shash *local_bindings, const char *pb_name);
> +    const struct shash *local_bindings, const char *pb_name);
>  ofp_port_t local_binding_get_lport_ofport(const struct shash
*local_bindings,
>                                            const char *pb_name);
>
> diff --git a/controller/ovn-controller.8.xml
b/controller/ovn-controller.8.xml
> index f61f430084..0883d8da9b 100644
> --- a/controller/ovn-controller.8.xml
> +++ b/controller/ovn-controller.8.xml
> @@ -128,6 +128,12 @@
>            to <code>true</code> for environments that all workloads need
to be
>            reachable from each other.
>          </p>
> +        <p>
> +          NOTE: for efficiency and scalability in common scenarios
> +          <code>ovn-controller</code> unconditionally monitors all
sub-ports
> +          (ports with <code>parent_port</code> set) regardless of the
> +          <code>ovn-monitor-all</code> value.
> +        </p>
>          <p>
>            Default value is <var>false</var>.
>          </p>
> diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c
> index a474069798..96d01219e1 100644
> --- a/controller/ovn-controller.c
> +++ b/controller/ovn-controller.c
> @@ -199,6 +199,7 @@ static unsigned int
>  update_sb_monitors(struct ovsdb_idl *ovnsb_idl,
>                     const struct sbrec_chassis *chassis,
>                     const struct sset *local_ifaces,
> +                   const struct shash *local_bindings,
>                     struct hmap *local_datapaths,
>                     bool monitor_all)
>  {
> @@ -297,10 +298,21 @@ update_sb_monitors(struct ovsdb_idl *ovnsb_idl,
>
>      if (local_ifaces) {
>          const char *name;
> +
> +        ovs_assert(local_bindings);
>          SSET_FOR_EACH (name, local_ifaces) {
> +            /* Skip the VIFs we bound already, we should have a local
datapath
> +             * for those. */
> +            const struct sbrec_port_binding *local_pb
> +                = local_binding_get_primary_pb(local_bindings, name);
> +            if (local_pb && get_lport_type(local_pb) == LP_VIF) {
> +                continue;
> +            }
>              sbrec_port_binding_add_clause_logical_port(&pb, OVSDB_F_EQ,
name);
> -            sbrec_port_binding_add_clause_parent_port(&pb, OVSDB_F_EQ,
name);
>          }
> +        /* Monitor all sub-ports unconditionally; we don't expect a lot
of
> +         * them in the SB database. */
> +        sbrec_port_binding_add_clause_parent_port(&pb, OVSDB_F_NE, NULL);
>      }
>      if (local_datapaths) {
>          const struct local_datapath *ld;
> @@ -609,7 +621,7 @@ update_sb_db(struct ovsdb_idl *ovs_idl, struct
ovsdb_idl *ovnsb_idl,
>           * extra cost. Instead, it is called after the engine execution
only
>           * when it is necessary. */
>          unsigned int next_cond_seqno =
> -            update_sb_monitors(ovnsb_idl, NULL, NULL, NULL, true);
> +            update_sb_monitors(ovnsb_idl, NULL, NULL, NULL, NULL, true);
>          if (sb_cond_seqno) {
>              *sb_cond_seqno = next_cond_seqno;
>          }
> @@ -4712,7 +4724,7 @@ main(int argc, char *argv[])
>      ovsdb_idl_omit(ovnsb_idl_loop.idl,
>                     &sbrec_chassis_private_col_external_ids);
>
> -    update_sb_monitors(ovnsb_idl_loop.idl, NULL, NULL, NULL, false);
> +    update_sb_monitors(ovnsb_idl_loop.idl, NULL, NULL, NULL, NULL,
false);
>
>      stopwatch_create(CONTROLLER_LOOP_STOPWATCH_NAME, SW_MS);
>      stopwatch_create(OFCTRL_PUT_STOPWATCH_NAME, SW_MS);
> @@ -5355,6 +5367,7 @@ main(int argc, char *argv[])
>                                  update_sb_monitors(
>                                      ovnsb_idl_loop.idl, chassis,
>                                      &runtime_data->local_lports,
> +
 &runtime_data->lbinding_data.bindings,
>                                      &runtime_data->local_datapaths,
>                                      sb_monitor_all);
>                          }
> diff --git a/ovn-nb.xml b/ovn-nb.xml
> index cf9c92009d..1fb9fa10ee 100644
> --- a/ovn-nb.xml
> +++ b/ovn-nb.xml
> @@ -1302,7 +1302,11 @@
>        <column name="parent_name">
>          The VM interface through which the nested container sends its
network
>          traffic.  This must match the <ref column="name"/> column for
some
> -        other <ref table="Logical_Switch_Port"/>.
> +        other <ref table="Logical_Switch_Port"/>.  Note: for performance
> +        reasons, unlike for regular VIFs, <code>ovn-controller</code>
will

It may be better to say "for performance of OVN Southbound database
conditional monitoring ...", otherwise the user might be confused.

> +        register to get updates about all OVN Southbound database
> +        <ref db="OVN_Southbound" table="Port_Binding"/> table records
> +        that correspond to nested container ports.

Better to add: "..., even if 'ovn-monitor-all' is set to false. See
ovn-controlller (8)." or something similar.

Please feel free to adjust the words and merge.

Acked-by: Han Zhou <hzhou@ovn.org>

>        </column>
>
>        <column name="tag_request">
> --
> 2.31.1
>
Dumitru Ceara June 22, 2023, 8:12 a.m. UTC | #2
On 6/21/23 16:50, Han Zhou wrote:
> On Wed, Jun 21, 2023 at 2:48 AM Dumitru Ceara <dceara@redhat.com> wrote:
>>
>> We don't need to explicitly add port bindings that were already bound
>> locally.  We implicitly get those because we monitor the datapaths
>> they're attached to.
>>
>> When performing an ovn-heater 500-node density-heavy scale test [0], with
>> conditional monitoring enabled, the unreasonably long poll intervals on
>> the Southbound database (the ones that took more than one second) are
>> measured as:
>>
>>   ---------------------------------------------------------------------
>>        Count      Min        Max       Median      Mean   95 percentile
>>   ---------------------------------------------------------------------
>>         56.0     1010.0     2664.0     1455.5     1544.9     2163.0
>>         77.0     1015.0     3460.0     1896.0     1858.2     2907.8
>>         69.0     1010.0     3118.0     1618.0     1688.1     2582.4
>>   ---------------------------------------------------------------------
>>        202.0     1010.0     3460.0     1610.0     1713.3     2711.5
>>
>> Compared to the baseline results (also with conditional monitoring
>> enabled):
>>   ---------------------------------------------------------------------
>>        Count      Min        Max       Median      Mean   95 percentile
>>   ---------------------------------------------------------------------
>>        141.0     1003.0    18338.0     1721.0     2625.4     7626.0
>>        151.0     1019.0    80297.0     1808.0     3410.7     9089.0
>>        165.0     1007.0    50736.0     2354.0     3067.7     7309.8
>>   ---------------------------------------------------------------------
>>        457.0     1003.0    80297.0     2131.0     3044.6     7799.6
>>
>> We see significant improvement on the server side.  This is explained
>> by the fact that now the Southbound database server doesn't need to
>> apply as many conditions as before when filtering individual monitor
>> contents.
>>
>> Note: Sub-ports - OVN switch ports with parent_port set - have to be
>> monitored unconditionally as we cannot efficiently determine their local
>> datapaths without including all local OVS interfaces in the monitor.
>>
>> [0]
> https://github.com/ovn-org/ovn-heater/blob/main/test-scenarios/ocp-500-density-heavy.yml
>>
>> Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2139194
>> Reported-by: Ilya Maximets <i.maximets@ovn.org>
>> Signed-off-by: Dumitru Ceara <dceara@redhat.com>
>> ---
>> V2:
>> - Addressed Han's comments:
>>   - changed commit log: removed assumption about sub-ports.
>>   - added ovn-nb documentation note about sub-ports monitoring.
>>   - added ovn-controller documentation note about sub-ports monitoring.
>>   - added TODO item.
>> - Added NEWS item.
>> ---
>>  NEWS                            |  3 +++
>>  TODO.rst                        |  6 ++++++
>>  controller/binding.c            |  2 +-
>>  controller/binding.h            |  2 +-
>>  controller/ovn-controller.8.xml |  6 ++++++
>>  controller/ovn-controller.c     | 19 ++++++++++++++++---
>>  ovn-nb.xml                      |  6 +++++-
>>  7 files changed, 38 insertions(+), 6 deletions(-)
>>
>> diff --git a/NEWS b/NEWS
>> index 482970eeeb..8275877f99 100644
>> --- a/NEWS
>> +++ b/NEWS
>> @@ -7,6 +7,9 @@ Post v23.06.0
>>      DHCPv4 "hostname" (12) option.
>>    - Support to create/update MAC_Binding when GARP received from VTEP
> (RAMP)
>>      switch on l3dgw port.
>> +  - To allow optimizing ovn-controller's monitor conditions for the
> regular
>> +    VIF case, ovn-controller now unconditionally monitors all sub-ports
>> +    (ports with parent_port set).
>>
>>  OVN v23.06.0 - 01 Jun 2023
>>  --------------------------
>> diff --git a/TODO.rst b/TODO.rst
>> index dff163e70b..b790a9fadf 100644
>> --- a/TODO.rst
>> +++ b/TODO.rst
>> @@ -124,3 +124,9 @@ OVN To-do List
>>  * Load Balancer templates
>>
>>    * Support combining the VIP IP and port into a single template
> variable.
>> +
>> +* ovn-controller conditional monitoring
>> +
>> +  * Improve sub-ports (with parent_port set) conditional monitoring;
> these
>> +    are currently unconditionally monitored, even if ovn-monitor-all is
>> +    set to false.
>> diff --git a/controller/binding.c b/controller/binding.c
>> index 486095ca79..359ad6d660 100644
>> --- a/controller/binding.c
>> +++ b/controller/binding.c
>> @@ -785,7 +785,7 @@ local_binding_data_destroy(struct local_binding_data
> *lbinding_data)
>>  }
>>
>>  const struct sbrec_port_binding *
>> -local_binding_get_primary_pb(struct shash *local_bindings,
>> +local_binding_get_primary_pb(const struct shash *local_bindings,
>>                               const char *pb_name)
>>  {
>>      struct local_binding *lbinding =
>> diff --git a/controller/binding.h b/controller/binding.h
>> index 0e57f02ee5..319cbd263c 100644
>> --- a/controller/binding.h
>> +++ b/controller/binding.h
>> @@ -150,7 +150,7 @@ void local_binding_data_init(struct
> local_binding_data *);
>>  void local_binding_data_destroy(struct local_binding_data *);
>>
>>  const struct sbrec_port_binding *local_binding_get_primary_pb(
>> -    struct shash *local_bindings, const char *pb_name);
>> +    const struct shash *local_bindings, const char *pb_name);
>>  ofp_port_t local_binding_get_lport_ofport(const struct shash
> *local_bindings,
>>                                            const char *pb_name);
>>
>> diff --git a/controller/ovn-controller.8.xml
> b/controller/ovn-controller.8.xml
>> index f61f430084..0883d8da9b 100644
>> --- a/controller/ovn-controller.8.xml
>> +++ b/controller/ovn-controller.8.xml
>> @@ -128,6 +128,12 @@
>>            to <code>true</code> for environments that all workloads need
> to be
>>            reachable from each other.
>>          </p>
>> +        <p>
>> +          NOTE: for efficiency and scalability in common scenarios
>> +          <code>ovn-controller</code> unconditionally monitors all
> sub-ports
>> +          (ports with <code>parent_port</code> set) regardless of the
>> +          <code>ovn-monitor-all</code> value.
>> +        </p>
>>          <p>
>>            Default value is <var>false</var>.
>>          </p>
>> diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c
>> index a474069798..96d01219e1 100644
>> --- a/controller/ovn-controller.c
>> +++ b/controller/ovn-controller.c
>> @@ -199,6 +199,7 @@ static unsigned int
>>  update_sb_monitors(struct ovsdb_idl *ovnsb_idl,
>>                     const struct sbrec_chassis *chassis,
>>                     const struct sset *local_ifaces,
>> +                   const struct shash *local_bindings,
>>                     struct hmap *local_datapaths,
>>                     bool monitor_all)
>>  {
>> @@ -297,10 +298,21 @@ update_sb_monitors(struct ovsdb_idl *ovnsb_idl,
>>
>>      if (local_ifaces) {
>>          const char *name;
>> +
>> +        ovs_assert(local_bindings);
>>          SSET_FOR_EACH (name, local_ifaces) {
>> +            /* Skip the VIFs we bound already, we should have a local
> datapath
>> +             * for those. */
>> +            const struct sbrec_port_binding *local_pb
>> +                = local_binding_get_primary_pb(local_bindings, name);
>> +            if (local_pb && get_lport_type(local_pb) == LP_VIF) {
>> +                continue;
>> +            }
>>              sbrec_port_binding_add_clause_logical_port(&pb, OVSDB_F_EQ,
> name);
>> -            sbrec_port_binding_add_clause_parent_port(&pb, OVSDB_F_EQ,
> name);
>>          }
>> +        /* Monitor all sub-ports unconditionally; we don't expect a lot
> of
>> +         * them in the SB database. */
>> +        sbrec_port_binding_add_clause_parent_port(&pb, OVSDB_F_NE, NULL);
>>      }
>>      if (local_datapaths) {
>>          const struct local_datapath *ld;
>> @@ -609,7 +621,7 @@ update_sb_db(struct ovsdb_idl *ovs_idl, struct
> ovsdb_idl *ovnsb_idl,
>>           * extra cost. Instead, it is called after the engine execution
> only
>>           * when it is necessary. */
>>          unsigned int next_cond_seqno =
>> -            update_sb_monitors(ovnsb_idl, NULL, NULL, NULL, true);
>> +            update_sb_monitors(ovnsb_idl, NULL, NULL, NULL, NULL, true);
>>          if (sb_cond_seqno) {
>>              *sb_cond_seqno = next_cond_seqno;
>>          }
>> @@ -4712,7 +4724,7 @@ main(int argc, char *argv[])
>>      ovsdb_idl_omit(ovnsb_idl_loop.idl,
>>                     &sbrec_chassis_private_col_external_ids);
>>
>> -    update_sb_monitors(ovnsb_idl_loop.idl, NULL, NULL, NULL, false);
>> +    update_sb_monitors(ovnsb_idl_loop.idl, NULL, NULL, NULL, NULL,
> false);
>>
>>      stopwatch_create(CONTROLLER_LOOP_STOPWATCH_NAME, SW_MS);
>>      stopwatch_create(OFCTRL_PUT_STOPWATCH_NAME, SW_MS);
>> @@ -5355,6 +5367,7 @@ main(int argc, char *argv[])
>>                                  update_sb_monitors(
>>                                      ovnsb_idl_loop.idl, chassis,
>>                                      &runtime_data->local_lports,
>> +
>  &runtime_data->lbinding_data.bindings,
>>                                      &runtime_data->local_datapaths,
>>                                      sb_monitor_all);
>>                          }
>> diff --git a/ovn-nb.xml b/ovn-nb.xml
>> index cf9c92009d..1fb9fa10ee 100644
>> --- a/ovn-nb.xml
>> +++ b/ovn-nb.xml
>> @@ -1302,7 +1302,11 @@
>>        <column name="parent_name">
>>          The VM interface through which the nested container sends its
> network
>>          traffic.  This must match the <ref column="name"/> column for
> some
>> -        other <ref table="Logical_Switch_Port"/>.
>> +        other <ref table="Logical_Switch_Port"/>.  Note: for performance
>> +        reasons, unlike for regular VIFs, <code>ovn-controller</code>
> will
> 
> It may be better to say "for performance of OVN Southbound database
> conditional monitoring ...", otherwise the user might be confused.
> 
>> +        register to get updates about all OVN Southbound database
>> +        <ref db="OVN_Southbound" table="Port_Binding"/> table records
>> +        that correspond to nested container ports.
> 
> Better to add: "..., even if 'ovn-monitor-all' is set to false. See
> ovn-controlller (8)." or something similar.
> 
> Please feel free to adjust the words and merge.
> 
> Acked-by: Han Zhou <hzhou@ovn.org>
> 

Done, thanks again for the review!

Regards,
Dumitru
Han Zhou June 28, 2023, 2:54 a.m. UTC | #3
On Thu, Jun 22, 2023 at 1:12 AM Dumitru Ceara <dceara@redhat.com> wrote:
>
> On 6/21/23 16:50, Han Zhou wrote:
> > On Wed, Jun 21, 2023 at 2:48 AM Dumitru Ceara <dceara@redhat.com> wrote:
> >>
> >> We don't need to explicitly add port bindings that were already bound
> >> locally.  We implicitly get those because we monitor the datapaths
> >> they're attached to.
> >>
> >> When performing an ovn-heater 500-node density-heavy scale test [0],
with
> >> conditional monitoring enabled, the unreasonably long poll intervals on
> >> the Southbound database (the ones that took more than one second) are
> >> measured as:
> >>
> >>   ---------------------------------------------------------------------
> >>        Count      Min        Max       Median      Mean   95 percentile
> >>   ---------------------------------------------------------------------
> >>         56.0     1010.0     2664.0     1455.5     1544.9     2163.0
> >>         77.0     1015.0     3460.0     1896.0     1858.2     2907.8
> >>         69.0     1010.0     3118.0     1618.0     1688.1     2582.4
> >>   ---------------------------------------------------------------------
> >>        202.0     1010.0     3460.0     1610.0     1713.3     2711.5
> >>
> >> Compared to the baseline results (also with conditional monitoring
> >> enabled):
> >>   ---------------------------------------------------------------------
> >>        Count      Min        Max       Median      Mean   95 percentile
> >>   ---------------------------------------------------------------------
> >>        141.0     1003.0    18338.0     1721.0     2625.4     7626.0
> >>        151.0     1019.0    80297.0     1808.0     3410.7     9089.0
> >>        165.0     1007.0    50736.0     2354.0     3067.7     7309.8
> >>   ---------------------------------------------------------------------
> >>        457.0     1003.0    80297.0     2131.0     3044.6     7799.6
> >>
> >> We see significant improvement on the server side.  This is explained
> >> by the fact that now the Southbound database server doesn't need to
> >> apply as many conditions as before when filtering individual monitor
> >> contents.
> >>
> >> Note: Sub-ports - OVN switch ports with parent_port set - have to be
> >> monitored unconditionally as we cannot efficiently determine their
local
> >> datapaths without including all local OVS interfaces in the monitor.
> >>
> >> [0]
> >
https://github.com/ovn-org/ovn-heater/blob/main/test-scenarios/ocp-500-density-heavy.yml
> >>
> >> Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2139194
> >> Reported-by: Ilya Maximets <i.maximets@ovn.org>
> >> Signed-off-by: Dumitru Ceara <dceara@redhat.com>
> >> ---
> >> V2:
> >> - Addressed Han's comments:
> >>   - changed commit log: removed assumption about sub-ports.
> >>   - added ovn-nb documentation note about sub-ports monitoring.
> >>   - added ovn-controller documentation note about sub-ports monitoring.
> >>   - added TODO item.
> >> - Added NEWS item.
> >> ---
> >>  NEWS                            |  3 +++
> >>  TODO.rst                        |  6 ++++++
> >>  controller/binding.c            |  2 +-
> >>  controller/binding.h            |  2 +-
> >>  controller/ovn-controller.8.xml |  6 ++++++
> >>  controller/ovn-controller.c     | 19 ++++++++++++++++---
> >>  ovn-nb.xml                      |  6 +++++-
> >>  7 files changed, 38 insertions(+), 6 deletions(-)
> >>
> >> diff --git a/NEWS b/NEWS
> >> index 482970eeeb..8275877f99 100644
> >> --- a/NEWS
> >> +++ b/NEWS
> >> @@ -7,6 +7,9 @@ Post v23.06.0
> >>      DHCPv4 "hostname" (12) option.
> >>    - Support to create/update MAC_Binding when GARP received from VTEP
> > (RAMP)
> >>      switch on l3dgw port.
> >> +  - To allow optimizing ovn-controller's monitor conditions for the
> > regular
> >> +    VIF case, ovn-controller now unconditionally monitors all
sub-ports
> >> +    (ports with parent_port set).
> >>
> >>  OVN v23.06.0 - 01 Jun 2023
> >>  --------------------------
> >> diff --git a/TODO.rst b/TODO.rst
> >> index dff163e70b..b790a9fadf 100644
> >> --- a/TODO.rst
> >> +++ b/TODO.rst
> >> @@ -124,3 +124,9 @@ OVN To-do List
> >>  * Load Balancer templates
> >>
> >>    * Support combining the VIP IP and port into a single template
> > variable.
> >> +
> >> +* ovn-controller conditional monitoring
> >> +
> >> +  * Improve sub-ports (with parent_port set) conditional monitoring;
> > these
> >> +    are currently unconditionally monitored, even if ovn-monitor-all
is
> >> +    set to false.
> >> diff --git a/controller/binding.c b/controller/binding.c
> >> index 486095ca79..359ad6d660 100644
> >> --- a/controller/binding.c
> >> +++ b/controller/binding.c
> >> @@ -785,7 +785,7 @@ local_binding_data_destroy(struct
local_binding_data
> > *lbinding_data)
> >>  }
> >>
> >>  const struct sbrec_port_binding *
> >> -local_binding_get_primary_pb(struct shash *local_bindings,
> >> +local_binding_get_primary_pb(const struct shash *local_bindings,
> >>                               const char *pb_name)
> >>  {
> >>      struct local_binding *lbinding =
> >> diff --git a/controller/binding.h b/controller/binding.h
> >> index 0e57f02ee5..319cbd263c 100644
> >> --- a/controller/binding.h
> >> +++ b/controller/binding.h
> >> @@ -150,7 +150,7 @@ void local_binding_data_init(struct
> > local_binding_data *);
> >>  void local_binding_data_destroy(struct local_binding_data *);
> >>
> >>  const struct sbrec_port_binding *local_binding_get_primary_pb(
> >> -    struct shash *local_bindings, const char *pb_name);
> >> +    const struct shash *local_bindings, const char *pb_name);
> >>  ofp_port_t local_binding_get_lport_ofport(const struct shash
> > *local_bindings,
> >>                                            const char *pb_name);
> >>
> >> diff --git a/controller/ovn-controller.8.xml
> > b/controller/ovn-controller.8.xml
> >> index f61f430084..0883d8da9b 100644
> >> --- a/controller/ovn-controller.8.xml
> >> +++ b/controller/ovn-controller.8.xml
> >> @@ -128,6 +128,12 @@
> >>            to <code>true</code> for environments that all workloads
need
> > to be
> >>            reachable from each other.
> >>          </p>
> >> +        <p>
> >> +          NOTE: for efficiency and scalability in common scenarios
> >> +          <code>ovn-controller</code> unconditionally monitors all
> > sub-ports
> >> +          (ports with <code>parent_port</code> set) regardless of the
> >> +          <code>ovn-monitor-all</code> value.
> >> +        </p>
> >>          <p>
> >>            Default value is <var>false</var>.
> >>          </p>
> >> diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c
> >> index a474069798..96d01219e1 100644
> >> --- a/controller/ovn-controller.c
> >> +++ b/controller/ovn-controller.c
> >> @@ -199,6 +199,7 @@ static unsigned int
> >>  update_sb_monitors(struct ovsdb_idl *ovnsb_idl,
> >>                     const struct sbrec_chassis *chassis,
> >>                     const struct sset *local_ifaces,
> >> +                   const struct shash *local_bindings,
> >>                     struct hmap *local_datapaths,
> >>                     bool monitor_all)
> >>  {
> >> @@ -297,10 +298,21 @@ update_sb_monitors(struct ovsdb_idl *ovnsb_idl,
> >>
> >>      if (local_ifaces) {
> >>          const char *name;
> >> +
> >> +        ovs_assert(local_bindings);
> >>          SSET_FOR_EACH (name, local_ifaces) {
> >> +            /* Skip the VIFs we bound already, we should have a local
> > datapath
> >> +             * for those. */
> >> +            const struct sbrec_port_binding *local_pb
> >> +                = local_binding_get_primary_pb(local_bindings, name);
> >> +            if (local_pb && get_lport_type(local_pb) == LP_VIF) {
> >> +                continue;
> >> +            }
> >>              sbrec_port_binding_add_clause_logical_port(&pb,
OVSDB_F_EQ,
> > name);
> >> -            sbrec_port_binding_add_clause_parent_port(&pb, OVSDB_F_EQ,
> > name);
> >>          }
> >> +        /* Monitor all sub-ports unconditionally; we don't expect a
lot
> > of
> >> +         * them in the SB database. */
> >> +        sbrec_port_binding_add_clause_parent_port(&pb, OVSDB_F_NE,
NULL);
> >>      }
> >>      if (local_datapaths) {
> >>          const struct local_datapath *ld;
> >> @@ -609,7 +621,7 @@ update_sb_db(struct ovsdb_idl *ovs_idl, struct
> > ovsdb_idl *ovnsb_idl,
> >>           * extra cost. Instead, it is called after the engine
execution
> > only
> >>           * when it is necessary. */
> >>          unsigned int next_cond_seqno =
> >> -            update_sb_monitors(ovnsb_idl, NULL, NULL, NULL, true);
> >> +            update_sb_monitors(ovnsb_idl, NULL, NULL, NULL, NULL,
true);
> >>          if (sb_cond_seqno) {
> >>              *sb_cond_seqno = next_cond_seqno;
> >>          }
> >> @@ -4712,7 +4724,7 @@ main(int argc, char *argv[])
> >>      ovsdb_idl_omit(ovnsb_idl_loop.idl,
> >>                     &sbrec_chassis_private_col_external_ids);
> >>
> >> -    update_sb_monitors(ovnsb_idl_loop.idl, NULL, NULL, NULL, false);
> >> +    update_sb_monitors(ovnsb_idl_loop.idl, NULL, NULL, NULL, NULL,
> > false);
> >>
> >>      stopwatch_create(CONTROLLER_LOOP_STOPWATCH_NAME, SW_MS);
> >>      stopwatch_create(OFCTRL_PUT_STOPWATCH_NAME, SW_MS);
> >> @@ -5355,6 +5367,7 @@ main(int argc, char *argv[])
> >>                                  update_sb_monitors(
> >>                                      ovnsb_idl_loop.idl, chassis,
> >>                                      &runtime_data->local_lports,
> >> +
> >  &runtime_data->lbinding_data.bindings,
> >>                                      &runtime_data->local_datapaths,
> >>                                      sb_monitor_all);
> >>                          }
> >> diff --git a/ovn-nb.xml b/ovn-nb.xml
> >> index cf9c92009d..1fb9fa10ee 100644
> >> --- a/ovn-nb.xml
> >> +++ b/ovn-nb.xml
> >> @@ -1302,7 +1302,11 @@
> >>        <column name="parent_name">
> >>          The VM interface through which the nested container sends its
> > network
> >>          traffic.  This must match the <ref column="name"/> column for
> > some
> >> -        other <ref table="Logical_Switch_Port"/>.
> >> +        other <ref table="Logical_Switch_Port"/>.  Note: for
performance
> >> +        reasons, unlike for regular VIFs, <code>ovn-controller</code>
> > will
> >
> > It may be better to say "for performance of OVN Southbound database
> > conditional monitoring ...", otherwise the user might be confused.
> >
> >> +        register to get updates about all OVN Southbound database
> >> +        <ref db="OVN_Southbound" table="Port_Binding"/> table records
> >> +        that correspond to nested container ports.
> >
> > Better to add: "..., even if 'ovn-monitor-all' is set to false. See
> > ovn-controlller (8)." or something similar.
> >
> > Please feel free to adjust the words and merge.
> >
> > Acked-by: Han Zhou <hzhou@ovn.org>
> >
>
> Done, thanks again for the review!
>
> Regards,
> Dumitru
>

Hi Dumitru,

I noticed that with this patch, the ovn-controller port-binding latency
increased a lot. With the simulated k8s scale test [0] of 500 chassis x 50
LSPs, the latency of a single LSP add/delete is almost tripled:

Command:
ovn-nbctl --wait=hv --print-wait-time lsp-add ls_1_0001 lsp_1_0001_01 --
lsp-set-addresses lsp_1_0001_01 "ff:f1:bb:00:01:01 1.0.1.101" --
lsp-set-port-security lsp_1_0001_01 "ff:f1:bb:00:01:01 1.0.1.101"

Before:
Time spent on processing nb_cfg 3:

        ovn-northd delay before processing:     2ms

        ovn-northd completion:                  42ms


        ovn-controller(s) completion:           68ms

(68 - 42 = 26ms)

After:
Time spent on processing nb_cfg 3:




        ovn-northd delay before processing:     1ms


        ovn-northd completion:                  38ms




        ovn-controller(s) completion:           112m

(112 - 38 = 74ms)

I think it may be because each port bind/unbind now introduces a SB DB
monitor condition change, which may have caused some delay. But I didn't
expect the impact to be that high because this test has only one single
ovn-controller (all the other 499 chassis are fake). I haven't spent time
to dig deeper yet.
Even though, it is still <100 ms, so I am not too worried about it. But
just to share the observation and raise the awareness.

[0] https://github.com/hzhou8/ovn-test-script

Thanks,
Han
Dumitru Ceara June 28, 2023, 8:30 a.m. UTC | #4
On 6/28/23 04:54, Han Zhou wrote:
> On Thu, Jun 22, 2023 at 1:12 AM Dumitru Ceara <dceara@redhat.com> wrote:
>>
>> On 6/21/23 16:50, Han Zhou wrote:
>>> On Wed, Jun 21, 2023 at 2:48 AM Dumitru Ceara <dceara@redhat.com> wrote:
>>>>
>>>> We don't need to explicitly add port bindings that were already bound
>>>> locally.  We implicitly get those because we monitor the datapaths
>>>> they're attached to.
>>>>
>>>> When performing an ovn-heater 500-node density-heavy scale test [0],
> with
>>>> conditional monitoring enabled, the unreasonably long poll intervals on
>>>> the Southbound database (the ones that took more than one second) are
>>>> measured as:
>>>>
>>>>   ---------------------------------------------------------------------
>>>>        Count      Min        Max       Median      Mean   95 percentile
>>>>   ---------------------------------------------------------------------
>>>>         56.0     1010.0     2664.0     1455.5     1544.9     2163.0
>>>>         77.0     1015.0     3460.0     1896.0     1858.2     2907.8
>>>>         69.0     1010.0     3118.0     1618.0     1688.1     2582.4
>>>>   ---------------------------------------------------------------------
>>>>        202.0     1010.0     3460.0     1610.0     1713.3     2711.5
>>>>
>>>> Compared to the baseline results (also with conditional monitoring
>>>> enabled):
>>>>   ---------------------------------------------------------------------
>>>>        Count      Min        Max       Median      Mean   95 percentile
>>>>   ---------------------------------------------------------------------
>>>>        141.0     1003.0    18338.0     1721.0     2625.4     7626.0
>>>>        151.0     1019.0    80297.0     1808.0     3410.7     9089.0
>>>>        165.0     1007.0    50736.0     2354.0     3067.7     7309.8
>>>>   ---------------------------------------------------------------------
>>>>        457.0     1003.0    80297.0     2131.0     3044.6     7799.6
>>>>
>>>> We see significant improvement on the server side.  This is explained
>>>> by the fact that now the Southbound database server doesn't need to
>>>> apply as many conditions as before when filtering individual monitor
>>>> contents.
>>>>
>>>> Note: Sub-ports - OVN switch ports with parent_port set - have to be
>>>> monitored unconditionally as we cannot efficiently determine their
> local
>>>> datapaths without including all local OVS interfaces in the monitor.
>>>>
>>>> [0]
>>>
> https://github.com/ovn-org/ovn-heater/blob/main/test-scenarios/ocp-500-density-heavy.yml
>>>>
>>>> Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2139194
>>>> Reported-by: Ilya Maximets <i.maximets@ovn.org>
>>>> Signed-off-by: Dumitru Ceara <dceara@redhat.com>
>>>> ---
>>>> V2:
>>>> - Addressed Han's comments:
>>>>   - changed commit log: removed assumption about sub-ports.
>>>>   - added ovn-nb documentation note about sub-ports monitoring.
>>>>   - added ovn-controller documentation note about sub-ports monitoring.
>>>>   - added TODO item.
>>>> - Added NEWS item.
>>>> ---
>>>>  NEWS                            |  3 +++
>>>>  TODO.rst                        |  6 ++++++
>>>>  controller/binding.c            |  2 +-
>>>>  controller/binding.h            |  2 +-
>>>>  controller/ovn-controller.8.xml |  6 ++++++
>>>>  controller/ovn-controller.c     | 19 ++++++++++++++++---
>>>>  ovn-nb.xml                      |  6 +++++-
>>>>  7 files changed, 38 insertions(+), 6 deletions(-)
>>>>
>>>> diff --git a/NEWS b/NEWS
>>>> index 482970eeeb..8275877f99 100644
>>>> --- a/NEWS
>>>> +++ b/NEWS
>>>> @@ -7,6 +7,9 @@ Post v23.06.0
>>>>      DHCPv4 "hostname" (12) option.
>>>>    - Support to create/update MAC_Binding when GARP received from VTEP
>>> (RAMP)
>>>>      switch on l3dgw port.
>>>> +  - To allow optimizing ovn-controller's monitor conditions for the
>>> regular
>>>> +    VIF case, ovn-controller now unconditionally monitors all
> sub-ports
>>>> +    (ports with parent_port set).
>>>>
>>>>  OVN v23.06.0 - 01 Jun 2023
>>>>  --------------------------
>>>> diff --git a/TODO.rst b/TODO.rst
>>>> index dff163e70b..b790a9fadf 100644
>>>> --- a/TODO.rst
>>>> +++ b/TODO.rst
>>>> @@ -124,3 +124,9 @@ OVN To-do List
>>>>  * Load Balancer templates
>>>>
>>>>    * Support combining the VIP IP and port into a single template
>>> variable.
>>>> +
>>>> +* ovn-controller conditional monitoring
>>>> +
>>>> +  * Improve sub-ports (with parent_port set) conditional monitoring;
>>> these
>>>> +    are currently unconditionally monitored, even if ovn-monitor-all
> is
>>>> +    set to false.
>>>> diff --git a/controller/binding.c b/controller/binding.c
>>>> index 486095ca79..359ad6d660 100644
>>>> --- a/controller/binding.c
>>>> +++ b/controller/binding.c
>>>> @@ -785,7 +785,7 @@ local_binding_data_destroy(struct
> local_binding_data
>>> *lbinding_data)
>>>>  }
>>>>
>>>>  const struct sbrec_port_binding *
>>>> -local_binding_get_primary_pb(struct shash *local_bindings,
>>>> +local_binding_get_primary_pb(const struct shash *local_bindings,
>>>>                               const char *pb_name)
>>>>  {
>>>>      struct local_binding *lbinding =
>>>> diff --git a/controller/binding.h b/controller/binding.h
>>>> index 0e57f02ee5..319cbd263c 100644
>>>> --- a/controller/binding.h
>>>> +++ b/controller/binding.h
>>>> @@ -150,7 +150,7 @@ void local_binding_data_init(struct
>>> local_binding_data *);
>>>>  void local_binding_data_destroy(struct local_binding_data *);
>>>>
>>>>  const struct sbrec_port_binding *local_binding_get_primary_pb(
>>>> -    struct shash *local_bindings, const char *pb_name);
>>>> +    const struct shash *local_bindings, const char *pb_name);
>>>>  ofp_port_t local_binding_get_lport_ofport(const struct shash
>>> *local_bindings,
>>>>                                            const char *pb_name);
>>>>
>>>> diff --git a/controller/ovn-controller.8.xml
>>> b/controller/ovn-controller.8.xml
>>>> index f61f430084..0883d8da9b 100644
>>>> --- a/controller/ovn-controller.8.xml
>>>> +++ b/controller/ovn-controller.8.xml
>>>> @@ -128,6 +128,12 @@
>>>>            to <code>true</code> for environments that all workloads
> need
>>> to be
>>>>            reachable from each other.
>>>>          </p>
>>>> +        <p>
>>>> +          NOTE: for efficiency and scalability in common scenarios
>>>> +          <code>ovn-controller</code> unconditionally monitors all
>>> sub-ports
>>>> +          (ports with <code>parent_port</code> set) regardless of the
>>>> +          <code>ovn-monitor-all</code> value.
>>>> +        </p>
>>>>          <p>
>>>>            Default value is <var>false</var>.
>>>>          </p>
>>>> diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c
>>>> index a474069798..96d01219e1 100644
>>>> --- a/controller/ovn-controller.c
>>>> +++ b/controller/ovn-controller.c
>>>> @@ -199,6 +199,7 @@ static unsigned int
>>>>  update_sb_monitors(struct ovsdb_idl *ovnsb_idl,
>>>>                     const struct sbrec_chassis *chassis,
>>>>                     const struct sset *local_ifaces,
>>>> +                   const struct shash *local_bindings,
>>>>                     struct hmap *local_datapaths,
>>>>                     bool monitor_all)
>>>>  {
>>>> @@ -297,10 +298,21 @@ update_sb_monitors(struct ovsdb_idl *ovnsb_idl,
>>>>
>>>>      if (local_ifaces) {
>>>>          const char *name;
>>>> +
>>>> +        ovs_assert(local_bindings);
>>>>          SSET_FOR_EACH (name, local_ifaces) {
>>>> +            /* Skip the VIFs we bound already, we should have a local
>>> datapath
>>>> +             * for those. */
>>>> +            const struct sbrec_port_binding *local_pb
>>>> +                = local_binding_get_primary_pb(local_bindings, name);
>>>> +            if (local_pb && get_lport_type(local_pb) == LP_VIF) {
>>>> +                continue;
>>>> +            }
>>>>              sbrec_port_binding_add_clause_logical_port(&pb,
> OVSDB_F_EQ,
>>> name);
>>>> -            sbrec_port_binding_add_clause_parent_port(&pb, OVSDB_F_EQ,
>>> name);
>>>>          }
>>>> +        /* Monitor all sub-ports unconditionally; we don't expect a
> lot
>>> of
>>>> +         * them in the SB database. */
>>>> +        sbrec_port_binding_add_clause_parent_port(&pb, OVSDB_F_NE,
> NULL);
>>>>      }
>>>>      if (local_datapaths) {
>>>>          const struct local_datapath *ld;
>>>> @@ -609,7 +621,7 @@ update_sb_db(struct ovsdb_idl *ovs_idl, struct
>>> ovsdb_idl *ovnsb_idl,
>>>>           * extra cost. Instead, it is called after the engine
> execution
>>> only
>>>>           * when it is necessary. */
>>>>          unsigned int next_cond_seqno =
>>>> -            update_sb_monitors(ovnsb_idl, NULL, NULL, NULL, true);
>>>> +            update_sb_monitors(ovnsb_idl, NULL, NULL, NULL, NULL,
> true);
>>>>          if (sb_cond_seqno) {
>>>>              *sb_cond_seqno = next_cond_seqno;
>>>>          }
>>>> @@ -4712,7 +4724,7 @@ main(int argc, char *argv[])
>>>>      ovsdb_idl_omit(ovnsb_idl_loop.idl,
>>>>                     &sbrec_chassis_private_col_external_ids);
>>>>
>>>> -    update_sb_monitors(ovnsb_idl_loop.idl, NULL, NULL, NULL, false);
>>>> +    update_sb_monitors(ovnsb_idl_loop.idl, NULL, NULL, NULL, NULL,
>>> false);
>>>>
>>>>      stopwatch_create(CONTROLLER_LOOP_STOPWATCH_NAME, SW_MS);
>>>>      stopwatch_create(OFCTRL_PUT_STOPWATCH_NAME, SW_MS);
>>>> @@ -5355,6 +5367,7 @@ main(int argc, char *argv[])
>>>>                                  update_sb_monitors(
>>>>                                      ovnsb_idl_loop.idl, chassis,
>>>>                                      &runtime_data->local_lports,
>>>> +
>>>  &runtime_data->lbinding_data.bindings,
>>>>                                      &runtime_data->local_datapaths,
>>>>                                      sb_monitor_all);
>>>>                          }
>>>> diff --git a/ovn-nb.xml b/ovn-nb.xml
>>>> index cf9c92009d..1fb9fa10ee 100644
>>>> --- a/ovn-nb.xml
>>>> +++ b/ovn-nb.xml
>>>> @@ -1302,7 +1302,11 @@
>>>>        <column name="parent_name">
>>>>          The VM interface through which the nested container sends its
>>> network
>>>>          traffic.  This must match the <ref column="name"/> column for
>>> some
>>>> -        other <ref table="Logical_Switch_Port"/>.
>>>> +        other <ref table="Logical_Switch_Port"/>.  Note: for
> performance
>>>> +        reasons, unlike for regular VIFs, <code>ovn-controller</code>
>>> will
>>>
>>> It may be better to say "for performance of OVN Southbound database
>>> conditional monitoring ...", otherwise the user might be confused.
>>>
>>>> +        register to get updates about all OVN Southbound database
>>>> +        <ref db="OVN_Southbound" table="Port_Binding"/> table records
>>>> +        that correspond to nested container ports.
>>>
>>> Better to add: "..., even if 'ovn-monitor-all' is set to false. See
>>> ovn-controlller (8)." or something similar.
>>>
>>> Please feel free to adjust the words and merge.
>>>
>>> Acked-by: Han Zhou <hzhou@ovn.org>
>>>
>>
>> Done, thanks again for the review!
>>
>> Regards,
>> Dumitru
>>
> 
> Hi Dumitru,
> 

Hi Han,

> I noticed that with this patch, the ovn-controller port-binding latency
> increased a lot. With the simulated k8s scale test [0] of 500 chassis x 50
> LSPs, the latency of a single LSP add/delete is almost tripled:
> 
> Command:
> ovn-nbctl --wait=hv --print-wait-time lsp-add ls_1_0001 lsp_1_0001_01 --
> lsp-set-addresses lsp_1_0001_01 "ff:f1:bb:00:01:01 1.0.1.101" --
> lsp-set-port-security lsp_1_0001_01 "ff:f1:bb:00:01:01 1.0.1.101"
> 
> Before:
> Time spent on processing nb_cfg 3:
> 
>         ovn-northd delay before processing:     2ms
> 
>         ovn-northd completion:                  42ms
> 
> 
>         ovn-controller(s) completion:           68ms
> 
> (68 - 42 = 26ms)
> 
> After:
> Time spent on processing nb_cfg 3:
> 
> 
> 
> 
>         ovn-northd delay before processing:     1ms
> 
> 
>         ovn-northd completion:                  38ms
> 
> 
> 
> 
>         ovn-controller(s) completion:           112m
> 
> (112 - 38 = 74ms)
> 
> I think it may be because each port bind/unbind now introduces a SB DB
> monitor condition change, which may have caused some delay. But I didn't
> expect the impact to be that high because this test has only one single
> ovn-controller (all the other 499 chassis are fake). I haven't spent time
> to dig deeper yet.

The monitor condition change should happen only for the first port of a
new logical switch.  From that point on the datapath is "local" to
ovn-controller and all new port additions to that logical switch should
not cause monitor conditions to change.

If there are monitor condition changes for *every* port bind/unbind (if
the set of local datapaths doesn't change) that means we have a bug.

> Even though, it is still <100 ms, so I am not too worried about it. But
> just to share the observation and raise the awareness.
> 
> [0] https://github.com/hzhou8/ovn-test-script
> 
> Thanks,
> Han
> 

Regards,
Dumitru
Ilya Maximets June 28, 2023, 10:20 a.m. UTC | #5
On 6/28/23 10:30, Dumitru Ceara wrote:
> On 6/28/23 04:54, Han Zhou wrote:
>> On Thu, Jun 22, 2023 at 1:12 AM Dumitru Ceara <dceara@redhat.com> wrote:
>>>
>>> On 6/21/23 16:50, Han Zhou wrote:
>>>> On Wed, Jun 21, 2023 at 2:48 AM Dumitru Ceara <dceara@redhat.com> wrote:
>>>>>
>>>>> We don't need to explicitly add port bindings that were already bound
>>>>> locally.  We implicitly get those because we monitor the datapaths
>>>>> they're attached to.
>>>>>
>>>>> When performing an ovn-heater 500-node density-heavy scale test [0],
>> with
>>>>> conditional monitoring enabled, the unreasonably long poll intervals on
>>>>> the Southbound database (the ones that took more than one second) are
>>>>> measured as:
>>>>>
>>>>>   ---------------------------------------------------------------------
>>>>>        Count      Min        Max       Median      Mean   95 percentile
>>>>>   ---------------------------------------------------------------------
>>>>>         56.0     1010.0     2664.0     1455.5     1544.9     2163.0
>>>>>         77.0     1015.0     3460.0     1896.0     1858.2     2907.8
>>>>>         69.0     1010.0     3118.0     1618.0     1688.1     2582.4
>>>>>   ---------------------------------------------------------------------
>>>>>        202.0     1010.0     3460.0     1610.0     1713.3     2711.5
>>>>>
>>>>> Compared to the baseline results (also with conditional monitoring
>>>>> enabled):
>>>>>   ---------------------------------------------------------------------
>>>>>        Count      Min        Max       Median      Mean   95 percentile
>>>>>   ---------------------------------------------------------------------
>>>>>        141.0     1003.0    18338.0     1721.0     2625.4     7626.0
>>>>>        151.0     1019.0    80297.0     1808.0     3410.7     9089.0
>>>>>        165.0     1007.0    50736.0     2354.0     3067.7     7309.8
>>>>>   ---------------------------------------------------------------------
>>>>>        457.0     1003.0    80297.0     2131.0     3044.6     7799.6
>>>>>
>>>>> We see significant improvement on the server side.  This is explained
>>>>> by the fact that now the Southbound database server doesn't need to
>>>>> apply as many conditions as before when filtering individual monitor
>>>>> contents.
>>>>>
>>>>> Note: Sub-ports - OVN switch ports with parent_port set - have to be
>>>>> monitored unconditionally as we cannot efficiently determine their
>> local
>>>>> datapaths without including all local OVS interfaces in the monitor.
>>>>>
>>>>> [0]
>>>>
>> https://github.com/ovn-org/ovn-heater/blob/main/test-scenarios/ocp-500-density-heavy.yml
>>>>>
>>>>> Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2139194
>>>>> Reported-by: Ilya Maximets <i.maximets@ovn.org>
>>>>> Signed-off-by: Dumitru Ceara <dceara@redhat.com>
>>>>> ---
>>>>> V2:
>>>>> - Addressed Han's comments:
>>>>>   - changed commit log: removed assumption about sub-ports.
>>>>>   - added ovn-nb documentation note about sub-ports monitoring.
>>>>>   - added ovn-controller documentation note about sub-ports monitoring.
>>>>>   - added TODO item.
>>>>> - Added NEWS item.
>>>>> ---
>>>>>  NEWS                            |  3 +++
>>>>>  TODO.rst                        |  6 ++++++
>>>>>  controller/binding.c            |  2 +-
>>>>>  controller/binding.h            |  2 +-
>>>>>  controller/ovn-controller.8.xml |  6 ++++++
>>>>>  controller/ovn-controller.c     | 19 ++++++++++++++++---
>>>>>  ovn-nb.xml                      |  6 +++++-
>>>>>  7 files changed, 38 insertions(+), 6 deletions(-)
>>>>>
>>>>> diff --git a/NEWS b/NEWS
>>>>> index 482970eeeb..8275877f99 100644
>>>>> --- a/NEWS
>>>>> +++ b/NEWS
>>>>> @@ -7,6 +7,9 @@ Post v23.06.0
>>>>>      DHCPv4 "hostname" (12) option.
>>>>>    - Support to create/update MAC_Binding when GARP received from VTEP
>>>> (RAMP)
>>>>>      switch on l3dgw port.
>>>>> +  - To allow optimizing ovn-controller's monitor conditions for the
>>>> regular
>>>>> +    VIF case, ovn-controller now unconditionally monitors all
>> sub-ports
>>>>> +    (ports with parent_port set).
>>>>>
>>>>>  OVN v23.06.0 - 01 Jun 2023
>>>>>  --------------------------
>>>>> diff --git a/TODO.rst b/TODO.rst
>>>>> index dff163e70b..b790a9fadf 100644
>>>>> --- a/TODO.rst
>>>>> +++ b/TODO.rst
>>>>> @@ -124,3 +124,9 @@ OVN To-do List
>>>>>  * Load Balancer templates
>>>>>
>>>>>    * Support combining the VIP IP and port into a single template
>>>> variable.
>>>>> +
>>>>> +* ovn-controller conditional monitoring
>>>>> +
>>>>> +  * Improve sub-ports (with parent_port set) conditional monitoring;
>>>> these
>>>>> +    are currently unconditionally monitored, even if ovn-monitor-all
>> is
>>>>> +    set to false.
>>>>> diff --git a/controller/binding.c b/controller/binding.c
>>>>> index 486095ca79..359ad6d660 100644
>>>>> --- a/controller/binding.c
>>>>> +++ b/controller/binding.c
>>>>> @@ -785,7 +785,7 @@ local_binding_data_destroy(struct
>> local_binding_data
>>>> *lbinding_data)
>>>>>  }
>>>>>
>>>>>  const struct sbrec_port_binding *
>>>>> -local_binding_get_primary_pb(struct shash *local_bindings,
>>>>> +local_binding_get_primary_pb(const struct shash *local_bindings,
>>>>>                               const char *pb_name)
>>>>>  {
>>>>>      struct local_binding *lbinding =
>>>>> diff --git a/controller/binding.h b/controller/binding.h
>>>>> index 0e57f02ee5..319cbd263c 100644
>>>>> --- a/controller/binding.h
>>>>> +++ b/controller/binding.h
>>>>> @@ -150,7 +150,7 @@ void local_binding_data_init(struct
>>>> local_binding_data *);
>>>>>  void local_binding_data_destroy(struct local_binding_data *);
>>>>>
>>>>>  const struct sbrec_port_binding *local_binding_get_primary_pb(
>>>>> -    struct shash *local_bindings, const char *pb_name);
>>>>> +    const struct shash *local_bindings, const char *pb_name);
>>>>>  ofp_port_t local_binding_get_lport_ofport(const struct shash
>>>> *local_bindings,
>>>>>                                            const char *pb_name);
>>>>>
>>>>> diff --git a/controller/ovn-controller.8.xml
>>>> b/controller/ovn-controller.8.xml
>>>>> index f61f430084..0883d8da9b 100644
>>>>> --- a/controller/ovn-controller.8.xml
>>>>> +++ b/controller/ovn-controller.8.xml
>>>>> @@ -128,6 +128,12 @@
>>>>>            to <code>true</code> for environments that all workloads
>> need
>>>> to be
>>>>>            reachable from each other.
>>>>>          </p>
>>>>> +        <p>
>>>>> +          NOTE: for efficiency and scalability in common scenarios
>>>>> +          <code>ovn-controller</code> unconditionally monitors all
>>>> sub-ports
>>>>> +          (ports with <code>parent_port</code> set) regardless of the
>>>>> +          <code>ovn-monitor-all</code> value.
>>>>> +        </p>
>>>>>          <p>
>>>>>            Default value is <var>false</var>.
>>>>>          </p>
>>>>> diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c
>>>>> index a474069798..96d01219e1 100644
>>>>> --- a/controller/ovn-controller.c
>>>>> +++ b/controller/ovn-controller.c
>>>>> @@ -199,6 +199,7 @@ static unsigned int
>>>>>  update_sb_monitors(struct ovsdb_idl *ovnsb_idl,
>>>>>                     const struct sbrec_chassis *chassis,
>>>>>                     const struct sset *local_ifaces,
>>>>> +                   const struct shash *local_bindings,
>>>>>                     struct hmap *local_datapaths,
>>>>>                     bool monitor_all)
>>>>>  {
>>>>> @@ -297,10 +298,21 @@ update_sb_monitors(struct ovsdb_idl *ovnsb_idl,
>>>>>
>>>>>      if (local_ifaces) {
>>>>>          const char *name;
>>>>> +
>>>>> +        ovs_assert(local_bindings);
>>>>>          SSET_FOR_EACH (name, local_ifaces) {
>>>>> +            /* Skip the VIFs we bound already, we should have a local
>>>> datapath
>>>>> +             * for those. */
>>>>> +            const struct sbrec_port_binding *local_pb
>>>>> +                = local_binding_get_primary_pb(local_bindings, name);
>>>>> +            if (local_pb && get_lport_type(local_pb) == LP_VIF) {
>>>>> +                continue;
>>>>> +            }
>>>>>              sbrec_port_binding_add_clause_logical_port(&pb,
>> OVSDB_F_EQ,
>>>> name);
>>>>> -            sbrec_port_binding_add_clause_parent_port(&pb, OVSDB_F_EQ,
>>>> name);
>>>>>          }
>>>>> +        /* Monitor all sub-ports unconditionally; we don't expect a
>> lot
>>>> of
>>>>> +         * them in the SB database. */
>>>>> +        sbrec_port_binding_add_clause_parent_port(&pb, OVSDB_F_NE,
>> NULL);
>>>>>      }
>>>>>      if (local_datapaths) {
>>>>>          const struct local_datapath *ld;
>>>>> @@ -609,7 +621,7 @@ update_sb_db(struct ovsdb_idl *ovs_idl, struct
>>>> ovsdb_idl *ovnsb_idl,
>>>>>           * extra cost. Instead, it is called after the engine
>> execution
>>>> only
>>>>>           * when it is necessary. */
>>>>>          unsigned int next_cond_seqno =
>>>>> -            update_sb_monitors(ovnsb_idl, NULL, NULL, NULL, true);
>>>>> +            update_sb_monitors(ovnsb_idl, NULL, NULL, NULL, NULL,
>> true);
>>>>>          if (sb_cond_seqno) {
>>>>>              *sb_cond_seqno = next_cond_seqno;
>>>>>          }
>>>>> @@ -4712,7 +4724,7 @@ main(int argc, char *argv[])
>>>>>      ovsdb_idl_omit(ovnsb_idl_loop.idl,
>>>>>                     &sbrec_chassis_private_col_external_ids);
>>>>>
>>>>> -    update_sb_monitors(ovnsb_idl_loop.idl, NULL, NULL, NULL, false);
>>>>> +    update_sb_monitors(ovnsb_idl_loop.idl, NULL, NULL, NULL, NULL,
>>>> false);
>>>>>
>>>>>      stopwatch_create(CONTROLLER_LOOP_STOPWATCH_NAME, SW_MS);
>>>>>      stopwatch_create(OFCTRL_PUT_STOPWATCH_NAME, SW_MS);
>>>>> @@ -5355,6 +5367,7 @@ main(int argc, char *argv[])
>>>>>                                  update_sb_monitors(
>>>>>                                      ovnsb_idl_loop.idl, chassis,
>>>>>                                      &runtime_data->local_lports,
>>>>> +
>>>>  &runtime_data->lbinding_data.bindings,
>>>>>                                      &runtime_data->local_datapaths,
>>>>>                                      sb_monitor_all);
>>>>>                          }
>>>>> diff --git a/ovn-nb.xml b/ovn-nb.xml
>>>>> index cf9c92009d..1fb9fa10ee 100644
>>>>> --- a/ovn-nb.xml
>>>>> +++ b/ovn-nb.xml
>>>>> @@ -1302,7 +1302,11 @@
>>>>>        <column name="parent_name">
>>>>>          The VM interface through which the nested container sends its
>>>> network
>>>>>          traffic.  This must match the <ref column="name"/> column for
>>>> some
>>>>> -        other <ref table="Logical_Switch_Port"/>.
>>>>> +        other <ref table="Logical_Switch_Port"/>.  Note: for
>> performance
>>>>> +        reasons, unlike for regular VIFs, <code>ovn-controller</code>
>>>> will
>>>>
>>>> It may be better to say "for performance of OVN Southbound database
>>>> conditional monitoring ...", otherwise the user might be confused.
>>>>
>>>>> +        register to get updates about all OVN Southbound database
>>>>> +        <ref db="OVN_Southbound" table="Port_Binding"/> table records
>>>>> +        that correspond to nested container ports.
>>>>
>>>> Better to add: "..., even if 'ovn-monitor-all' is set to false. See
>>>> ovn-controlller (8)." or something similar.
>>>>
>>>> Please feel free to adjust the words and merge.
>>>>
>>>> Acked-by: Han Zhou <hzhou@ovn.org>
>>>>
>>>
>>> Done, thanks again for the review!
>>>
>>> Regards,
>>> Dumitru
>>>
>>
>> Hi Dumitru,
>>
> 
> Hi Han,
> 
>> I noticed that with this patch, the ovn-controller port-binding latency
>> increased a lot. With the simulated k8s scale test [0] of 500 chassis x 50
>> LSPs, the latency of a single LSP add/delete is almost tripled:
>>
>> Command:
>> ovn-nbctl --wait=hv --print-wait-time lsp-add ls_1_0001 lsp_1_0001_01 --
>> lsp-set-addresses lsp_1_0001_01 "ff:f1:bb:00:01:01 1.0.1.101" --
>> lsp-set-port-security lsp_1_0001_01 "ff:f1:bb:00:01:01 1.0.1.101"
>>
>> Before:
>> Time spent on processing nb_cfg 3:
>>
>>         ovn-northd delay before processing:     2ms
>>
>>         ovn-northd completion:                  42ms
>>
>>
>>         ovn-controller(s) completion:           68ms
>>
>> (68 - 42 = 26ms)
>>
>> After:
>> Time spent on processing nb_cfg 3:
>>
>>
>>
>>
>>         ovn-northd delay before processing:     1ms
>>
>>
>>         ovn-northd completion:                  38ms
>>
>>
>>
>>
>>         ovn-controller(s) completion:           112m
>>
>> (112 - 38 = 74ms)
>>
>> I think it may be because each port bind/unbind now introduces a SB DB
>> monitor condition change, which may have caused some delay. But I didn't
>> expect the impact to be that high because this test has only one single
>> ovn-controller (all the other 499 chassis are fake). I haven't spent time
>> to dig deeper yet.
> 
> The monitor condition change should happen only for the first port of a
> new logical switch.  From that point on the datapath is "local" to
> ovn-controller and all new port additions to that logical switch should
> not cause monitor conditions to change.
> 
> If there are monitor condition changes for *every* port bind/unbind (if
> the set of local datapaths doesn't change) that means we have a bug.

But if you add OVS port first, ovn-controller will add it to the condition.
And it will remove it once Sb port binding is created for it.

If you create Nb --> Sb records first, then add OVS port, if this case
there should be no condition updates.

Or am I missing something?

How do real CMSes work?  Do they add OVS ports before or after creating
Nb records?  There might be a race anyway since we do not control when
the Sb port binding is created, unless we --wait=sb.

Best regards, Ilya Maximets.

> 
>> Even though, it is still <100 ms, so I am not too worried about it. But
>> just to share the observation and raise the awareness.
>>
>> [0] https://github.com/hzhou8/ovn-test-script
>>
>> Thanks,
>> Han
>>
> 
> Regards,
> Dumitru
>
Dumitru Ceara June 28, 2023, 11:06 a.m. UTC | #6
On 6/28/23 12:20, Ilya Maximets wrote:
> On 6/28/23 10:30, Dumitru Ceara wrote:
>> On 6/28/23 04:54, Han Zhou wrote:
>>> On Thu, Jun 22, 2023 at 1:12 AM Dumitru Ceara <dceara@redhat.com> wrote:
>>>>
>>>> On 6/21/23 16:50, Han Zhou wrote:
>>>>> On Wed, Jun 21, 2023 at 2:48 AM Dumitru Ceara <dceara@redhat.com> wrote:
>>>>>>
>>>>>> We don't need to explicitly add port bindings that were already bound
>>>>>> locally.  We implicitly get those because we monitor the datapaths
>>>>>> they're attached to.
>>>>>>
>>>>>> When performing an ovn-heater 500-node density-heavy scale test [0],
>>> with
>>>>>> conditional monitoring enabled, the unreasonably long poll intervals on
>>>>>> the Southbound database (the ones that took more than one second) are
>>>>>> measured as:
>>>>>>
>>>>>>   ---------------------------------------------------------------------
>>>>>>        Count      Min        Max       Median      Mean   95 percentile
>>>>>>   ---------------------------------------------------------------------
>>>>>>         56.0     1010.0     2664.0     1455.5     1544.9     2163.0
>>>>>>         77.0     1015.0     3460.0     1896.0     1858.2     2907.8
>>>>>>         69.0     1010.0     3118.0     1618.0     1688.1     2582.4
>>>>>>   ---------------------------------------------------------------------
>>>>>>        202.0     1010.0     3460.0     1610.0     1713.3     2711.5
>>>>>>
>>>>>> Compared to the baseline results (also with conditional monitoring
>>>>>> enabled):
>>>>>>   ---------------------------------------------------------------------
>>>>>>        Count      Min        Max       Median      Mean   95 percentile
>>>>>>   ---------------------------------------------------------------------
>>>>>>        141.0     1003.0    18338.0     1721.0     2625.4     7626.0
>>>>>>        151.0     1019.0    80297.0     1808.0     3410.7     9089.0
>>>>>>        165.0     1007.0    50736.0     2354.0     3067.7     7309.8
>>>>>>   ---------------------------------------------------------------------
>>>>>>        457.0     1003.0    80297.0     2131.0     3044.6     7799.6
>>>>>>
>>>>>> We see significant improvement on the server side.  This is explained
>>>>>> by the fact that now the Southbound database server doesn't need to
>>>>>> apply as many conditions as before when filtering individual monitor
>>>>>> contents.
>>>>>>
>>>>>> Note: Sub-ports - OVN switch ports with parent_port set - have to be
>>>>>> monitored unconditionally as we cannot efficiently determine their
>>> local
>>>>>> datapaths without including all local OVS interfaces in the monitor.
>>>>>>
>>>>>> [0]
>>>>>
>>> https://github.com/ovn-org/ovn-heater/blob/main/test-scenarios/ocp-500-density-heavy.yml
>>>>>>
>>>>>> Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2139194
>>>>>> Reported-by: Ilya Maximets <i.maximets@ovn.org>
>>>>>> Signed-off-by: Dumitru Ceara <dceara@redhat.com>
>>>>>> ---
>>>>>> V2:
>>>>>> - Addressed Han's comments:
>>>>>>   - changed commit log: removed assumption about sub-ports.
>>>>>>   - added ovn-nb documentation note about sub-ports monitoring.
>>>>>>   - added ovn-controller documentation note about sub-ports monitoring.
>>>>>>   - added TODO item.
>>>>>> - Added NEWS item.
>>>>>> ---
>>>>>>  NEWS                            |  3 +++
>>>>>>  TODO.rst                        |  6 ++++++
>>>>>>  controller/binding.c            |  2 +-
>>>>>>  controller/binding.h            |  2 +-
>>>>>>  controller/ovn-controller.8.xml |  6 ++++++
>>>>>>  controller/ovn-controller.c     | 19 ++++++++++++++++---
>>>>>>  ovn-nb.xml                      |  6 +++++-
>>>>>>  7 files changed, 38 insertions(+), 6 deletions(-)
>>>>>>
>>>>>> diff --git a/NEWS b/NEWS
>>>>>> index 482970eeeb..8275877f99 100644
>>>>>> --- a/NEWS
>>>>>> +++ b/NEWS
>>>>>> @@ -7,6 +7,9 @@ Post v23.06.0
>>>>>>      DHCPv4 "hostname" (12) option.
>>>>>>    - Support to create/update MAC_Binding when GARP received from VTEP
>>>>> (RAMP)
>>>>>>      switch on l3dgw port.
>>>>>> +  - To allow optimizing ovn-controller's monitor conditions for the
>>>>> regular
>>>>>> +    VIF case, ovn-controller now unconditionally monitors all
>>> sub-ports
>>>>>> +    (ports with parent_port set).
>>>>>>
>>>>>>  OVN v23.06.0 - 01 Jun 2023
>>>>>>  --------------------------
>>>>>> diff --git a/TODO.rst b/TODO.rst
>>>>>> index dff163e70b..b790a9fadf 100644
>>>>>> --- a/TODO.rst
>>>>>> +++ b/TODO.rst
>>>>>> @@ -124,3 +124,9 @@ OVN To-do List
>>>>>>  * Load Balancer templates
>>>>>>
>>>>>>    * Support combining the VIP IP and port into a single template
>>>>> variable.
>>>>>> +
>>>>>> +* ovn-controller conditional monitoring
>>>>>> +
>>>>>> +  * Improve sub-ports (with parent_port set) conditional monitoring;
>>>>> these
>>>>>> +    are currently unconditionally monitored, even if ovn-monitor-all
>>> is
>>>>>> +    set to false.
>>>>>> diff --git a/controller/binding.c b/controller/binding.c
>>>>>> index 486095ca79..359ad6d660 100644
>>>>>> --- a/controller/binding.c
>>>>>> +++ b/controller/binding.c
>>>>>> @@ -785,7 +785,7 @@ local_binding_data_destroy(struct
>>> local_binding_data
>>>>> *lbinding_data)
>>>>>>  }
>>>>>>
>>>>>>  const struct sbrec_port_binding *
>>>>>> -local_binding_get_primary_pb(struct shash *local_bindings,
>>>>>> +local_binding_get_primary_pb(const struct shash *local_bindings,
>>>>>>                               const char *pb_name)
>>>>>>  {
>>>>>>      struct local_binding *lbinding =
>>>>>> diff --git a/controller/binding.h b/controller/binding.h
>>>>>> index 0e57f02ee5..319cbd263c 100644
>>>>>> --- a/controller/binding.h
>>>>>> +++ b/controller/binding.h
>>>>>> @@ -150,7 +150,7 @@ void local_binding_data_init(struct
>>>>> local_binding_data *);
>>>>>>  void local_binding_data_destroy(struct local_binding_data *);
>>>>>>
>>>>>>  const struct sbrec_port_binding *local_binding_get_primary_pb(
>>>>>> -    struct shash *local_bindings, const char *pb_name);
>>>>>> +    const struct shash *local_bindings, const char *pb_name);
>>>>>>  ofp_port_t local_binding_get_lport_ofport(const struct shash
>>>>> *local_bindings,
>>>>>>                                            const char *pb_name);
>>>>>>
>>>>>> diff --git a/controller/ovn-controller.8.xml
>>>>> b/controller/ovn-controller.8.xml
>>>>>> index f61f430084..0883d8da9b 100644
>>>>>> --- a/controller/ovn-controller.8.xml
>>>>>> +++ b/controller/ovn-controller.8.xml
>>>>>> @@ -128,6 +128,12 @@
>>>>>>            to <code>true</code> for environments that all workloads
>>> need
>>>>> to be
>>>>>>            reachable from each other.
>>>>>>          </p>
>>>>>> +        <p>
>>>>>> +          NOTE: for efficiency and scalability in common scenarios
>>>>>> +          <code>ovn-controller</code> unconditionally monitors all
>>>>> sub-ports
>>>>>> +          (ports with <code>parent_port</code> set) regardless of the
>>>>>> +          <code>ovn-monitor-all</code> value.
>>>>>> +        </p>
>>>>>>          <p>
>>>>>>            Default value is <var>false</var>.
>>>>>>          </p>
>>>>>> diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c
>>>>>> index a474069798..96d01219e1 100644
>>>>>> --- a/controller/ovn-controller.c
>>>>>> +++ b/controller/ovn-controller.c
>>>>>> @@ -199,6 +199,7 @@ static unsigned int
>>>>>>  update_sb_monitors(struct ovsdb_idl *ovnsb_idl,
>>>>>>                     const struct sbrec_chassis *chassis,
>>>>>>                     const struct sset *local_ifaces,
>>>>>> +                   const struct shash *local_bindings,
>>>>>>                     struct hmap *local_datapaths,
>>>>>>                     bool monitor_all)
>>>>>>  {
>>>>>> @@ -297,10 +298,21 @@ update_sb_monitors(struct ovsdb_idl *ovnsb_idl,
>>>>>>
>>>>>>      if (local_ifaces) {
>>>>>>          const char *name;
>>>>>> +
>>>>>> +        ovs_assert(local_bindings);
>>>>>>          SSET_FOR_EACH (name, local_ifaces) {
>>>>>> +            /* Skip the VIFs we bound already, we should have a local
>>>>> datapath
>>>>>> +             * for those. */
>>>>>> +            const struct sbrec_port_binding *local_pb
>>>>>> +                = local_binding_get_primary_pb(local_bindings, name);
>>>>>> +            if (local_pb && get_lport_type(local_pb) == LP_VIF) {
>>>>>> +                continue;
>>>>>> +            }
>>>>>>              sbrec_port_binding_add_clause_logical_port(&pb,
>>> OVSDB_F_EQ,
>>>>> name);
>>>>>> -            sbrec_port_binding_add_clause_parent_port(&pb, OVSDB_F_EQ,
>>>>> name);
>>>>>>          }
>>>>>> +        /* Monitor all sub-ports unconditionally; we don't expect a
>>> lot
>>>>> of
>>>>>> +         * them in the SB database. */
>>>>>> +        sbrec_port_binding_add_clause_parent_port(&pb, OVSDB_F_NE,
>>> NULL);
>>>>>>      }
>>>>>>      if (local_datapaths) {
>>>>>>          const struct local_datapath *ld;
>>>>>> @@ -609,7 +621,7 @@ update_sb_db(struct ovsdb_idl *ovs_idl, struct
>>>>> ovsdb_idl *ovnsb_idl,
>>>>>>           * extra cost. Instead, it is called after the engine
>>> execution
>>>>> only
>>>>>>           * when it is necessary. */
>>>>>>          unsigned int next_cond_seqno =
>>>>>> -            update_sb_monitors(ovnsb_idl, NULL, NULL, NULL, true);
>>>>>> +            update_sb_monitors(ovnsb_idl, NULL, NULL, NULL, NULL,
>>> true);
>>>>>>          if (sb_cond_seqno) {
>>>>>>              *sb_cond_seqno = next_cond_seqno;
>>>>>>          }
>>>>>> @@ -4712,7 +4724,7 @@ main(int argc, char *argv[])
>>>>>>      ovsdb_idl_omit(ovnsb_idl_loop.idl,
>>>>>>                     &sbrec_chassis_private_col_external_ids);
>>>>>>
>>>>>> -    update_sb_monitors(ovnsb_idl_loop.idl, NULL, NULL, NULL, false);
>>>>>> +    update_sb_monitors(ovnsb_idl_loop.idl, NULL, NULL, NULL, NULL,
>>>>> false);
>>>>>>
>>>>>>      stopwatch_create(CONTROLLER_LOOP_STOPWATCH_NAME, SW_MS);
>>>>>>      stopwatch_create(OFCTRL_PUT_STOPWATCH_NAME, SW_MS);
>>>>>> @@ -5355,6 +5367,7 @@ main(int argc, char *argv[])
>>>>>>                                  update_sb_monitors(
>>>>>>                                      ovnsb_idl_loop.idl, chassis,
>>>>>>                                      &runtime_data->local_lports,
>>>>>> +
>>>>>  &runtime_data->lbinding_data.bindings,
>>>>>>                                      &runtime_data->local_datapaths,
>>>>>>                                      sb_monitor_all);
>>>>>>                          }
>>>>>> diff --git a/ovn-nb.xml b/ovn-nb.xml
>>>>>> index cf9c92009d..1fb9fa10ee 100644
>>>>>> --- a/ovn-nb.xml
>>>>>> +++ b/ovn-nb.xml
>>>>>> @@ -1302,7 +1302,11 @@
>>>>>>        <column name="parent_name">
>>>>>>          The VM interface through which the nested container sends its
>>>>> network
>>>>>>          traffic.  This must match the <ref column="name"/> column for
>>>>> some
>>>>>> -        other <ref table="Logical_Switch_Port"/>.
>>>>>> +        other <ref table="Logical_Switch_Port"/>.  Note: for
>>> performance
>>>>>> +        reasons, unlike for regular VIFs, <code>ovn-controller</code>
>>>>> will
>>>>>
>>>>> It may be better to say "for performance of OVN Southbound database
>>>>> conditional monitoring ...", otherwise the user might be confused.
>>>>>
>>>>>> +        register to get updates about all OVN Southbound database
>>>>>> +        <ref db="OVN_Southbound" table="Port_Binding"/> table records
>>>>>> +        that correspond to nested container ports.
>>>>>
>>>>> Better to add: "..., even if 'ovn-monitor-all' is set to false. See
>>>>> ovn-controlller (8)." or something similar.
>>>>>
>>>>> Please feel free to adjust the words and merge.
>>>>>
>>>>> Acked-by: Han Zhou <hzhou@ovn.org>
>>>>>
>>>>
>>>> Done, thanks again for the review!
>>>>
>>>> Regards,
>>>> Dumitru
>>>>
>>>
>>> Hi Dumitru,
>>>
>>
>> Hi Han,
>>
>>> I noticed that with this patch, the ovn-controller port-binding latency
>>> increased a lot. With the simulated k8s scale test [0] of 500 chassis x 50
>>> LSPs, the latency of a single LSP add/delete is almost tripled:
>>>
>>> Command:
>>> ovn-nbctl --wait=hv --print-wait-time lsp-add ls_1_0001 lsp_1_0001_01 --
>>> lsp-set-addresses lsp_1_0001_01 "ff:f1:bb:00:01:01 1.0.1.101" --
>>> lsp-set-port-security lsp_1_0001_01 "ff:f1:bb:00:01:01 1.0.1.101"
>>>
>>> Before:
>>> Time spent on processing nb_cfg 3:
>>>
>>>         ovn-northd delay before processing:     2ms
>>>
>>>         ovn-northd completion:                  42ms
>>>
>>>
>>>         ovn-controller(s) completion:           68ms
>>>
>>> (68 - 42 = 26ms)
>>>
>>> After:
>>> Time spent on processing nb_cfg 3:
>>>
>>>
>>>
>>>
>>>         ovn-northd delay before processing:     1ms
>>>
>>>
>>>         ovn-northd completion:                  38ms
>>>
>>>
>>>
>>>
>>>         ovn-controller(s) completion:           112m
>>>
>>> (112 - 38 = 74ms)
>>>
>>> I think it may be because each port bind/unbind now introduces a SB DB
>>> monitor condition change, which may have caused some delay. But I didn't
>>> expect the impact to be that high because this test has only one single
>>> ovn-controller (all the other 499 chassis are fake). I haven't spent time
>>> to dig deeper yet.
>>
>> The monitor condition change should happen only for the first port of a
>> new logical switch.  From that point on the datapath is "local" to
>> ovn-controller and all new port additions to that logical switch should
>> not cause monitor conditions to change.
>>
>> If there are monitor condition changes for *every* port bind/unbind (if
>> the set of local datapaths doesn't change) that means we have a bug.
> 
> But if you add OVS port first, ovn-controller will add it to the condition.
> And it will remove it once Sb port binding is created for it.
> 
> If you create Nb --> Sb records first, then add OVS port, if this case
> there should be no condition updates.
> 
> Or am I missing something?
> 

You're right, that's what's going on here too.  I forgot about this
case, thanks for following up!

> How do real CMSes work?  Do they add OVS ports before or after creating
> Nb records?  There might be a race anyway since we do not control when
> the Sb port binding is created, unless we --wait=sb.
> 

Yes, there's a race anyway.  I don't think we can do much about that
unless we add extra hints for ovn-controller, e.g., only monitor ports
that are tagged in a specific way.

Thanks,
Dumitru

> Best regards, Ilya Maximets.
> 
>>
>>> Even though, it is still <100 ms, so I am not too worried about it. But
>>> just to share the observation and raise the awareness.
>>>
>>> [0] https://github.com/hzhou8/ovn-test-script
>>>
>>> Thanks,
>>> Han
>>>
>>
>> Regards,
>> Dumitru
>>
>
diff mbox series

Patch

diff --git a/NEWS b/NEWS
index 482970eeeb..8275877f99 100644
--- a/NEWS
+++ b/NEWS
@@ -7,6 +7,9 @@  Post v23.06.0
     DHCPv4 "hostname" (12) option.
   - Support to create/update MAC_Binding when GARP received from VTEP (RAMP)
     switch on l3dgw port.
+  - To allow optimizing ovn-controller's monitor conditions for the regular
+    VIF case, ovn-controller now unconditionally monitors all sub-ports
+    (ports with parent_port set).
 
 OVN v23.06.0 - 01 Jun 2023
 --------------------------
diff --git a/TODO.rst b/TODO.rst
index dff163e70b..b790a9fadf 100644
--- a/TODO.rst
+++ b/TODO.rst
@@ -124,3 +124,9 @@  OVN To-do List
 * Load Balancer templates
 
   * Support combining the VIP IP and port into a single template variable.
+
+* ovn-controller conditional monitoring
+
+  * Improve sub-ports (with parent_port set) conditional monitoring; these
+    are currently unconditionally monitored, even if ovn-monitor-all is
+    set to false.
diff --git a/controller/binding.c b/controller/binding.c
index 486095ca79..359ad6d660 100644
--- a/controller/binding.c
+++ b/controller/binding.c
@@ -785,7 +785,7 @@  local_binding_data_destroy(struct local_binding_data *lbinding_data)
 }
 
 const struct sbrec_port_binding *
-local_binding_get_primary_pb(struct shash *local_bindings,
+local_binding_get_primary_pb(const struct shash *local_bindings,
                              const char *pb_name)
 {
     struct local_binding *lbinding =
diff --git a/controller/binding.h b/controller/binding.h
index 0e57f02ee5..319cbd263c 100644
--- a/controller/binding.h
+++ b/controller/binding.h
@@ -150,7 +150,7 @@  void local_binding_data_init(struct local_binding_data *);
 void local_binding_data_destroy(struct local_binding_data *);
 
 const struct sbrec_port_binding *local_binding_get_primary_pb(
-    struct shash *local_bindings, const char *pb_name);
+    const struct shash *local_bindings, const char *pb_name);
 ofp_port_t local_binding_get_lport_ofport(const struct shash *local_bindings,
                                           const char *pb_name);
 
diff --git a/controller/ovn-controller.8.xml b/controller/ovn-controller.8.xml
index f61f430084..0883d8da9b 100644
--- a/controller/ovn-controller.8.xml
+++ b/controller/ovn-controller.8.xml
@@ -128,6 +128,12 @@ 
           to <code>true</code> for environments that all workloads need to be
           reachable from each other.
         </p>
+        <p>
+          NOTE: for efficiency and scalability in common scenarios
+          <code>ovn-controller</code> unconditionally monitors all sub-ports
+          (ports with <code>parent_port</code> set) regardless of the
+          <code>ovn-monitor-all</code> value.
+        </p>
         <p>
           Default value is <var>false</var>.
         </p>
diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c
index a474069798..96d01219e1 100644
--- a/controller/ovn-controller.c
+++ b/controller/ovn-controller.c
@@ -199,6 +199,7 @@  static unsigned int
 update_sb_monitors(struct ovsdb_idl *ovnsb_idl,
                    const struct sbrec_chassis *chassis,
                    const struct sset *local_ifaces,
+                   const struct shash *local_bindings,
                    struct hmap *local_datapaths,
                    bool monitor_all)
 {
@@ -297,10 +298,21 @@  update_sb_monitors(struct ovsdb_idl *ovnsb_idl,
 
     if (local_ifaces) {
         const char *name;
+
+        ovs_assert(local_bindings);
         SSET_FOR_EACH (name, local_ifaces) {
+            /* Skip the VIFs we bound already, we should have a local datapath
+             * for those. */
+            const struct sbrec_port_binding *local_pb
+                = local_binding_get_primary_pb(local_bindings, name);
+            if (local_pb && get_lport_type(local_pb) == LP_VIF) {
+                continue;
+            }
             sbrec_port_binding_add_clause_logical_port(&pb, OVSDB_F_EQ, name);
-            sbrec_port_binding_add_clause_parent_port(&pb, OVSDB_F_EQ, name);
         }
+        /* Monitor all sub-ports unconditionally; we don't expect a lot of
+         * them in the SB database. */
+        sbrec_port_binding_add_clause_parent_port(&pb, OVSDB_F_NE, NULL);
     }
     if (local_datapaths) {
         const struct local_datapath *ld;
@@ -609,7 +621,7 @@  update_sb_db(struct ovsdb_idl *ovs_idl, struct ovsdb_idl *ovnsb_idl,
          * extra cost. Instead, it is called after the engine execution only
          * when it is necessary. */
         unsigned int next_cond_seqno =
-            update_sb_monitors(ovnsb_idl, NULL, NULL, NULL, true);
+            update_sb_monitors(ovnsb_idl, NULL, NULL, NULL, NULL, true);
         if (sb_cond_seqno) {
             *sb_cond_seqno = next_cond_seqno;
         }
@@ -4712,7 +4724,7 @@  main(int argc, char *argv[])
     ovsdb_idl_omit(ovnsb_idl_loop.idl,
                    &sbrec_chassis_private_col_external_ids);
 
-    update_sb_monitors(ovnsb_idl_loop.idl, NULL, NULL, NULL, false);
+    update_sb_monitors(ovnsb_idl_loop.idl, NULL, NULL, NULL, NULL, false);
 
     stopwatch_create(CONTROLLER_LOOP_STOPWATCH_NAME, SW_MS);
     stopwatch_create(OFCTRL_PUT_STOPWATCH_NAME, SW_MS);
@@ -5355,6 +5367,7 @@  main(int argc, char *argv[])
                                 update_sb_monitors(
                                     ovnsb_idl_loop.idl, chassis,
                                     &runtime_data->local_lports,
+                                    &runtime_data->lbinding_data.bindings,
                                     &runtime_data->local_datapaths,
                                     sb_monitor_all);
                         }
diff --git a/ovn-nb.xml b/ovn-nb.xml
index cf9c92009d..1fb9fa10ee 100644
--- a/ovn-nb.xml
+++ b/ovn-nb.xml
@@ -1302,7 +1302,11 @@ 
       <column name="parent_name">
         The VM interface through which the nested container sends its network
         traffic.  This must match the <ref column="name"/> column for some
-        other <ref table="Logical_Switch_Port"/>.
+        other <ref table="Logical_Switch_Port"/>.  Note: for performance
+        reasons, unlike for regular VIFs, <code>ovn-controller</code> will
+        register to get updates about all OVN Southbound database
+        <ref db="OVN_Southbound" table="Port_Binding"/> table records
+        that correspond to nested container ports.
       </column>
 
       <column name="tag_request">