diff mbox series

[ovs-dev,ovn,v2] Avoid nb_cfg update notification flooding

Message ID 20200323151834.11073-1-lmartins@redhat.com
State Superseded
Headers show
Series [ovs-dev,ovn,v2] Avoid nb_cfg update notification flooding | expand

Commit Message

Lucas Martins March 23, 2020, 3:18 p.m. UTC
From: Lucas Alvares Gomes <lucasagomes@gmail.com>

nb_cfg as a mechanism to "ping" OVN control plane is very useful
in many ways. However, the current implementation will trigger
update notifications flooding in the whole control plane. Each
HV updates to SB the nb_cfg number and all these updates are
notified to all the other HVs, which is O(n^2). Although updates
are batched in fewers notifications than n^2, it still generates
significant load on SB DB and also on ovn-controllers.

To solve this problem and make the mechanism more useful in large
scale producation deployment, this patch separates the per HV
*private* data (write only by the owning chassis and not
interesting to any other HVs) from the Chassis table to a separate
table, so that each HV can conditionally monitor and get updates
only for its own record.

Test result shows great improvement:
In a test environment with 1K sandbox HVs, and 10K ports created
on 40 lswitches and 8 lrouters, do the sync test when the system
is idle, with command:

    time ovn-nbctl --wait=hv sync

Original result:
real    4m52.926s
user    0m0.328s
sys     0m0.004s

With this patch:
real    0m11.405s
user    0m0.316s
sys     0m0.016s

Also, regarding backwards compatibility note that the nb_cfg from the
Chassis table is no longer updated. If any system is relying on this
mechanism they should start using the nb_cfg from the Chassis_Private
table from now on.

Co-authored-by: Han Zhou <hzhou8@ebay.com>
Signed-off-by: Han Zhou <hzhou8@ebay.com>
Signed-off-by: Lucas Alvares Gomes <lucasagomes@gmail.com>
---

v1 - v2 -> Addressed Dumitru's comments: ovn-northd no longer monitoring
the external_ids column from Chassis and Chassis_Private tables, added an
item to the NEWS file regarding the backward compatibility of this patch
and added an item to the TODO file regarding conditionally monitoring
a set of columns.

 NEWS                        |  6 +++++-
 TODO.rst                    |  3 +++
 controller/chassis.c        | 19 +++++++++++++++++--
 controller/chassis.h        |  8 ++++++--
 controller/ovn-controller.c | 37 ++++++++++++++++++++++++++++++------
 lib/chassis-index.c         | 25 ++++++++++++++++++++++++
 lib/chassis-index.h         |  6 ++++++
 northd/ovn-northd.c         | 26 +++++++++++++++++++------
 ovn-sb.ovsschema            | 13 +++++++++++--
 ovn-sb.xml                  | 38 +++++++++++++++++++++++++++++++++----
 tests/ovn-controller.at     | 26 +++++++++++++++++++++++++
 11 files changed, 184 insertions(+), 23 deletions(-)

--
2.25.2

Comments

Numan Siddique March 24, 2020, 9:45 a.m. UTC | #1
On Mon, Mar 23, 2020 at 8:49 PM <lmartins@redhat.com> wrote:
>
> From: Lucas Alvares Gomes <lucasagomes@gmail.com>
>
> nb_cfg as a mechanism to "ping" OVN control plane is very useful
> in many ways. However, the current implementation will trigger
> update notifications flooding in the whole control plane. Each
> HV updates to SB the nb_cfg number and all these updates are
> notified to all the other HVs, which is O(n^2). Although updates
> are batched in fewers notifications than n^2, it still generates
> significant load on SB DB and also on ovn-controllers.
>
> To solve this problem and make the mechanism more useful in large
> scale producation deployment, this patch separates the per HV
> *private* data (write only by the owning chassis and not
> interesting to any other HVs) from the Chassis table to a separate
> table, so that each HV can conditionally monitor and get updates
> only for its own record.
>
> Test result shows great improvement:
> In a test environment with 1K sandbox HVs, and 10K ports created
> on 40 lswitches and 8 lrouters, do the sync test when the system
> is idle, with command:
>
>     time ovn-nbctl --wait=hv sync
>
> Original result:
> real    4m52.926s
> user    0m0.328s
> sys     0m0.004s
>
> With this patch:
> real    0m11.405s
> user    0m0.316s
> sys     0m0.016s
>
> Also, regarding backwards compatibility note that the nb_cfg from the
> Chassis table is no longer updated. If any system is relying on this
> mechanism they should start using the nb_cfg from the Chassis_Private
> table from now on.
>
> Co-authored-by: Han Zhou <hzhou8@ebay.com>
> Signed-off-by: Han Zhou <hzhou8@ebay.com>
> Signed-off-by: Lucas Alvares Gomes <lucasagomes@gmail.com>

Thanks Lucas for v2.

I did some testing and I'm seeing a couple of issues
  - when I start a sandbox, ovn-controller never creates a chassis row and I see
   below issues in the log
 ***
2020-03-24T09:38:53.376Z|65637|jsonrpc|DBG|ssl:127.0.0.1:6642:
received reply,
result=[{"uuid":["uuid","cf141b1a-f601-4035-b444-b856481a110f"]},{"uuid":["uuid","20c335d8-cb7f-4544-a23a-fb53547438e7"]},{"details":"RBAC
rules for client \"chassis-1\" role \"ovn-controller\" prohibit row
insertion into table \"Chassis_Private\".","error":"permission
error"},null], id=56156
2020-03-24T09:38:53.376Z|65638|jsonrpc|DBG|ssl:127.0.0.1:6642:
received reply, result={}, id=56157
2020-03-24T09:38:53.376Z|65639|chassis|DBG|Could not find Chassis,
will create it: stored (chassis-1) ovs (chassis-1)
****

  - When I run the test suite with - make check TESTSUITEFLAGS="-j5",
90% of the time I see 2 tests failing. I don't see this
    behavior with master.
    In one run I see below 2 tests failing and looks like the test
fails because it didn't receive the expected packet. If I run the same
    test again individually it passes. Seems like something is wrong.

   ****
    39: ovn -- 2 HVs, 2 LS, 1 lport/LS, 2 peer LRs      FAILED
(ovs-macros.at:220)
 73: ovn -- nested containers                        FAILED (ovs-macros.at:220)

   Below is the tail of testsuite.log of test case 39

   actions: output enqueue set_vlan_vid set_vlan_pcp strip_vlan
mod_dl_src mod_dl_dst mod_nw_src mod_nw_dst mod_nw_tos mod_tp_src
mod_tp_dst
 1(hv2-vif1): addr:aa:55:aa:55:00:1f
     config:     0
     state:      0
     speed: 0 Mbps now, 0 Mbps max
 2(ovn-hv1-0): addr:ee:bb:1f:6b:1e:ae
     config:     0
     state:      0
     speed: 0 Mbps now, 0 Mbps max
 LOCAL(br-int): addr:1e:36:c5:d5:84:41
     config:     0
     state:      0
     speed: 0 Mbps now, 0 Mbps max
OFPT_GET_CONFIG_REPLY (xid=0x4): frags=normal miss_send_len=0
NXST_FLOW reply (xid=0x4):

checking packets in hv2/vif1-tx.pcap against expected:
ovn.at:12: waiting until $PYTHON "$ovs_srcdir/utilities/ovs-pcap.in"
$rcv_pcap > $rcv_text
        rcv_n=`wc -l < "$rcv_text"`
        echo "rcv_n=$rcv_n exp_n=$exp_n"
        test $rcv_n -ge $exp_n...
rcv_n=0 exp_n=1
rcv_n=0 exp_n=1
rcv_n=0 exp_n=1
rcv_n=0 exp_n=1
rcv_n=0 exp_n=1
rcv_n=0 exp_n=1
rcv_n=0 exp_n=1
rcv_n=0 exp_n=1
rcv_n=0 exp_n=1
rcv_n=0 exp_n=1
rcv_n=0 exp_n=1
rcv_n=0 exp_n=1
ovn.at:12: wait failed after 10 seconds
../../tests/ovs-macros.at:220: hard failure
39. ovn.at:3781: 39. ovn -- 2 HVs, 2 LS, 1 lport/LS, 2 peer LRs
(ovn.at:3781): FAILED (ovs-macros.at:220)

  ******

Sometimes I see similar failures even with ovn master. Running those
tests individually always passes.
I'm not really sure if this patch is the cause of these failures. But
this patch is making these tests fail more often.

Thanks
Numan


> ---
>
> v1 - v2 -> Addressed Dumitru's comments: ovn-northd no longer monitoring
> the external_ids column from Chassis and Chassis_Private tables, added an
> item to the NEWS file regarding the backward compatibility of this patch
> and added an item to the TODO file regarding conditionally monitoring
> a set of columns.
>
>  NEWS                        |  6 +++++-
>  TODO.rst                    |  3 +++
>  controller/chassis.c        | 19 +++++++++++++++++--
>  controller/chassis.h        |  8 ++++++--
>  controller/ovn-controller.c | 37 ++++++++++++++++++++++++++++++------
>  lib/chassis-index.c         | 25 ++++++++++++++++++++++++
>  lib/chassis-index.h         |  6 ++++++
>  northd/ovn-northd.c         | 26 +++++++++++++++++++------
>  ovn-sb.ovsschema            | 13 +++++++++++--
>  ovn-sb.xml                  | 38 +++++++++++++++++++++++++++++++++----
>  tests/ovn-controller.at     | 26 +++++++++++++++++++++++++
>  11 files changed, 184 insertions(+), 23 deletions(-)
>
> diff --git a/NEWS b/NEWS
> index 393721d70..2192ea020 100644
> --- a/NEWS
> +++ b/NEWS
> @@ -1,6 +1,10 @@
>  Post-v20.03.0
>  --------------------------
> -
> +   - The nb_cfg column from the Chassis table in the OVN Southbound
> +     database has been deprecated and is no longer updated. A new table
> +     called Chassis_Private now contains the nb_cfg column which is updated
> +     by incrementing the value in the NB_Global table, CMSes relying on
> +     this mechanism should update their code to use this new table.
>
>  OVN v20.03.0 - xx xxx xxxx
>  --------------------------
> diff --git a/TODO.rst b/TODO.rst
> index 809d1c91c..6c565a21a 100644
> --- a/TODO.rst
> +++ b/TODO.rst
> @@ -25,6 +25,9 @@
>  OVN To-do List
>  ==============
>
> +* Add OVSDB conditional monitoring to a set of columns. Once implemented
> +  we could get rid of the Chassis and Chassis_Private table separation.
> +
>  * Get incremental updates in ovn-controller and ovn-northd in some
>    sensible way.
>
> diff --git a/controller/chassis.c b/controller/chassis.c
> index 522893ead..99ea6b8fc 100644
> --- a/controller/chassis.c
> +++ b/controller/chassis.c
> @@ -585,14 +585,18 @@ chassis_update(const struct sbrec_chassis *chassis_rec,
>  const struct sbrec_chassis *
>  chassis_run(struct ovsdb_idl_txn *ovnsb_idl_txn,
>              struct ovsdb_idl_index *sbrec_chassis_by_name,
> +            struct ovsdb_idl_index *sbrec_chassis_private_by_name,
>              const struct ovsrec_open_vswitch_table *ovs_table,
>              const struct sbrec_chassis_table *chassis_table,
>              const char *chassis_id,
>              const struct ovsrec_bridge *br_int,
> -            const struct sset *transport_zones)
> +            const struct sset *transport_zones,
> +            const struct sbrec_chassis_private **chassis_private)
>  {
>      struct ovs_chassis_cfg ovs_cfg;
>
> +    *chassis_private = NULL;
> +
>      /* Get the chassis config from the ovs table. */
>      ovs_chassis_cfg_init(&ovs_cfg);
>      if (!chassis_parse_ovs_config(ovs_table, br_int, &ovs_cfg)) {
> @@ -616,6 +620,15 @@ chassis_run(struct ovsdb_idl_txn *ovnsb_idl_txn,
>                                    chassis_id);
>      }
>
> +    const struct sbrec_chassis_private *chassis_private_rec =
> +        chassis_private_lookup_by_name(sbrec_chassis_private_by_name,
> +                                       chassis_id);
> +    if (!chassis_private_rec && ovnsb_idl_txn) {
> +        chassis_private_rec = sbrec_chassis_private_insert(ovnsb_idl_txn);
> +        sbrec_chassis_private_set_name(chassis_private_rec, chassis_id);
> +    }
> +    *chassis_private = chassis_private_rec;
> +
>      ovs_chassis_cfg_destroy(&ovs_cfg);
>      return chassis_rec;
>  }
> @@ -669,7 +682,8 @@ chassis_get_mac(const struct sbrec_chassis *chassis_rec,
>   * required. */
>  bool
>  chassis_cleanup(struct ovsdb_idl_txn *ovnsb_idl_txn,
> -                const struct sbrec_chassis *chassis_rec)
> +                const struct sbrec_chassis *chassis_rec,
> +                const struct sbrec_chassis_private *chassis_private_rec)
>  {
>      if (!chassis_rec) {
>          return true;
> @@ -679,6 +693,7 @@ chassis_cleanup(struct ovsdb_idl_txn *ovnsb_idl_txn,
>                                    "ovn-controller: unregistering chassis '%s'",
>                                    chassis_rec->name);
>          sbrec_chassis_delete(chassis_rec);
> +        sbrec_chassis_private_delete(chassis_private_rec);
>      }
>      return false;
>  }
> diff --git a/controller/chassis.h b/controller/chassis.h
> index 178d2957e..81055b403 100644
> --- a/controller/chassis.h
> +++ b/controller/chassis.h
> @@ -17,6 +17,7 @@
>  #define OVN_CHASSIS_H 1
>
>  #include <stdbool.h>
> +#include "lib/ovn-sb-idl.h"
>
>  struct ovsdb_idl;
>  struct ovsdb_idl_index;
> @@ -33,12 +34,15 @@ void chassis_register_ovs_idl(struct ovsdb_idl *);
>  const struct sbrec_chassis *chassis_run(
>      struct ovsdb_idl_txn *ovnsb_idl_txn,
>      struct ovsdb_idl_index *sbrec_chassis_by_name,
> +    struct ovsdb_idl_index *sbrec_chassis_private_by_name,
>      const struct ovsrec_open_vswitch_table *,
>      const struct sbrec_chassis_table *,
>      const char *chassis_id, const struct ovsrec_bridge *br_int,
> -    const struct sset *transport_zones);
> +    const struct sset *transport_zones,
> +    const struct sbrec_chassis_private **chassis_private);
>  bool chassis_cleanup(struct ovsdb_idl_txn *ovnsb_idl_txn,
> -                     const struct sbrec_chassis *);
> +                     const struct sbrec_chassis *,
> +                     const struct sbrec_chassis_private *);
>  bool chassis_get_mac(const struct sbrec_chassis *chassis,
>                       const char *bridge_mapping,
>                       struct eth_addr *chassis_mac);
> diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c
> index 2893eaac1..1542b2ad7 100644
> --- a/controller/ovn-controller.c
> +++ b/controller/ovn-controller.c
> @@ -155,6 +155,7 @@ update_sb_monitors(struct ovsdb_idl *ovnsb_idl,
>      struct ovsdb_idl_condition ce =  OVSDB_IDL_CONDITION_INIT(&ce);
>      struct ovsdb_idl_condition ip_mcast = OVSDB_IDL_CONDITION_INIT(&ip_mcast);
>      struct ovsdb_idl_condition igmp = OVSDB_IDL_CONDITION_INIT(&igmp);
> +    struct ovsdb_idl_condition chprv = OVSDB_IDL_CONDITION_INIT(&chprv);
>
>      if (monitor_all) {
>          ovsdb_idl_condition_add_clause_true(&pb);
> @@ -165,6 +166,7 @@ update_sb_monitors(struct ovsdb_idl *ovnsb_idl,
>          ovsdb_idl_condition_add_clause_true(&ce);
>          ovsdb_idl_condition_add_clause_true(&ip_mcast);
>          ovsdb_idl_condition_add_clause_true(&igmp);
> +        ovsdb_idl_condition_add_clause_true(&chprv);
>          goto out;
>      }
>
> @@ -196,6 +198,10 @@ update_sb_monitors(struct ovsdb_idl *ovnsb_idl,
>                                                    &chassis->header_.uuid);
>          sbrec_igmp_group_add_clause_chassis(&igmp, OVSDB_F_EQ,
>                                              &chassis->header_.uuid);
> +
> +        /* Monitors Chassis_Private record for current chassis only */
> +        sbrec_chassis_private_add_clause_name(&chprv, OVSDB_F_EQ,
> +                                              chassis->name);
>      }
>      if (local_ifaces) {
>          const char *name;
> @@ -229,6 +235,7 @@ out:
>      sbrec_controller_event_set_condition(ovnsb_idl, &ce);
>      sbrec_ip_multicast_set_condition(ovnsb_idl, &ip_mcast);
>      sbrec_igmp_group_set_condition(ovnsb_idl, &igmp);
> +    sbrec_chassis_private_set_condition(ovnsb_idl, &chprv);
>      ovsdb_idl_condition_destroy(&pb);
>      ovsdb_idl_condition_destroy(&lf);
>      ovsdb_idl_condition_destroy(&mb);
> @@ -237,6 +244,7 @@ out:
>      ovsdb_idl_condition_destroy(&ce);
>      ovsdb_idl_condition_destroy(&ip_mcast);
>      ovsdb_idl_condition_destroy(&igmp);
> +    ovsdb_idl_condition_destroy(&chprv);
>  }
>
>  static const char *
> @@ -1759,6 +1767,8 @@ main(int argc, char *argv[])
>
>      struct ovsdb_idl_index *sbrec_chassis_by_name
>          = chassis_index_create(ovnsb_idl_loop.idl);
> +    struct ovsdb_idl_index *sbrec_chassis_private_by_name
> +        = chassis_private_index_create(ovnsb_idl_loop.idl);
>      struct ovsdb_idl_index *sbrec_multicast_group_by_name_datapath
>          = mcast_group_index_create(ovnsb_idl_loop.idl);
>      struct ovsdb_idl_index *sbrec_port_binding_by_name
> @@ -1784,7 +1794,8 @@ main(int argc, char *argv[])
>          = igmp_group_index_create(ovnsb_idl_loop.idl);
>
>      ovsdb_idl_track_add_all(ovnsb_idl_loop.idl);
> -    ovsdb_idl_omit_alert(ovnsb_idl_loop.idl, &sbrec_chassis_col_nb_cfg);
> +    ovsdb_idl_omit_alert(ovnsb_idl_loop.idl,
> +                         &sbrec_chassis_private_col_nb_cfg);
>
>      /* Omit the external_ids column of all the tables except for -
>       *  - DNS. pinctrl.c uses the external_ids column of DNS,
> @@ -1820,6 +1831,10 @@ main(int argc, char *argv[])
>      ovsdb_idl_omit(ovnsb_idl_loop.idl, &sbrec_connection_col_status);
>      ovsdb_idl_omit(ovnsb_idl_loop.idl, &sbrec_connection_col_target);
>
> +    /* Do not monitor Chassis_Private external_ids */
> +    ovsdb_idl_omit(ovnsb_idl_loop.idl,
> +                   &sbrec_chassis_private_col_external_ids);
> +
>      update_sb_monitors(ovnsb_idl_loop.idl, NULL, NULL, NULL, false);
>
>      stopwatch_create(CONTROLLER_LOOP_STOPWATCH_NAME, SW_MS);
> @@ -1997,10 +2012,13 @@ main(int argc, char *argv[])
>                  process_br_int(ovs_idl_txn, bridge_table, ovs_table);
>              const char *chassis_id = get_ovs_chassis_id(ovs_table);
>              const struct sbrec_chassis *chassis = NULL;
> +            const struct sbrec_chassis_private *chassis_private = NULL;
>              if (chassis_id) {
>                  chassis = chassis_run(ovnsb_idl_txn, sbrec_chassis_by_name,
> +                                      sbrec_chassis_private_by_name,
>                                        ovs_table, chassis_table, chassis_id,
> -                                      br_int, &transport_zones);
> +                                      br_int, &transport_zones,
> +                                      &chassis_private);
>              }
>
>              if (br_int) {
> @@ -2125,10 +2143,10 @@ main(int argc, char *argv[])
>                  engine_set_force_recompute(false);
>              }
>
> -            if (ovnsb_idl_txn && chassis) {
> +            if (ovnsb_idl_txn && chassis_private) {
>                  int64_t cur_cfg = ofctrl_get_cur_cfg();
> -                if (cur_cfg && cur_cfg != chassis->nb_cfg) {
> -                    sbrec_chassis_set_nb_cfg(chassis, cur_cfg);
> +                if (cur_cfg && cur_cfg != chassis_private->nb_cfg) {
> +                    sbrec_chassis_private_set_nb_cfg(chassis_private, cur_cfg);
>                  }
>              }
>
> @@ -2231,10 +2249,17 @@ main(int argc, char *argv[])
>                     ? chassis_lookup_by_name(sbrec_chassis_by_name, chassis_id)
>                     : NULL);
>
> +            const struct sbrec_chassis_private *chassis_private
> +                = (chassis_id
> +                   ? chassis_private_lookup_by_name(
> +                         sbrec_chassis_private_by_name, chassis_id)
> +                   : NULL);
> +
>              /* Run all of the cleanup functions, even if one of them returns
>               * false. We're done if all of them return true. */
>              done = binding_cleanup(ovnsb_idl_txn, port_binding_table, chassis);
> -            done = chassis_cleanup(ovnsb_idl_txn, chassis) && done;
> +            done = chassis_cleanup(ovnsb_idl_txn,
> +                                   chassis, chassis_private) && done;
>              done = encaps_cleanup(ovs_idl_txn, br_int) && done;
>              done = igmp_group_cleanup(ovnsb_idl_txn, sbrec_igmp_group) && done;
>              if (done) {
> diff --git a/lib/chassis-index.c b/lib/chassis-index.c
> index 39066f4cc..d760124b4 100644
> --- a/lib/chassis-index.c
> +++ b/lib/chassis-index.c
> @@ -40,6 +40,31 @@ chassis_lookup_by_name(struct ovsdb_idl_index *sbrec_chassis_by_name,
>      return retval;
>  }
>
> +struct ovsdb_idl_index *
> +chassis_private_index_create(struct ovsdb_idl *idl)
> +{
> +    return ovsdb_idl_index_create1(idl, &sbrec_chassis_private_col_name);
> +}
> +
> +/* Finds and returns the chassis with the given 'name', or NULL if no such
> + * chassis exists. */
> +const struct sbrec_chassis_private *
> +chassis_private_lookup_by_name(
> +    struct ovsdb_idl_index *sbrec_chassis_private_by_name,
> +    const char *name)
> +{
> +    struct sbrec_chassis_private *target =
> +        sbrec_chassis_private_index_init_row(sbrec_chassis_private_by_name);
> +    sbrec_chassis_private_index_set_name(target, name);
> +
> +    struct sbrec_chassis_private *retval = sbrec_chassis_private_index_find(
> +        sbrec_chassis_private_by_name, target);
> +
> +    sbrec_chassis_private_index_destroy_row(target);
> +
> +    return retval;
> +}
> +
>  struct ovsdb_idl_index *
>  ha_chassis_group_index_create(struct ovsdb_idl *idl)
>  {
> diff --git a/lib/chassis-index.h b/lib/chassis-index.h
> index 302e5f0fd..b9b331f34 100644
> --- a/lib/chassis-index.h
> +++ b/lib/chassis-index.h
> @@ -23,6 +23,12 @@ struct ovsdb_idl_index *chassis_index_create(struct ovsdb_idl *);
>  const struct sbrec_chassis *chassis_lookup_by_name(
>      struct ovsdb_idl_index *sbrec_chassis_by_name, const char *name);
>
> +struct ovsdb_idl_index *chassis_private_index_create(struct ovsdb_idl *);
> +
> +const struct sbrec_chassis_private *
> +chassis_private_lookup_by_name(
> +    struct ovsdb_idl_index *sbrec_chassis_private_by_name, const char *name);
> +
>  struct ovsdb_idl_index *ha_chassis_group_index_create(struct ovsdb_idl *idl);
>  const struct sbrec_ha_chassis_group *ha_chassis_group_lookup_by_name(
>      struct ovsdb_idl_index *sbrec_ha_chassis_grp_by_name, const char *name);
> diff --git a/northd/ovn-northd.c b/northd/ovn-northd.c
> index d42a9892a..8ad3842e8 100644
> --- a/northd/ovn-northd.c
> +++ b/northd/ovn-northd.c
> @@ -11174,6 +11174,11 @@ static const char *rbac_chassis_auth[] =
>  static const char *rbac_chassis_update[] =
>      {"nb_cfg", "external_ids", "encaps", "vtep_logical_switches"};
>
> +static const char *rbac_chassis_private_auth[] =
> +    {"chassis_name"};
> +static const char *rbac_chassis_private_update[] =
> +    {"nb_cfg"};
> +
>  static const char *rbac_encap_auth[] =
>      {"chassis_name"};
>  static const char *rbac_encap_update[] =
> @@ -11211,6 +11216,14 @@ static struct rbac_perm_cfg {
>          .update = rbac_chassis_update,
>          .n_update = ARRAY_SIZE(rbac_chassis_update),
>          .row = NULL
> +    },{
> +        .table = "Chassis_Private",
> +        .auth = rbac_chassis_private_auth,
> +        .n_auth = ARRAY_SIZE(rbac_chassis_private_auth),
> +        .insdel = true,
> +        .update = rbac_chassis_private_update,
> +        .n_update = ARRAY_SIZE(rbac_chassis_private_update),
> +        .row = NULL
>      },{
>          .table = "Encap",
>          .auth = rbac_encap_auth,
> @@ -11380,11 +11393,10 @@ update_northbound_cfg(struct northd_context *ctx,
>      /* Update northbound hv_cfg if appropriate. */
>      if (nbg) {
>          /* Find minimum nb_cfg among all chassis. */
> -        const struct sbrec_chassis *chassis;
> +        const struct sbrec_chassis_private *chassis;
>          int64_t hv_cfg = nbg->nb_cfg;
> -        SBREC_CHASSIS_FOR_EACH (chassis, ctx->ovnsb_idl) {
> -            if (!smap_get_bool(&chassis->external_ids, "is-remote", false) &&
> -                chassis->nb_cfg < hv_cfg) {
> +        SBREC_CHASSIS_PRIVATE_FOR_EACH (chassis, ctx->ovnsb_idl) {
> +            if (chassis->nb_cfg < hv_cfg) {
>                  hv_cfg = chassis->nb_cfg;
>              }
>          }
> @@ -11670,9 +11682,11 @@ main(int argc, char *argv[])
>      ovsdb_idl_add_column(ovnsb_idl_loop.idl, &sbrec_meter_band_col_burst_size);
>
>      ovsdb_idl_add_table(ovnsb_idl_loop.idl, &sbrec_table_chassis);
> -    ovsdb_idl_add_column(ovnsb_idl_loop.idl, &sbrec_chassis_col_nb_cfg);
>      ovsdb_idl_add_column(ovnsb_idl_loop.idl, &sbrec_chassis_col_name);
> -    ovsdb_idl_add_column(ovnsb_idl_loop.idl, &sbrec_chassis_col_external_ids);
> +
> +    ovsdb_idl_add_table(ovnsb_idl_loop.idl, &sbrec_table_chassis_private);
> +    ovsdb_idl_add_column(ovnsb_idl_loop.idl,
> +                         &sbrec_chassis_private_col_nb_cfg);
>
>      ovsdb_idl_add_table(ovnsb_idl_loop.idl, &sbrec_table_ha_chassis);
>      add_column_noalert(ovnsb_idl_loop.idl,
> diff --git a/ovn-sb.ovsschema b/ovn-sb.ovsschema
> index d89f8dbbb..fa53c6a4c 100644
> --- a/ovn-sb.ovsschema
> +++ b/ovn-sb.ovsschema
> @@ -1,7 +1,7 @@
>  {
>      "name": "OVN_Southbound",
> -    "version": "2.7.0",
> -    "cksum": "4286723485 21693",
> +    "version": "2.8.0",
> +    "cksum": "3504843097 22072",
>      "tables": {
>          "SB_Global": {
>              "columns": {
> @@ -43,6 +43,15 @@
>                                                "max": "unlimited"}}},
>              "isRoot": true,
>              "indexes": [["name"]]},
> +        "Chassis_Private": {
> +            "columns": {
> +                "name": {"type": "string"},
> +                "nb_cfg": {"type": {"key": "integer"}},
> +                "external_ids": {
> +                    "type": {"key": "string", "value": "string",
> +                             "min": 0, "max": "unlimited"}}},
> +            "isRoot": true,
> +            "indexes": [["name"]]},
>          "Encap": {
>              "columns": {
>                  "type": {"type": {"key": {
> diff --git a/ovn-sb.xml b/ovn-sb.xml
> index 3ae9d4f92..77c5b1f38 100644
> --- a/ovn-sb.xml
> +++ b/ovn-sb.xml
> @@ -256,10 +256,8 @@
>      </column>
>
>      <column name="nb_cfg">
> -      Sequence number for the configuration.  When <code>ovn-controller</code>
> -      updates the configuration of a chassis from the contents of the
> -      southbound database, it copies <ref table="SB_Global" column="nb_cfg"/>
> -      from the <ref table="SB_Global"/> table into this column.
> +      Deprecated. This column is replaced by the <ref table="Chassis_Private"
> +      column="nb_cfg"/> column of the <ref table="Chassis_Private"/> table.
>      </column>
>
>      <column name="external_ids" key="ovn-bridge-mappings">
> @@ -366,6 +364,38 @@
>      </group>
>    </table>
>
> +  <table name="Chassis_Private" title="Chassis Private">
> +    <p>
> +      Each row in this table maintains per chassis private data that are
> +      accessed only by the owning chassis (write only) and ovn-northd, not by
> +      any other chassis.  These data are stored in this separate table instead
> +      of the <ref table="Chassis"/> table for performance considerations:
> +      the rows in this table can be conditionally monitored by chassises so
> +      that each chassis only get update notifications for its own row, to avoid
> +      unnecessary chassis private data update flooding in a large scale
> +      deployment.  (Future: this separation can be avoided if ovsdb conditional
> +      monitoring is supported on a set of columns)
> +    </p>
> +
> +    <column name="name">
> +      The name of the chassis that owns these chassis-private data.
> +    </column>
> +
> +    <column name="nb_cfg">
> +      Sequence number for the configuration.  When <code>ovn-controller</code>
> +      updates the configuration of a chassis from the contents of the
> +      southbound database, it copies <ref table="SB_Global" column="nb_cfg"/>
> +      from the <ref table="SB_Global"/> table into this column.
> +    </column>
> +
> +    <group title="Common Columns">
> +      The overall purpose of these columns is described under <code>Common
> +      Columns</code> at the beginning of this document.
> +
> +      <column name="external_ids"/>
> +    </group>
> +  </table>
> +
>    <table name="Encap" title="Encapsulation Types">
>      <p>
>        The <ref column="encaps" table="Chassis"/> column in the <ref
> diff --git a/tests/ovn-controller.at b/tests/ovn-controller.at
> index 63b2581c0..8c6e03611 100644
> --- a/tests/ovn-controller.at
> +++ b/tests/ovn-controller.at
> @@ -321,3 +321,29 @@ as ovn-sb
>  OVS_APP_EXIT_AND_WAIT([ovsdb-server])
>
>  AT_CLEANUP
> +
> +# Checks that ovn-controller increments the nb_cfg value in the Chassis_Private table
> +AT_SETUP([ovn-controller - Bump Chassis_Private nb_cfg value])
> +AT_KEYWORDS([ovn])
> +ovn_start
> +
> +net_add n1
> +sim_add hv
> +as hv
> +ovs-vsctl add-br br-phys
> +ovn_attach n1 br-phys 192.168.0.1
> +
> +OVS_WAIT_UNTIL([test xhv = x`ovn-sbctl --columns name --bare find chassis`])
> +
> +# Bump the NB_Global nb_cfg value
> +nb_global_id=$(ovn-nbctl --columns _uuid --bare find nb_global)
> +ovn-nbctl set NB_Global ${nb_global_id} nb_cfg=999
> +
> +# ovn-controller should bump the nb_cfg in the chassis_private table
> +OVS_WAIT_UNTIL([test x999 = x`ovn-sbctl --columns nb_cfg --bare find chassis_private`])
> +
> +# Assert that the the nb_cfg from the Chassis table was not incremented
> +OVS_WAIT_UNTIL([test x0 = x`ovn-sbctl --columns nb_cfg --bare find chassis`])
> +
> +OVN_CLEANUP([hv])
> +AT_CLEANUP
> --
> 2.25.2
>
> _______________________________________________
> dev mailing list
> dev@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>
Han Zhou March 28, 2020, 1:31 a.m. UTC | #2
Hi Lucas,

Thanks for v2. Please see my comments below.

On Mon, Mar 23, 2020 at 8:19 AM <lmartins@redhat.com> wrote:
>
> From: Lucas Alvares Gomes <lucasagomes@gmail.com>
>
> nb_cfg as a mechanism to "ping" OVN control plane is very useful
> in many ways. However, the current implementation will trigger
> update notifications flooding in the whole control plane. Each
> HV updates to SB the nb_cfg number and all these updates are
> notified to all the other HVs, which is O(n^2). Although updates
> are batched in fewers notifications than n^2, it still generates
> significant load on SB DB and also on ovn-controllers.
>
> To solve this problem and make the mechanism more useful in large
> scale producation deployment, this patch separates the per HV
> *private* data (write only by the owning chassis and not
> interesting to any other HVs) from the Chassis table to a separate
> table, so that each HV can conditionally monitor and get updates
> only for its own record.
>
> Test result shows great improvement:
> In a test environment with 1K sandbox HVs, and 10K ports created
> on 40 lswitches and 8 lrouters, do the sync test when the system
> is idle, with command:
>
>     time ovn-nbctl --wait=hv sync
>
> Original result:
> real    4m52.926s
> user    0m0.328s
> sys     0m0.004s
>
> With this patch:
> real    0m11.405s
> user    0m0.316s
> sys     0m0.016s
>
> Also, regarding backwards compatibility note that the nb_cfg from the
> Chassis table is no longer updated. If any system is relying on this
> mechanism they should start using the nb_cfg from the Chassis_Private
> table from now on.
>
> Co-authored-by: Han Zhou <hzhou8@ebay.com>
> Signed-off-by: Han Zhou <hzhou8@ebay.com>
> Signed-off-by: Lucas Alvares Gomes <lucasagomes@gmail.com>
> ---
>
> v1 - v2 -> Addressed Dumitru's comments: ovn-northd no longer monitoring
> the external_ids column from Chassis and Chassis_Private tables, added an
> item to the NEWS file regarding the backward compatibility of this patch
> and added an item to the TODO file regarding conditionally monitoring
> a set of columns.
>
>  NEWS                        |  6 +++++-
>  TODO.rst                    |  3 +++
>  controller/chassis.c        | 19 +++++++++++++++++--
>  controller/chassis.h        |  8 ++++++--
>  controller/ovn-controller.c | 37 ++++++++++++++++++++++++++++++------
>  lib/chassis-index.c         | 25 ++++++++++++++++++++++++
>  lib/chassis-index.h         |  6 ++++++
>  northd/ovn-northd.c         | 26 +++++++++++++++++++------
>  ovn-sb.ovsschema            | 13 +++++++++++--
>  ovn-sb.xml                  | 38 +++++++++++++++++++++++++++++++++----
>  tests/ovn-controller.at     | 26 +++++++++++++++++++++++++
>  11 files changed, 184 insertions(+), 23 deletions(-)
>
> diff --git a/NEWS b/NEWS
> index 393721d70..2192ea020 100644
> --- a/NEWS
> +++ b/NEWS
> @@ -1,6 +1,10 @@
>  Post-v20.03.0
>  --------------------------
> -
> +   - The nb_cfg column from the Chassis table in the OVN Southbound
> +     database has been deprecated and is no longer updated. A new table
> +     called Chassis_Private now contains the nb_cfg column which is
updated
> +     by incrementing the value in the NB_Global table, CMSes relying on
> +     this mechanism should update their code to use this new table.
>
>  OVN v20.03.0 - xx xxx xxxx
>  --------------------------
> diff --git a/TODO.rst b/TODO.rst
> index 809d1c91c..6c565a21a 100644
> --- a/TODO.rst
> +++ b/TODO.rst
> @@ -25,6 +25,9 @@
>  OVN To-do List
>  ==============
>
> +* Add OVSDB conditional monitoring to a set of columns. Once implemented
> +  we could get rid of the Chassis and Chassis_Private table separation.
> +
>  * Get incremental updates in ovn-controller and ovn-northd in some
>    sensible way.
>
> diff --git a/controller/chassis.c b/controller/chassis.c
> index 522893ead..99ea6b8fc 100644
> --- a/controller/chassis.c
> +++ b/controller/chassis.c
> @@ -585,14 +585,18 @@ chassis_update(const struct sbrec_chassis
*chassis_rec,
>  const struct sbrec_chassis *
>  chassis_run(struct ovsdb_idl_txn *ovnsb_idl_txn,
>              struct ovsdb_idl_index *sbrec_chassis_by_name,
> +            struct ovsdb_idl_index *sbrec_chassis_private_by_name,
>              const struct ovsrec_open_vswitch_table *ovs_table,
>              const struct sbrec_chassis_table *chassis_table,
>              const char *chassis_id,
>              const struct ovsrec_bridge *br_int,
> -            const struct sset *transport_zones)
> +            const struct sset *transport_zones,
> +            const struct sbrec_chassis_private **chassis_private)
>  {
>      struct ovs_chassis_cfg ovs_cfg;
>
> +    *chassis_private = NULL;
> +
>      /* Get the chassis config from the ovs table. */
>      ovs_chassis_cfg_init(&ovs_cfg);
>      if (!chassis_parse_ovs_config(ovs_table, br_int, &ovs_cfg)) {
> @@ -616,6 +620,15 @@ chassis_run(struct ovsdb_idl_txn *ovnsb_idl_txn,
>                                    chassis_id);
>      }
>
> +    const struct sbrec_chassis_private *chassis_private_rec =
> +        chassis_private_lookup_by_name(sbrec_chassis_private_by_name,
> +                                       chassis_id);
> +    if (!chassis_private_rec && ovnsb_idl_txn) {
> +        chassis_private_rec =
sbrec_chassis_private_insert(ovnsb_idl_txn);
> +        sbrec_chassis_private_set_name(chassis_private_rec, chassis_id);
> +    }
> +    *chassis_private = chassis_private_rec;
> +
>      ovs_chassis_cfg_destroy(&ovs_cfg);
>      return chassis_rec;
>  }
> @@ -669,7 +682,8 @@ chassis_get_mac(const struct sbrec_chassis
*chassis_rec,
>   * required. */
>  bool
>  chassis_cleanup(struct ovsdb_idl_txn *ovnsb_idl_txn,
> -                const struct sbrec_chassis *chassis_rec)
> +                const struct sbrec_chassis *chassis_rec,
> +                const struct sbrec_chassis_private *chassis_private_rec)
>  {
>      if (!chassis_rec) {
>          return true;
> @@ -679,6 +693,7 @@ chassis_cleanup(struct ovsdb_idl_txn *ovnsb_idl_txn,
>                                    "ovn-controller: unregistering chassis
'%s'",
>                                    chassis_rec->name);
>          sbrec_chassis_delete(chassis_rec);
> +        sbrec_chassis_private_delete(chassis_private_rec);

Shall we check if chassis_private_rec is NULL? Or maybe assert if we are
sure when chassis_rec is not NULL then chassis_private_rec is never NULL?

>      }
>      return false;
>  }
> diff --git a/controller/chassis.h b/controller/chassis.h
> index 178d2957e..81055b403 100644
> --- a/controller/chassis.h
> +++ b/controller/chassis.h
> @@ -17,6 +17,7 @@
>  #define OVN_CHASSIS_H 1
>
>  #include <stdbool.h>
> +#include "lib/ovn-sb-idl.h"
>
>  struct ovsdb_idl;
>  struct ovsdb_idl_index;
> @@ -33,12 +34,15 @@ void chassis_register_ovs_idl(struct ovsdb_idl *);
>  const struct sbrec_chassis *chassis_run(
>      struct ovsdb_idl_txn *ovnsb_idl_txn,
>      struct ovsdb_idl_index *sbrec_chassis_by_name,
> +    struct ovsdb_idl_index *sbrec_chassis_private_by_name,
>      const struct ovsrec_open_vswitch_table *,
>      const struct sbrec_chassis_table *,
>      const char *chassis_id, const struct ovsrec_bridge *br_int,
> -    const struct sset *transport_zones);
> +    const struct sset *transport_zones,
> +    const struct sbrec_chassis_private **chassis_private);
>  bool chassis_cleanup(struct ovsdb_idl_txn *ovnsb_idl_txn,
> -                     const struct sbrec_chassis *);
> +                     const struct sbrec_chassis *,
> +                     const struct sbrec_chassis_private *);
>  bool chassis_get_mac(const struct sbrec_chassis *chassis,
>                       const char *bridge_mapping,
>                       struct eth_addr *chassis_mac);
> diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c
> index 2893eaac1..1542b2ad7 100644
> --- a/controller/ovn-controller.c
> +++ b/controller/ovn-controller.c
> @@ -155,6 +155,7 @@ update_sb_monitors(struct ovsdb_idl *ovnsb_idl,
>      struct ovsdb_idl_condition ce =  OVSDB_IDL_CONDITION_INIT(&ce);
>      struct ovsdb_idl_condition ip_mcast =
OVSDB_IDL_CONDITION_INIT(&ip_mcast);
>      struct ovsdb_idl_condition igmp = OVSDB_IDL_CONDITION_INIT(&igmp);
> +    struct ovsdb_idl_condition chprv = OVSDB_IDL_CONDITION_INIT(&chprv);
>
>      if (monitor_all) {
>          ovsdb_idl_condition_add_clause_true(&pb);
> @@ -165,6 +166,7 @@ update_sb_monitors(struct ovsdb_idl *ovnsb_idl,
>          ovsdb_idl_condition_add_clause_true(&ce);
>          ovsdb_idl_condition_add_clause_true(&ip_mcast);
>          ovsdb_idl_condition_add_clause_true(&igmp);
> +        ovsdb_idl_condition_add_clause_true(&chprv);
>          goto out;
>      }
>
> @@ -196,6 +198,10 @@ update_sb_monitors(struct ovsdb_idl *ovnsb_idl,
>
 &chassis->header_.uuid);
>          sbrec_igmp_group_add_clause_chassis(&igmp, OVSDB_F_EQ,
>                                              &chassis->header_.uuid);
> +
> +        /* Monitors Chassis_Private record for current chassis only */
> +        sbrec_chassis_private_add_clause_name(&chprv, OVSDB_F_EQ,
> +                                              chassis->name);
>      }
>      if (local_ifaces) {
>          const char *name;
> @@ -229,6 +235,7 @@ out:
>      sbrec_controller_event_set_condition(ovnsb_idl, &ce);
>      sbrec_ip_multicast_set_condition(ovnsb_idl, &ip_mcast);
>      sbrec_igmp_group_set_condition(ovnsb_idl, &igmp);
> +    sbrec_chassis_private_set_condition(ovnsb_idl, &chprv);
>      ovsdb_idl_condition_destroy(&pb);
>      ovsdb_idl_condition_destroy(&lf);
>      ovsdb_idl_condition_destroy(&mb);
> @@ -237,6 +244,7 @@ out:
>      ovsdb_idl_condition_destroy(&ce);
>      ovsdb_idl_condition_destroy(&ip_mcast);
>      ovsdb_idl_condition_destroy(&igmp);
> +    ovsdb_idl_condition_destroy(&chprv);
>  }
>
>  static const char *
> @@ -1759,6 +1767,8 @@ main(int argc, char *argv[])
>
>      struct ovsdb_idl_index *sbrec_chassis_by_name
>          = chassis_index_create(ovnsb_idl_loop.idl);
> +    struct ovsdb_idl_index *sbrec_chassis_private_by_name
> +        = chassis_private_index_create(ovnsb_idl_loop.idl);
>      struct ovsdb_idl_index *sbrec_multicast_group_by_name_datapath
>          = mcast_group_index_create(ovnsb_idl_loop.idl);
>      struct ovsdb_idl_index *sbrec_port_binding_by_name
> @@ -1784,7 +1794,8 @@ main(int argc, char *argv[])
>          = igmp_group_index_create(ovnsb_idl_loop.idl);
>
>      ovsdb_idl_track_add_all(ovnsb_idl_loop.idl);
> -    ovsdb_idl_omit_alert(ovnsb_idl_loop.idl, &sbrec_chassis_col_nb_cfg);
> +    ovsdb_idl_omit_alert(ovnsb_idl_loop.idl,
> +                         &sbrec_chassis_private_col_nb_cfg);
>
>      /* Omit the external_ids column of all the tables except for -
>       *  - DNS. pinctrl.c uses the external_ids column of DNS,
> @@ -1820,6 +1831,10 @@ main(int argc, char *argv[])
>      ovsdb_idl_omit(ovnsb_idl_loop.idl, &sbrec_connection_col_status);
>      ovsdb_idl_omit(ovnsb_idl_loop.idl, &sbrec_connection_col_target);
>
> +    /* Do not monitor Chassis_Private external_ids */
> +    ovsdb_idl_omit(ovnsb_idl_loop.idl,
> +                   &sbrec_chassis_private_col_external_ids);
> +
>      update_sb_monitors(ovnsb_idl_loop.idl, NULL, NULL, NULL, false);
>
>      stopwatch_create(CONTROLLER_LOOP_STOPWATCH_NAME, SW_MS);
> @@ -1997,10 +2012,13 @@ main(int argc, char *argv[])
>                  process_br_int(ovs_idl_txn, bridge_table, ovs_table);
>              const char *chassis_id = get_ovs_chassis_id(ovs_table);
>              const struct sbrec_chassis *chassis = NULL;
> +            const struct sbrec_chassis_private *chassis_private = NULL;
>              if (chassis_id) {
>                  chassis = chassis_run(ovnsb_idl_txn,
sbrec_chassis_by_name,
> +                                      sbrec_chassis_private_by_name,
>                                        ovs_table, chassis_table,
chassis_id,
> -                                      br_int, &transport_zones);
> +                                      br_int, &transport_zones,
> +                                      &chassis_private);
>              }
>
>              if (br_int) {
> @@ -2125,10 +2143,10 @@ main(int argc, char *argv[])
>                  engine_set_force_recompute(false);
>              }
>
> -            if (ovnsb_idl_txn && chassis) {
> +            if (ovnsb_idl_txn && chassis_private) {
>                  int64_t cur_cfg = ofctrl_get_cur_cfg();
> -                if (cur_cfg && cur_cfg != chassis->nb_cfg) {
> -                    sbrec_chassis_set_nb_cfg(chassis, cur_cfg);
> +                if (cur_cfg && cur_cfg != chassis_private->nb_cfg) {
> +                    sbrec_chassis_private_set_nb_cfg(chassis_private,
cur_cfg);
>                  }
>              }
>
> @@ -2231,10 +2249,17 @@ main(int argc, char *argv[])
>                     ? chassis_lookup_by_name(sbrec_chassis_by_name,
chassis_id)
>                     : NULL);
>
> +            const struct sbrec_chassis_private *chassis_private
> +                = (chassis_id
> +                   ? chassis_private_lookup_by_name(
> +                         sbrec_chassis_private_by_name, chassis_id)
> +                   : NULL);
> +
>              /* Run all of the cleanup functions, even if one of them
returns
>               * false. We're done if all of them return true. */
>              done = binding_cleanup(ovnsb_idl_txn, port_binding_table,
chassis);
> -            done = chassis_cleanup(ovnsb_idl_txn, chassis) && done;
> +            done = chassis_cleanup(ovnsb_idl_txn,
> +                                   chassis, chassis_private) && done;
>              done = encaps_cleanup(ovs_idl_txn, br_int) && done;
>              done = igmp_group_cleanup(ovnsb_idl_txn, sbrec_igmp_group)
&& done;
>              if (done) {
> diff --git a/lib/chassis-index.c b/lib/chassis-index.c
> index 39066f4cc..d760124b4 100644
> --- a/lib/chassis-index.c
> +++ b/lib/chassis-index.c
> @@ -40,6 +40,31 @@ chassis_lookup_by_name(struct ovsdb_idl_index
*sbrec_chassis_by_name,
>      return retval;
>  }
>
> +struct ovsdb_idl_index *
> +chassis_private_index_create(struct ovsdb_idl *idl)
> +{
> +    return ovsdb_idl_index_create1(idl, &sbrec_chassis_private_col_name);
> +}
> +
> +/* Finds and returns the chassis with the given 'name', or NULL if no
such
> + * chassis exists. */
> +const struct sbrec_chassis_private *
> +chassis_private_lookup_by_name(
> +    struct ovsdb_idl_index *sbrec_chassis_private_by_name,
> +    const char *name)
> +{
> +    struct sbrec_chassis_private *target =
> +
 sbrec_chassis_private_index_init_row(sbrec_chassis_private_by_name);
> +    sbrec_chassis_private_index_set_name(target, name);
> +
> +    struct sbrec_chassis_private *retval =
sbrec_chassis_private_index_find(
> +        sbrec_chassis_private_by_name, target);
> +
> +    sbrec_chassis_private_index_destroy_row(target);
> +
> +    return retval;
> +}
> +
>  struct ovsdb_idl_index *
>  ha_chassis_group_index_create(struct ovsdb_idl *idl)
>  {
> diff --git a/lib/chassis-index.h b/lib/chassis-index.h
> index 302e5f0fd..b9b331f34 100644
> --- a/lib/chassis-index.h
> +++ b/lib/chassis-index.h
> @@ -23,6 +23,12 @@ struct ovsdb_idl_index *chassis_index_create(struct
ovsdb_idl *);
>  const struct sbrec_chassis *chassis_lookup_by_name(
>      struct ovsdb_idl_index *sbrec_chassis_by_name, const char *name);
>
> +struct ovsdb_idl_index *chassis_private_index_create(struct ovsdb_idl *);
> +
> +const struct sbrec_chassis_private *
> +chassis_private_lookup_by_name(
> +    struct ovsdb_idl_index *sbrec_chassis_private_by_name, const char
*name);
> +
>  struct ovsdb_idl_index *ha_chassis_group_index_create(struct ovsdb_idl
*idl);
>  const struct sbrec_ha_chassis_group *ha_chassis_group_lookup_by_name(
>      struct ovsdb_idl_index *sbrec_ha_chassis_grp_by_name, const char
*name);
> diff --git a/northd/ovn-northd.c b/northd/ovn-northd.c
> index d42a9892a..8ad3842e8 100644
> --- a/northd/ovn-northd.c
> +++ b/northd/ovn-northd.c
> @@ -11174,6 +11174,11 @@ static const char *rbac_chassis_auth[] =
>  static const char *rbac_chassis_update[] =
>      {"nb_cfg", "external_ids", "encaps", "vtep_logical_switches"};
>
> +static const char *rbac_chassis_private_auth[] =
> +    {"chassis_name"};
> +static const char *rbac_chassis_private_update[] =
> +    {"nb_cfg"};
> +
>  static const char *rbac_encap_auth[] =
>      {"chassis_name"};
>  static const char *rbac_encap_update[] =
> @@ -11211,6 +11216,14 @@ static struct rbac_perm_cfg {
>          .update = rbac_chassis_update,
>          .n_update = ARRAY_SIZE(rbac_chassis_update),
>          .row = NULL
> +    },{
> +        .table = "Chassis_Private",
> +        .auth = rbac_chassis_private_auth,
> +        .n_auth = ARRAY_SIZE(rbac_chassis_private_auth),
> +        .insdel = true,
> +        .update = rbac_chassis_private_update,
> +        .n_update = ARRAY_SIZE(rbac_chassis_private_update),
> +        .row = NULL
>      },{
>          .table = "Encap",
>          .auth = rbac_encap_auth,
> @@ -11380,11 +11393,10 @@ update_northbound_cfg(struct northd_context
*ctx,
>      /* Update northbound hv_cfg if appropriate. */
>      if (nbg) {
>          /* Find minimum nb_cfg among all chassis. */
> -        const struct sbrec_chassis *chassis;
> +        const struct sbrec_chassis_private *chassis;
>          int64_t hv_cfg = nbg->nb_cfg;
> -        SBREC_CHASSIS_FOR_EACH (chassis, ctx->ovnsb_idl) {
> -            if (!smap_get_bool(&chassis->external_ids, "is-remote",
false) &&
> -                chassis->nb_cfg < hv_cfg) {
> +        SBREC_CHASSIS_PRIVATE_FOR_EACH (chassis, ctx->ovnsb_idl) {
> +            if (chassis->nb_cfg < hv_cfg) {

As mentioned by Dumitru in v1 review, we need to preserve the "is-remote"
check. Otherwise, when interconnection is used the ovn-nbctl --wait=hv sync
command will always fail because remote gateways' nb_cfg record will never
get updated.

>                  hv_cfg = chassis->nb_cfg;
>              }
>          }
> @@ -11670,9 +11682,11 @@ main(int argc, char *argv[])
>      ovsdb_idl_add_column(ovnsb_idl_loop.idl,
&sbrec_meter_band_col_burst_size);
>
>      ovsdb_idl_add_table(ovnsb_idl_loop.idl, &sbrec_table_chassis);
> -    ovsdb_idl_add_column(ovnsb_idl_loop.idl, &sbrec_chassis_col_nb_cfg);
>      ovsdb_idl_add_column(ovnsb_idl_loop.idl, &sbrec_chassis_col_name);
> -    ovsdb_idl_add_column(ovnsb_idl_loop.idl,
&sbrec_chassis_col_external_ids);

For the reason above we can't omit the external_ids monitoring. It is used
to check if a chassis is remote.

> +
> +    ovsdb_idl_add_table(ovnsb_idl_loop.idl,
&sbrec_table_chassis_private);
> +    ovsdb_idl_add_column(ovnsb_idl_loop.idl,
> +                         &sbrec_chassis_private_col_nb_cfg);
>
>      ovsdb_idl_add_table(ovnsb_idl_loop.idl, &sbrec_table_ha_chassis);
>      add_column_noalert(ovnsb_idl_loop.idl,
> diff --git a/ovn-sb.ovsschema b/ovn-sb.ovsschema
> index d89f8dbbb..fa53c6a4c 100644
> --- a/ovn-sb.ovsschema
> +++ b/ovn-sb.ovsschema
> @@ -1,7 +1,7 @@
>  {
>      "name": "OVN_Southbound",
> -    "version": "2.7.0",
> -    "cksum": "4286723485 21693",
> +    "version": "2.8.0",
> +    "cksum": "3504843097 22072",
>      "tables": {
>          "SB_Global": {
>              "columns": {
> @@ -43,6 +43,15 @@
>                                                "max": "unlimited"}}},
>              "isRoot": true,
>              "indexes": [["name"]]},
> +        "Chassis_Private": {
> +            "columns": {
> +                "name": {"type": "string"},
> +                "nb_cfg": {"type": {"key": "integer"}},
> +                "external_ids": {
> +                    "type": {"key": "string", "value": "string",
> +                             "min": 0, "max": "unlimited"}}},

I wonder if we really need external_ids column in this "private" table. It
is one-to-one mapping to the chassis table, so I assume whatever
external-ids should be able to be added in chassis table instead of here.

> +            "isRoot": true,
> +            "indexes": [["name"]]},
>          "Encap": {
>              "columns": {
>                  "type": {"type": {"key": {
> diff --git a/ovn-sb.xml b/ovn-sb.xml
> index 3ae9d4f92..77c5b1f38 100644
> --- a/ovn-sb.xml
> +++ b/ovn-sb.xml
> @@ -256,10 +256,8 @@
>      </column>
>
>      <column name="nb_cfg">
> -      Sequence number for the configuration.  When
<code>ovn-controller</code>
> -      updates the configuration of a chassis from the contents of the
> -      southbound database, it copies <ref table="SB_Global"
column="nb_cfg"/>
> -      from the <ref table="SB_Global"/> table into this column.
> +      Deprecated. This column is replaced by the <ref
table="Chassis_Private"
> +      column="nb_cfg"/> column of the <ref table="Chassis_Private"/>
table.
>      </column>
>
>      <column name="external_ids" key="ovn-bridge-mappings">
> @@ -366,6 +364,38 @@
>      </group>
>    </table>
>
> +  <table name="Chassis_Private" title="Chassis Private">
> +    <p>
> +      Each row in this table maintains per chassis private data that are
> +      accessed only by the owning chassis (write only) and ovn-northd,
not by
> +      any other chassis.  These data are stored in this separate table
instead
> +      of the <ref table="Chassis"/> table for performance considerations:
> +      the rows in this table can be conditionally monitored by chassises
so
> +      that each chassis only get update notifications for its own row,
to avoid
> +      unnecessary chassis private data update flooding in a large scale
> +      deployment.  (Future: this separation can be avoided if ovsdb
conditional
> +      monitoring is supported on a set of columns)
> +    </p>
> +
> +    <column name="name">
> +      The name of the chassis that owns these chassis-private data.
> +    </column>
> +
> +    <column name="nb_cfg">
> +      Sequence number for the configuration.  When
<code>ovn-controller</code>
> +      updates the configuration of a chassis from the contents of the
> +      southbound database, it copies <ref table="SB_Global"
column="nb_cfg"/>
> +      from the <ref table="SB_Global"/> table into this column.
> +    </column>
> +
> +    <group title="Common Columns">
> +      The overall purpose of these columns is described under
<code>Common
> +      Columns</code> at the beginning of this document.
> +
> +      <column name="external_ids"/>
> +    </group>
> +  </table>
> +
>    <table name="Encap" title="Encapsulation Types">
>      <p>
>        The <ref column="encaps" table="Chassis"/> column in the <ref
> diff --git a/tests/ovn-controller.at b/tests/ovn-controller.at
> index 63b2581c0..8c6e03611 100644
> --- a/tests/ovn-controller.at
> +++ b/tests/ovn-controller.at
> @@ -321,3 +321,29 @@ as ovn-sb
>  OVS_APP_EXIT_AND_WAIT([ovsdb-server])
>
>  AT_CLEANUP
> +
> +# Checks that ovn-controller increments the nb_cfg value in the
Chassis_Private table
> +AT_SETUP([ovn-controller - Bump Chassis_Private nb_cfg value])
> +AT_KEYWORDS([ovn])
> +ovn_start
> +
> +net_add n1
> +sim_add hv
> +as hv
> +ovs-vsctl add-br br-phys
> +ovn_attach n1 br-phys 192.168.0.1
> +
> +OVS_WAIT_UNTIL([test xhv = x`ovn-sbctl --columns name --bare find
chassis`])
> +
> +# Bump the NB_Global nb_cfg value
> +nb_global_id=$(ovn-nbctl --columns _uuid --bare find nb_global)
> +ovn-nbctl set NB_Global ${nb_global_id} nb_cfg=999
> +
> +# ovn-controller should bump the nb_cfg in the chassis_private table
> +OVS_WAIT_UNTIL([test x999 = x`ovn-sbctl --columns nb_cfg --bare find
chassis_private`])
> +
> +# Assert that the the nb_cfg from the Chassis table was not incremented
> +OVS_WAIT_UNTIL([test x0 = x`ovn-sbctl --columns nb_cfg --bare find
chassis`])
> +
> +OVN_CLEANUP([hv])
> +AT_CLEANUP
> --
> 2.25.2
>
> _______________________________________________
> dev mailing list
> dev@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Han Zhou July 29, 2020, 12:30 a.m. UTC | #3
On Fri, Mar 27, 2020 at 6:31 PM Han Zhou <hzhou@ovn.org> wrote:
>
> Hi Lucas,
>
> Thanks for v2. Please see my comments below.
>
> On Mon, Mar 23, 2020 at 8:19 AM <lmartins@redhat.com> wrote:
> >
> > From: Lucas Alvares Gomes <lucasagomes@gmail.com>
> >
> > nb_cfg as a mechanism to "ping" OVN control plane is very useful
> > in many ways. However, the current implementation will trigger
> > update notifications flooding in the whole control plane. Each
> > HV updates to SB the nb_cfg number and all these updates are
> > notified to all the other HVs, which is O(n^2). Although updates
> > are batched in fewers notifications than n^2, it still generates
> > significant load on SB DB and also on ovn-controllers.
> >
> > To solve this problem and make the mechanism more useful in large
> > scale producation deployment, this patch separates the per HV
> > *private* data (write only by the owning chassis and not
> > interesting to any other HVs) from the Chassis table to a separate
> > table, so that each HV can conditionally monitor and get updates
> > only for its own record.
> >
> > Test result shows great improvement:
> > In a test environment with 1K sandbox HVs, and 10K ports created
> > on 40 lswitches and 8 lrouters, do the sync test when the system
> > is idle, with command:
> >
> >     time ovn-nbctl --wait=hv sync
> >
> > Original result:
> > real    4m52.926s
> > user    0m0.328s
> > sys     0m0.004s
> >
> > With this patch:
> > real    0m11.405s
> > user    0m0.316s
> > sys     0m0.016s
> >
> > Also, regarding backwards compatibility note that the nb_cfg from the
> > Chassis table is no longer updated. If any system is relying on this
> > mechanism they should start using the nb_cfg from the Chassis_Private
> > table from now on.
> >
> > Co-authored-by: Han Zhou <hzhou8@ebay.com>
> > Signed-off-by: Han Zhou <hzhou8@ebay.com>
> > Signed-off-by: Lucas Alvares Gomes <lucasagomes@gmail.com>
> > ---
> >
> > v1 - v2 -> Addressed Dumitru's comments: ovn-northd no longer monitoring
> > the external_ids column from Chassis and Chassis_Private tables, added
an
> > item to the NEWS file regarding the backward compatibility of this patch
> > and added an item to the TODO file regarding conditionally monitoring
> > a set of columns.
> >
> >  NEWS                        |  6 +++++-
> >  TODO.rst                    |  3 +++
> >  controller/chassis.c        | 19 +++++++++++++++++--
> >  controller/chassis.h        |  8 ++++++--
> >  controller/ovn-controller.c | 37 ++++++++++++++++++++++++++++++------
> >  lib/chassis-index.c         | 25 ++++++++++++++++++++++++
> >  lib/chassis-index.h         |  6 ++++++
> >  northd/ovn-northd.c         | 26 +++++++++++++++++++------
> >  ovn-sb.ovsschema            | 13 +++++++++++--
> >  ovn-sb.xml                  | 38 +++++++++++++++++++++++++++++++++----
> >  tests/ovn-controller.at     | 26 +++++++++++++++++++++++++
> >  11 files changed, 184 insertions(+), 23 deletions(-)
> >
> > diff --git a/NEWS b/NEWS
> > index 393721d70..2192ea020 100644
> > --- a/NEWS
> > +++ b/NEWS
> > @@ -1,6 +1,10 @@
> >  Post-v20.03.0
> >  --------------------------
> > -
> > +   - The nb_cfg column from the Chassis table in the OVN Southbound
> > +     database has been deprecated and is no longer updated. A new table
> > +     called Chassis_Private now contains the nb_cfg column which is
updated
> > +     by incrementing the value in the NB_Global table, CMSes relying on
> > +     this mechanism should update their code to use this new table.
> >
> >  OVN v20.03.0 - xx xxx xxxx
> >  --------------------------
> > diff --git a/TODO.rst b/TODO.rst
> > index 809d1c91c..6c565a21a 100644
> > --- a/TODO.rst
> > +++ b/TODO.rst
> > @@ -25,6 +25,9 @@
> >  OVN To-do List
> >  ==============
> >
> > +* Add OVSDB conditional monitoring to a set of columns. Once
implemented
> > +  we could get rid of the Chassis and Chassis_Private table separation.
> > +
> >  * Get incremental updates in ovn-controller and ovn-northd in some
> >    sensible way.
> >
> > diff --git a/controller/chassis.c b/controller/chassis.c
> > index 522893ead..99ea6b8fc 100644
> > --- a/controller/chassis.c
> > +++ b/controller/chassis.c
> > @@ -585,14 +585,18 @@ chassis_update(const struct sbrec_chassis
*chassis_rec,
> >  const struct sbrec_chassis *
> >  chassis_run(struct ovsdb_idl_txn *ovnsb_idl_txn,
> >              struct ovsdb_idl_index *sbrec_chassis_by_name,
> > +            struct ovsdb_idl_index *sbrec_chassis_private_by_name,
> >              const struct ovsrec_open_vswitch_table *ovs_table,
> >              const struct sbrec_chassis_table *chassis_table,
> >              const char *chassis_id,
> >              const struct ovsrec_bridge *br_int,
> > -            const struct sset *transport_zones)
> > +            const struct sset *transport_zones,
> > +            const struct sbrec_chassis_private **chassis_private)
> >  {
> >      struct ovs_chassis_cfg ovs_cfg;
> >
> > +    *chassis_private = NULL;
> > +
> >      /* Get the chassis config from the ovs table. */
> >      ovs_chassis_cfg_init(&ovs_cfg);
> >      if (!chassis_parse_ovs_config(ovs_table, br_int, &ovs_cfg)) {
> > @@ -616,6 +620,15 @@ chassis_run(struct ovsdb_idl_txn *ovnsb_idl_txn,
> >                                    chassis_id);
> >      }
> >
> > +    const struct sbrec_chassis_private *chassis_private_rec =
> > +        chassis_private_lookup_by_name(sbrec_chassis_private_by_name,
> > +                                       chassis_id);
> > +    if (!chassis_private_rec && ovnsb_idl_txn) {
> > +        chassis_private_rec =
sbrec_chassis_private_insert(ovnsb_idl_txn);
> > +        sbrec_chassis_private_set_name(chassis_private_rec,
chassis_id);
> > +    }
> > +    *chassis_private = chassis_private_rec;
> > +
> >      ovs_chassis_cfg_destroy(&ovs_cfg);
> >      return chassis_rec;
> >  }
> > @@ -669,7 +682,8 @@ chassis_get_mac(const struct sbrec_chassis
*chassis_rec,
> >   * required. */
> >  bool
> >  chassis_cleanup(struct ovsdb_idl_txn *ovnsb_idl_txn,
> > -                const struct sbrec_chassis *chassis_rec)
> > +                const struct sbrec_chassis *chassis_rec,
> > +                const struct sbrec_chassis_private
*chassis_private_rec)
> >  {
> >      if (!chassis_rec) {
> >          return true;
> > @@ -679,6 +693,7 @@ chassis_cleanup(struct ovsdb_idl_txn *ovnsb_idl_txn,
> >                                    "ovn-controller: unregistering
chassis '%s'",
> >                                    chassis_rec->name);
> >          sbrec_chassis_delete(chassis_rec);
> > +        sbrec_chassis_private_delete(chassis_private_rec);
>
> Shall we check if chassis_private_rec is NULL? Or maybe assert if we are
sure when chassis_rec is not NULL then chassis_private_rec is never NULL?
>
> >      }
> >      return false;
> >  }
> > diff --git a/controller/chassis.h b/controller/chassis.h
> > index 178d2957e..81055b403 100644
> > --- a/controller/chassis.h
> > +++ b/controller/chassis.h
> > @@ -17,6 +17,7 @@
> >  #define OVN_CHASSIS_H 1
> >
> >  #include <stdbool.h>
> > +#include "lib/ovn-sb-idl.h"
> >
> >  struct ovsdb_idl;
> >  struct ovsdb_idl_index;
> > @@ -33,12 +34,15 @@ void chassis_register_ovs_idl(struct ovsdb_idl *);
> >  const struct sbrec_chassis *chassis_run(
> >      struct ovsdb_idl_txn *ovnsb_idl_txn,
> >      struct ovsdb_idl_index *sbrec_chassis_by_name,
> > +    struct ovsdb_idl_index *sbrec_chassis_private_by_name,
> >      const struct ovsrec_open_vswitch_table *,
> >      const struct sbrec_chassis_table *,
> >      const char *chassis_id, const struct ovsrec_bridge *br_int,
> > -    const struct sset *transport_zones);
> > +    const struct sset *transport_zones,
> > +    const struct sbrec_chassis_private **chassis_private);
> >  bool chassis_cleanup(struct ovsdb_idl_txn *ovnsb_idl_txn,
> > -                     const struct sbrec_chassis *);
> > +                     const struct sbrec_chassis *,
> > +                     const struct sbrec_chassis_private *);
> >  bool chassis_get_mac(const struct sbrec_chassis *chassis,
> >                       const char *bridge_mapping,
> >                       struct eth_addr *chassis_mac);
> > diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c
> > index 2893eaac1..1542b2ad7 100644
> > --- a/controller/ovn-controller.c
> > +++ b/controller/ovn-controller.c
> > @@ -155,6 +155,7 @@ update_sb_monitors(struct ovsdb_idl *ovnsb_idl,
> >      struct ovsdb_idl_condition ce =  OVSDB_IDL_CONDITION_INIT(&ce);
> >      struct ovsdb_idl_condition ip_mcast =
OVSDB_IDL_CONDITION_INIT(&ip_mcast);
> >      struct ovsdb_idl_condition igmp = OVSDB_IDL_CONDITION_INIT(&igmp);
> > +    struct ovsdb_idl_condition chprv =
OVSDB_IDL_CONDITION_INIT(&chprv);
> >
> >      if (monitor_all) {
> >          ovsdb_idl_condition_add_clause_true(&pb);
> > @@ -165,6 +166,7 @@ update_sb_monitors(struct ovsdb_idl *ovnsb_idl,
> >          ovsdb_idl_condition_add_clause_true(&ce);
> >          ovsdb_idl_condition_add_clause_true(&ip_mcast);
> >          ovsdb_idl_condition_add_clause_true(&igmp);
> > +        ovsdb_idl_condition_add_clause_true(&chprv);
> >          goto out;
> >      }
> >
> > @@ -196,6 +198,10 @@ update_sb_monitors(struct ovsdb_idl *ovnsb_idl,
> >
 &chassis->header_.uuid);
> >          sbrec_igmp_group_add_clause_chassis(&igmp, OVSDB_F_EQ,
> >                                              &chassis->header_.uuid);
> > +
> > +        /* Monitors Chassis_Private record for current chassis only */
> > +        sbrec_chassis_private_add_clause_name(&chprv, OVSDB_F_EQ,
> > +                                              chassis->name);
> >      }
> >      if (local_ifaces) {
> >          const char *name;
> > @@ -229,6 +235,7 @@ out:
> >      sbrec_controller_event_set_condition(ovnsb_idl, &ce);
> >      sbrec_ip_multicast_set_condition(ovnsb_idl, &ip_mcast);
> >      sbrec_igmp_group_set_condition(ovnsb_idl, &igmp);
> > +    sbrec_chassis_private_set_condition(ovnsb_idl, &chprv);
> >      ovsdb_idl_condition_destroy(&pb);
> >      ovsdb_idl_condition_destroy(&lf);
> >      ovsdb_idl_condition_destroy(&mb);
> > @@ -237,6 +244,7 @@ out:
> >      ovsdb_idl_condition_destroy(&ce);
> >      ovsdb_idl_condition_destroy(&ip_mcast);
> >      ovsdb_idl_condition_destroy(&igmp);
> > +    ovsdb_idl_condition_destroy(&chprv);
> >  }
> >
> >  static const char *
> > @@ -1759,6 +1767,8 @@ main(int argc, char *argv[])
> >
> >      struct ovsdb_idl_index *sbrec_chassis_by_name
> >          = chassis_index_create(ovnsb_idl_loop.idl);
> > +    struct ovsdb_idl_index *sbrec_chassis_private_by_name
> > +        = chassis_private_index_create(ovnsb_idl_loop.idl);
> >      struct ovsdb_idl_index *sbrec_multicast_group_by_name_datapath
> >          = mcast_group_index_create(ovnsb_idl_loop.idl);
> >      struct ovsdb_idl_index *sbrec_port_binding_by_name
> > @@ -1784,7 +1794,8 @@ main(int argc, char *argv[])
> >          = igmp_group_index_create(ovnsb_idl_loop.idl);
> >
> >      ovsdb_idl_track_add_all(ovnsb_idl_loop.idl);
> > -    ovsdb_idl_omit_alert(ovnsb_idl_loop.idl,
&sbrec_chassis_col_nb_cfg);
> > +    ovsdb_idl_omit_alert(ovnsb_idl_loop.idl,
> > +                         &sbrec_chassis_private_col_nb_cfg);
> >
> >      /* Omit the external_ids column of all the tables except for -
> >       *  - DNS. pinctrl.c uses the external_ids column of DNS,
> > @@ -1820,6 +1831,10 @@ main(int argc, char *argv[])
> >      ovsdb_idl_omit(ovnsb_idl_loop.idl, &sbrec_connection_col_status);
> >      ovsdb_idl_omit(ovnsb_idl_loop.idl, &sbrec_connection_col_target);
> >
> > +    /* Do not monitor Chassis_Private external_ids */
> > +    ovsdb_idl_omit(ovnsb_idl_loop.idl,
> > +                   &sbrec_chassis_private_col_external_ids);
> > +
> >      update_sb_monitors(ovnsb_idl_loop.idl, NULL, NULL, NULL, false);
> >
> >      stopwatch_create(CONTROLLER_LOOP_STOPWATCH_NAME, SW_MS);
> > @@ -1997,10 +2012,13 @@ main(int argc, char *argv[])
> >                  process_br_int(ovs_idl_txn, bridge_table, ovs_table);
> >              const char *chassis_id = get_ovs_chassis_id(ovs_table);
> >              const struct sbrec_chassis *chassis = NULL;
> > +            const struct sbrec_chassis_private *chassis_private = NULL;
> >              if (chassis_id) {
> >                  chassis = chassis_run(ovnsb_idl_txn,
sbrec_chassis_by_name,
> > +                                      sbrec_chassis_private_by_name,
> >                                        ovs_table, chassis_table,
chassis_id,
> > -                                      br_int, &transport_zones);
> > +                                      br_int, &transport_zones,
> > +                                      &chassis_private);
> >              }
> >
> >              if (br_int) {
> > @@ -2125,10 +2143,10 @@ main(int argc, char *argv[])
> >                  engine_set_force_recompute(false);
> >              }
> >
> > -            if (ovnsb_idl_txn && chassis) {
> > +            if (ovnsb_idl_txn && chassis_private) {
> >                  int64_t cur_cfg = ofctrl_get_cur_cfg();
> > -                if (cur_cfg && cur_cfg != chassis->nb_cfg) {
> > -                    sbrec_chassis_set_nb_cfg(chassis, cur_cfg);
> > +                if (cur_cfg && cur_cfg != chassis_private->nb_cfg) {
> > +                    sbrec_chassis_private_set_nb_cfg(chassis_private,
cur_cfg);
> >                  }
> >              }
> >
> > @@ -2231,10 +2249,17 @@ main(int argc, char *argv[])
> >                     ? chassis_lookup_by_name(sbrec_chassis_by_name,
chassis_id)
> >                     : NULL);
> >
> > +            const struct sbrec_chassis_private *chassis_private
> > +                = (chassis_id
> > +                   ? chassis_private_lookup_by_name(
> > +                         sbrec_chassis_private_by_name, chassis_id)
> > +                   : NULL);
> > +
> >              /* Run all of the cleanup functions, even if one of them
returns
> >               * false. We're done if all of them return true. */
> >              done = binding_cleanup(ovnsb_idl_txn, port_binding_table,
chassis);
> > -            done = chassis_cleanup(ovnsb_idl_txn, chassis) && done;
> > +            done = chassis_cleanup(ovnsb_idl_txn,
> > +                                   chassis, chassis_private) && done;
> >              done = encaps_cleanup(ovs_idl_txn, br_int) && done;
> >              done = igmp_group_cleanup(ovnsb_idl_txn, sbrec_igmp_group)
&& done;
> >              if (done) {
> > diff --git a/lib/chassis-index.c b/lib/chassis-index.c
> > index 39066f4cc..d760124b4 100644
> > --- a/lib/chassis-index.c
> > +++ b/lib/chassis-index.c
> > @@ -40,6 +40,31 @@ chassis_lookup_by_name(struct ovsdb_idl_index
*sbrec_chassis_by_name,
> >      return retval;
> >  }
> >
> > +struct ovsdb_idl_index *
> > +chassis_private_index_create(struct ovsdb_idl *idl)
> > +{
> > +    return ovsdb_idl_index_create1(idl,
&sbrec_chassis_private_col_name);
> > +}
> > +
> > +/* Finds and returns the chassis with the given 'name', or NULL if no
such
> > + * chassis exists. */
> > +const struct sbrec_chassis_private *
> > +chassis_private_lookup_by_name(
> > +    struct ovsdb_idl_index *sbrec_chassis_private_by_name,
> > +    const char *name)
> > +{
> > +    struct sbrec_chassis_private *target =
> > +
 sbrec_chassis_private_index_init_row(sbrec_chassis_private_by_name);
> > +    sbrec_chassis_private_index_set_name(target, name);
> > +
> > +    struct sbrec_chassis_private *retval =
sbrec_chassis_private_index_find(
> > +        sbrec_chassis_private_by_name, target);
> > +
> > +    sbrec_chassis_private_index_destroy_row(target);
> > +
> > +    return retval;
> > +}
> > +
> >  struct ovsdb_idl_index *
> >  ha_chassis_group_index_create(struct ovsdb_idl *idl)
> >  {
> > diff --git a/lib/chassis-index.h b/lib/chassis-index.h
> > index 302e5f0fd..b9b331f34 100644
> > --- a/lib/chassis-index.h
> > +++ b/lib/chassis-index.h
> > @@ -23,6 +23,12 @@ struct ovsdb_idl_index *chassis_index_create(struct
ovsdb_idl *);
> >  const struct sbrec_chassis *chassis_lookup_by_name(
> >      struct ovsdb_idl_index *sbrec_chassis_by_name, const char *name);
> >
> > +struct ovsdb_idl_index *chassis_private_index_create(struct ovsdb_idl
*);
> > +
> > +const struct sbrec_chassis_private *
> > +chassis_private_lookup_by_name(
> > +    struct ovsdb_idl_index *sbrec_chassis_private_by_name, const char
*name);
> > +
> >  struct ovsdb_idl_index *ha_chassis_group_index_create(struct ovsdb_idl
*idl);
> >  const struct sbrec_ha_chassis_group *ha_chassis_group_lookup_by_name(
> >      struct ovsdb_idl_index *sbrec_ha_chassis_grp_by_name, const char
*name);
> > diff --git a/northd/ovn-northd.c b/northd/ovn-northd.c
> > index d42a9892a..8ad3842e8 100644
> > --- a/northd/ovn-northd.c
> > +++ b/northd/ovn-northd.c
> > @@ -11174,6 +11174,11 @@ static const char *rbac_chassis_auth[] =
> >  static const char *rbac_chassis_update[] =
> >      {"nb_cfg", "external_ids", "encaps", "vtep_logical_switches"};
> >
> > +static const char *rbac_chassis_private_auth[] =
> > +    {"chassis_name"};
> > +static const char *rbac_chassis_private_update[] =
> > +    {"nb_cfg"};
> > +
> >  static const char *rbac_encap_auth[] =
> >      {"chassis_name"};
> >  static const char *rbac_encap_update[] =
> > @@ -11211,6 +11216,14 @@ static struct rbac_perm_cfg {
> >          .update = rbac_chassis_update,
> >          .n_update = ARRAY_SIZE(rbac_chassis_update),
> >          .row = NULL
> > +    },{
> > +        .table = "Chassis_Private",
> > +        .auth = rbac_chassis_private_auth,
> > +        .n_auth = ARRAY_SIZE(rbac_chassis_private_auth),
> > +        .insdel = true,
> > +        .update = rbac_chassis_private_update,
> > +        .n_update = ARRAY_SIZE(rbac_chassis_private_update),
> > +        .row = NULL
> >      },{
> >          .table = "Encap",
> >          .auth = rbac_encap_auth,
> > @@ -11380,11 +11393,10 @@ update_northbound_cfg(struct northd_context
*ctx,
> >      /* Update northbound hv_cfg if appropriate. */
> >      if (nbg) {
> >          /* Find minimum nb_cfg among all chassis. */
> > -        const struct sbrec_chassis *chassis;
> > +        const struct sbrec_chassis_private *chassis;
> >          int64_t hv_cfg = nbg->nb_cfg;
> > -        SBREC_CHASSIS_FOR_EACH (chassis, ctx->ovnsb_idl) {
> > -            if (!smap_get_bool(&chassis->external_ids, "is-remote",
false) &&
> > -                chassis->nb_cfg < hv_cfg) {
> > +        SBREC_CHASSIS_PRIVATE_FOR_EACH (chassis, ctx->ovnsb_idl) {
> > +            if (chassis->nb_cfg < hv_cfg) {
>
> As mentioned by Dumitru in v1 review, we need to preserve the "is-remote"
check. Otherwise, when interconnection is used the ovn-nbctl --wait=hv sync
command will always fail because remote gateways' nb_cfg record will never
get updated.
>
> >                  hv_cfg = chassis->nb_cfg;
> >              }
> >          }
> > @@ -11670,9 +11682,11 @@ main(int argc, char *argv[])
> >      ovsdb_idl_add_column(ovnsb_idl_loop.idl,
&sbrec_meter_band_col_burst_size);
> >
> >      ovsdb_idl_add_table(ovnsb_idl_loop.idl, &sbrec_table_chassis);
> > -    ovsdb_idl_add_column(ovnsb_idl_loop.idl,
&sbrec_chassis_col_nb_cfg);
> >      ovsdb_idl_add_column(ovnsb_idl_loop.idl, &sbrec_chassis_col_name);
> > -    ovsdb_idl_add_column(ovnsb_idl_loop.idl,
&sbrec_chassis_col_external_ids);
>
> For the reason above we can't omit the external_ids monitoring. It is
used to check if a chassis is remote.
>
> > +
> > +    ovsdb_idl_add_table(ovnsb_idl_loop.idl,
&sbrec_table_chassis_private);
> > +    ovsdb_idl_add_column(ovnsb_idl_loop.idl,
> > +                         &sbrec_chassis_private_col_nb_cfg);
> >
> >      ovsdb_idl_add_table(ovnsb_idl_loop.idl, &sbrec_table_ha_chassis);
> >      add_column_noalert(ovnsb_idl_loop.idl,
> > diff --git a/ovn-sb.ovsschema b/ovn-sb.ovsschema
> > index d89f8dbbb..fa53c6a4c 100644
> > --- a/ovn-sb.ovsschema
> > +++ b/ovn-sb.ovsschema
> > @@ -1,7 +1,7 @@
> >  {
> >      "name": "OVN_Southbound",
> > -    "version": "2.7.0",
> > -    "cksum": "4286723485 21693",
> > +    "version": "2.8.0",
> > +    "cksum": "3504843097 22072",
> >      "tables": {
> >          "SB_Global": {
> >              "columns": {
> > @@ -43,6 +43,15 @@
> >                                                "max": "unlimited"}}},
> >              "isRoot": true,
> >              "indexes": [["name"]]},
> > +        "Chassis_Private": {
> > +            "columns": {
> > +                "name": {"type": "string"},
> > +                "nb_cfg": {"type": {"key": "integer"}},
> > +                "external_ids": {
> > +                    "type": {"key": "string", "value": "string",
> > +                             "min": 0, "max": "unlimited"}}},
>
> I wonder if we really need external_ids column in this "private" table.
It is one-to-one mapping to the chassis table, so I assume whatever
external-ids should be able to be added in chassis table instead of here.
>
> > +            "isRoot": true,
> > +            "indexes": [["name"]]},
> >          "Encap": {
> >              "columns": {
> >                  "type": {"type": {"key": {
> > diff --git a/ovn-sb.xml b/ovn-sb.xml
> > index 3ae9d4f92..77c5b1f38 100644
> > --- a/ovn-sb.xml
> > +++ b/ovn-sb.xml
> > @@ -256,10 +256,8 @@
> >      </column>
> >
> >      <column name="nb_cfg">
> > -      Sequence number for the configuration.  When
<code>ovn-controller</code>
> > -      updates the configuration of a chassis from the contents of the
> > -      southbound database, it copies <ref table="SB_Global"
column="nb_cfg"/>
> > -      from the <ref table="SB_Global"/> table into this column.
> > +      Deprecated. This column is replaced by the <ref
table="Chassis_Private"
> > +      column="nb_cfg"/> column of the <ref table="Chassis_Private"/>
table.
> >      </column>
> >
> >      <column name="external_ids" key="ovn-bridge-mappings">
> > @@ -366,6 +364,38 @@
> >      </group>
> >    </table>
> >
> > +  <table name="Chassis_Private" title="Chassis Private">
> > +    <p>
> > +      Each row in this table maintains per chassis private data that
are
> > +      accessed only by the owning chassis (write only) and ovn-northd,
not by
> > +      any other chassis.  These data are stored in this separate table
instead
> > +      of the <ref table="Chassis"/> table for performance
considerations:
> > +      the rows in this table can be conditionally monitored by
chassises so
> > +      that each chassis only get update notifications for its own row,
to avoid
> > +      unnecessary chassis private data update flooding in a large scale
> > +      deployment.  (Future: this separation can be avoided if ovsdb
conditional
> > +      monitoring is supported on a set of columns)
> > +    </p>
> > +
> > +    <column name="name">
> > +      The name of the chassis that owns these chassis-private data.
> > +    </column>
> > +
> > +    <column name="nb_cfg">
> > +      Sequence number for the configuration.  When
<code>ovn-controller</code>
> > +      updates the configuration of a chassis from the contents of the
> > +      southbound database, it copies <ref table="SB_Global"
column="nb_cfg"/>
> > +      from the <ref table="SB_Global"/> table into this column.
> > +    </column>
> > +
> > +    <group title="Common Columns">
> > +      The overall purpose of these columns is described under
<code>Common
> > +      Columns</code> at the beginning of this document.
> > +
> > +      <column name="external_ids"/>
> > +    </group>
> > +  </table>
> > +
> >    <table name="Encap" title="Encapsulation Types">
> >      <p>
> >        The <ref column="encaps" table="Chassis"/> column in the <ref
> > diff --git a/tests/ovn-controller.at b/tests/ovn-controller.at
> > index 63b2581c0..8c6e03611 100644
> > --- a/tests/ovn-controller.at
> > +++ b/tests/ovn-controller.at
> > @@ -321,3 +321,29 @@ as ovn-sb
> >  OVS_APP_EXIT_AND_WAIT([ovsdb-server])
> >
> >  AT_CLEANUP
> > +
> > +# Checks that ovn-controller increments the nb_cfg value in the
Chassis_Private table
> > +AT_SETUP([ovn-controller - Bump Chassis_Private nb_cfg value])
> > +AT_KEYWORDS([ovn])
> > +ovn_start
> > +
> > +net_add n1
> > +sim_add hv
> > +as hv
> > +ovs-vsctl add-br br-phys
> > +ovn_attach n1 br-phys 192.168.0.1
> > +
> > +OVS_WAIT_UNTIL([test xhv = x`ovn-sbctl --columns name --bare find
chassis`])
> > +
> > +# Bump the NB_Global nb_cfg value
> > +nb_global_id=$(ovn-nbctl --columns _uuid --bare find nb_global)
> > +ovn-nbctl set NB_Global ${nb_global_id} nb_cfg=999
> > +
> > +# ovn-controller should bump the nb_cfg in the chassis_private table
> > +OVS_WAIT_UNTIL([test x999 = x`ovn-sbctl --columns nb_cfg --bare find
chassis_private`])
> > +
> > +# Assert that the the nb_cfg from the Chassis table was not incremented
> > +OVS_WAIT_UNTIL([test x0 = x`ovn-sbctl --columns nb_cfg --bare find
chassis`])
> > +
> > +OVN_CLEANUP([hv])
> > +AT_CLEANUP
> > --
> > 2.25.2
> >
> > _______________________________________________
> > dev mailing list
> > dev@openvswitch.org
> > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>

Hi Lucas,

Just to follow up. I didn't find any new versions for this patch and it
doesn't seem to be merged. Did you send any new versions or do you still
plan to work on this?

Thanks,
Han
Lucas Alvares Gomes July 29, 2020, 9:46 a.m. UTC | #4
Hi Han,

>
> Just to follow up. I didn't find any new versions for this patch and it
> doesn't seem to be merged. Did you send any new versions or do you still
> plan to work on this?
>

Thanks for this follow up.

So I am not currently working on it but the work is still in our
backlog. Many of the problems that we had with the mechanism
incrementing the "nb_cfg" has been mitigated by some code in the
OpenStack side as well as this commit for core OVN [0] that disabled
flow recomputation upon updates to the Chassis' external_ids column
(we had a few updates to that column every time the nb_cfg was
bumped).

But, I can try to check if I can get some prioritization and start
working again on this patch. Is this something that you need as well ?

[0] https://github.com/ovn-org/ovn/commit/74d90c2223d0a8c123823fb849b4c2de58c296e4

Cheers,
Lucas
Han Zhou July 29, 2020, 7:02 p.m. UTC | #5
On Wed, Jul 29, 2020 at 2:46 AM Lucas Alvares Gomes <lucasagomes@gmail.com>
wrote:
>
> Hi Han,
>
> >
> > Just to follow up. I didn't find any new versions for this patch and it
> > doesn't seem to be merged. Did you send any new versions or do you still
> > plan to work on this?
> >
>
> Thanks for this follow up.
>
> So I am not currently working on it but the work is still in our
> backlog. Many of the problems that we had with the mechanism
> incrementing the "nb_cfg" has been mitigated by some code in the
> OpenStack side as well as this commit for core OVN [0] that disabled
> flow recomputation upon updates to the Chassis' external_ids column
> (we had a few updates to that column every time the nb_cfg was
> bumped).
>
> But, I can try to check if I can get some prioritization and start
> working again on this patch. Is this something that you need as well ?
>
> [0]
https://github.com/ovn-org/ovn/commit/74d90c2223d0a8c123823fb849b4c2de58c296e4
>
> Cheers,
> Lucas

Thanks Lucas for the update.
Yes, I need it for latency analysis in scale test. I can pick it up. I
rebased on master and also fixed the RBAC issue and some minor updates. I
will post it later this week.

Thanks,
Han
Lucas Martins July 30, 2020, 9:05 a.m. UTC | #6
On Wed, Jul 29, 2020 at 8:03 PM Han Zhou <hzhou@ovn.org> wrote:
>
>
>
> On Wed, Jul 29, 2020 at 2:46 AM Lucas Alvares Gomes <lucasagomes@gmail.com> wrote:
> >
> > Hi Han,
> >
> > >
> > > Just to follow up. I didn't find any new versions for this patch and it
> > > doesn't seem to be merged. Did you send any new versions or do you still
> > > plan to work on this?
> > >
> >
> > Thanks for this follow up.
> >
> > So I am not currently working on it but the work is still in our
> > backlog. Many of the problems that we had with the mechanism
> > incrementing the "nb_cfg" has been mitigated by some code in the
> > OpenStack side as well as this commit for core OVN [0] that disabled
> > flow recomputation upon updates to the Chassis' external_ids column
> > (we had a few updates to that column every time the nb_cfg was
> > bumped).
> >
> > But, I can try to check if I can get some prioritization and start
> > working again on this patch. Is this something that you need as well ?
> >
> > [0] https://github.com/ovn-org/ovn/commit/74d90c2223d0a8c123823fb849b4c2de58c296e4
> >
> > Cheers,
> > Lucas
>
> Thanks Lucas for the update.
> Yes, I need it for latency analysis in scale test. I can pick it up. I rebased on master and also fixed the RBAC issue and some minor updates. I will post it later this week.
>

Thank you very much Han!
diff mbox series

Patch

diff --git a/NEWS b/NEWS
index 393721d70..2192ea020 100644
--- a/NEWS
+++ b/NEWS
@@ -1,6 +1,10 @@ 
 Post-v20.03.0
 --------------------------
-
+   - The nb_cfg column from the Chassis table in the OVN Southbound
+     database has been deprecated and is no longer updated. A new table
+     called Chassis_Private now contains the nb_cfg column which is updated
+     by incrementing the value in the NB_Global table, CMSes relying on
+     this mechanism should update their code to use this new table.

 OVN v20.03.0 - xx xxx xxxx
 --------------------------
diff --git a/TODO.rst b/TODO.rst
index 809d1c91c..6c565a21a 100644
--- a/TODO.rst
+++ b/TODO.rst
@@ -25,6 +25,9 @@ 
 OVN To-do List
 ==============

+* Add OVSDB conditional monitoring to a set of columns. Once implemented
+  we could get rid of the Chassis and Chassis_Private table separation.
+
 * Get incremental updates in ovn-controller and ovn-northd in some
   sensible way.

diff --git a/controller/chassis.c b/controller/chassis.c
index 522893ead..99ea6b8fc 100644
--- a/controller/chassis.c
+++ b/controller/chassis.c
@@ -585,14 +585,18 @@  chassis_update(const struct sbrec_chassis *chassis_rec,
 const struct sbrec_chassis *
 chassis_run(struct ovsdb_idl_txn *ovnsb_idl_txn,
             struct ovsdb_idl_index *sbrec_chassis_by_name,
+            struct ovsdb_idl_index *sbrec_chassis_private_by_name,
             const struct ovsrec_open_vswitch_table *ovs_table,
             const struct sbrec_chassis_table *chassis_table,
             const char *chassis_id,
             const struct ovsrec_bridge *br_int,
-            const struct sset *transport_zones)
+            const struct sset *transport_zones,
+            const struct sbrec_chassis_private **chassis_private)
 {
     struct ovs_chassis_cfg ovs_cfg;

+    *chassis_private = NULL;
+
     /* Get the chassis config from the ovs table. */
     ovs_chassis_cfg_init(&ovs_cfg);
     if (!chassis_parse_ovs_config(ovs_table, br_int, &ovs_cfg)) {
@@ -616,6 +620,15 @@  chassis_run(struct ovsdb_idl_txn *ovnsb_idl_txn,
                                   chassis_id);
     }

+    const struct sbrec_chassis_private *chassis_private_rec =
+        chassis_private_lookup_by_name(sbrec_chassis_private_by_name,
+                                       chassis_id);
+    if (!chassis_private_rec && ovnsb_idl_txn) {
+        chassis_private_rec = sbrec_chassis_private_insert(ovnsb_idl_txn);
+        sbrec_chassis_private_set_name(chassis_private_rec, chassis_id);
+    }
+    *chassis_private = chassis_private_rec;
+
     ovs_chassis_cfg_destroy(&ovs_cfg);
     return chassis_rec;
 }
@@ -669,7 +682,8 @@  chassis_get_mac(const struct sbrec_chassis *chassis_rec,
  * required. */
 bool
 chassis_cleanup(struct ovsdb_idl_txn *ovnsb_idl_txn,
-                const struct sbrec_chassis *chassis_rec)
+                const struct sbrec_chassis *chassis_rec,
+                const struct sbrec_chassis_private *chassis_private_rec)
 {
     if (!chassis_rec) {
         return true;
@@ -679,6 +693,7 @@  chassis_cleanup(struct ovsdb_idl_txn *ovnsb_idl_txn,
                                   "ovn-controller: unregistering chassis '%s'",
                                   chassis_rec->name);
         sbrec_chassis_delete(chassis_rec);
+        sbrec_chassis_private_delete(chassis_private_rec);
     }
     return false;
 }
diff --git a/controller/chassis.h b/controller/chassis.h
index 178d2957e..81055b403 100644
--- a/controller/chassis.h
+++ b/controller/chassis.h
@@ -17,6 +17,7 @@ 
 #define OVN_CHASSIS_H 1

 #include <stdbool.h>
+#include "lib/ovn-sb-idl.h"

 struct ovsdb_idl;
 struct ovsdb_idl_index;
@@ -33,12 +34,15 @@  void chassis_register_ovs_idl(struct ovsdb_idl *);
 const struct sbrec_chassis *chassis_run(
     struct ovsdb_idl_txn *ovnsb_idl_txn,
     struct ovsdb_idl_index *sbrec_chassis_by_name,
+    struct ovsdb_idl_index *sbrec_chassis_private_by_name,
     const struct ovsrec_open_vswitch_table *,
     const struct sbrec_chassis_table *,
     const char *chassis_id, const struct ovsrec_bridge *br_int,
-    const struct sset *transport_zones);
+    const struct sset *transport_zones,
+    const struct sbrec_chassis_private **chassis_private);
 bool chassis_cleanup(struct ovsdb_idl_txn *ovnsb_idl_txn,
-                     const struct sbrec_chassis *);
+                     const struct sbrec_chassis *,
+                     const struct sbrec_chassis_private *);
 bool chassis_get_mac(const struct sbrec_chassis *chassis,
                      const char *bridge_mapping,
                      struct eth_addr *chassis_mac);
diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c
index 2893eaac1..1542b2ad7 100644
--- a/controller/ovn-controller.c
+++ b/controller/ovn-controller.c
@@ -155,6 +155,7 @@  update_sb_monitors(struct ovsdb_idl *ovnsb_idl,
     struct ovsdb_idl_condition ce =  OVSDB_IDL_CONDITION_INIT(&ce);
     struct ovsdb_idl_condition ip_mcast = OVSDB_IDL_CONDITION_INIT(&ip_mcast);
     struct ovsdb_idl_condition igmp = OVSDB_IDL_CONDITION_INIT(&igmp);
+    struct ovsdb_idl_condition chprv = OVSDB_IDL_CONDITION_INIT(&chprv);

     if (monitor_all) {
         ovsdb_idl_condition_add_clause_true(&pb);
@@ -165,6 +166,7 @@  update_sb_monitors(struct ovsdb_idl *ovnsb_idl,
         ovsdb_idl_condition_add_clause_true(&ce);
         ovsdb_idl_condition_add_clause_true(&ip_mcast);
         ovsdb_idl_condition_add_clause_true(&igmp);
+        ovsdb_idl_condition_add_clause_true(&chprv);
         goto out;
     }

@@ -196,6 +198,10 @@  update_sb_monitors(struct ovsdb_idl *ovnsb_idl,
                                                   &chassis->header_.uuid);
         sbrec_igmp_group_add_clause_chassis(&igmp, OVSDB_F_EQ,
                                             &chassis->header_.uuid);
+
+        /* Monitors Chassis_Private record for current chassis only */
+        sbrec_chassis_private_add_clause_name(&chprv, OVSDB_F_EQ,
+                                              chassis->name);
     }
     if (local_ifaces) {
         const char *name;
@@ -229,6 +235,7 @@  out:
     sbrec_controller_event_set_condition(ovnsb_idl, &ce);
     sbrec_ip_multicast_set_condition(ovnsb_idl, &ip_mcast);
     sbrec_igmp_group_set_condition(ovnsb_idl, &igmp);
+    sbrec_chassis_private_set_condition(ovnsb_idl, &chprv);
     ovsdb_idl_condition_destroy(&pb);
     ovsdb_idl_condition_destroy(&lf);
     ovsdb_idl_condition_destroy(&mb);
@@ -237,6 +244,7 @@  out:
     ovsdb_idl_condition_destroy(&ce);
     ovsdb_idl_condition_destroy(&ip_mcast);
     ovsdb_idl_condition_destroy(&igmp);
+    ovsdb_idl_condition_destroy(&chprv);
 }

 static const char *
@@ -1759,6 +1767,8 @@  main(int argc, char *argv[])

     struct ovsdb_idl_index *sbrec_chassis_by_name
         = chassis_index_create(ovnsb_idl_loop.idl);
+    struct ovsdb_idl_index *sbrec_chassis_private_by_name
+        = chassis_private_index_create(ovnsb_idl_loop.idl);
     struct ovsdb_idl_index *sbrec_multicast_group_by_name_datapath
         = mcast_group_index_create(ovnsb_idl_loop.idl);
     struct ovsdb_idl_index *sbrec_port_binding_by_name
@@ -1784,7 +1794,8 @@  main(int argc, char *argv[])
         = igmp_group_index_create(ovnsb_idl_loop.idl);

     ovsdb_idl_track_add_all(ovnsb_idl_loop.idl);
-    ovsdb_idl_omit_alert(ovnsb_idl_loop.idl, &sbrec_chassis_col_nb_cfg);
+    ovsdb_idl_omit_alert(ovnsb_idl_loop.idl,
+                         &sbrec_chassis_private_col_nb_cfg);

     /* Omit the external_ids column of all the tables except for -
      *  - DNS. pinctrl.c uses the external_ids column of DNS,
@@ -1820,6 +1831,10 @@  main(int argc, char *argv[])
     ovsdb_idl_omit(ovnsb_idl_loop.idl, &sbrec_connection_col_status);
     ovsdb_idl_omit(ovnsb_idl_loop.idl, &sbrec_connection_col_target);

+    /* Do not monitor Chassis_Private external_ids */
+    ovsdb_idl_omit(ovnsb_idl_loop.idl,
+                   &sbrec_chassis_private_col_external_ids);
+
     update_sb_monitors(ovnsb_idl_loop.idl, NULL, NULL, NULL, false);

     stopwatch_create(CONTROLLER_LOOP_STOPWATCH_NAME, SW_MS);
@@ -1997,10 +2012,13 @@  main(int argc, char *argv[])
                 process_br_int(ovs_idl_txn, bridge_table, ovs_table);
             const char *chassis_id = get_ovs_chassis_id(ovs_table);
             const struct sbrec_chassis *chassis = NULL;
+            const struct sbrec_chassis_private *chassis_private = NULL;
             if (chassis_id) {
                 chassis = chassis_run(ovnsb_idl_txn, sbrec_chassis_by_name,
+                                      sbrec_chassis_private_by_name,
                                       ovs_table, chassis_table, chassis_id,
-                                      br_int, &transport_zones);
+                                      br_int, &transport_zones,
+                                      &chassis_private);
             }

             if (br_int) {
@@ -2125,10 +2143,10 @@  main(int argc, char *argv[])
                 engine_set_force_recompute(false);
             }

-            if (ovnsb_idl_txn && chassis) {
+            if (ovnsb_idl_txn && chassis_private) {
                 int64_t cur_cfg = ofctrl_get_cur_cfg();
-                if (cur_cfg && cur_cfg != chassis->nb_cfg) {
-                    sbrec_chassis_set_nb_cfg(chassis, cur_cfg);
+                if (cur_cfg && cur_cfg != chassis_private->nb_cfg) {
+                    sbrec_chassis_private_set_nb_cfg(chassis_private, cur_cfg);
                 }
             }

@@ -2231,10 +2249,17 @@  main(int argc, char *argv[])
                    ? chassis_lookup_by_name(sbrec_chassis_by_name, chassis_id)
                    : NULL);

+            const struct sbrec_chassis_private *chassis_private
+                = (chassis_id
+                   ? chassis_private_lookup_by_name(
+                         sbrec_chassis_private_by_name, chassis_id)
+                   : NULL);
+
             /* Run all of the cleanup functions, even if one of them returns
              * false. We're done if all of them return true. */
             done = binding_cleanup(ovnsb_idl_txn, port_binding_table, chassis);
-            done = chassis_cleanup(ovnsb_idl_txn, chassis) && done;
+            done = chassis_cleanup(ovnsb_idl_txn,
+                                   chassis, chassis_private) && done;
             done = encaps_cleanup(ovs_idl_txn, br_int) && done;
             done = igmp_group_cleanup(ovnsb_idl_txn, sbrec_igmp_group) && done;
             if (done) {
diff --git a/lib/chassis-index.c b/lib/chassis-index.c
index 39066f4cc..d760124b4 100644
--- a/lib/chassis-index.c
+++ b/lib/chassis-index.c
@@ -40,6 +40,31 @@  chassis_lookup_by_name(struct ovsdb_idl_index *sbrec_chassis_by_name,
     return retval;
 }

+struct ovsdb_idl_index *
+chassis_private_index_create(struct ovsdb_idl *idl)
+{
+    return ovsdb_idl_index_create1(idl, &sbrec_chassis_private_col_name);
+}
+
+/* Finds and returns the chassis with the given 'name', or NULL if no such
+ * chassis exists. */
+const struct sbrec_chassis_private *
+chassis_private_lookup_by_name(
+    struct ovsdb_idl_index *sbrec_chassis_private_by_name,
+    const char *name)
+{
+    struct sbrec_chassis_private *target =
+        sbrec_chassis_private_index_init_row(sbrec_chassis_private_by_name);
+    sbrec_chassis_private_index_set_name(target, name);
+
+    struct sbrec_chassis_private *retval = sbrec_chassis_private_index_find(
+        sbrec_chassis_private_by_name, target);
+
+    sbrec_chassis_private_index_destroy_row(target);
+
+    return retval;
+}
+
 struct ovsdb_idl_index *
 ha_chassis_group_index_create(struct ovsdb_idl *idl)
 {
diff --git a/lib/chassis-index.h b/lib/chassis-index.h
index 302e5f0fd..b9b331f34 100644
--- a/lib/chassis-index.h
+++ b/lib/chassis-index.h
@@ -23,6 +23,12 @@  struct ovsdb_idl_index *chassis_index_create(struct ovsdb_idl *);
 const struct sbrec_chassis *chassis_lookup_by_name(
     struct ovsdb_idl_index *sbrec_chassis_by_name, const char *name);

+struct ovsdb_idl_index *chassis_private_index_create(struct ovsdb_idl *);
+
+const struct sbrec_chassis_private *
+chassis_private_lookup_by_name(
+    struct ovsdb_idl_index *sbrec_chassis_private_by_name, const char *name);
+
 struct ovsdb_idl_index *ha_chassis_group_index_create(struct ovsdb_idl *idl);
 const struct sbrec_ha_chassis_group *ha_chassis_group_lookup_by_name(
     struct ovsdb_idl_index *sbrec_ha_chassis_grp_by_name, const char *name);
diff --git a/northd/ovn-northd.c b/northd/ovn-northd.c
index d42a9892a..8ad3842e8 100644
--- a/northd/ovn-northd.c
+++ b/northd/ovn-northd.c
@@ -11174,6 +11174,11 @@  static const char *rbac_chassis_auth[] =
 static const char *rbac_chassis_update[] =
     {"nb_cfg", "external_ids", "encaps", "vtep_logical_switches"};

+static const char *rbac_chassis_private_auth[] =
+    {"chassis_name"};
+static const char *rbac_chassis_private_update[] =
+    {"nb_cfg"};
+
 static const char *rbac_encap_auth[] =
     {"chassis_name"};
 static const char *rbac_encap_update[] =
@@ -11211,6 +11216,14 @@  static struct rbac_perm_cfg {
         .update = rbac_chassis_update,
         .n_update = ARRAY_SIZE(rbac_chassis_update),
         .row = NULL
+    },{
+        .table = "Chassis_Private",
+        .auth = rbac_chassis_private_auth,
+        .n_auth = ARRAY_SIZE(rbac_chassis_private_auth),
+        .insdel = true,
+        .update = rbac_chassis_private_update,
+        .n_update = ARRAY_SIZE(rbac_chassis_private_update),
+        .row = NULL
     },{
         .table = "Encap",
         .auth = rbac_encap_auth,
@@ -11380,11 +11393,10 @@  update_northbound_cfg(struct northd_context *ctx,
     /* Update northbound hv_cfg if appropriate. */
     if (nbg) {
         /* Find minimum nb_cfg among all chassis. */
-        const struct sbrec_chassis *chassis;
+        const struct sbrec_chassis_private *chassis;
         int64_t hv_cfg = nbg->nb_cfg;
-        SBREC_CHASSIS_FOR_EACH (chassis, ctx->ovnsb_idl) {
-            if (!smap_get_bool(&chassis->external_ids, "is-remote", false) &&
-                chassis->nb_cfg < hv_cfg) {
+        SBREC_CHASSIS_PRIVATE_FOR_EACH (chassis, ctx->ovnsb_idl) {
+            if (chassis->nb_cfg < hv_cfg) {
                 hv_cfg = chassis->nb_cfg;
             }
         }
@@ -11670,9 +11682,11 @@  main(int argc, char *argv[])
     ovsdb_idl_add_column(ovnsb_idl_loop.idl, &sbrec_meter_band_col_burst_size);

     ovsdb_idl_add_table(ovnsb_idl_loop.idl, &sbrec_table_chassis);
-    ovsdb_idl_add_column(ovnsb_idl_loop.idl, &sbrec_chassis_col_nb_cfg);
     ovsdb_idl_add_column(ovnsb_idl_loop.idl, &sbrec_chassis_col_name);
-    ovsdb_idl_add_column(ovnsb_idl_loop.idl, &sbrec_chassis_col_external_ids);
+
+    ovsdb_idl_add_table(ovnsb_idl_loop.idl, &sbrec_table_chassis_private);
+    ovsdb_idl_add_column(ovnsb_idl_loop.idl,
+                         &sbrec_chassis_private_col_nb_cfg);

     ovsdb_idl_add_table(ovnsb_idl_loop.idl, &sbrec_table_ha_chassis);
     add_column_noalert(ovnsb_idl_loop.idl,
diff --git a/ovn-sb.ovsschema b/ovn-sb.ovsschema
index d89f8dbbb..fa53c6a4c 100644
--- a/ovn-sb.ovsschema
+++ b/ovn-sb.ovsschema
@@ -1,7 +1,7 @@ 
 {
     "name": "OVN_Southbound",
-    "version": "2.7.0",
-    "cksum": "4286723485 21693",
+    "version": "2.8.0",
+    "cksum": "3504843097 22072",
     "tables": {
         "SB_Global": {
             "columns": {
@@ -43,6 +43,15 @@ 
                                               "max": "unlimited"}}},
             "isRoot": true,
             "indexes": [["name"]]},
+        "Chassis_Private": {
+            "columns": {
+                "name": {"type": "string"},
+                "nb_cfg": {"type": {"key": "integer"}},
+                "external_ids": {
+                    "type": {"key": "string", "value": "string",
+                             "min": 0, "max": "unlimited"}}},
+            "isRoot": true,
+            "indexes": [["name"]]},
         "Encap": {
             "columns": {
                 "type": {"type": {"key": {
diff --git a/ovn-sb.xml b/ovn-sb.xml
index 3ae9d4f92..77c5b1f38 100644
--- a/ovn-sb.xml
+++ b/ovn-sb.xml
@@ -256,10 +256,8 @@ 
     </column>

     <column name="nb_cfg">
-      Sequence number for the configuration.  When <code>ovn-controller</code>
-      updates the configuration of a chassis from the contents of the
-      southbound database, it copies <ref table="SB_Global" column="nb_cfg"/>
-      from the <ref table="SB_Global"/> table into this column.
+      Deprecated. This column is replaced by the <ref table="Chassis_Private"
+      column="nb_cfg"/> column of the <ref table="Chassis_Private"/> table.
     </column>

     <column name="external_ids" key="ovn-bridge-mappings">
@@ -366,6 +364,38 @@ 
     </group>
   </table>

+  <table name="Chassis_Private" title="Chassis Private">
+    <p>
+      Each row in this table maintains per chassis private data that are
+      accessed only by the owning chassis (write only) and ovn-northd, not by
+      any other chassis.  These data are stored in this separate table instead
+      of the <ref table="Chassis"/> table for performance considerations:
+      the rows in this table can be conditionally monitored by chassises so
+      that each chassis only get update notifications for its own row, to avoid
+      unnecessary chassis private data update flooding in a large scale
+      deployment.  (Future: this separation can be avoided if ovsdb conditional
+      monitoring is supported on a set of columns)
+    </p>
+
+    <column name="name">
+      The name of the chassis that owns these chassis-private data.
+    </column>
+
+    <column name="nb_cfg">
+      Sequence number for the configuration.  When <code>ovn-controller</code>
+      updates the configuration of a chassis from the contents of the
+      southbound database, it copies <ref table="SB_Global" column="nb_cfg"/>
+      from the <ref table="SB_Global"/> table into this column.
+    </column>
+
+    <group title="Common Columns">
+      The overall purpose of these columns is described under <code>Common
+      Columns</code> at the beginning of this document.
+
+      <column name="external_ids"/>
+    </group>
+  </table>
+
   <table name="Encap" title="Encapsulation Types">
     <p>
       The <ref column="encaps" table="Chassis"/> column in the <ref
diff --git a/tests/ovn-controller.at b/tests/ovn-controller.at
index 63b2581c0..8c6e03611 100644
--- a/tests/ovn-controller.at
+++ b/tests/ovn-controller.at
@@ -321,3 +321,29 @@  as ovn-sb
 OVS_APP_EXIT_AND_WAIT([ovsdb-server])

 AT_CLEANUP
+
+# Checks that ovn-controller increments the nb_cfg value in the Chassis_Private table
+AT_SETUP([ovn-controller - Bump Chassis_Private nb_cfg value])
+AT_KEYWORDS([ovn])
+ovn_start
+
+net_add n1
+sim_add hv
+as hv
+ovs-vsctl add-br br-phys
+ovn_attach n1 br-phys 192.168.0.1
+
+OVS_WAIT_UNTIL([test xhv = x`ovn-sbctl --columns name --bare find chassis`])
+
+# Bump the NB_Global nb_cfg value
+nb_global_id=$(ovn-nbctl --columns _uuid --bare find nb_global)
+ovn-nbctl set NB_Global ${nb_global_id} nb_cfg=999
+
+# ovn-controller should bump the nb_cfg in the chassis_private table
+OVS_WAIT_UNTIL([test x999 = x`ovn-sbctl --columns nb_cfg --bare find chassis_private`])
+
+# Assert that the the nb_cfg from the Chassis table was not incremented
+OVS_WAIT_UNTIL([test x0 = x`ovn-sbctl --columns nb_cfg --bare find chassis`])
+
+OVN_CLEANUP([hv])
+AT_CLEANUP