diff mbox series

[ovs-dev,ovn,OVN] Avoid nb_cfg update notification flooding

Message ID 20200313131850.39496-1-lmartins@redhat.com
State Superseded
Headers show
Series [ovs-dev,ovn,OVN] Avoid nb_cfg update notification flooding | expand

Commit Message

Lucas Martins March 13, 2020, 1:18 p.m. UTC
From: Lucas Alvares Gomes <lucasagomes@gmail.com>

nb_cfg as a mechanism to "ping" OVN control plane is very useful
in many ways. However, the current implementation will trigger
update notifications flooding in the whole control plane. Each
HV updates to SB the nb_cfg number and all these updates are
notified to all the other HVs, which is O(n^2). Although updates
are batched in fewers notifications than n^2, it still generates
significant load on SB DB and also on ovn-controllers.

To solve this problem and make the mechanism more useful in large
scale producation deployment, this patch separates the per HV
*private* data (write only by the owning chassis and not
interesting to any other HVs) from the Chassis table to a separate
table, so that each HV can conditionally monitor and get updates
only for its own record.

Test result shows great improvement:
In a test environment with 1K sandbox HVs, and 10K ports created
on 40 lswitches and 8 lrouters, do the sync test when the system
is idle, with command:

    time ovn-nbctl --wait=hv sync

Original result:
real    4m52.926s
user    0m0.328s
sys     0m0.004s

With this patch:
real    0m11.405s
user    0m0.316s
sys     0m0.016s

Also, regarding backwards compatibility note that the nb_cfg from the
Chassis table is no longer updated. If any system is relying on this
mechanism they should start using the nb_cfg from the Chassis_Private
table from now on.

Co-authored-by: Han Zhou <hzhou8@ebay.com>
Signed-off-by: Han Zhou <hzhou8@ebay.com>
Signed-off-by: Lucas Alvares Gomes <lucasagomes@gmail.com>
---
 controller/chassis.c        | 19 +++++++++++++++++--
 controller/chassis.h        |  8 ++++++--
 controller/ovn-controller.c | 37 ++++++++++++++++++++++++++++++------
 lib/chassis-index.c         | 25 ++++++++++++++++++++++++
 lib/chassis-index.h         |  6 ++++++
 northd/ovn-northd.c         | 27 +++++++++++++++++++++-----
 ovn-sb.ovsschema            | 13 +++++++++++--
 ovn-sb.xml                  | 38 +++++++++++++++++++++++++++++++++----
 tests/ovn-controller.at     | 26 +++++++++++++++++++++++++
 9 files changed, 178 insertions(+), 21 deletions(-)

Comments

Dumitru Ceara March 20, 2020, 10:07 a.m. UTC | #1
On 3/13/20 2:18 PM, lmartins@redhat.com wrote:
> From: Lucas Alvares Gomes <lucasagomes@gmail.com>
> 
> nb_cfg as a mechanism to "ping" OVN control plane is very useful
> in many ways. However, the current implementation will trigger
> update notifications flooding in the whole control plane. Each
> HV updates to SB the nb_cfg number and all these updates are
> notified to all the other HVs, which is O(n^2). Although updates
> are batched in fewers notifications than n^2, it still generates
> significant load on SB DB and also on ovn-controllers.
> 
> To solve this problem and make the mechanism more useful in large
> scale producation deployment, this patch separates the per HV
> *private* data (write only by the owning chassis and not
> interesting to any other HVs) from the Chassis table to a separate
> table, so that each HV can conditionally monitor and get updates
> only for its own record.
> 

Hi Lucas,

Overall this looks good to me. I do have a few small comments/questions,
please see below.

> Test result shows great improvement:
> In a test environment with 1K sandbox HVs, and 10K ports created
> on 40 lswitches and 8 lrouters, do the sync test when the system
> is idle, with command:
> 
>     time ovn-nbctl --wait=hv sync
> 
> Original result:
> real    4m52.926s
> user    0m0.328s
> sys     0m0.004s
> 
> With this patch:
> real    0m11.405s
> user    0m0.316s
> sys     0m0.016s
> 
> Also, regarding backwards compatibility note that the nb_cfg from the
> Chassis table is no longer updated. If any system is relying on this
> mechanism they should start using the nb_cfg from the Chassis_Private
> table from now on.

I see we mention it in the man pages but should we also add an item to
NEWS to make it clear that CMSs must use this new mechanism?

> 
> Co-authored-by: Han Zhou <hzhou8@ebay.com>
> Signed-off-by: Han Zhou <hzhou8@ebay.com>
> Signed-off-by: Lucas Alvares Gomes <lucasagomes@gmail.com>
> ---
>  controller/chassis.c        | 19 +++++++++++++++++--
>  controller/chassis.h        |  8 ++++++--
>  controller/ovn-controller.c | 37 ++++++++++++++++++++++++++++++------
>  lib/chassis-index.c         | 25 ++++++++++++++++++++++++
>  lib/chassis-index.h         |  6 ++++++
>  northd/ovn-northd.c         | 27 +++++++++++++++++++++-----
>  ovn-sb.ovsschema            | 13 +++++++++++--
>  ovn-sb.xml                  | 38 +++++++++++++++++++++++++++++++++----
>  tests/ovn-controller.at     | 26 +++++++++++++++++++++++++
>  9 files changed, 178 insertions(+), 21 deletions(-)
> 
> diff --git a/controller/chassis.c b/controller/chassis.c
> index 522893ead..99ea6b8fc 100644
> --- a/controller/chassis.c
> +++ b/controller/chassis.c
> @@ -585,14 +585,18 @@ chassis_update(const struct sbrec_chassis *chassis_rec,
>  const struct sbrec_chassis *
>  chassis_run(struct ovsdb_idl_txn *ovnsb_idl_txn,
>              struct ovsdb_idl_index *sbrec_chassis_by_name,
> +            struct ovsdb_idl_index *sbrec_chassis_private_by_name,
>              const struct ovsrec_open_vswitch_table *ovs_table,
>              const struct sbrec_chassis_table *chassis_table,
>              const char *chassis_id,
>              const struct ovsrec_bridge *br_int,
> -            const struct sset *transport_zones)
> +            const struct sset *transport_zones,
> +            const struct sbrec_chassis_private **chassis_private)
>  {
>      struct ovs_chassis_cfg ovs_cfg;
>  
> +    *chassis_private = NULL;
> +
>      /* Get the chassis config from the ovs table. */
>      ovs_chassis_cfg_init(&ovs_cfg);
>      if (!chassis_parse_ovs_config(ovs_table, br_int, &ovs_cfg)) {
> @@ -616,6 +620,15 @@ chassis_run(struct ovsdb_idl_txn *ovnsb_idl_txn,
>                                    chassis_id);
>      }
>  
> +    const struct sbrec_chassis_private *chassis_private_rec =
> +        chassis_private_lookup_by_name(sbrec_chassis_private_by_name,
> +                                       chassis_id);
> +    if (!chassis_private_rec && ovnsb_idl_txn) {
> +        chassis_private_rec = sbrec_chassis_private_insert(ovnsb_idl_txn);
> +        sbrec_chassis_private_set_name(chassis_private_rec, chassis_id);
> +    }
> +    *chassis_private = chassis_private_rec;
> +
>      ovs_chassis_cfg_destroy(&ovs_cfg);
>      return chassis_rec;
>  }
> @@ -669,7 +682,8 @@ chassis_get_mac(const struct sbrec_chassis *chassis_rec,
>   * required. */
>  bool
>  chassis_cleanup(struct ovsdb_idl_txn *ovnsb_idl_txn,
> -                const struct sbrec_chassis *chassis_rec)
> +                const struct sbrec_chassis *chassis_rec,
> +                const struct sbrec_chassis_private *chassis_private_rec)
>  {
>      if (!chassis_rec) {
>          return true;
> @@ -679,6 +693,7 @@ chassis_cleanup(struct ovsdb_idl_txn *ovnsb_idl_txn,
>                                    "ovn-controller: unregistering chassis '%s'",
>                                    chassis_rec->name);
>          sbrec_chassis_delete(chassis_rec);
> +        sbrec_chassis_private_delete(chassis_private_rec);
>      }
>      return false;
>  }
> diff --git a/controller/chassis.h b/controller/chassis.h
> index 178d2957e..81055b403 100644
> --- a/controller/chassis.h
> +++ b/controller/chassis.h
> @@ -17,6 +17,7 @@
>  #define OVN_CHASSIS_H 1
>  
>  #include <stdbool.h>
> +#include "lib/ovn-sb-idl.h"
>  
>  struct ovsdb_idl;
>  struct ovsdb_idl_index;
> @@ -33,12 +34,15 @@ void chassis_register_ovs_idl(struct ovsdb_idl *);
>  const struct sbrec_chassis *chassis_run(
>      struct ovsdb_idl_txn *ovnsb_idl_txn,
>      struct ovsdb_idl_index *sbrec_chassis_by_name,
> +    struct ovsdb_idl_index *sbrec_chassis_private_by_name,
>      const struct ovsrec_open_vswitch_table *,
>      const struct sbrec_chassis_table *,
>      const char *chassis_id, const struct ovsrec_bridge *br_int,
> -    const struct sset *transport_zones);
> +    const struct sset *transport_zones,
> +    const struct sbrec_chassis_private **chassis_private);
>  bool chassis_cleanup(struct ovsdb_idl_txn *ovnsb_idl_txn,
> -                     const struct sbrec_chassis *);
> +                     const struct sbrec_chassis *,
> +                     const struct sbrec_chassis_private *);
>  bool chassis_get_mac(const struct sbrec_chassis *chassis,
>                       const char *bridge_mapping,
>                       struct eth_addr *chassis_mac);
> diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c
> index 2893eaac1..1542b2ad7 100644
> --- a/controller/ovn-controller.c
> +++ b/controller/ovn-controller.c
> @@ -155,6 +155,7 @@ update_sb_monitors(struct ovsdb_idl *ovnsb_idl,
>      struct ovsdb_idl_condition ce =  OVSDB_IDL_CONDITION_INIT(&ce);
>      struct ovsdb_idl_condition ip_mcast = OVSDB_IDL_CONDITION_INIT(&ip_mcast);
>      struct ovsdb_idl_condition igmp = OVSDB_IDL_CONDITION_INIT(&igmp);
> +    struct ovsdb_idl_condition chprv = OVSDB_IDL_CONDITION_INIT(&chprv);
>  
>      if (monitor_all) {
>          ovsdb_idl_condition_add_clause_true(&pb);
> @@ -165,6 +166,7 @@ update_sb_monitors(struct ovsdb_idl *ovnsb_idl,
>          ovsdb_idl_condition_add_clause_true(&ce);
>          ovsdb_idl_condition_add_clause_true(&ip_mcast);
>          ovsdb_idl_condition_add_clause_true(&igmp);
> +        ovsdb_idl_condition_add_clause_true(&chprv);
>          goto out;
>      }
>  
> @@ -196,6 +198,10 @@ update_sb_monitors(struct ovsdb_idl *ovnsb_idl,
>                                                    &chassis->header_.uuid);
>          sbrec_igmp_group_add_clause_chassis(&igmp, OVSDB_F_EQ,
>                                              &chassis->header_.uuid);
> +
> +        /* Monitors Chassis_Private record for current chassis only */
> +        sbrec_chassis_private_add_clause_name(&chprv, OVSDB_F_EQ,
> +                                              chassis->name);
>      }
>      if (local_ifaces) {
>          const char *name;
> @@ -229,6 +235,7 @@ out:
>      sbrec_controller_event_set_condition(ovnsb_idl, &ce);
>      sbrec_ip_multicast_set_condition(ovnsb_idl, &ip_mcast);
>      sbrec_igmp_group_set_condition(ovnsb_idl, &igmp);
> +    sbrec_chassis_private_set_condition(ovnsb_idl, &chprv);
>      ovsdb_idl_condition_destroy(&pb);
>      ovsdb_idl_condition_destroy(&lf);
>      ovsdb_idl_condition_destroy(&mb);
> @@ -237,6 +244,7 @@ out:
>      ovsdb_idl_condition_destroy(&ce);
>      ovsdb_idl_condition_destroy(&ip_mcast);
>      ovsdb_idl_condition_destroy(&igmp);
> +    ovsdb_idl_condition_destroy(&chprv);
>  }
>  
>  static const char *
> @@ -1759,6 +1767,8 @@ main(int argc, char *argv[])
>  
>      struct ovsdb_idl_index *sbrec_chassis_by_name
>          = chassis_index_create(ovnsb_idl_loop.idl);
> +    struct ovsdb_idl_index *sbrec_chassis_private_by_name
> +        = chassis_private_index_create(ovnsb_idl_loop.idl);
>      struct ovsdb_idl_index *sbrec_multicast_group_by_name_datapath
>          = mcast_group_index_create(ovnsb_idl_loop.idl);
>      struct ovsdb_idl_index *sbrec_port_binding_by_name
> @@ -1784,7 +1794,8 @@ main(int argc, char *argv[])
>          = igmp_group_index_create(ovnsb_idl_loop.idl);
>  
>      ovsdb_idl_track_add_all(ovnsb_idl_loop.idl);
> -    ovsdb_idl_omit_alert(ovnsb_idl_loop.idl, &sbrec_chassis_col_nb_cfg);
> +    ovsdb_idl_omit_alert(ovnsb_idl_loop.idl,
> +                         &sbrec_chassis_private_col_nb_cfg);
>  
>      /* Omit the external_ids column of all the tables except for -
>       *  - DNS. pinctrl.c uses the external_ids column of DNS,
> @@ -1820,6 +1831,10 @@ main(int argc, char *argv[])
>      ovsdb_idl_omit(ovnsb_idl_loop.idl, &sbrec_connection_col_status);
>      ovsdb_idl_omit(ovnsb_idl_loop.idl, &sbrec_connection_col_target);
>  
> +    /* Do not monitor Chassis_Private external_ids */
> +    ovsdb_idl_omit(ovnsb_idl_loop.idl,
> +                   &sbrec_chassis_private_col_external_ids);
> +
>      update_sb_monitors(ovnsb_idl_loop.idl, NULL, NULL, NULL, false);
>  
>      stopwatch_create(CONTROLLER_LOOP_STOPWATCH_NAME, SW_MS);
> @@ -1997,10 +2012,13 @@ main(int argc, char *argv[])
>                  process_br_int(ovs_idl_txn, bridge_table, ovs_table);
>              const char *chassis_id = get_ovs_chassis_id(ovs_table);
>              const struct sbrec_chassis *chassis = NULL;
> +            const struct sbrec_chassis_private *chassis_private = NULL;
>              if (chassis_id) {
>                  chassis = chassis_run(ovnsb_idl_txn, sbrec_chassis_by_name,
> +                                      sbrec_chassis_private_by_name,
>                                        ovs_table, chassis_table, chassis_id,
> -                                      br_int, &transport_zones);
> +                                      br_int, &transport_zones,
> +                                      &chassis_private);
>              }
>  
>              if (br_int) {
> @@ -2125,10 +2143,10 @@ main(int argc, char *argv[])
>                  engine_set_force_recompute(false);
>              }
>  
> -            if (ovnsb_idl_txn && chassis) {
> +            if (ovnsb_idl_txn && chassis_private) {
>                  int64_t cur_cfg = ofctrl_get_cur_cfg();
> -                if (cur_cfg && cur_cfg != chassis->nb_cfg) {
> -                    sbrec_chassis_set_nb_cfg(chassis, cur_cfg);
> +                if (cur_cfg && cur_cfg != chassis_private->nb_cfg) {
> +                    sbrec_chassis_private_set_nb_cfg(chassis_private, cur_cfg);
>                  }
>              }
>  
> @@ -2231,10 +2249,17 @@ main(int argc, char *argv[])
>                     ? chassis_lookup_by_name(sbrec_chassis_by_name, chassis_id)
>                     : NULL);
>  
> +            const struct sbrec_chassis_private *chassis_private
> +                = (chassis_id
> +                   ? chassis_private_lookup_by_name(
> +                         sbrec_chassis_private_by_name, chassis_id)
> +                   : NULL);
> +
>              /* Run all of the cleanup functions, even if one of them returns
>               * false. We're done if all of them return true. */
>              done = binding_cleanup(ovnsb_idl_txn, port_binding_table, chassis);
> -            done = chassis_cleanup(ovnsb_idl_txn, chassis) && done;
> +            done = chassis_cleanup(ovnsb_idl_txn,
> +                                   chassis, chassis_private) && done;
>              done = encaps_cleanup(ovs_idl_txn, br_int) && done;
>              done = igmp_group_cleanup(ovnsb_idl_txn, sbrec_igmp_group) && done;
>              if (done) {
> diff --git a/lib/chassis-index.c b/lib/chassis-index.c
> index 39066f4cc..d760124b4 100644
> --- a/lib/chassis-index.c
> +++ b/lib/chassis-index.c
> @@ -40,6 +40,31 @@ chassis_lookup_by_name(struct ovsdb_idl_index *sbrec_chassis_by_name,
>      return retval;
>  }
>  
> +struct ovsdb_idl_index *
> +chassis_private_index_create(struct ovsdb_idl *idl)
> +{
> +    return ovsdb_idl_index_create1(idl, &sbrec_chassis_private_col_name);
> +}
> +
> +/* Finds and returns the chassis with the given 'name', or NULL if no such
> + * chassis exists. */
> +const struct sbrec_chassis_private *
> +chassis_private_lookup_by_name(
> +    struct ovsdb_idl_index *sbrec_chassis_private_by_name,
> +    const char *name)
> +{
> +    struct sbrec_chassis_private *target =
> +        sbrec_chassis_private_index_init_row(sbrec_chassis_private_by_name);
> +    sbrec_chassis_private_index_set_name(target, name);
> +
> +    struct sbrec_chassis_private *retval = sbrec_chassis_private_index_find(
> +        sbrec_chassis_private_by_name, target);
> +
> +    sbrec_chassis_private_index_destroy_row(target);
> +
> +    return retval;
> +}
> +
>  struct ovsdb_idl_index *
>  ha_chassis_group_index_create(struct ovsdb_idl *idl)
>  {
> diff --git a/lib/chassis-index.h b/lib/chassis-index.h
> index 302e5f0fd..b9b331f34 100644
> --- a/lib/chassis-index.h
> +++ b/lib/chassis-index.h
> @@ -23,6 +23,12 @@ struct ovsdb_idl_index *chassis_index_create(struct ovsdb_idl *);
>  const struct sbrec_chassis *chassis_lookup_by_name(
>      struct ovsdb_idl_index *sbrec_chassis_by_name, const char *name);
>  
> +struct ovsdb_idl_index *chassis_private_index_create(struct ovsdb_idl *);
> +
> +const struct sbrec_chassis_private *
> +chassis_private_lookup_by_name(
> +    struct ovsdb_idl_index *sbrec_chassis_private_by_name, const char *name);
> +
>  struct ovsdb_idl_index *ha_chassis_group_index_create(struct ovsdb_idl *idl);
>  const struct sbrec_ha_chassis_group *ha_chassis_group_lookup_by_name(
>      struct ovsdb_idl_index *sbrec_ha_chassis_grp_by_name, const char *name);
> diff --git a/northd/ovn-northd.c b/northd/ovn-northd.c
> index d42a9892a..cb2a7e0d6 100644
> --- a/northd/ovn-northd.c
> +++ b/northd/ovn-northd.c
> @@ -11174,6 +11174,11 @@ static const char *rbac_chassis_auth[] =
>  static const char *rbac_chassis_update[] =
>      {"nb_cfg", "external_ids", "encaps", "vtep_logical_switches"};
>  
> +static const char *rbac_chassis_private_auth[] =
> +    {"chassis_name"};
> +static const char *rbac_chassis_private_update[] =
> +    {"nb_cfg"};
> +
>  static const char *rbac_encap_auth[] =
>      {"chassis_name"};
>  static const char *rbac_encap_update[] =
> @@ -11211,6 +11216,14 @@ static struct rbac_perm_cfg {
>          .update = rbac_chassis_update,
>          .n_update = ARRAY_SIZE(rbac_chassis_update),
>          .row = NULL
> +    },{
> +        .table = "Chassis_Private",
> +        .auth = rbac_chassis_private_auth,
> +        .n_auth = ARRAY_SIZE(rbac_chassis_private_auth),
> +        .insdel = true,
> +        .update = rbac_chassis_private_update,
> +        .n_update = ARRAY_SIZE(rbac_chassis_private_update),
> +        .row = NULL
>      },{
>          .table = "Encap",
>          .auth = rbac_encap_auth,
> @@ -11380,11 +11393,10 @@ update_northbound_cfg(struct northd_context *ctx,
>      /* Update northbound hv_cfg if appropriate. */
>      if (nbg) {
>          /* Find minimum nb_cfg among all chassis. */
> -        const struct sbrec_chassis *chassis;
> +        const struct sbrec_chassis_private *chassis;
>          int64_t hv_cfg = nbg->nb_cfg;
> -        SBREC_CHASSIS_FOR_EACH (chassis, ctx->ovnsb_idl) {
> -            if (!smap_get_bool(&chassis->external_ids, "is-remote", false) &&
> -                chassis->nb_cfg < hv_cfg) {

I think this is ok, but maybe Han can confirm that this won't break OVN-IC.

> +        SBREC_CHASSIS_PRIVATE_FOR_EACH (chassis, ctx->ovnsb_idl) {
> +            if (chassis->nb_cfg < hv_cfg) {
>                  hv_cfg = chassis->nb_cfg;
>              }
>          }
> @@ -11670,10 +11682,15 @@ main(int argc, char *argv[])
>      ovsdb_idl_add_column(ovnsb_idl_loop.idl, &sbrec_meter_band_col_burst_size);
>  
>      ovsdb_idl_add_table(ovnsb_idl_loop.idl, &sbrec_table_chassis);
> -    ovsdb_idl_add_column(ovnsb_idl_loop.idl, &sbrec_chassis_col_nb_cfg);
>      ovsdb_idl_add_column(ovnsb_idl_loop.idl, &sbrec_chassis_col_name);
>      ovsdb_idl_add_column(ovnsb_idl_loop.idl, &sbrec_chassis_col_external_ids);

Do we still need to monitor sbrec_chassis_col_external_ids? I don't see
it being used anywhere in ovn-northd anymore.

>  
> +    ovsdb_idl_add_table(ovnsb_idl_loop.idl, &sbrec_table_chassis_private);
> +    ovsdb_idl_add_column(ovnsb_idl_loop.idl,
> +                         &sbrec_chassis_private_col_nb_cfg);
> +    add_column_noalert(ovnsb_idl_loop.idl,
> +                       &sbrec_chassis_private_col_external_ids);
> +

Same here, do we need to monitor sbrec_chassis_private_col_external_ids?
We don't seem to use it in ovn-northd.

>      ovsdb_idl_add_table(ovnsb_idl_loop.idl, &sbrec_table_ha_chassis);
>      add_column_noalert(ovnsb_idl_loop.idl,
>                         &sbrec_ha_chassis_col_chassis);
> diff --git a/ovn-sb.ovsschema b/ovn-sb.ovsschema
> index d89f8dbbb..fa53c6a4c 100644
> --- a/ovn-sb.ovsschema
> +++ b/ovn-sb.ovsschema
> @@ -1,7 +1,7 @@
>  {
>      "name": "OVN_Southbound",
> -    "version": "2.7.0",
> -    "cksum": "4286723485 21693",
> +    "version": "2.8.0",
> +    "cksum": "3504843097 22072",
>      "tables": {
>          "SB_Global": {
>              "columns": {
> @@ -43,6 +43,15 @@
>                                                "max": "unlimited"}}},
>              "isRoot": true,
>              "indexes": [["name"]]},
> +        "Chassis_Private": {
> +            "columns": {
> +                "name": {"type": "string"},
> +                "nb_cfg": {"type": {"key": "integer"}},
> +                "external_ids": {
> +                    "type": {"key": "string", "value": "string",
> +                             "min": 0, "max": "unlimited"}}},
> +            "isRoot": true,
> +            "indexes": [["name"]]},
>          "Encap": {
>              "columns": {
>                  "type": {"type": {"key": {
> diff --git a/ovn-sb.xml b/ovn-sb.xml
> index 3ae9d4f92..77c5b1f38 100644
> --- a/ovn-sb.xml
> +++ b/ovn-sb.xml
> @@ -256,10 +256,8 @@
>      </column>
>  
>      <column name="nb_cfg">
> -      Sequence number for the configuration.  When <code>ovn-controller</code>
> -      updates the configuration of a chassis from the contents of the
> -      southbound database, it copies <ref table="SB_Global" column="nb_cfg"/>
> -      from the <ref table="SB_Global"/> table into this column.
> +      Deprecated. This column is replaced by the <ref table="Chassis_Private"
> +      column="nb_cfg"/> column of the <ref table="Chassis_Private"/> table.
>      </column>
>  
>      <column name="external_ids" key="ovn-bridge-mappings">
> @@ -366,6 +364,38 @@
>      </group>
>    </table>
>  
> +  <table name="Chassis_Private" title="Chassis Private">
> +    <p>
> +      Each row in this table maintains per chassis private data that are
> +      accessed only by the owning chassis (write only) and ovn-northd, not by
> +      any other chassis.  These data are stored in this separate table instead
> +      of the <ref table="Chassis"/> table for performance considerations:
> +      the rows in this table can be conditionally monitored by chassises so
> +      that each chassis only get update notifications for its own row, to avoid
> +      unnecessary chassis private data update flooding in a large scale
> +      deployment.  (Future: this separation can be avoided if ovsdb conditional
> +      monitoring is supported on a set of columns)

Maybe this note about ovsdb conditional monitoring on sets of columns is
better to appear in TODO.rst instead of here?

Thanks,
Dumitru

> +    </p>
> +
> +    <column name="name">
> +      The name of the chassis that owns these chassis-private data.
> +    </column>
> +
> +    <column name="nb_cfg">
> +      Sequence number for the configuration.  When <code>ovn-controller</code>
> +      updates the configuration of a chassis from the contents of the
> +      southbound database, it copies <ref table="SB_Global" column="nb_cfg"/>
> +      from the <ref table="SB_Global"/> table into this column.
> +    </column>
> +
> +    <group title="Common Columns">
> +      The overall purpose of these columns is described under <code>Common
> +      Columns</code> at the beginning of this document.
> +
> +      <column name="external_ids"/>
> +    </group>
> +  </table>
> +
>    <table name="Encap" title="Encapsulation Types">
>      <p>
>        The <ref column="encaps" table="Chassis"/> column in the <ref
> diff --git a/tests/ovn-controller.at b/tests/ovn-controller.at
> index 63b2581c0..8c6e03611 100644
> --- a/tests/ovn-controller.at
> +++ b/tests/ovn-controller.at
> @@ -321,3 +321,29 @@ as ovn-sb
>  OVS_APP_EXIT_AND_WAIT([ovsdb-server])
>  
>  AT_CLEANUP
> +
> +# Checks that ovn-controller increments the nb_cfg value in the Chassis_Private table
> +AT_SETUP([ovn-controller - Bump Chassis_Private nb_cfg value])
> +AT_KEYWORDS([ovn])
> +ovn_start
> +
> +net_add n1
> +sim_add hv
> +as hv
> +ovs-vsctl add-br br-phys
> +ovn_attach n1 br-phys 192.168.0.1
> +
> +OVS_WAIT_UNTIL([test xhv = x`ovn-sbctl --columns name --bare find chassis`])
> +
> +# Bump the NB_Global nb_cfg value
> +nb_global_id=$(ovn-nbctl --columns _uuid --bare find nb_global)
> +ovn-nbctl set NB_Global ${nb_global_id} nb_cfg=999
> +
> +# ovn-controller should bump the nb_cfg in the chassis_private table
> +OVS_WAIT_UNTIL([test x999 = x`ovn-sbctl --columns nb_cfg --bare find chassis_private`])
> +
> +# Assert that the the nb_cfg from the Chassis table was not incremented
> +OVS_WAIT_UNTIL([test x0 = x`ovn-sbctl --columns nb_cfg --bare find chassis`])
> +
> +OVN_CLEANUP([hv])
> +AT_CLEANUP
>
Dumitru Ceara March 20, 2020, 10:28 a.m. UTC | #2
On 3/20/20 11:07 AM, Dumitru Ceara wrote:
> On 3/13/20 2:18 PM, lmartins@redhat.com wrote:
>> From: Lucas Alvares Gomes <lucasagomes@gmail.com>
>>
>> nb_cfg as a mechanism to "ping" OVN control plane is very useful
>> in many ways. However, the current implementation will trigger
>> update notifications flooding in the whole control plane. Each
>> HV updates to SB the nb_cfg number and all these updates are
>> notified to all the other HVs, which is O(n^2). Although updates
>> are batched in fewers notifications than n^2, it still generates
>> significant load on SB DB and also on ovn-controllers.
>>
>> To solve this problem and make the mechanism more useful in large
>> scale producation deployment, this patch separates the per HV
>> *private* data (write only by the owning chassis and not
>> interesting to any other HVs) from the Chassis table to a separate
>> table, so that each HV can conditionally monitor and get updates
>> only for its own record.
>>
> 
> Hi Lucas,
> 
> Overall this looks good to me. I do have a few small comments/questions,
> please see below.
> 
>> Test result shows great improvement:
>> In a test environment with 1K sandbox HVs, and 10K ports created
>> on 40 lswitches and 8 lrouters, do the sync test when the system
>> is idle, with command:
>>
>>     time ovn-nbctl --wait=hv sync
>>
>> Original result:
>> real    4m52.926s
>> user    0m0.328s
>> sys     0m0.004s
>>
>> With this patch:
>> real    0m11.405s
>> user    0m0.316s
>> sys     0m0.016s
>>
>> Also, regarding backwards compatibility note that the nb_cfg from the
>> Chassis table is no longer updated. If any system is relying on this
>> mechanism they should start using the nb_cfg from the Chassis_Private
>> table from now on.
> 
> I see we mention it in the man pages but should we also add an item to
> NEWS to make it clear that CMSs must use this new mechanism?
> 
>>
>> Co-authored-by: Han Zhou <hzhou8@ebay.com>
>> Signed-off-by: Han Zhou <hzhou8@ebay.com>
>> Signed-off-by: Lucas Alvares Gomes <lucasagomes@gmail.com>
>> ---
>>  controller/chassis.c        | 19 +++++++++++++++++--
>>  controller/chassis.h        |  8 ++++++--
>>  controller/ovn-controller.c | 37 ++++++++++++++++++++++++++++++------
>>  lib/chassis-index.c         | 25 ++++++++++++++++++++++++
>>  lib/chassis-index.h         |  6 ++++++
>>  northd/ovn-northd.c         | 27 +++++++++++++++++++++-----
>>  ovn-sb.ovsschema            | 13 +++++++++++--
>>  ovn-sb.xml                  | 38 +++++++++++++++++++++++++++++++++----
>>  tests/ovn-controller.at     | 26 +++++++++++++++++++++++++
>>  9 files changed, 178 insertions(+), 21 deletions(-)
>>
>> diff --git a/controller/chassis.c b/controller/chassis.c
>> index 522893ead..99ea6b8fc 100644
>> --- a/controller/chassis.c
>> +++ b/controller/chassis.c
>> @@ -585,14 +585,18 @@ chassis_update(const struct sbrec_chassis *chassis_rec,
>>  const struct sbrec_chassis *
>>  chassis_run(struct ovsdb_idl_txn *ovnsb_idl_txn,
>>              struct ovsdb_idl_index *sbrec_chassis_by_name,
>> +            struct ovsdb_idl_index *sbrec_chassis_private_by_name,
>>              const struct ovsrec_open_vswitch_table *ovs_table,
>>              const struct sbrec_chassis_table *chassis_table,
>>              const char *chassis_id,
>>              const struct ovsrec_bridge *br_int,
>> -            const struct sset *transport_zones)
>> +            const struct sset *transport_zones,
>> +            const struct sbrec_chassis_private **chassis_private)
>>  {
>>      struct ovs_chassis_cfg ovs_cfg;
>>  
>> +    *chassis_private = NULL;
>> +
>>      /* Get the chassis config from the ovs table. */
>>      ovs_chassis_cfg_init(&ovs_cfg);
>>      if (!chassis_parse_ovs_config(ovs_table, br_int, &ovs_cfg)) {
>> @@ -616,6 +620,15 @@ chassis_run(struct ovsdb_idl_txn *ovnsb_idl_txn,
>>                                    chassis_id);
>>      }
>>  
>> +    const struct sbrec_chassis_private *chassis_private_rec =
>> +        chassis_private_lookup_by_name(sbrec_chassis_private_by_name,
>> +                                       chassis_id);
>> +    if (!chassis_private_rec && ovnsb_idl_txn) {
>> +        chassis_private_rec = sbrec_chassis_private_insert(ovnsb_idl_txn);
>> +        sbrec_chassis_private_set_name(chassis_private_rec, chassis_id);
>> +    }
>> +    *chassis_private = chassis_private_rec;
>> +
>>      ovs_chassis_cfg_destroy(&ovs_cfg);
>>      return chassis_rec;
>>  }
>> @@ -669,7 +682,8 @@ chassis_get_mac(const struct sbrec_chassis *chassis_rec,
>>   * required. */
>>  bool
>>  chassis_cleanup(struct ovsdb_idl_txn *ovnsb_idl_txn,
>> -                const struct sbrec_chassis *chassis_rec)
>> +                const struct sbrec_chassis *chassis_rec,
>> +                const struct sbrec_chassis_private *chassis_private_rec)
>>  {
>>      if (!chassis_rec) {
>>          return true;
>> @@ -679,6 +693,7 @@ chassis_cleanup(struct ovsdb_idl_txn *ovnsb_idl_txn,
>>                                    "ovn-controller: unregistering chassis '%s'",
>>                                    chassis_rec->name);
>>          sbrec_chassis_delete(chassis_rec);
>> +        sbrec_chassis_private_delete(chassis_private_rec);
>>      }
>>      return false;
>>  }
>> diff --git a/controller/chassis.h b/controller/chassis.h
>> index 178d2957e..81055b403 100644
>> --- a/controller/chassis.h
>> +++ b/controller/chassis.h
>> @@ -17,6 +17,7 @@
>>  #define OVN_CHASSIS_H 1
>>  
>>  #include <stdbool.h>
>> +#include "lib/ovn-sb-idl.h"
>>  
>>  struct ovsdb_idl;
>>  struct ovsdb_idl_index;
>> @@ -33,12 +34,15 @@ void chassis_register_ovs_idl(struct ovsdb_idl *);
>>  const struct sbrec_chassis *chassis_run(
>>      struct ovsdb_idl_txn *ovnsb_idl_txn,
>>      struct ovsdb_idl_index *sbrec_chassis_by_name,
>> +    struct ovsdb_idl_index *sbrec_chassis_private_by_name,
>>      const struct ovsrec_open_vswitch_table *,
>>      const struct sbrec_chassis_table *,
>>      const char *chassis_id, const struct ovsrec_bridge *br_int,
>> -    const struct sset *transport_zones);
>> +    const struct sset *transport_zones,
>> +    const struct sbrec_chassis_private **chassis_private);
>>  bool chassis_cleanup(struct ovsdb_idl_txn *ovnsb_idl_txn,
>> -                     const struct sbrec_chassis *);
>> +                     const struct sbrec_chassis *,
>> +                     const struct sbrec_chassis_private *);
>>  bool chassis_get_mac(const struct sbrec_chassis *chassis,
>>                       const char *bridge_mapping,
>>                       struct eth_addr *chassis_mac);
>> diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c
>> index 2893eaac1..1542b2ad7 100644
>> --- a/controller/ovn-controller.c
>> +++ b/controller/ovn-controller.c
>> @@ -155,6 +155,7 @@ update_sb_monitors(struct ovsdb_idl *ovnsb_idl,
>>      struct ovsdb_idl_condition ce =  OVSDB_IDL_CONDITION_INIT(&ce);
>>      struct ovsdb_idl_condition ip_mcast = OVSDB_IDL_CONDITION_INIT(&ip_mcast);
>>      struct ovsdb_idl_condition igmp = OVSDB_IDL_CONDITION_INIT(&igmp);
>> +    struct ovsdb_idl_condition chprv = OVSDB_IDL_CONDITION_INIT(&chprv);
>>  
>>      if (monitor_all) {
>>          ovsdb_idl_condition_add_clause_true(&pb);
>> @@ -165,6 +166,7 @@ update_sb_monitors(struct ovsdb_idl *ovnsb_idl,
>>          ovsdb_idl_condition_add_clause_true(&ce);
>>          ovsdb_idl_condition_add_clause_true(&ip_mcast);
>>          ovsdb_idl_condition_add_clause_true(&igmp);
>> +        ovsdb_idl_condition_add_clause_true(&chprv);
>>          goto out;
>>      }
>>  
>> @@ -196,6 +198,10 @@ update_sb_monitors(struct ovsdb_idl *ovnsb_idl,
>>                                                    &chassis->header_.uuid);
>>          sbrec_igmp_group_add_clause_chassis(&igmp, OVSDB_F_EQ,
>>                                              &chassis->header_.uuid);
>> +
>> +        /* Monitors Chassis_Private record for current chassis only */
>> +        sbrec_chassis_private_add_clause_name(&chprv, OVSDB_F_EQ,
>> +                                              chassis->name);
>>      }
>>      if (local_ifaces) {
>>          const char *name;
>> @@ -229,6 +235,7 @@ out:
>>      sbrec_controller_event_set_condition(ovnsb_idl, &ce);
>>      sbrec_ip_multicast_set_condition(ovnsb_idl, &ip_mcast);
>>      sbrec_igmp_group_set_condition(ovnsb_idl, &igmp);
>> +    sbrec_chassis_private_set_condition(ovnsb_idl, &chprv);
>>      ovsdb_idl_condition_destroy(&pb);
>>      ovsdb_idl_condition_destroy(&lf);
>>      ovsdb_idl_condition_destroy(&mb);
>> @@ -237,6 +244,7 @@ out:
>>      ovsdb_idl_condition_destroy(&ce);
>>      ovsdb_idl_condition_destroy(&ip_mcast);
>>      ovsdb_idl_condition_destroy(&igmp);
>> +    ovsdb_idl_condition_destroy(&chprv);
>>  }
>>  
>>  static const char *
>> @@ -1759,6 +1767,8 @@ main(int argc, char *argv[])
>>  
>>      struct ovsdb_idl_index *sbrec_chassis_by_name
>>          = chassis_index_create(ovnsb_idl_loop.idl);
>> +    struct ovsdb_idl_index *sbrec_chassis_private_by_name
>> +        = chassis_private_index_create(ovnsb_idl_loop.idl);
>>      struct ovsdb_idl_index *sbrec_multicast_group_by_name_datapath
>>          = mcast_group_index_create(ovnsb_idl_loop.idl);
>>      struct ovsdb_idl_index *sbrec_port_binding_by_name
>> @@ -1784,7 +1794,8 @@ main(int argc, char *argv[])
>>          = igmp_group_index_create(ovnsb_idl_loop.idl);
>>  
>>      ovsdb_idl_track_add_all(ovnsb_idl_loop.idl);
>> -    ovsdb_idl_omit_alert(ovnsb_idl_loop.idl, &sbrec_chassis_col_nb_cfg);
>> +    ovsdb_idl_omit_alert(ovnsb_idl_loop.idl,
>> +                         &sbrec_chassis_private_col_nb_cfg);
>>  
>>      /* Omit the external_ids column of all the tables except for -
>>       *  - DNS. pinctrl.c uses the external_ids column of DNS,
>> @@ -1820,6 +1831,10 @@ main(int argc, char *argv[])
>>      ovsdb_idl_omit(ovnsb_idl_loop.idl, &sbrec_connection_col_status);
>>      ovsdb_idl_omit(ovnsb_idl_loop.idl, &sbrec_connection_col_target);
>>  
>> +    /* Do not monitor Chassis_Private external_ids */
>> +    ovsdb_idl_omit(ovnsb_idl_loop.idl,
>> +                   &sbrec_chassis_private_col_external_ids);
>> +
>>      update_sb_monitors(ovnsb_idl_loop.idl, NULL, NULL, NULL, false);
>>  
>>      stopwatch_create(CONTROLLER_LOOP_STOPWATCH_NAME, SW_MS);
>> @@ -1997,10 +2012,13 @@ main(int argc, char *argv[])
>>                  process_br_int(ovs_idl_txn, bridge_table, ovs_table);
>>              const char *chassis_id = get_ovs_chassis_id(ovs_table);
>>              const struct sbrec_chassis *chassis = NULL;
>> +            const struct sbrec_chassis_private *chassis_private = NULL;
>>              if (chassis_id) {
>>                  chassis = chassis_run(ovnsb_idl_txn, sbrec_chassis_by_name,
>> +                                      sbrec_chassis_private_by_name,
>>                                        ovs_table, chassis_table, chassis_id,
>> -                                      br_int, &transport_zones);
>> +                                      br_int, &transport_zones,
>> +                                      &chassis_private);
>>              }
>>  
>>              if (br_int) {
>> @@ -2125,10 +2143,10 @@ main(int argc, char *argv[])
>>                  engine_set_force_recompute(false);
>>              }
>>  
>> -            if (ovnsb_idl_txn && chassis) {
>> +            if (ovnsb_idl_txn && chassis_private) {
>>                  int64_t cur_cfg = ofctrl_get_cur_cfg();
>> -                if (cur_cfg && cur_cfg != chassis->nb_cfg) {
>> -                    sbrec_chassis_set_nb_cfg(chassis, cur_cfg);
>> +                if (cur_cfg && cur_cfg != chassis_private->nb_cfg) {
>> +                    sbrec_chassis_private_set_nb_cfg(chassis_private, cur_cfg);
>>                  }
>>              }
>>  
>> @@ -2231,10 +2249,17 @@ main(int argc, char *argv[])
>>                     ? chassis_lookup_by_name(sbrec_chassis_by_name, chassis_id)
>>                     : NULL);
>>  
>> +            const struct sbrec_chassis_private *chassis_private
>> +                = (chassis_id
>> +                   ? chassis_private_lookup_by_name(
>> +                         sbrec_chassis_private_by_name, chassis_id)
>> +                   : NULL);
>> +
>>              /* Run all of the cleanup functions, even if one of them returns
>>               * false. We're done if all of them return true. */
>>              done = binding_cleanup(ovnsb_idl_txn, port_binding_table, chassis);
>> -            done = chassis_cleanup(ovnsb_idl_txn, chassis) && done;
>> +            done = chassis_cleanup(ovnsb_idl_txn,
>> +                                   chassis, chassis_private) && done;
>>              done = encaps_cleanup(ovs_idl_txn, br_int) && done;
>>              done = igmp_group_cleanup(ovnsb_idl_txn, sbrec_igmp_group) && done;
>>              if (done) {
>> diff --git a/lib/chassis-index.c b/lib/chassis-index.c
>> index 39066f4cc..d760124b4 100644
>> --- a/lib/chassis-index.c
>> +++ b/lib/chassis-index.c
>> @@ -40,6 +40,31 @@ chassis_lookup_by_name(struct ovsdb_idl_index *sbrec_chassis_by_name,
>>      return retval;
>>  }
>>  
>> +struct ovsdb_idl_index *
>> +chassis_private_index_create(struct ovsdb_idl *idl)
>> +{
>> +    return ovsdb_idl_index_create1(idl, &sbrec_chassis_private_col_name);
>> +}
>> +
>> +/* Finds and returns the chassis with the given 'name', or NULL if no such
>> + * chassis exists. */
>> +const struct sbrec_chassis_private *
>> +chassis_private_lookup_by_name(
>> +    struct ovsdb_idl_index *sbrec_chassis_private_by_name,
>> +    const char *name)
>> +{
>> +    struct sbrec_chassis_private *target =
>> +        sbrec_chassis_private_index_init_row(sbrec_chassis_private_by_name);
>> +    sbrec_chassis_private_index_set_name(target, name);
>> +
>> +    struct sbrec_chassis_private *retval = sbrec_chassis_private_index_find(
>> +        sbrec_chassis_private_by_name, target);
>> +
>> +    sbrec_chassis_private_index_destroy_row(target);
>> +
>> +    return retval;
>> +}
>> +
>>  struct ovsdb_idl_index *
>>  ha_chassis_group_index_create(struct ovsdb_idl *idl)
>>  {
>> diff --git a/lib/chassis-index.h b/lib/chassis-index.h
>> index 302e5f0fd..b9b331f34 100644
>> --- a/lib/chassis-index.h
>> +++ b/lib/chassis-index.h
>> @@ -23,6 +23,12 @@ struct ovsdb_idl_index *chassis_index_create(struct ovsdb_idl *);
>>  const struct sbrec_chassis *chassis_lookup_by_name(
>>      struct ovsdb_idl_index *sbrec_chassis_by_name, const char *name);
>>  
>> +struct ovsdb_idl_index *chassis_private_index_create(struct ovsdb_idl *);
>> +
>> +const struct sbrec_chassis_private *
>> +chassis_private_lookup_by_name(
>> +    struct ovsdb_idl_index *sbrec_chassis_private_by_name, const char *name);
>> +
>>  struct ovsdb_idl_index *ha_chassis_group_index_create(struct ovsdb_idl *idl);
>>  const struct sbrec_ha_chassis_group *ha_chassis_group_lookup_by_name(
>>      struct ovsdb_idl_index *sbrec_ha_chassis_grp_by_name, const char *name);
>> diff --git a/northd/ovn-northd.c b/northd/ovn-northd.c
>> index d42a9892a..cb2a7e0d6 100644
>> --- a/northd/ovn-northd.c
>> +++ b/northd/ovn-northd.c
>> @@ -11174,6 +11174,11 @@ static const char *rbac_chassis_auth[] =
>>  static const char *rbac_chassis_update[] =
>>      {"nb_cfg", "external_ids", "encaps", "vtep_logical_switches"};
>>  
>> +static const char *rbac_chassis_private_auth[] =
>> +    {"chassis_name"};
>> +static const char *rbac_chassis_private_update[] =
>> +    {"nb_cfg"};
>> +
>>  static const char *rbac_encap_auth[] =
>>      {"chassis_name"};
>>  static const char *rbac_encap_update[] =
>> @@ -11211,6 +11216,14 @@ static struct rbac_perm_cfg {
>>          .update = rbac_chassis_update,
>>          .n_update = ARRAY_SIZE(rbac_chassis_update),
>>          .row = NULL
>> +    },{
>> +        .table = "Chassis_Private",
>> +        .auth = rbac_chassis_private_auth,
>> +        .n_auth = ARRAY_SIZE(rbac_chassis_private_auth),
>> +        .insdel = true,
>> +        .update = rbac_chassis_private_update,
>> +        .n_update = ARRAY_SIZE(rbac_chassis_private_update),
>> +        .row = NULL
>>      },{
>>          .table = "Encap",
>>          .auth = rbac_encap_auth,
>> @@ -11380,11 +11393,10 @@ update_northbound_cfg(struct northd_context *ctx,
>>      /* Update northbound hv_cfg if appropriate. */
>>      if (nbg) {
>>          /* Find minimum nb_cfg among all chassis. */
>> -        const struct sbrec_chassis *chassis;
>> +        const struct sbrec_chassis_private *chassis;
>>          int64_t hv_cfg = nbg->nb_cfg;
>> -        SBREC_CHASSIS_FOR_EACH (chassis, ctx->ovnsb_idl) {
>> -            if (!smap_get_bool(&chassis->external_ids, "is-remote", false) &&
>> -                chassis->nb_cfg < hv_cfg) {
> 
> I think this is ok, but maybe Han can confirm that this won't break OVN-IC.
> 
>> +        SBREC_CHASSIS_PRIVATE_FOR_EACH (chassis, ctx->ovnsb_idl) {
>> +            if (chassis->nb_cfg < hv_cfg) {
>>                  hv_cfg = chassis->nb_cfg;
>>              }
>>          }
>> @@ -11670,10 +11682,15 @@ main(int argc, char *argv[])
>>      ovsdb_idl_add_column(ovnsb_idl_loop.idl, &sbrec_meter_band_col_burst_size);
>>  
>>      ovsdb_idl_add_table(ovnsb_idl_loop.idl, &sbrec_table_chassis);
>> -    ovsdb_idl_add_column(ovnsb_idl_loop.idl, &sbrec_chassis_col_nb_cfg);
>>      ovsdb_idl_add_column(ovnsb_idl_loop.idl, &sbrec_chassis_col_name);
>>      ovsdb_idl_add_column(ovnsb_idl_loop.idl, &sbrec_chassis_col_external_ids);
> 
> Do we still need to monitor sbrec_chassis_col_external_ids? I don't see
> it being used anywhere in ovn-northd anymore.
> 

I'm not sure if this is related but with your patch I see failures of
the ovn-ic unit tests every now and then. I didn't see them with OVN master.

I ran the tests in a loop like this (50 times in a row until one fails):

$ bash -c "for ((i=0; i<50; i++)); do make check TESTSUITEFLAGS='-k
"ovn-ic"' || exit ; done"
[...]
OVN Interconnection Controller

218: ovn-ic -- AZ register                           ok
219: ovn-ic -- transit switch handling               ok
220: ovn-ic -- gateway sync                          FAILED (ovn-ic.at:82)
221: ovn-ic -- port sync                             ok
222: ovn-ic -- route sync                            ok

Thanks,
Dumitru

>>  
>> +    ovsdb_idl_add_table(ovnsb_idl_loop.idl, &sbrec_table_chassis_private);
>> +    ovsdb_idl_add_column(ovnsb_idl_loop.idl,
>> +                         &sbrec_chassis_private_col_nb_cfg);
>> +    add_column_noalert(ovnsb_idl_loop.idl,
>> +                       &sbrec_chassis_private_col_external_ids);
>> +
> 
> Same here, do we need to monitor sbrec_chassis_private_col_external_ids?
> We don't seem to use it in ovn-northd.
> 
>>      ovsdb_idl_add_table(ovnsb_idl_loop.idl, &sbrec_table_ha_chassis);
>>      add_column_noalert(ovnsb_idl_loop.idl,
>>                         &sbrec_ha_chassis_col_chassis);
>> diff --git a/ovn-sb.ovsschema b/ovn-sb.ovsschema
>> index d89f8dbbb..fa53c6a4c 100644
>> --- a/ovn-sb.ovsschema
>> +++ b/ovn-sb.ovsschema
>> @@ -1,7 +1,7 @@
>>  {
>>      "name": "OVN_Southbound",
>> -    "version": "2.7.0",
>> -    "cksum": "4286723485 21693",
>> +    "version": "2.8.0",
>> +    "cksum": "3504843097 22072",
>>      "tables": {
>>          "SB_Global": {
>>              "columns": {
>> @@ -43,6 +43,15 @@
>>                                                "max": "unlimited"}}},
>>              "isRoot": true,
>>              "indexes": [["name"]]},
>> +        "Chassis_Private": {
>> +            "columns": {
>> +                "name": {"type": "string"},
>> +                "nb_cfg": {"type": {"key": "integer"}},
>> +                "external_ids": {
>> +                    "type": {"key": "string", "value": "string",
>> +                             "min": 0, "max": "unlimited"}}},
>> +            "isRoot": true,
>> +            "indexes": [["name"]]},
>>          "Encap": {
>>              "columns": {
>>                  "type": {"type": {"key": {
>> diff --git a/ovn-sb.xml b/ovn-sb.xml
>> index 3ae9d4f92..77c5b1f38 100644
>> --- a/ovn-sb.xml
>> +++ b/ovn-sb.xml
>> @@ -256,10 +256,8 @@
>>      </column>
>>  
>>      <column name="nb_cfg">
>> -      Sequence number for the configuration.  When <code>ovn-controller</code>
>> -      updates the configuration of a chassis from the contents of the
>> -      southbound database, it copies <ref table="SB_Global" column="nb_cfg"/>
>> -      from the <ref table="SB_Global"/> table into this column.
>> +      Deprecated. This column is replaced by the <ref table="Chassis_Private"
>> +      column="nb_cfg"/> column of the <ref table="Chassis_Private"/> table.
>>      </column>
>>  
>>      <column name="external_ids" key="ovn-bridge-mappings">
>> @@ -366,6 +364,38 @@
>>      </group>
>>    </table>
>>  
>> +  <table name="Chassis_Private" title="Chassis Private">
>> +    <p>
>> +      Each row in this table maintains per chassis private data that are
>> +      accessed only by the owning chassis (write only) and ovn-northd, not by
>> +      any other chassis.  These data are stored in this separate table instead
>> +      of the <ref table="Chassis"/> table for performance considerations:
>> +      the rows in this table can be conditionally monitored by chassises so
>> +      that each chassis only get update notifications for its own row, to avoid
>> +      unnecessary chassis private data update flooding in a large scale
>> +      deployment.  (Future: this separation can be avoided if ovsdb conditional
>> +      monitoring is supported on a set of columns)
> 
> Maybe this note about ovsdb conditional monitoring on sets of columns is
> better to appear in TODO.rst instead of here?
> 
> Thanks,
> Dumitru
> 
>> +    </p>
>> +
>> +    <column name="name">
>> +      The name of the chassis that owns these chassis-private data.
>> +    </column>
>> +
>> +    <column name="nb_cfg">
>> +      Sequence number for the configuration.  When <code>ovn-controller</code>
>> +      updates the configuration of a chassis from the contents of the
>> +      southbound database, it copies <ref table="SB_Global" column="nb_cfg"/>
>> +      from the <ref table="SB_Global"/> table into this column.
>> +    </column>
>> +
>> +    <group title="Common Columns">
>> +      The overall purpose of these columns is described under <code>Common
>> +      Columns</code> at the beginning of this document.
>> +
>> +      <column name="external_ids"/>
>> +    </group>
>> +  </table>
>> +
>>    <table name="Encap" title="Encapsulation Types">
>>      <p>
>>        The <ref column="encaps" table="Chassis"/> column in the <ref
>> diff --git a/tests/ovn-controller.at b/tests/ovn-controller.at
>> index 63b2581c0..8c6e03611 100644
>> --- a/tests/ovn-controller.at
>> +++ b/tests/ovn-controller.at
>> @@ -321,3 +321,29 @@ as ovn-sb
>>  OVS_APP_EXIT_AND_WAIT([ovsdb-server])
>>  
>>  AT_CLEANUP
>> +
>> +# Checks that ovn-controller increments the nb_cfg value in the Chassis_Private table
>> +AT_SETUP([ovn-controller - Bump Chassis_Private nb_cfg value])
>> +AT_KEYWORDS([ovn])
>> +ovn_start
>> +
>> +net_add n1
>> +sim_add hv
>> +as hv
>> +ovs-vsctl add-br br-phys
>> +ovn_attach n1 br-phys 192.168.0.1
>> +
>> +OVS_WAIT_UNTIL([test xhv = x`ovn-sbctl --columns name --bare find chassis`])
>> +
>> +# Bump the NB_Global nb_cfg value
>> +nb_global_id=$(ovn-nbctl --columns _uuid --bare find nb_global)
>> +ovn-nbctl set NB_Global ${nb_global_id} nb_cfg=999
>> +
>> +# ovn-controller should bump the nb_cfg in the chassis_private table
>> +OVS_WAIT_UNTIL([test x999 = x`ovn-sbctl --columns nb_cfg --bare find chassis_private`])
>> +
>> +# Assert that the the nb_cfg from the Chassis table was not incremented
>> +OVS_WAIT_UNTIL([test x0 = x`ovn-sbctl --columns nb_cfg --bare find chassis`])
>> +
>> +OVN_CLEANUP([hv])
>> +AT_CLEANUP
>>
>
Lucas Alvares Gomes March 23, 2020, 11:03 a.m. UTC | #3
Hi,

On Fri, Mar 20, 2020 at 10:07 AM Dumitru Ceara <dceara@redhat.com> wrote:
>
> On 3/13/20 2:18 PM, lmartins@redhat.com wrote:
> > From: Lucas Alvares Gomes <lucasagomes@gmail.com>
> >
> > nb_cfg as a mechanism to "ping" OVN control plane is very useful
> > in many ways. However, the current implementation will trigger
> > update notifications flooding in the whole control plane. Each
> > HV updates to SB the nb_cfg number and all these updates are
> > notified to all the other HVs, which is O(n^2). Although updates
> > are batched in fewers notifications than n^2, it still generates
> > significant load on SB DB and also on ovn-controllers.
> >
> > To solve this problem and make the mechanism more useful in large
> > scale producation deployment, this patch separates the per HV
> > *private* data (write only by the owning chassis and not
> > interesting to any other HVs) from the Chassis table to a separate
> > table, so that each HV can conditionally monitor and get updates
> > only for its own record.
> >
>
> Hi Lucas,
>
> Overall this looks good to me. I do have a few small comments/questions,
> please see below.
>

Thank you very much for taking a look into this

> > Test result shows great improvement:
> > In a test environment with 1K sandbox HVs, and 10K ports created
> > on 40 lswitches and 8 lrouters, do the sync test when the system
> > is idle, with command:
> >
> >     time ovn-nbctl --wait=hv sync
> >
> > Original result:
> > real    4m52.926s
> > user    0m0.328s
> > sys     0m0.004s
> >
> > With this patch:
> > real    0m11.405s
> > user    0m0.316s
> > sys     0m0.016s
> >
> > Also, regarding backwards compatibility note that the nb_cfg from the
> > Chassis table is no longer updated. If any system is relying on this
> > mechanism they should start using the nb_cfg from the Chassis_Private
> > table from now on.
>
> I see we mention it in the man pages but should we also add an item to
> NEWS to make it clear that CMSs must use this new mechanism?
>

Will do

> >
> > Co-authored-by: Han Zhou <hzhou8@ebay.com>
> > Signed-off-by: Han Zhou <hzhou8@ebay.com>
> > Signed-off-by: Lucas Alvares Gomes <lucasagomes@gmail.com>
> > ---
> >  controller/chassis.c        | 19 +++++++++++++++++--
> >  controller/chassis.h        |  8 ++++++--
> >  controller/ovn-controller.c | 37 ++++++++++++++++++++++++++++++------
> >  lib/chassis-index.c         | 25 ++++++++++++++++++++++++
> >  lib/chassis-index.h         |  6 ++++++
> >  northd/ovn-northd.c         | 27 +++++++++++++++++++++-----
> >  ovn-sb.ovsschema            | 13 +++++++++++--
> >  ovn-sb.xml                  | 38 +++++++++++++++++++++++++++++++++----
> >  tests/ovn-controller.at     | 26 +++++++++++++++++++++++++
> >  9 files changed, 178 insertions(+), 21 deletions(-)
> >
> > diff --git a/controller/chassis.c b/controller/chassis.c
> > index 522893ead..99ea6b8fc 100644
> > --- a/controller/chassis.c
> > +++ b/controller/chassis.c
> > @@ -585,14 +585,18 @@ chassis_update(const struct sbrec_chassis *chassis_rec,
> >  const struct sbrec_chassis *
> >  chassis_run(struct ovsdb_idl_txn *ovnsb_idl_txn,
> >              struct ovsdb_idl_index *sbrec_chassis_by_name,
> > +            struct ovsdb_idl_index *sbrec_chassis_private_by_name,
> >              const struct ovsrec_open_vswitch_table *ovs_table,
> >              const struct sbrec_chassis_table *chassis_table,
> >              const char *chassis_id,
> >              const struct ovsrec_bridge *br_int,
> > -            const struct sset *transport_zones)
> > +            const struct sset *transport_zones,
> > +            const struct sbrec_chassis_private **chassis_private)
> >  {
> >      struct ovs_chassis_cfg ovs_cfg;
> >
> > +    *chassis_private = NULL;
> > +
> >      /* Get the chassis config from the ovs table. */
> >      ovs_chassis_cfg_init(&ovs_cfg);
> >      if (!chassis_parse_ovs_config(ovs_table, br_int, &ovs_cfg)) {
> > @@ -616,6 +620,15 @@ chassis_run(struct ovsdb_idl_txn *ovnsb_idl_txn,
> >                                    chassis_id);
> >      }
> >
> > +    const struct sbrec_chassis_private *chassis_private_rec =
> > +        chassis_private_lookup_by_name(sbrec_chassis_private_by_name,
> > +                                       chassis_id);
> > +    if (!chassis_private_rec && ovnsb_idl_txn) {
> > +        chassis_private_rec = sbrec_chassis_private_insert(ovnsb_idl_txn);
> > +        sbrec_chassis_private_set_name(chassis_private_rec, chassis_id);
> > +    }
> > +    *chassis_private = chassis_private_rec;
> > +
> >      ovs_chassis_cfg_destroy(&ovs_cfg);
> >      return chassis_rec;
> >  }
> > @@ -669,7 +682,8 @@ chassis_get_mac(const struct sbrec_chassis *chassis_rec,
> >   * required. */
> >  bool
> >  chassis_cleanup(struct ovsdb_idl_txn *ovnsb_idl_txn,
> > -                const struct sbrec_chassis *chassis_rec)
> > +                const struct sbrec_chassis *chassis_rec,
> > +                const struct sbrec_chassis_private *chassis_private_rec)
> >  {
> >      if (!chassis_rec) {
> >          return true;
> > @@ -679,6 +693,7 @@ chassis_cleanup(struct ovsdb_idl_txn *ovnsb_idl_txn,
> >                                    "ovn-controller: unregistering chassis '%s'",
> >                                    chassis_rec->name);
> >          sbrec_chassis_delete(chassis_rec);
> > +        sbrec_chassis_private_delete(chassis_private_rec);
> >      }
> >      return false;
> >  }
> > diff --git a/controller/chassis.h b/controller/chassis.h
> > index 178d2957e..81055b403 100644
> > --- a/controller/chassis.h
> > +++ b/controller/chassis.h
> > @@ -17,6 +17,7 @@
> >  #define OVN_CHASSIS_H 1
> >
> >  #include <stdbool.h>
> > +#include "lib/ovn-sb-idl.h"
> >
> >  struct ovsdb_idl;
> >  struct ovsdb_idl_index;
> > @@ -33,12 +34,15 @@ void chassis_register_ovs_idl(struct ovsdb_idl *);
> >  const struct sbrec_chassis *chassis_run(
> >      struct ovsdb_idl_txn *ovnsb_idl_txn,
> >      struct ovsdb_idl_index *sbrec_chassis_by_name,
> > +    struct ovsdb_idl_index *sbrec_chassis_private_by_name,
> >      const struct ovsrec_open_vswitch_table *,
> >      const struct sbrec_chassis_table *,
> >      const char *chassis_id, const struct ovsrec_bridge *br_int,
> > -    const struct sset *transport_zones);
> > +    const struct sset *transport_zones,
> > +    const struct sbrec_chassis_private **chassis_private);
> >  bool chassis_cleanup(struct ovsdb_idl_txn *ovnsb_idl_txn,
> > -                     const struct sbrec_chassis *);
> > +                     const struct sbrec_chassis *,
> > +                     const struct sbrec_chassis_private *);
> >  bool chassis_get_mac(const struct sbrec_chassis *chassis,
> >                       const char *bridge_mapping,
> >                       struct eth_addr *chassis_mac);
> > diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c
> > index 2893eaac1..1542b2ad7 100644
> > --- a/controller/ovn-controller.c
> > +++ b/controller/ovn-controller.c
> > @@ -155,6 +155,7 @@ update_sb_monitors(struct ovsdb_idl *ovnsb_idl,
> >      struct ovsdb_idl_condition ce =  OVSDB_IDL_CONDITION_INIT(&ce);
> >      struct ovsdb_idl_condition ip_mcast = OVSDB_IDL_CONDITION_INIT(&ip_mcast);
> >      struct ovsdb_idl_condition igmp = OVSDB_IDL_CONDITION_INIT(&igmp);
> > +    struct ovsdb_idl_condition chprv = OVSDB_IDL_CONDITION_INIT(&chprv);
> >
> >      if (monitor_all) {
> >          ovsdb_idl_condition_add_clause_true(&pb);
> > @@ -165,6 +166,7 @@ update_sb_monitors(struct ovsdb_idl *ovnsb_idl,
> >          ovsdb_idl_condition_add_clause_true(&ce);
> >          ovsdb_idl_condition_add_clause_true(&ip_mcast);
> >          ovsdb_idl_condition_add_clause_true(&igmp);
> > +        ovsdb_idl_condition_add_clause_true(&chprv);
> >          goto out;
> >      }
> >
> > @@ -196,6 +198,10 @@ update_sb_monitors(struct ovsdb_idl *ovnsb_idl,
> >                                                    &chassis->header_.uuid);
> >          sbrec_igmp_group_add_clause_chassis(&igmp, OVSDB_F_EQ,
> >                                              &chassis->header_.uuid);
> > +
> > +        /* Monitors Chassis_Private record for current chassis only */
> > +        sbrec_chassis_private_add_clause_name(&chprv, OVSDB_F_EQ,
> > +                                              chassis->name);
> >      }
> >      if (local_ifaces) {
> >          const char *name;
> > @@ -229,6 +235,7 @@ out:
> >      sbrec_controller_event_set_condition(ovnsb_idl, &ce);
> >      sbrec_ip_multicast_set_condition(ovnsb_idl, &ip_mcast);
> >      sbrec_igmp_group_set_condition(ovnsb_idl, &igmp);
> > +    sbrec_chassis_private_set_condition(ovnsb_idl, &chprv);
> >      ovsdb_idl_condition_destroy(&pb);
> >      ovsdb_idl_condition_destroy(&lf);
> >      ovsdb_idl_condition_destroy(&mb);
> > @@ -237,6 +244,7 @@ out:
> >      ovsdb_idl_condition_destroy(&ce);
> >      ovsdb_idl_condition_destroy(&ip_mcast);
> >      ovsdb_idl_condition_destroy(&igmp);
> > +    ovsdb_idl_condition_destroy(&chprv);
> >  }
> >
> >  static const char *
> > @@ -1759,6 +1767,8 @@ main(int argc, char *argv[])
> >
> >      struct ovsdb_idl_index *sbrec_chassis_by_name
> >          = chassis_index_create(ovnsb_idl_loop.idl);
> > +    struct ovsdb_idl_index *sbrec_chassis_private_by_name
> > +        = chassis_private_index_create(ovnsb_idl_loop.idl);
> >      struct ovsdb_idl_index *sbrec_multicast_group_by_name_datapath
> >          = mcast_group_index_create(ovnsb_idl_loop.idl);
> >      struct ovsdb_idl_index *sbrec_port_binding_by_name
> > @@ -1784,7 +1794,8 @@ main(int argc, char *argv[])
> >          = igmp_group_index_create(ovnsb_idl_loop.idl);
> >
> >      ovsdb_idl_track_add_all(ovnsb_idl_loop.idl);
> > -    ovsdb_idl_omit_alert(ovnsb_idl_loop.idl, &sbrec_chassis_col_nb_cfg);
> > +    ovsdb_idl_omit_alert(ovnsb_idl_loop.idl,
> > +                         &sbrec_chassis_private_col_nb_cfg);
> >
> >      /* Omit the external_ids column of all the tables except for -
> >       *  - DNS. pinctrl.c uses the external_ids column of DNS,
> > @@ -1820,6 +1831,10 @@ main(int argc, char *argv[])
> >      ovsdb_idl_omit(ovnsb_idl_loop.idl, &sbrec_connection_col_status);
> >      ovsdb_idl_omit(ovnsb_idl_loop.idl, &sbrec_connection_col_target);
> >
> > +    /* Do not monitor Chassis_Private external_ids */
> > +    ovsdb_idl_omit(ovnsb_idl_loop.idl,
> > +                   &sbrec_chassis_private_col_external_ids);
> > +
> >      update_sb_monitors(ovnsb_idl_loop.idl, NULL, NULL, NULL, false);
> >
> >      stopwatch_create(CONTROLLER_LOOP_STOPWATCH_NAME, SW_MS);
> > @@ -1997,10 +2012,13 @@ main(int argc, char *argv[])
> >                  process_br_int(ovs_idl_txn, bridge_table, ovs_table);
> >              const char *chassis_id = get_ovs_chassis_id(ovs_table);
> >              const struct sbrec_chassis *chassis = NULL;
> > +            const struct sbrec_chassis_private *chassis_private = NULL;
> >              if (chassis_id) {
> >                  chassis = chassis_run(ovnsb_idl_txn, sbrec_chassis_by_name,
> > +                                      sbrec_chassis_private_by_name,
> >                                        ovs_table, chassis_table, chassis_id,
> > -                                      br_int, &transport_zones);
> > +                                      br_int, &transport_zones,
> > +                                      &chassis_private);
> >              }
> >
> >              if (br_int) {
> > @@ -2125,10 +2143,10 @@ main(int argc, char *argv[])
> >                  engine_set_force_recompute(false);
> >              }
> >
> > -            if (ovnsb_idl_txn && chassis) {
> > +            if (ovnsb_idl_txn && chassis_private) {
> >                  int64_t cur_cfg = ofctrl_get_cur_cfg();
> > -                if (cur_cfg && cur_cfg != chassis->nb_cfg) {
> > -                    sbrec_chassis_set_nb_cfg(chassis, cur_cfg);
> > +                if (cur_cfg && cur_cfg != chassis_private->nb_cfg) {
> > +                    sbrec_chassis_private_set_nb_cfg(chassis_private, cur_cfg);
> >                  }
> >              }
> >
> > @@ -2231,10 +2249,17 @@ main(int argc, char *argv[])
> >                     ? chassis_lookup_by_name(sbrec_chassis_by_name, chassis_id)
> >                     : NULL);
> >
> > +            const struct sbrec_chassis_private *chassis_private
> > +                = (chassis_id
> > +                   ? chassis_private_lookup_by_name(
> > +                         sbrec_chassis_private_by_name, chassis_id)
> > +                   : NULL);
> > +
> >              /* Run all of the cleanup functions, even if one of them returns
> >               * false. We're done if all of them return true. */
> >              done = binding_cleanup(ovnsb_idl_txn, port_binding_table, chassis);
> > -            done = chassis_cleanup(ovnsb_idl_txn, chassis) && done;
> > +            done = chassis_cleanup(ovnsb_idl_txn,
> > +                                   chassis, chassis_private) && done;
> >              done = encaps_cleanup(ovs_idl_txn, br_int) && done;
> >              done = igmp_group_cleanup(ovnsb_idl_txn, sbrec_igmp_group) && done;
> >              if (done) {
> > diff --git a/lib/chassis-index.c b/lib/chassis-index.c
> > index 39066f4cc..d760124b4 100644
> > --- a/lib/chassis-index.c
> > +++ b/lib/chassis-index.c
> > @@ -40,6 +40,31 @@ chassis_lookup_by_name(struct ovsdb_idl_index *sbrec_chassis_by_name,
> >      return retval;
> >  }
> >
> > +struct ovsdb_idl_index *
> > +chassis_private_index_create(struct ovsdb_idl *idl)
> > +{
> > +    return ovsdb_idl_index_create1(idl, &sbrec_chassis_private_col_name);
> > +}
> > +
> > +/* Finds and returns the chassis with the given 'name', or NULL if no such
> > + * chassis exists. */
> > +const struct sbrec_chassis_private *
> > +chassis_private_lookup_by_name(
> > +    struct ovsdb_idl_index *sbrec_chassis_private_by_name,
> > +    const char *name)
> > +{
> > +    struct sbrec_chassis_private *target =
> > +        sbrec_chassis_private_index_init_row(sbrec_chassis_private_by_name);
> > +    sbrec_chassis_private_index_set_name(target, name);
> > +
> > +    struct sbrec_chassis_private *retval = sbrec_chassis_private_index_find(
> > +        sbrec_chassis_private_by_name, target);
> > +
> > +    sbrec_chassis_private_index_destroy_row(target);
> > +
> > +    return retval;
> > +}
> > +
> >  struct ovsdb_idl_index *
> >  ha_chassis_group_index_create(struct ovsdb_idl *idl)
> >  {
> > diff --git a/lib/chassis-index.h b/lib/chassis-index.h
> > index 302e5f0fd..b9b331f34 100644
> > --- a/lib/chassis-index.h
> > +++ b/lib/chassis-index.h
> > @@ -23,6 +23,12 @@ struct ovsdb_idl_index *chassis_index_create(struct ovsdb_idl *);
> >  const struct sbrec_chassis *chassis_lookup_by_name(
> >      struct ovsdb_idl_index *sbrec_chassis_by_name, const char *name);
> >
> > +struct ovsdb_idl_index *chassis_private_index_create(struct ovsdb_idl *);
> > +
> > +const struct sbrec_chassis_private *
> > +chassis_private_lookup_by_name(
> > +    struct ovsdb_idl_index *sbrec_chassis_private_by_name, const char *name);
> > +
> >  struct ovsdb_idl_index *ha_chassis_group_index_create(struct ovsdb_idl *idl);
> >  const struct sbrec_ha_chassis_group *ha_chassis_group_lookup_by_name(
> >      struct ovsdb_idl_index *sbrec_ha_chassis_grp_by_name, const char *name);
> > diff --git a/northd/ovn-northd.c b/northd/ovn-northd.c
> > index d42a9892a..cb2a7e0d6 100644
> > --- a/northd/ovn-northd.c
> > +++ b/northd/ovn-northd.c
> > @@ -11174,6 +11174,11 @@ static const char *rbac_chassis_auth[] =
> >  static const char *rbac_chassis_update[] =
> >      {"nb_cfg", "external_ids", "encaps", "vtep_logical_switches"};
> >
> > +static const char *rbac_chassis_private_auth[] =
> > +    {"chassis_name"};
> > +static const char *rbac_chassis_private_update[] =
> > +    {"nb_cfg"};
> > +
> >  static const char *rbac_encap_auth[] =
> >      {"chassis_name"};
> >  static const char *rbac_encap_update[] =
> > @@ -11211,6 +11216,14 @@ static struct rbac_perm_cfg {
> >          .update = rbac_chassis_update,
> >          .n_update = ARRAY_SIZE(rbac_chassis_update),
> >          .row = NULL
> > +    },{
> > +        .table = "Chassis_Private",
> > +        .auth = rbac_chassis_private_auth,
> > +        .n_auth = ARRAY_SIZE(rbac_chassis_private_auth),
> > +        .insdel = true,
> > +        .update = rbac_chassis_private_update,
> > +        .n_update = ARRAY_SIZE(rbac_chassis_private_update),
> > +        .row = NULL
> >      },{
> >          .table = "Encap",
> >          .auth = rbac_encap_auth,
> > @@ -11380,11 +11393,10 @@ update_northbound_cfg(struct northd_context *ctx,
> >      /* Update northbound hv_cfg if appropriate. */
> >      if (nbg) {
> >          /* Find minimum nb_cfg among all chassis. */
> > -        const struct sbrec_chassis *chassis;
> > +        const struct sbrec_chassis_private *chassis;
> >          int64_t hv_cfg = nbg->nb_cfg;
> > -        SBREC_CHASSIS_FOR_EACH (chassis, ctx->ovnsb_idl) {
> > -            if (!smap_get_bool(&chassis->external_ids, "is-remote", false) &&
> > -                chassis->nb_cfg < hv_cfg) {
>
> I think this is ok, but maybe Han can confirm that this won't break OVN-IC.
>
> > +        SBREC_CHASSIS_PRIVATE_FOR_EACH (chassis, ctx->ovnsb_idl) {
> > +            if (chassis->nb_cfg < hv_cfg) {
> >                  hv_cfg = chassis->nb_cfg;
> >              }
> >          }
> > @@ -11670,10 +11682,15 @@ main(int argc, char *argv[])
> >      ovsdb_idl_add_column(ovnsb_idl_loop.idl, &sbrec_meter_band_col_burst_size);
> >
> >      ovsdb_idl_add_table(ovnsb_idl_loop.idl, &sbrec_table_chassis);
> > -    ovsdb_idl_add_column(ovnsb_idl_loop.idl, &sbrec_chassis_col_nb_cfg);
> >      ovsdb_idl_add_column(ovnsb_idl_loop.idl, &sbrec_chassis_col_name);
> >      ovsdb_idl_add_column(ovnsb_idl_loop.idl, &sbrec_chassis_col_external_ids);
>
> Do we still need to monitor sbrec_chassis_col_external_ids? I don't see
> it being used anywhere in ovn-northd anymore.
>

Indeed, I think we can get rid of it. I will investigate

> >
> > +    ovsdb_idl_add_table(ovnsb_idl_loop.idl, &sbrec_table_chassis_private);
> > +    ovsdb_idl_add_column(ovnsb_idl_loop.idl,
> > +                         &sbrec_chassis_private_col_nb_cfg);
> > +    add_column_noalert(ovnsb_idl_loop.idl,
> > +                       &sbrec_chassis_private_col_external_ids);
> > +
>
> Same here, do we need to monitor sbrec_chassis_private_col_external_ids?
> We don't seem to use it in ovn-northd.
>

Will remove it

> >      ovsdb_idl_add_table(ovnsb_idl_loop.idl, &sbrec_table_ha_chassis);
> >      add_column_noalert(ovnsb_idl_loop.idl,
> >                         &sbrec_ha_chassis_col_chassis);
> > diff --git a/ovn-sb.ovsschema b/ovn-sb.ovsschema
> > index d89f8dbbb..fa53c6a4c 100644
> > --- a/ovn-sb.ovsschema
> > +++ b/ovn-sb.ovsschema
> > @@ -1,7 +1,7 @@
> >  {
> >      "name": "OVN_Southbound",
> > -    "version": "2.7.0",
> > -    "cksum": "4286723485 21693",
> > +    "version": "2.8.0",
> > +    "cksum": "3504843097 22072",
> >      "tables": {
> >          "SB_Global": {
> >              "columns": {
> > @@ -43,6 +43,15 @@
> >                                                "max": "unlimited"}}},
> >              "isRoot": true,
> >              "indexes": [["name"]]},
> > +        "Chassis_Private": {
> > +            "columns": {
> > +                "name": {"type": "string"},
> > +                "nb_cfg": {"type": {"key": "integer"}},
> > +                "external_ids": {
> > +                    "type": {"key": "string", "value": "string",
> > +                             "min": 0, "max": "unlimited"}}},
> > +            "isRoot": true,
> > +            "indexes": [["name"]]},
> >          "Encap": {
> >              "columns": {
> >                  "type": {"type": {"key": {
> > diff --git a/ovn-sb.xml b/ovn-sb.xml
> > index 3ae9d4f92..77c5b1f38 100644
> > --- a/ovn-sb.xml
> > +++ b/ovn-sb.xml
> > @@ -256,10 +256,8 @@
> >      </column>
> >
> >      <column name="nb_cfg">
> > -      Sequence number for the configuration.  When <code>ovn-controller</code>
> > -      updates the configuration of a chassis from the contents of the
> > -      southbound database, it copies <ref table="SB_Global" column="nb_cfg"/>
> > -      from the <ref table="SB_Global"/> table into this column.
> > +      Deprecated. This column is replaced by the <ref table="Chassis_Private"
> > +      column="nb_cfg"/> column of the <ref table="Chassis_Private"/> table.
> >      </column>
> >
> >      <column name="external_ids" key="ovn-bridge-mappings">
> > @@ -366,6 +364,38 @@
> >      </group>
> >    </table>
> >
> > +  <table name="Chassis_Private" title="Chassis Private">
> > +    <p>
> > +      Each row in this table maintains per chassis private data that are
> > +      accessed only by the owning chassis (write only) and ovn-northd, not by
> > +      any other chassis.  These data are stored in this separate table instead
> > +      of the <ref table="Chassis"/> table for performance considerations:
> > +      the rows in this table can be conditionally monitored by chassises so
> > +      that each chassis only get update notifications for its own row, to avoid
> > +      unnecessary chassis private data update flooding in a large scale
> > +      deployment.  (Future: this separation can be avoided if ovsdb conditional
> > +      monitoring is supported on a set of columns)
>
> Maybe this note about ovsdb conditional monitoring on sets of columns is
> better to appear in TODO.rst instead of here?
>

I will add an item on the TODO list regarding that potential feature.

> Thanks,
> Dumitru
>
> > +    </p>
> > +
> > +    <column name="name">
> > +      The name of the chassis that owns these chassis-private data.
> > +    </column>
> > +
> > +    <column name="nb_cfg">
> > +      Sequence number for the configuration.  When <code>ovn-controller</code>
> > +      updates the configuration of a chassis from the contents of the
> > +      southbound database, it copies <ref table="SB_Global" column="nb_cfg"/>
> > +      from the <ref table="SB_Global"/> table into this column.
> > +    </column>
> > +
> > +    <group title="Common Columns">
> > +      The overall purpose of these columns is described under <code>Common
> > +      Columns</code> at the beginning of this document.
> > +
> > +      <column name="external_ids"/>
> > +    </group>
> > +  </table>
> > +
> >    <table name="Encap" title="Encapsulation Types">
> >      <p>
> >        The <ref column="encaps" table="Chassis"/> column in the <ref
> > diff --git a/tests/ovn-controller.at b/tests/ovn-controller.at
> > index 63b2581c0..8c6e03611 100644
> > --- a/tests/ovn-controller.at
> > +++ b/tests/ovn-controller.at
> > @@ -321,3 +321,29 @@ as ovn-sb
> >  OVS_APP_EXIT_AND_WAIT([ovsdb-server])
> >
> >  AT_CLEANUP
> > +
> > +# Checks that ovn-controller increments the nb_cfg value in the Chassis_Private table
> > +AT_SETUP([ovn-controller - Bump Chassis_Private nb_cfg value])
> > +AT_KEYWORDS([ovn])
> > +ovn_start
> > +
> > +net_add n1
> > +sim_add hv
> > +as hv
> > +ovs-vsctl add-br br-phys
> > +ovn_attach n1 br-phys 192.168.0.1
> > +
> > +OVS_WAIT_UNTIL([test xhv = x`ovn-sbctl --columns name --bare find chassis`])
> > +
> > +# Bump the NB_Global nb_cfg value
> > +nb_global_id=$(ovn-nbctl --columns _uuid --bare find nb_global)
> > +ovn-nbctl set NB_Global ${nb_global_id} nb_cfg=999
> > +
> > +# ovn-controller should bump the nb_cfg in the chassis_private table
> > +OVS_WAIT_UNTIL([test x999 = x`ovn-sbctl --columns nb_cfg --bare find chassis_private`])
> > +
> > +# Assert that the the nb_cfg from the Chassis table was not incremented
> > +OVS_WAIT_UNTIL([test x0 = x`ovn-sbctl --columns nb_cfg --bare find chassis`])
> > +
> > +OVN_CLEANUP([hv])
> > +AT_CLEANUP
> >
>
> _______________________________________________
> dev mailing list
> dev@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
diff mbox series

Patch

diff --git a/controller/chassis.c b/controller/chassis.c
index 522893ead..99ea6b8fc 100644
--- a/controller/chassis.c
+++ b/controller/chassis.c
@@ -585,14 +585,18 @@  chassis_update(const struct sbrec_chassis *chassis_rec,
 const struct sbrec_chassis *
 chassis_run(struct ovsdb_idl_txn *ovnsb_idl_txn,
             struct ovsdb_idl_index *sbrec_chassis_by_name,
+            struct ovsdb_idl_index *sbrec_chassis_private_by_name,
             const struct ovsrec_open_vswitch_table *ovs_table,
             const struct sbrec_chassis_table *chassis_table,
             const char *chassis_id,
             const struct ovsrec_bridge *br_int,
-            const struct sset *transport_zones)
+            const struct sset *transport_zones,
+            const struct sbrec_chassis_private **chassis_private)
 {
     struct ovs_chassis_cfg ovs_cfg;
 
+    *chassis_private = NULL;
+
     /* Get the chassis config from the ovs table. */
     ovs_chassis_cfg_init(&ovs_cfg);
     if (!chassis_parse_ovs_config(ovs_table, br_int, &ovs_cfg)) {
@@ -616,6 +620,15 @@  chassis_run(struct ovsdb_idl_txn *ovnsb_idl_txn,
                                   chassis_id);
     }
 
+    const struct sbrec_chassis_private *chassis_private_rec =
+        chassis_private_lookup_by_name(sbrec_chassis_private_by_name,
+                                       chassis_id);
+    if (!chassis_private_rec && ovnsb_idl_txn) {
+        chassis_private_rec = sbrec_chassis_private_insert(ovnsb_idl_txn);
+        sbrec_chassis_private_set_name(chassis_private_rec, chassis_id);
+    }
+    *chassis_private = chassis_private_rec;
+
     ovs_chassis_cfg_destroy(&ovs_cfg);
     return chassis_rec;
 }
@@ -669,7 +682,8 @@  chassis_get_mac(const struct sbrec_chassis *chassis_rec,
  * required. */
 bool
 chassis_cleanup(struct ovsdb_idl_txn *ovnsb_idl_txn,
-                const struct sbrec_chassis *chassis_rec)
+                const struct sbrec_chassis *chassis_rec,
+                const struct sbrec_chassis_private *chassis_private_rec)
 {
     if (!chassis_rec) {
         return true;
@@ -679,6 +693,7 @@  chassis_cleanup(struct ovsdb_idl_txn *ovnsb_idl_txn,
                                   "ovn-controller: unregistering chassis '%s'",
                                   chassis_rec->name);
         sbrec_chassis_delete(chassis_rec);
+        sbrec_chassis_private_delete(chassis_private_rec);
     }
     return false;
 }
diff --git a/controller/chassis.h b/controller/chassis.h
index 178d2957e..81055b403 100644
--- a/controller/chassis.h
+++ b/controller/chassis.h
@@ -17,6 +17,7 @@ 
 #define OVN_CHASSIS_H 1
 
 #include <stdbool.h>
+#include "lib/ovn-sb-idl.h"
 
 struct ovsdb_idl;
 struct ovsdb_idl_index;
@@ -33,12 +34,15 @@  void chassis_register_ovs_idl(struct ovsdb_idl *);
 const struct sbrec_chassis *chassis_run(
     struct ovsdb_idl_txn *ovnsb_idl_txn,
     struct ovsdb_idl_index *sbrec_chassis_by_name,
+    struct ovsdb_idl_index *sbrec_chassis_private_by_name,
     const struct ovsrec_open_vswitch_table *,
     const struct sbrec_chassis_table *,
     const char *chassis_id, const struct ovsrec_bridge *br_int,
-    const struct sset *transport_zones);
+    const struct sset *transport_zones,
+    const struct sbrec_chassis_private **chassis_private);
 bool chassis_cleanup(struct ovsdb_idl_txn *ovnsb_idl_txn,
-                     const struct sbrec_chassis *);
+                     const struct sbrec_chassis *,
+                     const struct sbrec_chassis_private *);
 bool chassis_get_mac(const struct sbrec_chassis *chassis,
                      const char *bridge_mapping,
                      struct eth_addr *chassis_mac);
diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c
index 2893eaac1..1542b2ad7 100644
--- a/controller/ovn-controller.c
+++ b/controller/ovn-controller.c
@@ -155,6 +155,7 @@  update_sb_monitors(struct ovsdb_idl *ovnsb_idl,
     struct ovsdb_idl_condition ce =  OVSDB_IDL_CONDITION_INIT(&ce);
     struct ovsdb_idl_condition ip_mcast = OVSDB_IDL_CONDITION_INIT(&ip_mcast);
     struct ovsdb_idl_condition igmp = OVSDB_IDL_CONDITION_INIT(&igmp);
+    struct ovsdb_idl_condition chprv = OVSDB_IDL_CONDITION_INIT(&chprv);
 
     if (monitor_all) {
         ovsdb_idl_condition_add_clause_true(&pb);
@@ -165,6 +166,7 @@  update_sb_monitors(struct ovsdb_idl *ovnsb_idl,
         ovsdb_idl_condition_add_clause_true(&ce);
         ovsdb_idl_condition_add_clause_true(&ip_mcast);
         ovsdb_idl_condition_add_clause_true(&igmp);
+        ovsdb_idl_condition_add_clause_true(&chprv);
         goto out;
     }
 
@@ -196,6 +198,10 @@  update_sb_monitors(struct ovsdb_idl *ovnsb_idl,
                                                   &chassis->header_.uuid);
         sbrec_igmp_group_add_clause_chassis(&igmp, OVSDB_F_EQ,
                                             &chassis->header_.uuid);
+
+        /* Monitors Chassis_Private record for current chassis only */
+        sbrec_chassis_private_add_clause_name(&chprv, OVSDB_F_EQ,
+                                              chassis->name);
     }
     if (local_ifaces) {
         const char *name;
@@ -229,6 +235,7 @@  out:
     sbrec_controller_event_set_condition(ovnsb_idl, &ce);
     sbrec_ip_multicast_set_condition(ovnsb_idl, &ip_mcast);
     sbrec_igmp_group_set_condition(ovnsb_idl, &igmp);
+    sbrec_chassis_private_set_condition(ovnsb_idl, &chprv);
     ovsdb_idl_condition_destroy(&pb);
     ovsdb_idl_condition_destroy(&lf);
     ovsdb_idl_condition_destroy(&mb);
@@ -237,6 +244,7 @@  out:
     ovsdb_idl_condition_destroy(&ce);
     ovsdb_idl_condition_destroy(&ip_mcast);
     ovsdb_idl_condition_destroy(&igmp);
+    ovsdb_idl_condition_destroy(&chprv);
 }
 
 static const char *
@@ -1759,6 +1767,8 @@  main(int argc, char *argv[])
 
     struct ovsdb_idl_index *sbrec_chassis_by_name
         = chassis_index_create(ovnsb_idl_loop.idl);
+    struct ovsdb_idl_index *sbrec_chassis_private_by_name
+        = chassis_private_index_create(ovnsb_idl_loop.idl);
     struct ovsdb_idl_index *sbrec_multicast_group_by_name_datapath
         = mcast_group_index_create(ovnsb_idl_loop.idl);
     struct ovsdb_idl_index *sbrec_port_binding_by_name
@@ -1784,7 +1794,8 @@  main(int argc, char *argv[])
         = igmp_group_index_create(ovnsb_idl_loop.idl);
 
     ovsdb_idl_track_add_all(ovnsb_idl_loop.idl);
-    ovsdb_idl_omit_alert(ovnsb_idl_loop.idl, &sbrec_chassis_col_nb_cfg);
+    ovsdb_idl_omit_alert(ovnsb_idl_loop.idl,
+                         &sbrec_chassis_private_col_nb_cfg);
 
     /* Omit the external_ids column of all the tables except for -
      *  - DNS. pinctrl.c uses the external_ids column of DNS,
@@ -1820,6 +1831,10 @@  main(int argc, char *argv[])
     ovsdb_idl_omit(ovnsb_idl_loop.idl, &sbrec_connection_col_status);
     ovsdb_idl_omit(ovnsb_idl_loop.idl, &sbrec_connection_col_target);
 
+    /* Do not monitor Chassis_Private external_ids */
+    ovsdb_idl_omit(ovnsb_idl_loop.idl,
+                   &sbrec_chassis_private_col_external_ids);
+
     update_sb_monitors(ovnsb_idl_loop.idl, NULL, NULL, NULL, false);
 
     stopwatch_create(CONTROLLER_LOOP_STOPWATCH_NAME, SW_MS);
@@ -1997,10 +2012,13 @@  main(int argc, char *argv[])
                 process_br_int(ovs_idl_txn, bridge_table, ovs_table);
             const char *chassis_id = get_ovs_chassis_id(ovs_table);
             const struct sbrec_chassis *chassis = NULL;
+            const struct sbrec_chassis_private *chassis_private = NULL;
             if (chassis_id) {
                 chassis = chassis_run(ovnsb_idl_txn, sbrec_chassis_by_name,
+                                      sbrec_chassis_private_by_name,
                                       ovs_table, chassis_table, chassis_id,
-                                      br_int, &transport_zones);
+                                      br_int, &transport_zones,
+                                      &chassis_private);
             }
 
             if (br_int) {
@@ -2125,10 +2143,10 @@  main(int argc, char *argv[])
                 engine_set_force_recompute(false);
             }
 
-            if (ovnsb_idl_txn && chassis) {
+            if (ovnsb_idl_txn && chassis_private) {
                 int64_t cur_cfg = ofctrl_get_cur_cfg();
-                if (cur_cfg && cur_cfg != chassis->nb_cfg) {
-                    sbrec_chassis_set_nb_cfg(chassis, cur_cfg);
+                if (cur_cfg && cur_cfg != chassis_private->nb_cfg) {
+                    sbrec_chassis_private_set_nb_cfg(chassis_private, cur_cfg);
                 }
             }
 
@@ -2231,10 +2249,17 @@  main(int argc, char *argv[])
                    ? chassis_lookup_by_name(sbrec_chassis_by_name, chassis_id)
                    : NULL);
 
+            const struct sbrec_chassis_private *chassis_private
+                = (chassis_id
+                   ? chassis_private_lookup_by_name(
+                         sbrec_chassis_private_by_name, chassis_id)
+                   : NULL);
+
             /* Run all of the cleanup functions, even if one of them returns
              * false. We're done if all of them return true. */
             done = binding_cleanup(ovnsb_idl_txn, port_binding_table, chassis);
-            done = chassis_cleanup(ovnsb_idl_txn, chassis) && done;
+            done = chassis_cleanup(ovnsb_idl_txn,
+                                   chassis, chassis_private) && done;
             done = encaps_cleanup(ovs_idl_txn, br_int) && done;
             done = igmp_group_cleanup(ovnsb_idl_txn, sbrec_igmp_group) && done;
             if (done) {
diff --git a/lib/chassis-index.c b/lib/chassis-index.c
index 39066f4cc..d760124b4 100644
--- a/lib/chassis-index.c
+++ b/lib/chassis-index.c
@@ -40,6 +40,31 @@  chassis_lookup_by_name(struct ovsdb_idl_index *sbrec_chassis_by_name,
     return retval;
 }
 
+struct ovsdb_idl_index *
+chassis_private_index_create(struct ovsdb_idl *idl)
+{
+    return ovsdb_idl_index_create1(idl, &sbrec_chassis_private_col_name);
+}
+
+/* Finds and returns the chassis with the given 'name', or NULL if no such
+ * chassis exists. */
+const struct sbrec_chassis_private *
+chassis_private_lookup_by_name(
+    struct ovsdb_idl_index *sbrec_chassis_private_by_name,
+    const char *name)
+{
+    struct sbrec_chassis_private *target =
+        sbrec_chassis_private_index_init_row(sbrec_chassis_private_by_name);
+    sbrec_chassis_private_index_set_name(target, name);
+
+    struct sbrec_chassis_private *retval = sbrec_chassis_private_index_find(
+        sbrec_chassis_private_by_name, target);
+
+    sbrec_chassis_private_index_destroy_row(target);
+
+    return retval;
+}
+
 struct ovsdb_idl_index *
 ha_chassis_group_index_create(struct ovsdb_idl *idl)
 {
diff --git a/lib/chassis-index.h b/lib/chassis-index.h
index 302e5f0fd..b9b331f34 100644
--- a/lib/chassis-index.h
+++ b/lib/chassis-index.h
@@ -23,6 +23,12 @@  struct ovsdb_idl_index *chassis_index_create(struct ovsdb_idl *);
 const struct sbrec_chassis *chassis_lookup_by_name(
     struct ovsdb_idl_index *sbrec_chassis_by_name, const char *name);
 
+struct ovsdb_idl_index *chassis_private_index_create(struct ovsdb_idl *);
+
+const struct sbrec_chassis_private *
+chassis_private_lookup_by_name(
+    struct ovsdb_idl_index *sbrec_chassis_private_by_name, const char *name);
+
 struct ovsdb_idl_index *ha_chassis_group_index_create(struct ovsdb_idl *idl);
 const struct sbrec_ha_chassis_group *ha_chassis_group_lookup_by_name(
     struct ovsdb_idl_index *sbrec_ha_chassis_grp_by_name, const char *name);
diff --git a/northd/ovn-northd.c b/northd/ovn-northd.c
index d42a9892a..cb2a7e0d6 100644
--- a/northd/ovn-northd.c
+++ b/northd/ovn-northd.c
@@ -11174,6 +11174,11 @@  static const char *rbac_chassis_auth[] =
 static const char *rbac_chassis_update[] =
     {"nb_cfg", "external_ids", "encaps", "vtep_logical_switches"};
 
+static const char *rbac_chassis_private_auth[] =
+    {"chassis_name"};
+static const char *rbac_chassis_private_update[] =
+    {"nb_cfg"};
+
 static const char *rbac_encap_auth[] =
     {"chassis_name"};
 static const char *rbac_encap_update[] =
@@ -11211,6 +11216,14 @@  static struct rbac_perm_cfg {
         .update = rbac_chassis_update,
         .n_update = ARRAY_SIZE(rbac_chassis_update),
         .row = NULL
+    },{
+        .table = "Chassis_Private",
+        .auth = rbac_chassis_private_auth,
+        .n_auth = ARRAY_SIZE(rbac_chassis_private_auth),
+        .insdel = true,
+        .update = rbac_chassis_private_update,
+        .n_update = ARRAY_SIZE(rbac_chassis_private_update),
+        .row = NULL
     },{
         .table = "Encap",
         .auth = rbac_encap_auth,
@@ -11380,11 +11393,10 @@  update_northbound_cfg(struct northd_context *ctx,
     /* Update northbound hv_cfg if appropriate. */
     if (nbg) {
         /* Find minimum nb_cfg among all chassis. */
-        const struct sbrec_chassis *chassis;
+        const struct sbrec_chassis_private *chassis;
         int64_t hv_cfg = nbg->nb_cfg;
-        SBREC_CHASSIS_FOR_EACH (chassis, ctx->ovnsb_idl) {
-            if (!smap_get_bool(&chassis->external_ids, "is-remote", false) &&
-                chassis->nb_cfg < hv_cfg) {
+        SBREC_CHASSIS_PRIVATE_FOR_EACH (chassis, ctx->ovnsb_idl) {
+            if (chassis->nb_cfg < hv_cfg) {
                 hv_cfg = chassis->nb_cfg;
             }
         }
@@ -11670,10 +11682,15 @@  main(int argc, char *argv[])
     ovsdb_idl_add_column(ovnsb_idl_loop.idl, &sbrec_meter_band_col_burst_size);
 
     ovsdb_idl_add_table(ovnsb_idl_loop.idl, &sbrec_table_chassis);
-    ovsdb_idl_add_column(ovnsb_idl_loop.idl, &sbrec_chassis_col_nb_cfg);
     ovsdb_idl_add_column(ovnsb_idl_loop.idl, &sbrec_chassis_col_name);
     ovsdb_idl_add_column(ovnsb_idl_loop.idl, &sbrec_chassis_col_external_ids);
 
+    ovsdb_idl_add_table(ovnsb_idl_loop.idl, &sbrec_table_chassis_private);
+    ovsdb_idl_add_column(ovnsb_idl_loop.idl,
+                         &sbrec_chassis_private_col_nb_cfg);
+    add_column_noalert(ovnsb_idl_loop.idl,
+                       &sbrec_chassis_private_col_external_ids);
+
     ovsdb_idl_add_table(ovnsb_idl_loop.idl, &sbrec_table_ha_chassis);
     add_column_noalert(ovnsb_idl_loop.idl,
                        &sbrec_ha_chassis_col_chassis);
diff --git a/ovn-sb.ovsschema b/ovn-sb.ovsschema
index d89f8dbbb..fa53c6a4c 100644
--- a/ovn-sb.ovsschema
+++ b/ovn-sb.ovsschema
@@ -1,7 +1,7 @@ 
 {
     "name": "OVN_Southbound",
-    "version": "2.7.0",
-    "cksum": "4286723485 21693",
+    "version": "2.8.0",
+    "cksum": "3504843097 22072",
     "tables": {
         "SB_Global": {
             "columns": {
@@ -43,6 +43,15 @@ 
                                               "max": "unlimited"}}},
             "isRoot": true,
             "indexes": [["name"]]},
+        "Chassis_Private": {
+            "columns": {
+                "name": {"type": "string"},
+                "nb_cfg": {"type": {"key": "integer"}},
+                "external_ids": {
+                    "type": {"key": "string", "value": "string",
+                             "min": 0, "max": "unlimited"}}},
+            "isRoot": true,
+            "indexes": [["name"]]},
         "Encap": {
             "columns": {
                 "type": {"type": {"key": {
diff --git a/ovn-sb.xml b/ovn-sb.xml
index 3ae9d4f92..77c5b1f38 100644
--- a/ovn-sb.xml
+++ b/ovn-sb.xml
@@ -256,10 +256,8 @@ 
     </column>
 
     <column name="nb_cfg">
-      Sequence number for the configuration.  When <code>ovn-controller</code>
-      updates the configuration of a chassis from the contents of the
-      southbound database, it copies <ref table="SB_Global" column="nb_cfg"/>
-      from the <ref table="SB_Global"/> table into this column.
+      Deprecated. This column is replaced by the <ref table="Chassis_Private"
+      column="nb_cfg"/> column of the <ref table="Chassis_Private"/> table.
     </column>
 
     <column name="external_ids" key="ovn-bridge-mappings">
@@ -366,6 +364,38 @@ 
     </group>
   </table>
 
+  <table name="Chassis_Private" title="Chassis Private">
+    <p>
+      Each row in this table maintains per chassis private data that are
+      accessed only by the owning chassis (write only) and ovn-northd, not by
+      any other chassis.  These data are stored in this separate table instead
+      of the <ref table="Chassis"/> table for performance considerations:
+      the rows in this table can be conditionally monitored by chassises so
+      that each chassis only get update notifications for its own row, to avoid
+      unnecessary chassis private data update flooding in a large scale
+      deployment.  (Future: this separation can be avoided if ovsdb conditional
+      monitoring is supported on a set of columns)
+    </p>
+
+    <column name="name">
+      The name of the chassis that owns these chassis-private data.
+    </column>
+
+    <column name="nb_cfg">
+      Sequence number for the configuration.  When <code>ovn-controller</code>
+      updates the configuration of a chassis from the contents of the
+      southbound database, it copies <ref table="SB_Global" column="nb_cfg"/>
+      from the <ref table="SB_Global"/> table into this column.
+    </column>
+
+    <group title="Common Columns">
+      The overall purpose of these columns is described under <code>Common
+      Columns</code> at the beginning of this document.
+
+      <column name="external_ids"/>
+    </group>
+  </table>
+
   <table name="Encap" title="Encapsulation Types">
     <p>
       The <ref column="encaps" table="Chassis"/> column in the <ref
diff --git a/tests/ovn-controller.at b/tests/ovn-controller.at
index 63b2581c0..8c6e03611 100644
--- a/tests/ovn-controller.at
+++ b/tests/ovn-controller.at
@@ -321,3 +321,29 @@  as ovn-sb
 OVS_APP_EXIT_AND_WAIT([ovsdb-server])
 
 AT_CLEANUP
+
+# Checks that ovn-controller increments the nb_cfg value in the Chassis_Private table
+AT_SETUP([ovn-controller - Bump Chassis_Private nb_cfg value])
+AT_KEYWORDS([ovn])
+ovn_start
+
+net_add n1
+sim_add hv
+as hv
+ovs-vsctl add-br br-phys
+ovn_attach n1 br-phys 192.168.0.1
+
+OVS_WAIT_UNTIL([test xhv = x`ovn-sbctl --columns name --bare find chassis`])
+
+# Bump the NB_Global nb_cfg value
+nb_global_id=$(ovn-nbctl --columns _uuid --bare find nb_global)
+ovn-nbctl set NB_Global ${nb_global_id} nb_cfg=999
+
+# ovn-controller should bump the nb_cfg in the chassis_private table
+OVS_WAIT_UNTIL([test x999 = x`ovn-sbctl --columns nb_cfg --bare find chassis_private`])
+
+# Assert that the the nb_cfg from the Chassis table was not incremented
+OVS_WAIT_UNTIL([test x0 = x`ovn-sbctl --columns nb_cfg --bare find chassis`])
+
+OVN_CLEANUP([hv])
+AT_CLEANUP