diff mbox series

[ovs-dev,ovn,RFC] Add chassis liveness monitoring mechanism

Message ID 20200219153753.23182-1-lmartins@redhat.com
State Superseded
Headers show
Series [ovs-dev,ovn,RFC] Add chassis liveness monitoring mechanism | expand

Commit Message

Lucas Martins Feb. 19, 2020, 3:37 p.m. UTC
From: Lucas Alvares Gomes <lucasagomes@gmail.com>

NOTE: SENDING THIS PATCH AS A RFC TO SEE WHAT OTHERS MIGHT THINK ABOUT
THE IDEA PRIOR TO WRITTING TESTS TO IT.

CMSes integrating with OVN often uses the nb_cfg mechanism as a way to
check the health status of the ovn-controller process but, the current
implementation isn't ideal for that purpose because it floods the control
plane with update notifications every time the nb_cfg value is
incremented.

This patch is merging two ideas:

1) Han Zhou proposed a patch [0] creating a table called Chassis_Private
   where each hypervisor *only* writes and monitors its own record to
   avoid this flooding problem.

2) Having ovn-controller to periodically log that it's alive instead of
   relying on the nb_cfg mechanism.

By using this mechanism, a CMS can more easily just read this new
Chassis_Private table and figure out the status of each Chassis
(ovn-controller) in the cluster.

Here's some reasons why I believe this approach is better than having
to bump the  nb_cfg value:

1) Simple integration. Before, the CMS had to increment the nb_cfg
   value in the NB_Global table and wait for it to be propagated to the
   Southbound database (ovn-northd and then ovn-controllers) Chassis
   table in order to know the status of the Chassis.

   Now, it can just read the Chassis_Private table and compare the
   alive_at column against the current time.

2) Less costy. Just one read from the db is needed, no writing. Code
   using the nb_cfg mechanism had to implement a few safe-guard code to
   make less error prone. See [1] and [2] for example.

3) Backwards compatibility. The patch [0] was moving the nb_cfg value
   to new table so, systems relying on it would need to update their
   code upon updating OVN.

To enable this new mechanism set an option called
chassis_liveness_interval=N to the "options" column in the NB_Global
table, where N is the time (in seconds) in which the ovn-controller
should updates the alive_at column. If not set or set to 0, the
mechanism is disabled. By default, it's disabled.

[0] https://patchwork.ozlabs.org/patch/899608/
[1] https://review.opendev.org/#/c/703612/
[2] https://review.opendev.org/#/c/704530/

Co-authored-by: Han Zhou <hzhou8@ebay.com>
Co-authored-by: Numan Siddique <numans@ovn.org>
Co-authored-by: Dumitru Ceara <dceara@redhat.com>
Signed-off-by: Lucas Alvares Gomes <lucasagomes@gmail.com>
---
 controller/chassis.c        | 56 +++++++++++++++++++++++++++++++++++--
 controller/chassis.h        | 11 ++++++--
 controller/ovn-controller.c | 44 +++++++++++++++++++++++++++--
 lib/chassis-index.c         | 26 +++++++++++++++++
 lib/chassis-index.h         |  5 ++++
 northd/ovn-northd.c         |  4 +++
 ovn-sb.ovsschema            | 13 +++++++--
 ovn-sb.xml                  | 35 +++++++++++++++++++++++
 8 files changed, 185 insertions(+), 9 deletions(-)

Comments

Dumitru Ceara Feb. 20, 2020, 9:18 a.m. UTC | #1
On 2/19/20 4:37 PM, lmartins@redhat.com wrote:
> From: Lucas Alvares Gomes <lucasagomes@gmail.com>
> 
> NOTE: SENDING THIS PATCH AS A RFC TO SEE WHAT OTHERS MIGHT THINK ABOUT
> THE IDEA PRIOR TO WRITTING TESTS TO IT.
> 
> CMSes integrating with OVN often uses the nb_cfg mechanism as a way to
> check the health status of the ovn-controller process but, the current
> implementation isn't ideal for that purpose because it floods the control
> plane with update notifications every time the nb_cfg value is
> incremented.
> 
> This patch is merging two ideas:
> 
> 1) Han Zhou proposed a patch [0] creating a table called Chassis_Private
>    where each hypervisor *only* writes and monitors its own record to
>    avoid this flooding problem.
> 
> 2) Having ovn-controller to periodically log that it's alive instead of
>    relying on the nb_cfg mechanism.
> 
> By using this mechanism, a CMS can more easily just read this new
> Chassis_Private table and figure out the status of each Chassis
> (ovn-controller) in the cluster.
> 

Hi Lucas,

Thanks a lot for working on this!

> Here's some reasons why I believe this approach is better than having
> to bump the  nb_cfg value:
> 
> 1) Simple integration. Before, the CMS had to increment the nb_cfg
>    value in the NB_Global table and wait for it to be propagated to the
>    Southbound database (ovn-northd and then ovn-controllers) Chassis
>    table in order to know the status of the Chassis.
> 
>    Now, it can just read the Chassis_Private table and compare the
>    alive_at column against the current time.

I guess the only comment I have about this approach is that it doesn't
feel completely OK to store a timestamp representation as string in
"alive_at". I'm afraid this might be too inflexible and force CMSes to
compare their local time with the value in this column.

I know we discussed offline that using a counter that is periodically
incremented by ovn-controller instead of a timestamp would complicate
the implementation in Openstack but maybe other people on this mailing
list have ideas on how to deal with this in a more generic way.

> 
> 2) Less costy. Just one read from the db is needed, no writing. Code
>    using the nb_cfg mechanism had to implement a few safe-guard code to
>    make less error prone. See [1] and [2] for example.
> 
> 3) Backwards compatibility. The patch [0] was moving the nb_cfg value
>    to new table so, systems relying on it would need to update their
>    code upon updating OVN.

Nice!

One more minor comment on the code, inline.

Thanks,
Dumitru

> 
> To enable this new mechanism set an option called
> chassis_liveness_interval=N to the "options" column in the NB_Global
> table, where N is the time (in seconds) in which the ovn-controller
> should updates the alive_at column. If not set or set to 0, the
> mechanism is disabled. By default, it's disabled.
> 
> [0] https://patchwork.ozlabs.org/patch/899608/
> [1] https://review.opendev.org/#/c/703612/
> [2] https://review.opendev.org/#/c/704530/
> 
> Co-authored-by: Han Zhou <hzhou8@ebay.com>
> Co-authored-by: Numan Siddique <numans@ovn.org>
> Co-authored-by: Dumitru Ceara <dceara@redhat.com>
> Signed-off-by: Lucas Alvares Gomes <lucasagomes@gmail.com>
> ---
>  controller/chassis.c        | 56 +++++++++++++++++++++++++++++++++++--
>  controller/chassis.h        | 11 ++++++--
>  controller/ovn-controller.c | 44 +++++++++++++++++++++++++++--
>  lib/chassis-index.c         | 26 +++++++++++++++++
>  lib/chassis-index.h         |  5 ++++
>  northd/ovn-northd.c         |  4 +++
>  ovn-sb.ovsschema            | 13 +++++++--
>  ovn-sb.xml                  | 35 +++++++++++++++++++++++
>  8 files changed, 185 insertions(+), 9 deletions(-)
> 
> diff --git a/controller/chassis.c b/controller/chassis.c
> index 522893ead..7892cb966 100644
> --- a/controller/chassis.c
> +++ b/controller/chassis.c
> @@ -24,6 +24,7 @@
>  #include "openvswitch/dynamic-string.h"
>  #include "openvswitch/vlog.h"
>  #include "openvswitch/ofp-parse.h"
> +#include "openvswitch/poll-loop.h"
>  #include "lib/chassis-index.h"
>  #include "lib/ovn-sb-idl.h"
>  #include "ovn-controller.h"
> @@ -47,11 +48,16 @@ struct chassis_info {
>  
>      /* True if Chassis SB record is initialized, false otherwise. */
>      uint32_t id_inited : 1;
> +
> +    /* Next livennes update time to update the 'alive_at' column of
> +     * Chassis_Private table. */
> +    long long int next_liveness_update;
>  };
>  
>  static struct chassis_info chassis_state = {
>      .id = DS_EMPTY_INITIALIZER,
>      .id_inited = false,
> +    .next_liveness_update = LLONG_MAX,
>  };
>  
>  static void
> @@ -581,18 +587,38 @@ chassis_update(const struct sbrec_chassis *chassis_rec,
>      free(encaps);
>  }
>  
> +static void
> +chassis_update_liveness_cfg(const struct sbrec_chassis_private *chassis_rec,
> +                            struct chassis_info *ch_info,
> +                            int chassis_liveness_interval)
> +{
> +    if (ch_info->next_liveness_update == LLONG_MAX ||
> +        time_msec() > ch_info->next_liveness_update) {
> +        sbrec_chassis_private_set_alive_at(
> +            chassis_rec,
> +            xastrftime_msec("%Y-%m-%d %H:%M:%S", time_wall_msec(), true));
> +        ch_info->next_liveness_update =
> +            time_msec() + chassis_liveness_interval * 1000;
> +    }
> +}
> +
>  /* Returns this chassis's Chassis record, if it is available. */
>  const struct sbrec_chassis *
>  chassis_run(struct ovsdb_idl_txn *ovnsb_idl_txn,
>              struct ovsdb_idl_index *sbrec_chassis_by_name,
> +            struct ovsdb_idl_index *sbrec_chassis_private_by_name,
>              const struct ovsrec_open_vswitch_table *ovs_table,
>              const struct sbrec_chassis_table *chassis_table,
>              const char *chassis_id,
>              const struct ovsrec_bridge *br_int,
> -            const struct sset *transport_zones)
> +            const struct sset *transport_zones,
> +            const struct sbrec_chassis_private **chassis_private,
> +            int chassis_liveness_interval)
>  {
>      struct ovs_chassis_cfg ovs_cfg;
>  
> +    *chassis_private = NULL;
> +
>      /* Get the chassis config from the ovs table. */
>      ovs_chassis_cfg_init(&ovs_cfg);
>      if (!chassis_parse_ovs_config(ovs_table, br_int, &ovs_cfg)) {
> @@ -616,6 +642,24 @@ chassis_run(struct ovsdb_idl_txn *ovnsb_idl_txn,
>                                    chassis_id);
>      }
>  
> +    /* Create the Chassis_Private entry if it's not there yet */
> +    const struct sbrec_chassis_private *chassis_private_rec =
> +        chassis_private_lookup_by_name(sbrec_chassis_private_by_name,
> +                                       chassis_id);
> +    if (!chassis_private_rec && ovnsb_idl_txn) {
> +        chassis_private_rec = sbrec_chassis_private_insert(ovnsb_idl_txn);
> +        sbrec_chassis_private_set_name(chassis_private_rec, chassis_id);
> +    }
> +
> +    /* Update the alive_at column from the Chassis_Private table if
> +     * the chassis liveness mechanism is enabled */
> +    if (chassis_private_rec && ovnsb_idl_txn && chassis_liveness_interval) {
> +        chassis_update_liveness_cfg(chassis_private_rec, &chassis_state,
> +                                    chassis_liveness_interval);
> +    }
> +
> +    *chassis_private = chassis_private_rec;
> +
>      ovs_chassis_cfg_destroy(&ovs_cfg);
>      return chassis_rec;
>  }
> @@ -669,7 +713,8 @@ chassis_get_mac(const struct sbrec_chassis *chassis_rec,
>   * required. */
>  bool
>  chassis_cleanup(struct ovsdb_idl_txn *ovnsb_idl_txn,
> -                const struct sbrec_chassis *chassis_rec)
> +                const struct sbrec_chassis *chassis_rec,
> +                const struct sbrec_chassis_private *chassis_private_rec)
>  {
>      if (!chassis_rec) {
>          return true;
> @@ -679,6 +724,7 @@ chassis_cleanup(struct ovsdb_idl_txn *ovnsb_idl_txn,
>                                    "ovn-controller: unregistering chassis '%s'",
>                                    chassis_rec->name);
>          sbrec_chassis_delete(chassis_rec);
> +        sbrec_chassis_private_delete(chassis_private_rec);
>      }
>      return false;
>  }
> @@ -695,3 +741,9 @@ chassis_get_id(void)
>  
>      return NULL;
>  }
> +
> +void chassis_wait(int chassis_liveness_interval) {
> +    if (chassis_liveness_interval) {
> +        poll_timer_wait_until(chassis_state.next_liveness_update);
> +    }
> +}
> diff --git a/controller/chassis.h b/controller/chassis.h
> index 178d2957e..38a549553 100644
> --- a/controller/chassis.h
> +++ b/controller/chassis.h
> @@ -17,6 +17,7 @@
>  #define OVN_CHASSIS_H 1
>  
>  #include <stdbool.h>
> +#include "lib/ovn-sb-idl.h"
>  
>  struct ovsdb_idl;
>  struct ovsdb_idl_index;
> @@ -33,17 +34,21 @@ void chassis_register_ovs_idl(struct ovsdb_idl *);
>  const struct sbrec_chassis *chassis_run(
>      struct ovsdb_idl_txn *ovnsb_idl_txn,
>      struct ovsdb_idl_index *sbrec_chassis_by_name,
> +    struct ovsdb_idl_index *sbrec_chassis_private_by_name,
>      const struct ovsrec_open_vswitch_table *,
>      const struct sbrec_chassis_table *,
>      const char *chassis_id, const struct ovsrec_bridge *br_int,
> -    const struct sset *transport_zones);
> +    const struct sset *transport_zones,
> +    const struct sbrec_chassis_private **chassis_private,
> +    int chassis_liveness_interval);
>  bool chassis_cleanup(struct ovsdb_idl_txn *ovnsb_idl_txn,
> -                     const struct sbrec_chassis *);
> +                     const struct sbrec_chassis *,
> +                     const struct sbrec_chassis_private *);
>  bool chassis_get_mac(const struct sbrec_chassis *chassis,
>                       const char *bridge_mapping,
>                       struct eth_addr *chassis_mac);
>  const char *chassis_get_id(void);
>  const char * get_chassis_mac_mappings(const struct smap *ext_ids);
> -
> +void chassis_wait(int chassis_liveness_interval);
>  
>  #endif /* controller/chassis.h */
> diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c
> index 4d245ca28..fd9d95da2 100644
> --- a/controller/ovn-controller.c
> +++ b/controller/ovn-controller.c
> @@ -155,6 +155,7 @@ update_sb_monitors(struct ovsdb_idl *ovnsb_idl,
>      struct ovsdb_idl_condition ce =  OVSDB_IDL_CONDITION_INIT(&ce);
>      struct ovsdb_idl_condition ip_mcast = OVSDB_IDL_CONDITION_INIT(&ip_mcast);
>      struct ovsdb_idl_condition igmp = OVSDB_IDL_CONDITION_INIT(&igmp);
> +    struct ovsdb_idl_condition chprv = OVSDB_IDL_CONDITION_INIT(&chprv);
>  
>      if (monitor_all) {
>          ovsdb_idl_condition_add_clause_true(&pb);
> @@ -165,6 +166,7 @@ update_sb_monitors(struct ovsdb_idl *ovnsb_idl,
>          ovsdb_idl_condition_add_clause_true(&ce);
>          ovsdb_idl_condition_add_clause_true(&ip_mcast);
>          ovsdb_idl_condition_add_clause_true(&igmp);
> +        ovsdb_idl_condition_add_clause_true(&chprv);
>          goto out;
>      }
>  
> @@ -196,6 +198,10 @@ update_sb_monitors(struct ovsdb_idl *ovnsb_idl,
>                                                    &chassis->header_.uuid);
>          sbrec_igmp_group_add_clause_chassis(&igmp, OVSDB_F_EQ,
>                                              &chassis->header_.uuid);
> +
> +        /* Monitors Chassis_Private record for current chassis only */
> +        sbrec_chassis_private_add_clause_name(&chprv, OVSDB_F_EQ,
> +                                              chassis->name);
>      }
>      if (local_ifaces) {
>          const char *name;
> @@ -229,6 +235,7 @@ out:
>      sbrec_controller_event_set_condition(ovnsb_idl, &ce);
>      sbrec_ip_multicast_set_condition(ovnsb_idl, &ip_mcast);
>      sbrec_igmp_group_set_condition(ovnsb_idl, &igmp);
> +    sbrec_chassis_private_set_condition(ovnsb_idl, &chprv);
>      ovsdb_idl_condition_destroy(&pb);
>      ovsdb_idl_condition_destroy(&lf);
>      ovsdb_idl_condition_destroy(&mb);
> @@ -237,6 +244,7 @@ out:
>      ovsdb_idl_condition_destroy(&ce);
>      ovsdb_idl_condition_destroy(&ip_mcast);
>      ovsdb_idl_condition_destroy(&igmp);
> +    ovsdb_idl_condition_destroy(&chprv);
>  }
>  
>  static const char *
> @@ -687,6 +695,15 @@ get_transport_zones(const struct ovsrec_open_vswitch_table *ovs_table)
>      return smap_get_def(&cfg->external_ids, "ovn-transport-zones", "");
>  }
>  
> +static uint16_t
> +get_chassis_liveness_interval(const struct sbrec_sb_global_table
> +                              *sb_global_table)
> +{
> +    const struct sbrec_sb_global *sb
> +        = sbrec_sb_global_table_first(sb_global_table);
> +    return atoi(smap_get_def(&sb->options, "chassis_liveness_interval", "0"));

We have smap_get_int in smap.h which does exactly this with a bit more
error checking.

> +}
> +
>  static void
>  ctrl_register_ovs_idl(struct ovsdb_idl *ovs_idl)
>  {
> @@ -1760,6 +1777,8 @@ main(int argc, char *argv[])
>  
>      struct ovsdb_idl_index *sbrec_chassis_by_name
>          = chassis_index_create(ovnsb_idl_loop.idl);
> +    struct ovsdb_idl_index *sbrec_chassis_private_by_name
> +        = chassis_private_index_create(ovnsb_idl_loop.idl);
>      struct ovsdb_idl_index *sbrec_multicast_group_by_name_datapath
>          = mcast_group_index_create(ovnsb_idl_loop.idl);
>      struct ovsdb_idl_index *sbrec_port_binding_by_name
> @@ -1821,6 +1840,10 @@ main(int argc, char *argv[])
>      ovsdb_idl_omit(ovnsb_idl_loop.idl, &sbrec_connection_col_status);
>      ovsdb_idl_omit(ovnsb_idl_loop.idl, &sbrec_connection_col_target);
>  
> +    /* Do not monitor the Chassis_Private external_ids column */
> +    ovsdb_idl_omit(ovnsb_idl_loop.idl,
> +                   &sbrec_chassis_private_col_external_ids);
> +
>      update_sb_monitors(ovnsb_idl_loop.idl, NULL, NULL, NULL, false);
>  
>      stopwatch_create(CONTROLLER_LOOP_STOPWATCH_NAME, SW_MS);
> @@ -1998,10 +2021,16 @@ main(int argc, char *argv[])
>                  process_br_int(ovs_idl_txn, bridge_table, ovs_table);
>              const char *chassis_id = get_ovs_chassis_id(ovs_table);
>              const struct sbrec_chassis *chassis = NULL;
> +            const struct sbrec_chassis_private *chassis_private = NULL;
> +            uint16_t chassis_liveness_interval = get_chassis_liveness_interval(
> +                sbrec_sb_global_table_get(ovnsb_idl_loop.idl));
>              if (chassis_id) {
>                  chassis = chassis_run(ovnsb_idl_txn, sbrec_chassis_by_name,
> +                                      sbrec_chassis_private_by_name,
>                                        ovs_table, chassis_table, chassis_id,
> -                                      br_int, &transport_zones);
> +                                      br_int, &transport_zones,
> +                                      &chassis_private,
> +                                      chassis_liveness_interval);
>              }
>  
>              if (br_int) {
> @@ -2164,6 +2193,10 @@ main(int argc, char *argv[])
>                  ofctrl_wait();
>                  pinctrl_wait(ovnsb_idl_txn);
>              }
> +
> +            if (chassis) {
> +                chassis_wait(chassis_liveness_interval);
> +            }
>          }
>  
>          unixctl_server_run(unixctl);
> @@ -2232,10 +2265,17 @@ main(int argc, char *argv[])
>                     ? chassis_lookup_by_name(sbrec_chassis_by_name, chassis_id)
>                     : NULL);
>  
> +            const struct sbrec_chassis_private *chassis_private
> +                = (chassis_id
> +                   ? chassis_private_lookup_by_name(
> +                       sbrec_chassis_private_by_name, chassis_id)
> +                   : NULL);
> +
>              /* Run all of the cleanup functions, even if one of them returns
>               * false. We're done if all of them return true. */
>              done = binding_cleanup(ovnsb_idl_txn, port_binding_table, chassis);
> -            done = chassis_cleanup(ovnsb_idl_txn, chassis) && done;
> +            done = chassis_cleanup(ovnsb_idl_txn, chassis,
> +                                   chassis_private) && done;
>              done = encaps_cleanup(ovs_idl_txn, br_int) && done;
>              done = igmp_group_cleanup(ovnsb_idl_txn, sbrec_igmp_group) && done;
>              if (done) {
> diff --git a/lib/chassis-index.c b/lib/chassis-index.c
> index 39066f4cc..d9bbba8e4 100644
> --- a/lib/chassis-index.c
> +++ b/lib/chassis-index.c
> @@ -40,6 +40,32 @@ chassis_lookup_by_name(struct ovsdb_idl_index *sbrec_chassis_by_name,
>      return retval;
>  }
>  
> +struct ovsdb_idl_index *
> +chassis_private_index_create(struct ovsdb_idl *idl)
> +{
> +    return ovsdb_idl_index_create1(
> +        idl, &sbrec_chassis_private_col_name);
> +}
> +
> +/* Finds and returns the chassis with the given 'name', or NULL if no such
> + * chassis exists. */
> +const struct sbrec_chassis_private *
> +chassis_private_lookup_by_name(
> +    struct ovsdb_idl_index *sbrec_chassis_private_by_name,
> +    const char *name)
> +{
> +    struct sbrec_chassis_private *target = \
> +        sbrec_chassis_private_index_init_row(sbrec_chassis_private_by_name);
> +    sbrec_chassis_private_index_set_name(target, name);
> +
> +    struct sbrec_chassis_private *retval = sbrec_chassis_private_index_find(
> +        sbrec_chassis_private_by_name, target);
> +
> +    sbrec_chassis_private_index_destroy_row(target);
> +
> +    return retval;
> +}
> +
>  struct ovsdb_idl_index *
>  ha_chassis_group_index_create(struct ovsdb_idl *idl)
>  {
> diff --git a/lib/chassis-index.h b/lib/chassis-index.h
> index 302e5f0fd..77d2e1f99 100644
> --- a/lib/chassis-index.h
> +++ b/lib/chassis-index.h
> @@ -23,6 +23,11 @@ struct ovsdb_idl_index *chassis_index_create(struct ovsdb_idl *);
>  const struct sbrec_chassis *chassis_lookup_by_name(
>      struct ovsdb_idl_index *sbrec_chassis_by_name, const char *name);
>  
> +struct ovsdb_idl_index *chassis_private_index_create(struct ovsdb_idl *);
> +
> +const struct sbrec_chassis_private *chassis_private_lookup_by_name(
> +    struct ovsdb_idl_index *sbrec_chassis_private_by_name, const char *name);
> +
>  struct ovsdb_idl_index *ha_chassis_group_index_create(struct ovsdb_idl *idl);
>  const struct sbrec_ha_chassis_group *ha_chassis_group_lookup_by_name(
>      struct ovsdb_idl_index *sbrec_ha_chassis_grp_by_name, const char *name);
> diff --git a/northd/ovn-northd.c b/northd/ovn-northd.c
> index 2580b4ec9..1b16f5ef6 100644
> --- a/northd/ovn-northd.c
> +++ b/northd/ovn-northd.c
> @@ -11555,6 +11555,10 @@ main(int argc, char *argv[])
>      ovsdb_idl_add_column(ovnsb_idl_loop.idl, &sbrec_chassis_col_name);
>      ovsdb_idl_add_column(ovnsb_idl_loop.idl, &sbrec_chassis_col_external_ids);
>  
> +    ovsdb_idl_add_table(ovnsb_idl_loop.idl, &sbrec_table_chassis_private);
> +    add_column_noalert(ovnsb_idl_loop.idl,
> +                       &sbrec_chassis_private_col_external_ids);
> +
>      ovsdb_idl_add_table(ovnsb_idl_loop.idl, &sbrec_table_ha_chassis);
>      add_column_noalert(ovnsb_idl_loop.idl,
>                         &sbrec_ha_chassis_col_chassis);
> diff --git a/ovn-sb.ovsschema b/ovn-sb.ovsschema
> index d89f8dbbb..88da05849 100644
> --- a/ovn-sb.ovsschema
> +++ b/ovn-sb.ovsschema
> @@ -1,7 +1,7 @@
>  {
>      "name": "OVN_Southbound",
> -    "version": "2.7.0",
> -    "cksum": "4286723485 21693",
> +    "version": "2.8.0",
> +    "cksum": "293994447 22064",
>      "tables": {
>          "SB_Global": {
>              "columns": {
> @@ -43,6 +43,15 @@
>                                                "max": "unlimited"}}},
>              "isRoot": true,
>              "indexes": [["name"]]},
> +        "Chassis_Private": {
> +            "columns": {
> +                "name": {"type": "string"},
> +                "alive_at": {"type": "string"},
> +                "external_ids": {
> +                    "type": {"key": "string", "value": "string",
> +                             "min": 0, "max": "unlimited"}}},
> +            "isRoot": true,
> +            "indexes": [["name"]]},
>          "Encap": {
>              "columns": {
>                  "type": {"type": {"key": {
> diff --git a/ovn-sb.xml b/ovn-sb.xml
> index 3ae9d4f92..8c9822fdb 100644
> --- a/ovn-sb.xml
> +++ b/ovn-sb.xml
> @@ -366,6 +366,41 @@
>      </group>
>    </table>
>  
> +  <table name="Chassis_Private" title="Chassis Private">
> +    <p>
> +      Each row in this table maintains per chassis private data that are
> +      accessed only by the owning chassis (write only) and ovn-northd, not by
> +      any other chassis.  These data are stored in this separate table instead
> +      of the <ref table="Chassis"/> table for performance considerations:
> +      the rows in this table can be conditionally monitored by chassises so
> +      that each chassis only get update notifications for its own row, to avoid
> +      unnecessary chassis private data update flooding in a large scale
> +      deployment.  (Future: this separation can be avoided if ovsdb conditional
> +      monitoring is supported on a set of columns)
> +    </p>
> +
> +    <column name="name">
> +      The name of the chassis that owns these chassis-private data.
> +    </column>
> +
> +    <column name="alive_at">
> +      A timestamp indicating the last time ovn-controller
> +      signalized it's alive. For monitoring purposes, setting the
> +      chassis_liveness_interval configuration to the OVN_Northhbound's
> +      <ref table="NB_Global" column="options"/> with an integer value (in
> +      seconds) causes the <code>ovn-controller</code> to update this
> +      column with a timestamp every N seconds. If the value is not set
> +      or 0, then <code>ovn-controller</code> doesn't update this column.
> +    </column>
> +
> +    <group title="Common Columns">
> +      The overall purpose of these columns is described under <code>Common
> +      Columns</code> at the beginning of this document.
> +
> +      <column name="external_ids"/>
> +    </group>
> +  </table>
> +
>    <table name="Encap" title="Encapsulation Types">
>      <p>
>        The <ref column="encaps" table="Chassis"/> column in the <ref
>
Lucas Alvares Gomes Feb. 20, 2020, 9:55 a.m. UTC | #2
Thanks for the review Dumitry!

On Thu, Feb 20, 2020 at 9:19 AM Dumitru Ceara <dceara@redhat.com> wrote:
>
> On 2/19/20 4:37 PM, lmartins@redhat.com wrote:
> > From: Lucas Alvares Gomes <lucasagomes@gmail.com>
> >
> > NOTE: SENDING THIS PATCH AS A RFC TO SEE WHAT OTHERS MIGHT THINK ABOUT
> > THE IDEA PRIOR TO WRITTING TESTS TO IT.
> >
> > CMSes integrating with OVN often uses the nb_cfg mechanism as a way to
> > check the health status of the ovn-controller process but, the current
> > implementation isn't ideal for that purpose because it floods the control
> > plane with update notifications every time the nb_cfg value is
> > incremented.
> >
> > This patch is merging two ideas:
> >
> > 1) Han Zhou proposed a patch [0] creating a table called Chassis_Private
> >    where each hypervisor *only* writes and monitors its own record to
> >    avoid this flooding problem.
> >
> > 2) Having ovn-controller to periodically log that it's alive instead of
> >    relying on the nb_cfg mechanism.
> >
> > By using this mechanism, a CMS can more easily just read this new
> > Chassis_Private table and figure out the status of each Chassis
> > (ovn-controller) in the cluster.
> >
>
> Hi Lucas,
>
> Thanks a lot for working on this!
>
> > Here's some reasons why I believe this approach is better than having
> > to bump the  nb_cfg value:
> >
> > 1) Simple integration. Before, the CMS had to increment the nb_cfg
> >    value in the NB_Global table and wait for it to be propagated to the
> >    Southbound database (ovn-northd and then ovn-controllers) Chassis
> >    table in order to know the status of the Chassis.
> >
> >    Now, it can just read the Chassis_Private table and compare the
> >    alive_at column against the current time.
>
> I guess the only comment I have about this approach is that it doesn't
> feel completely OK to store a timestamp representation as string in
> "alive_at". I'm afraid this might be too inflexible and force CMSes to
> compare their local time with the value in this column.
>
> I know we discussed offline that using a counter that is periodically
> incremented by ovn-controller instead of a timestamp would complicate
> the implementation in Openstack but maybe other people on this mailing
> list have ideas on how to deal with this in a more generic way.
>

Agreed, let's see if people have more ideas here.

Also, let me expand my explanation a little. The reason why I think
having a counter is not ideal is because services would need to track
this number to know what the value was before and then compare with
the current value to figure out whether it's being incremented or not.

In OpenStack, we have multiple API workers spread across the
controllers nodes and the API request to check for the services'
health status will land on any of those workers. Which means that I
can't keep the track of the counter in-memory. Mostly likely, I would
need to set an IDL event on the field being incremented and store the
status of those chassis somewhere else accessible by all API workers.

The advantage with the nb_cfg mechanism compared with the incremenal
counter is that those values are already in the OVSDB, so we can
compare what's in the NB DB with the SB DB to figure whether the
services are alive or not. But, the price is that we need more code to
deal with waiting for the synchronization of the OVSDBs and, if we
move it to the Chasiss_Private table it won't be backwards compatible.

Therefore I think a timestamp would be better. It's easy to understand
by either a service or a even a person. When u look at the alive_at
field u know the last time the service signilized it was alive. For
CMSes, the service can just read the Chassis_Private table and compare
the alive_at field with the current time to figure the status of the
service (right now it's UTC, but, we can change it to include the TZ
info if people think it's easier). No writes, sync issues and also
it's backwards compatible with the nb_cfg approach.

> >
> > 2) Less costy. Just one read from the db is needed, no writing. Code
> >    using the nb_cfg mechanism had to implement a few safe-guard code to
> >    make less error prone. See [1] and [2] for example.
> >
> > 3) Backwards compatibility. The patch [0] was moving the nb_cfg value
> >    to new table so, systems relying on it would need to update their
> >    code upon updating OVN.
>
> Nice!
>
> One more minor comment on the code, inline.
>
> Thanks,
> Dumitru
>
> >
> > To enable this new mechanism set an option called
> > chassis_liveness_interval=N to the "options" column in the NB_Global
> > table, where N is the time (in seconds) in which the ovn-controller
> > should updates the alive_at column. If not set or set to 0, the
> > mechanism is disabled. By default, it's disabled.
> >
> > [0] https://patchwork.ozlabs.org/patch/899608/
> > [1] https://review.opendev.org/#/c/703612/
> > [2] https://review.opendev.org/#/c/704530/
> >
> > Co-authored-by: Han Zhou <hzhou8@ebay.com>
> > Co-authored-by: Numan Siddique <numans@ovn.org>
> > Co-authored-by: Dumitru Ceara <dceara@redhat.com>
> > Signed-off-by: Lucas Alvares Gomes <lucasagomes@gmail.com>
> > ---
> >  controller/chassis.c        | 56 +++++++++++++++++++++++++++++++++++--
> >  controller/chassis.h        | 11 ++++++--
> >  controller/ovn-controller.c | 44 +++++++++++++++++++++++++++--
> >  lib/chassis-index.c         | 26 +++++++++++++++++
> >  lib/chassis-index.h         |  5 ++++
> >  northd/ovn-northd.c         |  4 +++
> >  ovn-sb.ovsschema            | 13 +++++++--
> >  ovn-sb.xml                  | 35 +++++++++++++++++++++++
> >  8 files changed, 185 insertions(+), 9 deletions(-)
> >
> > diff --git a/controller/chassis.c b/controller/chassis.c
> > index 522893ead..7892cb966 100644
> > --- a/controller/chassis.c
> > +++ b/controller/chassis.c
> > @@ -24,6 +24,7 @@
> >  #include "openvswitch/dynamic-string.h"
> >  #include "openvswitch/vlog.h"
> >  #include "openvswitch/ofp-parse.h"
> > +#include "openvswitch/poll-loop.h"
> >  #include "lib/chassis-index.h"
> >  #include "lib/ovn-sb-idl.h"
> >  #include "ovn-controller.h"
> > @@ -47,11 +48,16 @@ struct chassis_info {
> >
> >      /* True if Chassis SB record is initialized, false otherwise. */
> >      uint32_t id_inited : 1;
> > +
> > +    /* Next livennes update time to update the 'alive_at' column of
> > +     * Chassis_Private table. */
> > +    long long int next_liveness_update;
> >  };
> >
> >  static struct chassis_info chassis_state = {
> >      .id = DS_EMPTY_INITIALIZER,
> >      .id_inited = false,
> > +    .next_liveness_update = LLONG_MAX,
> >  };
> >
> >  static void
> > @@ -581,18 +587,38 @@ chassis_update(const struct sbrec_chassis *chassis_rec,
> >      free(encaps);
> >  }
> >
> > +static void
> > +chassis_update_liveness_cfg(const struct sbrec_chassis_private *chassis_rec,
> > +                            struct chassis_info *ch_info,
> > +                            int chassis_liveness_interval)
> > +{
> > +    if (ch_info->next_liveness_update == LLONG_MAX ||
> > +        time_msec() > ch_info->next_liveness_update) {
> > +        sbrec_chassis_private_set_alive_at(
> > +            chassis_rec,
> > +            xastrftime_msec("%Y-%m-%d %H:%M:%S", time_wall_msec(), true));
> > +        ch_info->next_liveness_update =
> > +            time_msec() + chassis_liveness_interval * 1000;
> > +    }
> > +}
> > +
> >  /* Returns this chassis's Chassis record, if it is available. */
> >  const struct sbrec_chassis *
> >  chassis_run(struct ovsdb_idl_txn *ovnsb_idl_txn,
> >              struct ovsdb_idl_index *sbrec_chassis_by_name,
> > +            struct ovsdb_idl_index *sbrec_chassis_private_by_name,
> >              const struct ovsrec_open_vswitch_table *ovs_table,
> >              const struct sbrec_chassis_table *chassis_table,
> >              const char *chassis_id,
> >              const struct ovsrec_bridge *br_int,
> > -            const struct sset *transport_zones)
> > +            const struct sset *transport_zones,
> > +            const struct sbrec_chassis_private **chassis_private,
> > +            int chassis_liveness_interval)
> >  {
> >      struct ovs_chassis_cfg ovs_cfg;
> >
> > +    *chassis_private = NULL;
> > +
> >      /* Get the chassis config from the ovs table. */
> >      ovs_chassis_cfg_init(&ovs_cfg);
> >      if (!chassis_parse_ovs_config(ovs_table, br_int, &ovs_cfg)) {
> > @@ -616,6 +642,24 @@ chassis_run(struct ovsdb_idl_txn *ovnsb_idl_txn,
> >                                    chassis_id);
> >      }
> >
> > +    /* Create the Chassis_Private entry if it's not there yet */
> > +    const struct sbrec_chassis_private *chassis_private_rec =
> > +        chassis_private_lookup_by_name(sbrec_chassis_private_by_name,
> > +                                       chassis_id);
> > +    if (!chassis_private_rec && ovnsb_idl_txn) {
> > +        chassis_private_rec = sbrec_chassis_private_insert(ovnsb_idl_txn);
> > +        sbrec_chassis_private_set_name(chassis_private_rec, chassis_id);
> > +    }
> > +
> > +    /* Update the alive_at column from the Chassis_Private table if
> > +     * the chassis liveness mechanism is enabled */
> > +    if (chassis_private_rec && ovnsb_idl_txn && chassis_liveness_interval) {
> > +        chassis_update_liveness_cfg(chassis_private_rec, &chassis_state,
> > +                                    chassis_liveness_interval);
> > +    }
> > +
> > +    *chassis_private = chassis_private_rec;
> > +
> >      ovs_chassis_cfg_destroy(&ovs_cfg);
> >      return chassis_rec;
> >  }
> > @@ -669,7 +713,8 @@ chassis_get_mac(const struct sbrec_chassis *chassis_rec,
> >   * required. */
> >  bool
> >  chassis_cleanup(struct ovsdb_idl_txn *ovnsb_idl_txn,
> > -                const struct sbrec_chassis *chassis_rec)
> > +                const struct sbrec_chassis *chassis_rec,
> > +                const struct sbrec_chassis_private *chassis_private_rec)
> >  {
> >      if (!chassis_rec) {
> >          return true;
> > @@ -679,6 +724,7 @@ chassis_cleanup(struct ovsdb_idl_txn *ovnsb_idl_txn,
> >                                    "ovn-controller: unregistering chassis '%s'",
> >                                    chassis_rec->name);
> >          sbrec_chassis_delete(chassis_rec);
> > +        sbrec_chassis_private_delete(chassis_private_rec);
> >      }
> >      return false;
> >  }
> > @@ -695,3 +741,9 @@ chassis_get_id(void)
> >
> >      return NULL;
> >  }
> > +
> > +void chassis_wait(int chassis_liveness_interval) {
> > +    if (chassis_liveness_interval) {
> > +        poll_timer_wait_until(chassis_state.next_liveness_update);
> > +    }
> > +}
> > diff --git a/controller/chassis.h b/controller/chassis.h
> > index 178d2957e..38a549553 100644
> > --- a/controller/chassis.h
> > +++ b/controller/chassis.h
> > @@ -17,6 +17,7 @@
> >  #define OVN_CHASSIS_H 1
> >
> >  #include <stdbool.h>
> > +#include "lib/ovn-sb-idl.h"
> >
> >  struct ovsdb_idl;
> >  struct ovsdb_idl_index;
> > @@ -33,17 +34,21 @@ void chassis_register_ovs_idl(struct ovsdb_idl *);
> >  const struct sbrec_chassis *chassis_run(
> >      struct ovsdb_idl_txn *ovnsb_idl_txn,
> >      struct ovsdb_idl_index *sbrec_chassis_by_name,
> > +    struct ovsdb_idl_index *sbrec_chassis_private_by_name,
> >      const struct ovsrec_open_vswitch_table *,
> >      const struct sbrec_chassis_table *,
> >      const char *chassis_id, const struct ovsrec_bridge *br_int,
> > -    const struct sset *transport_zones);
> > +    const struct sset *transport_zones,
> > +    const struct sbrec_chassis_private **chassis_private,
> > +    int chassis_liveness_interval);
> >  bool chassis_cleanup(struct ovsdb_idl_txn *ovnsb_idl_txn,
> > -                     const struct sbrec_chassis *);
> > +                     const struct sbrec_chassis *,
> > +                     const struct sbrec_chassis_private *);
> >  bool chassis_get_mac(const struct sbrec_chassis *chassis,
> >                       const char *bridge_mapping,
> >                       struct eth_addr *chassis_mac);
> >  const char *chassis_get_id(void);
> >  const char * get_chassis_mac_mappings(const struct smap *ext_ids);
> > -
> > +void chassis_wait(int chassis_liveness_interval);
> >
> >  #endif /* controller/chassis.h */
> > diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c
> > index 4d245ca28..fd9d95da2 100644
> > --- a/controller/ovn-controller.c
> > +++ b/controller/ovn-controller.c
> > @@ -155,6 +155,7 @@ update_sb_monitors(struct ovsdb_idl *ovnsb_idl,
> >      struct ovsdb_idl_condition ce =  OVSDB_IDL_CONDITION_INIT(&ce);
> >      struct ovsdb_idl_condition ip_mcast = OVSDB_IDL_CONDITION_INIT(&ip_mcast);
> >      struct ovsdb_idl_condition igmp = OVSDB_IDL_CONDITION_INIT(&igmp);
> > +    struct ovsdb_idl_condition chprv = OVSDB_IDL_CONDITION_INIT(&chprv);
> >
> >      if (monitor_all) {
> >          ovsdb_idl_condition_add_clause_true(&pb);
> > @@ -165,6 +166,7 @@ update_sb_monitors(struct ovsdb_idl *ovnsb_idl,
> >          ovsdb_idl_condition_add_clause_true(&ce);
> >          ovsdb_idl_condition_add_clause_true(&ip_mcast);
> >          ovsdb_idl_condition_add_clause_true(&igmp);
> > +        ovsdb_idl_condition_add_clause_true(&chprv);
> >          goto out;
> >      }
> >
> > @@ -196,6 +198,10 @@ update_sb_monitors(struct ovsdb_idl *ovnsb_idl,
> >                                                    &chassis->header_.uuid);
> >          sbrec_igmp_group_add_clause_chassis(&igmp, OVSDB_F_EQ,
> >                                              &chassis->header_.uuid);
> > +
> > +        /* Monitors Chassis_Private record for current chassis only */
> > +        sbrec_chassis_private_add_clause_name(&chprv, OVSDB_F_EQ,
> > +                                              chassis->name);
> >      }
> >      if (local_ifaces) {
> >          const char *name;
> > @@ -229,6 +235,7 @@ out:
> >      sbrec_controller_event_set_condition(ovnsb_idl, &ce);
> >      sbrec_ip_multicast_set_condition(ovnsb_idl, &ip_mcast);
> >      sbrec_igmp_group_set_condition(ovnsb_idl, &igmp);
> > +    sbrec_chassis_private_set_condition(ovnsb_idl, &chprv);
> >      ovsdb_idl_condition_destroy(&pb);
> >      ovsdb_idl_condition_destroy(&lf);
> >      ovsdb_idl_condition_destroy(&mb);
> > @@ -237,6 +244,7 @@ out:
> >      ovsdb_idl_condition_destroy(&ce);
> >      ovsdb_idl_condition_destroy(&ip_mcast);
> >      ovsdb_idl_condition_destroy(&igmp);
> > +    ovsdb_idl_condition_destroy(&chprv);
> >  }
> >
> >  static const char *
> > @@ -687,6 +695,15 @@ get_transport_zones(const struct ovsrec_open_vswitch_table *ovs_table)
> >      return smap_get_def(&cfg->external_ids, "ovn-transport-zones", "");
> >  }
> >
> > +static uint16_t
> > +get_chassis_liveness_interval(const struct sbrec_sb_global_table
> > +                              *sb_global_table)
> > +{
> > +    const struct sbrec_sb_global *sb
> > +        = sbrec_sb_global_table_first(sb_global_table);
> > +    return atoi(smap_get_def(&sb->options, "chassis_liveness_interval", "0"));
>
> We have smap_get_int in smap.h which does exactly this with a bit more
> error checking.
>

A super! Thanks, I will change it.

> > +}
> > +
> >  static void
> >  ctrl_register_ovs_idl(struct ovsdb_idl *ovs_idl)
> >  {
> > @@ -1760,6 +1777,8 @@ main(int argc, char *argv[])
> >
> >      struct ovsdb_idl_index *sbrec_chassis_by_name
> >          = chassis_index_create(ovnsb_idl_loop.idl);
> > +    struct ovsdb_idl_index *sbrec_chassis_private_by_name
> > +        = chassis_private_index_create(ovnsb_idl_loop.idl);
> >      struct ovsdb_idl_index *sbrec_multicast_group_by_name_datapath
> >          = mcast_group_index_create(ovnsb_idl_loop.idl);
> >      struct ovsdb_idl_index *sbrec_port_binding_by_name
> > @@ -1821,6 +1840,10 @@ main(int argc, char *argv[])
> >      ovsdb_idl_omit(ovnsb_idl_loop.idl, &sbrec_connection_col_status);
> >      ovsdb_idl_omit(ovnsb_idl_loop.idl, &sbrec_connection_col_target);
> >
> > +    /* Do not monitor the Chassis_Private external_ids column */
> > +    ovsdb_idl_omit(ovnsb_idl_loop.idl,
> > +                   &sbrec_chassis_private_col_external_ids);
> > +
> >      update_sb_monitors(ovnsb_idl_loop.idl, NULL, NULL, NULL, false);
> >
> >      stopwatch_create(CONTROLLER_LOOP_STOPWATCH_NAME, SW_MS);
> > @@ -1998,10 +2021,16 @@ main(int argc, char *argv[])
> >                  process_br_int(ovs_idl_txn, bridge_table, ovs_table);
> >              const char *chassis_id = get_ovs_chassis_id(ovs_table);
> >              const struct sbrec_chassis *chassis = NULL;
> > +            const struct sbrec_chassis_private *chassis_private = NULL;
> > +            uint16_t chassis_liveness_interval = get_chassis_liveness_interval(
> > +                sbrec_sb_global_table_get(ovnsb_idl_loop.idl));
> >              if (chassis_id) {
> >                  chassis = chassis_run(ovnsb_idl_txn, sbrec_chassis_by_name,
> > +                                      sbrec_chassis_private_by_name,
> >                                        ovs_table, chassis_table, chassis_id,
> > -                                      br_int, &transport_zones);
> > +                                      br_int, &transport_zones,
> > +                                      &chassis_private,
> > +                                      chassis_liveness_interval);
> >              }
> >
> >              if (br_int) {
> > @@ -2164,6 +2193,10 @@ main(int argc, char *argv[])
> >                  ofctrl_wait();
> >                  pinctrl_wait(ovnsb_idl_txn);
> >              }
> > +
> > +            if (chassis) {
> > +                chassis_wait(chassis_liveness_interval);
> > +            }
> >          }
> >
> >          unixctl_server_run(unixctl);
> > @@ -2232,10 +2265,17 @@ main(int argc, char *argv[])
> >                     ? chassis_lookup_by_name(sbrec_chassis_by_name, chassis_id)
> >                     : NULL);
> >
> > +            const struct sbrec_chassis_private *chassis_private
> > +                = (chassis_id
> > +                   ? chassis_private_lookup_by_name(
> > +                       sbrec_chassis_private_by_name, chassis_id)
> > +                   : NULL);
> > +
> >              /* Run all of the cleanup functions, even if one of them returns
> >               * false. We're done if all of them return true. */
> >              done = binding_cleanup(ovnsb_idl_txn, port_binding_table, chassis);
> > -            done = chassis_cleanup(ovnsb_idl_txn, chassis) && done;
> > +            done = chassis_cleanup(ovnsb_idl_txn, chassis,
> > +                                   chassis_private) && done;
> >              done = encaps_cleanup(ovs_idl_txn, br_int) && done;
> >              done = igmp_group_cleanup(ovnsb_idl_txn, sbrec_igmp_group) && done;
> >              if (done) {
> > diff --git a/lib/chassis-index.c b/lib/chassis-index.c
> > index 39066f4cc..d9bbba8e4 100644
> > --- a/lib/chassis-index.c
> > +++ b/lib/chassis-index.c
> > @@ -40,6 +40,32 @@ chassis_lookup_by_name(struct ovsdb_idl_index *sbrec_chassis_by_name,
> >      return retval;
> >  }
> >
> > +struct ovsdb_idl_index *
> > +chassis_private_index_create(struct ovsdb_idl *idl)
> > +{
> > +    return ovsdb_idl_index_create1(
> > +        idl, &sbrec_chassis_private_col_name);
> > +}
> > +
> > +/* Finds and returns the chassis with the given 'name', or NULL if no such
> > + * chassis exists. */
> > +const struct sbrec_chassis_private *
> > +chassis_private_lookup_by_name(
> > +    struct ovsdb_idl_index *sbrec_chassis_private_by_name,
> > +    const char *name)
> > +{
> > +    struct sbrec_chassis_private *target = \
> > +        sbrec_chassis_private_index_init_row(sbrec_chassis_private_by_name);
> > +    sbrec_chassis_private_index_set_name(target, name);
> > +
> > +    struct sbrec_chassis_private *retval = sbrec_chassis_private_index_find(
> > +        sbrec_chassis_private_by_name, target);
> > +
> > +    sbrec_chassis_private_index_destroy_row(target);
> > +
> > +    return retval;
> > +}
> > +
> >  struct ovsdb_idl_index *
> >  ha_chassis_group_index_create(struct ovsdb_idl *idl)
> >  {
> > diff --git a/lib/chassis-index.h b/lib/chassis-index.h
> > index 302e5f0fd..77d2e1f99 100644
> > --- a/lib/chassis-index.h
> > +++ b/lib/chassis-index.h
> > @@ -23,6 +23,11 @@ struct ovsdb_idl_index *chassis_index_create(struct ovsdb_idl *);
> >  const struct sbrec_chassis *chassis_lookup_by_name(
> >      struct ovsdb_idl_index *sbrec_chassis_by_name, const char *name);
> >
> > +struct ovsdb_idl_index *chassis_private_index_create(struct ovsdb_idl *);
> > +
> > +const struct sbrec_chassis_private *chassis_private_lookup_by_name(
> > +    struct ovsdb_idl_index *sbrec_chassis_private_by_name, const char *name);
> > +
> >  struct ovsdb_idl_index *ha_chassis_group_index_create(struct ovsdb_idl *idl);
> >  const struct sbrec_ha_chassis_group *ha_chassis_group_lookup_by_name(
> >      struct ovsdb_idl_index *sbrec_ha_chassis_grp_by_name, const char *name);
> > diff --git a/northd/ovn-northd.c b/northd/ovn-northd.c
> > index 2580b4ec9..1b16f5ef6 100644
> > --- a/northd/ovn-northd.c
> > +++ b/northd/ovn-northd.c
> > @@ -11555,6 +11555,10 @@ main(int argc, char *argv[])
> >      ovsdb_idl_add_column(ovnsb_idl_loop.idl, &sbrec_chassis_col_name);
> >      ovsdb_idl_add_column(ovnsb_idl_loop.idl, &sbrec_chassis_col_external_ids);
> >
> > +    ovsdb_idl_add_table(ovnsb_idl_loop.idl, &sbrec_table_chassis_private);
> > +    add_column_noalert(ovnsb_idl_loop.idl,
> > +                       &sbrec_chassis_private_col_external_ids);
> > +
> >      ovsdb_idl_add_table(ovnsb_idl_loop.idl, &sbrec_table_ha_chassis);
> >      add_column_noalert(ovnsb_idl_loop.idl,
> >                         &sbrec_ha_chassis_col_chassis);
> > diff --git a/ovn-sb.ovsschema b/ovn-sb.ovsschema
> > index d89f8dbbb..88da05849 100644
> > --- a/ovn-sb.ovsschema
> > +++ b/ovn-sb.ovsschema
> > @@ -1,7 +1,7 @@
> >  {
> >      "name": "OVN_Southbound",
> > -    "version": "2.7.0",
> > -    "cksum": "4286723485 21693",
> > +    "version": "2.8.0",
> > +    "cksum": "293994447 22064",
> >      "tables": {
> >          "SB_Global": {
> >              "columns": {
> > @@ -43,6 +43,15 @@
> >                                                "max": "unlimited"}}},
> >              "isRoot": true,
> >              "indexes": [["name"]]},
> > +        "Chassis_Private": {
> > +            "columns": {
> > +                "name": {"type": "string"},
> > +                "alive_at": {"type": "string"},
> > +                "external_ids": {
> > +                    "type": {"key": "string", "value": "string",
> > +                             "min": 0, "max": "unlimited"}}},
> > +            "isRoot": true,
> > +            "indexes": [["name"]]},
> >          "Encap": {
> >              "columns": {
> >                  "type": {"type": {"key": {
> > diff --git a/ovn-sb.xml b/ovn-sb.xml
> > index 3ae9d4f92..8c9822fdb 100644
> > --- a/ovn-sb.xml
> > +++ b/ovn-sb.xml
> > @@ -366,6 +366,41 @@
> >      </group>
> >    </table>
> >
> > +  <table name="Chassis_Private" title="Chassis Private">
> > +    <p>
> > +      Each row in this table maintains per chassis private data that are
> > +      accessed only by the owning chassis (write only) and ovn-northd, not by
> > +      any other chassis.  These data are stored in this separate table instead
> > +      of the <ref table="Chassis"/> table for performance considerations:
> > +      the rows in this table can be conditionally monitored by chassises so
> > +      that each chassis only get update notifications for its own row, to avoid
> > +      unnecessary chassis private data update flooding in a large scale
> > +      deployment.  (Future: this separation can be avoided if ovsdb conditional
> > +      monitoring is supported on a set of columns)
> > +    </p>
> > +
> > +    <column name="name">
> > +      The name of the chassis that owns these chassis-private data.
> > +    </column>
> > +
> > +    <column name="alive_at">
> > +      A timestamp indicating the last time ovn-controller
> > +      signalized it's alive. For monitoring purposes, setting the
> > +      chassis_liveness_interval configuration to the OVN_Northhbound's
> > +      <ref table="NB_Global" column="options"/> with an integer value (in
> > +      seconds) causes the <code>ovn-controller</code> to update this
> > +      column with a timestamp every N seconds. If the value is not set
> > +      or 0, then <code>ovn-controller</code> doesn't update this column.
> > +    </column>
> > +
> > +    <group title="Common Columns">
> > +      The overall purpose of these columns is described under <code>Common
> > +      Columns</code> at the beginning of this document.
> > +
> > +      <column name="external_ids"/>
> > +    </group>
> > +  </table>
> > +
> >    <table name="Encap" title="Encapsulation Types">
> >      <p>
> >        The <ref column="encaps" table="Chassis"/> column in the <ref
> >
>
Han Zhou March 11, 2020, 7:05 a.m. UTC | #3
Thanks Lucas for working on this! Please see my comments below.

On Thu, Feb 20, 2020 at 1:56 AM Lucas Alvares Gomes <lucasagomes@gmail.com>
wrote:
>
> Thanks for the review Dumitry!
>
> On Thu, Feb 20, 2020 at 9:19 AM Dumitru Ceara <dceara@redhat.com> wrote:
> >
> > On 2/19/20 4:37 PM, lmartins@redhat.com wrote:
> > > From: Lucas Alvares Gomes <lucasagomes@gmail.com>
> > >
> > > NOTE: SENDING THIS PATCH AS A RFC TO SEE WHAT OTHERS MIGHT THINK ABOUT
> > > THE IDEA PRIOR TO WRITTING TESTS TO IT.
> > >
> > > CMSes integrating with OVN often uses the nb_cfg mechanism as a way to
> > > check the health status of the ovn-controller process but, the current
> > > implementation isn't ideal for that purpose because it floods the
control
> > > plane with update notifications every time the nb_cfg value is
> > > incremented.
> > >
> > > This patch is merging two ideas:
> > >
> > > 1) Han Zhou proposed a patch [0] creating a table called
Chassis_Private
> > >    where each hypervisor *only* writes and monitors its own record to
> > >    avoid this flooding problem.
> > >
> > > 2) Having ovn-controller to periodically log that it's alive instead
of
> > >    relying on the nb_cfg mechanism.
> > >
> > > By using this mechanism, a CMS can more easily just read this new
> > > Chassis_Private table and figure out the status of each Chassis
> > > (ovn-controller) in the cluster.
> > >
> >
> > Hi Lucas,
> >
> > Thanks a lot for working on this!
> >
> > > Here's some reasons why I believe this approach is better than having
> > > to bump the  nb_cfg value:
> > >
> > > 1) Simple integration. Before, the CMS had to increment the nb_cfg
> > >    value in the NB_Global table and wait for it to be propagated to
the
> > >    Southbound database (ovn-northd and then ovn-controllers) Chassis
> > >    table in order to know the status of the Chassis.
> > >
> > >    Now, it can just read the Chassis_Private table and compare the
> > >    alive_at column against the current time.

I think the nb_cfg mechanism doesn't have to wait. There can be one monitor
updating the counter periodically, and any other worker can compare the
desired value and each chassis's value and see which ones are stale, e.g.
if the gap is bigger than 3, meaning the chassis missed 3 round of
heartbeat.
But it is true that it is more costly because there are 2 directions of
communications. It may or may not be a concern, depending on the scale and
the frequency of heartbeat.

> >
> > I guess the only comment I have about this approach is that it doesn't
> > feel completely OK to store a timestamp representation as string in
> > "alive_at". I'm afraid this might be too inflexible and force CMSes to
> > compare their local time with the value in this column.
> >
> > I know we discussed offline that using a counter that is periodically
> > incremented by ovn-controller instead of a timestamp would complicate
> > the implementation in Openstack but maybe other people on this mailing
> > list have ideas on how to deal with this in a more generic way.
> >
>
> Agreed, let's see if people have more ideas here.
>
> Also, let me expand my explanation a little. The reason why I think
> having a counter is not ideal is because services would need to track
> this number to know what the value was before and then compare with
> the current value to figure out whether it's being incremented or not.
>
> In OpenStack, we have multiple API workers spread across the
> controllers nodes and the API request to check for the services'
> health status will land on any of those workers. Which means that I
> can't keep the track of the counter in-memory. Mostly likely, I would
> need to set an IDL event on the field being incremented and store the
> status of those chassis somewhere else accessible by all API workers.
>
> The advantage with the nb_cfg mechanism compared with the incremenal
> counter is that those values are already in the OVSDB, so we can
> compare what's in the NB DB with the SB DB to figure whether the
> services are alive or not. But, the price is that we need more code to
> deal with waiting for the synchronization of the OVSDBs and, if we
> move it to the Chasiss_Private table it won't be backwards compatible.
>
> Therefore I think a timestamp would be better. It's easy to understand
> by either a service or a even a person. When u look at the alive_at
> field u know the last time the service signilized it was alive. For
> CMSes, the service can just read the Chassis_Private table and compare
> the alive_at field with the current time to figure the status of the
> service (right now it's UTC, but, we can change it to include the TZ
> info if people think it's easier). No writes, sync issues and also
> it's backwards compatible with the nb_cfg approach.
>
> > >
> > > 2) Less costy. Just one read from the db is needed, no writing. Code
> > >    using the nb_cfg mechanism had to implement a few safe-guard code
to
> > >    make less error prone. See [1] and [2] for example.
> > >
> > > 3) Backwards compatibility. The patch [0] was moving the nb_cfg value
> > >    to new table so, systems relying on it would need to update their
> > >    code upon updating OVN.
> >
I have a different opinion on the backward compatibility. It is true that
moving the nb_cfg to a new table may cause compatibilty issue. However, I
think it shouldn't be a problem in reality, mainly because that the
flooding problem of nb_cfg today prevents it being used in live producation
environment.
In fact, the nb_cfg mechanism has it's unique capability of performing a
"control plane" ping, and it can be utilized to measure the end-to-end
latency of the whole control plane, i.e. when there is a change in NB, how
long it takes for all chassises to enforce that change. It is very useful
in scalability evaluation. Because of this, I strongly suggest to move the
nb_cfg mechanism to chassis_private table as well. I think it is not
harmful to have both mechanism co-exist (but both should be in
chassis_private table to avoid causing flooding problem), and each would
have its own advantages in different use cases. We could even combine both
approaches:

- Whenever nb_cfg is updated, each chassis should update its own record in
SB with the same value. When updating this value, it updates the timestamp
column as well, which could tell the latency of the specific chassis for
handling the update.

- If nb_cfg is not update, but chassis_liveness_interval is set, each
chassis should update the timestamp only (the nb_cfg value in its own
record should keep the same as the current NB nb_cfg value. If for some
reason they are different, then it should be updated taking this chance).

What do you think?

Thanks,
Han

> > Nice!
> >
> > One more minor comment on the code, inline.
> >
> > Thanks,
> > Dumitru
> >
> > >
> > > To enable this new mechanism set an option called
> > > chassis_liveness_interval=N to the "options" column in the NB_Global
> > > table, where N is the time (in seconds) in which the ovn-controller
> > > should updates the alive_at column. If not set or set to 0, the
> > > mechanism is disabled. By default, it's disabled.
> > >
> > > [0] https://patchwork.ozlabs.org/patch/899608/
> > > [1] https://review.opendev.org/#/c/703612/
> > > [2] https://review.opendev.org/#/c/704530/
> > >
> > > Co-authored-by: Han Zhou <hzhou8@ebay.com>
> > > Co-authored-by: Numan Siddique <numans@ovn.org>
> > > Co-authored-by: Dumitru Ceara <dceara@redhat.com>
> > > Signed-off-by: Lucas Alvares Gomes <lucasagomes@gmail.com>
> > > ---
> > >  controller/chassis.c        | 56
+++++++++++++++++++++++++++++++++++--
> > >  controller/chassis.h        | 11 ++++++--
> > >  controller/ovn-controller.c | 44 +++++++++++++++++++++++++++--
> > >  lib/chassis-index.c         | 26 +++++++++++++++++
> > >  lib/chassis-index.h         |  5 ++++
> > >  northd/ovn-northd.c         |  4 +++
> > >  ovn-sb.ovsschema            | 13 +++++++--
> > >  ovn-sb.xml                  | 35 +++++++++++++++++++++++
> > >  8 files changed, 185 insertions(+), 9 deletions(-)
> > >
> > > diff --git a/controller/chassis.c b/controller/chassis.c
> > > index 522893ead..7892cb966 100644
> > > --- a/controller/chassis.c
> > > +++ b/controller/chassis.c
> > > @@ -24,6 +24,7 @@
> > >  #include "openvswitch/dynamic-string.h"
> > >  #include "openvswitch/vlog.h"
> > >  #include "openvswitch/ofp-parse.h"
> > > +#include "openvswitch/poll-loop.h"
> > >  #include "lib/chassis-index.h"
> > >  #include "lib/ovn-sb-idl.h"
> > >  #include "ovn-controller.h"
> > > @@ -47,11 +48,16 @@ struct chassis_info {
> > >
> > >      /* True if Chassis SB record is initialized, false otherwise. */
> > >      uint32_t id_inited : 1;
> > > +
> > > +    /* Next livennes update time to update the 'alive_at' column of
> > > +     * Chassis_Private table. */
> > > +    long long int next_liveness_update;
> > >  };
> > >
> > >  static struct chassis_info chassis_state = {
> > >      .id = DS_EMPTY_INITIALIZER,
> > >      .id_inited = false,
> > > +    .next_liveness_update = LLONG_MAX,
> > >  };
> > >
> > >  static void
> > > @@ -581,18 +587,38 @@ chassis_update(const struct sbrec_chassis
*chassis_rec,
> > >      free(encaps);
> > >  }
> > >
> > > +static void
> > > +chassis_update_liveness_cfg(const struct sbrec_chassis_private
*chassis_rec,
> > > +                            struct chassis_info *ch_info,
> > > +                            int chassis_liveness_interval)
> > > +{
> > > +    if (ch_info->next_liveness_update == LLONG_MAX ||
> > > +        time_msec() > ch_info->next_liveness_update) {
> > > +        sbrec_chassis_private_set_alive_at(
> > > +            chassis_rec,
> > > +            xastrftime_msec("%Y-%m-%d %H:%M:%S", time_wall_msec(),
true));
> > > +        ch_info->next_liveness_update =
> > > +            time_msec() + chassis_liveness_interval * 1000;
> > > +    }
> > > +}
> > > +
> > >  /* Returns this chassis's Chassis record, if it is available. */
> > >  const struct sbrec_chassis *
> > >  chassis_run(struct ovsdb_idl_txn *ovnsb_idl_txn,
> > >              struct ovsdb_idl_index *sbrec_chassis_by_name,
> > > +            struct ovsdb_idl_index *sbrec_chassis_private_by_name,
> > >              const struct ovsrec_open_vswitch_table *ovs_table,
> > >              const struct sbrec_chassis_table *chassis_table,
> > >              const char *chassis_id,
> > >              const struct ovsrec_bridge *br_int,
> > > -            const struct sset *transport_zones)
> > > +            const struct sset *transport_zones,
> > > +            const struct sbrec_chassis_private **chassis_private,
> > > +            int chassis_liveness_interval)
> > >  {
> > >      struct ovs_chassis_cfg ovs_cfg;
> > >
> > > +    *chassis_private = NULL;
> > > +
> > >      /* Get the chassis config from the ovs table. */
> > >      ovs_chassis_cfg_init(&ovs_cfg);
> > >      if (!chassis_parse_ovs_config(ovs_table, br_int, &ovs_cfg)) {
> > > @@ -616,6 +642,24 @@ chassis_run(struct ovsdb_idl_txn *ovnsb_idl_txn,
> > >                                    chassis_id);
> > >      }
> > >
> > > +    /* Create the Chassis_Private entry if it's not there yet */
> > > +    const struct sbrec_chassis_private *chassis_private_rec =
> > > +        chassis_private_lookup_by_name(sbrec_chassis_private_by_name,
> > > +                                       chassis_id);
> > > +    if (!chassis_private_rec && ovnsb_idl_txn) {
> > > +        chassis_private_rec =
sbrec_chassis_private_insert(ovnsb_idl_txn);
> > > +        sbrec_chassis_private_set_name(chassis_private_rec,
chassis_id);
> > > +    }
> > > +
> > > +    /* Update the alive_at column from the Chassis_Private table if
> > > +     * the chassis liveness mechanism is enabled */
> > > +    if (chassis_private_rec && ovnsb_idl_txn &&
chassis_liveness_interval) {
> > > +        chassis_update_liveness_cfg(chassis_private_rec,
&chassis_state,
> > > +                                    chassis_liveness_interval);
> > > +    }
> > > +
> > > +    *chassis_private = chassis_private_rec;
> > > +
> > >      ovs_chassis_cfg_destroy(&ovs_cfg);
> > >      return chassis_rec;
> > >  }
> > > @@ -669,7 +713,8 @@ chassis_get_mac(const struct sbrec_chassis
*chassis_rec,
> > >   * required. */
> > >  bool
> > >  chassis_cleanup(struct ovsdb_idl_txn *ovnsb_idl_txn,
> > > -                const struct sbrec_chassis *chassis_rec)
> > > +                const struct sbrec_chassis *chassis_rec,
> > > +                const struct sbrec_chassis_private
*chassis_private_rec)
> > >  {
> > >      if (!chassis_rec) {
> > >          return true;
> > > @@ -679,6 +724,7 @@ chassis_cleanup(struct ovsdb_idl_txn
*ovnsb_idl_txn,
> > >                                    "ovn-controller: unregistering
chassis '%s'",
> > >                                    chassis_rec->name);
> > >          sbrec_chassis_delete(chassis_rec);
> > > +        sbrec_chassis_private_delete(chassis_private_rec);
> > >      }
> > >      return false;
> > >  }
> > > @@ -695,3 +741,9 @@ chassis_get_id(void)
> > >
> > >      return NULL;
> > >  }
> > > +
> > > +void chassis_wait(int chassis_liveness_interval) {
> > > +    if (chassis_liveness_interval) {
> > > +        poll_timer_wait_until(chassis_state.next_liveness_update);
> > > +    }
> > > +}
> > > diff --git a/controller/chassis.h b/controller/chassis.h
> > > index 178d2957e..38a549553 100644
> > > --- a/controller/chassis.h
> > > +++ b/controller/chassis.h
> > > @@ -17,6 +17,7 @@
> > >  #define OVN_CHASSIS_H 1
> > >
> > >  #include <stdbool.h>
> > > +#include "lib/ovn-sb-idl.h"
> > >
> > >  struct ovsdb_idl;
> > >  struct ovsdb_idl_index;
> > > @@ -33,17 +34,21 @@ void chassis_register_ovs_idl(struct ovsdb_idl *);
> > >  const struct sbrec_chassis *chassis_run(
> > >      struct ovsdb_idl_txn *ovnsb_idl_txn,
> > >      struct ovsdb_idl_index *sbrec_chassis_by_name,
> > > +    struct ovsdb_idl_index *sbrec_chassis_private_by_name,
> > >      const struct ovsrec_open_vswitch_table *,
> > >      const struct sbrec_chassis_table *,
> > >      const char *chassis_id, const struct ovsrec_bridge *br_int,
> > > -    const struct sset *transport_zones);
> > > +    const struct sset *transport_zones,
> > > +    const struct sbrec_chassis_private **chassis_private,
> > > +    int chassis_liveness_interval);
> > >  bool chassis_cleanup(struct ovsdb_idl_txn *ovnsb_idl_txn,
> > > -                     const struct sbrec_chassis *);
> > > +                     const struct sbrec_chassis *,
> > > +                     const struct sbrec_chassis_private *);
> > >  bool chassis_get_mac(const struct sbrec_chassis *chassis,
> > >                       const char *bridge_mapping,
> > >                       struct eth_addr *chassis_mac);
> > >  const char *chassis_get_id(void);
> > >  const char * get_chassis_mac_mappings(const struct smap *ext_ids);
> > > -
> > > +void chassis_wait(int chassis_liveness_interval);
> > >
> > >  #endif /* controller/chassis.h */
> > > diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c
> > > index 4d245ca28..fd9d95da2 100644
> > > --- a/controller/ovn-controller.c
> > > +++ b/controller/ovn-controller.c
> > > @@ -155,6 +155,7 @@ update_sb_monitors(struct ovsdb_idl *ovnsb_idl,
> > >      struct ovsdb_idl_condition ce =  OVSDB_IDL_CONDITION_INIT(&ce);
> > >      struct ovsdb_idl_condition ip_mcast =
OVSDB_IDL_CONDITION_INIT(&ip_mcast);
> > >      struct ovsdb_idl_condition igmp =
OVSDB_IDL_CONDITION_INIT(&igmp);
> > > +    struct ovsdb_idl_condition chprv =
OVSDB_IDL_CONDITION_INIT(&chprv);
> > >
> > >      if (monitor_all) {
> > >          ovsdb_idl_condition_add_clause_true(&pb);
> > > @@ -165,6 +166,7 @@ update_sb_monitors(struct ovsdb_idl *ovnsb_idl,
> > >          ovsdb_idl_condition_add_clause_true(&ce);
> > >          ovsdb_idl_condition_add_clause_true(&ip_mcast);
> > >          ovsdb_idl_condition_add_clause_true(&igmp);
> > > +        ovsdb_idl_condition_add_clause_true(&chprv);
> > >          goto out;
> > >      }
> > >
> > > @@ -196,6 +198,10 @@ update_sb_monitors(struct ovsdb_idl *ovnsb_idl,
> > >
 &chassis->header_.uuid);
> > >          sbrec_igmp_group_add_clause_chassis(&igmp, OVSDB_F_EQ,
> > >                                              &chassis->header_.uuid);
> > > +
> > > +        /* Monitors Chassis_Private record for current chassis only
*/
> > > +        sbrec_chassis_private_add_clause_name(&chprv, OVSDB_F_EQ,
> > > +                                              chassis->name);
> > >      }
> > >      if (local_ifaces) {
> > >          const char *name;
> > > @@ -229,6 +235,7 @@ out:
> > >      sbrec_controller_event_set_condition(ovnsb_idl, &ce);
> > >      sbrec_ip_multicast_set_condition(ovnsb_idl, &ip_mcast);
> > >      sbrec_igmp_group_set_condition(ovnsb_idl, &igmp);
> > > +    sbrec_chassis_private_set_condition(ovnsb_idl, &chprv);
> > >      ovsdb_idl_condition_destroy(&pb);
> > >      ovsdb_idl_condition_destroy(&lf);
> > >      ovsdb_idl_condition_destroy(&mb);
> > > @@ -237,6 +244,7 @@ out:
> > >      ovsdb_idl_condition_destroy(&ce);
> > >      ovsdb_idl_condition_destroy(&ip_mcast);
> > >      ovsdb_idl_condition_destroy(&igmp);
> > > +    ovsdb_idl_condition_destroy(&chprv);
> > >  }
> > >
> > >  static const char *
> > > @@ -687,6 +695,15 @@ get_transport_zones(const struct
ovsrec_open_vswitch_table *ovs_table)
> > >      return smap_get_def(&cfg->external_ids, "ovn-transport-zones",
"");
> > >  }
> > >
> > > +static uint16_t
> > > +get_chassis_liveness_interval(const struct sbrec_sb_global_table
> > > +                              *sb_global_table)
> > > +{
> > > +    const struct sbrec_sb_global *sb
> > > +        = sbrec_sb_global_table_first(sb_global_table);
> > > +    return atoi(smap_get_def(&sb->options,
"chassis_liveness_interval", "0"));
> >
> > We have smap_get_int in smap.h which does exactly this with a bit more
> > error checking.
> >
>
> A super! Thanks, I will change it.
>
> > > +}
> > > +
> > >  static void
> > >  ctrl_register_ovs_idl(struct ovsdb_idl *ovs_idl)
> > >  {
> > > @@ -1760,6 +1777,8 @@ main(int argc, char *argv[])
> > >
> > >      struct ovsdb_idl_index *sbrec_chassis_by_name
> > >          = chassis_index_create(ovnsb_idl_loop.idl);
> > > +    struct ovsdb_idl_index *sbrec_chassis_private_by_name
> > > +        = chassis_private_index_create(ovnsb_idl_loop.idl);
> > >      struct ovsdb_idl_index *sbrec_multicast_group_by_name_datapath
> > >          = mcast_group_index_create(ovnsb_idl_loop.idl);
> > >      struct ovsdb_idl_index *sbrec_port_binding_by_name
> > > @@ -1821,6 +1840,10 @@ main(int argc, char *argv[])
> > >      ovsdb_idl_omit(ovnsb_idl_loop.idl, &sbrec_connection_col_status);
> > >      ovsdb_idl_omit(ovnsb_idl_loop.idl, &sbrec_connection_col_target);
> > >
> > > +    /* Do not monitor the Chassis_Private external_ids column */
> > > +    ovsdb_idl_omit(ovnsb_idl_loop.idl,
> > > +                   &sbrec_chassis_private_col_external_ids);
> > > +
> > >      update_sb_monitors(ovnsb_idl_loop.idl, NULL, NULL, NULL, false);
> > >
> > >      stopwatch_create(CONTROLLER_LOOP_STOPWATCH_NAME, SW_MS);
> > > @@ -1998,10 +2021,16 @@ main(int argc, char *argv[])
> > >                  process_br_int(ovs_idl_txn, bridge_table, ovs_table);
> > >              const char *chassis_id = get_ovs_chassis_id(ovs_table);
> > >              const struct sbrec_chassis *chassis = NULL;
> > > +            const struct sbrec_chassis_private *chassis_private =
NULL;
> > > +            uint16_t chassis_liveness_interval =
get_chassis_liveness_interval(
> > > +                sbrec_sb_global_table_get(ovnsb_idl_loop.idl));
> > >              if (chassis_id) {
> > >                  chassis = chassis_run(ovnsb_idl_txn,
sbrec_chassis_by_name,
> > > +                                      sbrec_chassis_private_by_name,
> > >                                        ovs_table, chassis_table,
chassis_id,
> > > -                                      br_int, &transport_zones);
> > > +                                      br_int, &transport_zones,
> > > +                                      &chassis_private,
> > > +                                      chassis_liveness_interval);
> > >              }
> > >
> > >              if (br_int) {
> > > @@ -2164,6 +2193,10 @@ main(int argc, char *argv[])
> > >                  ofctrl_wait();
> > >                  pinctrl_wait(ovnsb_idl_txn);
> > >              }
> > > +
> > > +            if (chassis) {
> > > +                chassis_wait(chassis_liveness_interval);
> > > +            }
> > >          }
> > >
> > >          unixctl_server_run(unixctl);
> > > @@ -2232,10 +2265,17 @@ main(int argc, char *argv[])
> > >                     ? chassis_lookup_by_name(sbrec_chassis_by_name,
chassis_id)
> > >                     : NULL);
> > >
> > > +            const struct sbrec_chassis_private *chassis_private
> > > +                = (chassis_id
> > > +                   ? chassis_private_lookup_by_name(
> > > +                       sbrec_chassis_private_by_name, chassis_id)
> > > +                   : NULL);
> > > +
> > >              /* Run all of the cleanup functions, even if one of them
returns
> > >               * false. We're done if all of them return true. */
> > >              done = binding_cleanup(ovnsb_idl_txn,
port_binding_table, chassis);
> > > -            done = chassis_cleanup(ovnsb_idl_txn, chassis) && done;
> > > +            done = chassis_cleanup(ovnsb_idl_txn, chassis,
> > > +                                   chassis_private) && done;
> > >              done = encaps_cleanup(ovs_idl_txn, br_int) && done;
> > >              done = igmp_group_cleanup(ovnsb_idl_txn,
sbrec_igmp_group) && done;
> > >              if (done) {
> > > diff --git a/lib/chassis-index.c b/lib/chassis-index.c
> > > index 39066f4cc..d9bbba8e4 100644
> > > --- a/lib/chassis-index.c
> > > +++ b/lib/chassis-index.c
> > > @@ -40,6 +40,32 @@ chassis_lookup_by_name(struct ovsdb_idl_index
*sbrec_chassis_by_name,
> > >      return retval;
> > >  }
> > >
> > > +struct ovsdb_idl_index *
> > > +chassis_private_index_create(struct ovsdb_idl *idl)
> > > +{
> > > +    return ovsdb_idl_index_create1(
> > > +        idl, &sbrec_chassis_private_col_name);
> > > +}
> > > +
> > > +/* Finds and returns the chassis with the given 'name', or NULL if
no such
> > > + * chassis exists. */
> > > +const struct sbrec_chassis_private *
> > > +chassis_private_lookup_by_name(
> > > +    struct ovsdb_idl_index *sbrec_chassis_private_by_name,
> > > +    const char *name)
> > > +{
> > > +    struct sbrec_chassis_private *target = \
> > > +
 sbrec_chassis_private_index_init_row(sbrec_chassis_private_by_name);
> > > +    sbrec_chassis_private_index_set_name(target, name);
> > > +
> > > +    struct sbrec_chassis_private *retval =
sbrec_chassis_private_index_find(
> > > +        sbrec_chassis_private_by_name, target);
> > > +
> > > +    sbrec_chassis_private_index_destroy_row(target);
> > > +
> > > +    return retval;
> > > +}
> > > +
> > >  struct ovsdb_idl_index *
> > >  ha_chassis_group_index_create(struct ovsdb_idl *idl)
> > >  {
> > > diff --git a/lib/chassis-index.h b/lib/chassis-index.h
> > > index 302e5f0fd..77d2e1f99 100644
> > > --- a/lib/chassis-index.h
> > > +++ b/lib/chassis-index.h
> > > @@ -23,6 +23,11 @@ struct ovsdb_idl_index
*chassis_index_create(struct ovsdb_idl *);
> > >  const struct sbrec_chassis *chassis_lookup_by_name(
> > >      struct ovsdb_idl_index *sbrec_chassis_by_name, const char *name);
> > >
> > > +struct ovsdb_idl_index *chassis_private_index_create(struct
ovsdb_idl *);
> > > +
> > > +const struct sbrec_chassis_private *chassis_private_lookup_by_name(
> > > +    struct ovsdb_idl_index *sbrec_chassis_private_by_name, const
char *name);
> > > +
> > >  struct ovsdb_idl_index *ha_chassis_group_index_create(struct
ovsdb_idl *idl);
> > >  const struct sbrec_ha_chassis_group *ha_chassis_group_lookup_by_name(
> > >      struct ovsdb_idl_index *sbrec_ha_chassis_grp_by_name, const char
*name);
> > > diff --git a/northd/ovn-northd.c b/northd/ovn-northd.c
> > > index 2580b4ec9..1b16f5ef6 100644
> > > --- a/northd/ovn-northd.c
> > > +++ b/northd/ovn-northd.c
> > > @@ -11555,6 +11555,10 @@ main(int argc, char *argv[])
> > >      ovsdb_idl_add_column(ovnsb_idl_loop.idl,
&sbrec_chassis_col_name);
> > >      ovsdb_idl_add_column(ovnsb_idl_loop.idl,
&sbrec_chassis_col_external_ids);
> > >
> > > +    ovsdb_idl_add_table(ovnsb_idl_loop.idl,
&sbrec_table_chassis_private);
> > > +    add_column_noalert(ovnsb_idl_loop.idl,
> > > +                       &sbrec_chassis_private_col_external_ids);
> > > +
> > >      ovsdb_idl_add_table(ovnsb_idl_loop.idl, &sbrec_table_ha_chassis);
> > >      add_column_noalert(ovnsb_idl_loop.idl,
> > >                         &sbrec_ha_chassis_col_chassis);
> > > diff --git a/ovn-sb.ovsschema b/ovn-sb.ovsschema
> > > index d89f8dbbb..88da05849 100644
> > > --- a/ovn-sb.ovsschema
> > > +++ b/ovn-sb.ovsschema
> > > @@ -1,7 +1,7 @@
> > >  {
> > >      "name": "OVN_Southbound",
> > > -    "version": "2.7.0",
> > > -    "cksum": "4286723485 21693",
> > > +    "version": "2.8.0",
> > > +    "cksum": "293994447 22064",
> > >      "tables": {
> > >          "SB_Global": {
> > >              "columns": {
> > > @@ -43,6 +43,15 @@
> > >                                                "max": "unlimited"}}},
> > >              "isRoot": true,
> > >              "indexes": [["name"]]},
> > > +        "Chassis_Private": {
> > > +            "columns": {
> > > +                "name": {"type": "string"},
> > > +                "alive_at": {"type": "string"},
> > > +                "external_ids": {
> > > +                    "type": {"key": "string", "value": "string",
> > > +                             "min": 0, "max": "unlimited"}}},
> > > +            "isRoot": true,
> > > +            "indexes": [["name"]]},
> > >          "Encap": {
> > >              "columns": {
> > >                  "type": {"type": {"key": {
> > > diff --git a/ovn-sb.xml b/ovn-sb.xml
> > > index 3ae9d4f92..8c9822fdb 100644
> > > --- a/ovn-sb.xml
> > > +++ b/ovn-sb.xml
> > > @@ -366,6 +366,41 @@
> > >      </group>
> > >    </table>
> > >
> > > +  <table name="Chassis_Private" title="Chassis Private">
> > > +    <p>
> > > +      Each row in this table maintains per chassis private data that
are
> > > +      accessed only by the owning chassis (write only) and
ovn-northd, not by
> > > +      any other chassis.  These data are stored in this separate
table instead
> > > +      of the <ref table="Chassis"/> table for performance
considerations:
> > > +      the rows in this table can be conditionally monitored by
chassises so
> > > +      that each chassis only get update notifications for its own
row, to avoid
> > > +      unnecessary chassis private data update flooding in a large
scale
> > > +      deployment.  (Future: this separation can be avoided if ovsdb
conditional
> > > +      monitoring is supported on a set of columns)
> > > +    </p>
> > > +
> > > +    <column name="name">
> > > +      The name of the chassis that owns these chassis-private data.
> > > +    </column>
> > > +
> > > +    <column name="alive_at">
> > > +      A timestamp indicating the last time ovn-controller
> > > +      signalized it's alive. For monitoring purposes, setting the
> > > +      chassis_liveness_interval configuration to the
OVN_Northhbound's
> > > +      <ref table="NB_Global" column="options"/> with an integer
value (in
> > > +      seconds) causes the <code>ovn-controller</code> to update this
> > > +      column with a timestamp every N seconds. If the value is not
set
> > > +      or 0, then <code>ovn-controller</code> doesn't update this
column.
> > > +    </column>
> > > +
> > > +    <group title="Common Columns">
> > > +      The overall purpose of these columns is described under
<code>Common
> > > +      Columns</code> at the beginning of this document.
> > > +
> > > +      <column name="external_ids"/>
> > > +    </group>
> > > +  </table>
> > > +
> > >    <table name="Encap" title="Encapsulation Types">
> > >      <p>
> > >        The <ref column="encaps" table="Chassis"/> column in the <ref
> > >
> >
> _______________________________________________
> dev mailing list
> dev@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Lucas Alvares Gomes March 11, 2020, 1:41 p.m. UTC | #4
Hi Han,

Thank you very much for the feedback, much appreciated!

On Wed, Mar 11, 2020 at 7:06 AM Han Zhou <hzhou@ovn.org> wrote:
>
> Thanks Lucas for working on this! Please see my comments below.
>
> On Thu, Feb 20, 2020 at 1:56 AM Lucas Alvares Gomes <lucasagomes@gmail.com> wrote:
> >
> > Thanks for the review Dumitry!
> >
> > On Thu, Feb 20, 2020 at 9:19 AM Dumitru Ceara <dceara@redhat.com> wrote:
> > >
> > > On 2/19/20 4:37 PM, lmartins@redhat.com wrote:
> > > > From: Lucas Alvares Gomes <lucasagomes@gmail.com>
> > > >
> > > > NOTE: SENDING THIS PATCH AS A RFC TO SEE WHAT OTHERS MIGHT THINK ABOUT
> > > > THE IDEA PRIOR TO WRITTING TESTS TO IT.
> > > >
> > > > CMSes integrating with OVN often uses the nb_cfg mechanism as a way to
> > > > check the health status of the ovn-controller process but, the current
> > > > implementation isn't ideal for that purpose because it floods the control
> > > > plane with update notifications every time the nb_cfg value is
> > > > incremented.
> > > >
> > > > This patch is merging two ideas:
> > > >
> > > > 1) Han Zhou proposed a patch [0] creating a table called Chassis_Private
> > > >    where each hypervisor *only* writes and monitors its own record to
> > > >    avoid this flooding problem.
> > > >
> > > > 2) Having ovn-controller to periodically log that it's alive instead of
> > > >    relying on the nb_cfg mechanism.
> > > >
> > > > By using this mechanism, a CMS can more easily just read this new
> > > > Chassis_Private table and figure out the status of each Chassis
> > > > (ovn-controller) in the cluster.
> > > >
> > >
> > > Hi Lucas,
> > >
> > > Thanks a lot for working on this!
> > >
> > > > Here's some reasons why I believe this approach is better than having
> > > > to bump the  nb_cfg value:
> > > >
> > > > 1) Simple integration. Before, the CMS had to increment the nb_cfg
> > > >    value in the NB_Global table and wait for it to be propagated to the
> > > >    Southbound database (ovn-northd and then ovn-controllers) Chassis
> > > >    table in order to know the status of the Chassis.
> > > >
> > > >    Now, it can just read the Chassis_Private table and compare the
> > > >    alive_at column against the current time.
>
> I think the nb_cfg mechanism doesn't have to wait. There can be one monitor updating the counter periodically, and any other worker can compare the desired value and each chassis's value and see which ones are stale, e.g. if the gap is bigger than 3, meaning the chassis missed 3 round of heartbeat.
> But it is true that it is more costly because there are 2 directions of communications. It may or may not be a concern, depending on the scale and the frequency of heartbeat.
>

That's true, it can work as you suggested.

In the OVN neutron driver we relaxed the checks to account for some
miss synchronization [0] and added some code to guard against the
frequency of the checks as well [1].

[0] https://review.opendev.org/#/c/703612/
[1] https://review.opendev.org/#/c/704530/

> > >
> > > I guess the only comment I have about this approach is that it doesn't
> > > feel completely OK to store a timestamp representation as string in
> > > "alive_at". I'm afraid this might be too inflexible and force CMSes to
> > > compare their local time with the value in this column.
> > >
> > > I know we discussed offline that using a counter that is periodically
> > > incremented by ovn-controller instead of a timestamp would complicate
> > > the implementation in Openstack but maybe other people on this mailing
> > > list have ideas on how to deal with this in a more generic way.
> > >
> >
> > Agreed, let's see if people have more ideas here.
> >
> > Also, let me expand my explanation a little. The reason why I think
> > having a counter is not ideal is because services would need to track
> > this number to know what the value was before and then compare with
> > the current value to figure out whether it's being incremented or not.
> >
> > In OpenStack, we have multiple API workers spread across the
> > controllers nodes and the API request to check for the services'
> > health status will land on any of those workers. Which means that I
> > can't keep the track of the counter in-memory. Mostly likely, I would
> > need to set an IDL event on the field being incremented and store the
> > status of those chassis somewhere else accessible by all API workers.
> >
> > The advantage with the nb_cfg mechanism compared with the incremenal
> > counter is that those values are already in the OVSDB, so we can
> > compare what's in the NB DB with the SB DB to figure whether the
> > services are alive or not. But, the price is that we need more code to
> > deal with waiting for the synchronization of the OVSDBs and, if we
> > move it to the Chasiss_Private table it won't be backwards compatible.
> >
> > Therefore I think a timestamp would be better. It's easy to understand
> > by either a service or a even a person. When u look at the alive_at
> > field u know the last time the service signilized it was alive. For
> > CMSes, the service can just read the Chassis_Private table and compare
> > the alive_at field with the current time to figure the status of the
> > service (right now it's UTC, but, we can change it to include the TZ
> > info if people think it's easier). No writes, sync issues and also
> > it's backwards compatible with the nb_cfg approach.
> >
> > > >
> > > > 2) Less costy. Just one read from the db is needed, no writing. Code
> > > >    using the nb_cfg mechanism had to implement a few safe-guard code to
> > > >    make less error prone. See [1] and [2] for example.
> > > >
> > > > 3) Backwards compatibility. The patch [0] was moving the nb_cfg value
> > > >    to new table so, systems relying on it would need to update their
> > > >    code upon updating OVN.
> > >
> I have a different opinion on the backward compatibility. It is true that moving the nb_cfg to a new table may cause compatibilty issue. However, I think it shouldn't be a problem in reality, mainly because that the flooding problem of nb_cfg today prevents it being used in live producation environment.
> In fact, the nb_cfg mechanism has it's unique capability of performing a "control plane" ping, and it can be utilized to measure the end-to-end latency of the whole control plane, i.e. when there is a change in NB, how long it takes for all chassises to enforce that change. It is very useful in scalability evaluation. Because of this, I strongly suggest to move the nb_cfg mechanism to chassis_private table as well. I think it is not harmful to have both mechanism co-exist (but both should be in chassis_private table to avoid causing flooding problem), and each would have its own advantages in different use cases. We could even combine both approaches:
>
> - Whenever nb_cfg is updated, each chassis should update its own record in SB with the same value. When updating this value, it updates the timestamp column as well, which could tell the latency of the specific chassis for handling the update.
>
> - If nb_cfg is not update, but chassis_liveness_interval is set, each chassis should update the timestamp only (the nb_cfg value in its own record should keep the same as the current NB nb_cfg value. If for some reason they are different, then it should be updated taking this chance).
>
> What do you think?
>

I think you've made great points, I haven't thought about the latency
before and it seems pretty useful for that indeed.

Regarding the suggestions, I like it but I don't want to create an
over-engineered mechanism. One of the main reasons why I've thought
about using timestamps instead of the nb_cfg was the backward
compatibly issue, but as you've said the nb_cfg mechanism at the
moment is pretty much unusable in production so maybe we shouldn't
care too much about the backward compatibility and fix that mechanism
instead.

Moving the nb_cfg mechanism to the Chassis_Private table should be
enough to fix our problems and the timestamp idea can be discarded
(less knobs to configure OVN).

If you don't mind, I will rebase your Chassis_Private patch against
master and re-submit it.

Again, thank you for the feedback,
Lucas


> Thanks,
> Han
>
> > > Nice!
> > >
> > > One more minor comment on the code, inline.
> > >
> > > Thanks,
> > > Dumitru
> > >
> > > >
> > > > To enable this new mechanism set an option called
> > > > chassis_liveness_interval=N to the "options" column in the NB_Global
> > > > table, where N is the time (in seconds) in which the ovn-controller
> > > > should updates the alive_at column. If not set or set to 0, the
> > > > mechanism is disabled. By default, it's disabled.
> > > >
> > > > [0] https://patchwork.ozlabs.org/patch/899608/
> > > > [1] https://review.opendev.org/#/c/703612/
> > > > [2] https://review.opendev.org/#/c/704530/
> > > >
> > > > Co-authored-by: Han Zhou <hzhou8@ebay.com>
> > > > Co-authored-by: Numan Siddique <numans@ovn.org>
> > > > Co-authored-by: Dumitru Ceara <dceara@redhat.com>
> > > > Signed-off-by: Lucas Alvares Gomes <lucasagomes@gmail.com>
> > > > ---
> > > >  controller/chassis.c        | 56 +++++++++++++++++++++++++++++++++++--
> > > >  controller/chassis.h        | 11 ++++++--
> > > >  controller/ovn-controller.c | 44 +++++++++++++++++++++++++++--
> > > >  lib/chassis-index.c         | 26 +++++++++++++++++
> > > >  lib/chassis-index.h         |  5 ++++
> > > >  northd/ovn-northd.c         |  4 +++
> > > >  ovn-sb.ovsschema            | 13 +++++++--
> > > >  ovn-sb.xml                  | 35 +++++++++++++++++++++++
> > > >  8 files changed, 185 insertions(+), 9 deletions(-)
> > > >
> > > > diff --git a/controller/chassis.c b/controller/chassis.c
> > > > index 522893ead..7892cb966 100644
> > > > --- a/controller/chassis.c
> > > > +++ b/controller/chassis.c
> > > > @@ -24,6 +24,7 @@
> > > >  #include "openvswitch/dynamic-string.h"
> > > >  #include "openvswitch/vlog.h"
> > > >  #include "openvswitch/ofp-parse.h"
> > > > +#include "openvswitch/poll-loop.h"
> > > >  #include "lib/chassis-index.h"
> > > >  #include "lib/ovn-sb-idl.h"
> > > >  #include "ovn-controller.h"
> > > > @@ -47,11 +48,16 @@ struct chassis_info {
> > > >
> > > >      /* True if Chassis SB record is initialized, false otherwise. */
> > > >      uint32_t id_inited : 1;
> > > > +
> > > > +    /* Next livennes update time to update the 'alive_at' column of
> > > > +     * Chassis_Private table. */
> > > > +    long long int next_liveness_update;
> > > >  };
> > > >
> > > >  static struct chassis_info chassis_state = {
> > > >      .id = DS_EMPTY_INITIALIZER,
> > > >      .id_inited = false,
> > > > +    .next_liveness_update = LLONG_MAX,
> > > >  };
> > > >
> > > >  static void
> > > > @@ -581,18 +587,38 @@ chassis_update(const struct sbrec_chassis *chassis_rec,
> > > >      free(encaps);
> > > >  }
> > > >
> > > > +static void
> > > > +chassis_update_liveness_cfg(const struct sbrec_chassis_private *chassis_rec,
> > > > +                            struct chassis_info *ch_info,
> > > > +                            int chassis_liveness_interval)
> > > > +{
> > > > +    if (ch_info->next_liveness_update == LLONG_MAX ||
> > > > +        time_msec() > ch_info->next_liveness_update) {
> > > > +        sbrec_chassis_private_set_alive_at(
> > > > +            chassis_rec,
> > > > +            xastrftime_msec("%Y-%m-%d %H:%M:%S", time_wall_msec(), true));
> > > > +        ch_info->next_liveness_update =
> > > > +            time_msec() + chassis_liveness_interval * 1000;
> > > > +    }
> > > > +}
> > > > +
> > > >  /* Returns this chassis's Chassis record, if it is available. */
> > > >  const struct sbrec_chassis *
> > > >  chassis_run(struct ovsdb_idl_txn *ovnsb_idl_txn,
> > > >              struct ovsdb_idl_index *sbrec_chassis_by_name,
> > > > +            struct ovsdb_idl_index *sbrec_chassis_private_by_name,
> > > >              const struct ovsrec_open_vswitch_table *ovs_table,
> > > >              const struct sbrec_chassis_table *chassis_table,
> > > >              const char *chassis_id,
> > > >              const struct ovsrec_bridge *br_int,
> > > > -            const struct sset *transport_zones)
> > > > +            const struct sset *transport_zones,
> > > > +            const struct sbrec_chassis_private **chassis_private,
> > > > +            int chassis_liveness_interval)
> > > >  {
> > > >      struct ovs_chassis_cfg ovs_cfg;
> > > >
> > > > +    *chassis_private = NULL;
> > > > +
> > > >      /* Get the chassis config from the ovs table. */
> > > >      ovs_chassis_cfg_init(&ovs_cfg);
> > > >      if (!chassis_parse_ovs_config(ovs_table, br_int, &ovs_cfg)) {
> > > > @@ -616,6 +642,24 @@ chassis_run(struct ovsdb_idl_txn *ovnsb_idl_txn,
> > > >                                    chassis_id);
> > > >      }
> > > >
> > > > +    /* Create the Chassis_Private entry if it's not there yet */
> > > > +    const struct sbrec_chassis_private *chassis_private_rec =
> > > > +        chassis_private_lookup_by_name(sbrec_chassis_private_by_name,
> > > > +                                       chassis_id);
> > > > +    if (!chassis_private_rec && ovnsb_idl_txn) {
> > > > +        chassis_private_rec = sbrec_chassis_private_insert(ovnsb_idl_txn);
> > > > +        sbrec_chassis_private_set_name(chassis_private_rec, chassis_id);
> > > > +    }
> > > > +
> > > > +    /* Update the alive_at column from the Chassis_Private table if
> > > > +     * the chassis liveness mechanism is enabled */
> > > > +    if (chassis_private_rec && ovnsb_idl_txn && chassis_liveness_interval) {
> > > > +        chassis_update_liveness_cfg(chassis_private_rec, &chassis_state,
> > > > +                                    chassis_liveness_interval);
> > > > +    }
> > > > +
> > > > +    *chassis_private = chassis_private_rec;
> > > > +
> > > >      ovs_chassis_cfg_destroy(&ovs_cfg);
> > > >      return chassis_rec;
> > > >  }
> > > > @@ -669,7 +713,8 @@ chassis_get_mac(const struct sbrec_chassis *chassis_rec,
> > > >   * required. */
> > > >  bool
> > > >  chassis_cleanup(struct ovsdb_idl_txn *ovnsb_idl_txn,
> > > > -                const struct sbrec_chassis *chassis_rec)
> > > > +                const struct sbrec_chassis *chassis_rec,
> > > > +                const struct sbrec_chassis_private *chassis_private_rec)
> > > >  {
> > > >      if (!chassis_rec) {
> > > >          return true;
> > > > @@ -679,6 +724,7 @@ chassis_cleanup(struct ovsdb_idl_txn *ovnsb_idl_txn,
> > > >                                    "ovn-controller: unregistering chassis '%s'",
> > > >                                    chassis_rec->name);
> > > >          sbrec_chassis_delete(chassis_rec);
> > > > +        sbrec_chassis_private_delete(chassis_private_rec);
> > > >      }
> > > >      return false;
> > > >  }
> > > > @@ -695,3 +741,9 @@ chassis_get_id(void)
> > > >
> > > >      return NULL;
> > > >  }
> > > > +
> > > > +void chassis_wait(int chassis_liveness_interval) {
> > > > +    if (chassis_liveness_interval) {
> > > > +        poll_timer_wait_until(chassis_state.next_liveness_update);
> > > > +    }
> > > > +}
> > > > diff --git a/controller/chassis.h b/controller/chassis.h
> > > > index 178d2957e..38a549553 100644
> > > > --- a/controller/chassis.h
> > > > +++ b/controller/chassis.h
> > > > @@ -17,6 +17,7 @@
> > > >  #define OVN_CHASSIS_H 1
> > > >
> > > >  #include <stdbool.h>
> > > > +#include "lib/ovn-sb-idl.h"
> > > >
> > > >  struct ovsdb_idl;
> > > >  struct ovsdb_idl_index;
> > > > @@ -33,17 +34,21 @@ void chassis_register_ovs_idl(struct ovsdb_idl *);
> > > >  const struct sbrec_chassis *chassis_run(
> > > >      struct ovsdb_idl_txn *ovnsb_idl_txn,
> > > >      struct ovsdb_idl_index *sbrec_chassis_by_name,
> > > > +    struct ovsdb_idl_index *sbrec_chassis_private_by_name,
> > > >      const struct ovsrec_open_vswitch_table *,
> > > >      const struct sbrec_chassis_table *,
> > > >      const char *chassis_id, const struct ovsrec_bridge *br_int,
> > > > -    const struct sset *transport_zones);
> > > > +    const struct sset *transport_zones,
> > > > +    const struct sbrec_chassis_private **chassis_private,
> > > > +    int chassis_liveness_interval);
> > > >  bool chassis_cleanup(struct ovsdb_idl_txn *ovnsb_idl_txn,
> > > > -                     const struct sbrec_chassis *);
> > > > +                     const struct sbrec_chassis *,
> > > > +                     const struct sbrec_chassis_private *);
> > > >  bool chassis_get_mac(const struct sbrec_chassis *chassis,
> > > >                       const char *bridge_mapping,
> > > >                       struct eth_addr *chassis_mac);
> > > >  const char *chassis_get_id(void);
> > > >  const char * get_chassis_mac_mappings(const struct smap *ext_ids);
> > > > -
> > > > +void chassis_wait(int chassis_liveness_interval);
> > > >
> > > >  #endif /* controller/chassis.h */
> > > > diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c
> > > > index 4d245ca28..fd9d95da2 100644
> > > > --- a/controller/ovn-controller.c
> > > > +++ b/controller/ovn-controller.c
> > > > @@ -155,6 +155,7 @@ update_sb_monitors(struct ovsdb_idl *ovnsb_idl,
> > > >      struct ovsdb_idl_condition ce =  OVSDB_IDL_CONDITION_INIT(&ce);
> > > >      struct ovsdb_idl_condition ip_mcast = OVSDB_IDL_CONDITION_INIT(&ip_mcast);
> > > >      struct ovsdb_idl_condition igmp = OVSDB_IDL_CONDITION_INIT(&igmp);
> > > > +    struct ovsdb_idl_condition chprv = OVSDB_IDL_CONDITION_INIT(&chprv);
> > > >
> > > >      if (monitor_all) {
> > > >          ovsdb_idl_condition_add_clause_true(&pb);
> > > > @@ -165,6 +166,7 @@ update_sb_monitors(struct ovsdb_idl *ovnsb_idl,
> > > >          ovsdb_idl_condition_add_clause_true(&ce);
> > > >          ovsdb_idl_condition_add_clause_true(&ip_mcast);
> > > >          ovsdb_idl_condition_add_clause_true(&igmp);
> > > > +        ovsdb_idl_condition_add_clause_true(&chprv);
> > > >          goto out;
> > > >      }
> > > >
> > > > @@ -196,6 +198,10 @@ update_sb_monitors(struct ovsdb_idl *ovnsb_idl,
> > > >                                                    &chassis->header_.uuid);
> > > >          sbrec_igmp_group_add_clause_chassis(&igmp, OVSDB_F_EQ,
> > > >                                              &chassis->header_.uuid);
> > > > +
> > > > +        /* Monitors Chassis_Private record for current chassis only */
> > > > +        sbrec_chassis_private_add_clause_name(&chprv, OVSDB_F_EQ,
> > > > +                                              chassis->name);
> > > >      }
> > > >      if (local_ifaces) {
> > > >          const char *name;
> > > > @@ -229,6 +235,7 @@ out:
> > > >      sbrec_controller_event_set_condition(ovnsb_idl, &ce);
> > > >      sbrec_ip_multicast_set_condition(ovnsb_idl, &ip_mcast);
> > > >      sbrec_igmp_group_set_condition(ovnsb_idl, &igmp);
> > > > +    sbrec_chassis_private_set_condition(ovnsb_idl, &chprv);
> > > >      ovsdb_idl_condition_destroy(&pb);
> > > >      ovsdb_idl_condition_destroy(&lf);
> > > >      ovsdb_idl_condition_destroy(&mb);
> > > > @@ -237,6 +244,7 @@ out:
> > > >      ovsdb_idl_condition_destroy(&ce);
> > > >      ovsdb_idl_condition_destroy(&ip_mcast);
> > > >      ovsdb_idl_condition_destroy(&igmp);
> > > > +    ovsdb_idl_condition_destroy(&chprv);
> > > >  }
> > > >
> > > >  static const char *
> > > > @@ -687,6 +695,15 @@ get_transport_zones(const struct ovsrec_open_vswitch_table *ovs_table)
> > > >      return smap_get_def(&cfg->external_ids, "ovn-transport-zones", "");
> > > >  }
> > > >
> > > > +static uint16_t
> > > > +get_chassis_liveness_interval(const struct sbrec_sb_global_table
> > > > +                              *sb_global_table)
> > > > +{
> > > > +    const struct sbrec_sb_global *sb
> > > > +        = sbrec_sb_global_table_first(sb_global_table);
> > > > +    return atoi(smap_get_def(&sb->options, "chassis_liveness_interval", "0"));
> > >
> > > We have smap_get_int in smap.h which does exactly this with a bit more
> > > error checking.
> > >
> >
> > A super! Thanks, I will change it.
> >
> > > > +}
> > > > +
> > > >  static void
> > > >  ctrl_register_ovs_idl(struct ovsdb_idl *ovs_idl)
> > > >  {
> > > > @@ -1760,6 +1777,8 @@ main(int argc, char *argv[])
> > > >
> > > >      struct ovsdb_idl_index *sbrec_chassis_by_name
> > > >          = chassis_index_create(ovnsb_idl_loop.idl);
> > > > +    struct ovsdb_idl_index *sbrec_chassis_private_by_name
> > > > +        = chassis_private_index_create(ovnsb_idl_loop.idl);
> > > >      struct ovsdb_idl_index *sbrec_multicast_group_by_name_datapath
> > > >          = mcast_group_index_create(ovnsb_idl_loop.idl);
> > > >      struct ovsdb_idl_index *sbrec_port_binding_by_name
> > > > @@ -1821,6 +1840,10 @@ main(int argc, char *argv[])
> > > >      ovsdb_idl_omit(ovnsb_idl_loop.idl, &sbrec_connection_col_status);
> > > >      ovsdb_idl_omit(ovnsb_idl_loop.idl, &sbrec_connection_col_target);
> > > >
> > > > +    /* Do not monitor the Chassis_Private external_ids column */
> > > > +    ovsdb_idl_omit(ovnsb_idl_loop.idl,
> > > > +                   &sbrec_chassis_private_col_external_ids);
> > > > +
> > > >      update_sb_monitors(ovnsb_idl_loop.idl, NULL, NULL, NULL, false);
> > > >
> > > >      stopwatch_create(CONTROLLER_LOOP_STOPWATCH_NAME, SW_MS);
> > > > @@ -1998,10 +2021,16 @@ main(int argc, char *argv[])
> > > >                  process_br_int(ovs_idl_txn, bridge_table, ovs_table);
> > > >              const char *chassis_id = get_ovs_chassis_id(ovs_table);
> > > >              const struct sbrec_chassis *chassis = NULL;
> > > > +            const struct sbrec_chassis_private *chassis_private = NULL;
> > > > +            uint16_t chassis_liveness_interval = get_chassis_liveness_interval(
> > > > +                sbrec_sb_global_table_get(ovnsb_idl_loop.idl));
> > > >              if (chassis_id) {
> > > >                  chassis = chassis_run(ovnsb_idl_txn, sbrec_chassis_by_name,
> > > > +                                      sbrec_chassis_private_by_name,
> > > >                                        ovs_table, chassis_table, chassis_id,
> > > > -                                      br_int, &transport_zones);
> > > > +                                      br_int, &transport_zones,
> > > > +                                      &chassis_private,
> > > > +                                      chassis_liveness_interval);
> > > >              }
> > > >
> > > >              if (br_int) {
> > > > @@ -2164,6 +2193,10 @@ main(int argc, char *argv[])
> > > >                  ofctrl_wait();
> > > >                  pinctrl_wait(ovnsb_idl_txn);
> > > >              }
> > > > +
> > > > +            if (chassis) {
> > > > +                chassis_wait(chassis_liveness_interval);
> > > > +            }
> > > >          }
> > > >
> > > >          unixctl_server_run(unixctl);
> > > > @@ -2232,10 +2265,17 @@ main(int argc, char *argv[])
> > > >                     ? chassis_lookup_by_name(sbrec_chassis_by_name, chassis_id)
> > > >                     : NULL);
> > > >
> > > > +            const struct sbrec_chassis_private *chassis_private
> > > > +                = (chassis_id
> > > > +                   ? chassis_private_lookup_by_name(
> > > > +                       sbrec_chassis_private_by_name, chassis_id)
> > > > +                   : NULL);
> > > > +
> > > >              /* Run all of the cleanup functions, even if one of them returns
> > > >               * false. We're done if all of them return true. */
> > > >              done = binding_cleanup(ovnsb_idl_txn, port_binding_table, chassis);
> > > > -            done = chassis_cleanup(ovnsb_idl_txn, chassis) && done;
> > > > +            done = chassis_cleanup(ovnsb_idl_txn, chassis,
> > > > +                                   chassis_private) && done;
> > > >              done = encaps_cleanup(ovs_idl_txn, br_int) && done;
> > > >              done = igmp_group_cleanup(ovnsb_idl_txn, sbrec_igmp_group) && done;
> > > >              if (done) {
> > > > diff --git a/lib/chassis-index.c b/lib/chassis-index.c
> > > > index 39066f4cc..d9bbba8e4 100644
> > > > --- a/lib/chassis-index.c
> > > > +++ b/lib/chassis-index.c
> > > > @@ -40,6 +40,32 @@ chassis_lookup_by_name(struct ovsdb_idl_index *sbrec_chassis_by_name,
> > > >      return retval;
> > > >  }
> > > >
> > > > +struct ovsdb_idl_index *
> > > > +chassis_private_index_create(struct ovsdb_idl *idl)
> > > > +{
> > > > +    return ovsdb_idl_index_create1(
> > > > +        idl, &sbrec_chassis_private_col_name);
> > > > +}
> > > > +
> > > > +/* Finds and returns the chassis with the given 'name', or NULL if no such
> > > > + * chassis exists. */
> > > > +const struct sbrec_chassis_private *
> > > > +chassis_private_lookup_by_name(
> > > > +    struct ovsdb_idl_index *sbrec_chassis_private_by_name,
> > > > +    const char *name)
> > > > +{
> > > > +    struct sbrec_chassis_private *target = \
> > > > +        sbrec_chassis_private_index_init_row(sbrec_chassis_private_by_name);
> > > > +    sbrec_chassis_private_index_set_name(target, name);
> > > > +
> > > > +    struct sbrec_chassis_private *retval = sbrec_chassis_private_index_find(
> > > > +        sbrec_chassis_private_by_name, target);
> > > > +
> > > > +    sbrec_chassis_private_index_destroy_row(target);
> > > > +
> > > > +    return retval;
> > > > +}
> > > > +
> > > >  struct ovsdb_idl_index *
> > > >  ha_chassis_group_index_create(struct ovsdb_idl *idl)
> > > >  {
> > > > diff --git a/lib/chassis-index.h b/lib/chassis-index.h
> > > > index 302e5f0fd..77d2e1f99 100644
> > > > --- a/lib/chassis-index.h
> > > > +++ b/lib/chassis-index.h
> > > > @@ -23,6 +23,11 @@ struct ovsdb_idl_index *chassis_index_create(struct ovsdb_idl *);
> > > >  const struct sbrec_chassis *chassis_lookup_by_name(
> > > >      struct ovsdb_idl_index *sbrec_chassis_by_name, const char *name);
> > > >
> > > > +struct ovsdb_idl_index *chassis_private_index_create(struct ovsdb_idl *);
> > > > +
> > > > +const struct sbrec_chassis_private *chassis_private_lookup_by_name(
> > > > +    struct ovsdb_idl_index *sbrec_chassis_private_by_name, const char *name);
> > > > +
> > > >  struct ovsdb_idl_index *ha_chassis_group_index_create(struct ovsdb_idl *idl);
> > > >  const struct sbrec_ha_chassis_group *ha_chassis_group_lookup_by_name(
> > > >      struct ovsdb_idl_index *sbrec_ha_chassis_grp_by_name, const char *name);
> > > > diff --git a/northd/ovn-northd.c b/northd/ovn-northd.c
> > > > index 2580b4ec9..1b16f5ef6 100644
> > > > --- a/northd/ovn-northd.c
> > > > +++ b/northd/ovn-northd.c
> > > > @@ -11555,6 +11555,10 @@ main(int argc, char *argv[])
> > > >      ovsdb_idl_add_column(ovnsb_idl_loop.idl, &sbrec_chassis_col_name);
> > > >      ovsdb_idl_add_column(ovnsb_idl_loop.idl, &sbrec_chassis_col_external_ids);
> > > >
> > > > +    ovsdb_idl_add_table(ovnsb_idl_loop.idl, &sbrec_table_chassis_private);
> > > > +    add_column_noalert(ovnsb_idl_loop.idl,
> > > > +                       &sbrec_chassis_private_col_external_ids);
> > > > +
> > > >      ovsdb_idl_add_table(ovnsb_idl_loop.idl, &sbrec_table_ha_chassis);
> > > >      add_column_noalert(ovnsb_idl_loop.idl,
> > > >                         &sbrec_ha_chassis_col_chassis);
> > > > diff --git a/ovn-sb.ovsschema b/ovn-sb.ovsschema
> > > > index d89f8dbbb..88da05849 100644
> > > > --- a/ovn-sb.ovsschema
> > > > +++ b/ovn-sb.ovsschema
> > > > @@ -1,7 +1,7 @@
> > > >  {
> > > >      "name": "OVN_Southbound",
> > > > -    "version": "2.7.0",
> > > > -    "cksum": "4286723485 21693",
> > > > +    "version": "2.8.0",
> > > > +    "cksum": "293994447 22064",
> > > >      "tables": {
> > > >          "SB_Global": {
> > > >              "columns": {
> > > > @@ -43,6 +43,15 @@
> > > >                                                "max": "unlimited"}}},
> > > >              "isRoot": true,
> > > >              "indexes": [["name"]]},
> > > > +        "Chassis_Private": {
> > > > +            "columns": {
> > > > +                "name": {"type": "string"},
> > > > +                "alive_at": {"type": "string"},
> > > > +                "external_ids": {
> > > > +                    "type": {"key": "string", "value": "string",
> > > > +                             "min": 0, "max": "unlimited"}}},
> > > > +            "isRoot": true,
> > > > +            "indexes": [["name"]]},
> > > >          "Encap": {
> > > >              "columns": {
> > > >                  "type": {"type": {"key": {
> > > > diff --git a/ovn-sb.xml b/ovn-sb.xml
> > > > index 3ae9d4f92..8c9822fdb 100644
> > > > --- a/ovn-sb.xml
> > > > +++ b/ovn-sb.xml
> > > > @@ -366,6 +366,41 @@
> > > >      </group>
> > > >    </table>
> > > >
> > > > +  <table name="Chassis_Private" title="Chassis Private">
> > > > +    <p>
> > > > +      Each row in this table maintains per chassis private data that are
> > > > +      accessed only by the owning chassis (write only) and ovn-northd, not by
> > > > +      any other chassis.  These data are stored in this separate table instead
> > > > +      of the <ref table="Chassis"/> table for performance considerations:
> > > > +      the rows in this table can be conditionally monitored by chassises so
> > > > +      that each chassis only get update notifications for its own row, to avoid
> > > > +      unnecessary chassis private data update flooding in a large scale
> > > > +      deployment.  (Future: this separation can be avoided if ovsdb conditional
> > > > +      monitoring is supported on a set of columns)
> > > > +    </p>
> > > > +
> > > > +    <column name="name">
> > > > +      The name of the chassis that owns these chassis-private data.
> > > > +    </column>
> > > > +
> > > > +    <column name="alive_at">
> > > > +      A timestamp indicating the last time ovn-controller
> > > > +      signalized it's alive. For monitoring purposes, setting the
> > > > +      chassis_liveness_interval configuration to the OVN_Northhbound's
> > > > +      <ref table="NB_Global" column="options"/> with an integer value (in
> > > > +      seconds) causes the <code>ovn-controller</code> to update this
> > > > +      column with a timestamp every N seconds. If the value is not set
> > > > +      or 0, then <code>ovn-controller</code> doesn't update this column.
> > > > +    </column>
> > > > +
> > > > +    <group title="Common Columns">
> > > > +      The overall purpose of these columns is described under <code>Common
> > > > +      Columns</code> at the beginning of this document.
> > > > +
> > > > +      <column name="external_ids"/>
> > > > +    </group>
> > > > +  </table>
> > > > +
> > > >    <table name="Encap" title="Encapsulation Types">
> > > >      <p>
> > > >        The <ref column="encaps" table="Chassis"/> column in the <ref
> > > >
> > >
> > _______________________________________________
> > dev mailing list
> > dev@openvswitch.org
> > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Han Zhou March 11, 2020, 5:31 p.m. UTC | #5
On Wed, Mar 11, 2020 at 6:41 AM Lucas Alvares Gomes <lucasagomes@gmail.com>
wrote:
>
> Hi Han,
>
> Thank you very much for the feedback, much appreciated!
>
> On Wed, Mar 11, 2020 at 7:06 AM Han Zhou <hzhou@ovn.org> wrote:
> >
> > Thanks Lucas for working on this! Please see my comments below.
> >
> > On Thu, Feb 20, 2020 at 1:56 AM Lucas Alvares Gomes <
lucasagomes@gmail.com> wrote:
> > >
> > > Thanks for the review Dumitry!
> > >
> > > On Thu, Feb 20, 2020 at 9:19 AM Dumitru Ceara <dceara@redhat.com>
wrote:
> > > >
> > > > On 2/19/20 4:37 PM, lmartins@redhat.com wrote:
> > > > > From: Lucas Alvares Gomes <lucasagomes@gmail.com>
> > > > >
> > > > > NOTE: SENDING THIS PATCH AS A RFC TO SEE WHAT OTHERS MIGHT THINK
ABOUT
> > > > > THE IDEA PRIOR TO WRITTING TESTS TO IT.
> > > > >
> > > > > CMSes integrating with OVN often uses the nb_cfg mechanism as a
way to
> > > > > check the health status of the ovn-controller process but, the
current
> > > > > implementation isn't ideal for that purpose because it floods the
control
> > > > > plane with update notifications every time the nb_cfg value is
> > > > > incremented.
> > > > >
> > > > > This patch is merging two ideas:
> > > > >
> > > > > 1) Han Zhou proposed a patch [0] creating a table called
Chassis_Private
> > > > >    where each hypervisor *only* writes and monitors its own
record to
> > > > >    avoid this flooding problem.
> > > > >
> > > > > 2) Having ovn-controller to periodically log that it's alive
instead of
> > > > >    relying on the nb_cfg mechanism.
> > > > >
> > > > > By using this mechanism, a CMS can more easily just read this new
> > > > > Chassis_Private table and figure out the status of each Chassis
> > > > > (ovn-controller) in the cluster.
> > > > >
> > > >
> > > > Hi Lucas,
> > > >
> > > > Thanks a lot for working on this!
> > > >
> > > > > Here's some reasons why I believe this approach is better than
having
> > > > > to bump the  nb_cfg value:
> > > > >
> > > > > 1) Simple integration. Before, the CMS had to increment the nb_cfg
> > > > >    value in the NB_Global table and wait for it to be propagated
to the
> > > > >    Southbound database (ovn-northd and then ovn-controllers)
Chassis
> > > > >    table in order to know the status of the Chassis.
> > > > >
> > > > >    Now, it can just read the Chassis_Private table and compare the
> > > > >    alive_at column against the current time.
> >
> > I think the nb_cfg mechanism doesn't have to wait. There can be one
monitor updating the counter periodically, and any other worker can compare
the desired value and each chassis's value and see which ones are stale,
e.g. if the gap is bigger than 3, meaning the chassis missed 3 round of
heartbeat.
> > But it is true that it is more costly because there are 2 directions of
communications. It may or may not be a concern, depending on the scale and
the frequency of heartbeat.
> >
>
> That's true, it can work as you suggested.
>
> In the OVN neutron driver we relaxed the checks to account for some
> miss synchronization [0] and added some code to guard against the
> frequency of the checks as well [1].
>
> [0] https://review.opendev.org/#/c/703612/
> [1] https://review.opendev.org/#/c/704530/
>
> > > >
> > > > I guess the only comment I have about this approach is that it
doesn't
> > > > feel completely OK to store a timestamp representation as string in
> > > > "alive_at". I'm afraid this might be too inflexible and force CMSes
to
> > > > compare their local time with the value in this column.
> > > >
> > > > I know we discussed offline that using a counter that is
periodically
> > > > incremented by ovn-controller instead of a timestamp would
complicate
> > > > the implementation in Openstack but maybe other people on this
mailing
> > > > list have ideas on how to deal with this in a more generic way.
> > > >
> > >
> > > Agreed, let's see if people have more ideas here.
> > >
> > > Also, let me expand my explanation a little. The reason why I think
> > > having a counter is not ideal is because services would need to track
> > > this number to know what the value was before and then compare with
> > > the current value to figure out whether it's being incremented or not.
> > >
> > > In OpenStack, we have multiple API workers spread across the
> > > controllers nodes and the API request to check for the services'
> > > health status will land on any of those workers. Which means that I
> > > can't keep the track of the counter in-memory. Mostly likely, I would
> > > need to set an IDL event on the field being incremented and store the
> > > status of those chassis somewhere else accessible by all API workers.
> > >
> > > The advantage with the nb_cfg mechanism compared with the incremenal
> > > counter is that those values are already in the OVSDB, so we can
> > > compare what's in the NB DB with the SB DB to figure whether the
> > > services are alive or not. But, the price is that we need more code to
> > > deal with waiting for the synchronization of the OVSDBs and, if we
> > > move it to the Chasiss_Private table it won't be backwards compatible.
> > >
> > > Therefore I think a timestamp would be better. It's easy to understand
> > > by either a service or a even a person. When u look at the alive_at
> > > field u know the last time the service signilized it was alive. For
> > > CMSes, the service can just read the Chassis_Private table and compare
> > > the alive_at field with the current time to figure the status of the
> > > service (right now it's UTC, but, we can change it to include the TZ
> > > info if people think it's easier). No writes, sync issues and also
> > > it's backwards compatible with the nb_cfg approach.
> > >
> > > > >
> > > > > 2) Less costy. Just one read from the db is needed, no writing.
Code
> > > > >    using the nb_cfg mechanism had to implement a few safe-guard
code to
> > > > >    make less error prone. See [1] and [2] for example.
> > > > >
> > > > > 3) Backwards compatibility. The patch [0] was moving the nb_cfg
value
> > > > >    to new table so, systems relying on it would need to update
their
> > > > >    code upon updating OVN.
> > > >
> > I have a different opinion on the backward compatibility. It is true
that moving the nb_cfg to a new table may cause compatibilty issue.
However, I think it shouldn't be a problem in reality, mainly because that
the flooding problem of nb_cfg today prevents it being used in live
producation environment.
> > In fact, the nb_cfg mechanism has it's unique capability of performing
a "control plane" ping, and it can be utilized to measure the end-to-end
latency of the whole control plane, i.e. when there is a change in NB, how
long it takes for all chassises to enforce that change. It is very useful
in scalability evaluation. Because of this, I strongly suggest to move the
nb_cfg mechanism to chassis_private table as well. I think it is not
harmful to have both mechanism co-exist (but both should be in
chassis_private table to avoid causing flooding problem), and each would
have its own advantages in different use cases. We could even combine both
approaches:
> >
> > - Whenever nb_cfg is updated, each chassis should update its own record
in SB with the same value. When updating this value, it updates the
timestamp column as well, which could tell the latency of the specific
chassis for handling the update.
> >
> > - If nb_cfg is not update, but chassis_liveness_interval is set, each
chassis should update the timestamp only (the nb_cfg value in its own
record should keep the same as the current NB nb_cfg value. If for some
reason they are different, then it should be updated taking this chance).
> >
> > What do you think?
> >
>
> I think you've made great points, I haven't thought about the latency
> before and it seems pretty useful for that indeed.
>
> Regarding the suggestions, I like it but I don't want to create an
> over-engineered mechanism. One of the main reasons why I've thought
> about using timestamps instead of the nb_cfg was the backward
> compatibly issue, but as you've said the nb_cfg mechanism at the
> moment is pretty much unusable in production so maybe we shouldn't
> care too much about the backward compatibility and fix that mechanism
> instead.
>
> Moving the nb_cfg mechanism to the Chassis_Private table should be
> enough to fix our problems and the timestamp idea can be discarded
> (less knobs to configure OVN).
>
> If you don't mind, I will rebase your Chassis_Private patch against
> master and re-submit it.
>
Sounds good to me.

Thanks,
Han
diff mbox series

Patch

diff --git a/controller/chassis.c b/controller/chassis.c
index 522893ead..7892cb966 100644
--- a/controller/chassis.c
+++ b/controller/chassis.c
@@ -24,6 +24,7 @@ 
 #include "openvswitch/dynamic-string.h"
 #include "openvswitch/vlog.h"
 #include "openvswitch/ofp-parse.h"
+#include "openvswitch/poll-loop.h"
 #include "lib/chassis-index.h"
 #include "lib/ovn-sb-idl.h"
 #include "ovn-controller.h"
@@ -47,11 +48,16 @@  struct chassis_info {
 
     /* True if Chassis SB record is initialized, false otherwise. */
     uint32_t id_inited : 1;
+
+    /* Next livennes update time to update the 'alive_at' column of
+     * Chassis_Private table. */
+    long long int next_liveness_update;
 };
 
 static struct chassis_info chassis_state = {
     .id = DS_EMPTY_INITIALIZER,
     .id_inited = false,
+    .next_liveness_update = LLONG_MAX,
 };
 
 static void
@@ -581,18 +587,38 @@  chassis_update(const struct sbrec_chassis *chassis_rec,
     free(encaps);
 }
 
+static void
+chassis_update_liveness_cfg(const struct sbrec_chassis_private *chassis_rec,
+                            struct chassis_info *ch_info,
+                            int chassis_liveness_interval)
+{
+    if (ch_info->next_liveness_update == LLONG_MAX ||
+        time_msec() > ch_info->next_liveness_update) {
+        sbrec_chassis_private_set_alive_at(
+            chassis_rec,
+            xastrftime_msec("%Y-%m-%d %H:%M:%S", time_wall_msec(), true));
+        ch_info->next_liveness_update =
+            time_msec() + chassis_liveness_interval * 1000;
+    }
+}
+
 /* Returns this chassis's Chassis record, if it is available. */
 const struct sbrec_chassis *
 chassis_run(struct ovsdb_idl_txn *ovnsb_idl_txn,
             struct ovsdb_idl_index *sbrec_chassis_by_name,
+            struct ovsdb_idl_index *sbrec_chassis_private_by_name,
             const struct ovsrec_open_vswitch_table *ovs_table,
             const struct sbrec_chassis_table *chassis_table,
             const char *chassis_id,
             const struct ovsrec_bridge *br_int,
-            const struct sset *transport_zones)
+            const struct sset *transport_zones,
+            const struct sbrec_chassis_private **chassis_private,
+            int chassis_liveness_interval)
 {
     struct ovs_chassis_cfg ovs_cfg;
 
+    *chassis_private = NULL;
+
     /* Get the chassis config from the ovs table. */
     ovs_chassis_cfg_init(&ovs_cfg);
     if (!chassis_parse_ovs_config(ovs_table, br_int, &ovs_cfg)) {
@@ -616,6 +642,24 @@  chassis_run(struct ovsdb_idl_txn *ovnsb_idl_txn,
                                   chassis_id);
     }
 
+    /* Create the Chassis_Private entry if it's not there yet */
+    const struct sbrec_chassis_private *chassis_private_rec =
+        chassis_private_lookup_by_name(sbrec_chassis_private_by_name,
+                                       chassis_id);
+    if (!chassis_private_rec && ovnsb_idl_txn) {
+        chassis_private_rec = sbrec_chassis_private_insert(ovnsb_idl_txn);
+        sbrec_chassis_private_set_name(chassis_private_rec, chassis_id);
+    }
+
+    /* Update the alive_at column from the Chassis_Private table if
+     * the chassis liveness mechanism is enabled */
+    if (chassis_private_rec && ovnsb_idl_txn && chassis_liveness_interval) {
+        chassis_update_liveness_cfg(chassis_private_rec, &chassis_state,
+                                    chassis_liveness_interval);
+    }
+
+    *chassis_private = chassis_private_rec;
+
     ovs_chassis_cfg_destroy(&ovs_cfg);
     return chassis_rec;
 }
@@ -669,7 +713,8 @@  chassis_get_mac(const struct sbrec_chassis *chassis_rec,
  * required. */
 bool
 chassis_cleanup(struct ovsdb_idl_txn *ovnsb_idl_txn,
-                const struct sbrec_chassis *chassis_rec)
+                const struct sbrec_chassis *chassis_rec,
+                const struct sbrec_chassis_private *chassis_private_rec)
 {
     if (!chassis_rec) {
         return true;
@@ -679,6 +724,7 @@  chassis_cleanup(struct ovsdb_idl_txn *ovnsb_idl_txn,
                                   "ovn-controller: unregistering chassis '%s'",
                                   chassis_rec->name);
         sbrec_chassis_delete(chassis_rec);
+        sbrec_chassis_private_delete(chassis_private_rec);
     }
     return false;
 }
@@ -695,3 +741,9 @@  chassis_get_id(void)
 
     return NULL;
 }
+
+void chassis_wait(int chassis_liveness_interval) {
+    if (chassis_liveness_interval) {
+        poll_timer_wait_until(chassis_state.next_liveness_update);
+    }
+}
diff --git a/controller/chassis.h b/controller/chassis.h
index 178d2957e..38a549553 100644
--- a/controller/chassis.h
+++ b/controller/chassis.h
@@ -17,6 +17,7 @@ 
 #define OVN_CHASSIS_H 1
 
 #include <stdbool.h>
+#include "lib/ovn-sb-idl.h"
 
 struct ovsdb_idl;
 struct ovsdb_idl_index;
@@ -33,17 +34,21 @@  void chassis_register_ovs_idl(struct ovsdb_idl *);
 const struct sbrec_chassis *chassis_run(
     struct ovsdb_idl_txn *ovnsb_idl_txn,
     struct ovsdb_idl_index *sbrec_chassis_by_name,
+    struct ovsdb_idl_index *sbrec_chassis_private_by_name,
     const struct ovsrec_open_vswitch_table *,
     const struct sbrec_chassis_table *,
     const char *chassis_id, const struct ovsrec_bridge *br_int,
-    const struct sset *transport_zones);
+    const struct sset *transport_zones,
+    const struct sbrec_chassis_private **chassis_private,
+    int chassis_liveness_interval);
 bool chassis_cleanup(struct ovsdb_idl_txn *ovnsb_idl_txn,
-                     const struct sbrec_chassis *);
+                     const struct sbrec_chassis *,
+                     const struct sbrec_chassis_private *);
 bool chassis_get_mac(const struct sbrec_chassis *chassis,
                      const char *bridge_mapping,
                      struct eth_addr *chassis_mac);
 const char *chassis_get_id(void);
 const char * get_chassis_mac_mappings(const struct smap *ext_ids);
-
+void chassis_wait(int chassis_liveness_interval);
 
 #endif /* controller/chassis.h */
diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c
index 4d245ca28..fd9d95da2 100644
--- a/controller/ovn-controller.c
+++ b/controller/ovn-controller.c
@@ -155,6 +155,7 @@  update_sb_monitors(struct ovsdb_idl *ovnsb_idl,
     struct ovsdb_idl_condition ce =  OVSDB_IDL_CONDITION_INIT(&ce);
     struct ovsdb_idl_condition ip_mcast = OVSDB_IDL_CONDITION_INIT(&ip_mcast);
     struct ovsdb_idl_condition igmp = OVSDB_IDL_CONDITION_INIT(&igmp);
+    struct ovsdb_idl_condition chprv = OVSDB_IDL_CONDITION_INIT(&chprv);
 
     if (monitor_all) {
         ovsdb_idl_condition_add_clause_true(&pb);
@@ -165,6 +166,7 @@  update_sb_monitors(struct ovsdb_idl *ovnsb_idl,
         ovsdb_idl_condition_add_clause_true(&ce);
         ovsdb_idl_condition_add_clause_true(&ip_mcast);
         ovsdb_idl_condition_add_clause_true(&igmp);
+        ovsdb_idl_condition_add_clause_true(&chprv);
         goto out;
     }
 
@@ -196,6 +198,10 @@  update_sb_monitors(struct ovsdb_idl *ovnsb_idl,
                                                   &chassis->header_.uuid);
         sbrec_igmp_group_add_clause_chassis(&igmp, OVSDB_F_EQ,
                                             &chassis->header_.uuid);
+
+        /* Monitors Chassis_Private record for current chassis only */
+        sbrec_chassis_private_add_clause_name(&chprv, OVSDB_F_EQ,
+                                              chassis->name);
     }
     if (local_ifaces) {
         const char *name;
@@ -229,6 +235,7 @@  out:
     sbrec_controller_event_set_condition(ovnsb_idl, &ce);
     sbrec_ip_multicast_set_condition(ovnsb_idl, &ip_mcast);
     sbrec_igmp_group_set_condition(ovnsb_idl, &igmp);
+    sbrec_chassis_private_set_condition(ovnsb_idl, &chprv);
     ovsdb_idl_condition_destroy(&pb);
     ovsdb_idl_condition_destroy(&lf);
     ovsdb_idl_condition_destroy(&mb);
@@ -237,6 +244,7 @@  out:
     ovsdb_idl_condition_destroy(&ce);
     ovsdb_idl_condition_destroy(&ip_mcast);
     ovsdb_idl_condition_destroy(&igmp);
+    ovsdb_idl_condition_destroy(&chprv);
 }
 
 static const char *
@@ -687,6 +695,15 @@  get_transport_zones(const struct ovsrec_open_vswitch_table *ovs_table)
     return smap_get_def(&cfg->external_ids, "ovn-transport-zones", "");
 }
 
+static uint16_t
+get_chassis_liveness_interval(const struct sbrec_sb_global_table
+                              *sb_global_table)
+{
+    const struct sbrec_sb_global *sb
+        = sbrec_sb_global_table_first(sb_global_table);
+    return atoi(smap_get_def(&sb->options, "chassis_liveness_interval", "0"));
+}
+
 static void
 ctrl_register_ovs_idl(struct ovsdb_idl *ovs_idl)
 {
@@ -1760,6 +1777,8 @@  main(int argc, char *argv[])
 
     struct ovsdb_idl_index *sbrec_chassis_by_name
         = chassis_index_create(ovnsb_idl_loop.idl);
+    struct ovsdb_idl_index *sbrec_chassis_private_by_name
+        = chassis_private_index_create(ovnsb_idl_loop.idl);
     struct ovsdb_idl_index *sbrec_multicast_group_by_name_datapath
         = mcast_group_index_create(ovnsb_idl_loop.idl);
     struct ovsdb_idl_index *sbrec_port_binding_by_name
@@ -1821,6 +1840,10 @@  main(int argc, char *argv[])
     ovsdb_idl_omit(ovnsb_idl_loop.idl, &sbrec_connection_col_status);
     ovsdb_idl_omit(ovnsb_idl_loop.idl, &sbrec_connection_col_target);
 
+    /* Do not monitor the Chassis_Private external_ids column */
+    ovsdb_idl_omit(ovnsb_idl_loop.idl,
+                   &sbrec_chassis_private_col_external_ids);
+
     update_sb_monitors(ovnsb_idl_loop.idl, NULL, NULL, NULL, false);
 
     stopwatch_create(CONTROLLER_LOOP_STOPWATCH_NAME, SW_MS);
@@ -1998,10 +2021,16 @@  main(int argc, char *argv[])
                 process_br_int(ovs_idl_txn, bridge_table, ovs_table);
             const char *chassis_id = get_ovs_chassis_id(ovs_table);
             const struct sbrec_chassis *chassis = NULL;
+            const struct sbrec_chassis_private *chassis_private = NULL;
+            uint16_t chassis_liveness_interval = get_chassis_liveness_interval(
+                sbrec_sb_global_table_get(ovnsb_idl_loop.idl));
             if (chassis_id) {
                 chassis = chassis_run(ovnsb_idl_txn, sbrec_chassis_by_name,
+                                      sbrec_chassis_private_by_name,
                                       ovs_table, chassis_table, chassis_id,
-                                      br_int, &transport_zones);
+                                      br_int, &transport_zones,
+                                      &chassis_private,
+                                      chassis_liveness_interval);
             }
 
             if (br_int) {
@@ -2164,6 +2193,10 @@  main(int argc, char *argv[])
                 ofctrl_wait();
                 pinctrl_wait(ovnsb_idl_txn);
             }
+
+            if (chassis) {
+                chassis_wait(chassis_liveness_interval);
+            }
         }
 
         unixctl_server_run(unixctl);
@@ -2232,10 +2265,17 @@  main(int argc, char *argv[])
                    ? chassis_lookup_by_name(sbrec_chassis_by_name, chassis_id)
                    : NULL);
 
+            const struct sbrec_chassis_private *chassis_private
+                = (chassis_id
+                   ? chassis_private_lookup_by_name(
+                       sbrec_chassis_private_by_name, chassis_id)
+                   : NULL);
+
             /* Run all of the cleanup functions, even if one of them returns
              * false. We're done if all of them return true. */
             done = binding_cleanup(ovnsb_idl_txn, port_binding_table, chassis);
-            done = chassis_cleanup(ovnsb_idl_txn, chassis) && done;
+            done = chassis_cleanup(ovnsb_idl_txn, chassis,
+                                   chassis_private) && done;
             done = encaps_cleanup(ovs_idl_txn, br_int) && done;
             done = igmp_group_cleanup(ovnsb_idl_txn, sbrec_igmp_group) && done;
             if (done) {
diff --git a/lib/chassis-index.c b/lib/chassis-index.c
index 39066f4cc..d9bbba8e4 100644
--- a/lib/chassis-index.c
+++ b/lib/chassis-index.c
@@ -40,6 +40,32 @@  chassis_lookup_by_name(struct ovsdb_idl_index *sbrec_chassis_by_name,
     return retval;
 }
 
+struct ovsdb_idl_index *
+chassis_private_index_create(struct ovsdb_idl *idl)
+{
+    return ovsdb_idl_index_create1(
+        idl, &sbrec_chassis_private_col_name);
+}
+
+/* Finds and returns the chassis with the given 'name', or NULL if no such
+ * chassis exists. */
+const struct sbrec_chassis_private *
+chassis_private_lookup_by_name(
+    struct ovsdb_idl_index *sbrec_chassis_private_by_name,
+    const char *name)
+{
+    struct sbrec_chassis_private *target = \
+        sbrec_chassis_private_index_init_row(sbrec_chassis_private_by_name);
+    sbrec_chassis_private_index_set_name(target, name);
+
+    struct sbrec_chassis_private *retval = sbrec_chassis_private_index_find(
+        sbrec_chassis_private_by_name, target);
+
+    sbrec_chassis_private_index_destroy_row(target);
+
+    return retval;
+}
+
 struct ovsdb_idl_index *
 ha_chassis_group_index_create(struct ovsdb_idl *idl)
 {
diff --git a/lib/chassis-index.h b/lib/chassis-index.h
index 302e5f0fd..77d2e1f99 100644
--- a/lib/chassis-index.h
+++ b/lib/chassis-index.h
@@ -23,6 +23,11 @@  struct ovsdb_idl_index *chassis_index_create(struct ovsdb_idl *);
 const struct sbrec_chassis *chassis_lookup_by_name(
     struct ovsdb_idl_index *sbrec_chassis_by_name, const char *name);
 
+struct ovsdb_idl_index *chassis_private_index_create(struct ovsdb_idl *);
+
+const struct sbrec_chassis_private *chassis_private_lookup_by_name(
+    struct ovsdb_idl_index *sbrec_chassis_private_by_name, const char *name);
+
 struct ovsdb_idl_index *ha_chassis_group_index_create(struct ovsdb_idl *idl);
 const struct sbrec_ha_chassis_group *ha_chassis_group_lookup_by_name(
     struct ovsdb_idl_index *sbrec_ha_chassis_grp_by_name, const char *name);
diff --git a/northd/ovn-northd.c b/northd/ovn-northd.c
index 2580b4ec9..1b16f5ef6 100644
--- a/northd/ovn-northd.c
+++ b/northd/ovn-northd.c
@@ -11555,6 +11555,10 @@  main(int argc, char *argv[])
     ovsdb_idl_add_column(ovnsb_idl_loop.idl, &sbrec_chassis_col_name);
     ovsdb_idl_add_column(ovnsb_idl_loop.idl, &sbrec_chassis_col_external_ids);
 
+    ovsdb_idl_add_table(ovnsb_idl_loop.idl, &sbrec_table_chassis_private);
+    add_column_noalert(ovnsb_idl_loop.idl,
+                       &sbrec_chassis_private_col_external_ids);
+
     ovsdb_idl_add_table(ovnsb_idl_loop.idl, &sbrec_table_ha_chassis);
     add_column_noalert(ovnsb_idl_loop.idl,
                        &sbrec_ha_chassis_col_chassis);
diff --git a/ovn-sb.ovsschema b/ovn-sb.ovsschema
index d89f8dbbb..88da05849 100644
--- a/ovn-sb.ovsschema
+++ b/ovn-sb.ovsschema
@@ -1,7 +1,7 @@ 
 {
     "name": "OVN_Southbound",
-    "version": "2.7.0",
-    "cksum": "4286723485 21693",
+    "version": "2.8.0",
+    "cksum": "293994447 22064",
     "tables": {
         "SB_Global": {
             "columns": {
@@ -43,6 +43,15 @@ 
                                               "max": "unlimited"}}},
             "isRoot": true,
             "indexes": [["name"]]},
+        "Chassis_Private": {
+            "columns": {
+                "name": {"type": "string"},
+                "alive_at": {"type": "string"},
+                "external_ids": {
+                    "type": {"key": "string", "value": "string",
+                             "min": 0, "max": "unlimited"}}},
+            "isRoot": true,
+            "indexes": [["name"]]},
         "Encap": {
             "columns": {
                 "type": {"type": {"key": {
diff --git a/ovn-sb.xml b/ovn-sb.xml
index 3ae9d4f92..8c9822fdb 100644
--- a/ovn-sb.xml
+++ b/ovn-sb.xml
@@ -366,6 +366,41 @@ 
     </group>
   </table>
 
+  <table name="Chassis_Private" title="Chassis Private">
+    <p>
+      Each row in this table maintains per chassis private data that are
+      accessed only by the owning chassis (write only) and ovn-northd, not by
+      any other chassis.  These data are stored in this separate table instead
+      of the <ref table="Chassis"/> table for performance considerations:
+      the rows in this table can be conditionally monitored by chassises so
+      that each chassis only get update notifications for its own row, to avoid
+      unnecessary chassis private data update flooding in a large scale
+      deployment.  (Future: this separation can be avoided if ovsdb conditional
+      monitoring is supported on a set of columns)
+    </p>
+
+    <column name="name">
+      The name of the chassis that owns these chassis-private data.
+    </column>
+
+    <column name="alive_at">
+      A timestamp indicating the last time ovn-controller
+      signalized it's alive. For monitoring purposes, setting the
+      chassis_liveness_interval configuration to the OVN_Northhbound's
+      <ref table="NB_Global" column="options"/> with an integer value (in
+      seconds) causes the <code>ovn-controller</code> to update this
+      column with a timestamp every N seconds. If the value is not set
+      or 0, then <code>ovn-controller</code> doesn't update this column.
+    </column>
+
+    <group title="Common Columns">
+      The overall purpose of these columns is described under <code>Common
+      Columns</code> at the beginning of this document.
+
+      <column name="external_ids"/>
+    </group>
+  </table>
+
   <table name="Encap" title="Encapsulation Types">
     <p>
       The <ref column="encaps" table="Chassis"/> column in the <ref