diff mbox series

[ovs-dev,v3] controller: avoid recomputes triggered by SBDB Port_Binding updates.

Message ID 20220622102351.455161-1-xsimonar@redhat.com
State Superseded
Headers show
Series [ovs-dev,v3] controller: avoid recomputes triggered by SBDB Port_Binding updates. | expand

Checks

Context Check Description
ovsrobot/apply-robot success apply and check: success
ovsrobot/github-robot-_Build_and_Test fail github build: failed
ovsrobot/github-robot-_ovn-kubernetes success github build: passed

Commit Message

Xavier Simonart June 22, 2022, 10:23 a.m. UTC
When VIF ports are claimed on a chassis, SBDB Port_Binding table is updated.
If the SBDB IDL is still is read-only ("in transaction") when such a update
is required, the update is not possible and recompute is triggered through
I+P failure.

This situation can happen:
- after updating Port_Binding->chassis to SBDB for one port, in a following
  iteration, ovn-controller handles Interface:external_ids:ovn-installed
  (for the same port) while SBDB is still read-only.
- after updating Port_Binding->chassis to SBDB for one port, in a following
  iteration, ovn-controller updates Port_Binding->chassis for another port,
  while SBDB is still read-only.

This patch prevent the recompute, by having the if-status module
updating the Port_Binding chassis (if needed) when possible.
This does not delay Port_Binding chassis update compared to before this patch.
- With the patch, Port_Binding chassis will be updated as soon as SBDB is
again writable, without recompute.
- Without the patch, Port_Binding chassis was updated as soon as SBDB was
again writable, through a recompute.

As part of this patch, ovn-installed will not be updated for additional chassis;
it will only be updated when the migration is completed.

Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2059253
Signed-off-by: Xavier Simonart <xsimonar@redhat.com>

---
v2:  - handled Dumitru's comments.
     - handled Han's comments, mainly ensure we moved out of CLAIMED state
       only after updating pb->chassis to guarentee physical flows are installed
       when ovn-installed is updated in OVS.
     - slighly reorganize the code to isolate 'notify_up = false' cases in
       claim_port (i.e. ports such as virtual ports), in the idea of making
       future patch preventing recomputes when virtual ports are claimed.
     - updated test case to cause more race conditions.
     - rebased on origin/main
     - note that "additional chassis" as now supported by
       "Support LSP:options:requested-chassis as a list" might still cause
       recomputes.
     - fixed missing flows when Port_Binding chassis was updated by mgr_update
       w/o any lflow recalculation.
v3:  - handled Dumitru's comments on v2, mainly have runtime_data handler
       handling pb_claims when sb becomes writable (instead of a lflow handler).
     - fixed test as it was not checking recomputes on all hv, as well as a flaky
       behavior.
     - rebased on origin/main.
---
 controller/binding.c        | 154 +++++++++++++++++++++----------
 controller/binding.h        |  15 +++-
 controller/if-status.c      | 174 ++++++++++++++++++++++++++++++++----
 controller/if-status.h      |  16 +++-
 controller/ovn-controller.c |  72 ++++++++++++++-
 tests/ovn-macros.at         |  12 +++
 tests/ovn.at                | 147 +++++++++++++++++++++++++++++-
 tests/perf-northd.at        |  17 ----
 8 files changed, 519 insertions(+), 88 deletions(-)

Comments

Dumitru Ceara June 28, 2022, 12:17 p.m. UTC | #1
On 6/22/22 12:23, Xavier Simonart wrote:
> When VIF ports are claimed on a chassis, SBDB Port_Binding table is updated.
> If the SBDB IDL is still is read-only ("in transaction") when such a update
> is required, the update is not possible and recompute is triggered through
> I+P failure.
> 
> This situation can happen:
> - after updating Port_Binding->chassis to SBDB for one port, in a following
>   iteration, ovn-controller handles Interface:external_ids:ovn-installed
>   (for the same port) while SBDB is still read-only.
> - after updating Port_Binding->chassis to SBDB for one port, in a following
>   iteration, ovn-controller updates Port_Binding->chassis for another port,
>   while SBDB is still read-only.
> 
> This patch prevent the recompute, by having the if-status module
> updating the Port_Binding chassis (if needed) when possible.
> This does not delay Port_Binding chassis update compared to before this patch.
> - With the patch, Port_Binding chassis will be updated as soon as SBDB is
> again writable, without recompute.
> - Without the patch, Port_Binding chassis was updated as soon as SBDB was
> again writable, through a recompute.
> 
> As part of this patch, ovn-installed will not be updated for additional chassis;
> it will only be updated when the migration is completed.
> 
> Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2059253
> Signed-off-by: Xavier Simonart <xsimonar@redhat.com>
> 
> ---

Hi Xavier,

Thanks for the new revision!  It looks good to me.  I only have two very
minor comments below and I guess the maintainer applying the patch can
address that when pushing.   So, unless you need a new revision for
other reasons:

Acked-by: Dumitru Ceara <dceara@redhat.com>

Thanks,
Dumitru

> v2:  - handled Dumitru's comments.
>      - handled Han's comments, mainly ensure we moved out of CLAIMED state
>        only after updating pb->chassis to guarentee physical flows are installed
>        when ovn-installed is updated in OVS.
>      - slighly reorganize the code to isolate 'notify_up = false' cases in
>        claim_port (i.e. ports such as virtual ports), in the idea of making
>        future patch preventing recomputes when virtual ports are claimed.
>      - updated test case to cause more race conditions.
>      - rebased on origin/main
>      - note that "additional chassis" as now supported by
>        "Support LSP:options:requested-chassis as a list" might still cause
>        recomputes.
>      - fixed missing flows when Port_Binding chassis was updated by mgr_update
>        w/o any lflow recalculation.
> v3:  - handled Dumitru's comments on v2, mainly have runtime_data handler
>        handling pb_claims when sb becomes writable (instead of a lflow handler).
>      - fixed test as it was not checking recomputes on all hv, as well as a flaky
>        behavior.
>      - rebased on origin/main.
> ---
>  controller/binding.c        | 154 +++++++++++++++++++++----------
>  controller/binding.h        |  15 +++-
>  controller/if-status.c      | 174 ++++++++++++++++++++++++++++++++----
>  controller/if-status.h      |  16 +++-
>  controller/ovn-controller.c |  72 ++++++++++++++-
>  tests/ovn-macros.at         |  12 +++
>  tests/ovn.at                | 147 +++++++++++++++++++++++++++++-
>  tests/perf-northd.at        |  17 ----
>  8 files changed, 519 insertions(+), 88 deletions(-)
> 
> diff --git a/controller/binding.c b/controller/binding.c
> index 2279570f9..b21577f71 100644
> --- a/controller/binding.c
> +++ b/controller/binding.c
> @@ -644,11 +644,17 @@ local_binding_get_lport_ofport(const struct shash *local_bindings,
>  }
>  
>  bool
> -local_binding_is_up(struct shash *local_bindings, const char *pb_name)
> +local_binding_is_up(struct shash *local_bindings, const char *pb_name,
> +                    const struct sbrec_chassis *chassis_rec)
>  {
>      struct local_binding *lbinding =
>          local_binding_find(local_bindings, pb_name);
>      struct binding_lport *b_lport = local_binding_get_primary_lport(lbinding);
> +
> +    if (b_lport && b_lport->pb->chassis != chassis_rec) {
> +        return false;
> +    }
> +
>      if (lbinding && b_lport && lbinding->iface) {
>          if (b_lport->pb->n_up && !b_lport->pb->up[0]) {
>              return false;
> @@ -660,13 +666,23 @@ local_binding_is_up(struct shash *local_bindings, const char *pb_name)
>  }
>  
>  bool
> -local_binding_is_down(struct shash *local_bindings, const char *pb_name)
> +local_binding_is_down(struct shash *local_bindings, const char *pb_name,
> +                      const struct sbrec_chassis *chassis_rec)
>  {
>      struct local_binding *lbinding =
>          local_binding_find(local_bindings, pb_name);
>  
>      struct binding_lport *b_lport = local_binding_get_primary_lport(lbinding);
>  
> +    if (b_lport) {
> +        if (b_lport->pb->chassis == chassis_rec) {
> +            return false;
> +        } else if (b_lport->pb->chassis) {
> +            VLOG_DBG("lport %s already claimed by other chassis",
> +                     b_lport->pb->logical_port);
> +        }
> +    }
> +
>      if (!lbinding) {
>          return true;
>      }
> @@ -884,37 +900,80 @@ get_lport_type_str(enum en_lport_type lport_type)
>      OVS_NOT_REACHED();
>  }
>  
> -/* For newly claimed ports, if 'notify_up' is 'false':
> +void
> +set_pb_chassis_in_sbrec(const struct sbrec_port_binding *pb,
> +                        const struct sbrec_chassis *chassis_rec,
> +                        bool is_set)
> +{
> +    if (pb->chassis != chassis_rec) {
> +         if (is_set) {
> +            if (pb->chassis) {
> +                VLOG_INFO("Changing chassis for lport %s from %s to %s.",
> +                          pb->logical_port, pb->chassis->name,
> +                          chassis_rec->name);
> +            } else {
> +                VLOG_INFO("Claiming lport %s for this chassis.",
> +                          pb->logical_port);
> +            }
> +            for (int i = 0; i < pb->n_mac; i++) {
> +                VLOG_INFO("%s: Claiming %s", pb->logical_port, pb->mac[i]);
> +            }
> +            sbrec_port_binding_set_chassis(pb, chassis_rec);
> +        }
> +    } else if (!is_set) {
> +        sbrec_port_binding_set_chassis(pb, NULL);
> +    }
> +}
> +
> +void
> +local_binding_set_pb(struct shash *local_bindings, const char *pb_name,
> +                     const struct sbrec_chassis *chassis_rec,
> +                     struct hmap *tracked_datapaths, bool is_set)
> +{
> +    struct local_binding *lbinding =
> +        local_binding_find(local_bindings, pb_name);
> +    struct binding_lport *b_lport = local_binding_get_primary_lport(lbinding);
> +
> +    if (b_lport) {
> +        set_pb_chassis_in_sbrec(b_lport->pb, chassis_rec, is_set);
> +        if (tracked_datapaths) {
> +            update_lport_tracking(b_lport->pb, tracked_datapaths, true);
> +        }
> +    }
> +}
> +
> +/* For newly claimed ports:
>   * - set the 'pb.up' field to true if 'pb' has no 'parent_pb'.
>   * - set the 'pb.up' field to true if 'parent_pb.up' is 'true' (e.g., for
>   *   container and virtual ports).
> - * Otherwise request a notification to be sent when the OVS flows
> - * corresponding to 'pb' have been installed.
> + *
> + * Returns false if lport is not claimed due to 'sb_readonly'.
> + * Returns true otherwise.
>   *
>   * Note:
> - *   Updates (directly or through a notification) the 'pb->up' field only if
> - *   it's explicitly set to 'false'.
> + *   Updates the 'pb->up' field only if it's explicitly set to 'false'.
>   *   This is to ensure compatibility with older versions of ovn-northd.
>   */
> -static void
> +static bool
>  claimed_lport_set_up(const struct sbrec_port_binding *pb,
>                       const struct sbrec_port_binding *parent_pb,
> -                     const struct sbrec_chassis *chassis_rec,
> -                     bool notify_up, struct if_status_mgr *if_mgr)
> +                     bool sb_readonly)
>  {
> -    if (!notify_up) {
> -        bool up = true;
> -        if (!parent_pb || (parent_pb->n_up && parent_pb->up[0])) {
> +    /* When notify_up is false in claim_port(), no state is created
> +     * by if_status_mgr. In such cases, return false (i.e. trigger recompute)
> +     * if we can't update sb (because it is readonly).
> +     */
> +    bool up = true;
> +    if (!parent_pb || (parent_pb->n_up && parent_pb->up[0])) {
> +        if (!sb_readonly) {
>              if (pb->n_up) {
>                  sbrec_port_binding_set_up(pb, &up, 1);
>              }
> +        } else if (pb->n_up && !pb->up[0]) {
> +            return false;
>          }
> -        return;
> -    }
> -
> -    if (pb->chassis != chassis_rec || (pb->n_up && !pb->up[0])) {
> -        if_status_mgr_claim_iface(if_mgr, pb->logical_port);
>      }
> +    return true;
>  }
>  
>  typedef void (*set_func)(const struct sbrec_port_binding *pb,
> @@ -1057,37 +1116,35 @@ claim_lport(const struct sbrec_port_binding *pb,
>              struct hmap *tracked_datapaths,
>              struct if_status_mgr *if_mgr)
>  {
> -    if (!sb_readonly) {
> -        claimed_lport_set_up(pb, parent_pb, chassis_rec, notify_up, if_mgr);
> -    }
> -
>      enum can_bind can_bind = lport_can_bind_on_this_chassis(chassis_rec, pb);
>      bool update_tracked = false;
>  
>      if (can_bind == CAN_BIND_AS_MAIN) {
>          if (pb->chassis != chassis_rec) {
> -            if (sb_readonly) {
> -                return false;
> -            }
> -
> -            if (pb->chassis) {
> -                VLOG_INFO("Changing chassis for lport %s from %s to %s.",
> -                        pb->logical_port, pb->chassis->name,
> -                        chassis_rec->name);
> -            } else {
> -                VLOG_INFO("Claiming lport %s for this chassis.",
> -                          pb->logical_port);
> -            }
> -            for (size_t i = 0; i < pb->n_mac; i++) {
> -                VLOG_INFO("%s: Claiming %s", pb->logical_port, pb->mac[i]);
> -            }
> -
> -            sbrec_port_binding_set_chassis(pb, chassis_rec);
>              if (is_additional_chassis(pb, chassis_rec)) {
> +                if (sb_readonly) {
> +                    return false;
> +                }
>                  remove_additional_chassis(pb, chassis_rec);
>              }
>              update_tracked = true;
>          }
> +        if (!notify_up) {
> +            if (!claimed_lport_set_up(pb, parent_pb, sb_readonly)) {
> +                return false;
> +            }
> +            if (pb->chassis != chassis_rec) {
> +                if (sb_readonly) {
> +                    return false;
> +                }
> +                set_pb_chassis_in_sbrec(pb, chassis_rec, true);
> +            }
> +        } else {
> +            if ((pb->chassis != chassis_rec) || (pb->n_up && !pb->up[0])) {
> +                if_status_mgr_claim_iface(if_mgr, pb, chassis_rec,
> +                                          sb_readonly);
> +            }
> +        }
>      } else if (can_bind == CAN_BIND_AS_ADDITIONAL) {
>          if (!is_additional_chassis(pb, chassis_rec)) {
>              if (sb_readonly) {
> @@ -1132,7 +1189,8 @@ claim_lport(const struct sbrec_port_binding *pb,
>   */
>  static bool
>  release_lport_main_chassis(const struct sbrec_port_binding *pb,
> -                           bool sb_readonly)
> +                           bool sb_readonly,
> +                           struct if_status_mgr *if_mgr)
>  {
>      if (pb->encap) {
>          if (sb_readonly) {
> @@ -1141,11 +1199,13 @@ release_lport_main_chassis(const struct sbrec_port_binding *pb,
>          sbrec_port_binding_set_encap(pb, NULL);
>      }
>  
> +    /* If sb readonly, pb->chassis unset through if-status if present. */
>      if (pb->chassis) {
> -        if (sb_readonly) {
> +        if (!sb_readonly) {
> +            sbrec_port_binding_set_chassis(pb, NULL);
> +        } else if (!if_status_mgr_iface_is_present(if_mgr, pb->logical_port)) {
>              return false;
>          }
> -        sbrec_port_binding_set_chassis(pb, NULL);
>      }
>  
>      if (pb->virtual_parent) {
> @@ -1155,7 +1215,8 @@ release_lport_main_chassis(const struct sbrec_port_binding *pb,
>          sbrec_port_binding_set_virtual_parent(pb, NULL);
>      }
>  
> -    VLOG_INFO("Releasing lport %s from this chassis.", pb->logical_port);
> +    VLOG_INFO("Releasing lport %s from this chassis (sb_readonly=%d)",
> +              pb->logical_port, sb_readonly);
>      return true;
>  }
>  
> @@ -1189,7 +1250,7 @@ release_lport(const struct sbrec_port_binding *pb,
>                struct hmap *tracked_datapaths, struct if_status_mgr *if_mgr)
>  {
>      if (pb->chassis == chassis_rec) {
> -        if (!release_lport_main_chassis(pb, sb_readonly)) {
> +        if (!release_lport_main_chassis(pb, sb_readonly, if_mgr)) {
>              return false;
>          }
>      } else if (is_additional_chassis(pb, chassis_rec)) {
> @@ -1271,7 +1332,7 @@ consider_vif_lport_(const struct sbrec_port_binding *pb,
>                               b_lport->lbinding->iface,
>                               !b_ctx_in->ovnsb_idl_txn,
>                               !parent_pb, b_ctx_out->tracked_dp_bindings,
> -                             b_ctx_out->if_mgr)){
> +                             b_ctx_out->if_mgr)) {
>                  return false;
>              }
>  
> @@ -1527,7 +1588,8 @@ consider_localport(const struct sbrec_port_binding *pb,
>      enum can_bind can_bind = lport_can_bind_on_this_chassis(
>          b_ctx_in->chassis_rec, pb);
>      if (can_bind == CAN_BIND_AS_MAIN) {
> -        if (!release_lport_main_chassis(pb, !b_ctx_in->ovnsb_idl_txn)) {
> +        if (!release_lport_main_chassis(pb, !b_ctx_in->ovnsb_idl_txn,
> +            b_ctx_out->if_mgr)) {
>              return false;
>          }
>      } else if (can_bind == CAN_BIND_AS_ADDITIONAL) {
> diff --git a/controller/binding.h b/controller/binding.h
> index 1fed06674..d20659b0b 100644
> --- a/controller/binding.h
> +++ b/controller/binding.h
> @@ -151,8 +151,10 @@ const struct sbrec_port_binding *local_binding_get_primary_pb(
>  ofp_port_t local_binding_get_lport_ofport(const struct shash *local_bindings,
>                                            const char *pb_name);
>  
> -bool local_binding_is_up(struct shash *local_bindings, const char *pb_name);
> -bool local_binding_is_down(struct shash *local_bindings, const char *pb_name);
> +bool local_binding_is_up(struct shash *local_bindings, const char *pb_name,
> +                         const struct sbrec_chassis *);
> +bool local_binding_is_down(struct shash *local_bindings, const char *pb_name,
> +                           const struct sbrec_chassis *);
>  void local_binding_set_up(struct shash *local_bindings, const char *pb_name,
>                            const struct sbrec_chassis *chassis_rec,
>                            const char *ts_now_str, bool sb_readonly,
> @@ -160,7 +162,10 @@ void local_binding_set_up(struct shash *local_bindings, const char *pb_name,
>  void local_binding_set_down(struct shash *local_bindings, const char *pb_name,
>                              const struct sbrec_chassis *chassis_rec,
>                              bool sb_readonly, bool ovs_readonly);
> -
> +void local_binding_set_pb(struct shash *local_bindings, const char *pb_name,
> +                          const struct sbrec_chassis *chassis_rec,
> +                          struct hmap *tracked_datapaths,
> +                          bool is_set);
>  void binding_register_ovs_idl(struct ovsdb_idl *);
>  void binding_run(struct binding_ctx_in *, struct binding_ctx_out *);
>  bool binding_cleanup(struct ovsdb_idl_txn *ovnsb_idl_txn,
> @@ -178,6 +183,10 @@ void binding_dump_local_bindings(struct local_binding_data *, struct ds *);
>  bool is_additional_chassis(const struct sbrec_port_binding *pb,
>                             const struct sbrec_chassis *chassis_rec);
>  
> +void set_pb_chassis_in_sbrec(const struct sbrec_port_binding *pb,
> +                             const struct sbrec_chassis *chassis_rec,
> +                             bool is_set);
> +
>  /* Corresponds to each Port_Binding.type. */
>  enum en_lport_type {
>      LP_UNKNOWN,
> diff --git a/controller/if-status.c b/controller/if-status.c
> index ad61844d8..7693c289b 100644
> --- a/controller/if-status.c
> +++ b/controller/if-status.c
> @@ -24,6 +24,7 @@
>  #include "lib/util.h"
>  #include "timeval.h"
>  #include "openvswitch/vlog.h"
> +#include "lib/ovn-sb-idl.h"
>  
>  VLOG_DEFINE_THIS_MODULE(if_status);
>  
> @@ -53,9 +54,11 @@ VLOG_DEFINE_THIS_MODULE(if_status);
>   */
>  
>  enum if_state {
> -    OIF_CLAIMED,       /* Newly claimed interface. */
> -    OIF_INSTALL_FLOWS, /* Already claimed interface for which flows are still
> -                        * being installed.
> +    OIF_CLAIMED,       /* Newly claimed interface. pb->chassis not yet updated.
> +                        */
> +    OIF_INSTALL_FLOWS, /* Claimed interface with pb->chassis successfully
> +                        * updated in SB and for which flows are still being
> +                        * installed.
>                          */
>      OIF_MARK_UP,       /* Interface with flows successfully installed in OVS
>                          * but not yet marked "up" in the binding module (in
> @@ -78,6 +81,55 @@ static const char *if_state_names[] = {
>      [OIF_INSTALLED]     = "INSTALLED",
>  };
>  
> +/*
> + *       +----------------------+
> + * +---> |                      |
> + * | +-> |         NULL         | <--------------------------------------+++-+
> + * | |   +----------------------+                                            |
> + * | |     ^ release_iface   | claim_iface                                   |
> + * | |     |                 V - sbrec_update_chassis(if sb is rw)           |
> + * | |   +----------------------+                                            |
> + * | |   |                      | <----------------------------------------+ |
> + * | |   |       CLAIMED        | <--------------------------------------+ | |
> + * | |   +----------------------+                                        | | |
> + * | |                  | mgr_update(when sb is rw)                      | | |
> + * | | release_iface    |  - sbrec_update_chassis                        | | |
> + * | |                  |  - request seqno                               | | |
> + * | |                  V                                                | | |
> + * | |   +----------------------+                                        | | |
> + * | +-- |                      |  mgr_run(seqno not rcvd)               | | |
> + * |     |    INSTALL_FLOWS     |   - set port down in sb                | | |
> + * |     |                      |  mgr_update()                          | | |
> + * |     +----------------------+   - sbrec_update_chassis if needed     | | |
> + * |                    |                                                | | |
> + * |                    |  mgr_run(seqno rcvd)                           | | |
> + * |                    |  - set port up in sb                           | | |
> + * | release_iface      |  - set ovn-installed in ovs                    | | |
> + * |                    V                                                | | |
> + * |   +----------------------+                                          | | |
> + * |   |                      |  mgr_run()                               | | |
> + * +-- |       MARK_UP        |  - set port up in sb                     | | |
> + *     |                      |  - set ovn-installed in ovs              | | |
> + *     |                      |  mgr_update()                            | | |
> + *     +----------------------+  - sbrec_update_chassis if needed        | | |
> + *              |                                                        | | |
> + *              | mgr_update(rcvd port up / ovn_installed & chassis set) | | |
> + *              V                                                        | | |
> + *     +----------------------+                                          | | |
> + *     |      INSTALLED       | ------------> claim_iface ---------------+ | |
> + *     +----------------------+                                            | |
> + *              |                                                          | |
> + *              | release_iface                                            | |
> + *              V                                                          | |
> + *     +----------------------+                                            | |
> + *     |                      | ------------> claim_iface -----------------+ |
> + *     |      MARK_DOWN       | ------> mgr_update(rcvd port down) ----------+
> + *     |                      | mgr_run()
> + *     |                      | - set port down in sb
> + *     |                      | mgr_update()
> + *     +----------------------+ - sbrec_update_chassis(NULL)
> + */
> +
>  struct ovs_iface {
>      char *id;               /* Extracted from OVS external_ids.iface_id. */
>      enum if_state state;    /* State of the interface in the state machine. */
> @@ -85,6 +137,7 @@ struct ovs_iface {
>                               * be fully programmed in OVS.  Only used in state
>                               * OIF_INSTALL_FLOWS.
>                               */
> +    bool chassis_update_required;  /* If true, pb->chassis must be updated. */
>  };
>  
>  static uint64_t ifaces_usage;
> @@ -158,14 +211,23 @@ if_status_mgr_destroy(struct if_status_mgr *mgr)
>  }
>  
>  void
> -if_status_mgr_claim_iface(struct if_status_mgr *mgr, const char *iface_id)
> +if_status_mgr_claim_iface(struct if_status_mgr *mgr,
> +                          const struct sbrec_port_binding *pb,
> +                          const struct sbrec_chassis *chassis_rec,
> +                          bool sb_readonly)
>  {
> +    const char *iface_id = pb->logical_port;
>      struct ovs_iface *iface = shash_find_data(&mgr->ifaces, iface_id);
>  
>      if (!iface) {
>          iface = ovs_iface_create(mgr, iface_id, OIF_CLAIMED);
>      }
> -
> +    if (!sb_readonly) {
> +        set_pb_chassis_in_sbrec(pb, chassis_rec, true);
> +        iface->chassis_update_required = false;
> +    } else {
> +        iface->chassis_update_required = true;
> +    }
>      switch (iface->state) {
>      case OIF_CLAIMED:
>      case OIF_INSTALL_FLOWS:
> @@ -182,6 +244,12 @@ if_status_mgr_claim_iface(struct if_status_mgr *mgr, const char *iface_id)
>      }
>  }
>  
> +bool
> +if_status_mgr_iface_is_present(struct if_status_mgr *mgr, const char *iface_id)
> +{
> +    return !!shash_find_data(&mgr->ifaces, iface_id);
> +}
> +
>  void
>  if_status_mgr_release_iface(struct if_status_mgr *mgr, const char *iface_id)
>  {
> @@ -246,9 +314,39 @@ if_status_mgr_delete_iface(struct if_status_mgr *mgr, const char *iface_id)
>      }
>  }
>  
> +bool
> +if_status_handle_claims(struct if_status_mgr *mgr,
> +                        struct local_binding_data *binding_data,
> +                        const struct sbrec_chassis *chassis_rec,
> +                        struct hmap *tracked_datapath,
> +                        bool sb_readonly)
> +{
> +    if (!binding_data || sb_readonly) {
> +        return false;
> +    }
> +
> +    struct shash *bindings = &binding_data->bindings;
> +    struct hmapx_node *node;
> +
> +    bool rc = false;
> +    HMAPX_FOR_EACH (node, &mgr->ifaces_per_state[OIF_CLAIMED]) {
> +        struct ovs_iface *iface = node->data;
> +        if (iface->chassis_update_required) {
> +            VLOG_INFO("if_status_handle_claims for %s", iface->id);
> +            local_binding_set_pb(bindings, iface->id, chassis_rec,
> +                                 tracked_datapath, true);
> +            rc = true;
> +        }
> +        iface->chassis_update_required = false;
> +    }
> +    return rc;
> +}
> +
>  void
>  if_status_mgr_update(struct if_status_mgr *mgr,
> -                     struct local_binding_data *binding_data)
> +                     struct local_binding_data *binding_data,
> +                     const struct sbrec_chassis *chassis_rec,
> +                     bool sb_readonly)
>  {
>      if (!binding_data) {
>          return;
> @@ -257,13 +355,25 @@ if_status_mgr_update(struct if_status_mgr *mgr,
>      struct shash *bindings = &binding_data->bindings;
>      struct hmapx_node *node;
>  
> +    /* Interfaces in OIF_MARK_UP state have already set their pb->chassis.
> +     * However, it might have been reset by another hv.
> +     */
>      /* Move all interfaces that have been confirmed "up" by the binding module,
>       * from OIF_MARK_UP to OIF_INSTALLED.
>       */
>      HMAPX_FOR_EACH_SAFE (node, &mgr->ifaces_per_state[OIF_MARK_UP]) {
>          struct ovs_iface *iface = node->data;
>  
> -        if (local_binding_is_up(bindings, iface->id)) {
> +        if (iface->chassis_update_required) {
> +            if (!sb_readonly) {
> +                iface->chassis_update_required = false;
> +                local_binding_set_pb(bindings, iface->id, chassis_rec,
> +                                     NULL, true);
> +            } else {
> +                continue;
> +            }
> +        }
> +        if (local_binding_is_up(bindings, iface->id, chassis_rec)) {
>              ovs_iface_set_state(mgr, iface, OIF_INSTALLED);
>          }
>      }
> @@ -274,23 +384,53 @@ if_status_mgr_update(struct if_status_mgr *mgr,
>      HMAPX_FOR_EACH_SAFE (node, &mgr->ifaces_per_state[OIF_MARK_DOWN]) {
>          struct ovs_iface *iface = node->data;
>  
> -        if (local_binding_is_down(bindings, iface->id)) {
> +        if (!sb_readonly) {
> +            local_binding_set_pb(bindings, iface->id, chassis_rec,
> +                                 NULL, false);
> +        }
> +        if (local_binding_is_down(bindings, iface->id, chassis_rec)) {
>              ovs_iface_destroy(mgr, iface);
>          }
>      }
>  
> -    /* Register for a notification about flows being installed in OVS for all
> -     * newly claimed interfaces.
> +    if (!sb_readonly) {
> +        HMAPX_FOR_EACH_SAFE (node, &mgr->ifaces_per_state[OIF_INSTALL_FLOWS]) {

I think this can be HMAPX_FOR_EACH.

> +            struct ovs_iface *iface = node->data;
> +
> +            if (iface->chassis_update_required) {
> +                iface->chassis_update_required = false;
> +                local_binding_set_pb(bindings, iface->id, chassis_rec,
> +                                     NULL, true);
> +            }
> +        }
> +    }
> +
> +    /* Update Port_Binding->chassis for newly claimed interfaces
> +     * Register for a notification about flows being installed in OVS for all
> +     * newly claimed interfaces for which we could update pb->chassis.
>       *
>       * Move them from OIF_CLAIMED to OIF_INSTALL_FLOWS.
>       */
> -    bool new_ifaces = false;
> -    HMAPX_FOR_EACH_SAFE (node, &mgr->ifaces_per_state[OIF_CLAIMED]) {
> -        struct ovs_iface *iface = node->data;
>  
> -        ovs_iface_set_state(mgr, iface, OIF_INSTALL_FLOWS);
> -        iface->install_seqno = mgr->iface_seqno + 1;
> -        new_ifaces = true;
> +    bool new_ifaces = false;
> +    if (!sb_readonly) {
> +        HMAPX_FOR_EACH_SAFE (node, &mgr->ifaces_per_state[OIF_CLAIMED]) {
> +            struct ovs_iface *iface = node->data;
> +            /* No need to check for chassis_update_required as already done
> +             * in if_status_handle_claims or if_status_mgr_claim_iface
> +             */
> +            ovs_iface_set_state(mgr, iface, OIF_INSTALL_FLOWS);
> +            iface->install_seqno = mgr->iface_seqno + 1;
> +            new_ifaces = true;
> +        }
> +    } else {
> +        HMAPX_FOR_EACH_SAFE (node, &mgr->ifaces_per_state[OIF_CLAIMED]) {

This can be HMAPX_FOR_EACH.

> +            struct ovs_iface *iface = node->data;
> +            static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 1);
> +            VLOG_INFO_RL(&rl,
> +                         "Not updating pb chassis for %s now as "
> +                         "sb is readonly", iface->id);
> +        }
>      }
>  
>      /* Request a seqno update when the flows for new interfaces have been
> @@ -403,7 +543,7 @@ if_status_mgr_update_bindings(struct if_status_mgr *mgr,
>      struct hmapx_node *node;
>  
>      /* Notify the binding module to set "down" all bindings that are still
> -     * in the process of being installed in OVS, i.e., are not yet instsalled.
> +     * in the process of being installed in OVS, i.e., are not yet installed.
>       */
>      HMAPX_FOR_EACH (node, &mgr->ifaces_per_state[OIF_INSTALL_FLOWS]) {
>          struct ovs_iface *iface = node->data;
> diff --git a/controller/if-status.h b/controller/if-status.h
> index bb8a3950d..f9b05d30d 100644
> --- a/controller/if-status.h
> +++ b/controller/if-status.h
> @@ -27,15 +27,27 @@ struct if_status_mgr *if_status_mgr_create(void);
>  void if_status_mgr_clear(struct if_status_mgr *);
>  void if_status_mgr_destroy(struct if_status_mgr *);
>  
> -void if_status_mgr_claim_iface(struct if_status_mgr *, const char *iface_id);
> +void if_status_mgr_claim_iface(struct if_status_mgr *,
> +                               const struct sbrec_port_binding *pb,
> +                               const struct sbrec_chassis *chassis_rec,
> +                               bool sb_readonly);
>  void if_status_mgr_release_iface(struct if_status_mgr *, const char *iface_id);
>  void if_status_mgr_delete_iface(struct if_status_mgr *, const char *iface_id);
>  
> -void if_status_mgr_update(struct if_status_mgr *, struct local_binding_data *);
> +void if_status_mgr_update(struct if_status_mgr *, struct local_binding_data *,
> +                          const struct sbrec_chassis *chassis,
> +                          bool sb_readonly);
>  void if_status_mgr_run(struct if_status_mgr *mgr, struct local_binding_data *,
>                         const struct sbrec_chassis *,
>                         bool sb_readonly, bool ovs_readonly);
>  void if_status_mgr_get_memory_usage(struct if_status_mgr *mgr,
>                                      struct simap *usage);
> +bool if_status_mgr_iface_is_present(struct if_status_mgr *mgr,
> +                                    const char *iface_id);
> +bool if_status_handle_claims(struct if_status_mgr *mgr,
> +                             struct local_binding_data *binding_data,
> +                             const struct sbrec_chassis *chassis_rec,
> +                             struct hmap *tracked_datapath,
> +                             bool sb_readonly);
>  
>  # endif /* controller/if-status.h */
> diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c
> index 69615308e..3947baf03 100644
> --- a/controller/ovn-controller.c
> +++ b/controller/ovn-controller.c
> @@ -1464,6 +1464,73 @@ en_runtime_data_run(struct engine_node *node, void *data)
>      engine_set_node_state(node, EN_UPDATED);
>  }
>  
> +struct ed_type_sb_ro {
> +    bool sb_readonly;
> +};
> +
> +static void *
> +en_sb_ro_init(struct engine_node *node OVS_UNUSED,
> +              struct engine_arg *arg OVS_UNUSED)
> +{
> +    struct ed_type_sb_ro *data = xzalloc(sizeof *data);
> +    return data;
> +}
> +
> +static void
> +en_sb_ro_run(struct engine_node *node, void *data)
> +{
> +    struct ed_type_sb_ro *sb_ro_data = data;
> +    bool sb_readonly = !engine_get_context()->ovnsb_idl_txn;
> +    if (sb_ro_data->sb_readonly != sb_readonly) {
> +        sb_ro_data->sb_readonly = sb_readonly;
> +        if (!sb_ro_data->sb_readonly) {
> +            engine_set_node_state(node, EN_UPDATED);
> +        }
> +    }
> +}
> +
> +static void
> +en_sb_ro_cleanup(void *data OVS_UNUSED)
> +{
> +}
> +
> +static bool
> +runtime_data_sb_ro_handler(struct engine_node *node, void *data)
> +{
> +    const struct sbrec_chassis *chassis = NULL;
> +
> +    struct ovsrec_open_vswitch_table *ovs_table =
> +        (struct ovsrec_open_vswitch_table *)EN_OVSDB_GET(
> +            engine_get_input("OVS_open_vswitch", node));
> +
> +    const char *chassis_id = get_ovs_chassis_id(ovs_table);
> +
> +    struct ovsdb_idl_index *sbrec_chassis_by_name =
> +        engine_ovsdb_node_get_index(
> +                engine_get_input("SB_chassis", node),
> +                "name");
> +
> +    if (chassis_id) {
> +        chassis = chassis_lookup_by_name(sbrec_chassis_by_name, chassis_id);
> +    }
> +    if (chassis) {
> +        struct ed_type_runtime_data *rt_data = data;
> +        bool sb_readonly = !engine_get_context()->ovnsb_idl_txn;
> +        struct controller_engine_ctx *ctrl_ctx =
> +            engine_get_context()->client_ctx;
> +
> +        if (if_status_handle_claims(ctrl_ctx->if_mgr,
> +                                    &rt_data->lbinding_data,
> +                                    chassis,
> +                                    &rt_data->tracked_dp_bindings,
> +                                    sb_readonly)) {
> +            engine_set_node_state(node, EN_UPDATED);
> +            rt_data->tracked = true;
> +        }
> +    }
> +    return true;
> +}
> +
>  static bool
>  runtime_data_ovs_interface_shadow_handler(struct engine_node *node, void *data)
>  {
> @@ -3528,6 +3595,7 @@ main(int argc, char *argv[])
>      stopwatch_create(VIF_PLUG_RUN_STOPWATCH_NAME, SW_MS);
>  
>      /* Define inc-proc-engine nodes. */
> +    ENGINE_NODE(sb_ro, "sb_ro");
>      ENGINE_NODE_WITH_CLEAR_TRACK_DATA_IS_VALID(ct_zones, "ct_zones");
>      ENGINE_NODE_WITH_CLEAR_TRACK_DATA(ovs_interface_shadow,
>                                        "ovs_interface_shadow");
> @@ -3664,6 +3732,7 @@ main(int argc, char *argv[])
>      engine_add_input(&en_ovs_interface_shadow, &en_ovs_interface,
>                       ovs_interface_shadow_ovs_interface_handler);
>  
> +    engine_add_input(&en_runtime_data, &en_sb_ro, runtime_data_sb_ro_handler);
>      engine_add_input(&en_runtime_data, &en_ofctrl_is_connected, NULL);
>  
>      engine_add_input(&en_runtime_data, &en_ovs_open_vswitch, NULL);
> @@ -4098,7 +4167,8 @@ main(int argc, char *argv[])
>                          runtime_data ? &runtime_data->lbinding_data : NULL;
>                      stopwatch_start(IF_STATUS_MGR_UPDATE_STOPWATCH_NAME,
>                                      time_msec());
> -                    if_status_mgr_update(if_mgr, binding_data);
> +                    if_status_mgr_update(if_mgr, binding_data, chassis,
> +                                         !ovnsb_idl_txn);
>                      stopwatch_stop(IF_STATUS_MGR_UPDATE_STOPWATCH_NAME,
>                                     time_msec());
>  
> diff --git a/tests/ovn-macros.at b/tests/ovn-macros.at
> index 335f9158c..8fd6ae6f7 100644
> --- a/tests/ovn-macros.at
> +++ b/tests/ovn-macros.at
> @@ -759,3 +759,15 @@ m4_define([OVN_FOR_EACH_NORTHD_WITHOUT_DP_GROUPS],
>       [m4_foreach([NORTHD_USE_DP_GROUPS], [no],
>         [m4_foreach([NORTHD_USE_PARALLELIZATION], [yes, no], [$1
>  ])])])])
> +
> +# OVN_NBCTL(NBCTL_COMMAND) adds NBCTL_COMMAND to list of commands to be run by RUN_OVN_NBCTL().
> +m4_define([OVN_NBCTL], [
> +    command="${command} -- $1"
> +])
> +
> +# RUN_OVN_NBCTL() executes list of commands built by the OVN_NBCTL() macro.
> +m4_define([RUN_OVN_NBCTL], [
> +    check ovn-nbctl ${command}
> +    unset command
> +])
> +
> diff --git a/tests/ovn.at b/tests/ovn.at
> index bfaa41962..94d16bac9 100644
> --- a/tests/ovn.at
> +++ b/tests/ovn.at
> @@ -102,6 +102,18 @@ m4_divert_text([PREPARE_TESTS],
>           test 1 -le $(as $hv1 ovs-ofctl dump-flows br-int | grep -c "output:$ofport")
>       ])
>     }
> +
> +   ovn_wait_remote_input_flows () {
> +     hv1=$1
> +     hv2=$2
> +     echo "$3: waiting for flows for remote input on $hv1"
> +     # Wait for a flow outputing  to remote input
> +     OVS_WAIT_UNTIL([
> +         ofport=$(as $hv1 ovs-vsctl --bare --columns ofport find Interface name=ovn-${hv2}-0)
> +         echo "tunnel port=$ofport"
> +         test 1 -le $(as $hv1 ovs-ofctl dump-flows br-int | grep -c "in_port=$ofport")
> +     ])
> +   }
>  ])
>  
>  m4_define([OVN_CHECK_PACKETS],
> @@ -127,6 +139,8 @@ m4_define([OVN_WAIT_PATCH_PORT_FLOWS],
>  m4_define([OVN_WAIT_REMOTE_OUTPUT_FLOWS],
>    [ovn_wait_remote_output_flows "$1" "$2" "__file__:__line__"])
>  
> +m4_define([OVN_WAIT_REMOTE_INPUT_FLOWS],
> +  [ovn_wait_remote_input_flows "$1" "$2" "__file__:__line__"])
>  
>  AT_BANNER([OVN components])
>  
> @@ -14056,6 +14070,11 @@ wait_column "$hv1_uuid" Port_Binding requested_chassis logical_port=lsp0
>  wait_column "$hv2_uuid" Port_Binding additional_chassis logical_port=lsp0
>  wait_column "$hv2_uuid" Port_Binding requested_additional_chassis logical_port=lsp0
>  
> +# Check ovn-installed updated for main chassis
> +wait_for_ports_up
> +OVS_WAIT_UNTIL([test `as hv1 ovs-vsctl get Interface lsp0 external_ids:ovn-installed` = '"true"'])
> +OVS_WAIT_UNTIL([test x`as hv2 ovs-vsctl get Interface lsp0 external_ids:ovn-installed` = x])
> +
>  # Check that setting iface:encap-ip populates Port_Binding:additional_encap
>  wait_row_count Encap 2 chassis_name=hv1
>  wait_row_count Encap 2 chassis_name=hv2
> @@ -14081,6 +14100,11 @@ wait_column "$hv2_uuid" Port_Binding requested_chassis logical_port=lsp0
>  wait_column "" Port_Binding additional_chassis logical_port=lsp0
>  wait_column "" Port_Binding requested_additional_chassis logical_port=lsp0
>  
> +# Check ovn-installed updated for main chassis and not for other chassis
> +wait_for_ports_up
> +OVS_WAIT_UNTIL([test `as hv2 ovs-vsctl get Interface lsp0 external_ids:ovn-installed` = '"true"'])
> +OVS_WAIT_UNTIL([test x`as hv1 ovs-vsctl get Interface lsp0 external_ids:ovn-installed` = x])
> +
>  # Check that additional_encap is cleared
>  wait_column "" Port_Binding additional_encap logical_port=lsp0
>  
> @@ -15370,7 +15394,10 @@ AT_CHECK([as hv1 ovs-ofctl dump-flows br-int table=65 | grep actions=output:1],
>  echo "verifying that lsp0 binding moves when requested-chassis is changed"
>  
>  ovn-nbctl lsp-set-options lsp0 requested-chassis=hv2
> -OVS_WAIT_UNTIL([test 1 = $(grep -c "Releasing lport lsp0 from this chassis" hv1/ovn-controller.log)])
> +
> +# We might see multiple "Releasing lport ...", when sb is read only
> +OVS_WAIT_UNTIL([test 1 -le $(grep -c "Releasing lport lsp0 from this chassis" hv1/ovn-controller.log)])
> +
>  wait_column "$hv2_uuid" Port_Binding chassis logical_port=lsp0
>  
>  # (6) Chassis hv2 should add flows and hv1 should not.
> @@ -15416,7 +15443,7 @@ AT_CHECK([as hv1 ovs-ofctl dump-flows br-int table=0 | grep in_port=1], [0], [ig
>  AT_CHECK([as hv1 ovs-ofctl dump-flows br-int table=65 | grep actions=output:1], [0], [ignore])
>  
>  check ovn-nbctl --wait=hv lsp-set-options lsp0 requested-chassis=non-existant-chassis
> -OVS_WAIT_UNTIL([test 1 = $(grep -c "Releasing lport lsp0 from this chassis" hv1/ovn-controller.log)])
> +OVS_WAIT_UNTIL([test 1 -le $(grep -c "Releasing lport lsp0 from this chassis" hv1/ovn-controller.log)])
>  check ovn-nbctl --wait=hv sync
>  wait_column '' Port_Binding chasssi logical_port=lsp0
>  AT_CHECK([as hv1 ovs-ofctl dump-flows br-int table=0 | grep in_port=1], [1], [])
> @@ -32418,3 +32445,119 @@ AT_CHECK([test $(ovn-sbctl list fdb | grep -c "00:00:00:00:10:30") = 0])
>  OVN_CLEANUP([hv1])
>  AT_CLEANUP
>  ])
> +
> +OVN_FOR_EACH_NORTHD([
> +AT_SETUP([recomputes])
> +ovn_start
> +
> +n_hv=4
> +
> +# Add chassis
> +net_add n1
> +for i in $(seq 1 $n_hv); do
> +    sim_add hv$i
> +    as hv$i
> +    check ovs-vsctl add-br br-phys
> +    ovs-vsctl set open . external-ids:ovn-bridge-mappings=phys:br-phys
> +    ovn_attach n1 br-phys 192.168.0.$i 24 geneve
> +done
> +
> +add_switch_ports() {
> +    start_port=$1
> +    end_port=$2
> +    nb_hv=$3
> +    bulk_size=$4
> +    for ((i=start_port; i<end_port; )) do
> +        start_bulk=$i
> +        for hv in $(seq 1 $nb_hv); do
> +            end_bulk=$((start_bulk+bulk_size-1))
> +            for port in $(seq $start_bulk $end_bulk); do
> +                logical_switch_port=lsp${port}
> +                OVN_NBCTL(lsp-add ls1 $logical_switch_port)
> +                OVN_NBCTL(lsp-set-addresses $logical_switch_port dynamic)
> +            done
> +            start_bulk=$((end_bulk+1))
> +        done
> +        RUN_OVN_NBCTL()
> +
> +        start_bulk=$i
> +        for hv in $(seq 1 $nb_hv); do
> +            end_bulk=$((start_bulk+bulk_size-1))
> +            for port in $(seq $start_bulk $end_bulk); do
> +                logical_switch_port=lsp${port}
> +                as hv$hv ovs-vsctl \
> +                    --no-wait -- add-port br-int vif${port} \
> +                    -- set Interface vif${port} external_ids:iface-id=$logical_switch_port
> +            done
> +            start_bulk=$((end_bulk+1))
> +        done
> +        i=$((end_bulk+1))
> +    done
> +}
> +check ovn-nbctl ls-add ls1
> +check ovn-nbctl set Logical_Switch ls1 other_config:subnet=10.1.0.0/16
> +check ovn-nbctl set Logical_Switch ls1 other_config:exclude_ips=10.1.255.254
> +
> +check ovn-nbctl lr-add lr1
> +check ovn-nbctl lsp-add ls1 lsp0 -- set Logical_Switch_Port lsp0 type=router options:router-port=lrp0 addresses=dynamic
> +check ovn-nbctl lrp-add lr1 lrp0 "f0:00:00:01:00:01" 10.1.255.254/16
> +check ovn-nbctl lr-nat-add lr1 snat 10.2.0.1 10.1.0.0/16
> +
> +lflow_run=0
> +check ovn-nbctl --wait=hv sync
> +
> +# Tunnel ports might not be added (yet) at this point on slow system.
> +# Wait for flows related to such ports to ensure those ports have been added
> +# before we measure recomputes. Otherwise, ovs_interface handler might be run
> +# afterwards for tunnel ports, causing recomputes.
> +for i in $(seq 1 $n_hv); do
> +    for j in $(seq 1 $n_hv); do
> +        if test $i != $j; then
> +            OVN_WAIT_REMOTE_INPUT_FLOWS(["hv$i"],["hv$j"])
> +        fi
> +    done
> +done
> +
> +for i in $(seq 1 $n_hv); do
> +    as hv$i
> +    lflow_run1=$(ovn-appctl -t ovn-controller coverage/read-counter lflow_run)
> +    lflow_run=`expr $lflow_run1 + $lflow_run`
> +done
> +
> +add_switch_ports 1 1000 $n_hv 5
> +
> +wait_for_ports_up
> +check ovn-nbctl --wait=hv sync
> +
> +for i in $(seq 1 $n_hv); do
> +    pid=$(cat hv${i}/ovn-controller.pid)
> +    u=$((u+$(cat "/proc/$pid/stat" | awk '{print $14}')))
> +    s=$((s+$(cat "/proc/$pid/stat" | awk '{print $15}')))
> +done
> +
> +n_pid=$(cat northd/ovn-northd.pid)
> +n_u=$(cat "/proc/$pid/stat" | awk '{print $14}')
> +n_s=$(cat "/proc/$pid/stat" | awk '{print $15}')
> +
> +echo "Total Northd User Time: $n_u"
> +echo "Total Northd System Time: $n_s"
> +echo "Total Controller User Time: $u"
> +echo "Total Controller System Time: $s"
> +
> +lflow_run_end=0
> +for i in $(seq 1 $n_hv); do
> +    as hv$i
> +    lflow_run1=$(ovn-appctl -t ovn-controller coverage/read-counter lflow_run)
> +    lflow_run_end=`expr $lflow_run1 + $lflow_run_end`
> +done
> +n_recomputes=`expr $lflow_run_end - $lflow_run`
> +echo "$n_recomputes recomputes"
> +
> +AT_CHECK([test $lflow_run_end == $lflow_run])
> +
> +for i in $(seq 2 $n_hv); do
> +    OVN_CLEANUP_SBOX([hv$i])
> +done
> +OVN_CLEANUP([hv1])
> +AT_CLEANUP
> +])
> diff --git a/tests/perf-northd.at b/tests/perf-northd.at
> index 74b69e9d4..6ec196b36 100644
> --- a/tests/perf-northd.at
> +++ b/tests/perf-northd.at
> @@ -76,23 +76,6 @@ m4_define([PERF_RECORD_STOP], [
>      PERF_RECORD_STOPWATCH(ovn-northd-loop, ["Short term average"], [Average (northd-loop in msec)])
>  ])
>  
> -# OVN_NBCTL([NBCTL_COMMAND])
> -#
> -# Add NBCTL_COMMAND to list of commands to be run by RUN_OVN_NBCTL().
> -#
> -m4_define([OVN_NBCTL], [
> -    command="${command} -- $1"
> -])
> -
> -# RUN_OVN_NBCTL()
> -#
> -# Execute list of commands built by the OVN_NBCTL() macro.
> -#
> -m4_define([RUN_OVN_NBCTL], [
> -    check ovn-nbctl ${command}
> -    unset command
> -])
> -
>  OVS_START_SHELL_HELPERS
>  generate_subnet () {
>      local a=$(printf %d $(expr $1 / 256 + 10))
Han Zhou July 7, 2022, 6:49 a.m. UTC | #2
On Wed, Jun 22, 2022 at 3:23 AM Xavier Simonart <xsimonar@redhat.com> wrote:
>
> When VIF ports are claimed on a chassis, SBDB Port_Binding table is
updated.
> If the SBDB IDL is still is read-only ("in transaction") when such a
update
> is required, the update is not possible and recompute is triggered through
> I+P failure.
>
> This situation can happen:
> - after updating Port_Binding->chassis to SBDB for one port, in a
following
>   iteration, ovn-controller handles Interface:external_ids:ovn-installed
>   (for the same port) while SBDB is still read-only.
> - after updating Port_Binding->chassis to SBDB for one port, in a
following
>   iteration, ovn-controller updates Port_Binding->chassis for another
port,
>   while SBDB is still read-only.
>
> This patch prevent the recompute, by having the if-status module
> updating the Port_Binding chassis (if needed) when possible.
> This does not delay Port_Binding chassis update compared to before this
patch.
> - With the patch, Port_Binding chassis will be updated as soon as SBDB is
> again writable, without recompute.
> - Without the patch, Port_Binding chassis was updated as soon as SBDB was
> again writable, through a recompute.
>
> As part of this patch, ovn-installed will not be updated for additional
chassis;
> it will only be updated when the migration is completed.
>
> Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2059253
> Signed-off-by: Xavier Simonart <xsimonar@redhat.com>
>
> ---
> v2:  - handled Dumitru's comments.
>      - handled Han's comments, mainly ensure we moved out of CLAIMED state
>        only after updating pb->chassis to guarentee physical flows are
installed
>        when ovn-installed is updated in OVS.
>      - slighly reorganize the code to isolate 'notify_up = false' cases in
>        claim_port (i.e. ports such as virtual ports), in the idea of
making
>        future patch preventing recomputes when virtual ports are claimed.
>      - updated test case to cause more race conditions.
>      - rebased on origin/main
>      - note that "additional chassis" as now supported by
>        "Support LSP:options:requested-chassis as a list" might still cause
>        recomputes.
>      - fixed missing flows when Port_Binding chassis was updated by
mgr_update
>        w/o any lflow recalculation.
> v3:  - handled Dumitru's comments on v2, mainly have runtime_data handler
>        handling pb_claims when sb becomes writable (instead of a lflow
handler).
>      - fixed test as it was not checking recomputes on all hv, as well as
a flaky
>        behavior.
>      - rebased on origin/main.
> ---
>  controller/binding.c        | 154 +++++++++++++++++++++----------
>  controller/binding.h        |  15 +++-
>  controller/if-status.c      | 174 ++++++++++++++++++++++++++++++++----
>  controller/if-status.h      |  16 +++-
>  controller/ovn-controller.c |  72 ++++++++++++++-
>  tests/ovn-macros.at         |  12 +++
>  tests/ovn.at                | 147 +++++++++++++++++++++++++++++-
>  tests/perf-northd.at        |  17 ----
>  8 files changed, 519 insertions(+), 88 deletions(-)
>
> diff --git a/controller/binding.c b/controller/binding.c
> index 2279570f9..b21577f71 100644
> --- a/controller/binding.c
> +++ b/controller/binding.c
> @@ -644,11 +644,17 @@ local_binding_get_lport_ofport(const struct shash
*local_bindings,
>  }
>
>  bool
> -local_binding_is_up(struct shash *local_bindings, const char *pb_name)
> +local_binding_is_up(struct shash *local_bindings, const char *pb_name,
> +                    const struct sbrec_chassis *chassis_rec)
>  {
>      struct local_binding *lbinding =
>          local_binding_find(local_bindings, pb_name);
>      struct binding_lport *b_lport =
local_binding_get_primary_lport(lbinding);
> +
> +    if (b_lport && b_lport->pb->chassis != chassis_rec) {
> +        return false;
> +    }
> +
>      if (lbinding && b_lport && lbinding->iface) {
>          if (b_lport->pb->n_up && !b_lport->pb->up[0]) {
>              return false;
> @@ -660,13 +666,23 @@ local_binding_is_up(struct shash *local_bindings,
const char *pb_name)
>  }
>
>  bool
> -local_binding_is_down(struct shash *local_bindings, const char *pb_name)
> +local_binding_is_down(struct shash *local_bindings, const char *pb_name,
> +                      const struct sbrec_chassis *chassis_rec)
>  {
>      struct local_binding *lbinding =
>          local_binding_find(local_bindings, pb_name);
>
>      struct binding_lport *b_lport =
local_binding_get_primary_lport(lbinding);
>
> +    if (b_lport) {
> +        if (b_lport->pb->chassis == chassis_rec) {
> +            return false;
> +        } else if (b_lport->pb->chassis) {
> +            VLOG_DBG("lport %s already claimed by other chassis",
> +                     b_lport->pb->logical_port);
> +        }
> +    }
> +
>      if (!lbinding) {
>          return true;
>      }
> @@ -884,37 +900,80 @@ get_lport_type_str(enum en_lport_type lport_type)
>      OVS_NOT_REACHED();
>  }
>
> -/* For newly claimed ports, if 'notify_up' is 'false':
> +void
> +set_pb_chassis_in_sbrec(const struct sbrec_port_binding *pb,
> +                        const struct sbrec_chassis *chassis_rec,
> +                        bool is_set)
> +{
> +    if (pb->chassis != chassis_rec) {
> +         if (is_set) {
> +            if (pb->chassis) {
> +                VLOG_INFO("Changing chassis for lport %s from %s to %s.",
> +                          pb->logical_port, pb->chassis->name,
> +                          chassis_rec->name);
> +            } else {
> +                VLOG_INFO("Claiming lport %s for this chassis.",
> +                          pb->logical_port);
> +            }
> +            for (int i = 0; i < pb->n_mac; i++) {
> +                VLOG_INFO("%s: Claiming %s", pb->logical_port,
pb->mac[i]);
> +            }
> +            sbrec_port_binding_set_chassis(pb, chassis_rec);
> +        }
> +    } else if (!is_set) {
> +        sbrec_port_binding_set_chassis(pb, NULL);
> +    }
> +}
> +
> +void
> +local_binding_set_pb(struct shash *local_bindings, const char *pb_name,
> +                     const struct sbrec_chassis *chassis_rec,
> +                     struct hmap *tracked_datapaths, bool is_set)
> +{
> +    struct local_binding *lbinding =
> +        local_binding_find(local_bindings, pb_name);
> +    struct binding_lport *b_lport =
local_binding_get_primary_lport(lbinding);
> +
> +    if (b_lport) {
> +        set_pb_chassis_in_sbrec(b_lport->pb, chassis_rec, is_set);
> +        if (tracked_datapaths) {
> +            update_lport_tracking(b_lport->pb, tracked_datapaths, true);
> +        }
> +    }
> +}
> +
> +/* For newly claimed ports:
>   * - set the 'pb.up' field to true if 'pb' has no 'parent_pb'.
>   * - set the 'pb.up' field to true if 'parent_pb.up' is 'true' (e.g., for
>   *   container and virtual ports).
> - * Otherwise request a notification to be sent when the OVS flows
> - * corresponding to 'pb' have been installed.
> + *
> + * Returns false if lport is not claimed due to 'sb_readonly'.
> + * Returns true otherwise.
>   *
>   * Note:
> - *   Updates (directly or through a notification) the 'pb->up' field
only if
> - *   it's explicitly set to 'false'.
> + *   Updates the 'pb->up' field only if it's explicitly set to 'false'.
>   *   This is to ensure compatibility with older versions of ovn-northd.
>   */
> -static void
> +static bool
>  claimed_lport_set_up(const struct sbrec_port_binding *pb,
>                       const struct sbrec_port_binding *parent_pb,
> -                     const struct sbrec_chassis *chassis_rec,
> -                     bool notify_up, struct if_status_mgr *if_mgr)
> +                     bool sb_readonly)
>  {
> -    if (!notify_up) {
> -        bool up = true;
> -        if (!parent_pb || (parent_pb->n_up && parent_pb->up[0])) {
> +    /* When notify_up is false in claim_port(), no state is created
> +     * by if_status_mgr. In such cases, return false (i.e. trigger
recompute)
> +     * if we can't update sb (because it is readonly).
> +     */
> +    bool up = true;
> +    if (!parent_pb || (parent_pb->n_up && parent_pb->up[0])) {
> +        if (!sb_readonly) {
>              if (pb->n_up) {
>                  sbrec_port_binding_set_up(pb, &up, 1);
>              }
> +        } else if (pb->n_up && !pb->up[0]) {
> +            return false;
>          }
> -        return;
> -    }
> -
> -    if (pb->chassis != chassis_rec || (pb->n_up && !pb->up[0])) {
> -        if_status_mgr_claim_iface(if_mgr, pb->logical_port);
>      }
> +    return true;
>  }
>
>  typedef void (*set_func)(const struct sbrec_port_binding *pb,
> @@ -1057,37 +1116,35 @@ claim_lport(const struct sbrec_port_binding *pb,
>              struct hmap *tracked_datapaths,
>              struct if_status_mgr *if_mgr)
>  {
> -    if (!sb_readonly) {
> -        claimed_lport_set_up(pb, parent_pb, chassis_rec, notify_up,
if_mgr);
> -    }
> -
>      enum can_bind can_bind = lport_can_bind_on_this_chassis(chassis_rec,
pb);
>      bool update_tracked = false;
>
>      if (can_bind == CAN_BIND_AS_MAIN) {
>          if (pb->chassis != chassis_rec) {
> -            if (sb_readonly) {
> -                return false;
> -            }
> -
> -            if (pb->chassis) {
> -                VLOG_INFO("Changing chassis for lport %s from %s to %s.",
> -                        pb->logical_port, pb->chassis->name,
> -                        chassis_rec->name);
> -            } else {
> -                VLOG_INFO("Claiming lport %s for this chassis.",
> -                          pb->logical_port);
> -            }
> -            for (size_t i = 0; i < pb->n_mac; i++) {
> -                VLOG_INFO("%s: Claiming %s", pb->logical_port,
pb->mac[i]);
> -            }
> -
> -            sbrec_port_binding_set_chassis(pb, chassis_rec);
>              if (is_additional_chassis(pb, chassis_rec)) {
> +                if (sb_readonly) {
> +                    return false;
> +                }
>                  remove_additional_chassis(pb, chassis_rec);
>              }
>              update_tracked = true;
>          }
> +        if (!notify_up) {
> +            if (!claimed_lport_set_up(pb, parent_pb, sb_readonly)) {
> +                return false;
> +            }
> +            if (pb->chassis != chassis_rec) {
> +                if (sb_readonly) {
> +                    return false;
> +                }
> +                set_pb_chassis_in_sbrec(pb, chassis_rec, true);
> +            }
> +        } else {
> +            if ((pb->chassis != chassis_rec) || (pb->n_up &&
!pb->up[0])) {
> +                if_status_mgr_claim_iface(if_mgr, pb, chassis_rec,
> +                                          sb_readonly);
> +            }
> +        }
>      } else if (can_bind == CAN_BIND_AS_ADDITIONAL) {
>          if (!is_additional_chassis(pb, chassis_rec)) {
>              if (sb_readonly) {
> @@ -1132,7 +1189,8 @@ claim_lport(const struct sbrec_port_binding *pb,
>   */
>  static bool
>  release_lport_main_chassis(const struct sbrec_port_binding *pb,
> -                           bool sb_readonly)
> +                           bool sb_readonly,
> +                           struct if_status_mgr *if_mgr)
>  {
>      if (pb->encap) {
>          if (sb_readonly) {
> @@ -1141,11 +1199,13 @@ release_lport_main_chassis(const struct
sbrec_port_binding *pb,
>          sbrec_port_binding_set_encap(pb, NULL);
>      }
>
> +    /* If sb readonly, pb->chassis unset through if-status if present. */
>      if (pb->chassis) {
> -        if (sb_readonly) {
> +        if (!sb_readonly) {
> +            sbrec_port_binding_set_chassis(pb, NULL);
> +        } else if (!if_status_mgr_iface_is_present(if_mgr,
pb->logical_port)) {
>              return false;
>          }
> -        sbrec_port_binding_set_chassis(pb, NULL);
>      }
>
>      if (pb->virtual_parent) {
> @@ -1155,7 +1215,8 @@ release_lport_main_chassis(const struct
sbrec_port_binding *pb,
>          sbrec_port_binding_set_virtual_parent(pb, NULL);
>      }
>
> -    VLOG_INFO("Releasing lport %s from this chassis.", pb->logical_port);
> +    VLOG_INFO("Releasing lport %s from this chassis (sb_readonly=%d)",
> +              pb->logical_port, sb_readonly);
>      return true;
>  }
>
> @@ -1189,7 +1250,7 @@ release_lport(const struct sbrec_port_binding *pb,
>                struct hmap *tracked_datapaths, struct if_status_mgr
*if_mgr)
>  {
>      if (pb->chassis == chassis_rec) {
> -        if (!release_lport_main_chassis(pb, sb_readonly)) {
> +        if (!release_lport_main_chassis(pb, sb_readonly, if_mgr)) {
>              return false;
>          }
>      } else if (is_additional_chassis(pb, chassis_rec)) {
> @@ -1271,7 +1332,7 @@ consider_vif_lport_(const struct sbrec_port_binding
*pb,
>                               b_lport->lbinding->iface,
>                               !b_ctx_in->ovnsb_idl_txn,
>                               !parent_pb, b_ctx_out->tracked_dp_bindings,
> -                             b_ctx_out->if_mgr)){
> +                             b_ctx_out->if_mgr)) {
>                  return false;
>              }
>
> @@ -1527,7 +1588,8 @@ consider_localport(const struct sbrec_port_binding
*pb,
>      enum can_bind can_bind = lport_can_bind_on_this_chassis(
>          b_ctx_in->chassis_rec, pb);
>      if (can_bind == CAN_BIND_AS_MAIN) {
> -        if (!release_lport_main_chassis(pb, !b_ctx_in->ovnsb_idl_txn)) {
> +        if (!release_lport_main_chassis(pb, !b_ctx_in->ovnsb_idl_txn,
> +            b_ctx_out->if_mgr)) {
>              return false;
>          }
>      } else if (can_bind == CAN_BIND_AS_ADDITIONAL) {
> diff --git a/controller/binding.h b/controller/binding.h
> index 1fed06674..d20659b0b 100644
> --- a/controller/binding.h
> +++ b/controller/binding.h
> @@ -151,8 +151,10 @@ const struct sbrec_port_binding
*local_binding_get_primary_pb(
>  ofp_port_t local_binding_get_lport_ofport(const struct shash
*local_bindings,
>                                            const char *pb_name);
>
> -bool local_binding_is_up(struct shash *local_bindings, const char
*pb_name);
> -bool local_binding_is_down(struct shash *local_bindings, const char
*pb_name);
> +bool local_binding_is_up(struct shash *local_bindings, const char
*pb_name,
> +                         const struct sbrec_chassis *);
> +bool local_binding_is_down(struct shash *local_bindings, const char
*pb_name,
> +                           const struct sbrec_chassis *);
>  void local_binding_set_up(struct shash *local_bindings, const char
*pb_name,
>                            const struct sbrec_chassis *chassis_rec,
>                            const char *ts_now_str, bool sb_readonly,
> @@ -160,7 +162,10 @@ void local_binding_set_up(struct shash
*local_bindings, const char *pb_name,
>  void local_binding_set_down(struct shash *local_bindings, const char
*pb_name,
>                              const struct sbrec_chassis *chassis_rec,
>                              bool sb_readonly, bool ovs_readonly);
> -
> +void local_binding_set_pb(struct shash *local_bindings, const char
*pb_name,
> +                          const struct sbrec_chassis *chassis_rec,
> +                          struct hmap *tracked_datapaths,
> +                          bool is_set);
>  void binding_register_ovs_idl(struct ovsdb_idl *);
>  void binding_run(struct binding_ctx_in *, struct binding_ctx_out *);
>  bool binding_cleanup(struct ovsdb_idl_txn *ovnsb_idl_txn,
> @@ -178,6 +183,10 @@ void binding_dump_local_bindings(struct
local_binding_data *, struct ds *);
>  bool is_additional_chassis(const struct sbrec_port_binding *pb,
>                             const struct sbrec_chassis *chassis_rec);
>
> +void set_pb_chassis_in_sbrec(const struct sbrec_port_binding *pb,
> +                             const struct sbrec_chassis *chassis_rec,
> +                             bool is_set);
> +
>  /* Corresponds to each Port_Binding.type. */
>  enum en_lport_type {
>      LP_UNKNOWN,
> diff --git a/controller/if-status.c b/controller/if-status.c
> index ad61844d8..7693c289b 100644
> --- a/controller/if-status.c
> +++ b/controller/if-status.c
> @@ -24,6 +24,7 @@
>  #include "lib/util.h"
>  #include "timeval.h"
>  #include "openvswitch/vlog.h"
> +#include "lib/ovn-sb-idl.h"
>
>  VLOG_DEFINE_THIS_MODULE(if_status);
>
> @@ -53,9 +54,11 @@ VLOG_DEFINE_THIS_MODULE(if_status);
>   */
>
>  enum if_state {
> -    OIF_CLAIMED,       /* Newly claimed interface. */
> -    OIF_INSTALL_FLOWS, /* Already claimed interface for which flows are
still
> -                        * being installed.
> +    OIF_CLAIMED,       /* Newly claimed interface. pb->chassis not yet
updated.
> +                        */
> +    OIF_INSTALL_FLOWS, /* Claimed interface with pb->chassis successfully
> +                        * updated in SB and for which flows are still
being
> +                        * installed.
>                          */
>      OIF_MARK_UP,       /* Interface with flows successfully installed in
OVS
>                          * but not yet marked "up" in the binding module
(in
> @@ -78,6 +81,55 @@ static const char *if_state_names[] = {
>      [OIF_INSTALLED]     = "INSTALLED",
>  };
>
> +/*
> + *       +----------------------+
> + * +---> |                      |
> + * | +-> |         NULL         |
<--------------------------------------+++-+
> + * | |   +----------------------+
     |
> + * | |     ^ release_iface   | claim_iface
    |
> + * | |     |                 V - sbrec_update_chassis(if sb is rw)
    |
> + * | |   +----------------------+
     |
> + * | |   |                      |
<----------------------------------------+ |
> + * | |   |       CLAIMED        |
<--------------------------------------+ | |
> + * | |   +----------------------+
 | | |
> + * | |                  | mgr_update(when sb is rw)
 | | |
> + * | | release_iface    |  - sbrec_update_chassis
 | | |
> + * | |                  |  - request seqno
| | |
> + * | |                  V
 | | |
> + * | |   +----------------------+
 | | |
> + * | +-- |                      |  mgr_run(seqno not rcvd)
| | |
> + * |     |    INSTALL_FLOWS     |   - set port down in sb
 | | |
> + * |     |                      |  mgr_update()
 | | |
> + * |     +----------------------+   - sbrec_update_chassis if needed
| | |
> + * |                    |
 | | |
> + * |                    |  mgr_run(seqno rcvd)
| | |
> + * |                    |  - set port up in sb
| | |
> + * | release_iface      |  - set ovn-installed in ovs
 | | |
> + * |                    V
 | | |
> + * |   +----------------------+
 | | |
> + * |   |                      |  mgr_run()
| | |
> + * +-- |       MARK_UP        |  - set port up in sb
| | |
> + *     |                      |  - set ovn-installed in ovs
 | | |
> + *     |                      |  mgr_update()
 | | |
> + *     +----------------------+  - sbrec_update_chassis if needed
 | | |
> + *              |
 | | |
> + *              | mgr_update(rcvd port up / ovn_installed & chassis set)
| | |
> + *              V
 | | |
> + *     +----------------------+
 | | |
> + *     |      INSTALLED       | ------------> claim_iface
---------------+ | |
> + *     +----------------------+
   | |
> + *              |
   | |
> + *              | release_iface
   | |
> + *              V
   | |
> + *     +----------------------+
   | |
> + *     |                      | ------------> claim_iface
-----------------+ |
> + *     |      MARK_DOWN       | ------> mgr_update(rcvd port down)
----------+
> + *     |                      | mgr_run()
> + *     |                      | - set port down in sb
> + *     |                      | mgr_update()
> + *     +----------------------+ - sbrec_update_chassis(NULL)
> + */
> +
>  struct ovs_iface {
>      char *id;               /* Extracted from OVS external_ids.iface_id.
*/
>      enum if_state state;    /* State of the interface in the state
machine. */
> @@ -85,6 +137,7 @@ struct ovs_iface {
>                               * be fully programmed in OVS.  Only used in
state
>                               * OIF_INSTALL_FLOWS.
>                               */
> +    bool chassis_update_required;  /* If true, pb->chassis must be
updated. */
>  };
>
>  static uint64_t ifaces_usage;
> @@ -158,14 +211,23 @@ if_status_mgr_destroy(struct if_status_mgr *mgr)
>  }
>
>  void
> -if_status_mgr_claim_iface(struct if_status_mgr *mgr, const char
*iface_id)
> +if_status_mgr_claim_iface(struct if_status_mgr *mgr,
> +                          const struct sbrec_port_binding *pb,
> +                          const struct sbrec_chassis *chassis_rec,
> +                          bool sb_readonly)
>  {
> +    const char *iface_id = pb->logical_port;
>      struct ovs_iface *iface = shash_find_data(&mgr->ifaces, iface_id);
>
>      if (!iface) {
>          iface = ovs_iface_create(mgr, iface_id, OIF_CLAIMED);
>      }
> -
> +    if (!sb_readonly) {
> +        set_pb_chassis_in_sbrec(pb, chassis_rec, true);
> +        iface->chassis_update_required = false;
> +    } else {
> +        iface->chassis_update_required = true;
> +    }
>      switch (iface->state) {
>      case OIF_CLAIMED:
>      case OIF_INSTALL_FLOWS:
> @@ -182,6 +244,12 @@ if_status_mgr_claim_iface(struct if_status_mgr *mgr,
const char *iface_id)
>      }
>  }
>
> +bool
> +if_status_mgr_iface_is_present(struct if_status_mgr *mgr, const char
*iface_id)
> +{
> +    return !!shash_find_data(&mgr->ifaces, iface_id);
> +}
> +
>  void
>  if_status_mgr_release_iface(struct if_status_mgr *mgr, const char
*iface_id)
>  {
> @@ -246,9 +314,39 @@ if_status_mgr_delete_iface(struct if_status_mgr
*mgr, const char *iface_id)
>      }
>  }
>
> +bool
> +if_status_handle_claims(struct if_status_mgr *mgr,
> +                        struct local_binding_data *binding_data,
> +                        const struct sbrec_chassis *chassis_rec,
> +                        struct hmap *tracked_datapath,
> +                        bool sb_readonly)
> +{
> +    if (!binding_data || sb_readonly) {
> +        return false;
> +    }
> +
> +    struct shash *bindings = &binding_data->bindings;
> +    struct hmapx_node *node;
> +
> +    bool rc = false;
> +    HMAPX_FOR_EACH (node, &mgr->ifaces_per_state[OIF_CLAIMED]) {
> +        struct ovs_iface *iface = node->data;
> +        if (iface->chassis_update_required) {

Thanks Xavier for the revision. The state machine looks more clear now, but
I have a major concern for the use of chassis_update_required. This bool
flag is used to decide if an update to SB is needed, and once a SB update
is requested, it is set to false, and assumes the SB update will succeed
immediately. However, the assumption may be wrong. There can be different
kinds of reasons that the subsequent SB update fails, or delayed, so this
flag is not reliable. Instead, in CLAIMED state, the responsibility to make
sure the SB update is completed. If the transaction is in-progress, the
sb_readonly is true. So if sb_readonly is false, it means nothing is
in-progress, so we can always check if (!sb_readonly && <SB chassis is not
updated for the port-binding>) we should just send the update, regardless
of whether we have requested it before. Please also see another comment
below for the state transition.

> +            VLOG_INFO("if_status_handle_claims for %s", iface->id);
> +            local_binding_set_pb(bindings, iface->id, chassis_rec,
> +                                 tracked_datapath, true);
> +            rc = true;
> +        }
> +        iface->chassis_update_required = false;
> +    }
> +    return rc;
> +}
> +
>  void
>  if_status_mgr_update(struct if_status_mgr *mgr,
> -                     struct local_binding_data *binding_data)
> +                     struct local_binding_data *binding_data,
> +                     const struct sbrec_chassis *chassis_rec,
> +                     bool sb_readonly)
>  {
>      if (!binding_data) {
>          return;
> @@ -257,13 +355,25 @@ if_status_mgr_update(struct if_status_mgr *mgr,
>      struct shash *bindings = &binding_data->bindings;
>      struct hmapx_node *node;
>
> +    /* Interfaces in OIF_MARK_UP state have already set their
pb->chassis.
> +     * However, it might have been reset by another hv.
> +     */
>      /* Move all interfaces that have been confirmed "up" by the binding
module,
>       * from OIF_MARK_UP to OIF_INSTALLED.
>       */
>      HMAPX_FOR_EACH_SAFE (node, &mgr->ifaces_per_state[OIF_MARK_UP]) {
>          struct ovs_iface *iface = node->data;
>
> -        if (local_binding_is_up(bindings, iface->id)) {
> +        if (iface->chassis_update_required) {
> +            if (!sb_readonly) {
> +                iface->chassis_update_required = false;
> +                local_binding_set_pb(bindings, iface->id, chassis_rec,
> +                                     NULL, true);
> +            } else {
> +                continue;
> +            }
> +        }
> +        if (local_binding_is_up(bindings, iface->id, chassis_rec)) {
>              ovs_iface_set_state(mgr, iface, OIF_INSTALLED);
>          }
>      }
> @@ -274,23 +384,53 @@ if_status_mgr_update(struct if_status_mgr *mgr,
>      HMAPX_FOR_EACH_SAFE (node, &mgr->ifaces_per_state[OIF_MARK_DOWN]) {
>          struct ovs_iface *iface = node->data;
>
> -        if (local_binding_is_down(bindings, iface->id)) {
> +        if (!sb_readonly) {
> +            local_binding_set_pb(bindings, iface->id, chassis_rec,
> +                                 NULL, false);
> +        }
> +        if (local_binding_is_down(bindings, iface->id, chassis_rec)) {
>              ovs_iface_destroy(mgr, iface);
>          }
>      }
>
> -    /* Register for a notification about flows being installed in OVS
for all
> -     * newly claimed interfaces.
> +    if (!sb_readonly) {
> +        HMAPX_FOR_EACH_SAFE (node,
&mgr->ifaces_per_state[OIF_INSTALL_FLOWS]) {
> +            struct ovs_iface *iface = node->data;
> +
> +            if (iface->chassis_update_required) {
> +                iface->chassis_update_required = false;
> +                local_binding_set_pb(bindings, iface->id, chassis_rec,
> +                                     NULL, true);
> +            }
> +        }
> +    }
> +
> +    /* Update Port_Binding->chassis for newly claimed interfaces
> +     * Register for a notification about flows being installed in OVS
for all
> +     * newly claimed interfaces for which we could update pb->chassis.
>       *
>       * Move them from OIF_CLAIMED to OIF_INSTALL_FLOWS.
>       */
> -    bool new_ifaces = false;
> -    HMAPX_FOR_EACH_SAFE (node, &mgr->ifaces_per_state[OIF_CLAIMED]) {
> -        struct ovs_iface *iface = node->data;
>
> -        ovs_iface_set_state(mgr, iface, OIF_INSTALL_FLOWS);
> -        iface->install_seqno = mgr->iface_seqno + 1;
> -        new_ifaces = true;
> +    bool new_ifaces = false;
> +    if (!sb_readonly) {
> +        HMAPX_FOR_EACH_SAFE (node, &mgr->ifaces_per_state[OIF_CLAIMED]) {
> +            struct ovs_iface *iface = node->data;
> +            /* No need to check for chassis_update_required as already
done
> +             * in if_status_handle_claims or if_status_mgr_claim_iface
> +             */
> +            ovs_iface_set_state(mgr, iface, OIF_INSTALL_FLOWS);

We need to make sure the chassis in SB port-binding is up-to-date (i.e. the
update notification from SB DB has been received) before moving to
INSTALL_FLOWS. Otherwise, it is still possible that the state is moved too
early and end up with incomplete flow installation for the lport when the
state is finally moved to INSTALLED.

Thanks,
Han

> +            iface->install_seqno = mgr->iface_seqno + 1;
> +            new_ifaces = true;
> +        }
> +    } else {
> +        HMAPX_FOR_EACH_SAFE (node, &mgr->ifaces_per_state[OIF_CLAIMED]) {
> +            struct ovs_iface *iface = node->data;
> +            static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5,
1);
> +            VLOG_INFO_RL(&rl,
> +                         "Not updating pb chassis for %s now as "
> +                         "sb is readonly", iface->id);
> +        }
>      }
>
>      /* Request a seqno update when the flows for new interfaces have been
> @@ -403,7 +543,7 @@ if_status_mgr_update_bindings(struct if_status_mgr
*mgr,
>      struct hmapx_node *node;
>
>      /* Notify the binding module to set "down" all bindings that are
still
> -     * in the process of being installed in OVS, i.e., are not yet
instsalled.
> +     * in the process of being installed in OVS, i.e., are not yet
installed.
>       */
>      HMAPX_FOR_EACH (node, &mgr->ifaces_per_state[OIF_INSTALL_FLOWS]) {
>          struct ovs_iface *iface = node->data;
> diff --git a/controller/if-status.h b/controller/if-status.h
> index bb8a3950d..f9b05d30d 100644
> --- a/controller/if-status.h
> +++ b/controller/if-status.h
> @@ -27,15 +27,27 @@ struct if_status_mgr *if_status_mgr_create(void);
>  void if_status_mgr_clear(struct if_status_mgr *);
>  void if_status_mgr_destroy(struct if_status_mgr *);
>
> -void if_status_mgr_claim_iface(struct if_status_mgr *, const char
*iface_id);
> +void if_status_mgr_claim_iface(struct if_status_mgr *,
> +                               const struct sbrec_port_binding *pb,
> +                               const struct sbrec_chassis *chassis_rec,
> +                               bool sb_readonly);
>  void if_status_mgr_release_iface(struct if_status_mgr *, const char
*iface_id);
>  void if_status_mgr_delete_iface(struct if_status_mgr *, const char
*iface_id);
>
> -void if_status_mgr_update(struct if_status_mgr *, struct
local_binding_data *);
> +void if_status_mgr_update(struct if_status_mgr *, struct
local_binding_data *,
> +                          const struct sbrec_chassis *chassis,
> +                          bool sb_readonly);
>  void if_status_mgr_run(struct if_status_mgr *mgr, struct
local_binding_data *,
>                         const struct sbrec_chassis *,
>                         bool sb_readonly, bool ovs_readonly);
>  void if_status_mgr_get_memory_usage(struct if_status_mgr *mgr,
>                                      struct simap *usage);
> +bool if_status_mgr_iface_is_present(struct if_status_mgr *mgr,
> +                                    const char *iface_id);
> +bool if_status_handle_claims(struct if_status_mgr *mgr,
> +                             struct local_binding_data *binding_data,
> +                             const struct sbrec_chassis *chassis_rec,
> +                             struct hmap *tracked_datapath,
> +                             bool sb_readonly);
>
>  # endif /* controller/if-status.h */
> diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c
> index 69615308e..3947baf03 100644
> --- a/controller/ovn-controller.c
> +++ b/controller/ovn-controller.c
> @@ -1464,6 +1464,73 @@ en_runtime_data_run(struct engine_node *node, void
*data)
>      engine_set_node_state(node, EN_UPDATED);
>  }
>
> +struct ed_type_sb_ro {
> +    bool sb_readonly;
> +};
> +
> +static void *
> +en_sb_ro_init(struct engine_node *node OVS_UNUSED,
> +              struct engine_arg *arg OVS_UNUSED)
> +{
> +    struct ed_type_sb_ro *data = xzalloc(sizeof *data);
> +    return data;
> +}
> +
> +static void
> +en_sb_ro_run(struct engine_node *node, void *data)
> +{
> +    struct ed_type_sb_ro *sb_ro_data = data;
> +    bool sb_readonly = !engine_get_context()->ovnsb_idl_txn;
> +    if (sb_ro_data->sb_readonly != sb_readonly) {
> +        sb_ro_data->sb_readonly = sb_readonly;
> +        if (!sb_ro_data->sb_readonly) {
> +            engine_set_node_state(node, EN_UPDATED);
> +        }
> +    }
> +}
> +
> +static void
> +en_sb_ro_cleanup(void *data OVS_UNUSED)
> +{
> +}
> +
> +static bool
> +runtime_data_sb_ro_handler(struct engine_node *node, void *data)
> +{
> +    const struct sbrec_chassis *chassis = NULL;
> +
> +    struct ovsrec_open_vswitch_table *ovs_table =
> +        (struct ovsrec_open_vswitch_table *)EN_OVSDB_GET(
> +            engine_get_input("OVS_open_vswitch", node));
> +
> +    const char *chassis_id = get_ovs_chassis_id(ovs_table);
> +
> +    struct ovsdb_idl_index *sbrec_chassis_by_name =
> +        engine_ovsdb_node_get_index(
> +                engine_get_input("SB_chassis", node),
> +                "name");
> +
> +    if (chassis_id) {
> +        chassis = chassis_lookup_by_name(sbrec_chassis_by_name,
chassis_id);
> +    }
> +    if (chassis) {
> +        struct ed_type_runtime_data *rt_data = data;
> +        bool sb_readonly = !engine_get_context()->ovnsb_idl_txn;
> +        struct controller_engine_ctx *ctrl_ctx =
> +            engine_get_context()->client_ctx;
> +
> +        if (if_status_handle_claims(ctrl_ctx->if_mgr,
> +                                    &rt_data->lbinding_data,
> +                                    chassis,
> +                                    &rt_data->tracked_dp_bindings,
> +                                    sb_readonly)) {
> +            engine_set_node_state(node, EN_UPDATED);
> +            rt_data->tracked = true;
> +        }
> +    }
> +    return true;
> +}
> +
>  static bool
>  runtime_data_ovs_interface_shadow_handler(struct engine_node *node, void
*data)
>  {
> @@ -3528,6 +3595,7 @@ main(int argc, char *argv[])
>      stopwatch_create(VIF_PLUG_RUN_STOPWATCH_NAME, SW_MS);
>
>      /* Define inc-proc-engine nodes. */
> +    ENGINE_NODE(sb_ro, "sb_ro");
>      ENGINE_NODE_WITH_CLEAR_TRACK_DATA_IS_VALID(ct_zones, "ct_zones");
>      ENGINE_NODE_WITH_CLEAR_TRACK_DATA(ovs_interface_shadow,
>                                        "ovs_interface_shadow");
> @@ -3664,6 +3732,7 @@ main(int argc, char *argv[])
>      engine_add_input(&en_ovs_interface_shadow, &en_ovs_interface,
>                       ovs_interface_shadow_ovs_interface_handler);
>
> +    engine_add_input(&en_runtime_data, &en_sb_ro,
runtime_data_sb_ro_handler);
>      engine_add_input(&en_runtime_data, &en_ofctrl_is_connected, NULL);
>
>      engine_add_input(&en_runtime_data, &en_ovs_open_vswitch, NULL);
> @@ -4098,7 +4167,8 @@ main(int argc, char *argv[])
>                          runtime_data ? &runtime_data->lbinding_data :
NULL;
>                      stopwatch_start(IF_STATUS_MGR_UPDATE_STOPWATCH_NAME,
>                                      time_msec());
> -                    if_status_mgr_update(if_mgr, binding_data);
> +                    if_status_mgr_update(if_mgr, binding_data, chassis,
> +                                         !ovnsb_idl_txn);
>                      stopwatch_stop(IF_STATUS_MGR_UPDATE_STOPWATCH_NAME,
>                                     time_msec());
>
> diff --git a/tests/ovn-macros.at b/tests/ovn-macros.at
> index 335f9158c..8fd6ae6f7 100644
> --- a/tests/ovn-macros.at
> +++ b/tests/ovn-macros.at
> @@ -759,3 +759,15 @@ m4_define([OVN_FOR_EACH_NORTHD_WITHOUT_DP_GROUPS],
>       [m4_foreach([NORTHD_USE_DP_GROUPS], [no],
>         [m4_foreach([NORTHD_USE_PARALLELIZATION], [yes, no], [$1
>  ])])])])
> +
> +# OVN_NBCTL(NBCTL_COMMAND) adds NBCTL_COMMAND to list of commands to be
run by RUN_OVN_NBCTL().
> +m4_define([OVN_NBCTL], [
> +    command="${command} -- $1"
> +])
> +
> +# RUN_OVN_NBCTL() executes list of commands built by the OVN_NBCTL()
macro.
> +m4_define([RUN_OVN_NBCTL], [
> +    check ovn-nbctl ${command}
> +    unset command
> +])
> +
> diff --git a/tests/ovn.at b/tests/ovn.at
> index bfaa41962..94d16bac9 100644
> --- a/tests/ovn.at
> +++ b/tests/ovn.at
> @@ -102,6 +102,18 @@ m4_divert_text([PREPARE_TESTS],
>           test 1 -le $(as $hv1 ovs-ofctl dump-flows br-int | grep -c
"output:$ofport")
>       ])
>     }
> +
> +   ovn_wait_remote_input_flows () {
> +     hv1=$1
> +     hv2=$2
> +     echo "$3: waiting for flows for remote input on $hv1"
> +     # Wait for a flow outputing  to remote input
> +     OVS_WAIT_UNTIL([
> +         ofport=$(as $hv1 ovs-vsctl --bare --columns ofport find
Interface name=ovn-${hv2}-0)
> +         echo "tunnel port=$ofport"
> +         test 1 -le $(as $hv1 ovs-ofctl dump-flows br-int | grep -c
"in_port=$ofport")
> +     ])
> +   }
>  ])
>
>  m4_define([OVN_CHECK_PACKETS],
> @@ -127,6 +139,8 @@ m4_define([OVN_WAIT_PATCH_PORT_FLOWS],
>  m4_define([OVN_WAIT_REMOTE_OUTPUT_FLOWS],
>    [ovn_wait_remote_output_flows "$1" "$2" "__file__:__line__"])
>
> +m4_define([OVN_WAIT_REMOTE_INPUT_FLOWS],
> +  [ovn_wait_remote_input_flows "$1" "$2" "__file__:__line__"])
>
>  AT_BANNER([OVN components])
>
> @@ -14056,6 +14070,11 @@ wait_column "$hv1_uuid" Port_Binding
requested_chassis logical_port=lsp0
>  wait_column "$hv2_uuid" Port_Binding additional_chassis logical_port=lsp0
>  wait_column "$hv2_uuid" Port_Binding requested_additional_chassis
logical_port=lsp0
>
> +# Check ovn-installed updated for main chassis
> +wait_for_ports_up
> +OVS_WAIT_UNTIL([test `as hv1 ovs-vsctl get Interface lsp0
external_ids:ovn-installed` = '"true"'])
> +OVS_WAIT_UNTIL([test x`as hv2 ovs-vsctl get Interface lsp0
external_ids:ovn-installed` = x])
> +
>  # Check that setting iface:encap-ip populates
Port_Binding:additional_encap
>  wait_row_count Encap 2 chassis_name=hv1
>  wait_row_count Encap 2 chassis_name=hv2
> @@ -14081,6 +14100,11 @@ wait_column "$hv2_uuid" Port_Binding
requested_chassis logical_port=lsp0
>  wait_column "" Port_Binding additional_chassis logical_port=lsp0
>  wait_column "" Port_Binding requested_additional_chassis
logical_port=lsp0
>
> +# Check ovn-installed updated for main chassis and not for other chassis
> +wait_for_ports_up
> +OVS_WAIT_UNTIL([test `as hv2 ovs-vsctl get Interface lsp0
external_ids:ovn-installed` = '"true"'])
> +OVS_WAIT_UNTIL([test x`as hv1 ovs-vsctl get Interface lsp0
external_ids:ovn-installed` = x])
> +
>  # Check that additional_encap is cleared
>  wait_column "" Port_Binding additional_encap logical_port=lsp0
>
> @@ -15370,7 +15394,10 @@ AT_CHECK([as hv1 ovs-ofctl dump-flows br-int
table=65 | grep actions=output:1],
>  echo "verifying that lsp0 binding moves when requested-chassis is
changed"
>
>  ovn-nbctl lsp-set-options lsp0 requested-chassis=hv2
> -OVS_WAIT_UNTIL([test 1 = $(grep -c "Releasing lport lsp0 from this
chassis" hv1/ovn-controller.log)])
> +
> +# We might see multiple "Releasing lport ...", when sb is read only
> +OVS_WAIT_UNTIL([test 1 -le $(grep -c "Releasing lport lsp0 from this
chassis" hv1/ovn-controller.log)])
> +
>  wait_column "$hv2_uuid" Port_Binding chassis logical_port=lsp0
>
>  # (6) Chassis hv2 should add flows and hv1 should not.
> @@ -15416,7 +15443,7 @@ AT_CHECK([as hv1 ovs-ofctl dump-flows br-int
table=0 | grep in_port=1], [0], [ig
>  AT_CHECK([as hv1 ovs-ofctl dump-flows br-int table=65 | grep
actions=output:1], [0], [ignore])
>
>  check ovn-nbctl --wait=hv lsp-set-options lsp0
requested-chassis=non-existant-chassis
> -OVS_WAIT_UNTIL([test 1 = $(grep -c "Releasing lport lsp0 from this
chassis" hv1/ovn-controller.log)])
> +OVS_WAIT_UNTIL([test 1 -le $(grep -c "Releasing lport lsp0 from this
chassis" hv1/ovn-controller.log)])
>  check ovn-nbctl --wait=hv sync
>  wait_column '' Port_Binding chasssi logical_port=lsp0
>  AT_CHECK([as hv1 ovs-ofctl dump-flows br-int table=0 | grep in_port=1],
[1], [])
> @@ -32418,3 +32445,119 @@ AT_CHECK([test $(ovn-sbctl list fdb | grep -c
"00:00:00:00:10:30") = 0])
>  OVN_CLEANUP([hv1])
>  AT_CLEANUP
>  ])
> +
> +OVN_FOR_EACH_NORTHD([
> +AT_SETUP([recomputes])
> +ovn_start
> +
> +n_hv=4
> +
> +# Add chassis
> +net_add n1
> +for i in $(seq 1 $n_hv); do
> +    sim_add hv$i
> +    as hv$i
> +    check ovs-vsctl add-br br-phys
> +    ovs-vsctl set open . external-ids:ovn-bridge-mappings=phys:br-phys
> +    ovn_attach n1 br-phys 192.168.0.$i 24 geneve
> +done
> +
> +add_switch_ports() {
> +    start_port=$1
> +    end_port=$2
> +    nb_hv=$3
> +    bulk_size=$4
> +    for ((i=start_port; i<end_port; )) do
> +        start_bulk=$i
> +        for hv in $(seq 1 $nb_hv); do
> +            end_bulk=$((start_bulk+bulk_size-1))
> +            for port in $(seq $start_bulk $end_bulk); do
> +                logical_switch_port=lsp${port}
> +                OVN_NBCTL(lsp-add ls1 $logical_switch_port)
> +                OVN_NBCTL(lsp-set-addresses $logical_switch_port dynamic)
> +            done
> +            start_bulk=$((end_bulk+1))
> +        done
> +        RUN_OVN_NBCTL()
> +
> +        start_bulk=$i
> +        for hv in $(seq 1 $nb_hv); do
> +            end_bulk=$((start_bulk+bulk_size-1))
> +            for port in $(seq $start_bulk $end_bulk); do
> +                logical_switch_port=lsp${port}
> +                as hv$hv ovs-vsctl \
> +                    --no-wait -- add-port br-int vif${port} \
> +                    -- set Interface vif${port}
external_ids:iface-id=$logical_switch_port
> +            done
> +            start_bulk=$((end_bulk+1))
> +        done
> +        i=$((end_bulk+1))
> +    done
> +}
> +check ovn-nbctl ls-add ls1
> +check ovn-nbctl set Logical_Switch ls1 other_config:subnet=10.1.0.0/16
> +check ovn-nbctl set Logical_Switch ls1
other_config:exclude_ips=10.1.255.254
> +
> +check ovn-nbctl lr-add lr1
> +check ovn-nbctl lsp-add ls1 lsp0 -- set Logical_Switch_Port lsp0
type=router options:router-port=lrp0 addresses=dynamic
> +check ovn-nbctl lrp-add lr1 lrp0 "f0:00:00:01:00:01" 10.1.255.254/16
> +check ovn-nbctl lr-nat-add lr1 snat 10.2.0.1 10.1.0.0/16
> +
> +lflow_run=0
> +check ovn-nbctl --wait=hv sync
> +
> +# Tunnel ports might not be added (yet) at this point on slow system.
> +# Wait for flows related to such ports to ensure those ports have been
added
> +# before we measure recomputes. Otherwise, ovs_interface handler might
be run
> +# afterwards for tunnel ports, causing recomputes.
> +for i in $(seq 1 $n_hv); do
> +    for j in $(seq 1 $n_hv); do
> +        if test $i != $j; then
> +            OVN_WAIT_REMOTE_INPUT_FLOWS(["hv$i"],["hv$j"])
> +        fi
> +    done
> +done
> +
> +for i in $(seq 1 $n_hv); do
> +    as hv$i
> +    lflow_run1=$(ovn-appctl -t ovn-controller coverage/read-counter
lflow_run)
> +    lflow_run=`expr $lflow_run1 + $lflow_run`
> +done
> +
> +add_switch_ports 1 1000 $n_hv 5
> +
> +wait_for_ports_up
> +check ovn-nbctl --wait=hv sync
> +
> +for i in $(seq 1 $n_hv); do
> +    pid=$(cat hv${i}/ovn-controller.pid)
> +    u=$((u+$(cat "/proc/$pid/stat" | awk '{print $14}')))
> +    s=$((s+$(cat "/proc/$pid/stat" | awk '{print $15}')))
> +done
> +
> +n_pid=$(cat northd/ovn-northd.pid)
> +n_u=$(cat "/proc/$pid/stat" | awk '{print $14}')
> +n_s=$(cat "/proc/$pid/stat" | awk '{print $15}')
> +
> +echo "Total Northd User Time: $n_u"
> +echo "Total Northd System Time: $n_s"
> +echo "Total Controller User Time: $u"
> +echo "Total Controller System Time: $s"
> +
> +lflow_run_end=0
> +for i in $(seq 1 $n_hv); do
> +    as hv$i
> +    lflow_run1=$(ovn-appctl -t ovn-controller coverage/read-counter
lflow_run)
> +    lflow_run_end=`expr $lflow_run1 + $lflow_run_end`
> +done
> +n_recomputes=`expr $lflow_run_end - $lflow_run`
> +echo "$n_recomputes recomputes"
> +
> +AT_CHECK([test $lflow_run_end == $lflow_run])
> +
> +for i in $(seq 2 $n_hv); do
> +    OVN_CLEANUP_SBOX([hv$i])
> +done
> +OVN_CLEANUP([hv1])
> +AT_CLEANUP
> +])
> diff --git a/tests/perf-northd.at b/tests/perf-northd.at
> index 74b69e9d4..6ec196b36 100644
> --- a/tests/perf-northd.at
> +++ b/tests/perf-northd.at
> @@ -76,23 +76,6 @@ m4_define([PERF_RECORD_STOP], [
>      PERF_RECORD_STOPWATCH(ovn-northd-loop, ["Short term average"],
[Average (northd-loop in msec)])
>  ])
>
> -# OVN_NBCTL([NBCTL_COMMAND])
> -#
> -# Add NBCTL_COMMAND to list of commands to be run by RUN_OVN_NBCTL().
> -#
> -m4_define([OVN_NBCTL], [
> -    command="${command} -- $1"
> -])
> -
> -# RUN_OVN_NBCTL()
> -#
> -# Execute list of commands built by the OVN_NBCTL() macro.
> -#
> -m4_define([RUN_OVN_NBCTL], [
> -    check ovn-nbctl ${command}
> -    unset command
> -])
> -
>  OVS_START_SHELL_HELPERS
>  generate_subnet () {
>      local a=$(printf %d $(expr $1 / 256 + 10))
> --
> 2.31.1
>
Xavier Simonart July 11, 2022, 11:31 a.m. UTC | #3
Hi Han

Thanks for your review.

Let me try to understand your two main concerns and the proper way to fix
it.
1) We only try once to write pb->chassis. If the commit fails, pb->chassis
is not written. As commit fails, we will recompute, but as the
update_required flag is not set anymore, we might end up with no
pb->chassis.
=> I'll remove the flag and try to update until it's confirmed.
2) The state machine, and when we move to INSTALL_FLOWS. Serializing the
state machine, by waiting for confirmation to be received before moving to
INSTALL_FLOWS state will delay the ovn-installed compared to today. So I am
(still) trying to see if there is any way to prevent this in some cases.
Would it be correct to do this serialization (wait for pb->chassis update
confirmation) only when using conditional monitoring? When using
monitor-all, as soon as we have written (w/o confirmation) pb->chassis, we
would move to INSTALL_FLOWS. In that loop where we wrote pb->chassis, (all)
the flows should be updated taking into account pb->chassis.

Thanks again for your feedback

Xavier

On Thu, Jul 7, 2022 at 8:49 AM Han Zhou <hzhou@ovn.org> wrote:

>
>
> On Wed, Jun 22, 2022 at 3:23 AM Xavier Simonart <xsimonar@redhat.com>
> wrote:
> >
> > When VIF ports are claimed on a chassis, SBDB Port_Binding table is
> updated.
> > If the SBDB IDL is still is read-only ("in transaction") when such a
> update
> > is required, the update is not possible and recompute is triggered
> through
> > I+P failure.
> >
> > This situation can happen:
> > - after updating Port_Binding->chassis to SBDB for one port, in a
> following
> >   iteration, ovn-controller handles Interface:external_ids:ovn-installed
> >   (for the same port) while SBDB is still read-only.
> > - after updating Port_Binding->chassis to SBDB for one port, in a
> following
> >   iteration, ovn-controller updates Port_Binding->chassis for another
> port,
> >   while SBDB is still read-only.
> >
> > This patch prevent the recompute, by having the if-status module
> > updating the Port_Binding chassis (if needed) when possible.
> > This does not delay Port_Binding chassis update compared to before this
> patch.
> > - With the patch, Port_Binding chassis will be updated as soon as SBDB is
> > again writable, without recompute.
> > - Without the patch, Port_Binding chassis was updated as soon as SBDB was
> > again writable, through a recompute.
> >
> > As part of this patch, ovn-installed will not be updated for additional
> chassis;
> > it will only be updated when the migration is completed.
> >
> > Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2059253
> > Signed-off-by: Xavier Simonart <xsimonar@redhat.com>
> >
> > ---
> > v2:  - handled Dumitru's comments.
> >      - handled Han's comments, mainly ensure we moved out of CLAIMED
> state
> >        only after updating pb->chassis to guarentee physical flows are
> installed
> >        when ovn-installed is updated in OVS.
> >      - slighly reorganize the code to isolate 'notify_up = false' cases
> in
> >        claim_port (i.e. ports such as virtual ports), in the idea of
> making
> >        future patch preventing recomputes when virtual ports are claimed.
> >      - updated test case to cause more race conditions.
> >      - rebased on origin/main
> >      - note that "additional chassis" as now supported by
> >        "Support LSP:options:requested-chassis as a list" might still
> cause
> >        recomputes.
> >      - fixed missing flows when Port_Binding chassis was updated by
> mgr_update
> >        w/o any lflow recalculation.
> > v3:  - handled Dumitru's comments on v2, mainly have runtime_data handler
> >        handling pb_claims when sb becomes writable (instead of a lflow
> handler).
> >      - fixed test as it was not checking recomputes on all hv, as well
> as a flaky
> >        behavior.
> >      - rebased on origin/main.
> > ---
> >  controller/binding.c        | 154 +++++++++++++++++++++----------
> >  controller/binding.h        |  15 +++-
> >  controller/if-status.c      | 174 ++++++++++++++++++++++++++++++++----
> >  controller/if-status.h      |  16 +++-
> >  controller/ovn-controller.c |  72 ++++++++++++++-
> >  tests/ovn-macros.at         |  12 +++
> >  tests/ovn.at                | 147 +++++++++++++++++++++++++++++-
> >  tests/perf-northd.at        |  17 ----
> >  8 files changed, 519 insertions(+), 88 deletions(-)
> >
> > diff --git a/controller/binding.c b/controller/binding.c
> > index 2279570f9..b21577f71 100644
> > --- a/controller/binding.c
> > +++ b/controller/binding.c
> > @@ -644,11 +644,17 @@ local_binding_get_lport_ofport(const struct shash
> *local_bindings,
> >  }
> >
> >  bool
> > -local_binding_is_up(struct shash *local_bindings, const char *pb_name)
> > +local_binding_is_up(struct shash *local_bindings, const char *pb_name,
> > +                    const struct sbrec_chassis *chassis_rec)
> >  {
> >      struct local_binding *lbinding =
> >          local_binding_find(local_bindings, pb_name);
> >      struct binding_lport *b_lport =
> local_binding_get_primary_lport(lbinding);
> > +
> > +    if (b_lport && b_lport->pb->chassis != chassis_rec) {
> > +        return false;
> > +    }
> > +
> >      if (lbinding && b_lport && lbinding->iface) {
> >          if (b_lport->pb->n_up && !b_lport->pb->up[0]) {
> >              return false;
> > @@ -660,13 +666,23 @@ local_binding_is_up(struct shash *local_bindings,
> const char *pb_name)
> >  }
> >
> >  bool
> > -local_binding_is_down(struct shash *local_bindings, const char *pb_name)
> > +local_binding_is_down(struct shash *local_bindings, const char *pb_name,
> > +                      const struct sbrec_chassis *chassis_rec)
> >  {
> >      struct local_binding *lbinding =
> >          local_binding_find(local_bindings, pb_name);
> >
> >      struct binding_lport *b_lport =
> local_binding_get_primary_lport(lbinding);
> >
> > +    if (b_lport) {
> > +        if (b_lport->pb->chassis == chassis_rec) {
> > +            return false;
> > +        } else if (b_lport->pb->chassis) {
> > +            VLOG_DBG("lport %s already claimed by other chassis",
> > +                     b_lport->pb->logical_port);
> > +        }
> > +    }
> > +
> >      if (!lbinding) {
> >          return true;
> >      }
> > @@ -884,37 +900,80 @@ get_lport_type_str(enum en_lport_type lport_type)
> >      OVS_NOT_REACHED();
> >  }
> >
> > -/* For newly claimed ports, if 'notify_up' is 'false':
> > +void
> > +set_pb_chassis_in_sbrec(const struct sbrec_port_binding *pb,
> > +                        const struct sbrec_chassis *chassis_rec,
> > +                        bool is_set)
> > +{
> > +    if (pb->chassis != chassis_rec) {
> > +         if (is_set) {
> > +            if (pb->chassis) {
> > +                VLOG_INFO("Changing chassis for lport %s from %s to
> %s.",
> > +                          pb->logical_port, pb->chassis->name,
> > +                          chassis_rec->name);
> > +            } else {
> > +                VLOG_INFO("Claiming lport %s for this chassis.",
> > +                          pb->logical_port);
> > +            }
> > +            for (int i = 0; i < pb->n_mac; i++) {
> > +                VLOG_INFO("%s: Claiming %s", pb->logical_port,
> pb->mac[i]);
> > +            }
> > +            sbrec_port_binding_set_chassis(pb, chassis_rec);
> > +        }
> > +    } else if (!is_set) {
> > +        sbrec_port_binding_set_chassis(pb, NULL);
> > +    }
> > +}
> > +
> > +void
> > +local_binding_set_pb(struct shash *local_bindings, const char *pb_name,
> > +                     const struct sbrec_chassis *chassis_rec,
> > +                     struct hmap *tracked_datapaths, bool is_set)
> > +{
> > +    struct local_binding *lbinding =
> > +        local_binding_find(local_bindings, pb_name);
> > +    struct binding_lport *b_lport =
> local_binding_get_primary_lport(lbinding);
> > +
> > +    if (b_lport) {
> > +        set_pb_chassis_in_sbrec(b_lport->pb, chassis_rec, is_set);
> > +        if (tracked_datapaths) {
> > +            update_lport_tracking(b_lport->pb, tracked_datapaths, true);
> > +        }
> > +    }
> > +}
> > +
> > +/* For newly claimed ports:
> >   * - set the 'pb.up' field to true if 'pb' has no 'parent_pb'.
> >   * - set the 'pb.up' field to true if 'parent_pb.up' is 'true' (e.g.,
> for
> >   *   container and virtual ports).
> > - * Otherwise request a notification to be sent when the OVS flows
> > - * corresponding to 'pb' have been installed.
> > + *
> > + * Returns false if lport is not claimed due to 'sb_readonly'.
> > + * Returns true otherwise.
> >   *
> >   * Note:
> > - *   Updates (directly or through a notification) the 'pb->up' field
> only if
> > - *   it's explicitly set to 'false'.
> > + *   Updates the 'pb->up' field only if it's explicitly set to 'false'.
> >   *   This is to ensure compatibility with older versions of ovn-northd.
> >   */
> > -static void
> > +static bool
> >  claimed_lport_set_up(const struct sbrec_port_binding *pb,
> >                       const struct sbrec_port_binding *parent_pb,
> > -                     const struct sbrec_chassis *chassis_rec,
> > -                     bool notify_up, struct if_status_mgr *if_mgr)
> > +                     bool sb_readonly)
> >  {
> > -    if (!notify_up) {
> > -        bool up = true;
> > -        if (!parent_pb || (parent_pb->n_up && parent_pb->up[0])) {
> > +    /* When notify_up is false in claim_port(), no state is created
> > +     * by if_status_mgr. In such cases, return false (i.e. trigger
> recompute)
> > +     * if we can't update sb (because it is readonly).
> > +     */
> > +    bool up = true;
> > +    if (!parent_pb || (parent_pb->n_up && parent_pb->up[0])) {
> > +        if (!sb_readonly) {
> >              if (pb->n_up) {
> >                  sbrec_port_binding_set_up(pb, &up, 1);
> >              }
> > +        } else if (pb->n_up && !pb->up[0]) {
> > +            return false;
> >          }
> > -        return;
> > -    }
> > -
> > -    if (pb->chassis != chassis_rec || (pb->n_up && !pb->up[0])) {
> > -        if_status_mgr_claim_iface(if_mgr, pb->logical_port);
> >      }
> > +    return true;
> >  }
> >
> >  typedef void (*set_func)(const struct sbrec_port_binding *pb,
> > @@ -1057,37 +1116,35 @@ claim_lport(const struct sbrec_port_binding *pb,
> >              struct hmap *tracked_datapaths,
> >              struct if_status_mgr *if_mgr)
> >  {
> > -    if (!sb_readonly) {
> > -        claimed_lport_set_up(pb, parent_pb, chassis_rec, notify_up,
> if_mgr);
> > -    }
> > -
> >      enum can_bind can_bind =
> lport_can_bind_on_this_chassis(chassis_rec, pb);
> >      bool update_tracked = false;
> >
> >      if (can_bind == CAN_BIND_AS_MAIN) {
> >          if (pb->chassis != chassis_rec) {
> > -            if (sb_readonly) {
> > -                return false;
> > -            }
> > -
> > -            if (pb->chassis) {
> > -                VLOG_INFO("Changing chassis for lport %s from %s to
> %s.",
> > -                        pb->logical_port, pb->chassis->name,
> > -                        chassis_rec->name);
> > -            } else {
> > -                VLOG_INFO("Claiming lport %s for this chassis.",
> > -                          pb->logical_port);
> > -            }
> > -            for (size_t i = 0; i < pb->n_mac; i++) {
> > -                VLOG_INFO("%s: Claiming %s", pb->logical_port,
> pb->mac[i]);
> > -            }
> > -
> > -            sbrec_port_binding_set_chassis(pb, chassis_rec);
> >              if (is_additional_chassis(pb, chassis_rec)) {
> > +                if (sb_readonly) {
> > +                    return false;
> > +                }
> >                  remove_additional_chassis(pb, chassis_rec);
> >              }
> >              update_tracked = true;
> >          }
> > +        if (!notify_up) {
> > +            if (!claimed_lport_set_up(pb, parent_pb, sb_readonly)) {
> > +                return false;
> > +            }
> > +            if (pb->chassis != chassis_rec) {
> > +                if (sb_readonly) {
> > +                    return false;
> > +                }
> > +                set_pb_chassis_in_sbrec(pb, chassis_rec, true);
> > +            }
> > +        } else {
> > +            if ((pb->chassis != chassis_rec) || (pb->n_up &&
> !pb->up[0])) {
> > +                if_status_mgr_claim_iface(if_mgr, pb, chassis_rec,
> > +                                          sb_readonly);
> > +            }
> > +        }
> >      } else if (can_bind == CAN_BIND_AS_ADDITIONAL) {
> >          if (!is_additional_chassis(pb, chassis_rec)) {
> >              if (sb_readonly) {
> > @@ -1132,7 +1189,8 @@ claim_lport(const struct sbrec_port_binding *pb,
> >   */
> >  static bool
> >  release_lport_main_chassis(const struct sbrec_port_binding *pb,
> > -                           bool sb_readonly)
> > +                           bool sb_readonly,
> > +                           struct if_status_mgr *if_mgr)
> >  {
> >      if (pb->encap) {
> >          if (sb_readonly) {
> > @@ -1141,11 +1199,13 @@ release_lport_main_chassis(const struct
> sbrec_port_binding *pb,
> >          sbrec_port_binding_set_encap(pb, NULL);
> >      }
> >
> > +    /* If sb readonly, pb->chassis unset through if-status if present.
> */
> >      if (pb->chassis) {
> > -        if (sb_readonly) {
> > +        if (!sb_readonly) {
> > +            sbrec_port_binding_set_chassis(pb, NULL);
> > +        } else if (!if_status_mgr_iface_is_present(if_mgr,
> pb->logical_port)) {
> >              return false;
> >          }
> > -        sbrec_port_binding_set_chassis(pb, NULL);
> >      }
> >
> >      if (pb->virtual_parent) {
> > @@ -1155,7 +1215,8 @@ release_lport_main_chassis(const struct
> sbrec_port_binding *pb,
> >          sbrec_port_binding_set_virtual_parent(pb, NULL);
> >      }
> >
> > -    VLOG_INFO("Releasing lport %s from this chassis.",
> pb->logical_port);
> > +    VLOG_INFO("Releasing lport %s from this chassis (sb_readonly=%d)",
> > +              pb->logical_port, sb_readonly);
> >      return true;
> >  }
> >
> > @@ -1189,7 +1250,7 @@ release_lport(const struct sbrec_port_binding *pb,
> >                struct hmap *tracked_datapaths, struct if_status_mgr
> *if_mgr)
> >  {
> >      if (pb->chassis == chassis_rec) {
> > -        if (!release_lport_main_chassis(pb, sb_readonly)) {
> > +        if (!release_lport_main_chassis(pb, sb_readonly, if_mgr)) {
> >              return false;
> >          }
> >      } else if (is_additional_chassis(pb, chassis_rec)) {
> > @@ -1271,7 +1332,7 @@ consider_vif_lport_(const struct
> sbrec_port_binding *pb,
> >                               b_lport->lbinding->iface,
> >                               !b_ctx_in->ovnsb_idl_txn,
> >                               !parent_pb, b_ctx_out->tracked_dp_bindings,
> > -                             b_ctx_out->if_mgr)){
> > +                             b_ctx_out->if_mgr)) {
> >                  return false;
> >              }
> >
> > @@ -1527,7 +1588,8 @@ consider_localport(const struct sbrec_port_binding
> *pb,
> >      enum can_bind can_bind = lport_can_bind_on_this_chassis(
> >          b_ctx_in->chassis_rec, pb);
> >      if (can_bind == CAN_BIND_AS_MAIN) {
> > -        if (!release_lport_main_chassis(pb, !b_ctx_in->ovnsb_idl_txn)) {
> > +        if (!release_lport_main_chassis(pb, !b_ctx_in->ovnsb_idl_txn,
> > +            b_ctx_out->if_mgr)) {
> >              return false;
> >          }
> >      } else if (can_bind == CAN_BIND_AS_ADDITIONAL) {
> > diff --git a/controller/binding.h b/controller/binding.h
> > index 1fed06674..d20659b0b 100644
> > --- a/controller/binding.h
> > +++ b/controller/binding.h
> > @@ -151,8 +151,10 @@ const struct sbrec_port_binding
> *local_binding_get_primary_pb(
> >  ofp_port_t local_binding_get_lport_ofport(const struct shash
> *local_bindings,
> >                                            const char *pb_name);
> >
> > -bool local_binding_is_up(struct shash *local_bindings, const char
> *pb_name);
> > -bool local_binding_is_down(struct shash *local_bindings, const char
> *pb_name);
> > +bool local_binding_is_up(struct shash *local_bindings, const char
> *pb_name,
> > +                         const struct sbrec_chassis *);
> > +bool local_binding_is_down(struct shash *local_bindings, const char
> *pb_name,
> > +                           const struct sbrec_chassis *);
> >  void local_binding_set_up(struct shash *local_bindings, const char
> *pb_name,
> >                            const struct sbrec_chassis *chassis_rec,
> >                            const char *ts_now_str, bool sb_readonly,
> > @@ -160,7 +162,10 @@ void local_binding_set_up(struct shash
> *local_bindings, const char *pb_name,
> >  void local_binding_set_down(struct shash *local_bindings, const char
> *pb_name,
> >                              const struct sbrec_chassis *chassis_rec,
> >                              bool sb_readonly, bool ovs_readonly);
> > -
> > +void local_binding_set_pb(struct shash *local_bindings, const char
> *pb_name,
> > +                          const struct sbrec_chassis *chassis_rec,
> > +                          struct hmap *tracked_datapaths,
> > +                          bool is_set);
> >  void binding_register_ovs_idl(struct ovsdb_idl *);
> >  void binding_run(struct binding_ctx_in *, struct binding_ctx_out *);
> >  bool binding_cleanup(struct ovsdb_idl_txn *ovnsb_idl_txn,
> > @@ -178,6 +183,10 @@ void binding_dump_local_bindings(struct
> local_binding_data *, struct ds *);
> >  bool is_additional_chassis(const struct sbrec_port_binding *pb,
> >                             const struct sbrec_chassis *chassis_rec);
> >
> > +void set_pb_chassis_in_sbrec(const struct sbrec_port_binding *pb,
> > +                             const struct sbrec_chassis *chassis_rec,
> > +                             bool is_set);
> > +
> >  /* Corresponds to each Port_Binding.type. */
> >  enum en_lport_type {
> >      LP_UNKNOWN,
> > diff --git a/controller/if-status.c b/controller/if-status.c
> > index ad61844d8..7693c289b 100644
> > --- a/controller/if-status.c
> > +++ b/controller/if-status.c
> > @@ -24,6 +24,7 @@
> >  #include "lib/util.h"
> >  #include "timeval.h"
> >  #include "openvswitch/vlog.h"
> > +#include "lib/ovn-sb-idl.h"
> >
> >  VLOG_DEFINE_THIS_MODULE(if_status);
> >
> > @@ -53,9 +54,11 @@ VLOG_DEFINE_THIS_MODULE(if_status);
> >   */
> >
> >  enum if_state {
> > -    OIF_CLAIMED,       /* Newly claimed interface. */
> > -    OIF_INSTALL_FLOWS, /* Already claimed interface for which flows are
> still
> > -                        * being installed.
> > +    OIF_CLAIMED,       /* Newly claimed interface. pb->chassis not yet
> updated.
> > +                        */
> > +    OIF_INSTALL_FLOWS, /* Claimed interface with pb->chassis
> successfully
> > +                        * updated in SB and for which flows are still
> being
> > +                        * installed.
> >                          */
> >      OIF_MARK_UP,       /* Interface with flows successfully installed
> in OVS
> >                          * but not yet marked "up" in the binding module
> (in
> > @@ -78,6 +81,55 @@ static const char *if_state_names[] = {
> >      [OIF_INSTALLED]     = "INSTALLED",
> >  };
> >
> > +/*
> > + *       +----------------------+
> > + * +---> |                      |
> > + * | +-> |         NULL         |
> <--------------------------------------+++-+
> > + * | |   +----------------------+
>      |
> > + * | |     ^ release_iface   | claim_iface
>       |
> > + * | |     |                 V - sbrec_update_chassis(if sb is rw)
>       |
> > + * | |   +----------------------+
>      |
> > + * | |   |                      |
> <----------------------------------------+ |
> > + * | |   |       CLAIMED        |
> <--------------------------------------+ | |
> > + * | |   +----------------------+
>  | | |
> > + * | |                  | mgr_update(when sb is rw)
>  | | |
> > + * | | release_iface    |  - sbrec_update_chassis
>  | | |
> > + * | |                  |  - request seqno
>   | | |
> > + * | |                  V
>  | | |
> > + * | |   +----------------------+
>  | | |
> > + * | +-- |                      |  mgr_run(seqno not rcvd)
>   | | |
> > + * |     |    INSTALL_FLOWS     |   - set port down in sb
>  | | |
> > + * |     |                      |  mgr_update()
>  | | |
> > + * |     +----------------------+   - sbrec_update_chassis if needed
>   | | |
> > + * |                    |
>  | | |
> > + * |                    |  mgr_run(seqno rcvd)
>   | | |
> > + * |                    |  - set port up in sb
>   | | |
> > + * | release_iface      |  - set ovn-installed in ovs
>  | | |
> > + * |                    V
>  | | |
> > + * |   +----------------------+
>  | | |
> > + * |   |                      |  mgr_run()
>   | | |
> > + * +-- |       MARK_UP        |  - set port up in sb
>   | | |
> > + *     |                      |  - set ovn-installed in ovs
>  | | |
> > + *     |                      |  mgr_update()
>  | | |
> > + *     +----------------------+  - sbrec_update_chassis if needed
>  | | |
> > + *              |
>  | | |
> > + *              | mgr_update(rcvd port up / ovn_installed & chassis
> set) | | |
> > + *              V
>  | | |
> > + *     +----------------------+
>  | | |
> > + *     |      INSTALLED       | ------------> claim_iface
> ---------------+ | |
> > + *     +----------------------+
>    | |
> > + *              |
>    | |
> > + *              | release_iface
>    | |
> > + *              V
>    | |
> > + *     +----------------------+
>    | |
> > + *     |                      | ------------> claim_iface
> -----------------+ |
> > + *     |      MARK_DOWN       | ------> mgr_update(rcvd port down)
> ----------+
> > + *     |                      | mgr_run()
> > + *     |                      | - set port down in sb
> > + *     |                      | mgr_update()
> > + *     +----------------------+ - sbrec_update_chassis(NULL)
> > + */
> > +
> >  struct ovs_iface {
> >      char *id;               /* Extracted from OVS
> external_ids.iface_id. */
> >      enum if_state state;    /* State of the interface in the state
> machine. */
> > @@ -85,6 +137,7 @@ struct ovs_iface {
> >                               * be fully programmed in OVS.  Only used
> in state
> >                               * OIF_INSTALL_FLOWS.
> >                               */
> > +    bool chassis_update_required;  /* If true, pb->chassis must be
> updated. */
> >  };
> >
> >  static uint64_t ifaces_usage;
> > @@ -158,14 +211,23 @@ if_status_mgr_destroy(struct if_status_mgr *mgr)
> >  }
> >
> >  void
> > -if_status_mgr_claim_iface(struct if_status_mgr *mgr, const char
> *iface_id)
> > +if_status_mgr_claim_iface(struct if_status_mgr *mgr,
> > +                          const struct sbrec_port_binding *pb,
> > +                          const struct sbrec_chassis *chassis_rec,
> > +                          bool sb_readonly)
> >  {
> > +    const char *iface_id = pb->logical_port;
> >      struct ovs_iface *iface = shash_find_data(&mgr->ifaces, iface_id);
> >
> >      if (!iface) {
> >          iface = ovs_iface_create(mgr, iface_id, OIF_CLAIMED);
> >      }
> > -
> > +    if (!sb_readonly) {
> > +        set_pb_chassis_in_sbrec(pb, chassis_rec, true);
> > +        iface->chassis_update_required = false;
> > +    } else {
> > +        iface->chassis_update_required = true;
> > +    }
> >      switch (iface->state) {
> >      case OIF_CLAIMED:
> >      case OIF_INSTALL_FLOWS:
> > @@ -182,6 +244,12 @@ if_status_mgr_claim_iface(struct if_status_mgr
> *mgr, const char *iface_id)
> >      }
> >  }
> >
> > +bool
> > +if_status_mgr_iface_is_present(struct if_status_mgr *mgr, const char
> *iface_id)
> > +{
> > +    return !!shash_find_data(&mgr->ifaces, iface_id);
> > +}
> > +
> >  void
> >  if_status_mgr_release_iface(struct if_status_mgr *mgr, const char
> *iface_id)
> >  {
> > @@ -246,9 +314,39 @@ if_status_mgr_delete_iface(struct if_status_mgr
> *mgr, const char *iface_id)
> >      }
> >  }
> >
> > +bool
> > +if_status_handle_claims(struct if_status_mgr *mgr,
> > +                        struct local_binding_data *binding_data,
> > +                        const struct sbrec_chassis *chassis_rec,
> > +                        struct hmap *tracked_datapath,
> > +                        bool sb_readonly)
> > +{
> > +    if (!binding_data || sb_readonly) {
> > +        return false;
> > +    }
> > +
> > +    struct shash *bindings = &binding_data->bindings;
> > +    struct hmapx_node *node;
> > +
> > +    bool rc = false;
> > +    HMAPX_FOR_EACH (node, &mgr->ifaces_per_state[OIF_CLAIMED]) {
> > +        struct ovs_iface *iface = node->data;
> > +        if (iface->chassis_update_required) {
>
> Thanks Xavier for the revision. The state machine looks more clear now,
> but I have a major concern for the use of chassis_update_required. This
> bool flag is used to decide if an update to SB is needed, and once a SB
> update is requested, it is set to false, and assumes the SB update will
> succeed immediately. However, the assumption may be wrong. There can be
> different kinds of reasons that the subsequent SB update fails, or delayed,
> so this flag is not reliable. Instead, in CLAIMED state, the responsibility
> to make sure the SB update is completed. If the transaction is in-progress,
> the sb_readonly is true. So if sb_readonly is false, it means nothing is
> in-progress, so we can always check if (!sb_readonly && <SB chassis is not
> updated for the port-binding>) we should just send the update, regardless
> of whether we have requested it before. Please also see another comment
> below for the state transition.
>
> > +            VLOG_INFO("if_status_handle_claims for %s", iface->id);
> > +            local_binding_set_pb(bindings, iface->id, chassis_rec,
> > +                                 tracked_datapath, true);
> > +            rc = true;
> > +        }
> > +        iface->chassis_update_required = false;
> > +    }
> > +    return rc;
> > +}
> > +
> >  void
> >  if_status_mgr_update(struct if_status_mgr *mgr,
> > -                     struct local_binding_data *binding_data)
> > +                     struct local_binding_data *binding_data,
> > +                     const struct sbrec_chassis *chassis_rec,
> > +                     bool sb_readonly)
> >  {
> >      if (!binding_data) {
> >          return;
> > @@ -257,13 +355,25 @@ if_status_mgr_update(struct if_status_mgr *mgr,
> >      struct shash *bindings = &binding_data->bindings;
> >      struct hmapx_node *node;
> >
> > +    /* Interfaces in OIF_MARK_UP state have already set their
> pb->chassis.
> > +     * However, it might have been reset by another hv.
> > +     */
> >      /* Move all interfaces that have been confirmed "up" by the binding
> module,
> >       * from OIF_MARK_UP to OIF_INSTALLED.
> >       */
> >      HMAPX_FOR_EACH_SAFE (node, &mgr->ifaces_per_state[OIF_MARK_UP]) {
> >          struct ovs_iface *iface = node->data;
> >
> > -        if (local_binding_is_up(bindings, iface->id)) {
> > +        if (iface->chassis_update_required) {
> > +            if (!sb_readonly) {
> > +                iface->chassis_update_required = false;
> > +                local_binding_set_pb(bindings, iface->id, chassis_rec,
> > +                                     NULL, true);
> > +            } else {
> > +                continue;
> > +            }
> > +        }
> > +        if (local_binding_is_up(bindings, iface->id, chassis_rec)) {
> >              ovs_iface_set_state(mgr, iface, OIF_INSTALLED);
> >          }
> >      }
> > @@ -274,23 +384,53 @@ if_status_mgr_update(struct if_status_mgr *mgr,
> >      HMAPX_FOR_EACH_SAFE (node, &mgr->ifaces_per_state[OIF_MARK_DOWN]) {
> >          struct ovs_iface *iface = node->data;
> >
> > -        if (local_binding_is_down(bindings, iface->id)) {
> > +        if (!sb_readonly) {
> > +            local_binding_set_pb(bindings, iface->id, chassis_rec,
> > +                                 NULL, false);
> > +        }
> > +        if (local_binding_is_down(bindings, iface->id, chassis_rec)) {
> >              ovs_iface_destroy(mgr, iface);
> >          }
> >      }
> >
> > -    /* Register for a notification about flows being installed in OVS
> for all
> > -     * newly claimed interfaces.
> > +    if (!sb_readonly) {
> > +        HMAPX_FOR_EACH_SAFE (node,
> &mgr->ifaces_per_state[OIF_INSTALL_FLOWS]) {
> > +            struct ovs_iface *iface = node->data;
> > +
> > +            if (iface->chassis_update_required) {
> > +                iface->chassis_update_required = false;
> > +                local_binding_set_pb(bindings, iface->id, chassis_rec,
> > +                                     NULL, true);
> > +            }
> > +        }
> > +    }
> > +
> > +    /* Update Port_Binding->chassis for newly claimed interfaces
> > +     * Register for a notification about flows being installed in OVS
> for all
> > +     * newly claimed interfaces for which we could update pb->chassis.
> >       *
> >       * Move them from OIF_CLAIMED to OIF_INSTALL_FLOWS.
> >       */
> > -    bool new_ifaces = false;
> > -    HMAPX_FOR_EACH_SAFE (node, &mgr->ifaces_per_state[OIF_CLAIMED]) {
> > -        struct ovs_iface *iface = node->data;
> >
> > -        ovs_iface_set_state(mgr, iface, OIF_INSTALL_FLOWS);
> > -        iface->install_seqno = mgr->iface_seqno + 1;
> > -        new_ifaces = true;
> > +    bool new_ifaces = false;
> > +    if (!sb_readonly) {
> > +        HMAPX_FOR_EACH_SAFE (node, &mgr->ifaces_per_state[OIF_CLAIMED])
> {
> > +            struct ovs_iface *iface = node->data;
> > +            /* No need to check for chassis_update_required as already
> done
> > +             * in if_status_handle_claims or if_status_mgr_claim_iface
> > +             */
> > +            ovs_iface_set_state(mgr, iface, OIF_INSTALL_FLOWS);
>
> We need to make sure the chassis in SB port-binding is up-to-date (i.e.
> the update notification from SB DB has been received) before moving to
> INSTALL_FLOWS. Otherwise, it is still possible that the state is moved too
> early and end up with incomplete flow installation for the lport when the
> state is finally moved to INSTALLED.
>
> Thanks,
> Han
>
> > +            iface->install_seqno = mgr->iface_seqno + 1;
> > +            new_ifaces = true;
> > +        }
> > +    } else {
> > +        HMAPX_FOR_EACH_SAFE (node, &mgr->ifaces_per_state[OIF_CLAIMED])
> {
> > +            struct ovs_iface *iface = node->data;
> > +            static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5,
> 1);
> > +            VLOG_INFO_RL(&rl,
> > +                         "Not updating pb chassis for %s now as "
> > +                         "sb is readonly", iface->id);
> > +        }
> >      }
> >
> >      /* Request a seqno update when the flows for new interfaces have
> been
> > @@ -403,7 +543,7 @@ if_status_mgr_update_bindings(struct if_status_mgr
> *mgr,
> >      struct hmapx_node *node;
> >
> >      /* Notify the binding module to set "down" all bindings that are
> still
> > -     * in the process of being installed in OVS, i.e., are not yet
> instsalled.
> > +     * in the process of being installed in OVS, i.e., are not yet
> installed.
> >       */
> >      HMAPX_FOR_EACH (node, &mgr->ifaces_per_state[OIF_INSTALL_FLOWS]) {
> >          struct ovs_iface *iface = node->data;
> > diff --git a/controller/if-status.h b/controller/if-status.h
> > index bb8a3950d..f9b05d30d 100644
> > --- a/controller/if-status.h
> > +++ b/controller/if-status.h
> > @@ -27,15 +27,27 @@ struct if_status_mgr *if_status_mgr_create(void);
> >  void if_status_mgr_clear(struct if_status_mgr *);
> >  void if_status_mgr_destroy(struct if_status_mgr *);
> >
> > -void if_status_mgr_claim_iface(struct if_status_mgr *, const char
> *iface_id);
> > +void if_status_mgr_claim_iface(struct if_status_mgr *,
> > +                               const struct sbrec_port_binding *pb,
> > +                               const struct sbrec_chassis *chassis_rec,
> > +                               bool sb_readonly);
> >  void if_status_mgr_release_iface(struct if_status_mgr *, const char
> *iface_id);
> >  void if_status_mgr_delete_iface(struct if_status_mgr *, const char
> *iface_id);
> >
> > -void if_status_mgr_update(struct if_status_mgr *, struct
> local_binding_data *);
> > +void if_status_mgr_update(struct if_status_mgr *, struct
> local_binding_data *,
> > +                          const struct sbrec_chassis *chassis,
> > +                          bool sb_readonly);
> >  void if_status_mgr_run(struct if_status_mgr *mgr, struct
> local_binding_data *,
> >                         const struct sbrec_chassis *,
> >                         bool sb_readonly, bool ovs_readonly);
> >  void if_status_mgr_get_memory_usage(struct if_status_mgr *mgr,
> >                                      struct simap *usage);
> > +bool if_status_mgr_iface_is_present(struct if_status_mgr *mgr,
> > +                                    const char *iface_id);
> > +bool if_status_handle_claims(struct if_status_mgr *mgr,
> > +                             struct local_binding_data *binding_data,
> > +                             const struct sbrec_chassis *chassis_rec,
> > +                             struct hmap *tracked_datapath,
> > +                             bool sb_readonly);
> >
> >  # endif /* controller/if-status.h */
> > diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c
> > index 69615308e..3947baf03 100644
> > --- a/controller/ovn-controller.c
> > +++ b/controller/ovn-controller.c
> > @@ -1464,6 +1464,73 @@ en_runtime_data_run(struct engine_node *node,
> void *data)
> >      engine_set_node_state(node, EN_UPDATED);
> >  }
> >
> > +struct ed_type_sb_ro {
> > +    bool sb_readonly;
> > +};
> > +
> > +static void *
> > +en_sb_ro_init(struct engine_node *node OVS_UNUSED,
> > +              struct engine_arg *arg OVS_UNUSED)
> > +{
> > +    struct ed_type_sb_ro *data = xzalloc(sizeof *data);
> > +    return data;
> > +}
> > +
> > +static void
> > +en_sb_ro_run(struct engine_node *node, void *data)
> > +{
> > +    struct ed_type_sb_ro *sb_ro_data = data;
> > +    bool sb_readonly = !engine_get_context()->ovnsb_idl_txn;
> > +    if (sb_ro_data->sb_readonly != sb_readonly) {
> > +        sb_ro_data->sb_readonly = sb_readonly;
> > +        if (!sb_ro_data->sb_readonly) {
> > +            engine_set_node_state(node, EN_UPDATED);
> > +        }
> > +    }
> > +}
> > +
> > +static void
> > +en_sb_ro_cleanup(void *data OVS_UNUSED)
> > +{
> > +}
> > +
> > +static bool
> > +runtime_data_sb_ro_handler(struct engine_node *node, void *data)
> > +{
> > +    const struct sbrec_chassis *chassis = NULL;
> > +
> > +    struct ovsrec_open_vswitch_table *ovs_table =
> > +        (struct ovsrec_open_vswitch_table *)EN_OVSDB_GET(
> > +            engine_get_input("OVS_open_vswitch", node));
> > +
> > +    const char *chassis_id = get_ovs_chassis_id(ovs_table);
> > +
> > +    struct ovsdb_idl_index *sbrec_chassis_by_name =
> > +        engine_ovsdb_node_get_index(
> > +                engine_get_input("SB_chassis", node),
> > +                "name");
> > +
> > +    if (chassis_id) {
> > +        chassis = chassis_lookup_by_name(sbrec_chassis_by_name,
> chassis_id);
> > +    }
> > +    if (chassis) {
> > +        struct ed_type_runtime_data *rt_data = data;
> > +        bool sb_readonly = !engine_get_context()->ovnsb_idl_txn;
> > +        struct controller_engine_ctx *ctrl_ctx =
> > +            engine_get_context()->client_ctx;
> > +
> > +        if (if_status_handle_claims(ctrl_ctx->if_mgr,
> > +                                    &rt_data->lbinding_data,
> > +                                    chassis,
> > +                                    &rt_data->tracked_dp_bindings,
> > +                                    sb_readonly)) {
> > +            engine_set_node_state(node, EN_UPDATED);
> > +            rt_data->tracked = true;
> > +        }
> > +    }
> > +    return true;
> > +}
> > +
> >  static bool
> >  runtime_data_ovs_interface_shadow_handler(struct engine_node *node,
> void *data)
> >  {
> > @@ -3528,6 +3595,7 @@ main(int argc, char *argv[])
> >      stopwatch_create(VIF_PLUG_RUN_STOPWATCH_NAME, SW_MS);
> >
> >      /* Define inc-proc-engine nodes. */
> > +    ENGINE_NODE(sb_ro, "sb_ro");
> >      ENGINE_NODE_WITH_CLEAR_TRACK_DATA_IS_VALID(ct_zones, "ct_zones");
> >      ENGINE_NODE_WITH_CLEAR_TRACK_DATA(ovs_interface_shadow,
> >                                        "ovs_interface_shadow");
> > @@ -3664,6 +3732,7 @@ main(int argc, char *argv[])
> >      engine_add_input(&en_ovs_interface_shadow, &en_ovs_interface,
> >                       ovs_interface_shadow_ovs_interface_handler);
> >
> > +    engine_add_input(&en_runtime_data, &en_sb_ro,
> runtime_data_sb_ro_handler);
> >      engine_add_input(&en_runtime_data, &en_ofctrl_is_connected, NULL);
> >
> >      engine_add_input(&en_runtime_data, &en_ovs_open_vswitch, NULL);
> > @@ -4098,7 +4167,8 @@ main(int argc, char *argv[])
> >                          runtime_data ? &runtime_data->lbinding_data :
> NULL;
> >                      stopwatch_start(IF_STATUS_MGR_UPDATE_STOPWATCH_NAME,
> >                                      time_msec());
> > -                    if_status_mgr_update(if_mgr, binding_data);
> > +                    if_status_mgr_update(if_mgr, binding_data, chassis,
> > +                                         !ovnsb_idl_txn);
> >                      stopwatch_stop(IF_STATUS_MGR_UPDATE_STOPWATCH_NAME,
> >                                     time_msec());
> >
> > diff --git a/tests/ovn-macros.at b/tests/ovn-macros.at
> > index 335f9158c..8fd6ae6f7 100644
> > --- a/tests/ovn-macros.at
> > +++ b/tests/ovn-macros.at
> > @@ -759,3 +759,15 @@ m4_define([OVN_FOR_EACH_NORTHD_WITHOUT_DP_GROUPS],
> >       [m4_foreach([NORTHD_USE_DP_GROUPS], [no],
> >         [m4_foreach([NORTHD_USE_PARALLELIZATION], [yes, no], [$1
> >  ])])])])
> > +
> > +# OVN_NBCTL(NBCTL_COMMAND) adds NBCTL_COMMAND to list of commands to be
> run by RUN_OVN_NBCTL().
> > +m4_define([OVN_NBCTL], [
> > +    command="${command} -- $1"
> > +])
> > +
> > +# RUN_OVN_NBCTL() executes list of commands built by the OVN_NBCTL()
> macro.
> > +m4_define([RUN_OVN_NBCTL], [
> > +    check ovn-nbctl ${command}
> > +    unset command
> > +])
> > +
> > diff --git a/tests/ovn.at b/tests/ovn.at
> > index bfaa41962..94d16bac9 100644
> > --- a/tests/ovn.at
> > +++ b/tests/ovn.at
> > @@ -102,6 +102,18 @@ m4_divert_text([PREPARE_TESTS],
> >           test 1 -le $(as $hv1 ovs-ofctl dump-flows br-int | grep -c
> "output:$ofport")
> >       ])
> >     }
> > +
> > +   ovn_wait_remote_input_flows () {
> > +     hv1=$1
> > +     hv2=$2
> > +     echo "$3: waiting for flows for remote input on $hv1"
> > +     # Wait for a flow outputing  to remote input
> > +     OVS_WAIT_UNTIL([
> > +         ofport=$(as $hv1 ovs-vsctl --bare --columns ofport find
> Interface name=ovn-${hv2}-0)
> > +         echo "tunnel port=$ofport"
> > +         test 1 -le $(as $hv1 ovs-ofctl dump-flows br-int | grep -c
> "in_port=$ofport")
> > +     ])
> > +   }
> >  ])
> >
> >  m4_define([OVN_CHECK_PACKETS],
> > @@ -127,6 +139,8 @@ m4_define([OVN_WAIT_PATCH_PORT_FLOWS],
> >  m4_define([OVN_WAIT_REMOTE_OUTPUT_FLOWS],
> >    [ovn_wait_remote_output_flows "$1" "$2" "__file__:__line__"])
> >
> > +m4_define([OVN_WAIT_REMOTE_INPUT_FLOWS],
> > +  [ovn_wait_remote_input_flows "$1" "$2" "__file__:__line__"])
> >
> >  AT_BANNER([OVN components])
> >
> > @@ -14056,6 +14070,11 @@ wait_column "$hv1_uuid" Port_Binding
> requested_chassis logical_port=lsp0
> >  wait_column "$hv2_uuid" Port_Binding additional_chassis
> logical_port=lsp0
> >  wait_column "$hv2_uuid" Port_Binding requested_additional_chassis
> logical_port=lsp0
> >
> > +# Check ovn-installed updated for main chassis
> > +wait_for_ports_up
> > +OVS_WAIT_UNTIL([test `as hv1 ovs-vsctl get Interface lsp0
> external_ids:ovn-installed` = '"true"'])
> > +OVS_WAIT_UNTIL([test x`as hv2 ovs-vsctl get Interface lsp0
> external_ids:ovn-installed` = x])
> > +
> >  # Check that setting iface:encap-ip populates
> Port_Binding:additional_encap
> >  wait_row_count Encap 2 chassis_name=hv1
> >  wait_row_count Encap 2 chassis_name=hv2
> > @@ -14081,6 +14100,11 @@ wait_column "$hv2_uuid" Port_Binding
> requested_chassis logical_port=lsp0
> >  wait_column "" Port_Binding additional_chassis logical_port=lsp0
> >  wait_column "" Port_Binding requested_additional_chassis
> logical_port=lsp0
> >
> > +# Check ovn-installed updated for main chassis and not for other chassis
> > +wait_for_ports_up
> > +OVS_WAIT_UNTIL([test `as hv2 ovs-vsctl get Interface lsp0
> external_ids:ovn-installed` = '"true"'])
> > +OVS_WAIT_UNTIL([test x`as hv1 ovs-vsctl get Interface lsp0
> external_ids:ovn-installed` = x])
> > +
> >  # Check that additional_encap is cleared
> >  wait_column "" Port_Binding additional_encap logical_port=lsp0
> >
> > @@ -15370,7 +15394,10 @@ AT_CHECK([as hv1 ovs-ofctl dump-flows br-int
> table=65 | grep actions=output:1],
> >  echo "verifying that lsp0 binding moves when requested-chassis is
> changed"
> >
> >  ovn-nbctl lsp-set-options lsp0 requested-chassis=hv2
> > -OVS_WAIT_UNTIL([test 1 = $(grep -c "Releasing lport lsp0 from this
> chassis" hv1/ovn-controller.log)])
> > +
> > +# We might see multiple "Releasing lport ...", when sb is read only
> > +OVS_WAIT_UNTIL([test 1 -le $(grep -c "Releasing lport lsp0 from this
> chassis" hv1/ovn-controller.log)])
> > +
> >  wait_column "$hv2_uuid" Port_Binding chassis logical_port=lsp0
> >
> >  # (6) Chassis hv2 should add flows and hv1 should not.
> > @@ -15416,7 +15443,7 @@ AT_CHECK([as hv1 ovs-ofctl dump-flows br-int
> table=0 | grep in_port=1], [0], [ig
> >  AT_CHECK([as hv1 ovs-ofctl dump-flows br-int table=65 | grep
> actions=output:1], [0], [ignore])
> >
> >  check ovn-nbctl --wait=hv lsp-set-options lsp0
> requested-chassis=non-existant-chassis
> > -OVS_WAIT_UNTIL([test 1 = $(grep -c "Releasing lport lsp0 from this
> chassis" hv1/ovn-controller.log)])
> > +OVS_WAIT_UNTIL([test 1 -le $(grep -c "Releasing lport lsp0 from this
> chassis" hv1/ovn-controller.log)])
> >  check ovn-nbctl --wait=hv sync
> >  wait_column '' Port_Binding chasssi logical_port=lsp0
> >  AT_CHECK([as hv1 ovs-ofctl dump-flows br-int table=0 | grep in_port=1],
> [1], [])
> > @@ -32418,3 +32445,119 @@ AT_CHECK([test $(ovn-sbctl list fdb | grep -c
> "00:00:00:00:10:30") = 0])
> >  OVN_CLEANUP([hv1])
> >  AT_CLEANUP
> >  ])
> > +
> > +OVN_FOR_EACH_NORTHD([
> > +AT_SETUP([recomputes])
> > +ovn_start
> > +
> > +n_hv=4
> > +
> > +# Add chassis
> > +net_add n1
> > +for i in $(seq 1 $n_hv); do
> > +    sim_add hv$i
> > +    as hv$i
> > +    check ovs-vsctl add-br br-phys
> > +    ovs-vsctl set open . external-ids:ovn-bridge-mappings=phys:br-phys
> > +    ovn_attach n1 br-phys 192.168.0.$i 24 geneve
> > +done
> > +
> > +add_switch_ports() {
> > +    start_port=$1
> > +    end_port=$2
> > +    nb_hv=$3
> > +    bulk_size=$4
> > +    for ((i=start_port; i<end_port; )) do
> > +        start_bulk=$i
> > +        for hv in $(seq 1 $nb_hv); do
> > +            end_bulk=$((start_bulk+bulk_size-1))
> > +            for port in $(seq $start_bulk $end_bulk); do
> > +                logical_switch_port=lsp${port}
> > +                OVN_NBCTL(lsp-add ls1 $logical_switch_port)
> > +                OVN_NBCTL(lsp-set-addresses $logical_switch_port
> dynamic)
> > +            done
> > +            start_bulk=$((end_bulk+1))
> > +        done
> > +        RUN_OVN_NBCTL()
> > +
> > +        start_bulk=$i
> > +        for hv in $(seq 1 $nb_hv); do
> > +            end_bulk=$((start_bulk+bulk_size-1))
> > +            for port in $(seq $start_bulk $end_bulk); do
> > +                logical_switch_port=lsp${port}
> > +                as hv$hv ovs-vsctl \
> > +                    --no-wait -- add-port br-int vif${port} \
> > +                    -- set Interface vif${port}
> external_ids:iface-id=$logical_switch_port
> > +            done
> > +            start_bulk=$((end_bulk+1))
> > +        done
> > +        i=$((end_bulk+1))
> > +    done
> > +}
> > +check ovn-nbctl ls-add ls1
> > +check ovn-nbctl set Logical_Switch ls1 other_config:subnet=10.1.0.0/16
> > +check ovn-nbctl set Logical_Switch ls1
> other_config:exclude_ips=10.1.255.254
> > +
> > +check ovn-nbctl lr-add lr1
> > +check ovn-nbctl lsp-add ls1 lsp0 -- set Logical_Switch_Port lsp0
> type=router options:router-port=lrp0 addresses=dynamic
> > +check ovn-nbctl lrp-add lr1 lrp0 "f0:00:00:01:00:01" 10.1.255.254/16
> > +check ovn-nbctl lr-nat-add lr1 snat 10.2.0.1 10.1.0.0/16
> > +
> > +lflow_run=0
> > +check ovn-nbctl --wait=hv sync
> > +
> > +# Tunnel ports might not be added (yet) at this point on slow system.
> > +# Wait for flows related to such ports to ensure those ports have been
> added
> > +# before we measure recomputes. Otherwise, ovs_interface handler might
> be run
> > +# afterwards for tunnel ports, causing recomputes.
> > +for i in $(seq 1 $n_hv); do
> > +    for j in $(seq 1 $n_hv); do
> > +        if test $i != $j; then
> > +            OVN_WAIT_REMOTE_INPUT_FLOWS(["hv$i"],["hv$j"])
> > +        fi
> > +    done
> > +done
> > +
> > +for i in $(seq 1 $n_hv); do
> > +    as hv$i
> > +    lflow_run1=$(ovn-appctl -t ovn-controller coverage/read-counter
> lflow_run)
> > +    lflow_run=`expr $lflow_run1 + $lflow_run`
> > +done
> > +
> > +add_switch_ports 1 1000 $n_hv 5
> > +
> > +wait_for_ports_up
> > +check ovn-nbctl --wait=hv sync
> > +
> > +for i in $(seq 1 $n_hv); do
> > +    pid=$(cat hv${i}/ovn-controller.pid)
> > +    u=$((u+$(cat "/proc/$pid/stat" | awk '{print $14}')))
> > +    s=$((s+$(cat "/proc/$pid/stat" | awk '{print $15}')))
> > +done
> > +
> > +n_pid=$(cat northd/ovn-northd.pid)
> > +n_u=$(cat "/proc/$pid/stat" | awk '{print $14}')
> > +n_s=$(cat "/proc/$pid/stat" | awk '{print $15}')
> > +
> > +echo "Total Northd User Time: $n_u"
> > +echo "Total Northd System Time: $n_s"
> > +echo "Total Controller User Time: $u"
> > +echo "Total Controller System Time: $s"
> > +
> > +lflow_run_end=0
> > +for i in $(seq 1 $n_hv); do
> > +    as hv$i
> > +    lflow_run1=$(ovn-appctl -t ovn-controller coverage/read-counter
> lflow_run)
> > +    lflow_run_end=`expr $lflow_run1 + $lflow_run_end`
> > +done
> > +n_recomputes=`expr $lflow_run_end - $lflow_run`
> > +echo "$n_recomputes recomputes"
> > +
> > +AT_CHECK([test $lflow_run_end == $lflow_run])
> > +
> > +for i in $(seq 2 $n_hv); do
> > +    OVN_CLEANUP_SBOX([hv$i])
> > +done
> > +OVN_CLEANUP([hv1])
> > +AT_CLEANUP
> > +])
> > diff --git a/tests/perf-northd.at b/tests/perf-northd.at
> > index 74b69e9d4..6ec196b36 100644
> > --- a/tests/perf-northd.at
> > +++ b/tests/perf-northd.at
> > @@ -76,23 +76,6 @@ m4_define([PERF_RECORD_STOP], [
> >      PERF_RECORD_STOPWATCH(ovn-northd-loop, ["Short term average"],
> [Average (northd-loop in msec)])
> >  ])
> >
> > -# OVN_NBCTL([NBCTL_COMMAND])
> > -#
> > -# Add NBCTL_COMMAND to list of commands to be run by RUN_OVN_NBCTL().
> > -#
> > -m4_define([OVN_NBCTL], [
> > -    command="${command} -- $1"
> > -])
> > -
> > -# RUN_OVN_NBCTL()
> > -#
> > -# Execute list of commands built by the OVN_NBCTL() macro.
> > -#
> > -m4_define([RUN_OVN_NBCTL], [
> > -    check ovn-nbctl ${command}
> > -    unset command
> > -])
> > -
> >  OVS_START_SHELL_HELPERS
> >  generate_subnet () {
> >      local a=$(printf %d $(expr $1 / 256 + 10))
> > --
> > 2.31.1
> >
>
Dumitru Ceara July 11, 2022, 11:55 a.m. UTC | #4
On 7/11/22 13:31, Xavier Simonart wrote:
> Hi Han
> 
> Thanks for your review.
> 
> Let me try to understand your two main concerns and the proper way to fix
> it.
> 1) We only try once to write pb->chassis. If the commit fails, pb->chassis
> is not written. As commit fails, we will recompute, but as the
> update_required flag is not set anymore, we might end up with no
> pb->chassis.
> => I'll remove the flag and try to update until it's confirmed.
> 2) The state machine, and when we move to INSTALL_FLOWS. Serializing the
> state machine, by waiting for confirmation to be received before moving to
> INSTALL_FLOWS state will delay the ovn-installed compared to today. So I am
> (still) trying to see if there is any way to prevent this in some cases.
> Would it be correct to do this serialization (wait for pb->chassis update
> confirmation) only when using conditional monitoring? When using
> monitor-all, as soon as we have written (w/o confirmation) pb->chassis, we
> would move to INSTALL_FLOWS. In that loop where we wrote pb->chassis, (all)
> the flows should be updated taking into account pb->chassis.

I think this approach is OK.  ovn-controllers that don't use conditional
monitoring already know the complete SB contents and should be able to
install (mostly?) complete sets of openflows that correspond to a given
Port_Binding.

It's also "CMS-friendly", at least for ovn-kubernetes, which uses
ovn-monitor-all=true and only waits for
OVS.Interface.external_ids:ovn-installed=true.  And shouldn't impact the
others which wait for SB.Port_Binding.up=true.

> 
> Thanks again for your feedback
> 
> Xavier
> 
> On Thu, Jul 7, 2022 at 8:49 AM Han Zhou <hzhou@ovn.org> wrote:
> 
>>
>>
>> On Wed, Jun 22, 2022 at 3:23 AM Xavier Simonart <xsimonar@redhat.com>
>> wrote:
>>>
>>> When VIF ports are claimed on a chassis, SBDB Port_Binding table is
>> updated.
>>> If the SBDB IDL is still is read-only ("in transaction") when such a
>> update
>>> is required, the update is not possible and recompute is triggered
>> through
>>> I+P failure.
>>>
>>> This situation can happen:
>>> - after updating Port_Binding->chassis to SBDB for one port, in a
>> following
>>>   iteration, ovn-controller handles Interface:external_ids:ovn-installed
>>>   (for the same port) while SBDB is still read-only.
>>> - after updating Port_Binding->chassis to SBDB for one port, in a
>> following
>>>   iteration, ovn-controller updates Port_Binding->chassis for another
>> port,
>>>   while SBDB is still read-only.
>>>
>>> This patch prevent the recompute, by having the if-status module
>>> updating the Port_Binding chassis (if needed) when possible.
>>> This does not delay Port_Binding chassis update compared to before this
>> patch.
>>> - With the patch, Port_Binding chassis will be updated as soon as SBDB is
>>> again writable, without recompute.
>>> - Without the patch, Port_Binding chassis was updated as soon as SBDB was
>>> again writable, through a recompute.
>>>
>>> As part of this patch, ovn-installed will not be updated for additional
>> chassis;
>>> it will only be updated when the migration is completed.
>>>
>>> Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2059253
>>> Signed-off-by: Xavier Simonart <xsimonar@redhat.com>
>>>
>>> ---
>>> v2:  - handled Dumitru's comments.
>>>      - handled Han's comments, mainly ensure we moved out of CLAIMED
>> state
>>>        only after updating pb->chassis to guarentee physical flows are
>> installed
>>>        when ovn-installed is updated in OVS.
>>>      - slighly reorganize the code to isolate 'notify_up = false' cases
>> in
>>>        claim_port (i.e. ports such as virtual ports), in the idea of
>> making
>>>        future patch preventing recomputes when virtual ports are claimed.
>>>      - updated test case to cause more race conditions.
>>>      - rebased on origin/main
>>>      - note that "additional chassis" as now supported by
>>>        "Support LSP:options:requested-chassis as a list" might still
>> cause
>>>        recomputes.
>>>      - fixed missing flows when Port_Binding chassis was updated by
>> mgr_update
>>>        w/o any lflow recalculation.
>>> v3:  - handled Dumitru's comments on v2, mainly have runtime_data handler
>>>        handling pb_claims when sb becomes writable (instead of a lflow
>> handler).
>>>      - fixed test as it was not checking recomputes on all hv, as well
>> as a flaky
>>>        behavior.
>>>      - rebased on origin/main.
>>> ---
>>>  controller/binding.c        | 154 +++++++++++++++++++++----------
>>>  controller/binding.h        |  15 +++-
>>>  controller/if-status.c      | 174 ++++++++++++++++++++++++++++++++----
>>>  controller/if-status.h      |  16 +++-
>>>  controller/ovn-controller.c |  72 ++++++++++++++-
>>>  tests/ovn-macros.at         |  12 +++
>>>  tests/ovn.at                | 147 +++++++++++++++++++++++++++++-
>>>  tests/perf-northd.at        |  17 ----
>>>  8 files changed, 519 insertions(+), 88 deletions(-)
>>>
>>> diff --git a/controller/binding.c b/controller/binding.c
>>> index 2279570f9..b21577f71 100644
>>> --- a/controller/binding.c
>>> +++ b/controller/binding.c
>>> @@ -644,11 +644,17 @@ local_binding_get_lport_ofport(const struct shash
>> *local_bindings,
>>>  }
>>>
>>>  bool
>>> -local_binding_is_up(struct shash *local_bindings, const char *pb_name)
>>> +local_binding_is_up(struct shash *local_bindings, const char *pb_name,
>>> +                    const struct sbrec_chassis *chassis_rec)
>>>  {
>>>      struct local_binding *lbinding =
>>>          local_binding_find(local_bindings, pb_name);
>>>      struct binding_lport *b_lport =
>> local_binding_get_primary_lport(lbinding);
>>> +
>>> +    if (b_lport && b_lport->pb->chassis != chassis_rec) {
>>> +        return false;
>>> +    }
>>> +
>>>      if (lbinding && b_lport && lbinding->iface) {
>>>          if (b_lport->pb->n_up && !b_lport->pb->up[0]) {
>>>              return false;
>>> @@ -660,13 +666,23 @@ local_binding_is_up(struct shash *local_bindings,
>> const char *pb_name)
>>>  }
>>>
>>>  bool
>>> -local_binding_is_down(struct shash *local_bindings, const char *pb_name)
>>> +local_binding_is_down(struct shash *local_bindings, const char *pb_name,
>>> +                      const struct sbrec_chassis *chassis_rec)
>>>  {
>>>      struct local_binding *lbinding =
>>>          local_binding_find(local_bindings, pb_name);
>>>
>>>      struct binding_lport *b_lport =
>> local_binding_get_primary_lport(lbinding);
>>>
>>> +    if (b_lport) {
>>> +        if (b_lport->pb->chassis == chassis_rec) {
>>> +            return false;
>>> +        } else if (b_lport->pb->chassis) {
>>> +            VLOG_DBG("lport %s already claimed by other chassis",
>>> +                     b_lport->pb->logical_port);
>>> +        }
>>> +    }
>>> +
>>>      if (!lbinding) {
>>>          return true;
>>>      }
>>> @@ -884,37 +900,80 @@ get_lport_type_str(enum en_lport_type lport_type)
>>>      OVS_NOT_REACHED();
>>>  }
>>>
>>> -/* For newly claimed ports, if 'notify_up' is 'false':
>>> +void
>>> +set_pb_chassis_in_sbrec(const struct sbrec_port_binding *pb,
>>> +                        const struct sbrec_chassis *chassis_rec,
>>> +                        bool is_set)
>>> +{
>>> +    if (pb->chassis != chassis_rec) {
>>> +         if (is_set) {
>>> +            if (pb->chassis) {
>>> +                VLOG_INFO("Changing chassis for lport %s from %s to
>> %s.",
>>> +                          pb->logical_port, pb->chassis->name,
>>> +                          chassis_rec->name);
>>> +            } else {
>>> +                VLOG_INFO("Claiming lport %s for this chassis.",
>>> +                          pb->logical_port);
>>> +            }
>>> +            for (int i = 0; i < pb->n_mac; i++) {
>>> +                VLOG_INFO("%s: Claiming %s", pb->logical_port,
>> pb->mac[i]);
>>> +            }
>>> +            sbrec_port_binding_set_chassis(pb, chassis_rec);
>>> +        }
>>> +    } else if (!is_set) {
>>> +        sbrec_port_binding_set_chassis(pb, NULL);
>>> +    }
>>> +}
>>> +
>>> +void
>>> +local_binding_set_pb(struct shash *local_bindings, const char *pb_name,
>>> +                     const struct sbrec_chassis *chassis_rec,
>>> +                     struct hmap *tracked_datapaths, bool is_set)
>>> +{
>>> +    struct local_binding *lbinding =
>>> +        local_binding_find(local_bindings, pb_name);
>>> +    struct binding_lport *b_lport =
>> local_binding_get_primary_lport(lbinding);
>>> +
>>> +    if (b_lport) {
>>> +        set_pb_chassis_in_sbrec(b_lport->pb, chassis_rec, is_set);
>>> +        if (tracked_datapaths) {
>>> +            update_lport_tracking(b_lport->pb, tracked_datapaths, true);
>>> +        }
>>> +    }
>>> +}
>>> +
>>> +/* For newly claimed ports:
>>>   * - set the 'pb.up' field to true if 'pb' has no 'parent_pb'.
>>>   * - set the 'pb.up' field to true if 'parent_pb.up' is 'true' (e.g.,
>> for
>>>   *   container and virtual ports).
>>> - * Otherwise request a notification to be sent when the OVS flows
>>> - * corresponding to 'pb' have been installed.
>>> + *
>>> + * Returns false if lport is not claimed due to 'sb_readonly'.
>>> + * Returns true otherwise.
>>>   *
>>>   * Note:
>>> - *   Updates (directly or through a notification) the 'pb->up' field
>> only if
>>> - *   it's explicitly set to 'false'.
>>> + *   Updates the 'pb->up' field only if it's explicitly set to 'false'.
>>>   *   This is to ensure compatibility with older versions of ovn-northd.
>>>   */
>>> -static void
>>> +static bool
>>>  claimed_lport_set_up(const struct sbrec_port_binding *pb,
>>>                       const struct sbrec_port_binding *parent_pb,
>>> -                     const struct sbrec_chassis *chassis_rec,
>>> -                     bool notify_up, struct if_status_mgr *if_mgr)
>>> +                     bool sb_readonly)
>>>  {
>>> -    if (!notify_up) {
>>> -        bool up = true;
>>> -        if (!parent_pb || (parent_pb->n_up && parent_pb->up[0])) {
>>> +    /* When notify_up is false in claim_port(), no state is created
>>> +     * by if_status_mgr. In such cases, return false (i.e. trigger
>> recompute)
>>> +     * if we can't update sb (because it is readonly).
>>> +     */
>>> +    bool up = true;
>>> +    if (!parent_pb || (parent_pb->n_up && parent_pb->up[0])) {
>>> +        if (!sb_readonly) {
>>>              if (pb->n_up) {
>>>                  sbrec_port_binding_set_up(pb, &up, 1);
>>>              }
>>> +        } else if (pb->n_up && !pb->up[0]) {
>>> +            return false;
>>>          }
>>> -        return;
>>> -    }
>>> -
>>> -    if (pb->chassis != chassis_rec || (pb->n_up && !pb->up[0])) {
>>> -        if_status_mgr_claim_iface(if_mgr, pb->logical_port);
>>>      }
>>> +    return true;
>>>  }
>>>
>>>  typedef void (*set_func)(const struct sbrec_port_binding *pb,
>>> @@ -1057,37 +1116,35 @@ claim_lport(const struct sbrec_port_binding *pb,
>>>              struct hmap *tracked_datapaths,
>>>              struct if_status_mgr *if_mgr)
>>>  {
>>> -    if (!sb_readonly) {
>>> -        claimed_lport_set_up(pb, parent_pb, chassis_rec, notify_up,
>> if_mgr);
>>> -    }
>>> -
>>>      enum can_bind can_bind =
>> lport_can_bind_on_this_chassis(chassis_rec, pb);
>>>      bool update_tracked = false;
>>>
>>>      if (can_bind == CAN_BIND_AS_MAIN) {
>>>          if (pb->chassis != chassis_rec) {
>>> -            if (sb_readonly) {
>>> -                return false;
>>> -            }
>>> -
>>> -            if (pb->chassis) {
>>> -                VLOG_INFO("Changing chassis for lport %s from %s to
>> %s.",
>>> -                        pb->logical_port, pb->chassis->name,
>>> -                        chassis_rec->name);
>>> -            } else {
>>> -                VLOG_INFO("Claiming lport %s for this chassis.",
>>> -                          pb->logical_port);
>>> -            }
>>> -            for (size_t i = 0; i < pb->n_mac; i++) {
>>> -                VLOG_INFO("%s: Claiming %s", pb->logical_port,
>> pb->mac[i]);
>>> -            }
>>> -
>>> -            sbrec_port_binding_set_chassis(pb, chassis_rec);
>>>              if (is_additional_chassis(pb, chassis_rec)) {
>>> +                if (sb_readonly) {
>>> +                    return false;
>>> +                }
>>>                  remove_additional_chassis(pb, chassis_rec);
>>>              }
>>>              update_tracked = true;
>>>          }
>>> +        if (!notify_up) {
>>> +            if (!claimed_lport_set_up(pb, parent_pb, sb_readonly)) {
>>> +                return false;
>>> +            }
>>> +            if (pb->chassis != chassis_rec) {
>>> +                if (sb_readonly) {
>>> +                    return false;
>>> +                }
>>> +                set_pb_chassis_in_sbrec(pb, chassis_rec, true);
>>> +            }
>>> +        } else {
>>> +            if ((pb->chassis != chassis_rec) || (pb->n_up &&
>> !pb->up[0])) {
>>> +                if_status_mgr_claim_iface(if_mgr, pb, chassis_rec,
>>> +                                          sb_readonly);
>>> +            }
>>> +        }
>>>      } else if (can_bind == CAN_BIND_AS_ADDITIONAL) {
>>>          if (!is_additional_chassis(pb, chassis_rec)) {
>>>              if (sb_readonly) {
>>> @@ -1132,7 +1189,8 @@ claim_lport(const struct sbrec_port_binding *pb,
>>>   */
>>>  static bool
>>>  release_lport_main_chassis(const struct sbrec_port_binding *pb,
>>> -                           bool sb_readonly)
>>> +                           bool sb_readonly,
>>> +                           struct if_status_mgr *if_mgr)
>>>  {
>>>      if (pb->encap) {
>>>          if (sb_readonly) {
>>> @@ -1141,11 +1199,13 @@ release_lport_main_chassis(const struct
>> sbrec_port_binding *pb,
>>>          sbrec_port_binding_set_encap(pb, NULL);
>>>      }
>>>
>>> +    /* If sb readonly, pb->chassis unset through if-status if present.
>> */
>>>      if (pb->chassis) {
>>> -        if (sb_readonly) {
>>> +        if (!sb_readonly) {
>>> +            sbrec_port_binding_set_chassis(pb, NULL);
>>> +        } else if (!if_status_mgr_iface_is_present(if_mgr,
>> pb->logical_port)) {
>>>              return false;
>>>          }
>>> -        sbrec_port_binding_set_chassis(pb, NULL);
>>>      }
>>>
>>>      if (pb->virtual_parent) {
>>> @@ -1155,7 +1215,8 @@ release_lport_main_chassis(const struct
>> sbrec_port_binding *pb,
>>>          sbrec_port_binding_set_virtual_parent(pb, NULL);
>>>      }
>>>
>>> -    VLOG_INFO("Releasing lport %s from this chassis.",
>> pb->logical_port);
>>> +    VLOG_INFO("Releasing lport %s from this chassis (sb_readonly=%d)",
>>> +              pb->logical_port, sb_readonly);
>>>      return true;
>>>  }
>>>
>>> @@ -1189,7 +1250,7 @@ release_lport(const struct sbrec_port_binding *pb,
>>>                struct hmap *tracked_datapaths, struct if_status_mgr
>> *if_mgr)
>>>  {
>>>      if (pb->chassis == chassis_rec) {
>>> -        if (!release_lport_main_chassis(pb, sb_readonly)) {
>>> +        if (!release_lport_main_chassis(pb, sb_readonly, if_mgr)) {
>>>              return false;
>>>          }
>>>      } else if (is_additional_chassis(pb, chassis_rec)) {
>>> @@ -1271,7 +1332,7 @@ consider_vif_lport_(const struct
>> sbrec_port_binding *pb,
>>>                               b_lport->lbinding->iface,
>>>                               !b_ctx_in->ovnsb_idl_txn,
>>>                               !parent_pb, b_ctx_out->tracked_dp_bindings,
>>> -                             b_ctx_out->if_mgr)){
>>> +                             b_ctx_out->if_mgr)) {
>>>                  return false;
>>>              }
>>>
>>> @@ -1527,7 +1588,8 @@ consider_localport(const struct sbrec_port_binding
>> *pb,
>>>      enum can_bind can_bind = lport_can_bind_on_this_chassis(
>>>          b_ctx_in->chassis_rec, pb);
>>>      if (can_bind == CAN_BIND_AS_MAIN) {
>>> -        if (!release_lport_main_chassis(pb, !b_ctx_in->ovnsb_idl_txn)) {
>>> +        if (!release_lport_main_chassis(pb, !b_ctx_in->ovnsb_idl_txn,
>>> +            b_ctx_out->if_mgr)) {
>>>              return false;
>>>          }
>>>      } else if (can_bind == CAN_BIND_AS_ADDITIONAL) {
>>> diff --git a/controller/binding.h b/controller/binding.h
>>> index 1fed06674..d20659b0b 100644
>>> --- a/controller/binding.h
>>> +++ b/controller/binding.h
>>> @@ -151,8 +151,10 @@ const struct sbrec_port_binding
>> *local_binding_get_primary_pb(
>>>  ofp_port_t local_binding_get_lport_ofport(const struct shash
>> *local_bindings,
>>>                                            const char *pb_name);
>>>
>>> -bool local_binding_is_up(struct shash *local_bindings, const char
>> *pb_name);
>>> -bool local_binding_is_down(struct shash *local_bindings, const char
>> *pb_name);
>>> +bool local_binding_is_up(struct shash *local_bindings, const char
>> *pb_name,
>>> +                         const struct sbrec_chassis *);
>>> +bool local_binding_is_down(struct shash *local_bindings, const char
>> *pb_name,
>>> +                           const struct sbrec_chassis *);
>>>  void local_binding_set_up(struct shash *local_bindings, const char
>> *pb_name,
>>>                            const struct sbrec_chassis *chassis_rec,
>>>                            const char *ts_now_str, bool sb_readonly,
>>> @@ -160,7 +162,10 @@ void local_binding_set_up(struct shash
>> *local_bindings, const char *pb_name,
>>>  void local_binding_set_down(struct shash *local_bindings, const char
>> *pb_name,
>>>                              const struct sbrec_chassis *chassis_rec,
>>>                              bool sb_readonly, bool ovs_readonly);
>>> -
>>> +void local_binding_set_pb(struct shash *local_bindings, const char
>> *pb_name,
>>> +                          const struct sbrec_chassis *chassis_rec,
>>> +                          struct hmap *tracked_datapaths,
>>> +                          bool is_set);
>>>  void binding_register_ovs_idl(struct ovsdb_idl *);
>>>  void binding_run(struct binding_ctx_in *, struct binding_ctx_out *);
>>>  bool binding_cleanup(struct ovsdb_idl_txn *ovnsb_idl_txn,
>>> @@ -178,6 +183,10 @@ void binding_dump_local_bindings(struct
>> local_binding_data *, struct ds *);
>>>  bool is_additional_chassis(const struct sbrec_port_binding *pb,
>>>                             const struct sbrec_chassis *chassis_rec);
>>>
>>> +void set_pb_chassis_in_sbrec(const struct sbrec_port_binding *pb,
>>> +                             const struct sbrec_chassis *chassis_rec,
>>> +                             bool is_set);
>>> +
>>>  /* Corresponds to each Port_Binding.type. */
>>>  enum en_lport_type {
>>>      LP_UNKNOWN,
>>> diff --git a/controller/if-status.c b/controller/if-status.c
>>> index ad61844d8..7693c289b 100644
>>> --- a/controller/if-status.c
>>> +++ b/controller/if-status.c
>>> @@ -24,6 +24,7 @@
>>>  #include "lib/util.h"
>>>  #include "timeval.h"
>>>  #include "openvswitch/vlog.h"
>>> +#include "lib/ovn-sb-idl.h"
>>>
>>>  VLOG_DEFINE_THIS_MODULE(if_status);
>>>
>>> @@ -53,9 +54,11 @@ VLOG_DEFINE_THIS_MODULE(if_status);
>>>   */
>>>
>>>  enum if_state {
>>> -    OIF_CLAIMED,       /* Newly claimed interface. */
>>> -    OIF_INSTALL_FLOWS, /* Already claimed interface for which flows are
>> still
>>> -                        * being installed.
>>> +    OIF_CLAIMED,       /* Newly claimed interface. pb->chassis not yet
>> updated.
>>> +                        */
>>> +    OIF_INSTALL_FLOWS, /* Claimed interface with pb->chassis
>> successfully
>>> +                        * updated in SB and for which flows are still
>> being
>>> +                        * installed.
>>>                          */
>>>      OIF_MARK_UP,       /* Interface with flows successfully installed
>> in OVS
>>>                          * but not yet marked "up" in the binding module
>> (in
>>> @@ -78,6 +81,55 @@ static const char *if_state_names[] = {
>>>      [OIF_INSTALLED]     = "INSTALLED",
>>>  };
>>>
>>> +/*
>>> + *       +----------------------+
>>> + * +---> |                      |
>>> + * | +-> |         NULL         |
>> <--------------------------------------+++-+
>>> + * | |   +----------------------+
>>      |
>>> + * | |     ^ release_iface   | claim_iface
>>       |
>>> + * | |     |                 V - sbrec_update_chassis(if sb is rw)
>>       |
>>> + * | |   +----------------------+
>>      |
>>> + * | |   |                      |
>> <----------------------------------------+ |
>>> + * | |   |       CLAIMED        |
>> <--------------------------------------+ | |
>>> + * | |   +----------------------+
>>  | | |
>>> + * | |                  | mgr_update(when sb is rw)
>>  | | |
>>> + * | | release_iface    |  - sbrec_update_chassis
>>  | | |
>>> + * | |                  |  - request seqno
>>   | | |
>>> + * | |                  V
>>  | | |
>>> + * | |   +----------------------+
>>  | | |
>>> + * | +-- |                      |  mgr_run(seqno not rcvd)
>>   | | |
>>> + * |     |    INSTALL_FLOWS     |   - set port down in sb
>>  | | |
>>> + * |     |                      |  mgr_update()
>>  | | |
>>> + * |     +----------------------+   - sbrec_update_chassis if needed
>>   | | |
>>> + * |                    |
>>  | | |
>>> + * |                    |  mgr_run(seqno rcvd)
>>   | | |
>>> + * |                    |  - set port up in sb
>>   | | |
>>> + * | release_iface      |  - set ovn-installed in ovs
>>  | | |
>>> + * |                    V
>>  | | |
>>> + * |   +----------------------+
>>  | | |
>>> + * |   |                      |  mgr_run()
>>   | | |
>>> + * +-- |       MARK_UP        |  - set port up in sb
>>   | | |
>>> + *     |                      |  - set ovn-installed in ovs
>>  | | |
>>> + *     |                      |  mgr_update()
>>  | | |
>>> + *     +----------------------+  - sbrec_update_chassis if needed
>>  | | |
>>> + *              |
>>  | | |
>>> + *              | mgr_update(rcvd port up / ovn_installed & chassis
>> set) | | |
>>> + *              V
>>  | | |
>>> + *     +----------------------+
>>  | | |
>>> + *     |      INSTALLED       | ------------> claim_iface
>> ---------------+ | |
>>> + *     +----------------------+
>>    | |
>>> + *              |
>>    | |
>>> + *              | release_iface
>>    | |
>>> + *              V
>>    | |
>>> + *     +----------------------+
>>    | |
>>> + *     |                      | ------------> claim_iface
>> -----------------+ |
>>> + *     |      MARK_DOWN       | ------> mgr_update(rcvd port down)
>> ----------+
>>> + *     |                      | mgr_run()
>>> + *     |                      | - set port down in sb
>>> + *     |                      | mgr_update()
>>> + *     +----------------------+ - sbrec_update_chassis(NULL)
>>> + */
>>> +
>>>  struct ovs_iface {
>>>      char *id;               /* Extracted from OVS
>> external_ids.iface_id. */
>>>      enum if_state state;    /* State of the interface in the state
>> machine. */
>>> @@ -85,6 +137,7 @@ struct ovs_iface {
>>>                               * be fully programmed in OVS.  Only used
>> in state
>>>                               * OIF_INSTALL_FLOWS.
>>>                               */
>>> +    bool chassis_update_required;  /* If true, pb->chassis must be
>> updated. */
>>>  };
>>>
>>>  static uint64_t ifaces_usage;
>>> @@ -158,14 +211,23 @@ if_status_mgr_destroy(struct if_status_mgr *mgr)
>>>  }
>>>
>>>  void
>>> -if_status_mgr_claim_iface(struct if_status_mgr *mgr, const char
>> *iface_id)
>>> +if_status_mgr_claim_iface(struct if_status_mgr *mgr,
>>> +                          const struct sbrec_port_binding *pb,
>>> +                          const struct sbrec_chassis *chassis_rec,
>>> +                          bool sb_readonly)
>>>  {
>>> +    const char *iface_id = pb->logical_port;
>>>      struct ovs_iface *iface = shash_find_data(&mgr->ifaces, iface_id);
>>>
>>>      if (!iface) {
>>>          iface = ovs_iface_create(mgr, iface_id, OIF_CLAIMED);
>>>      }
>>> -
>>> +    if (!sb_readonly) {
>>> +        set_pb_chassis_in_sbrec(pb, chassis_rec, true);
>>> +        iface->chassis_update_required = false;
>>> +    } else {
>>> +        iface->chassis_update_required = true;
>>> +    }
>>>      switch (iface->state) {
>>>      case OIF_CLAIMED:
>>>      case OIF_INSTALL_FLOWS:
>>> @@ -182,6 +244,12 @@ if_status_mgr_claim_iface(struct if_status_mgr
>> *mgr, const char *iface_id)
>>>      }
>>>  }
>>>
>>> +bool
>>> +if_status_mgr_iface_is_present(struct if_status_mgr *mgr, const char
>> *iface_id)
>>> +{
>>> +    return !!shash_find_data(&mgr->ifaces, iface_id);
>>> +}
>>> +
>>>  void
>>>  if_status_mgr_release_iface(struct if_status_mgr *mgr, const char
>> *iface_id)
>>>  {
>>> @@ -246,9 +314,39 @@ if_status_mgr_delete_iface(struct if_status_mgr
>> *mgr, const char *iface_id)
>>>      }
>>>  }
>>>
>>> +bool
>>> +if_status_handle_claims(struct if_status_mgr *mgr,
>>> +                        struct local_binding_data *binding_data,
>>> +                        const struct sbrec_chassis *chassis_rec,
>>> +                        struct hmap *tracked_datapath,
>>> +                        bool sb_readonly)
>>> +{
>>> +    if (!binding_data || sb_readonly) {
>>> +        return false;
>>> +    }
>>> +
>>> +    struct shash *bindings = &binding_data->bindings;
>>> +    struct hmapx_node *node;
>>> +
>>> +    bool rc = false;
>>> +    HMAPX_FOR_EACH (node, &mgr->ifaces_per_state[OIF_CLAIMED]) {
>>> +        struct ovs_iface *iface = node->data;
>>> +        if (iface->chassis_update_required) {
>>
>> Thanks Xavier for the revision. The state machine looks more clear now,
>> but I have a major concern for the use of chassis_update_required. This
>> bool flag is used to decide if an update to SB is needed, and once a SB
>> update is requested, it is set to false, and assumes the SB update will
>> succeed immediately. However, the assumption may be wrong. There can be
>> different kinds of reasons that the subsequent SB update fails, or delayed,
>> so this flag is not reliable. Instead, in CLAIMED state, the responsibility
>> to make sure the SB update is completed. If the transaction is in-progress,
>> the sb_readonly is true. So if sb_readonly is false, it means nothing is
>> in-progress, so we can always check if (!sb_readonly && <SB chassis is not
>> updated for the port-binding>) we should just send the update, regardless
>> of whether we have requested it before. Please also see another comment
>> below for the state transition.
>>
>>> +            VLOG_INFO("if_status_handle_claims for %s", iface->id);
>>> +            local_binding_set_pb(bindings, iface->id, chassis_rec,
>>> +                                 tracked_datapath, true);
>>> +            rc = true;
>>> +        }
>>> +        iface->chassis_update_required = false;
>>> +    }
>>> +    return rc;
>>> +}
>>> +
>>>  void
>>>  if_status_mgr_update(struct if_status_mgr *mgr,
>>> -                     struct local_binding_data *binding_data)
>>> +                     struct local_binding_data *binding_data,
>>> +                     const struct sbrec_chassis *chassis_rec,
>>> +                     bool sb_readonly)
>>>  {
>>>      if (!binding_data) {
>>>          return;
>>> @@ -257,13 +355,25 @@ if_status_mgr_update(struct if_status_mgr *mgr,
>>>      struct shash *bindings = &binding_data->bindings;
>>>      struct hmapx_node *node;
>>>
>>> +    /* Interfaces in OIF_MARK_UP state have already set their
>> pb->chassis.
>>> +     * However, it might have been reset by another hv.
>>> +     */
>>>      /* Move all interfaces that have been confirmed "up" by the binding
>> module,
>>>       * from OIF_MARK_UP to OIF_INSTALLED.
>>>       */
>>>      HMAPX_FOR_EACH_SAFE (node, &mgr->ifaces_per_state[OIF_MARK_UP]) {
>>>          struct ovs_iface *iface = node->data;
>>>
>>> -        if (local_binding_is_up(bindings, iface->id)) {
>>> +        if (iface->chassis_update_required) {
>>> +            if (!sb_readonly) {
>>> +                iface->chassis_update_required = false;
>>> +                local_binding_set_pb(bindings, iface->id, chassis_rec,
>>> +                                     NULL, true);
>>> +            } else {
>>> +                continue;
>>> +            }
>>> +        }
>>> +        if (local_binding_is_up(bindings, iface->id, chassis_rec)) {
>>>              ovs_iface_set_state(mgr, iface, OIF_INSTALLED);
>>>          }
>>>      }
>>> @@ -274,23 +384,53 @@ if_status_mgr_update(struct if_status_mgr *mgr,
>>>      HMAPX_FOR_EACH_SAFE (node, &mgr->ifaces_per_state[OIF_MARK_DOWN]) {
>>>          struct ovs_iface *iface = node->data;
>>>
>>> -        if (local_binding_is_down(bindings, iface->id)) {
>>> +        if (!sb_readonly) {
>>> +            local_binding_set_pb(bindings, iface->id, chassis_rec,
>>> +                                 NULL, false);
>>> +        }
>>> +        if (local_binding_is_down(bindings, iface->id, chassis_rec)) {
>>>              ovs_iface_destroy(mgr, iface);
>>>          }
>>>      }
>>>
>>> -    /* Register for a notification about flows being installed in OVS
>> for all
>>> -     * newly claimed interfaces.
>>> +    if (!sb_readonly) {
>>> +        HMAPX_FOR_EACH_SAFE (node,
>> &mgr->ifaces_per_state[OIF_INSTALL_FLOWS]) {
>>> +            struct ovs_iface *iface = node->data;
>>> +
>>> +            if (iface->chassis_update_required) {
>>> +                iface->chassis_update_required = false;
>>> +                local_binding_set_pb(bindings, iface->id, chassis_rec,
>>> +                                     NULL, true);
>>> +            }
>>> +        }
>>> +    }
>>> +
>>> +    /* Update Port_Binding->chassis for newly claimed interfaces
>>> +     * Register for a notification about flows being installed in OVS
>> for all
>>> +     * newly claimed interfaces for which we could update pb->chassis.
>>>       *
>>>       * Move them from OIF_CLAIMED to OIF_INSTALL_FLOWS.
>>>       */
>>> -    bool new_ifaces = false;
>>> -    HMAPX_FOR_EACH_SAFE (node, &mgr->ifaces_per_state[OIF_CLAIMED]) {
>>> -        struct ovs_iface *iface = node->data;
>>>
>>> -        ovs_iface_set_state(mgr, iface, OIF_INSTALL_FLOWS);
>>> -        iface->install_seqno = mgr->iface_seqno + 1;
>>> -        new_ifaces = true;
>>> +    bool new_ifaces = false;
>>> +    if (!sb_readonly) {
>>> +        HMAPX_FOR_EACH_SAFE (node, &mgr->ifaces_per_state[OIF_CLAIMED])
>> {
>>> +            struct ovs_iface *iface = node->data;
>>> +            /* No need to check for chassis_update_required as already
>> done
>>> +             * in if_status_handle_claims or if_status_mgr_claim_iface
>>> +             */
>>> +            ovs_iface_set_state(mgr, iface, OIF_INSTALL_FLOWS);
>>
>> We need to make sure the chassis in SB port-binding is up-to-date (i.e.
>> the update notification from SB DB has been received) before moving to
>> INSTALL_FLOWS. Otherwise, it is still possible that the state is moved too
>> early and end up with incomplete flow installation for the lport when the
>> state is finally moved to INSTALLED.
>>
>> Thanks,
>> Han
>>
>>> +            iface->install_seqno = mgr->iface_seqno + 1;
>>> +            new_ifaces = true;
>>> +        }
>>> +    } else {
>>> +        HMAPX_FOR_EACH_SAFE (node, &mgr->ifaces_per_state[OIF_CLAIMED])
>> {
>>> +            struct ovs_iface *iface = node->data;
>>> +            static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5,
>> 1);
>>> +            VLOG_INFO_RL(&rl,
>>> +                         "Not updating pb chassis for %s now as "
>>> +                         "sb is readonly", iface->id);
>>> +        }
>>>      }
>>>
>>>      /* Request a seqno update when the flows for new interfaces have
>> been
>>> @@ -403,7 +543,7 @@ if_status_mgr_update_bindings(struct if_status_mgr
>> *mgr,
>>>      struct hmapx_node *node;
>>>
>>>      /* Notify the binding module to set "down" all bindings that are
>> still
>>> -     * in the process of being installed in OVS, i.e., are not yet
>> instsalled.
>>> +     * in the process of being installed in OVS, i.e., are not yet
>> installed.
>>>       */
>>>      HMAPX_FOR_EACH (node, &mgr->ifaces_per_state[OIF_INSTALL_FLOWS]) {
>>>          struct ovs_iface *iface = node->data;
>>> diff --git a/controller/if-status.h b/controller/if-status.h
>>> index bb8a3950d..f9b05d30d 100644
>>> --- a/controller/if-status.h
>>> +++ b/controller/if-status.h
>>> @@ -27,15 +27,27 @@ struct if_status_mgr *if_status_mgr_create(void);
>>>  void if_status_mgr_clear(struct if_status_mgr *);
>>>  void if_status_mgr_destroy(struct if_status_mgr *);
>>>
>>> -void if_status_mgr_claim_iface(struct if_status_mgr *, const char
>> *iface_id);
>>> +void if_status_mgr_claim_iface(struct if_status_mgr *,
>>> +                               const struct sbrec_port_binding *pb,
>>> +                               const struct sbrec_chassis *chassis_rec,
>>> +                               bool sb_readonly);
>>>  void if_status_mgr_release_iface(struct if_status_mgr *, const char
>> *iface_id);
>>>  void if_status_mgr_delete_iface(struct if_status_mgr *, const char
>> *iface_id);
>>>
>>> -void if_status_mgr_update(struct if_status_mgr *, struct
>> local_binding_data *);
>>> +void if_status_mgr_update(struct if_status_mgr *, struct
>> local_binding_data *,
>>> +                          const struct sbrec_chassis *chassis,
>>> +                          bool sb_readonly);
>>>  void if_status_mgr_run(struct if_status_mgr *mgr, struct
>> local_binding_data *,
>>>                         const struct sbrec_chassis *,
>>>                         bool sb_readonly, bool ovs_readonly);
>>>  void if_status_mgr_get_memory_usage(struct if_status_mgr *mgr,
>>>                                      struct simap *usage);
>>> +bool if_status_mgr_iface_is_present(struct if_status_mgr *mgr,
>>> +                                    const char *iface_id);
>>> +bool if_status_handle_claims(struct if_status_mgr *mgr,
>>> +                             struct local_binding_data *binding_data,
>>> +                             const struct sbrec_chassis *chassis_rec,
>>> +                             struct hmap *tracked_datapath,
>>> +                             bool sb_readonly);
>>>
>>>  # endif /* controller/if-status.h */
>>> diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c
>>> index 69615308e..3947baf03 100644
>>> --- a/controller/ovn-controller.c
>>> +++ b/controller/ovn-controller.c
>>> @@ -1464,6 +1464,73 @@ en_runtime_data_run(struct engine_node *node,
>> void *data)
>>>      engine_set_node_state(node, EN_UPDATED);
>>>  }
>>>
>>> +struct ed_type_sb_ro {
>>> +    bool sb_readonly;
>>> +};
>>> +
>>> +static void *
>>> +en_sb_ro_init(struct engine_node *node OVS_UNUSED,
>>> +              struct engine_arg *arg OVS_UNUSED)
>>> +{
>>> +    struct ed_type_sb_ro *data = xzalloc(sizeof *data);
>>> +    return data;
>>> +}
>>> +
>>> +static void
>>> +en_sb_ro_run(struct engine_node *node, void *data)
>>> +{
>>> +    struct ed_type_sb_ro *sb_ro_data = data;
>>> +    bool sb_readonly = !engine_get_context()->ovnsb_idl_txn;
>>> +    if (sb_ro_data->sb_readonly != sb_readonly) {
>>> +        sb_ro_data->sb_readonly = sb_readonly;
>>> +        if (!sb_ro_data->sb_readonly) {
>>> +            engine_set_node_state(node, EN_UPDATED);
>>> +        }
>>> +    }
>>> +}
>>> +
>>> +static void
>>> +en_sb_ro_cleanup(void *data OVS_UNUSED)
>>> +{
>>> +}
>>> +
>>> +static bool
>>> +runtime_data_sb_ro_handler(struct engine_node *node, void *data)
>>> +{
>>> +    const struct sbrec_chassis *chassis = NULL;
>>> +
>>> +    struct ovsrec_open_vswitch_table *ovs_table =
>>> +        (struct ovsrec_open_vswitch_table *)EN_OVSDB_GET(
>>> +            engine_get_input("OVS_open_vswitch", node));
>>> +
>>> +    const char *chassis_id = get_ovs_chassis_id(ovs_table);
>>> +
>>> +    struct ovsdb_idl_index *sbrec_chassis_by_name =
>>> +        engine_ovsdb_node_get_index(
>>> +                engine_get_input("SB_chassis", node),
>>> +                "name");
>>> +
>>> +    if (chassis_id) {
>>> +        chassis = chassis_lookup_by_name(sbrec_chassis_by_name,
>> chassis_id);
>>> +    }
>>> +    if (chassis) {
>>> +        struct ed_type_runtime_data *rt_data = data;
>>> +        bool sb_readonly = !engine_get_context()->ovnsb_idl_txn;
>>> +        struct controller_engine_ctx *ctrl_ctx =
>>> +            engine_get_context()->client_ctx;
>>> +
>>> +        if (if_status_handle_claims(ctrl_ctx->if_mgr,
>>> +                                    &rt_data->lbinding_data,
>>> +                                    chassis,
>>> +                                    &rt_data->tracked_dp_bindings,
>>> +                                    sb_readonly)) {
>>> +            engine_set_node_state(node, EN_UPDATED);
>>> +            rt_data->tracked = true;
>>> +        }
>>> +    }
>>> +    return true;
>>> +}
>>> +
>>>  static bool
>>>  runtime_data_ovs_interface_shadow_handler(struct engine_node *node,
>> void *data)
>>>  {
>>> @@ -3528,6 +3595,7 @@ main(int argc, char *argv[])
>>>      stopwatch_create(VIF_PLUG_RUN_STOPWATCH_NAME, SW_MS);
>>>
>>>      /* Define inc-proc-engine nodes. */
>>> +    ENGINE_NODE(sb_ro, "sb_ro");
>>>      ENGINE_NODE_WITH_CLEAR_TRACK_DATA_IS_VALID(ct_zones, "ct_zones");
>>>      ENGINE_NODE_WITH_CLEAR_TRACK_DATA(ovs_interface_shadow,
>>>                                        "ovs_interface_shadow");
>>> @@ -3664,6 +3732,7 @@ main(int argc, char *argv[])
>>>      engine_add_input(&en_ovs_interface_shadow, &en_ovs_interface,
>>>                       ovs_interface_shadow_ovs_interface_handler);
>>>
>>> +    engine_add_input(&en_runtime_data, &en_sb_ro,
>> runtime_data_sb_ro_handler);
>>>      engine_add_input(&en_runtime_data, &en_ofctrl_is_connected, NULL);
>>>
>>>      engine_add_input(&en_runtime_data, &en_ovs_open_vswitch, NULL);
>>> @@ -4098,7 +4167,8 @@ main(int argc, char *argv[])
>>>                          runtime_data ? &runtime_data->lbinding_data :
>> NULL;
>>>                      stopwatch_start(IF_STATUS_MGR_UPDATE_STOPWATCH_NAME,
>>>                                      time_msec());
>>> -                    if_status_mgr_update(if_mgr, binding_data);
>>> +                    if_status_mgr_update(if_mgr, binding_data, chassis,
>>> +                                         !ovnsb_idl_txn);
>>>                      stopwatch_stop(IF_STATUS_MGR_UPDATE_STOPWATCH_NAME,
>>>                                     time_msec());
>>>
>>> diff --git a/tests/ovn-macros.at b/tests/ovn-macros.at
>>> index 335f9158c..8fd6ae6f7 100644
>>> --- a/tests/ovn-macros.at
>>> +++ b/tests/ovn-macros.at
>>> @@ -759,3 +759,15 @@ m4_define([OVN_FOR_EACH_NORTHD_WITHOUT_DP_GROUPS],
>>>       [m4_foreach([NORTHD_USE_DP_GROUPS], [no],
>>>         [m4_foreach([NORTHD_USE_PARALLELIZATION], [yes, no], [$1
>>>  ])])])])
>>> +
>>> +# OVN_NBCTL(NBCTL_COMMAND) adds NBCTL_COMMAND to list of commands to be
>> run by RUN_OVN_NBCTL().
>>> +m4_define([OVN_NBCTL], [
>>> +    command="${command} -- $1"
>>> +])
>>> +
>>> +# RUN_OVN_NBCTL() executes list of commands built by the OVN_NBCTL()
>> macro.
>>> +m4_define([RUN_OVN_NBCTL], [
>>> +    check ovn-nbctl ${command}
>>> +    unset command
>>> +])
>>> +
>>> diff --git a/tests/ovn.at b/tests/ovn.at
>>> index bfaa41962..94d16bac9 100644
>>> --- a/tests/ovn.at
>>> +++ b/tests/ovn.at
>>> @@ -102,6 +102,18 @@ m4_divert_text([PREPARE_TESTS],
>>>           test 1 -le $(as $hv1 ovs-ofctl dump-flows br-int | grep -c
>> "output:$ofport")
>>>       ])
>>>     }
>>> +
>>> +   ovn_wait_remote_input_flows () {
>>> +     hv1=$1
>>> +     hv2=$2
>>> +     echo "$3: waiting for flows for remote input on $hv1"
>>> +     # Wait for a flow outputing  to remote input
>>> +     OVS_WAIT_UNTIL([
>>> +         ofport=$(as $hv1 ovs-vsctl --bare --columns ofport find
>> Interface name=ovn-${hv2}-0)
>>> +         echo "tunnel port=$ofport"
>>> +         test 1 -le $(as $hv1 ovs-ofctl dump-flows br-int | grep -c
>> "in_port=$ofport")
>>> +     ])
>>> +   }
>>>  ])
>>>
>>>  m4_define([OVN_CHECK_PACKETS],
>>> @@ -127,6 +139,8 @@ m4_define([OVN_WAIT_PATCH_PORT_FLOWS],
>>>  m4_define([OVN_WAIT_REMOTE_OUTPUT_FLOWS],
>>>    [ovn_wait_remote_output_flows "$1" "$2" "__file__:__line__"])
>>>
>>> +m4_define([OVN_WAIT_REMOTE_INPUT_FLOWS],
>>> +  [ovn_wait_remote_input_flows "$1" "$2" "__file__:__line__"])
>>>
>>>  AT_BANNER([OVN components])
>>>
>>> @@ -14056,6 +14070,11 @@ wait_column "$hv1_uuid" Port_Binding
>> requested_chassis logical_port=lsp0
>>>  wait_column "$hv2_uuid" Port_Binding additional_chassis
>> logical_port=lsp0
>>>  wait_column "$hv2_uuid" Port_Binding requested_additional_chassis
>> logical_port=lsp0
>>>
>>> +# Check ovn-installed updated for main chassis
>>> +wait_for_ports_up
>>> +OVS_WAIT_UNTIL([test `as hv1 ovs-vsctl get Interface lsp0
>> external_ids:ovn-installed` = '"true"'])
>>> +OVS_WAIT_UNTIL([test x`as hv2 ovs-vsctl get Interface lsp0
>> external_ids:ovn-installed` = x])
>>> +
>>>  # Check that setting iface:encap-ip populates
>> Port_Binding:additional_encap
>>>  wait_row_count Encap 2 chassis_name=hv1
>>>  wait_row_count Encap 2 chassis_name=hv2
>>> @@ -14081,6 +14100,11 @@ wait_column "$hv2_uuid" Port_Binding
>> requested_chassis logical_port=lsp0
>>>  wait_column "" Port_Binding additional_chassis logical_port=lsp0
>>>  wait_column "" Port_Binding requested_additional_chassis
>> logical_port=lsp0
>>>
>>> +# Check ovn-installed updated for main chassis and not for other chassis
>>> +wait_for_ports_up
>>> +OVS_WAIT_UNTIL([test `as hv2 ovs-vsctl get Interface lsp0
>> external_ids:ovn-installed` = '"true"'])
>>> +OVS_WAIT_UNTIL([test x`as hv1 ovs-vsctl get Interface lsp0
>> external_ids:ovn-installed` = x])
>>> +
>>>  # Check that additional_encap is cleared
>>>  wait_column "" Port_Binding additional_encap logical_port=lsp0
>>>
>>> @@ -15370,7 +15394,10 @@ AT_CHECK([as hv1 ovs-ofctl dump-flows br-int
>> table=65 | grep actions=output:1],
>>>  echo "verifying that lsp0 binding moves when requested-chassis is
>> changed"
>>>
>>>  ovn-nbctl lsp-set-options lsp0 requested-chassis=hv2
>>> -OVS_WAIT_UNTIL([test 1 = $(grep -c "Releasing lport lsp0 from this
>> chassis" hv1/ovn-controller.log)])
>>> +
>>> +# We might see multiple "Releasing lport ...", when sb is read only
>>> +OVS_WAIT_UNTIL([test 1 -le $(grep -c "Releasing lport lsp0 from this
>> chassis" hv1/ovn-controller.log)])
>>> +
>>>  wait_column "$hv2_uuid" Port_Binding chassis logical_port=lsp0
>>>
>>>  # (6) Chassis hv2 should add flows and hv1 should not.
>>> @@ -15416,7 +15443,7 @@ AT_CHECK([as hv1 ovs-ofctl dump-flows br-int
>> table=0 | grep in_port=1], [0], [ig
>>>  AT_CHECK([as hv1 ovs-ofctl dump-flows br-int table=65 | grep
>> actions=output:1], [0], [ignore])
>>>
>>>  check ovn-nbctl --wait=hv lsp-set-options lsp0
>> requested-chassis=non-existant-chassis
>>> -OVS_WAIT_UNTIL([test 1 = $(grep -c "Releasing lport lsp0 from this
>> chassis" hv1/ovn-controller.log)])
>>> +OVS_WAIT_UNTIL([test 1 -le $(grep -c "Releasing lport lsp0 from this
>> chassis" hv1/ovn-controller.log)])
>>>  check ovn-nbctl --wait=hv sync
>>>  wait_column '' Port_Binding chasssi logical_port=lsp0
>>>  AT_CHECK([as hv1 ovs-ofctl dump-flows br-int table=0 | grep in_port=1],
>> [1], [])
>>> @@ -32418,3 +32445,119 @@ AT_CHECK([test $(ovn-sbctl list fdb | grep -c
>> "00:00:00:00:10:30") = 0])
>>>  OVN_CLEANUP([hv1])
>>>  AT_CLEANUP
>>>  ])
>>> +
>>> +OVN_FOR_EACH_NORTHD([
>>> +AT_SETUP([recomputes])
>>> +ovn_start
>>> +
>>> +n_hv=4
>>> +
>>> +# Add chassis
>>> +net_add n1
>>> +for i in $(seq 1 $n_hv); do
>>> +    sim_add hv$i
>>> +    as hv$i
>>> +    check ovs-vsctl add-br br-phys
>>> +    ovs-vsctl set open . external-ids:ovn-bridge-mappings=phys:br-phys
>>> +    ovn_attach n1 br-phys 192.168.0.$i 24 geneve
>>> +done
>>> +
>>> +add_switch_ports() {
>>> +    start_port=$1
>>> +    end_port=$2
>>> +    nb_hv=$3
>>> +    bulk_size=$4
>>> +    for ((i=start_port; i<end_port; )) do
>>> +        start_bulk=$i
>>> +        for hv in $(seq 1 $nb_hv); do
>>> +            end_bulk=$((start_bulk+bulk_size-1))
>>> +            for port in $(seq $start_bulk $end_bulk); do
>>> +                logical_switch_port=lsp${port}
>>> +                OVN_NBCTL(lsp-add ls1 $logical_switch_port)
>>> +                OVN_NBCTL(lsp-set-addresses $logical_switch_port
>> dynamic)
>>> +            done
>>> +            start_bulk=$((end_bulk+1))
>>> +        done
>>> +        RUN_OVN_NBCTL()
>>> +
>>> +        start_bulk=$i
>>> +        for hv in $(seq 1 $nb_hv); do
>>> +            end_bulk=$((start_bulk+bulk_size-1))
>>> +            for port in $(seq $start_bulk $end_bulk); do
>>> +                logical_switch_port=lsp${port}
>>> +                as hv$hv ovs-vsctl \
>>> +                    --no-wait -- add-port br-int vif${port} \
>>> +                    -- set Interface vif${port}
>> external_ids:iface-id=$logical_switch_port
>>> +            done
>>> +            start_bulk=$((end_bulk+1))
>>> +        done
>>> +        i=$((end_bulk+1))
>>> +    done
>>> +}
>>> +check ovn-nbctl ls-add ls1
>>> +check ovn-nbctl set Logical_Switch ls1 other_config:subnet=10.1.0.0/16
>>> +check ovn-nbctl set Logical_Switch ls1
>> other_config:exclude_ips=10.1.255.254
>>> +
>>> +check ovn-nbctl lr-add lr1
>>> +check ovn-nbctl lsp-add ls1 lsp0 -- set Logical_Switch_Port lsp0
>> type=router options:router-port=lrp0 addresses=dynamic
>>> +check ovn-nbctl lrp-add lr1 lrp0 "f0:00:00:01:00:01" 10.1.255.254/16
>>> +check ovn-nbctl lr-nat-add lr1 snat 10.2.0.1 10.1.0.0/16
>>> +
>>> +lflow_run=0
>>> +check ovn-nbctl --wait=hv sync
>>> +
>>> +# Tunnel ports might not be added (yet) at this point on slow system.
>>> +# Wait for flows related to such ports to ensure those ports have been
>> added
>>> +# before we measure recomputes. Otherwise, ovs_interface handler might
>> be run
>>> +# afterwards for tunnel ports, causing recomputes.
>>> +for i in $(seq 1 $n_hv); do
>>> +    for j in $(seq 1 $n_hv); do
>>> +        if test $i != $j; then
>>> +            OVN_WAIT_REMOTE_INPUT_FLOWS(["hv$i"],["hv$j"])
>>> +        fi
>>> +    done
>>> +done
>>> +
>>> +for i in $(seq 1 $n_hv); do
>>> +    as hv$i
>>> +    lflow_run1=$(ovn-appctl -t ovn-controller coverage/read-counter
>> lflow_run)
>>> +    lflow_run=`expr $lflow_run1 + $lflow_run`
>>> +done
>>> +
>>> +add_switch_ports 1 1000 $n_hv 5
>>> +
>>> +wait_for_ports_up
>>> +check ovn-nbctl --wait=hv sync
>>> +
>>> +for i in $(seq 1 $n_hv); do
>>> +    pid=$(cat hv${i}/ovn-controller.pid)
>>> +    u=$((u+$(cat "/proc/$pid/stat" | awk '{print $14}')))
>>> +    s=$((s+$(cat "/proc/$pid/stat" | awk '{print $15}')))
>>> +done
>>> +
>>> +n_pid=$(cat northd/ovn-northd.pid)
>>> +n_u=$(cat "/proc/$pid/stat" | awk '{print $14}')
>>> +n_s=$(cat "/proc/$pid/stat" | awk '{print $15}')
>>> +
>>> +echo "Total Northd User Time: $n_u"
>>> +echo "Total Northd System Time: $n_s"
>>> +echo "Total Controller User Time: $u"
>>> +echo "Total Controller System Time: $s"
>>> +
>>> +lflow_run_end=0
>>> +for i in $(seq 1 $n_hv); do
>>> +    as hv$i
>>> +    lflow_run1=$(ovn-appctl -t ovn-controller coverage/read-counter
>> lflow_run)
>>> +    lflow_run_end=`expr $lflow_run1 + $lflow_run_end`
>>> +done
>>> +n_recomputes=`expr $lflow_run_end - $lflow_run`
>>> +echo "$n_recomputes recomputes"
>>> +
>>> +AT_CHECK([test $lflow_run_end == $lflow_run])
>>> +
>>> +for i in $(seq 2 $n_hv); do
>>> +    OVN_CLEANUP_SBOX([hv$i])
>>> +done
>>> +OVN_CLEANUP([hv1])
>>> +AT_CLEANUP
>>> +])
>>> diff --git a/tests/perf-northd.at b/tests/perf-northd.at
>>> index 74b69e9d4..6ec196b36 100644
>>> --- a/tests/perf-northd.at
>>> +++ b/tests/perf-northd.at
>>> @@ -76,23 +76,6 @@ m4_define([PERF_RECORD_STOP], [
>>>      PERF_RECORD_STOPWATCH(ovn-northd-loop, ["Short term average"],
>> [Average (northd-loop in msec)])
>>>  ])
>>>
>>> -# OVN_NBCTL([NBCTL_COMMAND])
>>> -#
>>> -# Add NBCTL_COMMAND to list of commands to be run by RUN_OVN_NBCTL().
>>> -#
>>> -m4_define([OVN_NBCTL], [
>>> -    command="${command} -- $1"
>>> -])
>>> -
>>> -# RUN_OVN_NBCTL()
>>> -#
>>> -# Execute list of commands built by the OVN_NBCTL() macro.
>>> -#
>>> -m4_define([RUN_OVN_NBCTL], [
>>> -    check ovn-nbctl ${command}
>>> -    unset command
>>> -])
>>> -
>>>  OVS_START_SHELL_HELPERS
>>>  generate_subnet () {
>>>      local a=$(printf %d $(expr $1 / 256 + 10))
>>> --
>>> 2.31.1
>>>
>>
>
Han Zhou July 12, 2022, 6:52 a.m. UTC | #5
On Mon, Jul 11, 2022 at 4:55 AM Dumitru Ceara <dceara@redhat.com> wrote:
>
> On 7/11/22 13:31, Xavier Simonart wrote:
> > Hi Han
> >
> > Thanks for your review.
> >
> > Let me try to understand your two main concerns and the proper way to
fix
> > it.
> > 1) We only try once to write pb->chassis. If the commit fails,
pb->chassis
> > is not written. As commit fails, we will recompute, but as the
> > update_required flag is not set anymore, we might end up with no
> > pb->chassis.
> > => I'll remove the flag and try to update until it's confirmed.

Thank you!

> > 2) The state machine, and when we move to INSTALL_FLOWS. Serializing the
> > state machine, by waiting for confirmation to be received before moving
to
> > INSTALL_FLOWS state will delay the ovn-installed compared to today. So
I am
> > (still) trying to see if there is any way to prevent this in some cases.
> > Would it be correct to do this serialization (wait for pb->chassis
update
> > confirmation) only when using conditional monitoring? When using
> > monitor-all, as soon as we have written (w/o confirmation) pb->chassis,
we
> > would move to INSTALL_FLOWS. In that loop where we wrote pb->chassis,
(all)
> > the flows should be updated taking into account pb->chassis.
>
> I think this approach is OK.  ovn-controllers that don't use conditional
> monitoring already know the complete SB contents and should be able to
> install (mostly?) complete sets of openflows that correspond to a given
> Port_Binding.
>
> It's also "CMS-friendly", at least for ovn-kubernetes, which uses
> ovn-monitor-all=true and only waits for
> OVS.Interface.external_ids:ovn-installed=true.  And shouldn't impact the
> others which wait for SB.Port_Binding.up=true.
>

Xavier and Dumitru, I think we shouldn't introduce the dependency of
"ovn-monitor-all" setting here.

First of all, ovn-installed is a flag for CMS to understand that all the
flows related to the port-binding is installed. If we set the flag before
it is truly completed, it is a bug, and it is possible that the flag is set
but some traffic doesn't work.
I did a quick test, and at least a flow in (table_id=38, priority=100)
which is multicast-group related is updated AFTER the SB notification is
received for the port-binding chassis update.

Secondly, if the change hasn't made it to the SB, all the other nodes would
not be able to reach the port, which means the workload (pod/VM) cannot
receive traffic yet at this phase.

So, I think our goal is not to set ovn-installed early, but to set it
accurately (sometime may be ok to be conservative).

In addition, ovn-monitor-all is not always true even in ovn-k8s. It is
configurable in ovn-k8s. (in our environment we set it to false, to save
MEM and CPU for worker nodes, while sacrifice a little for the central SB
DB)

Thanks,
Han
> >
> > Thanks again for your feedback
> >
> > Xavier
> >
> > On Thu, Jul 7, 2022 at 8:49 AM Han Zhou <hzhou@ovn.org> wrote:
> >
> >>
> >>
> >> On Wed, Jun 22, 2022 at 3:23 AM Xavier Simonart <xsimonar@redhat.com>
> >> wrote:
> >>>
> >>> When VIF ports are claimed on a chassis, SBDB Port_Binding table is
> >> updated.
> >>> If the SBDB IDL is still is read-only ("in transaction") when such a
> >> update
> >>> is required, the update is not possible and recompute is triggered
> >> through
> >>> I+P failure.
> >>>
> >>> This situation can happen:
> >>> - after updating Port_Binding->chassis to SBDB for one port, in a
> >> following
> >>>   iteration, ovn-controller handles
Interface:external_ids:ovn-installed
> >>>   (for the same port) while SBDB is still read-only.
> >>> - after updating Port_Binding->chassis to SBDB for one port, in a
> >> following
> >>>   iteration, ovn-controller updates Port_Binding->chassis for another
> >> port,
> >>>   while SBDB is still read-only.
> >>>
> >>> This patch prevent the recompute, by having the if-status module
> >>> updating the Port_Binding chassis (if needed) when possible.
> >>> This does not delay Port_Binding chassis update compared to before
this
> >> patch.
> >>> - With the patch, Port_Binding chassis will be updated as soon as
SBDB is
> >>> again writable, without recompute.
> >>> - Without the patch, Port_Binding chassis was updated as soon as SBDB
was
> >>> again writable, through a recompute.
> >>>
> >>> As part of this patch, ovn-installed will not be updated for
additional
> >> chassis;
> >>> it will only be updated when the migration is completed.
> >>>
> >>> Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2059253
> >>> Signed-off-by: Xavier Simonart <xsimonar@redhat.com>
> >>>
> >>> ---
> >>> v2:  - handled Dumitru's comments.
> >>>      - handled Han's comments, mainly ensure we moved out of CLAIMED
> >> state
> >>>        only after updating pb->chassis to guarentee physical flows are
> >> installed
> >>>        when ovn-installed is updated in OVS.
> >>>      - slighly reorganize the code to isolate 'notify_up = false'
cases
> >> in
> >>>        claim_port (i.e. ports such as virtual ports), in the idea of
> >> making
> >>>        future patch preventing recomputes when virtual ports are
claimed.
> >>>      - updated test case to cause more race conditions.
> >>>      - rebased on origin/main
> >>>      - note that "additional chassis" as now supported by
> >>>        "Support LSP:options:requested-chassis as a list" might still
> >> cause
> >>>        recomputes.
> >>>      - fixed missing flows when Port_Binding chassis was updated by
> >> mgr_update
> >>>        w/o any lflow recalculation.
> >>> v3:  - handled Dumitru's comments on v2, mainly have runtime_data
handler
> >>>        handling pb_claims when sb becomes writable (instead of a lflow
> >> handler).
> >>>      - fixed test as it was not checking recomputes on all hv, as well
> >> as a flaky
> >>>        behavior.
> >>>      - rebased on origin/main.
> >>> ---
> >>>  controller/binding.c        | 154 +++++++++++++++++++++----------
> >>>  controller/binding.h        |  15 +++-
> >>>  controller/if-status.c      | 174
++++++++++++++++++++++++++++++++----
> >>>  controller/if-status.h      |  16 +++-
> >>>  controller/ovn-controller.c |  72 ++++++++++++++-
> >>>  tests/ovn-macros.at         |  12 +++
> >>>  tests/ovn.at                | 147 +++++++++++++++++++++++++++++-
> >>>  tests/perf-northd.at        |  17 ----
> >>>  8 files changed, 519 insertions(+), 88 deletions(-)
> >>>
> >>> diff --git a/controller/binding.c b/controller/binding.c
> >>> index 2279570f9..b21577f71 100644
> >>> --- a/controller/binding.c
> >>> +++ b/controller/binding.c
> >>> @@ -644,11 +644,17 @@ local_binding_get_lport_ofport(const struct
shash
> >> *local_bindings,
> >>>  }
> >>>
> >>>  bool
> >>> -local_binding_is_up(struct shash *local_bindings, const char
*pb_name)
> >>> +local_binding_is_up(struct shash *local_bindings, const char
*pb_name,
> >>> +                    const struct sbrec_chassis *chassis_rec)
> >>>  {
> >>>      struct local_binding *lbinding =
> >>>          local_binding_find(local_bindings, pb_name);
> >>>      struct binding_lport *b_lport =
> >> local_binding_get_primary_lport(lbinding);
> >>> +
> >>> +    if (b_lport && b_lport->pb->chassis != chassis_rec) {
> >>> +        return false;
> >>> +    }
> >>> +
> >>>      if (lbinding && b_lport && lbinding->iface) {
> >>>          if (b_lport->pb->n_up && !b_lport->pb->up[0]) {
> >>>              return false;
> >>> @@ -660,13 +666,23 @@ local_binding_is_up(struct shash
*local_bindings,
> >> const char *pb_name)
> >>>  }
> >>>
> >>>  bool
> >>> -local_binding_is_down(struct shash *local_bindings, const char
*pb_name)
> >>> +local_binding_is_down(struct shash *local_bindings, const char
*pb_name,
> >>> +                      const struct sbrec_chassis *chassis_rec)
> >>>  {
> >>>      struct local_binding *lbinding =
> >>>          local_binding_find(local_bindings, pb_name);
> >>>
> >>>      struct binding_lport *b_lport =
> >> local_binding_get_primary_lport(lbinding);
> >>>
> >>> +    if (b_lport) {
> >>> +        if (b_lport->pb->chassis == chassis_rec) {
> >>> +            return false;
> >>> +        } else if (b_lport->pb->chassis) {
> >>> +            VLOG_DBG("lport %s already claimed by other chassis",
> >>> +                     b_lport->pb->logical_port);
> >>> +        }
> >>> +    }
> >>> +
> >>>      if (!lbinding) {
> >>>          return true;
> >>>      }
> >>> @@ -884,37 +900,80 @@ get_lport_type_str(enum en_lport_type
lport_type)
> >>>      OVS_NOT_REACHED();
> >>>  }
> >>>
> >>> -/* For newly claimed ports, if 'notify_up' is 'false':
> >>> +void
> >>> +set_pb_chassis_in_sbrec(const struct sbrec_port_binding *pb,
> >>> +                        const struct sbrec_chassis *chassis_rec,
> >>> +                        bool is_set)
> >>> +{
> >>> +    if (pb->chassis != chassis_rec) {
> >>> +         if (is_set) {
> >>> +            if (pb->chassis) {
> >>> +                VLOG_INFO("Changing chassis for lport %s from %s to
> >> %s.",
> >>> +                          pb->logical_port, pb->chassis->name,
> >>> +                          chassis_rec->name);
> >>> +            } else {
> >>> +                VLOG_INFO("Claiming lport %s for this chassis.",
> >>> +                          pb->logical_port);
> >>> +            }
> >>> +            for (int i = 0; i < pb->n_mac; i++) {
> >>> +                VLOG_INFO("%s: Claiming %s", pb->logical_port,
> >> pb->mac[i]);
> >>> +            }
> >>> +            sbrec_port_binding_set_chassis(pb, chassis_rec);
> >>> +        }
> >>> +    } else if (!is_set) {
> >>> +        sbrec_port_binding_set_chassis(pb, NULL);
> >>> +    }
> >>> +}
> >>> +
> >>> +void
> >>> +local_binding_set_pb(struct shash *local_bindings, const char
*pb_name,
> >>> +                     const struct sbrec_chassis *chassis_rec,
> >>> +                     struct hmap *tracked_datapaths, bool is_set)
> >>> +{
> >>> +    struct local_binding *lbinding =
> >>> +        local_binding_find(local_bindings, pb_name);
> >>> +    struct binding_lport *b_lport =
> >> local_binding_get_primary_lport(lbinding);
> >>> +
> >>> +    if (b_lport) {
> >>> +        set_pb_chassis_in_sbrec(b_lport->pb, chassis_rec, is_set);
> >>> +        if (tracked_datapaths) {
> >>> +            update_lport_tracking(b_lport->pb, tracked_datapaths,
true);
> >>> +        }
> >>> +    }
> >>> +}
> >>> +
> >>> +/* For newly claimed ports:
> >>>   * - set the 'pb.up' field to true if 'pb' has no 'parent_pb'.
> >>>   * - set the 'pb.up' field to true if 'parent_pb.up' is 'true' (e.g.,
> >> for
> >>>   *   container and virtual ports).
> >>> - * Otherwise request a notification to be sent when the OVS flows
> >>> - * corresponding to 'pb' have been installed.
> >>> + *
> >>> + * Returns false if lport is not claimed due to 'sb_readonly'.
> >>> + * Returns true otherwise.
> >>>   *
> >>>   * Note:
> >>> - *   Updates (directly or through a notification) the 'pb->up' field
> >> only if
> >>> - *   it's explicitly set to 'false'.
> >>> + *   Updates the 'pb->up' field only if it's explicitly set to
'false'.
> >>>   *   This is to ensure compatibility with older versions of
ovn-northd.
> >>>   */
> >>> -static void
> >>> +static bool
> >>>  claimed_lport_set_up(const struct sbrec_port_binding *pb,
> >>>                       const struct sbrec_port_binding *parent_pb,
> >>> -                     const struct sbrec_chassis *chassis_rec,
> >>> -                     bool notify_up, struct if_status_mgr *if_mgr)
> >>> +                     bool sb_readonly)
> >>>  {
> >>> -    if (!notify_up) {
> >>> -        bool up = true;
> >>> -        if (!parent_pb || (parent_pb->n_up && parent_pb->up[0])) {
> >>> +    /* When notify_up is false in claim_port(), no state is created
> >>> +     * by if_status_mgr. In such cases, return false (i.e. trigger
> >> recompute)
> >>> +     * if we can't update sb (because it is readonly).
> >>> +     */
> >>> +    bool up = true;
> >>> +    if (!parent_pb || (parent_pb->n_up && parent_pb->up[0])) {
> >>> +        if (!sb_readonly) {
> >>>              if (pb->n_up) {
> >>>                  sbrec_port_binding_set_up(pb, &up, 1);
> >>>              }
> >>> +        } else if (pb->n_up && !pb->up[0]) {
> >>> +            return false;
> >>>          }
> >>> -        return;
> >>> -    }
> >>> -
> >>> -    if (pb->chassis != chassis_rec || (pb->n_up && !pb->up[0])) {
> >>> -        if_status_mgr_claim_iface(if_mgr, pb->logical_port);
> >>>      }
> >>> +    return true;
> >>>  }
> >>>
> >>>  typedef void (*set_func)(const struct sbrec_port_binding *pb,
> >>> @@ -1057,37 +1116,35 @@ claim_lport(const struct sbrec_port_binding
*pb,
> >>>              struct hmap *tracked_datapaths,
> >>>              struct if_status_mgr *if_mgr)
> >>>  {
> >>> -    if (!sb_readonly) {
> >>> -        claimed_lport_set_up(pb, parent_pb, chassis_rec, notify_up,
> >> if_mgr);
> >>> -    }
> >>> -
> >>>      enum can_bind can_bind =
> >> lport_can_bind_on_this_chassis(chassis_rec, pb);
> >>>      bool update_tracked = false;
> >>>
> >>>      if (can_bind == CAN_BIND_AS_MAIN) {
> >>>          if (pb->chassis != chassis_rec) {
> >>> -            if (sb_readonly) {
> >>> -                return false;
> >>> -            }
> >>> -
> >>> -            if (pb->chassis) {
> >>> -                VLOG_INFO("Changing chassis for lport %s from %s to
> >> %s.",
> >>> -                        pb->logical_port, pb->chassis->name,
> >>> -                        chassis_rec->name);
> >>> -            } else {
> >>> -                VLOG_INFO("Claiming lport %s for this chassis.",
> >>> -                          pb->logical_port);
> >>> -            }
> >>> -            for (size_t i = 0; i < pb->n_mac; i++) {
> >>> -                VLOG_INFO("%s: Claiming %s", pb->logical_port,
> >> pb->mac[i]);
> >>> -            }
> >>> -
> >>> -            sbrec_port_binding_set_chassis(pb, chassis_rec);
> >>>              if (is_additional_chassis(pb, chassis_rec)) {
> >>> +                if (sb_readonly) {
> >>> +                    return false;
> >>> +                }
> >>>                  remove_additional_chassis(pb, chassis_rec);
> >>>              }
> >>>              update_tracked = true;
> >>>          }
> >>> +        if (!notify_up) {
> >>> +            if (!claimed_lport_set_up(pb, parent_pb, sb_readonly)) {
> >>> +                return false;
> >>> +            }
> >>> +            if (pb->chassis != chassis_rec) {
> >>> +                if (sb_readonly) {
> >>> +                    return false;
> >>> +                }
> >>> +                set_pb_chassis_in_sbrec(pb, chassis_rec, true);
> >>> +            }
> >>> +        } else {
> >>> +            if ((pb->chassis != chassis_rec) || (pb->n_up &&
> >> !pb->up[0])) {
> >>> +                if_status_mgr_claim_iface(if_mgr, pb, chassis_rec,
> >>> +                                          sb_readonly);
> >>> +            }
> >>> +        }
> >>>      } else if (can_bind == CAN_BIND_AS_ADDITIONAL) {
> >>>          if (!is_additional_chassis(pb, chassis_rec)) {
> >>>              if (sb_readonly) {
> >>> @@ -1132,7 +1189,8 @@ claim_lport(const struct sbrec_port_binding *pb,
> >>>   */
> >>>  static bool
> >>>  release_lport_main_chassis(const struct sbrec_port_binding *pb,
> >>> -                           bool sb_readonly)
> >>> +                           bool sb_readonly,
> >>> +                           struct if_status_mgr *if_mgr)
> >>>  {
> >>>      if (pb->encap) {
> >>>          if (sb_readonly) {
> >>> @@ -1141,11 +1199,13 @@ release_lport_main_chassis(const struct
> >> sbrec_port_binding *pb,
> >>>          sbrec_port_binding_set_encap(pb, NULL);
> >>>      }
> >>>
> >>> +    /* If sb readonly, pb->chassis unset through if-status if
present.
> >> */
> >>>      if (pb->chassis) {
> >>> -        if (sb_readonly) {
> >>> +        if (!sb_readonly) {
> >>> +            sbrec_port_binding_set_chassis(pb, NULL);
> >>> +        } else if (!if_status_mgr_iface_is_present(if_mgr,
> >> pb->logical_port)) {
> >>>              return false;
> >>>          }
> >>> -        sbrec_port_binding_set_chassis(pb, NULL);
> >>>      }
> >>>
> >>>      if (pb->virtual_parent) {
> >>> @@ -1155,7 +1215,8 @@ release_lport_main_chassis(const struct
> >> sbrec_port_binding *pb,
> >>>          sbrec_port_binding_set_virtual_parent(pb, NULL);
> >>>      }
> >>>
> >>> -    VLOG_INFO("Releasing lport %s from this chassis.",
> >> pb->logical_port);
> >>> +    VLOG_INFO("Releasing lport %s from this chassis
(sb_readonly=%d)",
> >>> +              pb->logical_port, sb_readonly);
> >>>      return true;
> >>>  }
> >>>
> >>> @@ -1189,7 +1250,7 @@ release_lport(const struct sbrec_port_binding
*pb,
> >>>                struct hmap *tracked_datapaths, struct if_status_mgr
> >> *if_mgr)
> >>>  {
> >>>      if (pb->chassis == chassis_rec) {
> >>> -        if (!release_lport_main_chassis(pb, sb_readonly)) {
> >>> +        if (!release_lport_main_chassis(pb, sb_readonly, if_mgr)) {
> >>>              return false;
> >>>          }
> >>>      } else if (is_additional_chassis(pb, chassis_rec)) {
> >>> @@ -1271,7 +1332,7 @@ consider_vif_lport_(const struct
> >> sbrec_port_binding *pb,
> >>>                               b_lport->lbinding->iface,
> >>>                               !b_ctx_in->ovnsb_idl_txn,
> >>>                               !parent_pb,
b_ctx_out->tracked_dp_bindings,
> >>> -                             b_ctx_out->if_mgr)){
> >>> +                             b_ctx_out->if_mgr)) {
> >>>                  return false;
> >>>              }
> >>>
> >>> @@ -1527,7 +1588,8 @@ consider_localport(const struct
sbrec_port_binding
> >> *pb,
> >>>      enum can_bind can_bind = lport_can_bind_on_this_chassis(
> >>>          b_ctx_in->chassis_rec, pb);
> >>>      if (can_bind == CAN_BIND_AS_MAIN) {
> >>> -        if (!release_lport_main_chassis(pb,
!b_ctx_in->ovnsb_idl_txn)) {
> >>> +        if (!release_lport_main_chassis(pb, !b_ctx_in->ovnsb_idl_txn,
> >>> +            b_ctx_out->if_mgr)) {
> >>>              return false;
> >>>          }
> >>>      } else if (can_bind == CAN_BIND_AS_ADDITIONAL) {
> >>> diff --git a/controller/binding.h b/controller/binding.h
> >>> index 1fed06674..d20659b0b 100644
> >>> --- a/controller/binding.h
> >>> +++ b/controller/binding.h
> >>> @@ -151,8 +151,10 @@ const struct sbrec_port_binding
> >> *local_binding_get_primary_pb(
> >>>  ofp_port_t local_binding_get_lport_ofport(const struct shash
> >> *local_bindings,
> >>>                                            const char *pb_name);
> >>>
> >>> -bool local_binding_is_up(struct shash *local_bindings, const char
> >> *pb_name);
> >>> -bool local_binding_is_down(struct shash *local_bindings, const char
> >> *pb_name);
> >>> +bool local_binding_is_up(struct shash *local_bindings, const char
> >> *pb_name,
> >>> +                         const struct sbrec_chassis *);
> >>> +bool local_binding_is_down(struct shash *local_bindings, const char
> >> *pb_name,
> >>> +                           const struct sbrec_chassis *);
> >>>  void local_binding_set_up(struct shash *local_bindings, const char
> >> *pb_name,
> >>>                            const struct sbrec_chassis *chassis_rec,
> >>>                            const char *ts_now_str, bool sb_readonly,
> >>> @@ -160,7 +162,10 @@ void local_binding_set_up(struct shash
> >> *local_bindings, const char *pb_name,
> >>>  void local_binding_set_down(struct shash *local_bindings, const char
> >> *pb_name,
> >>>                              const struct sbrec_chassis *chassis_rec,
> >>>                              bool sb_readonly, bool ovs_readonly);
> >>> -
> >>> +void local_binding_set_pb(struct shash *local_bindings, const char
> >> *pb_name,
> >>> +                          const struct sbrec_chassis *chassis_rec,
> >>> +                          struct hmap *tracked_datapaths,
> >>> +                          bool is_set);
> >>>  void binding_register_ovs_idl(struct ovsdb_idl *);
> >>>  void binding_run(struct binding_ctx_in *, struct binding_ctx_out *);
> >>>  bool binding_cleanup(struct ovsdb_idl_txn *ovnsb_idl_txn,
> >>> @@ -178,6 +183,10 @@ void binding_dump_local_bindings(struct
> >> local_binding_data *, struct ds *);
> >>>  bool is_additional_chassis(const struct sbrec_port_binding *pb,
> >>>                             const struct sbrec_chassis *chassis_rec);
> >>>
> >>> +void set_pb_chassis_in_sbrec(const struct sbrec_port_binding *pb,
> >>> +                             const struct sbrec_chassis *chassis_rec,
> >>> +                             bool is_set);
> >>> +
> >>>  /* Corresponds to each Port_Binding.type. */
> >>>  enum en_lport_type {
> >>>      LP_UNKNOWN,
> >>> diff --git a/controller/if-status.c b/controller/if-status.c
> >>> index ad61844d8..7693c289b 100644
> >>> --- a/controller/if-status.c
> >>> +++ b/controller/if-status.c
> >>> @@ -24,6 +24,7 @@
> >>>  #include "lib/util.h"
> >>>  #include "timeval.h"
> >>>  #include "openvswitch/vlog.h"
> >>> +#include "lib/ovn-sb-idl.h"
> >>>
> >>>  VLOG_DEFINE_THIS_MODULE(if_status);
> >>>
> >>> @@ -53,9 +54,11 @@ VLOG_DEFINE_THIS_MODULE(if_status);
> >>>   */
> >>>
> >>>  enum if_state {
> >>> -    OIF_CLAIMED,       /* Newly claimed interface. */
> >>> -    OIF_INSTALL_FLOWS, /* Already claimed interface for which flows
are
> >> still
> >>> -                        * being installed.
> >>> +    OIF_CLAIMED,       /* Newly claimed interface. pb->chassis not
yet
> >> updated.
> >>> +                        */
> >>> +    OIF_INSTALL_FLOWS, /* Claimed interface with pb->chassis
> >> successfully
> >>> +                        * updated in SB and for which flows are still
> >> being
> >>> +                        * installed.
> >>>                          */
> >>>      OIF_MARK_UP,       /* Interface with flows successfully installed
> >> in OVS
> >>>                          * but not yet marked "up" in the binding
module
> >> (in
> >>> @@ -78,6 +81,55 @@ static const char *if_state_names[] = {
> >>>      [OIF_INSTALLED]     = "INSTALLED",
> >>>  };
> >>>
> >>> +/*
> >>> + *       +----------------------+
> >>> + * +---> |                      |
> >>> + * | +-> |         NULL         |
> >> <--------------------------------------+++-+
> >>> + * | |   +----------------------+
> >>      |
> >>> + * | |     ^ release_iface   | claim_iface
> >>       |
> >>> + * | |     |                 V - sbrec_update_chassis(if sb is rw)
> >>       |
> >>> + * | |   +----------------------+
> >>      |
> >>> + * | |   |                      |
> >> <----------------------------------------+ |
> >>> + * | |   |       CLAIMED        |
> >> <--------------------------------------+ | |
> >>> + * | |   +----------------------+
> >>  | | |
> >>> + * | |                  | mgr_update(when sb is rw)
> >>  | | |
> >>> + * | | release_iface    |  - sbrec_update_chassis
> >>  | | |
> >>> + * | |                  |  - request seqno
> >>   | | |
> >>> + * | |                  V
> >>  | | |
> >>> + * | |   +----------------------+
> >>  | | |
> >>> + * | +-- |                      |  mgr_run(seqno not rcvd)
> >>   | | |
> >>> + * |     |    INSTALL_FLOWS     |   - set port down in sb
> >>  | | |
> >>> + * |     |                      |  mgr_update()
> >>  | | |
> >>> + * |     +----------------------+   - sbrec_update_chassis if needed
> >>   | | |
> >>> + * |                    |
> >>  | | |
> >>> + * |                    |  mgr_run(seqno rcvd)
> >>   | | |
> >>> + * |                    |  - set port up in sb
> >>   | | |
> >>> + * | release_iface      |  - set ovn-installed in ovs
> >>  | | |
> >>> + * |                    V
> >>  | | |
> >>> + * |   +----------------------+
> >>  | | |
> >>> + * |   |                      |  mgr_run()
> >>   | | |
> >>> + * +-- |       MARK_UP        |  - set port up in sb
> >>   | | |
> >>> + *     |                      |  - set ovn-installed in ovs
> >>  | | |
> >>> + *     |                      |  mgr_update()
> >>  | | |
> >>> + *     +----------------------+  - sbrec_update_chassis if needed
> >>  | | |
> >>> + *              |
> >>  | | |
> >>> + *              | mgr_update(rcvd port up / ovn_installed & chassis
> >> set) | | |
> >>> + *              V
> >>  | | |
> >>> + *     +----------------------+
> >>  | | |
> >>> + *     |      INSTALLED       | ------------> claim_iface
> >> ---------------+ | |
> >>> + *     +----------------------+
> >>    | |
> >>> + *              |
> >>    | |
> >>> + *              | release_iface
> >>    | |
> >>> + *              V
> >>    | |
> >>> + *     +----------------------+
> >>    | |
> >>> + *     |                      | ------------> claim_iface
> >> -----------------+ |
> >>> + *     |      MARK_DOWN       | ------> mgr_update(rcvd port down)
> >> ----------+
> >>> + *     |                      | mgr_run()
> >>> + *     |                      | - set port down in sb
> >>> + *     |                      | mgr_update()
> >>> + *     +----------------------+ - sbrec_update_chassis(NULL)
> >>> + */
> >>> +
> >>>  struct ovs_iface {
> >>>      char *id;               /* Extracted from OVS
> >> external_ids.iface_id. */
> >>>      enum if_state state;    /* State of the interface in the state
> >> machine. */
> >>> @@ -85,6 +137,7 @@ struct ovs_iface {
> >>>                               * be fully programmed in OVS.  Only used
> >> in state
> >>>                               * OIF_INSTALL_FLOWS.
> >>>                               */
> >>> +    bool chassis_update_required;  /* If true, pb->chassis must be
> >> updated. */
> >>>  };
> >>>
> >>>  static uint64_t ifaces_usage;
> >>> @@ -158,14 +211,23 @@ if_status_mgr_destroy(struct if_status_mgr *mgr)
> >>>  }
> >>>
> >>>  void
> >>> -if_status_mgr_claim_iface(struct if_status_mgr *mgr, const char
> >> *iface_id)
> >>> +if_status_mgr_claim_iface(struct if_status_mgr *mgr,
> >>> +                          const struct sbrec_port_binding *pb,
> >>> +                          const struct sbrec_chassis *chassis_rec,
> >>> +                          bool sb_readonly)
> >>>  {
> >>> +    const char *iface_id = pb->logical_port;
> >>>      struct ovs_iface *iface = shash_find_data(&mgr->ifaces,
iface_id);
> >>>
> >>>      if (!iface) {
> >>>          iface = ovs_iface_create(mgr, iface_id, OIF_CLAIMED);
> >>>      }
> >>> -
> >>> +    if (!sb_readonly) {
> >>> +        set_pb_chassis_in_sbrec(pb, chassis_rec, true);
> >>> +        iface->chassis_update_required = false;
> >>> +    } else {
> >>> +        iface->chassis_update_required = true;
> >>> +    }
> >>>      switch (iface->state) {
> >>>      case OIF_CLAIMED:
> >>>      case OIF_INSTALL_FLOWS:
> >>> @@ -182,6 +244,12 @@ if_status_mgr_claim_iface(struct if_status_mgr
> >> *mgr, const char *iface_id)
> >>>      }
> >>>  }
> >>>
> >>> +bool
> >>> +if_status_mgr_iface_is_present(struct if_status_mgr *mgr, const char
> >> *iface_id)
> >>> +{
> >>> +    return !!shash_find_data(&mgr->ifaces, iface_id);
> >>> +}
> >>> +
> >>>  void
> >>>  if_status_mgr_release_iface(struct if_status_mgr *mgr, const char
> >> *iface_id)
> >>>  {
> >>> @@ -246,9 +314,39 @@ if_status_mgr_delete_iface(struct if_status_mgr
> >> *mgr, const char *iface_id)
> >>>      }
> >>>  }
> >>>
> >>> +bool
> >>> +if_status_handle_claims(struct if_status_mgr *mgr,
> >>> +                        struct local_binding_data *binding_data,
> >>> +                        const struct sbrec_chassis *chassis_rec,
> >>> +                        struct hmap *tracked_datapath,
> >>> +                        bool sb_readonly)
> >>> +{
> >>> +    if (!binding_data || sb_readonly) {
> >>> +        return false;
> >>> +    }
> >>> +
> >>> +    struct shash *bindings = &binding_data->bindings;
> >>> +    struct hmapx_node *node;
> >>> +
> >>> +    bool rc = false;
> >>> +    HMAPX_FOR_EACH (node, &mgr->ifaces_per_state[OIF_CLAIMED]) {
> >>> +        struct ovs_iface *iface = node->data;
> >>> +        if (iface->chassis_update_required) {
> >>
> >> Thanks Xavier for the revision. The state machine looks more clear now,
> >> but I have a major concern for the use of chassis_update_required. This
> >> bool flag is used to decide if an update to SB is needed, and once a SB
> >> update is requested, it is set to false, and assumes the SB update will
> >> succeed immediately. However, the assumption may be wrong. There can be
> >> different kinds of reasons that the subsequent SB update fails, or
delayed,
> >> so this flag is not reliable. Instead, in CLAIMED state, the
responsibility
> >> to make sure the SB update is completed. If the transaction is
in-progress,
> >> the sb_readonly is true. So if sb_readonly is false, it means nothing
is
> >> in-progress, so we can always check if (!sb_readonly && <SB chassis is
not
> >> updated for the port-binding>) we should just send the update,
regardless
> >> of whether we have requested it before. Please also see another comment
> >> below for the state transition.
> >>
> >>> +            VLOG_INFO("if_status_handle_claims for %s", iface->id);
> >>> +            local_binding_set_pb(bindings, iface->id, chassis_rec,
> >>> +                                 tracked_datapath, true);
> >>> +            rc = true;
> >>> +        }
> >>> +        iface->chassis_update_required = false;
> >>> +    }
> >>> +    return rc;
> >>> +}
> >>> +
> >>>  void
> >>>  if_status_mgr_update(struct if_status_mgr *mgr,
> >>> -                     struct local_binding_data *binding_data)
> >>> +                     struct local_binding_data *binding_data,
> >>> +                     const struct sbrec_chassis *chassis_rec,
> >>> +                     bool sb_readonly)
> >>>  {
> >>>      if (!binding_data) {
> >>>          return;
> >>> @@ -257,13 +355,25 @@ if_status_mgr_update(struct if_status_mgr *mgr,
> >>>      struct shash *bindings = &binding_data->bindings;
> >>>      struct hmapx_node *node;
> >>>
> >>> +    /* Interfaces in OIF_MARK_UP state have already set their
> >> pb->chassis.
> >>> +     * However, it might have been reset by another hv.
> >>> +     */
> >>>      /* Move all interfaces that have been confirmed "up" by the
binding
> >> module,
> >>>       * from OIF_MARK_UP to OIF_INSTALLED.
> >>>       */
> >>>      HMAPX_FOR_EACH_SAFE (node, &mgr->ifaces_per_state[OIF_MARK_UP]) {
> >>>          struct ovs_iface *iface = node->data;
> >>>
> >>> -        if (local_binding_is_up(bindings, iface->id)) {
> >>> +        if (iface->chassis_update_required) {
> >>> +            if (!sb_readonly) {
> >>> +                iface->chassis_update_required = false;
> >>> +                local_binding_set_pb(bindings, iface->id,
chassis_rec,
> >>> +                                     NULL, true);
> >>> +            } else {
> >>> +                continue;
> >>> +            }
> >>> +        }
> >>> +        if (local_binding_is_up(bindings, iface->id, chassis_rec)) {
> >>>              ovs_iface_set_state(mgr, iface, OIF_INSTALLED);
> >>>          }
> >>>      }
> >>> @@ -274,23 +384,53 @@ if_status_mgr_update(struct if_status_mgr *mgr,
> >>>      HMAPX_FOR_EACH_SAFE (node,
&mgr->ifaces_per_state[OIF_MARK_DOWN]) {
> >>>          struct ovs_iface *iface = node->data;
> >>>
> >>> -        if (local_binding_is_down(bindings, iface->id)) {
> >>> +        if (!sb_readonly) {
> >>> +            local_binding_set_pb(bindings, iface->id, chassis_rec,
> >>> +                                 NULL, false);
> >>> +        }
> >>> +        if (local_binding_is_down(bindings, iface->id, chassis_rec))
{
> >>>              ovs_iface_destroy(mgr, iface);
> >>>          }
> >>>      }
> >>>
> >>> -    /* Register for a notification about flows being installed in OVS
> >> for all
> >>> -     * newly claimed interfaces.
> >>> +    if (!sb_readonly) {
> >>> +        HMAPX_FOR_EACH_SAFE (node,
> >> &mgr->ifaces_per_state[OIF_INSTALL_FLOWS]) {
> >>> +            struct ovs_iface *iface = node->data;
> >>> +
> >>> +            if (iface->chassis_update_required) {
> >>> +                iface->chassis_update_required = false;
> >>> +                local_binding_set_pb(bindings, iface->id,
chassis_rec,
> >>> +                                     NULL, true);
> >>> +            }
> >>> +        }
> >>> +    }
> >>> +
> >>> +    /* Update Port_Binding->chassis for newly claimed interfaces
> >>> +     * Register for a notification about flows being installed in OVS
> >> for all
> >>> +     * newly claimed interfaces for which we could update
pb->chassis.
> >>>       *
> >>>       * Move them from OIF_CLAIMED to OIF_INSTALL_FLOWS.
> >>>       */
> >>> -    bool new_ifaces = false;
> >>> -    HMAPX_FOR_EACH_SAFE (node, &mgr->ifaces_per_state[OIF_CLAIMED]) {
> >>> -        struct ovs_iface *iface = node->data;
> >>>
> >>> -        ovs_iface_set_state(mgr, iface, OIF_INSTALL_FLOWS);
> >>> -        iface->install_seqno = mgr->iface_seqno + 1;
> >>> -        new_ifaces = true;
> >>> +    bool new_ifaces = false;
> >>> +    if (!sb_readonly) {
> >>> +        HMAPX_FOR_EACH_SAFE (node,
&mgr->ifaces_per_state[OIF_CLAIMED])
> >> {
> >>> +            struct ovs_iface *iface = node->data;
> >>> +            /* No need to check for chassis_update_required as
already
> >> done
> >>> +             * in if_status_handle_claims or
if_status_mgr_claim_iface
> >>> +             */
> >>> +            ovs_iface_set_state(mgr, iface, OIF_INSTALL_FLOWS);
> >>
> >> We need to make sure the chassis in SB port-binding is up-to-date (i.e.
> >> the update notification from SB DB has been received) before moving to
> >> INSTALL_FLOWS. Otherwise, it is still possible that the state is moved
too
> >> early and end up with incomplete flow installation for the lport when
the
> >> state is finally moved to INSTALLED.
> >>
> >> Thanks,
> >> Han
> >>
> >>> +            iface->install_seqno = mgr->iface_seqno + 1;
> >>> +            new_ifaces = true;
> >>> +        }
> >>> +    } else {
> >>> +        HMAPX_FOR_EACH_SAFE (node,
&mgr->ifaces_per_state[OIF_CLAIMED])
> >> {
> >>> +            struct ovs_iface *iface = node->data;
> >>> +            static struct vlog_rate_limit rl =
VLOG_RATE_LIMIT_INIT(5,
> >> 1);
> >>> +            VLOG_INFO_RL(&rl,
> >>> +                         "Not updating pb chassis for %s now as "
> >>> +                         "sb is readonly", iface->id);
> >>> +        }
> >>>      }
> >>>
> >>>      /* Request a seqno update when the flows for new interfaces have
> >> been
> >>> @@ -403,7 +543,7 @@ if_status_mgr_update_bindings(struct if_status_mgr
> >> *mgr,
> >>>      struct hmapx_node *node;
> >>>
> >>>      /* Notify the binding module to set "down" all bindings that are
> >> still
> >>> -     * in the process of being installed in OVS, i.e., are not yet
> >> instsalled.
> >>> +     * in the process of being installed in OVS, i.e., are not yet
> >> installed.
> >>>       */
> >>>      HMAPX_FOR_EACH (node, &mgr->ifaces_per_state[OIF_INSTALL_FLOWS])
{
> >>>          struct ovs_iface *iface = node->data;
> >>> diff --git a/controller/if-status.h b/controller/if-status.h
> >>> index bb8a3950d..f9b05d30d 100644
> >>> --- a/controller/if-status.h
> >>> +++ b/controller/if-status.h
> >>> @@ -27,15 +27,27 @@ struct if_status_mgr *if_status_mgr_create(void);
> >>>  void if_status_mgr_clear(struct if_status_mgr *);
> >>>  void if_status_mgr_destroy(struct if_status_mgr *);
> >>>
> >>> -void if_status_mgr_claim_iface(struct if_status_mgr *, const char
> >> *iface_id);
> >>> +void if_status_mgr_claim_iface(struct if_status_mgr *,
> >>> +                               const struct sbrec_port_binding *pb,
> >>> +                               const struct sbrec_chassis
*chassis_rec,
> >>> +                               bool sb_readonly);
> >>>  void if_status_mgr_release_iface(struct if_status_mgr *, const char
> >> *iface_id);
> >>>  void if_status_mgr_delete_iface(struct if_status_mgr *, const char
> >> *iface_id);
> >>>
> >>> -void if_status_mgr_update(struct if_status_mgr *, struct
> >> local_binding_data *);
> >>> +void if_status_mgr_update(struct if_status_mgr *, struct
> >> local_binding_data *,
> >>> +                          const struct sbrec_chassis *chassis,
> >>> +                          bool sb_readonly);
> >>>  void if_status_mgr_run(struct if_status_mgr *mgr, struct
> >> local_binding_data *,
> >>>                         const struct sbrec_chassis *,
> >>>                         bool sb_readonly, bool ovs_readonly);
> >>>  void if_status_mgr_get_memory_usage(struct if_status_mgr *mgr,
> >>>                                      struct simap *usage);
> >>> +bool if_status_mgr_iface_is_present(struct if_status_mgr *mgr,
> >>> +                                    const char *iface_id);
> >>> +bool if_status_handle_claims(struct if_status_mgr *mgr,
> >>> +                             struct local_binding_data *binding_data,
> >>> +                             const struct sbrec_chassis *chassis_rec,
> >>> +                             struct hmap *tracked_datapath,
> >>> +                             bool sb_readonly);
> >>>
> >>>  # endif /* controller/if-status.h */
> >>> diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c
> >>> index 69615308e..3947baf03 100644
> >>> --- a/controller/ovn-controller.c
> >>> +++ b/controller/ovn-controller.c
> >>> @@ -1464,6 +1464,73 @@ en_runtime_data_run(struct engine_node *node,
> >> void *data)
> >>>      engine_set_node_state(node, EN_UPDATED);
> >>>  }
> >>>
> >>> +struct ed_type_sb_ro {
> >>> +    bool sb_readonly;
> >>> +};
> >>> +
> >>> +static void *
> >>> +en_sb_ro_init(struct engine_node *node OVS_UNUSED,
> >>> +              struct engine_arg *arg OVS_UNUSED)
> >>> +{
> >>> +    struct ed_type_sb_ro *data = xzalloc(sizeof *data);
> >>> +    return data;
> >>> +}
> >>> +
> >>> +static void
> >>> +en_sb_ro_run(struct engine_node *node, void *data)
> >>> +{
> >>> +    struct ed_type_sb_ro *sb_ro_data = data;
> >>> +    bool sb_readonly = !engine_get_context()->ovnsb_idl_txn;
> >>> +    if (sb_ro_data->sb_readonly != sb_readonly) {
> >>> +        sb_ro_data->sb_readonly = sb_readonly;
> >>> +        if (!sb_ro_data->sb_readonly) {
> >>> +            engine_set_node_state(node, EN_UPDATED);
> >>> +        }
> >>> +    }
> >>> +}
> >>> +
> >>> +static void
> >>> +en_sb_ro_cleanup(void *data OVS_UNUSED)
> >>> +{
> >>> +}
> >>> +
> >>> +static bool
> >>> +runtime_data_sb_ro_handler(struct engine_node *node, void *data)
> >>> +{
> >>> +    const struct sbrec_chassis *chassis = NULL;
> >>> +
> >>> +    struct ovsrec_open_vswitch_table *ovs_table =
> >>> +        (struct ovsrec_open_vswitch_table *)EN_OVSDB_GET(
> >>> +            engine_get_input("OVS_open_vswitch", node));
> >>> +
> >>> +    const char *chassis_id = get_ovs_chassis_id(ovs_table);
> >>> +
> >>> +    struct ovsdb_idl_index *sbrec_chassis_by_name =
> >>> +        engine_ovsdb_node_get_index(
> >>> +                engine_get_input("SB_chassis", node),
> >>> +                "name");
> >>> +
> >>> +    if (chassis_id) {
> >>> +        chassis = chassis_lookup_by_name(sbrec_chassis_by_name,
> >> chassis_id);
> >>> +    }
> >>> +    if (chassis) {
> >>> +        struct ed_type_runtime_data *rt_data = data;
> >>> +        bool sb_readonly = !engine_get_context()->ovnsb_idl_txn;
> >>> +        struct controller_engine_ctx *ctrl_ctx =
> >>> +            engine_get_context()->client_ctx;
> >>> +
> >>> +        if (if_status_handle_claims(ctrl_ctx->if_mgr,
> >>> +                                    &rt_data->lbinding_data,
> >>> +                                    chassis,
> >>> +                                    &rt_data->tracked_dp_bindings,
> >>> +                                    sb_readonly)) {
> >>> +            engine_set_node_state(node, EN_UPDATED);
> >>> +            rt_data->tracked = true;
> >>> +        }
> >>> +    }
> >>> +    return true;
> >>> +}
> >>> +
> >>>  static bool
> >>>  runtime_data_ovs_interface_shadow_handler(struct engine_node *node,
> >> void *data)
> >>>  {
> >>> @@ -3528,6 +3595,7 @@ main(int argc, char *argv[])
> >>>      stopwatch_create(VIF_PLUG_RUN_STOPWATCH_NAME, SW_MS);
> >>>
> >>>      /* Define inc-proc-engine nodes. */
> >>> +    ENGINE_NODE(sb_ro, "sb_ro");
> >>>      ENGINE_NODE_WITH_CLEAR_TRACK_DATA_IS_VALID(ct_zones, "ct_zones");
> >>>      ENGINE_NODE_WITH_CLEAR_TRACK_DATA(ovs_interface_shadow,
> >>>                                        "ovs_interface_shadow");
> >>> @@ -3664,6 +3732,7 @@ main(int argc, char *argv[])
> >>>      engine_add_input(&en_ovs_interface_shadow, &en_ovs_interface,
> >>>                       ovs_interface_shadow_ovs_interface_handler);
> >>>
> >>> +    engine_add_input(&en_runtime_data, &en_sb_ro,
> >> runtime_data_sb_ro_handler);
> >>>      engine_add_input(&en_runtime_data, &en_ofctrl_is_connected,
NULL);
> >>>
> >>>      engine_add_input(&en_runtime_data, &en_ovs_open_vswitch, NULL);
> >>> @@ -4098,7 +4167,8 @@ main(int argc, char *argv[])
> >>>                          runtime_data ? &runtime_data->lbinding_data :
> >> NULL;
> >>>
 stopwatch_start(IF_STATUS_MGR_UPDATE_STOPWATCH_NAME,
> >>>                                      time_msec());
> >>> -                    if_status_mgr_update(if_mgr, binding_data);
> >>> +                    if_status_mgr_update(if_mgr, binding_data,
chassis,
> >>> +                                         !ovnsb_idl_txn);
> >>>
 stopwatch_stop(IF_STATUS_MGR_UPDATE_STOPWATCH_NAME,
> >>>                                     time_msec());
> >>>
> >>> diff --git a/tests/ovn-macros.at b/tests/ovn-macros.at
> >>> index 335f9158c..8fd6ae6f7 100644
> >>> --- a/tests/ovn-macros.at
> >>> +++ b/tests/ovn-macros.at
> >>> @@ -759,3 +759,15 @@
m4_define([OVN_FOR_EACH_NORTHD_WITHOUT_DP_GROUPS],
> >>>       [m4_foreach([NORTHD_USE_DP_GROUPS], [no],
> >>>         [m4_foreach([NORTHD_USE_PARALLELIZATION], [yes, no], [$1
> >>>  ])])])])
> >>> +
> >>> +# OVN_NBCTL(NBCTL_COMMAND) adds NBCTL_COMMAND to list of commands to
be
> >> run by RUN_OVN_NBCTL().
> >>> +m4_define([OVN_NBCTL], [
> >>> +    command="${command} -- $1"
> >>> +])
> >>> +
> >>> +# RUN_OVN_NBCTL() executes list of commands built by the OVN_NBCTL()
> >> macro.
> >>> +m4_define([RUN_OVN_NBCTL], [
> >>> +    check ovn-nbctl ${command}
> >>> +    unset command
> >>> +])
> >>> +
> >>> diff --git a/tests/ovn.at b/tests/ovn.at
> >>> index bfaa41962..94d16bac9 100644
> >>> --- a/tests/ovn.at
> >>> +++ b/tests/ovn.at
> >>> @@ -102,6 +102,18 @@ m4_divert_text([PREPARE_TESTS],
> >>>           test 1 -le $(as $hv1 ovs-ofctl dump-flows br-int | grep -c
> >> "output:$ofport")
> >>>       ])
> >>>     }
> >>> +
> >>> +   ovn_wait_remote_input_flows () {
> >>> +     hv1=$1
> >>> +     hv2=$2
> >>> +     echo "$3: waiting for flows for remote input on $hv1"
> >>> +     # Wait for a flow outputing  to remote input
> >>> +     OVS_WAIT_UNTIL([
> >>> +         ofport=$(as $hv1 ovs-vsctl --bare --columns ofport find
> >> Interface name=ovn-${hv2}-0)
> >>> +         echo "tunnel port=$ofport"
> >>> +         test 1 -le $(as $hv1 ovs-ofctl dump-flows br-int | grep -c
> >> "in_port=$ofport")
> >>> +     ])
> >>> +   }
> >>>  ])
> >>>
> >>>  m4_define([OVN_CHECK_PACKETS],
> >>> @@ -127,6 +139,8 @@ m4_define([OVN_WAIT_PATCH_PORT_FLOWS],
> >>>  m4_define([OVN_WAIT_REMOTE_OUTPUT_FLOWS],
> >>>    [ovn_wait_remote_output_flows "$1" "$2" "__file__:__line__"])
> >>>
> >>> +m4_define([OVN_WAIT_REMOTE_INPUT_FLOWS],
> >>> +  [ovn_wait_remote_input_flows "$1" "$2" "__file__:__line__"])
> >>>
> >>>  AT_BANNER([OVN components])
> >>>
> >>> @@ -14056,6 +14070,11 @@ wait_column "$hv1_uuid" Port_Binding
> >> requested_chassis logical_port=lsp0
> >>>  wait_column "$hv2_uuid" Port_Binding additional_chassis
> >> logical_port=lsp0
> >>>  wait_column "$hv2_uuid" Port_Binding requested_additional_chassis
> >> logical_port=lsp0
> >>>
> >>> +# Check ovn-installed updated for main chassis
> >>> +wait_for_ports_up
> >>> +OVS_WAIT_UNTIL([test `as hv1 ovs-vsctl get Interface lsp0
> >> external_ids:ovn-installed` = '"true"'])
> >>> +OVS_WAIT_UNTIL([test x`as hv2 ovs-vsctl get Interface lsp0
> >> external_ids:ovn-installed` = x])
> >>> +
> >>>  # Check that setting iface:encap-ip populates
> >> Port_Binding:additional_encap
> >>>  wait_row_count Encap 2 chassis_name=hv1
> >>>  wait_row_count Encap 2 chassis_name=hv2
> >>> @@ -14081,6 +14100,11 @@ wait_column "$hv2_uuid" Port_Binding
> >> requested_chassis logical_port=lsp0
> >>>  wait_column "" Port_Binding additional_chassis logical_port=lsp0
> >>>  wait_column "" Port_Binding requested_additional_chassis
> >> logical_port=lsp0
> >>>
> >>> +# Check ovn-installed updated for main chassis and not for other
chassis
> >>> +wait_for_ports_up
> >>> +OVS_WAIT_UNTIL([test `as hv2 ovs-vsctl get Interface lsp0
> >> external_ids:ovn-installed` = '"true"'])
> >>> +OVS_WAIT_UNTIL([test x`as hv1 ovs-vsctl get Interface lsp0
> >> external_ids:ovn-installed` = x])
> >>> +
> >>>  # Check that additional_encap is cleared
> >>>  wait_column "" Port_Binding additional_encap logical_port=lsp0
> >>>
> >>> @@ -15370,7 +15394,10 @@ AT_CHECK([as hv1 ovs-ofctl dump-flows br-int
> >> table=65 | grep actions=output:1],
> >>>  echo "verifying that lsp0 binding moves when requested-chassis is
> >> changed"
> >>>
> >>>  ovn-nbctl lsp-set-options lsp0 requested-chassis=hv2
> >>> -OVS_WAIT_UNTIL([test 1 = $(grep -c "Releasing lport lsp0 from this
> >> chassis" hv1/ovn-controller.log)])
> >>> +
> >>> +# We might see multiple "Releasing lport ...", when sb is read only
> >>> +OVS_WAIT_UNTIL([test 1 -le $(grep -c "Releasing lport lsp0 from this
> >> chassis" hv1/ovn-controller.log)])
> >>> +
> >>>  wait_column "$hv2_uuid" Port_Binding chassis logical_port=lsp0
> >>>
> >>>  # (6) Chassis hv2 should add flows and hv1 should not.
> >>> @@ -15416,7 +15443,7 @@ AT_CHECK([as hv1 ovs-ofctl dump-flows br-int
> >> table=0 | grep in_port=1], [0], [ig
> >>>  AT_CHECK([as hv1 ovs-ofctl dump-flows br-int table=65 | grep
> >> actions=output:1], [0], [ignore])
> >>>
> >>>  check ovn-nbctl --wait=hv lsp-set-options lsp0
> >> requested-chassis=non-existant-chassis
> >>> -OVS_WAIT_UNTIL([test 1 = $(grep -c "Releasing lport lsp0 from this
> >> chassis" hv1/ovn-controller.log)])
> >>> +OVS_WAIT_UNTIL([test 1 -le $(grep -c "Releasing lport lsp0 from this
> >> chassis" hv1/ovn-controller.log)])
> >>>  check ovn-nbctl --wait=hv sync
> >>>  wait_column '' Port_Binding chasssi logical_port=lsp0
> >>>  AT_CHECK([as hv1 ovs-ofctl dump-flows br-int table=0 | grep
in_port=1],
> >> [1], [])
> >>> @@ -32418,3 +32445,119 @@ AT_CHECK([test $(ovn-sbctl list fdb | grep
-c
> >> "00:00:00:00:10:30") = 0])
> >>>  OVN_CLEANUP([hv1])
> >>>  AT_CLEANUP
> >>>  ])
> >>> +
> >>> +OVN_FOR_EACH_NORTHD([
> >>> +AT_SETUP([recomputes])
> >>> +ovn_start
> >>> +
> >>> +n_hv=4
> >>> +
> >>> +# Add chassis
> >>> +net_add n1
> >>> +for i in $(seq 1 $n_hv); do
> >>> +    sim_add hv$i
> >>> +    as hv$i
> >>> +    check ovs-vsctl add-br br-phys
> >>> +    ovs-vsctl set open .
external-ids:ovn-bridge-mappings=phys:br-phys
> >>> +    ovn_attach n1 br-phys 192.168.0.$i 24 geneve
> >>> +done
> >>> +
> >>> +add_switch_ports() {
> >>> +    start_port=$1
> >>> +    end_port=$2
> >>> +    nb_hv=$3
> >>> +    bulk_size=$4
> >>> +    for ((i=start_port; i<end_port; )) do
> >>> +        start_bulk=$i
> >>> +        for hv in $(seq 1 $nb_hv); do
> >>> +            end_bulk=$((start_bulk+bulk_size-1))
> >>> +            for port in $(seq $start_bulk $end_bulk); do
> >>> +                logical_switch_port=lsp${port}
> >>> +                OVN_NBCTL(lsp-add ls1 $logical_switch_port)
> >>> +                OVN_NBCTL(lsp-set-addresses $logical_switch_port
> >> dynamic)
> >>> +            done
> >>> +            start_bulk=$((end_bulk+1))
> >>> +        done
> >>> +        RUN_OVN_NBCTL()
> >>> +
> >>> +        start_bulk=$i
> >>> +        for hv in $(seq 1 $nb_hv); do
> >>> +            end_bulk=$((start_bulk+bulk_size-1))
> >>> +            for port in $(seq $start_bulk $end_bulk); do
> >>> +                logical_switch_port=lsp${port}
> >>> +                as hv$hv ovs-vsctl \
> >>> +                    --no-wait -- add-port br-int vif${port} \
> >>> +                    -- set Interface vif${port}
> >> external_ids:iface-id=$logical_switch_port
> >>> +            done
> >>> +            start_bulk=$((end_bulk+1))
> >>> +        done
> >>> +        i=$((end_bulk+1))
> >>> +    done
> >>> +}
> >>> +check ovn-nbctl ls-add ls1
> >>> +check ovn-nbctl set Logical_Switch ls1 other_config:subnet=
10.1.0.0/16
> >>> +check ovn-nbctl set Logical_Switch ls1
> >> other_config:exclude_ips=10.1.255.254
> >>> +
> >>> +check ovn-nbctl lr-add lr1
> >>> +check ovn-nbctl lsp-add ls1 lsp0 -- set Logical_Switch_Port lsp0
> >> type=router options:router-port=lrp0 addresses=dynamic
> >>> +check ovn-nbctl lrp-add lr1 lrp0 "f0:00:00:01:00:01" 10.1.255.254/16
> >>> +check ovn-nbctl lr-nat-add lr1 snat 10.2.0.1 10.1.0.0/16
> >>> +
> >>> +lflow_run=0
> >>> +check ovn-nbctl --wait=hv sync
> >>> +
> >>> +# Tunnel ports might not be added (yet) at this point on slow system.
> >>> +# Wait for flows related to such ports to ensure those ports have
been
> >> added
> >>> +# before we measure recomputes. Otherwise, ovs_interface handler
might
> >> be run
> >>> +# afterwards for tunnel ports, causing recomputes.
> >>> +for i in $(seq 1 $n_hv); do
> >>> +    for j in $(seq 1 $n_hv); do
> >>> +        if test $i != $j; then
> >>> +            OVN_WAIT_REMOTE_INPUT_FLOWS(["hv$i"],["hv$j"])
> >>> +        fi
> >>> +    done
> >>> +done
> >>> +
> >>> +for i in $(seq 1 $n_hv); do
> >>> +    as hv$i
> >>> +    lflow_run1=$(ovn-appctl -t ovn-controller coverage/read-counter
> >> lflow_run)
> >>> +    lflow_run=`expr $lflow_run1 + $lflow_run`
> >>> +done
> >>> +
> >>> +add_switch_ports 1 1000 $n_hv 5
> >>> +
> >>> +wait_for_ports_up
> >>> +check ovn-nbctl --wait=hv sync
> >>> +
> >>> +for i in $(seq 1 $n_hv); do
> >>> +    pid=$(cat hv${i}/ovn-controller.pid)
> >>> +    u=$((u+$(cat "/proc/$pid/stat" | awk '{print $14}')))
> >>> +    s=$((s+$(cat "/proc/$pid/stat" | awk '{print $15}')))
> >>> +done
> >>> +
> >>> +n_pid=$(cat northd/ovn-northd.pid)
> >>> +n_u=$(cat "/proc/$pid/stat" | awk '{print $14}')
> >>> +n_s=$(cat "/proc/$pid/stat" | awk '{print $15}')
> >>> +
> >>> +echo "Total Northd User Time: $n_u"
> >>> +echo "Total Northd System Time: $n_s"
> >>> +echo "Total Controller User Time: $u"
> >>> +echo "Total Controller System Time: $s"
> >>> +
> >>> +lflow_run_end=0
> >>> +for i in $(seq 1 $n_hv); do
> >>> +    as hv$i
> >>> +    lflow_run1=$(ovn-appctl -t ovn-controller coverage/read-counter
> >> lflow_run)
> >>> +    lflow_run_end=`expr $lflow_run1 + $lflow_run_end`
> >>> +done
> >>> +n_recomputes=`expr $lflow_run_end - $lflow_run`
> >>> +echo "$n_recomputes recomputes"
> >>> +
> >>> +AT_CHECK([test $lflow_run_end == $lflow_run])
> >>> +
> >>> +for i in $(seq 2 $n_hv); do
> >>> +    OVN_CLEANUP_SBOX([hv$i])
> >>> +done
> >>> +OVN_CLEANUP([hv1])
> >>> +AT_CLEANUP
> >>> +])
> >>> diff --git a/tests/perf-northd.at b/tests/perf-northd.at
> >>> index 74b69e9d4..6ec196b36 100644
> >>> --- a/tests/perf-northd.at
> >>> +++ b/tests/perf-northd.at
> >>> @@ -76,23 +76,6 @@ m4_define([PERF_RECORD_STOP], [
> >>>      PERF_RECORD_STOPWATCH(ovn-northd-loop, ["Short term average"],
> >> [Average (northd-loop in msec)])
> >>>  ])
> >>>
> >>> -# OVN_NBCTL([NBCTL_COMMAND])
> >>> -#
> >>> -# Add NBCTL_COMMAND to list of commands to be run by RUN_OVN_NBCTL().
> >>> -#
> >>> -m4_define([OVN_NBCTL], [
> >>> -    command="${command} -- $1"
> >>> -])
> >>> -
> >>> -# RUN_OVN_NBCTL()
> >>> -#
> >>> -# Execute list of commands built by the OVN_NBCTL() macro.
> >>> -#
> >>> -m4_define([RUN_OVN_NBCTL], [
> >>> -    check ovn-nbctl ${command}
> >>> -    unset command
> >>> -])
> >>> -
> >>>  OVS_START_SHELL_HELPERS
> >>>  generate_subnet () {
> >>>      local a=$(printf %d $(expr $1 / 256 + 10))
> >>> --
> >>> 2.31.1
> >>>
> >>
> >
>
Dumitru Ceara July 12, 2022, 7:35 a.m. UTC | #6
On 7/12/22 08:52, Han Zhou wrote:
> On Mon, Jul 11, 2022 at 4:55 AM Dumitru Ceara <dceara@redhat.com> wrote:
>>
>> On 7/11/22 13:31, Xavier Simonart wrote:
>>> Hi Han
>>>
>>> Thanks for your review.
>>>
>>> Let me try to understand your two main concerns and the proper way to
> fix
>>> it.
>>> 1) We only try once to write pb->chassis. If the commit fails,
> pb->chassis
>>> is not written. As commit fails, we will recompute, but as the
>>> update_required flag is not set anymore, we might end up with no
>>> pb->chassis.
>>> => I'll remove the flag and try to update until it's confirmed.
> 
> Thank you!
> 
>>> 2) The state machine, and when we move to INSTALL_FLOWS. Serializing the
>>> state machine, by waiting for confirmation to be received before moving
> to
>>> INSTALL_FLOWS state will delay the ovn-installed compared to today. So
> I am
>>> (still) trying to see if there is any way to prevent this in some cases.
>>> Would it be correct to do this serialization (wait for pb->chassis
> update
>>> confirmation) only when using conditional monitoring? When using
>>> monitor-all, as soon as we have written (w/o confirmation) pb->chassis,
> we
>>> would move to INSTALL_FLOWS. In that loop where we wrote pb->chassis,
> (all)
>>> the flows should be updated taking into account pb->chassis.
>>
>> I think this approach is OK.  ovn-controllers that don't use conditional
>> monitoring already know the complete SB contents and should be able to
>> install (mostly?) complete sets of openflows that correspond to a given
>> Port_Binding.
>>
>> It's also "CMS-friendly", at least for ovn-kubernetes, which uses
>> ovn-monitor-all=true and only waits for
>> OVS.Interface.external_ids:ovn-installed=true.  And shouldn't impact the
>> others which wait for SB.Port_Binding.up=true.
>>
> 
> Xavier and Dumitru, I think we shouldn't introduce the dependency of
> "ovn-monitor-all" setting here.
> 
> First of all, ovn-installed is a flag for CMS to understand that all the
> flows related to the port-binding is installed. If we set the flag before
> it is truly completed, it is a bug, and it is possible that the flag is set
> but some traffic doesn't work.

I think it's a matter of semantics.  The way I see "ovn-installed=true"
is: all flows that are relevant to the port binidng on the local chassis
have been installed.  When we added it it was for the ovn-k8s case;
ovn-k8s used to explicitly check if some openflow tables on the node
where the pod is being brought up contained flows that seemed to
correspond to the pod (e.g., matching on pod mac and IP addresses).

> I did a quick test, and at least a flow in (table_id=38, priority=100)
> which is multicast-group related is updated AFTER the SB notification is
> received for the port-binding chassis update.
> 

This sounds like something we should fix, I think.  I don't see any
multicast-group changes conditioned by the port_binding being up=true in
northd.  I might be wrong though.

> Secondly, if the change hasn't made it to the SB, all the other nodes would
> not be able to reach the port, which means the workload (pod/VM) cannot
> receive traffic yet at this phase.
> 

Even if the change made it to the SB we have no way of knowing that all
other nodes processed it so we cannot know for sure that traffic can
flow properly end-to-end.  But, like I said above, this doesn't matter
if the semantics of ovn-installed=true are "all locally relevant flows
are installed".

> So, I think our goal is not to set ovn-installed early, but to set it
> accurately (sometime may be ok to be conservative).
> 

Sure, but waiting for the SB port_binding.chassis update might introduce
significant spikes in latency if the SB is compacting (or just busy) at
that moment.

This might become an issue in large scale deployments as pods will take
longer to be declared "ready".

> In addition, ovn-monitor-all is not always true even in ovn-k8s. It is
> configurable in ovn-k8s. (in our environment we set it to false, to save
> MEM and CPU for worker nodes, while sacrifice a little for the central SB
> DB)
> 

Ack.  But for this case specifically, as SB is already busier with
conditional monitoring, I think serializing events in ovn-controller
will create even more visible delays in pod bringup times.

Thanks,
Dumitru

> Thanks,
> Han
>>>
>>> Thanks again for your feedback
>>>
>>> Xavier
>>>
>>> On Thu, Jul 7, 2022 at 8:49 AM Han Zhou <hzhou@ovn.org> wrote:
>>>
>>>>
>>>>
>>>> On Wed, Jun 22, 2022 at 3:23 AM Xavier Simonart <xsimonar@redhat.com>
>>>> wrote:
>>>>>
>>>>> When VIF ports are claimed on a chassis, SBDB Port_Binding table is
>>>> updated.
>>>>> If the SBDB IDL is still is read-only ("in transaction") when such a
>>>> update
>>>>> is required, the update is not possible and recompute is triggered
>>>> through
>>>>> I+P failure.
>>>>>
>>>>> This situation can happen:
>>>>> - after updating Port_Binding->chassis to SBDB for one port, in a
>>>> following
>>>>>   iteration, ovn-controller handles
> Interface:external_ids:ovn-installed
>>>>>   (for the same port) while SBDB is still read-only.
>>>>> - after updating Port_Binding->chassis to SBDB for one port, in a
>>>> following
>>>>>   iteration, ovn-controller updates Port_Binding->chassis for another
>>>> port,
>>>>>   while SBDB is still read-only.
>>>>>
>>>>> This patch prevent the recompute, by having the if-status module
>>>>> updating the Port_Binding chassis (if needed) when possible.
>>>>> This does not delay Port_Binding chassis update compared to before
> this
>>>> patch.
>>>>> - With the patch, Port_Binding chassis will be updated as soon as
> SBDB is
>>>>> again writable, without recompute.
>>>>> - Without the patch, Port_Binding chassis was updated as soon as SBDB
> was
>>>>> again writable, through a recompute.
>>>>>
>>>>> As part of this patch, ovn-installed will not be updated for
> additional
>>>> chassis;
>>>>> it will only be updated when the migration is completed.
>>>>>
>>>>> Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2059253
>>>>> Signed-off-by: Xavier Simonart <xsimonar@redhat.com>
>>>>>
>>>>> ---
>>>>> v2:  - handled Dumitru's comments.
>>>>>      - handled Han's comments, mainly ensure we moved out of CLAIMED
>>>> state
>>>>>        only after updating pb->chassis to guarentee physical flows are
>>>> installed
>>>>>        when ovn-installed is updated in OVS.
>>>>>      - slighly reorganize the code to isolate 'notify_up = false'
> cases
>>>> in
>>>>>        claim_port (i.e. ports such as virtual ports), in the idea of
>>>> making
>>>>>        future patch preventing recomputes when virtual ports are
> claimed.
>>>>>      - updated test case to cause more race conditions.
>>>>>      - rebased on origin/main
>>>>>      - note that "additional chassis" as now supported by
>>>>>        "Support LSP:options:requested-chassis as a list" might still
>>>> cause
>>>>>        recomputes.
>>>>>      - fixed missing flows when Port_Binding chassis was updated by
>>>> mgr_update
>>>>>        w/o any lflow recalculation.
>>>>> v3:  - handled Dumitru's comments on v2, mainly have runtime_data
> handler
>>>>>        handling pb_claims when sb becomes writable (instead of a lflow
>>>> handler).
>>>>>      - fixed test as it was not checking recomputes on all hv, as well
>>>> as a flaky
>>>>>        behavior.
>>>>>      - rebased on origin/main.
>>>>> ---
>>>>>  controller/binding.c        | 154 +++++++++++++++++++++----------
>>>>>  controller/binding.h        |  15 +++-
>>>>>  controller/if-status.c      | 174
> ++++++++++++++++++++++++++++++++----
>>>>>  controller/if-status.h      |  16 +++-
>>>>>  controller/ovn-controller.c |  72 ++++++++++++++-
>>>>>  tests/ovn-macros.at         |  12 +++
>>>>>  tests/ovn.at                | 147 +++++++++++++++++++++++++++++-
>>>>>  tests/perf-northd.at        |  17 ----
>>>>>  8 files changed, 519 insertions(+), 88 deletions(-)
>>>>>
>>>>> diff --git a/controller/binding.c b/controller/binding.c
>>>>> index 2279570f9..b21577f71 100644
>>>>> --- a/controller/binding.c
>>>>> +++ b/controller/binding.c
>>>>> @@ -644,11 +644,17 @@ local_binding_get_lport_ofport(const struct
> shash
>>>> *local_bindings,
>>>>>  }
>>>>>
>>>>>  bool
>>>>> -local_binding_is_up(struct shash *local_bindings, const char
> *pb_name)
>>>>> +local_binding_is_up(struct shash *local_bindings, const char
> *pb_name,
>>>>> +                    const struct sbrec_chassis *chassis_rec)
>>>>>  {
>>>>>      struct local_binding *lbinding =
>>>>>          local_binding_find(local_bindings, pb_name);
>>>>>      struct binding_lport *b_lport =
>>>> local_binding_get_primary_lport(lbinding);
>>>>> +
>>>>> +    if (b_lport && b_lport->pb->chassis != chassis_rec) {
>>>>> +        return false;
>>>>> +    }
>>>>> +
>>>>>      if (lbinding && b_lport && lbinding->iface) {
>>>>>          if (b_lport->pb->n_up && !b_lport->pb->up[0]) {
>>>>>              return false;
>>>>> @@ -660,13 +666,23 @@ local_binding_is_up(struct shash
> *local_bindings,
>>>> const char *pb_name)
>>>>>  }
>>>>>
>>>>>  bool
>>>>> -local_binding_is_down(struct shash *local_bindings, const char
> *pb_name)
>>>>> +local_binding_is_down(struct shash *local_bindings, const char
> *pb_name,
>>>>> +                      const struct sbrec_chassis *chassis_rec)
>>>>>  {
>>>>>      struct local_binding *lbinding =
>>>>>          local_binding_find(local_bindings, pb_name);
>>>>>
>>>>>      struct binding_lport *b_lport =
>>>> local_binding_get_primary_lport(lbinding);
>>>>>
>>>>> +    if (b_lport) {
>>>>> +        if (b_lport->pb->chassis == chassis_rec) {
>>>>> +            return false;
>>>>> +        } else if (b_lport->pb->chassis) {
>>>>> +            VLOG_DBG("lport %s already claimed by other chassis",
>>>>> +                     b_lport->pb->logical_port);
>>>>> +        }
>>>>> +    }
>>>>> +
>>>>>      if (!lbinding) {
>>>>>          return true;
>>>>>      }
>>>>> @@ -884,37 +900,80 @@ get_lport_type_str(enum en_lport_type
> lport_type)
>>>>>      OVS_NOT_REACHED();
>>>>>  }
>>>>>
>>>>> -/* For newly claimed ports, if 'notify_up' is 'false':
>>>>> +void
>>>>> +set_pb_chassis_in_sbrec(const struct sbrec_port_binding *pb,
>>>>> +                        const struct sbrec_chassis *chassis_rec,
>>>>> +                        bool is_set)
>>>>> +{
>>>>> +    if (pb->chassis != chassis_rec) {
>>>>> +         if (is_set) {
>>>>> +            if (pb->chassis) {
>>>>> +                VLOG_INFO("Changing chassis for lport %s from %s to
>>>> %s.",
>>>>> +                          pb->logical_port, pb->chassis->name,
>>>>> +                          chassis_rec->name);
>>>>> +            } else {
>>>>> +                VLOG_INFO("Claiming lport %s for this chassis.",
>>>>> +                          pb->logical_port);
>>>>> +            }
>>>>> +            for (int i = 0; i < pb->n_mac; i++) {
>>>>> +                VLOG_INFO("%s: Claiming %s", pb->logical_port,
>>>> pb->mac[i]);
>>>>> +            }
>>>>> +            sbrec_port_binding_set_chassis(pb, chassis_rec);
>>>>> +        }
>>>>> +    } else if (!is_set) {
>>>>> +        sbrec_port_binding_set_chassis(pb, NULL);
>>>>> +    }
>>>>> +}
>>>>> +
>>>>> +void
>>>>> +local_binding_set_pb(struct shash *local_bindings, const char
> *pb_name,
>>>>> +                     const struct sbrec_chassis *chassis_rec,
>>>>> +                     struct hmap *tracked_datapaths, bool is_set)
>>>>> +{
>>>>> +    struct local_binding *lbinding =
>>>>> +        local_binding_find(local_bindings, pb_name);
>>>>> +    struct binding_lport *b_lport =
>>>> local_binding_get_primary_lport(lbinding);
>>>>> +
>>>>> +    if (b_lport) {
>>>>> +        set_pb_chassis_in_sbrec(b_lport->pb, chassis_rec, is_set);
>>>>> +        if (tracked_datapaths) {
>>>>> +            update_lport_tracking(b_lport->pb, tracked_datapaths,
> true);
>>>>> +        }
>>>>> +    }
>>>>> +}
>>>>> +
>>>>> +/* For newly claimed ports:
>>>>>   * - set the 'pb.up' field to true if 'pb' has no 'parent_pb'.
>>>>>   * - set the 'pb.up' field to true if 'parent_pb.up' is 'true' (e.g.,
>>>> for
>>>>>   *   container and virtual ports).
>>>>> - * Otherwise request a notification to be sent when the OVS flows
>>>>> - * corresponding to 'pb' have been installed.
>>>>> + *
>>>>> + * Returns false if lport is not claimed due to 'sb_readonly'.
>>>>> + * Returns true otherwise.
>>>>>   *
>>>>>   * Note:
>>>>> - *   Updates (directly or through a notification) the 'pb->up' field
>>>> only if
>>>>> - *   it's explicitly set to 'false'.
>>>>> + *   Updates the 'pb->up' field only if it's explicitly set to
> 'false'.
>>>>>   *   This is to ensure compatibility with older versions of
> ovn-northd.
>>>>>   */
>>>>> -static void
>>>>> +static bool
>>>>>  claimed_lport_set_up(const struct sbrec_port_binding *pb,
>>>>>                       const struct sbrec_port_binding *parent_pb,
>>>>> -                     const struct sbrec_chassis *chassis_rec,
>>>>> -                     bool notify_up, struct if_status_mgr *if_mgr)
>>>>> +                     bool sb_readonly)
>>>>>  {
>>>>> -    if (!notify_up) {
>>>>> -        bool up = true;
>>>>> -        if (!parent_pb || (parent_pb->n_up && parent_pb->up[0])) {
>>>>> +    /* When notify_up is false in claim_port(), no state is created
>>>>> +     * by if_status_mgr. In such cases, return false (i.e. trigger
>>>> recompute)
>>>>> +     * if we can't update sb (because it is readonly).
>>>>> +     */
>>>>> +    bool up = true;
>>>>> +    if (!parent_pb || (parent_pb->n_up && parent_pb->up[0])) {
>>>>> +        if (!sb_readonly) {
>>>>>              if (pb->n_up) {
>>>>>                  sbrec_port_binding_set_up(pb, &up, 1);
>>>>>              }
>>>>> +        } else if (pb->n_up && !pb->up[0]) {
>>>>> +            return false;
>>>>>          }
>>>>> -        return;
>>>>> -    }
>>>>> -
>>>>> -    if (pb->chassis != chassis_rec || (pb->n_up && !pb->up[0])) {
>>>>> -        if_status_mgr_claim_iface(if_mgr, pb->logical_port);
>>>>>      }
>>>>> +    return true;
>>>>>  }
>>>>>
>>>>>  typedef void (*set_func)(const struct sbrec_port_binding *pb,
>>>>> @@ -1057,37 +1116,35 @@ claim_lport(const struct sbrec_port_binding
> *pb,
>>>>>              struct hmap *tracked_datapaths,
>>>>>              struct if_status_mgr *if_mgr)
>>>>>  {
>>>>> -    if (!sb_readonly) {
>>>>> -        claimed_lport_set_up(pb, parent_pb, chassis_rec, notify_up,
>>>> if_mgr);
>>>>> -    }
>>>>> -
>>>>>      enum can_bind can_bind =
>>>> lport_can_bind_on_this_chassis(chassis_rec, pb);
>>>>>      bool update_tracked = false;
>>>>>
>>>>>      if (can_bind == CAN_BIND_AS_MAIN) {
>>>>>          if (pb->chassis != chassis_rec) {
>>>>> -            if (sb_readonly) {
>>>>> -                return false;
>>>>> -            }
>>>>> -
>>>>> -            if (pb->chassis) {
>>>>> -                VLOG_INFO("Changing chassis for lport %s from %s to
>>>> %s.",
>>>>> -                        pb->logical_port, pb->chassis->name,
>>>>> -                        chassis_rec->name);
>>>>> -            } else {
>>>>> -                VLOG_INFO("Claiming lport %s for this chassis.",
>>>>> -                          pb->logical_port);
>>>>> -            }
>>>>> -            for (size_t i = 0; i < pb->n_mac; i++) {
>>>>> -                VLOG_INFO("%s: Claiming %s", pb->logical_port,
>>>> pb->mac[i]);
>>>>> -            }
>>>>> -
>>>>> -            sbrec_port_binding_set_chassis(pb, chassis_rec);
>>>>>              if (is_additional_chassis(pb, chassis_rec)) {
>>>>> +                if (sb_readonly) {
>>>>> +                    return false;
>>>>> +                }
>>>>>                  remove_additional_chassis(pb, chassis_rec);
>>>>>              }
>>>>>              update_tracked = true;
>>>>>          }
>>>>> +        if (!notify_up) {
>>>>> +            if (!claimed_lport_set_up(pb, parent_pb, sb_readonly)) {
>>>>> +                return false;
>>>>> +            }
>>>>> +            if (pb->chassis != chassis_rec) {
>>>>> +                if (sb_readonly) {
>>>>> +                    return false;
>>>>> +                }
>>>>> +                set_pb_chassis_in_sbrec(pb, chassis_rec, true);
>>>>> +            }
>>>>> +        } else {
>>>>> +            if ((pb->chassis != chassis_rec) || (pb->n_up &&
>>>> !pb->up[0])) {
>>>>> +                if_status_mgr_claim_iface(if_mgr, pb, chassis_rec,
>>>>> +                                          sb_readonly);
>>>>> +            }
>>>>> +        }
>>>>>      } else if (can_bind == CAN_BIND_AS_ADDITIONAL) {
>>>>>          if (!is_additional_chassis(pb, chassis_rec)) {
>>>>>              if (sb_readonly) {
>>>>> @@ -1132,7 +1189,8 @@ claim_lport(const struct sbrec_port_binding *pb,
>>>>>   */
>>>>>  static bool
>>>>>  release_lport_main_chassis(const struct sbrec_port_binding *pb,
>>>>> -                           bool sb_readonly)
>>>>> +                           bool sb_readonly,
>>>>> +                           struct if_status_mgr *if_mgr)
>>>>>  {
>>>>>      if (pb->encap) {
>>>>>          if (sb_readonly) {
>>>>> @@ -1141,11 +1199,13 @@ release_lport_main_chassis(const struct
>>>> sbrec_port_binding *pb,
>>>>>          sbrec_port_binding_set_encap(pb, NULL);
>>>>>      }
>>>>>
>>>>> +    /* If sb readonly, pb->chassis unset through if-status if
> present.
>>>> */
>>>>>      if (pb->chassis) {
>>>>> -        if (sb_readonly) {
>>>>> +        if (!sb_readonly) {
>>>>> +            sbrec_port_binding_set_chassis(pb, NULL);
>>>>> +        } else if (!if_status_mgr_iface_is_present(if_mgr,
>>>> pb->logical_port)) {
>>>>>              return false;
>>>>>          }
>>>>> -        sbrec_port_binding_set_chassis(pb, NULL);
>>>>>      }
>>>>>
>>>>>      if (pb->virtual_parent) {
>>>>> @@ -1155,7 +1215,8 @@ release_lport_main_chassis(const struct
>>>> sbrec_port_binding *pb,
>>>>>          sbrec_port_binding_set_virtual_parent(pb, NULL);
>>>>>      }
>>>>>
>>>>> -    VLOG_INFO("Releasing lport %s from this chassis.",
>>>> pb->logical_port);
>>>>> +    VLOG_INFO("Releasing lport %s from this chassis
> (sb_readonly=%d)",
>>>>> +              pb->logical_port, sb_readonly);
>>>>>      return true;
>>>>>  }
>>>>>
>>>>> @@ -1189,7 +1250,7 @@ release_lport(const struct sbrec_port_binding
> *pb,
>>>>>                struct hmap *tracked_datapaths, struct if_status_mgr
>>>> *if_mgr)
>>>>>  {
>>>>>      if (pb->chassis == chassis_rec) {
>>>>> -        if (!release_lport_main_chassis(pb, sb_readonly)) {
>>>>> +        if (!release_lport_main_chassis(pb, sb_readonly, if_mgr)) {
>>>>>              return false;
>>>>>          }
>>>>>      } else if (is_additional_chassis(pb, chassis_rec)) {
>>>>> @@ -1271,7 +1332,7 @@ consider_vif_lport_(const struct
>>>> sbrec_port_binding *pb,
>>>>>                               b_lport->lbinding->iface,
>>>>>                               !b_ctx_in->ovnsb_idl_txn,
>>>>>                               !parent_pb,
> b_ctx_out->tracked_dp_bindings,
>>>>> -                             b_ctx_out->if_mgr)){
>>>>> +                             b_ctx_out->if_mgr)) {
>>>>>                  return false;
>>>>>              }
>>>>>
>>>>> @@ -1527,7 +1588,8 @@ consider_localport(const struct
> sbrec_port_binding
>>>> *pb,
>>>>>      enum can_bind can_bind = lport_can_bind_on_this_chassis(
>>>>>          b_ctx_in->chassis_rec, pb);
>>>>>      if (can_bind == CAN_BIND_AS_MAIN) {
>>>>> -        if (!release_lport_main_chassis(pb,
> !b_ctx_in->ovnsb_idl_txn)) {
>>>>> +        if (!release_lport_main_chassis(pb, !b_ctx_in->ovnsb_idl_txn,
>>>>> +            b_ctx_out->if_mgr)) {
>>>>>              return false;
>>>>>          }
>>>>>      } else if (can_bind == CAN_BIND_AS_ADDITIONAL) {
>>>>> diff --git a/controller/binding.h b/controller/binding.h
>>>>> index 1fed06674..d20659b0b 100644
>>>>> --- a/controller/binding.h
>>>>> +++ b/controller/binding.h
>>>>> @@ -151,8 +151,10 @@ const struct sbrec_port_binding
>>>> *local_binding_get_primary_pb(
>>>>>  ofp_port_t local_binding_get_lport_ofport(const struct shash
>>>> *local_bindings,
>>>>>                                            const char *pb_name);
>>>>>
>>>>> -bool local_binding_is_up(struct shash *local_bindings, const char
>>>> *pb_name);
>>>>> -bool local_binding_is_down(struct shash *local_bindings, const char
>>>> *pb_name);
>>>>> +bool local_binding_is_up(struct shash *local_bindings, const char
>>>> *pb_name,
>>>>> +                         const struct sbrec_chassis *);
>>>>> +bool local_binding_is_down(struct shash *local_bindings, const char
>>>> *pb_name,
>>>>> +                           const struct sbrec_chassis *);
>>>>>  void local_binding_set_up(struct shash *local_bindings, const char
>>>> *pb_name,
>>>>>                            const struct sbrec_chassis *chassis_rec,
>>>>>                            const char *ts_now_str, bool sb_readonly,
>>>>> @@ -160,7 +162,10 @@ void local_binding_set_up(struct shash
>>>> *local_bindings, const char *pb_name,
>>>>>  void local_binding_set_down(struct shash *local_bindings, const char
>>>> *pb_name,
>>>>>                              const struct sbrec_chassis *chassis_rec,
>>>>>                              bool sb_readonly, bool ovs_readonly);
>>>>> -
>>>>> +void local_binding_set_pb(struct shash *local_bindings, const char
>>>> *pb_name,
>>>>> +                          const struct sbrec_chassis *chassis_rec,
>>>>> +                          struct hmap *tracked_datapaths,
>>>>> +                          bool is_set);
>>>>>  void binding_register_ovs_idl(struct ovsdb_idl *);
>>>>>  void binding_run(struct binding_ctx_in *, struct binding_ctx_out *);
>>>>>  bool binding_cleanup(struct ovsdb_idl_txn *ovnsb_idl_txn,
>>>>> @@ -178,6 +183,10 @@ void binding_dump_local_bindings(struct
>>>> local_binding_data *, struct ds *);
>>>>>  bool is_additional_chassis(const struct sbrec_port_binding *pb,
>>>>>                             const struct sbrec_chassis *chassis_rec);
>>>>>
>>>>> +void set_pb_chassis_in_sbrec(const struct sbrec_port_binding *pb,
>>>>> +                             const struct sbrec_chassis *chassis_rec,
>>>>> +                             bool is_set);
>>>>> +
>>>>>  /* Corresponds to each Port_Binding.type. */
>>>>>  enum en_lport_type {
>>>>>      LP_UNKNOWN,
>>>>> diff --git a/controller/if-status.c b/controller/if-status.c
>>>>> index ad61844d8..7693c289b 100644
>>>>> --- a/controller/if-status.c
>>>>> +++ b/controller/if-status.c
>>>>> @@ -24,6 +24,7 @@
>>>>>  #include "lib/util.h"
>>>>>  #include "timeval.h"
>>>>>  #include "openvswitch/vlog.h"
>>>>> +#include "lib/ovn-sb-idl.h"
>>>>>
>>>>>  VLOG_DEFINE_THIS_MODULE(if_status);
>>>>>
>>>>> @@ -53,9 +54,11 @@ VLOG_DEFINE_THIS_MODULE(if_status);
>>>>>   */
>>>>>
>>>>>  enum if_state {
>>>>> -    OIF_CLAIMED,       /* Newly claimed interface. */
>>>>> -    OIF_INSTALL_FLOWS, /* Already claimed interface for which flows
> are
>>>> still
>>>>> -                        * being installed.
>>>>> +    OIF_CLAIMED,       /* Newly claimed interface. pb->chassis not
> yet
>>>> updated.
>>>>> +                        */
>>>>> +    OIF_INSTALL_FLOWS, /* Claimed interface with pb->chassis
>>>> successfully
>>>>> +                        * updated in SB and for which flows are still
>>>> being
>>>>> +                        * installed.
>>>>>                          */
>>>>>      OIF_MARK_UP,       /* Interface with flows successfully installed
>>>> in OVS
>>>>>                          * but not yet marked "up" in the binding
> module
>>>> (in
>>>>> @@ -78,6 +81,55 @@ static const char *if_state_names[] = {
>>>>>      [OIF_INSTALLED]     = "INSTALLED",
>>>>>  };
>>>>>
>>>>> +/*
>>>>> + *       +----------------------+
>>>>> + * +---> |                      |
>>>>> + * | +-> |         NULL         |
>>>> <--------------------------------------+++-+
>>>>> + * | |   +----------------------+
>>>>      |
>>>>> + * | |     ^ release_iface   | claim_iface
>>>>       |
>>>>> + * | |     |                 V - sbrec_update_chassis(if sb is rw)
>>>>       |
>>>>> + * | |   +----------------------+
>>>>      |
>>>>> + * | |   |                      |
>>>> <----------------------------------------+ |
>>>>> + * | |   |       CLAIMED        |
>>>> <--------------------------------------+ | |
>>>>> + * | |   +----------------------+
>>>>  | | |
>>>>> + * | |                  | mgr_update(when sb is rw)
>>>>  | | |
>>>>> + * | | release_iface    |  - sbrec_update_chassis
>>>>  | | |
>>>>> + * | |                  |  - request seqno
>>>>   | | |
>>>>> + * | |                  V
>>>>  | | |
>>>>> + * | |   +----------------------+
>>>>  | | |
>>>>> + * | +-- |                      |  mgr_run(seqno not rcvd)
>>>>   | | |
>>>>> + * |     |    INSTALL_FLOWS     |   - set port down in sb
>>>>  | | |
>>>>> + * |     |                      |  mgr_update()
>>>>  | | |
>>>>> + * |     +----------------------+   - sbrec_update_chassis if needed
>>>>   | | |
>>>>> + * |                    |
>>>>  | | |
>>>>> + * |                    |  mgr_run(seqno rcvd)
>>>>   | | |
>>>>> + * |                    |  - set port up in sb
>>>>   | | |
>>>>> + * | release_iface      |  - set ovn-installed in ovs
>>>>  | | |
>>>>> + * |                    V
>>>>  | | |
>>>>> + * |   +----------------------+
>>>>  | | |
>>>>> + * |   |                      |  mgr_run()
>>>>   | | |
>>>>> + * +-- |       MARK_UP        |  - set port up in sb
>>>>   | | |
>>>>> + *     |                      |  - set ovn-installed in ovs
>>>>  | | |
>>>>> + *     |                      |  mgr_update()
>>>>  | | |
>>>>> + *     +----------------------+  - sbrec_update_chassis if needed
>>>>  | | |
>>>>> + *              |
>>>>  | | |
>>>>> + *              | mgr_update(rcvd port up / ovn_installed & chassis
>>>> set) | | |
>>>>> + *              V
>>>>  | | |
>>>>> + *     +----------------------+
>>>>  | | |
>>>>> + *     |      INSTALLED       | ------------> claim_iface
>>>> ---------------+ | |
>>>>> + *     +----------------------+
>>>>    | |
>>>>> + *              |
>>>>    | |
>>>>> + *              | release_iface
>>>>    | |
>>>>> + *              V
>>>>    | |
>>>>> + *     +----------------------+
>>>>    | |
>>>>> + *     |                      | ------------> claim_iface
>>>> -----------------+ |
>>>>> + *     |      MARK_DOWN       | ------> mgr_update(rcvd port down)
>>>> ----------+
>>>>> + *     |                      | mgr_run()
>>>>> + *     |                      | - set port down in sb
>>>>> + *     |                      | mgr_update()
>>>>> + *     +----------------------+ - sbrec_update_chassis(NULL)
>>>>> + */
>>>>> +
>>>>>  struct ovs_iface {
>>>>>      char *id;               /* Extracted from OVS
>>>> external_ids.iface_id. */
>>>>>      enum if_state state;    /* State of the interface in the state
>>>> machine. */
>>>>> @@ -85,6 +137,7 @@ struct ovs_iface {
>>>>>                               * be fully programmed in OVS.  Only used
>>>> in state
>>>>>                               * OIF_INSTALL_FLOWS.
>>>>>                               */
>>>>> +    bool chassis_update_required;  /* If true, pb->chassis must be
>>>> updated. */
>>>>>  };
>>>>>
>>>>>  static uint64_t ifaces_usage;
>>>>> @@ -158,14 +211,23 @@ if_status_mgr_destroy(struct if_status_mgr *mgr)
>>>>>  }
>>>>>
>>>>>  void
>>>>> -if_status_mgr_claim_iface(struct if_status_mgr *mgr, const char
>>>> *iface_id)
>>>>> +if_status_mgr_claim_iface(struct if_status_mgr *mgr,
>>>>> +                          const struct sbrec_port_binding *pb,
>>>>> +                          const struct sbrec_chassis *chassis_rec,
>>>>> +                          bool sb_readonly)
>>>>>  {
>>>>> +    const char *iface_id = pb->logical_port;
>>>>>      struct ovs_iface *iface = shash_find_data(&mgr->ifaces,
> iface_id);
>>>>>
>>>>>      if (!iface) {
>>>>>          iface = ovs_iface_create(mgr, iface_id, OIF_CLAIMED);
>>>>>      }
>>>>> -
>>>>> +    if (!sb_readonly) {
>>>>> +        set_pb_chassis_in_sbrec(pb, chassis_rec, true);
>>>>> +        iface->chassis_update_required = false;
>>>>> +    } else {
>>>>> +        iface->chassis_update_required = true;
>>>>> +    }
>>>>>      switch (iface->state) {
>>>>>      case OIF_CLAIMED:
>>>>>      case OIF_INSTALL_FLOWS:
>>>>> @@ -182,6 +244,12 @@ if_status_mgr_claim_iface(struct if_status_mgr
>>>> *mgr, const char *iface_id)
>>>>>      }
>>>>>  }
>>>>>
>>>>> +bool
>>>>> +if_status_mgr_iface_is_present(struct if_status_mgr *mgr, const char
>>>> *iface_id)
>>>>> +{
>>>>> +    return !!shash_find_data(&mgr->ifaces, iface_id);
>>>>> +}
>>>>> +
>>>>>  void
>>>>>  if_status_mgr_release_iface(struct if_status_mgr *mgr, const char
>>>> *iface_id)
>>>>>  {
>>>>> @@ -246,9 +314,39 @@ if_status_mgr_delete_iface(struct if_status_mgr
>>>> *mgr, const char *iface_id)
>>>>>      }
>>>>>  }
>>>>>
>>>>> +bool
>>>>> +if_status_handle_claims(struct if_status_mgr *mgr,
>>>>> +                        struct local_binding_data *binding_data,
>>>>> +                        const struct sbrec_chassis *chassis_rec,
>>>>> +                        struct hmap *tracked_datapath,
>>>>> +                        bool sb_readonly)
>>>>> +{
>>>>> +    if (!binding_data || sb_readonly) {
>>>>> +        return false;
>>>>> +    }
>>>>> +
>>>>> +    struct shash *bindings = &binding_data->bindings;
>>>>> +    struct hmapx_node *node;
>>>>> +
>>>>> +    bool rc = false;
>>>>> +    HMAPX_FOR_EACH (node, &mgr->ifaces_per_state[OIF_CLAIMED]) {
>>>>> +        struct ovs_iface *iface = node->data;
>>>>> +        if (iface->chassis_update_required) {
>>>>
>>>> Thanks Xavier for the revision. The state machine looks more clear now,
>>>> but I have a major concern for the use of chassis_update_required. This
>>>> bool flag is used to decide if an update to SB is needed, and once a SB
>>>> update is requested, it is set to false, and assumes the SB update will
>>>> succeed immediately. However, the assumption may be wrong. There can be
>>>> different kinds of reasons that the subsequent SB update fails, or
> delayed,
>>>> so this flag is not reliable. Instead, in CLAIMED state, the
> responsibility
>>>> to make sure the SB update is completed. If the transaction is
> in-progress,
>>>> the sb_readonly is true. So if sb_readonly is false, it means nothing
> is
>>>> in-progress, so we can always check if (!sb_readonly && <SB chassis is
> not
>>>> updated for the port-binding>) we should just send the update,
> regardless
>>>> of whether we have requested it before. Please also see another comment
>>>> below for the state transition.
>>>>
>>>>> +            VLOG_INFO("if_status_handle_claims for %s", iface->id);
>>>>> +            local_binding_set_pb(bindings, iface->id, chassis_rec,
>>>>> +                                 tracked_datapath, true);
>>>>> +            rc = true;
>>>>> +        }
>>>>> +        iface->chassis_update_required = false;
>>>>> +    }
>>>>> +    return rc;
>>>>> +}
>>>>> +
>>>>>  void
>>>>>  if_status_mgr_update(struct if_status_mgr *mgr,
>>>>> -                     struct local_binding_data *binding_data)
>>>>> +                     struct local_binding_data *binding_data,
>>>>> +                     const struct sbrec_chassis *chassis_rec,
>>>>> +                     bool sb_readonly)
>>>>>  {
>>>>>      if (!binding_data) {
>>>>>          return;
>>>>> @@ -257,13 +355,25 @@ if_status_mgr_update(struct if_status_mgr *mgr,
>>>>>      struct shash *bindings = &binding_data->bindings;
>>>>>      struct hmapx_node *node;
>>>>>
>>>>> +    /* Interfaces in OIF_MARK_UP state have already set their
>>>> pb->chassis.
>>>>> +     * However, it might have been reset by another hv.
>>>>> +     */
>>>>>      /* Move all interfaces that have been confirmed "up" by the
> binding
>>>> module,
>>>>>       * from OIF_MARK_UP to OIF_INSTALLED.
>>>>>       */
>>>>>      HMAPX_FOR_EACH_SAFE (node, &mgr->ifaces_per_state[OIF_MARK_UP]) {
>>>>>          struct ovs_iface *iface = node->data;
>>>>>
>>>>> -        if (local_binding_is_up(bindings, iface->id)) {
>>>>> +        if (iface->chassis_update_required) {
>>>>> +            if (!sb_readonly) {
>>>>> +                iface->chassis_update_required = false;
>>>>> +                local_binding_set_pb(bindings, iface->id,
> chassis_rec,
>>>>> +                                     NULL, true);
>>>>> +            } else {
>>>>> +                continue;
>>>>> +            }
>>>>> +        }
>>>>> +        if (local_binding_is_up(bindings, iface->id, chassis_rec)) {
>>>>>              ovs_iface_set_state(mgr, iface, OIF_INSTALLED);
>>>>>          }
>>>>>      }
>>>>> @@ -274,23 +384,53 @@ if_status_mgr_update(struct if_status_mgr *mgr,
>>>>>      HMAPX_FOR_EACH_SAFE (node,
> &mgr->ifaces_per_state[OIF_MARK_DOWN]) {
>>>>>          struct ovs_iface *iface = node->data;
>>>>>
>>>>> -        if (local_binding_is_down(bindings, iface->id)) {
>>>>> +        if (!sb_readonly) {
>>>>> +            local_binding_set_pb(bindings, iface->id, chassis_rec,
>>>>> +                                 NULL, false);
>>>>> +        }
>>>>> +        if (local_binding_is_down(bindings, iface->id, chassis_rec))
> {
>>>>>              ovs_iface_destroy(mgr, iface);
>>>>>          }
>>>>>      }
>>>>>
>>>>> -    /* Register for a notification about flows being installed in OVS
>>>> for all
>>>>> -     * newly claimed interfaces.
>>>>> +    if (!sb_readonly) {
>>>>> +        HMAPX_FOR_EACH_SAFE (node,
>>>> &mgr->ifaces_per_state[OIF_INSTALL_FLOWS]) {
>>>>> +            struct ovs_iface *iface = node->data;
>>>>> +
>>>>> +            if (iface->chassis_update_required) {
>>>>> +                iface->chassis_update_required = false;
>>>>> +                local_binding_set_pb(bindings, iface->id,
> chassis_rec,
>>>>> +                                     NULL, true);
>>>>> +            }
>>>>> +        }
>>>>> +    }
>>>>> +
>>>>> +    /* Update Port_Binding->chassis for newly claimed interfaces
>>>>> +     * Register for a notification about flows being installed in OVS
>>>> for all
>>>>> +     * newly claimed interfaces for which we could update
> pb->chassis.
>>>>>       *
>>>>>       * Move them from OIF_CLAIMED to OIF_INSTALL_FLOWS.
>>>>>       */
>>>>> -    bool new_ifaces = false;
>>>>> -    HMAPX_FOR_EACH_SAFE (node, &mgr->ifaces_per_state[OIF_CLAIMED]) {
>>>>> -        struct ovs_iface *iface = node->data;
>>>>>
>>>>> -        ovs_iface_set_state(mgr, iface, OIF_INSTALL_FLOWS);
>>>>> -        iface->install_seqno = mgr->iface_seqno + 1;
>>>>> -        new_ifaces = true;
>>>>> +    bool new_ifaces = false;
>>>>> +    if (!sb_readonly) {
>>>>> +        HMAPX_FOR_EACH_SAFE (node,
> &mgr->ifaces_per_state[OIF_CLAIMED])
>>>> {
>>>>> +            struct ovs_iface *iface = node->data;
>>>>> +            /* No need to check for chassis_update_required as
> already
>>>> done
>>>>> +             * in if_status_handle_claims or
> if_status_mgr_claim_iface
>>>>> +             */
>>>>> +            ovs_iface_set_state(mgr, iface, OIF_INSTALL_FLOWS);
>>>>
>>>> We need to make sure the chassis in SB port-binding is up-to-date (i.e.
>>>> the update notification from SB DB has been received) before moving to
>>>> INSTALL_FLOWS. Otherwise, it is still possible that the state is moved
> too
>>>> early and end up with incomplete flow installation for the lport when
> the
>>>> state is finally moved to INSTALLED.
>>>>
>>>> Thanks,
>>>> Han
>>>>
>>>>> +            iface->install_seqno = mgr->iface_seqno + 1;
>>>>> +            new_ifaces = true;
>>>>> +        }
>>>>> +    } else {
>>>>> +        HMAPX_FOR_EACH_SAFE (node,
> &mgr->ifaces_per_state[OIF_CLAIMED])
>>>> {
>>>>> +            struct ovs_iface *iface = node->data;
>>>>> +            static struct vlog_rate_limit rl =
> VLOG_RATE_LIMIT_INIT(5,
>>>> 1);
>>>>> +            VLOG_INFO_RL(&rl,
>>>>> +                         "Not updating pb chassis for %s now as "
>>>>> +                         "sb is readonly", iface->id);
>>>>> +        }
>>>>>      }
>>>>>
>>>>>      /* Request a seqno update when the flows for new interfaces have
>>>> been
>>>>> @@ -403,7 +543,7 @@ if_status_mgr_update_bindings(struct if_status_mgr
>>>> *mgr,
>>>>>      struct hmapx_node *node;
>>>>>
>>>>>      /* Notify the binding module to set "down" all bindings that are
>>>> still
>>>>> -     * in the process of being installed in OVS, i.e., are not yet
>>>> instsalled.
>>>>> +     * in the process of being installed in OVS, i.e., are not yet
>>>> installed.
>>>>>       */
>>>>>      HMAPX_FOR_EACH (node, &mgr->ifaces_per_state[OIF_INSTALL_FLOWS])
> {
>>>>>          struct ovs_iface *iface = node->data;
>>>>> diff --git a/controller/if-status.h b/controller/if-status.h
>>>>> index bb8a3950d..f9b05d30d 100644
>>>>> --- a/controller/if-status.h
>>>>> +++ b/controller/if-status.h
>>>>> @@ -27,15 +27,27 @@ struct if_status_mgr *if_status_mgr_create(void);
>>>>>  void if_status_mgr_clear(struct if_status_mgr *);
>>>>>  void if_status_mgr_destroy(struct if_status_mgr *);
>>>>>
>>>>> -void if_status_mgr_claim_iface(struct if_status_mgr *, const char
>>>> *iface_id);
>>>>> +void if_status_mgr_claim_iface(struct if_status_mgr *,
>>>>> +                               const struct sbrec_port_binding *pb,
>>>>> +                               const struct sbrec_chassis
> *chassis_rec,
>>>>> +                               bool sb_readonly);
>>>>>  void if_status_mgr_release_iface(struct if_status_mgr *, const char
>>>> *iface_id);
>>>>>  void if_status_mgr_delete_iface(struct if_status_mgr *, const char
>>>> *iface_id);
>>>>>
>>>>> -void if_status_mgr_update(struct if_status_mgr *, struct
>>>> local_binding_data *);
>>>>> +void if_status_mgr_update(struct if_status_mgr *, struct
>>>> local_binding_data *,
>>>>> +                          const struct sbrec_chassis *chassis,
>>>>> +                          bool sb_readonly);
>>>>>  void if_status_mgr_run(struct if_status_mgr *mgr, struct
>>>> local_binding_data *,
>>>>>                         const struct sbrec_chassis *,
>>>>>                         bool sb_readonly, bool ovs_readonly);
>>>>>  void if_status_mgr_get_memory_usage(struct if_status_mgr *mgr,
>>>>>                                      struct simap *usage);
>>>>> +bool if_status_mgr_iface_is_present(struct if_status_mgr *mgr,
>>>>> +                                    const char *iface_id);
>>>>> +bool if_status_handle_claims(struct if_status_mgr *mgr,
>>>>> +                             struct local_binding_data *binding_data,
>>>>> +                             const struct sbrec_chassis *chassis_rec,
>>>>> +                             struct hmap *tracked_datapath,
>>>>> +                             bool sb_readonly);
>>>>>
>>>>>  # endif /* controller/if-status.h */
>>>>> diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c
>>>>> index 69615308e..3947baf03 100644
>>>>> --- a/controller/ovn-controller.c
>>>>> +++ b/controller/ovn-controller.c
>>>>> @@ -1464,6 +1464,73 @@ en_runtime_data_run(struct engine_node *node,
>>>> void *data)
>>>>>      engine_set_node_state(node, EN_UPDATED);
>>>>>  }
>>>>>
>>>>> +struct ed_type_sb_ro {
>>>>> +    bool sb_readonly;
>>>>> +};
>>>>> +
>>>>> +static void *
>>>>> +en_sb_ro_init(struct engine_node *node OVS_UNUSED,
>>>>> +              struct engine_arg *arg OVS_UNUSED)
>>>>> +{
>>>>> +    struct ed_type_sb_ro *data = xzalloc(sizeof *data);
>>>>> +    return data;
>>>>> +}
>>>>> +
>>>>> +static void
>>>>> +en_sb_ro_run(struct engine_node *node, void *data)
>>>>> +{
>>>>> +    struct ed_type_sb_ro *sb_ro_data = data;
>>>>> +    bool sb_readonly = !engine_get_context()->ovnsb_idl_txn;
>>>>> +    if (sb_ro_data->sb_readonly != sb_readonly) {
>>>>> +        sb_ro_data->sb_readonly = sb_readonly;
>>>>> +        if (!sb_ro_data->sb_readonly) {
>>>>> +            engine_set_node_state(node, EN_UPDATED);
>>>>> +        }
>>>>> +    }
>>>>> +}
>>>>> +
>>>>> +static void
>>>>> +en_sb_ro_cleanup(void *data OVS_UNUSED)
>>>>> +{
>>>>> +}
>>>>> +
>>>>> +static bool
>>>>> +runtime_data_sb_ro_handler(struct engine_node *node, void *data)
>>>>> +{
>>>>> +    const struct sbrec_chassis *chassis = NULL;
>>>>> +
>>>>> +    struct ovsrec_open_vswitch_table *ovs_table =
>>>>> +        (struct ovsrec_open_vswitch_table *)EN_OVSDB_GET(
>>>>> +            engine_get_input("OVS_open_vswitch", node));
>>>>> +
>>>>> +    const char *chassis_id = get_ovs_chassis_id(ovs_table);
>>>>> +
>>>>> +    struct ovsdb_idl_index *sbrec_chassis_by_name =
>>>>> +        engine_ovsdb_node_get_index(
>>>>> +                engine_get_input("SB_chassis", node),
>>>>> +                "name");
>>>>> +
>>>>> +    if (chassis_id) {
>>>>> +        chassis = chassis_lookup_by_name(sbrec_chassis_by_name,
>>>> chassis_id);
>>>>> +    }
>>>>> +    if (chassis) {
>>>>> +        struct ed_type_runtime_data *rt_data = data;
>>>>> +        bool sb_readonly = !engine_get_context()->ovnsb_idl_txn;
>>>>> +        struct controller_engine_ctx *ctrl_ctx =
>>>>> +            engine_get_context()->client_ctx;
>>>>> +
>>>>> +        if (if_status_handle_claims(ctrl_ctx->if_mgr,
>>>>> +                                    &rt_data->lbinding_data,
>>>>> +                                    chassis,
>>>>> +                                    &rt_data->tracked_dp_bindings,
>>>>> +                                    sb_readonly)) {
>>>>> +            engine_set_node_state(node, EN_UPDATED);
>>>>> +            rt_data->tracked = true;
>>>>> +        }
>>>>> +    }
>>>>> +    return true;
>>>>> +}
>>>>> +
>>>>>  static bool
>>>>>  runtime_data_ovs_interface_shadow_handler(struct engine_node *node,
>>>> void *data)
>>>>>  {
>>>>> @@ -3528,6 +3595,7 @@ main(int argc, char *argv[])
>>>>>      stopwatch_create(VIF_PLUG_RUN_STOPWATCH_NAME, SW_MS);
>>>>>
>>>>>      /* Define inc-proc-engine nodes. */
>>>>> +    ENGINE_NODE(sb_ro, "sb_ro");
>>>>>      ENGINE_NODE_WITH_CLEAR_TRACK_DATA_IS_VALID(ct_zones, "ct_zones");
>>>>>      ENGINE_NODE_WITH_CLEAR_TRACK_DATA(ovs_interface_shadow,
>>>>>                                        "ovs_interface_shadow");
>>>>> @@ -3664,6 +3732,7 @@ main(int argc, char *argv[])
>>>>>      engine_add_input(&en_ovs_interface_shadow, &en_ovs_interface,
>>>>>                       ovs_interface_shadow_ovs_interface_handler);
>>>>>
>>>>> +    engine_add_input(&en_runtime_data, &en_sb_ro,
>>>> runtime_data_sb_ro_handler);
>>>>>      engine_add_input(&en_runtime_data, &en_ofctrl_is_connected,
> NULL);
>>>>>
>>>>>      engine_add_input(&en_runtime_data, &en_ovs_open_vswitch, NULL);
>>>>> @@ -4098,7 +4167,8 @@ main(int argc, char *argv[])
>>>>>                          runtime_data ? &runtime_data->lbinding_data :
>>>> NULL;
>>>>>
>  stopwatch_start(IF_STATUS_MGR_UPDATE_STOPWATCH_NAME,
>>>>>                                      time_msec());
>>>>> -                    if_status_mgr_update(if_mgr, binding_data);
>>>>> +                    if_status_mgr_update(if_mgr, binding_data,
> chassis,
>>>>> +                                         !ovnsb_idl_txn);
>>>>>
>  stopwatch_stop(IF_STATUS_MGR_UPDATE_STOPWATCH_NAME,
>>>>>                                     time_msec());
>>>>>
>>>>> diff --git a/tests/ovn-macros.at b/tests/ovn-macros.at
>>>>> index 335f9158c..8fd6ae6f7 100644
>>>>> --- a/tests/ovn-macros.at
>>>>> +++ b/tests/ovn-macros.at
>>>>> @@ -759,3 +759,15 @@
> m4_define([OVN_FOR_EACH_NORTHD_WITHOUT_DP_GROUPS],
>>>>>       [m4_foreach([NORTHD_USE_DP_GROUPS], [no],
>>>>>         [m4_foreach([NORTHD_USE_PARALLELIZATION], [yes, no], [$1
>>>>>  ])])])])
>>>>> +
>>>>> +# OVN_NBCTL(NBCTL_COMMAND) adds NBCTL_COMMAND to list of commands to
> be
>>>> run by RUN_OVN_NBCTL().
>>>>> +m4_define([OVN_NBCTL], [
>>>>> +    command="${command} -- $1"
>>>>> +])
>>>>> +
>>>>> +# RUN_OVN_NBCTL() executes list of commands built by the OVN_NBCTL()
>>>> macro.
>>>>> +m4_define([RUN_OVN_NBCTL], [
>>>>> +    check ovn-nbctl ${command}
>>>>> +    unset command
>>>>> +])
>>>>> +
>>>>> diff --git a/tests/ovn.at b/tests/ovn.at
>>>>> index bfaa41962..94d16bac9 100644
>>>>> --- a/tests/ovn.at
>>>>> +++ b/tests/ovn.at
>>>>> @@ -102,6 +102,18 @@ m4_divert_text([PREPARE_TESTS],
>>>>>           test 1 -le $(as $hv1 ovs-ofctl dump-flows br-int | grep -c
>>>> "output:$ofport")
>>>>>       ])
>>>>>     }
>>>>> +
>>>>> +   ovn_wait_remote_input_flows () {
>>>>> +     hv1=$1
>>>>> +     hv2=$2
>>>>> +     echo "$3: waiting for flows for remote input on $hv1"
>>>>> +     # Wait for a flow outputing  to remote input
>>>>> +     OVS_WAIT_UNTIL([
>>>>> +         ofport=$(as $hv1 ovs-vsctl --bare --columns ofport find
>>>> Interface name=ovn-${hv2}-0)
>>>>> +         echo "tunnel port=$ofport"
>>>>> +         test 1 -le $(as $hv1 ovs-ofctl dump-flows br-int | grep -c
>>>> "in_port=$ofport")
>>>>> +     ])
>>>>> +   }
>>>>>  ])
>>>>>
>>>>>  m4_define([OVN_CHECK_PACKETS],
>>>>> @@ -127,6 +139,8 @@ m4_define([OVN_WAIT_PATCH_PORT_FLOWS],
>>>>>  m4_define([OVN_WAIT_REMOTE_OUTPUT_FLOWS],
>>>>>    [ovn_wait_remote_output_flows "$1" "$2" "__file__:__line__"])
>>>>>
>>>>> +m4_define([OVN_WAIT_REMOTE_INPUT_FLOWS],
>>>>> +  [ovn_wait_remote_input_flows "$1" "$2" "__file__:__line__"])
>>>>>
>>>>>  AT_BANNER([OVN components])
>>>>>
>>>>> @@ -14056,6 +14070,11 @@ wait_column "$hv1_uuid" Port_Binding
>>>> requested_chassis logical_port=lsp0
>>>>>  wait_column "$hv2_uuid" Port_Binding additional_chassis
>>>> logical_port=lsp0
>>>>>  wait_column "$hv2_uuid" Port_Binding requested_additional_chassis
>>>> logical_port=lsp0
>>>>>
>>>>> +# Check ovn-installed updated for main chassis
>>>>> +wait_for_ports_up
>>>>> +OVS_WAIT_UNTIL([test `as hv1 ovs-vsctl get Interface lsp0
>>>> external_ids:ovn-installed` = '"true"'])
>>>>> +OVS_WAIT_UNTIL([test x`as hv2 ovs-vsctl get Interface lsp0
>>>> external_ids:ovn-installed` = x])
>>>>> +
>>>>>  # Check that setting iface:encap-ip populates
>>>> Port_Binding:additional_encap
>>>>>  wait_row_count Encap 2 chassis_name=hv1
>>>>>  wait_row_count Encap 2 chassis_name=hv2
>>>>> @@ -14081,6 +14100,11 @@ wait_column "$hv2_uuid" Port_Binding
>>>> requested_chassis logical_port=lsp0
>>>>>  wait_column "" Port_Binding additional_chassis logical_port=lsp0
>>>>>  wait_column "" Port_Binding requested_additional_chassis
>>>> logical_port=lsp0
>>>>>
>>>>> +# Check ovn-installed updated for main chassis and not for other
> chassis
>>>>> +wait_for_ports_up
>>>>> +OVS_WAIT_UNTIL([test `as hv2 ovs-vsctl get Interface lsp0
>>>> external_ids:ovn-installed` = '"true"'])
>>>>> +OVS_WAIT_UNTIL([test x`as hv1 ovs-vsctl get Interface lsp0
>>>> external_ids:ovn-installed` = x])
>>>>> +
>>>>>  # Check that additional_encap is cleared
>>>>>  wait_column "" Port_Binding additional_encap logical_port=lsp0
>>>>>
>>>>> @@ -15370,7 +15394,10 @@ AT_CHECK([as hv1 ovs-ofctl dump-flows br-int
>>>> table=65 | grep actions=output:1],
>>>>>  echo "verifying that lsp0 binding moves when requested-chassis is
>>>> changed"
>>>>>
>>>>>  ovn-nbctl lsp-set-options lsp0 requested-chassis=hv2
>>>>> -OVS_WAIT_UNTIL([test 1 = $(grep -c "Releasing lport lsp0 from this
>>>> chassis" hv1/ovn-controller.log)])
>>>>> +
>>>>> +# We might see multiple "Releasing lport ...", when sb is read only
>>>>> +OVS_WAIT_UNTIL([test 1 -le $(grep -c "Releasing lport lsp0 from this
>>>> chassis" hv1/ovn-controller.log)])
>>>>> +
>>>>>  wait_column "$hv2_uuid" Port_Binding chassis logical_port=lsp0
>>>>>
>>>>>  # (6) Chassis hv2 should add flows and hv1 should not.
>>>>> @@ -15416,7 +15443,7 @@ AT_CHECK([as hv1 ovs-ofctl dump-flows br-int
>>>> table=0 | grep in_port=1], [0], [ig
>>>>>  AT_CHECK([as hv1 ovs-ofctl dump-flows br-int table=65 | grep
>>>> actions=output:1], [0], [ignore])
>>>>>
>>>>>  check ovn-nbctl --wait=hv lsp-set-options lsp0
>>>> requested-chassis=non-existant-chassis
>>>>> -OVS_WAIT_UNTIL([test 1 = $(grep -c "Releasing lport lsp0 from this
>>>> chassis" hv1/ovn-controller.log)])
>>>>> +OVS_WAIT_UNTIL([test 1 -le $(grep -c "Releasing lport lsp0 from this
>>>> chassis" hv1/ovn-controller.log)])
>>>>>  check ovn-nbctl --wait=hv sync
>>>>>  wait_column '' Port_Binding chasssi logical_port=lsp0
>>>>>  AT_CHECK([as hv1 ovs-ofctl dump-flows br-int table=0 | grep
> in_port=1],
>>>> [1], [])
>>>>> @@ -32418,3 +32445,119 @@ AT_CHECK([test $(ovn-sbctl list fdb | grep
> -c
>>>> "00:00:00:00:10:30") = 0])
>>>>>  OVN_CLEANUP([hv1])
>>>>>  AT_CLEANUP
>>>>>  ])
>>>>> +
>>>>> +OVN_FOR_EACH_NORTHD([
>>>>> +AT_SETUP([recomputes])
>>>>> +ovn_start
>>>>> +
>>>>> +n_hv=4
>>>>> +
>>>>> +# Add chassis
>>>>> +net_add n1
>>>>> +for i in $(seq 1 $n_hv); do
>>>>> +    sim_add hv$i
>>>>> +    as hv$i
>>>>> +    check ovs-vsctl add-br br-phys
>>>>> +    ovs-vsctl set open .
> external-ids:ovn-bridge-mappings=phys:br-phys
>>>>> +    ovn_attach n1 br-phys 192.168.0.$i 24 geneve
>>>>> +done
>>>>> +
>>>>> +add_switch_ports() {
>>>>> +    start_port=$1
>>>>> +    end_port=$2
>>>>> +    nb_hv=$3
>>>>> +    bulk_size=$4
>>>>> +    for ((i=start_port; i<end_port; )) do
>>>>> +        start_bulk=$i
>>>>> +        for hv in $(seq 1 $nb_hv); do
>>>>> +            end_bulk=$((start_bulk+bulk_size-1))
>>>>> +            for port in $(seq $start_bulk $end_bulk); do
>>>>> +                logical_switch_port=lsp${port}
>>>>> +                OVN_NBCTL(lsp-add ls1 $logical_switch_port)
>>>>> +                OVN_NBCTL(lsp-set-addresses $logical_switch_port
>>>> dynamic)
>>>>> +            done
>>>>> +            start_bulk=$((end_bulk+1))
>>>>> +        done
>>>>> +        RUN_OVN_NBCTL()
>>>>> +
>>>>> +        start_bulk=$i
>>>>> +        for hv in $(seq 1 $nb_hv); do
>>>>> +            end_bulk=$((start_bulk+bulk_size-1))
>>>>> +            for port in $(seq $start_bulk $end_bulk); do
>>>>> +                logical_switch_port=lsp${port}
>>>>> +                as hv$hv ovs-vsctl \
>>>>> +                    --no-wait -- add-port br-int vif${port} \
>>>>> +                    -- set Interface vif${port}
>>>> external_ids:iface-id=$logical_switch_port
>>>>> +            done
>>>>> +            start_bulk=$((end_bulk+1))
>>>>> +        done
>>>>> +        i=$((end_bulk+1))
>>>>> +    done
>>>>> +}
>>>>> +check ovn-nbctl ls-add ls1
>>>>> +check ovn-nbctl set Logical_Switch ls1 other_config:subnet=
> 10.1.0.0/16
>>>>> +check ovn-nbctl set Logical_Switch ls1
>>>> other_config:exclude_ips=10.1.255.254
>>>>> +
>>>>> +check ovn-nbctl lr-add lr1
>>>>> +check ovn-nbctl lsp-add ls1 lsp0 -- set Logical_Switch_Port lsp0
>>>> type=router options:router-port=lrp0 addresses=dynamic
>>>>> +check ovn-nbctl lrp-add lr1 lrp0 "f0:00:00:01:00:01" 10.1.255.254/16
>>>>> +check ovn-nbctl lr-nat-add lr1 snat 10.2.0.1 10.1.0.0/16
>>>>> +
>>>>> +lflow_run=0
>>>>> +check ovn-nbctl --wait=hv sync
>>>>> +
>>>>> +# Tunnel ports might not be added (yet) at this point on slow system.
>>>>> +# Wait for flows related to such ports to ensure those ports have
> been
>>>> added
>>>>> +# before we measure recomputes. Otherwise, ovs_interface handler
> might
>>>> be run
>>>>> +# afterwards for tunnel ports, causing recomputes.
>>>>> +for i in $(seq 1 $n_hv); do
>>>>> +    for j in $(seq 1 $n_hv); do
>>>>> +        if test $i != $j; then
>>>>> +            OVN_WAIT_REMOTE_INPUT_FLOWS(["hv$i"],["hv$j"])
>>>>> +        fi
>>>>> +    done
>>>>> +done
>>>>> +
>>>>> +for i in $(seq 1 $n_hv); do
>>>>> +    as hv$i
>>>>> +    lflow_run1=$(ovn-appctl -t ovn-controller coverage/read-counter
>>>> lflow_run)
>>>>> +    lflow_run=`expr $lflow_run1 + $lflow_run`
>>>>> +done
>>>>> +
>>>>> +add_switch_ports 1 1000 $n_hv 5
>>>>> +
>>>>> +wait_for_ports_up
>>>>> +check ovn-nbctl --wait=hv sync
>>>>> +
>>>>> +for i in $(seq 1 $n_hv); do
>>>>> +    pid=$(cat hv${i}/ovn-controller.pid)
>>>>> +    u=$((u+$(cat "/proc/$pid/stat" | awk '{print $14}')))
>>>>> +    s=$((s+$(cat "/proc/$pid/stat" | awk '{print $15}')))
>>>>> +done
>>>>> +
>>>>> +n_pid=$(cat northd/ovn-northd.pid)
>>>>> +n_u=$(cat "/proc/$pid/stat" | awk '{print $14}')
>>>>> +n_s=$(cat "/proc/$pid/stat" | awk '{print $15}')
>>>>> +
>>>>> +echo "Total Northd User Time: $n_u"
>>>>> +echo "Total Northd System Time: $n_s"
>>>>> +echo "Total Controller User Time: $u"
>>>>> +echo "Total Controller System Time: $s"
>>>>> +
>>>>> +lflow_run_end=0
>>>>> +for i in $(seq 1 $n_hv); do
>>>>> +    as hv$i
>>>>> +    lflow_run1=$(ovn-appctl -t ovn-controller coverage/read-counter
>>>> lflow_run)
>>>>> +    lflow_run_end=`expr $lflow_run1 + $lflow_run_end`
>>>>> +done
>>>>> +n_recomputes=`expr $lflow_run_end - $lflow_run`
>>>>> +echo "$n_recomputes recomputes"
>>>>> +
>>>>> +AT_CHECK([test $lflow_run_end == $lflow_run])
>>>>> +
>>>>> +for i in $(seq 2 $n_hv); do
>>>>> +    OVN_CLEANUP_SBOX([hv$i])
>>>>> +done
>>>>> +OVN_CLEANUP([hv1])
>>>>> +AT_CLEANUP
>>>>> +])
>>>>> diff --git a/tests/perf-northd.at b/tests/perf-northd.at
>>>>> index 74b69e9d4..6ec196b36 100644
>>>>> --- a/tests/perf-northd.at
>>>>> +++ b/tests/perf-northd.at
>>>>> @@ -76,23 +76,6 @@ m4_define([PERF_RECORD_STOP], [
>>>>>      PERF_RECORD_STOPWATCH(ovn-northd-loop, ["Short term average"],
>>>> [Average (northd-loop in msec)])
>>>>>  ])
>>>>>
>>>>> -# OVN_NBCTL([NBCTL_COMMAND])
>>>>> -#
>>>>> -# Add NBCTL_COMMAND to list of commands to be run by RUN_OVN_NBCTL().
>>>>> -#
>>>>> -m4_define([OVN_NBCTL], [
>>>>> -    command="${command} -- $1"
>>>>> -])
>>>>> -
>>>>> -# RUN_OVN_NBCTL()
>>>>> -#
>>>>> -# Execute list of commands built by the OVN_NBCTL() macro.
>>>>> -#
>>>>> -m4_define([RUN_OVN_NBCTL], [
>>>>> -    check ovn-nbctl ${command}
>>>>> -    unset command
>>>>> -])
>>>>> -
>>>>>  OVS_START_SHELL_HELPERS
>>>>>  generate_subnet () {
>>>>>      local a=$(printf %d $(expr $1 / 256 + 10))
>>>>> --
>>>>> 2.31.1
>>>>>
>>>>
>>>
>>
>
Han Zhou July 13, 2022, 6:27 a.m. UTC | #7
On Tue, Jul 12, 2022 at 12:35 AM Dumitru Ceara <dceara@redhat.com> wrote:
>
> On 7/12/22 08:52, Han Zhou wrote:
> > On Mon, Jul 11, 2022 at 4:55 AM Dumitru Ceara <dceara@redhat.com> wrote:
> >>
> >> On 7/11/22 13:31, Xavier Simonart wrote:
> >>> Hi Han
> >>>
> >>> Thanks for your review.
> >>>
> >>> Let me try to understand your two main concerns and the proper way to
> > fix
> >>> it.
> >>> 1) We only try once to write pb->chassis. If the commit fails,
> > pb->chassis
> >>> is not written. As commit fails, we will recompute, but as the
> >>> update_required flag is not set anymore, we might end up with no
> >>> pb->chassis.
> >>> => I'll remove the flag and try to update until it's confirmed.
> >
> > Thank you!
> >
> >>> 2) The state machine, and when we move to INSTALL_FLOWS. Serializing
the
> >>> state machine, by waiting for confirmation to be received before
moving
> > to
> >>> INSTALL_FLOWS state will delay the ovn-installed compared to today. So
> > I am
> >>> (still) trying to see if there is any way to prevent this in some
cases.
> >>> Would it be correct to do this serialization (wait for pb->chassis
> > update
> >>> confirmation) only when using conditional monitoring? When using
> >>> monitor-all, as soon as we have written (w/o confirmation)
pb->chassis,
> > we
> >>> would move to INSTALL_FLOWS. In that loop where we wrote pb->chassis,
> > (all)
> >>> the flows should be updated taking into account pb->chassis.
> >>
> >> I think this approach is OK.  ovn-controllers that don't use
conditional
> >> monitoring already know the complete SB contents and should be able to
> >> install (mostly?) complete sets of openflows that correspond to a given
> >> Port_Binding.
> >>
> >> It's also "CMS-friendly", at least for ovn-kubernetes, which uses
> >> ovn-monitor-all=true and only waits for
> >> OVS.Interface.external_ids:ovn-installed=true.  And shouldn't impact
the
> >> others which wait for SB.Port_Binding.up=true.
> >>
> >
> > Xavier and Dumitru, I think we shouldn't introduce the dependency of
> > "ovn-monitor-all" setting here.
> >
> > First of all, ovn-installed is a flag for CMS to understand that all the
> > flows related to the port-binding is installed. If we set the flag
before
> > it is truly completed, it is a bug, and it is possible that the flag is
set
> > but some traffic doesn't work.
>
> I think it's a matter of semantics.  The way I see "ovn-installed=true"
> is: all flows that are relevant to the port binidng on the local chassis
> have been installed.  When we added it it was for the ovn-k8s case;
> ovn-k8s used to explicitly check if some openflow tables on the node
> where the pod is being brought up contained flows that seemed to
> correspond to the pod (e.g., matching on pod mac and IP addresses).
>
But the purpose of checking the flows (when ovn-installed wasn't available)
was to make sure the pod is ready to send/receive traffic. If ovn-installed
can provide more accuracy, why not?

> > I did a quick test, and at least a flow in (table_id=38, priority=100)
> > which is multicast-group related is updated AFTER the SB notification is
> > received for the port-binding chassis update.
> >
>
> This sounds like something we should fix, I think.  I don't see any
> multicast-group changes conditioned by the port_binding being up=true in
> northd.  I might be wrong though.
>

It is not about "up=true". It is triggered by the port-binding->chassis
update. Since multicast-group has reference to port-binding, so a
port-binding update triggers multicast-group change handling, which is
required because physical flows related to the MC group need to be updated
when port-binding->chassis is updated. You may argue that the IDL may be
optimized so that the MC group change can be triggered and handled before
SB is updated, but I am not sure if the benefit is worth the complexity.
Given how OVSDB IDL transaction is designed, I'd always think a DB record
is *formally* updated only after the update notification is received from
the server, which seems to be safe and clear.

> > Secondly, if the change hasn't made it to the SB, all the other nodes
would
> > not be able to reach the port, which means the workload (pod/VM) cannot
> > receive traffic yet at this phase.
> >
>
> Even if the change made it to the SB we have no way of knowing that all
> other nodes processed it so we cannot know for sure that traffic can
> flow properly end-to-end.  But, like I said above, this doesn't matter
> if the semantics of ovn-installed=true are "all locally relevant flows
> are installed".
>
It's true that even SB is updated it doesn't ensure all the nodes processed
it, but I view it this way: at least from the current node's point of view,
its job is done and the other nodes are beyond its control. On the other
hand, if SB update failed, its job is not done yet. I am not saying this is
the only *correct* way, but just the way I am seeing it :).

> > So, I think our goal is not to set ovn-installed early, but to set it
> > accurately (sometime may be ok to be conservative).
> >
>
> Sure, but waiting for the SB port_binding.chassis update might introduce
> significant spikes in latency if the SB is compacting (or just busy) at
> that moment.
>
> This might become an issue in large scale deployments as pods will take
> longer to be declared "ready".
>
I understand your concern, but if you think about it, no matter how the
pods are *declared* ready doesn't change the fact it is ready or not. It
doesn't make the real flow setup faster or slower.
If the CMS really wants to declare it ready earlier, it can just ignore the
ovn-installed flag check or flow check. What's the real benefit except for
metrics?

> > In addition, ovn-monitor-all is not always true even in ovn-k8s. It is
> > configurable in ovn-k8s. (in our environment we set it to false, to save
> > MEM and CPU for worker nodes, while sacrifice a little for the central
SB
> > DB)
> >
>
> Ack.  But for this case specifically, as SB is already busier with
> conditional monitoring, I think serializing events in ovn-controller
> will create even more visible delays in pod bringup times.
>
> Thanks,
> Dumitru
>
> > Thanks,
> > Han
> >>>
> >>> Thanks again for your feedback
> >>>
> >>> Xavier
> >>>
> >>> On Thu, Jul 7, 2022 at 8:49 AM Han Zhou <hzhou@ovn.org> wrote:
> >>>
> >>>>
> >>>>
> >>>> On Wed, Jun 22, 2022 at 3:23 AM Xavier Simonart <xsimonar@redhat.com>
> >>>> wrote:
> >>>>>
> >>>>> When VIF ports are claimed on a chassis, SBDB Port_Binding table is
> >>>> updated.
> >>>>> If the SBDB IDL is still is read-only ("in transaction") when such a
> >>>> update
> >>>>> is required, the update is not possible and recompute is triggered
> >>>> through
> >>>>> I+P failure.
> >>>>>
> >>>>> This situation can happen:
> >>>>> - after updating Port_Binding->chassis to SBDB for one port, in a
> >>>> following
> >>>>>   iteration, ovn-controller handles
> > Interface:external_ids:ovn-installed
> >>>>>   (for the same port) while SBDB is still read-only.
> >>>>> - after updating Port_Binding->chassis to SBDB for one port, in a
> >>>> following
> >>>>>   iteration, ovn-controller updates Port_Binding->chassis for
another
> >>>> port,
> >>>>>   while SBDB is still read-only.
> >>>>>
> >>>>> This patch prevent the recompute, by having the if-status module
> >>>>> updating the Port_Binding chassis (if needed) when possible.
> >>>>> This does not delay Port_Binding chassis update compared to before
> > this
> >>>> patch.
> >>>>> - With the patch, Port_Binding chassis will be updated as soon as
> > SBDB is
> >>>>> again writable, without recompute.
> >>>>> - Without the patch, Port_Binding chassis was updated as soon as
SBDB
> > was
> >>>>> again writable, through a recompute.
> >>>>>
> >>>>> As part of this patch, ovn-installed will not be updated for
> > additional
> >>>> chassis;
> >>>>> it will only be updated when the migration is completed.
> >>>>>
> >>>>> Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2059253
> >>>>> Signed-off-by: Xavier Simonart <xsimonar@redhat.com>
> >>>>>
> >>>>> ---
> >>>>> v2:  - handled Dumitru's comments.
> >>>>>      - handled Han's comments, mainly ensure we moved out of CLAIMED
> >>>> state
> >>>>>        only after updating pb->chassis to guarentee physical flows
are
> >>>> installed
> >>>>>        when ovn-installed is updated in OVS.
> >>>>>      - slighly reorganize the code to isolate 'notify_up = false'
> > cases
> >>>> in
> >>>>>        claim_port (i.e. ports such as virtual ports), in the idea of
> >>>> making
> >>>>>        future patch preventing recomputes when virtual ports are
> > claimed.
> >>>>>      - updated test case to cause more race conditions.
> >>>>>      - rebased on origin/main
> >>>>>      - note that "additional chassis" as now supported by
> >>>>>        "Support LSP:options:requested-chassis as a list" might still
> >>>> cause
> >>>>>        recomputes.
> >>>>>      - fixed missing flows when Port_Binding chassis was updated by
> >>>> mgr_update
> >>>>>        w/o any lflow recalculation.
> >>>>> v3:  - handled Dumitru's comments on v2, mainly have runtime_data
> > handler
> >>>>>        handling pb_claims when sb becomes writable (instead of a
lflow
> >>>> handler).
> >>>>>      - fixed test as it was not checking recomputes on all hv, as
well
> >>>> as a flaky
> >>>>>        behavior.
> >>>>>      - rebased on origin/main.
> >>>>> ---
> >>>>>  controller/binding.c        | 154 +++++++++++++++++++++----------
> >>>>>  controller/binding.h        |  15 +++-
> >>>>>  controller/if-status.c      | 174
> > ++++++++++++++++++++++++++++++++----
> >>>>>  controller/if-status.h      |  16 +++-
> >>>>>  controller/ovn-controller.c |  72 ++++++++++++++-
> >>>>>  tests/ovn-macros.at         |  12 +++
> >>>>>  tests/ovn.at                | 147 +++++++++++++++++++++++++++++-
> >>>>>  tests/perf-northd.at        |  17 ----
> >>>>>  8 files changed, 519 insertions(+), 88 deletions(-)
> >>>>>
> >>>>> diff --git a/controller/binding.c b/controller/binding.c
> >>>>> index 2279570f9..b21577f71 100644
> >>>>> --- a/controller/binding.c
> >>>>> +++ b/controller/binding.c
> >>>>> @@ -644,11 +644,17 @@ local_binding_get_lport_ofport(const struct
> > shash
> >>>> *local_bindings,
> >>>>>  }
> >>>>>
> >>>>>  bool
> >>>>> -local_binding_is_up(struct shash *local_bindings, const char
> > *pb_name)
> >>>>> +local_binding_is_up(struct shash *local_bindings, const char
> > *pb_name,
> >>>>> +                    const struct sbrec_chassis *chassis_rec)
> >>>>>  {
> >>>>>      struct local_binding *lbinding =
> >>>>>          local_binding_find(local_bindings, pb_name);
> >>>>>      struct binding_lport *b_lport =
> >>>> local_binding_get_primary_lport(lbinding);
> >>>>> +
> >>>>> +    if (b_lport && b_lport->pb->chassis != chassis_rec) {
> >>>>> +        return false;
> >>>>> +    }
> >>>>> +
> >>>>>      if (lbinding && b_lport && lbinding->iface) {
> >>>>>          if (b_lport->pb->n_up && !b_lport->pb->up[0]) {
> >>>>>              return false;
> >>>>> @@ -660,13 +666,23 @@ local_binding_is_up(struct shash
> > *local_bindings,
> >>>> const char *pb_name)
> >>>>>  }
> >>>>>
> >>>>>  bool
> >>>>> -local_binding_is_down(struct shash *local_bindings, const char
> > *pb_name)
> >>>>> +local_binding_is_down(struct shash *local_bindings, const char
> > *pb_name,
> >>>>> +                      const struct sbrec_chassis *chassis_rec)
> >>>>>  {
> >>>>>      struct local_binding *lbinding =
> >>>>>          local_binding_find(local_bindings, pb_name);
> >>>>>
> >>>>>      struct binding_lport *b_lport =
> >>>> local_binding_get_primary_lport(lbinding);
> >>>>>
> >>>>> +    if (b_lport) {
> >>>>> +        if (b_lport->pb->chassis == chassis_rec) {
> >>>>> +            return false;
> >>>>> +        } else if (b_lport->pb->chassis) {
> >>>>> +            VLOG_DBG("lport %s already claimed by other chassis",
> >>>>> +                     b_lport->pb->logical_port);
> >>>>> +        }
> >>>>> +    }
> >>>>> +
> >>>>>      if (!lbinding) {
> >>>>>          return true;
> >>>>>      }
> >>>>> @@ -884,37 +900,80 @@ get_lport_type_str(enum en_lport_type
> > lport_type)
> >>>>>      OVS_NOT_REACHED();
> >>>>>  }
> >>>>>
> >>>>> -/* For newly claimed ports, if 'notify_up' is 'false':
> >>>>> +void
> >>>>> +set_pb_chassis_in_sbrec(const struct sbrec_port_binding *pb,
> >>>>> +                        const struct sbrec_chassis *chassis_rec,
> >>>>> +                        bool is_set)
> >>>>> +{
> >>>>> +    if (pb->chassis != chassis_rec) {
> >>>>> +         if (is_set) {
> >>>>> +            if (pb->chassis) {
> >>>>> +                VLOG_INFO("Changing chassis for lport %s from %s to
> >>>> %s.",
> >>>>> +                          pb->logical_port, pb->chassis->name,
> >>>>> +                          chassis_rec->name);
> >>>>> +            } else {
> >>>>> +                VLOG_INFO("Claiming lport %s for this chassis.",
> >>>>> +                          pb->logical_port);
> >>>>> +            }
> >>>>> +            for (int i = 0; i < pb->n_mac; i++) {
> >>>>> +                VLOG_INFO("%s: Claiming %s", pb->logical_port,
> >>>> pb->mac[i]);
> >>>>> +            }
> >>>>> +            sbrec_port_binding_set_chassis(pb, chassis_rec);
> >>>>> +        }
> >>>>> +    } else if (!is_set) {
> >>>>> +        sbrec_port_binding_set_chassis(pb, NULL);
> >>>>> +    }
> >>>>> +}
> >>>>> +
> >>>>> +void
> >>>>> +local_binding_set_pb(struct shash *local_bindings, const char
> > *pb_name,
> >>>>> +                     const struct sbrec_chassis *chassis_rec,
> >>>>> +                     struct hmap *tracked_datapaths, bool is_set)
> >>>>> +{
> >>>>> +    struct local_binding *lbinding =
> >>>>> +        local_binding_find(local_bindings, pb_name);
> >>>>> +    struct binding_lport *b_lport =
> >>>> local_binding_get_primary_lport(lbinding);
> >>>>> +
> >>>>> +    if (b_lport) {
> >>>>> +        set_pb_chassis_in_sbrec(b_lport->pb, chassis_rec, is_set);
> >>>>> +        if (tracked_datapaths) {
> >>>>> +            update_lport_tracking(b_lport->pb, tracked_datapaths,
> > true);
> >>>>> +        }
> >>>>> +    }
> >>>>> +}
> >>>>> +
> >>>>> +/* For newly claimed ports:
> >>>>>   * - set the 'pb.up' field to true if 'pb' has no 'parent_pb'.
> >>>>>   * - set the 'pb.up' field to true if 'parent_pb.up' is 'true'
(e.g.,
> >>>> for
> >>>>>   *   container and virtual ports).
> >>>>> - * Otherwise request a notification to be sent when the OVS flows
> >>>>> - * corresponding to 'pb' have been installed.
> >>>>> + *
> >>>>> + * Returns false if lport is not claimed due to 'sb_readonly'.
> >>>>> + * Returns true otherwise.
> >>>>>   *
> >>>>>   * Note:
> >>>>> - *   Updates (directly or through a notification) the 'pb->up'
field
> >>>> only if
> >>>>> - *   it's explicitly set to 'false'.
> >>>>> + *   Updates the 'pb->up' field only if it's explicitly set to
> > 'false'.
> >>>>>   *   This is to ensure compatibility with older versions of
> > ovn-northd.
> >>>>>   */
> >>>>> -static void
> >>>>> +static bool
> >>>>>  claimed_lport_set_up(const struct sbrec_port_binding *pb,
> >>>>>                       const struct sbrec_port_binding *parent_pb,
> >>>>> -                     const struct sbrec_chassis *chassis_rec,
> >>>>> -                     bool notify_up, struct if_status_mgr *if_mgr)
> >>>>> +                     bool sb_readonly)
> >>>>>  {
> >>>>> -    if (!notify_up) {
> >>>>> -        bool up = true;
> >>>>> -        if (!parent_pb || (parent_pb->n_up && parent_pb->up[0])) {
> >>>>> +    /* When notify_up is false in claim_port(), no state is created
> >>>>> +     * by if_status_mgr. In such cases, return false (i.e. trigger
> >>>> recompute)
> >>>>> +     * if we can't update sb (because it is readonly).
> >>>>> +     */
> >>>>> +    bool up = true;
> >>>>> +    if (!parent_pb || (parent_pb->n_up && parent_pb->up[0])) {
> >>>>> +        if (!sb_readonly) {
> >>>>>              if (pb->n_up) {
> >>>>>                  sbrec_port_binding_set_up(pb, &up, 1);
> >>>>>              }
> >>>>> +        } else if (pb->n_up && !pb->up[0]) {
> >>>>> +            return false;
> >>>>>          }
> >>>>> -        return;
> >>>>> -    }
> >>>>> -
> >>>>> -    if (pb->chassis != chassis_rec || (pb->n_up && !pb->up[0])) {
> >>>>> -        if_status_mgr_claim_iface(if_mgr, pb->logical_port);
> >>>>>      }
> >>>>> +    return true;
> >>>>>  }
> >>>>>
> >>>>>  typedef void (*set_func)(const struct sbrec_port_binding *pb,
> >>>>> @@ -1057,37 +1116,35 @@ claim_lport(const struct sbrec_port_binding
> > *pb,
> >>>>>              struct hmap *tracked_datapaths,
> >>>>>              struct if_status_mgr *if_mgr)
> >>>>>  {
> >>>>> -    if (!sb_readonly) {
> >>>>> -        claimed_lport_set_up(pb, parent_pb, chassis_rec, notify_up,
> >>>> if_mgr);
> >>>>> -    }
> >>>>> -
> >>>>>      enum can_bind can_bind =
> >>>> lport_can_bind_on_this_chassis(chassis_rec, pb);
> >>>>>      bool update_tracked = false;
> >>>>>
> >>>>>      if (can_bind == CAN_BIND_AS_MAIN) {
> >>>>>          if (pb->chassis != chassis_rec) {
> >>>>> -            if (sb_readonly) {
> >>>>> -                return false;
> >>>>> -            }
> >>>>> -
> >>>>> -            if (pb->chassis) {
> >>>>> -                VLOG_INFO("Changing chassis for lport %s from %s to
> >>>> %s.",
> >>>>> -                        pb->logical_port, pb->chassis->name,
> >>>>> -                        chassis_rec->name);
> >>>>> -            } else {
> >>>>> -                VLOG_INFO("Claiming lport %s for this chassis.",
> >>>>> -                          pb->logical_port);
> >>>>> -            }
> >>>>> -            for (size_t i = 0; i < pb->n_mac; i++) {
> >>>>> -                VLOG_INFO("%s: Claiming %s", pb->logical_port,
> >>>> pb->mac[i]);
> >>>>> -            }
> >>>>> -
> >>>>> -            sbrec_port_binding_set_chassis(pb, chassis_rec);
> >>>>>              if (is_additional_chassis(pb, chassis_rec)) {
> >>>>> +                if (sb_readonly) {
> >>>>> +                    return false;
> >>>>> +                }
> >>>>>                  remove_additional_chassis(pb, chassis_rec);
> >>>>>              }
> >>>>>              update_tracked = true;
> >>>>>          }
> >>>>> +        if (!notify_up) {
> >>>>> +            if (!claimed_lport_set_up(pb, parent_pb, sb_readonly))
{
> >>>>> +                return false;
> >>>>> +            }
> >>>>> +            if (pb->chassis != chassis_rec) {
> >>>>> +                if (sb_readonly) {
> >>>>> +                    return false;
> >>>>> +                }
> >>>>> +                set_pb_chassis_in_sbrec(pb, chassis_rec, true);
> >>>>> +            }
> >>>>> +        } else {
> >>>>> +            if ((pb->chassis != chassis_rec) || (pb->n_up &&
> >>>> !pb->up[0])) {
> >>>>> +                if_status_mgr_claim_iface(if_mgr, pb, chassis_rec,
> >>>>> +                                          sb_readonly);
> >>>>> +            }
> >>>>> +        }
> >>>>>      } else if (can_bind == CAN_BIND_AS_ADDITIONAL) {
> >>>>>          if (!is_additional_chassis(pb, chassis_rec)) {
> >>>>>              if (sb_readonly) {
> >>>>> @@ -1132,7 +1189,8 @@ claim_lport(const struct sbrec_port_binding
*pb,
> >>>>>   */
> >>>>>  static bool
> >>>>>  release_lport_main_chassis(const struct sbrec_port_binding *pb,
> >>>>> -                           bool sb_readonly)
> >>>>> +                           bool sb_readonly,
> >>>>> +                           struct if_status_mgr *if_mgr)
> >>>>>  {
> >>>>>      if (pb->encap) {
> >>>>>          if (sb_readonly) {
> >>>>> @@ -1141,11 +1199,13 @@ release_lport_main_chassis(const struct
> >>>> sbrec_port_binding *pb,
> >>>>>          sbrec_port_binding_set_encap(pb, NULL);
> >>>>>      }
> >>>>>
> >>>>> +    /* If sb readonly, pb->chassis unset through if-status if
> > present.
> >>>> */
> >>>>>      if (pb->chassis) {
> >>>>> -        if (sb_readonly) {
> >>>>> +        if (!sb_readonly) {
> >>>>> +            sbrec_port_binding_set_chassis(pb, NULL);
> >>>>> +        } else if (!if_status_mgr_iface_is_present(if_mgr,
> >>>> pb->logical_port)) {
> >>>>>              return false;
> >>>>>          }
> >>>>> -        sbrec_port_binding_set_chassis(pb, NULL);
> >>>>>      }
> >>>>>
> >>>>>      if (pb->virtual_parent) {
> >>>>> @@ -1155,7 +1215,8 @@ release_lport_main_chassis(const struct
> >>>> sbrec_port_binding *pb,
> >>>>>          sbrec_port_binding_set_virtual_parent(pb, NULL);
> >>>>>      }
> >>>>>
> >>>>> -    VLOG_INFO("Releasing lport %s from this chassis.",
> >>>> pb->logical_port);
> >>>>> +    VLOG_INFO("Releasing lport %s from this chassis
> > (sb_readonly=%d)",
> >>>>> +              pb->logical_port, sb_readonly);
> >>>>>      return true;
> >>>>>  }
> >>>>>
> >>>>> @@ -1189,7 +1250,7 @@ release_lport(const struct sbrec_port_binding
> > *pb,
> >>>>>                struct hmap *tracked_datapaths, struct if_status_mgr
> >>>> *if_mgr)
> >>>>>  {
> >>>>>      if (pb->chassis == chassis_rec) {
> >>>>> -        if (!release_lport_main_chassis(pb, sb_readonly)) {
> >>>>> +        if (!release_lport_main_chassis(pb, sb_readonly, if_mgr)) {
> >>>>>              return false;
> >>>>>          }
> >>>>>      } else if (is_additional_chassis(pb, chassis_rec)) {
> >>>>> @@ -1271,7 +1332,7 @@ consider_vif_lport_(const struct
> >>>> sbrec_port_binding *pb,
> >>>>>                               b_lport->lbinding->iface,
> >>>>>                               !b_ctx_in->ovnsb_idl_txn,
> >>>>>                               !parent_pb,
> > b_ctx_out->tracked_dp_bindings,
> >>>>> -                             b_ctx_out->if_mgr)){
> >>>>> +                             b_ctx_out->if_mgr)) {
> >>>>>                  return false;
> >>>>>              }
> >>>>>
> >>>>> @@ -1527,7 +1588,8 @@ consider_localport(const struct
> > sbrec_port_binding
> >>>> *pb,
> >>>>>      enum can_bind can_bind = lport_can_bind_on_this_chassis(
> >>>>>          b_ctx_in->chassis_rec, pb);
> >>>>>      if (can_bind == CAN_BIND_AS_MAIN) {
> >>>>> -        if (!release_lport_main_chassis(pb,
> > !b_ctx_in->ovnsb_idl_txn)) {
> >>>>> +        if (!release_lport_main_chassis(pb,
!b_ctx_in->ovnsb_idl_txn,
> >>>>> +            b_ctx_out->if_mgr)) {
> >>>>>              return false;
> >>>>>          }
> >>>>>      } else if (can_bind == CAN_BIND_AS_ADDITIONAL) {
> >>>>> diff --git a/controller/binding.h b/controller/binding.h
> >>>>> index 1fed06674..d20659b0b 100644
> >>>>> --- a/controller/binding.h
> >>>>> +++ b/controller/binding.h
> >>>>> @@ -151,8 +151,10 @@ const struct sbrec_port_binding
> >>>> *local_binding_get_primary_pb(
> >>>>>  ofp_port_t local_binding_get_lport_ofport(const struct shash
> >>>> *local_bindings,
> >>>>>                                            const char *pb_name);
> >>>>>
> >>>>> -bool local_binding_is_up(struct shash *local_bindings, const char
> >>>> *pb_name);
> >>>>> -bool local_binding_is_down(struct shash *local_bindings, const char
> >>>> *pb_name);
> >>>>> +bool local_binding_is_up(struct shash *local_bindings, const char
> >>>> *pb_name,
> >>>>> +                         const struct sbrec_chassis *);
> >>>>> +bool local_binding_is_down(struct shash *local_bindings, const char
> >>>> *pb_name,
> >>>>> +                           const struct sbrec_chassis *);
> >>>>>  void local_binding_set_up(struct shash *local_bindings, const char
> >>>> *pb_name,
> >>>>>                            const struct sbrec_chassis *chassis_rec,
> >>>>>                            const char *ts_now_str, bool sb_readonly,
> >>>>> @@ -160,7 +162,10 @@ void local_binding_set_up(struct shash
> >>>> *local_bindings, const char *pb_name,
> >>>>>  void local_binding_set_down(struct shash *local_bindings, const
char
> >>>> *pb_name,
> >>>>>                              const struct sbrec_chassis
*chassis_rec,
> >>>>>                              bool sb_readonly, bool ovs_readonly);
> >>>>> -
> >>>>> +void local_binding_set_pb(struct shash *local_bindings, const char
> >>>> *pb_name,
> >>>>> +                          const struct sbrec_chassis *chassis_rec,
> >>>>> +                          struct hmap *tracked_datapaths,
> >>>>> +                          bool is_set);
> >>>>>  void binding_register_ovs_idl(struct ovsdb_idl *);
> >>>>>  void binding_run(struct binding_ctx_in *, struct binding_ctx_out
*);
> >>>>>  bool binding_cleanup(struct ovsdb_idl_txn *ovnsb_idl_txn,
> >>>>> @@ -178,6 +183,10 @@ void binding_dump_local_bindings(struct
> >>>> local_binding_data *, struct ds *);
> >>>>>  bool is_additional_chassis(const struct sbrec_port_binding *pb,
> >>>>>                             const struct sbrec_chassis
*chassis_rec);
> >>>>>
> >>>>> +void set_pb_chassis_in_sbrec(const struct sbrec_port_binding *pb,
> >>>>> +                             const struct sbrec_chassis
*chassis_rec,
> >>>>> +                             bool is_set);
> >>>>> +
> >>>>>  /* Corresponds to each Port_Binding.type. */
> >>>>>  enum en_lport_type {
> >>>>>      LP_UNKNOWN,
> >>>>> diff --git a/controller/if-status.c b/controller/if-status.c
> >>>>> index ad61844d8..7693c289b 100644
> >>>>> --- a/controller/if-status.c
> >>>>> +++ b/controller/if-status.c
> >>>>> @@ -24,6 +24,7 @@
> >>>>>  #include "lib/util.h"
> >>>>>  #include "timeval.h"
> >>>>>  #include "openvswitch/vlog.h"
> >>>>> +#include "lib/ovn-sb-idl.h"
> >>>>>
> >>>>>  VLOG_DEFINE_THIS_MODULE(if_status);
> >>>>>
> >>>>> @@ -53,9 +54,11 @@ VLOG_DEFINE_THIS_MODULE(if_status);
> >>>>>   */
> >>>>>
> >>>>>  enum if_state {
> >>>>> -    OIF_CLAIMED,       /* Newly claimed interface. */
> >>>>> -    OIF_INSTALL_FLOWS, /* Already claimed interface for which flows
> > are
> >>>> still
> >>>>> -                        * being installed.
> >>>>> +    OIF_CLAIMED,       /* Newly claimed interface. pb->chassis not
> > yet
> >>>> updated.
> >>>>> +                        */
> >>>>> +    OIF_INSTALL_FLOWS, /* Claimed interface with pb->chassis
> >>>> successfully
> >>>>> +                        * updated in SB and for which flows are
still
> >>>> being
> >>>>> +                        * installed.
> >>>>>                          */
> >>>>>      OIF_MARK_UP,       /* Interface with flows successfully
installed
> >>>> in OVS
> >>>>>                          * but not yet marked "up" in the binding
> > module
> >>>> (in
> >>>>> @@ -78,6 +81,55 @@ static const char *if_state_names[] = {
> >>>>>      [OIF_INSTALLED]     = "INSTALLED",
> >>>>>  };
> >>>>>
> >>>>> +/*
> >>>>> + *       +----------------------+
> >>>>> + * +---> |                      |
> >>>>> + * | +-> |         NULL         |
> >>>> <--------------------------------------+++-+
> >>>>> + * | |   +----------------------+
> >>>>      |
> >>>>> + * | |     ^ release_iface   | claim_iface
> >>>>       |
> >>>>> + * | |     |                 V - sbrec_update_chassis(if sb is rw)
> >>>>       |
> >>>>> + * | |   +----------------------+
> >>>>      |
> >>>>> + * | |   |                      |
> >>>> <----------------------------------------+ |
> >>>>> + * | |   |       CLAIMED        |
> >>>> <--------------------------------------+ | |
> >>>>> + * | |   +----------------------+
> >>>>  | | |
> >>>>> + * | |                  | mgr_update(when sb is rw)
> >>>>  | | |
> >>>>> + * | | release_iface    |  - sbrec_update_chassis
> >>>>  | | |
> >>>>> + * | |                  |  - request seqno
> >>>>   | | |
> >>>>> + * | |                  V
> >>>>  | | |
> >>>>> + * | |   +----------------------+
> >>>>  | | |
> >>>>> + * | +-- |                      |  mgr_run(seqno not rcvd)
> >>>>   | | |
> >>>>> + * |     |    INSTALL_FLOWS     |   - set port down in sb
> >>>>  | | |
> >>>>> + * |     |                      |  mgr_update()
> >>>>  | | |
> >>>>> + * |     +----------------------+   - sbrec_update_chassis if
needed
> >>>>   | | |
> >>>>> + * |                    |
> >>>>  | | |
> >>>>> + * |                    |  mgr_run(seqno rcvd)
> >>>>   | | |
> >>>>> + * |                    |  - set port up in sb
> >>>>   | | |
> >>>>> + * | release_iface      |  - set ovn-installed in ovs
> >>>>  | | |
> >>>>> + * |                    V
> >>>>  | | |
> >>>>> + * |   +----------------------+
> >>>>  | | |
> >>>>> + * |   |                      |  mgr_run()
> >>>>   | | |
> >>>>> + * +-- |       MARK_UP        |  - set port up in sb
> >>>>   | | |
> >>>>> + *     |                      |  - set ovn-installed in ovs
> >>>>  | | |
> >>>>> + *     |                      |  mgr_update()
> >>>>  | | |
> >>>>> + *     +----------------------+  - sbrec_update_chassis if needed
> >>>>  | | |
> >>>>> + *              |
> >>>>  | | |
> >>>>> + *              | mgr_update(rcvd port up / ovn_installed & chassis
> >>>> set) | | |
> >>>>> + *              V
> >>>>  | | |
> >>>>> + *     +----------------------+
> >>>>  | | |
> >>>>> + *     |      INSTALLED       | ------------> claim_iface
> >>>> ---------------+ | |
> >>>>> + *     +----------------------+
> >>>>    | |
> >>>>> + *              |
> >>>>    | |
> >>>>> + *              | release_iface
> >>>>    | |
> >>>>> + *              V
> >>>>    | |
> >>>>> + *     +----------------------+
> >>>>    | |
> >>>>> + *     |                      | ------------> claim_iface
> >>>> -----------------+ |
> >>>>> + *     |      MARK_DOWN       | ------> mgr_update(rcvd port down)
> >>>> ----------+
> >>>>> + *     |                      | mgr_run()
> >>>>> + *     |                      | - set port down in sb
> >>>>> + *     |                      | mgr_update()
> >>>>> + *     +----------------------+ - sbrec_update_chassis(NULL)
> >>>>> + */
> >>>>> +
> >>>>>  struct ovs_iface {
> >>>>>      char *id;               /* Extracted from OVS
> >>>> external_ids.iface_id. */
> >>>>>      enum if_state state;    /* State of the interface in the state
> >>>> machine. */
> >>>>> @@ -85,6 +137,7 @@ struct ovs_iface {
> >>>>>                               * be fully programmed in OVS.  Only
used
> >>>> in state
> >>>>>                               * OIF_INSTALL_FLOWS.
> >>>>>                               */
> >>>>> +    bool chassis_update_required;  /* If true, pb->chassis must be
> >>>> updated. */
> >>>>>  };
> >>>>>
> >>>>>  static uint64_t ifaces_usage;
> >>>>> @@ -158,14 +211,23 @@ if_status_mgr_destroy(struct if_status_mgr
*mgr)
> >>>>>  }
> >>>>>
> >>>>>  void
> >>>>> -if_status_mgr_claim_iface(struct if_status_mgr *mgr, const char
> >>>> *iface_id)
> >>>>> +if_status_mgr_claim_iface(struct if_status_mgr *mgr,
> >>>>> +                          const struct sbrec_port_binding *pb,
> >>>>> +                          const struct sbrec_chassis *chassis_rec,
> >>>>> +                          bool sb_readonly)
> >>>>>  {
> >>>>> +    const char *iface_id = pb->logical_port;
> >>>>>      struct ovs_iface *iface = shash_find_data(&mgr->ifaces,
> > iface_id);
> >>>>>
> >>>>>      if (!iface) {
> >>>>>          iface = ovs_iface_create(mgr, iface_id, OIF_CLAIMED);
> >>>>>      }
> >>>>> -
> >>>>> +    if (!sb_readonly) {
> >>>>> +        set_pb_chassis_in_sbrec(pb, chassis_rec, true);
> >>>>> +        iface->chassis_update_required = false;
> >>>>> +    } else {
> >>>>> +        iface->chassis_update_required = true;
> >>>>> +    }
> >>>>>      switch (iface->state) {
> >>>>>      case OIF_CLAIMED:
> >>>>>      case OIF_INSTALL_FLOWS:
> >>>>> @@ -182,6 +244,12 @@ if_status_mgr_claim_iface(struct if_status_mgr
> >>>> *mgr, const char *iface_id)
> >>>>>      }
> >>>>>  }
> >>>>>
> >>>>> +bool
> >>>>> +if_status_mgr_iface_is_present(struct if_status_mgr *mgr, const
char
> >>>> *iface_id)
> >>>>> +{
> >>>>> +    return !!shash_find_data(&mgr->ifaces, iface_id);
> >>>>> +}
> >>>>> +
> >>>>>  void
> >>>>>  if_status_mgr_release_iface(struct if_status_mgr *mgr, const char
> >>>> *iface_id)
> >>>>>  {
> >>>>> @@ -246,9 +314,39 @@ if_status_mgr_delete_iface(struct if_status_mgr
> >>>> *mgr, const char *iface_id)
> >>>>>      }
> >>>>>  }
> >>>>>
> >>>>> +bool
> >>>>> +if_status_handle_claims(struct if_status_mgr *mgr,
> >>>>> +                        struct local_binding_data *binding_data,
> >>>>> +                        const struct sbrec_chassis *chassis_rec,
> >>>>> +                        struct hmap *tracked_datapath,
> >>>>> +                        bool sb_readonly)
> >>>>> +{
> >>>>> +    if (!binding_data || sb_readonly) {
> >>>>> +        return false;
> >>>>> +    }
> >>>>> +
> >>>>> +    struct shash *bindings = &binding_data->bindings;
> >>>>> +    struct hmapx_node *node;
> >>>>> +
> >>>>> +    bool rc = false;
> >>>>> +    HMAPX_FOR_EACH (node, &mgr->ifaces_per_state[OIF_CLAIMED]) {
> >>>>> +        struct ovs_iface *iface = node->data;
> >>>>> +        if (iface->chassis_update_required) {
> >>>>
> >>>> Thanks Xavier for the revision. The state machine looks more clear
now,
> >>>> but I have a major concern for the use of chassis_update_required.
This
> >>>> bool flag is used to decide if an update to SB is needed, and once a
SB
> >>>> update is requested, it is set to false, and assumes the SB update
will
> >>>> succeed immediately. However, the assumption may be wrong. There can
be
> >>>> different kinds of reasons that the subsequent SB update fails, or
> > delayed,
> >>>> so this flag is not reliable. Instead, in CLAIMED state, the
> > responsibility
> >>>> to make sure the SB update is completed. If the transaction is
> > in-progress,
> >>>> the sb_readonly is true. So if sb_readonly is false, it means nothing
> > is
> >>>> in-progress, so we can always check if (!sb_readonly && <SB chassis
is
> > not
> >>>> updated for the port-binding>) we should just send the update,
> > regardless
> >>>> of whether we have requested it before. Please also see another
comment
> >>>> below for the state transition.
> >>>>
> >>>>> +            VLOG_INFO("if_status_handle_claims for %s", iface->id);
> >>>>> +            local_binding_set_pb(bindings, iface->id, chassis_rec,
> >>>>> +                                 tracked_datapath, true);
> >>>>> +            rc = true;
> >>>>> +        }
> >>>>> +        iface->chassis_update_required = false;
> >>>>> +    }
> >>>>> +    return rc;
> >>>>> +}
> >>>>> +
> >>>>>  void
> >>>>>  if_status_mgr_update(struct if_status_mgr *mgr,
> >>>>> -                     struct local_binding_data *binding_data)
> >>>>> +                     struct local_binding_data *binding_data,
> >>>>> +                     const struct sbrec_chassis *chassis_rec,
> >>>>> +                     bool sb_readonly)
> >>>>>  {
> >>>>>      if (!binding_data) {
> >>>>>          return;
> >>>>> @@ -257,13 +355,25 @@ if_status_mgr_update(struct if_status_mgr
*mgr,
> >>>>>      struct shash *bindings = &binding_data->bindings;
> >>>>>      struct hmapx_node *node;
> >>>>>
> >>>>> +    /* Interfaces in OIF_MARK_UP state have already set their
> >>>> pb->chassis.
> >>>>> +     * However, it might have been reset by another hv.
> >>>>> +     */
> >>>>>      /* Move all interfaces that have been confirmed "up" by the
> > binding
> >>>> module,
> >>>>>       * from OIF_MARK_UP to OIF_INSTALLED.
> >>>>>       */
> >>>>>      HMAPX_FOR_EACH_SAFE (node,
&mgr->ifaces_per_state[OIF_MARK_UP]) {
> >>>>>          struct ovs_iface *iface = node->data;
> >>>>>
> >>>>> -        if (local_binding_is_up(bindings, iface->id)) {
> >>>>> +        if (iface->chassis_update_required) {
> >>>>> +            if (!sb_readonly) {
> >>>>> +                iface->chassis_update_required = false;
> >>>>> +                local_binding_set_pb(bindings, iface->id,
> > chassis_rec,
> >>>>> +                                     NULL, true);
> >>>>> +            } else {
> >>>>> +                continue;
> >>>>> +            }
> >>>>> +        }
> >>>>> +        if (local_binding_is_up(bindings, iface->id, chassis_rec))
{
> >>>>>              ovs_iface_set_state(mgr, iface, OIF_INSTALLED);
> >>>>>          }
> >>>>>      }
> >>>>> @@ -274,23 +384,53 @@ if_status_mgr_update(struct if_status_mgr
*mgr,
> >>>>>      HMAPX_FOR_EACH_SAFE (node,
> > &mgr->ifaces_per_state[OIF_MARK_DOWN]) {
> >>>>>          struct ovs_iface *iface = node->data;
> >>>>>
> >>>>> -        if (local_binding_is_down(bindings, iface->id)) {
> >>>>> +        if (!sb_readonly) {
> >>>>> +            local_binding_set_pb(bindings, iface->id, chassis_rec,
> >>>>> +                                 NULL, false);
> >>>>> +        }
> >>>>> +        if (local_binding_is_down(bindings, iface->id,
chassis_rec))
> > {
> >>>>>              ovs_iface_destroy(mgr, iface);
> >>>>>          }
> >>>>>      }
> >>>>>
> >>>>> -    /* Register for a notification about flows being installed in
OVS
> >>>> for all
> >>>>> -     * newly claimed interfaces.
> >>>>> +    if (!sb_readonly) {
> >>>>> +        HMAPX_FOR_EACH_SAFE (node,
> >>>> &mgr->ifaces_per_state[OIF_INSTALL_FLOWS]) {
> >>>>> +            struct ovs_iface *iface = node->data;
> >>>>> +
> >>>>> +            if (iface->chassis_update_required) {
> >>>>> +                iface->chassis_update_required = false;
> >>>>> +                local_binding_set_pb(bindings, iface->id,
> > chassis_rec,
> >>>>> +                                     NULL, true);
> >>>>> +            }
> >>>>> +        }
> >>>>> +    }
> >>>>> +
> >>>>> +    /* Update Port_Binding->chassis for newly claimed interfaces
> >>>>> +     * Register for a notification about flows being installed in
OVS
> >>>> for all
> >>>>> +     * newly claimed interfaces for which we could update
> > pb->chassis.
> >>>>>       *
> >>>>>       * Move them from OIF_CLAIMED to OIF_INSTALL_FLOWS.
> >>>>>       */
> >>>>> -    bool new_ifaces = false;
> >>>>> -    HMAPX_FOR_EACH_SAFE (node,
&mgr->ifaces_per_state[OIF_CLAIMED]) {
> >>>>> -        struct ovs_iface *iface = node->data;
> >>>>>
> >>>>> -        ovs_iface_set_state(mgr, iface, OIF_INSTALL_FLOWS);
> >>>>> -        iface->install_seqno = mgr->iface_seqno + 1;
> >>>>> -        new_ifaces = true;
> >>>>> +    bool new_ifaces = false;
> >>>>> +    if (!sb_readonly) {
> >>>>> +        HMAPX_FOR_EACH_SAFE (node,
> > &mgr->ifaces_per_state[OIF_CLAIMED])
> >>>> {
> >>>>> +            struct ovs_iface *iface = node->data;
> >>>>> +            /* No need to check for chassis_update_required as
> > already
> >>>> done
> >>>>> +             * in if_status_handle_claims or
> > if_status_mgr_claim_iface
> >>>>> +             */
> >>>>> +            ovs_iface_set_state(mgr, iface, OIF_INSTALL_FLOWS);
> >>>>
> >>>> We need to make sure the chassis in SB port-binding is up-to-date
(i.e.
> >>>> the update notification from SB DB has been received) before moving
to
> >>>> INSTALL_FLOWS. Otherwise, it is still possible that the state is
moved
> > too
> >>>> early and end up with incomplete flow installation for the lport when
> > the
> >>>> state is finally moved to INSTALLED.
> >>>>
> >>>> Thanks,
> >>>> Han
> >>>>
> >>>>> +            iface->install_seqno = mgr->iface_seqno + 1;
> >>>>> +            new_ifaces = true;
> >>>>> +        }
> >>>>> +    } else {
> >>>>> +        HMAPX_FOR_EACH_SAFE (node,
> > &mgr->ifaces_per_state[OIF_CLAIMED])
> >>>> {
> >>>>> +            struct ovs_iface *iface = node->data;
> >>>>> +            static struct vlog_rate_limit rl =
> > VLOG_RATE_LIMIT_INIT(5,
> >>>> 1);
> >>>>> +            VLOG_INFO_RL(&rl,
> >>>>> +                         "Not updating pb chassis for %s now as "
> >>>>> +                         "sb is readonly", iface->id);
> >>>>> +        }
> >>>>>      }
> >>>>>
> >>>>>      /* Request a seqno update when the flows for new interfaces
have
> >>>> been
> >>>>> @@ -403,7 +543,7 @@ if_status_mgr_update_bindings(struct
if_status_mgr
> >>>> *mgr,
> >>>>>      struct hmapx_node *node;
> >>>>>
> >>>>>      /* Notify the binding module to set "down" all bindings that
are
> >>>> still
> >>>>> -     * in the process of being installed in OVS, i.e., are not yet
> >>>> instsalled.
> >>>>> +     * in the process of being installed in OVS, i.e., are not yet
> >>>> installed.
> >>>>>       */
> >>>>>      HMAPX_FOR_EACH (node,
&mgr->ifaces_per_state[OIF_INSTALL_FLOWS])
> > {
> >>>>>          struct ovs_iface *iface = node->data;
> >>>>> diff --git a/controller/if-status.h b/controller/if-status.h
> >>>>> index bb8a3950d..f9b05d30d 100644
> >>>>> --- a/controller/if-status.h
> >>>>> +++ b/controller/if-status.h
> >>>>> @@ -27,15 +27,27 @@ struct if_status_mgr
*if_status_mgr_create(void);
> >>>>>  void if_status_mgr_clear(struct if_status_mgr *);
> >>>>>  void if_status_mgr_destroy(struct if_status_mgr *);
> >>>>>
> >>>>> -void if_status_mgr_claim_iface(struct if_status_mgr *, const char
> >>>> *iface_id);
> >>>>> +void if_status_mgr_claim_iface(struct if_status_mgr *,
> >>>>> +                               const struct sbrec_port_binding *pb,
> >>>>> +                               const struct sbrec_chassis
> > *chassis_rec,
> >>>>> +                               bool sb_readonly);
> >>>>>  void if_status_mgr_release_iface(struct if_status_mgr *, const char
> >>>> *iface_id);
> >>>>>  void if_status_mgr_delete_iface(struct if_status_mgr *, const char
> >>>> *iface_id);
> >>>>>
> >>>>> -void if_status_mgr_update(struct if_status_mgr *, struct
> >>>> local_binding_data *);
> >>>>> +void if_status_mgr_update(struct if_status_mgr *, struct
> >>>> local_binding_data *,
> >>>>> +                          const struct sbrec_chassis *chassis,
> >>>>> +                          bool sb_readonly);
> >>>>>  void if_status_mgr_run(struct if_status_mgr *mgr, struct
> >>>> local_binding_data *,
> >>>>>                         const struct sbrec_chassis *,
> >>>>>                         bool sb_readonly, bool ovs_readonly);
> >>>>>  void if_status_mgr_get_memory_usage(struct if_status_mgr *mgr,
> >>>>>                                      struct simap *usage);
> >>>>> +bool if_status_mgr_iface_is_present(struct if_status_mgr *mgr,
> >>>>> +                                    const char *iface_id);
> >>>>> +bool if_status_handle_claims(struct if_status_mgr *mgr,
> >>>>> +                             struct local_binding_data
*binding_data,
> >>>>> +                             const struct sbrec_chassis
*chassis_rec,
> >>>>> +                             struct hmap *tracked_datapath,
> >>>>> +                             bool sb_readonly);
> >>>>>
> >>>>>  # endif /* controller/if-status.h */
> >>>>> diff --git a/controller/ovn-controller.c
b/controller/ovn-controller.c
> >>>>> index 69615308e..3947baf03 100644
> >>>>> --- a/controller/ovn-controller.c
> >>>>> +++ b/controller/ovn-controller.c
> >>>>> @@ -1464,6 +1464,73 @@ en_runtime_data_run(struct engine_node *node,
> >>>> void *data)
> >>>>>      engine_set_node_state(node, EN_UPDATED);
> >>>>>  }
> >>>>>
> >>>>> +struct ed_type_sb_ro {
> >>>>> +    bool sb_readonly;
> >>>>> +};
> >>>>> +
> >>>>> +static void *
> >>>>> +en_sb_ro_init(struct engine_node *node OVS_UNUSED,
> >>>>> +              struct engine_arg *arg OVS_UNUSED)
> >>>>> +{
> >>>>> +    struct ed_type_sb_ro *data = xzalloc(sizeof *data);
> >>>>> +    return data;
> >>>>> +}
> >>>>> +
> >>>>> +static void
> >>>>> +en_sb_ro_run(struct engine_node *node, void *data)
> >>>>> +{
> >>>>> +    struct ed_type_sb_ro *sb_ro_data = data;
> >>>>> +    bool sb_readonly = !engine_get_context()->ovnsb_idl_txn;
> >>>>> +    if (sb_ro_data->sb_readonly != sb_readonly) {
> >>>>> +        sb_ro_data->sb_readonly = sb_readonly;
> >>>>> +        if (!sb_ro_data->sb_readonly) {
> >>>>> +            engine_set_node_state(node, EN_UPDATED);
> >>>>> +        }
> >>>>> +    }
> >>>>> +}
> >>>>> +
> >>>>> +static void
> >>>>> +en_sb_ro_cleanup(void *data OVS_UNUSED)
> >>>>> +{
> >>>>> +}
> >>>>> +
> >>>>> +static bool
> >>>>> +runtime_data_sb_ro_handler(struct engine_node *node, void *data)
> >>>>> +{
> >>>>> +    const struct sbrec_chassis *chassis = NULL;
> >>>>> +
> >>>>> +    struct ovsrec_open_vswitch_table *ovs_table =
> >>>>> +        (struct ovsrec_open_vswitch_table *)EN_OVSDB_GET(
> >>>>> +            engine_get_input("OVS_open_vswitch", node));
> >>>>> +
> >>>>> +    const char *chassis_id = get_ovs_chassis_id(ovs_table);
> >>>>> +
> >>>>> +    struct ovsdb_idl_index *sbrec_chassis_by_name =
> >>>>> +        engine_ovsdb_node_get_index(
> >>>>> +                engine_get_input("SB_chassis", node),
> >>>>> +                "name");
> >>>>> +
> >>>>> +    if (chassis_id) {
> >>>>> +        chassis = chassis_lookup_by_name(sbrec_chassis_by_name,
> >>>> chassis_id);
> >>>>> +    }
> >>>>> +    if (chassis) {
> >>>>> +        struct ed_type_runtime_data *rt_data = data;
> >>>>> +        bool sb_readonly = !engine_get_context()->ovnsb_idl_txn;
> >>>>> +        struct controller_engine_ctx *ctrl_ctx =
> >>>>> +            engine_get_context()->client_ctx;
> >>>>> +
> >>>>> +        if (if_status_handle_claims(ctrl_ctx->if_mgr,
> >>>>> +                                    &rt_data->lbinding_data,
> >>>>> +                                    chassis,
> >>>>> +                                    &rt_data->tracked_dp_bindings,
> >>>>> +                                    sb_readonly)) {
> >>>>> +            engine_set_node_state(node, EN_UPDATED);
> >>>>> +            rt_data->tracked = true;
> >>>>> +        }
> >>>>> +    }
> >>>>> +    return true;
> >>>>> +}
> >>>>> +
> >>>>>  static bool
> >>>>>  runtime_data_ovs_interface_shadow_handler(struct engine_node *node,
> >>>> void *data)
> >>>>>  {
> >>>>> @@ -3528,6 +3595,7 @@ main(int argc, char *argv[])
> >>>>>      stopwatch_create(VIF_PLUG_RUN_STOPWATCH_NAME, SW_MS);
> >>>>>
> >>>>>      /* Define inc-proc-engine nodes. */
> >>>>> +    ENGINE_NODE(sb_ro, "sb_ro");
> >>>>>      ENGINE_NODE_WITH_CLEAR_TRACK_DATA_IS_VALID(ct_zones,
"ct_zones");
> >>>>>      ENGINE_NODE_WITH_CLEAR_TRACK_DATA(ovs_interface_shadow,
> >>>>>                                        "ovs_interface_shadow");
> >>>>> @@ -3664,6 +3732,7 @@ main(int argc, char *argv[])
> >>>>>      engine_add_input(&en_ovs_interface_shadow, &en_ovs_interface,
> >>>>>                       ovs_interface_shadow_ovs_interface_handler);
> >>>>>
> >>>>> +    engine_add_input(&en_runtime_data, &en_sb_ro,
> >>>> runtime_data_sb_ro_handler);
> >>>>>      engine_add_input(&en_runtime_data, &en_ofctrl_is_connected,
> > NULL);
> >>>>>
> >>>>>      engine_add_input(&en_runtime_data, &en_ovs_open_vswitch, NULL);
> >>>>> @@ -4098,7 +4167,8 @@ main(int argc, char *argv[])
> >>>>>                          runtime_data ?
&runtime_data->lbinding_data :
> >>>> NULL;
> >>>>>
> >  stopwatch_start(IF_STATUS_MGR_UPDATE_STOPWATCH_NAME,
> >>>>>                                      time_msec());
> >>>>> -                    if_status_mgr_update(if_mgr, binding_data);
> >>>>> +                    if_status_mgr_update(if_mgr, binding_data,
> > chassis,
> >>>>> +                                         !ovnsb_idl_txn);
> >>>>>
> >  stopwatch_stop(IF_STATUS_MGR_UPDATE_STOPWATCH_NAME,
> >>>>>                                     time_msec());
> >>>>>
> >>>>> diff --git a/tests/ovn-macros.at b/tests/ovn-macros.at
> >>>>> index 335f9158c..8fd6ae6f7 100644
> >>>>> --- a/tests/ovn-macros.at
> >>>>> +++ b/tests/ovn-macros.at
> >>>>> @@ -759,3 +759,15 @@
> > m4_define([OVN_FOR_EACH_NORTHD_WITHOUT_DP_GROUPS],
> >>>>>       [m4_foreach([NORTHD_USE_DP_GROUPS], [no],
> >>>>>         [m4_foreach([NORTHD_USE_PARALLELIZATION], [yes, no], [$1
> >>>>>  ])])])])
> >>>>> +
> >>>>> +# OVN_NBCTL(NBCTL_COMMAND) adds NBCTL_COMMAND to list of commands
to
> > be
> >>>> run by RUN_OVN_NBCTL().
> >>>>> +m4_define([OVN_NBCTL], [
> >>>>> +    command="${command} -- $1"
> >>>>> +])
> >>>>> +
> >>>>> +# RUN_OVN_NBCTL() executes list of commands built by the
OVN_NBCTL()
> >>>> macro.
> >>>>> +m4_define([RUN_OVN_NBCTL], [
> >>>>> +    check ovn-nbctl ${command}
> >>>>> +    unset command
> >>>>> +])
> >>>>> +
> >>>>> diff --git a/tests/ovn.at b/tests/ovn.at
> >>>>> index bfaa41962..94d16bac9 100644
> >>>>> --- a/tests/ovn.at
> >>>>> +++ b/tests/ovn.at
> >>>>> @@ -102,6 +102,18 @@ m4_divert_text([PREPARE_TESTS],
> >>>>>           test 1 -le $(as $hv1 ovs-ofctl dump-flows br-int | grep -c
> >>>> "output:$ofport")
> >>>>>       ])
> >>>>>     }
> >>>>> +
> >>>>> +   ovn_wait_remote_input_flows () {
> >>>>> +     hv1=$1
> >>>>> +     hv2=$2
> >>>>> +     echo "$3: waiting for flows for remote input on $hv1"
> >>>>> +     # Wait for a flow outputing  to remote input
> >>>>> +     OVS_WAIT_UNTIL([
> >>>>> +         ofport=$(as $hv1 ovs-vsctl --bare --columns ofport find
> >>>> Interface name=ovn-${hv2}-0)
> >>>>> +         echo "tunnel port=$ofport"
> >>>>> +         test 1 -le $(as $hv1 ovs-ofctl dump-flows br-int | grep -c
> >>>> "in_port=$ofport")
> >>>>> +     ])
> >>>>> +   }
> >>>>>  ])
> >>>>>
> >>>>>  m4_define([OVN_CHECK_PACKETS],
> >>>>> @@ -127,6 +139,8 @@ m4_define([OVN_WAIT_PATCH_PORT_FLOWS],
> >>>>>  m4_define([OVN_WAIT_REMOTE_OUTPUT_FLOWS],
> >>>>>    [ovn_wait_remote_output_flows "$1" "$2" "__file__:__line__"])
> >>>>>
> >>>>> +m4_define([OVN_WAIT_REMOTE_INPUT_FLOWS],
> >>>>> +  [ovn_wait_remote_input_flows "$1" "$2" "__file__:__line__"])
> >>>>>
> >>>>>  AT_BANNER([OVN components])
> >>>>>
> >>>>> @@ -14056,6 +14070,11 @@ wait_column "$hv1_uuid" Port_Binding
> >>>> requested_chassis logical_port=lsp0
> >>>>>  wait_column "$hv2_uuid" Port_Binding additional_chassis
> >>>> logical_port=lsp0
> >>>>>  wait_column "$hv2_uuid" Port_Binding requested_additional_chassis
> >>>> logical_port=lsp0
> >>>>>
> >>>>> +# Check ovn-installed updated for main chassis
> >>>>> +wait_for_ports_up
> >>>>> +OVS_WAIT_UNTIL([test `as hv1 ovs-vsctl get Interface lsp0
> >>>> external_ids:ovn-installed` = '"true"'])
> >>>>> +OVS_WAIT_UNTIL([test x`as hv2 ovs-vsctl get Interface lsp0
> >>>> external_ids:ovn-installed` = x])
> >>>>> +
> >>>>>  # Check that setting iface:encap-ip populates
> >>>> Port_Binding:additional_encap
> >>>>>  wait_row_count Encap 2 chassis_name=hv1
> >>>>>  wait_row_count Encap 2 chassis_name=hv2
> >>>>> @@ -14081,6 +14100,11 @@ wait_column "$hv2_uuid" Port_Binding
> >>>> requested_chassis logical_port=lsp0
> >>>>>  wait_column "" Port_Binding additional_chassis logical_port=lsp0
> >>>>>  wait_column "" Port_Binding requested_additional_chassis
> >>>> logical_port=lsp0
> >>>>>
> >>>>> +# Check ovn-installed updated for main chassis and not for other
> > chassis
> >>>>> +wait_for_ports_up
> >>>>> +OVS_WAIT_UNTIL([test `as hv2 ovs-vsctl get Interface lsp0
> >>>> external_ids:ovn-installed` = '"true"'])
> >>>>> +OVS_WAIT_UNTIL([test x`as hv1 ovs-vsctl get Interface lsp0
> >>>> external_ids:ovn-installed` = x])
> >>>>> +
> >>>>>  # Check that additional_encap is cleared
> >>>>>  wait_column "" Port_Binding additional_encap logical_port=lsp0
> >>>>>
> >>>>> @@ -15370,7 +15394,10 @@ AT_CHECK([as hv1 ovs-ofctl dump-flows
br-int
> >>>> table=65 | grep actions=output:1],
> >>>>>  echo "verifying that lsp0 binding moves when requested-chassis is
> >>>> changed"
> >>>>>
> >>>>>  ovn-nbctl lsp-set-options lsp0 requested-chassis=hv2
> >>>>> -OVS_WAIT_UNTIL([test 1 = $(grep -c "Releasing lport lsp0 from this
> >>>> chassis" hv1/ovn-controller.log)])
> >>>>> +
> >>>>> +# We might see multiple "Releasing lport ...", when sb is read only
> >>>>> +OVS_WAIT_UNTIL([test 1 -le $(grep -c "Releasing lport lsp0 from
this
> >>>> chassis" hv1/ovn-controller.log)])
> >>>>> +
> >>>>>  wait_column "$hv2_uuid" Port_Binding chassis logical_port=lsp0
> >>>>>
> >>>>>  # (6) Chassis hv2 should add flows and hv1 should not.
> >>>>> @@ -15416,7 +15443,7 @@ AT_CHECK([as hv1 ovs-ofctl dump-flows br-int
> >>>> table=0 | grep in_port=1], [0], [ig
> >>>>>  AT_CHECK([as hv1 ovs-ofctl dump-flows br-int table=65 | grep
> >>>> actions=output:1], [0], [ignore])
> >>>>>
> >>>>>  check ovn-nbctl --wait=hv lsp-set-options lsp0
> >>>> requested-chassis=non-existant-chassis
> >>>>> -OVS_WAIT_UNTIL([test 1 = $(grep -c "Releasing lport lsp0 from this
> >>>> chassis" hv1/ovn-controller.log)])
> >>>>> +OVS_WAIT_UNTIL([test 1 -le $(grep -c "Releasing lport lsp0 from
this
> >>>> chassis" hv1/ovn-controller.log)])
> >>>>>  check ovn-nbctl --wait=hv sync
> >>>>>  wait_column '' Port_Binding chasssi logical_port=lsp0
> >>>>>  AT_CHECK([as hv1 ovs-ofctl dump-flows br-int table=0 | grep
> > in_port=1],
> >>>> [1], [])
> >>>>> @@ -32418,3 +32445,119 @@ AT_CHECK([test $(ovn-sbctl list fdb | grep
> > -c
> >>>> "00:00:00:00:10:30") = 0])
> >>>>>  OVN_CLEANUP([hv1])
> >>>>>  AT_CLEANUP
> >>>>>  ])
> >>>>> +
> >>>>> +OVN_FOR_EACH_NORTHD([
> >>>>> +AT_SETUP([recomputes])
> >>>>> +ovn_start
> >>>>> +
> >>>>> +n_hv=4
> >>>>> +
> >>>>> +# Add chassis
> >>>>> +net_add n1
> >>>>> +for i in $(seq 1 $n_hv); do
> >>>>> +    sim_add hv$i
> >>>>> +    as hv$i
> >>>>> +    check ovs-vsctl add-br br-phys
> >>>>> +    ovs-vsctl set open .
> > external-ids:ovn-bridge-mappings=phys:br-phys
> >>>>> +    ovn_attach n1 br-phys 192.168.0.$i 24 geneve
> >>>>> +done
> >>>>> +
> >>>>> +add_switch_ports() {
> >>>>> +    start_port=$1
> >>>>> +    end_port=$2
> >>>>> +    nb_hv=$3
> >>>>> +    bulk_size=$4
> >>>>> +    for ((i=start_port; i<end_port; )) do
> >>>>> +        start_bulk=$i
> >>>>> +        for hv in $(seq 1 $nb_hv); do
> >>>>> +            end_bulk=$((start_bulk+bulk_size-1))
> >>>>> +            for port in $(seq $start_bulk $end_bulk); do
> >>>>> +                logical_switch_port=lsp${port}
> >>>>> +                OVN_NBCTL(lsp-add ls1 $logical_switch_port)
> >>>>> +                OVN_NBCTL(lsp-set-addresses $logical_switch_port
> >>>> dynamic)
> >>>>> +            done
> >>>>> +            start_bulk=$((end_bulk+1))
> >>>>> +        done
> >>>>> +        RUN_OVN_NBCTL()
> >>>>> +
> >>>>> +        start_bulk=$i
> >>>>> +        for hv in $(seq 1 $nb_hv); do
> >>>>> +            end_bulk=$((start_bulk+bulk_size-1))
> >>>>> +            for port in $(seq $start_bulk $end_bulk); do
> >>>>> +                logical_switch_port=lsp${port}
> >>>>> +                as hv$hv ovs-vsctl \
> >>>>> +                    --no-wait -- add-port br-int vif${port} \
> >>>>> +                    -- set Interface vif${port}
> >>>> external_ids:iface-id=$logical_switch_port
> >>>>> +            done
> >>>>> +            start_bulk=$((end_bulk+1))
> >>>>> +        done
> >>>>> +        i=$((end_bulk+1))
> >>>>> +    done
> >>>>> +}
> >>>>> +check ovn-nbctl ls-add ls1
> >>>>> +check ovn-nbctl set Logical_Switch ls1 other_config:subnet=
> > 10.1.0.0/16
> >>>>> +check ovn-nbctl set Logical_Switch ls1
> >>>> other_config:exclude_ips=10.1.255.254
> >>>>> +
> >>>>> +check ovn-nbctl lr-add lr1
> >>>>> +check ovn-nbctl lsp-add ls1 lsp0 -- set Logical_Switch_Port lsp0
> >>>> type=router options:router-port=lrp0 addresses=dynamic
> >>>>> +check ovn-nbctl lrp-add lr1 lrp0 "f0:00:00:01:00:01"
10.1.255.254/16
> >>>>> +check ovn-nbctl lr-nat-add lr1 snat 10.2.0.1 10.1.0.0/16
> >>>>> +
> >>>>> +lflow_run=0
> >>>>> +check ovn-nbctl --wait=hv sync
> >>>>> +
> >>>>> +# Tunnel ports might not be added (yet) at this point on slow
system.
> >>>>> +# Wait for flows related to such ports to ensure those ports have
> > been
> >>>> added
> >>>>> +# before we measure recomputes. Otherwise, ovs_interface handler
> > might
> >>>> be run
> >>>>> +# afterwards for tunnel ports, causing recomputes.
> >>>>> +for i in $(seq 1 $n_hv); do
> >>>>> +    for j in $(seq 1 $n_hv); do
> >>>>> +        if test $i != $j; then
> >>>>> +            OVN_WAIT_REMOTE_INPUT_FLOWS(["hv$i"],["hv$j"])
> >>>>> +        fi
> >>>>> +    done
> >>>>> +done
> >>>>> +
> >>>>> +for i in $(seq 1 $n_hv); do
> >>>>> +    as hv$i
> >>>>> +    lflow_run1=$(ovn-appctl -t ovn-controller coverage/read-counter
> >>>> lflow_run)
> >>>>> +    lflow_run=`expr $lflow_run1 + $lflow_run`
> >>>>> +done
> >>>>> +
> >>>>> +add_switch_ports 1 1000 $n_hv 5
> >>>>> +
> >>>>> +wait_for_ports_up
> >>>>> +check ovn-nbctl --wait=hv sync
> >>>>> +
> >>>>> +for i in $(seq 1 $n_hv); do
> >>>>> +    pid=$(cat hv${i}/ovn-controller.pid)
> >>>>> +    u=$((u+$(cat "/proc/$pid/stat" | awk '{print $14}')))
> >>>>> +    s=$((s+$(cat "/proc/$pid/stat" | awk '{print $15}')))
> >>>>> +done
> >>>>> +
> >>>>> +n_pid=$(cat northd/ovn-northd.pid)
> >>>>> +n_u=$(cat "/proc/$pid/stat" | awk '{print $14}')
> >>>>> +n_s=$(cat "/proc/$pid/stat" | awk '{print $15}')
> >>>>> +
> >>>>> +echo "Total Northd User Time: $n_u"
> >>>>> +echo "Total Northd System Time: $n_s"
> >>>>> +echo "Total Controller User Time: $u"
> >>>>> +echo "Total Controller System Time: $s"
> >>>>> +
> >>>>> +lflow_run_end=0
> >>>>> +for i in $(seq 1 $n_hv); do
> >>>>> +    as hv$i
> >>>>> +    lflow_run1=$(ovn-appctl -t ovn-controller coverage/read-counter
> >>>> lflow_run)
> >>>>> +    lflow_run_end=`expr $lflow_run1 + $lflow_run_end`
> >>>>> +done
> >>>>> +n_recomputes=`expr $lflow_run_end - $lflow_run`
> >>>>> +echo "$n_recomputes recomputes"
> >>>>> +
> >>>>> +AT_CHECK([test $lflow_run_end == $lflow_run])
> >>>>> +
> >>>>> +for i in $(seq 2 $n_hv); do
> >>>>> +    OVN_CLEANUP_SBOX([hv$i])
> >>>>> +done
> >>>>> +OVN_CLEANUP([hv1])
> >>>>> +AT_CLEANUP
> >>>>> +])
> >>>>> diff --git a/tests/perf-northd.at b/tests/perf-northd.at
> >>>>> index 74b69e9d4..6ec196b36 100644
> >>>>> --- a/tests/perf-northd.at
> >>>>> +++ b/tests/perf-northd.at
> >>>>> @@ -76,23 +76,6 @@ m4_define([PERF_RECORD_STOP], [
> >>>>>      PERF_RECORD_STOPWATCH(ovn-northd-loop, ["Short term average"],
> >>>> [Average (northd-loop in msec)])
> >>>>>  ])
> >>>>>
> >>>>> -# OVN_NBCTL([NBCTL_COMMAND])
> >>>>> -#
> >>>>> -# Add NBCTL_COMMAND to list of commands to be run by
RUN_OVN_NBCTL().
> >>>>> -#
> >>>>> -m4_define([OVN_NBCTL], [
> >>>>> -    command="${command} -- $1"
> >>>>> -])
> >>>>> -
> >>>>> -# RUN_OVN_NBCTL()
> >>>>> -#
> >>>>> -# Execute list of commands built by the OVN_NBCTL() macro.
> >>>>> -#
> >>>>> -m4_define([RUN_OVN_NBCTL], [
> >>>>> -    check ovn-nbctl ${command}
> >>>>> -    unset command
> >>>>> -])
> >>>>> -
> >>>>>  OVS_START_SHELL_HELPERS
> >>>>>  generate_subnet () {
> >>>>>      local a=$(printf %d $(expr $1 / 256 + 10))
> >>>>> --
> >>>>> 2.31.1
> >>>>>
> >>>>
> >>>
> >>
> >
>
Xavier Simonart July 13, 2022, 7:40 a.m. UTC | #8
Hi Han, Dumitru

I think that we should, as much as possible, try to achieve both goals:
- have an accurate ovn-installed
- do not increase latency in large scale deployments

The fact that ovn-installed is sent too early for mc flows is already an
issue today, independent of this patch.
Fixing ovn-installed related to mc flows by delaying the state change (for
all cases, included when no mc groups) might be seen as a performance
regression.

I agree that we should fix this ovn-installed issue, but it is not a
regression added by this patch. We should enter a BZ for it.
Per my understanding, the mc flows are updated when the SB_multicast_group
is seen as updated by ovn-controller, due to its references to port binding.
Other flows related to port binding are installed earlier, i.e. when
ovn-controller writes port_binding->chassis (i.e. before it receives SB
confirmation). So, while sending the mc flows earlier than what we do today
might be more complex, I think it makes some kind of sense (we would send
all those flows within the same loop).

Thanks
Xavier



On Wed, Jul 13, 2022 at 8:28 AM Han Zhou <hzhou@ovn.org> wrote:

>
>
> On Tue, Jul 12, 2022 at 12:35 AM Dumitru Ceara <dceara@redhat.com> wrote:
> >
> > On 7/12/22 08:52, Han Zhou wrote:
> > > On Mon, Jul 11, 2022 at 4:55 AM Dumitru Ceara <dceara@redhat.com>
> wrote:
> > >>
> > >> On 7/11/22 13:31, Xavier Simonart wrote:
> > >>> Hi Han
> > >>>
> > >>> Thanks for your review.
> > >>>
> > >>> Let me try to understand your two main concerns and the proper way to
> > > fix
> > >>> it.
> > >>> 1) We only try once to write pb->chassis. If the commit fails,
> > > pb->chassis
> > >>> is not written. As commit fails, we will recompute, but as the
> > >>> update_required flag is not set anymore, we might end up with no
> > >>> pb->chassis.
> > >>> => I'll remove the flag and try to update until it's confirmed.
> > >
> > > Thank you!
> > >
> > >>> 2) The state machine, and when we move to INSTALL_FLOWS. Serializing
> the
> > >>> state machine, by waiting for confirmation to be received before
> moving
> > > to
> > >>> INSTALL_FLOWS state will delay the ovn-installed compared to today.
> So
> > > I am
> > >>> (still) trying to see if there is any way to prevent this in some
> cases.
> > >>> Would it be correct to do this serialization (wait for pb->chassis
> > > update
> > >>> confirmation) only when using conditional monitoring? When using
> > >>> monitor-all, as soon as we have written (w/o confirmation)
> pb->chassis,
> > > we
> > >>> would move to INSTALL_FLOWS. In that loop where we wrote pb->chassis,
> > > (all)
> > >>> the flows should be updated taking into account pb->chassis.
> > >>
> > >> I think this approach is OK.  ovn-controllers that don't use
> conditional
> > >> monitoring already know the complete SB contents and should be able to
> > >> install (mostly?) complete sets of openflows that correspond to a
> given
> > >> Port_Binding.
> > >>
> > >> It's also "CMS-friendly", at least for ovn-kubernetes, which uses
> > >> ovn-monitor-all=true and only waits for
> > >> OVS.Interface.external_ids:ovn-installed=true.  And shouldn't impact
> the
> > >> others which wait for SB.Port_Binding.up=true.
> > >>
> > >
> > > Xavier and Dumitru, I think we shouldn't introduce the dependency of
> > > "ovn-monitor-all" setting here.
> > >
> > > First of all, ovn-installed is a flag for CMS to understand that all
> the
> > > flows related to the port-binding is installed. If we set the flag
> before
> > > it is truly completed, it is a bug, and it is possible that the flag
> is set
> > > but some traffic doesn't work.
> >
> > I think it's a matter of semantics.  The way I see "ovn-installed=true"
> > is: all flows that are relevant to the port binidng on the local chassis
> > have been installed.  When we added it it was for the ovn-k8s case;
> > ovn-k8s used to explicitly check if some openflow tables on the node
> > where the pod is being brought up contained flows that seemed to
> > correspond to the pod (e.g., matching on pod mac and IP addresses).
> >
> But the purpose of checking the flows (when ovn-installed wasn't
> available) was to make sure the pod is ready to send/receive traffic. If
> ovn-installed can provide more accuracy, why not?
>
> > > I did a quick test, and at least a flow in (table_id=38, priority=100)
> > > which is multicast-group related is updated AFTER the SB notification
> is
> > > received for the port-binding chassis update.
> > >
> >
> > This sounds like something we should fix, I think.  I don't see any
> > multicast-group changes conditioned by the port_binding being up=true in
> > northd.  I might be wrong though.
> >
>
> It is not about "up=true". It is triggered by the port-binding->chassis
> update. Since multicast-group has reference to port-binding, so a
> port-binding update triggers multicast-group change handling, which is
> required because physical flows related to the MC group need to be updated
> when port-binding->chassis is updated. You may argue that the IDL may be
> optimized so that the MC group change can be triggered and handled before
> SB is updated, but I am not sure if the benefit is worth the complexity.
> Given how OVSDB IDL transaction is designed, I'd always think a DB record
> is *formally* updated only after the update notification is received from
> the server, which seems to be safe and clear.
>
> > > Secondly, if the change hasn't made it to the SB, all the other nodes
> would
> > > not be able to reach the port, which means the workload (pod/VM) cannot
> > > receive traffic yet at this phase.
> > >
> >
> > Even if the change made it to the SB we have no way of knowing that all
> > other nodes processed it so we cannot know for sure that traffic can
> > flow properly end-to-end.  But, like I said above, this doesn't matter
> > if the semantics of ovn-installed=true are "all locally relevant flows
> > are installed".
> >
> It's true that even SB is updated it doesn't ensure all the nodes
> processed it, but I view it this way: at least from the current node's
> point of view, its job is done and the other nodes are beyond its control.
> On the other hand, if SB update failed, its job is not done yet. I am not
> saying this is the only *correct* way, but just the way I am seeing it :).
>
> > > So, I think our goal is not to set ovn-installed early, but to set it
> > > accurately (sometime may be ok to be conservative).
> > >
> >
> > Sure, but waiting for the SB port_binding.chassis update might introduce
> > significant spikes in latency if the SB is compacting (or just busy) at
> > that moment.
> >
> > This might become an issue in large scale deployments as pods will take
> > longer to be declared "ready".
> >
> I understand your concern, but if you think about it, no matter how the
> pods are *declared* ready doesn't change the fact it is ready or not. It
> doesn't make the real flow setup faster or slower.
> If the CMS really wants to declare it ready earlier, it can just ignore
> the ovn-installed flag check or flow check. What's the real benefit except
> for metrics?
>
> > > In addition, ovn-monitor-all is not always true even in ovn-k8s. It is
> > > configurable in ovn-k8s. (in our environment we set it to false, to
> save
> > > MEM and CPU for worker nodes, while sacrifice a little for the central
> SB
> > > DB)
> > >
> >
> > Ack.  But for this case specifically, as SB is already busier with
> > conditional monitoring, I think serializing events in ovn-controller
> > will create even more visible delays in pod bringup times.
> >
> > Thanks,
> > Dumitru
> >
> > > Thanks,
> > > Han
> > >>>
> > >>> Thanks again for your feedback
> > >>>
> > >>> Xavier
> > >>>
> > >>> On Thu, Jul 7, 2022 at 8:49 AM Han Zhou <hzhou@ovn.org> wrote:
> > >>>
> > >>>>
> > >>>>
> > >>>> On Wed, Jun 22, 2022 at 3:23 AM Xavier Simonart <
> xsimonar@redhat.com>
> > >>>> wrote:
> > >>>>>
> > >>>>> When VIF ports are claimed on a chassis, SBDB Port_Binding table is
> > >>>> updated.
> > >>>>> If the SBDB IDL is still is read-only ("in transaction") when such
> a
> > >>>> update
> > >>>>> is required, the update is not possible and recompute is triggered
> > >>>> through
> > >>>>> I+P failure.
> > >>>>>
> > >>>>> This situation can happen:
> > >>>>> - after updating Port_Binding->chassis to SBDB for one port, in a
> > >>>> following
> > >>>>>   iteration, ovn-controller handles
> > > Interface:external_ids:ovn-installed
> > >>>>>   (for the same port) while SBDB is still read-only.
> > >>>>> - after updating Port_Binding->chassis to SBDB for one port, in a
> > >>>> following
> > >>>>>   iteration, ovn-controller updates Port_Binding->chassis for
> another
> > >>>> port,
> > >>>>>   while SBDB is still read-only.
> > >>>>>
> > >>>>> This patch prevent the recompute, by having the if-status module
> > >>>>> updating the Port_Binding chassis (if needed) when possible.
> > >>>>> This does not delay Port_Binding chassis update compared to before
> > > this
> > >>>> patch.
> > >>>>> - With the patch, Port_Binding chassis will be updated as soon as
> > > SBDB is
> > >>>>> again writable, without recompute.
> > >>>>> - Without the patch, Port_Binding chassis was updated as soon as
> SBDB
> > > was
> > >>>>> again writable, through a recompute.
> > >>>>>
> > >>>>> As part of this patch, ovn-installed will not be updated for
> > > additional
> > >>>> chassis;
> > >>>>> it will only be updated when the migration is completed.
> > >>>>>
> > >>>>> Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2059253
> > >>>>> Signed-off-by: Xavier Simonart <xsimonar@redhat.com>
> > >>>>>
> > >>>>> ---
> > >>>>> v2:  - handled Dumitru's comments.
> > >>>>>      - handled Han's comments, mainly ensure we moved out of
> CLAIMED
> > >>>> state
> > >>>>>        only after updating pb->chassis to guarentee physical flows
> are
> > >>>> installed
> > >>>>>        when ovn-installed is updated in OVS.
> > >>>>>      - slighly reorganize the code to isolate 'notify_up = false'
> > > cases
> > >>>> in
> > >>>>>        claim_port (i.e. ports such as virtual ports), in the idea
> of
> > >>>> making
> > >>>>>        future patch preventing recomputes when virtual ports are
> > > claimed.
> > >>>>>      - updated test case to cause more race conditions.
> > >>>>>      - rebased on origin/main
> > >>>>>      - note that "additional chassis" as now supported by
> > >>>>>        "Support LSP:options:requested-chassis as a list" might
> still
> > >>>> cause
> > >>>>>        recomputes.
> > >>>>>      - fixed missing flows when Port_Binding chassis was updated by
> > >>>> mgr_update
> > >>>>>        w/o any lflow recalculation.
> > >>>>> v3:  - handled Dumitru's comments on v2, mainly have runtime_data
> > > handler
> > >>>>>        handling pb_claims when sb becomes writable (instead of a
> lflow
> > >>>> handler).
> > >>>>>      - fixed test as it was not checking recomputes on all hv, as
> well
> > >>>> as a flaky
> > >>>>>        behavior.
> > >>>>>      - rebased on origin/main.
> > >>>>> ---
> > >>>>>  controller/binding.c        | 154 +++++++++++++++++++++----------
> > >>>>>  controller/binding.h        |  15 +++-
> > >>>>>  controller/if-status.c      | 174
> > > ++++++++++++++++++++++++++++++++----
> > >>>>>  controller/if-status.h      |  16 +++-
> > >>>>>  controller/ovn-controller.c |  72 ++++++++++++++-
> > >>>>>  tests/ovn-macros.at         |  12 +++
> > >>>>>  tests/ovn.at                | 147 +++++++++++++++++++++++++++++-
> > >>>>>  tests/perf-northd.at        |  17 ----
> > >>>>>  8 files changed, 519 insertions(+), 88 deletions(-)
> > >>>>>
> > >>>>> diff --git a/controller/binding.c b/controller/binding.c
> > >>>>> index 2279570f9..b21577f71 100644
> > >>>>> --- a/controller/binding.c
> > >>>>> +++ b/controller/binding.c
> > >>>>> @@ -644,11 +644,17 @@ local_binding_get_lport_ofport(const struct
> > > shash
> > >>>> *local_bindings,
> > >>>>>  }
> > >>>>>
> > >>>>>  bool
> > >>>>> -local_binding_is_up(struct shash *local_bindings, const char
> > > *pb_name)
> > >>>>> +local_binding_is_up(struct shash *local_bindings, const char
> > > *pb_name,
> > >>>>> +                    const struct sbrec_chassis *chassis_rec)
> > >>>>>  {
> > >>>>>      struct local_binding *lbinding =
> > >>>>>          local_binding_find(local_bindings, pb_name);
> > >>>>>      struct binding_lport *b_lport =
> > >>>> local_binding_get_primary_lport(lbinding);
> > >>>>> +
> > >>>>> +    if (b_lport && b_lport->pb->chassis != chassis_rec) {
> > >>>>> +        return false;
> > >>>>> +    }
> > >>>>> +
> > >>>>>      if (lbinding && b_lport && lbinding->iface) {
> > >>>>>          if (b_lport->pb->n_up && !b_lport->pb->up[0]) {
> > >>>>>              return false;
> > >>>>> @@ -660,13 +666,23 @@ local_binding_is_up(struct shash
> > > *local_bindings,
> > >>>> const char *pb_name)
> > >>>>>  }
> > >>>>>
> > >>>>>  bool
> > >>>>> -local_binding_is_down(struct shash *local_bindings, const char
> > > *pb_name)
> > >>>>> +local_binding_is_down(struct shash *local_bindings, const char
> > > *pb_name,
> > >>>>> +                      const struct sbrec_chassis *chassis_rec)
> > >>>>>  {
> > >>>>>      struct local_binding *lbinding =
> > >>>>>          local_binding_find(local_bindings, pb_name);
> > >>>>>
> > >>>>>      struct binding_lport *b_lport =
> > >>>> local_binding_get_primary_lport(lbinding);
> > >>>>>
> > >>>>> +    if (b_lport) {
> > >>>>> +        if (b_lport->pb->chassis == chassis_rec) {
> > >>>>> +            return false;
> > >>>>> +        } else if (b_lport->pb->chassis) {
> > >>>>> +            VLOG_DBG("lport %s already claimed by other chassis",
> > >>>>> +                     b_lport->pb->logical_port);
> > >>>>> +        }
> > >>>>> +    }
> > >>>>> +
> > >>>>>      if (!lbinding) {
> > >>>>>          return true;
> > >>>>>      }
> > >>>>> @@ -884,37 +900,80 @@ get_lport_type_str(enum en_lport_type
> > > lport_type)
> > >>>>>      OVS_NOT_REACHED();
> > >>>>>  }
> > >>>>>
> > >>>>> -/* For newly claimed ports, if 'notify_up' is 'false':
> > >>>>> +void
> > >>>>> +set_pb_chassis_in_sbrec(const struct sbrec_port_binding *pb,
> > >>>>> +                        const struct sbrec_chassis *chassis_rec,
> > >>>>> +                        bool is_set)
> > >>>>> +{
> > >>>>> +    if (pb->chassis != chassis_rec) {
> > >>>>> +         if (is_set) {
> > >>>>> +            if (pb->chassis) {
> > >>>>> +                VLOG_INFO("Changing chassis for lport %s from %s
> to
> > >>>> %s.",
> > >>>>> +                          pb->logical_port, pb->chassis->name,
> > >>>>> +                          chassis_rec->name);
> > >>>>> +            } else {
> > >>>>> +                VLOG_INFO("Claiming lport %s for this chassis.",
> > >>>>> +                          pb->logical_port);
> > >>>>> +            }
> > >>>>> +            for (int i = 0; i < pb->n_mac; i++) {
> > >>>>> +                VLOG_INFO("%s: Claiming %s", pb->logical_port,
> > >>>> pb->mac[i]);
> > >>>>> +            }
> > >>>>> +            sbrec_port_binding_set_chassis(pb, chassis_rec);
> > >>>>> +        }
> > >>>>> +    } else if (!is_set) {
> > >>>>> +        sbrec_port_binding_set_chassis(pb, NULL);
> > >>>>> +    }
> > >>>>> +}
> > >>>>> +
> > >>>>> +void
> > >>>>> +local_binding_set_pb(struct shash *local_bindings, const char
> > > *pb_name,
> > >>>>> +                     const struct sbrec_chassis *chassis_rec,
> > >>>>> +                     struct hmap *tracked_datapaths, bool is_set)
> > >>>>> +{
> > >>>>> +    struct local_binding *lbinding =
> > >>>>> +        local_binding_find(local_bindings, pb_name);
> > >>>>> +    struct binding_lport *b_lport =
> > >>>> local_binding_get_primary_lport(lbinding);
> > >>>>> +
> > >>>>> +    if (b_lport) {
> > >>>>> +        set_pb_chassis_in_sbrec(b_lport->pb, chassis_rec, is_set);
> > >>>>> +        if (tracked_datapaths) {
> > >>>>> +            update_lport_tracking(b_lport->pb, tracked_datapaths,
> > > true);
> > >>>>> +        }
> > >>>>> +    }
> > >>>>> +}
> > >>>>> +
> > >>>>> +/* For newly claimed ports:
> > >>>>>   * - set the 'pb.up' field to true if 'pb' has no 'parent_pb'.
> > >>>>>   * - set the 'pb.up' field to true if 'parent_pb.up' is 'true'
> (e.g.,
> > >>>> for
> > >>>>>   *   container and virtual ports).
> > >>>>> - * Otherwise request a notification to be sent when the OVS flows
> > >>>>> - * corresponding to 'pb' have been installed.
> > >>>>> + *
> > >>>>> + * Returns false if lport is not claimed due to 'sb_readonly'.
> > >>>>> + * Returns true otherwise.
> > >>>>>   *
> > >>>>>   * Note:
> > >>>>> - *   Updates (directly or through a notification) the 'pb->up'
> field
> > >>>> only if
> > >>>>> - *   it's explicitly set to 'false'.
> > >>>>> + *   Updates the 'pb->up' field only if it's explicitly set to
> > > 'false'.
> > >>>>>   *   This is to ensure compatibility with older versions of
> > > ovn-northd.
> > >>>>>   */
> > >>>>> -static void
> > >>>>> +static bool
> > >>>>>  claimed_lport_set_up(const struct sbrec_port_binding *pb,
> > >>>>>                       const struct sbrec_port_binding *parent_pb,
> > >>>>> -                     const struct sbrec_chassis *chassis_rec,
> > >>>>> -                     bool notify_up, struct if_status_mgr *if_mgr)
> > >>>>> +                     bool sb_readonly)
> > >>>>>  {
> > >>>>> -    if (!notify_up) {
> > >>>>> -        bool up = true;
> > >>>>> -        if (!parent_pb || (parent_pb->n_up && parent_pb->up[0])) {
> > >>>>> +    /* When notify_up is false in claim_port(), no state is
> created
> > >>>>> +     * by if_status_mgr. In such cases, return false (i.e. trigger
> > >>>> recompute)
> > >>>>> +     * if we can't update sb (because it is readonly).
> > >>>>> +     */
> > >>>>> +    bool up = true;
> > >>>>> +    if (!parent_pb || (parent_pb->n_up && parent_pb->up[0])) {
> > >>>>> +        if (!sb_readonly) {
> > >>>>>              if (pb->n_up) {
> > >>>>>                  sbrec_port_binding_set_up(pb, &up, 1);
> > >>>>>              }
> > >>>>> +        } else if (pb->n_up && !pb->up[0]) {
> > >>>>> +            return false;
> > >>>>>          }
> > >>>>> -        return;
> > >>>>> -    }
> > >>>>> -
> > >>>>> -    if (pb->chassis != chassis_rec || (pb->n_up && !pb->up[0])) {
> > >>>>> -        if_status_mgr_claim_iface(if_mgr, pb->logical_port);
> > >>>>>      }
> > >>>>> +    return true;
> > >>>>>  }
> > >>>>>
> > >>>>>  typedef void (*set_func)(const struct sbrec_port_binding *pb,
> > >>>>> @@ -1057,37 +1116,35 @@ claim_lport(const struct sbrec_port_binding
> > > *pb,
> > >>>>>              struct hmap *tracked_datapaths,
> > >>>>>              struct if_status_mgr *if_mgr)
> > >>>>>  {
> > >>>>> -    if (!sb_readonly) {
> > >>>>> -        claimed_lport_set_up(pb, parent_pb, chassis_rec,
> notify_up,
> > >>>> if_mgr);
> > >>>>> -    }
> > >>>>> -
> > >>>>>      enum can_bind can_bind =
> > >>>> lport_can_bind_on_this_chassis(chassis_rec, pb);
> > >>>>>      bool update_tracked = false;
> > >>>>>
> > >>>>>      if (can_bind == CAN_BIND_AS_MAIN) {
> > >>>>>          if (pb->chassis != chassis_rec) {
> > >>>>> -            if (sb_readonly) {
> > >>>>> -                return false;
> > >>>>> -            }
> > >>>>> -
> > >>>>> -            if (pb->chassis) {
> > >>>>> -                VLOG_INFO("Changing chassis for lport %s from %s
> to
> > >>>> %s.",
> > >>>>> -                        pb->logical_port, pb->chassis->name,
> > >>>>> -                        chassis_rec->name);
> > >>>>> -            } else {
> > >>>>> -                VLOG_INFO("Claiming lport %s for this chassis.",
> > >>>>> -                          pb->logical_port);
> > >>>>> -            }
> > >>>>> -            for (size_t i = 0; i < pb->n_mac; i++) {
> > >>>>> -                VLOG_INFO("%s: Claiming %s", pb->logical_port,
> > >>>> pb->mac[i]);
> > >>>>> -            }
> > >>>>> -
> > >>>>> -            sbrec_port_binding_set_chassis(pb, chassis_rec);
> > >>>>>              if (is_additional_chassis(pb, chassis_rec)) {
> > >>>>> +                if (sb_readonly) {
> > >>>>> +                    return false;
> > >>>>> +                }
> > >>>>>                  remove_additional_chassis(pb, chassis_rec);
> > >>>>>              }
> > >>>>>              update_tracked = true;
> > >>>>>          }
> > >>>>> +        if (!notify_up) {
> > >>>>> +            if (!claimed_lport_set_up(pb, parent_pb,
> sb_readonly)) {
> > >>>>> +                return false;
> > >>>>> +            }
> > >>>>> +            if (pb->chassis != chassis_rec) {
> > >>>>> +                if (sb_readonly) {
> > >>>>> +                    return false;
> > >>>>> +                }
> > >>>>> +                set_pb_chassis_in_sbrec(pb, chassis_rec, true);
> > >>>>> +            }
> > >>>>> +        } else {
> > >>>>> +            if ((pb->chassis != chassis_rec) || (pb->n_up &&
> > >>>> !pb->up[0])) {
> > >>>>> +                if_status_mgr_claim_iface(if_mgr, pb, chassis_rec,
> > >>>>> +                                          sb_readonly);
> > >>>>> +            }
> > >>>>> +        }
> > >>>>>      } else if (can_bind == CAN_BIND_AS_ADDITIONAL) {
> > >>>>>          if (!is_additional_chassis(pb, chassis_rec)) {
> > >>>>>              if (sb_readonly) {
> > >>>>> @@ -1132,7 +1189,8 @@ claim_lport(const struct sbrec_port_binding
> *pb,
> > >>>>>   */
> > >>>>>  static bool
> > >>>>>  release_lport_main_chassis(const struct sbrec_port_binding *pb,
> > >>>>> -                           bool sb_readonly)
> > >>>>> +                           bool sb_readonly,
> > >>>>> +                           struct if_status_mgr *if_mgr)
> > >>>>>  {
> > >>>>>      if (pb->encap) {
> > >>>>>          if (sb_readonly) {
> > >>>>> @@ -1141,11 +1199,13 @@ release_lport_main_chassis(const struct
> > >>>> sbrec_port_binding *pb,
> > >>>>>          sbrec_port_binding_set_encap(pb, NULL);
> > >>>>>      }
> > >>>>>
> > >>>>> +    /* If sb readonly, pb->chassis unset through if-status if
> > > present.
> > >>>> */
> > >>>>>      if (pb->chassis) {
> > >>>>> -        if (sb_readonly) {
> > >>>>> +        if (!sb_readonly) {
> > >>>>> +            sbrec_port_binding_set_chassis(pb, NULL);
> > >>>>> +        } else if (!if_status_mgr_iface_is_present(if_mgr,
> > >>>> pb->logical_port)) {
> > >>>>>              return false;
> > >>>>>          }
> > >>>>> -        sbrec_port_binding_set_chassis(pb, NULL);
> > >>>>>      }
> > >>>>>
> > >>>>>      if (pb->virtual_parent) {
> > >>>>> @@ -1155,7 +1215,8 @@ release_lport_main_chassis(const struct
> > >>>> sbrec_port_binding *pb,
> > >>>>>          sbrec_port_binding_set_virtual_parent(pb, NULL);
> > >>>>>      }
> > >>>>>
> > >>>>> -    VLOG_INFO("Releasing lport %s from this chassis.",
> > >>>> pb->logical_port);
> > >>>>> +    VLOG_INFO("Releasing lport %s from this chassis
> > > (sb_readonly=%d)",
> > >>>>> +              pb->logical_port, sb_readonly);
> > >>>>>      return true;
> > >>>>>  }
> > >>>>>
> > >>>>> @@ -1189,7 +1250,7 @@ release_lport(const struct sbrec_port_binding
> > > *pb,
> > >>>>>                struct hmap *tracked_datapaths, struct if_status_mgr
> > >>>> *if_mgr)
> > >>>>>  {
> > >>>>>      if (pb->chassis == chassis_rec) {
> > >>>>> -        if (!release_lport_main_chassis(pb, sb_readonly)) {
> > >>>>> +        if (!release_lport_main_chassis(pb, sb_readonly, if_mgr))
> {
> > >>>>>              return false;
> > >>>>>          }
> > >>>>>      } else if (is_additional_chassis(pb, chassis_rec)) {
> > >>>>> @@ -1271,7 +1332,7 @@ consider_vif_lport_(const struct
> > >>>> sbrec_port_binding *pb,
> > >>>>>                               b_lport->lbinding->iface,
> > >>>>>                               !b_ctx_in->ovnsb_idl_txn,
> > >>>>>                               !parent_pb,
> > > b_ctx_out->tracked_dp_bindings,
> > >>>>> -                             b_ctx_out->if_mgr)){
> > >>>>> +                             b_ctx_out->if_mgr)) {
> > >>>>>                  return false;
> > >>>>>              }
> > >>>>>
> > >>>>> @@ -1527,7 +1588,8 @@ consider_localport(const struct
> > > sbrec_port_binding
> > >>>> *pb,
> > >>>>>      enum can_bind can_bind = lport_can_bind_on_this_chassis(
> > >>>>>          b_ctx_in->chassis_rec, pb);
> > >>>>>      if (can_bind == CAN_BIND_AS_MAIN) {
> > >>>>> -        if (!release_lport_main_chassis(pb,
> > > !b_ctx_in->ovnsb_idl_txn)) {
> > >>>>> +        if (!release_lport_main_chassis(pb,
> !b_ctx_in->ovnsb_idl_txn,
> > >>>>> +            b_ctx_out->if_mgr)) {
> > >>>>>              return false;
> > >>>>>          }
> > >>>>>      } else if (can_bind == CAN_BIND_AS_ADDITIONAL) {
> > >>>>> diff --git a/controller/binding.h b/controller/binding.h
> > >>>>> index 1fed06674..d20659b0b 100644
> > >>>>> --- a/controller/binding.h
> > >>>>> +++ b/controller/binding.h
> > >>>>> @@ -151,8 +151,10 @@ const struct sbrec_port_binding
> > >>>> *local_binding_get_primary_pb(
> > >>>>>  ofp_port_t local_binding_get_lport_ofport(const struct shash
> > >>>> *local_bindings,
> > >>>>>                                            const char *pb_name);
> > >>>>>
> > >>>>> -bool local_binding_is_up(struct shash *local_bindings, const char
> > >>>> *pb_name);
> > >>>>> -bool local_binding_is_down(struct shash *local_bindings, const
> char
> > >>>> *pb_name);
> > >>>>> +bool local_binding_is_up(struct shash *local_bindings, const char
> > >>>> *pb_name,
> > >>>>> +                         const struct sbrec_chassis *);
> > >>>>> +bool local_binding_is_down(struct shash *local_bindings, const
> char
> > >>>> *pb_name,
> > >>>>> +                           const struct sbrec_chassis *);
> > >>>>>  void local_binding_set_up(struct shash *local_bindings, const char
> > >>>> *pb_name,
> > >>>>>                            const struct sbrec_chassis *chassis_rec,
> > >>>>>                            const char *ts_now_str, bool
> sb_readonly,
> > >>>>> @@ -160,7 +162,10 @@ void local_binding_set_up(struct shash
> > >>>> *local_bindings, const char *pb_name,
> > >>>>>  void local_binding_set_down(struct shash *local_bindings, const
> char
> > >>>> *pb_name,
> > >>>>>                              const struct sbrec_chassis
> *chassis_rec,
> > >>>>>                              bool sb_readonly, bool ovs_readonly);
> > >>>>> -
> > >>>>> +void local_binding_set_pb(struct shash *local_bindings, const char
> > >>>> *pb_name,
> > >>>>> +                          const struct sbrec_chassis *chassis_rec,
> > >>>>> +                          struct hmap *tracked_datapaths,
> > >>>>> +                          bool is_set);
> > >>>>>  void binding_register_ovs_idl(struct ovsdb_idl *);
> > >>>>>  void binding_run(struct binding_ctx_in *, struct binding_ctx_out
> *);
> > >>>>>  bool binding_cleanup(struct ovsdb_idl_txn *ovnsb_idl_txn,
> > >>>>> @@ -178,6 +183,10 @@ void binding_dump_local_bindings(struct
> > >>>> local_binding_data *, struct ds *);
> > >>>>>  bool is_additional_chassis(const struct sbrec_port_binding *pb,
> > >>>>>                             const struct sbrec_chassis
> *chassis_rec);
> > >>>>>
> > >>>>> +void set_pb_chassis_in_sbrec(const struct sbrec_port_binding *pb,
> > >>>>> +                             const struct sbrec_chassis
> *chassis_rec,
> > >>>>> +                             bool is_set);
> > >>>>> +
> > >>>>>  /* Corresponds to each Port_Binding.type. */
> > >>>>>  enum en_lport_type {
> > >>>>>      LP_UNKNOWN,
> > >>>>> diff --git a/controller/if-status.c b/controller/if-status.c
> > >>>>> index ad61844d8..7693c289b 100644
> > >>>>> --- a/controller/if-status.c
> > >>>>> +++ b/controller/if-status.c
> > >>>>> @@ -24,6 +24,7 @@
> > >>>>>  #include "lib/util.h"
> > >>>>>  #include "timeval.h"
> > >>>>>  #include "openvswitch/vlog.h"
> > >>>>> +#include "lib/ovn-sb-idl.h"
> > >>>>>
> > >>>>>  VLOG_DEFINE_THIS_MODULE(if_status);
> > >>>>>
> > >>>>> @@ -53,9 +54,11 @@ VLOG_DEFINE_THIS_MODULE(if_status);
> > >>>>>   */
> > >>>>>
> > >>>>>  enum if_state {
> > >>>>> -    OIF_CLAIMED,       /* Newly claimed interface. */
> > >>>>> -    OIF_INSTALL_FLOWS, /* Already claimed interface for which
> flows
> > > are
> > >>>> still
> > >>>>> -                        * being installed.
> > >>>>> +    OIF_CLAIMED,       /* Newly claimed interface. pb->chassis not
> > > yet
> > >>>> updated.
> > >>>>> +                        */
> > >>>>> +    OIF_INSTALL_FLOWS, /* Claimed interface with pb->chassis
> > >>>> successfully
> > >>>>> +                        * updated in SB and for which flows are
> still
> > >>>> being
> > >>>>> +                        * installed.
> > >>>>>                          */
> > >>>>>      OIF_MARK_UP,       /* Interface with flows successfully
> installed
> > >>>> in OVS
> > >>>>>                          * but not yet marked "up" in the binding
> > > module
> > >>>> (in
> > >>>>> @@ -78,6 +81,55 @@ static const char *if_state_names[] = {
> > >>>>>      [OIF_INSTALLED]     = "INSTALLED",
> > >>>>>  };
> > >>>>>
> > >>>>> +/*
> > >>>>> + *       +----------------------+
> > >>>>> + * +---> |                      |
> > >>>>> + * | +-> |         NULL         |
> > >>>> <--------------------------------------+++-+
> > >>>>> + * | |   +----------------------+
> > >>>>      |
> > >>>>> + * | |     ^ release_iface   | claim_iface
> > >>>>       |
> > >>>>> + * | |     |                 V - sbrec_update_chassis(if sb is rw)
> > >>>>       |
> > >>>>> + * | |   +----------------------+
> > >>>>      |
> > >>>>> + * | |   |                      |
> > >>>> <----------------------------------------+ |
> > >>>>> + * | |   |       CLAIMED        |
> > >>>> <--------------------------------------+ | |
> > >>>>> + * | |   +----------------------+
> > >>>>  | | |
> > >>>>> + * | |                  | mgr_update(when sb is rw)
> > >>>>  | | |
> > >>>>> + * | | release_iface    |  - sbrec_update_chassis
> > >>>>  | | |
> > >>>>> + * | |                  |  - request seqno
> > >>>>   | | |
> > >>>>> + * | |                  V
> > >>>>  | | |
> > >>>>> + * | |   +----------------------+
> > >>>>  | | |
> > >>>>> + * | +-- |                      |  mgr_run(seqno not rcvd)
> > >>>>   | | |
> > >>>>> + * |     |    INSTALL_FLOWS     |   - set port down in sb
> > >>>>  | | |
> > >>>>> + * |     |                      |  mgr_update()
> > >>>>  | | |
> > >>>>> + * |     +----------------------+   - sbrec_update_chassis if
> needed
> > >>>>   | | |
> > >>>>> + * |                    |
> > >>>>  | | |
> > >>>>> + * |                    |  mgr_run(seqno rcvd)
> > >>>>   | | |
> > >>>>> + * |                    |  - set port up in sb
> > >>>>   | | |
> > >>>>> + * | release_iface      |  - set ovn-installed in ovs
> > >>>>  | | |
> > >>>>> + * |                    V
> > >>>>  | | |
> > >>>>> + * |   +----------------------+
> > >>>>  | | |
> > >>>>> + * |   |                      |  mgr_run()
> > >>>>   | | |
> > >>>>> + * +-- |       MARK_UP        |  - set port up in sb
> > >>>>   | | |
> > >>>>> + *     |                      |  - set ovn-installed in ovs
> > >>>>  | | |
> > >>>>> + *     |                      |  mgr_update()
> > >>>>  | | |
> > >>>>> + *     +----------------------+  - sbrec_update_chassis if needed
> > >>>>  | | |
> > >>>>> + *              |
> > >>>>  | | |
> > >>>>> + *              | mgr_update(rcvd port up / ovn_installed &
> chassis
> > >>>> set) | | |
> > >>>>> + *              V
> > >>>>  | | |
> > >>>>> + *     +----------------------+
> > >>>>  | | |
> > >>>>> + *     |      INSTALLED       | ------------> claim_iface
> > >>>> ---------------+ | |
> > >>>>> + *     +----------------------+
> > >>>>    | |
> > >>>>> + *              |
> > >>>>    | |
> > >>>>> + *              | release_iface
> > >>>>    | |
> > >>>>> + *              V
> > >>>>    | |
> > >>>>> + *     +----------------------+
> > >>>>    | |
> > >>>>> + *     |                      | ------------> claim_iface
> > >>>> -----------------+ |
> > >>>>> + *     |      MARK_DOWN       | ------> mgr_update(rcvd port down)
> > >>>> ----------+
> > >>>>> + *     |                      | mgr_run()
> > >>>>> + *     |                      | - set port down in sb
> > >>>>> + *     |                      | mgr_update()
> > >>>>> + *     +----------------------+ - sbrec_update_chassis(NULL)
> > >>>>> + */
> > >>>>> +
> > >>>>>  struct ovs_iface {
> > >>>>>      char *id;               /* Extracted from OVS
> > >>>> external_ids.iface_id. */
> > >>>>>      enum if_state state;    /* State of the interface in the state
> > >>>> machine. */
> > >>>>> @@ -85,6 +137,7 @@ struct ovs_iface {
> > >>>>>                               * be fully programmed in OVS.  Only
> used
> > >>>> in state
> > >>>>>                               * OIF_INSTALL_FLOWS.
> > >>>>>                               */
> > >>>>> +    bool chassis_update_required;  /* If true, pb->chassis must be
> > >>>> updated. */
> > >>>>>  };
> > >>>>>
> > >>>>>  static uint64_t ifaces_usage;
> > >>>>> @@ -158,14 +211,23 @@ if_status_mgr_destroy(struct if_status_mgr
> *mgr)
> > >>>>>  }
> > >>>>>
> > >>>>>  void
> > >>>>> -if_status_mgr_claim_iface(struct if_status_mgr *mgr, const char
> > >>>> *iface_id)
> > >>>>> +if_status_mgr_claim_iface(struct if_status_mgr *mgr,
> > >>>>> +                          const struct sbrec_port_binding *pb,
> > >>>>> +                          const struct sbrec_chassis *chassis_rec,
> > >>>>> +                          bool sb_readonly)
> > >>>>>  {
> > >>>>> +    const char *iface_id = pb->logical_port;
> > >>>>>      struct ovs_iface *iface = shash_find_data(&mgr->ifaces,
> > > iface_id);
> > >>>>>
> > >>>>>      if (!iface) {
> > >>>>>          iface = ovs_iface_create(mgr, iface_id, OIF_CLAIMED);
> > >>>>>      }
> > >>>>> -
> > >>>>> +    if (!sb_readonly) {
> > >>>>> +        set_pb_chassis_in_sbrec(pb, chassis_rec, true);
> > >>>>> +        iface->chassis_update_required = false;
> > >>>>> +    } else {
> > >>>>> +        iface->chassis_update_required = true;
> > >>>>> +    }
> > >>>>>      switch (iface->state) {
> > >>>>>      case OIF_CLAIMED:
> > >>>>>      case OIF_INSTALL_FLOWS:
> > >>>>> @@ -182,6 +244,12 @@ if_status_mgr_claim_iface(struct if_status_mgr
> > >>>> *mgr, const char *iface_id)
> > >>>>>      }
> > >>>>>  }
> > >>>>>
> > >>>>> +bool
> > >>>>> +if_status_mgr_iface_is_present(struct if_status_mgr *mgr, const
> char
> > >>>> *iface_id)
> > >>>>> +{
> > >>>>> +    return !!shash_find_data(&mgr->ifaces, iface_id);
> > >>>>> +}
> > >>>>> +
> > >>>>>  void
> > >>>>>  if_status_mgr_release_iface(struct if_status_mgr *mgr, const char
> > >>>> *iface_id)
> > >>>>>  {
> > >>>>> @@ -246,9 +314,39 @@ if_status_mgr_delete_iface(struct
> if_status_mgr
> > >>>> *mgr, const char *iface_id)
> > >>>>>      }
> > >>>>>  }
> > >>>>>
> > >>>>> +bool
> > >>>>> +if_status_handle_claims(struct if_status_mgr *mgr,
> > >>>>> +                        struct local_binding_data *binding_data,
> > >>>>> +                        const struct sbrec_chassis *chassis_rec,
> > >>>>> +                        struct hmap *tracked_datapath,
> > >>>>> +                        bool sb_readonly)
> > >>>>> +{
> > >>>>> +    if (!binding_data || sb_readonly) {
> > >>>>> +        return false;
> > >>>>> +    }
> > >>>>> +
> > >>>>> +    struct shash *bindings = &binding_data->bindings;
> > >>>>> +    struct hmapx_node *node;
> > >>>>> +
> > >>>>> +    bool rc = false;
> > >>>>> +    HMAPX_FOR_EACH (node, &mgr->ifaces_per_state[OIF_CLAIMED]) {
> > >>>>> +        struct ovs_iface *iface = node->data;
> > >>>>> +        if (iface->chassis_update_required) {
> > >>>>
> > >>>> Thanks Xavier for the revision. The state machine looks more clear
> now,
> > >>>> but I have a major concern for the use of chassis_update_required.
> This
> > >>>> bool flag is used to decide if an update to SB is needed, and once
> a SB
> > >>>> update is requested, it is set to false, and assumes the SB update
> will
> > >>>> succeed immediately. However, the assumption may be wrong. There
> can be
> > >>>> different kinds of reasons that the subsequent SB update fails, or
> > > delayed,
> > >>>> so this flag is not reliable. Instead, in CLAIMED state, the
> > > responsibility
> > >>>> to make sure the SB update is completed. If the transaction is
> > > in-progress,
> > >>>> the sb_readonly is true. So if sb_readonly is false, it means
> nothing
> > > is
> > >>>> in-progress, so we can always check if (!sb_readonly && <SB chassis
> is
> > > not
> > >>>> updated for the port-binding>) we should just send the update,
> > > regardless
> > >>>> of whether we have requested it before. Please also see another
> comment
> > >>>> below for the state transition.
> > >>>>
> > >>>>> +            VLOG_INFO("if_status_handle_claims for %s",
> iface->id);
> > >>>>> +            local_binding_set_pb(bindings, iface->id, chassis_rec,
> > >>>>> +                                 tracked_datapath, true);
> > >>>>> +            rc = true;
> > >>>>> +        }
> > >>>>> +        iface->chassis_update_required = false;
> > >>>>> +    }
> > >>>>> +    return rc;
> > >>>>> +}
> > >>>>> +
> > >>>>>  void
> > >>>>>  if_status_mgr_update(struct if_status_mgr *mgr,
> > >>>>> -                     struct local_binding_data *binding_data)
> > >>>>> +                     struct local_binding_data *binding_data,
> > >>>>> +                     const struct sbrec_chassis *chassis_rec,
> > >>>>> +                     bool sb_readonly)
> > >>>>>  {
> > >>>>>      if (!binding_data) {
> > >>>>>          return;
> > >>>>> @@ -257,13 +355,25 @@ if_status_mgr_update(struct if_status_mgr
> *mgr,
> > >>>>>      struct shash *bindings = &binding_data->bindings;
> > >>>>>      struct hmapx_node *node;
> > >>>>>
> > >>>>> +    /* Interfaces in OIF_MARK_UP state have already set their
> > >>>> pb->chassis.
> > >>>>> +     * However, it might have been reset by another hv.
> > >>>>> +     */
> > >>>>>      /* Move all interfaces that have been confirmed "up" by the
> > > binding
> > >>>> module,
> > >>>>>       * from OIF_MARK_UP to OIF_INSTALLED.
> > >>>>>       */
> > >>>>>      HMAPX_FOR_EACH_SAFE (node,
> &mgr->ifaces_per_state[OIF_MARK_UP]) {
> > >>>>>          struct ovs_iface *iface = node->data;
> > >>>>>
> > >>>>> -        if (local_binding_is_up(bindings, iface->id)) {
> > >>>>> +        if (iface->chassis_update_required) {
> > >>>>> +            if (!sb_readonly) {
> > >>>>> +                iface->chassis_update_required = false;
> > >>>>> +                local_binding_set_pb(bindings, iface->id,
> > > chassis_rec,
> > >>>>> +                                     NULL, true);
> > >>>>> +            } else {
> > >>>>> +                continue;
> > >>>>> +            }
> > >>>>> +        }
> > >>>>> +        if (local_binding_is_up(bindings, iface->id,
> chassis_rec)) {
> > >>>>>              ovs_iface_set_state(mgr, iface, OIF_INSTALLED);
> > >>>>>          }
> > >>>>>      }
> > >>>>> @@ -274,23 +384,53 @@ if_status_mgr_update(struct if_status_mgr
> *mgr,
> > >>>>>      HMAPX_FOR_EACH_SAFE (node,
> > > &mgr->ifaces_per_state[OIF_MARK_DOWN]) {
> > >>>>>          struct ovs_iface *iface = node->data;
> > >>>>>
> > >>>>> -        if (local_binding_is_down(bindings, iface->id)) {
> > >>>>> +        if (!sb_readonly) {
> > >>>>> +            local_binding_set_pb(bindings, iface->id, chassis_rec,
> > >>>>> +                                 NULL, false);
> > >>>>> +        }
> > >>>>> +        if (local_binding_is_down(bindings, iface->id,
> chassis_rec))
> > > {
> > >>>>>              ovs_iface_destroy(mgr, iface);
> > >>>>>          }
> > >>>>>      }
> > >>>>>
> > >>>>> -    /* Register for a notification about flows being installed in
> OVS
> > >>>> for all
> > >>>>> -     * newly claimed interfaces.
> > >>>>> +    if (!sb_readonly) {
> > >>>>> +        HMAPX_FOR_EACH_SAFE (node,
> > >>>> &mgr->ifaces_per_state[OIF_INSTALL_FLOWS]) {
> > >>>>> +            struct ovs_iface *iface = node->data;
> > >>>>> +
> > >>>>> +            if (iface->chassis_update_required) {
> > >>>>> +                iface->chassis_update_required = false;
> > >>>>> +                local_binding_set_pb(bindings, iface->id,
> > > chassis_rec,
> > >>>>> +                                     NULL, true);
> > >>>>> +            }
> > >>>>> +        }
> > >>>>> +    }
> > >>>>> +
> > >>>>> +    /* Update Port_Binding->chassis for newly claimed interfaces
> > >>>>> +     * Register for a notification about flows being installed in
> OVS
> > >>>> for all
> > >>>>> +     * newly claimed interfaces for which we could update
> > > pb->chassis.
> > >>>>>       *
> > >>>>>       * Move them from OIF_CLAIMED to OIF_INSTALL_FLOWS.
> > >>>>>       */
> > >>>>> -    bool new_ifaces = false;
> > >>>>> -    HMAPX_FOR_EACH_SAFE (node,
> &mgr->ifaces_per_state[OIF_CLAIMED]) {
> > >>>>> -        struct ovs_iface *iface = node->data;
> > >>>>>
> > >>>>> -        ovs_iface_set_state(mgr, iface, OIF_INSTALL_FLOWS);
> > >>>>> -        iface->install_seqno = mgr->iface_seqno + 1;
> > >>>>> -        new_ifaces = true;
> > >>>>> +    bool new_ifaces = false;
> > >>>>> +    if (!sb_readonly) {
> > >>>>> +        HMAPX_FOR_EACH_SAFE (node,
> > > &mgr->ifaces_per_state[OIF_CLAIMED])
> > >>>> {
> > >>>>> +            struct ovs_iface *iface = node->data;
> > >>>>> +            /* No need to check for chassis_update_required as
> > > already
> > >>>> done
> > >>>>> +             * in if_status_handle_claims or
> > > if_status_mgr_claim_iface
> > >>>>> +             */
> > >>>>> +            ovs_iface_set_state(mgr, iface, OIF_INSTALL_FLOWS);
> > >>>>
> > >>>> We need to make sure the chassis in SB port-binding is up-to-date
> (i.e.
> > >>>> the update notification from SB DB has been received) before moving
> to
> > >>>> INSTALL_FLOWS. Otherwise, it is still possible that the state is
> moved
> > > too
> > >>>> early and end up with incomplete flow installation for the lport
> when
> > > the
> > >>>> state is finally moved to INSTALLED.
> > >>>>
> > >>>> Thanks,
> > >>>> Han
> > >>>>
> > >>>>> +            iface->install_seqno = mgr->iface_seqno + 1;
> > >>>>> +            new_ifaces = true;
> > >>>>> +        }
> > >>>>> +    } else {
> > >>>>> +        HMAPX_FOR_EACH_SAFE (node,
> > > &mgr->ifaces_per_state[OIF_CLAIMED])
> > >>>> {
> > >>>>> +            struct ovs_iface *iface = node->data;
> > >>>>> +            static struct vlog_rate_limit rl =
> > > VLOG_RATE_LIMIT_INIT(5,
> > >>>> 1);
> > >>>>> +            VLOG_INFO_RL(&rl,
> > >>>>> +                         "Not updating pb chassis for %s now as "
> > >>>>> +                         "sb is readonly", iface->id);
> > >>>>> +        }
> > >>>>>      }
> > >>>>>
> > >>>>>      /* Request a seqno update when the flows for new interfaces
> have
> > >>>> been
> > >>>>> @@ -403,7 +543,7 @@ if_status_mgr_update_bindings(struct
> if_status_mgr
> > >>>> *mgr,
> > >>>>>      struct hmapx_node *node;
> > >>>>>
> > >>>>>      /* Notify the binding module to set "down" all bindings that
> are
> > >>>> still
> > >>>>> -     * in the process of being installed in OVS, i.e., are not yet
> > >>>> instsalled.
> > >>>>> +     * in the process of being installed in OVS, i.e., are not yet
> > >>>> installed.
> > >>>>>       */
> > >>>>>      HMAPX_FOR_EACH (node,
> &mgr->ifaces_per_state[OIF_INSTALL_FLOWS])
> > > {
> > >>>>>          struct ovs_iface *iface = node->data;
> > >>>>> diff --git a/controller/if-status.h b/controller/if-status.h
> > >>>>> index bb8a3950d..f9b05d30d 100644
> > >>>>> --- a/controller/if-status.h
> > >>>>> +++ b/controller/if-status.h
> > >>>>> @@ -27,15 +27,27 @@ struct if_status_mgr
> *if_status_mgr_create(void);
> > >>>>>  void if_status_mgr_clear(struct if_status_mgr *);
> > >>>>>  void if_status_mgr_destroy(struct if_status_mgr *);
> > >>>>>
> > >>>>> -void if_status_mgr_claim_iface(struct if_status_mgr *, const char
> > >>>> *iface_id);
> > >>>>> +void if_status_mgr_claim_iface(struct if_status_mgr *,
> > >>>>> +                               const struct sbrec_port_binding
> *pb,
> > >>>>> +                               const struct sbrec_chassis
> > > *chassis_rec,
> > >>>>> +                               bool sb_readonly);
> > >>>>>  void if_status_mgr_release_iface(struct if_status_mgr *, const
> char
> > >>>> *iface_id);
> > >>>>>  void if_status_mgr_delete_iface(struct if_status_mgr *, const char
> > >>>> *iface_id);
> > >>>>>
> > >>>>> -void if_status_mgr_update(struct if_status_mgr *, struct
> > >>>> local_binding_data *);
> > >>>>> +void if_status_mgr_update(struct if_status_mgr *, struct
> > >>>> local_binding_data *,
> > >>>>> +                          const struct sbrec_chassis *chassis,
> > >>>>> +                          bool sb_readonly);
> > >>>>>  void if_status_mgr_run(struct if_status_mgr *mgr, struct
> > >>>> local_binding_data *,
> > >>>>>                         const struct sbrec_chassis *,
> > >>>>>                         bool sb_readonly, bool ovs_readonly);
> > >>>>>  void if_status_mgr_get_memory_usage(struct if_status_mgr *mgr,
> > >>>>>                                      struct simap *usage);
> > >>>>> +bool if_status_mgr_iface_is_present(struct if_status_mgr *mgr,
> > >>>>> +                                    const char *iface_id);
> > >>>>> +bool if_status_handle_claims(struct if_status_mgr *mgr,
> > >>>>> +                             struct local_binding_data
> *binding_data,
> > >>>>> +                             const struct sbrec_chassis
> *chassis_rec,
> > >>>>> +                             struct hmap *tracked_datapath,
> > >>>>> +                             bool sb_readonly);
> > >>>>>
> > >>>>>  # endif /* controller/if-status.h */
> > >>>>> diff --git a/controller/ovn-controller.c
> b/controller/ovn-controller.c
> > >>>>> index 69615308e..3947baf03 100644
> > >>>>> --- a/controller/ovn-controller.c
> > >>>>> +++ b/controller/ovn-controller.c
> > >>>>> @@ -1464,6 +1464,73 @@ en_runtime_data_run(struct engine_node
> *node,
> > >>>> void *data)
> > >>>>>      engine_set_node_state(node, EN_UPDATED);
> > >>>>>  }
> > >>>>>
> > >>>>> +struct ed_type_sb_ro {
> > >>>>> +    bool sb_readonly;
> > >>>>> +};
> > >>>>> +
> > >>>>> +static void *
> > >>>>> +en_sb_ro_init(struct engine_node *node OVS_UNUSED,
> > >>>>> +              struct engine_arg *arg OVS_UNUSED)
> > >>>>> +{
> > >>>>> +    struct ed_type_sb_ro *data = xzalloc(sizeof *data);
> > >>>>> +    return data;
> > >>>>> +}
> > >>>>> +
> > >>>>> +static void
> > >>>>> +en_sb_ro_run(struct engine_node *node, void *data)
> > >>>>> +{
> > >>>>> +    struct ed_type_sb_ro *sb_ro_data = data;
> > >>>>> +    bool sb_readonly = !engine_get_context()->ovnsb_idl_txn;
> > >>>>> +    if (sb_ro_data->sb_readonly != sb_readonly) {
> > >>>>> +        sb_ro_data->sb_readonly = sb_readonly;
> > >>>>> +        if (!sb_ro_data->sb_readonly) {
> > >>>>> +            engine_set_node_state(node, EN_UPDATED);
> > >>>>> +        }
> > >>>>> +    }
> > >>>>> +}
> > >>>>> +
> > >>>>> +static void
> > >>>>> +en_sb_ro_cleanup(void *data OVS_UNUSED)
> > >>>>> +{
> > >>>>> +}
> > >>>>> +
> > >>>>> +static bool
> > >>>>> +runtime_data_sb_ro_handler(struct engine_node *node, void *data)
> > >>>>> +{
> > >>>>> +    const struct sbrec_chassis *chassis = NULL;
> > >>>>> +
> > >>>>> +    struct ovsrec_open_vswitch_table *ovs_table =
> > >>>>> +        (struct ovsrec_open_vswitch_table *)EN_OVSDB_GET(
> > >>>>> +            engine_get_input("OVS_open_vswitch", node));
> > >>>>> +
> > >>>>> +    const char *chassis_id = get_ovs_chassis_id(ovs_table);
> > >>>>> +
> > >>>>> +    struct ovsdb_idl_index *sbrec_chassis_by_name =
> > >>>>> +        engine_ovsdb_node_get_index(
> > >>>>> +                engine_get_input("SB_chassis", node),
> > >>>>> +                "name");
> > >>>>> +
> > >>>>> +    if (chassis_id) {
> > >>>>> +        chassis = chassis_lookup_by_name(sbrec_chassis_by_name,
> > >>>> chassis_id);
> > >>>>> +    }
> > >>>>> +    if (chassis) {
> > >>>>> +        struct ed_type_runtime_data *rt_data = data;
> > >>>>> +        bool sb_readonly = !engine_get_context()->ovnsb_idl_txn;
> > >>>>> +        struct controller_engine_ctx *ctrl_ctx =
> > >>>>> +            engine_get_context()->client_ctx;
> > >>>>> +
> > >>>>> +        if (if_status_handle_claims(ctrl_ctx->if_mgr,
> > >>>>> +                                    &rt_data->lbinding_data,
> > >>>>> +                                    chassis,
> > >>>>> +                                    &rt_data->tracked_dp_bindings,
> > >>>>> +                                    sb_readonly)) {
> > >>>>> +            engine_set_node_state(node, EN_UPDATED);
> > >>>>> +            rt_data->tracked = true;
> > >>>>> +        }
> > >>>>> +    }
> > >>>>> +    return true;
> > >>>>> +}
> > >>>>> +
> > >>>>>  static bool
> > >>>>>  runtime_data_ovs_interface_shadow_handler(struct engine_node
> *node,
> > >>>> void *data)
> > >>>>>  {
> > >>>>> @@ -3528,6 +3595,7 @@ main(int argc, char *argv[])
> > >>>>>      stopwatch_create(VIF_PLUG_RUN_STOPWATCH_NAME, SW_MS);
> > >>>>>
> > >>>>>      /* Define inc-proc-engine nodes. */
> > >>>>> +    ENGINE_NODE(sb_ro, "sb_ro");
> > >>>>>      ENGINE_NODE_WITH_CLEAR_TRACK_DATA_IS_VALID(ct_zones,
> "ct_zones");
> > >>>>>      ENGINE_NODE_WITH_CLEAR_TRACK_DATA(ovs_interface_shadow,
> > >>>>>                                        "ovs_interface_shadow");
> > >>>>> @@ -3664,6 +3732,7 @@ main(int argc, char *argv[])
> > >>>>>      engine_add_input(&en_ovs_interface_shadow, &en_ovs_interface,
> > >>>>>                       ovs_interface_shadow_ovs_interface_handler);
> > >>>>>
> > >>>>> +    engine_add_input(&en_runtime_data, &en_sb_ro,
> > >>>> runtime_data_sb_ro_handler);
> > >>>>>      engine_add_input(&en_runtime_data, &en_ofctrl_is_connected,
> > > NULL);
> > >>>>>
> > >>>>>      engine_add_input(&en_runtime_data, &en_ovs_open_vswitch,
> NULL);
> > >>>>> @@ -4098,7 +4167,8 @@ main(int argc, char *argv[])
> > >>>>>                          runtime_data ?
> &runtime_data->lbinding_data :
> > >>>> NULL;
> > >>>>>
> > >  stopwatch_start(IF_STATUS_MGR_UPDATE_STOPWATCH_NAME,
> > >>>>>                                      time_msec());
> > >>>>> -                    if_status_mgr_update(if_mgr, binding_data);
> > >>>>> +                    if_status_mgr_update(if_mgr, binding_data,
> > > chassis,
> > >>>>> +                                         !ovnsb_idl_txn);
> > >>>>>
> > >  stopwatch_stop(IF_STATUS_MGR_UPDATE_STOPWATCH_NAME,
> > >>>>>                                     time_msec());
> > >>>>>
> > >>>>> diff --git a/tests/ovn-macros.at b/tests/ovn-macros.at
> > >>>>> index 335f9158c..8fd6ae6f7 100644
> > >>>>> --- a/tests/ovn-macros.at
> > >>>>> +++ b/tests/ovn-macros.at
> > >>>>> @@ -759,3 +759,15 @@
> > > m4_define([OVN_FOR_EACH_NORTHD_WITHOUT_DP_GROUPS],
> > >>>>>       [m4_foreach([NORTHD_USE_DP_GROUPS], [no],
> > >>>>>         [m4_foreach([NORTHD_USE_PARALLELIZATION], [yes, no], [$1
> > >>>>>  ])])])])
> > >>>>> +
> > >>>>> +# OVN_NBCTL(NBCTL_COMMAND) adds NBCTL_COMMAND to list of commands
> to
> > > be
> > >>>> run by RUN_OVN_NBCTL().
> > >>>>> +m4_define([OVN_NBCTL], [
> > >>>>> +    command="${command} -- $1"
> > >>>>> +])
> > >>>>> +
> > >>>>> +# RUN_OVN_NBCTL() executes list of commands built by the
> OVN_NBCTL()
> > >>>> macro.
> > >>>>> +m4_define([RUN_OVN_NBCTL], [
> > >>>>> +    check ovn-nbctl ${command}
> > >>>>> +    unset command
> > >>>>> +])
> > >>>>> +
> > >>>>> diff --git a/tests/ovn.at b/tests/ovn.at
> > >>>>> index bfaa41962..94d16bac9 100644
> > >>>>> --- a/tests/ovn.at
> > >>>>> +++ b/tests/ovn.at
> > >>>>> @@ -102,6 +102,18 @@ m4_divert_text([PREPARE_TESTS],
> > >>>>>           test 1 -le $(as $hv1 ovs-ofctl dump-flows br-int | grep
> -c
> > >>>> "output:$ofport")
> > >>>>>       ])
> > >>>>>     }
> > >>>>> +
> > >>>>> +   ovn_wait_remote_input_flows () {
> > >>>>> +     hv1=$1
> > >>>>> +     hv2=$2
> > >>>>> +     echo "$3: waiting for flows for remote input on $hv1"
> > >>>>> +     # Wait for a flow outputing  to remote input
> > >>>>> +     OVS_WAIT_UNTIL([
> > >>>>> +         ofport=$(as $hv1 ovs-vsctl --bare --columns ofport find
> > >>>> Interface name=ovn-${hv2}-0)
> > >>>>> +         echo "tunnel port=$ofport"
> > >>>>> +         test 1 -le $(as $hv1 ovs-ofctl dump-flows br-int | grep
> -c
> > >>>> "in_port=$ofport")
> > >>>>> +     ])
> > >>>>> +   }
> > >>>>>  ])
> > >>>>>
> > >>>>>  m4_define([OVN_CHECK_PACKETS],
> > >>>>> @@ -127,6 +139,8 @@ m4_define([OVN_WAIT_PATCH_PORT_FLOWS],
> > >>>>>  m4_define([OVN_WAIT_REMOTE_OUTPUT_FLOWS],
> > >>>>>    [ovn_wait_remote_output_flows "$1" "$2" "__file__:__line__"])
> > >>>>>
> > >>>>> +m4_define([OVN_WAIT_REMOTE_INPUT_FLOWS],
> > >>>>> +  [ovn_wait_remote_input_flows "$1" "$2" "__file__:__line__"])
> > >>>>>
> > >>>>>  AT_BANNER([OVN components])
> > >>>>>
> > >>>>> @@ -14056,6 +14070,11 @@ wait_column "$hv1_uuid" Port_Binding
> > >>>> requested_chassis logical_port=lsp0
> > >>>>>  wait_column "$hv2_uuid" Port_Binding additional_chassis
> > >>>> logical_port=lsp0
> > >>>>>  wait_column "$hv2_uuid" Port_Binding requested_additional_chassis
> > >>>> logical_port=lsp0
> > >>>>>
> > >>>>> +# Check ovn-installed updated for main chassis
> > >>>>> +wait_for_ports_up
> > >>>>> +OVS_WAIT_UNTIL([test `as hv1 ovs-vsctl get Interface lsp0
> > >>>> external_ids:ovn-installed` = '"true"'])
> > >>>>> +OVS_WAIT_UNTIL([test x`as hv2 ovs-vsctl get Interface lsp0
> > >>>> external_ids:ovn-installed` = x])
> > >>>>> +
> > >>>>>  # Check that setting iface:encap-ip populates
> > >>>> Port_Binding:additional_encap
> > >>>>>  wait_row_count Encap 2 chassis_name=hv1
> > >>>>>  wait_row_count Encap 2 chassis_name=hv2
> > >>>>> @@ -14081,6 +14100,11 @@ wait_column "$hv2_uuid" Port_Binding
> > >>>> requested_chassis logical_port=lsp0
> > >>>>>  wait_column "" Port_Binding additional_chassis logical_port=lsp0
> > >>>>>  wait_column "" Port_Binding requested_additional_chassis
> > >>>> logical_port=lsp0
> > >>>>>
> > >>>>> +# Check ovn-installed updated for main chassis and not for other
> > > chassis
> > >>>>> +wait_for_ports_up
> > >>>>> +OVS_WAIT_UNTIL([test `as hv2 ovs-vsctl get Interface lsp0
> > >>>> external_ids:ovn-installed` = '"true"'])
> > >>>>> +OVS_WAIT_UNTIL([test x`as hv1 ovs-vsctl get Interface lsp0
> > >>>> external_ids:ovn-installed` = x])
> > >>>>> +
> > >>>>>  # Check that additional_encap is cleared
> > >>>>>  wait_column "" Port_Binding additional_encap logical_port=lsp0
> > >>>>>
> > >>>>> @@ -15370,7 +15394,10 @@ AT_CHECK([as hv1 ovs-ofctl dump-flows
> br-int
> > >>>> table=65 | grep actions=output:1],
> > >>>>>  echo "verifying that lsp0 binding moves when requested-chassis is
> > >>>> changed"
> > >>>>>
> > >>>>>  ovn-nbctl lsp-set-options lsp0 requested-chassis=hv2
> > >>>>> -OVS_WAIT_UNTIL([test 1 = $(grep -c "Releasing lport lsp0 from this
> > >>>> chassis" hv1/ovn-controller.log)])
> > >>>>> +
> > >>>>> +# We might see multiple "Releasing lport ...", when sb is read
> only
> > >>>>> +OVS_WAIT_UNTIL([test 1 -le $(grep -c "Releasing lport lsp0 from
> this
> > >>>> chassis" hv1/ovn-controller.log)])
> > >>>>> +
> > >>>>>  wait_column "$hv2_uuid" Port_Binding chassis logical_port=lsp0
> > >>>>>
> > >>>>>  # (6) Chassis hv2 should add flows and hv1 should not.
> > >>>>> @@ -15416,7 +15443,7 @@ AT_CHECK([as hv1 ovs-ofctl dump-flows
> br-int
> > >>>> table=0 | grep in_port=1], [0], [ig
> > >>>>>  AT_CHECK([as hv1 ovs-ofctl dump-flows br-int table=65 | grep
> > >>>> actions=output:1], [0], [ignore])
> > >>>>>
> > >>>>>  check ovn-nbctl --wait=hv lsp-set-options lsp0
> > >>>> requested-chassis=non-existant-chassis
> > >>>>> -OVS_WAIT_UNTIL([test 1 = $(grep -c "Releasing lport lsp0 from this
> > >>>> chassis" hv1/ovn-controller.log)])
> > >>>>> +OVS_WAIT_UNTIL([test 1 -le $(grep -c "Releasing lport lsp0 from
> this
> > >>>> chassis" hv1/ovn-controller.log)])
> > >>>>>  check ovn-nbctl --wait=hv sync
> > >>>>>  wait_column '' Port_Binding chasssi logical_port=lsp0
> > >>>>>  AT_CHECK([as hv1 ovs-ofctl dump-flows br-int table=0 | grep
> > > in_port=1],
> > >>>> [1], [])
> > >>>>> @@ -32418,3 +32445,119 @@ AT_CHECK([test $(ovn-sbctl list fdb |
> grep
> > > -c
> > >>>> "00:00:00:00:10:30") = 0])
> > >>>>>  OVN_CLEANUP([hv1])
> > >>>>>  AT_CLEANUP
> > >>>>>  ])
> > >>>>> +
> > >>>>> +OVN_FOR_EACH_NORTHD([
> > >>>>> +AT_SETUP([recomputes])
> > >>>>> +ovn_start
> > >>>>> +
> > >>>>> +n_hv=4
> > >>>>> +
> > >>>>> +# Add chassis
> > >>>>> +net_add n1
> > >>>>> +for i in $(seq 1 $n_hv); do
> > >>>>> +    sim_add hv$i
> > >>>>> +    as hv$i
> > >>>>> +    check ovs-vsctl add-br br-phys
> > >>>>> +    ovs-vsctl set open .
> > > external-ids:ovn-bridge-mappings=phys:br-phys
> > >>>>> +    ovn_attach n1 br-phys 192.168.0.$i 24 geneve
> > >>>>> +done
> > >>>>> +
> > >>>>> +add_switch_ports() {
> > >>>>> +    start_port=$1
> > >>>>> +    end_port=$2
> > >>>>> +    nb_hv=$3
> > >>>>> +    bulk_size=$4
> > >>>>> +    for ((i=start_port; i<end_port; )) do
> > >>>>> +        start_bulk=$i
> > >>>>> +        for hv in $(seq 1 $nb_hv); do
> > >>>>> +            end_bulk=$((start_bulk+bulk_size-1))
> > >>>>> +            for port in $(seq $start_bulk $end_bulk); do
> > >>>>> +                logical_switch_port=lsp${port}
> > >>>>> +                OVN_NBCTL(lsp-add ls1 $logical_switch_port)
> > >>>>> +                OVN_NBCTL(lsp-set-addresses $logical_switch_port
> > >>>> dynamic)
> > >>>>> +            done
> > >>>>> +            start_bulk=$((end_bulk+1))
> > >>>>> +        done
> > >>>>> +        RUN_OVN_NBCTL()
> > >>>>> +
> > >>>>> +        start_bulk=$i
> > >>>>> +        for hv in $(seq 1 $nb_hv); do
> > >>>>> +            end_bulk=$((start_bulk+bulk_size-1))
> > >>>>> +            for port in $(seq $start_bulk $end_bulk); do
> > >>>>> +                logical_switch_port=lsp${port}
> > >>>>> +                as hv$hv ovs-vsctl \
> > >>>>> +                    --no-wait -- add-port br-int vif${port} \
> > >>>>> +                    -- set Interface vif${port}
> > >>>> external_ids:iface-id=$logical_switch_port
> > >>>>> +            done
> > >>>>> +            start_bulk=$((end_bulk+1))
> > >>>>> +        done
> > >>>>> +        i=$((end_bulk+1))
> > >>>>> +    done
> > >>>>> +}
> > >>>>> +check ovn-nbctl ls-add ls1
> > >>>>> +check ovn-nbctl set Logical_Switch ls1 other_config:subnet=
> > > 10.1.0.0/16
> > >>>>> +check ovn-nbctl set Logical_Switch ls1
> > >>>> other_config:exclude_ips=10.1.255.254
> > >>>>> +
> > >>>>> +check ovn-nbctl lr-add lr1
> > >>>>> +check ovn-nbctl lsp-add ls1 lsp0 -- set Logical_Switch_Port lsp0
> > >>>> type=router options:router-port=lrp0 addresses=dynamic
> > >>>>> +check ovn-nbctl lrp-add lr1 lrp0 "f0:00:00:01:00:01"
> 10.1.255.254/16
> > >>>>> +check ovn-nbctl lr-nat-add lr1 snat 10.2.0.1 10.1.0.0/16
> > >>>>> +
> > >>>>> +lflow_run=0
> > >>>>> +check ovn-nbctl --wait=hv sync
> > >>>>> +
> > >>>>> +# Tunnel ports might not be added (yet) at this point on slow
> system.
> > >>>>> +# Wait for flows related to such ports to ensure those ports have
> > > been
> > >>>> added
> > >>>>> +# before we measure recomputes. Otherwise, ovs_interface handler
> > > might
> > >>>> be run
> > >>>>> +# afterwards for tunnel ports, causing recomputes.
> > >>>>> +for i in $(seq 1 $n_hv); do
> > >>>>> +    for j in $(seq 1 $n_hv); do
> > >>>>> +        if test $i != $j; then
> > >>>>> +            OVN_WAIT_REMOTE_INPUT_FLOWS(["hv$i"],["hv$j"])
> > >>>>> +        fi
> > >>>>> +    done
> > >>>>> +done
> > >>>>> +
> > >>>>> +for i in $(seq 1 $n_hv); do
> > >>>>> +    as hv$i
> > >>>>> +    lflow_run1=$(ovn-appctl -t ovn-controller
> coverage/read-counter
> > >>>> lflow_run)
> > >>>>> +    lflow_run=`expr $lflow_run1 + $lflow_run`
> > >>>>> +done
> > >>>>> +
> > >>>>> +add_switch_ports 1 1000 $n_hv 5
> > >>>>> +
> > >>>>> +wait_for_ports_up
> > >>>>> +check ovn-nbctl --wait=hv sync
> > >>>>> +
> > >>>>> +for i in $(seq 1 $n_hv); do
> > >>>>> +    pid=$(cat hv${i}/ovn-controller.pid)
> > >>>>> +    u=$((u+$(cat "/proc/$pid/stat" | awk '{print $14}')))
> > >>>>> +    s=$((s+$(cat "/proc/$pid/stat" | awk '{print $15}')))
> > >>>>> +done
> > >>>>> +
> > >>>>> +n_pid=$(cat northd/ovn-northd.pid)
> > >>>>> +n_u=$(cat "/proc/$pid/stat" | awk '{print $14}')
> > >>>>> +n_s=$(cat "/proc/$pid/stat" | awk '{print $15}')
> > >>>>> +
> > >>>>> +echo "Total Northd User Time: $n_u"
> > >>>>> +echo "Total Northd System Time: $n_s"
> > >>>>> +echo "Total Controller User Time: $u"
> > >>>>> +echo "Total Controller System Time: $s"
> > >>>>> +
> > >>>>> +lflow_run_end=0
> > >>>>> +for i in $(seq 1 $n_hv); do
> > >>>>> +    as hv$i
> > >>>>> +    lflow_run1=$(ovn-appctl -t ovn-controller
> coverage/read-counter
> > >>>> lflow_run)
> > >>>>> +    lflow_run_end=`expr $lflow_run1 + $lflow_run_end`
> > >>>>> +done
> > >>>>> +n_recomputes=`expr $lflow_run_end - $lflow_run`
> > >>>>> +echo "$n_recomputes recomputes"
> > >>>>> +
> > >>>>> +AT_CHECK([test $lflow_run_end == $lflow_run])
> > >>>>> +
> > >>>>> +for i in $(seq 2 $n_hv); do
> > >>>>> +    OVN_CLEANUP_SBOX([hv$i])
> > >>>>> +done
> > >>>>> +OVN_CLEANUP([hv1])
> > >>>>> +AT_CLEANUP
> > >>>>> +])
> > >>>>> diff --git a/tests/perf-northd.at b/tests/perf-northd.at
> > >>>>> index 74b69e9d4..6ec196b36 100644
> > >>>>> --- a/tests/perf-northd.at
> > >>>>> +++ b/tests/perf-northd.at
> > >>>>> @@ -76,23 +76,6 @@ m4_define([PERF_RECORD_STOP], [
> > >>>>>      PERF_RECORD_STOPWATCH(ovn-northd-loop, ["Short term average"],
> > >>>> [Average (northd-loop in msec)])
> > >>>>>  ])
> > >>>>>
> > >>>>> -# OVN_NBCTL([NBCTL_COMMAND])
> > >>>>> -#
> > >>>>> -# Add NBCTL_COMMAND to list of commands to be run by
> RUN_OVN_NBCTL().
> > >>>>> -#
> > >>>>> -m4_define([OVN_NBCTL], [
> > >>>>> -    command="${command} -- $1"
> > >>>>> -])
> > >>>>> -
> > >>>>> -# RUN_OVN_NBCTL()
> > >>>>> -#
> > >>>>> -# Execute list of commands built by the OVN_NBCTL() macro.
> > >>>>> -#
> > >>>>> -m4_define([RUN_OVN_NBCTL], [
> > >>>>> -    check ovn-nbctl ${command}
> > >>>>> -    unset command
> > >>>>> -])
> > >>>>> -
> > >>>>>  OVS_START_SHELL_HELPERS
> > >>>>>  generate_subnet () {
> > >>>>>      local a=$(printf %d $(expr $1 / 256 + 10))
> > >>>>> --
> > >>>>> 2.31.1
> > >>>>>
> > >>>>
> > >>>
> > >>
> > >
> >
>
Dumitru Ceara July 13, 2022, 9:01 a.m. UTC | #9
On 7/13/22 08:27, Han Zhou wrote:
> On Tue, Jul 12, 2022 at 12:35 AM Dumitru Ceara <dceara@redhat.com> wrote:
>>
>> On 7/12/22 08:52, Han Zhou wrote:
>>> On Mon, Jul 11, 2022 at 4:55 AM Dumitru Ceara <dceara@redhat.com> wrote:
>>>>
>>>> On 7/11/22 13:31, Xavier Simonart wrote:
>>>>> Hi Han
>>>>>
>>>>> Thanks for your review.
>>>>>
>>>>> Let me try to understand your two main concerns and the proper way to
>>> fix
>>>>> it.
>>>>> 1) We only try once to write pb->chassis. If the commit fails,
>>> pb->chassis
>>>>> is not written. As commit fails, we will recompute, but as the
>>>>> update_required flag is not set anymore, we might end up with no
>>>>> pb->chassis.
>>>>> => I'll remove the flag and try to update until it's confirmed.
>>>
>>> Thank you!
>>>
>>>>> 2) The state machine, and when we move to INSTALL_FLOWS. Serializing
> the
>>>>> state machine, by waiting for confirmation to be received before
> moving
>>> to
>>>>> INSTALL_FLOWS state will delay the ovn-installed compared to today. So
>>> I am
>>>>> (still) trying to see if there is any way to prevent this in some
> cases.
>>>>> Would it be correct to do this serialization (wait for pb->chassis
>>> update
>>>>> confirmation) only when using conditional monitoring? When using
>>>>> monitor-all, as soon as we have written (w/o confirmation)
> pb->chassis,
>>> we
>>>>> would move to INSTALL_FLOWS. In that loop where we wrote pb->chassis,
>>> (all)
>>>>> the flows should be updated taking into account pb->chassis.
>>>>
>>>> I think this approach is OK.  ovn-controllers that don't use
> conditional
>>>> monitoring already know the complete SB contents and should be able to
>>>> install (mostly?) complete sets of openflows that correspond to a given
>>>> Port_Binding.
>>>>
>>>> It's also "CMS-friendly", at least for ovn-kubernetes, which uses
>>>> ovn-monitor-all=true and only waits for
>>>> OVS.Interface.external_ids:ovn-installed=true.  And shouldn't impact
> the
>>>> others which wait for SB.Port_Binding.up=true.
>>>>
>>>
>>> Xavier and Dumitru, I think we shouldn't introduce the dependency of
>>> "ovn-monitor-all" setting here.
>>>
>>> First of all, ovn-installed is a flag for CMS to understand that all the
>>> flows related to the port-binding is installed. If we set the flag
> before
>>> it is truly completed, it is a bug, and it is possible that the flag is
> set
>>> but some traffic doesn't work.
>>
>> I think it's a matter of semantics.  The way I see "ovn-installed=true"
>> is: all flows that are relevant to the port binidng on the local chassis
>> have been installed.  When we added it it was for the ovn-k8s case;
>> ovn-k8s used to explicitly check if some openflow tables on the node
>> where the pod is being brought up contained flows that seemed to
>> correspond to the pod (e.g., matching on pod mac and IP addresses).
>>
> But the purpose of checking the flows (when ovn-installed wasn't available)
> was to make sure the pod is ready to send/receive traffic. If ovn-installed
> can provide more accuracy, why not?
> 

That's fine, but with the condition of not affecting performance
(directly or indirectly).

>>> I did a quick test, and at least a flow in (table_id=38, priority=100)
>>> which is multicast-group related is updated AFTER the SB notification is
>>> received for the port-binding chassis update.
>>>
>>
>> This sounds like something we should fix, I think.  I don't see any
>> multicast-group changes conditioned by the port_binding being up=true in
>> northd.  I might be wrong though.
>>
> 
> It is not about "up=true". It is triggered by the port-binding->chassis
> update. Since multicast-group has reference to port-binding, so a
> port-binding update triggers multicast-group change handling, which is
> required because physical flows related to the MC group need to be updated
> when port-binding->chassis is updated. You may argue that the IDL may be

Ah, I see, thanks for the explanation!

> optimized so that the MC group change can be triggered and handled before
> SB is updated, but I am not sure if the benefit is worth the complexity.
> Given how OVSDB IDL transaction is designed, I'd always think a DB record
> is *formally* updated only after the update notification is received from
> the server, which seems to be safe and clear.
> 

I aggree with the latter.  A DB record is to be considered updated only
after the "update" message was received from the server.  Otherwise we
need to be able to rollback changes and that makes it too complex IMO.

>>> Secondly, if the change hasn't made it to the SB, all the other nodes
> would
>>> not be able to reach the port, which means the workload (pod/VM) cannot
>>> receive traffic yet at this phase.
>>>
>>
>> Even if the change made it to the SB we have no way of knowing that all
>> other nodes processed it so we cannot know for sure that traffic can
>> flow properly end-to-end.  But, like I said above, this doesn't matter
>> if the semantics of ovn-installed=true are "all locally relevant flows
>> are installed".
>>
> It's true that even SB is updated it doesn't ensure all the nodes processed
> it, but I view it this way: at least from the current node's point of view,
> its job is done and the other nodes are beyond its control. On the other
> hand, if SB update failed, its job is not done yet. I am not saying this is
> the only *correct* way, but just the way I am seeing it :).
> 

Ok, but we need to find a "correct" (or rather good-enough) way of doing
this without incurring performance penalty.

>>> So, I think our goal is not to set ovn-installed early, but to set it
>>> accurately (sometime may be ok to be conservative).
>>>
>>
>> Sure, but waiting for the SB port_binding.chassis update might introduce
>> significant spikes in latency if the SB is compacting (or just busy) at
>> that moment.
>>
>> This might become an issue in large scale deployments as pods will take
>> longer to be declared "ready".
>>
> I understand your concern, but if you think about it, no matter how the
> pods are *declared* ready doesn't change the fact it is ready or not. It
> doesn't make the real flow setup faster or slower.
> If the CMS really wants to declare it ready earlier, it can just ignore the
> ovn-installed flag check or flow check. What's the real benefit except for
> metrics?
> 

There's more to it than just metrics, I think.  For example, a pod is
added as a service backend only when it's declared ready.  If the
ovn-installed annotation is set later than today and thus delays the pod
"ready" state, that also affects the time it takes to bring up a service
potentially resulting in bad user experience.

>>> In addition, ovn-monitor-all is not always true even in ovn-k8s. It is
>>> configurable in ovn-k8s. (in our environment we set it to false, to save
>>> MEM and CPU for worker nodes, while sacrifice a little for the central
> SB
>>> DB)
>>>
>>
>> Ack.  But for this case specifically, as SB is already busier with
>> conditional monitoring, I think serializing events in ovn-controller
>> will create even more visible delays in pod bringup times.
>>
>> Thanks,
>> Dumitru
>>
>>> Thanks,
>>> Han
>>>>>
>>>>> Thanks again for your feedback
>>>>>
>>>>> Xavier
>>>>>
>>>>> On Thu, Jul 7, 2022 at 8:49 AM Han Zhou <hzhou@ovn.org> wrote:
>>>>>
>>>>>>
>>>>>>
>>>>>> On Wed, Jun 22, 2022 at 3:23 AM Xavier Simonart <xsimonar@redhat.com>
>>>>>> wrote:
>>>>>>>
>>>>>>> When VIF ports are claimed on a chassis, SBDB Port_Binding table is
>>>>>> updated.
>>>>>>> If the SBDB IDL is still is read-only ("in transaction") when such a
>>>>>> update
>>>>>>> is required, the update is not possible and recompute is triggered
>>>>>> through
>>>>>>> I+P failure.
>>>>>>>
>>>>>>> This situation can happen:
>>>>>>> - after updating Port_Binding->chassis to SBDB for one port, in a
>>>>>> following
>>>>>>>   iteration, ovn-controller handles
>>> Interface:external_ids:ovn-installed
>>>>>>>   (for the same port) while SBDB is still read-only.
>>>>>>> - after updating Port_Binding->chassis to SBDB for one port, in a
>>>>>> following
>>>>>>>   iteration, ovn-controller updates Port_Binding->chassis for
> another
>>>>>> port,
>>>>>>>   while SBDB is still read-only.
>>>>>>>
>>>>>>> This patch prevent the recompute, by having the if-status module
>>>>>>> updating the Port_Binding chassis (if needed) when possible.
>>>>>>> This does not delay Port_Binding chassis update compared to before
>>> this
>>>>>> patch.
>>>>>>> - With the patch, Port_Binding chassis will be updated as soon as
>>> SBDB is
>>>>>>> again writable, without recompute.
>>>>>>> - Without the patch, Port_Binding chassis was updated as soon as
> SBDB
>>> was
>>>>>>> again writable, through a recompute.
>>>>>>>
>>>>>>> As part of this patch, ovn-installed will not be updated for
>>> additional
>>>>>> chassis;
>>>>>>> it will only be updated when the migration is completed.
>>>>>>>
>>>>>>> Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2059253
>>>>>>> Signed-off-by: Xavier Simonart <xsimonar@redhat.com>
>>>>>>>
>>>>>>> ---
>>>>>>> v2:  - handled Dumitru's comments.
>>>>>>>      - handled Han's comments, mainly ensure we moved out of CLAIMED
>>>>>> state
>>>>>>>        only after updating pb->chassis to guarentee physical flows
> are
>>>>>> installed
>>>>>>>        when ovn-installed is updated in OVS.
>>>>>>>      - slighly reorganize the code to isolate 'notify_up = false'
>>> cases
>>>>>> in
>>>>>>>        claim_port (i.e. ports such as virtual ports), in the idea of
>>>>>> making
>>>>>>>        future patch preventing recomputes when virtual ports are
>>> claimed.
>>>>>>>      - updated test case to cause more race conditions.
>>>>>>>      - rebased on origin/main
>>>>>>>      - note that "additional chassis" as now supported by
>>>>>>>        "Support LSP:options:requested-chassis as a list" might still
>>>>>> cause
>>>>>>>        recomputes.
>>>>>>>      - fixed missing flows when Port_Binding chassis was updated by
>>>>>> mgr_update
>>>>>>>        w/o any lflow recalculation.
>>>>>>> v3:  - handled Dumitru's comments on v2, mainly have runtime_data
>>> handler
>>>>>>>        handling pb_claims when sb becomes writable (instead of a
> lflow
>>>>>> handler).
>>>>>>>      - fixed test as it was not checking recomputes on all hv, as
> well
>>>>>> as a flaky
>>>>>>>        behavior.
>>>>>>>      - rebased on origin/main.
>>>>>>> ---
>>>>>>>  controller/binding.c        | 154 +++++++++++++++++++++----------
>>>>>>>  controller/binding.h        |  15 +++-
>>>>>>>  controller/if-status.c      | 174
>>> ++++++++++++++++++++++++++++++++----
>>>>>>>  controller/if-status.h      |  16 +++-
>>>>>>>  controller/ovn-controller.c |  72 ++++++++++++++-
>>>>>>>  tests/ovn-macros.at         |  12 +++
>>>>>>>  tests/ovn.at                | 147 +++++++++++++++++++++++++++++-
>>>>>>>  tests/perf-northd.at        |  17 ----
>>>>>>>  8 files changed, 519 insertions(+), 88 deletions(-)
>>>>>>>
>>>>>>> diff --git a/controller/binding.c b/controller/binding.c
>>>>>>> index 2279570f9..b21577f71 100644
>>>>>>> --- a/controller/binding.c
>>>>>>> +++ b/controller/binding.c
>>>>>>> @@ -644,11 +644,17 @@ local_binding_get_lport_ofport(const struct
>>> shash
>>>>>> *local_bindings,
>>>>>>>  }
>>>>>>>
>>>>>>>  bool
>>>>>>> -local_binding_is_up(struct shash *local_bindings, const char
>>> *pb_name)
>>>>>>> +local_binding_is_up(struct shash *local_bindings, const char
>>> *pb_name,
>>>>>>> +                    const struct sbrec_chassis *chassis_rec)
>>>>>>>  {
>>>>>>>      struct local_binding *lbinding =
>>>>>>>          local_binding_find(local_bindings, pb_name);
>>>>>>>      struct binding_lport *b_lport =
>>>>>> local_binding_get_primary_lport(lbinding);
>>>>>>> +
>>>>>>> +    if (b_lport && b_lport->pb->chassis != chassis_rec) {
>>>>>>> +        return false;
>>>>>>> +    }
>>>>>>> +
>>>>>>>      if (lbinding && b_lport && lbinding->iface) {
>>>>>>>          if (b_lport->pb->n_up && !b_lport->pb->up[0]) {
>>>>>>>              return false;
>>>>>>> @@ -660,13 +666,23 @@ local_binding_is_up(struct shash
>>> *local_bindings,
>>>>>> const char *pb_name)
>>>>>>>  }
>>>>>>>
>>>>>>>  bool
>>>>>>> -local_binding_is_down(struct shash *local_bindings, const char
>>> *pb_name)
>>>>>>> +local_binding_is_down(struct shash *local_bindings, const char
>>> *pb_name,
>>>>>>> +                      const struct sbrec_chassis *chassis_rec)
>>>>>>>  {
>>>>>>>      struct local_binding *lbinding =
>>>>>>>          local_binding_find(local_bindings, pb_name);
>>>>>>>
>>>>>>>      struct binding_lport *b_lport =
>>>>>> local_binding_get_primary_lport(lbinding);
>>>>>>>
>>>>>>> +    if (b_lport) {
>>>>>>> +        if (b_lport->pb->chassis == chassis_rec) {
>>>>>>> +            return false;
>>>>>>> +        } else if (b_lport->pb->chassis) {
>>>>>>> +            VLOG_DBG("lport %s already claimed by other chassis",
>>>>>>> +                     b_lport->pb->logical_port);
>>>>>>> +        }
>>>>>>> +    }
>>>>>>> +
>>>>>>>      if (!lbinding) {
>>>>>>>          return true;
>>>>>>>      }
>>>>>>> @@ -884,37 +900,80 @@ get_lport_type_str(enum en_lport_type
>>> lport_type)
>>>>>>>      OVS_NOT_REACHED();
>>>>>>>  }
>>>>>>>
>>>>>>> -/* For newly claimed ports, if 'notify_up' is 'false':
>>>>>>> +void
>>>>>>> +set_pb_chassis_in_sbrec(const struct sbrec_port_binding *pb,
>>>>>>> +                        const struct sbrec_chassis *chassis_rec,
>>>>>>> +                        bool is_set)
>>>>>>> +{
>>>>>>> +    if (pb->chassis != chassis_rec) {
>>>>>>> +         if (is_set) {
>>>>>>> +            if (pb->chassis) {
>>>>>>> +                VLOG_INFO("Changing chassis for lport %s from %s to
>>>>>> %s.",
>>>>>>> +                          pb->logical_port, pb->chassis->name,
>>>>>>> +                          chassis_rec->name);
>>>>>>> +            } else {
>>>>>>> +                VLOG_INFO("Claiming lport %s for this chassis.",
>>>>>>> +                          pb->logical_port);
>>>>>>> +            }
>>>>>>> +            for (int i = 0; i < pb->n_mac; i++) {
>>>>>>> +                VLOG_INFO("%s: Claiming %s", pb->logical_port,
>>>>>> pb->mac[i]);
>>>>>>> +            }
>>>>>>> +            sbrec_port_binding_set_chassis(pb, chassis_rec);
>>>>>>> +        }
>>>>>>> +    } else if (!is_set) {
>>>>>>> +        sbrec_port_binding_set_chassis(pb, NULL);
>>>>>>> +    }
>>>>>>> +}
>>>>>>> +
>>>>>>> +void
>>>>>>> +local_binding_set_pb(struct shash *local_bindings, const char
>>> *pb_name,
>>>>>>> +                     const struct sbrec_chassis *chassis_rec,
>>>>>>> +                     struct hmap *tracked_datapaths, bool is_set)
>>>>>>> +{
>>>>>>> +    struct local_binding *lbinding =
>>>>>>> +        local_binding_find(local_bindings, pb_name);
>>>>>>> +    struct binding_lport *b_lport =
>>>>>> local_binding_get_primary_lport(lbinding);
>>>>>>> +
>>>>>>> +    if (b_lport) {
>>>>>>> +        set_pb_chassis_in_sbrec(b_lport->pb, chassis_rec, is_set);
>>>>>>> +        if (tracked_datapaths) {
>>>>>>> +            update_lport_tracking(b_lport->pb, tracked_datapaths,
>>> true);
>>>>>>> +        }
>>>>>>> +    }
>>>>>>> +}
>>>>>>> +
>>>>>>> +/* For newly claimed ports:
>>>>>>>   * - set the 'pb.up' field to true if 'pb' has no 'parent_pb'.
>>>>>>>   * - set the 'pb.up' field to true if 'parent_pb.up' is 'true'
> (e.g.,
>>>>>> for
>>>>>>>   *   container and virtual ports).
>>>>>>> - * Otherwise request a notification to be sent when the OVS flows
>>>>>>> - * corresponding to 'pb' have been installed.
>>>>>>> + *
>>>>>>> + * Returns false if lport is not claimed due to 'sb_readonly'.
>>>>>>> + * Returns true otherwise.
>>>>>>>   *
>>>>>>>   * Note:
>>>>>>> - *   Updates (directly or through a notification) the 'pb->up'
> field
>>>>>> only if
>>>>>>> - *   it's explicitly set to 'false'.
>>>>>>> + *   Updates the 'pb->up' field only if it's explicitly set to
>>> 'false'.
>>>>>>>   *   This is to ensure compatibility with older versions of
>>> ovn-northd.
>>>>>>>   */
>>>>>>> -static void
>>>>>>> +static bool
>>>>>>>  claimed_lport_set_up(const struct sbrec_port_binding *pb,
>>>>>>>                       const struct sbrec_port_binding *parent_pb,
>>>>>>> -                     const struct sbrec_chassis *chassis_rec,
>>>>>>> -                     bool notify_up, struct if_status_mgr *if_mgr)
>>>>>>> +                     bool sb_readonly)
>>>>>>>  {
>>>>>>> -    if (!notify_up) {
>>>>>>> -        bool up = true;
>>>>>>> -        if (!parent_pb || (parent_pb->n_up && parent_pb->up[0])) {
>>>>>>> +    /* When notify_up is false in claim_port(), no state is created
>>>>>>> +     * by if_status_mgr. In such cases, return false (i.e. trigger
>>>>>> recompute)
>>>>>>> +     * if we can't update sb (because it is readonly).
>>>>>>> +     */
>>>>>>> +    bool up = true;
>>>>>>> +    if (!parent_pb || (parent_pb->n_up && parent_pb->up[0])) {
>>>>>>> +        if (!sb_readonly) {
>>>>>>>              if (pb->n_up) {
>>>>>>>                  sbrec_port_binding_set_up(pb, &up, 1);
>>>>>>>              }
>>>>>>> +        } else if (pb->n_up && !pb->up[0]) {
>>>>>>> +            return false;
>>>>>>>          }
>>>>>>> -        return;
>>>>>>> -    }
>>>>>>> -
>>>>>>> -    if (pb->chassis != chassis_rec || (pb->n_up && !pb->up[0])) {
>>>>>>> -        if_status_mgr_claim_iface(if_mgr, pb->logical_port);
>>>>>>>      }
>>>>>>> +    return true;
>>>>>>>  }
>>>>>>>
>>>>>>>  typedef void (*set_func)(const struct sbrec_port_binding *pb,
>>>>>>> @@ -1057,37 +1116,35 @@ claim_lport(const struct sbrec_port_binding
>>> *pb,
>>>>>>>              struct hmap *tracked_datapaths,
>>>>>>>              struct if_status_mgr *if_mgr)
>>>>>>>  {
>>>>>>> -    if (!sb_readonly) {
>>>>>>> -        claimed_lport_set_up(pb, parent_pb, chassis_rec, notify_up,
>>>>>> if_mgr);
>>>>>>> -    }
>>>>>>> -
>>>>>>>      enum can_bind can_bind =
>>>>>> lport_can_bind_on_this_chassis(chassis_rec, pb);
>>>>>>>      bool update_tracked = false;
>>>>>>>
>>>>>>>      if (can_bind == CAN_BIND_AS_MAIN) {
>>>>>>>          if (pb->chassis != chassis_rec) {
>>>>>>> -            if (sb_readonly) {
>>>>>>> -                return false;
>>>>>>> -            }
>>>>>>> -
>>>>>>> -            if (pb->chassis) {
>>>>>>> -                VLOG_INFO("Changing chassis for lport %s from %s to
>>>>>> %s.",
>>>>>>> -                        pb->logical_port, pb->chassis->name,
>>>>>>> -                        chassis_rec->name);
>>>>>>> -            } else {
>>>>>>> -                VLOG_INFO("Claiming lport %s for this chassis.",
>>>>>>> -                          pb->logical_port);
>>>>>>> -            }
>>>>>>> -            for (size_t i = 0; i < pb->n_mac; i++) {
>>>>>>> -                VLOG_INFO("%s: Claiming %s", pb->logical_port,
>>>>>> pb->mac[i]);
>>>>>>> -            }
>>>>>>> -
>>>>>>> -            sbrec_port_binding_set_chassis(pb, chassis_rec);
>>>>>>>              if (is_additional_chassis(pb, chassis_rec)) {
>>>>>>> +                if (sb_readonly) {
>>>>>>> +                    return false;
>>>>>>> +                }
>>>>>>>                  remove_additional_chassis(pb, chassis_rec);
>>>>>>>              }
>>>>>>>              update_tracked = true;
>>>>>>>          }
>>>>>>> +        if (!notify_up) {
>>>>>>> +            if (!claimed_lport_set_up(pb, parent_pb, sb_readonly))
> {
>>>>>>> +                return false;
>>>>>>> +            }
>>>>>>> +            if (pb->chassis != chassis_rec) {
>>>>>>> +                if (sb_readonly) {
>>>>>>> +                    return false;
>>>>>>> +                }
>>>>>>> +                set_pb_chassis_in_sbrec(pb, chassis_rec, true);
>>>>>>> +            }
>>>>>>> +        } else {
>>>>>>> +            if ((pb->chassis != chassis_rec) || (pb->n_up &&
>>>>>> !pb->up[0])) {
>>>>>>> +                if_status_mgr_claim_iface(if_mgr, pb, chassis_rec,
>>>>>>> +                                          sb_readonly);
>>>>>>> +            }
>>>>>>> +        }
>>>>>>>      } else if (can_bind == CAN_BIND_AS_ADDITIONAL) {
>>>>>>>          if (!is_additional_chassis(pb, chassis_rec)) {
>>>>>>>              if (sb_readonly) {
>>>>>>> @@ -1132,7 +1189,8 @@ claim_lport(const struct sbrec_port_binding
> *pb,
>>>>>>>   */
>>>>>>>  static bool
>>>>>>>  release_lport_main_chassis(const struct sbrec_port_binding *pb,
>>>>>>> -                           bool sb_readonly)
>>>>>>> +                           bool sb_readonly,
>>>>>>> +                           struct if_status_mgr *if_mgr)
>>>>>>>  {
>>>>>>>      if (pb->encap) {
>>>>>>>          if (sb_readonly) {
>>>>>>> @@ -1141,11 +1199,13 @@ release_lport_main_chassis(const struct
>>>>>> sbrec_port_binding *pb,
>>>>>>>          sbrec_port_binding_set_encap(pb, NULL);
>>>>>>>      }
>>>>>>>
>>>>>>> +    /* If sb readonly, pb->chassis unset through if-status if
>>> present.
>>>>>> */
>>>>>>>      if (pb->chassis) {
>>>>>>> -        if (sb_readonly) {
>>>>>>> +        if (!sb_readonly) {
>>>>>>> +            sbrec_port_binding_set_chassis(pb, NULL);
>>>>>>> +        } else if (!if_status_mgr_iface_is_present(if_mgr,
>>>>>> pb->logical_port)) {
>>>>>>>              return false;
>>>>>>>          }
>>>>>>> -        sbrec_port_binding_set_chassis(pb, NULL);
>>>>>>>      }
>>>>>>>
>>>>>>>      if (pb->virtual_parent) {
>>>>>>> @@ -1155,7 +1215,8 @@ release_lport_main_chassis(const struct
>>>>>> sbrec_port_binding *pb,
>>>>>>>          sbrec_port_binding_set_virtual_parent(pb, NULL);
>>>>>>>      }
>>>>>>>
>>>>>>> -    VLOG_INFO("Releasing lport %s from this chassis.",
>>>>>> pb->logical_port);
>>>>>>> +    VLOG_INFO("Releasing lport %s from this chassis
>>> (sb_readonly=%d)",
>>>>>>> +              pb->logical_port, sb_readonly);
>>>>>>>      return true;
>>>>>>>  }
>>>>>>>
>>>>>>> @@ -1189,7 +1250,7 @@ release_lport(const struct sbrec_port_binding
>>> *pb,
>>>>>>>                struct hmap *tracked_datapaths, struct if_status_mgr
>>>>>> *if_mgr)
>>>>>>>  {
>>>>>>>      if (pb->chassis == chassis_rec) {
>>>>>>> -        if (!release_lport_main_chassis(pb, sb_readonly)) {
>>>>>>> +        if (!release_lport_main_chassis(pb, sb_readonly, if_mgr)) {
>>>>>>>              return false;
>>>>>>>          }
>>>>>>>      } else if (is_additional_chassis(pb, chassis_rec)) {
>>>>>>> @@ -1271,7 +1332,7 @@ consider_vif_lport_(const struct
>>>>>> sbrec_port_binding *pb,
>>>>>>>                               b_lport->lbinding->iface,
>>>>>>>                               !b_ctx_in->ovnsb_idl_txn,
>>>>>>>                               !parent_pb,
>>> b_ctx_out->tracked_dp_bindings,
>>>>>>> -                             b_ctx_out->if_mgr)){
>>>>>>> +                             b_ctx_out->if_mgr)) {
>>>>>>>                  return false;
>>>>>>>              }
>>>>>>>
>>>>>>> @@ -1527,7 +1588,8 @@ consider_localport(const struct
>>> sbrec_port_binding
>>>>>> *pb,
>>>>>>>      enum can_bind can_bind = lport_can_bind_on_this_chassis(
>>>>>>>          b_ctx_in->chassis_rec, pb);
>>>>>>>      if (can_bind == CAN_BIND_AS_MAIN) {
>>>>>>> -        if (!release_lport_main_chassis(pb,
>>> !b_ctx_in->ovnsb_idl_txn)) {
>>>>>>> +        if (!release_lport_main_chassis(pb,
> !b_ctx_in->ovnsb_idl_txn,
>>>>>>> +            b_ctx_out->if_mgr)) {
>>>>>>>              return false;
>>>>>>>          }
>>>>>>>      } else if (can_bind == CAN_BIND_AS_ADDITIONAL) {
>>>>>>> diff --git a/controller/binding.h b/controller/binding.h
>>>>>>> index 1fed06674..d20659b0b 100644
>>>>>>> --- a/controller/binding.h
>>>>>>> +++ b/controller/binding.h
>>>>>>> @@ -151,8 +151,10 @@ const struct sbrec_port_binding
>>>>>> *local_binding_get_primary_pb(
>>>>>>>  ofp_port_t local_binding_get_lport_ofport(const struct shash
>>>>>> *local_bindings,
>>>>>>>                                            const char *pb_name);
>>>>>>>
>>>>>>> -bool local_binding_is_up(struct shash *local_bindings, const char
>>>>>> *pb_name);
>>>>>>> -bool local_binding_is_down(struct shash *local_bindings, const char
>>>>>> *pb_name);
>>>>>>> +bool local_binding_is_up(struct shash *local_bindings, const char
>>>>>> *pb_name,
>>>>>>> +                         const struct sbrec_chassis *);
>>>>>>> +bool local_binding_is_down(struct shash *local_bindings, const char
>>>>>> *pb_name,
>>>>>>> +                           const struct sbrec_chassis *);
>>>>>>>  void local_binding_set_up(struct shash *local_bindings, const char
>>>>>> *pb_name,
>>>>>>>                            const struct sbrec_chassis *chassis_rec,
>>>>>>>                            const char *ts_now_str, bool sb_readonly,
>>>>>>> @@ -160,7 +162,10 @@ void local_binding_set_up(struct shash
>>>>>> *local_bindings, const char *pb_name,
>>>>>>>  void local_binding_set_down(struct shash *local_bindings, const
> char
>>>>>> *pb_name,
>>>>>>>                              const struct sbrec_chassis
> *chassis_rec,
>>>>>>>                              bool sb_readonly, bool ovs_readonly);
>>>>>>> -
>>>>>>> +void local_binding_set_pb(struct shash *local_bindings, const char
>>>>>> *pb_name,
>>>>>>> +                          const struct sbrec_chassis *chassis_rec,
>>>>>>> +                          struct hmap *tracked_datapaths,
>>>>>>> +                          bool is_set);
>>>>>>>  void binding_register_ovs_idl(struct ovsdb_idl *);
>>>>>>>  void binding_run(struct binding_ctx_in *, struct binding_ctx_out
> *);
>>>>>>>  bool binding_cleanup(struct ovsdb_idl_txn *ovnsb_idl_txn,
>>>>>>> @@ -178,6 +183,10 @@ void binding_dump_local_bindings(struct
>>>>>> local_binding_data *, struct ds *);
>>>>>>>  bool is_additional_chassis(const struct sbrec_port_binding *pb,
>>>>>>>                             const struct sbrec_chassis
> *chassis_rec);
>>>>>>>
>>>>>>> +void set_pb_chassis_in_sbrec(const struct sbrec_port_binding *pb,
>>>>>>> +                             const struct sbrec_chassis
> *chassis_rec,
>>>>>>> +                             bool is_set);
>>>>>>> +
>>>>>>>  /* Corresponds to each Port_Binding.type. */
>>>>>>>  enum en_lport_type {
>>>>>>>      LP_UNKNOWN,
>>>>>>> diff --git a/controller/if-status.c b/controller/if-status.c
>>>>>>> index ad61844d8..7693c289b 100644
>>>>>>> --- a/controller/if-status.c
>>>>>>> +++ b/controller/if-status.c
>>>>>>> @@ -24,6 +24,7 @@
>>>>>>>  #include "lib/util.h"
>>>>>>>  #include "timeval.h"
>>>>>>>  #include "openvswitch/vlog.h"
>>>>>>> +#include "lib/ovn-sb-idl.h"
>>>>>>>
>>>>>>>  VLOG_DEFINE_THIS_MODULE(if_status);
>>>>>>>
>>>>>>> @@ -53,9 +54,11 @@ VLOG_DEFINE_THIS_MODULE(if_status);
>>>>>>>   */
>>>>>>>
>>>>>>>  enum if_state {
>>>>>>> -    OIF_CLAIMED,       /* Newly claimed interface. */
>>>>>>> -    OIF_INSTALL_FLOWS, /* Already claimed interface for which flows
>>> are
>>>>>> still
>>>>>>> -                        * being installed.
>>>>>>> +    OIF_CLAIMED,       /* Newly claimed interface. pb->chassis not
>>> yet
>>>>>> updated.
>>>>>>> +                        */
>>>>>>> +    OIF_INSTALL_FLOWS, /* Claimed interface with pb->chassis
>>>>>> successfully
>>>>>>> +                        * updated in SB and for which flows are
> still
>>>>>> being
>>>>>>> +                        * installed.
>>>>>>>                          */
>>>>>>>      OIF_MARK_UP,       /* Interface with flows successfully
> installed
>>>>>> in OVS
>>>>>>>                          * but not yet marked "up" in the binding
>>> module
>>>>>> (in
>>>>>>> @@ -78,6 +81,55 @@ static const char *if_state_names[] = {
>>>>>>>      [OIF_INSTALLED]     = "INSTALLED",
>>>>>>>  };
>>>>>>>
>>>>>>> +/*
>>>>>>> + *       +----------------------+
>>>>>>> + * +---> |                      |
>>>>>>> + * | +-> |         NULL         |
>>>>>> <--------------------------------------+++-+
>>>>>>> + * | |   +----------------------+
>>>>>>      |
>>>>>>> + * | |     ^ release_iface   | claim_iface
>>>>>>       |
>>>>>>> + * | |     |                 V - sbrec_update_chassis(if sb is rw)
>>>>>>       |
>>>>>>> + * | |   +----------------------+
>>>>>>      |
>>>>>>> + * | |   |                      |
>>>>>> <----------------------------------------+ |
>>>>>>> + * | |   |       CLAIMED        |
>>>>>> <--------------------------------------+ | |
>>>>>>> + * | |   +----------------------+
>>>>>>  | | |
>>>>>>> + * | |                  | mgr_update(when sb is rw)
>>>>>>  | | |
>>>>>>> + * | | release_iface    |  - sbrec_update_chassis
>>>>>>  | | |
>>>>>>> + * | |                  |  - request seqno
>>>>>>   | | |
>>>>>>> + * | |                  V
>>>>>>  | | |
>>>>>>> + * | |   +----------------------+
>>>>>>  | | |
>>>>>>> + * | +-- |                      |  mgr_run(seqno not rcvd)
>>>>>>   | | |
>>>>>>> + * |     |    INSTALL_FLOWS     |   - set port down in sb
>>>>>>  | | |
>>>>>>> + * |     |                      |  mgr_update()
>>>>>>  | | |
>>>>>>> + * |     +----------------------+   - sbrec_update_chassis if
> needed
>>>>>>   | | |
>>>>>>> + * |                    |
>>>>>>  | | |
>>>>>>> + * |                    |  mgr_run(seqno rcvd)
>>>>>>   | | |
>>>>>>> + * |                    |  - set port up in sb
>>>>>>   | | |
>>>>>>> + * | release_iface      |  - set ovn-installed in ovs
>>>>>>  | | |
>>>>>>> + * |                    V
>>>>>>  | | |
>>>>>>> + * |   +----------------------+
>>>>>>  | | |
>>>>>>> + * |   |                      |  mgr_run()
>>>>>>   | | |
>>>>>>> + * +-- |       MARK_UP        |  - set port up in sb
>>>>>>   | | |
>>>>>>> + *     |                      |  - set ovn-installed in ovs
>>>>>>  | | |
>>>>>>> + *     |                      |  mgr_update()
>>>>>>  | | |
>>>>>>> + *     +----------------------+  - sbrec_update_chassis if needed
>>>>>>  | | |
>>>>>>> + *              |
>>>>>>  | | |
>>>>>>> + *              | mgr_update(rcvd port up / ovn_installed & chassis
>>>>>> set) | | |
>>>>>>> + *              V
>>>>>>  | | |
>>>>>>> + *     +----------------------+
>>>>>>  | | |
>>>>>>> + *     |      INSTALLED       | ------------> claim_iface
>>>>>> ---------------+ | |
>>>>>>> + *     +----------------------+
>>>>>>    | |
>>>>>>> + *              |
>>>>>>    | |
>>>>>>> + *              | release_iface
>>>>>>    | |
>>>>>>> + *              V
>>>>>>    | |
>>>>>>> + *     +----------------------+
>>>>>>    | |
>>>>>>> + *     |                      | ------------> claim_iface
>>>>>> -----------------+ |
>>>>>>> + *     |      MARK_DOWN       | ------> mgr_update(rcvd port down)
>>>>>> ----------+
>>>>>>> + *     |                      | mgr_run()
>>>>>>> + *     |                      | - set port down in sb
>>>>>>> + *     |                      | mgr_update()
>>>>>>> + *     +----------------------+ - sbrec_update_chassis(NULL)
>>>>>>> + */
>>>>>>> +
>>>>>>>  struct ovs_iface {
>>>>>>>      char *id;               /* Extracted from OVS
>>>>>> external_ids.iface_id. */
>>>>>>>      enum if_state state;    /* State of the interface in the state
>>>>>> machine. */
>>>>>>> @@ -85,6 +137,7 @@ struct ovs_iface {
>>>>>>>                               * be fully programmed in OVS.  Only
> used
>>>>>> in state
>>>>>>>                               * OIF_INSTALL_FLOWS.
>>>>>>>                               */
>>>>>>> +    bool chassis_update_required;  /* If true, pb->chassis must be
>>>>>> updated. */
>>>>>>>  };
>>>>>>>
>>>>>>>  static uint64_t ifaces_usage;
>>>>>>> @@ -158,14 +211,23 @@ if_status_mgr_destroy(struct if_status_mgr
> *mgr)
>>>>>>>  }
>>>>>>>
>>>>>>>  void
>>>>>>> -if_status_mgr_claim_iface(struct if_status_mgr *mgr, const char
>>>>>> *iface_id)
>>>>>>> +if_status_mgr_claim_iface(struct if_status_mgr *mgr,
>>>>>>> +                          const struct sbrec_port_binding *pb,
>>>>>>> +                          const struct sbrec_chassis *chassis_rec,
>>>>>>> +                          bool sb_readonly)
>>>>>>>  {
>>>>>>> +    const char *iface_id = pb->logical_port;
>>>>>>>      struct ovs_iface *iface = shash_find_data(&mgr->ifaces,
>>> iface_id);
>>>>>>>
>>>>>>>      if (!iface) {
>>>>>>>          iface = ovs_iface_create(mgr, iface_id, OIF_CLAIMED);
>>>>>>>      }
>>>>>>> -
>>>>>>> +    if (!sb_readonly) {
>>>>>>> +        set_pb_chassis_in_sbrec(pb, chassis_rec, true);
>>>>>>> +        iface->chassis_update_required = false;
>>>>>>> +    } else {
>>>>>>> +        iface->chassis_update_required = true;
>>>>>>> +    }
>>>>>>>      switch (iface->state) {
>>>>>>>      case OIF_CLAIMED:
>>>>>>>      case OIF_INSTALL_FLOWS:
>>>>>>> @@ -182,6 +244,12 @@ if_status_mgr_claim_iface(struct if_status_mgr
>>>>>> *mgr, const char *iface_id)
>>>>>>>      }
>>>>>>>  }
>>>>>>>
>>>>>>> +bool
>>>>>>> +if_status_mgr_iface_is_present(struct if_status_mgr *mgr, const
> char
>>>>>> *iface_id)
>>>>>>> +{
>>>>>>> +    return !!shash_find_data(&mgr->ifaces, iface_id);
>>>>>>> +}
>>>>>>> +
>>>>>>>  void
>>>>>>>  if_status_mgr_release_iface(struct if_status_mgr *mgr, const char
>>>>>> *iface_id)
>>>>>>>  {
>>>>>>> @@ -246,9 +314,39 @@ if_status_mgr_delete_iface(struct if_status_mgr
>>>>>> *mgr, const char *iface_id)
>>>>>>>      }
>>>>>>>  }
>>>>>>>
>>>>>>> +bool
>>>>>>> +if_status_handle_claims(struct if_status_mgr *mgr,
>>>>>>> +                        struct local_binding_data *binding_data,
>>>>>>> +                        const struct sbrec_chassis *chassis_rec,
>>>>>>> +                        struct hmap *tracked_datapath,
>>>>>>> +                        bool sb_readonly)
>>>>>>> +{
>>>>>>> +    if (!binding_data || sb_readonly) {
>>>>>>> +        return false;
>>>>>>> +    }
>>>>>>> +
>>>>>>> +    struct shash *bindings = &binding_data->bindings;
>>>>>>> +    struct hmapx_node *node;
>>>>>>> +
>>>>>>> +    bool rc = false;
>>>>>>> +    HMAPX_FOR_EACH (node, &mgr->ifaces_per_state[OIF_CLAIMED]) {
>>>>>>> +        struct ovs_iface *iface = node->data;
>>>>>>> +        if (iface->chassis_update_required) {
>>>>>>
>>>>>> Thanks Xavier for the revision. The state machine looks more clear
> now,
>>>>>> but I have a major concern for the use of chassis_update_required.
> This
>>>>>> bool flag is used to decide if an update to SB is needed, and once a
> SB
>>>>>> update is requested, it is set to false, and assumes the SB update
> will
>>>>>> succeed immediately. However, the assumption may be wrong. There can
> be
>>>>>> different kinds of reasons that the subsequent SB update fails, or
>>> delayed,
>>>>>> so this flag is not reliable. Instead, in CLAIMED state, the
>>> responsibility
>>>>>> to make sure the SB update is completed. If the transaction is
>>> in-progress,
>>>>>> the sb_readonly is true. So if sb_readonly is false, it means nothing
>>> is
>>>>>> in-progress, so we can always check if (!sb_readonly && <SB chassis
> is
>>> not
>>>>>> updated for the port-binding>) we should just send the update,
>>> regardless
>>>>>> of whether we have requested it before. Please also see another
> comment
>>>>>> below for the state transition.
>>>>>>
>>>>>>> +            VLOG_INFO("if_status_handle_claims for %s", iface->id);
>>>>>>> +            local_binding_set_pb(bindings, iface->id, chassis_rec,
>>>>>>> +                                 tracked_datapath, true);
>>>>>>> +            rc = true;
>>>>>>> +        }
>>>>>>> +        iface->chassis_update_required = false;
>>>>>>> +    }
>>>>>>> +    return rc;
>>>>>>> +}
>>>>>>> +
>>>>>>>  void
>>>>>>>  if_status_mgr_update(struct if_status_mgr *mgr,
>>>>>>> -                     struct local_binding_data *binding_data)
>>>>>>> +                     struct local_binding_data *binding_data,
>>>>>>> +                     const struct sbrec_chassis *chassis_rec,
>>>>>>> +                     bool sb_readonly)
>>>>>>>  {
>>>>>>>      if (!binding_data) {
>>>>>>>          return;
>>>>>>> @@ -257,13 +355,25 @@ if_status_mgr_update(struct if_status_mgr
> *mgr,
>>>>>>>      struct shash *bindings = &binding_data->bindings;
>>>>>>>      struct hmapx_node *node;
>>>>>>>
>>>>>>> +    /* Interfaces in OIF_MARK_UP state have already set their
>>>>>> pb->chassis.
>>>>>>> +     * However, it might have been reset by another hv.
>>>>>>> +     */
>>>>>>>      /* Move all interfaces that have been confirmed "up" by the
>>> binding
>>>>>> module,
>>>>>>>       * from OIF_MARK_UP to OIF_INSTALLED.
>>>>>>>       */
>>>>>>>      HMAPX_FOR_EACH_SAFE (node,
> &mgr->ifaces_per_state[OIF_MARK_UP]) {
>>>>>>>          struct ovs_iface *iface = node->data;
>>>>>>>
>>>>>>> -        if (local_binding_is_up(bindings, iface->id)) {
>>>>>>> +        if (iface->chassis_update_required) {
>>>>>>> +            if (!sb_readonly) {
>>>>>>> +                iface->chassis_update_required = false;
>>>>>>> +                local_binding_set_pb(bindings, iface->id,
>>> chassis_rec,
>>>>>>> +                                     NULL, true);
>>>>>>> +            } else {
>>>>>>> +                continue;
>>>>>>> +            }
>>>>>>> +        }
>>>>>>> +        if (local_binding_is_up(bindings, iface->id, chassis_rec))
> {
>>>>>>>              ovs_iface_set_state(mgr, iface, OIF_INSTALLED);
>>>>>>>          }
>>>>>>>      }
>>>>>>> @@ -274,23 +384,53 @@ if_status_mgr_update(struct if_status_mgr
> *mgr,
>>>>>>>      HMAPX_FOR_EACH_SAFE (node,
>>> &mgr->ifaces_per_state[OIF_MARK_DOWN]) {
>>>>>>>          struct ovs_iface *iface = node->data;
>>>>>>>
>>>>>>> -        if (local_binding_is_down(bindings, iface->id)) {
>>>>>>> +        if (!sb_readonly) {
>>>>>>> +            local_binding_set_pb(bindings, iface->id, chassis_rec,
>>>>>>> +                                 NULL, false);
>>>>>>> +        }
>>>>>>> +        if (local_binding_is_down(bindings, iface->id,
> chassis_rec))
>>> {
>>>>>>>              ovs_iface_destroy(mgr, iface);
>>>>>>>          }
>>>>>>>      }
>>>>>>>
>>>>>>> -    /* Register for a notification about flows being installed in
> OVS
>>>>>> for all
>>>>>>> -     * newly claimed interfaces.
>>>>>>> +    if (!sb_readonly) {
>>>>>>> +        HMAPX_FOR_EACH_SAFE (node,
>>>>>> &mgr->ifaces_per_state[OIF_INSTALL_FLOWS]) {
>>>>>>> +            struct ovs_iface *iface = node->data;
>>>>>>> +
>>>>>>> +            if (iface->chassis_update_required) {
>>>>>>> +                iface->chassis_update_required = false;
>>>>>>> +                local_binding_set_pb(bindings, iface->id,
>>> chassis_rec,
>>>>>>> +                                     NULL, true);
>>>>>>> +            }
>>>>>>> +        }
>>>>>>> +    }
>>>>>>> +
>>>>>>> +    /* Update Port_Binding->chassis for newly claimed interfaces
>>>>>>> +     * Register for a notification about flows being installed in
> OVS
>>>>>> for all
>>>>>>> +     * newly claimed interfaces for which we could update
>>> pb->chassis.
>>>>>>>       *
>>>>>>>       * Move them from OIF_CLAIMED to OIF_INSTALL_FLOWS.
>>>>>>>       */
>>>>>>> -    bool new_ifaces = false;
>>>>>>> -    HMAPX_FOR_EACH_SAFE (node,
> &mgr->ifaces_per_state[OIF_CLAIMED]) {
>>>>>>> -        struct ovs_iface *iface = node->data;
>>>>>>>
>>>>>>> -        ovs_iface_set_state(mgr, iface, OIF_INSTALL_FLOWS);
>>>>>>> -        iface->install_seqno = mgr->iface_seqno + 1;
>>>>>>> -        new_ifaces = true;
>>>>>>> +    bool new_ifaces = false;
>>>>>>> +    if (!sb_readonly) {
>>>>>>> +        HMAPX_FOR_EACH_SAFE (node,
>>> &mgr->ifaces_per_state[OIF_CLAIMED])
>>>>>> {
>>>>>>> +            struct ovs_iface *iface = node->data;
>>>>>>> +            /* No need to check for chassis_update_required as
>>> already
>>>>>> done
>>>>>>> +             * in if_status_handle_claims or
>>> if_status_mgr_claim_iface
>>>>>>> +             */
>>>>>>> +            ovs_iface_set_state(mgr, iface, OIF_INSTALL_FLOWS);
>>>>>>
>>>>>> We need to make sure the chassis in SB port-binding is up-to-date
> (i.e.
>>>>>> the update notification from SB DB has been received) before moving
> to
>>>>>> INSTALL_FLOWS. Otherwise, it is still possible that the state is
> moved
>>> too
>>>>>> early and end up with incomplete flow installation for the lport when
>>> the
>>>>>> state is finally moved to INSTALLED.
>>>>>>
>>>>>> Thanks,
>>>>>> Han
>>>>>>
>>>>>>> +            iface->install_seqno = mgr->iface_seqno + 1;
>>>>>>> +            new_ifaces = true;
>>>>>>> +        }
>>>>>>> +    } else {
>>>>>>> +        HMAPX_FOR_EACH_SAFE (node,
>>> &mgr->ifaces_per_state[OIF_CLAIMED])
>>>>>> {
>>>>>>> +            struct ovs_iface *iface = node->data;
>>>>>>> +            static struct vlog_rate_limit rl =
>>> VLOG_RATE_LIMIT_INIT(5,
>>>>>> 1);
>>>>>>> +            VLOG_INFO_RL(&rl,
>>>>>>> +                         "Not updating pb chassis for %s now as "
>>>>>>> +                         "sb is readonly", iface->id);
>>>>>>> +        }
>>>>>>>      }
>>>>>>>
>>>>>>>      /* Request a seqno update when the flows for new interfaces
> have
>>>>>> been
>>>>>>> @@ -403,7 +543,7 @@ if_status_mgr_update_bindings(struct
> if_status_mgr
>>>>>> *mgr,
>>>>>>>      struct hmapx_node *node;
>>>>>>>
>>>>>>>      /* Notify the binding module to set "down" all bindings that
> are
>>>>>> still
>>>>>>> -     * in the process of being installed in OVS, i.e., are not yet
>>>>>> instsalled.
>>>>>>> +     * in the process of being installed in OVS, i.e., are not yet
>>>>>> installed.
>>>>>>>       */
>>>>>>>      HMAPX_FOR_EACH (node,
> &mgr->ifaces_per_state[OIF_INSTALL_FLOWS])
>>> {
>>>>>>>          struct ovs_iface *iface = node->data;
>>>>>>> diff --git a/controller/if-status.h b/controller/if-status.h
>>>>>>> index bb8a3950d..f9b05d30d 100644
>>>>>>> --- a/controller/if-status.h
>>>>>>> +++ b/controller/if-status.h
>>>>>>> @@ -27,15 +27,27 @@ struct if_status_mgr
> *if_status_mgr_create(void);
>>>>>>>  void if_status_mgr_clear(struct if_status_mgr *);
>>>>>>>  void if_status_mgr_destroy(struct if_status_mgr *);
>>>>>>>
>>>>>>> -void if_status_mgr_claim_iface(struct if_status_mgr *, const char
>>>>>> *iface_id);
>>>>>>> +void if_status_mgr_claim_iface(struct if_status_mgr *,
>>>>>>> +                               const struct sbrec_port_binding *pb,
>>>>>>> +                               const struct sbrec_chassis
>>> *chassis_rec,
>>>>>>> +                               bool sb_readonly);
>>>>>>>  void if_status_mgr_release_iface(struct if_status_mgr *, const char
>>>>>> *iface_id);
>>>>>>>  void if_status_mgr_delete_iface(struct if_status_mgr *, const char
>>>>>> *iface_id);
>>>>>>>
>>>>>>> -void if_status_mgr_update(struct if_status_mgr *, struct
>>>>>> local_binding_data *);
>>>>>>> +void if_status_mgr_update(struct if_status_mgr *, struct
>>>>>> local_binding_data *,
>>>>>>> +                          const struct sbrec_chassis *chassis,
>>>>>>> +                          bool sb_readonly);
>>>>>>>  void if_status_mgr_run(struct if_status_mgr *mgr, struct
>>>>>> local_binding_data *,
>>>>>>>                         const struct sbrec_chassis *,
>>>>>>>                         bool sb_readonly, bool ovs_readonly);
>>>>>>>  void if_status_mgr_get_memory_usage(struct if_status_mgr *mgr,
>>>>>>>                                      struct simap *usage);
>>>>>>> +bool if_status_mgr_iface_is_present(struct if_status_mgr *mgr,
>>>>>>> +                                    const char *iface_id);
>>>>>>> +bool if_status_handle_claims(struct if_status_mgr *mgr,
>>>>>>> +                             struct local_binding_data
> *binding_data,
>>>>>>> +                             const struct sbrec_chassis
> *chassis_rec,
>>>>>>> +                             struct hmap *tracked_datapath,
>>>>>>> +                             bool sb_readonly);
>>>>>>>
>>>>>>>  # endif /* controller/if-status.h */
>>>>>>> diff --git a/controller/ovn-controller.c
> b/controller/ovn-controller.c
>>>>>>> index 69615308e..3947baf03 100644
>>>>>>> --- a/controller/ovn-controller.c
>>>>>>> +++ b/controller/ovn-controller.c
>>>>>>> @@ -1464,6 +1464,73 @@ en_runtime_data_run(struct engine_node *node,
>>>>>> void *data)
>>>>>>>      engine_set_node_state(node, EN_UPDATED);
>>>>>>>  }
>>>>>>>
>>>>>>> +struct ed_type_sb_ro {
>>>>>>> +    bool sb_readonly;
>>>>>>> +};
>>>>>>> +
>>>>>>> +static void *
>>>>>>> +en_sb_ro_init(struct engine_node *node OVS_UNUSED,
>>>>>>> +              struct engine_arg *arg OVS_UNUSED)
>>>>>>> +{
>>>>>>> +    struct ed_type_sb_ro *data = xzalloc(sizeof *data);
>>>>>>> +    return data;
>>>>>>> +}
>>>>>>> +
>>>>>>> +static void
>>>>>>> +en_sb_ro_run(struct engine_node *node, void *data)
>>>>>>> +{
>>>>>>> +    struct ed_type_sb_ro *sb_ro_data = data;
>>>>>>> +    bool sb_readonly = !engine_get_context()->ovnsb_idl_txn;
>>>>>>> +    if (sb_ro_data->sb_readonly != sb_readonly) {
>>>>>>> +        sb_ro_data->sb_readonly = sb_readonly;
>>>>>>> +        if (!sb_ro_data->sb_readonly) {
>>>>>>> +            engine_set_node_state(node, EN_UPDATED);
>>>>>>> +        }
>>>>>>> +    }
>>>>>>> +}
>>>>>>> +
>>>>>>> +static void
>>>>>>> +en_sb_ro_cleanup(void *data OVS_UNUSED)
>>>>>>> +{
>>>>>>> +}
>>>>>>> +
>>>>>>> +static bool
>>>>>>> +runtime_data_sb_ro_handler(struct engine_node *node, void *data)
>>>>>>> +{
>>>>>>> +    const struct sbrec_chassis *chassis = NULL;
>>>>>>> +
>>>>>>> +    struct ovsrec_open_vswitch_table *ovs_table =
>>>>>>> +        (struct ovsrec_open_vswitch_table *)EN_OVSDB_GET(
>>>>>>> +            engine_get_input("OVS_open_vswitch", node));
>>>>>>> +
>>>>>>> +    const char *chassis_id = get_ovs_chassis_id(ovs_table);
>>>>>>> +
>>>>>>> +    struct ovsdb_idl_index *sbrec_chassis_by_name =
>>>>>>> +        engine_ovsdb_node_get_index(
>>>>>>> +                engine_get_input("SB_chassis", node),
>>>>>>> +                "name");
>>>>>>> +
>>>>>>> +    if (chassis_id) {
>>>>>>> +        chassis = chassis_lookup_by_name(sbrec_chassis_by_name,
>>>>>> chassis_id);
>>>>>>> +    }
>>>>>>> +    if (chassis) {
>>>>>>> +        struct ed_type_runtime_data *rt_data = data;
>>>>>>> +        bool sb_readonly = !engine_get_context()->ovnsb_idl_txn;
>>>>>>> +        struct controller_engine_ctx *ctrl_ctx =
>>>>>>> +            engine_get_context()->client_ctx;
>>>>>>> +
>>>>>>> +        if (if_status_handle_claims(ctrl_ctx->if_mgr,
>>>>>>> +                                    &rt_data->lbinding_data,
>>>>>>> +                                    chassis,
>>>>>>> +                                    &rt_data->tracked_dp_bindings,
>>>>>>> +                                    sb_readonly)) {
>>>>>>> +            engine_set_node_state(node, EN_UPDATED);
>>>>>>> +            rt_data->tracked = true;
>>>>>>> +        }
>>>>>>> +    }
>>>>>>> +    return true;
>>>>>>> +}
>>>>>>> +
>>>>>>>  static bool
>>>>>>>  runtime_data_ovs_interface_shadow_handler(struct engine_node *node,
>>>>>> void *data)
>>>>>>>  {
>>>>>>> @@ -3528,6 +3595,7 @@ main(int argc, char *argv[])
>>>>>>>      stopwatch_create(VIF_PLUG_RUN_STOPWATCH_NAME, SW_MS);
>>>>>>>
>>>>>>>      /* Define inc-proc-engine nodes. */
>>>>>>> +    ENGINE_NODE(sb_ro, "sb_ro");
>>>>>>>      ENGINE_NODE_WITH_CLEAR_TRACK_DATA_IS_VALID(ct_zones,
> "ct_zones");
>>>>>>>      ENGINE_NODE_WITH_CLEAR_TRACK_DATA(ovs_interface_shadow,
>>>>>>>                                        "ovs_interface_shadow");
>>>>>>> @@ -3664,6 +3732,7 @@ main(int argc, char *argv[])
>>>>>>>      engine_add_input(&en_ovs_interface_shadow, &en_ovs_interface,
>>>>>>>                       ovs_interface_shadow_ovs_interface_handler);
>>>>>>>
>>>>>>> +    engine_add_input(&en_runtime_data, &en_sb_ro,
>>>>>> runtime_data_sb_ro_handler);
>>>>>>>      engine_add_input(&en_runtime_data, &en_ofctrl_is_connected,
>>> NULL);
>>>>>>>
>>>>>>>      engine_add_input(&en_runtime_data, &en_ovs_open_vswitch, NULL);
>>>>>>> @@ -4098,7 +4167,8 @@ main(int argc, char *argv[])
>>>>>>>                          runtime_data ?
> &runtime_data->lbinding_data :
>>>>>> NULL;
>>>>>>>
>>>  stopwatch_start(IF_STATUS_MGR_UPDATE_STOPWATCH_NAME,
>>>>>>>                                      time_msec());
>>>>>>> -                    if_status_mgr_update(if_mgr, binding_data);
>>>>>>> +                    if_status_mgr_update(if_mgr, binding_data,
>>> chassis,
>>>>>>> +                                         !ovnsb_idl_txn);
>>>>>>>
>>>  stopwatch_stop(IF_STATUS_MGR_UPDATE_STOPWATCH_NAME,
>>>>>>>                                     time_msec());
>>>>>>>
>>>>>>> diff --git a/tests/ovn-macros.at b/tests/ovn-macros.at
>>>>>>> index 335f9158c..8fd6ae6f7 100644
>>>>>>> --- a/tests/ovn-macros.at
>>>>>>> +++ b/tests/ovn-macros.at
>>>>>>> @@ -759,3 +759,15 @@
>>> m4_define([OVN_FOR_EACH_NORTHD_WITHOUT_DP_GROUPS],
>>>>>>>       [m4_foreach([NORTHD_USE_DP_GROUPS], [no],
>>>>>>>         [m4_foreach([NORTHD_USE_PARALLELIZATION], [yes, no], [$1
>>>>>>>  ])])])])
>>>>>>> +
>>>>>>> +# OVN_NBCTL(NBCTL_COMMAND) adds NBCTL_COMMAND to list of commands
> to
>>> be
>>>>>> run by RUN_OVN_NBCTL().
>>>>>>> +m4_define([OVN_NBCTL], [
>>>>>>> +    command="${command} -- $1"
>>>>>>> +])
>>>>>>> +
>>>>>>> +# RUN_OVN_NBCTL() executes list of commands built by the
> OVN_NBCTL()
>>>>>> macro.
>>>>>>> +m4_define([RUN_OVN_NBCTL], [
>>>>>>> +    check ovn-nbctl ${command}
>>>>>>> +    unset command
>>>>>>> +])
>>>>>>> +
>>>>>>> diff --git a/tests/ovn.at b/tests/ovn.at
>>>>>>> index bfaa41962..94d16bac9 100644
>>>>>>> --- a/tests/ovn.at
>>>>>>> +++ b/tests/ovn.at
>>>>>>> @@ -102,6 +102,18 @@ m4_divert_text([PREPARE_TESTS],
>>>>>>>           test 1 -le $(as $hv1 ovs-ofctl dump-flows br-int | grep -c
>>>>>> "output:$ofport")
>>>>>>>       ])
>>>>>>>     }
>>>>>>> +
>>>>>>> +   ovn_wait_remote_input_flows () {
>>>>>>> +     hv1=$1
>>>>>>> +     hv2=$2
>>>>>>> +     echo "$3: waiting for flows for remote input on $hv1"
>>>>>>> +     # Wait for a flow outputing  to remote input
>>>>>>> +     OVS_WAIT_UNTIL([
>>>>>>> +         ofport=$(as $hv1 ovs-vsctl --bare --columns ofport find
>>>>>> Interface name=ovn-${hv2}-0)
>>>>>>> +         echo "tunnel port=$ofport"
>>>>>>> +         test 1 -le $(as $hv1 ovs-ofctl dump-flows br-int | grep -c
>>>>>> "in_port=$ofport")
>>>>>>> +     ])
>>>>>>> +   }
>>>>>>>  ])
>>>>>>>
>>>>>>>  m4_define([OVN_CHECK_PACKETS],
>>>>>>> @@ -127,6 +139,8 @@ m4_define([OVN_WAIT_PATCH_PORT_FLOWS],
>>>>>>>  m4_define([OVN_WAIT_REMOTE_OUTPUT_FLOWS],
>>>>>>>    [ovn_wait_remote_output_flows "$1" "$2" "__file__:__line__"])
>>>>>>>
>>>>>>> +m4_define([OVN_WAIT_REMOTE_INPUT_FLOWS],
>>>>>>> +  [ovn_wait_remote_input_flows "$1" "$2" "__file__:__line__"])
>>>>>>>
>>>>>>>  AT_BANNER([OVN components])
>>>>>>>
>>>>>>> @@ -14056,6 +14070,11 @@ wait_column "$hv1_uuid" Port_Binding
>>>>>> requested_chassis logical_port=lsp0
>>>>>>>  wait_column "$hv2_uuid" Port_Binding additional_chassis
>>>>>> logical_port=lsp0
>>>>>>>  wait_column "$hv2_uuid" Port_Binding requested_additional_chassis
>>>>>> logical_port=lsp0
>>>>>>>
>>>>>>> +# Check ovn-installed updated for main chassis
>>>>>>> +wait_for_ports_up
>>>>>>> +OVS_WAIT_UNTIL([test `as hv1 ovs-vsctl get Interface lsp0
>>>>>> external_ids:ovn-installed` = '"true"'])
>>>>>>> +OVS_WAIT_UNTIL([test x`as hv2 ovs-vsctl get Interface lsp0
>>>>>> external_ids:ovn-installed` = x])
>>>>>>> +
>>>>>>>  # Check that setting iface:encap-ip populates
>>>>>> Port_Binding:additional_encap
>>>>>>>  wait_row_count Encap 2 chassis_name=hv1
>>>>>>>  wait_row_count Encap 2 chassis_name=hv2
>>>>>>> @@ -14081,6 +14100,11 @@ wait_column "$hv2_uuid" Port_Binding
>>>>>> requested_chassis logical_port=lsp0
>>>>>>>  wait_column "" Port_Binding additional_chassis logical_port=lsp0
>>>>>>>  wait_column "" Port_Binding requested_additional_chassis
>>>>>> logical_port=lsp0
>>>>>>>
>>>>>>> +# Check ovn-installed updated for main chassis and not for other
>>> chassis
>>>>>>> +wait_for_ports_up
>>>>>>> +OVS_WAIT_UNTIL([test `as hv2 ovs-vsctl get Interface lsp0
>>>>>> external_ids:ovn-installed` = '"true"'])
>>>>>>> +OVS_WAIT_UNTIL([test x`as hv1 ovs-vsctl get Interface lsp0
>>>>>> external_ids:ovn-installed` = x])
>>>>>>> +
>>>>>>>  # Check that additional_encap is cleared
>>>>>>>  wait_column "" Port_Binding additional_encap logical_port=lsp0
>>>>>>>
>>>>>>> @@ -15370,7 +15394,10 @@ AT_CHECK([as hv1 ovs-ofctl dump-flows
> br-int
>>>>>> table=65 | grep actions=output:1],
>>>>>>>  echo "verifying that lsp0 binding moves when requested-chassis is
>>>>>> changed"
>>>>>>>
>>>>>>>  ovn-nbctl lsp-set-options lsp0 requested-chassis=hv2
>>>>>>> -OVS_WAIT_UNTIL([test 1 = $(grep -c "Releasing lport lsp0 from this
>>>>>> chassis" hv1/ovn-controller.log)])
>>>>>>> +
>>>>>>> +# We might see multiple "Releasing lport ...", when sb is read only
>>>>>>> +OVS_WAIT_UNTIL([test 1 -le $(grep -c "Releasing lport lsp0 from
> this
>>>>>> chassis" hv1/ovn-controller.log)])
>>>>>>> +
>>>>>>>  wait_column "$hv2_uuid" Port_Binding chassis logical_port=lsp0
>>>>>>>
>>>>>>>  # (6) Chassis hv2 should add flows and hv1 should not.
>>>>>>> @@ -15416,7 +15443,7 @@ AT_CHECK([as hv1 ovs-ofctl dump-flows br-int
>>>>>> table=0 | grep in_port=1], [0], [ig
>>>>>>>  AT_CHECK([as hv1 ovs-ofctl dump-flows br-int table=65 | grep
>>>>>> actions=output:1], [0], [ignore])
>>>>>>>
>>>>>>>  check ovn-nbctl --wait=hv lsp-set-options lsp0
>>>>>> requested-chassis=non-existant-chassis
>>>>>>> -OVS_WAIT_UNTIL([test 1 = $(grep -c "Releasing lport lsp0 from this
>>>>>> chassis" hv1/ovn-controller.log)])
>>>>>>> +OVS_WAIT_UNTIL([test 1 -le $(grep -c "Releasing lport lsp0 from
> this
>>>>>> chassis" hv1/ovn-controller.log)])
>>>>>>>  check ovn-nbctl --wait=hv sync
>>>>>>>  wait_column '' Port_Binding chasssi logical_port=lsp0
>>>>>>>  AT_CHECK([as hv1 ovs-ofctl dump-flows br-int table=0 | grep
>>> in_port=1],
>>>>>> [1], [])
>>>>>>> @@ -32418,3 +32445,119 @@ AT_CHECK([test $(ovn-sbctl list fdb | grep
>>> -c
>>>>>> "00:00:00:00:10:30") = 0])
>>>>>>>  OVN_CLEANUP([hv1])
>>>>>>>  AT_CLEANUP
>>>>>>>  ])
>>>>>>> +
>>>>>>> +OVN_FOR_EACH_NORTHD([
>>>>>>> +AT_SETUP([recomputes])
>>>>>>> +ovn_start
>>>>>>> +
>>>>>>> +n_hv=4
>>>>>>> +
>>>>>>> +# Add chassis
>>>>>>> +net_add n1
>>>>>>> +for i in $(seq 1 $n_hv); do
>>>>>>> +    sim_add hv$i
>>>>>>> +    as hv$i
>>>>>>> +    check ovs-vsctl add-br br-phys
>>>>>>> +    ovs-vsctl set open .
>>> external-ids:ovn-bridge-mappings=phys:br-phys
>>>>>>> +    ovn_attach n1 br-phys 192.168.0.$i 24 geneve
>>>>>>> +done
>>>>>>> +
>>>>>>> +add_switch_ports() {
>>>>>>> +    start_port=$1
>>>>>>> +    end_port=$2
>>>>>>> +    nb_hv=$3
>>>>>>> +    bulk_size=$4
>>>>>>> +    for ((i=start_port; i<end_port; )) do
>>>>>>> +        start_bulk=$i
>>>>>>> +        for hv in $(seq 1 $nb_hv); do
>>>>>>> +            end_bulk=$((start_bulk+bulk_size-1))
>>>>>>> +            for port in $(seq $start_bulk $end_bulk); do
>>>>>>> +                logical_switch_port=lsp${port}
>>>>>>> +                OVN_NBCTL(lsp-add ls1 $logical_switch_port)
>>>>>>> +                OVN_NBCTL(lsp-set-addresses $logical_switch_port
>>>>>> dynamic)
>>>>>>> +            done
>>>>>>> +            start_bulk=$((end_bulk+1))
>>>>>>> +        done
>>>>>>> +        RUN_OVN_NBCTL()
>>>>>>> +
>>>>>>> +        start_bulk=$i
>>>>>>> +        for hv in $(seq 1 $nb_hv); do
>>>>>>> +            end_bulk=$((start_bulk+bulk_size-1))
>>>>>>> +            for port in $(seq $start_bulk $end_bulk); do
>>>>>>> +                logical_switch_port=lsp${port}
>>>>>>> +                as hv$hv ovs-vsctl \
>>>>>>> +                    --no-wait -- add-port br-int vif${port} \
>>>>>>> +                    -- set Interface vif${port}
>>>>>> external_ids:iface-id=$logical_switch_port
>>>>>>> +            done
>>>>>>> +            start_bulk=$((end_bulk+1))
>>>>>>> +        done
>>>>>>> +        i=$((end_bulk+1))
>>>>>>> +    done
>>>>>>> +}
>>>>>>> +check ovn-nbctl ls-add ls1
>>>>>>> +check ovn-nbctl set Logical_Switch ls1 other_config:subnet=
>>> 10.1.0.0/16
>>>>>>> +check ovn-nbctl set Logical_Switch ls1
>>>>>> other_config:exclude_ips=10.1.255.254
>>>>>>> +
>>>>>>> +check ovn-nbctl lr-add lr1
>>>>>>> +check ovn-nbctl lsp-add ls1 lsp0 -- set Logical_Switch_Port lsp0
>>>>>> type=router options:router-port=lrp0 addresses=dynamic
>>>>>>> +check ovn-nbctl lrp-add lr1 lrp0 "f0:00:00:01:00:01"
> 10.1.255.254/16
>>>>>>> +check ovn-nbctl lr-nat-add lr1 snat 10.2.0.1 10.1.0.0/16
>>>>>>> +
>>>>>>> +lflow_run=0
>>>>>>> +check ovn-nbctl --wait=hv sync
>>>>>>> +
>>>>>>> +# Tunnel ports might not be added (yet) at this point on slow
> system.
>>>>>>> +# Wait for flows related to such ports to ensure those ports have
>>> been
>>>>>> added
>>>>>>> +# before we measure recomputes. Otherwise, ovs_interface handler
>>> might
>>>>>> be run
>>>>>>> +# afterwards for tunnel ports, causing recomputes.
>>>>>>> +for i in $(seq 1 $n_hv); do
>>>>>>> +    for j in $(seq 1 $n_hv); do
>>>>>>> +        if test $i != $j; then
>>>>>>> +            OVN_WAIT_REMOTE_INPUT_FLOWS(["hv$i"],["hv$j"])
>>>>>>> +        fi
>>>>>>> +    done
>>>>>>> +done
>>>>>>> +
>>>>>>> +for i in $(seq 1 $n_hv); do
>>>>>>> +    as hv$i
>>>>>>> +    lflow_run1=$(ovn-appctl -t ovn-controller coverage/read-counter
>>>>>> lflow_run)
>>>>>>> +    lflow_run=`expr $lflow_run1 + $lflow_run`
>>>>>>> +done
>>>>>>> +
>>>>>>> +add_switch_ports 1 1000 $n_hv 5
>>>>>>> +
>>>>>>> +wait_for_ports_up
>>>>>>> +check ovn-nbctl --wait=hv sync
>>>>>>> +
>>>>>>> +for i in $(seq 1 $n_hv); do
>>>>>>> +    pid=$(cat hv${i}/ovn-controller.pid)
>>>>>>> +    u=$((u+$(cat "/proc/$pid/stat" | awk '{print $14}')))
>>>>>>> +    s=$((s+$(cat "/proc/$pid/stat" | awk '{print $15}')))
>>>>>>> +done
>>>>>>> +
>>>>>>> +n_pid=$(cat northd/ovn-northd.pid)
>>>>>>> +n_u=$(cat "/proc/$pid/stat" | awk '{print $14}')
>>>>>>> +n_s=$(cat "/proc/$pid/stat" | awk '{print $15}')
>>>>>>> +
>>>>>>> +echo "Total Northd User Time: $n_u"
>>>>>>> +echo "Total Northd System Time: $n_s"
>>>>>>> +echo "Total Controller User Time: $u"
>>>>>>> +echo "Total Controller System Time: $s"
>>>>>>> +
>>>>>>> +lflow_run_end=0
>>>>>>> +for i in $(seq 1 $n_hv); do
>>>>>>> +    as hv$i
>>>>>>> +    lflow_run1=$(ovn-appctl -t ovn-controller coverage/read-counter
>>>>>> lflow_run)
>>>>>>> +    lflow_run_end=`expr $lflow_run1 + $lflow_run_end`
>>>>>>> +done
>>>>>>> +n_recomputes=`expr $lflow_run_end - $lflow_run`
>>>>>>> +echo "$n_recomputes recomputes"
>>>>>>> +
>>>>>>> +AT_CHECK([test $lflow_run_end == $lflow_run])
>>>>>>> +
>>>>>>> +for i in $(seq 2 $n_hv); do
>>>>>>> +    OVN_CLEANUP_SBOX([hv$i])
>>>>>>> +done
>>>>>>> +OVN_CLEANUP([hv1])
>>>>>>> +AT_CLEANUP
>>>>>>> +])
>>>>>>> diff --git a/tests/perf-northd.at b/tests/perf-northd.at
>>>>>>> index 74b69e9d4..6ec196b36 100644
>>>>>>> --- a/tests/perf-northd.at
>>>>>>> +++ b/tests/perf-northd.at
>>>>>>> @@ -76,23 +76,6 @@ m4_define([PERF_RECORD_STOP], [
>>>>>>>      PERF_RECORD_STOPWATCH(ovn-northd-loop, ["Short term average"],
>>>>>> [Average (northd-loop in msec)])
>>>>>>>  ])
>>>>>>>
>>>>>>> -# OVN_NBCTL([NBCTL_COMMAND])
>>>>>>> -#
>>>>>>> -# Add NBCTL_COMMAND to list of commands to be run by
> RUN_OVN_NBCTL().
>>>>>>> -#
>>>>>>> -m4_define([OVN_NBCTL], [
>>>>>>> -    command="${command} -- $1"
>>>>>>> -])
>>>>>>> -
>>>>>>> -# RUN_OVN_NBCTL()
>>>>>>> -#
>>>>>>> -# Execute list of commands built by the OVN_NBCTL() macro.
>>>>>>> -#
>>>>>>> -m4_define([RUN_OVN_NBCTL], [
>>>>>>> -    check ovn-nbctl ${command}
>>>>>>> -    unset command
>>>>>>> -])
>>>>>>> -
>>>>>>>  OVS_START_SHELL_HELPERS
>>>>>>>  generate_subnet () {
>>>>>>>      local a=$(printf %d $(expr $1 / 256 + 10))
>>>>>>> --
>>>>>>> 2.31.1
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>
Dumitru Ceara July 13, 2022, 9:08 a.m. UTC | #10
On 7/13/22 09:40, Xavier Simonart wrote:
> Hi Han, Dumitru
> 

Hi Han, Xavier,

Sorry, I had already replied to the previous message and only then
noticed this one.

> I think that we should, as much as possible, try to achieve both goals:
> - have an accurate ovn-installed
> - do not increase latency in large scale deployments
> 

+1

> The fact that ovn-installed is sent too early for mc flows is already an
> issue today, independent of this patch.
> Fixing ovn-installed related to mc flows by delaying the state change (for
> all cases, included when no mc groups) might be seen as a performance
> regression.
> 

I think it will, and I'm not sure we can convince the CMS that this is
"just metrics".

> I agree that we should fix this ovn-installed issue, but it is not a
> regression added by this patch. We should enter a BZ for it.
> Per my understanding, the mc flows are updated when the SB_multicast_group
> is seen as updated by ovn-controller, due to its references to port binding.
> Other flows related to port binding are installed earlier, i.e. when
> ovn-controller writes port_binding->chassis (i.e. before it receives SB
> confirmation). So, while sending the mc flows earlier than what we do today
> might be more complex, I think it makes some kind of sense (we would send
> all those flows within the same loop).

I'm inclining towards leaving it as it is today if this is the only flow
we're missing.  It's a guess without testing things out, but I think
it's for the MC_FLOOD_L2 multicast group which is used only for
forwarding ARP packets originated by OVN routers or destined to a
specific OVN router.  Losing some of those packets is not a big deal.

But it might be good to confirm that this is the MC group we install the
flow for.

Thanks,
Dumitru

> 
> Thanks
> Xavier
> 
> 
> 
> On Wed, Jul 13, 2022 at 8:28 AM Han Zhou <hzhou@ovn.org> wrote:
> 
>>
>>
>> On Tue, Jul 12, 2022 at 12:35 AM Dumitru Ceara <dceara@redhat.com> wrote:
>>>
>>> On 7/12/22 08:52, Han Zhou wrote:
>>>> On Mon, Jul 11, 2022 at 4:55 AM Dumitru Ceara <dceara@redhat.com>
>> wrote:
>>>>>
>>>>> On 7/11/22 13:31, Xavier Simonart wrote:
>>>>>> Hi Han
>>>>>>
>>>>>> Thanks for your review.
>>>>>>
>>>>>> Let me try to understand your two main concerns and the proper way to
>>>> fix
>>>>>> it.
>>>>>> 1) We only try once to write pb->chassis. If the commit fails,
>>>> pb->chassis
>>>>>> is not written. As commit fails, we will recompute, but as the
>>>>>> update_required flag is not set anymore, we might end up with no
>>>>>> pb->chassis.
>>>>>> => I'll remove the flag and try to update until it's confirmed.
>>>>
>>>> Thank you!
>>>>
>>>>>> 2) The state machine, and when we move to INSTALL_FLOWS. Serializing
>> the
>>>>>> state machine, by waiting for confirmation to be received before
>> moving
>>>> to
>>>>>> INSTALL_FLOWS state will delay the ovn-installed compared to today.
>> So
>>>> I am
>>>>>> (still) trying to see if there is any way to prevent this in some
>> cases.
>>>>>> Would it be correct to do this serialization (wait for pb->chassis
>>>> update
>>>>>> confirmation) only when using conditional monitoring? When using
>>>>>> monitor-all, as soon as we have written (w/o confirmation)
>> pb->chassis,
>>>> we
>>>>>> would move to INSTALL_FLOWS. In that loop where we wrote pb->chassis,
>>>> (all)
>>>>>> the flows should be updated taking into account pb->chassis.
>>>>>
>>>>> I think this approach is OK.  ovn-controllers that don't use
>> conditional
>>>>> monitoring already know the complete SB contents and should be able to
>>>>> install (mostly?) complete sets of openflows that correspond to a
>> given
>>>>> Port_Binding.
>>>>>
>>>>> It's also "CMS-friendly", at least for ovn-kubernetes, which uses
>>>>> ovn-monitor-all=true and only waits for
>>>>> OVS.Interface.external_ids:ovn-installed=true.  And shouldn't impact
>> the
>>>>> others which wait for SB.Port_Binding.up=true.
>>>>>
>>>>
>>>> Xavier and Dumitru, I think we shouldn't introduce the dependency of
>>>> "ovn-monitor-all" setting here.
>>>>
>>>> First of all, ovn-installed is a flag for CMS to understand that all
>> the
>>>> flows related to the port-binding is installed. If we set the flag
>> before
>>>> it is truly completed, it is a bug, and it is possible that the flag
>> is set
>>>> but some traffic doesn't work.
>>>
>>> I think it's a matter of semantics.  The way I see "ovn-installed=true"
>>> is: all flows that are relevant to the port binidng on the local chassis
>>> have been installed.  When we added it it was for the ovn-k8s case;
>>> ovn-k8s used to explicitly check if some openflow tables on the node
>>> where the pod is being brought up contained flows that seemed to
>>> correspond to the pod (e.g., matching on pod mac and IP addresses).
>>>
>> But the purpose of checking the flows (when ovn-installed wasn't
>> available) was to make sure the pod is ready to send/receive traffic. If
>> ovn-installed can provide more accuracy, why not?
>>
>>>> I did a quick test, and at least a flow in (table_id=38, priority=100)
>>>> which is multicast-group related is updated AFTER the SB notification
>> is
>>>> received for the port-binding chassis update.
>>>>
>>>
>>> This sounds like something we should fix, I think.  I don't see any
>>> multicast-group changes conditioned by the port_binding being up=true in
>>> northd.  I might be wrong though.
>>>
>>
>> It is not about "up=true". It is triggered by the port-binding->chassis
>> update. Since multicast-group has reference to port-binding, so a
>> port-binding update triggers multicast-group change handling, which is
>> required because physical flows related to the MC group need to be updated
>> when port-binding->chassis is updated. You may argue that the IDL may be
>> optimized so that the MC group change can be triggered and handled before
>> SB is updated, but I am not sure if the benefit is worth the complexity.
>> Given how OVSDB IDL transaction is designed, I'd always think a DB record
>> is *formally* updated only after the update notification is received from
>> the server, which seems to be safe and clear.
>>
>>>> Secondly, if the change hasn't made it to the SB, all the other nodes
>> would
>>>> not be able to reach the port, which means the workload (pod/VM) cannot
>>>> receive traffic yet at this phase.
>>>>
>>>
>>> Even if the change made it to the SB we have no way of knowing that all
>>> other nodes processed it so we cannot know for sure that traffic can
>>> flow properly end-to-end.  But, like I said above, this doesn't matter
>>> if the semantics of ovn-installed=true are "all locally relevant flows
>>> are installed".
>>>
>> It's true that even SB is updated it doesn't ensure all the nodes
>> processed it, but I view it this way: at least from the current node's
>> point of view, its job is done and the other nodes are beyond its control.
>> On the other hand, if SB update failed, its job is not done yet. I am not
>> saying this is the only *correct* way, but just the way I am seeing it :).
>>
>>>> So, I think our goal is not to set ovn-installed early, but to set it
>>>> accurately (sometime may be ok to be conservative).
>>>>
>>>
>>> Sure, but waiting for the SB port_binding.chassis update might introduce
>>> significant spikes in latency if the SB is compacting (or just busy) at
>>> that moment.
>>>
>>> This might become an issue in large scale deployments as pods will take
>>> longer to be declared "ready".
>>>
>> I understand your concern, but if you think about it, no matter how the
>> pods are *declared* ready doesn't change the fact it is ready or not. It
>> doesn't make the real flow setup faster or slower.
>> If the CMS really wants to declare it ready earlier, it can just ignore
>> the ovn-installed flag check or flow check. What's the real benefit except
>> for metrics?
>>
>>>> In addition, ovn-monitor-all is not always true even in ovn-k8s. It is
>>>> configurable in ovn-k8s. (in our environment we set it to false, to
>> save
>>>> MEM and CPU for worker nodes, while sacrifice a little for the central
>> SB
>>>> DB)
>>>>
>>>
>>> Ack.  But for this case specifically, as SB is already busier with
>>> conditional monitoring, I think serializing events in ovn-controller
>>> will create even more visible delays in pod bringup times.
>>>
>>> Thanks,
>>> Dumitru
>>>
>>>> Thanks,
>>>> Han
>>>>>>
>>>>>> Thanks again for your feedback
>>>>>>
>>>>>> Xavier
>>>>>>
>>>>>> On Thu, Jul 7, 2022 at 8:49 AM Han Zhou <hzhou@ovn.org> wrote:
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Jun 22, 2022 at 3:23 AM Xavier Simonart <
>> xsimonar@redhat.com>
>>>>>>> wrote:
>>>>>>>>
>>>>>>>> When VIF ports are claimed on a chassis, SBDB Port_Binding table is
>>>>>>> updated.
>>>>>>>> If the SBDB IDL is still is read-only ("in transaction") when such
>> a
>>>>>>> update
>>>>>>>> is required, the update is not possible and recompute is triggered
>>>>>>> through
>>>>>>>> I+P failure.
>>>>>>>>
>>>>>>>> This situation can happen:
>>>>>>>> - after updating Port_Binding->chassis to SBDB for one port, in a
>>>>>>> following
>>>>>>>>   iteration, ovn-controller handles
>>>> Interface:external_ids:ovn-installed
>>>>>>>>   (for the same port) while SBDB is still read-only.
>>>>>>>> - after updating Port_Binding->chassis to SBDB for one port, in a
>>>>>>> following
>>>>>>>>   iteration, ovn-controller updates Port_Binding->chassis for
>> another
>>>>>>> port,
>>>>>>>>   while SBDB is still read-only.
>>>>>>>>
>>>>>>>> This patch prevent the recompute, by having the if-status module
>>>>>>>> updating the Port_Binding chassis (if needed) when possible.
>>>>>>>> This does not delay Port_Binding chassis update compared to before
>>>> this
>>>>>>> patch.
>>>>>>>> - With the patch, Port_Binding chassis will be updated as soon as
>>>> SBDB is
>>>>>>>> again writable, without recompute.
>>>>>>>> - Without the patch, Port_Binding chassis was updated as soon as
>> SBDB
>>>> was
>>>>>>>> again writable, through a recompute.
>>>>>>>>
>>>>>>>> As part of this patch, ovn-installed will not be updated for
>>>> additional
>>>>>>> chassis;
>>>>>>>> it will only be updated when the migration is completed.
>>>>>>>>
>>>>>>>> Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2059253
>>>>>>>> Signed-off-by: Xavier Simonart <xsimonar@redhat.com>
>>>>>>>>
>>>>>>>> ---
>>>>>>>> v2:  - handled Dumitru's comments.
>>>>>>>>      - handled Han's comments, mainly ensure we moved out of
>> CLAIMED
>>>>>>> state
>>>>>>>>        only after updating pb->chassis to guarentee physical flows
>> are
>>>>>>> installed
>>>>>>>>        when ovn-installed is updated in OVS.
>>>>>>>>      - slighly reorganize the code to isolate 'notify_up = false'
>>>> cases
>>>>>>> in
>>>>>>>>        claim_port (i.e. ports such as virtual ports), in the idea
>> of
>>>>>>> making
>>>>>>>>        future patch preventing recomputes when virtual ports are
>>>> claimed.
>>>>>>>>      - updated test case to cause more race conditions.
>>>>>>>>      - rebased on origin/main
>>>>>>>>      - note that "additional chassis" as now supported by
>>>>>>>>        "Support LSP:options:requested-chassis as a list" might
>> still
>>>>>>> cause
>>>>>>>>        recomputes.
>>>>>>>>      - fixed missing flows when Port_Binding chassis was updated by
>>>>>>> mgr_update
>>>>>>>>        w/o any lflow recalculation.
>>>>>>>> v3:  - handled Dumitru's comments on v2, mainly have runtime_data
>>>> handler
>>>>>>>>        handling pb_claims when sb becomes writable (instead of a
>> lflow
>>>>>>> handler).
>>>>>>>>      - fixed test as it was not checking recomputes on all hv, as
>> well
>>>>>>> as a flaky
>>>>>>>>        behavior.
>>>>>>>>      - rebased on origin/main.
>>>>>>>> ---
>>>>>>>>  controller/binding.c        | 154 +++++++++++++++++++++----------
>>>>>>>>  controller/binding.h        |  15 +++-
>>>>>>>>  controller/if-status.c      | 174
>>>> ++++++++++++++++++++++++++++++++----
>>>>>>>>  controller/if-status.h      |  16 +++-
>>>>>>>>  controller/ovn-controller.c |  72 ++++++++++++++-
>>>>>>>>  tests/ovn-macros.at         |  12 +++
>>>>>>>>  tests/ovn.at                | 147 +++++++++++++++++++++++++++++-
>>>>>>>>  tests/perf-northd.at        |  17 ----
>>>>>>>>  8 files changed, 519 insertions(+), 88 deletions(-)
>>>>>>>>
>>>>>>>> diff --git a/controller/binding.c b/controller/binding.c
>>>>>>>> index 2279570f9..b21577f71 100644
>>>>>>>> --- a/controller/binding.c
>>>>>>>> +++ b/controller/binding.c
>>>>>>>> @@ -644,11 +644,17 @@ local_binding_get_lport_ofport(const struct
>>>> shash
>>>>>>> *local_bindings,
>>>>>>>>  }
>>>>>>>>
>>>>>>>>  bool
>>>>>>>> -local_binding_is_up(struct shash *local_bindings, const char
>>>> *pb_name)
>>>>>>>> +local_binding_is_up(struct shash *local_bindings, const char
>>>> *pb_name,
>>>>>>>> +                    const struct sbrec_chassis *chassis_rec)
>>>>>>>>  {
>>>>>>>>      struct local_binding *lbinding =
>>>>>>>>          local_binding_find(local_bindings, pb_name);
>>>>>>>>      struct binding_lport *b_lport =
>>>>>>> local_binding_get_primary_lport(lbinding);
>>>>>>>> +
>>>>>>>> +    if (b_lport && b_lport->pb->chassis != chassis_rec) {
>>>>>>>> +        return false;
>>>>>>>> +    }
>>>>>>>> +
>>>>>>>>      if (lbinding && b_lport && lbinding->iface) {
>>>>>>>>          if (b_lport->pb->n_up && !b_lport->pb->up[0]) {
>>>>>>>>              return false;
>>>>>>>> @@ -660,13 +666,23 @@ local_binding_is_up(struct shash
>>>> *local_bindings,
>>>>>>> const char *pb_name)
>>>>>>>>  }
>>>>>>>>
>>>>>>>>  bool
>>>>>>>> -local_binding_is_down(struct shash *local_bindings, const char
>>>> *pb_name)
>>>>>>>> +local_binding_is_down(struct shash *local_bindings, const char
>>>> *pb_name,
>>>>>>>> +                      const struct sbrec_chassis *chassis_rec)
>>>>>>>>  {
>>>>>>>>      struct local_binding *lbinding =
>>>>>>>>          local_binding_find(local_bindings, pb_name);
>>>>>>>>
>>>>>>>>      struct binding_lport *b_lport =
>>>>>>> local_binding_get_primary_lport(lbinding);
>>>>>>>>
>>>>>>>> +    if (b_lport) {
>>>>>>>> +        if (b_lport->pb->chassis == chassis_rec) {
>>>>>>>> +            return false;
>>>>>>>> +        } else if (b_lport->pb->chassis) {
>>>>>>>> +            VLOG_DBG("lport %s already claimed by other chassis",
>>>>>>>> +                     b_lport->pb->logical_port);
>>>>>>>> +        }
>>>>>>>> +    }
>>>>>>>> +
>>>>>>>>      if (!lbinding) {
>>>>>>>>          return true;
>>>>>>>>      }
>>>>>>>> @@ -884,37 +900,80 @@ get_lport_type_str(enum en_lport_type
>>>> lport_type)
>>>>>>>>      OVS_NOT_REACHED();
>>>>>>>>  }
>>>>>>>>
>>>>>>>> -/* For newly claimed ports, if 'notify_up' is 'false':
>>>>>>>> +void
>>>>>>>> +set_pb_chassis_in_sbrec(const struct sbrec_port_binding *pb,
>>>>>>>> +                        const struct sbrec_chassis *chassis_rec,
>>>>>>>> +                        bool is_set)
>>>>>>>> +{
>>>>>>>> +    if (pb->chassis != chassis_rec) {
>>>>>>>> +         if (is_set) {
>>>>>>>> +            if (pb->chassis) {
>>>>>>>> +                VLOG_INFO("Changing chassis for lport %s from %s
>> to
>>>>>>> %s.",
>>>>>>>> +                          pb->logical_port, pb->chassis->name,
>>>>>>>> +                          chassis_rec->name);
>>>>>>>> +            } else {
>>>>>>>> +                VLOG_INFO("Claiming lport %s for this chassis.",
>>>>>>>> +                          pb->logical_port);
>>>>>>>> +            }
>>>>>>>> +            for (int i = 0; i < pb->n_mac; i++) {
>>>>>>>> +                VLOG_INFO("%s: Claiming %s", pb->logical_port,
>>>>>>> pb->mac[i]);
>>>>>>>> +            }
>>>>>>>> +            sbrec_port_binding_set_chassis(pb, chassis_rec);
>>>>>>>> +        }
>>>>>>>> +    } else if (!is_set) {
>>>>>>>> +        sbrec_port_binding_set_chassis(pb, NULL);
>>>>>>>> +    }
>>>>>>>> +}
>>>>>>>> +
>>>>>>>> +void
>>>>>>>> +local_binding_set_pb(struct shash *local_bindings, const char
>>>> *pb_name,
>>>>>>>> +                     const struct sbrec_chassis *chassis_rec,
>>>>>>>> +                     struct hmap *tracked_datapaths, bool is_set)
>>>>>>>> +{
>>>>>>>> +    struct local_binding *lbinding =
>>>>>>>> +        local_binding_find(local_bindings, pb_name);
>>>>>>>> +    struct binding_lport *b_lport =
>>>>>>> local_binding_get_primary_lport(lbinding);
>>>>>>>> +
>>>>>>>> +    if (b_lport) {
>>>>>>>> +        set_pb_chassis_in_sbrec(b_lport->pb, chassis_rec, is_set);
>>>>>>>> +        if (tracked_datapaths) {
>>>>>>>> +            update_lport_tracking(b_lport->pb, tracked_datapaths,
>>>> true);
>>>>>>>> +        }
>>>>>>>> +    }
>>>>>>>> +}
>>>>>>>> +
>>>>>>>> +/* For newly claimed ports:
>>>>>>>>   * - set the 'pb.up' field to true if 'pb' has no 'parent_pb'.
>>>>>>>>   * - set the 'pb.up' field to true if 'parent_pb.up' is 'true'
>> (e.g.,
>>>>>>> for
>>>>>>>>   *   container and virtual ports).
>>>>>>>> - * Otherwise request a notification to be sent when the OVS flows
>>>>>>>> - * corresponding to 'pb' have been installed.
>>>>>>>> + *
>>>>>>>> + * Returns false if lport is not claimed due to 'sb_readonly'.
>>>>>>>> + * Returns true otherwise.
>>>>>>>>   *
>>>>>>>>   * Note:
>>>>>>>> - *   Updates (directly or through a notification) the 'pb->up'
>> field
>>>>>>> only if
>>>>>>>> - *   it's explicitly set to 'false'.
>>>>>>>> + *   Updates the 'pb->up' field only if it's explicitly set to
>>>> 'false'.
>>>>>>>>   *   This is to ensure compatibility with older versions of
>>>> ovn-northd.
>>>>>>>>   */
>>>>>>>> -static void
>>>>>>>> +static bool
>>>>>>>>  claimed_lport_set_up(const struct sbrec_port_binding *pb,
>>>>>>>>                       const struct sbrec_port_binding *parent_pb,
>>>>>>>> -                     const struct sbrec_chassis *chassis_rec,
>>>>>>>> -                     bool notify_up, struct if_status_mgr *if_mgr)
>>>>>>>> +                     bool sb_readonly)
>>>>>>>>  {
>>>>>>>> -    if (!notify_up) {
>>>>>>>> -        bool up = true;
>>>>>>>> -        if (!parent_pb || (parent_pb->n_up && parent_pb->up[0])) {
>>>>>>>> +    /* When notify_up is false in claim_port(), no state is
>> created
>>>>>>>> +     * by if_status_mgr. In such cases, return false (i.e. trigger
>>>>>>> recompute)
>>>>>>>> +     * if we can't update sb (because it is readonly).
>>>>>>>> +     */
>>>>>>>> +    bool up = true;
>>>>>>>> +    if (!parent_pb || (parent_pb->n_up && parent_pb->up[0])) {
>>>>>>>> +        if (!sb_readonly) {
>>>>>>>>              if (pb->n_up) {
>>>>>>>>                  sbrec_port_binding_set_up(pb, &up, 1);
>>>>>>>>              }
>>>>>>>> +        } else if (pb->n_up && !pb->up[0]) {
>>>>>>>> +            return false;
>>>>>>>>          }
>>>>>>>> -        return;
>>>>>>>> -    }
>>>>>>>> -
>>>>>>>> -    if (pb->chassis != chassis_rec || (pb->n_up && !pb->up[0])) {
>>>>>>>> -        if_status_mgr_claim_iface(if_mgr, pb->logical_port);
>>>>>>>>      }
>>>>>>>> +    return true;
>>>>>>>>  }
>>>>>>>>
>>>>>>>>  typedef void (*set_func)(const struct sbrec_port_binding *pb,
>>>>>>>> @@ -1057,37 +1116,35 @@ claim_lport(const struct sbrec_port_binding
>>>> *pb,
>>>>>>>>              struct hmap *tracked_datapaths,
>>>>>>>>              struct if_status_mgr *if_mgr)
>>>>>>>>  {
>>>>>>>> -    if (!sb_readonly) {
>>>>>>>> -        claimed_lport_set_up(pb, parent_pb, chassis_rec,
>> notify_up,
>>>>>>> if_mgr);
>>>>>>>> -    }
>>>>>>>> -
>>>>>>>>      enum can_bind can_bind =
>>>>>>> lport_can_bind_on_this_chassis(chassis_rec, pb);
>>>>>>>>      bool update_tracked = false;
>>>>>>>>
>>>>>>>>      if (can_bind == CAN_BIND_AS_MAIN) {
>>>>>>>>          if (pb->chassis != chassis_rec) {
>>>>>>>> -            if (sb_readonly) {
>>>>>>>> -                return false;
>>>>>>>> -            }
>>>>>>>> -
>>>>>>>> -            if (pb->chassis) {
>>>>>>>> -                VLOG_INFO("Changing chassis for lport %s from %s
>> to
>>>>>>> %s.",
>>>>>>>> -                        pb->logical_port, pb->chassis->name,
>>>>>>>> -                        chassis_rec->name);
>>>>>>>> -            } else {
>>>>>>>> -                VLOG_INFO("Claiming lport %s for this chassis.",
>>>>>>>> -                          pb->logical_port);
>>>>>>>> -            }
>>>>>>>> -            for (size_t i = 0; i < pb->n_mac; i++) {
>>>>>>>> -                VLOG_INFO("%s: Claiming %s", pb->logical_port,
>>>>>>> pb->mac[i]);
>>>>>>>> -            }
>>>>>>>> -
>>>>>>>> -            sbrec_port_binding_set_chassis(pb, chassis_rec);
>>>>>>>>              if (is_additional_chassis(pb, chassis_rec)) {
>>>>>>>> +                if (sb_readonly) {
>>>>>>>> +                    return false;
>>>>>>>> +                }
>>>>>>>>                  remove_additional_chassis(pb, chassis_rec);
>>>>>>>>              }
>>>>>>>>              update_tracked = true;
>>>>>>>>          }
>>>>>>>> +        if (!notify_up) {
>>>>>>>> +            if (!claimed_lport_set_up(pb, parent_pb,
>> sb_readonly)) {
>>>>>>>> +                return false;
>>>>>>>> +            }
>>>>>>>> +            if (pb->chassis != chassis_rec) {
>>>>>>>> +                if (sb_readonly) {
>>>>>>>> +                    return false;
>>>>>>>> +                }
>>>>>>>> +                set_pb_chassis_in_sbrec(pb, chassis_rec, true);
>>>>>>>> +            }
>>>>>>>> +        } else {
>>>>>>>> +            if ((pb->chassis != chassis_rec) || (pb->n_up &&
>>>>>>> !pb->up[0])) {
>>>>>>>> +                if_status_mgr_claim_iface(if_mgr, pb, chassis_rec,
>>>>>>>> +                                          sb_readonly);
>>>>>>>> +            }
>>>>>>>> +        }
>>>>>>>>      } else if (can_bind == CAN_BIND_AS_ADDITIONAL) {
>>>>>>>>          if (!is_additional_chassis(pb, chassis_rec)) {
>>>>>>>>              if (sb_readonly) {
>>>>>>>> @@ -1132,7 +1189,8 @@ claim_lport(const struct sbrec_port_binding
>> *pb,
>>>>>>>>   */
>>>>>>>>  static bool
>>>>>>>>  release_lport_main_chassis(const struct sbrec_port_binding *pb,
>>>>>>>> -                           bool sb_readonly)
>>>>>>>> +                           bool sb_readonly,
>>>>>>>> +                           struct if_status_mgr *if_mgr)
>>>>>>>>  {
>>>>>>>>      if (pb->encap) {
>>>>>>>>          if (sb_readonly) {
>>>>>>>> @@ -1141,11 +1199,13 @@ release_lport_main_chassis(const struct
>>>>>>> sbrec_port_binding *pb,
>>>>>>>>          sbrec_port_binding_set_encap(pb, NULL);
>>>>>>>>      }
>>>>>>>>
>>>>>>>> +    /* If sb readonly, pb->chassis unset through if-status if
>>>> present.
>>>>>>> */
>>>>>>>>      if (pb->chassis) {
>>>>>>>> -        if (sb_readonly) {
>>>>>>>> +        if (!sb_readonly) {
>>>>>>>> +            sbrec_port_binding_set_chassis(pb, NULL);
>>>>>>>> +        } else if (!if_status_mgr_iface_is_present(if_mgr,
>>>>>>> pb->logical_port)) {
>>>>>>>>              return false;
>>>>>>>>          }
>>>>>>>> -        sbrec_port_binding_set_chassis(pb, NULL);
>>>>>>>>      }
>>>>>>>>
>>>>>>>>      if (pb->virtual_parent) {
>>>>>>>> @@ -1155,7 +1215,8 @@ release_lport_main_chassis(const struct
>>>>>>> sbrec_port_binding *pb,
>>>>>>>>          sbrec_port_binding_set_virtual_parent(pb, NULL);
>>>>>>>>      }
>>>>>>>>
>>>>>>>> -    VLOG_INFO("Releasing lport %s from this chassis.",
>>>>>>> pb->logical_port);
>>>>>>>> +    VLOG_INFO("Releasing lport %s from this chassis
>>>> (sb_readonly=%d)",
>>>>>>>> +              pb->logical_port, sb_readonly);
>>>>>>>>      return true;
>>>>>>>>  }
>>>>>>>>
>>>>>>>> @@ -1189,7 +1250,7 @@ release_lport(const struct sbrec_port_binding
>>>> *pb,
>>>>>>>>                struct hmap *tracked_datapaths, struct if_status_mgr
>>>>>>> *if_mgr)
>>>>>>>>  {
>>>>>>>>      if (pb->chassis == chassis_rec) {
>>>>>>>> -        if (!release_lport_main_chassis(pb, sb_readonly)) {
>>>>>>>> +        if (!release_lport_main_chassis(pb, sb_readonly, if_mgr))
>> {
>>>>>>>>              return false;
>>>>>>>>          }
>>>>>>>>      } else if (is_additional_chassis(pb, chassis_rec)) {
>>>>>>>> @@ -1271,7 +1332,7 @@ consider_vif_lport_(const struct
>>>>>>> sbrec_port_binding *pb,
>>>>>>>>                               b_lport->lbinding->iface,
>>>>>>>>                               !b_ctx_in->ovnsb_idl_txn,
>>>>>>>>                               !parent_pb,
>>>> b_ctx_out->tracked_dp_bindings,
>>>>>>>> -                             b_ctx_out->if_mgr)){
>>>>>>>> +                             b_ctx_out->if_mgr)) {
>>>>>>>>                  return false;
>>>>>>>>              }
>>>>>>>>
>>>>>>>> @@ -1527,7 +1588,8 @@ consider_localport(const struct
>>>> sbrec_port_binding
>>>>>>> *pb,
>>>>>>>>      enum can_bind can_bind = lport_can_bind_on_this_chassis(
>>>>>>>>          b_ctx_in->chassis_rec, pb);
>>>>>>>>      if (can_bind == CAN_BIND_AS_MAIN) {
>>>>>>>> -        if (!release_lport_main_chassis(pb,
>>>> !b_ctx_in->ovnsb_idl_txn)) {
>>>>>>>> +        if (!release_lport_main_chassis(pb,
>> !b_ctx_in->ovnsb_idl_txn,
>>>>>>>> +            b_ctx_out->if_mgr)) {
>>>>>>>>              return false;
>>>>>>>>          }
>>>>>>>>      } else if (can_bind == CAN_BIND_AS_ADDITIONAL) {
>>>>>>>> diff --git a/controller/binding.h b/controller/binding.h
>>>>>>>> index 1fed06674..d20659b0b 100644
>>>>>>>> --- a/controller/binding.h
>>>>>>>> +++ b/controller/binding.h
>>>>>>>> @@ -151,8 +151,10 @@ const struct sbrec_port_binding
>>>>>>> *local_binding_get_primary_pb(
>>>>>>>>  ofp_port_t local_binding_get_lport_ofport(const struct shash
>>>>>>> *local_bindings,
>>>>>>>>                                            const char *pb_name);
>>>>>>>>
>>>>>>>> -bool local_binding_is_up(struct shash *local_bindings, const char
>>>>>>> *pb_name);
>>>>>>>> -bool local_binding_is_down(struct shash *local_bindings, const
>> char
>>>>>>> *pb_name);
>>>>>>>> +bool local_binding_is_up(struct shash *local_bindings, const char
>>>>>>> *pb_name,
>>>>>>>> +                         const struct sbrec_chassis *);
>>>>>>>> +bool local_binding_is_down(struct shash *local_bindings, const
>> char
>>>>>>> *pb_name,
>>>>>>>> +                           const struct sbrec_chassis *);
>>>>>>>>  void local_binding_set_up(struct shash *local_bindings, const char
>>>>>>> *pb_name,
>>>>>>>>                            const struct sbrec_chassis *chassis_rec,
>>>>>>>>                            const char *ts_now_str, bool
>> sb_readonly,
>>>>>>>> @@ -160,7 +162,10 @@ void local_binding_set_up(struct shash
>>>>>>> *local_bindings, const char *pb_name,
>>>>>>>>  void local_binding_set_down(struct shash *local_bindings, const
>> char
>>>>>>> *pb_name,
>>>>>>>>                              const struct sbrec_chassis
>> *chassis_rec,
>>>>>>>>                              bool sb_readonly, bool ovs_readonly);
>>>>>>>> -
>>>>>>>> +void local_binding_set_pb(struct shash *local_bindings, const char
>>>>>>> *pb_name,
>>>>>>>> +                          const struct sbrec_chassis *chassis_rec,
>>>>>>>> +                          struct hmap *tracked_datapaths,
>>>>>>>> +                          bool is_set);
>>>>>>>>  void binding_register_ovs_idl(struct ovsdb_idl *);
>>>>>>>>  void binding_run(struct binding_ctx_in *, struct binding_ctx_out
>> *);
>>>>>>>>  bool binding_cleanup(struct ovsdb_idl_txn *ovnsb_idl_txn,
>>>>>>>> @@ -178,6 +183,10 @@ void binding_dump_local_bindings(struct
>>>>>>> local_binding_data *, struct ds *);
>>>>>>>>  bool is_additional_chassis(const struct sbrec_port_binding *pb,
>>>>>>>>                             const struct sbrec_chassis
>> *chassis_rec);
>>>>>>>>
>>>>>>>> +void set_pb_chassis_in_sbrec(const struct sbrec_port_binding *pb,
>>>>>>>> +                             const struct sbrec_chassis
>> *chassis_rec,
>>>>>>>> +                             bool is_set);
>>>>>>>> +
>>>>>>>>  /* Corresponds to each Port_Binding.type. */
>>>>>>>>  enum en_lport_type {
>>>>>>>>      LP_UNKNOWN,
>>>>>>>> diff --git a/controller/if-status.c b/controller/if-status.c
>>>>>>>> index ad61844d8..7693c289b 100644
>>>>>>>> --- a/controller/if-status.c
>>>>>>>> +++ b/controller/if-status.c
>>>>>>>> @@ -24,6 +24,7 @@
>>>>>>>>  #include "lib/util.h"
>>>>>>>>  #include "timeval.h"
>>>>>>>>  #include "openvswitch/vlog.h"
>>>>>>>> +#include "lib/ovn-sb-idl.h"
>>>>>>>>
>>>>>>>>  VLOG_DEFINE_THIS_MODULE(if_status);
>>>>>>>>
>>>>>>>> @@ -53,9 +54,11 @@ VLOG_DEFINE_THIS_MODULE(if_status);
>>>>>>>>   */
>>>>>>>>
>>>>>>>>  enum if_state {
>>>>>>>> -    OIF_CLAIMED,       /* Newly claimed interface. */
>>>>>>>> -    OIF_INSTALL_FLOWS, /* Already claimed interface for which
>> flows
>>>> are
>>>>>>> still
>>>>>>>> -                        * being installed.
>>>>>>>> +    OIF_CLAIMED,       /* Newly claimed interface. pb->chassis not
>>>> yet
>>>>>>> updated.
>>>>>>>> +                        */
>>>>>>>> +    OIF_INSTALL_FLOWS, /* Claimed interface with pb->chassis
>>>>>>> successfully
>>>>>>>> +                        * updated in SB and for which flows are
>> still
>>>>>>> being
>>>>>>>> +                        * installed.
>>>>>>>>                          */
>>>>>>>>      OIF_MARK_UP,       /* Interface with flows successfully
>> installed
>>>>>>> in OVS
>>>>>>>>                          * but not yet marked "up" in the binding
>>>> module
>>>>>>> (in
>>>>>>>> @@ -78,6 +81,55 @@ static const char *if_state_names[] = {
>>>>>>>>      [OIF_INSTALLED]     = "INSTALLED",
>>>>>>>>  };
>>>>>>>>
>>>>>>>> +/*
>>>>>>>> + *       +----------------------+
>>>>>>>> + * +---> |                      |
>>>>>>>> + * | +-> |         NULL         |
>>>>>>> <--------------------------------------+++-+
>>>>>>>> + * | |   +----------------------+
>>>>>>>      |
>>>>>>>> + * | |     ^ release_iface   | claim_iface
>>>>>>>       |
>>>>>>>> + * | |     |                 V - sbrec_update_chassis(if sb is rw)
>>>>>>>       |
>>>>>>>> + * | |   +----------------------+
>>>>>>>      |
>>>>>>>> + * | |   |                      |
>>>>>>> <----------------------------------------+ |
>>>>>>>> + * | |   |       CLAIMED        |
>>>>>>> <--------------------------------------+ | |
>>>>>>>> + * | |   +----------------------+
>>>>>>>  | | |
>>>>>>>> + * | |                  | mgr_update(when sb is rw)
>>>>>>>  | | |
>>>>>>>> + * | | release_iface    |  - sbrec_update_chassis
>>>>>>>  | | |
>>>>>>>> + * | |                  |  - request seqno
>>>>>>>   | | |
>>>>>>>> + * | |                  V
>>>>>>>  | | |
>>>>>>>> + * | |   +----------------------+
>>>>>>>  | | |
>>>>>>>> + * | +-- |                      |  mgr_run(seqno not rcvd)
>>>>>>>   | | |
>>>>>>>> + * |     |    INSTALL_FLOWS     |   - set port down in sb
>>>>>>>  | | |
>>>>>>>> + * |     |                      |  mgr_update()
>>>>>>>  | | |
>>>>>>>> + * |     +----------------------+   - sbrec_update_chassis if
>> needed
>>>>>>>   | | |
>>>>>>>> + * |                    |
>>>>>>>  | | |
>>>>>>>> + * |                    |  mgr_run(seqno rcvd)
>>>>>>>   | | |
>>>>>>>> + * |                    |  - set port up in sb
>>>>>>>   | | |
>>>>>>>> + * | release_iface      |  - set ovn-installed in ovs
>>>>>>>  | | |
>>>>>>>> + * |                    V
>>>>>>>  | | |
>>>>>>>> + * |   +----------------------+
>>>>>>>  | | |
>>>>>>>> + * |   |                      |  mgr_run()
>>>>>>>   | | |
>>>>>>>> + * +-- |       MARK_UP        |  - set port up in sb
>>>>>>>   | | |
>>>>>>>> + *     |                      |  - set ovn-installed in ovs
>>>>>>>  | | |
>>>>>>>> + *     |                      |  mgr_update()
>>>>>>>  | | |
>>>>>>>> + *     +----------------------+  - sbrec_update_chassis if needed
>>>>>>>  | | |
>>>>>>>> + *              |
>>>>>>>  | | |
>>>>>>>> + *              | mgr_update(rcvd port up / ovn_installed &
>> chassis
>>>>>>> set) | | |
>>>>>>>> + *              V
>>>>>>>  | | |
>>>>>>>> + *     +----------------------+
>>>>>>>  | | |
>>>>>>>> + *     |      INSTALLED       | ------------> claim_iface
>>>>>>> ---------------+ | |
>>>>>>>> + *     +----------------------+
>>>>>>>    | |
>>>>>>>> + *              |
>>>>>>>    | |
>>>>>>>> + *              | release_iface
>>>>>>>    | |
>>>>>>>> + *              V
>>>>>>>    | |
>>>>>>>> + *     +----------------------+
>>>>>>>    | |
>>>>>>>> + *     |                      | ------------> claim_iface
>>>>>>> -----------------+ |
>>>>>>>> + *     |      MARK_DOWN       | ------> mgr_update(rcvd port down)
>>>>>>> ----------+
>>>>>>>> + *     |                      | mgr_run()
>>>>>>>> + *     |                      | - set port down in sb
>>>>>>>> + *     |                      | mgr_update()
>>>>>>>> + *     +----------------------+ - sbrec_update_chassis(NULL)
>>>>>>>> + */
>>>>>>>> +
>>>>>>>>  struct ovs_iface {
>>>>>>>>      char *id;               /* Extracted from OVS
>>>>>>> external_ids.iface_id. */
>>>>>>>>      enum if_state state;    /* State of the interface in the state
>>>>>>> machine. */
>>>>>>>> @@ -85,6 +137,7 @@ struct ovs_iface {
>>>>>>>>                               * be fully programmed in OVS.  Only
>> used
>>>>>>> in state
>>>>>>>>                               * OIF_INSTALL_FLOWS.
>>>>>>>>                               */
>>>>>>>> +    bool chassis_update_required;  /* If true, pb->chassis must be
>>>>>>> updated. */
>>>>>>>>  };
>>>>>>>>
>>>>>>>>  static uint64_t ifaces_usage;
>>>>>>>> @@ -158,14 +211,23 @@ if_status_mgr_destroy(struct if_status_mgr
>> *mgr)
>>>>>>>>  }
>>>>>>>>
>>>>>>>>  void
>>>>>>>> -if_status_mgr_claim_iface(struct if_status_mgr *mgr, const char
>>>>>>> *iface_id)
>>>>>>>> +if_status_mgr_claim_iface(struct if_status_mgr *mgr,
>>>>>>>> +                          const struct sbrec_port_binding *pb,
>>>>>>>> +                          const struct sbrec_chassis *chassis_rec,
>>>>>>>> +                          bool sb_readonly)
>>>>>>>>  {
>>>>>>>> +    const char *iface_id = pb->logical_port;
>>>>>>>>      struct ovs_iface *iface = shash_find_data(&mgr->ifaces,
>>>> iface_id);
>>>>>>>>
>>>>>>>>      if (!iface) {
>>>>>>>>          iface = ovs_iface_create(mgr, iface_id, OIF_CLAIMED);
>>>>>>>>      }
>>>>>>>> -
>>>>>>>> +    if (!sb_readonly) {
>>>>>>>> +        set_pb_chassis_in_sbrec(pb, chassis_rec, true);
>>>>>>>> +        iface->chassis_update_required = false;
>>>>>>>> +    } else {
>>>>>>>> +        iface->chassis_update_required = true;
>>>>>>>> +    }
>>>>>>>>      switch (iface->state) {
>>>>>>>>      case OIF_CLAIMED:
>>>>>>>>      case OIF_INSTALL_FLOWS:
>>>>>>>> @@ -182,6 +244,12 @@ if_status_mgr_claim_iface(struct if_status_mgr
>>>>>>> *mgr, const char *iface_id)
>>>>>>>>      }
>>>>>>>>  }
>>>>>>>>
>>>>>>>> +bool
>>>>>>>> +if_status_mgr_iface_is_present(struct if_status_mgr *mgr, const
>> char
>>>>>>> *iface_id)
>>>>>>>> +{
>>>>>>>> +    return !!shash_find_data(&mgr->ifaces, iface_id);
>>>>>>>> +}
>>>>>>>> +
>>>>>>>>  void
>>>>>>>>  if_status_mgr_release_iface(struct if_status_mgr *mgr, const char
>>>>>>> *iface_id)
>>>>>>>>  {
>>>>>>>> @@ -246,9 +314,39 @@ if_status_mgr_delete_iface(struct
>> if_status_mgr
>>>>>>> *mgr, const char *iface_id)
>>>>>>>>      }
>>>>>>>>  }
>>>>>>>>
>>>>>>>> +bool
>>>>>>>> +if_status_handle_claims(struct if_status_mgr *mgr,
>>>>>>>> +                        struct local_binding_data *binding_data,
>>>>>>>> +                        const struct sbrec_chassis *chassis_rec,
>>>>>>>> +                        struct hmap *tracked_datapath,
>>>>>>>> +                        bool sb_readonly)
>>>>>>>> +{
>>>>>>>> +    if (!binding_data || sb_readonly) {
>>>>>>>> +        return false;
>>>>>>>> +    }
>>>>>>>> +
>>>>>>>> +    struct shash *bindings = &binding_data->bindings;
>>>>>>>> +    struct hmapx_node *node;
>>>>>>>> +
>>>>>>>> +    bool rc = false;
>>>>>>>> +    HMAPX_FOR_EACH (node, &mgr->ifaces_per_state[OIF_CLAIMED]) {
>>>>>>>> +        struct ovs_iface *iface = node->data;
>>>>>>>> +        if (iface->chassis_update_required) {
>>>>>>>
>>>>>>> Thanks Xavier for the revision. The state machine looks more clear
>> now,
>>>>>>> but I have a major concern for the use of chassis_update_required.
>> This
>>>>>>> bool flag is used to decide if an update to SB is needed, and once
>> a SB
>>>>>>> update is requested, it is set to false, and assumes the SB update
>> will
>>>>>>> succeed immediately. However, the assumption may be wrong. There
>> can be
>>>>>>> different kinds of reasons that the subsequent SB update fails, or
>>>> delayed,
>>>>>>> so this flag is not reliable. Instead, in CLAIMED state, the
>>>> responsibility
>>>>>>> to make sure the SB update is completed. If the transaction is
>>>> in-progress,
>>>>>>> the sb_readonly is true. So if sb_readonly is false, it means
>> nothing
>>>> is
>>>>>>> in-progress, so we can always check if (!sb_readonly && <SB chassis
>> is
>>>> not
>>>>>>> updated for the port-binding>) we should just send the update,
>>>> regardless
>>>>>>> of whether we have requested it before. Please also see another
>> comment
>>>>>>> below for the state transition.
>>>>>>>
>>>>>>>> +            VLOG_INFO("if_status_handle_claims for %s",
>> iface->id);
>>>>>>>> +            local_binding_set_pb(bindings, iface->id, chassis_rec,
>>>>>>>> +                                 tracked_datapath, true);
>>>>>>>> +            rc = true;
>>>>>>>> +        }
>>>>>>>> +        iface->chassis_update_required = false;
>>>>>>>> +    }
>>>>>>>> +    return rc;
>>>>>>>> +}
>>>>>>>> +
>>>>>>>>  void
>>>>>>>>  if_status_mgr_update(struct if_status_mgr *mgr,
>>>>>>>> -                     struct local_binding_data *binding_data)
>>>>>>>> +                     struct local_binding_data *binding_data,
>>>>>>>> +                     const struct sbrec_chassis *chassis_rec,
>>>>>>>> +                     bool sb_readonly)
>>>>>>>>  {
>>>>>>>>      if (!binding_data) {
>>>>>>>>          return;
>>>>>>>> @@ -257,13 +355,25 @@ if_status_mgr_update(struct if_status_mgr
>> *mgr,
>>>>>>>>      struct shash *bindings = &binding_data->bindings;
>>>>>>>>      struct hmapx_node *node;
>>>>>>>>
>>>>>>>> +    /* Interfaces in OIF_MARK_UP state have already set their
>>>>>>> pb->chassis.
>>>>>>>> +     * However, it might have been reset by another hv.
>>>>>>>> +     */
>>>>>>>>      /* Move all interfaces that have been confirmed "up" by the
>>>> binding
>>>>>>> module,
>>>>>>>>       * from OIF_MARK_UP to OIF_INSTALLED.
>>>>>>>>       */
>>>>>>>>      HMAPX_FOR_EACH_SAFE (node,
>> &mgr->ifaces_per_state[OIF_MARK_UP]) {
>>>>>>>>          struct ovs_iface *iface = node->data;
>>>>>>>>
>>>>>>>> -        if (local_binding_is_up(bindings, iface->id)) {
>>>>>>>> +        if (iface->chassis_update_required) {
>>>>>>>> +            if (!sb_readonly) {
>>>>>>>> +                iface->chassis_update_required = false;
>>>>>>>> +                local_binding_set_pb(bindings, iface->id,
>>>> chassis_rec,
>>>>>>>> +                                     NULL, true);
>>>>>>>> +            } else {
>>>>>>>> +                continue;
>>>>>>>> +            }
>>>>>>>> +        }
>>>>>>>> +        if (local_binding_is_up(bindings, iface->id,
>> chassis_rec)) {
>>>>>>>>              ovs_iface_set_state(mgr, iface, OIF_INSTALLED);
>>>>>>>>          }
>>>>>>>>      }
>>>>>>>> @@ -274,23 +384,53 @@ if_status_mgr_update(struct if_status_mgr
>> *mgr,
>>>>>>>>      HMAPX_FOR_EACH_SAFE (node,
>>>> &mgr->ifaces_per_state[OIF_MARK_DOWN]) {
>>>>>>>>          struct ovs_iface *iface = node->data;
>>>>>>>>
>>>>>>>> -        if (local_binding_is_down(bindings, iface->id)) {
>>>>>>>> +        if (!sb_readonly) {
>>>>>>>> +            local_binding_set_pb(bindings, iface->id, chassis_rec,
>>>>>>>> +                                 NULL, false);
>>>>>>>> +        }
>>>>>>>> +        if (local_binding_is_down(bindings, iface->id,
>> chassis_rec))
>>>> {
>>>>>>>>              ovs_iface_destroy(mgr, iface);
>>>>>>>>          }
>>>>>>>>      }
>>>>>>>>
>>>>>>>> -    /* Register for a notification about flows being installed in
>> OVS
>>>>>>> for all
>>>>>>>> -     * newly claimed interfaces.
>>>>>>>> +    if (!sb_readonly) {
>>>>>>>> +        HMAPX_FOR_EACH_SAFE (node,
>>>>>>> &mgr->ifaces_per_state[OIF_INSTALL_FLOWS]) {
>>>>>>>> +            struct ovs_iface *iface = node->data;
>>>>>>>> +
>>>>>>>> +            if (iface->chassis_update_required) {
>>>>>>>> +                iface->chassis_update_required = false;
>>>>>>>> +                local_binding_set_pb(bindings, iface->id,
>>>> chassis_rec,
>>>>>>>> +                                     NULL, true);
>>>>>>>> +            }
>>>>>>>> +        }
>>>>>>>> +    }
>>>>>>>> +
>>>>>>>> +    /* Update Port_Binding->chassis for newly claimed interfaces
>>>>>>>> +     * Register for a notification about flows being installed in
>> OVS
>>>>>>> for all
>>>>>>>> +     * newly claimed interfaces for which we could update
>>>> pb->chassis.
>>>>>>>>       *
>>>>>>>>       * Move them from OIF_CLAIMED to OIF_INSTALL_FLOWS.
>>>>>>>>       */
>>>>>>>> -    bool new_ifaces = false;
>>>>>>>> -    HMAPX_FOR_EACH_SAFE (node,
>> &mgr->ifaces_per_state[OIF_CLAIMED]) {
>>>>>>>> -        struct ovs_iface *iface = node->data;
>>>>>>>>
>>>>>>>> -        ovs_iface_set_state(mgr, iface, OIF_INSTALL_FLOWS);
>>>>>>>> -        iface->install_seqno = mgr->iface_seqno + 1;
>>>>>>>> -        new_ifaces = true;
>>>>>>>> +    bool new_ifaces = false;
>>>>>>>> +    if (!sb_readonly) {
>>>>>>>> +        HMAPX_FOR_EACH_SAFE (node,
>>>> &mgr->ifaces_per_state[OIF_CLAIMED])
>>>>>>> {
>>>>>>>> +            struct ovs_iface *iface = node->data;
>>>>>>>> +            /* No need to check for chassis_update_required as
>>>> already
>>>>>>> done
>>>>>>>> +             * in if_status_handle_claims or
>>>> if_status_mgr_claim_iface
>>>>>>>> +             */
>>>>>>>> +            ovs_iface_set_state(mgr, iface, OIF_INSTALL_FLOWS);
>>>>>>>
>>>>>>> We need to make sure the chassis in SB port-binding is up-to-date
>> (i.e.
>>>>>>> the update notification from SB DB has been received) before moving
>> to
>>>>>>> INSTALL_FLOWS. Otherwise, it is still possible that the state is
>> moved
>>>> too
>>>>>>> early and end up with incomplete flow installation for the lport
>> when
>>>> the
>>>>>>> state is finally moved to INSTALLED.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Han
>>>>>>>
>>>>>>>> +            iface->install_seqno = mgr->iface_seqno + 1;
>>>>>>>> +            new_ifaces = true;
>>>>>>>> +        }
>>>>>>>> +    } else {
>>>>>>>> +        HMAPX_FOR_EACH_SAFE (node,
>>>> &mgr->ifaces_per_state[OIF_CLAIMED])
>>>>>>> {
>>>>>>>> +            struct ovs_iface *iface = node->data;
>>>>>>>> +            static struct vlog_rate_limit rl =
>>>> VLOG_RATE_LIMIT_INIT(5,
>>>>>>> 1);
>>>>>>>> +            VLOG_INFO_RL(&rl,
>>>>>>>> +                         "Not updating pb chassis for %s now as "
>>>>>>>> +                         "sb is readonly", iface->id);
>>>>>>>> +        }
>>>>>>>>      }
>>>>>>>>
>>>>>>>>      /* Request a seqno update when the flows for new interfaces
>> have
>>>>>>> been
>>>>>>>> @@ -403,7 +543,7 @@ if_status_mgr_update_bindings(struct
>> if_status_mgr
>>>>>>> *mgr,
>>>>>>>>      struct hmapx_node *node;
>>>>>>>>
>>>>>>>>      /* Notify the binding module to set "down" all bindings that
>> are
>>>>>>> still
>>>>>>>> -     * in the process of being installed in OVS, i.e., are not yet
>>>>>>> instsalled.
>>>>>>>> +     * in the process of being installed in OVS, i.e., are not yet
>>>>>>> installed.
>>>>>>>>       */
>>>>>>>>      HMAPX_FOR_EACH (node,
>> &mgr->ifaces_per_state[OIF_INSTALL_FLOWS])
>>>> {
>>>>>>>>          struct ovs_iface *iface = node->data;
>>>>>>>> diff --git a/controller/if-status.h b/controller/if-status.h
>>>>>>>> index bb8a3950d..f9b05d30d 100644
>>>>>>>> --- a/controller/if-status.h
>>>>>>>> +++ b/controller/if-status.h
>>>>>>>> @@ -27,15 +27,27 @@ struct if_status_mgr
>> *if_status_mgr_create(void);
>>>>>>>>  void if_status_mgr_clear(struct if_status_mgr *);
>>>>>>>>  void if_status_mgr_destroy(struct if_status_mgr *);
>>>>>>>>
>>>>>>>> -void if_status_mgr_claim_iface(struct if_status_mgr *, const char
>>>>>>> *iface_id);
>>>>>>>> +void if_status_mgr_claim_iface(struct if_status_mgr *,
>>>>>>>> +                               const struct sbrec_port_binding
>> *pb,
>>>>>>>> +                               const struct sbrec_chassis
>>>> *chassis_rec,
>>>>>>>> +                               bool sb_readonly);
>>>>>>>>  void if_status_mgr_release_iface(struct if_status_mgr *, const
>> char
>>>>>>> *iface_id);
>>>>>>>>  void if_status_mgr_delete_iface(struct if_status_mgr *, const char
>>>>>>> *iface_id);
>>>>>>>>
>>>>>>>> -void if_status_mgr_update(struct if_status_mgr *, struct
>>>>>>> local_binding_data *);
>>>>>>>> +void if_status_mgr_update(struct if_status_mgr *, struct
>>>>>>> local_binding_data *,
>>>>>>>> +                          const struct sbrec_chassis *chassis,
>>>>>>>> +                          bool sb_readonly);
>>>>>>>>  void if_status_mgr_run(struct if_status_mgr *mgr, struct
>>>>>>> local_binding_data *,
>>>>>>>>                         const struct sbrec_chassis *,
>>>>>>>>                         bool sb_readonly, bool ovs_readonly);
>>>>>>>>  void if_status_mgr_get_memory_usage(struct if_status_mgr *mgr,
>>>>>>>>                                      struct simap *usage);
>>>>>>>> +bool if_status_mgr_iface_is_present(struct if_status_mgr *mgr,
>>>>>>>> +                                    const char *iface_id);
>>>>>>>> +bool if_status_handle_claims(struct if_status_mgr *mgr,
>>>>>>>> +                             struct local_binding_data
>> *binding_data,
>>>>>>>> +                             const struct sbrec_chassis
>> *chassis_rec,
>>>>>>>> +                             struct hmap *tracked_datapath,
>>>>>>>> +                             bool sb_readonly);
>>>>>>>>
>>>>>>>>  # endif /* controller/if-status.h */
>>>>>>>> diff --git a/controller/ovn-controller.c
>> b/controller/ovn-controller.c
>>>>>>>> index 69615308e..3947baf03 100644
>>>>>>>> --- a/controller/ovn-controller.c
>>>>>>>> +++ b/controller/ovn-controller.c
>>>>>>>> @@ -1464,6 +1464,73 @@ en_runtime_data_run(struct engine_node
>> *node,
>>>>>>> void *data)
>>>>>>>>      engine_set_node_state(node, EN_UPDATED);
>>>>>>>>  }
>>>>>>>>
>>>>>>>> +struct ed_type_sb_ro {
>>>>>>>> +    bool sb_readonly;
>>>>>>>> +};
>>>>>>>> +
>>>>>>>> +static void *
>>>>>>>> +en_sb_ro_init(struct engine_node *node OVS_UNUSED,
>>>>>>>> +              struct engine_arg *arg OVS_UNUSED)
>>>>>>>> +{
>>>>>>>> +    struct ed_type_sb_ro *data = xzalloc(sizeof *data);
>>>>>>>> +    return data;
>>>>>>>> +}
>>>>>>>> +
>>>>>>>> +static void
>>>>>>>> +en_sb_ro_run(struct engine_node *node, void *data)
>>>>>>>> +{
>>>>>>>> +    struct ed_type_sb_ro *sb_ro_data = data;
>>>>>>>> +    bool sb_readonly = !engine_get_context()->ovnsb_idl_txn;
>>>>>>>> +    if (sb_ro_data->sb_readonly != sb_readonly) {
>>>>>>>> +        sb_ro_data->sb_readonly = sb_readonly;
>>>>>>>> +        if (!sb_ro_data->sb_readonly) {
>>>>>>>> +            engine_set_node_state(node, EN_UPDATED);
>>>>>>>> +        }
>>>>>>>> +    }
>>>>>>>> +}
>>>>>>>> +
>>>>>>>> +static void
>>>>>>>> +en_sb_ro_cleanup(void *data OVS_UNUSED)
>>>>>>>> +{
>>>>>>>> +}
>>>>>>>> +
>>>>>>>> +static bool
>>>>>>>> +runtime_data_sb_ro_handler(struct engine_node *node, void *data)
>>>>>>>> +{
>>>>>>>> +    const struct sbrec_chassis *chassis = NULL;
>>>>>>>> +
>>>>>>>> +    struct ovsrec_open_vswitch_table *ovs_table =
>>>>>>>> +        (struct ovsrec_open_vswitch_table *)EN_OVSDB_GET(
>>>>>>>> +            engine_get_input("OVS_open_vswitch", node));
>>>>>>>> +
>>>>>>>> +    const char *chassis_id = get_ovs_chassis_id(ovs_table);
>>>>>>>> +
>>>>>>>> +    struct ovsdb_idl_index *sbrec_chassis_by_name =
>>>>>>>> +        engine_ovsdb_node_get_index(
>>>>>>>> +                engine_get_input("SB_chassis", node),
>>>>>>>> +                "name");
>>>>>>>> +
>>>>>>>> +    if (chassis_id) {
>>>>>>>> +        chassis = chassis_lookup_by_name(sbrec_chassis_by_name,
>>>>>>> chassis_id);
>>>>>>>> +    }
>>>>>>>> +    if (chassis) {
>>>>>>>> +        struct ed_type_runtime_data *rt_data = data;
>>>>>>>> +        bool sb_readonly = !engine_get_context()->ovnsb_idl_txn;
>>>>>>>> +        struct controller_engine_ctx *ctrl_ctx =
>>>>>>>> +            engine_get_context()->client_ctx;
>>>>>>>> +
>>>>>>>> +        if (if_status_handle_claims(ctrl_ctx->if_mgr,
>>>>>>>> +                                    &rt_data->lbinding_data,
>>>>>>>> +                                    chassis,
>>>>>>>> +                                    &rt_data->tracked_dp_bindings,
>>>>>>>> +                                    sb_readonly)) {
>>>>>>>> +            engine_set_node_state(node, EN_UPDATED);
>>>>>>>> +            rt_data->tracked = true;
>>>>>>>> +        }
>>>>>>>> +    }
>>>>>>>> +    return true;
>>>>>>>> +}
>>>>>>>> +
>>>>>>>>  static bool
>>>>>>>>  runtime_data_ovs_interface_shadow_handler(struct engine_node
>> *node,
>>>>>>> void *data)
>>>>>>>>  {
>>>>>>>> @@ -3528,6 +3595,7 @@ main(int argc, char *argv[])
>>>>>>>>      stopwatch_create(VIF_PLUG_RUN_STOPWATCH_NAME, SW_MS);
>>>>>>>>
>>>>>>>>      /* Define inc-proc-engine nodes. */
>>>>>>>> +    ENGINE_NODE(sb_ro, "sb_ro");
>>>>>>>>      ENGINE_NODE_WITH_CLEAR_TRACK_DATA_IS_VALID(ct_zones,
>> "ct_zones");
>>>>>>>>      ENGINE_NODE_WITH_CLEAR_TRACK_DATA(ovs_interface_shadow,
>>>>>>>>                                        "ovs_interface_shadow");
>>>>>>>> @@ -3664,6 +3732,7 @@ main(int argc, char *argv[])
>>>>>>>>      engine_add_input(&en_ovs_interface_shadow, &en_ovs_interface,
>>>>>>>>                       ovs_interface_shadow_ovs_interface_handler);
>>>>>>>>
>>>>>>>> +    engine_add_input(&en_runtime_data, &en_sb_ro,
>>>>>>> runtime_data_sb_ro_handler);
>>>>>>>>      engine_add_input(&en_runtime_data, &en_ofctrl_is_connected,
>>>> NULL);
>>>>>>>>
>>>>>>>>      engine_add_input(&en_runtime_data, &en_ovs_open_vswitch,
>> NULL);
>>>>>>>> @@ -4098,7 +4167,8 @@ main(int argc, char *argv[])
>>>>>>>>                          runtime_data ?
>> &runtime_data->lbinding_data :
>>>>>>> NULL;
>>>>>>>>
>>>>  stopwatch_start(IF_STATUS_MGR_UPDATE_STOPWATCH_NAME,
>>>>>>>>                                      time_msec());
>>>>>>>> -                    if_status_mgr_update(if_mgr, binding_data);
>>>>>>>> +                    if_status_mgr_update(if_mgr, binding_data,
>>>> chassis,
>>>>>>>> +                                         !ovnsb_idl_txn);
>>>>>>>>
>>>>  stopwatch_stop(IF_STATUS_MGR_UPDATE_STOPWATCH_NAME,
>>>>>>>>                                     time_msec());
>>>>>>>>
>>>>>>>> diff --git a/tests/ovn-macros.at b/tests/ovn-macros.at
>>>>>>>> index 335f9158c..8fd6ae6f7 100644
>>>>>>>> --- a/tests/ovn-macros.at
>>>>>>>> +++ b/tests/ovn-macros.at
>>>>>>>> @@ -759,3 +759,15 @@
>>>> m4_define([OVN_FOR_EACH_NORTHD_WITHOUT_DP_GROUPS],
>>>>>>>>       [m4_foreach([NORTHD_USE_DP_GROUPS], [no],
>>>>>>>>         [m4_foreach([NORTHD_USE_PARALLELIZATION], [yes, no], [$1
>>>>>>>>  ])])])])
>>>>>>>> +
>>>>>>>> +# OVN_NBCTL(NBCTL_COMMAND) adds NBCTL_COMMAND to list of commands
>> to
>>>> be
>>>>>>> run by RUN_OVN_NBCTL().
>>>>>>>> +m4_define([OVN_NBCTL], [
>>>>>>>> +    command="${command} -- $1"
>>>>>>>> +])
>>>>>>>> +
>>>>>>>> +# RUN_OVN_NBCTL() executes list of commands built by the
>> OVN_NBCTL()
>>>>>>> macro.
>>>>>>>> +m4_define([RUN_OVN_NBCTL], [
>>>>>>>> +    check ovn-nbctl ${command}
>>>>>>>> +    unset command
>>>>>>>> +])
>>>>>>>> +
>>>>>>>> diff --git a/tests/ovn.at b/tests/ovn.at
>>>>>>>> index bfaa41962..94d16bac9 100644
>>>>>>>> --- a/tests/ovn.at
>>>>>>>> +++ b/tests/ovn.at
>>>>>>>> @@ -102,6 +102,18 @@ m4_divert_text([PREPARE_TESTS],
>>>>>>>>           test 1 -le $(as $hv1 ovs-ofctl dump-flows br-int | grep
>> -c
>>>>>>> "output:$ofport")
>>>>>>>>       ])
>>>>>>>>     }
>>>>>>>> +
>>>>>>>> +   ovn_wait_remote_input_flows () {
>>>>>>>> +     hv1=$1
>>>>>>>> +     hv2=$2
>>>>>>>> +     echo "$3: waiting for flows for remote input on $hv1"
>>>>>>>> +     # Wait for a flow outputing  to remote input
>>>>>>>> +     OVS_WAIT_UNTIL([
>>>>>>>> +         ofport=$(as $hv1 ovs-vsctl --bare --columns ofport find
>>>>>>> Interface name=ovn-${hv2}-0)
>>>>>>>> +         echo "tunnel port=$ofport"
>>>>>>>> +         test 1 -le $(as $hv1 ovs-ofctl dump-flows br-int | grep
>> -c
>>>>>>> "in_port=$ofport")
>>>>>>>> +     ])
>>>>>>>> +   }
>>>>>>>>  ])
>>>>>>>>
>>>>>>>>  m4_define([OVN_CHECK_PACKETS],
>>>>>>>> @@ -127,6 +139,8 @@ m4_define([OVN_WAIT_PATCH_PORT_FLOWS],
>>>>>>>>  m4_define([OVN_WAIT_REMOTE_OUTPUT_FLOWS],
>>>>>>>>    [ovn_wait_remote_output_flows "$1" "$2" "__file__:__line__"])
>>>>>>>>
>>>>>>>> +m4_define([OVN_WAIT_REMOTE_INPUT_FLOWS],
>>>>>>>> +  [ovn_wait_remote_input_flows "$1" "$2" "__file__:__line__"])
>>>>>>>>
>>>>>>>>  AT_BANNER([OVN components])
>>>>>>>>
>>>>>>>> @@ -14056,6 +14070,11 @@ wait_column "$hv1_uuid" Port_Binding
>>>>>>> requested_chassis logical_port=lsp0
>>>>>>>>  wait_column "$hv2_uuid" Port_Binding additional_chassis
>>>>>>> logical_port=lsp0
>>>>>>>>  wait_column "$hv2_uuid" Port_Binding requested_additional_chassis
>>>>>>> logical_port=lsp0
>>>>>>>>
>>>>>>>> +# Check ovn-installed updated for main chassis
>>>>>>>> +wait_for_ports_up
>>>>>>>> +OVS_WAIT_UNTIL([test `as hv1 ovs-vsctl get Interface lsp0
>>>>>>> external_ids:ovn-installed` = '"true"'])
>>>>>>>> +OVS_WAIT_UNTIL([test x`as hv2 ovs-vsctl get Interface lsp0
>>>>>>> external_ids:ovn-installed` = x])
>>>>>>>> +
>>>>>>>>  # Check that setting iface:encap-ip populates
>>>>>>> Port_Binding:additional_encap
>>>>>>>>  wait_row_count Encap 2 chassis_name=hv1
>>>>>>>>  wait_row_count Encap 2 chassis_name=hv2
>>>>>>>> @@ -14081,6 +14100,11 @@ wait_column "$hv2_uuid" Port_Binding
>>>>>>> requested_chassis logical_port=lsp0
>>>>>>>>  wait_column "" Port_Binding additional_chassis logical_port=lsp0
>>>>>>>>  wait_column "" Port_Binding requested_additional_chassis
>>>>>>> logical_port=lsp0
>>>>>>>>
>>>>>>>> +# Check ovn-installed updated for main chassis and not for other
>>>> chassis
>>>>>>>> +wait_for_ports_up
>>>>>>>> +OVS_WAIT_UNTIL([test `as hv2 ovs-vsctl get Interface lsp0
>>>>>>> external_ids:ovn-installed` = '"true"'])
>>>>>>>> +OVS_WAIT_UNTIL([test x`as hv1 ovs-vsctl get Interface lsp0
>>>>>>> external_ids:ovn-installed` = x])
>>>>>>>> +
>>>>>>>>  # Check that additional_encap is cleared
>>>>>>>>  wait_column "" Port_Binding additional_encap logical_port=lsp0
>>>>>>>>
>>>>>>>> @@ -15370,7 +15394,10 @@ AT_CHECK([as hv1 ovs-ofctl dump-flows
>> br-int
>>>>>>> table=65 | grep actions=output:1],
>>>>>>>>  echo "verifying that lsp0 binding moves when requested-chassis is
>>>>>>> changed"
>>>>>>>>
>>>>>>>>  ovn-nbctl lsp-set-options lsp0 requested-chassis=hv2
>>>>>>>> -OVS_WAIT_UNTIL([test 1 = $(grep -c "Releasing lport lsp0 from this
>>>>>>> chassis" hv1/ovn-controller.log)])
>>>>>>>> +
>>>>>>>> +# We might see multiple "Releasing lport ...", when sb is read
>> only
>>>>>>>> +OVS_WAIT_UNTIL([test 1 -le $(grep -c "Releasing lport lsp0 from
>> this
>>>>>>> chassis" hv1/ovn-controller.log)])
>>>>>>>> +
>>>>>>>>  wait_column "$hv2_uuid" Port_Binding chassis logical_port=lsp0
>>>>>>>>
>>>>>>>>  # (6) Chassis hv2 should add flows and hv1 should not.
>>>>>>>> @@ -15416,7 +15443,7 @@ AT_CHECK([as hv1 ovs-ofctl dump-flows
>> br-int
>>>>>>> table=0 | grep in_port=1], [0], [ig
>>>>>>>>  AT_CHECK([as hv1 ovs-ofctl dump-flows br-int table=65 | grep
>>>>>>> actions=output:1], [0], [ignore])
>>>>>>>>
>>>>>>>>  check ovn-nbctl --wait=hv lsp-set-options lsp0
>>>>>>> requested-chassis=non-existant-chassis
>>>>>>>> -OVS_WAIT_UNTIL([test 1 = $(grep -c "Releasing lport lsp0 from this
>>>>>>> chassis" hv1/ovn-controller.log)])
>>>>>>>> +OVS_WAIT_UNTIL([test 1 -le $(grep -c "Releasing lport lsp0 from
>> this
>>>>>>> chassis" hv1/ovn-controller.log)])
>>>>>>>>  check ovn-nbctl --wait=hv sync
>>>>>>>>  wait_column '' Port_Binding chasssi logical_port=lsp0
>>>>>>>>  AT_CHECK([as hv1 ovs-ofctl dump-flows br-int table=0 | grep
>>>> in_port=1],
>>>>>>> [1], [])
>>>>>>>> @@ -32418,3 +32445,119 @@ AT_CHECK([test $(ovn-sbctl list fdb |
>> grep
>>>> -c
>>>>>>> "00:00:00:00:10:30") = 0])
>>>>>>>>  OVN_CLEANUP([hv1])
>>>>>>>>  AT_CLEANUP
>>>>>>>>  ])
>>>>>>>> +
>>>>>>>> +OVN_FOR_EACH_NORTHD([
>>>>>>>> +AT_SETUP([recomputes])
>>>>>>>> +ovn_start
>>>>>>>> +
>>>>>>>> +n_hv=4
>>>>>>>> +
>>>>>>>> +# Add chassis
>>>>>>>> +net_add n1
>>>>>>>> +for i in $(seq 1 $n_hv); do
>>>>>>>> +    sim_add hv$i
>>>>>>>> +    as hv$i
>>>>>>>> +    check ovs-vsctl add-br br-phys
>>>>>>>> +    ovs-vsctl set open .
>>>> external-ids:ovn-bridge-mappings=phys:br-phys
>>>>>>>> +    ovn_attach n1 br-phys 192.168.0.$i 24 geneve
>>>>>>>> +done
>>>>>>>> +
>>>>>>>> +add_switch_ports() {
>>>>>>>> +    start_port=$1
>>>>>>>> +    end_port=$2
>>>>>>>> +    nb_hv=$3
>>>>>>>> +    bulk_size=$4
>>>>>>>> +    for ((i=start_port; i<end_port; )) do
>>>>>>>> +        start_bulk=$i
>>>>>>>> +        for hv in $(seq 1 $nb_hv); do
>>>>>>>> +            end_bulk=$((start_bulk+bulk_size-1))
>>>>>>>> +            for port in $(seq $start_bulk $end_bulk); do
>>>>>>>> +                logical_switch_port=lsp${port}
>>>>>>>> +                OVN_NBCTL(lsp-add ls1 $logical_switch_port)
>>>>>>>> +                OVN_NBCTL(lsp-set-addresses $logical_switch_port
>>>>>>> dynamic)
>>>>>>>> +            done
>>>>>>>> +            start_bulk=$((end_bulk+1))
>>>>>>>> +        done
>>>>>>>> +        RUN_OVN_NBCTL()
>>>>>>>> +
>>>>>>>> +        start_bulk=$i
>>>>>>>> +        for hv in $(seq 1 $nb_hv); do
>>>>>>>> +            end_bulk=$((start_bulk+bulk_size-1))
>>>>>>>> +            for port in $(seq $start_bulk $end_bulk); do
>>>>>>>> +                logical_switch_port=lsp${port}
>>>>>>>> +                as hv$hv ovs-vsctl \
>>>>>>>> +                    --no-wait -- add-port br-int vif${port} \
>>>>>>>> +                    -- set Interface vif${port}
>>>>>>> external_ids:iface-id=$logical_switch_port
>>>>>>>> +            done
>>>>>>>> +            start_bulk=$((end_bulk+1))
>>>>>>>> +        done
>>>>>>>> +        i=$((end_bulk+1))
>>>>>>>> +    done
>>>>>>>> +}
>>>>>>>> +check ovn-nbctl ls-add ls1
>>>>>>>> +check ovn-nbctl set Logical_Switch ls1 other_config:subnet=
>>>> 10.1.0.0/16
>>>>>>>> +check ovn-nbctl set Logical_Switch ls1
>>>>>>> other_config:exclude_ips=10.1.255.254
>>>>>>>> +
>>>>>>>> +check ovn-nbctl lr-add lr1
>>>>>>>> +check ovn-nbctl lsp-add ls1 lsp0 -- set Logical_Switch_Port lsp0
>>>>>>> type=router options:router-port=lrp0 addresses=dynamic
>>>>>>>> +check ovn-nbctl lrp-add lr1 lrp0 "f0:00:00:01:00:01"
>> 10.1.255.254/16
>>>>>>>> +check ovn-nbctl lr-nat-add lr1 snat 10.2.0.1 10.1.0.0/16
>>>>>>>> +
>>>>>>>> +lflow_run=0
>>>>>>>> +check ovn-nbctl --wait=hv sync
>>>>>>>> +
>>>>>>>> +# Tunnel ports might not be added (yet) at this point on slow
>> system.
>>>>>>>> +# Wait for flows related to such ports to ensure those ports have
>>>> been
>>>>>>> added
>>>>>>>> +# before we measure recomputes. Otherwise, ovs_interface handler
>>>> might
>>>>>>> be run
>>>>>>>> +# afterwards for tunnel ports, causing recomputes.
>>>>>>>> +for i in $(seq 1 $n_hv); do
>>>>>>>> +    for j in $(seq 1 $n_hv); do
>>>>>>>> +        if test $i != $j; then
>>>>>>>> +            OVN_WAIT_REMOTE_INPUT_FLOWS(["hv$i"],["hv$j"])
>>>>>>>> +        fi
>>>>>>>> +    done
>>>>>>>> +done
>>>>>>>> +
>>>>>>>> +for i in $(seq 1 $n_hv); do
>>>>>>>> +    as hv$i
>>>>>>>> +    lflow_run1=$(ovn-appctl -t ovn-controller
>> coverage/read-counter
>>>>>>> lflow_run)
>>>>>>>> +    lflow_run=`expr $lflow_run1 + $lflow_run`
>>>>>>>> +done
>>>>>>>> +
>>>>>>>> +add_switch_ports 1 1000 $n_hv 5
>>>>>>>> +
>>>>>>>> +wait_for_ports_up
>>>>>>>> +check ovn-nbctl --wait=hv sync
>>>>>>>> +
>>>>>>>> +for i in $(seq 1 $n_hv); do
>>>>>>>> +    pid=$(cat hv${i}/ovn-controller.pid)
>>>>>>>> +    u=$((u+$(cat "/proc/$pid/stat" | awk '{print $14}')))
>>>>>>>> +    s=$((s+$(cat "/proc/$pid/stat" | awk '{print $15}')))
>>>>>>>> +done
>>>>>>>> +
>>>>>>>> +n_pid=$(cat northd/ovn-northd.pid)
>>>>>>>> +n_u=$(cat "/proc/$pid/stat" | awk '{print $14}')
>>>>>>>> +n_s=$(cat "/proc/$pid/stat" | awk '{print $15}')
>>>>>>>> +
>>>>>>>> +echo "Total Northd User Time: $n_u"
>>>>>>>> +echo "Total Northd System Time: $n_s"
>>>>>>>> +echo "Total Controller User Time: $u"
>>>>>>>> +echo "Total Controller System Time: $s"
>>>>>>>> +
>>>>>>>> +lflow_run_end=0
>>>>>>>> +for i in $(seq 1 $n_hv); do
>>>>>>>> +    as hv$i
>>>>>>>> +    lflow_run1=$(ovn-appctl -t ovn-controller
>> coverage/read-counter
>>>>>>> lflow_run)
>>>>>>>> +    lflow_run_end=`expr $lflow_run1 + $lflow_run_end`
>>>>>>>> +done
>>>>>>>> +n_recomputes=`expr $lflow_run_end - $lflow_run`
>>>>>>>> +echo "$n_recomputes recomputes"
>>>>>>>> +
>>>>>>>> +AT_CHECK([test $lflow_run_end == $lflow_run])
>>>>>>>> +
>>>>>>>> +for i in $(seq 2 $n_hv); do
>>>>>>>> +    OVN_CLEANUP_SBOX([hv$i])
>>>>>>>> +done
>>>>>>>> +OVN_CLEANUP([hv1])
>>>>>>>> +AT_CLEANUP
>>>>>>>> +])
>>>>>>>> diff --git a/tests/perf-northd.at b/tests/perf-northd.at
>>>>>>>> index 74b69e9d4..6ec196b36 100644
>>>>>>>> --- a/tests/perf-northd.at
>>>>>>>> +++ b/tests/perf-northd.at
>>>>>>>> @@ -76,23 +76,6 @@ m4_define([PERF_RECORD_STOP], [
>>>>>>>>      PERF_RECORD_STOPWATCH(ovn-northd-loop, ["Short term average"],
>>>>>>> [Average (northd-loop in msec)])
>>>>>>>>  ])
>>>>>>>>
>>>>>>>> -# OVN_NBCTL([NBCTL_COMMAND])
>>>>>>>> -#
>>>>>>>> -# Add NBCTL_COMMAND to list of commands to be run by
>> RUN_OVN_NBCTL().
>>>>>>>> -#
>>>>>>>> -m4_define([OVN_NBCTL], [
>>>>>>>> -    command="${command} -- $1"
>>>>>>>> -])
>>>>>>>> -
>>>>>>>> -# RUN_OVN_NBCTL()
>>>>>>>> -#
>>>>>>>> -# Execute list of commands built by the OVN_NBCTL() macro.
>>>>>>>> -#
>>>>>>>> -m4_define([RUN_OVN_NBCTL], [
>>>>>>>> -    check ovn-nbctl ${command}
>>>>>>>> -    unset command
>>>>>>>> -])
>>>>>>>> -
>>>>>>>>  OVS_START_SHELL_HELPERS
>>>>>>>>  generate_subnet () {
>>>>>>>>      local a=$(printf %d $(expr $1 / 256 + 10))
>>>>>>>> --
>>>>>>>> 2.31.1
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>
Dumitru Ceara July 13, 2022, 9:28 a.m. UTC | #11
On 7/13/22 11:08, Dumitru Ceara wrote:
> On 7/13/22 09:40, Xavier Simonart wrote:
>> Hi Han, Dumitru
>>
> 
> Hi Han, Xavier,
> 
> Sorry, I had already replied to the previous message and only then
> noticed this one.
> 
>> I think that we should, as much as possible, try to achieve both goals:
>> - have an accurate ovn-installed
>> - do not increase latency in large scale deployments
>>
> 
> +1
> 
>> The fact that ovn-installed is sent too early for mc flows is already an
>> issue today, independent of this patch.
>> Fixing ovn-installed related to mc flows by delaying the state change (for
>> all cases, included when no mc groups) might be seen as a performance
>> regression.
>>
> 
> I think it will, and I'm not sure we can convince the CMS that this is
> "just metrics".
> 
>> I agree that we should fix this ovn-installed issue, but it is not a
>> regression added by this patch. We should enter a BZ for it.
>> Per my understanding, the mc flows are updated when the SB_multicast_group
>> is seen as updated by ovn-controller, due to its references to port binding.
>> Other flows related to port binding are installed earlier, i.e. when
>> ovn-controller writes port_binding->chassis (i.e. before it receives SB
>> confirmation). So, while sending the mc flows earlier than what we do today
>> might be more complex, I think it makes some kind of sense (we would send
>> all those flows within the same loop).
> 
> I'm inclining towards leaving it as it is today if this is the only flow
> we're missing.  It's a guess without testing things out, but I think
> it's for the MC_FLOOD_L2 multicast group which is used only for
> forwarding ARP packets originated by OVN routers or destined to a
> specific OVN router.  Losing some of those packets is not a big deal.
> 
> But it might be good to confirm that this is the MC group we install the
> flow for.
> 

Oh, well, the port is also part of the MC_FLOOD group.  This is however
only used for BUM traffic.  So losing some packets here is also not
terrible, I think.

> Thanks,
> Dumitru
> 
>>
>> Thanks
>> Xavier
>>
>>
>>
>> On Wed, Jul 13, 2022 at 8:28 AM Han Zhou <hzhou@ovn.org> wrote:
>>
>>>
>>>
>>> On Tue, Jul 12, 2022 at 12:35 AM Dumitru Ceara <dceara@redhat.com> wrote:
>>>>
>>>> On 7/12/22 08:52, Han Zhou wrote:
>>>>> On Mon, Jul 11, 2022 at 4:55 AM Dumitru Ceara <dceara@redhat.com>
>>> wrote:
>>>>>>
>>>>>> On 7/11/22 13:31, Xavier Simonart wrote:
>>>>>>> Hi Han
>>>>>>>
>>>>>>> Thanks for your review.
>>>>>>>
>>>>>>> Let me try to understand your two main concerns and the proper way to
>>>>> fix
>>>>>>> it.
>>>>>>> 1) We only try once to write pb->chassis. If the commit fails,
>>>>> pb->chassis
>>>>>>> is not written. As commit fails, we will recompute, but as the
>>>>>>> update_required flag is not set anymore, we might end up with no
>>>>>>> pb->chassis.
>>>>>>> => I'll remove the flag and try to update until it's confirmed.
>>>>>
>>>>> Thank you!
>>>>>
>>>>>>> 2) The state machine, and when we move to INSTALL_FLOWS. Serializing
>>> the
>>>>>>> state machine, by waiting for confirmation to be received before
>>> moving
>>>>> to
>>>>>>> INSTALL_FLOWS state will delay the ovn-installed compared to today.
>>> So
>>>>> I am
>>>>>>> (still) trying to see if there is any way to prevent this in some
>>> cases.
>>>>>>> Would it be correct to do this serialization (wait for pb->chassis
>>>>> update
>>>>>>> confirmation) only when using conditional monitoring? When using
>>>>>>> monitor-all, as soon as we have written (w/o confirmation)
>>> pb->chassis,
>>>>> we
>>>>>>> would move to INSTALL_FLOWS. In that loop where we wrote pb->chassis,
>>>>> (all)
>>>>>>> the flows should be updated taking into account pb->chassis.
>>>>>>
>>>>>> I think this approach is OK.  ovn-controllers that don't use
>>> conditional
>>>>>> monitoring already know the complete SB contents and should be able to
>>>>>> install (mostly?) complete sets of openflows that correspond to a
>>> given
>>>>>> Port_Binding.
>>>>>>
>>>>>> It's also "CMS-friendly", at least for ovn-kubernetes, which uses
>>>>>> ovn-monitor-all=true and only waits for
>>>>>> OVS.Interface.external_ids:ovn-installed=true.  And shouldn't impact
>>> the
>>>>>> others which wait for SB.Port_Binding.up=true.
>>>>>>
>>>>>
>>>>> Xavier and Dumitru, I think we shouldn't introduce the dependency of
>>>>> "ovn-monitor-all" setting here.
>>>>>
>>>>> First of all, ovn-installed is a flag for CMS to understand that all
>>> the
>>>>> flows related to the port-binding is installed. If we set the flag
>>> before
>>>>> it is truly completed, it is a bug, and it is possible that the flag
>>> is set
>>>>> but some traffic doesn't work.
>>>>
>>>> I think it's a matter of semantics.  The way I see "ovn-installed=true"
>>>> is: all flows that are relevant to the port binidng on the local chassis
>>>> have been installed.  When we added it it was for the ovn-k8s case;
>>>> ovn-k8s used to explicitly check if some openflow tables on the node
>>>> where the pod is being brought up contained flows that seemed to
>>>> correspond to the pod (e.g., matching on pod mac and IP addresses).
>>>>
>>> But the purpose of checking the flows (when ovn-installed wasn't
>>> available) was to make sure the pod is ready to send/receive traffic. If
>>> ovn-installed can provide more accuracy, why not?
>>>
>>>>> I did a quick test, and at least a flow in (table_id=38, priority=100)
>>>>> which is multicast-group related is updated AFTER the SB notification
>>> is
>>>>> received for the port-binding chassis update.
>>>>>
>>>>
>>>> This sounds like something we should fix, I think.  I don't see any
>>>> multicast-group changes conditioned by the port_binding being up=true in
>>>> northd.  I might be wrong though.
>>>>
>>>
>>> It is not about "up=true". It is triggered by the port-binding->chassis
>>> update. Since multicast-group has reference to port-binding, so a
>>> port-binding update triggers multicast-group change handling, which is
>>> required because physical flows related to the MC group need to be updated
>>> when port-binding->chassis is updated. You may argue that the IDL may be
>>> optimized so that the MC group change can be triggered and handled before
>>> SB is updated, but I am not sure if the benefit is worth the complexity.
>>> Given how OVSDB IDL transaction is designed, I'd always think a DB record
>>> is *formally* updated only after the update notification is received from
>>> the server, which seems to be safe and clear.
>>>
>>>>> Secondly, if the change hasn't made it to the SB, all the other nodes
>>> would
>>>>> not be able to reach the port, which means the workload (pod/VM) cannot
>>>>> receive traffic yet at this phase.
>>>>>
>>>>
>>>> Even if the change made it to the SB we have no way of knowing that all
>>>> other nodes processed it so we cannot know for sure that traffic can
>>>> flow properly end-to-end.  But, like I said above, this doesn't matter
>>>> if the semantics of ovn-installed=true are "all locally relevant flows
>>>> are installed".
>>>>
>>> It's true that even SB is updated it doesn't ensure all the nodes
>>> processed it, but I view it this way: at least from the current node's
>>> point of view, its job is done and the other nodes are beyond its control.
>>> On the other hand, if SB update failed, its job is not done yet. I am not
>>> saying this is the only *correct* way, but just the way I am seeing it :).
>>>
>>>>> So, I think our goal is not to set ovn-installed early, but to set it
>>>>> accurately (sometime may be ok to be conservative).
>>>>>
>>>>
>>>> Sure, but waiting for the SB port_binding.chassis update might introduce
>>>> significant spikes in latency if the SB is compacting (or just busy) at
>>>> that moment.
>>>>
>>>> This might become an issue in large scale deployments as pods will take
>>>> longer to be declared "ready".
>>>>
>>> I understand your concern, but if you think about it, no matter how the
>>> pods are *declared* ready doesn't change the fact it is ready or not. It
>>> doesn't make the real flow setup faster or slower.
>>> If the CMS really wants to declare it ready earlier, it can just ignore
>>> the ovn-installed flag check or flow check. What's the real benefit except
>>> for metrics?
>>>
>>>>> In addition, ovn-monitor-all is not always true even in ovn-k8s. It is
>>>>> configurable in ovn-k8s. (in our environment we set it to false, to
>>> save
>>>>> MEM and CPU for worker nodes, while sacrifice a little for the central
>>> SB
>>>>> DB)
>>>>>
>>>>
>>>> Ack.  But for this case specifically, as SB is already busier with
>>>> conditional monitoring, I think serializing events in ovn-controller
>>>> will create even more visible delays in pod bringup times.
>>>>
>>>> Thanks,
>>>> Dumitru
>>>>
>>>>> Thanks,
>>>>> Han
>>>>>>>
>>>>>>> Thanks again for your feedback
>>>>>>>
>>>>>>> Xavier
>>>>>>>
>>>>>>> On Thu, Jul 7, 2022 at 8:49 AM Han Zhou <hzhou@ovn.org> wrote:
>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Wed, Jun 22, 2022 at 3:23 AM Xavier Simonart <
>>> xsimonar@redhat.com>
>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>> When VIF ports are claimed on a chassis, SBDB Port_Binding table is
>>>>>>>> updated.
>>>>>>>>> If the SBDB IDL is still is read-only ("in transaction") when such
>>> a
>>>>>>>> update
>>>>>>>>> is required, the update is not possible and recompute is triggered
>>>>>>>> through
>>>>>>>>> I+P failure.
>>>>>>>>>
>>>>>>>>> This situation can happen:
>>>>>>>>> - after updating Port_Binding->chassis to SBDB for one port, in a
>>>>>>>> following
>>>>>>>>>   iteration, ovn-controller handles
>>>>> Interface:external_ids:ovn-installed
>>>>>>>>>   (for the same port) while SBDB is still read-only.
>>>>>>>>> - after updating Port_Binding->chassis to SBDB for one port, in a
>>>>>>>> following
>>>>>>>>>   iteration, ovn-controller updates Port_Binding->chassis for
>>> another
>>>>>>>> port,
>>>>>>>>>   while SBDB is still read-only.
>>>>>>>>>
>>>>>>>>> This patch prevent the recompute, by having the if-status module
>>>>>>>>> updating the Port_Binding chassis (if needed) when possible.
>>>>>>>>> This does not delay Port_Binding chassis update compared to before
>>>>> this
>>>>>>>> patch.
>>>>>>>>> - With the patch, Port_Binding chassis will be updated as soon as
>>>>> SBDB is
>>>>>>>>> again writable, without recompute.
>>>>>>>>> - Without the patch, Port_Binding chassis was updated as soon as
>>> SBDB
>>>>> was
>>>>>>>>> again writable, through a recompute.
>>>>>>>>>
>>>>>>>>> As part of this patch, ovn-installed will not be updated for
>>>>> additional
>>>>>>>> chassis;
>>>>>>>>> it will only be updated when the migration is completed.
>>>>>>>>>
>>>>>>>>> Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2059253
>>>>>>>>> Signed-off-by: Xavier Simonart <xsimonar@redhat.com>
>>>>>>>>>
>>>>>>>>> ---
>>>>>>>>> v2:  - handled Dumitru's comments.
>>>>>>>>>      - handled Han's comments, mainly ensure we moved out of
>>> CLAIMED
>>>>>>>> state
>>>>>>>>>        only after updating pb->chassis to guarentee physical flows
>>> are
>>>>>>>> installed
>>>>>>>>>        when ovn-installed is updated in OVS.
>>>>>>>>>      - slighly reorganize the code to isolate 'notify_up = false'
>>>>> cases
>>>>>>>> in
>>>>>>>>>        claim_port (i.e. ports such as virtual ports), in the idea
>>> of
>>>>>>>> making
>>>>>>>>>        future patch preventing recomputes when virtual ports are
>>>>> claimed.
>>>>>>>>>      - updated test case to cause more race conditions.
>>>>>>>>>      - rebased on origin/main
>>>>>>>>>      - note that "additional chassis" as now supported by
>>>>>>>>>        "Support LSP:options:requested-chassis as a list" might
>>> still
>>>>>>>> cause
>>>>>>>>>        recomputes.
>>>>>>>>>      - fixed missing flows when Port_Binding chassis was updated by
>>>>>>>> mgr_update
>>>>>>>>>        w/o any lflow recalculation.
>>>>>>>>> v3:  - handled Dumitru's comments on v2, mainly have runtime_data
>>>>> handler
>>>>>>>>>        handling pb_claims when sb becomes writable (instead of a
>>> lflow
>>>>>>>> handler).
>>>>>>>>>      - fixed test as it was not checking recomputes on all hv, as
>>> well
>>>>>>>> as a flaky
>>>>>>>>>        behavior.
>>>>>>>>>      - rebased on origin/main.
>>>>>>>>> ---
>>>>>>>>>  controller/binding.c        | 154 +++++++++++++++++++++----------
>>>>>>>>>  controller/binding.h        |  15 +++-
>>>>>>>>>  controller/if-status.c      | 174
>>>>> ++++++++++++++++++++++++++++++++----
>>>>>>>>>  controller/if-status.h      |  16 +++-
>>>>>>>>>  controller/ovn-controller.c |  72 ++++++++++++++-
>>>>>>>>>  tests/ovn-macros.at         |  12 +++
>>>>>>>>>  tests/ovn.at                | 147 +++++++++++++++++++++++++++++-
>>>>>>>>>  tests/perf-northd.at        |  17 ----
>>>>>>>>>  8 files changed, 519 insertions(+), 88 deletions(-)
>>>>>>>>>
>>>>>>>>> diff --git a/controller/binding.c b/controller/binding.c
>>>>>>>>> index 2279570f9..b21577f71 100644
>>>>>>>>> --- a/controller/binding.c
>>>>>>>>> +++ b/controller/binding.c
>>>>>>>>> @@ -644,11 +644,17 @@ local_binding_get_lport_ofport(const struct
>>>>> shash
>>>>>>>> *local_bindings,
>>>>>>>>>  }
>>>>>>>>>
>>>>>>>>>  bool
>>>>>>>>> -local_binding_is_up(struct shash *local_bindings, const char
>>>>> *pb_name)
>>>>>>>>> +local_binding_is_up(struct shash *local_bindings, const char
>>>>> *pb_name,
>>>>>>>>> +                    const struct sbrec_chassis *chassis_rec)
>>>>>>>>>  {
>>>>>>>>>      struct local_binding *lbinding =
>>>>>>>>>          local_binding_find(local_bindings, pb_name);
>>>>>>>>>      struct binding_lport *b_lport =
>>>>>>>> local_binding_get_primary_lport(lbinding);
>>>>>>>>> +
>>>>>>>>> +    if (b_lport && b_lport->pb->chassis != chassis_rec) {
>>>>>>>>> +        return false;
>>>>>>>>> +    }
>>>>>>>>> +
>>>>>>>>>      if (lbinding && b_lport && lbinding->iface) {
>>>>>>>>>          if (b_lport->pb->n_up && !b_lport->pb->up[0]) {
>>>>>>>>>              return false;
>>>>>>>>> @@ -660,13 +666,23 @@ local_binding_is_up(struct shash
>>>>> *local_bindings,
>>>>>>>> const char *pb_name)
>>>>>>>>>  }
>>>>>>>>>
>>>>>>>>>  bool
>>>>>>>>> -local_binding_is_down(struct shash *local_bindings, const char
>>>>> *pb_name)
>>>>>>>>> +local_binding_is_down(struct shash *local_bindings, const char
>>>>> *pb_name,
>>>>>>>>> +                      const struct sbrec_chassis *chassis_rec)
>>>>>>>>>  {
>>>>>>>>>      struct local_binding *lbinding =
>>>>>>>>>          local_binding_find(local_bindings, pb_name);
>>>>>>>>>
>>>>>>>>>      struct binding_lport *b_lport =
>>>>>>>> local_binding_get_primary_lport(lbinding);
>>>>>>>>>
>>>>>>>>> +    if (b_lport) {
>>>>>>>>> +        if (b_lport->pb->chassis == chassis_rec) {
>>>>>>>>> +            return false;
>>>>>>>>> +        } else if (b_lport->pb->chassis) {
>>>>>>>>> +            VLOG_DBG("lport %s already claimed by other chassis",
>>>>>>>>> +                     b_lport->pb->logical_port);
>>>>>>>>> +        }
>>>>>>>>> +    }
>>>>>>>>> +
>>>>>>>>>      if (!lbinding) {
>>>>>>>>>          return true;
>>>>>>>>>      }
>>>>>>>>> @@ -884,37 +900,80 @@ get_lport_type_str(enum en_lport_type
>>>>> lport_type)
>>>>>>>>>      OVS_NOT_REACHED();
>>>>>>>>>  }
>>>>>>>>>
>>>>>>>>> -/* For newly claimed ports, if 'notify_up' is 'false':
>>>>>>>>> +void
>>>>>>>>> +set_pb_chassis_in_sbrec(const struct sbrec_port_binding *pb,
>>>>>>>>> +                        const struct sbrec_chassis *chassis_rec,
>>>>>>>>> +                        bool is_set)
>>>>>>>>> +{
>>>>>>>>> +    if (pb->chassis != chassis_rec) {
>>>>>>>>> +         if (is_set) {
>>>>>>>>> +            if (pb->chassis) {
>>>>>>>>> +                VLOG_INFO("Changing chassis for lport %s from %s
>>> to
>>>>>>>> %s.",
>>>>>>>>> +                          pb->logical_port, pb->chassis->name,
>>>>>>>>> +                          chassis_rec->name);
>>>>>>>>> +            } else {
>>>>>>>>> +                VLOG_INFO("Claiming lport %s for this chassis.",
>>>>>>>>> +                          pb->logical_port);
>>>>>>>>> +            }
>>>>>>>>> +            for (int i = 0; i < pb->n_mac; i++) {
>>>>>>>>> +                VLOG_INFO("%s: Claiming %s", pb->logical_port,
>>>>>>>> pb->mac[i]);
>>>>>>>>> +            }
>>>>>>>>> +            sbrec_port_binding_set_chassis(pb, chassis_rec);
>>>>>>>>> +        }
>>>>>>>>> +    } else if (!is_set) {
>>>>>>>>> +        sbrec_port_binding_set_chassis(pb, NULL);
>>>>>>>>> +    }
>>>>>>>>> +}
>>>>>>>>> +
>>>>>>>>> +void
>>>>>>>>> +local_binding_set_pb(struct shash *local_bindings, const char
>>>>> *pb_name,
>>>>>>>>> +                     const struct sbrec_chassis *chassis_rec,
>>>>>>>>> +                     struct hmap *tracked_datapaths, bool is_set)
>>>>>>>>> +{
>>>>>>>>> +    struct local_binding *lbinding =
>>>>>>>>> +        local_binding_find(local_bindings, pb_name);
>>>>>>>>> +    struct binding_lport *b_lport =
>>>>>>>> local_binding_get_primary_lport(lbinding);
>>>>>>>>> +
>>>>>>>>> +    if (b_lport) {
>>>>>>>>> +        set_pb_chassis_in_sbrec(b_lport->pb, chassis_rec, is_set);
>>>>>>>>> +        if (tracked_datapaths) {
>>>>>>>>> +            update_lport_tracking(b_lport->pb, tracked_datapaths,
>>>>> true);
>>>>>>>>> +        }
>>>>>>>>> +    }
>>>>>>>>> +}
>>>>>>>>> +
>>>>>>>>> +/* For newly claimed ports:
>>>>>>>>>   * - set the 'pb.up' field to true if 'pb' has no 'parent_pb'.
>>>>>>>>>   * - set the 'pb.up' field to true if 'parent_pb.up' is 'true'
>>> (e.g.,
>>>>>>>> for
>>>>>>>>>   *   container and virtual ports).
>>>>>>>>> - * Otherwise request a notification to be sent when the OVS flows
>>>>>>>>> - * corresponding to 'pb' have been installed.
>>>>>>>>> + *
>>>>>>>>> + * Returns false if lport is not claimed due to 'sb_readonly'.
>>>>>>>>> + * Returns true otherwise.
>>>>>>>>>   *
>>>>>>>>>   * Note:
>>>>>>>>> - *   Updates (directly or through a notification) the 'pb->up'
>>> field
>>>>>>>> only if
>>>>>>>>> - *   it's explicitly set to 'false'.
>>>>>>>>> + *   Updates the 'pb->up' field only if it's explicitly set to
>>>>> 'false'.
>>>>>>>>>   *   This is to ensure compatibility with older versions of
>>>>> ovn-northd.
>>>>>>>>>   */
>>>>>>>>> -static void
>>>>>>>>> +static bool
>>>>>>>>>  claimed_lport_set_up(const struct sbrec_port_binding *pb,
>>>>>>>>>                       const struct sbrec_port_binding *parent_pb,
>>>>>>>>> -                     const struct sbrec_chassis *chassis_rec,
>>>>>>>>> -                     bool notify_up, struct if_status_mgr *if_mgr)
>>>>>>>>> +                     bool sb_readonly)
>>>>>>>>>  {
>>>>>>>>> -    if (!notify_up) {
>>>>>>>>> -        bool up = true;
>>>>>>>>> -        if (!parent_pb || (parent_pb->n_up && parent_pb->up[0])) {
>>>>>>>>> +    /* When notify_up is false in claim_port(), no state is
>>> created
>>>>>>>>> +     * by if_status_mgr. In such cases, return false (i.e. trigger
>>>>>>>> recompute)
>>>>>>>>> +     * if we can't update sb (because it is readonly).
>>>>>>>>> +     */
>>>>>>>>> +    bool up = true;
>>>>>>>>> +    if (!parent_pb || (parent_pb->n_up && parent_pb->up[0])) {
>>>>>>>>> +        if (!sb_readonly) {
>>>>>>>>>              if (pb->n_up) {
>>>>>>>>>                  sbrec_port_binding_set_up(pb, &up, 1);
>>>>>>>>>              }
>>>>>>>>> +        } else if (pb->n_up && !pb->up[0]) {
>>>>>>>>> +            return false;
>>>>>>>>>          }
>>>>>>>>> -        return;
>>>>>>>>> -    }
>>>>>>>>> -
>>>>>>>>> -    if (pb->chassis != chassis_rec || (pb->n_up && !pb->up[0])) {
>>>>>>>>> -        if_status_mgr_claim_iface(if_mgr, pb->logical_port);
>>>>>>>>>      }
>>>>>>>>> +    return true;
>>>>>>>>>  }
>>>>>>>>>
>>>>>>>>>  typedef void (*set_func)(const struct sbrec_port_binding *pb,
>>>>>>>>> @@ -1057,37 +1116,35 @@ claim_lport(const struct sbrec_port_binding
>>>>> *pb,
>>>>>>>>>              struct hmap *tracked_datapaths,
>>>>>>>>>              struct if_status_mgr *if_mgr)
>>>>>>>>>  {
>>>>>>>>> -    if (!sb_readonly) {
>>>>>>>>> -        claimed_lport_set_up(pb, parent_pb, chassis_rec,
>>> notify_up,
>>>>>>>> if_mgr);
>>>>>>>>> -    }
>>>>>>>>> -
>>>>>>>>>      enum can_bind can_bind =
>>>>>>>> lport_can_bind_on_this_chassis(chassis_rec, pb);
>>>>>>>>>      bool update_tracked = false;
>>>>>>>>>
>>>>>>>>>      if (can_bind == CAN_BIND_AS_MAIN) {
>>>>>>>>>          if (pb->chassis != chassis_rec) {
>>>>>>>>> -            if (sb_readonly) {
>>>>>>>>> -                return false;
>>>>>>>>> -            }
>>>>>>>>> -
>>>>>>>>> -            if (pb->chassis) {
>>>>>>>>> -                VLOG_INFO("Changing chassis for lport %s from %s
>>> to
>>>>>>>> %s.",
>>>>>>>>> -                        pb->logical_port, pb->chassis->name,
>>>>>>>>> -                        chassis_rec->name);
>>>>>>>>> -            } else {
>>>>>>>>> -                VLOG_INFO("Claiming lport %s for this chassis.",
>>>>>>>>> -                          pb->logical_port);
>>>>>>>>> -            }
>>>>>>>>> -            for (size_t i = 0; i < pb->n_mac; i++) {
>>>>>>>>> -                VLOG_INFO("%s: Claiming %s", pb->logical_port,
>>>>>>>> pb->mac[i]);
>>>>>>>>> -            }
>>>>>>>>> -
>>>>>>>>> -            sbrec_port_binding_set_chassis(pb, chassis_rec);
>>>>>>>>>              if (is_additional_chassis(pb, chassis_rec)) {
>>>>>>>>> +                if (sb_readonly) {
>>>>>>>>> +                    return false;
>>>>>>>>> +                }
>>>>>>>>>                  remove_additional_chassis(pb, chassis_rec);
>>>>>>>>>              }
>>>>>>>>>              update_tracked = true;
>>>>>>>>>          }
>>>>>>>>> +        if (!notify_up) {
>>>>>>>>> +            if (!claimed_lport_set_up(pb, parent_pb,
>>> sb_readonly)) {
>>>>>>>>> +                return false;
>>>>>>>>> +            }
>>>>>>>>> +            if (pb->chassis != chassis_rec) {
>>>>>>>>> +                if (sb_readonly) {
>>>>>>>>> +                    return false;
>>>>>>>>> +                }
>>>>>>>>> +                set_pb_chassis_in_sbrec(pb, chassis_rec, true);
>>>>>>>>> +            }
>>>>>>>>> +        } else {
>>>>>>>>> +            if ((pb->chassis != chassis_rec) || (pb->n_up &&
>>>>>>>> !pb->up[0])) {
>>>>>>>>> +                if_status_mgr_claim_iface(if_mgr, pb, chassis_rec,
>>>>>>>>> +                                          sb_readonly);
>>>>>>>>> +            }
>>>>>>>>> +        }
>>>>>>>>>      } else if (can_bind == CAN_BIND_AS_ADDITIONAL) {
>>>>>>>>>          if (!is_additional_chassis(pb, chassis_rec)) {
>>>>>>>>>              if (sb_readonly) {
>>>>>>>>> @@ -1132,7 +1189,8 @@ claim_lport(const struct sbrec_port_binding
>>> *pb,
>>>>>>>>>   */
>>>>>>>>>  static bool
>>>>>>>>>  release_lport_main_chassis(const struct sbrec_port_binding *pb,
>>>>>>>>> -                           bool sb_readonly)
>>>>>>>>> +                           bool sb_readonly,
>>>>>>>>> +                           struct if_status_mgr *if_mgr)
>>>>>>>>>  {
>>>>>>>>>      if (pb->encap) {
>>>>>>>>>          if (sb_readonly) {
>>>>>>>>> @@ -1141,11 +1199,13 @@ release_lport_main_chassis(const struct
>>>>>>>> sbrec_port_binding *pb,
>>>>>>>>>          sbrec_port_binding_set_encap(pb, NULL);
>>>>>>>>>      }
>>>>>>>>>
>>>>>>>>> +    /* If sb readonly, pb->chassis unset through if-status if
>>>>> present.
>>>>>>>> */
>>>>>>>>>      if (pb->chassis) {
>>>>>>>>> -        if (sb_readonly) {
>>>>>>>>> +        if (!sb_readonly) {
>>>>>>>>> +            sbrec_port_binding_set_chassis(pb, NULL);
>>>>>>>>> +        } else if (!if_status_mgr_iface_is_present(if_mgr,
>>>>>>>> pb->logical_port)) {
>>>>>>>>>              return false;
>>>>>>>>>          }
>>>>>>>>> -        sbrec_port_binding_set_chassis(pb, NULL);
>>>>>>>>>      }
>>>>>>>>>
>>>>>>>>>      if (pb->virtual_parent) {
>>>>>>>>> @@ -1155,7 +1215,8 @@ release_lport_main_chassis(const struct
>>>>>>>> sbrec_port_binding *pb,
>>>>>>>>>          sbrec_port_binding_set_virtual_parent(pb, NULL);
>>>>>>>>>      }
>>>>>>>>>
>>>>>>>>> -    VLOG_INFO("Releasing lport %s from this chassis.",
>>>>>>>> pb->logical_port);
>>>>>>>>> +    VLOG_INFO("Releasing lport %s from this chassis
>>>>> (sb_readonly=%d)",
>>>>>>>>> +              pb->logical_port, sb_readonly);
>>>>>>>>>      return true;
>>>>>>>>>  }
>>>>>>>>>
>>>>>>>>> @@ -1189,7 +1250,7 @@ release_lport(const struct sbrec_port_binding
>>>>> *pb,
>>>>>>>>>                struct hmap *tracked_datapaths, struct if_status_mgr
>>>>>>>> *if_mgr)
>>>>>>>>>  {
>>>>>>>>>      if (pb->chassis == chassis_rec) {
>>>>>>>>> -        if (!release_lport_main_chassis(pb, sb_readonly)) {
>>>>>>>>> +        if (!release_lport_main_chassis(pb, sb_readonly, if_mgr))
>>> {
>>>>>>>>>              return false;
>>>>>>>>>          }
>>>>>>>>>      } else if (is_additional_chassis(pb, chassis_rec)) {
>>>>>>>>> @@ -1271,7 +1332,7 @@ consider_vif_lport_(const struct
>>>>>>>> sbrec_port_binding *pb,
>>>>>>>>>                               b_lport->lbinding->iface,
>>>>>>>>>                               !b_ctx_in->ovnsb_idl_txn,
>>>>>>>>>                               !parent_pb,
>>>>> b_ctx_out->tracked_dp_bindings,
>>>>>>>>> -                             b_ctx_out->if_mgr)){
>>>>>>>>> +                             b_ctx_out->if_mgr)) {
>>>>>>>>>                  return false;
>>>>>>>>>              }
>>>>>>>>>
>>>>>>>>> @@ -1527,7 +1588,8 @@ consider_localport(const struct
>>>>> sbrec_port_binding
>>>>>>>> *pb,
>>>>>>>>>      enum can_bind can_bind = lport_can_bind_on_this_chassis(
>>>>>>>>>          b_ctx_in->chassis_rec, pb);
>>>>>>>>>      if (can_bind == CAN_BIND_AS_MAIN) {
>>>>>>>>> -        if (!release_lport_main_chassis(pb,
>>>>> !b_ctx_in->ovnsb_idl_txn)) {
>>>>>>>>> +        if (!release_lport_main_chassis(pb,
>>> !b_ctx_in->ovnsb_idl_txn,
>>>>>>>>> +            b_ctx_out->if_mgr)) {
>>>>>>>>>              return false;
>>>>>>>>>          }
>>>>>>>>>      } else if (can_bind == CAN_BIND_AS_ADDITIONAL) {
>>>>>>>>> diff --git a/controller/binding.h b/controller/binding.h
>>>>>>>>> index 1fed06674..d20659b0b 100644
>>>>>>>>> --- a/controller/binding.h
>>>>>>>>> +++ b/controller/binding.h
>>>>>>>>> @@ -151,8 +151,10 @@ const struct sbrec_port_binding
>>>>>>>> *local_binding_get_primary_pb(
>>>>>>>>>  ofp_port_t local_binding_get_lport_ofport(const struct shash
>>>>>>>> *local_bindings,
>>>>>>>>>                                            const char *pb_name);
>>>>>>>>>
>>>>>>>>> -bool local_binding_is_up(struct shash *local_bindings, const char
>>>>>>>> *pb_name);
>>>>>>>>> -bool local_binding_is_down(struct shash *local_bindings, const
>>> char
>>>>>>>> *pb_name);
>>>>>>>>> +bool local_binding_is_up(struct shash *local_bindings, const char
>>>>>>>> *pb_name,
>>>>>>>>> +                         const struct sbrec_chassis *);
>>>>>>>>> +bool local_binding_is_down(struct shash *local_bindings, const
>>> char
>>>>>>>> *pb_name,
>>>>>>>>> +                           const struct sbrec_chassis *);
>>>>>>>>>  void local_binding_set_up(struct shash *local_bindings, const char
>>>>>>>> *pb_name,
>>>>>>>>>                            const struct sbrec_chassis *chassis_rec,
>>>>>>>>>                            const char *ts_now_str, bool
>>> sb_readonly,
>>>>>>>>> @@ -160,7 +162,10 @@ void local_binding_set_up(struct shash
>>>>>>>> *local_bindings, const char *pb_name,
>>>>>>>>>  void local_binding_set_down(struct shash *local_bindings, const
>>> char
>>>>>>>> *pb_name,
>>>>>>>>>                              const struct sbrec_chassis
>>> *chassis_rec,
>>>>>>>>>                              bool sb_readonly, bool ovs_readonly);
>>>>>>>>> -
>>>>>>>>> +void local_binding_set_pb(struct shash *local_bindings, const char
>>>>>>>> *pb_name,
>>>>>>>>> +                          const struct sbrec_chassis *chassis_rec,
>>>>>>>>> +                          struct hmap *tracked_datapaths,
>>>>>>>>> +                          bool is_set);
>>>>>>>>>  void binding_register_ovs_idl(struct ovsdb_idl *);
>>>>>>>>>  void binding_run(struct binding_ctx_in *, struct binding_ctx_out
>>> *);
>>>>>>>>>  bool binding_cleanup(struct ovsdb_idl_txn *ovnsb_idl_txn,
>>>>>>>>> @@ -178,6 +183,10 @@ void binding_dump_local_bindings(struct
>>>>>>>> local_binding_data *, struct ds *);
>>>>>>>>>  bool is_additional_chassis(const struct sbrec_port_binding *pb,
>>>>>>>>>                             const struct sbrec_chassis
>>> *chassis_rec);
>>>>>>>>>
>>>>>>>>> +void set_pb_chassis_in_sbrec(const struct sbrec_port_binding *pb,
>>>>>>>>> +                             const struct sbrec_chassis
>>> *chassis_rec,
>>>>>>>>> +                             bool is_set);
>>>>>>>>> +
>>>>>>>>>  /* Corresponds to each Port_Binding.type. */
>>>>>>>>>  enum en_lport_type {
>>>>>>>>>      LP_UNKNOWN,
>>>>>>>>> diff --git a/controller/if-status.c b/controller/if-status.c
>>>>>>>>> index ad61844d8..7693c289b 100644
>>>>>>>>> --- a/controller/if-status.c
>>>>>>>>> +++ b/controller/if-status.c
>>>>>>>>> @@ -24,6 +24,7 @@
>>>>>>>>>  #include "lib/util.h"
>>>>>>>>>  #include "timeval.h"
>>>>>>>>>  #include "openvswitch/vlog.h"
>>>>>>>>> +#include "lib/ovn-sb-idl.h"
>>>>>>>>>
>>>>>>>>>  VLOG_DEFINE_THIS_MODULE(if_status);
>>>>>>>>>
>>>>>>>>> @@ -53,9 +54,11 @@ VLOG_DEFINE_THIS_MODULE(if_status);
>>>>>>>>>   */
>>>>>>>>>
>>>>>>>>>  enum if_state {
>>>>>>>>> -    OIF_CLAIMED,       /* Newly claimed interface. */
>>>>>>>>> -    OIF_INSTALL_FLOWS, /* Already claimed interface for which
>>> flows
>>>>> are
>>>>>>>> still
>>>>>>>>> -                        * being installed.
>>>>>>>>> +    OIF_CLAIMED,       /* Newly claimed interface. pb->chassis not
>>>>> yet
>>>>>>>> updated.
>>>>>>>>> +                        */
>>>>>>>>> +    OIF_INSTALL_FLOWS, /* Claimed interface with pb->chassis
>>>>>>>> successfully
>>>>>>>>> +                        * updated in SB and for which flows are
>>> still
>>>>>>>> being
>>>>>>>>> +                        * installed.
>>>>>>>>>                          */
>>>>>>>>>      OIF_MARK_UP,       /* Interface with flows successfully
>>> installed
>>>>>>>> in OVS
>>>>>>>>>                          * but not yet marked "up" in the binding
>>>>> module
>>>>>>>> (in
>>>>>>>>> @@ -78,6 +81,55 @@ static const char *if_state_names[] = {
>>>>>>>>>      [OIF_INSTALLED]     = "INSTALLED",
>>>>>>>>>  };
>>>>>>>>>
>>>>>>>>> +/*
>>>>>>>>> + *       +----------------------+
>>>>>>>>> + * +---> |                      |
>>>>>>>>> + * | +-> |         NULL         |
>>>>>>>> <--------------------------------------+++-+
>>>>>>>>> + * | |   +----------------------+
>>>>>>>>      |
>>>>>>>>> + * | |     ^ release_iface   | claim_iface
>>>>>>>>       |
>>>>>>>>> + * | |     |                 V - sbrec_update_chassis(if sb is rw)
>>>>>>>>       |
>>>>>>>>> + * | |   +----------------------+
>>>>>>>>      |
>>>>>>>>> + * | |   |                      |
>>>>>>>> <----------------------------------------+ |
>>>>>>>>> + * | |   |       CLAIMED        |
>>>>>>>> <--------------------------------------+ | |
>>>>>>>>> + * | |   +----------------------+
>>>>>>>>  | | |
>>>>>>>>> + * | |                  | mgr_update(when sb is rw)
>>>>>>>>  | | |
>>>>>>>>> + * | | release_iface    |  - sbrec_update_chassis
>>>>>>>>  | | |
>>>>>>>>> + * | |                  |  - request seqno
>>>>>>>>   | | |
>>>>>>>>> + * | |                  V
>>>>>>>>  | | |
>>>>>>>>> + * | |   +----------------------+
>>>>>>>>  | | |
>>>>>>>>> + * | +-- |                      |  mgr_run(seqno not rcvd)
>>>>>>>>   | | |
>>>>>>>>> + * |     |    INSTALL_FLOWS     |   - set port down in sb
>>>>>>>>  | | |
>>>>>>>>> + * |     |                      |  mgr_update()
>>>>>>>>  | | |
>>>>>>>>> + * |     +----------------------+   - sbrec_update_chassis if
>>> needed
>>>>>>>>   | | |
>>>>>>>>> + * |                    |
>>>>>>>>  | | |
>>>>>>>>> + * |                    |  mgr_run(seqno rcvd)
>>>>>>>>   | | |
>>>>>>>>> + * |                    |  - set port up in sb
>>>>>>>>   | | |
>>>>>>>>> + * | release_iface      |  - set ovn-installed in ovs
>>>>>>>>  | | |
>>>>>>>>> + * |                    V
>>>>>>>>  | | |
>>>>>>>>> + * |   +----------------------+
>>>>>>>>  | | |
>>>>>>>>> + * |   |                      |  mgr_run()
>>>>>>>>   | | |
>>>>>>>>> + * +-- |       MARK_UP        |  - set port up in sb
>>>>>>>>   | | |
>>>>>>>>> + *     |                      |  - set ovn-installed in ovs
>>>>>>>>  | | |
>>>>>>>>> + *     |                      |  mgr_update()
>>>>>>>>  | | |
>>>>>>>>> + *     +----------------------+  - sbrec_update_chassis if needed
>>>>>>>>  | | |
>>>>>>>>> + *              |
>>>>>>>>  | | |
>>>>>>>>> + *              | mgr_update(rcvd port up / ovn_installed &
>>> chassis
>>>>>>>> set) | | |
>>>>>>>>> + *              V
>>>>>>>>  | | |
>>>>>>>>> + *     +----------------------+
>>>>>>>>  | | |
>>>>>>>>> + *     |      INSTALLED       | ------------> claim_iface
>>>>>>>> ---------------+ | |
>>>>>>>>> + *     +----------------------+
>>>>>>>>    | |
>>>>>>>>> + *              |
>>>>>>>>    | |
>>>>>>>>> + *              | release_iface
>>>>>>>>    | |
>>>>>>>>> + *              V
>>>>>>>>    | |
>>>>>>>>> + *     +----------------------+
>>>>>>>>    | |
>>>>>>>>> + *     |                      | ------------> claim_iface
>>>>>>>> -----------------+ |
>>>>>>>>> + *     |      MARK_DOWN       | ------> mgr_update(rcvd port down)
>>>>>>>> ----------+
>>>>>>>>> + *     |                      | mgr_run()
>>>>>>>>> + *     |                      | - set port down in sb
>>>>>>>>> + *     |                      | mgr_update()
>>>>>>>>> + *     +----------------------+ - sbrec_update_chassis(NULL)
>>>>>>>>> + */
>>>>>>>>> +
>>>>>>>>>  struct ovs_iface {
>>>>>>>>>      char *id;               /* Extracted from OVS
>>>>>>>> external_ids.iface_id. */
>>>>>>>>>      enum if_state state;    /* State of the interface in the state
>>>>>>>> machine. */
>>>>>>>>> @@ -85,6 +137,7 @@ struct ovs_iface {
>>>>>>>>>                               * be fully programmed in OVS.  Only
>>> used
>>>>>>>> in state
>>>>>>>>>                               * OIF_INSTALL_FLOWS.
>>>>>>>>>                               */
>>>>>>>>> +    bool chassis_update_required;  /* If true, pb->chassis must be
>>>>>>>> updated. */
>>>>>>>>>  };
>>>>>>>>>
>>>>>>>>>  static uint64_t ifaces_usage;
>>>>>>>>> @@ -158,14 +211,23 @@ if_status_mgr_destroy(struct if_status_mgr
>>> *mgr)
>>>>>>>>>  }
>>>>>>>>>
>>>>>>>>>  void
>>>>>>>>> -if_status_mgr_claim_iface(struct if_status_mgr *mgr, const char
>>>>>>>> *iface_id)
>>>>>>>>> +if_status_mgr_claim_iface(struct if_status_mgr *mgr,
>>>>>>>>> +                          const struct sbrec_port_binding *pb,
>>>>>>>>> +                          const struct sbrec_chassis *chassis_rec,
>>>>>>>>> +                          bool sb_readonly)
>>>>>>>>>  {
>>>>>>>>> +    const char *iface_id = pb->logical_port;
>>>>>>>>>      struct ovs_iface *iface = shash_find_data(&mgr->ifaces,
>>>>> iface_id);
>>>>>>>>>
>>>>>>>>>      if (!iface) {
>>>>>>>>>          iface = ovs_iface_create(mgr, iface_id, OIF_CLAIMED);
>>>>>>>>>      }
>>>>>>>>> -
>>>>>>>>> +    if (!sb_readonly) {
>>>>>>>>> +        set_pb_chassis_in_sbrec(pb, chassis_rec, true);
>>>>>>>>> +        iface->chassis_update_required = false;
>>>>>>>>> +    } else {
>>>>>>>>> +        iface->chassis_update_required = true;
>>>>>>>>> +    }
>>>>>>>>>      switch (iface->state) {
>>>>>>>>>      case OIF_CLAIMED:
>>>>>>>>>      case OIF_INSTALL_FLOWS:
>>>>>>>>> @@ -182,6 +244,12 @@ if_status_mgr_claim_iface(struct if_status_mgr
>>>>>>>> *mgr, const char *iface_id)
>>>>>>>>>      }
>>>>>>>>>  }
>>>>>>>>>
>>>>>>>>> +bool
>>>>>>>>> +if_status_mgr_iface_is_present(struct if_status_mgr *mgr, const
>>> char
>>>>>>>> *iface_id)
>>>>>>>>> +{
>>>>>>>>> +    return !!shash_find_data(&mgr->ifaces, iface_id);
>>>>>>>>> +}
>>>>>>>>> +
>>>>>>>>>  void
>>>>>>>>>  if_status_mgr_release_iface(struct if_status_mgr *mgr, const char
>>>>>>>> *iface_id)
>>>>>>>>>  {
>>>>>>>>> @@ -246,9 +314,39 @@ if_status_mgr_delete_iface(struct
>>> if_status_mgr
>>>>>>>> *mgr, const char *iface_id)
>>>>>>>>>      }
>>>>>>>>>  }
>>>>>>>>>
>>>>>>>>> +bool
>>>>>>>>> +if_status_handle_claims(struct if_status_mgr *mgr,
>>>>>>>>> +                        struct local_binding_data *binding_data,
>>>>>>>>> +                        const struct sbrec_chassis *chassis_rec,
>>>>>>>>> +                        struct hmap *tracked_datapath,
>>>>>>>>> +                        bool sb_readonly)
>>>>>>>>> +{
>>>>>>>>> +    if (!binding_data || sb_readonly) {
>>>>>>>>> +        return false;
>>>>>>>>> +    }
>>>>>>>>> +
>>>>>>>>> +    struct shash *bindings = &binding_data->bindings;
>>>>>>>>> +    struct hmapx_node *node;
>>>>>>>>> +
>>>>>>>>> +    bool rc = false;
>>>>>>>>> +    HMAPX_FOR_EACH (node, &mgr->ifaces_per_state[OIF_CLAIMED]) {
>>>>>>>>> +        struct ovs_iface *iface = node->data;
>>>>>>>>> +        if (iface->chassis_update_required) {
>>>>>>>>
>>>>>>>> Thanks Xavier for the revision. The state machine looks more clear
>>> now,
>>>>>>>> but I have a major concern for the use of chassis_update_required.
>>> This
>>>>>>>> bool flag is used to decide if an update to SB is needed, and once
>>> a SB
>>>>>>>> update is requested, it is set to false, and assumes the SB update
>>> will
>>>>>>>> succeed immediately. However, the assumption may be wrong. There
>>> can be
>>>>>>>> different kinds of reasons that the subsequent SB update fails, or
>>>>> delayed,
>>>>>>>> so this flag is not reliable. Instead, in CLAIMED state, the
>>>>> responsibility
>>>>>>>> to make sure the SB update is completed. If the transaction is
>>>>> in-progress,
>>>>>>>> the sb_readonly is true. So if sb_readonly is false, it means
>>> nothing
>>>>> is
>>>>>>>> in-progress, so we can always check if (!sb_readonly && <SB chassis
>>> is
>>>>> not
>>>>>>>> updated for the port-binding>) we should just send the update,
>>>>> regardless
>>>>>>>> of whether we have requested it before. Please also see another
>>> comment
>>>>>>>> below for the state transition.
>>>>>>>>
>>>>>>>>> +            VLOG_INFO("if_status_handle_claims for %s",
>>> iface->id);
>>>>>>>>> +            local_binding_set_pb(bindings, iface->id, chassis_rec,
>>>>>>>>> +                                 tracked_datapath, true);
>>>>>>>>> +            rc = true;
>>>>>>>>> +        }
>>>>>>>>> +        iface->chassis_update_required = false;
>>>>>>>>> +    }
>>>>>>>>> +    return rc;
>>>>>>>>> +}
>>>>>>>>> +
>>>>>>>>>  void
>>>>>>>>>  if_status_mgr_update(struct if_status_mgr *mgr,
>>>>>>>>> -                     struct local_binding_data *binding_data)
>>>>>>>>> +                     struct local_binding_data *binding_data,
>>>>>>>>> +                     const struct sbrec_chassis *chassis_rec,
>>>>>>>>> +                     bool sb_readonly)
>>>>>>>>>  {
>>>>>>>>>      if (!binding_data) {
>>>>>>>>>          return;
>>>>>>>>> @@ -257,13 +355,25 @@ if_status_mgr_update(struct if_status_mgr
>>> *mgr,
>>>>>>>>>      struct shash *bindings = &binding_data->bindings;
>>>>>>>>>      struct hmapx_node *node;
>>>>>>>>>
>>>>>>>>> +    /* Interfaces in OIF_MARK_UP state have already set their
>>>>>>>> pb->chassis.
>>>>>>>>> +     * However, it might have been reset by another hv.
>>>>>>>>> +     */
>>>>>>>>>      /* Move all interfaces that have been confirmed "up" by the
>>>>> binding
>>>>>>>> module,
>>>>>>>>>       * from OIF_MARK_UP to OIF_INSTALLED.
>>>>>>>>>       */
>>>>>>>>>      HMAPX_FOR_EACH_SAFE (node,
>>> &mgr->ifaces_per_state[OIF_MARK_UP]) {
>>>>>>>>>          struct ovs_iface *iface = node->data;
>>>>>>>>>
>>>>>>>>> -        if (local_binding_is_up(bindings, iface->id)) {
>>>>>>>>> +        if (iface->chassis_update_required) {
>>>>>>>>> +            if (!sb_readonly) {
>>>>>>>>> +                iface->chassis_update_required = false;
>>>>>>>>> +                local_binding_set_pb(bindings, iface->id,
>>>>> chassis_rec,
>>>>>>>>> +                                     NULL, true);
>>>>>>>>> +            } else {
>>>>>>>>> +                continue;
>>>>>>>>> +            }
>>>>>>>>> +        }
>>>>>>>>> +        if (local_binding_is_up(bindings, iface->id,
>>> chassis_rec)) {
>>>>>>>>>              ovs_iface_set_state(mgr, iface, OIF_INSTALLED);
>>>>>>>>>          }
>>>>>>>>>      }
>>>>>>>>> @@ -274,23 +384,53 @@ if_status_mgr_update(struct if_status_mgr
>>> *mgr,
>>>>>>>>>      HMAPX_FOR_EACH_SAFE (node,
>>>>> &mgr->ifaces_per_state[OIF_MARK_DOWN]) {
>>>>>>>>>          struct ovs_iface *iface = node->data;
>>>>>>>>>
>>>>>>>>> -        if (local_binding_is_down(bindings, iface->id)) {
>>>>>>>>> +        if (!sb_readonly) {
>>>>>>>>> +            local_binding_set_pb(bindings, iface->id, chassis_rec,
>>>>>>>>> +                                 NULL, false);
>>>>>>>>> +        }
>>>>>>>>> +        if (local_binding_is_down(bindings, iface->id,
>>> chassis_rec))
>>>>> {
>>>>>>>>>              ovs_iface_destroy(mgr, iface);
>>>>>>>>>          }
>>>>>>>>>      }
>>>>>>>>>
>>>>>>>>> -    /* Register for a notification about flows being installed in
>>> OVS
>>>>>>>> for all
>>>>>>>>> -     * newly claimed interfaces.
>>>>>>>>> +    if (!sb_readonly) {
>>>>>>>>> +        HMAPX_FOR_EACH_SAFE (node,
>>>>>>>> &mgr->ifaces_per_state[OIF_INSTALL_FLOWS]) {
>>>>>>>>> +            struct ovs_iface *iface = node->data;
>>>>>>>>> +
>>>>>>>>> +            if (iface->chassis_update_required) {
>>>>>>>>> +                iface->chassis_update_required = false;
>>>>>>>>> +                local_binding_set_pb(bindings, iface->id,
>>>>> chassis_rec,
>>>>>>>>> +                                     NULL, true);
>>>>>>>>> +            }
>>>>>>>>> +        }
>>>>>>>>> +    }
>>>>>>>>> +
>>>>>>>>> +    /* Update Port_Binding->chassis for newly claimed interfaces
>>>>>>>>> +     * Register for a notification about flows being installed in
>>> OVS
>>>>>>>> for all
>>>>>>>>> +     * newly claimed interfaces for which we could update
>>>>> pb->chassis.
>>>>>>>>>       *
>>>>>>>>>       * Move them from OIF_CLAIMED to OIF_INSTALL_FLOWS.
>>>>>>>>>       */
>>>>>>>>> -    bool new_ifaces = false;
>>>>>>>>> -    HMAPX_FOR_EACH_SAFE (node,
>>> &mgr->ifaces_per_state[OIF_CLAIMED]) {
>>>>>>>>> -        struct ovs_iface *iface = node->data;
>>>>>>>>>
>>>>>>>>> -        ovs_iface_set_state(mgr, iface, OIF_INSTALL_FLOWS);
>>>>>>>>> -        iface->install_seqno = mgr->iface_seqno + 1;
>>>>>>>>> -        new_ifaces = true;
>>>>>>>>> +    bool new_ifaces = false;
>>>>>>>>> +    if (!sb_readonly) {
>>>>>>>>> +        HMAPX_FOR_EACH_SAFE (node,
>>>>> &mgr->ifaces_per_state[OIF_CLAIMED])
>>>>>>>> {
>>>>>>>>> +            struct ovs_iface *iface = node->data;
>>>>>>>>> +            /* No need to check for chassis_update_required as
>>>>> already
>>>>>>>> done
>>>>>>>>> +             * in if_status_handle_claims or
>>>>> if_status_mgr_claim_iface
>>>>>>>>> +             */
>>>>>>>>> +            ovs_iface_set_state(mgr, iface, OIF_INSTALL_FLOWS);
>>>>>>>>
>>>>>>>> We need to make sure the chassis in SB port-binding is up-to-date
>>> (i.e.
>>>>>>>> the update notification from SB DB has been received) before moving
>>> to
>>>>>>>> INSTALL_FLOWS. Otherwise, it is still possible that the state is
>>> moved
>>>>> too
>>>>>>>> early and end up with incomplete flow installation for the lport
>>> when
>>>>> the
>>>>>>>> state is finally moved to INSTALLED.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Han
>>>>>>>>
>>>>>>>>> +            iface->install_seqno = mgr->iface_seqno + 1;
>>>>>>>>> +            new_ifaces = true;
>>>>>>>>> +        }
>>>>>>>>> +    } else {
>>>>>>>>> +        HMAPX_FOR_EACH_SAFE (node,
>>>>> &mgr->ifaces_per_state[OIF_CLAIMED])
>>>>>>>> {
>>>>>>>>> +            struct ovs_iface *iface = node->data;
>>>>>>>>> +            static struct vlog_rate_limit rl =
>>>>> VLOG_RATE_LIMIT_INIT(5,
>>>>>>>> 1);
>>>>>>>>> +            VLOG_INFO_RL(&rl,
>>>>>>>>> +                         "Not updating pb chassis for %s now as "
>>>>>>>>> +                         "sb is readonly", iface->id);
>>>>>>>>> +        }
>>>>>>>>>      }
>>>>>>>>>
>>>>>>>>>      /* Request a seqno update when the flows for new interfaces
>>> have
>>>>>>>> been
>>>>>>>>> @@ -403,7 +543,7 @@ if_status_mgr_update_bindings(struct
>>> if_status_mgr
>>>>>>>> *mgr,
>>>>>>>>>      struct hmapx_node *node;
>>>>>>>>>
>>>>>>>>>      /* Notify the binding module to set "down" all bindings that
>>> are
>>>>>>>> still
>>>>>>>>> -     * in the process of being installed in OVS, i.e., are not yet
>>>>>>>> instsalled.
>>>>>>>>> +     * in the process of being installed in OVS, i.e., are not yet
>>>>>>>> installed.
>>>>>>>>>       */
>>>>>>>>>      HMAPX_FOR_EACH (node,
>>> &mgr->ifaces_per_state[OIF_INSTALL_FLOWS])
>>>>> {
>>>>>>>>>          struct ovs_iface *iface = node->data;
>>>>>>>>> diff --git a/controller/if-status.h b/controller/if-status.h
>>>>>>>>> index bb8a3950d..f9b05d30d 100644
>>>>>>>>> --- a/controller/if-status.h
>>>>>>>>> +++ b/controller/if-status.h
>>>>>>>>> @@ -27,15 +27,27 @@ struct if_status_mgr
>>> *if_status_mgr_create(void);
>>>>>>>>>  void if_status_mgr_clear(struct if_status_mgr *);
>>>>>>>>>  void if_status_mgr_destroy(struct if_status_mgr *);
>>>>>>>>>
>>>>>>>>> -void if_status_mgr_claim_iface(struct if_status_mgr *, const char
>>>>>>>> *iface_id);
>>>>>>>>> +void if_status_mgr_claim_iface(struct if_status_mgr *,
>>>>>>>>> +                               const struct sbrec_port_binding
>>> *pb,
>>>>>>>>> +                               const struct sbrec_chassis
>>>>> *chassis_rec,
>>>>>>>>> +                               bool sb_readonly);
>>>>>>>>>  void if_status_mgr_release_iface(struct if_status_mgr *, const
>>> char
>>>>>>>> *iface_id);
>>>>>>>>>  void if_status_mgr_delete_iface(struct if_status_mgr *, const char
>>>>>>>> *iface_id);
>>>>>>>>>
>>>>>>>>> -void if_status_mgr_update(struct if_status_mgr *, struct
>>>>>>>> local_binding_data *);
>>>>>>>>> +void if_status_mgr_update(struct if_status_mgr *, struct
>>>>>>>> local_binding_data *,
>>>>>>>>> +                          const struct sbrec_chassis *chassis,
>>>>>>>>> +                          bool sb_readonly);
>>>>>>>>>  void if_status_mgr_run(struct if_status_mgr *mgr, struct
>>>>>>>> local_binding_data *,
>>>>>>>>>                         const struct sbrec_chassis *,
>>>>>>>>>                         bool sb_readonly, bool ovs_readonly);
>>>>>>>>>  void if_status_mgr_get_memory_usage(struct if_status_mgr *mgr,
>>>>>>>>>                                      struct simap *usage);
>>>>>>>>> +bool if_status_mgr_iface_is_present(struct if_status_mgr *mgr,
>>>>>>>>> +                                    const char *iface_id);
>>>>>>>>> +bool if_status_handle_claims(struct if_status_mgr *mgr,
>>>>>>>>> +                             struct local_binding_data
>>> *binding_data,
>>>>>>>>> +                             const struct sbrec_chassis
>>> *chassis_rec,
>>>>>>>>> +                             struct hmap *tracked_datapath,
>>>>>>>>> +                             bool sb_readonly);
>>>>>>>>>
>>>>>>>>>  # endif /* controller/if-status.h */
>>>>>>>>> diff --git a/controller/ovn-controller.c
>>> b/controller/ovn-controller.c
>>>>>>>>> index 69615308e..3947baf03 100644
>>>>>>>>> --- a/controller/ovn-controller.c
>>>>>>>>> +++ b/controller/ovn-controller.c
>>>>>>>>> @@ -1464,6 +1464,73 @@ en_runtime_data_run(struct engine_node
>>> *node,
>>>>>>>> void *data)
>>>>>>>>>      engine_set_node_state(node, EN_UPDATED);
>>>>>>>>>  }
>>>>>>>>>
>>>>>>>>> +struct ed_type_sb_ro {
>>>>>>>>> +    bool sb_readonly;
>>>>>>>>> +};
>>>>>>>>> +
>>>>>>>>> +static void *
>>>>>>>>> +en_sb_ro_init(struct engine_node *node OVS_UNUSED,
>>>>>>>>> +              struct engine_arg *arg OVS_UNUSED)
>>>>>>>>> +{
>>>>>>>>> +    struct ed_type_sb_ro *data = xzalloc(sizeof *data);
>>>>>>>>> +    return data;
>>>>>>>>> +}
>>>>>>>>> +
>>>>>>>>> +static void
>>>>>>>>> +en_sb_ro_run(struct engine_node *node, void *data)
>>>>>>>>> +{
>>>>>>>>> +    struct ed_type_sb_ro *sb_ro_data = data;
>>>>>>>>> +    bool sb_readonly = !engine_get_context()->ovnsb_idl_txn;
>>>>>>>>> +    if (sb_ro_data->sb_readonly != sb_readonly) {
>>>>>>>>> +        sb_ro_data->sb_readonly = sb_readonly;
>>>>>>>>> +        if (!sb_ro_data->sb_readonly) {
>>>>>>>>> +            engine_set_node_state(node, EN_UPDATED);
>>>>>>>>> +        }
>>>>>>>>> +    }
>>>>>>>>> +}
>>>>>>>>> +
>>>>>>>>> +static void
>>>>>>>>> +en_sb_ro_cleanup(void *data OVS_UNUSED)
>>>>>>>>> +{
>>>>>>>>> +}
>>>>>>>>> +
>>>>>>>>> +static bool
>>>>>>>>> +runtime_data_sb_ro_handler(struct engine_node *node, void *data)
>>>>>>>>> +{
>>>>>>>>> +    const struct sbrec_chassis *chassis = NULL;
>>>>>>>>> +
>>>>>>>>> +    struct ovsrec_open_vswitch_table *ovs_table =
>>>>>>>>> +        (struct ovsrec_open_vswitch_table *)EN_OVSDB_GET(
>>>>>>>>> +            engine_get_input("OVS_open_vswitch", node));
>>>>>>>>> +
>>>>>>>>> +    const char *chassis_id = get_ovs_chassis_id(ovs_table);
>>>>>>>>> +
>>>>>>>>> +    struct ovsdb_idl_index *sbrec_chassis_by_name =
>>>>>>>>> +        engine_ovsdb_node_get_index(
>>>>>>>>> +                engine_get_input("SB_chassis", node),
>>>>>>>>> +                "name");
>>>>>>>>> +
>>>>>>>>> +    if (chassis_id) {
>>>>>>>>> +        chassis = chassis_lookup_by_name(sbrec_chassis_by_name,
>>>>>>>> chassis_id);
>>>>>>>>> +    }
>>>>>>>>> +    if (chassis) {
>>>>>>>>> +        struct ed_type_runtime_data *rt_data = data;
>>>>>>>>> +        bool sb_readonly = !engine_get_context()->ovnsb_idl_txn;
>>>>>>>>> +        struct controller_engine_ctx *ctrl_ctx =
>>>>>>>>> +            engine_get_context()->client_ctx;
>>>>>>>>> +
>>>>>>>>> +        if (if_status_handle_claims(ctrl_ctx->if_mgr,
>>>>>>>>> +                                    &rt_data->lbinding_data,
>>>>>>>>> +                                    chassis,
>>>>>>>>> +                                    &rt_data->tracked_dp_bindings,
>>>>>>>>> +                                    sb_readonly)) {
>>>>>>>>> +            engine_set_node_state(node, EN_UPDATED);
>>>>>>>>> +            rt_data->tracked = true;
>>>>>>>>> +        }
>>>>>>>>> +    }
>>>>>>>>> +    return true;
>>>>>>>>> +}
>>>>>>>>> +
>>>>>>>>>  static bool
>>>>>>>>>  runtime_data_ovs_interface_shadow_handler(struct engine_node
>>> *node,
>>>>>>>> void *data)
>>>>>>>>>  {
>>>>>>>>> @@ -3528,6 +3595,7 @@ main(int argc, char *argv[])
>>>>>>>>>      stopwatch_create(VIF_PLUG_RUN_STOPWATCH_NAME, SW_MS);
>>>>>>>>>
>>>>>>>>>      /* Define inc-proc-engine nodes. */
>>>>>>>>> +    ENGINE_NODE(sb_ro, "sb_ro");
>>>>>>>>>      ENGINE_NODE_WITH_CLEAR_TRACK_DATA_IS_VALID(ct_zones,
>>> "ct_zones");
>>>>>>>>>      ENGINE_NODE_WITH_CLEAR_TRACK_DATA(ovs_interface_shadow,
>>>>>>>>>                                        "ovs_interface_shadow");
>>>>>>>>> @@ -3664,6 +3732,7 @@ main(int argc, char *argv[])
>>>>>>>>>      engine_add_input(&en_ovs_interface_shadow, &en_ovs_interface,
>>>>>>>>>                       ovs_interface_shadow_ovs_interface_handler);
>>>>>>>>>
>>>>>>>>> +    engine_add_input(&en_runtime_data, &en_sb_ro,
>>>>>>>> runtime_data_sb_ro_handler);
>>>>>>>>>      engine_add_input(&en_runtime_data, &en_ofctrl_is_connected,
>>>>> NULL);
>>>>>>>>>
>>>>>>>>>      engine_add_input(&en_runtime_data, &en_ovs_open_vswitch,
>>> NULL);
>>>>>>>>> @@ -4098,7 +4167,8 @@ main(int argc, char *argv[])
>>>>>>>>>                          runtime_data ?
>>> &runtime_data->lbinding_data :
>>>>>>>> NULL;
>>>>>>>>>
>>>>>  stopwatch_start(IF_STATUS_MGR_UPDATE_STOPWATCH_NAME,
>>>>>>>>>                                      time_msec());
>>>>>>>>> -                    if_status_mgr_update(if_mgr, binding_data);
>>>>>>>>> +                    if_status_mgr_update(if_mgr, binding_data,
>>>>> chassis,
>>>>>>>>> +                                         !ovnsb_idl_txn);
>>>>>>>>>
>>>>>  stopwatch_stop(IF_STATUS_MGR_UPDATE_STOPWATCH_NAME,
>>>>>>>>>                                     time_msec());
>>>>>>>>>
>>>>>>>>> diff --git a/tests/ovn-macros.at b/tests/ovn-macros.at
>>>>>>>>> index 335f9158c..8fd6ae6f7 100644
>>>>>>>>> --- a/tests/ovn-macros.at
>>>>>>>>> +++ b/tests/ovn-macros.at
>>>>>>>>> @@ -759,3 +759,15 @@
>>>>> m4_define([OVN_FOR_EACH_NORTHD_WITHOUT_DP_GROUPS],
>>>>>>>>>       [m4_foreach([NORTHD_USE_DP_GROUPS], [no],
>>>>>>>>>         [m4_foreach([NORTHD_USE_PARALLELIZATION], [yes, no], [$1
>>>>>>>>>  ])])])])
>>>>>>>>> +
>>>>>>>>> +# OVN_NBCTL(NBCTL_COMMAND) adds NBCTL_COMMAND to list of commands
>>> to
>>>>> be
>>>>>>>> run by RUN_OVN_NBCTL().
>>>>>>>>> +m4_define([OVN_NBCTL], [
>>>>>>>>> +    command="${command} -- $1"
>>>>>>>>> +])
>>>>>>>>> +
>>>>>>>>> +# RUN_OVN_NBCTL() executes list of commands built by the
>>> OVN_NBCTL()
>>>>>>>> macro.
>>>>>>>>> +m4_define([RUN_OVN_NBCTL], [
>>>>>>>>> +    check ovn-nbctl ${command}
>>>>>>>>> +    unset command
>>>>>>>>> +])
>>>>>>>>> +
>>>>>>>>> diff --git a/tests/ovn.at b/tests/ovn.at
>>>>>>>>> index bfaa41962..94d16bac9 100644
>>>>>>>>> --- a/tests/ovn.at
>>>>>>>>> +++ b/tests/ovn.at
>>>>>>>>> @@ -102,6 +102,18 @@ m4_divert_text([PREPARE_TESTS],
>>>>>>>>>           test 1 -le $(as $hv1 ovs-ofctl dump-flows br-int | grep
>>> -c
>>>>>>>> "output:$ofport")
>>>>>>>>>       ])
>>>>>>>>>     }
>>>>>>>>> +
>>>>>>>>> +   ovn_wait_remote_input_flows () {
>>>>>>>>> +     hv1=$1
>>>>>>>>> +     hv2=$2
>>>>>>>>> +     echo "$3: waiting for flows for remote input on $hv1"
>>>>>>>>> +     # Wait for a flow outputing  to remote input
>>>>>>>>> +     OVS_WAIT_UNTIL([
>>>>>>>>> +         ofport=$(as $hv1 ovs-vsctl --bare --columns ofport find
>>>>>>>> Interface name=ovn-${hv2}-0)
>>>>>>>>> +         echo "tunnel port=$ofport"
>>>>>>>>> +         test 1 -le $(as $hv1 ovs-ofctl dump-flows br-int | grep
>>> -c
>>>>>>>> "in_port=$ofport")
>>>>>>>>> +     ])
>>>>>>>>> +   }
>>>>>>>>>  ])
>>>>>>>>>
>>>>>>>>>  m4_define([OVN_CHECK_PACKETS],
>>>>>>>>> @@ -127,6 +139,8 @@ m4_define([OVN_WAIT_PATCH_PORT_FLOWS],
>>>>>>>>>  m4_define([OVN_WAIT_REMOTE_OUTPUT_FLOWS],
>>>>>>>>>    [ovn_wait_remote_output_flows "$1" "$2" "__file__:__line__"])
>>>>>>>>>
>>>>>>>>> +m4_define([OVN_WAIT_REMOTE_INPUT_FLOWS],
>>>>>>>>> +  [ovn_wait_remote_input_flows "$1" "$2" "__file__:__line__"])
>>>>>>>>>
>>>>>>>>>  AT_BANNER([OVN components])
>>>>>>>>>
>>>>>>>>> @@ -14056,6 +14070,11 @@ wait_column "$hv1_uuid" Port_Binding
>>>>>>>> requested_chassis logical_port=lsp0
>>>>>>>>>  wait_column "$hv2_uuid" Port_Binding additional_chassis
>>>>>>>> logical_port=lsp0
>>>>>>>>>  wait_column "$hv2_uuid" Port_Binding requested_additional_chassis
>>>>>>>> logical_port=lsp0
>>>>>>>>>
>>>>>>>>> +# Check ovn-installed updated for main chassis
>>>>>>>>> +wait_for_ports_up
>>>>>>>>> +OVS_WAIT_UNTIL([test `as hv1 ovs-vsctl get Interface lsp0
>>>>>>>> external_ids:ovn-installed` = '"true"'])
>>>>>>>>> +OVS_WAIT_UNTIL([test x`as hv2 ovs-vsctl get Interface lsp0
>>>>>>>> external_ids:ovn-installed` = x])
>>>>>>>>> +
>>>>>>>>>  # Check that setting iface:encap-ip populates
>>>>>>>> Port_Binding:additional_encap
>>>>>>>>>  wait_row_count Encap 2 chassis_name=hv1
>>>>>>>>>  wait_row_count Encap 2 chassis_name=hv2
>>>>>>>>> @@ -14081,6 +14100,11 @@ wait_column "$hv2_uuid" Port_Binding
>>>>>>>> requested_chassis logical_port=lsp0
>>>>>>>>>  wait_column "" Port_Binding additional_chassis logical_port=lsp0
>>>>>>>>>  wait_column "" Port_Binding requested_additional_chassis
>>>>>>>> logical_port=lsp0
>>>>>>>>>
>>>>>>>>> +# Check ovn-installed updated for main chassis and not for other
>>>>> chassis
>>>>>>>>> +wait_for_ports_up
>>>>>>>>> +OVS_WAIT_UNTIL([test `as hv2 ovs-vsctl get Interface lsp0
>>>>>>>> external_ids:ovn-installed` = '"true"'])
>>>>>>>>> +OVS_WAIT_UNTIL([test x`as hv1 ovs-vsctl get Interface lsp0
>>>>>>>> external_ids:ovn-installed` = x])
>>>>>>>>> +
>>>>>>>>>  # Check that additional_encap is cleared
>>>>>>>>>  wait_column "" Port_Binding additional_encap logical_port=lsp0
>>>>>>>>>
>>>>>>>>> @@ -15370,7 +15394,10 @@ AT_CHECK([as hv1 ovs-ofctl dump-flows
>>> br-int
>>>>>>>> table=65 | grep actions=output:1],
>>>>>>>>>  echo "verifying that lsp0 binding moves when requested-chassis is
>>>>>>>> changed"
>>>>>>>>>
>>>>>>>>>  ovn-nbctl lsp-set-options lsp0 requested-chassis=hv2
>>>>>>>>> -OVS_WAIT_UNTIL([test 1 = $(grep -c "Releasing lport lsp0 from this
>>>>>>>> chassis" hv1/ovn-controller.log)])
>>>>>>>>> +
>>>>>>>>> +# We might see multiple "Releasing lport ...", when sb is read
>>> only
>>>>>>>>> +OVS_WAIT_UNTIL([test 1 -le $(grep -c "Releasing lport lsp0 from
>>> this
>>>>>>>> chassis" hv1/ovn-controller.log)])
>>>>>>>>> +
>>>>>>>>>  wait_column "$hv2_uuid" Port_Binding chassis logical_port=lsp0
>>>>>>>>>
>>>>>>>>>  # (6) Chassis hv2 should add flows and hv1 should not.
>>>>>>>>> @@ -15416,7 +15443,7 @@ AT_CHECK([as hv1 ovs-ofctl dump-flows
>>> br-int
>>>>>>>> table=0 | grep in_port=1], [0], [ig
>>>>>>>>>  AT_CHECK([as hv1 ovs-ofctl dump-flows br-int table=65 | grep
>>>>>>>> actions=output:1], [0], [ignore])
>>>>>>>>>
>>>>>>>>>  check ovn-nbctl --wait=hv lsp-set-options lsp0
>>>>>>>> requested-chassis=non-existant-chassis
>>>>>>>>> -OVS_WAIT_UNTIL([test 1 = $(grep -c "Releasing lport lsp0 from this
>>>>>>>> chassis" hv1/ovn-controller.log)])
>>>>>>>>> +OVS_WAIT_UNTIL([test 1 -le $(grep -c "Releasing lport lsp0 from
>>> this
>>>>>>>> chassis" hv1/ovn-controller.log)])
>>>>>>>>>  check ovn-nbctl --wait=hv sync
>>>>>>>>>  wait_column '' Port_Binding chasssi logical_port=lsp0
>>>>>>>>>  AT_CHECK([as hv1 ovs-ofctl dump-flows br-int table=0 | grep
>>>>> in_port=1],
>>>>>>>> [1], [])
>>>>>>>>> @@ -32418,3 +32445,119 @@ AT_CHECK([test $(ovn-sbctl list fdb |
>>> grep
>>>>> -c
>>>>>>>> "00:00:00:00:10:30") = 0])
>>>>>>>>>  OVN_CLEANUP([hv1])
>>>>>>>>>  AT_CLEANUP
>>>>>>>>>  ])
>>>>>>>>> +
>>>>>>>>> +OVN_FOR_EACH_NORTHD([
>>>>>>>>> +AT_SETUP([recomputes])
>>>>>>>>> +ovn_start
>>>>>>>>> +
>>>>>>>>> +n_hv=4
>>>>>>>>> +
>>>>>>>>> +# Add chassis
>>>>>>>>> +net_add n1
>>>>>>>>> +for i in $(seq 1 $n_hv); do
>>>>>>>>> +    sim_add hv$i
>>>>>>>>> +    as hv$i
>>>>>>>>> +    check ovs-vsctl add-br br-phys
>>>>>>>>> +    ovs-vsctl set open .
>>>>> external-ids:ovn-bridge-mappings=phys:br-phys
>>>>>>>>> +    ovn_attach n1 br-phys 192.168.0.$i 24 geneve
>>>>>>>>> +done
>>>>>>>>> +
>>>>>>>>> +add_switch_ports() {
>>>>>>>>> +    start_port=$1
>>>>>>>>> +    end_port=$2
>>>>>>>>> +    nb_hv=$3
>>>>>>>>> +    bulk_size=$4
>>>>>>>>> +    for ((i=start_port; i<end_port; )) do
>>>>>>>>> +        start_bulk=$i
>>>>>>>>> +        for hv in $(seq 1 $nb_hv); do
>>>>>>>>> +            end_bulk=$((start_bulk+bulk_size-1))
>>>>>>>>> +            for port in $(seq $start_bulk $end_bulk); do
>>>>>>>>> +                logical_switch_port=lsp${port}
>>>>>>>>> +                OVN_NBCTL(lsp-add ls1 $logical_switch_port)
>>>>>>>>> +                OVN_NBCTL(lsp-set-addresses $logical_switch_port
>>>>>>>> dynamic)
>>>>>>>>> +            done
>>>>>>>>> +            start_bulk=$((end_bulk+1))
>>>>>>>>> +        done
>>>>>>>>> +        RUN_OVN_NBCTL()
>>>>>>>>> +
>>>>>>>>> +        start_bulk=$i
>>>>>>>>> +        for hv in $(seq 1 $nb_hv); do
>>>>>>>>> +            end_bulk=$((start_bulk+bulk_size-1))
>>>>>>>>> +            for port in $(seq $start_bulk $end_bulk); do
>>>>>>>>> +                logical_switch_port=lsp${port}
>>>>>>>>> +                as hv$hv ovs-vsctl \
>>>>>>>>> +                    --no-wait -- add-port br-int vif${port} \
>>>>>>>>> +                    -- set Interface vif${port}
>>>>>>>> external_ids:iface-id=$logical_switch_port
>>>>>>>>> +            done
>>>>>>>>> +            start_bulk=$((end_bulk+1))
>>>>>>>>> +        done
>>>>>>>>> +        i=$((end_bulk+1))
>>>>>>>>> +    done
>>>>>>>>> +}
>>>>>>>>> +check ovn-nbctl ls-add ls1
>>>>>>>>> +check ovn-nbctl set Logical_Switch ls1 other_config:subnet=
>>>>> 10.1.0.0/16
>>>>>>>>> +check ovn-nbctl set Logical_Switch ls1
>>>>>>>> other_config:exclude_ips=10.1.255.254
>>>>>>>>> +
>>>>>>>>> +check ovn-nbctl lr-add lr1
>>>>>>>>> +check ovn-nbctl lsp-add ls1 lsp0 -- set Logical_Switch_Port lsp0
>>>>>>>> type=router options:router-port=lrp0 addresses=dynamic
>>>>>>>>> +check ovn-nbctl lrp-add lr1 lrp0 "f0:00:00:01:00:01"
>>> 10.1.255.254/16
>>>>>>>>> +check ovn-nbctl lr-nat-add lr1 snat 10.2.0.1 10.1.0.0/16
>>>>>>>>> +
>>>>>>>>> +lflow_run=0
>>>>>>>>> +check ovn-nbctl --wait=hv sync
>>>>>>>>> +
>>>>>>>>> +# Tunnel ports might not be added (yet) at this point on slow
>>> system.
>>>>>>>>> +# Wait for flows related to such ports to ensure those ports have
>>>>> been
>>>>>>>> added
>>>>>>>>> +# before we measure recomputes. Otherwise, ovs_interface handler
>>>>> might
>>>>>>>> be run
>>>>>>>>> +# afterwards for tunnel ports, causing recomputes.
>>>>>>>>> +for i in $(seq 1 $n_hv); do
>>>>>>>>> +    for j in $(seq 1 $n_hv); do
>>>>>>>>> +        if test $i != $j; then
>>>>>>>>> +            OVN_WAIT_REMOTE_INPUT_FLOWS(["hv$i"],["hv$j"])
>>>>>>>>> +        fi
>>>>>>>>> +    done
>>>>>>>>> +done
>>>>>>>>> +
>>>>>>>>> +for i in $(seq 1 $n_hv); do
>>>>>>>>> +    as hv$i
>>>>>>>>> +    lflow_run1=$(ovn-appctl -t ovn-controller
>>> coverage/read-counter
>>>>>>>> lflow_run)
>>>>>>>>> +    lflow_run=`expr $lflow_run1 + $lflow_run`
>>>>>>>>> +done
>>>>>>>>> +
>>>>>>>>> +add_switch_ports 1 1000 $n_hv 5
>>>>>>>>> +
>>>>>>>>> +wait_for_ports_up
>>>>>>>>> +check ovn-nbctl --wait=hv sync
>>>>>>>>> +
>>>>>>>>> +for i in $(seq 1 $n_hv); do
>>>>>>>>> +    pid=$(cat hv${i}/ovn-controller.pid)
>>>>>>>>> +    u=$((u+$(cat "/proc/$pid/stat" | awk '{print $14}')))
>>>>>>>>> +    s=$((s+$(cat "/proc/$pid/stat" | awk '{print $15}')))
>>>>>>>>> +done
>>>>>>>>> +
>>>>>>>>> +n_pid=$(cat northd/ovn-northd.pid)
>>>>>>>>> +n_u=$(cat "/proc/$pid/stat" | awk '{print $14}')
>>>>>>>>> +n_s=$(cat "/proc/$pid/stat" | awk '{print $15}')
>>>>>>>>> +
>>>>>>>>> +echo "Total Northd User Time: $n_u"
>>>>>>>>> +echo "Total Northd System Time: $n_s"
>>>>>>>>> +echo "Total Controller User Time: $u"
>>>>>>>>> +echo "Total Controller System Time: $s"
>>>>>>>>> +
>>>>>>>>> +lflow_run_end=0
>>>>>>>>> +for i in $(seq 1 $n_hv); do
>>>>>>>>> +    as hv$i
>>>>>>>>> +    lflow_run1=$(ovn-appctl -t ovn-controller
>>> coverage/read-counter
>>>>>>>> lflow_run)
>>>>>>>>> +    lflow_run_end=`expr $lflow_run1 + $lflow_run_end`
>>>>>>>>> +done
>>>>>>>>> +n_recomputes=`expr $lflow_run_end - $lflow_run`
>>>>>>>>> +echo "$n_recomputes recomputes"
>>>>>>>>> +
>>>>>>>>> +AT_CHECK([test $lflow_run_end == $lflow_run])
>>>>>>>>> +
>>>>>>>>> +for i in $(seq 2 $n_hv); do
>>>>>>>>> +    OVN_CLEANUP_SBOX([hv$i])
>>>>>>>>> +done
>>>>>>>>> +OVN_CLEANUP([hv1])
>>>>>>>>> +AT_CLEANUP
>>>>>>>>> +])
>>>>>>>>> diff --git a/tests/perf-northd.at b/tests/perf-northd.at
>>>>>>>>> index 74b69e9d4..6ec196b36 100644
>>>>>>>>> --- a/tests/perf-northd.at
>>>>>>>>> +++ b/tests/perf-northd.at
>>>>>>>>> @@ -76,23 +76,6 @@ m4_define([PERF_RECORD_STOP], [
>>>>>>>>>      PERF_RECORD_STOPWATCH(ovn-northd-loop, ["Short term average"],
>>>>>>>> [Average (northd-loop in msec)])
>>>>>>>>>  ])
>>>>>>>>>
>>>>>>>>> -# OVN_NBCTL([NBCTL_COMMAND])
>>>>>>>>> -#
>>>>>>>>> -# Add NBCTL_COMMAND to list of commands to be run by
>>> RUN_OVN_NBCTL().
>>>>>>>>> -#
>>>>>>>>> -m4_define([OVN_NBCTL], [
>>>>>>>>> -    command="${command} -- $1"
>>>>>>>>> -])
>>>>>>>>> -
>>>>>>>>> -# RUN_OVN_NBCTL()
>>>>>>>>> -#
>>>>>>>>> -# Execute list of commands built by the OVN_NBCTL() macro.
>>>>>>>>> -#
>>>>>>>>> -m4_define([RUN_OVN_NBCTL], [
>>>>>>>>> -    check ovn-nbctl ${command}
>>>>>>>>> -    unset command
>>>>>>>>> -])
>>>>>>>>> -
>>>>>>>>>  OVS_START_SHELL_HELPERS
>>>>>>>>>  generate_subnet () {
>>>>>>>>>      local a=$(printf %d $(expr $1 / 256 + 10))
>>>>>>>>> --
>>>>>>>>> 2.31.1
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
Numan Siddique July 20, 2022, 5:29 p.m. UTC | #12
On Wed, Jul 13, 2022 at 4:29 AM Dumitru Ceara <dceara@redhat.com> wrote:
>
> On 7/13/22 11:08, Dumitru Ceara wrote:
> > On 7/13/22 09:40, Xavier Simonart wrote:
> >> Hi Han, Dumitru
> >>
> >
> > Hi Han, Xavier,
> >
> > Sorry, I had already replied to the previous message and only then
> > noticed this one.
> >
> >> I think that we should, as much as possible, try to achieve both goals:
> >> - have an accurate ovn-installed
> >> - do not increase latency in large scale deployments
> >>
> >
> > +1
> >
> >> The fact that ovn-installed is sent too early for mc flows is already an
> >> issue today, independent of this patch.
> >> Fixing ovn-installed related to mc flows by delaying the state change (for
> >> all cases, included when no mc groups) might be seen as a performance
> >> regression.
> >>
> >
> > I think it will, and I'm not sure we can convince the CMS that this is
> > "just metrics".
> >
> >> I agree that we should fix this ovn-installed issue, but it is not a
> >> regression added by this patch. We should enter a BZ for it.
> >> Per my understanding, the mc flows are updated when the SB_multicast_group
> >> is seen as updated by ovn-controller, due to its references to port binding.
> >> Other flows related to port binding are installed earlier, i.e. when
> >> ovn-controller writes port_binding->chassis (i.e. before it receives SB
> >> confirmation). So, while sending the mc flows earlier than what we do today
> >> might be more complex, I think it makes some kind of sense (we would send
> >> all those flows within the same loop).
> >
> > I'm inclining towards leaving it as it is today if this is the only flow
> > we're missing.  It's a guess without testing things out, but I think
> > it's for the MC_FLOOD_L2 multicast group which is used only for
> > forwarding ARP packets originated by OVN routers or destined to a
> > specific OVN router.  Losing some of those packets is not a big deal.
> >
> > But it might be good to confirm that this is the MC group we install the
> > flow for.
> >
>
> Oh, well, the port is also part of the MC_FLOOD group.  This is however
> only used for BUM traffic.  So losing some packets here is also not
> terrible, I think.
>

When a logical port is claimed,  we process the logical flows related
to the logical port (i.e with inport == <lport> or outport == <lport>)
and install
the corresponding openflows. All the generic logical flows (i.e
without inport or outport match) would have already been programmed
(if the datapath already part of local_datapaths).
These processed logical flows (lflow_handle_flows_for_lport() in
lflow.c) will be most likely part of the same openflow bundle. And
once the sequence number
for this bundle is acknowledged we set "ovn-installed=true".  When CMS
notices "ovn-installed=true" , I think it can fairly assume that the
flows for the lport are
programmed.

I think the only flows pertaining to the logical port  which we would
be missing are the multicast related flows and the logical flows which
ovn-northd would generate after the logical port
is claimed (presently it is the arp responder flows) and I don't think
we can wait for these logical flows to be programmed by ovn-northd
before setting "ovn-installed=true".

Delaying setting the ovn-installed=true would definitely result in
latency.  It would not be easy for ovn-controller to keep track of
openflows already programmed for a logical port
In other words,  I don't think ovn-controller can accurately keep
track of all the openflows related to a logical port are programmed or
not  unless all these flows are grouped in one bundle.

Also since the present ovn main already sets ovn-installed=true a bit
early i.e. even before the multicast and arp responder flows are
programmed, I think it is out of this patch's scope to address it.

So I think the patch is fine with me once Xavier addresses (1) i.e
remove the 'chassis_update_required'.

Thanks
Numan


> > Thanks,
> > Dumitru
> >
> >>
> >> Thanks
> >> Xavier
> >>
> >>
> >>
> >> On Wed, Jul 13, 2022 at 8:28 AM Han Zhou <hzhou@ovn.org> wrote:
> >>
> >>>
> >>>
> >>> On Tue, Jul 12, 2022 at 12:35 AM Dumitru Ceara <dceara@redhat.com> wrote:
> >>>>
> >>>> On 7/12/22 08:52, Han Zhou wrote:
> >>>>> On Mon, Jul 11, 2022 at 4:55 AM Dumitru Ceara <dceara@redhat.com>
> >>> wrote:
> >>>>>>
> >>>>>> On 7/11/22 13:31, Xavier Simonart wrote:
> >>>>>>> Hi Han
> >>>>>>>
> >>>>>>> Thanks for your review.
> >>>>>>>
> >>>>>>> Let me try to understand your two main concerns and the proper way to
> >>>>> fix
> >>>>>>> it.
> >>>>>>> 1) We only try once to write pb->chassis. If the commit fails,
> >>>>> pb->chassis
> >>>>>>> is not written. As commit fails, we will recompute, but as the
> >>>>>>> update_required flag is not set anymore, we might end up with no
> >>>>>>> pb->chassis.
> >>>>>>> => I'll remove the flag and try to update until it's confirmed.
> >>>>>
> >>>>> Thank you!
> >>>>>
> >>>>>>> 2) The state machine, and when we move to INSTALL_FLOWS. Serializing
> >>> the
> >>>>>>> state machine, by waiting for confirmation to be received before
> >>> moving
> >>>>> to
> >>>>>>> INSTALL_FLOWS state will delay the ovn-installed compared to today.
> >>> So
> >>>>> I am
> >>>>>>> (still) trying to see if there is any way to prevent this in some
> >>> cases.
> >>>>>>> Would it be correct to do this serialization (wait for pb->chassis
> >>>>> update
> >>>>>>> confirmation) only when using conditional monitoring? When using
> >>>>>>> monitor-all, as soon as we have written (w/o confirmation)
> >>> pb->chassis,
> >>>>> we
> >>>>>>> would move to INSTALL_FLOWS. In that loop where we wrote pb->chassis,
> >>>>> (all)
> >>>>>>> the flows should be updated taking into account pb->chassis.
> >>>>>>
> >>>>>> I think this approach is OK.  ovn-controllers that don't use
> >>> conditional
> >>>>>> monitoring already know the complete SB contents and should be able to
> >>>>>> install (mostly?) complete sets of openflows that correspond to a
> >>> given
> >>>>>> Port_Binding.
> >>>>>>
> >>>>>> It's also "CMS-friendly", at least for ovn-kubernetes, which uses
> >>>>>> ovn-monitor-all=true and only waits for
> >>>>>> OVS.Interface.external_ids:ovn-installed=true.  And shouldn't impact
> >>> the
> >>>>>> others which wait for SB.Port_Binding.up=true.
> >>>>>>
> >>>>>
> >>>>> Xavier and Dumitru, I think we shouldn't introduce the dependency of
> >>>>> "ovn-monitor-all" setting here.
> >>>>>
> >>>>> First of all, ovn-installed is a flag for CMS to understand that all
> >>> the
> >>>>> flows related to the port-binding is installed. If we set the flag
> >>> before
> >>>>> it is truly completed, it is a bug, and it is possible that the flag
> >>> is set
> >>>>> but some traffic doesn't work.
> >>>>
> >>>> I think it's a matter of semantics.  The way I see "ovn-installed=true"
> >>>> is: all flows that are relevant to the port binidng on the local chassis
> >>>> have been installed.  When we added it it was for the ovn-k8s case;
> >>>> ovn-k8s used to explicitly check if some openflow tables on the node
> >>>> where the pod is being brought up contained flows that seemed to
> >>>> correspond to the pod (e.g., matching on pod mac and IP addresses).
> >>>>
> >>> But the purpose of checking the flows (when ovn-installed wasn't
> >>> available) was to make sure the pod is ready to send/receive traffic. If
> >>> ovn-installed can provide more accuracy, why not?
> >>>
> >>>>> I did a quick test, and at least a flow in (table_id=38, priority=100)
> >>>>> which is multicast-group related is updated AFTER the SB notification
> >>> is
> >>>>> received for the port-binding chassis update.
> >>>>>
> >>>>
> >>>> This sounds like something we should fix, I think.  I don't see any
> >>>> multicast-group changes conditioned by the port_binding being up=true in
> >>>> northd.  I might be wrong though.
> >>>>
> >>>
> >>> It is not about "up=true". It is triggered by the port-binding->chassis
> >>> update. Since multicast-group has reference to port-binding, so a
> >>> port-binding update triggers multicast-group change handling, which is
> >>> required because physical flows related to the MC group need to be updated
> >>> when port-binding->chassis is updated. You may argue that the IDL may be
> >>> optimized so that the MC group change can be triggered and handled before
> >>> SB is updated, but I am not sure if the benefit is worth the complexity.
> >>> Given how OVSDB IDL transaction is designed, I'd always think a DB record
> >>> is *formally* updated only after the update notification is received from
> >>> the server, which seems to be safe and clear.
> >>>
> >>>>> Secondly, if the change hasn't made it to the SB, all the other nodes
> >>> would
> >>>>> not be able to reach the port, which means the workload (pod/VM) cannot
> >>>>> receive traffic yet at this phase.
> >>>>>
> >>>>
> >>>> Even if the change made it to the SB we have no way of knowing that all
> >>>> other nodes processed it so we cannot know for sure that traffic can
> >>>> flow properly end-to-end.  But, like I said above, this doesn't matter
> >>>> if the semantics of ovn-installed=true are "all locally relevant flows
> >>>> are installed".
> >>>>
> >>> It's true that even SB is updated it doesn't ensure all the nodes
> >>> processed it, but I view it this way: at least from the current node's
> >>> point of view, its job is done and the other nodes are beyond its control.
> >>> On the other hand, if SB update failed, its job is not done yet. I am not
> >>> saying this is the only *correct* way, but just the way I am seeing it :).
> >>>
> >>>>> So, I think our goal is not to set ovn-installed early, but to set it
> >>>>> accurately (sometime may be ok to be conservative).
> >>>>>
> >>>>
> >>>> Sure, but waiting for the SB port_binding.chassis update might introduce
> >>>> significant spikes in latency if the SB is compacting (or just busy) at
> >>>> that moment.
> >>>>
> >>>> This might become an issue in large scale deployments as pods will take
> >>>> longer to be declared "ready".
> >>>>
> >>> I understand your concern, but if you think about it, no matter how the
> >>> pods are *declared* ready doesn't change the fact it is ready or not. It
> >>> doesn't make the real flow setup faster or slower.
> >>> If the CMS really wants to declare it ready earlier, it can just ignore
> >>> the ovn-installed flag check or flow check. What's the real benefit except
> >>> for metrics?
> >>>
> >>>>> In addition, ovn-monitor-all is not always true even in ovn-k8s. It is
> >>>>> configurable in ovn-k8s. (in our environment we set it to false, to
> >>> save
> >>>>> MEM and CPU for worker nodes, while sacrifice a little for the central
> >>> SB
> >>>>> DB)
> >>>>>
> >>>>
> >>>> Ack.  But for this case specifically, as SB is already busier with
> >>>> conditional monitoring, I think serializing events in ovn-controller
> >>>> will create even more visible delays in pod bringup times.
> >>>>
> >>>> Thanks,
> >>>> Dumitru
> >>>>
> >>>>> Thanks,
> >>>>> Han
> >>>>>>>
> >>>>>>> Thanks again for your feedback
> >>>>>>>
> >>>>>>> Xavier
> >>>>>>>
> >>>>>>> On Thu, Jul 7, 2022 at 8:49 AM Han Zhou <hzhou@ovn.org> wrote:
> >>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> On Wed, Jun 22, 2022 at 3:23 AM Xavier Simonart <
> >>> xsimonar@redhat.com>
> >>>>>>>> wrote:
> >>>>>>>>>
> >>>>>>>>> When VIF ports are claimed on a chassis, SBDB Port_Binding table is
> >>>>>>>> updated.
> >>>>>>>>> If the SBDB IDL is still is read-only ("in transaction") when such
> >>> a
> >>>>>>>> update
> >>>>>>>>> is required, the update is not possible and recompute is triggered
> >>>>>>>> through
> >>>>>>>>> I+P failure.
> >>>>>>>>>
> >>>>>>>>> This situation can happen:
> >>>>>>>>> - after updating Port_Binding->chassis to SBDB for one port, in a
> >>>>>>>> following
> >>>>>>>>>   iteration, ovn-controller handles
> >>>>> Interface:external_ids:ovn-installed
> >>>>>>>>>   (for the same port) while SBDB is still read-only.
> >>>>>>>>> - after updating Port_Binding->chassis to SBDB for one port, in a
> >>>>>>>> following
> >>>>>>>>>   iteration, ovn-controller updates Port_Binding->chassis for
> >>> another
> >>>>>>>> port,
> >>>>>>>>>   while SBDB is still read-only.
> >>>>>>>>>
> >>>>>>>>> This patch prevent the recompute, by having the if-status module
> >>>>>>>>> updating the Port_Binding chassis (if needed) when possible.
> >>>>>>>>> This does not delay Port_Binding chassis update compared to before
> >>>>> this
> >>>>>>>> patch.
> >>>>>>>>> - With the patch, Port_Binding chassis will be updated as soon as
> >>>>> SBDB is
> >>>>>>>>> again writable, without recompute.
> >>>>>>>>> - Without the patch, Port_Binding chassis was updated as soon as
> >>> SBDB
> >>>>> was
> >>>>>>>>> again writable, through a recompute.
> >>>>>>>>>
> >>>>>>>>> As part of this patch, ovn-installed will not be updated for
> >>>>> additional
> >>>>>>>> chassis;
> >>>>>>>>> it will only be updated when the migration is completed.
> >>>>>>>>>
> >>>>>>>>> Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=2059253
> >>>>>>>>> Signed-off-by: Xavier Simonart <xsimonar@redhat.com>
> >>>>>>>>>
> >>>>>>>>> ---
> >>>>>>>>> v2:  - handled Dumitru's comments.
> >>>>>>>>>      - handled Han's comments, mainly ensure we moved out of
> >>> CLAIMED
> >>>>>>>> state
> >>>>>>>>>        only after updating pb->chassis to guarentee physical flows
> >>> are
> >>>>>>>> installed
> >>>>>>>>>        when ovn-installed is updated in OVS.
> >>>>>>>>>      - slighly reorganize the code to isolate 'notify_up = false'
> >>>>> cases
> >>>>>>>> in
> >>>>>>>>>        claim_port (i.e. ports such as virtual ports), in the idea
> >>> of
> >>>>>>>> making
> >>>>>>>>>        future patch preventing recomputes when virtual ports are
> >>>>> claimed.
> >>>>>>>>>      - updated test case to cause more race conditions.
> >>>>>>>>>      - rebased on origin/main
> >>>>>>>>>      - note that "additional chassis" as now supported by
> >>>>>>>>>        "Support LSP:options:requested-chassis as a list" might
> >>> still
> >>>>>>>> cause
> >>>>>>>>>        recomputes.
> >>>>>>>>>      - fixed missing flows when Port_Binding chassis was updated by
> >>>>>>>> mgr_update
> >>>>>>>>>        w/o any lflow recalculation.
> >>>>>>>>> v3:  - handled Dumitru's comments on v2, mainly have runtime_data
> >>>>> handler
> >>>>>>>>>        handling pb_claims when sb becomes writable (instead of a
> >>> lflow
> >>>>>>>> handler).
> >>>>>>>>>      - fixed test as it was not checking recomputes on all hv, as
> >>> well
> >>>>>>>> as a flaky
> >>>>>>>>>        behavior.
> >>>>>>>>>      - rebased on origin/main.
> >>>>>>>>> ---
> >>>>>>>>>  controller/binding.c        | 154 +++++++++++++++++++++----------
> >>>>>>>>>  controller/binding.h        |  15 +++-
> >>>>>>>>>  controller/if-status.c      | 174
> >>>>> ++++++++++++++++++++++++++++++++----
> >>>>>>>>>  controller/if-status.h      |  16 +++-
> >>>>>>>>>  controller/ovn-controller.c |  72 ++++++++++++++-
> >>>>>>>>>  tests/ovn-macros.at         |  12 +++
> >>>>>>>>>  tests/ovn.at                | 147 +++++++++++++++++++++++++++++-
> >>>>>>>>>  tests/perf-northd.at        |  17 ----
> >>>>>>>>>  8 files changed, 519 insertions(+), 88 deletions(-)
> >>>>>>>>>
> >>>>>>>>> diff --git a/controller/binding.c b/controller/binding.c
> >>>>>>>>> index 2279570f9..b21577f71 100644
> >>>>>>>>> --- a/controller/binding.c
> >>>>>>>>> +++ b/controller/binding.c
> >>>>>>>>> @@ -644,11 +644,17 @@ local_binding_get_lport_ofport(const struct
> >>>>> shash
> >>>>>>>> *local_bindings,
> >>>>>>>>>  }
> >>>>>>>>>
> >>>>>>>>>  bool
> >>>>>>>>> -local_binding_is_up(struct shash *local_bindings, const char
> >>>>> *pb_name)
> >>>>>>>>> +local_binding_is_up(struct shash *local_bindings, const char
> >>>>> *pb_name,
> >>>>>>>>> +                    const struct sbrec_chassis *chassis_rec)
> >>>>>>>>>  {
> >>>>>>>>>      struct local_binding *lbinding =
> >>>>>>>>>          local_binding_find(local_bindings, pb_name);
> >>>>>>>>>      struct binding_lport *b_lport =
> >>>>>>>> local_binding_get_primary_lport(lbinding);
> >>>>>>>>> +
> >>>>>>>>> +    if (b_lport && b_lport->pb->chassis != chassis_rec) {
> >>>>>>>>> +        return false;
> >>>>>>>>> +    }
> >>>>>>>>> +
> >>>>>>>>>      if (lbinding && b_lport && lbinding->iface) {
> >>>>>>>>>          if (b_lport->pb->n_up && !b_lport->pb->up[0]) {
> >>>>>>>>>              return false;
> >>>>>>>>> @@ -660,13 +666,23 @@ local_binding_is_up(struct shash
> >>>>> *local_bindings,
> >>>>>>>> const char *pb_name)
> >>>>>>>>>  }
> >>>>>>>>>
> >>>>>>>>>  bool
> >>>>>>>>> -local_binding_is_down(struct shash *local_bindings, const char
> >>>>> *pb_name)
> >>>>>>>>> +local_binding_is_down(struct shash *local_bindings, const char
> >>>>> *pb_name,
> >>>>>>>>> +                      const struct sbrec_chassis *chassis_rec)
> >>>>>>>>>  {
> >>>>>>>>>      struct local_binding *lbinding =
> >>>>>>>>>          local_binding_find(local_bindings, pb_name);
> >>>>>>>>>
> >>>>>>>>>      struct binding_lport *b_lport =
> >>>>>>>> local_binding_get_primary_lport(lbinding);
> >>>>>>>>>
> >>>>>>>>> +    if (b_lport) {
> >>>>>>>>> +        if (b_lport->pb->chassis == chassis_rec) {
> >>>>>>>>> +            return false;
> >>>>>>>>> +        } else if (b_lport->pb->chassis) {
> >>>>>>>>> +            VLOG_DBG("lport %s already claimed by other chassis",
> >>>>>>>>> +                     b_lport->pb->logical_port);
> >>>>>>>>> +        }
> >>>>>>>>> +    }
> >>>>>>>>> +
> >>>>>>>>>      if (!lbinding) {
> >>>>>>>>>          return true;
> >>>>>>>>>      }
> >>>>>>>>> @@ -884,37 +900,80 @@ get_lport_type_str(enum en_lport_type
> >>>>> lport_type)
> >>>>>>>>>      OVS_NOT_REACHED();
> >>>>>>>>>  }
> >>>>>>>>>
> >>>>>>>>> -/* For newly claimed ports, if 'notify_up' is 'false':
> >>>>>>>>> +void
> >>>>>>>>> +set_pb_chassis_in_sbrec(const struct sbrec_port_binding *pb,
> >>>>>>>>> +                        const struct sbrec_chassis *chassis_rec,
> >>>>>>>>> +                        bool is_set)
> >>>>>>>>> +{
> >>>>>>>>> +    if (pb->chassis != chassis_rec) {
> >>>>>>>>> +         if (is_set) {
> >>>>>>>>> +            if (pb->chassis) {
> >>>>>>>>> +                VLOG_INFO("Changing chassis for lport %s from %s
> >>> to
> >>>>>>>> %s.",
> >>>>>>>>> +                          pb->logical_port, pb->chassis->name,
> >>>>>>>>> +                          chassis_rec->name);
> >>>>>>>>> +            } else {
> >>>>>>>>> +                VLOG_INFO("Claiming lport %s for this chassis.",
> >>>>>>>>> +                          pb->logical_port);
> >>>>>>>>> +            }
> >>>>>>>>> +            for (int i = 0; i < pb->n_mac; i++) {
> >>>>>>>>> +                VLOG_INFO("%s: Claiming %s", pb->logical_port,
> >>>>>>>> pb->mac[i]);
> >>>>>>>>> +            }
> >>>>>>>>> +            sbrec_port_binding_set_chassis(pb, chassis_rec);
> >>>>>>>>> +        }
> >>>>>>>>> +    } else if (!is_set) {
> >>>>>>>>> +        sbrec_port_binding_set_chassis(pb, NULL);
> >>>>>>>>> +    }
> >>>>>>>>> +}
> >>>>>>>>> +
> >>>>>>>>> +void
> >>>>>>>>> +local_binding_set_pb(struct shash *local_bindings, const char
> >>>>> *pb_name,
> >>>>>>>>> +                     const struct sbrec_chassis *chassis_rec,
> >>>>>>>>> +                     struct hmap *tracked_datapaths, bool is_set)
> >>>>>>>>> +{
> >>>>>>>>> +    struct local_binding *lbinding =
> >>>>>>>>> +        local_binding_find(local_bindings, pb_name);
> >>>>>>>>> +    struct binding_lport *b_lport =
> >>>>>>>> local_binding_get_primary_lport(lbinding);
> >>>>>>>>> +
> >>>>>>>>> +    if (b_lport) {
> >>>>>>>>> +        set_pb_chassis_in_sbrec(b_lport->pb, chassis_rec, is_set);
> >>>>>>>>> +        if (tracked_datapaths) {
> >>>>>>>>> +            update_lport_tracking(b_lport->pb, tracked_datapaths,
> >>>>> true);
> >>>>>>>>> +        }
> >>>>>>>>> +    }
> >>>>>>>>> +}
> >>>>>>>>> +
> >>>>>>>>> +/* For newly claimed ports:
> >>>>>>>>>   * - set the 'pb.up' field to true if 'pb' has no 'parent_pb'.
> >>>>>>>>>   * - set the 'pb.up' field to true if 'parent_pb.up' is 'true'
> >>> (e.g.,
> >>>>>>>> for
> >>>>>>>>>   *   container and virtual ports).
> >>>>>>>>> - * Otherwise request a notification to be sent when the OVS flows
> >>>>>>>>> - * corresponding to 'pb' have been installed.
> >>>>>>>>> + *
> >>>>>>>>> + * Returns false if lport is not claimed due to 'sb_readonly'.
> >>>>>>>>> + * Returns true otherwise.
> >>>>>>>>>   *
> >>>>>>>>>   * Note:
> >>>>>>>>> - *   Updates (directly or through a notification) the 'pb->up'
> >>> field
> >>>>>>>> only if
> >>>>>>>>> - *   it's explicitly set to 'false'.
> >>>>>>>>> + *   Updates the 'pb->up' field only if it's explicitly set to
> >>>>> 'false'.
> >>>>>>>>>   *   This is to ensure compatibility with older versions of
> >>>>> ovn-northd.
> >>>>>>>>>   */
> >>>>>>>>> -static void
> >>>>>>>>> +static bool
> >>>>>>>>>  claimed_lport_set_up(const struct sbrec_port_binding *pb,
> >>>>>>>>>                       const struct sbrec_port_binding *parent_pb,
> >>>>>>>>> -                     const struct sbrec_chassis *chassis_rec,
> >>>>>>>>> -                     bool notify_up, struct if_status_mgr *if_mgr)
> >>>>>>>>> +                     bool sb_readonly)
> >>>>>>>>>  {
> >>>>>>>>> -    if (!notify_up) {
> >>>>>>>>> -        bool up = true;
> >>>>>>>>> -        if (!parent_pb || (parent_pb->n_up && parent_pb->up[0])) {
> >>>>>>>>> +    /* When notify_up is false in claim_port(), no state is
> >>> created
> >>>>>>>>> +     * by if_status_mgr. In such cases, return false (i.e. trigger
> >>>>>>>> recompute)
> >>>>>>>>> +     * if we can't update sb (because it is readonly).
> >>>>>>>>> +     */
> >>>>>>>>> +    bool up = true;
> >>>>>>>>> +    if (!parent_pb || (parent_pb->n_up && parent_pb->up[0])) {
> >>>>>>>>> +        if (!sb_readonly) {
> >>>>>>>>>              if (pb->n_up) {
> >>>>>>>>>                  sbrec_port_binding_set_up(pb, &up, 1);
> >>>>>>>>>              }
> >>>>>>>>> +        } else if (pb->n_up && !pb->up[0]) {
> >>>>>>>>> +            return false;
> >>>>>>>>>          }
> >>>>>>>>> -        return;
> >>>>>>>>> -    }
> >>>>>>>>> -
> >>>>>>>>> -    if (pb->chassis != chassis_rec || (pb->n_up && !pb->up[0])) {
> >>>>>>>>> -        if_status_mgr_claim_iface(if_mgr, pb->logical_port);
> >>>>>>>>>      }
> >>>>>>>>> +    return true;
> >>>>>>>>>  }
> >>>>>>>>>
> >>>>>>>>>  typedef void (*set_func)(const struct sbrec_port_binding *pb,
> >>>>>>>>> @@ -1057,37 +1116,35 @@ claim_lport(const struct sbrec_port_binding
> >>>>> *pb,
> >>>>>>>>>              struct hmap *tracked_datapaths,
> >>>>>>>>>              struct if_status_mgr *if_mgr)
> >>>>>>>>>  {
> >>>>>>>>> -    if (!sb_readonly) {
> >>>>>>>>> -        claimed_lport_set_up(pb, parent_pb, chassis_rec,
> >>> notify_up,
> >>>>>>>> if_mgr);
> >>>>>>>>> -    }
> >>>>>>>>> -
> >>>>>>>>>      enum can_bind can_bind =
> >>>>>>>> lport_can_bind_on_this_chassis(chassis_rec, pb);
> >>>>>>>>>      bool update_tracked = false;
> >>>>>>>>>
> >>>>>>>>>      if (can_bind == CAN_BIND_AS_MAIN) {
> >>>>>>>>>          if (pb->chassis != chassis_rec) {
> >>>>>>>>> -            if (sb_readonly) {
> >>>>>>>>> -                return false;
> >>>>>>>>> -            }
> >>>>>>>>> -
> >>>>>>>>> -            if (pb->chassis) {
> >>>>>>>>> -                VLOG_INFO("Changing chassis for lport %s from %s
> >>> to
> >>>>>>>> %s.",
> >>>>>>>>> -                        pb->logical_port, pb->chassis->name,
> >>>>>>>>> -                        chassis_rec->name);
> >>>>>>>>> -            } else {
> >>>>>>>>> -                VLOG_INFO("Claiming lport %s for this chassis.",
> >>>>>>>>> -                          pb->logical_port);
> >>>>>>>>> -            }
> >>>>>>>>> -            for (size_t i = 0; i < pb->n_mac; i++) {
> >>>>>>>>> -                VLOG_INFO("%s: Claiming %s", pb->logical_port,
> >>>>>>>> pb->mac[i]);
> >>>>>>>>> -            }
> >>>>>>>>> -
> >>>>>>>>> -            sbrec_port_binding_set_chassis(pb, chassis_rec);
> >>>>>>>>>              if (is_additional_chassis(pb, chassis_rec)) {
> >>>>>>>>> +                if (sb_readonly) {
> >>>>>>>>> +                    return false;
> >>>>>>>>> +                }
> >>>>>>>>>                  remove_additional_chassis(pb, chassis_rec);
> >>>>>>>>>              }
> >>>>>>>>>              update_tracked = true;
> >>>>>>>>>          }
> >>>>>>>>> +        if (!notify_up) {
> >>>>>>>>> +            if (!claimed_lport_set_up(pb, parent_pb,
> >>> sb_readonly)) {
> >>>>>>>>> +                return false;
> >>>>>>>>> +            }
> >>>>>>>>> +            if (pb->chassis != chassis_rec) {
> >>>>>>>>> +                if (sb_readonly) {
> >>>>>>>>> +                    return false;
> >>>>>>>>> +                }
> >>>>>>>>> +                set_pb_chassis_in_sbrec(pb, chassis_rec, true);
> >>>>>>>>> +            }
> >>>>>>>>> +        } else {
> >>>>>>>>> +            if ((pb->chassis != chassis_rec) || (pb->n_up &&
> >>>>>>>> !pb->up[0])) {
> >>>>>>>>> +                if_status_mgr_claim_iface(if_mgr, pb, chassis_rec,
> >>>>>>>>> +                                          sb_readonly);
> >>>>>>>>> +            }
> >>>>>>>>> +        }
> >>>>>>>>>      } else if (can_bind == CAN_BIND_AS_ADDITIONAL) {
> >>>>>>>>>          if (!is_additional_chassis(pb, chassis_rec)) {
> >>>>>>>>>              if (sb_readonly) {
> >>>>>>>>> @@ -1132,7 +1189,8 @@ claim_lport(const struct sbrec_port_binding
> >>> *pb,
> >>>>>>>>>   */
> >>>>>>>>>  static bool
> >>>>>>>>>  release_lport_main_chassis(const struct sbrec_port_binding *pb,
> >>>>>>>>> -                           bool sb_readonly)
> >>>>>>>>> +                           bool sb_readonly,
> >>>>>>>>> +                           struct if_status_mgr *if_mgr)
> >>>>>>>>>  {
> >>>>>>>>>      if (pb->encap) {
> >>>>>>>>>          if (sb_readonly) {
> >>>>>>>>> @@ -1141,11 +1199,13 @@ release_lport_main_chassis(const struct
> >>>>>>>> sbrec_port_binding *pb,
> >>>>>>>>>          sbrec_port_binding_set_encap(pb, NULL);
> >>>>>>>>>      }
> >>>>>>>>>
> >>>>>>>>> +    /* If sb readonly, pb->chassis unset through if-status if
> >>>>> present.
> >>>>>>>> */
> >>>>>>>>>      if (pb->chassis) {
> >>>>>>>>> -        if (sb_readonly) {
> >>>>>>>>> +        if (!sb_readonly) {
> >>>>>>>>> +            sbrec_port_binding_set_chassis(pb, NULL);
> >>>>>>>>> +        } else if (!if_status_mgr_iface_is_present(if_mgr,
> >>>>>>>> pb->logical_port)) {
> >>>>>>>>>              return false;
> >>>>>>>>>          }
> >>>>>>>>> -        sbrec_port_binding_set_chassis(pb, NULL);
> >>>>>>>>>      }
> >>>>>>>>>
> >>>>>>>>>      if (pb->virtual_parent) {
> >>>>>>>>> @@ -1155,7 +1215,8 @@ release_lport_main_chassis(const struct
> >>>>>>>> sbrec_port_binding *pb,
> >>>>>>>>>          sbrec_port_binding_set_virtual_parent(pb, NULL);
> >>>>>>>>>      }
> >>>>>>>>>
> >>>>>>>>> -    VLOG_INFO("Releasing lport %s from this chassis.",
> >>>>>>>> pb->logical_port);
> >>>>>>>>> +    VLOG_INFO("Releasing lport %s from this chassis
> >>>>> (sb_readonly=%d)",
> >>>>>>>>> +              pb->logical_port, sb_readonly);
> >>>>>>>>>      return true;
> >>>>>>>>>  }
> >>>>>>>>>
> >>>>>>>>> @@ -1189,7 +1250,7 @@ release_lport(const struct sbrec_port_binding
> >>>>> *pb,
> >>>>>>>>>                struct hmap *tracked_datapaths, struct if_status_mgr
> >>>>>>>> *if_mgr)
> >>>>>>>>>  {
> >>>>>>>>>      if (pb->chassis == chassis_rec) {
> >>>>>>>>> -        if (!release_lport_main_chassis(pb, sb_readonly)) {
> >>>>>>>>> +        if (!release_lport_main_chassis(pb, sb_readonly, if_mgr))
> >>> {
> >>>>>>>>>              return false;
> >>>>>>>>>          }
> >>>>>>>>>      } else if (is_additional_chassis(pb, chassis_rec)) {
> >>>>>>>>> @@ -1271,7 +1332,7 @@ consider_vif_lport_(const struct
> >>>>>>>> sbrec_port_binding *pb,
> >>>>>>>>>                               b_lport->lbinding->iface,
> >>>>>>>>>                               !b_ctx_in->ovnsb_idl_txn,
> >>>>>>>>>                               !parent_pb,
> >>>>> b_ctx_out->tracked_dp_bindings,
> >>>>>>>>> -                             b_ctx_out->if_mgr)){
> >>>>>>>>> +                             b_ctx_out->if_mgr)) {
> >>>>>>>>>                  return false;
> >>>>>>>>>              }
> >>>>>>>>>
> >>>>>>>>> @@ -1527,7 +1588,8 @@ consider_localport(const struct
> >>>>> sbrec_port_binding
> >>>>>>>> *pb,
> >>>>>>>>>      enum can_bind can_bind = lport_can_bind_on_this_chassis(
> >>>>>>>>>          b_ctx_in->chassis_rec, pb);
> >>>>>>>>>      if (can_bind == CAN_BIND_AS_MAIN) {
> >>>>>>>>> -        if (!release_lport_main_chassis(pb,
> >>>>> !b_ctx_in->ovnsb_idl_txn)) {
> >>>>>>>>> +        if (!release_lport_main_chassis(pb,
> >>> !b_ctx_in->ovnsb_idl_txn,
> >>>>>>>>> +            b_ctx_out->if_mgr)) {
> >>>>>>>>>              return false;
> >>>>>>>>>          }
> >>>>>>>>>      } else if (can_bind == CAN_BIND_AS_ADDITIONAL) {
> >>>>>>>>> diff --git a/controller/binding.h b/controller/binding.h
> >>>>>>>>> index 1fed06674..d20659b0b 100644
> >>>>>>>>> --- a/controller/binding.h
> >>>>>>>>> +++ b/controller/binding.h
> >>>>>>>>> @@ -151,8 +151,10 @@ const struct sbrec_port_binding
> >>>>>>>> *local_binding_get_primary_pb(
> >>>>>>>>>  ofp_port_t local_binding_get_lport_ofport(const struct shash
> >>>>>>>> *local_bindings,
> >>>>>>>>>                                            const char *pb_name);
> >>>>>>>>>
> >>>>>>>>> -bool local_binding_is_up(struct shash *local_bindings, const char
> >>>>>>>> *pb_name);
> >>>>>>>>> -bool local_binding_is_down(struct shash *local_bindings, const
> >>> char
> >>>>>>>> *pb_name);
> >>>>>>>>> +bool local_binding_is_up(struct shash *local_bindings, const char
> >>>>>>>> *pb_name,
> >>>>>>>>> +                         const struct sbrec_chassis *);
> >>>>>>>>> +bool local_binding_is_down(struct shash *local_bindings, const
> >>> char
> >>>>>>>> *pb_name,
> >>>>>>>>> +                           const struct sbrec_chassis *);
> >>>>>>>>>  void local_binding_set_up(struct shash *local_bindings, const char
> >>>>>>>> *pb_name,
> >>>>>>>>>                            const struct sbrec_chassis *chassis_rec,
> >>>>>>>>>                            const char *ts_now_str, bool
> >>> sb_readonly,
> >>>>>>>>> @@ -160,7 +162,10 @@ void local_binding_set_up(struct shash
> >>>>>>>> *local_bindings, const char *pb_name,
> >>>>>>>>>  void local_binding_set_down(struct shash *local_bindings, const
> >>> char
> >>>>>>>> *pb_name,
> >>>>>>>>>                              const struct sbrec_chassis
> >>> *chassis_rec,
> >>>>>>>>>                              bool sb_readonly, bool ovs_readonly);
> >>>>>>>>> -
> >>>>>>>>> +void local_binding_set_pb(struct shash *local_bindings, const char
> >>>>>>>> *pb_name,
> >>>>>>>>> +                          const struct sbrec_chassis *chassis_rec,
> >>>>>>>>> +                          struct hmap *tracked_datapaths,
> >>>>>>>>> +                          bool is_set);
> >>>>>>>>>  void binding_register_ovs_idl(struct ovsdb_idl *);
> >>>>>>>>>  void binding_run(struct binding_ctx_in *, struct binding_ctx_out
> >>> *);
> >>>>>>>>>  bool binding_cleanup(struct ovsdb_idl_txn *ovnsb_idl_txn,
> >>>>>>>>> @@ -178,6 +183,10 @@ void binding_dump_local_bindings(struct
> >>>>>>>> local_binding_data *, struct ds *);
> >>>>>>>>>  bool is_additional_chassis(const struct sbrec_port_binding *pb,
> >>>>>>>>>                             const struct sbrec_chassis
> >>> *chassis_rec);
> >>>>>>>>>
> >>>>>>>>> +void set_pb_chassis_in_sbrec(const struct sbrec_port_binding *pb,
> >>>>>>>>> +                             const struct sbrec_chassis
> >>> *chassis_rec,
> >>>>>>>>> +                             bool is_set);
> >>>>>>>>> +
> >>>>>>>>>  /* Corresponds to each Port_Binding.type. */
> >>>>>>>>>  enum en_lport_type {
> >>>>>>>>>      LP_UNKNOWN,
> >>>>>>>>> diff --git a/controller/if-status.c b/controller/if-status.c
> >>>>>>>>> index ad61844d8..7693c289b 100644
> >>>>>>>>> --- a/controller/if-status.c
> >>>>>>>>> +++ b/controller/if-status.c
> >>>>>>>>> @@ -24,6 +24,7 @@
> >>>>>>>>>  #include "lib/util.h"
> >>>>>>>>>  #include "timeval.h"
> >>>>>>>>>  #include "openvswitch/vlog.h"
> >>>>>>>>> +#include "lib/ovn-sb-idl.h"
> >>>>>>>>>
> >>>>>>>>>  VLOG_DEFINE_THIS_MODULE(if_status);
> >>>>>>>>>
> >>>>>>>>> @@ -53,9 +54,11 @@ VLOG_DEFINE_THIS_MODULE(if_status);
> >>>>>>>>>   */
> >>>>>>>>>
> >>>>>>>>>  enum if_state {
> >>>>>>>>> -    OIF_CLAIMED,       /* Newly claimed interface. */
> >>>>>>>>> -    OIF_INSTALL_FLOWS, /* Already claimed interface for which
> >>> flows
> >>>>> are
> >>>>>>>> still
> >>>>>>>>> -                        * being installed.
> >>>>>>>>> +    OIF_CLAIMED,       /* Newly claimed interface. pb->chassis not
> >>>>> yet
> >>>>>>>> updated.
> >>>>>>>>> +                        */
> >>>>>>>>> +    OIF_INSTALL_FLOWS, /* Claimed interface with pb->chassis
> >>>>>>>> successfully
> >>>>>>>>> +                        * updated in SB and for which flows are
> >>> still
> >>>>>>>> being
> >>>>>>>>> +                        * installed.
> >>>>>>>>>                          */
> >>>>>>>>>      OIF_MARK_UP,       /* Interface with flows successfully
> >>> installed
> >>>>>>>> in OVS
> >>>>>>>>>                          * but not yet marked "up" in the binding
> >>>>> module
> >>>>>>>> (in
> >>>>>>>>> @@ -78,6 +81,55 @@ static const char *if_state_names[] = {
> >>>>>>>>>      [OIF_INSTALLED]     = "INSTALLED",
> >>>>>>>>>  };
> >>>>>>>>>
> >>>>>>>>> +/*
> >>>>>>>>> + *       +----------------------+
> >>>>>>>>> + * +---> |                      |
> >>>>>>>>> + * | +-> |         NULL         |
> >>>>>>>> <--------------------------------------+++-+
> >>>>>>>>> + * | |   +----------------------+
> >>>>>>>>      |
> >>>>>>>>> + * | |     ^ release_iface   | claim_iface
> >>>>>>>>       |
> >>>>>>>>> + * | |     |                 V - sbrec_update_chassis(if sb is rw)
> >>>>>>>>       |
> >>>>>>>>> + * | |   +----------------------+
> >>>>>>>>      |
> >>>>>>>>> + * | |   |                      |
> >>>>>>>> <----------------------------------------+ |
> >>>>>>>>> + * | |   |       CLAIMED        |
> >>>>>>>> <--------------------------------------+ | |
> >>>>>>>>> + * | |   +----------------------+
> >>>>>>>>  | | |
> >>>>>>>>> + * | |                  | mgr_update(when sb is rw)
> >>>>>>>>  | | |
> >>>>>>>>> + * | | release_iface    |  - sbrec_update_chassis
> >>>>>>>>  | | |
> >>>>>>>>> + * | |                  |  - request seqno
> >>>>>>>>   | | |
> >>>>>>>>> + * | |                  V
> >>>>>>>>  | | |
> >>>>>>>>> + * | |   +----------------------+
> >>>>>>>>  | | |
> >>>>>>>>> + * | +-- |                      |  mgr_run(seqno not rcvd)
> >>>>>>>>   | | |
> >>>>>>>>> + * |     |    INSTALL_FLOWS     |   - set port down in sb
> >>>>>>>>  | | |
> >>>>>>>>> + * |     |                      |  mgr_update()
> >>>>>>>>  | | |
> >>>>>>>>> + * |     +----------------------+   - sbrec_update_chassis if
> >>> needed
> >>>>>>>>   | | |
> >>>>>>>>> + * |                    |
> >>>>>>>>  | | |
> >>>>>>>>> + * |                    |  mgr_run(seqno rcvd)
> >>>>>>>>   | | |
> >>>>>>>>> + * |                    |  - set port up in sb
> >>>>>>>>   | | |
> >>>>>>>>> + * | release_iface      |  - set ovn-installed in ovs
> >>>>>>>>  | | |
> >>>>>>>>> + * |                    V
> >>>>>>>>  | | |
> >>>>>>>>> + * |   +----------------------+
> >>>>>>>>  | | |
> >>>>>>>>> + * |   |                      |  mgr_run()
> >>>>>>>>   | | |
> >>>>>>>>> + * +-- |       MARK_UP        |  - set port up in sb
> >>>>>>>>   | | |
> >>>>>>>>> + *     |                      |  - set ovn-installed in ovs
> >>>>>>>>  | | |
> >>>>>>>>> + *     |                      |  mgr_update()
> >>>>>>>>  | | |
> >>>>>>>>> + *     +----------------------+  - sbrec_update_chassis if needed
> >>>>>>>>  | | |
> >>>>>>>>> + *              |
> >>>>>>>>  | | |
> >>>>>>>>> + *              | mgr_update(rcvd port up / ovn_installed &
> >>> chassis
> >>>>>>>> set) | | |
> >>>>>>>>> + *              V
> >>>>>>>>  | | |
> >>>>>>>>> + *     +----------------------+
> >>>>>>>>  | | |
> >>>>>>>>> + *     |      INSTALLED       | ------------> claim_iface
> >>>>>>>> ---------------+ | |
> >>>>>>>>> + *     +----------------------+
> >>>>>>>>    | |
> >>>>>>>>> + *              |
> >>>>>>>>    | |
> >>>>>>>>> + *              | release_iface
> >>>>>>>>    | |
> >>>>>>>>> + *              V
> >>>>>>>>    | |
> >>>>>>>>> + *     +----------------------+
> >>>>>>>>    | |
> >>>>>>>>> + *     |                      | ------------> claim_iface
> >>>>>>>> -----------------+ |
> >>>>>>>>> + *     |      MARK_DOWN       | ------> mgr_update(rcvd port down)
> >>>>>>>> ----------+
> >>>>>>>>> + *     |                      | mgr_run()
> >>>>>>>>> + *     |                      | - set port down in sb
> >>>>>>>>> + *     |                      | mgr_update()
> >>>>>>>>> + *     +----------------------+ - sbrec_update_chassis(NULL)
> >>>>>>>>> + */
> >>>>>>>>> +
> >>>>>>>>>  struct ovs_iface {
> >>>>>>>>>      char *id;               /* Extracted from OVS
> >>>>>>>> external_ids.iface_id. */
> >>>>>>>>>      enum if_state state;    /* State of the interface in the state
> >>>>>>>> machine. */
> >>>>>>>>> @@ -85,6 +137,7 @@ struct ovs_iface {
> >>>>>>>>>                               * be fully programmed in OVS.  Only
> >>> used
> >>>>>>>> in state
> >>>>>>>>>                               * OIF_INSTALL_FLOWS.
> >>>>>>>>>                               */
> >>>>>>>>> +    bool chassis_update_required;  /* If true, pb->chassis must be
> >>>>>>>> updated. */
> >>>>>>>>>  };
> >>>>>>>>>
> >>>>>>>>>  static uint64_t ifaces_usage;
> >>>>>>>>> @@ -158,14 +211,23 @@ if_status_mgr_destroy(struct if_status_mgr
> >>> *mgr)
> >>>>>>>>>  }
> >>>>>>>>>
> >>>>>>>>>  void
> >>>>>>>>> -if_status_mgr_claim_iface(struct if_status_mgr *mgr, const char
> >>>>>>>> *iface_id)
> >>>>>>>>> +if_status_mgr_claim_iface(struct if_status_mgr *mgr,
> >>>>>>>>> +                          const struct sbrec_port_binding *pb,
> >>>>>>>>> +                          const struct sbrec_chassis *chassis_rec,
> >>>>>>>>> +                          bool sb_readonly)
> >>>>>>>>>  {
> >>>>>>>>> +    const char *iface_id = pb->logical_port;
> >>>>>>>>>      struct ovs_iface *iface = shash_find_data(&mgr->ifaces,
> >>>>> iface_id);
> >>>>>>>>>
> >>>>>>>>>      if (!iface) {
> >>>>>>>>>          iface = ovs_iface_create(mgr, iface_id, OIF_CLAIMED);
> >>>>>>>>>      }
> >>>>>>>>> -
> >>>>>>>>> +    if (!sb_readonly) {
> >>>>>>>>> +        set_pb_chassis_in_sbrec(pb, chassis_rec, true);
> >>>>>>>>> +        iface->chassis_update_required = false;
> >>>>>>>>> +    } else {
> >>>>>>>>> +        iface->chassis_update_required = true;
> >>>>>>>>> +    }
> >>>>>>>>>      switch (iface->state) {
> >>>>>>>>>      case OIF_CLAIMED:
> >>>>>>>>>      case OIF_INSTALL_FLOWS:
> >>>>>>>>> @@ -182,6 +244,12 @@ if_status_mgr_claim_iface(struct if_status_mgr
> >>>>>>>> *mgr, const char *iface_id)
> >>>>>>>>>      }
> >>>>>>>>>  }
> >>>>>>>>>
> >>>>>>>>> +bool
> >>>>>>>>> +if_status_mgr_iface_is_present(struct if_status_mgr *mgr, const
> >>> char
> >>>>>>>> *iface_id)
> >>>>>>>>> +{
> >>>>>>>>> +    return !!shash_find_data(&mgr->ifaces, iface_id);
> >>>>>>>>> +}
> >>>>>>>>> +
> >>>>>>>>>  void
> >>>>>>>>>  if_status_mgr_release_iface(struct if_status_mgr *mgr, const char
> >>>>>>>> *iface_id)
> >>>>>>>>>  {
> >>>>>>>>> @@ -246,9 +314,39 @@ if_status_mgr_delete_iface(struct
> >>> if_status_mgr
> >>>>>>>> *mgr, const char *iface_id)
> >>>>>>>>>      }
> >>>>>>>>>  }
> >>>>>>>>>
> >>>>>>>>> +bool
> >>>>>>>>> +if_status_handle_claims(struct if_status_mgr *mgr,
> >>>>>>>>> +                        struct local_binding_data *binding_data,
> >>>>>>>>> +                        const struct sbrec_chassis *chassis_rec,
> >>>>>>>>> +                        struct hmap *tracked_datapath,
> >>>>>>>>> +                        bool sb_readonly)
> >>>>>>>>> +{
> >>>>>>>>> +    if (!binding_data || sb_readonly) {
> >>>>>>>>> +        return false;
> >>>>>>>>> +    }
> >>>>>>>>> +
> >>>>>>>>> +    struct shash *bindings = &binding_data->bindings;
> >>>>>>>>> +    struct hmapx_node *node;
> >>>>>>>>> +
> >>>>>>>>> +    bool rc = false;
> >>>>>>>>> +    HMAPX_FOR_EACH (node, &mgr->ifaces_per_state[OIF_CLAIMED]) {
> >>>>>>>>> +        struct ovs_iface *iface = node->data;
> >>>>>>>>> +        if (iface->chassis_update_required) {
> >>>>>>>>
> >>>>>>>> Thanks Xavier for the revision. The state machine looks more clear
> >>> now,
> >>>>>>>> but I have a major concern for the use of chassis_update_required.
> >>> This
> >>>>>>>> bool flag is used to decide if an update to SB is needed, and once
> >>> a SB
> >>>>>>>> update is requested, it is set to false, and assumes the SB update
> >>> will
> >>>>>>>> succeed immediately. However, the assumption may be wrong. There
> >>> can be
> >>>>>>>> different kinds of reasons that the subsequent SB update fails, or
> >>>>> delayed,
> >>>>>>>> so this flag is not reliable. Instead, in CLAIMED state, the
> >>>>> responsibility
> >>>>>>>> to make sure the SB update is completed. If the transaction is
> >>>>> in-progress,
> >>>>>>>> the sb_readonly is true. So if sb_readonly is false, it means
> >>> nothing
> >>>>> is
> >>>>>>>> in-progress, so we can always check if (!sb_readonly && <SB chassis
> >>> is
> >>>>> not
> >>>>>>>> updated for the port-binding>) we should just send the update,
> >>>>> regardless
> >>>>>>>> of whether we have requested it before. Please also see another
> >>> comment
> >>>>>>>> below for the state transition.
> >>>>>>>>
> >>>>>>>>> +            VLOG_INFO("if_status_handle_claims for %s",
> >>> iface->id);
> >>>>>>>>> +            local_binding_set_pb(bindings, iface->id, chassis_rec,
> >>>>>>>>> +                                 tracked_datapath, true);
> >>>>>>>>> +            rc = true;
> >>>>>>>>> +        }
> >>>>>>>>> +        iface->chassis_update_required = false;
> >>>>>>>>> +    }
> >>>>>>>>> +    return rc;
> >>>>>>>>> +}
> >>>>>>>>> +
> >>>>>>>>>  void
> >>>>>>>>>  if_status_mgr_update(struct if_status_mgr *mgr,
> >>>>>>>>> -                     struct local_binding_data *binding_data)
> >>>>>>>>> +                     struct local_binding_data *binding_data,
> >>>>>>>>> +                     const struct sbrec_chassis *chassis_rec,
> >>>>>>>>> +                     bool sb_readonly)
> >>>>>>>>>  {
> >>>>>>>>>      if (!binding_data) {
> >>>>>>>>>          return;
> >>>>>>>>> @@ -257,13 +355,25 @@ if_status_mgr_update(struct if_status_mgr
> >>> *mgr,
> >>>>>>>>>      struct shash *bindings = &binding_data->bindings;
> >>>>>>>>>      struct hmapx_node *node;
> >>>>>>>>>
> >>>>>>>>> +    /* Interfaces in OIF_MARK_UP state have already set their
> >>>>>>>> pb->chassis.
> >>>>>>>>> +     * However, it might have been reset by another hv.
> >>>>>>>>> +     */
> >>>>>>>>>      /* Move all interfaces that have been confirmed "up" by the
> >>>>> binding
> >>>>>>>> module,
> >>>>>>>>>       * from OIF_MARK_UP to OIF_INSTALLED.
> >>>>>>>>>       */
> >>>>>>>>>      HMAPX_FOR_EACH_SAFE (node,
> >>> &mgr->ifaces_per_state[OIF_MARK_UP]) {
> >>>>>>>>>          struct ovs_iface *iface = node->data;
> >>>>>>>>>
> >>>>>>>>> -        if (local_binding_is_up(bindings, iface->id)) {
> >>>>>>>>> +        if (iface->chassis_update_required) {
> >>>>>>>>> +            if (!sb_readonly) {
> >>>>>>>>> +                iface->chassis_update_required = false;
> >>>>>>>>> +                local_binding_set_pb(bindings, iface->id,
> >>>>> chassis_rec,
> >>>>>>>>> +                                     NULL, true);
> >>>>>>>>> +            } else {
> >>>>>>>>> +                continue;
> >>>>>>>>> +            }
> >>>>>>>>> +        }
> >>>>>>>>> +        if (local_binding_is_up(bindings, iface->id,
> >>> chassis_rec)) {
> >>>>>>>>>              ovs_iface_set_state(mgr, iface, OIF_INSTALLED);
> >>>>>>>>>          }
> >>>>>>>>>      }
> >>>>>>>>> @@ -274,23 +384,53 @@ if_status_mgr_update(struct if_status_mgr
> >>> *mgr,
> >>>>>>>>>      HMAPX_FOR_EACH_SAFE (node,
> >>>>> &mgr->ifaces_per_state[OIF_MARK_DOWN]) {
> >>>>>>>>>          struct ovs_iface *iface = node->data;
> >>>>>>>>>
> >>>>>>>>> -        if (local_binding_is_down(bindings, iface->id)) {
> >>>>>>>>> +        if (!sb_readonly) {
> >>>>>>>>> +            local_binding_set_pb(bindings, iface->id, chassis_rec,
> >>>>>>>>> +                                 NULL, false);
> >>>>>>>>> +        }
> >>>>>>>>> +        if (local_binding_is_down(bindings, iface->id,
> >>> chassis_rec))
> >>>>> {
> >>>>>>>>>              ovs_iface_destroy(mgr, iface);
> >>>>>>>>>          }
> >>>>>>>>>      }
> >>>>>>>>>
> >>>>>>>>> -    /* Register for a notification about flows being installed in
> >>> OVS
> >>>>>>>> for all
> >>>>>>>>> -     * newly claimed interfaces.
> >>>>>>>>> +    if (!sb_readonly) {
> >>>>>>>>> +        HMAPX_FOR_EACH_SAFE (node,
> >>>>>>>> &mgr->ifaces_per_state[OIF_INSTALL_FLOWS]) {
> >>>>>>>>> +            struct ovs_iface *iface = node->data;
> >>>>>>>>> +
> >>>>>>>>> +            if (iface->chassis_update_required) {
> >>>>>>>>> +                iface->chassis_update_required = false;
> >>>>>>>>> +                local_binding_set_pb(bindings, iface->id,
> >>>>> chassis_rec,
> >>>>>>>>> +                                     NULL, true);
> >>>>>>>>> +            }
> >>>>>>>>> +        }
> >>>>>>>>> +    }
> >>>>>>>>> +
> >>>>>>>>> +    /* Update Port_Binding->chassis for newly claimed interfaces
> >>>>>>>>> +     * Register for a notification about flows being installed in
> >>> OVS
> >>>>>>>> for all
> >>>>>>>>> +     * newly claimed interfaces for which we could update
> >>>>> pb->chassis.
> >>>>>>>>>       *
> >>>>>>>>>       * Move them from OIF_CLAIMED to OIF_INSTALL_FLOWS.
> >>>>>>>>>       */
> >>>>>>>>> -    bool new_ifaces = false;
> >>>>>>>>> -    HMAPX_FOR_EACH_SAFE (node,
> >>> &mgr->ifaces_per_state[OIF_CLAIMED]) {
> >>>>>>>>> -        struct ovs_iface *iface = node->data;
> >>>>>>>>>
> >>>>>>>>> -        ovs_iface_set_state(mgr, iface, OIF_INSTALL_FLOWS);
> >>>>>>>>> -        iface->install_seqno = mgr->iface_seqno + 1;
> >>>>>>>>> -        new_ifaces = true;
> >>>>>>>>> +    bool new_ifaces = false;
> >>>>>>>>> +    if (!sb_readonly) {
> >>>>>>>>> +        HMAPX_FOR_EACH_SAFE (node,
> >>>>> &mgr->ifaces_per_state[OIF_CLAIMED])
> >>>>>>>> {
> >>>>>>>>> +            struct ovs_iface *iface = node->data;
> >>>>>>>>> +            /* No need to check for chassis_update_required as
> >>>>> already
> >>>>>>>> done
> >>>>>>>>> +             * in if_status_handle_claims or
> >>>>> if_status_mgr_claim_iface
> >>>>>>>>> +             */
> >>>>>>>>> +            ovs_iface_set_state(mgr, iface, OIF_INSTALL_FLOWS);
> >>>>>>>>
> >>>>>>>> We need to make sure the chassis in SB port-binding is up-to-date
> >>> (i.e.
> >>>>>>>> the update notification from SB DB has been received) before moving
> >>> to
> >>>>>>>> INSTALL_FLOWS. Otherwise, it is still possible that the state is
> >>> moved
> >>>>> too
> >>>>>>>> early and end up with incomplete flow installation for the lport
> >>> when
> >>>>> the
> >>>>>>>> state is finally moved to INSTALLED.
> >>>>>>>>
> >>>>>>>> Thanks,
> >>>>>>>> Han
> >>>>>>>>
> >>>>>>>>> +            iface->install_seqno = mgr->iface_seqno + 1;
> >>>>>>>>> +            new_ifaces = true;
> >>>>>>>>> +        }
> >>>>>>>>> +    } else {
> >>>>>>>>> +        HMAPX_FOR_EACH_SAFE (node,
> >>>>> &mgr->ifaces_per_state[OIF_CLAIMED])
> >>>>>>>> {
> >>>>>>>>> +            struct ovs_iface *iface = node->data;
> >>>>>>>>> +            static struct vlog_rate_limit rl =
> >>>>> VLOG_RATE_LIMIT_INIT(5,
> >>>>>>>> 1);
> >>>>>>>>> +            VLOG_INFO_RL(&rl,
> >>>>>>>>> +                         "Not updating pb chassis for %s now as "
> >>>>>>>>> +                         "sb is readonly", iface->id);
> >>>>>>>>> +        }
> >>>>>>>>>      }
> >>>>>>>>>
> >>>>>>>>>      /* Request a seqno update when the flows for new interfaces
> >>> have
> >>>>>>>> been
> >>>>>>>>> @@ -403,7 +543,7 @@ if_status_mgr_update_bindings(struct
> >>> if_status_mgr
> >>>>>>>> *mgr,
> >>>>>>>>>      struct hmapx_node *node;
> >>>>>>>>>
> >>>>>>>>>      /* Notify the binding module to set "down" all bindings that
> >>> are
> >>>>>>>> still
> >>>>>>>>> -     * in the process of being installed in OVS, i.e., are not yet
> >>>>>>>> instsalled.
> >>>>>>>>> +     * in the process of being installed in OVS, i.e., are not yet
> >>>>>>>> installed.
> >>>>>>>>>       */
> >>>>>>>>>      HMAPX_FOR_EACH (node,
> >>> &mgr->ifaces_per_state[OIF_INSTALL_FLOWS])
> >>>>> {
> >>>>>>>>>          struct ovs_iface *iface = node->data;
> >>>>>>>>> diff --git a/controller/if-status.h b/controller/if-status.h
> >>>>>>>>> index bb8a3950d..f9b05d30d 100644
> >>>>>>>>> --- a/controller/if-status.h
> >>>>>>>>> +++ b/controller/if-status.h
> >>>>>>>>> @@ -27,15 +27,27 @@ struct if_status_mgr
> >>> *if_status_mgr_create(void);
> >>>>>>>>>  void if_status_mgr_clear(struct if_status_mgr *);
> >>>>>>>>>  void if_status_mgr_destroy(struct if_status_mgr *);
> >>>>>>>>>
> >>>>>>>>> -void if_status_mgr_claim_iface(struct if_status_mgr *, const char
> >>>>>>>> *iface_id);
> >>>>>>>>> +void if_status_mgr_claim_iface(struct if_status_mgr *,
> >>>>>>>>> +                               const struct sbrec_port_binding
> >>> *pb,
> >>>>>>>>> +                               const struct sbrec_chassis
> >>>>> *chassis_rec,
> >>>>>>>>> +                               bool sb_readonly);
> >>>>>>>>>  void if_status_mgr_release_iface(struct if_status_mgr *, const
> >>> char
> >>>>>>>> *iface_id);
> >>>>>>>>>  void if_status_mgr_delete_iface(struct if_status_mgr *, const char
> >>>>>>>> *iface_id);
> >>>>>>>>>
> >>>>>>>>> -void if_status_mgr_update(struct if_status_mgr *, struct
> >>>>>>>> local_binding_data *);
> >>>>>>>>> +void if_status_mgr_update(struct if_status_mgr *, struct
> >>>>>>>> local_binding_data *,
> >>>>>>>>> +                          const struct sbrec_chassis *chassis,
> >>>>>>>>> +                          bool sb_readonly);
> >>>>>>>>>  void if_status_mgr_run(struct if_status_mgr *mgr, struct
> >>>>>>>> local_binding_data *,
> >>>>>>>>>                         const struct sbrec_chassis *,
> >>>>>>>>>                         bool sb_readonly, bool ovs_readonly);
> >>>>>>>>>  void if_status_mgr_get_memory_usage(struct if_status_mgr *mgr,
> >>>>>>>>>                                      struct simap *usage);
> >>>>>>>>> +bool if_status_mgr_iface_is_present(struct if_status_mgr *mgr,
> >>>>>>>>> +                                    const char *iface_id);
> >>>>>>>>> +bool if_status_handle_claims(struct if_status_mgr *mgr,
> >>>>>>>>> +                             struct local_binding_data
> >>> *binding_data,
> >>>>>>>>> +                             const struct sbrec_chassis
> >>> *chassis_rec,
> >>>>>>>>> +                             struct hmap *tracked_datapath,
> >>>>>>>>> +                             bool sb_readonly);
> >>>>>>>>>
> >>>>>>>>>  # endif /* controller/if-status.h */
> >>>>>>>>> diff --git a/controller/ovn-controller.c
> >>> b/controller/ovn-controller.c
> >>>>>>>>> index 69615308e..3947baf03 100644
> >>>>>>>>> --- a/controller/ovn-controller.c
> >>>>>>>>> +++ b/controller/ovn-controller.c
> >>>>>>>>> @@ -1464,6 +1464,73 @@ en_runtime_data_run(struct engine_node
> >>> *node,
> >>>>>>>> void *data)
> >>>>>>>>>      engine_set_node_state(node, EN_UPDATED);
> >>>>>>>>>  }
> >>>>>>>>>
> >>>>>>>>> +struct ed_type_sb_ro {
> >>>>>>>>> +    bool sb_readonly;
> >>>>>>>>> +};
> >>>>>>>>> +
> >>>>>>>>> +static void *
> >>>>>>>>> +en_sb_ro_init(struct engine_node *node OVS_UNUSED,
> >>>>>>>>> +              struct engine_arg *arg OVS_UNUSED)
> >>>>>>>>> +{
> >>>>>>>>> +    struct ed_type_sb_ro *data = xzalloc(sizeof *data);
> >>>>>>>>> +    return data;
> >>>>>>>>> +}
> >>>>>>>>> +
> >>>>>>>>> +static void
> >>>>>>>>> +en_sb_ro_run(struct engine_node *node, void *data)
> >>>>>>>>> +{
> >>>>>>>>> +    struct ed_type_sb_ro *sb_ro_data = data;
> >>>>>>>>> +    bool sb_readonly = !engine_get_context()->ovnsb_idl_txn;
> >>>>>>>>> +    if (sb_ro_data->sb_readonly != sb_readonly) {
> >>>>>>>>> +        sb_ro_data->sb_readonly = sb_readonly;
> >>>>>>>>> +        if (!sb_ro_data->sb_readonly) {
> >>>>>>>>> +            engine_set_node_state(node, EN_UPDATED);
> >>>>>>>>> +        }
> >>>>>>>>> +    }
> >>>>>>>>> +}
> >>>>>>>>> +
> >>>>>>>>> +static void
> >>>>>>>>> +en_sb_ro_cleanup(void *data OVS_UNUSED)
> >>>>>>>>> +{
> >>>>>>>>> +}
> >>>>>>>>> +
> >>>>>>>>> +static bool
> >>>>>>>>> +runtime_data_sb_ro_handler(struct engine_node *node, void *data)
> >>>>>>>>> +{
> >>>>>>>>> +    const struct sbrec_chassis *chassis = NULL;
> >>>>>>>>> +
> >>>>>>>>> +    struct ovsrec_open_vswitch_table *ovs_table =
> >>>>>>>>> +        (struct ovsrec_open_vswitch_table *)EN_OVSDB_GET(
> >>>>>>>>> +            engine_get_input("OVS_open_vswitch", node));
> >>>>>>>>> +
> >>>>>>>>> +    const char *chassis_id = get_ovs_chassis_id(ovs_table);
> >>>>>>>>> +
> >>>>>>>>> +    struct ovsdb_idl_index *sbrec_chassis_by_name =
> >>>>>>>>> +        engine_ovsdb_node_get_index(
> >>>>>>>>> +                engine_get_input("SB_chassis", node),
> >>>>>>>>> +                "name");
> >>>>>>>>> +
> >>>>>>>>> +    if (chassis_id) {
> >>>>>>>>> +        chassis = chassis_lookup_by_name(sbrec_chassis_by_name,
> >>>>>>>> chassis_id);
> >>>>>>>>> +    }
> >>>>>>>>> +    if (chassis) {
> >>>>>>>>> +        struct ed_type_runtime_data *rt_data = data;
> >>>>>>>>> +        bool sb_readonly = !engine_get_context()->ovnsb_idl_txn;
> >>>>>>>>> +        struct controller_engine_ctx *ctrl_ctx =
> >>>>>>>>> +            engine_get_context()->client_ctx;
> >>>>>>>>> +
> >>>>>>>>> +        if (if_status_handle_claims(ctrl_ctx->if_mgr,
> >>>>>>>>> +                                    &rt_data->lbinding_data,
> >>>>>>>>> +                                    chassis,
> >>>>>>>>> +                                    &rt_data->tracked_dp_bindings,
> >>>>>>>>> +                                    sb_readonly)) {
> >>>>>>>>> +            engine_set_node_state(node, EN_UPDATED);
> >>>>>>>>> +            rt_data->tracked = true;
> >>>>>>>>> +        }
> >>>>>>>>> +    }
> >>>>>>>>> +    return true;
> >>>>>>>>> +}
> >>>>>>>>> +
> >>>>>>>>>  static bool
> >>>>>>>>>  runtime_data_ovs_interface_shadow_handler(struct engine_node
> >>> *node,
> >>>>>>>> void *data)
> >>>>>>>>>  {
> >>>>>>>>> @@ -3528,6 +3595,7 @@ main(int argc, char *argv[])
> >>>>>>>>>      stopwatch_create(VIF_PLUG_RUN_STOPWATCH_NAME, SW_MS);
> >>>>>>>>>
> >>>>>>>>>      /* Define inc-proc-engine nodes. */
> >>>>>>>>> +    ENGINE_NODE(sb_ro, "sb_ro");
> >>>>>>>>>      ENGINE_NODE_WITH_CLEAR_TRACK_DATA_IS_VALID(ct_zones,
> >>> "ct_zones");
> >>>>>>>>>      ENGINE_NODE_WITH_CLEAR_TRACK_DATA(ovs_interface_shadow,
> >>>>>>>>>                                        "ovs_interface_shadow");
> >>>>>>>>> @@ -3664,6 +3732,7 @@ main(int argc, char *argv[])
> >>>>>>>>>      engine_add_input(&en_ovs_interface_shadow, &en_ovs_interface,
> >>>>>>>>>                       ovs_interface_shadow_ovs_interface_handler);
> >>>>>>>>>
> >>>>>>>>> +    engine_add_input(&en_runtime_data, &en_sb_ro,
> >>>>>>>> runtime_data_sb_ro_handler);
> >>>>>>>>>      engine_add_input(&en_runtime_data, &en_ofctrl_is_connected,
> >>>>> NULL);
> >>>>>>>>>
> >>>>>>>>>      engine_add_input(&en_runtime_data, &en_ovs_open_vswitch,
> >>> NULL);
> >>>>>>>>> @@ -4098,7 +4167,8 @@ main(int argc, char *argv[])
> >>>>>>>>>                          runtime_data ?
> >>> &runtime_data->lbinding_data :
> >>>>>>>> NULL;
> >>>>>>>>>
> >>>>>  stopwatch_start(IF_STATUS_MGR_UPDATE_STOPWATCH_NAME,
> >>>>>>>>>                                      time_msec());
> >>>>>>>>> -                    if_status_mgr_update(if_mgr, binding_data);
> >>>>>>>>> +                    if_status_mgr_update(if_mgr, binding_data,
> >>>>> chassis,
> >>>>>>>>> +                                         !ovnsb_idl_txn);
> >>>>>>>>>
> >>>>>  stopwatch_stop(IF_STATUS_MGR_UPDATE_STOPWATCH_NAME,
> >>>>>>>>>                                     time_msec());
> >>>>>>>>>
> >>>>>>>>> diff --git a/tests/ovn-macros.at b/tests/ovn-macros.at
> >>>>>>>>> index 335f9158c..8fd6ae6f7 100644
> >>>>>>>>> --- a/tests/ovn-macros.at
> >>>>>>>>> +++ b/tests/ovn-macros.at
> >>>>>>>>> @@ -759,3 +759,15 @@
> >>>>> m4_define([OVN_FOR_EACH_NORTHD_WITHOUT_DP_GROUPS],
> >>>>>>>>>       [m4_foreach([NORTHD_USE_DP_GROUPS], [no],
> >>>>>>>>>         [m4_foreach([NORTHD_USE_PARALLELIZATION], [yes, no], [$1
> >>>>>>>>>  ])])])])
> >>>>>>>>> +
> >>>>>>>>> +# OVN_NBCTL(NBCTL_COMMAND) adds NBCTL_COMMAND to list of commands
> >>> to
> >>>>> be
> >>>>>>>> run by RUN_OVN_NBCTL().
> >>>>>>>>> +m4_define([OVN_NBCTL], [
> >>>>>>>>> +    command="${command} -- $1"
> >>>>>>>>> +])
> >>>>>>>>> +
> >>>>>>>>> +# RUN_OVN_NBCTL() executes list of commands built by the
> >>> OVN_NBCTL()
> >>>>>>>> macro.
> >>>>>>>>> +m4_define([RUN_OVN_NBCTL], [
> >>>>>>>>> +    check ovn-nbctl ${command}
> >>>>>>>>> +    unset command
> >>>>>>>>> +])
> >>>>>>>>> +
> >>>>>>>>> diff --git a/tests/ovn.at b/tests/ovn.at
> >>>>>>>>> index bfaa41962..94d16bac9 100644
> >>>>>>>>> --- a/tests/ovn.at
> >>>>>>>>> +++ b/tests/ovn.at
> >>>>>>>>> @@ -102,6 +102,18 @@ m4_divert_text([PREPARE_TESTS],
> >>>>>>>>>           test 1 -le $(as $hv1 ovs-ofctl dump-flows br-int | grep
> >>> -c
> >>>>>>>> "output:$ofport")
> >>>>>>>>>       ])
> >>>>>>>>>     }
> >>>>>>>>> +
> >>>>>>>>> +   ovn_wait_remote_input_flows () {
> >>>>>>>>> +     hv1=$1
> >>>>>>>>> +     hv2=$2
> >>>>>>>>> +     echo "$3: waiting for flows for remote input on $hv1"
> >>>>>>>>> +     # Wait for a flow outputing  to remote input
> >>>>>>>>> +     OVS_WAIT_UNTIL([
> >>>>>>>>> +         ofport=$(as $hv1 ovs-vsctl --bare --columns ofport find
> >>>>>>>> Interface name=ovn-${hv2}-0)
> >>>>>>>>> +         echo "tunnel port=$ofport"
> >>>>>>>>> +         test 1 -le $(as $hv1 ovs-ofctl dump-flows br-int | grep
> >>> -c
> >>>>>>>> "in_port=$ofport")
> >>>>>>>>> +     ])
> >>>>>>>>> +   }
> >>>>>>>>>  ])
> >>>>>>>>>
> >>>>>>>>>  m4_define([OVN_CHECK_PACKETS],
> >>>>>>>>> @@ -127,6 +139,8 @@ m4_define([OVN_WAIT_PATCH_PORT_FLOWS],
> >>>>>>>>>  m4_define([OVN_WAIT_REMOTE_OUTPUT_FLOWS],
> >>>>>>>>>    [ovn_wait_remote_output_flows "$1" "$2" "__file__:__line__"])
> >>>>>>>>>
> >>>>>>>>> +m4_define([OVN_WAIT_REMOTE_INPUT_FLOWS],
> >>>>>>>>> +  [ovn_wait_remote_input_flows "$1" "$2" "__file__:__line__"])
> >>>>>>>>>
> >>>>>>>>>  AT_BANNER([OVN components])
> >>>>>>>>>
> >>>>>>>>> @@ -14056,6 +14070,11 @@ wait_column "$hv1_uuid" Port_Binding
> >>>>>>>> requested_chassis logical_port=lsp0
> >>>>>>>>>  wait_column "$hv2_uuid" Port_Binding additional_chassis
> >>>>>>>> logical_port=lsp0
> >>>>>>>>>  wait_column "$hv2_uuid" Port_Binding requested_additional_chassis
> >>>>>>>> logical_port=lsp0
> >>>>>>>>>
> >>>>>>>>> +# Check ovn-installed updated for main chassis
> >>>>>>>>> +wait_for_ports_up
> >>>>>>>>> +OVS_WAIT_UNTIL([test `as hv1 ovs-vsctl get Interface lsp0
> >>>>>>>> external_ids:ovn-installed` = '"true"'])
> >>>>>>>>> +OVS_WAIT_UNTIL([test x`as hv2 ovs-vsctl get Interface lsp0
> >>>>>>>> external_ids:ovn-installed` = x])
> >>>>>>>>> +
> >>>>>>>>>  # Check that setting iface:encap-ip populates
> >>>>>>>> Port_Binding:additional_encap
> >>>>>>>>>  wait_row_count Encap 2 chassis_name=hv1
> >>>>>>>>>  wait_row_count Encap 2 chassis_name=hv2
> >>>>>>>>> @@ -14081,6 +14100,11 @@ wait_column "$hv2_uuid" Port_Binding
> >>>>>>>> requested_chassis logical_port=lsp0
> >>>>>>>>>  wait_column "" Port_Binding additional_chassis logical_port=lsp0
> >>>>>>>>>  wait_column "" Port_Binding requested_additional_chassis
> >>>>>>>> logical_port=lsp0
> >>>>>>>>>
> >>>>>>>>> +# Check ovn-installed updated for main chassis and not for other
> >>>>> chassis
> >>>>>>>>> +wait_for_ports_up
> >>>>>>>>> +OVS_WAIT_UNTIL([test `as hv2 ovs-vsctl get Interface lsp0
> >>>>>>>> external_ids:ovn-installed` = '"true"'])
> >>>>>>>>> +OVS_WAIT_UNTIL([test x`as hv1 ovs-vsctl get Interface lsp0
> >>>>>>>> external_ids:ovn-installed` = x])
> >>>>>>>>> +
> >>>>>>>>>  # Check that additional_encap is cleared
> >>>>>>>>>  wait_column "" Port_Binding additional_encap logical_port=lsp0
> >>>>>>>>>
> >>>>>>>>> @@ -15370,7 +15394,10 @@ AT_CHECK([as hv1 ovs-ofctl dump-flows
> >>> br-int
> >>>>>>>> table=65 | grep actions=output:1],
> >>>>>>>>>  echo "verifying that lsp0 binding moves when requested-chassis is
> >>>>>>>> changed"
> >>>>>>>>>
> >>>>>>>>>  ovn-nbctl lsp-set-options lsp0 requested-chassis=hv2
> >>>>>>>>> -OVS_WAIT_UNTIL([test 1 = $(grep -c "Releasing lport lsp0 from this
> >>>>>>>> chassis" hv1/ovn-controller.log)])
> >>>>>>>>> +
> >>>>>>>>> +# We might see multiple "Releasing lport ...", when sb is read
> >>> only
> >>>>>>>>> +OVS_WAIT_UNTIL([test 1 -le $(grep -c "Releasing lport lsp0 from
> >>> this
> >>>>>>>> chassis" hv1/ovn-controller.log)])
> >>>>>>>>> +
> >>>>>>>>>  wait_column "$hv2_uuid" Port_Binding chassis logical_port=lsp0
> >>>>>>>>>
> >>>>>>>>>  # (6) Chassis hv2 should add flows and hv1 should not.
> >>>>>>>>> @@ -15416,7 +15443,7 @@ AT_CHECK([as hv1 ovs-ofctl dump-flows
> >>> br-int
> >>>>>>>> table=0 | grep in_port=1], [0], [ig
> >>>>>>>>>  AT_CHECK([as hv1 ovs-ofctl dump-flows br-int table=65 | grep
> >>>>>>>> actions=output:1], [0], [ignore])
> >>>>>>>>>
> >>>>>>>>>  check ovn-nbctl --wait=hv lsp-set-options lsp0
> >>>>>>>> requested-chassis=non-existant-chassis
> >>>>>>>>> -OVS_WAIT_UNTIL([test 1 = $(grep -c "Releasing lport lsp0 from this
> >>>>>>>> chassis" hv1/ovn-controller.log)])
> >>>>>>>>> +OVS_WAIT_UNTIL([test 1 -le $(grep -c "Releasing lport lsp0 from
> >>> this
> >>>>>>>> chassis" hv1/ovn-controller.log)])
> >>>>>>>>>  check ovn-nbctl --wait=hv sync
> >>>>>>>>>  wait_column '' Port_Binding chasssi logical_port=lsp0
> >>>>>>>>>  AT_CHECK([as hv1 ovs-ofctl dump-flows br-int table=0 | grep
> >>>>> in_port=1],
> >>>>>>>> [1], [])
> >>>>>>>>> @@ -32418,3 +32445,119 @@ AT_CHECK([test $(ovn-sbctl list fdb |
> >>> grep
> >>>>> -c
> >>>>>>>> "00:00:00:00:10:30") = 0])
> >>>>>>>>>  OVN_CLEANUP([hv1])
> >>>>>>>>>  AT_CLEANUP
> >>>>>>>>>  ])
> >>>>>>>>> +
> >>>>>>>>> +OVN_FOR_EACH_NORTHD([
> >>>>>>>>> +AT_SETUP([recomputes])
> >>>>>>>>> +ovn_start
> >>>>>>>>> +
> >>>>>>>>> +n_hv=4
> >>>>>>>>> +
> >>>>>>>>> +# Add chassis
> >>>>>>>>> +net_add n1
> >>>>>>>>> +for i in $(seq 1 $n_hv); do
> >>>>>>>>> +    sim_add hv$i
> >>>>>>>>> +    as hv$i
> >>>>>>>>> +    check ovs-vsctl add-br br-phys
> >>>>>>>>> +    ovs-vsctl set open .
> >>>>> external-ids:ovn-bridge-mappings=phys:br-phys
> >>>>>>>>> +    ovn_attach n1 br-phys 192.168.0.$i 24 geneve
> >>>>>>>>> +done
> >>>>>>>>> +
> >>>>>>>>> +add_switch_ports() {
> >>>>>>>>> +    start_port=$1
> >>>>>>>>> +    end_port=$2
> >>>>>>>>> +    nb_hv=$3
> >>>>>>>>> +    bulk_size=$4
> >>>>>>>>> +    for ((i=start_port; i<end_port; )) do
> >>>>>>>>> +        start_bulk=$i
> >>>>>>>>> +        for hv in $(seq 1 $nb_hv); do
> >>>>>>>>> +            end_bulk=$((start_bulk+bulk_size-1))
> >>>>>>>>> +            for port in $(seq $start_bulk $end_bulk); do
> >>>>>>>>> +                logical_switch_port=lsp${port}
> >>>>>>>>> +                OVN_NBCTL(lsp-add ls1 $logical_switch_port)
> >>>>>>>>> +                OVN_NBCTL(lsp-set-addresses $logical_switch_port
> >>>>>>>> dynamic)
> >>>>>>>>> +            done
> >>>>>>>>> +            start_bulk=$((end_bulk+1))
> >>>>>>>>> +        done
> >>>>>>>>> +        RUN_OVN_NBCTL()
> >>>>>>>>> +
> >>>>>>>>> +        start_bulk=$i
> >>>>>>>>> +        for hv in $(seq 1 $nb_hv); do
> >>>>>>>>> +            end_bulk=$((start_bulk+bulk_size-1))
> >>>>>>>>> +            for port in $(seq $start_bulk $end_bulk); do
> >>>>>>>>> +                logical_switch_port=lsp${port}
> >>>>>>>>> +                as hv$hv ovs-vsctl \
> >>>>>>>>> +                    --no-wait -- add-port br-int vif${port} \
> >>>>>>>>> +                    -- set Interface vif${port}
> >>>>>>>> external_ids:iface-id=$logical_switch_port
> >>>>>>>>> +            done
> >>>>>>>>> +            start_bulk=$((end_bulk+1))
> >>>>>>>>> +        done
> >>>>>>>>> +        i=$((end_bulk+1))
> >>>>>>>>> +    done
> >>>>>>>>> +}
> >>>>>>>>> +check ovn-nbctl ls-add ls1
> >>>>>>>>> +check ovn-nbctl set Logical_Switch ls1 other_config:subnet=
> >>>>> 10.1.0.0/16
> >>>>>>>>> +check ovn-nbctl set Logical_Switch ls1
> >>>>>>>> other_config:exclude_ips=10.1.255.254
> >>>>>>>>> +
> >>>>>>>>> +check ovn-nbctl lr-add lr1
> >>>>>>>>> +check ovn-nbctl lsp-add ls1 lsp0 -- set Logical_Switch_Port lsp0
> >>>>>>>> type=router options:router-port=lrp0 addresses=dynamic
> >>>>>>>>> +check ovn-nbctl lrp-add lr1 lrp0 "f0:00:00:01:00:01"
> >>> 10.1.255.254/16
> >>>>>>>>> +check ovn-nbctl lr-nat-add lr1 snat 10.2.0.1 10.1.0.0/16
> >>>>>>>>> +
> >>>>>>>>> +lflow_run=0
> >>>>>>>>> +check ovn-nbctl --wait=hv sync
> >>>>>>>>> +
> >>>>>>>>> +# Tunnel ports might not be added (yet) at this point on slow
> >>> system.
> >>>>>>>>> +# Wait for flows related to such ports to ensure those ports have
> >>>>> been
> >>>>>>>> added
> >>>>>>>>> +# before we measure recomputes. Otherwise, ovs_interface handler
> >>>>> might
> >>>>>>>> be run
> >>>>>>>>> +# afterwards for tunnel ports, causing recomputes.
> >>>>>>>>> +for i in $(seq 1 $n_hv); do
> >>>>>>>>> +    for j in $(seq 1 $n_hv); do
> >>>>>>>>> +        if test $i != $j; then
> >>>>>>>>> +            OVN_WAIT_REMOTE_INPUT_FLOWS(["hv$i"],["hv$j"])
> >>>>>>>>> +        fi
> >>>>>>>>> +    done
> >>>>>>>>> +done
> >>>>>>>>> +
> >>>>>>>>> +for i in $(seq 1 $n_hv); do
> >>>>>>>>> +    as hv$i
> >>>>>>>>> +    lflow_run1=$(ovn-appctl -t ovn-controller
> >>> coverage/read-counter
> >>>>>>>> lflow_run)
> >>>>>>>>> +    lflow_run=`expr $lflow_run1 + $lflow_run`
> >>>>>>>>> +done
> >>>>>>>>> +
> >>>>>>>>> +add_switch_ports 1 1000 $n_hv 5
> >>>>>>>>> +
> >>>>>>>>> +wait_for_ports_up
> >>>>>>>>> +check ovn-nbctl --wait=hv sync
> >>>>>>>>> +
> >>>>>>>>> +for i in $(seq 1 $n_hv); do
> >>>>>>>>> +    pid=$(cat hv${i}/ovn-controller.pid)
> >>>>>>>>> +    u=$((u+$(cat "/proc/$pid/stat" | awk '{print $14}')))
> >>>>>>>>> +    s=$((s+$(cat "/proc/$pid/stat" | awk '{print $15}')))
> >>>>>>>>> +done
> >>>>>>>>> +
> >>>>>>>>> +n_pid=$(cat northd/ovn-northd.pid)
> >>>>>>>>> +n_u=$(cat "/proc/$pid/stat" | awk '{print $14}')
> >>>>>>>>> +n_s=$(cat "/proc/$pid/stat" | awk '{print $15}')
> >>>>>>>>> +
> >>>>>>>>> +echo "Total Northd User Time: $n_u"
> >>>>>>>>> +echo "Total Northd System Time: $n_s"
> >>>>>>>>> +echo "Total Controller User Time: $u"
> >>>>>>>>> +echo "Total Controller System Time: $s"
> >>>>>>>>> +
> >>>>>>>>> +lflow_run_end=0
> >>>>>>>>> +for i in $(seq 1 $n_hv); do
> >>>>>>>>> +    as hv$i
> >>>>>>>>> +    lflow_run1=$(ovn-appctl -t ovn-controller
> >>> coverage/read-counter
> >>>>>>>> lflow_run)
> >>>>>>>>> +    lflow_run_end=`expr $lflow_run1 + $lflow_run_end`
> >>>>>>>>> +done
> >>>>>>>>> +n_recomputes=`expr $lflow_run_end - $lflow_run`
> >>>>>>>>> +echo "$n_recomputes recomputes"
> >>>>>>>>> +
> >>>>>>>>> +AT_CHECK([test $lflow_run_end == $lflow_run])
> >>>>>>>>> +
> >>>>>>>>> +for i in $(seq 2 $n_hv); do
> >>>>>>>>> +    OVN_CLEANUP_SBOX([hv$i])
> >>>>>>>>> +done
> >>>>>>>>> +OVN_CLEANUP([hv1])
> >>>>>>>>> +AT_CLEANUP
> >>>>>>>>> +])
> >>>>>>>>> diff --git a/tests/perf-northd.at b/tests/perf-northd.at
> >>>>>>>>> index 74b69e9d4..6ec196b36 100644
> >>>>>>>>> --- a/tests/perf-northd.at
> >>>>>>>>> +++ b/tests/perf-northd.at
> >>>>>>>>> @@ -76,23 +76,6 @@ m4_define([PERF_RECORD_STOP], [
> >>>>>>>>>      PERF_RECORD_STOPWATCH(ovn-northd-loop, ["Short term average"],
> >>>>>>>> [Average (northd-loop in msec)])
> >>>>>>>>>  ])
> >>>>>>>>>
> >>>>>>>>> -# OVN_NBCTL([NBCTL_COMMAND])
> >>>>>>>>> -#
> >>>>>>>>> -# Add NBCTL_COMMAND to list of commands to be run by
> >>> RUN_OVN_NBCTL().
> >>>>>>>>> -#
> >>>>>>>>> -m4_define([OVN_NBCTL], [
> >>>>>>>>> -    command="${command} -- $1"
> >>>>>>>>> -])
> >>>>>>>>> -
> >>>>>>>>> -# RUN_OVN_NBCTL()
> >>>>>>>>> -#
> >>>>>>>>> -# Execute list of commands built by the OVN_NBCTL() macro.
> >>>>>>>>> -#
> >>>>>>>>> -m4_define([RUN_OVN_NBCTL], [
> >>>>>>>>> -    check ovn-nbctl ${command}
> >>>>>>>>> -    unset command
> >>>>>>>>> -])
> >>>>>>>>> -
> >>>>>>>>>  OVS_START_SHELL_HELPERS
> >>>>>>>>>  generate_subnet () {
> >>>>>>>>>      local a=$(printf %d $(expr $1 / 256 + 10))
> >>>>>>>>> --
> >>>>>>>>> 2.31.1
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>
>
> _______________________________________________
> dev mailing list
> dev@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>
Han Zhou July 21, 2022, 6:29 a.m. UTC | #13
On Wed, Jul 20, 2022 at 10:29 AM Numan Siddique <numans@ovn.org> wrote:
>
> On Wed, Jul 13, 2022 at 4:29 AM Dumitru Ceara <dceara@redhat.com> wrote:
> >
> > On 7/13/22 11:08, Dumitru Ceara wrote:
> > > On 7/13/22 09:40, Xavier Simonart wrote:
> > >> Hi Han, Dumitru
> > >>
> > >
> > > Hi Han, Xavier,
> > >
> > > Sorry, I had already replied to the previous message and only then
> > > noticed this one.
> > >
> > >> I think that we should, as much as possible, try to achieve both
goals:
> > >> - have an accurate ovn-installed
> > >> - do not increase latency in large scale deployments
> > >>
> > >
> > > +1
> > >
> > >> The fact that ovn-installed is sent too early for mc flows is
already an
> > >> issue today, independent of this patch.
> > >> Fixing ovn-installed related to mc flows by delaying the state
change (for
> > >> all cases, included when no mc groups) might be seen as a performance
> > >> regression.
> > >>
> > >
> > > I think it will, and I'm not sure we can convince the CMS that this is
> > > "just metrics".
> > >
> > >> I agree that we should fix this ovn-installed issue, but it is not a
> > >> regression added by this patch. We should enter a BZ for it.
> > >> Per my understanding, the mc flows are updated when the
SB_multicast_group
> > >> is seen as updated by ovn-controller, due to its references to port
binding.
> > >> Other flows related to port binding are installed earlier, i.e. when
> > >> ovn-controller writes port_binding->chassis (i.e. before it receives
SB
> > >> confirmation). So, while sending the mc flows earlier than what we
do today
> > >> might be more complex, I think it makes some kind of sense (we would
send
> > >> all those flows within the same loop).
> > >
> > > I'm inclining towards leaving it as it is today if this is the only
flow
> > > we're missing.  It's a guess without testing things out, but I think
> > > it's for the MC_FLOOD_L2 multicast group which is used only for
> > > forwarding ARP packets originated by OVN routers or destined to a
> > > specific OVN router.  Losing some of those packets is not a big deal.
> > >
> > > But it might be good to confirm that this is the MC group we install
the
> > > flow for.
> > >
> >
> > Oh, well, the port is also part of the MC_FLOOD group.  This is however
> > only used for BUM traffic.  So losing some packets here is also not
> > terrible, I think.
> >
>
> When a logical port is claimed,  we process the logical flows related
> to the logical port (i.e with inport == <lport> or outport == <lport>)
> and install
> the corresponding openflows. All the generic logical flows (i.e
> without inport or outport match) would have already been programmed
> (if the datapath already part of local_datapaths).
> These processed logical flows (lflow_handle_flows_for_lport() in
> lflow.c) will be most likely part of the same openflow bundle. And
> once the sequence number
> for this bundle is acknowledged we set "ovn-installed=true".  When CMS
> notices "ovn-installed=true" , I think it can fairly assume that the
> flows for the lport are
> programmed.
>
> I think the only flows pertaining to the logical port  which we would
> be missing are the multicast related flows and the logical flows which
> ovn-northd would generate after the logical port
> is claimed (presently it is the arp responder flows) and I don't think
> we can wait for these logical flows to be programmed by ovn-northd
> before setting "ovn-installed=true".

The missing flows is just a side-effect. I am more concerned with the
clearness of the state-machine.
To my understanding it would be very clear to define the "CLAIMED" state's
job as claiming the port in SB-DB. If SB commit fails, the retry should
happen at this state. If we see the update notification (i.e. we see the
PB.chassis matches the desired chassis), we move to the next state
"INSTALL_FLOWS". Now if we move the state forward without confirming the
PB.chassis is updated in SB, we would need to perform the task in all the
following states. The only benefit we get from this is that ovn-installed
can be set to true a little bit earlier (to save a SB round trip), at the
cost of more complexity (even more so if ovn-monitor-all needs to be
considered in this logic) and less clarity of the state machine.

Is it possible to address it with the simpler/clear approach first and see
if it really causes obvious performance regression, then we can consider
the "short-cuts"? I am not sure if it is some kind of premature
optimization at this point.

Thanks,
Han
>
> Delaying setting the ovn-installed=true would definitely result in
> latency.  It would not be easy for ovn-controller to keep track of
> openflows already programmed for a logical port
> In other words,  I don't think ovn-controller can accurately keep
> track of all the openflows related to a logical port are programmed or
> not  unless all these flows are grouped in one bundle.
>
> Also since the present ovn main already sets ovn-installed=true a bit
> early i.e. even before the multicast and arp responder flows are
> programmed, I think it is out of this patch's scope to address it.
>
> So I think the patch is fine with me once Xavier addresses (1) i.e
> remove the 'chassis_update_required'.
>
> Thanks
> Numan
>
>
> > > Thanks,
> > > Dumitru
> > >
> > >>
> > >> Thanks
> > >> Xavier
> > >>
> > >>
> > >>
> > >> On Wed, Jul 13, 2022 at 8:28 AM Han Zhou <hzhou@ovn.org> wrote:
> > >>
> > >>>
> > >>>
> > >>> On Tue, Jul 12, 2022 at 12:35 AM Dumitru Ceara <dceara@redhat.com>
wrote:
> > >>>>
> > >>>> On 7/12/22 08:52, Han Zhou wrote:
> > >>>>> On Mon, Jul 11, 2022 at 4:55 AM Dumitru Ceara <dceara@redhat.com>
> > >>> wrote:
> > >>>>>>
> > >>>>>> On 7/11/22 13:31, Xavier Simonart wrote:
> > >>>>>>> Hi Han
> > >>>>>>>
> > >>>>>>> Thanks for your review.
> > >>>>>>>
> > >>>>>>> Let me try to understand your two main concerns and the proper
way to
> > >>>>> fix
> > >>>>>>> it.
> > >>>>>>> 1) We only try once to write pb->chassis. If the commit fails,
> > >>>>> pb->chassis
> > >>>>>>> is not written. As commit fails, we will recompute, but as the
> > >>>>>>> update_required flag is not set anymore, we might end up with no
> > >>>>>>> pb->chassis.
> > >>>>>>> => I'll remove the flag and try to update until it's confirmed.
> > >>>>>
> > >>>>> Thank you!
> > >>>>>
> > >>>>>>> 2) The state machine, and when we move to INSTALL_FLOWS.
Serializing
> > >>> the
> > >>>>>>> state machine, by waiting for confirmation to be received before
> > >>> moving
> > >>>>> to
> > >>>>>>> INSTALL_FLOWS state will delay the ovn-installed compared to
today.
> > >>> So
> > >>>>> I am
> > >>>>>>> (still) trying to see if there is any way to prevent this in
some
> > >>> cases.
> > >>>>>>> Would it be correct to do this serialization (wait for
pb->chassis
> > >>>>> update
> > >>>>>>> confirmation) only when using conditional monitoring? When using
> > >>>>>>> monitor-all, as soon as we have written (w/o confirmation)
> > >>> pb->chassis,
> > >>>>> we
> > >>>>>>> would move to INSTALL_FLOWS. In that loop where we wrote
pb->chassis,
> > >>>>> (all)
> > >>>>>>> the flows should be updated taking into account pb->chassis.
> > >>>>>>
> > >>>>>> I think this approach is OK.  ovn-controllers that don't use
> > >>> conditional
> > >>>>>> monitoring already know the complete SB contents and should be
able to
> > >>>>>> install (mostly?) complete sets of openflows that correspond to a
> > >>> given
> > >>>>>> Port_Binding.
> > >>>>>>
> > >>>>>> It's also "CMS-friendly", at least for ovn-kubernetes, which uses
> > >>>>>> ovn-monitor-all=true and only waits for
> > >>>>>> OVS.Interface.external_ids:ovn-installed=true.  And shouldn't
impact
> > >>> the
> > >>>>>> others which wait for SB.Port_Binding.up=true.
> > >>>>>>
> > >>>>>
> > >>>>> Xavier and Dumitru, I think we shouldn't introduce the dependency
of
> > >>>>> "ovn-monitor-all" setting here.
> > >>>>>
> > >>>>> First of all, ovn-installed is a flag for CMS to understand that
all
> > >>> the
> > >>>>> flows related to the port-binding is installed. If we set the flag
> > >>> before
> > >>>>> it is truly completed, it is a bug, and it is possible that the
flag
> > >>> is set
> > >>>>> but some traffic doesn't work.
> > >>>>
> > >>>> I think it's a matter of semantics.  The way I see
"ovn-installed=true"
> > >>>> is: all flows that are relevant to the port binidng on the local
chassis
> > >>>> have been installed.  When we added it it was for the ovn-k8s case;
> > >>>> ovn-k8s used to explicitly check if some openflow tables on the
node
> > >>>> where the pod is being brought up contained flows that seemed to
> > >>>> correspond to the pod (e.g., matching on pod mac and IP addresses).
> > >>>>
> > >>> But the purpose of checking the flows (when ovn-installed wasn't
> > >>> available) was to make sure the pod is ready to send/receive
traffic. If
> > >>> ovn-installed can provide more accuracy, why not?
> > >>>
> > >>>>> I did a quick test, and at least a flow in (table_id=38,
priority=100)
> > >>>>> which is multicast-group related is updated AFTER the SB
notification
> > >>> is
> > >>>>> received for the port-binding chassis update.
> > >>>>>
> > >>>>
> > >>>> This sounds like something we should fix, I think.  I don't see any
> > >>>> multicast-group changes conditioned by the port_binding being
up=true in
> > >>>> northd.  I might be wrong though.
> > >>>>
> > >>>
> > >>> It is not about "up=true". It is triggered by the
port-binding->chassis
> > >>> update. Since multicast-group has reference to port-binding, so a
> > >>> port-binding update triggers multicast-group change handling, which
is
> > >>> required because physical flows related to the MC group need to be
updated
> > >>> when port-binding->chassis is updated. You may argue that the IDL
may be
> > >>> optimized so that the MC group change can be triggered and handled
before
> > >>> SB is updated, but I am not sure if the benefit is worth the
complexity.
> > >>> Given how OVSDB IDL transaction is designed, I'd always think a DB
record
> > >>> is *formally* updated only after the update notification is
received from
> > >>> the server, which seems to be safe and clear.
> > >>>
> > >>>>> Secondly, if the change hasn't made it to the SB, all the other
nodes
> > >>> would
> > >>>>> not be able to reach the port, which means the workload (pod/VM)
cannot
> > >>>>> receive traffic yet at this phase.
> > >>>>>
> > >>>>
> > >>>> Even if the change made it to the SB we have no way of knowing
that all
> > >>>> other nodes processed it so we cannot know for sure that traffic
can
> > >>>> flow properly end-to-end.  But, like I said above, this doesn't
matter
> > >>>> if the semantics of ovn-installed=true are "all locally relevant
flows
> > >>>> are installed".
> > >>>>
> > >>> It's true that even SB is updated it doesn't ensure all the nodes
> > >>> processed it, but I view it this way: at least from the current
node's
> > >>> point of view, its job is done and the other nodes are beyond its
control.
> > >>> On the other hand, if SB update failed, its job is not done yet. I
am not
> > >>> saying this is the only *correct* way, but just the way I am seeing
it :).
> > >>>
> > >>>>> So, I think our goal is not to set ovn-installed early, but to
set it
> > >>>>> accurately (sometime may be ok to be conservative).
> > >>>>>
> > >>>>
> > >>>> Sure, but waiting for the SB port_binding.chassis update might
introduce
> > >>>> significant spikes in latency if the SB is compacting (or just
busy) at
> > >>>> that moment.
> > >>>>
> > >>>> This might become an issue in large scale deployments as pods will
take
> > >>>> longer to be declared "ready".
> > >>>>
> > >>> I understand your concern, but if you think about it, no matter how
the
> > >>> pods are *declared* ready doesn't change the fact it is ready or
not. It
> > >>> doesn't make the real flow setup faster or slower.
> > >>> If the CMS really wants to declare it ready earlier, it can just
ignore
> > >>> the ovn-installed flag check or flow check. What's the real benefit
except
> > >>> for metrics?
> > >>>
> > >>>>> In addition, ovn-monitor-all is not always true even in ovn-k8s.
It is
> > >>>>> configurable in ovn-k8s. (in our environment we set it to false,
to
> > >>> save
> > >>>>> MEM and CPU for worker nodes, while sacrifice a little for the
central
> > >>> SB
> > >>>>> DB)
> > >>>>>
> > >>>>
> > >>>> Ack.  But for this case specifically, as SB is already busier with
> > >>>> conditional monitoring, I think serializing events in
ovn-controller
> > >>>> will create even more visible delays in pod bringup times.
> > >>>>
> > >>>> Thanks,
> > >>>> Dumitru
> > >>>>
> > >>>>> Thanks,
> > >>>>> Han
> > >>>>>>>
> > >>>>>>> Thanks again for your feedback
> > >>>>>>>
> > >>>>>>> Xavier
> > >>>>>>>
> > >>>>>>> On Thu, Jul 7, 2022 at 8:49 AM Han Zhou <hzhou@ovn.org> wrote:
> > >>>>>>>
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>> On Wed, Jun 22, 2022 at 3:23 AM Xavier Simonart <
> > >>> xsimonar@redhat.com>
> > >>>>>>>> wrote:
> > >>>>>>>>>
> > >>>>>>>>> When VIF ports are claimed on a chassis, SBDB Port_Binding
table is
> > >>>>>>>> updated.
> > >>>>>>>>> If the SBDB IDL is still is read-only ("in transaction") when
such
> > >>> a
> > >>>>>>>> update
> > >>>>>>>>> is required, the update is not possible and recompute is
triggered
> > >>>>>>>> through
> > >>>>>>>>> I+P failure.
> > >>>>>>>>>
> > >>>>>>>>> This situation can happen:
> > >>>>>>>>> - after updating Port_Binding->chassis to SBDB for one port,
in a
> > >>>>>>>> following
> > >>>>>>>>>   iteration, ovn-controller handles
> > >>>>> Interface:external_ids:ovn-installed
> > >>>>>>>>>   (for the same port) while SBDB is still read-only.
> > >>>>>>>>> - after updating Port_Binding->chassis to SBDB for one port,
in a
> > >>>>>>>> following
> > >>>>>>>>>   iteration, ovn-controller updates Port_Binding->chassis for
> > >>> another
> > >>>>>>>> port,
> > >>>>>>>>>   while SBDB is still read-only.
> > >>>>>>>>>
> > >>>>>>>>> This patch prevent the recompute, by having the if-status
module
> > >>>>>>>>> updating the Port_Binding chassis (if needed) when possible.
> > >>>>>>>>> This does not delay Port_Binding chassis update compared to
before
> > >>>>> this
> > >>>>>>>> patch.
> > >>>>>>>>> - With the patch, Port_Binding chassis will be updated as
soon as
> > >>>>> SBDB is
> > >>>>>>>>> again writable, without recompute.
> > >>>>>>>>> - Without the patch, Port_Binding chassis was updated as soon
as
> > >>> SBDB
> > >>>>> was
> > >>>>>>>>> again writable, through a recompute.
> > >>>>>>>>>
> > >>>>>>>>> As part of this patch, ovn-installed will not be updated for
> > >>>>> additional
> > >>>>>>>> chassis;
> > >>>>>>>>> it will only be updated when the migration is completed.
> > >>>>>>>>>
> > >>>>>>>>> Reported-at:
https://bugzilla.redhat.com/show_bug.cgi?id=2059253
> > >>>>>>>>> Signed-off-by: Xavier Simonart <xsimonar@redhat.com>
> > >>>>>>>>>
> > >>>>>>>>> ---
> > >>>>>>>>> v2:  - handled Dumitru's comments.
> > >>>>>>>>>      - handled Han's comments, mainly ensure we moved out of
> > >>> CLAIMED
> > >>>>>>>> state
> > >>>>>>>>>        only after updating pb->chassis to guarentee physical
flows
> > >>> are
> > >>>>>>>> installed
> > >>>>>>>>>        when ovn-installed is updated in OVS.
> > >>>>>>>>>      - slighly reorganize the code to isolate 'notify_up =
false'
> > >>>>> cases
> > >>>>>>>> in
> > >>>>>>>>>        claim_port (i.e. ports such as virtual ports), in the
idea
> > >>> of
> > >>>>>>>> making
> > >>>>>>>>>        future patch preventing recomputes when virtual ports
are
> > >>>>> claimed.
> > >>>>>>>>>      - updated test case to cause more race conditions.
> > >>>>>>>>>      - rebased on origin/main
> > >>>>>>>>>      - note that "additional chassis" as now supported by
> > >>>>>>>>>        "Support LSP:options:requested-chassis as a list" might
> > >>> still
> > >>>>>>>> cause
> > >>>>>>>>>        recomputes.
> > >>>>>>>>>      - fixed missing flows when Port_Binding chassis was
updated by
> > >>>>>>>> mgr_update
> > >>>>>>>>>        w/o any lflow recalculation.
> > >>>>>>>>> v3:  - handled Dumitru's comments on v2, mainly have
runtime_data
> > >>>>> handler
> > >>>>>>>>>        handling pb_claims when sb becomes writable (instead
of a
> > >>> lflow
> > >>>>>>>> handler).
> > >>>>>>>>>      - fixed test as it was not checking recomputes on all
hv, as
> > >>> well
> > >>>>>>>> as a flaky
> > >>>>>>>>>        behavior.
> > >>>>>>>>>      - rebased on origin/main.
> > >>>>>>>>> ---
> > >>>>>>>>>  controller/binding.c        | 154
+++++++++++++++++++++----------
> > >>>>>>>>>  controller/binding.h        |  15 +++-
> > >>>>>>>>>  controller/if-status.c      | 174
> > >>>>> ++++++++++++++++++++++++++++++++----
> > >>>>>>>>>  controller/if-status.h      |  16 +++-
> > >>>>>>>>>  controller/ovn-controller.c |  72 ++++++++++++++-
> > >>>>>>>>>  tests/ovn-macros.at         |  12 +++
> > >>>>>>>>>  tests/ovn.at                | 147
+++++++++++++++++++++++++++++-
> > >>>>>>>>>  tests/perf-northd.at        |  17 ----
> > >>>>>>>>>  8 files changed, 519 insertions(+), 88 deletions(-)
> > >>>>>>>>>
> > >>>>>>>>> diff --git a/controller/binding.c b/controller/binding.c
> > >>>>>>>>> index 2279570f9..b21577f71 100644
> > >>>>>>>>> --- a/controller/binding.c
> > >>>>>>>>> +++ b/controller/binding.c
> > >>>>>>>>> @@ -644,11 +644,17 @@ local_binding_get_lport_ofport(const
struct
> > >>>>> shash
> > >>>>>>>> *local_bindings,
> > >>>>>>>>>  }
> > >>>>>>>>>
> > >>>>>>>>>  bool
> > >>>>>>>>> -local_binding_is_up(struct shash *local_bindings, const char
> > >>>>> *pb_name)
> > >>>>>>>>> +local_binding_is_up(struct shash *local_bindings, const char
> > >>>>> *pb_name,
> > >>>>>>>>> +                    const struct sbrec_chassis *chassis_rec)
> > >>>>>>>>>  {
> > >>>>>>>>>      struct local_binding *lbinding =
> > >>>>>>>>>          local_binding_find(local_bindings, pb_name);
> > >>>>>>>>>      struct binding_lport *b_lport =
> > >>>>>>>> local_binding_get_primary_lport(lbinding);
> > >>>>>>>>> +
> > >>>>>>>>> +    if (b_lport && b_lport->pb->chassis != chassis_rec) {
> > >>>>>>>>> +        return false;
> > >>>>>>>>> +    }
> > >>>>>>>>> +
> > >>>>>>>>>      if (lbinding && b_lport && lbinding->iface) {
> > >>>>>>>>>          if (b_lport->pb->n_up && !b_lport->pb->up[0]) {
> > >>>>>>>>>              return false;
> > >>>>>>>>> @@ -660,13 +666,23 @@ local_binding_is_up(struct shash
> > >>>>> *local_bindings,
> > >>>>>>>> const char *pb_name)
> > >>>>>>>>>  }
> > >>>>>>>>>
> > >>>>>>>>>  bool
> > >>>>>>>>> -local_binding_is_down(struct shash *local_bindings, const
char
> > >>>>> *pb_name)
> > >>>>>>>>> +local_binding_is_down(struct shash *local_bindings, const
char
> > >>>>> *pb_name,
> > >>>>>>>>> +                      const struct sbrec_chassis
*chassis_rec)
> > >>>>>>>>>  {
> > >>>>>>>>>      struct local_binding *lbinding =
> > >>>>>>>>>          local_binding_find(local_bindings, pb_name);
> > >>>>>>>>>
> > >>>>>>>>>      struct binding_lport *b_lport =
> > >>>>>>>> local_binding_get_primary_lport(lbinding);
> > >>>>>>>>>
> > >>>>>>>>> +    if (b_lport) {
> > >>>>>>>>> +        if (b_lport->pb->chassis == chassis_rec) {
> > >>>>>>>>> +            return false;
> > >>>>>>>>> +        } else if (b_lport->pb->chassis) {
> > >>>>>>>>> +            VLOG_DBG("lport %s already claimed by other
chassis",
> > >>>>>>>>> +                     b_lport->pb->logical_port);
> > >>>>>>>>> +        }
> > >>>>>>>>> +    }
> > >>>>>>>>> +
> > >>>>>>>>>      if (!lbinding) {
> > >>>>>>>>>          return true;
> > >>>>>>>>>      }
> > >>>>>>>>> @@ -884,37 +900,80 @@ get_lport_type_str(enum en_lport_type
> > >>>>> lport_type)
> > >>>>>>>>>      OVS_NOT_REACHED();
> > >>>>>>>>>  }
> > >>>>>>>>>
> > >>>>>>>>> -/* For newly claimed ports, if 'notify_up' is 'false':
> > >>>>>>>>> +void
> > >>>>>>>>> +set_pb_chassis_in_sbrec(const struct sbrec_port_binding *pb,
> > >>>>>>>>> +                        const struct sbrec_chassis
*chassis_rec,
> > >>>>>>>>> +                        bool is_set)
> > >>>>>>>>> +{
> > >>>>>>>>> +    if (pb->chassis != chassis_rec) {
> > >>>>>>>>> +         if (is_set) {
> > >>>>>>>>> +            if (pb->chassis) {
> > >>>>>>>>> +                VLOG_INFO("Changing chassis for lport %s
from %s
> > >>> to
> > >>>>>>>> %s.",
> > >>>>>>>>> +                          pb->logical_port,
pb->chassis->name,
> > >>>>>>>>> +                          chassis_rec->name);
> > >>>>>>>>> +            } else {
> > >>>>>>>>> +                VLOG_INFO("Claiming lport %s for this
chassis.",
> > >>>>>>>>> +                          pb->logical_port);
> > >>>>>>>>> +            }
> > >>>>>>>>> +            for (int i = 0; i < pb->n_mac; i++) {
> > >>>>>>>>> +                VLOG_INFO("%s: Claiming %s",
pb->logical_port,
> > >>>>>>>> pb->mac[i]);
> > >>>>>>>>> +            }
> > >>>>>>>>> +            sbrec_port_binding_set_chassis(pb, chassis_rec);
> > >>>>>>>>> +        }
> > >>>>>>>>> +    } else if (!is_set) {
> > >>>>>>>>> +        sbrec_port_binding_set_chassis(pb, NULL);
> > >>>>>>>>> +    }
> > >>>>>>>>> +}
> > >>>>>>>>> +
> > >>>>>>>>> +void
> > >>>>>>>>> +local_binding_set_pb(struct shash *local_bindings, const char
> > >>>>> *pb_name,
> > >>>>>>>>> +                     const struct sbrec_chassis *chassis_rec,
> > >>>>>>>>> +                     struct hmap *tracked_datapaths, bool
is_set)
> > >>>>>>>>> +{
> > >>>>>>>>> +    struct local_binding *lbinding =
> > >>>>>>>>> +        local_binding_find(local_bindings, pb_name);
> > >>>>>>>>> +    struct binding_lport *b_lport =
> > >>>>>>>> local_binding_get_primary_lport(lbinding);
> > >>>>>>>>> +
> > >>>>>>>>> +    if (b_lport) {
> > >>>>>>>>> +        set_pb_chassis_in_sbrec(b_lport->pb, chassis_rec,
is_set);
> > >>>>>>>>> +        if (tracked_datapaths) {
> > >>>>>>>>> +            update_lport_tracking(b_lport->pb,
tracked_datapaths,
> > >>>>> true);
> > >>>>>>>>> +        }
> > >>>>>>>>> +    }
> > >>>>>>>>> +}
> > >>>>>>>>> +
> > >>>>>>>>> +/* For newly claimed ports:
> > >>>>>>>>>   * - set the 'pb.up' field to true if 'pb' has no
'parent_pb'.
> > >>>>>>>>>   * - set the 'pb.up' field to true if 'parent_pb.up' is
'true'
> > >>> (e.g.,
> > >>>>>>>> for
> > >>>>>>>>>   *   container and virtual ports).
> > >>>>>>>>> - * Otherwise request a notification to be sent when the OVS
flows
> > >>>>>>>>> - * corresponding to 'pb' have been installed.
> > >>>>>>>>> + *
> > >>>>>>>>> + * Returns false if lport is not claimed due to
'sb_readonly'.
> > >>>>>>>>> + * Returns true otherwise.
> > >>>>>>>>>   *
> > >>>>>>>>>   * Note:
> > >>>>>>>>> - *   Updates (directly or through a notification) the
'pb->up'
> > >>> field
> > >>>>>>>> only if
> > >>>>>>>>> - *   it's explicitly set to 'false'.
> > >>>>>>>>> + *   Updates the 'pb->up' field only if it's explicitly set
to
> > >>>>> 'false'.
> > >>>>>>>>>   *   This is to ensure compatibility with older versions of
> > >>>>> ovn-northd.
> > >>>>>>>>>   */
> > >>>>>>>>> -static void
> > >>>>>>>>> +static bool
> > >>>>>>>>>  claimed_lport_set_up(const struct sbrec_port_binding *pb,
> > >>>>>>>>>                       const struct sbrec_port_binding
*parent_pb,
> > >>>>>>>>> -                     const struct sbrec_chassis *chassis_rec,
> > >>>>>>>>> -                     bool notify_up, struct if_status_mgr
*if_mgr)
> > >>>>>>>>> +                     bool sb_readonly)
> > >>>>>>>>>  {
> > >>>>>>>>> -    if (!notify_up) {
> > >>>>>>>>> -        bool up = true;
> > >>>>>>>>> -        if (!parent_pb || (parent_pb->n_up &&
parent_pb->up[0])) {
> > >>>>>>>>> +    /* When notify_up is false in claim_port(), no state is
> > >>> created
> > >>>>>>>>> +     * by if_status_mgr. In such cases, return false (i.e.
trigger
> > >>>>>>>> recompute)
> > >>>>>>>>> +     * if we can't update sb (because it is readonly).
> > >>>>>>>>> +     */
> > >>>>>>>>> +    bool up = true;
> > >>>>>>>>> +    if (!parent_pb || (parent_pb->n_up && parent_pb->up[0]))
{
> > >>>>>>>>> +        if (!sb_readonly) {
> > >>>>>>>>>              if (pb->n_up) {
> > >>>>>>>>>                  sbrec_port_binding_set_up(pb, &up, 1);
> > >>>>>>>>>              }
> > >>>>>>>>> +        } else if (pb->n_up && !pb->up[0]) {
> > >>>>>>>>> +            return false;
> > >>>>>>>>>          }
> > >>>>>>>>> -        return;
> > >>>>>>>>> -    }
> > >>>>>>>>> -
> > >>>>>>>>> -    if (pb->chassis != chassis_rec || (pb->n_up &&
!pb->up[0])) {
> > >>>>>>>>> -        if_status_mgr_claim_iface(if_mgr, pb->logical_port);
> > >>>>>>>>>      }
> > >>>>>>>>> +    return true;
> > >>>>>>>>>  }
> > >>>>>>>>>
> > >>>>>>>>>  typedef void (*set_func)(const struct sbrec_port_binding *pb,
> > >>>>>>>>> @@ -1057,37 +1116,35 @@ claim_lport(const struct
sbrec_port_binding
> > >>>>> *pb,
> > >>>>>>>>>              struct hmap *tracked_datapaths,
> > >>>>>>>>>              struct if_status_mgr *if_mgr)
> > >>>>>>>>>  {
> > >>>>>>>>> -    if (!sb_readonly) {
> > >>>>>>>>> -        claimed_lport_set_up(pb, parent_pb, chassis_rec,
> > >>> notify_up,
> > >>>>>>>> if_mgr);
> > >>>>>>>>> -    }
> > >>>>>>>>> -
> > >>>>>>>>>      enum can_bind can_bind =
> > >>>>>>>> lport_can_bind_on_this_chassis(chassis_rec, pb);
> > >>>>>>>>>      bool update_tracked = false;
> > >>>>>>>>>
> > >>>>>>>>>      if (can_bind == CAN_BIND_AS_MAIN) {
> > >>>>>>>>>          if (pb->chassis != chassis_rec) {
> > >>>>>>>>> -            if (sb_readonly) {
> > >>>>>>>>> -                return false;
> > >>>>>>>>> -            }
> > >>>>>>>>> -
> > >>>>>>>>> -            if (pb->chassis) {
> > >>>>>>>>> -                VLOG_INFO("Changing chassis for lport %s
from %s
> > >>> to
> > >>>>>>>> %s.",
> > >>>>>>>>> -                        pb->logical_port, pb->chassis->name,
> > >>>>>>>>> -                        chassis_rec->name);
> > >>>>>>>>> -            } else {
> > >>>>>>>>> -                VLOG_INFO("Claiming lport %s for this
chassis.",
> > >>>>>>>>> -                          pb->logical_port);
> > >>>>>>>>> -            }
> > >>>>>>>>> -            for (size_t i = 0; i < pb->n_mac; i++) {
> > >>>>>>>>> -                VLOG_INFO("%s: Claiming %s",
pb->logical_port,
> > >>>>>>>> pb->mac[i]);
> > >>>>>>>>> -            }
> > >>>>>>>>> -
> > >>>>>>>>> -            sbrec_port_binding_set_chassis(pb, chassis_rec);
> > >>>>>>>>>              if (is_additional_chassis(pb, chassis_rec)) {
> > >>>>>>>>> +                if (sb_readonly) {
> > >>>>>>>>> +                    return false;
> > >>>>>>>>> +                }
> > >>>>>>>>>                  remove_additional_chassis(pb, chassis_rec);
> > >>>>>>>>>              }
> > >>>>>>>>>              update_tracked = true;
> > >>>>>>>>>          }
> > >>>>>>>>> +        if (!notify_up) {
> > >>>>>>>>> +            if (!claimed_lport_set_up(pb, parent_pb,
> > >>> sb_readonly)) {
> > >>>>>>>>> +                return false;
> > >>>>>>>>> +            }
> > >>>>>>>>> +            if (pb->chassis != chassis_rec) {
> > >>>>>>>>> +                if (sb_readonly) {
> > >>>>>>>>> +                    return false;
> > >>>>>>>>> +                }
> > >>>>>>>>> +                set_pb_chassis_in_sbrec(pb, chassis_rec,
true);
> > >>>>>>>>> +            }
> > >>>>>>>>> +        } else {
> > >>>>>>>>> +            if ((pb->chassis != chassis_rec) || (pb->n_up &&
> > >>>>>>>> !pb->up[0])) {
> > >>>>>>>>> +                if_status_mgr_claim_iface(if_mgr, pb,
chassis_rec,
> > >>>>>>>>> +                                          sb_readonly);
> > >>>>>>>>> +            }
> > >>>>>>>>> +        }
> > >>>>>>>>>      } else if (can_bind == CAN_BIND_AS_ADDITIONAL) {
> > >>>>>>>>>          if (!is_additional_chassis(pb, chassis_rec)) {
> > >>>>>>>>>              if (sb_readonly) {
> > >>>>>>>>> @@ -1132,7 +1189,8 @@ claim_lport(const struct
sbrec_port_binding
> > >>> *pb,
> > >>>>>>>>>   */
> > >>>>>>>>>  static bool
> > >>>>>>>>>  release_lport_main_chassis(const struct sbrec_port_binding
*pb,
> > >>>>>>>>> -                           bool sb_readonly)
> > >>>>>>>>> +                           bool sb_readonly,
> > >>>>>>>>> +                           struct if_status_mgr *if_mgr)
> > >>>>>>>>>  {
> > >>>>>>>>>      if (pb->encap) {
> > >>>>>>>>>          if (sb_readonly) {
> > >>>>>>>>> @@ -1141,11 +1199,13 @@ release_lport_main_chassis(const
struct
> > >>>>>>>> sbrec_port_binding *pb,
> > >>>>>>>>>          sbrec_port_binding_set_encap(pb, NULL);
> > >>>>>>>>>      }
> > >>>>>>>>>
> > >>>>>>>>> +    /* If sb readonly, pb->chassis unset through if-status if
> > >>>>> present.
> > >>>>>>>> */
> > >>>>>>>>>      if (pb->chassis) {
> > >>>>>>>>> -        if (sb_readonly) {
> > >>>>>>>>> +        if (!sb_readonly) {
> > >>>>>>>>> +            sbrec_port_binding_set_chassis(pb, NULL);
> > >>>>>>>>> +        } else if (!if_status_mgr_iface_is_present(if_mgr,
> > >>>>>>>> pb->logical_port)) {
> > >>>>>>>>>              return false;
> > >>>>>>>>>          }
> > >>>>>>>>> -        sbrec_port_binding_set_chassis(pb, NULL);
> > >>>>>>>>>      }
> > >>>>>>>>>
> > >>>>>>>>>      if (pb->virtual_parent) {
> > >>>>>>>>> @@ -1155,7 +1215,8 @@ release_lport_main_chassis(const struct
> > >>>>>>>> sbrec_port_binding *pb,
> > >>>>>>>>>          sbrec_port_binding_set_virtual_parent(pb, NULL);
> > >>>>>>>>>      }
> > >>>>>>>>>
> > >>>>>>>>> -    VLOG_INFO("Releasing lport %s from this chassis.",
> > >>>>>>>> pb->logical_port);
> > >>>>>>>>> +    VLOG_INFO("Releasing lport %s from this chassis
> > >>>>> (sb_readonly=%d)",
> > >>>>>>>>> +              pb->logical_port, sb_readonly);
> > >>>>>>>>>      return true;
> > >>>>>>>>>  }
> > >>>>>>>>>
> > >>>>>>>>> @@ -1189,7 +1250,7 @@ release_lport(const struct
sbrec_port_binding
> > >>>>> *pb,
> > >>>>>>>>>                struct hmap *tracked_datapaths, struct
if_status_mgr
> > >>>>>>>> *if_mgr)
> > >>>>>>>>>  {
> > >>>>>>>>>      if (pb->chassis == chassis_rec) {
> > >>>>>>>>> -        if (!release_lport_main_chassis(pb, sb_readonly)) {
> > >>>>>>>>> +        if (!release_lport_main_chassis(pb, sb_readonly,
if_mgr))
> > >>> {
> > >>>>>>>>>              return false;
> > >>>>>>>>>          }
> > >>>>>>>>>      } else if (is_additional_chassis(pb, chassis_rec)) {
> > >>>>>>>>> @@ -1271,7 +1332,7 @@ consider_vif_lport_(const struct
> > >>>>>>>> sbrec_port_binding *pb,
> > >>>>>>>>>                               b_lport->lbinding->iface,
> > >>>>>>>>>                               !b_ctx_in->ovnsb_idl_txn,
> > >>>>>>>>>                               !parent_pb,
> > >>>>> b_ctx_out->tracked_dp_bindings,
> > >>>>>>>>> -                             b_ctx_out->if_mgr)){
> > >>>>>>>>> +                             b_ctx_out->if_mgr)) {
> > >>>>>>>>>                  return false;
> > >>>>>>>>>              }
> > >>>>>>>>>
> > >>>>>>>>> @@ -1527,7 +1588,8 @@ consider_localport(const struct
> > >>>>> sbrec_port_binding
> > >>>>>>>> *pb,
> > >>>>>>>>>      enum can_bind can_bind = lport_can_bind_on_this_chassis(
> > >>>>>>>>>          b_ctx_in->chassis_rec, pb);
> > >>>>>>>>>      if (can_bind == CAN_BIND_AS_MAIN) {
> > >>>>>>>>> -        if (!release_lport_main_chassis(pb,
> > >>>>> !b_ctx_in->ovnsb_idl_txn)) {
> > >>>>>>>>> +        if (!release_lport_main_chassis(pb,
> > >>> !b_ctx_in->ovnsb_idl_txn,
> > >>>>>>>>> +            b_ctx_out->if_mgr)) {
> > >>>>>>>>>              return false;
> > >>>>>>>>>          }
> > >>>>>>>>>      } else if (can_bind == CAN_BIND_AS_ADDITIONAL) {
> > >>>>>>>>> diff --git a/controller/binding.h b/controller/binding.h
> > >>>>>>>>> index 1fed06674..d20659b0b 100644
> > >>>>>>>>> --- a/controller/binding.h
> > >>>>>>>>> +++ b/controller/binding.h
> > >>>>>>>>> @@ -151,8 +151,10 @@ const struct sbrec_port_binding
> > >>>>>>>> *local_binding_get_primary_pb(
> > >>>>>>>>>  ofp_port_t local_binding_get_lport_ofport(const struct shash
> > >>>>>>>> *local_bindings,
> > >>>>>>>>>                                            const char
*pb_name);
> > >>>>>>>>>
> > >>>>>>>>> -bool local_binding_is_up(struct shash *local_bindings, const
char
> > >>>>>>>> *pb_name);
> > >>>>>>>>> -bool local_binding_is_down(struct shash *local_bindings,
const
> > >>> char
> > >>>>>>>> *pb_name);
> > >>>>>>>>> +bool local_binding_is_up(struct shash *local_bindings, const
char
> > >>>>>>>> *pb_name,
> > >>>>>>>>> +                         const struct sbrec_chassis *);
> > >>>>>>>>> +bool local_binding_is_down(struct shash *local_bindings,
const
> > >>> char
> > >>>>>>>> *pb_name,
> > >>>>>>>>> +                           const struct sbrec_chassis *);
> > >>>>>>>>>  void local_binding_set_up(struct shash *local_bindings,
const char
> > >>>>>>>> *pb_name,
> > >>>>>>>>>                            const struct sbrec_chassis
*chassis_rec,
> > >>>>>>>>>                            const char *ts_now_str, bool
> > >>> sb_readonly,
> > >>>>>>>>> @@ -160,7 +162,10 @@ void local_binding_set_up(struct shash
> > >>>>>>>> *local_bindings, const char *pb_name,
> > >>>>>>>>>  void local_binding_set_down(struct shash *local_bindings,
const
> > >>> char
> > >>>>>>>> *pb_name,
> > >>>>>>>>>                              const struct sbrec_chassis
> > >>> *chassis_rec,
> > >>>>>>>>>                              bool sb_readonly, bool
ovs_readonly);
> > >>>>>>>>> -
> > >>>>>>>>> +void local_binding_set_pb(struct shash *local_bindings,
const char
> > >>>>>>>> *pb_name,
> > >>>>>>>>> +                          const struct sbrec_chassis
*chassis_rec,
> > >>>>>>>>> +                          struct hmap *tracked_datapaths,
> > >>>>>>>>> +                          bool is_set);
> > >>>>>>>>>  void binding_register_ovs_idl(struct ovsdb_idl *);
> > >>>>>>>>>  void binding_run(struct binding_ctx_in *, struct
binding_ctx_out
> > >>> *);
> > >>>>>>>>>  bool binding_cleanup(struct ovsdb_idl_txn *ovnsb_idl_txn,
> > >>>>>>>>> @@ -178,6 +183,10 @@ void binding_dump_local_bindings(struct
> > >>>>>>>> local_binding_data *, struct ds *);
> > >>>>>>>>>  bool is_additional_chassis(const struct sbrec_port_binding
*pb,
> > >>>>>>>>>                             const struct sbrec_chassis
> > >>> *chassis_rec);
> > >>>>>>>>>
> > >>>>>>>>> +void set_pb_chassis_in_sbrec(const struct sbrec_port_binding
*pb,
> > >>>>>>>>> +                             const struct sbrec_chassis
> > >>> *chassis_rec,
> > >>>>>>>>> +                             bool is_set);
> > >>>>>>>>> +
> > >>>>>>>>>  /* Corresponds to each Port_Binding.type. */
> > >>>>>>>>>  enum en_lport_type {
> > >>>>>>>>>      LP_UNKNOWN,
> > >>>>>>>>> diff --git a/controller/if-status.c b/controller/if-status.c
> > >>>>>>>>> index ad61844d8..7693c289b 100644
> > >>>>>>>>> --- a/controller/if-status.c
> > >>>>>>>>> +++ b/controller/if-status.c
> > >>>>>>>>> @@ -24,6 +24,7 @@
> > >>>>>>>>>  #include "lib/util.h"
> > >>>>>>>>>  #include "timeval.h"
> > >>>>>>>>>  #include "openvswitch/vlog.h"
> > >>>>>>>>> +#include "lib/ovn-sb-idl.h"
> > >>>>>>>>>
> > >>>>>>>>>  VLOG_DEFINE_THIS_MODULE(if_status);
> > >>>>>>>>>
> > >>>>>>>>> @@ -53,9 +54,11 @@ VLOG_DEFINE_THIS_MODULE(if_status);
> > >>>>>>>>>   */
> > >>>>>>>>>
> > >>>>>>>>>  enum if_state {
> > >>>>>>>>> -    OIF_CLAIMED,       /* Newly claimed interface. */
> > >>>>>>>>> -    OIF_INSTALL_FLOWS, /* Already claimed interface for which
> > >>> flows
> > >>>>> are
> > >>>>>>>> still
> > >>>>>>>>> -                        * being installed.
> > >>>>>>>>> +    OIF_CLAIMED,       /* Newly claimed interface.
pb->chassis not
> > >>>>> yet
> > >>>>>>>> updated.
> > >>>>>>>>> +                        */
> > >>>>>>>>> +    OIF_INSTALL_FLOWS, /* Claimed interface with pb->chassis
> > >>>>>>>> successfully
> > >>>>>>>>> +                        * updated in SB and for which flows
are
> > >>> still
> > >>>>>>>> being
> > >>>>>>>>> +                        * installed.
> > >>>>>>>>>                          */
> > >>>>>>>>>      OIF_MARK_UP,       /* Interface with flows successfully
> > >>> installed
> > >>>>>>>> in OVS
> > >>>>>>>>>                          * but not yet marked "up" in the
binding
> > >>>>> module
> > >>>>>>>> (in
> > >>>>>>>>> @@ -78,6 +81,55 @@ static const char *if_state_names[] = {
> > >>>>>>>>>      [OIF_INSTALLED]     = "INSTALLED",
> > >>>>>>>>>  };
> > >>>>>>>>>
> > >>>>>>>>> +/*
> > >>>>>>>>> + *       +----------------------+
> > >>>>>>>>> + * +---> |                      |
> > >>>>>>>>> + * | +-> |         NULL         |
> > >>>>>>>> <--------------------------------------+++-+
> > >>>>>>>>> + * | |   +----------------------+
> > >>>>>>>>      |
> > >>>>>>>>> + * | |     ^ release_iface   | claim_iface
> > >>>>>>>>       |
> > >>>>>>>>> + * | |     |                 V - sbrec_update_chassis(if sb
is rw)
> > >>>>>>>>       |
> > >>>>>>>>> + * | |   +----------------------+
> > >>>>>>>>      |
> > >>>>>>>>> + * | |   |                      |
> > >>>>>>>> <----------------------------------------+ |
> > >>>>>>>>> + * | |   |       CLAIMED        |
> > >>>>>>>> <--------------------------------------+ | |
> > >>>>>>>>> + * | |   +----------------------+
> > >>>>>>>>  | | |
> > >>>>>>>>> + * | |                  | mgr_update(when sb is rw)
> > >>>>>>>>  | | |
> > >>>>>>>>> + * | | release_iface    |  - sbrec_update_chassis
> > >>>>>>>>  | | |
> > >>>>>>>>> + * | |                  |  - request seqno
> > >>>>>>>>   | | |
> > >>>>>>>>> + * | |                  V
> > >>>>>>>>  | | |
> > >>>>>>>>> + * | |   +----------------------+
> > >>>>>>>>  | | |
> > >>>>>>>>> + * | +-- |                      |  mgr_run(seqno not rcvd)
> > >>>>>>>>   | | |
> > >>>>>>>>> + * |     |    INSTALL_FLOWS     |   - set port down in sb
> > >>>>>>>>  | | |
> > >>>>>>>>> + * |     |                      |  mgr_update()
> > >>>>>>>>  | | |
> > >>>>>>>>> + * |     +----------------------+   - sbrec_update_chassis if
> > >>> needed
> > >>>>>>>>   | | |
> > >>>>>>>>> + * |                    |
> > >>>>>>>>  | | |
> > >>>>>>>>> + * |                    |  mgr_run(seqno rcvd)
> > >>>>>>>>   | | |
> > >>>>>>>>> + * |                    |  - set port up in sb
> > >>>>>>>>   | | |
> > >>>>>>>>> + * | release_iface      |  - set ovn-installed in ovs
> > >>>>>>>>  | | |
> > >>>>>>>>> + * |                    V
> > >>>>>>>>  | | |
> > >>>>>>>>> + * |   +----------------------+
> > >>>>>>>>  | | |
> > >>>>>>>>> + * |   |                      |  mgr_run()
> > >>>>>>>>   | | |
> > >>>>>>>>> + * +-- |       MARK_UP        |  - set port up in sb
> > >>>>>>>>   | | |
> > >>>>>>>>> + *     |                      |  - set ovn-installed in ovs
> > >>>>>>>>  | | |
> > >>>>>>>>> + *     |                      |  mgr_update()
> > >>>>>>>>  | | |
> > >>>>>>>>> + *     +----------------------+  - sbrec_update_chassis if
needed
> > >>>>>>>>  | | |
> > >>>>>>>>> + *              |
> > >>>>>>>>  | | |
> > >>>>>>>>> + *              | mgr_update(rcvd port up / ovn_installed &
> > >>> chassis
> > >>>>>>>> set) | | |
> > >>>>>>>>> + *              V
> > >>>>>>>>  | | |
> > >>>>>>>>> + *     +----------------------+
> > >>>>>>>>  | | |
> > >>>>>>>>> + *     |      INSTALLED       | ------------> claim_iface
> > >>>>>>>> ---------------+ | |
> > >>>>>>>>> + *     +----------------------+
> > >>>>>>>>    | |
> > >>>>>>>>> + *              |
> > >>>>>>>>    | |
> > >>>>>>>>> + *              | release_iface
> > >>>>>>>>    | |
> > >>>>>>>>> + *              V
> > >>>>>>>>    | |
> > >>>>>>>>> + *     +----------------------+
> > >>>>>>>>    | |
> > >>>>>>>>> + *     |                      | ------------> claim_iface
> > >>>>>>>> -----------------+ |
> > >>>>>>>>> + *     |      MARK_DOWN       | ------> mgr_update(rcvd port
down)
> > >>>>>>>> ----------+
> > >>>>>>>>> + *     |                      | mgr_run()
> > >>>>>>>>> + *     |                      | - set port down in sb
> > >>>>>>>>> + *     |                      | mgr_update()
> > >>>>>>>>> + *     +----------------------+ - sbrec_update_chassis(NULL)
> > >>>>>>>>> + */
> > >>>>>>>>> +
> > >>>>>>>>>  struct ovs_iface {
> > >>>>>>>>>      char *id;               /* Extracted from OVS
> > >>>>>>>> external_ids.iface_id. */
> > >>>>>>>>>      enum if_state state;    /* State of the interface in the
state
> > >>>>>>>> machine. */
> > >>>>>>>>> @@ -85,6 +137,7 @@ struct ovs_iface {
> > >>>>>>>>>                               * be fully programmed in OVS.
Only
> > >>> used
> > >>>>>>>> in state
> > >>>>>>>>>                               * OIF_INSTALL_FLOWS.
> > >>>>>>>>>                               */
> > >>>>>>>>> +    bool chassis_update_required;  /* If true, pb->chassis
must be
> > >>>>>>>> updated. */
> > >>>>>>>>>  };
> > >>>>>>>>>
> > >>>>>>>>>  static uint64_t ifaces_usage;
> > >>>>>>>>> @@ -158,14 +211,23 @@ if_status_mgr_destroy(struct
if_status_mgr
> > >>> *mgr)
> > >>>>>>>>>  }
> > >>>>>>>>>
> > >>>>>>>>>  void
> > >>>>>>>>> -if_status_mgr_claim_iface(struct if_status_mgr *mgr, const
char
> > >>>>>>>> *iface_id)
> > >>>>>>>>> +if_status_mgr_claim_iface(struct if_status_mgr *mgr,
> > >>>>>>>>> +                          const struct sbrec_port_binding
*pb,
> > >>>>>>>>> +                          const struct sbrec_chassis
*chassis_rec,
> > >>>>>>>>> +                          bool sb_readonly)
> > >>>>>>>>>  {
> > >>>>>>>>> +    const char *iface_id = pb->logical_port;
> > >>>>>>>>>      struct ovs_iface *iface = shash_find_data(&mgr->ifaces,
> > >>>>> iface_id);
> > >>>>>>>>>
> > >>>>>>>>>      if (!iface) {
> > >>>>>>>>>          iface = ovs_iface_create(mgr, iface_id, OIF_CLAIMED);
> > >>>>>>>>>      }
> > >>>>>>>>> -
> > >>>>>>>>> +    if (!sb_readonly) {
> > >>>>>>>>> +        set_pb_chassis_in_sbrec(pb, chassis_rec, true);
> > >>>>>>>>> +        iface->chassis_update_required = false;
> > >>>>>>>>> +    } else {
> > >>>>>>>>> +        iface->chassis_update_required = true;
> > >>>>>>>>> +    }
> > >>>>>>>>>      switch (iface->state) {
> > >>>>>>>>>      case OIF_CLAIMED:
> > >>>>>>>>>      case OIF_INSTALL_FLOWS:
> > >>>>>>>>> @@ -182,6 +244,12 @@ if_status_mgr_claim_iface(struct
if_status_mgr
> > >>>>>>>> *mgr, const char *iface_id)
> > >>>>>>>>>      }
> > >>>>>>>>>  }
> > >>>>>>>>>
> > >>>>>>>>> +bool
> > >>>>>>>>> +if_status_mgr_iface_is_present(struct if_status_mgr *mgr,
const
> > >>> char
> > >>>>>>>> *iface_id)
> > >>>>>>>>> +{
> > >>>>>>>>> +    return !!shash_find_data(&mgr->ifaces, iface_id);
> > >>>>>>>>> +}
> > >>>>>>>>> +
> > >>>>>>>>>  void
> > >>>>>>>>>  if_status_mgr_release_iface(struct if_status_mgr *mgr, const
char
> > >>>>>>>> *iface_id)
> > >>>>>>>>>  {
> > >>>>>>>>> @@ -246,9 +314,39 @@ if_status_mgr_delete_iface(struct
> > >>> if_status_mgr
> > >>>>>>>> *mgr, const char *iface_id)
> > >>>>>>>>>      }
> > >>>>>>>>>  }
> > >>>>>>>>>
> > >>>>>>>>> +bool
> > >>>>>>>>> +if_status_handle_claims(struct if_status_mgr *mgr,
> > >>>>>>>>> +                        struct local_binding_data
*binding_data,
> > >>>>>>>>> +                        const struct sbrec_chassis
*chassis_rec,
> > >>>>>>>>> +                        struct hmap *tracked_datapath,
> > >>>>>>>>> +                        bool sb_readonly)
> > >>>>>>>>> +{
> > >>>>>>>>> +    if (!binding_data || sb_readonly) {
> > >>>>>>>>> +        return false;
> > >>>>>>>>> +    }
> > >>>>>>>>> +
> > >>>>>>>>> +    struct shash *bindings = &binding_data->bindings;
> > >>>>>>>>> +    struct hmapx_node *node;
> > >>>>>>>>> +
> > >>>>>>>>> +    bool rc = false;
> > >>>>>>>>> +    HMAPX_FOR_EACH (node,
&mgr->ifaces_per_state[OIF_CLAIMED]) {
> > >>>>>>>>> +        struct ovs_iface *iface = node->data;
> > >>>>>>>>> +        if (iface->chassis_update_required) {
> > >>>>>>>>
> > >>>>>>>> Thanks Xavier for the revision. The state machine looks more
clear
> > >>> now,
> > >>>>>>>> but I have a major concern for the use of
chassis_update_required.
> > >>> This
> > >>>>>>>> bool flag is used to decide if an update to SB is needed, and
once
> > >>> a SB
> > >>>>>>>> update is requested, it is set to false, and assumes the SB
update
> > >>> will
> > >>>>>>>> succeed immediately. However, the assumption may be wrong.
There
> > >>> can be
> > >>>>>>>> different kinds of reasons that the subsequent SB update
fails, or
> > >>>>> delayed,
> > >>>>>>>> so this flag is not reliable. Instead, in CLAIMED state, the
> > >>>>> responsibility
> > >>>>>>>> to make sure the SB update is completed. If the transaction is
> > >>>>> in-progress,
> > >>>>>>>> the sb_readonly is true. So if sb_readonly is false, it means
> > >>> nothing
> > >>>>> is
> > >>>>>>>> in-progress, so we can always check if (!sb_readonly && <SB
chassis
> > >>> is
> > >>>>> not
> > >>>>>>>> updated for the port-binding>) we should just send the update,
> > >>>>> regardless
> > >>>>>>>> of whether we have requested it before. Please also see another
> > >>> comment
> > >>>>>>>> below for the state transition.
> > >>>>>>>>
> > >>>>>>>>> +            VLOG_INFO("if_status_handle_claims for %s",
> > >>> iface->id);
> > >>>>>>>>> +            local_binding_set_pb(bindings, iface->id,
chassis_rec,
> > >>>>>>>>> +                                 tracked_datapath, true);
> > >>>>>>>>> +            rc = true;
> > >>>>>>>>> +        }
> > >>>>>>>>> +        iface->chassis_update_required = false;
> > >>>>>>>>> +    }
> > >>>>>>>>> +    return rc;
> > >>>>>>>>> +}
> > >>>>>>>>> +
> > >>>>>>>>>  void
> > >>>>>>>>>  if_status_mgr_update(struct if_status_mgr *mgr,
> > >>>>>>>>> -                     struct local_binding_data *binding_data)
> > >>>>>>>>> +                     struct local_binding_data *binding_data,
> > >>>>>>>>> +                     const struct sbrec_chassis *chassis_rec,
> > >>>>>>>>> +                     bool sb_readonly)
> > >>>>>>>>>  {
> > >>>>>>>>>      if (!binding_data) {
> > >>>>>>>>>          return;
> > >>>>>>>>> @@ -257,13 +355,25 @@ if_status_mgr_update(struct
if_status_mgr
> > >>> *mgr,
> > >>>>>>>>>      struct shash *bindings = &binding_data->bindings;
> > >>>>>>>>>      struct hmapx_node *node;
> > >>>>>>>>>
> > >>>>>>>>> +    /* Interfaces in OIF_MARK_UP state have already set their
> > >>>>>>>> pb->chassis.
> > >>>>>>>>> +     * However, it might have been reset by another hv.
> > >>>>>>>>> +     */
> > >>>>>>>>>      /* Move all interfaces that have been confirmed "up" by
the
> > >>>>> binding
> > >>>>>>>> module,
> > >>>>>>>>>       * from OIF_MARK_UP to OIF_INSTALLED.
> > >>>>>>>>>       */
> > >>>>>>>>>      HMAPX_FOR_EACH_SAFE (node,
> > >>> &mgr->ifaces_per_state[OIF_MARK_UP]) {
> > >>>>>>>>>          struct ovs_iface *iface = node->data;
> > >>>>>>>>>
> > >>>>>>>>> -        if (local_binding_is_up(bindings, iface->id)) {
> > >>>>>>>>> +        if (iface->chassis_update_required) {
> > >>>>>>>>> +            if (!sb_readonly) {
> > >>>>>>>>> +                iface->chassis_update_required = false;
> > >>>>>>>>> +                local_binding_set_pb(bindings, iface->id,
> > >>>>> chassis_rec,
> > >>>>>>>>> +                                     NULL, true);
> > >>>>>>>>> +            } else {
> > >>>>>>>>> +                continue;
> > >>>>>>>>> +            }
> > >>>>>>>>> +        }
> > >>>>>>>>> +        if (local_binding_is_up(bindings, iface->id,
> > >>> chassis_rec)) {
> > >>>>>>>>>              ovs_iface_set_state(mgr, iface, OIF_INSTALLED);
> > >>>>>>>>>          }
> > >>>>>>>>>      }
> > >>>>>>>>> @@ -274,23 +384,53 @@ if_status_mgr_update(struct
if_status_mgr
> > >>> *mgr,
> > >>>>>>>>>      HMAPX_FOR_EACH_SAFE (node,
> > >>>>> &mgr->ifaces_per_state[OIF_MARK_DOWN]) {
> > >>>>>>>>>          struct ovs_iface *iface = node->data;
> > >>>>>>>>>
> > >>>>>>>>> -        if (local_binding_is_down(bindings, iface->id)) {
> > >>>>>>>>> +        if (!sb_readonly) {
> > >>>>>>>>> +            local_binding_set_pb(bindings, iface->id,
chassis_rec,
> > >>>>>>>>> +                                 NULL, false);
> > >>>>>>>>> +        }
> > >>>>>>>>> +        if (local_binding_is_down(bindings, iface->id,
> > >>> chassis_rec))
> > >>>>> {
> > >>>>>>>>>              ovs_iface_destroy(mgr, iface);
> > >>>>>>>>>          }
> > >>>>>>>>>      }
> > >>>>>>>>>
> > >>>>>>>>> -    /* Register for a notification about flows being
installed in
> > >>> OVS
> > >>>>>>>> for all
> > >>>>>>>>> -     * newly claimed interfaces.
> > >>>>>>>>> +    if (!sb_readonly) {
> > >>>>>>>>> +        HMAPX_FOR_EACH_SAFE (node,
> > >>>>>>>> &mgr->ifaces_per_state[OIF_INSTALL_FLOWS]) {
> > >>>>>>>>> +            struct ovs_iface *iface = node->data;
> > >>>>>>>>> +
> > >>>>>>>>> +            if (iface->chassis_update_required) {
> > >>>>>>>>> +                iface->chassis_update_required = false;
> > >>>>>>>>> +                local_binding_set_pb(bindings, iface->id,
> > >>>>> chassis_rec,
> > >>>>>>>>> +                                     NULL, true);
> > >>>>>>>>> +            }
> > >>>>>>>>> +        }
> > >>>>>>>>> +    }
> > >>>>>>>>> +
> > >>>>>>>>> +    /* Update Port_Binding->chassis for newly claimed
interfaces
> > >>>>>>>>> +     * Register for a notification about flows being
installed in
> > >>> OVS
> > >>>>>>>> for all
> > >>>>>>>>> +     * newly claimed interfaces for which we could update
> > >>>>> pb->chassis.
> > >>>>>>>>>       *
> > >>>>>>>>>       * Move them from OIF_CLAIMED to OIF_INSTALL_FLOWS.
> > >>>>>>>>>       */
> > >>>>>>>>> -    bool new_ifaces = false;
> > >>>>>>>>> -    HMAPX_FOR_EACH_SAFE (node,
> > >>> &mgr->ifaces_per_state[OIF_CLAIMED]) {
> > >>>>>>>>> -        struct ovs_iface *iface = node->data;
> > >>>>>>>>>
> > >>>>>>>>> -        ovs_iface_set_state(mgr, iface, OIF_INSTALL_FLOWS);
> > >>>>>>>>> -        iface->install_seqno = mgr->iface_seqno + 1;
> > >>>>>>>>> -        new_ifaces = true;
> > >>>>>>>>> +    bool new_ifaces = false;
> > >>>>>>>>> +    if (!sb_readonly) {
> > >>>>>>>>> +        HMAPX_FOR_EACH_SAFE (node,
> > >>>>> &mgr->ifaces_per_state[OIF_CLAIMED])
> > >>>>>>>> {
> > >>>>>>>>> +            struct ovs_iface *iface = node->data;
> > >>>>>>>>> +            /* No need to check for chassis_update_required
as
> > >>>>> already
> > >>>>>>>> done
> > >>>>>>>>> +             * in if_status_handle_claims or
> > >>>>> if_status_mgr_claim_iface
> > >>>>>>>>> +             */
> > >>>>>>>>> +            ovs_iface_set_state(mgr, iface,
OIF_INSTALL_FLOWS);
> > >>>>>>>>
> > >>>>>>>> We need to make sure the chassis in SB port-binding is
up-to-date
> > >>> (i.e.
> > >>>>>>>> the update notification from SB DB has been received) before
moving
> > >>> to
> > >>>>>>>> INSTALL_FLOWS. Otherwise, it is still possible that the state
is
> > >>> moved
> > >>>>> too
> > >>>>>>>> early and end up with incomplete flow installation for the
lport
> > >>> when
> > >>>>> the
> > >>>>>>>> state is finally moved to INSTALLED.
> > >>>>>>>>
> > >>>>>>>> Thanks,
> > >>>>>>>> Han
> > >>>>>>>>
> > >>>>>>>>> +            iface->install_seqno = mgr->iface_seqno + 1;
> > >>>>>>>>> +            new_ifaces = true;
> > >>>>>>>>> +        }
> > >>>>>>>>> +    } else {
> > >>>>>>>>> +        HMAPX_FOR_EACH_SAFE (node,
> > >>>>> &mgr->ifaces_per_state[OIF_CLAIMED])
> > >>>>>>>> {
> > >>>>>>>>> +            struct ovs_iface *iface = node->data;
> > >>>>>>>>> +            static struct vlog_rate_limit rl =
> > >>>>> VLOG_RATE_LIMIT_INIT(5,
> > >>>>>>>> 1);
> > >>>>>>>>> +            VLOG_INFO_RL(&rl,
> > >>>>>>>>> +                         "Not updating pb chassis for %s now
as "
> > >>>>>>>>> +                         "sb is readonly", iface->id);
> > >>>>>>>>> +        }
> > >>>>>>>>>      }
> > >>>>>>>>>
> > >>>>>>>>>      /* Request a seqno update when the flows for new
interfaces
> > >>> have
> > >>>>>>>> been
> > >>>>>>>>> @@ -403,7 +543,7 @@ if_status_mgr_update_bindings(struct
> > >>> if_status_mgr
> > >>>>>>>> *mgr,
> > >>>>>>>>>      struct hmapx_node *node;
> > >>>>>>>>>
> > >>>>>>>>>      /* Notify the binding module to set "down" all bindings
that
> > >>> are
> > >>>>>>>> still
> > >>>>>>>>> -     * in the process of being installed in OVS, i.e., are
not yet
> > >>>>>>>> instsalled.
> > >>>>>>>>> +     * in the process of being installed in OVS, i.e., are
not yet
> > >>>>>>>> installed.
> > >>>>>>>>>       */
> > >>>>>>>>>      HMAPX_FOR_EACH (node,
> > >>> &mgr->ifaces_per_state[OIF_INSTALL_FLOWS])
> > >>>>> {
> > >>>>>>>>>          struct ovs_iface *iface = node->data;
> > >>>>>>>>> diff --git a/controller/if-status.h b/controller/if-status.h
> > >>>>>>>>> index bb8a3950d..f9b05d30d 100644
> > >>>>>>>>> --- a/controller/if-status.h
> > >>>>>>>>> +++ b/controller/if-status.h
> > >>>>>>>>> @@ -27,15 +27,27 @@ struct if_status_mgr
> > >>> *if_status_mgr_create(void);
> > >>>>>>>>>  void if_status_mgr_clear(struct if_status_mgr *);
> > >>>>>>>>>  void if_status_mgr_destroy(struct if_status_mgr *);
> > >>>>>>>>>
> > >>>>>>>>> -void if_status_mgr_claim_iface(struct if_status_mgr *, const
char
> > >>>>>>>> *iface_id);
> > >>>>>>>>> +void if_status_mgr_claim_iface(struct if_status_mgr *,
> > >>>>>>>>> +                               const struct
sbrec_port_binding
> > >>> *pb,
> > >>>>>>>>> +                               const struct sbrec_chassis
> > >>>>> *chassis_rec,
> > >>>>>>>>> +                               bool sb_readonly);
> > >>>>>>>>>  void if_status_mgr_release_iface(struct if_status_mgr *,
const
> > >>> char
> > >>>>>>>> *iface_id);
> > >>>>>>>>>  void if_status_mgr_delete_iface(struct if_status_mgr *,
const char
> > >>>>>>>> *iface_id);
> > >>>>>>>>>
> > >>>>>>>>> -void if_status_mgr_update(struct if_status_mgr *, struct
> > >>>>>>>> local_binding_data *);
> > >>>>>>>>> +void if_status_mgr_update(struct if_status_mgr *, struct
> > >>>>>>>> local_binding_data *,
> > >>>>>>>>> +                          const struct sbrec_chassis
*chassis,
> > >>>>>>>>> +                          bool sb_readonly);
> > >>>>>>>>>  void if_status_mgr_run(struct if_status_mgr *mgr, struct
> > >>>>>>>> local_binding_data *,
> > >>>>>>>>>                         const struct sbrec_chassis *,
> > >>>>>>>>>                         bool sb_readonly, bool ovs_readonly);
> > >>>>>>>>>  void if_status_mgr_get_memory_usage(struct if_status_mgr
*mgr,
> > >>>>>>>>>                                      struct simap *usage);
> > >>>>>>>>> +bool if_status_mgr_iface_is_present(struct if_status_mgr
*mgr,
> > >>>>>>>>> +                                    const char *iface_id);
> > >>>>>>>>> +bool if_status_handle_claims(struct if_status_mgr *mgr,
> > >>>>>>>>> +                             struct local_binding_data
> > >>> *binding_data,
> > >>>>>>>>> +                             const struct sbrec_chassis
> > >>> *chassis_rec,
> > >>>>>>>>> +                             struct hmap *tracked_datapath,
> > >>>>>>>>> +                             bool sb_readonly);
> > >>>>>>>>>
> > >>>>>>>>>  # endif /* controller/if-status.h */
> > >>>>>>>>> diff --git a/controller/ovn-controller.c
> > >>> b/controller/ovn-controller.c
> > >>>>>>>>> index 69615308e..3947baf03 100644
> > >>>>>>>>> --- a/controller/ovn-controller.c
> > >>>>>>>>> +++ b/controller/ovn-controller.c
> > >>>>>>>>> @@ -1464,6 +1464,73 @@ en_runtime_data_run(struct engine_node
> > >>> *node,
> > >>>>>>>> void *data)
> > >>>>>>>>>      engine_set_node_state(node, EN_UPDATED);
> > >>>>>>>>>  }
> > >>>>>>>>>
> > >>>>>>>>> +struct ed_type_sb_ro {
> > >>>>>>>>> +    bool sb_readonly;
> > >>>>>>>>> +};
> > >>>>>>>>> +
> > >>>>>>>>> +static void *
> > >>>>>>>>> +en_sb_ro_init(struct engine_node *node OVS_UNUSED,
> > >>>>>>>>> +              struct engine_arg *arg OVS_UNUSED)
> > >>>>>>>>> +{
> > >>>>>>>>> +    struct ed_type_sb_ro *data = xzalloc(sizeof *data);
> > >>>>>>>>> +    return data;
> > >>>>>>>>> +}
> > >>>>>>>>> +
> > >>>>>>>>> +static void
> > >>>>>>>>> +en_sb_ro_run(struct engine_node *node, void *data)
> > >>>>>>>>> +{
> > >>>>>>>>> +    struct ed_type_sb_ro *sb_ro_data = data;
> > >>>>>>>>> +    bool sb_readonly = !engine_get_context()->ovnsb_idl_txn;
> > >>>>>>>>> +    if (sb_ro_data->sb_readonly != sb_readonly) {
> > >>>>>>>>> +        sb_ro_data->sb_readonly = sb_readonly;
> > >>>>>>>>> +        if (!sb_ro_data->sb_readonly) {
> > >>>>>>>>> +            engine_set_node_state(node, EN_UPDATED);
> > >>>>>>>>> +        }
> > >>>>>>>>> +    }
> > >>>>>>>>> +}
> > >>>>>>>>> +
> > >>>>>>>>> +static void
> > >>>>>>>>> +en_sb_ro_cleanup(void *data OVS_UNUSED)
> > >>>>>>>>> +{
> > >>>>>>>>> +}
> > >>>>>>>>> +
> > >>>>>>>>> +static bool
> > >>>>>>>>> +runtime_data_sb_ro_handler(struct engine_node *node, void
*data)
> > >>>>>>>>> +{
> > >>>>>>>>> +    const struct sbrec_chassis *chassis = NULL;
> > >>>>>>>>> +
> > >>>>>>>>> +    struct ovsrec_open_vswitch_table *ovs_table =
> > >>>>>>>>> +        (struct ovsrec_open_vswitch_table *)EN_OVSDB_GET(
> > >>>>>>>>> +            engine_get_input("OVS_open_vswitch", node));
> > >>>>>>>>> +
> > >>>>>>>>> +    const char *chassis_id = get_ovs_chassis_id(ovs_table);
> > >>>>>>>>> +
> > >>>>>>>>> +    struct ovsdb_idl_index *sbrec_chassis_by_name =
> > >>>>>>>>> +        engine_ovsdb_node_get_index(
> > >>>>>>>>> +                engine_get_input("SB_chassis", node),
> > >>>>>>>>> +                "name");
> > >>>>>>>>> +
> > >>>>>>>>> +    if (chassis_id) {
> > >>>>>>>>> +        chassis =
chassis_lookup_by_name(sbrec_chassis_by_name,
> > >>>>>>>> chassis_id);
> > >>>>>>>>> +    }
> > >>>>>>>>> +    if (chassis) {
> > >>>>>>>>> +        struct ed_type_runtime_data *rt_data = data;
> > >>>>>>>>> +        bool sb_readonly =
!engine_get_context()->ovnsb_idl_txn;
> > >>>>>>>>> +        struct controller_engine_ctx *ctrl_ctx =
> > >>>>>>>>> +            engine_get_context()->client_ctx;
> > >>>>>>>>> +
> > >>>>>>>>> +        if (if_status_handle_claims(ctrl_ctx->if_mgr,
> > >>>>>>>>> +                                    &rt_data->lbinding_data,
> > >>>>>>>>> +                                    chassis,
> > >>>>>>>>> +
 &rt_data->tracked_dp_bindings,
> > >>>>>>>>> +                                    sb_readonly)) {
> > >>>>>>>>> +            engine_set_node_state(node, EN_UPDATED);
> > >>>>>>>>> +            rt_data->tracked = true;
> > >>>>>>>>> +        }
> > >>>>>>>>> +    }
> > >>>>>>>>> +    return true;
> > >>>>>>>>> +}
> > >>>>>>>>> +
> > >>>>>>>>>  static bool
> > >>>>>>>>>  runtime_data_ovs_interface_shadow_handler(struct engine_node
> > >>> *node,
> > >>>>>>>> void *data)
> > >>>>>>>>>  {
> > >>>>>>>>> @@ -3528,6 +3595,7 @@ main(int argc, char *argv[])
> > >>>>>>>>>      stopwatch_create(VIF_PLUG_RUN_STOPWATCH_NAME, SW_MS);
> > >>>>>>>>>
> > >>>>>>>>>      /* Define inc-proc-engine nodes. */
> > >>>>>>>>> +    ENGINE_NODE(sb_ro, "sb_ro");
> > >>>>>>>>>      ENGINE_NODE_WITH_CLEAR_TRACK_DATA_IS_VALID(ct_zones,
> > >>> "ct_zones");
> > >>>>>>>>>      ENGINE_NODE_WITH_CLEAR_TRACK_DATA(ovs_interface_shadow,
> > >>>>>>>>>
 "ovs_interface_shadow");
> > >>>>>>>>> @@ -3664,6 +3732,7 @@ main(int argc, char *argv[])
> > >>>>>>>>>      engine_add_input(&en_ovs_interface_shadow,
&en_ovs_interface,
> > >>>>>>>>>
ovs_interface_shadow_ovs_interface_handler);
> > >>>>>>>>>
> > >>>>>>>>> +    engine_add_input(&en_runtime_data, &en_sb_ro,
> > >>>>>>>> runtime_data_sb_ro_handler);
> > >>>>>>>>>      engine_add_input(&en_runtime_data,
&en_ofctrl_is_connected,
> > >>>>> NULL);
> > >>>>>>>>>
> > >>>>>>>>>      engine_add_input(&en_runtime_data, &en_ovs_open_vswitch,
> > >>> NULL);
> > >>>>>>>>> @@ -4098,7 +4167,8 @@ main(int argc, char *argv[])
> > >>>>>>>>>                          runtime_data ?
> > >>> &runtime_data->lbinding_data :
> > >>>>>>>> NULL;
> > >>>>>>>>>
> > >>>>>  stopwatch_start(IF_STATUS_MGR_UPDATE_STOPWATCH_NAME,
> > >>>>>>>>>                                      time_msec());
> > >>>>>>>>> -                    if_status_mgr_update(if_mgr,
binding_data);
> > >>>>>>>>> +                    if_status_mgr_update(if_mgr,
binding_data,
> > >>>>> chassis,
> > >>>>>>>>> +                                         !ovnsb_idl_txn);
> > >>>>>>>>>
> > >>>>>  stopwatch_stop(IF_STATUS_MGR_UPDATE_STOPWATCH_NAME,
> > >>>>>>>>>                                     time_msec());
> > >>>>>>>>>
> > >>>>>>>>> diff --git a/tests/ovn-macros.at b/tests/ovn-macros.at
> > >>>>>>>>> index 335f9158c..8fd6ae6f7 100644
> > >>>>>>>>> --- a/tests/ovn-macros.at
> > >>>>>>>>> +++ b/tests/ovn-macros.at
> > >>>>>>>>> @@ -759,3 +759,15 @@
> > >>>>> m4_define([OVN_FOR_EACH_NORTHD_WITHOUT_DP_GROUPS],
> > >>>>>>>>>       [m4_foreach([NORTHD_USE_DP_GROUPS], [no],
> > >>>>>>>>>         [m4_foreach([NORTHD_USE_PARALLELIZATION], [yes, no],
[$1
> > >>>>>>>>>  ])])])])
> > >>>>>>>>> +
> > >>>>>>>>> +# OVN_NBCTL(NBCTL_COMMAND) adds NBCTL_COMMAND to list of
commands
> > >>> to
> > >>>>> be
> > >>>>>>>> run by RUN_OVN_NBCTL().
> > >>>>>>>>> +m4_define([OVN_NBCTL], [
> > >>>>>>>>> +    command="${command} -- $1"
> > >>>>>>>>> +])
> > >>>>>>>>> +
> > >>>>>>>>> +# RUN_OVN_NBCTL() executes list of commands built by the
> > >>> OVN_NBCTL()
> > >>>>>>>> macro.
> > >>>>>>>>> +m4_define([RUN_OVN_NBCTL], [
> > >>>>>>>>> +    check ovn-nbctl ${command}
> > >>>>>>>>> +    unset command
> > >>>>>>>>> +])
> > >>>>>>>>> +
> > >>>>>>>>> diff --git a/tests/ovn.at b/tests/ovn.at
> > >>>>>>>>> index bfaa41962..94d16bac9 100644
> > >>>>>>>>> --- a/tests/ovn.at
> > >>>>>>>>> +++ b/tests/ovn.at
> > >>>>>>>>> @@ -102,6 +102,18 @@ m4_divert_text([PREPARE_TESTS],
> > >>>>>>>>>           test 1 -le $(as $hv1 ovs-ofctl dump-flows br-int |
grep
> > >>> -c
> > >>>>>>>> "output:$ofport")
> > >>>>>>>>>       ])
> > >>>>>>>>>     }
> > >>>>>>>>> +
> > >>>>>>>>> +   ovn_wait_remote_input_flows () {
> > >>>>>>>>> +     hv1=$1
> > >>>>>>>>> +     hv2=$2
> > >>>>>>>>> +     echo "$3: waiting for flows for remote input on $hv1"
> > >>>>>>>>> +     # Wait for a flow outputing  to remote input
> > >>>>>>>>> +     OVS_WAIT_UNTIL([
> > >>>>>>>>> +         ofport=$(as $hv1 ovs-vsctl --bare --columns ofport
find
> > >>>>>>>> Interface name=ovn-${hv2}-0)
> > >>>>>>>>> +         echo "tunnel port=$ofport"
> > >>>>>>>>> +         test 1 -le $(as $hv1 ovs-ofctl dump-flows br-int |
grep
> > >>> -c
> > >>>>>>>> "in_port=$ofport")
> > >>>>>>>>> +     ])
> > >>>>>>>>> +   }
> > >>>>>>>>>  ])
> > >>>>>>>>>
> > >>>>>>>>>  m4_define([OVN_CHECK_PACKETS],
> > >>>>>>>>> @@ -127,6 +139,8 @@ m4_define([OVN_WAIT_PATCH_PORT_FLOWS],
> > >>>>>>>>>  m4_define([OVN_WAIT_REMOTE_OUTPUT_FLOWS],
> > >>>>>>>>>    [ovn_wait_remote_output_flows "$1" "$2"
"__file__:__line__"])
> > >>>>>>>>>
> > >>>>>>>>> +m4_define([OVN_WAIT_REMOTE_INPUT_FLOWS],
> > >>>>>>>>> +  [ovn_wait_remote_input_flows "$1" "$2"
"__file__:__line__"])
> > >>>>>>>>>
> > >>>>>>>>>  AT_BANNER([OVN components])
> > >>>>>>>>>
> > >>>>>>>>> @@ -14056,6 +14070,11 @@ wait_column "$hv1_uuid" Port_Binding
> > >>>>>>>> requested_chassis logical_port=lsp0
> > >>>>>>>>>  wait_column "$hv2_uuid" Port_Binding additional_chassis
> > >>>>>>>> logical_port=lsp0
> > >>>>>>>>>  wait_column "$hv2_uuid" Port_Binding
requested_additional_chassis
> > >>>>>>>> logical_port=lsp0
> > >>>>>>>>>
> > >>>>>>>>> +# Check ovn-installed updated for main chassis
> > >>>>>>>>> +wait_for_ports_up
> > >>>>>>>>> +OVS_WAIT_UNTIL([test `as hv1 ovs-vsctl get Interface lsp0
> > >>>>>>>> external_ids:ovn-installed` = '"true"'])
> > >>>>>>>>> +OVS_WAIT_UNTIL([test x`as hv2 ovs-vsctl get Interface lsp0
> > >>>>>>>> external_ids:ovn-installed` = x])
> > >>>>>>>>> +
> > >>>>>>>>>  # Check that setting iface:encap-ip populates
> > >>>>>>>> Port_Binding:additional_encap
> > >>>>>>>>>  wait_row_count Encap 2 chassis_name=hv1
> > >>>>>>>>>  wait_row_count Encap 2 chassis_name=hv2
> > >>>>>>>>> @@ -14081,6 +14100,11 @@ wait_column "$hv2_uuid" Port_Binding
> > >>>>>>>> requested_chassis logical_port=lsp0
> > >>>>>>>>>  wait_column "" Port_Binding additional_chassis
logical_port=lsp0
> > >>>>>>>>>  wait_column "" Port_Binding requested_additional_chassis
> > >>>>>>>> logical_port=lsp0
> > >>>>>>>>>
> > >>>>>>>>> +# Check ovn-installed updated for main chassis and not for
other
> > >>>>> chassis
> > >>>>>>>>> +wait_for_ports_up
> > >>>>>>>>> +OVS_WAIT_UNTIL([test `as hv2 ovs-vsctl get Interface lsp0
> > >>>>>>>> external_ids:ovn-installed` = '"true"'])
> > >>>>>>>>> +OVS_WAIT_UNTIL([test x`as hv1 ovs-vsctl get Interface lsp0
> > >>>>>>>> external_ids:ovn-installed` = x])
> > >>>>>>>>> +
> > >>>>>>>>>  # Check that additional_encap is cleared
> > >>>>>>>>>  wait_column "" Port_Binding additional_encap
logical_port=lsp0
> > >>>>>>>>>
> > >>>>>>>>> @@ -15370,7 +15394,10 @@ AT_CHECK([as hv1 ovs-ofctl dump-flows
> > >>> br-int
> > >>>>>>>> table=65 | grep actions=output:1],
> > >>>>>>>>>  echo "verifying that lsp0 binding moves when
requested-chassis is
> > >>>>>>>> changed"
> > >>>>>>>>>
> > >>>>>>>>>  ovn-nbctl lsp-set-options lsp0 requested-chassis=hv2
> > >>>>>>>>> -OVS_WAIT_UNTIL([test 1 = $(grep -c "Releasing lport lsp0
from this
> > >>>>>>>> chassis" hv1/ovn-controller.log)])
> > >>>>>>>>> +
> > >>>>>>>>> +# We might see multiple "Releasing lport ...", when sb is
read
> > >>> only
> > >>>>>>>>> +OVS_WAIT_UNTIL([test 1 -le $(grep -c "Releasing lport lsp0
from
> > >>> this
> > >>>>>>>> chassis" hv1/ovn-controller.log)])
> > >>>>>>>>> +
> > >>>>>>>>>  wait_column "$hv2_uuid" Port_Binding chassis
logical_port=lsp0
> > >>>>>>>>>
> > >>>>>>>>>  # (6) Chassis hv2 should add flows and hv1 should not.
> > >>>>>>>>> @@ -15416,7 +15443,7 @@ AT_CHECK([as hv1 ovs-ofctl dump-flows
> > >>> br-int
> > >>>>>>>> table=0 | grep in_port=1], [0], [ig
> > >>>>>>>>>  AT_CHECK([as hv1 ovs-ofctl dump-flows br-int table=65 | grep
> > >>>>>>>> actions=output:1], [0], [ignore])
> > >>>>>>>>>
> > >>>>>>>>>  check ovn-nbctl --wait=hv lsp-set-options lsp0
> > >>>>>>>> requested-chassis=non-existant-chassis
> > >>>>>>>>> -OVS_WAIT_UNTIL([test 1 = $(grep -c "Releasing lport lsp0
from this
> > >>>>>>>> chassis" hv1/ovn-controller.log)])
> > >>>>>>>>> +OVS_WAIT_UNTIL([test 1 -le $(grep -c "Releasing lport lsp0
from
> > >>> this
> > >>>>>>>> chassis" hv1/ovn-controller.log)])
> > >>>>>>>>>  check ovn-nbctl --wait=hv sync
> > >>>>>>>>>  wait_column '' Port_Binding chasssi logical_port=lsp0
> > >>>>>>>>>  AT_CHECK([as hv1 ovs-ofctl dump-flows br-int table=0 | grep
> > >>>>> in_port=1],
> > >>>>>>>> [1], [])
> > >>>>>>>>> @@ -32418,3 +32445,119 @@ AT_CHECK([test $(ovn-sbctl list fdb
|
> > >>> grep
> > >>>>> -c
> > >>>>>>>> "00:00:00:00:10:30") = 0])
> > >>>>>>>>>  OVN_CLEANUP([hv1])
> > >>>>>>>>>  AT_CLEANUP
> > >>>>>>>>>  ])
> > >>>>>>>>> +
> > >>>>>>>>> +OVN_FOR_EACH_NORTHD([
> > >>>>>>>>> +AT_SETUP([recomputes])
> > >>>>>>>>> +ovn_start
> > >>>>>>>>> +
> > >>>>>>>>> +n_hv=4
> > >>>>>>>>> +
> > >>>>>>>>> +# Add chassis
> > >>>>>>>>> +net_add n1
> > >>>>>>>>> +for i in $(seq 1 $n_hv); do
> > >>>>>>>>> +    sim_add hv$i
> > >>>>>>>>> +    as hv$i
> > >>>>>>>>> +    check ovs-vsctl add-br br-phys
> > >>>>>>>>> +    ovs-vsctl set open .
> > >>>>> external-ids:ovn-bridge-mappings=phys:br-phys
> > >>>>>>>>> +    ovn_attach n1 br-phys 192.168.0.$i 24 geneve
> > >>>>>>>>> +done
> > >>>>>>>>> +
> > >>>>>>>>> +add_switch_ports() {
> > >>>>>>>>> +    start_port=$1
> > >>>>>>>>> +    end_port=$2
> > >>>>>>>>> +    nb_hv=$3
> > >>>>>>>>> +    bulk_size=$4
> > >>>>>>>>> +    for ((i=start_port; i<end_port; )) do
> > >>>>>>>>> +        start_bulk=$i
> > >>>>>>>>> +        for hv in $(seq 1 $nb_hv); do
> > >>>>>>>>> +            end_bulk=$((start_bulk+bulk_size-1))
> > >>>>>>>>> +            for port in $(seq $start_bulk $end_bulk); do
> > >>>>>>>>> +                logical_switch_port=lsp${port}
> > >>>>>>>>> +                OVN_NBCTL(lsp-add ls1 $logical_switch_port)
> > >>>>>>>>> +                OVN_NBCTL(lsp-set-addresses
$logical_switch_port
> > >>>>>>>> dynamic)
> > >>>>>>>>> +            done
> > >>>>>>>>> +            start_bulk=$((end_bulk+1))
> > >>>>>>>>> +        done
> > >>>>>>>>> +        RUN_OVN_NBCTL()
> > >>>>>>>>> +
> > >>>>>>>>> +        start_bulk=$i
> > >>>>>>>>> +        for hv in $(seq 1 $nb_hv); do
> > >>>>>>>>> +            end_bulk=$((start_bulk+bulk_size-1))
> > >>>>>>>>> +            for port in $(seq $start_bulk $end_bulk); do
> > >>>>>>>>> +                logical_switch_port=lsp${port}
> > >>>>>>>>> +                as hv$hv ovs-vsctl \
> > >>>>>>>>> +                    --no-wait -- add-port br-int vif${port} \
> > >>>>>>>>> +                    -- set Interface vif${port}
> > >>>>>>>> external_ids:iface-id=$logical_switch_port
> > >>>>>>>>> +            done
> > >>>>>>>>> +            start_bulk=$((end_bulk+1))
> > >>>>>>>>> +        done
> > >>>>>>>>> +        i=$((end_bulk+1))
> > >>>>>>>>> +    done
> > >>>>>>>>> +}
> > >>>>>>>>> +check ovn-nbctl ls-add ls1
> > >>>>>>>>> +check ovn-nbctl set Logical_Switch ls1 other_config:subnet=
> > >>>>> 10.1.0.0/16
> > >>>>>>>>> +check ovn-nbctl set Logical_Switch ls1
> > >>>>>>>> other_config:exclude_ips=10.1.255.254
> > >>>>>>>>> +
> > >>>>>>>>> +check ovn-nbctl lr-add lr1
> > >>>>>>>>> +check ovn-nbctl lsp-add ls1 lsp0 -- set Logical_Switch_Port
lsp0
> > >>>>>>>> type=router options:router-port=lrp0 addresses=dynamic
> > >>>>>>>>> +check ovn-nbctl lrp-add lr1 lrp0 "f0:00:00:01:00:01"
> > >>> 10.1.255.254/16
> > >>>>>>>>> +check ovn-nbctl lr-nat-add lr1 snat 10.2.0.1 10.1.0.0/16
> > >>>>>>>>> +
> > >>>>>>>>> +lflow_run=0
> > >>>>>>>>> +check ovn-nbctl --wait=hv sync
> > >>>>>>>>> +
> > >>>>>>>>> +# Tunnel ports might not be added (yet) at this point on slow
> > >>> system.
> > >>>>>>>>> +# Wait for flows related to such ports to ensure those ports
have
> > >>>>> been
> > >>>>>>>> added
> > >>>>>>>>> +# before we measure recomputes. Otherwise, ovs_interface
handler
> > >>>>> might
> > >>>>>>>> be run
> > >>>>>>>>> +# afterwards for tunnel ports, causing recomputes.
> > >>>>>>>>> +for i in $(seq 1 $n_hv); do
> > >>>>>>>>> +    for j in $(seq 1 $n_hv); do
> > >>>>>>>>> +        if test $i != $j; then
> > >>>>>>>>> +            OVN_WAIT_REMOTE_INPUT_FLOWS(["hv$i"],["hv$j"])
> > >>>>>>>>> +        fi
> > >>>>>>>>> +    done
> > >>>>>>>>> +done
> > >>>>>>>>> +
> > >>>>>>>>> +for i in $(seq 1 $n_hv); do
> > >>>>>>>>> +    as hv$i
> > >>>>>>>>> +    lflow_run1=$(ovn-appctl -t ovn-controller
> > >>> coverage/read-counter
> > >>>>>>>> lflow_run)
> > >>>>>>>>> +    lflow_run=`expr $lflow_run1 + $lflow_run`
> > >>>>>>>>> +done
> > >>>>>>>>> +
> > >>>>>>>>> +add_switch_ports 1 1000 $n_hv 5
> > >>>>>>>>> +
> > >>>>>>>>> +wait_for_ports_up
> > >>>>>>>>> +check ovn-nbctl --wait=hv sync
> > >>>>>>>>> +
> > >>>>>>>>> +for i in $(seq 1 $n_hv); do
> > >>>>>>>>> +    pid=$(cat hv${i}/ovn-controller.pid)
> > >>>>>>>>> +    u=$((u+$(cat "/proc/$pid/stat" | awk '{print $14}')))
> > >>>>>>>>> +    s=$((s+$(cat "/proc/$pid/stat" | awk '{print $15}')))
> > >>>>>>>>> +done
> > >>>>>>>>> +
> > >>>>>>>>> +n_pid=$(cat northd/ovn-northd.pid)
> > >>>>>>>>> +n_u=$(cat "/proc/$pid/stat" | awk '{print $14}')
> > >>>>>>>>> +n_s=$(cat "/proc/$pid/stat" | awk '{print $15}')
> > >>>>>>>>> +
> > >>>>>>>>> +echo "Total Northd User Time: $n_u"
> > >>>>>>>>> +echo "Total Northd System Time: $n_s"
> > >>>>>>>>> +echo "Total Controller User Time: $u"
> > >>>>>>>>> +echo "Total Controller System Time: $s"
> > >>>>>>>>> +
> > >>>>>>>>> +lflow_run_end=0
> > >>>>>>>>> +for i in $(seq 1 $n_hv); do
> > >>>>>>>>> +    as hv$i
> > >>>>>>>>> +    lflow_run1=$(ovn-appctl -t ovn-controller
> > >>> coverage/read-counter
> > >>>>>>>> lflow_run)
> > >>>>>>>>> +    lflow_run_end=`expr $lflow_run1 + $lflow_run_end`
> > >>>>>>>>> +done
> > >>>>>>>>> +n_recomputes=`expr $lflow_run_end - $lflow_run`
> > >>>>>>>>> +echo "$n_recomputes recomputes"
> > >>>>>>>>> +
> > >>>>>>>>> +AT_CHECK([test $lflow_run_end == $lflow_run])
> > >>>>>>>>> +
> > >>>>>>>>> +for i in $(seq 2 $n_hv); do
> > >>>>>>>>> +    OVN_CLEANUP_SBOX([hv$i])
> > >>>>>>>>> +done
> > >>>>>>>>> +OVN_CLEANUP([hv1])
> > >>>>>>>>> +AT_CLEANUP
> > >>>>>>>>> +])
> > >>>>>>>>> diff --git a/tests/perf-northd.at b/tests/perf-northd.at
> > >>>>>>>>> index 74b69e9d4..6ec196b36 100644
> > >>>>>>>>> --- a/tests/perf-northd.at
> > >>>>>>>>> +++ b/tests/perf-northd.at
> > >>>>>>>>> @@ -76,23 +76,6 @@ m4_define([PERF_RECORD_STOP], [
> > >>>>>>>>>      PERF_RECORD_STOPWATCH(ovn-northd-loop, ["Short term
average"],
> > >>>>>>>> [Average (northd-loop in msec)])
> > >>>>>>>>>  ])
> > >>>>>>>>>
> > >>>>>>>>> -# OVN_NBCTL([NBCTL_COMMAND])
> > >>>>>>>>> -#
> > >>>>>>>>> -# Add NBCTL_COMMAND to list of commands to be run by
> > >>> RUN_OVN_NBCTL().
> > >>>>>>>>> -#
> > >>>>>>>>> -m4_define([OVN_NBCTL], [
> > >>>>>>>>> -    command="${command} -- $1"
> > >>>>>>>>> -])
> > >>>>>>>>> -
> > >>>>>>>>> -# RUN_OVN_NBCTL()
> > >>>>>>>>> -#
> > >>>>>>>>> -# Execute list of commands built by the OVN_NBCTL() macro.
> > >>>>>>>>> -#
> > >>>>>>>>> -m4_define([RUN_OVN_NBCTL], [
> > >>>>>>>>> -    check ovn-nbctl ${command}
> > >>>>>>>>> -    unset command
> > >>>>>>>>> -])
> > >>>>>>>>> -
> > >>>>>>>>>  OVS_START_SHELL_HELPERS
> > >>>>>>>>>  generate_subnet () {
> > >>>>>>>>>      local a=$(printf %d $(expr $1 / 256 + 10))
> > >>>>>>>>> --
> > >>>>>>>>> 2.31.1
> > >>>>>>>>>
> > >>>>>>>>
> > >>>>>>>
> > >>>>>>
> > >>>>>
> > >>>>
> > >>>
> > >>
> >
> > _______________________________________________
> > dev mailing list
> > dev@openvswitch.org
> > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> >
Dumitru Ceara July 21, 2022, 10:48 a.m. UTC | #14
On 7/21/22 08:29, Han Zhou wrote:
> On Wed, Jul 20, 2022 at 10:29 AM Numan Siddique <numans@ovn.org> wrote:
>>
>> On Wed, Jul 13, 2022 at 4:29 AM Dumitru Ceara <dceara@redhat.com> wrote:
>>>
>>> On 7/13/22 11:08, Dumitru Ceara wrote:
>>>> On 7/13/22 09:40, Xavier Simonart wrote:
>>>>> Hi Han, Dumitru
>>>>>
>>>>
>>>> Hi Han, Xavier,
>>>>
>>>> Sorry, I had already replied to the previous message and only then
>>>> noticed this one.
>>>>
>>>>> I think that we should, as much as possible, try to achieve both
> goals:
>>>>> - have an accurate ovn-installed
>>>>> - do not increase latency in large scale deployments
>>>>>
>>>>
>>>> +1
>>>>
>>>>> The fact that ovn-installed is sent too early for mc flows is
> already an
>>>>> issue today, independent of this patch.
>>>>> Fixing ovn-installed related to mc flows by delaying the state
> change (for
>>>>> all cases, included when no mc groups) might be seen as a performance
>>>>> regression.
>>>>>
>>>>
>>>> I think it will, and I'm not sure we can convince the CMS that this is
>>>> "just metrics".
>>>>
>>>>> I agree that we should fix this ovn-installed issue, but it is not a
>>>>> regression added by this patch. We should enter a BZ for it.
>>>>> Per my understanding, the mc flows are updated when the
> SB_multicast_group
>>>>> is seen as updated by ovn-controller, due to its references to port
> binding.
>>>>> Other flows related to port binding are installed earlier, i.e. when
>>>>> ovn-controller writes port_binding->chassis (i.e. before it receives
> SB
>>>>> confirmation). So, while sending the mc flows earlier than what we
> do today
>>>>> might be more complex, I think it makes some kind of sense (we would
> send
>>>>> all those flows within the same loop).
>>>>
>>>> I'm inclining towards leaving it as it is today if this is the only
> flow
>>>> we're missing.  It's a guess without testing things out, but I think
>>>> it's for the MC_FLOOD_L2 multicast group which is used only for
>>>> forwarding ARP packets originated by OVN routers or destined to a
>>>> specific OVN router.  Losing some of those packets is not a big deal.
>>>>
>>>> But it might be good to confirm that this is the MC group we install
> the
>>>> flow for.
>>>>
>>>
>>> Oh, well, the port is also part of the MC_FLOOD group.  This is however
>>> only used for BUM traffic.  So losing some packets here is also not
>>> terrible, I think.
>>>
>>
>> When a logical port is claimed,  we process the logical flows related
>> to the logical port (i.e with inport == <lport> or outport == <lport>)
>> and install
>> the corresponding openflows. All the generic logical flows (i.e
>> without inport or outport match) would have already been programmed
>> (if the datapath already part of local_datapaths).
>> These processed logical flows (lflow_handle_flows_for_lport() in
>> lflow.c) will be most likely part of the same openflow bundle. And
>> once the sequence number
>> for this bundle is acknowledged we set "ovn-installed=true".  When CMS
>> notices "ovn-installed=true" , I think it can fairly assume that the
>> flows for the lport are
>> programmed.
>>
>> I think the only flows pertaining to the logical port  which we would
>> be missing are the multicast related flows and the logical flows which
>> ovn-northd would generate after the logical port
>> is claimed (presently it is the arp responder flows) and I don't think
>> we can wait for these logical flows to be programmed by ovn-northd
>> before setting "ovn-installed=true".
> 
> The missing flows is just a side-effect. I am more concerned with the
> clearness of the state-machine.
> To my understanding it would be very clear to define the "CLAIMED" state's
> job as claiming the port in SB-DB. If SB commit fails, the retry should
> happen at this state. If we see the update notification (i.e. we see the
> PB.chassis matches the desired chassis), we move to the next state
> "INSTALL_FLOWS". Now if we move the state forward without confirming the
> PB.chassis is updated in SB, we would need to perform the task in all the
> following states. The only benefit we get from this is that ovn-installed
> can be set to true a little bit earlier (to save a SB round trip), at the
> cost of more complexity (even more so if ovn-monitor-all needs to be
> considered in this logic) and less clarity of the state machine.
> 
> Is it possible to address it with the simpler/clear approach first and see
> if it really causes obvious performance regression, then we can consider
> the "short-cuts"? I am not sure if it is some kind of premature
> optimization at this point.
> 

It's not exactly measured with the same code but I think this is still
relevant.  A while ago we had tried to see the impact of delaying
ovn-installed as long as the SB is readonly (last transaction still in
progress).  The impact was measurable, up to 500msec on the 120 node
cluster we tested on:

https://mail.openvswitch.org/pipermail/ovs-dev/2022-April/393319.html

This was with a real OpenShift 120 node cluster running a scale test
used for measuring scalability and to detect performance regressions.

Now, we should maybe retest with your suggested change but, unless I'm
missing something, the impact will be similar.

Thanks,
Dumitru

> Thanks,
> Han
>>
>> Delaying setting the ovn-installed=true would definitely result in
>> latency.  It would not be easy for ovn-controller to keep track of
>> openflows already programmed for a logical port
>> In other words,  I don't think ovn-controller can accurately keep
>> track of all the openflows related to a logical port are programmed or
>> not  unless all these flows are grouped in one bundle.
>>
>> Also since the present ovn main already sets ovn-installed=true a bit
>> early i.e. even before the multicast and arp responder flows are
>> programmed, I think it is out of this patch's scope to address it.
>>
>> So I think the patch is fine with me once Xavier addresses (1) i.e
>> remove the 'chassis_update_required'.
>>
>> Thanks
>> Numan
>>
>>
>>>> Thanks,
>>>> Dumitru
>>>>
>>>>>
>>>>> Thanks
>>>>> Xavier
>>>>>
>>>>>
>>>>>
>>>>> On Wed, Jul 13, 2022 at 8:28 AM Han Zhou <hzhou@ovn.org> wrote:
>>>>>
>>>>>>
>>>>>>
>>>>>> On Tue, Jul 12, 2022 at 12:35 AM Dumitru Ceara <dceara@redhat.com>
> wrote:
>>>>>>>
>>>>>>> On 7/12/22 08:52, Han Zhou wrote:
>>>>>>>> On Mon, Jul 11, 2022 at 4:55 AM Dumitru Ceara <dceara@redhat.com>
>>>>>> wrote:
>>>>>>>>>
>>>>>>>>> On 7/11/22 13:31, Xavier Simonart wrote:
>>>>>>>>>> Hi Han
>>>>>>>>>>
>>>>>>>>>> Thanks for your review.
>>>>>>>>>>
>>>>>>>>>> Let me try to understand your two main concerns and the proper
> way to
>>>>>>>> fix
>>>>>>>>>> it.
>>>>>>>>>> 1) We only try once to write pb->chassis. If the commit fails,
>>>>>>>> pb->chassis
>>>>>>>>>> is not written. As commit fails, we will recompute, but as the
>>>>>>>>>> update_required flag is not set anymore, we might end up with no
>>>>>>>>>> pb->chassis.
>>>>>>>>>> => I'll remove the flag and try to update until it's confirmed.
>>>>>>>>
>>>>>>>> Thank you!
>>>>>>>>
>>>>>>>>>> 2) The state machine, and when we move to INSTALL_FLOWS.
> Serializing
>>>>>> the
>>>>>>>>>> state machine, by waiting for confirmation to be received before
>>>>>> moving
>>>>>>>> to
>>>>>>>>>> INSTALL_FLOWS state will delay the ovn-installed compared to
> today.
>>>>>> So
>>>>>>>> I am
>>>>>>>>>> (still) trying to see if there is any way to prevent this in
> some
>>>>>> cases.
>>>>>>>>>> Would it be correct to do this serialization (wait for
> pb->chassis
>>>>>>>> update
>>>>>>>>>> confirmation) only when using conditional monitoring? When using
>>>>>>>>>> monitor-all, as soon as we have written (w/o confirmation)
>>>>>> pb->chassis,
>>>>>>>> we
>>>>>>>>>> would move to INSTALL_FLOWS. In that loop where we wrote
> pb->chassis,
>>>>>>>> (all)
>>>>>>>>>> the flows should be updated taking into account pb->chassis.
>>>>>>>>>
>>>>>>>>> I think this approach is OK.  ovn-controllers that don't use
>>>>>> conditional
>>>>>>>>> monitoring already know the complete SB contents and should be
> able to
>>>>>>>>> install (mostly?) complete sets of openflows that correspond to a
>>>>>> given
>>>>>>>>> Port_Binding.
>>>>>>>>>
>>>>>>>>> It's also "CMS-friendly", at least for ovn-kubernetes, which uses
>>>>>>>>> ovn-monitor-all=true and only waits for
>>>>>>>>> OVS.Interface.external_ids:ovn-installed=true.  And shouldn't
> impact
>>>>>> the
>>>>>>>>> others which wait for SB.Port_Binding.up=true.
>>>>>>>>>
>>>>>>>>
>>>>>>>> Xavier and Dumitru, I think we shouldn't introduce the dependency
> of
>>>>>>>> "ovn-monitor-all" setting here.
>>>>>>>>
>>>>>>>> First of all, ovn-installed is a flag for CMS to understand that
> all
>>>>>> the
>>>>>>>> flows related to the port-binding is installed. If we set the flag
>>>>>> before
>>>>>>>> it is truly completed, it is a bug, and it is possible that the
> flag
>>>>>> is set
>>>>>>>> but some traffic doesn't work.
>>>>>>>
>>>>>>> I think it's a matter of semantics.  The way I see
> "ovn-installed=true"
>>>>>>> is: all flows that are relevant to the port binidng on the local
> chassis
>>>>>>> have been installed.  When we added it it was for the ovn-k8s case;
>>>>>>> ovn-k8s used to explicitly check if some openflow tables on the
> node
>>>>>>> where the pod is being brought up contained flows that seemed to
>>>>>>> correspond to the pod (e.g., matching on pod mac and IP addresses).
>>>>>>>
>>>>>> But the purpose of checking the flows (when ovn-installed wasn't
>>>>>> available) was to make sure the pod is ready to send/receive
> traffic. If
>>>>>> ovn-installed can provide more accuracy, why not?
>>>>>>
>>>>>>>> I did a quick test, and at least a flow in (table_id=38,
> priority=100)
>>>>>>>> which is multicast-group related is updated AFTER the SB
> notification
>>>>>> is
>>>>>>>> received for the port-binding chassis update.
>>>>>>>>
>>>>>>>
>>>>>>> This sounds like something we should fix, I think.  I don't see any
>>>>>>> multicast-group changes conditioned by the port_binding being
> up=true in
>>>>>>> northd.  I might be wrong though.
>>>>>>>
>>>>>>
>>>>>> It is not about "up=true". It is triggered by the
> port-binding->chassis
>>>>>> update. Since multicast-group has reference to port-binding, so a
>>>>>> port-binding update triggers multicast-group change handling, which
> is
>>>>>> required because physical flows related to the MC group need to be
> updated
>>>>>> when port-binding->chassis is updated. You may argue that the IDL
> may be
>>>>>> optimized so that the MC group change can be triggered and handled
> before
>>>>>> SB is updated, but I am not sure if the benefit is worth the
> complexity.
>>>>>> Given how OVSDB IDL transaction is designed, I'd always think a DB
> record
>>>>>> is *formally* updated only after the update notification is
> received from
>>>>>> the server, which seems to be safe and clear.
>>>>>>
>>>>>>>> Secondly, if the change hasn't made it to the SB, all the other
> nodes
>>>>>> would
>>>>>>>> not be able to reach the port, which means the workload (pod/VM)
> cannot
>>>>>>>> receive traffic yet at this phase.
>>>>>>>>
>>>>>>>
>>>>>>> Even if the change made it to the SB we have no way of knowing
> that all
>>>>>>> other nodes processed it so we cannot know for sure that traffic
> can
>>>>>>> flow properly end-to-end.  But, like I said above, this doesn't
> matter
>>>>>>> if the semantics of ovn-installed=true are "all locally relevant
> flows
>>>>>>> are installed".
>>>>>>>
>>>>>> It's true that even SB is updated it doesn't ensure all the nodes
>>>>>> processed it, but I view it this way: at least from the current
> node's
>>>>>> point of view, its job is done and the other nodes are beyond its
> control.
>>>>>> On the other hand, if SB update failed, its job is not done yet. I
> am not
>>>>>> saying this is the only *correct* way, but just the way I am seeing
> it :).
>>>>>>
>>>>>>>> So, I think our goal is not to set ovn-installed early, but to
> set it
>>>>>>>> accurately (sometime may be ok to be conservative).
>>>>>>>>
>>>>>>>
>>>>>>> Sure, but waiting for the SB port_binding.chassis update might
> introduce
>>>>>>> significant spikes in latency if the SB is compacting (or just
> busy) at
>>>>>>> that moment.
>>>>>>>
>>>>>>> This might become an issue in large scale deployments as pods will
> take
>>>>>>> longer to be declared "ready".
>>>>>>>
>>>>>> I understand your concern, but if you think about it, no matter how
> the
>>>>>> pods are *declared* ready doesn't change the fact it is ready or
> not. It
>>>>>> doesn't make the real flow setup faster or slower.
>>>>>> If the CMS really wants to declare it ready earlier, it can just
> ignore
>>>>>> the ovn-installed flag check or flow check. What's the real benefit
> except
>>>>>> for metrics?
>>>>>>
>>>>>>>> In addition, ovn-monitor-all is not always true even in ovn-k8s.
> It is
>>>>>>>> configurable in ovn-k8s. (in our environment we set it to false,
> to
>>>>>> save
>>>>>>>> MEM and CPU for worker nodes, while sacrifice a little for the
> central
>>>>>> SB
>>>>>>>> DB)
>>>>>>>>
>>>>>>>
>>>>>>> Ack.  But for this case specifically, as SB is already busier with
>>>>>>> conditional monitoring, I think serializing events in
> ovn-controller
>>>>>>> will create even more visible delays in pod bringup times.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Dumitru
>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Han
>>>>>>>>>>
>>>>>>>>>> Thanks again for your feedback
>>>>>>>>>>
>>>>>>>>>> Xavier
>>>>>>>>>>
>>>>>>>>>> On Thu, Jul 7, 2022 at 8:49 AM Han Zhou <hzhou@ovn.org> wrote:
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Wed, Jun 22, 2022 at 3:23 AM Xavier Simonart <
>>>>>> xsimonar@redhat.com>
>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> When VIF ports are claimed on a chassis, SBDB Port_Binding
> table is
>>>>>>>>>>> updated.
>>>>>>>>>>>> If the SBDB IDL is still is read-only ("in transaction") when
> such
>>>>>> a
>>>>>>>>>>> update
>>>>>>>>>>>> is required, the update is not possible and recompute is
> triggered
>>>>>>>>>>> through
>>>>>>>>>>>> I+P failure.
>>>>>>>>>>>>
>>>>>>>>>>>> This situation can happen:
>>>>>>>>>>>> - after updating Port_Binding->chassis to SBDB for one port,
> in a
>>>>>>>>>>> following
>>>>>>>>>>>>   iteration, ovn-controller handles
>>>>>>>> Interface:external_ids:ovn-installed
>>>>>>>>>>>>   (for the same port) while SBDB is still read-only.
>>>>>>>>>>>> - after updating Port_Binding->chassis to SBDB for one port,
> in a
>>>>>>>>>>> following
>>>>>>>>>>>>   iteration, ovn-controller updates Port_Binding->chassis for
>>>>>> another
>>>>>>>>>>> port,
>>>>>>>>>>>>   while SBDB is still read-only.
>>>>>>>>>>>>
>>>>>>>>>>>> This patch prevent the recompute, by having the if-status
> module
>>>>>>>>>>>> updating the Port_Binding chassis (if needed) when possible.
>>>>>>>>>>>> This does not delay Port_Binding chassis update compared to
> before
>>>>>>>> this
>>>>>>>>>>> patch.
>>>>>>>>>>>> - With the patch, Port_Binding chassis will be updated as
> soon as
>>>>>>>> SBDB is
>>>>>>>>>>>> again writable, without recompute.
>>>>>>>>>>>> - Without the patch, Port_Binding chassis was updated as soon
> as
>>>>>> SBDB
>>>>>>>> was
>>>>>>>>>>>> again writable, through a recompute.
>>>>>>>>>>>>
>>>>>>>>>>>> As part of this patch, ovn-installed will not be updated for
>>>>>>>> additional
>>>>>>>>>>> chassis;
>>>>>>>>>>>> it will only be updated when the migration is completed.
>>>>>>>>>>>>
>>>>>>>>>>>> Reported-at:
> https://bugzilla.redhat.com/show_bug.cgi?id=2059253
>>>>>>>>>>>> Signed-off-by: Xavier Simonart <xsimonar@redhat.com>
>>>>>>>>>>>>
>>>>>>>>>>>> ---
>>>>>>>>>>>> v2:  - handled Dumitru's comments.
>>>>>>>>>>>>      - handled Han's comments, mainly ensure we moved out of
>>>>>> CLAIMED
>>>>>>>>>>> state
>>>>>>>>>>>>        only after updating pb->chassis to guarentee physical
> flows
>>>>>> are
>>>>>>>>>>> installed
>>>>>>>>>>>>        when ovn-installed is updated in OVS.
>>>>>>>>>>>>      - slighly reorganize the code to isolate 'notify_up =
> false'
>>>>>>>> cases
>>>>>>>>>>> in
>>>>>>>>>>>>        claim_port (i.e. ports such as virtual ports), in the
> idea
>>>>>> of
>>>>>>>>>>> making
>>>>>>>>>>>>        future patch preventing recomputes when virtual ports
> are
>>>>>>>> claimed.
>>>>>>>>>>>>      - updated test case to cause more race conditions.
>>>>>>>>>>>>      - rebased on origin/main
>>>>>>>>>>>>      - note that "additional chassis" as now supported by
>>>>>>>>>>>>        "Support LSP:options:requested-chassis as a list" might
>>>>>> still
>>>>>>>>>>> cause
>>>>>>>>>>>>        recomputes.
>>>>>>>>>>>>      - fixed missing flows when Port_Binding chassis was
> updated by
>>>>>>>>>>> mgr_update
>>>>>>>>>>>>        w/o any lflow recalculation.
>>>>>>>>>>>> v3:  - handled Dumitru's comments on v2, mainly have
> runtime_data
>>>>>>>> handler
>>>>>>>>>>>>        handling pb_claims when sb becomes writable (instead
> of a
>>>>>> lflow
>>>>>>>>>>> handler).
>>>>>>>>>>>>      - fixed test as it was not checking recomputes on all
> hv, as
>>>>>> well
>>>>>>>>>>> as a flaky
>>>>>>>>>>>>        behavior.
>>>>>>>>>>>>      - rebased on origin/main.
>>>>>>>>>>>> ---
>>>>>>>>>>>>  controller/binding.c        | 154
> +++++++++++++++++++++----------
>>>>>>>>>>>>  controller/binding.h        |  15 +++-
>>>>>>>>>>>>  controller/if-status.c      | 174
>>>>>>>> ++++++++++++++++++++++++++++++++----
>>>>>>>>>>>>  controller/if-status.h      |  16 +++-
>>>>>>>>>>>>  controller/ovn-controller.c |  72 ++++++++++++++-
>>>>>>>>>>>>  tests/ovn-macros.at         |  12 +++
>>>>>>>>>>>>  tests/ovn.at                | 147
> +++++++++++++++++++++++++++++-
>>>>>>>>>>>>  tests/perf-northd.at        |  17 ----
>>>>>>>>>>>>  8 files changed, 519 insertions(+), 88 deletions(-)
>>>>>>>>>>>>
>>>>>>>>>>>> diff --git a/controller/binding.c b/controller/binding.c
>>>>>>>>>>>> index 2279570f9..b21577f71 100644
>>>>>>>>>>>> --- a/controller/binding.c
>>>>>>>>>>>> +++ b/controller/binding.c
>>>>>>>>>>>> @@ -644,11 +644,17 @@ local_binding_get_lport_ofport(const
> struct
>>>>>>>> shash
>>>>>>>>>>> *local_bindings,
>>>>>>>>>>>>  }
>>>>>>>>>>>>
>>>>>>>>>>>>  bool
>>>>>>>>>>>> -local_binding_is_up(struct shash *local_bindings, const char
>>>>>>>> *pb_name)
>>>>>>>>>>>> +local_binding_is_up(struct shash *local_bindings, const char
>>>>>>>> *pb_name,
>>>>>>>>>>>> +                    const struct sbrec_chassis *chassis_rec)
>>>>>>>>>>>>  {
>>>>>>>>>>>>      struct local_binding *lbinding =
>>>>>>>>>>>>          local_binding_find(local_bindings, pb_name);
>>>>>>>>>>>>      struct binding_lport *b_lport =
>>>>>>>>>>> local_binding_get_primary_lport(lbinding);
>>>>>>>>>>>> +
>>>>>>>>>>>> +    if (b_lport && b_lport->pb->chassis != chassis_rec) {
>>>>>>>>>>>> +        return false;
>>>>>>>>>>>> +    }
>>>>>>>>>>>> +
>>>>>>>>>>>>      if (lbinding && b_lport && lbinding->iface) {
>>>>>>>>>>>>          if (b_lport->pb->n_up && !b_lport->pb->up[0]) {
>>>>>>>>>>>>              return false;
>>>>>>>>>>>> @@ -660,13 +666,23 @@ local_binding_is_up(struct shash
>>>>>>>> *local_bindings,
>>>>>>>>>>> const char *pb_name)
>>>>>>>>>>>>  }
>>>>>>>>>>>>
>>>>>>>>>>>>  bool
>>>>>>>>>>>> -local_binding_is_down(struct shash *local_bindings, const
> char
>>>>>>>> *pb_name)
>>>>>>>>>>>> +local_binding_is_down(struct shash *local_bindings, const
> char
>>>>>>>> *pb_name,
>>>>>>>>>>>> +                      const struct sbrec_chassis
> *chassis_rec)
>>>>>>>>>>>>  {
>>>>>>>>>>>>      struct local_binding *lbinding =
>>>>>>>>>>>>          local_binding_find(local_bindings, pb_name);
>>>>>>>>>>>>
>>>>>>>>>>>>      struct binding_lport *b_lport =
>>>>>>>>>>> local_binding_get_primary_lport(lbinding);
>>>>>>>>>>>>
>>>>>>>>>>>> +    if (b_lport) {
>>>>>>>>>>>> +        if (b_lport->pb->chassis == chassis_rec) {
>>>>>>>>>>>> +            return false;
>>>>>>>>>>>> +        } else if (b_lport->pb->chassis) {
>>>>>>>>>>>> +            VLOG_DBG("lport %s already claimed by other
> chassis",
>>>>>>>>>>>> +                     b_lport->pb->logical_port);
>>>>>>>>>>>> +        }
>>>>>>>>>>>> +    }
>>>>>>>>>>>> +
>>>>>>>>>>>>      if (!lbinding) {
>>>>>>>>>>>>          return true;
>>>>>>>>>>>>      }
>>>>>>>>>>>> @@ -884,37 +900,80 @@ get_lport_type_str(enum en_lport_type
>>>>>>>> lport_type)
>>>>>>>>>>>>      OVS_NOT_REACHED();
>>>>>>>>>>>>  }
>>>>>>>>>>>>
>>>>>>>>>>>> -/* For newly claimed ports, if 'notify_up' is 'false':
>>>>>>>>>>>> +void
>>>>>>>>>>>> +set_pb_chassis_in_sbrec(const struct sbrec_port_binding *pb,
>>>>>>>>>>>> +                        const struct sbrec_chassis
> *chassis_rec,
>>>>>>>>>>>> +                        bool is_set)
>>>>>>>>>>>> +{
>>>>>>>>>>>> +    if (pb->chassis != chassis_rec) {
>>>>>>>>>>>> +         if (is_set) {
>>>>>>>>>>>> +            if (pb->chassis) {
>>>>>>>>>>>> +                VLOG_INFO("Changing chassis for lport %s
> from %s
>>>>>> to
>>>>>>>>>>> %s.",
>>>>>>>>>>>> +                          pb->logical_port,
> pb->chassis->name,
>>>>>>>>>>>> +                          chassis_rec->name);
>>>>>>>>>>>> +            } else {
>>>>>>>>>>>> +                VLOG_INFO("Claiming lport %s for this
> chassis.",
>>>>>>>>>>>> +                          pb->logical_port);
>>>>>>>>>>>> +            }
>>>>>>>>>>>> +            for (int i = 0; i < pb->n_mac; i++) {
>>>>>>>>>>>> +                VLOG_INFO("%s: Claiming %s",
> pb->logical_port,
>>>>>>>>>>> pb->mac[i]);
>>>>>>>>>>>> +            }
>>>>>>>>>>>> +            sbrec_port_binding_set_chassis(pb, chassis_rec);
>>>>>>>>>>>> +        }
>>>>>>>>>>>> +    } else if (!is_set) {
>>>>>>>>>>>> +        sbrec_port_binding_set_chassis(pb, NULL);
>>>>>>>>>>>> +    }
>>>>>>>>>>>> +}
>>>>>>>>>>>> +
>>>>>>>>>>>> +void
>>>>>>>>>>>> +local_binding_set_pb(struct shash *local_bindings, const char
>>>>>>>> *pb_name,
>>>>>>>>>>>> +                     const struct sbrec_chassis *chassis_rec,
>>>>>>>>>>>> +                     struct hmap *tracked_datapaths, bool
> is_set)
>>>>>>>>>>>> +{
>>>>>>>>>>>> +    struct local_binding *lbinding =
>>>>>>>>>>>> +        local_binding_find(local_bindings, pb_name);
>>>>>>>>>>>> +    struct binding_lport *b_lport =
>>>>>>>>>>> local_binding_get_primary_lport(lbinding);
>>>>>>>>>>>> +
>>>>>>>>>>>> +    if (b_lport) {
>>>>>>>>>>>> +        set_pb_chassis_in_sbrec(b_lport->pb, chassis_rec,
> is_set);
>>>>>>>>>>>> +        if (tracked_datapaths) {
>>>>>>>>>>>> +            update_lport_tracking(b_lport->pb,
> tracked_datapaths,
>>>>>>>> true);
>>>>>>>>>>>> +        }
>>>>>>>>>>>> +    }
>>>>>>>>>>>> +}
>>>>>>>>>>>> +
>>>>>>>>>>>> +/* For newly claimed ports:
>>>>>>>>>>>>   * - set the 'pb.up' field to true if 'pb' has no
> 'parent_pb'.
>>>>>>>>>>>>   * - set the 'pb.up' field to true if 'parent_pb.up' is
> 'true'
>>>>>> (e.g.,
>>>>>>>>>>> for
>>>>>>>>>>>>   *   container and virtual ports).
>>>>>>>>>>>> - * Otherwise request a notification to be sent when the OVS
> flows
>>>>>>>>>>>> - * corresponding to 'pb' have been installed.
>>>>>>>>>>>> + *
>>>>>>>>>>>> + * Returns false if lport is not claimed due to
> 'sb_readonly'.
>>>>>>>>>>>> + * Returns true otherwise.
>>>>>>>>>>>>   *
>>>>>>>>>>>>   * Note:
>>>>>>>>>>>> - *   Updates (directly or through a notification) the
> 'pb->up'
>>>>>> field
>>>>>>>>>>> only if
>>>>>>>>>>>> - *   it's explicitly set to 'false'.
>>>>>>>>>>>> + *   Updates the 'pb->up' field only if it's explicitly set
> to
>>>>>>>> 'false'.
>>>>>>>>>>>>   *   This is to ensure compatibility with older versions of
>>>>>>>> ovn-northd.
>>>>>>>>>>>>   */
>>>>>>>>>>>> -static void
>>>>>>>>>>>> +static bool
>>>>>>>>>>>>  claimed_lport_set_up(const struct sbrec_port_binding *pb,
>>>>>>>>>>>>                       const struct sbrec_port_binding
> *parent_pb,
>>>>>>>>>>>> -                     const struct sbrec_chassis *chassis_rec,
>>>>>>>>>>>> -                     bool notify_up, struct if_status_mgr
> *if_mgr)
>>>>>>>>>>>> +                     bool sb_readonly)
>>>>>>>>>>>>  {
>>>>>>>>>>>> -    if (!notify_up) {
>>>>>>>>>>>> -        bool up = true;
>>>>>>>>>>>> -        if (!parent_pb || (parent_pb->n_up &&
> parent_pb->up[0])) {
>>>>>>>>>>>> +    /* When notify_up is false in claim_port(), no state is
>>>>>> created
>>>>>>>>>>>> +     * by if_status_mgr. In such cases, return false (i.e.
> trigger
>>>>>>>>>>> recompute)
>>>>>>>>>>>> +     * if we can't update sb (because it is readonly).
>>>>>>>>>>>> +     */
>>>>>>>>>>>> +    bool up = true;
>>>>>>>>>>>> +    if (!parent_pb || (parent_pb->n_up && parent_pb->up[0]))
> {
>>>>>>>>>>>> +        if (!sb_readonly) {
>>>>>>>>>>>>              if (pb->n_up) {
>>>>>>>>>>>>                  sbrec_port_binding_set_up(pb, &up, 1);
>>>>>>>>>>>>              }
>>>>>>>>>>>> +        } else if (pb->n_up && !pb->up[0]) {
>>>>>>>>>>>> +            return false;
>>>>>>>>>>>>          }
>>>>>>>>>>>> -        return;
>>>>>>>>>>>> -    }
>>>>>>>>>>>> -
>>>>>>>>>>>> -    if (pb->chassis != chassis_rec || (pb->n_up &&
> !pb->up[0])) {
>>>>>>>>>>>> -        if_status_mgr_claim_iface(if_mgr, pb->logical_port);
>>>>>>>>>>>>      }
>>>>>>>>>>>> +    return true;
>>>>>>>>>>>>  }
>>>>>>>>>>>>
>>>>>>>>>>>>  typedef void (*set_func)(const struct sbrec_port_binding *pb,
>>>>>>>>>>>> @@ -1057,37 +1116,35 @@ claim_lport(const struct
> sbrec_port_binding
>>>>>>>> *pb,
>>>>>>>>>>>>              struct hmap *tracked_datapaths,
>>>>>>>>>>>>              struct if_status_mgr *if_mgr)
>>>>>>>>>>>>  {
>>>>>>>>>>>> -    if (!sb_readonly) {
>>>>>>>>>>>> -        claimed_lport_set_up(pb, parent_pb, chassis_rec,
>>>>>> notify_up,
>>>>>>>>>>> if_mgr);
>>>>>>>>>>>> -    }
>>>>>>>>>>>> -
>>>>>>>>>>>>      enum can_bind can_bind =
>>>>>>>>>>> lport_can_bind_on_this_chassis(chassis_rec, pb);
>>>>>>>>>>>>      bool update_tracked = false;
>>>>>>>>>>>>
>>>>>>>>>>>>      if (can_bind == CAN_BIND_AS_MAIN) {
>>>>>>>>>>>>          if (pb->chassis != chassis_rec) {
>>>>>>>>>>>> -            if (sb_readonly) {
>>>>>>>>>>>> -                return false;
>>>>>>>>>>>> -            }
>>>>>>>>>>>> -
>>>>>>>>>>>> -            if (pb->chassis) {
>>>>>>>>>>>> -                VLOG_INFO("Changing chassis for lport %s
> from %s
>>>>>> to
>>>>>>>>>>> %s.",
>>>>>>>>>>>> -                        pb->logical_port, pb->chassis->name,
>>>>>>>>>>>> -                        chassis_rec->name);
>>>>>>>>>>>> -            } else {
>>>>>>>>>>>> -                VLOG_INFO("Claiming lport %s for this
> chassis.",
>>>>>>>>>>>> -                          pb->logical_port);
>>>>>>>>>>>> -            }
>>>>>>>>>>>> -            for (size_t i = 0; i < pb->n_mac; i++) {
>>>>>>>>>>>> -                VLOG_INFO("%s: Claiming %s",
> pb->logical_port,
>>>>>>>>>>> pb->mac[i]);
>>>>>>>>>>>> -            }
>>>>>>>>>>>> -
>>>>>>>>>>>> -            sbrec_port_binding_set_chassis(pb, chassis_rec);
>>>>>>>>>>>>              if (is_additional_chassis(pb, chassis_rec)) {
>>>>>>>>>>>> +                if (sb_readonly) {
>>>>>>>>>>>> +                    return false;
>>>>>>>>>>>> +                }
>>>>>>>>>>>>                  remove_additional_chassis(pb, chassis_rec);
>>>>>>>>>>>>              }
>>>>>>>>>>>>              update_tracked = true;
>>>>>>>>>>>>          }
>>>>>>>>>>>> +        if (!notify_up) {
>>>>>>>>>>>> +            if (!claimed_lport_set_up(pb, parent_pb,
>>>>>> sb_readonly)) {
>>>>>>>>>>>> +                return false;
>>>>>>>>>>>> +            }
>>>>>>>>>>>> +            if (pb->chassis != chassis_rec) {
>>>>>>>>>>>> +                if (sb_readonly) {
>>>>>>>>>>>> +                    return false;
>>>>>>>>>>>> +                }
>>>>>>>>>>>> +                set_pb_chassis_in_sbrec(pb, chassis_rec,
> true);
>>>>>>>>>>>> +            }
>>>>>>>>>>>> +        } else {
>>>>>>>>>>>> +            if ((pb->chassis != chassis_rec) || (pb->n_up &&
>>>>>>>>>>> !pb->up[0])) {
>>>>>>>>>>>> +                if_status_mgr_claim_iface(if_mgr, pb,
> chassis_rec,
>>>>>>>>>>>> +                                          sb_readonly);
>>>>>>>>>>>> +            }
>>>>>>>>>>>> +        }
>>>>>>>>>>>>      } else if (can_bind == CAN_BIND_AS_ADDITIONAL) {
>>>>>>>>>>>>          if (!is_additional_chassis(pb, chassis_rec)) {
>>>>>>>>>>>>              if (sb_readonly) {
>>>>>>>>>>>> @@ -1132,7 +1189,8 @@ claim_lport(const struct
> sbrec_port_binding
>>>>>> *pb,
>>>>>>>>>>>>   */
>>>>>>>>>>>>  static bool
>>>>>>>>>>>>  release_lport_main_chassis(const struct sbrec_port_binding
> *pb,
>>>>>>>>>>>> -                           bool sb_readonly)
>>>>>>>>>>>> +                           bool sb_readonly,
>>>>>>>>>>>> +                           struct if_status_mgr *if_mgr)
>>>>>>>>>>>>  {
>>>>>>>>>>>>      if (pb->encap) {
>>>>>>>>>>>>          if (sb_readonly) {
>>>>>>>>>>>> @@ -1141,11 +1199,13 @@ release_lport_main_chassis(const
> struct
>>>>>>>>>>> sbrec_port_binding *pb,
>>>>>>>>>>>>          sbrec_port_binding_set_encap(pb, NULL);
>>>>>>>>>>>>      }
>>>>>>>>>>>>
>>>>>>>>>>>> +    /* If sb readonly, pb->chassis unset through if-status if
>>>>>>>> present.
>>>>>>>>>>> */
>>>>>>>>>>>>      if (pb->chassis) {
>>>>>>>>>>>> -        if (sb_readonly) {
>>>>>>>>>>>> +        if (!sb_readonly) {
>>>>>>>>>>>> +            sbrec_port_binding_set_chassis(pb, NULL);
>>>>>>>>>>>> +        } else if (!if_status_mgr_iface_is_present(if_mgr,
>>>>>>>>>>> pb->logical_port)) {
>>>>>>>>>>>>              return false;
>>>>>>>>>>>>          }
>>>>>>>>>>>> -        sbrec_port_binding_set_chassis(pb, NULL);
>>>>>>>>>>>>      }
>>>>>>>>>>>>
>>>>>>>>>>>>      if (pb->virtual_parent) {
>>>>>>>>>>>> @@ -1155,7 +1215,8 @@ release_lport_main_chassis(const struct
>>>>>>>>>>> sbrec_port_binding *pb,
>>>>>>>>>>>>          sbrec_port_binding_set_virtual_parent(pb, NULL);
>>>>>>>>>>>>      }
>>>>>>>>>>>>
>>>>>>>>>>>> -    VLOG_INFO("Releasing lport %s from this chassis.",
>>>>>>>>>>> pb->logical_port);
>>>>>>>>>>>> +    VLOG_INFO("Releasing lport %s from this chassis
>>>>>>>> (sb_readonly=%d)",
>>>>>>>>>>>> +              pb->logical_port, sb_readonly);
>>>>>>>>>>>>      return true;
>>>>>>>>>>>>  }
>>>>>>>>>>>>
>>>>>>>>>>>> @@ -1189,7 +1250,7 @@ release_lport(const struct
> sbrec_port_binding
>>>>>>>> *pb,
>>>>>>>>>>>>                struct hmap *tracked_datapaths, struct
> if_status_mgr
>>>>>>>>>>> *if_mgr)
>>>>>>>>>>>>  {
>>>>>>>>>>>>      if (pb->chassis == chassis_rec) {
>>>>>>>>>>>> -        if (!release_lport_main_chassis(pb, sb_readonly)) {
>>>>>>>>>>>> +        if (!release_lport_main_chassis(pb, sb_readonly,
> if_mgr))
>>>>>> {
>>>>>>>>>>>>              return false;
>>>>>>>>>>>>          }
>>>>>>>>>>>>      } else if (is_additional_chassis(pb, chassis_rec)) {
>>>>>>>>>>>> @@ -1271,7 +1332,7 @@ consider_vif_lport_(const struct
>>>>>>>>>>> sbrec_port_binding *pb,
>>>>>>>>>>>>                               b_lport->lbinding->iface,
>>>>>>>>>>>>                               !b_ctx_in->ovnsb_idl_txn,
>>>>>>>>>>>>                               !parent_pb,
>>>>>>>> b_ctx_out->tracked_dp_bindings,
>>>>>>>>>>>> -                             b_ctx_out->if_mgr)){
>>>>>>>>>>>> +                             b_ctx_out->if_mgr)) {
>>>>>>>>>>>>                  return false;
>>>>>>>>>>>>              }
>>>>>>>>>>>>
>>>>>>>>>>>> @@ -1527,7 +1588,8 @@ consider_localport(const struct
>>>>>>>> sbrec_port_binding
>>>>>>>>>>> *pb,
>>>>>>>>>>>>      enum can_bind can_bind = lport_can_bind_on_this_chassis(
>>>>>>>>>>>>          b_ctx_in->chassis_rec, pb);
>>>>>>>>>>>>      if (can_bind == CAN_BIND_AS_MAIN) {
>>>>>>>>>>>> -        if (!release_lport_main_chassis(pb,
>>>>>>>> !b_ctx_in->ovnsb_idl_txn)) {
>>>>>>>>>>>> +        if (!release_lport_main_chassis(pb,
>>>>>> !b_ctx_in->ovnsb_idl_txn,
>>>>>>>>>>>> +            b_ctx_out->if_mgr)) {
>>>>>>>>>>>>              return false;
>>>>>>>>>>>>          }
>>>>>>>>>>>>      } else if (can_bind == CAN_BIND_AS_ADDITIONAL) {
>>>>>>>>>>>> diff --git a/controller/binding.h b/controller/binding.h
>>>>>>>>>>>> index 1fed06674..d20659b0b 100644
>>>>>>>>>>>> --- a/controller/binding.h
>>>>>>>>>>>> +++ b/controller/binding.h
>>>>>>>>>>>> @@ -151,8 +151,10 @@ const struct sbrec_port_binding
>>>>>>>>>>> *local_binding_get_primary_pb(
>>>>>>>>>>>>  ofp_port_t local_binding_get_lport_ofport(const struct shash
>>>>>>>>>>> *local_bindings,
>>>>>>>>>>>>                                            const char
> *pb_name);
>>>>>>>>>>>>
>>>>>>>>>>>> -bool local_binding_is_up(struct shash *local_bindings, const
> char
>>>>>>>>>>> *pb_name);
>>>>>>>>>>>> -bool local_binding_is_down(struct shash *local_bindings,
> const
>>>>>> char
>>>>>>>>>>> *pb_name);
>>>>>>>>>>>> +bool local_binding_is_up(struct shash *local_bindings, const
> char
>>>>>>>>>>> *pb_name,
>>>>>>>>>>>> +                         const struct sbrec_chassis *);
>>>>>>>>>>>> +bool local_binding_is_down(struct shash *local_bindings,
> const
>>>>>> char
>>>>>>>>>>> *pb_name,
>>>>>>>>>>>> +                           const struct sbrec_chassis *);
>>>>>>>>>>>>  void local_binding_set_up(struct shash *local_bindings,
> const char
>>>>>>>>>>> *pb_name,
>>>>>>>>>>>>                            const struct sbrec_chassis
> *chassis_rec,
>>>>>>>>>>>>                            const char *ts_now_str, bool
>>>>>> sb_readonly,
>>>>>>>>>>>> @@ -160,7 +162,10 @@ void local_binding_set_up(struct shash
>>>>>>>>>>> *local_bindings, const char *pb_name,
>>>>>>>>>>>>  void local_binding_set_down(struct shash *local_bindings,
> const
>>>>>> char
>>>>>>>>>>> *pb_name,
>>>>>>>>>>>>                              const struct sbrec_chassis
>>>>>> *chassis_rec,
>>>>>>>>>>>>                              bool sb_readonly, bool
> ovs_readonly);
>>>>>>>>>>>> -
>>>>>>>>>>>> +void local_binding_set_pb(struct shash *local_bindings,
> const char
>>>>>>>>>>> *pb_name,
>>>>>>>>>>>> +                          const struct sbrec_chassis
> *chassis_rec,
>>>>>>>>>>>> +                          struct hmap *tracked_datapaths,
>>>>>>>>>>>> +                          bool is_set);
>>>>>>>>>>>>  void binding_register_ovs_idl(struct ovsdb_idl *);
>>>>>>>>>>>>  void binding_run(struct binding_ctx_in *, struct
> binding_ctx_out
>>>>>> *);
>>>>>>>>>>>>  bool binding_cleanup(struct ovsdb_idl_txn *ovnsb_idl_txn,
>>>>>>>>>>>> @@ -178,6 +183,10 @@ void binding_dump_local_bindings(struct
>>>>>>>>>>> local_binding_data *, struct ds *);
>>>>>>>>>>>>  bool is_additional_chassis(const struct sbrec_port_binding
> *pb,
>>>>>>>>>>>>                             const struct sbrec_chassis
>>>>>> *chassis_rec);
>>>>>>>>>>>>
>>>>>>>>>>>> +void set_pb_chassis_in_sbrec(const struct sbrec_port_binding
> *pb,
>>>>>>>>>>>> +                             const struct sbrec_chassis
>>>>>> *chassis_rec,
>>>>>>>>>>>> +                             bool is_set);
>>>>>>>>>>>> +
>>>>>>>>>>>>  /* Corresponds to each Port_Binding.type. */
>>>>>>>>>>>>  enum en_lport_type {
>>>>>>>>>>>>      LP_UNKNOWN,
>>>>>>>>>>>> diff --git a/controller/if-status.c b/controller/if-status.c
>>>>>>>>>>>> index ad61844d8..7693c289b 100644
>>>>>>>>>>>> --- a/controller/if-status.c
>>>>>>>>>>>> +++ b/controller/if-status.c
>>>>>>>>>>>> @@ -24,6 +24,7 @@
>>>>>>>>>>>>  #include "lib/util.h"
>>>>>>>>>>>>  #include "timeval.h"
>>>>>>>>>>>>  #include "openvswitch/vlog.h"
>>>>>>>>>>>> +#include "lib/ovn-sb-idl.h"
>>>>>>>>>>>>
>>>>>>>>>>>>  VLOG_DEFINE_THIS_MODULE(if_status);
>>>>>>>>>>>>
>>>>>>>>>>>> @@ -53,9 +54,11 @@ VLOG_DEFINE_THIS_MODULE(if_status);
>>>>>>>>>>>>   */
>>>>>>>>>>>>
>>>>>>>>>>>>  enum if_state {
>>>>>>>>>>>> -    OIF_CLAIMED,       /* Newly claimed interface. */
>>>>>>>>>>>> -    OIF_INSTALL_FLOWS, /* Already claimed interface for which
>>>>>> flows
>>>>>>>> are
>>>>>>>>>>> still
>>>>>>>>>>>> -                        * being installed.
>>>>>>>>>>>> +    OIF_CLAIMED,       /* Newly claimed interface.
> pb->chassis not
>>>>>>>> yet
>>>>>>>>>>> updated.
>>>>>>>>>>>> +                        */
>>>>>>>>>>>> +    OIF_INSTALL_FLOWS, /* Claimed interface with pb->chassis
>>>>>>>>>>> successfully
>>>>>>>>>>>> +                        * updated in SB and for which flows
> are
>>>>>> still
>>>>>>>>>>> being
>>>>>>>>>>>> +                        * installed.
>>>>>>>>>>>>                          */
>>>>>>>>>>>>      OIF_MARK_UP,       /* Interface with flows successfully
>>>>>> installed
>>>>>>>>>>> in OVS
>>>>>>>>>>>>                          * but not yet marked "up" in the
> binding
>>>>>>>> module
>>>>>>>>>>> (in
>>>>>>>>>>>> @@ -78,6 +81,55 @@ static const char *if_state_names[] = {
>>>>>>>>>>>>      [OIF_INSTALLED]     = "INSTALLED",
>>>>>>>>>>>>  };
>>>>>>>>>>>>
>>>>>>>>>>>> +/*
>>>>>>>>>>>> + *       +----------------------+
>>>>>>>>>>>> + * +---> |                      |
>>>>>>>>>>>> + * | +-> |         NULL         |
>>>>>>>>>>> <--------------------------------------+++-+
>>>>>>>>>>>> + * | |   +----------------------+
>>>>>>>>>>>      |
>>>>>>>>>>>> + * | |     ^ release_iface   | claim_iface
>>>>>>>>>>>       |
>>>>>>>>>>>> + * | |     |                 V - sbrec_update_chassis(if sb
> is rw)
>>>>>>>>>>>       |
>>>>>>>>>>>> + * | |   +----------------------+
>>>>>>>>>>>      |
>>>>>>>>>>>> + * | |   |                      |
>>>>>>>>>>> <----------------------------------------+ |
>>>>>>>>>>>> + * | |   |       CLAIMED        |
>>>>>>>>>>> <--------------------------------------+ | |
>>>>>>>>>>>> + * | |   +----------------------+
>>>>>>>>>>>  | | |
>>>>>>>>>>>> + * | |                  | mgr_update(when sb is rw)
>>>>>>>>>>>  | | |
>>>>>>>>>>>> + * | | release_iface    |  - sbrec_update_chassis
>>>>>>>>>>>  | | |
>>>>>>>>>>>> + * | |                  |  - request seqno
>>>>>>>>>>>   | | |
>>>>>>>>>>>> + * | |                  V
>>>>>>>>>>>  | | |
>>>>>>>>>>>> + * | |   +----------------------+
>>>>>>>>>>>  | | |
>>>>>>>>>>>> + * | +-- |                      |  mgr_run(seqno not rcvd)
>>>>>>>>>>>   | | |
>>>>>>>>>>>> + * |     |    INSTALL_FLOWS     |   - set port down in sb
>>>>>>>>>>>  | | |
>>>>>>>>>>>> + * |     |                      |  mgr_update()
>>>>>>>>>>>  | | |
>>>>>>>>>>>> + * |     +----------------------+   - sbrec_update_chassis if
>>>>>> needed
>>>>>>>>>>>   | | |
>>>>>>>>>>>> + * |                    |
>>>>>>>>>>>  | | |
>>>>>>>>>>>> + * |                    |  mgr_run(seqno rcvd)
>>>>>>>>>>>   | | |
>>>>>>>>>>>> + * |                    |  - set port up in sb
>>>>>>>>>>>   | | |
>>>>>>>>>>>> + * | release_iface      |  - set ovn-installed in ovs
>>>>>>>>>>>  | | |
>>>>>>>>>>>> + * |                    V
>>>>>>>>>>>  | | |
>>>>>>>>>>>> + * |   +----------------------+
>>>>>>>>>>>  | | |
>>>>>>>>>>>> + * |   |                      |  mgr_run()
>>>>>>>>>>>   | | |
>>>>>>>>>>>> + * +-- |       MARK_UP        |  - set port up in sb
>>>>>>>>>>>   | | |
>>>>>>>>>>>> + *     |                      |  - set ovn-installed in ovs
>>>>>>>>>>>  | | |
>>>>>>>>>>>> + *     |                      |  mgr_update()
>>>>>>>>>>>  | | |
>>>>>>>>>>>> + *     +----------------------+  - sbrec_update_chassis if
> needed
>>>>>>>>>>>  | | |
>>>>>>>>>>>> + *              |
>>>>>>>>>>>  | | |
>>>>>>>>>>>> + *              | mgr_update(rcvd port up / ovn_installed &
>>>>>> chassis
>>>>>>>>>>> set) | | |
>>>>>>>>>>>> + *              V
>>>>>>>>>>>  | | |
>>>>>>>>>>>> + *     +----------------------+
>>>>>>>>>>>  | | |
>>>>>>>>>>>> + *     |      INSTALLED       | ------------> claim_iface
>>>>>>>>>>> ---------------+ | |
>>>>>>>>>>>> + *     +----------------------+
>>>>>>>>>>>    | |
>>>>>>>>>>>> + *              |
>>>>>>>>>>>    | |
>>>>>>>>>>>> + *              | release_iface
>>>>>>>>>>>    | |
>>>>>>>>>>>> + *              V
>>>>>>>>>>>    | |
>>>>>>>>>>>> + *     +----------------------+
>>>>>>>>>>>    | |
>>>>>>>>>>>> + *     |                      | ------------> claim_iface
>>>>>>>>>>> -----------------+ |
>>>>>>>>>>>> + *     |      MARK_DOWN       | ------> mgr_update(rcvd port
> down)
>>>>>>>>>>> ----------+
>>>>>>>>>>>> + *     |                      | mgr_run()
>>>>>>>>>>>> + *     |                      | - set port down in sb
>>>>>>>>>>>> + *     |                      | mgr_update()
>>>>>>>>>>>> + *     +----------------------+ - sbrec_update_chassis(NULL)
>>>>>>>>>>>> + */
>>>>>>>>>>>> +
>>>>>>>>>>>>  struct ovs_iface {
>>>>>>>>>>>>      char *id;               /* Extracted from OVS
>>>>>>>>>>> external_ids.iface_id. */
>>>>>>>>>>>>      enum if_state state;    /* State of the interface in the
> state
>>>>>>>>>>> machine. */
>>>>>>>>>>>> @@ -85,6 +137,7 @@ struct ovs_iface {
>>>>>>>>>>>>                               * be fully programmed in OVS.
> Only
>>>>>> used
>>>>>>>>>>> in state
>>>>>>>>>>>>                               * OIF_INSTALL_FLOWS.
>>>>>>>>>>>>                               */
>>>>>>>>>>>> +    bool chassis_update_required;  /* If true, pb->chassis
> must be
>>>>>>>>>>> updated. */
>>>>>>>>>>>>  };
>>>>>>>>>>>>
>>>>>>>>>>>>  static uint64_t ifaces_usage;
>>>>>>>>>>>> @@ -158,14 +211,23 @@ if_status_mgr_destroy(struct
> if_status_mgr
>>>>>> *mgr)
>>>>>>>>>>>>  }
>>>>>>>>>>>>
>>>>>>>>>>>>  void
>>>>>>>>>>>> -if_status_mgr_claim_iface(struct if_status_mgr *mgr, const
> char
>>>>>>>>>>> *iface_id)
>>>>>>>>>>>> +if_status_mgr_claim_iface(struct if_status_mgr *mgr,
>>>>>>>>>>>> +                          const struct sbrec_port_binding
> *pb,
>>>>>>>>>>>> +                          const struct sbrec_chassis
> *chassis_rec,
>>>>>>>>>>>> +                          bool sb_readonly)
>>>>>>>>>>>>  {
>>>>>>>>>>>> +    const char *iface_id = pb->logical_port;
>>>>>>>>>>>>      struct ovs_iface *iface = shash_find_data(&mgr->ifaces,
>>>>>>>> iface_id);
>>>>>>>>>>>>
>>>>>>>>>>>>      if (!iface) {
>>>>>>>>>>>>          iface = ovs_iface_create(mgr, iface_id, OIF_CLAIMED);
>>>>>>>>>>>>      }
>>>>>>>>>>>> -
>>>>>>>>>>>> +    if (!sb_readonly) {
>>>>>>>>>>>> +        set_pb_chassis_in_sbrec(pb, chassis_rec, true);
>>>>>>>>>>>> +        iface->chassis_update_required = false;
>>>>>>>>>>>> +    } else {
>>>>>>>>>>>> +        iface->chassis_update_required = true;
>>>>>>>>>>>> +    }
>>>>>>>>>>>>      switch (iface->state) {
>>>>>>>>>>>>      case OIF_CLAIMED:
>>>>>>>>>>>>      case OIF_INSTALL_FLOWS:
>>>>>>>>>>>> @@ -182,6 +244,12 @@ if_status_mgr_claim_iface(struct
> if_status_mgr
>>>>>>>>>>> *mgr, const char *iface_id)
>>>>>>>>>>>>      }
>>>>>>>>>>>>  }
>>>>>>>>>>>>
>>>>>>>>>>>> +bool
>>>>>>>>>>>> +if_status_mgr_iface_is_present(struct if_status_mgr *mgr,
> const
>>>>>> char
>>>>>>>>>>> *iface_id)
>>>>>>>>>>>> +{
>>>>>>>>>>>> +    return !!shash_find_data(&mgr->ifaces, iface_id);
>>>>>>>>>>>> +}
>>>>>>>>>>>> +
>>>>>>>>>>>>  void
>>>>>>>>>>>>  if_status_mgr_release_iface(struct if_status_mgr *mgr, const
> char
>>>>>>>>>>> *iface_id)
>>>>>>>>>>>>  {
>>>>>>>>>>>> @@ -246,9 +314,39 @@ if_status_mgr_delete_iface(struct
>>>>>> if_status_mgr
>>>>>>>>>>> *mgr, const char *iface_id)
>>>>>>>>>>>>      }
>>>>>>>>>>>>  }
>>>>>>>>>>>>
>>>>>>>>>>>> +bool
>>>>>>>>>>>> +if_status_handle_claims(struct if_status_mgr *mgr,
>>>>>>>>>>>> +                        struct local_binding_data
> *binding_data,
>>>>>>>>>>>> +                        const struct sbrec_chassis
> *chassis_rec,
>>>>>>>>>>>> +                        struct hmap *tracked_datapath,
>>>>>>>>>>>> +                        bool sb_readonly)
>>>>>>>>>>>> +{
>>>>>>>>>>>> +    if (!binding_data || sb_readonly) {
>>>>>>>>>>>> +        return false;
>>>>>>>>>>>> +    }
>>>>>>>>>>>> +
>>>>>>>>>>>> +    struct shash *bindings = &binding_data->bindings;
>>>>>>>>>>>> +    struct hmapx_node *node;
>>>>>>>>>>>> +
>>>>>>>>>>>> +    bool rc = false;
>>>>>>>>>>>> +    HMAPX_FOR_EACH (node,
> &mgr->ifaces_per_state[OIF_CLAIMED]) {
>>>>>>>>>>>> +        struct ovs_iface *iface = node->data;
>>>>>>>>>>>> +        if (iface->chassis_update_required) {
>>>>>>>>>>>
>>>>>>>>>>> Thanks Xavier for the revision. The state machine looks more
> clear
>>>>>> now,
>>>>>>>>>>> but I have a major concern for the use of
> chassis_update_required.
>>>>>> This
>>>>>>>>>>> bool flag is used to decide if an update to SB is needed, and
> once
>>>>>> a SB
>>>>>>>>>>> update is requested, it is set to false, and assumes the SB
> update
>>>>>> will
>>>>>>>>>>> succeed immediately. However, the assumption may be wrong.
> There
>>>>>> can be
>>>>>>>>>>> different kinds of reasons that the subsequent SB update
> fails, or
>>>>>>>> delayed,
>>>>>>>>>>> so this flag is not reliable. Instead, in CLAIMED state, the
>>>>>>>> responsibility
>>>>>>>>>>> to make sure the SB update is completed. If the transaction is
>>>>>>>> in-progress,
>>>>>>>>>>> the sb_readonly is true. So if sb_readonly is false, it means
>>>>>> nothing
>>>>>>>> is
>>>>>>>>>>> in-progress, so we can always check if (!sb_readonly && <SB
> chassis
>>>>>> is
>>>>>>>> not
>>>>>>>>>>> updated for the port-binding>) we should just send the update,
>>>>>>>> regardless
>>>>>>>>>>> of whether we have requested it before. Please also see another
>>>>>> comment
>>>>>>>>>>> below for the state transition.
>>>>>>>>>>>
>>>>>>>>>>>> +            VLOG_INFO("if_status_handle_claims for %s",
>>>>>> iface->id);
>>>>>>>>>>>> +            local_binding_set_pb(bindings, iface->id,
> chassis_rec,
>>>>>>>>>>>> +                                 tracked_datapath, true);
>>>>>>>>>>>> +            rc = true;
>>>>>>>>>>>> +        }
>>>>>>>>>>>> +        iface->chassis_update_required = false;
>>>>>>>>>>>> +    }
>>>>>>>>>>>> +    return rc;
>>>>>>>>>>>> +}
>>>>>>>>>>>> +
>>>>>>>>>>>>  void
>>>>>>>>>>>>  if_status_mgr_update(struct if_status_mgr *mgr,
>>>>>>>>>>>> -                     struct local_binding_data *binding_data)
>>>>>>>>>>>> +                     struct local_binding_data *binding_data,
>>>>>>>>>>>> +                     const struct sbrec_chassis *chassis_rec,
>>>>>>>>>>>> +                     bool sb_readonly)
>>>>>>>>>>>>  {
>>>>>>>>>>>>      if (!binding_data) {
>>>>>>>>>>>>          return;
>>>>>>>>>>>> @@ -257,13 +355,25 @@ if_status_mgr_update(struct
> if_status_mgr
>>>>>> *mgr,
>>>>>>>>>>>>      struct shash *bindings = &binding_data->bindings;
>>>>>>>>>>>>      struct hmapx_node *node;
>>>>>>>>>>>>
>>>>>>>>>>>> +    /* Interfaces in OIF_MARK_UP state have already set their
>>>>>>>>>>> pb->chassis.
>>>>>>>>>>>> +     * However, it might have been reset by another hv.
>>>>>>>>>>>> +     */
>>>>>>>>>>>>      /* Move all interfaces that have been confirmed "up" by
> the
>>>>>>>> binding
>>>>>>>>>>> module,
>>>>>>>>>>>>       * from OIF_MARK_UP to OIF_INSTALLED.
>>>>>>>>>>>>       */
>>>>>>>>>>>>      HMAPX_FOR_EACH_SAFE (node,
>>>>>> &mgr->ifaces_per_state[OIF_MARK_UP]) {
>>>>>>>>>>>>          struct ovs_iface *iface = node->data;
>>>>>>>>>>>>
>>>>>>>>>>>> -        if (local_binding_is_up(bindings, iface->id)) {
>>>>>>>>>>>> +        if (iface->chassis_update_required) {
>>>>>>>>>>>> +            if (!sb_readonly) {
>>>>>>>>>>>> +                iface->chassis_update_required = false;
>>>>>>>>>>>> +                local_binding_set_pb(bindings, iface->id,
>>>>>>>> chassis_rec,
>>>>>>>>>>>> +                                     NULL, true);
>>>>>>>>>>>> +            } else {
>>>>>>>>>>>> +                continue;
>>>>>>>>>>>> +            }
>>>>>>>>>>>> +        }
>>>>>>>>>>>> +        if (local_binding_is_up(bindings, iface->id,
>>>>>> chassis_rec)) {
>>>>>>>>>>>>              ovs_iface_set_state(mgr, iface, OIF_INSTALLED);
>>>>>>>>>>>>          }
>>>>>>>>>>>>      }
>>>>>>>>>>>> @@ -274,23 +384,53 @@ if_status_mgr_update(struct
> if_status_mgr
>>>>>> *mgr,
>>>>>>>>>>>>      HMAPX_FOR_EACH_SAFE (node,
>>>>>>>> &mgr->ifaces_per_state[OIF_MARK_DOWN]) {
>>>>>>>>>>>>          struct ovs_iface *iface = node->data;
>>>>>>>>>>>>
>>>>>>>>>>>> -        if (local_binding_is_down(bindings, iface->id)) {
>>>>>>>>>>>> +        if (!sb_readonly) {
>>>>>>>>>>>> +            local_binding_set_pb(bindings, iface->id,
> chassis_rec,
>>>>>>>>>>>> +                                 NULL, false);
>>>>>>>>>>>> +        }
>>>>>>>>>>>> +        if (local_binding_is_down(bindings, iface->id,
>>>>>> chassis_rec))
>>>>>>>> {
>>>>>>>>>>>>              ovs_iface_destroy(mgr, iface);
>>>>>>>>>>>>          }
>>>>>>>>>>>>      }
>>>>>>>>>>>>
>>>>>>>>>>>> -    /* Register for a notification about flows being
> installed in
>>>>>> OVS
>>>>>>>>>>> for all
>>>>>>>>>>>> -     * newly claimed interfaces.
>>>>>>>>>>>> +    if (!sb_readonly) {
>>>>>>>>>>>> +        HMAPX_FOR_EACH_SAFE (node,
>>>>>>>>>>> &mgr->ifaces_per_state[OIF_INSTALL_FLOWS]) {
>>>>>>>>>>>> +            struct ovs_iface *iface = node->data;
>>>>>>>>>>>> +
>>>>>>>>>>>> +            if (iface->chassis_update_required) {
>>>>>>>>>>>> +                iface->chassis_update_required = false;
>>>>>>>>>>>> +                local_binding_set_pb(bindings, iface->id,
>>>>>>>> chassis_rec,
>>>>>>>>>>>> +                                     NULL, true);
>>>>>>>>>>>> +            }
>>>>>>>>>>>> +        }
>>>>>>>>>>>> +    }
>>>>>>>>>>>> +
>>>>>>>>>>>> +    /* Update Port_Binding->chassis for newly claimed
> interfaces
>>>>>>>>>>>> +     * Register for a notification about flows being
> installed in
>>>>>> OVS
>>>>>>>>>>> for all
>>>>>>>>>>>> +     * newly claimed interfaces for which we could update
>>>>>>>> pb->chassis.
>>>>>>>>>>>>       *
>>>>>>>>>>>>       * Move them from OIF_CLAIMED to OIF_INSTALL_FLOWS.
>>>>>>>>>>>>       */
>>>>>>>>>>>> -    bool new_ifaces = false;
>>>>>>>>>>>> -    HMAPX_FOR_EACH_SAFE (node,
>>>>>> &mgr->ifaces_per_state[OIF_CLAIMED]) {
>>>>>>>>>>>> -        struct ovs_iface *iface = node->data;
>>>>>>>>>>>>
>>>>>>>>>>>> -        ovs_iface_set_state(mgr, iface, OIF_INSTALL_FLOWS);
>>>>>>>>>>>> -        iface->install_seqno = mgr->iface_seqno + 1;
>>>>>>>>>>>> -        new_ifaces = true;
>>>>>>>>>>>> +    bool new_ifaces = false;
>>>>>>>>>>>> +    if (!sb_readonly) {
>>>>>>>>>>>> +        HMAPX_FOR_EACH_SAFE (node,
>>>>>>>> &mgr->ifaces_per_state[OIF_CLAIMED])
>>>>>>>>>>> {
>>>>>>>>>>>> +            struct ovs_iface *iface = node->data;
>>>>>>>>>>>> +            /* No need to check for chassis_update_required
> as
>>>>>>>> already
>>>>>>>>>>> done
>>>>>>>>>>>> +             * in if_status_handle_claims or
>>>>>>>> if_status_mgr_claim_iface
>>>>>>>>>>>> +             */
>>>>>>>>>>>> +            ovs_iface_set_state(mgr, iface,
> OIF_INSTALL_FLOWS);
>>>>>>>>>>>
>>>>>>>>>>> We need to make sure the chassis in SB port-binding is
> up-to-date
>>>>>> (i.e.
>>>>>>>>>>> the update notification from SB DB has been received) before
> moving
>>>>>> to
>>>>>>>>>>> INSTALL_FLOWS. Otherwise, it is still possible that the state
> is
>>>>>> moved
>>>>>>>> too
>>>>>>>>>>> early and end up with incomplete flow installation for the
> lport
>>>>>> when
>>>>>>>> the
>>>>>>>>>>> state is finally moved to INSTALLED.
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> Han
>>>>>>>>>>>
>>>>>>>>>>>> +            iface->install_seqno = mgr->iface_seqno + 1;
>>>>>>>>>>>> +            new_ifaces = true;
>>>>>>>>>>>> +        }
>>>>>>>>>>>> +    } else {
>>>>>>>>>>>> +        HMAPX_FOR_EACH_SAFE (node,
>>>>>>>> &mgr->ifaces_per_state[OIF_CLAIMED])
>>>>>>>>>>> {
>>>>>>>>>>>> +            struct ovs_iface *iface = node->data;
>>>>>>>>>>>> +            static struct vlog_rate_limit rl =
>>>>>>>> VLOG_RATE_LIMIT_INIT(5,
>>>>>>>>>>> 1);
>>>>>>>>>>>> +            VLOG_INFO_RL(&rl,
>>>>>>>>>>>> +                         "Not updating pb chassis for %s now
> as "
>>>>>>>>>>>> +                         "sb is readonly", iface->id);
>>>>>>>>>>>> +        }
>>>>>>>>>>>>      }
>>>>>>>>>>>>
>>>>>>>>>>>>      /* Request a seqno update when the flows for new
> interfaces
>>>>>> have
>>>>>>>>>>> been
>>>>>>>>>>>> @@ -403,7 +543,7 @@ if_status_mgr_update_bindings(struct
>>>>>> if_status_mgr
>>>>>>>>>>> *mgr,
>>>>>>>>>>>>      struct hmapx_node *node;
>>>>>>>>>>>>
>>>>>>>>>>>>      /* Notify the binding module to set "down" all bindings
> that
>>>>>> are
>>>>>>>>>>> still
>>>>>>>>>>>> -     * in the process of being installed in OVS, i.e., are
> not yet
>>>>>>>>>>> instsalled.
>>>>>>>>>>>> +     * in the process of being installed in OVS, i.e., are
> not yet
>>>>>>>>>>> installed.
>>>>>>>>>>>>       */
>>>>>>>>>>>>      HMAPX_FOR_EACH (node,
>>>>>> &mgr->ifaces_per_state[OIF_INSTALL_FLOWS])
>>>>>>>> {
>>>>>>>>>>>>          struct ovs_iface *iface = node->data;
>>>>>>>>>>>> diff --git a/controller/if-status.h b/controller/if-status.h
>>>>>>>>>>>> index bb8a3950d..f9b05d30d 100644
>>>>>>>>>>>> --- a/controller/if-status.h
>>>>>>>>>>>> +++ b/controller/if-status.h
>>>>>>>>>>>> @@ -27,15 +27,27 @@ struct if_status_mgr
>>>>>> *if_status_mgr_create(void);
>>>>>>>>>>>>  void if_status_mgr_clear(struct if_status_mgr *);
>>>>>>>>>>>>  void if_status_mgr_destroy(struct if_status_mgr *);
>>>>>>>>>>>>
>>>>>>>>>>>> -void if_status_mgr_claim_iface(struct if_status_mgr *, const
> char
>>>>>>>>>>> *iface_id);
>>>>>>>>>>>> +void if_status_mgr_claim_iface(struct if_status_mgr *,
>>>>>>>>>>>> +                               const struct
> sbrec_port_binding
>>>>>> *pb,
>>>>>>>>>>>> +                               const struct sbrec_chassis
>>>>>>>> *chassis_rec,
>>>>>>>>>>>> +                               bool sb_readonly);
>>>>>>>>>>>>  void if_status_mgr_release_iface(struct if_status_mgr *,
> const
>>>>>> char
>>>>>>>>>>> *iface_id);
>>>>>>>>>>>>  void if_status_mgr_delete_iface(struct if_status_mgr *,
> const char
>>>>>>>>>>> *iface_id);
>>>>>>>>>>>>
>>>>>>>>>>>> -void if_status_mgr_update(struct if_status_mgr *, struct
>>>>>>>>>>> local_binding_data *);
>>>>>>>>>>>> +void if_status_mgr_update(struct if_status_mgr *, struct
>>>>>>>>>>> local_binding_data *,
>>>>>>>>>>>> +                          const struct sbrec_chassis
> *chassis,
>>>>>>>>>>>> +                          bool sb_readonly);
>>>>>>>>>>>>  void if_status_mgr_run(struct if_status_mgr *mgr, struct
>>>>>>>>>>> local_binding_data *,
>>>>>>>>>>>>                         const struct sbrec_chassis *,
>>>>>>>>>>>>                         bool sb_readonly, bool ovs_readonly);
>>>>>>>>>>>>  void if_status_mgr_get_memory_usage(struct if_status_mgr
> *mgr,
>>>>>>>>>>>>                                      struct simap *usage);
>>>>>>>>>>>> +bool if_status_mgr_iface_is_present(struct if_status_mgr
> *mgr,
>>>>>>>>>>>> +                                    const char *iface_id);
>>>>>>>>>>>> +bool if_status_handle_claims(struct if_status_mgr *mgr,
>>>>>>>>>>>> +                             struct local_binding_data
>>>>>> *binding_data,
>>>>>>>>>>>> +                             const struct sbrec_chassis
>>>>>> *chassis_rec,
>>>>>>>>>>>> +                             struct hmap *tracked_datapath,
>>>>>>>>>>>> +                             bool sb_readonly);
>>>>>>>>>>>>
>>>>>>>>>>>>  # endif /* controller/if-status.h */
>>>>>>>>>>>> diff --git a/controller/ovn-controller.c
>>>>>> b/controller/ovn-controller.c
>>>>>>>>>>>> index 69615308e..3947baf03 100644
>>>>>>>>>>>> --- a/controller/ovn-controller.c
>>>>>>>>>>>> +++ b/controller/ovn-controller.c
>>>>>>>>>>>> @@ -1464,6 +1464,73 @@ en_runtime_data_run(struct engine_node
>>>>>> *node,
>>>>>>>>>>> void *data)
>>>>>>>>>>>>      engine_set_node_state(node, EN_UPDATED);
>>>>>>>>>>>>  }
>>>>>>>>>>>>
>>>>>>>>>>>> +struct ed_type_sb_ro {
>>>>>>>>>>>> +    bool sb_readonly;
>>>>>>>>>>>> +};
>>>>>>>>>>>> +
>>>>>>>>>>>> +static void *
>>>>>>>>>>>> +en_sb_ro_init(struct engine_node *node OVS_UNUSED,
>>>>>>>>>>>> +              struct engine_arg *arg OVS_UNUSED)
>>>>>>>>>>>> +{
>>>>>>>>>>>> +    struct ed_type_sb_ro *data = xzalloc(sizeof *data);
>>>>>>>>>>>> +    return data;
>>>>>>>>>>>> +}
>>>>>>>>>>>> +
>>>>>>>>>>>> +static void
>>>>>>>>>>>> +en_sb_ro_run(struct engine_node *node, void *data)
>>>>>>>>>>>> +{
>>>>>>>>>>>> +    struct ed_type_sb_ro *sb_ro_data = data;
>>>>>>>>>>>> +    bool sb_readonly = !engine_get_context()->ovnsb_idl_txn;
>>>>>>>>>>>> +    if (sb_ro_data->sb_readonly != sb_readonly) {
>>>>>>>>>>>> +        sb_ro_data->sb_readonly = sb_readonly;
>>>>>>>>>>>> +        if (!sb_ro_data->sb_readonly) {
>>>>>>>>>>>> +            engine_set_node_state(node, EN_UPDATED);
>>>>>>>>>>>> +        }
>>>>>>>>>>>> +    }
>>>>>>>>>>>> +}
>>>>>>>>>>>> +
>>>>>>>>>>>> +static void
>>>>>>>>>>>> +en_sb_ro_cleanup(void *data OVS_UNUSED)
>>>>>>>>>>>> +{
>>>>>>>>>>>> +}
>>>>>>>>>>>> +
>>>>>>>>>>>> +static bool
>>>>>>>>>>>> +runtime_data_sb_ro_handler(struct engine_node *node, void
> *data)
>>>>>>>>>>>> +{
>>>>>>>>>>>> +    const struct sbrec_chassis *chassis = NULL;
>>>>>>>>>>>> +
>>>>>>>>>>>> +    struct ovsrec_open_vswitch_table *ovs_table =
>>>>>>>>>>>> +        (struct ovsrec_open_vswitch_table *)EN_OVSDB_GET(
>>>>>>>>>>>> +            engine_get_input("OVS_open_vswitch", node));
>>>>>>>>>>>> +
>>>>>>>>>>>> +    const char *chassis_id = get_ovs_chassis_id(ovs_table);
>>>>>>>>>>>> +
>>>>>>>>>>>> +    struct ovsdb_idl_index *sbrec_chassis_by_name =
>>>>>>>>>>>> +        engine_ovsdb_node_get_index(
>>>>>>>>>>>> +                engine_get_input("SB_chassis", node),
>>>>>>>>>>>> +                "name");
>>>>>>>>>>>> +
>>>>>>>>>>>> +    if (chassis_id) {
>>>>>>>>>>>> +        chassis =
> chassis_lookup_by_name(sbrec_chassis_by_name,
>>>>>>>>>>> chassis_id);
>>>>>>>>>>>> +    }
>>>>>>>>>>>> +    if (chassis) {
>>>>>>>>>>>> +        struct ed_type_runtime_data *rt_data = data;
>>>>>>>>>>>> +        bool sb_readonly =
> !engine_get_context()->ovnsb_idl_txn;
>>>>>>>>>>>> +        struct controller_engine_ctx *ctrl_ctx =
>>>>>>>>>>>> +            engine_get_context()->client_ctx;
>>>>>>>>>>>> +
>>>>>>>>>>>> +        if (if_status_handle_claims(ctrl_ctx->if_mgr,
>>>>>>>>>>>> +                                    &rt_data->lbinding_data,
>>>>>>>>>>>> +                                    chassis,
>>>>>>>>>>>> +
>  &rt_data->tracked_dp_bindings,
>>>>>>>>>>>> +                                    sb_readonly)) {
>>>>>>>>>>>> +            engine_set_node_state(node, EN_UPDATED);
>>>>>>>>>>>> +            rt_data->tracked = true;
>>>>>>>>>>>> +        }
>>>>>>>>>>>> +    }
>>>>>>>>>>>> +    return true;
>>>>>>>>>>>> +}
>>>>>>>>>>>> +
>>>>>>>>>>>>  static bool
>>>>>>>>>>>>  runtime_data_ovs_interface_shadow_handler(struct engine_node
>>>>>> *node,
>>>>>>>>>>> void *data)
>>>>>>>>>>>>  {
>>>>>>>>>>>> @@ -3528,6 +3595,7 @@ main(int argc, char *argv[])
>>>>>>>>>>>>      stopwatch_create(VIF_PLUG_RUN_STOPWATCH_NAME, SW_MS);
>>>>>>>>>>>>
>>>>>>>>>>>>      /* Define inc-proc-engine nodes. */
>>>>>>>>>>>> +    ENGINE_NODE(sb_ro, "sb_ro");
>>>>>>>>>>>>      ENGINE_NODE_WITH_CLEAR_TRACK_DATA_IS_VALID(ct_zones,
>>>>>> "ct_zones");
>>>>>>>>>>>>      ENGINE_NODE_WITH_CLEAR_TRACK_DATA(ovs_interface_shadow,
>>>>>>>>>>>>
>  "ovs_interface_shadow");
>>>>>>>>>>>> @@ -3664,6 +3732,7 @@ main(int argc, char *argv[])
>>>>>>>>>>>>      engine_add_input(&en_ovs_interface_shadow,
> &en_ovs_interface,
>>>>>>>>>>>>
> ovs_interface_shadow_ovs_interface_handler);
>>>>>>>>>>>>
>>>>>>>>>>>> +    engine_add_input(&en_runtime_data, &en_sb_ro,
>>>>>>>>>>> runtime_data_sb_ro_handler);
>>>>>>>>>>>>      engine_add_input(&en_runtime_data,
> &en_ofctrl_is_connected,
>>>>>>>> NULL);
>>>>>>>>>>>>
>>>>>>>>>>>>      engine_add_input(&en_runtime_data, &en_ovs_open_vswitch,
>>>>>> NULL);
>>>>>>>>>>>> @@ -4098,7 +4167,8 @@ main(int argc, char *argv[])
>>>>>>>>>>>>                          runtime_data ?
>>>>>> &runtime_data->lbinding_data :
>>>>>>>>>>> NULL;
>>>>>>>>>>>>
>>>>>>>>  stopwatch_start(IF_STATUS_MGR_UPDATE_STOPWATCH_NAME,
>>>>>>>>>>>>                                      time_msec());
>>>>>>>>>>>> -                    if_status_mgr_update(if_mgr,
> binding_data);
>>>>>>>>>>>> +                    if_status_mgr_update(if_mgr,
> binding_data,
>>>>>>>> chassis,
>>>>>>>>>>>> +                                         !ovnsb_idl_txn);
>>>>>>>>>>>>
>>>>>>>>  stopwatch_stop(IF_STATUS_MGR_UPDATE_STOPWATCH_NAME,
>>>>>>>>>>>>                                     time_msec());
>>>>>>>>>>>>
>>>>>>>>>>>> diff --git a/tests/ovn-macros.at b/tests/ovn-macros.at
>>>>>>>>>>>> index 335f9158c..8fd6ae6f7 100644
>>>>>>>>>>>> --- a/tests/ovn-macros.at
>>>>>>>>>>>> +++ b/tests/ovn-macros.at
>>>>>>>>>>>> @@ -759,3 +759,15 @@
>>>>>>>> m4_define([OVN_FOR_EACH_NORTHD_WITHOUT_DP_GROUPS],
>>>>>>>>>>>>       [m4_foreach([NORTHD_USE_DP_GROUPS], [no],
>>>>>>>>>>>>         [m4_foreach([NORTHD_USE_PARALLELIZATION], [yes, no],
> [$1
>>>>>>>>>>>>  ])])])])
>>>>>>>>>>>> +
>>>>>>>>>>>> +# OVN_NBCTL(NBCTL_COMMAND) adds NBCTL_COMMAND to list of
> commands
>>>>>> to
>>>>>>>> be
>>>>>>>>>>> run by RUN_OVN_NBCTL().
>>>>>>>>>>>> +m4_define([OVN_NBCTL], [
>>>>>>>>>>>> +    command="${command} -- $1"
>>>>>>>>>>>> +])
>>>>>>>>>>>> +
>>>>>>>>>>>> +# RUN_OVN_NBCTL() executes list of commands built by the
>>>>>> OVN_NBCTL()
>>>>>>>>>>> macro.
>>>>>>>>>>>> +m4_define([RUN_OVN_NBCTL], [
>>>>>>>>>>>> +    check ovn-nbctl ${command}
>>>>>>>>>>>> +    unset command
>>>>>>>>>>>> +])
>>>>>>>>>>>> +
>>>>>>>>>>>> diff --git a/tests/ovn.at b/tests/ovn.at
>>>>>>>>>>>> index bfaa41962..94d16bac9 100644
>>>>>>>>>>>> --- a/tests/ovn.at
>>>>>>>>>>>> +++ b/tests/ovn.at
>>>>>>>>>>>> @@ -102,6 +102,18 @@ m4_divert_text([PREPARE_TESTS],
>>>>>>>>>>>>           test 1 -le $(as $hv1 ovs-ofctl dump-flows br-int |
> grep
>>>>>> -c
>>>>>>>>>>> "output:$ofport")
>>>>>>>>>>>>       ])
>>>>>>>>>>>>     }
>>>>>>>>>>>> +
>>>>>>>>>>>> +   ovn_wait_remote_input_flows () {
>>>>>>>>>>>> +     hv1=$1
>>>>>>>>>>>> +     hv2=$2
>>>>>>>>>>>> +     echo "$3: waiting for flows for remote input on $hv1"
>>>>>>>>>>>> +     # Wait for a flow outputing  to remote input
>>>>>>>>>>>> +     OVS_WAIT_UNTIL([
>>>>>>>>>>>> +         ofport=$(as $hv1 ovs-vsctl --bare --columns ofport
> find
>>>>>>>>>>> Interface name=ovn-${hv2}-0)
>>>>>>>>>>>> +         echo "tunnel port=$ofport"
>>>>>>>>>>>> +         test 1 -le $(as $hv1 ovs-ofctl dump-flows br-int |
> grep
>>>>>> -c
>>>>>>>>>>> "in_port=$ofport")
>>>>>>>>>>>> +     ])
>>>>>>>>>>>> +   }
>>>>>>>>>>>>  ])
>>>>>>>>>>>>
>>>>>>>>>>>>  m4_define([OVN_CHECK_PACKETS],
>>>>>>>>>>>> @@ -127,6 +139,8 @@ m4_define([OVN_WAIT_PATCH_PORT_FLOWS],
>>>>>>>>>>>>  m4_define([OVN_WAIT_REMOTE_OUTPUT_FLOWS],
>>>>>>>>>>>>    [ovn_wait_remote_output_flows "$1" "$2"
> "__file__:__line__"])
>>>>>>>>>>>>
>>>>>>>>>>>> +m4_define([OVN_WAIT_REMOTE_INPUT_FLOWS],
>>>>>>>>>>>> +  [ovn_wait_remote_input_flows "$1" "$2"
> "__file__:__line__"])
>>>>>>>>>>>>
>>>>>>>>>>>>  AT_BANNER([OVN components])
>>>>>>>>>>>>
>>>>>>>>>>>> @@ -14056,6 +14070,11 @@ wait_column "$hv1_uuid" Port_Binding
>>>>>>>>>>> requested_chassis logical_port=lsp0
>>>>>>>>>>>>  wait_column "$hv2_uuid" Port_Binding additional_chassis
>>>>>>>>>>> logical_port=lsp0
>>>>>>>>>>>>  wait_column "$hv2_uuid" Port_Binding
> requested_additional_chassis
>>>>>>>>>>> logical_port=lsp0
>>>>>>>>>>>>
>>>>>>>>>>>> +# Check ovn-installed updated for main chassis
>>>>>>>>>>>> +wait_for_ports_up
>>>>>>>>>>>> +OVS_WAIT_UNTIL([test `as hv1 ovs-vsctl get Interface lsp0
>>>>>>>>>>> external_ids:ovn-installed` = '"true"'])
>>>>>>>>>>>> +OVS_WAIT_UNTIL([test x`as hv2 ovs-vsctl get Interface lsp0
>>>>>>>>>>> external_ids:ovn-installed` = x])
>>>>>>>>>>>> +
>>>>>>>>>>>>  # Check that setting iface:encap-ip populates
>>>>>>>>>>> Port_Binding:additional_encap
>>>>>>>>>>>>  wait_row_count Encap 2 chassis_name=hv1
>>>>>>>>>>>>  wait_row_count Encap 2 chassis_name=hv2
>>>>>>>>>>>> @@ -14081,6 +14100,11 @@ wait_column "$hv2_uuid" Port_Binding
>>>>>>>>>>> requested_chassis logical_port=lsp0
>>>>>>>>>>>>  wait_column "" Port_Binding additional_chassis
> logical_port=lsp0
>>>>>>>>>>>>  wait_column "" Port_Binding requested_additional_chassis
>>>>>>>>>>> logical_port=lsp0
>>>>>>>>>>>>
>>>>>>>>>>>> +# Check ovn-installed updated for main chassis and not for
> other
>>>>>>>> chassis
>>>>>>>>>>>> +wait_for_ports_up
>>>>>>>>>>>> +OVS_WAIT_UNTIL([test `as hv2 ovs-vsctl get Interface lsp0
>>>>>>>>>>> external_ids:ovn-installed` = '"true"'])
>>>>>>>>>>>> +OVS_WAIT_UNTIL([test x`as hv1 ovs-vsctl get Interface lsp0
>>>>>>>>>>> external_ids:ovn-installed` = x])
>>>>>>>>>>>> +
>>>>>>>>>>>>  # Check that additional_encap is cleared
>>>>>>>>>>>>  wait_column "" Port_Binding additional_encap
> logical_port=lsp0
>>>>>>>>>>>>
>>>>>>>>>>>> @@ -15370,7 +15394,10 @@ AT_CHECK([as hv1 ovs-ofctl dump-flows
>>>>>> br-int
>>>>>>>>>>> table=65 | grep actions=output:1],
>>>>>>>>>>>>  echo "verifying that lsp0 binding moves when
> requested-chassis is
>>>>>>>>>>> changed"
>>>>>>>>>>>>
>>>>>>>>>>>>  ovn-nbctl lsp-set-options lsp0 requested-chassis=hv2
>>>>>>>>>>>> -OVS_WAIT_UNTIL([test 1 = $(grep -c "Releasing lport lsp0
> from this
>>>>>>>>>>> chassis" hv1/ovn-controller.log)])
>>>>>>>>>>>> +
>>>>>>>>>>>> +# We might see multiple "Releasing lport ...", when sb is
> read
>>>>>> only
>>>>>>>>>>>> +OVS_WAIT_UNTIL([test 1 -le $(grep -c "Releasing lport lsp0
> from
>>>>>> this
>>>>>>>>>>> chassis" hv1/ovn-controller.log)])
>>>>>>>>>>>> +
>>>>>>>>>>>>  wait_column "$hv2_uuid" Port_Binding chassis
> logical_port=lsp0
>>>>>>>>>>>>
>>>>>>>>>>>>  # (6) Chassis hv2 should add flows and hv1 should not.
>>>>>>>>>>>> @@ -15416,7 +15443,7 @@ AT_CHECK([as hv1 ovs-ofctl dump-flows
>>>>>> br-int
>>>>>>>>>>> table=0 | grep in_port=1], [0], [ig
>>>>>>>>>>>>  AT_CHECK([as hv1 ovs-ofctl dump-flows br-int table=65 | grep
>>>>>>>>>>> actions=output:1], [0], [ignore])
>>>>>>>>>>>>
>>>>>>>>>>>>  check ovn-nbctl --wait=hv lsp-set-options lsp0
>>>>>>>>>>> requested-chassis=non-existant-chassis
>>>>>>>>>>>> -OVS_WAIT_UNTIL([test 1 = $(grep -c "Releasing lport lsp0
> from this
>>>>>>>>>>> chassis" hv1/ovn-controller.log)])
>>>>>>>>>>>> +OVS_WAIT_UNTIL([test 1 -le $(grep -c "Releasing lport lsp0
> from
>>>>>> this
>>>>>>>>>>> chassis" hv1/ovn-controller.log)])
>>>>>>>>>>>>  check ovn-nbctl --wait=hv sync
>>>>>>>>>>>>  wait_column '' Port_Binding chasssi logical_port=lsp0
>>>>>>>>>>>>  AT_CHECK([as hv1 ovs-ofctl dump-flows br-int table=0 | grep
>>>>>>>> in_port=1],
>>>>>>>>>>> [1], [])
>>>>>>>>>>>> @@ -32418,3 +32445,119 @@ AT_CHECK([test $(ovn-sbctl list fdb
> |
>>>>>> grep
>>>>>>>> -c
>>>>>>>>>>> "00:00:00:00:10:30") = 0])
>>>>>>>>>>>>  OVN_CLEANUP([hv1])
>>>>>>>>>>>>  AT_CLEANUP
>>>>>>>>>>>>  ])
>>>>>>>>>>>> +
>>>>>>>>>>>> +OVN_FOR_EACH_NORTHD([
>>>>>>>>>>>> +AT_SETUP([recomputes])
>>>>>>>>>>>> +ovn_start
>>>>>>>>>>>> +
>>>>>>>>>>>> +n_hv=4
>>>>>>>>>>>> +
>>>>>>>>>>>> +# Add chassis
>>>>>>>>>>>> +net_add n1
>>>>>>>>>>>> +for i in $(seq 1 $n_hv); do
>>>>>>>>>>>> +    sim_add hv$i
>>>>>>>>>>>> +    as hv$i
>>>>>>>>>>>> +    check ovs-vsctl add-br br-phys
>>>>>>>>>>>> +    ovs-vsctl set open .
>>>>>>>> external-ids:ovn-bridge-mappings=phys:br-phys
>>>>>>>>>>>> +    ovn_attach n1 br-phys 192.168.0.$i 24 geneve
>>>>>>>>>>>> +done
>>>>>>>>>>>> +
>>>>>>>>>>>> +add_switch_ports() {
>>>>>>>>>>>> +    start_port=$1
>>>>>>>>>>>> +    end_port=$2
>>>>>>>>>>>> +    nb_hv=$3
>>>>>>>>>>>> +    bulk_size=$4
>>>>>>>>>>>> +    for ((i=start_port; i<end_port; )) do
>>>>>>>>>>>> +        start_bulk=$i
>>>>>>>>>>>> +        for hv in $(seq 1 $nb_hv); do
>>>>>>>>>>>> +            end_bulk=$((start_bulk+bulk_size-1))
>>>>>>>>>>>> +            for port in $(seq $start_bulk $end_bulk); do
>>>>>>>>>>>> +                logical_switch_port=lsp${port}
>>>>>>>>>>>> +                OVN_NBCTL(lsp-add ls1 $logical_switch_port)
>>>>>>>>>>>> +                OVN_NBCTL(lsp-set-addresses
> $logical_switch_port
>>>>>>>>>>> dynamic)
>>>>>>>>>>>> +            done
>>>>>>>>>>>> +            start_bulk=$((end_bulk+1))
>>>>>>>>>>>> +        done
>>>>>>>>>>>> +        RUN_OVN_NBCTL()
>>>>>>>>>>>> +
>>>>>>>>>>>> +        start_bulk=$i
>>>>>>>>>>>> +        for hv in $(seq 1 $nb_hv); do
>>>>>>>>>>>> +            end_bulk=$((start_bulk+bulk_size-1))
>>>>>>>>>>>> +            for port in $(seq $start_bulk $end_bulk); do
>>>>>>>>>>>> +                logical_switch_port=lsp${port}
>>>>>>>>>>>> +                as hv$hv ovs-vsctl \
>>>>>>>>>>>> +                    --no-wait -- add-port br-int vif${port} \
>>>>>>>>>>>> +                    -- set Interface vif${port}
>>>>>>>>>>> external_ids:iface-id=$logical_switch_port
>>>>>>>>>>>> +            done
>>>>>>>>>>>> +            start_bulk=$((end_bulk+1))
>>>>>>>>>>>> +        done
>>>>>>>>>>>> +        i=$((end_bulk+1))
>>>>>>>>>>>> +    done
>>>>>>>>>>>> +}
>>>>>>>>>>>> +check ovn-nbctl ls-add ls1
>>>>>>>>>>>> +check ovn-nbctl set Logical_Switch ls1 other_config:subnet=
>>>>>>>> 10.1.0.0/16
>>>>>>>>>>>> +check ovn-nbctl set Logical_Switch ls1
>>>>>>>>>>> other_config:exclude_ips=10.1.255.254
>>>>>>>>>>>> +
>>>>>>>>>>>> +check ovn-nbctl lr-add lr1
>>>>>>>>>>>> +check ovn-nbctl lsp-add ls1 lsp0 -- set Logical_Switch_Port
> lsp0
>>>>>>>>>>> type=router options:router-port=lrp0 addresses=dynamic
>>>>>>>>>>>> +check ovn-nbctl lrp-add lr1 lrp0 "f0:00:00:01:00:01"
>>>>>> 10.1.255.254/16
>>>>>>>>>>>> +check ovn-nbctl lr-nat-add lr1 snat 10.2.0.1 10.1.0.0/16
>>>>>>>>>>>> +
>>>>>>>>>>>> +lflow_run=0
>>>>>>>>>>>> +check ovn-nbctl --wait=hv sync
>>>>>>>>>>>> +
>>>>>>>>>>>> +# Tunnel ports might not be added (yet) at this point on slow
>>>>>> system.
>>>>>>>>>>>> +# Wait for flows related to such ports to ensure those ports
> have
>>>>>>>> been
>>>>>>>>>>> added
>>>>>>>>>>>> +# before we measure recomputes. Otherwise, ovs_interface
> handler
>>>>>>>> might
>>>>>>>>>>> be run
>>>>>>>>>>>> +# afterwards for tunnel ports, causing recomputes.
>>>>>>>>>>>> +for i in $(seq 1 $n_hv); do
>>>>>>>>>>>> +    for j in $(seq 1 $n_hv); do
>>>>>>>>>>>> +        if test $i != $j; then
>>>>>>>>>>>> +            OVN_WAIT_REMOTE_INPUT_FLOWS(["hv$i"],["hv$j"])
>>>>>>>>>>>> +        fi
>>>>>>>>>>>> +    done
>>>>>>>>>>>> +done
>>>>>>>>>>>> +
>>>>>>>>>>>> +for i in $(seq 1 $n_hv); do
>>>>>>>>>>>> +    as hv$i
>>>>>>>>>>>> +    lflow_run1=$(ovn-appctl -t ovn-controller
>>>>>> coverage/read-counter
>>>>>>>>>>> lflow_run)
>>>>>>>>>>>> +    lflow_run=`expr $lflow_run1 + $lflow_run`
>>>>>>>>>>>> +done
>>>>>>>>>>>> +
>>>>>>>>>>>> +add_switch_ports 1 1000 $n_hv 5
>>>>>>>>>>>> +
>>>>>>>>>>>> +wait_for_ports_up
>>>>>>>>>>>> +check ovn-nbctl --wait=hv sync
>>>>>>>>>>>> +
>>>>>>>>>>>> +for i in $(seq 1 $n_hv); do
>>>>>>>>>>>> +    pid=$(cat hv${i}/ovn-controller.pid)
>>>>>>>>>>>> +    u=$((u+$(cat "/proc/$pid/stat" | awk '{print $14}')))
>>>>>>>>>>>> +    s=$((s+$(cat "/proc/$pid/stat" | awk '{print $15}')))
>>>>>>>>>>>> +done
>>>>>>>>>>>> +
>>>>>>>>>>>> +n_pid=$(cat northd/ovn-northd.pid)
>>>>>>>>>>>> +n_u=$(cat "/proc/$pid/stat" | awk '{print $14}')
>>>>>>>>>>>> +n_s=$(cat "/proc/$pid/stat" | awk '{print $15}')
>>>>>>>>>>>> +
>>>>>>>>>>>> +echo "Total Northd User Time: $n_u"
>>>>>>>>>>>> +echo "Total Northd System Time: $n_s"
>>>>>>>>>>>> +echo "Total Controller User Time: $u"
>>>>>>>>>>>> +echo "Total Controller System Time: $s"
>>>>>>>>>>>> +
>>>>>>>>>>>> +lflow_run_end=0
>>>>>>>>>>>> +for i in $(seq 1 $n_hv); do
>>>>>>>>>>>> +    as hv$i
>>>>>>>>>>>> +    lflow_run1=$(ovn-appctl -t ovn-controller
>>>>>> coverage/read-counter
>>>>>>>>>>> lflow_run)
>>>>>>>>>>>> +    lflow_run_end=`expr $lflow_run1 + $lflow_run_end`
>>>>>>>>>>>> +done
>>>>>>>>>>>> +n_recomputes=`expr $lflow_run_end - $lflow_run`
>>>>>>>>>>>> +echo "$n_recomputes recomputes"
>>>>>>>>>>>> +
>>>>>>>>>>>> +AT_CHECK([test $lflow_run_end == $lflow_run])
>>>>>>>>>>>> +
>>>>>>>>>>>> +for i in $(seq 2 $n_hv); do
>>>>>>>>>>>> +    OVN_CLEANUP_SBOX([hv$i])
>>>>>>>>>>>> +done
>>>>>>>>>>>> +OVN_CLEANUP([hv1])
>>>>>>>>>>>> +AT_CLEANUP
>>>>>>>>>>>> +])
>>>>>>>>>>>> diff --git a/tests/perf-northd.at b/tests/perf-northd.at
>>>>>>>>>>>> index 74b69e9d4..6ec196b36 100644
>>>>>>>>>>>> --- a/tests/perf-northd.at
>>>>>>>>>>>> +++ b/tests/perf-northd.at
>>>>>>>>>>>> @@ -76,23 +76,6 @@ m4_define([PERF_RECORD_STOP], [
>>>>>>>>>>>>      PERF_RECORD_STOPWATCH(ovn-northd-loop, ["Short term
> average"],
>>>>>>>>>>> [Average (northd-loop in msec)])
>>>>>>>>>>>>  ])
>>>>>>>>>>>>
>>>>>>>>>>>> -# OVN_NBCTL([NBCTL_COMMAND])
>>>>>>>>>>>> -#
>>>>>>>>>>>> -# Add NBCTL_COMMAND to list of commands to be run by
>>>>>> RUN_OVN_NBCTL().
>>>>>>>>>>>> -#
>>>>>>>>>>>> -m4_define([OVN_NBCTL], [
>>>>>>>>>>>> -    command="${command} -- $1"
>>>>>>>>>>>> -])
>>>>>>>>>>>> -
>>>>>>>>>>>> -# RUN_OVN_NBCTL()
>>>>>>>>>>>> -#
>>>>>>>>>>>> -# Execute list of commands built by the OVN_NBCTL() macro.
>>>>>>>>>>>> -#
>>>>>>>>>>>> -m4_define([RUN_OVN_NBCTL], [
>>>>>>>>>>>> -    check ovn-nbctl ${command}
>>>>>>>>>>>> -    unset command
>>>>>>>>>>>> -])
>>>>>>>>>>>> -
>>>>>>>>>>>>  OVS_START_SHELL_HELPERS
>>>>>>>>>>>>  generate_subnet () {
>>>>>>>>>>>>      local a=$(printf %d $(expr $1 / 256 + 10))
>>>>>>>>>>>> --
>>>>>>>>>>>> 2.31.1
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>
>>> _______________________________________________
>>> dev mailing list
>>> dev@openvswitch.org
>>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>>>
>
Han Zhou July 21, 2022, 5:28 p.m. UTC | #15
On Thu, Jul 21, 2022 at 3:48 AM Dumitru Ceara <dceara@redhat.com> wrote:
>
> On 7/21/22 08:29, Han Zhou wrote:
> > On Wed, Jul 20, 2022 at 10:29 AM Numan Siddique <numans@ovn.org> wrote:
> >>
> >> On Wed, Jul 13, 2022 at 4:29 AM Dumitru Ceara <dceara@redhat.com>
wrote:
> >>>
> >>> On 7/13/22 11:08, Dumitru Ceara wrote:
> >>>> On 7/13/22 09:40, Xavier Simonart wrote:
> >>>>> Hi Han, Dumitru
> >>>>>
> >>>>
> >>>> Hi Han, Xavier,
> >>>>
> >>>> Sorry, I had already replied to the previous message and only then
> >>>> noticed this one.
> >>>>
> >>>>> I think that we should, as much as possible, try to achieve both
> > goals:
> >>>>> - have an accurate ovn-installed
> >>>>> - do not increase latency in large scale deployments
> >>>>>
> >>>>
> >>>> +1
> >>>>
> >>>>> The fact that ovn-installed is sent too early for mc flows is
> > already an
> >>>>> issue today, independent of this patch.
> >>>>> Fixing ovn-installed related to mc flows by delaying the state
> > change (for
> >>>>> all cases, included when no mc groups) might be seen as a
performance
> >>>>> regression.
> >>>>>
> >>>>
> >>>> I think it will, and I'm not sure we can convince the CMS that this
is
> >>>> "just metrics".
> >>>>
> >>>>> I agree that we should fix this ovn-installed issue, but it is not a
> >>>>> regression added by this patch. We should enter a BZ for it.
> >>>>> Per my understanding, the mc flows are updated when the
> > SB_multicast_group
> >>>>> is seen as updated by ovn-controller, due to its references to port
> > binding.
> >>>>> Other flows related to port binding are installed earlier, i.e. when
> >>>>> ovn-controller writes port_binding->chassis (i.e. before it receives
> > SB
> >>>>> confirmation). So, while sending the mc flows earlier than what we
> > do today
> >>>>> might be more complex, I think it makes some kind of sense (we would
> > send
> >>>>> all those flows within the same loop).
> >>>>
> >>>> I'm inclining towards leaving it as it is today if this is the only
> > flow
> >>>> we're missing.  It's a guess without testing things out, but I think
> >>>> it's for the MC_FLOOD_L2 multicast group which is used only for
> >>>> forwarding ARP packets originated by OVN routers or destined to a
> >>>> specific OVN router.  Losing some of those packets is not a big deal.
> >>>>
> >>>> But it might be good to confirm that this is the MC group we install
> > the
> >>>> flow for.
> >>>>
> >>>
> >>> Oh, well, the port is also part of the MC_FLOOD group.  This is
however
> >>> only used for BUM traffic.  So losing some packets here is also not
> >>> terrible, I think.
> >>>
> >>
> >> When a logical port is claimed,  we process the logical flows related
> >> to the logical port (i.e with inport == <lport> or outport == <lport>)
> >> and install
> >> the corresponding openflows. All the generic logical flows (i.e
> >> without inport or outport match) would have already been programmed
> >> (if the datapath already part of local_datapaths).
> >> These processed logical flows (lflow_handle_flows_for_lport() in
> >> lflow.c) will be most likely part of the same openflow bundle. And
> >> once the sequence number
> >> for this bundle is acknowledged we set "ovn-installed=true".  When CMS
> >> notices "ovn-installed=true" , I think it can fairly assume that the
> >> flows for the lport are
> >> programmed.
> >>
> >> I think the only flows pertaining to the logical port  which we would
> >> be missing are the multicast related flows and the logical flows which
> >> ovn-northd would generate after the logical port
> >> is claimed (presently it is the arp responder flows) and I don't think
> >> we can wait for these logical flows to be programmed by ovn-northd
> >> before setting "ovn-installed=true".
> >
> > The missing flows is just a side-effect. I am more concerned with the
> > clearness of the state-machine.
> > To my understanding it would be very clear to define the "CLAIMED"
state's
> > job as claiming the port in SB-DB. If SB commit fails, the retry should
> > happen at this state. If we see the update notification (i.e. we see the
> > PB.chassis matches the desired chassis), we move to the next state
> > "INSTALL_FLOWS". Now if we move the state forward without confirming the
> > PB.chassis is updated in SB, we would need to perform the task in all
the
> > following states. The only benefit we get from this is that
ovn-installed
> > can be set to true a little bit earlier (to save a SB round trip), at
the
> > cost of more complexity (even more so if ovn-monitor-all needs to be
> > considered in this logic) and less clarity of the state machine.
> >
> > Is it possible to address it with the simpler/clear approach first and
see
> > if it really causes obvious performance regression, then we can consider
> > the "short-cuts"? I am not sure if it is some kind of premature
> > optimization at this point.
> >
>
> It's not exactly measured with the same code but I think this is still
> relevant.  A while ago we had tried to see the impact of delaying
> ovn-installed as long as the SB is readonly (last transaction still in
> progress).  The impact was measurable, up to 500msec on the 120 node
> cluster we tested on:
>
> https://mail.openvswitch.org/pipermail/ovs-dev/2022-April/393319.html
>
> This was with a real OpenShift 120 node cluster running a scale test
> used for measuring scalability and to detect performance regressions.
>
> Now, we should maybe retest with your suggested change but, unless I'm
> missing something, the impact will be similar.
>

ok, sounds like it is critical to have the ovn-installed updated earlier. I
am ok with moving forward the state before PB.chassis is updated, but
please remember to update the state machine diagram/doc to reflect that
PB.chassis update can happen at any state, and the CLAIMED state is only
for the initial attempt. (I didn't comfirm the impact that when commit
fails, the IDL change is reverted, and whether that cause the installed
flows got removed and reinstalled - I guess not, but better to confirm that)

In addition, regarding ovn-monitor-all, I'd rather not add the dependence
in this state machine. I don't see direct dependency here, unless
PB.chassis update would change the monitor condition. Please correct me if
I am wrong.

Thanks,
Han

> Thanks,
> Dumitru
>
> > Thanks,
> > Han
> >>
> >> Delaying setting the ovn-installed=true would definitely result in
> >> latency.  It would not be easy for ovn-controller to keep track of
> >> openflows already programmed for a logical port
> >> In other words,  I don't think ovn-controller can accurately keep
> >> track of all the openflows related to a logical port are programmed or
> >> not  unless all these flows are grouped in one bundle.
> >>
> >> Also since the present ovn main already sets ovn-installed=true a bit
> >> early i.e. even before the multicast and arp responder flows are
> >> programmed, I think it is out of this patch's scope to address it.
> >>
> >> So I think the patch is fine with me once Xavier addresses (1) i.e
> >> remove the 'chassis_update_required'.
> >>
> >> Thanks
> >> Numan
> >>
> >>
> >>>> Thanks,
> >>>> Dumitru
> >>>>
> >>>>>
> >>>>> Thanks
> >>>>> Xavier
> >>>>>
> >>>>>
> >>>>>
> >>>>> On Wed, Jul 13, 2022 at 8:28 AM Han Zhou <hzhou@ovn.org> wrote:
> >>>>>
> >>>>>>
> >>>>>>
> >>>>>> On Tue, Jul 12, 2022 at 12:35 AM Dumitru Ceara <dceara@redhat.com>
> > wrote:
> >>>>>>>
> >>>>>>> On 7/12/22 08:52, Han Zhou wrote:
> >>>>>>>> On Mon, Jul 11, 2022 at 4:55 AM Dumitru Ceara <dceara@redhat.com>
> >>>>>> wrote:
> >>>>>>>>>
> >>>>>>>>> On 7/11/22 13:31, Xavier Simonart wrote:
> >>>>>>>>>> Hi Han
> >>>>>>>>>>
> >>>>>>>>>> Thanks for your review.
> >>>>>>>>>>
> >>>>>>>>>> Let me try to understand your two main concerns and the proper
> > way to
> >>>>>>>> fix
> >>>>>>>>>> it.
> >>>>>>>>>> 1) We only try once to write pb->chassis. If the commit fails,
> >>>>>>>> pb->chassis
> >>>>>>>>>> is not written. As commit fails, we will recompute, but as the
> >>>>>>>>>> update_required flag is not set anymore, we might end up with
no
> >>>>>>>>>> pb->chassis.
> >>>>>>>>>> => I'll remove the flag and try to update until it's confirmed.
> >>>>>>>>
> >>>>>>>> Thank you!
> >>>>>>>>
> >>>>>>>>>> 2) The state machine, and when we move to INSTALL_FLOWS.
> > Serializing
> >>>>>> the
> >>>>>>>>>> state machine, by waiting for confirmation to be received
before
> >>>>>> moving
> >>>>>>>> to
> >>>>>>>>>> INSTALL_FLOWS state will delay the ovn-installed compared to
> > today.
> >>>>>> So
> >>>>>>>> I am
> >>>>>>>>>> (still) trying to see if there is any way to prevent this in
> > some
> >>>>>> cases.
> >>>>>>>>>> Would it be correct to do this serialization (wait for
> > pb->chassis
> >>>>>>>> update
> >>>>>>>>>> confirmation) only when using conditional monitoring? When
using
> >>>>>>>>>> monitor-all, as soon as we have written (w/o confirmation)
> >>>>>> pb->chassis,
> >>>>>>>> we
> >>>>>>>>>> would move to INSTALL_FLOWS. In that loop where we wrote
> > pb->chassis,
> >>>>>>>> (all)
> >>>>>>>>>> the flows should be updated taking into account pb->chassis.
> >>>>>>>>>
> >>>>>>>>> I think this approach is OK.  ovn-controllers that don't use
> >>>>>> conditional
> >>>>>>>>> monitoring already know the complete SB contents and should be
> > able to
> >>>>>>>>> install (mostly?) complete sets of openflows that correspond to
a
> >>>>>> given
> >>>>>>>>> Port_Binding.
> >>>>>>>>>
> >>>>>>>>> It's also "CMS-friendly", at least for ovn-kubernetes, which
uses
> >>>>>>>>> ovn-monitor-all=true and only waits for
> >>>>>>>>> OVS.Interface.external_ids:ovn-installed=true.  And shouldn't
> > impact
> >>>>>> the
> >>>>>>>>> others which wait for SB.Port_Binding.up=true.
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>> Xavier and Dumitru, I think we shouldn't introduce the dependency
> > of
> >>>>>>>> "ovn-monitor-all" setting here.
> >>>>>>>>
> >>>>>>>> First of all, ovn-installed is a flag for CMS to understand that
> > all
> >>>>>> the
> >>>>>>>> flows related to the port-binding is installed. If we set the
flag
> >>>>>> before
> >>>>>>>> it is truly completed, it is a bug, and it is possible that the
> > flag
> >>>>>> is set
> >>>>>>>> but some traffic doesn't work.
> >>>>>>>
> >>>>>>> I think it's a matter of semantics.  The way I see
> > "ovn-installed=true"
> >>>>>>> is: all flows that are relevant to the port binidng on the local
> > chassis
> >>>>>>> have been installed.  When we added it it was for the ovn-k8s
case;
> >>>>>>> ovn-k8s used to explicitly check if some openflow tables on the
> > node
> >>>>>>> where the pod is being brought up contained flows that seemed to
> >>>>>>> correspond to the pod (e.g., matching on pod mac and IP
addresses).
> >>>>>>>
> >>>>>> But the purpose of checking the flows (when ovn-installed wasn't
> >>>>>> available) was to make sure the pod is ready to send/receive
> > traffic. If
> >>>>>> ovn-installed can provide more accuracy, why not?
> >>>>>>
> >>>>>>>> I did a quick test, and at least a flow in (table_id=38,
> > priority=100)
> >>>>>>>> which is multicast-group related is updated AFTER the SB
> > notification
> >>>>>> is
> >>>>>>>> received for the port-binding chassis update.
> >>>>>>>>
> >>>>>>>
> >>>>>>> This sounds like something we should fix, I think.  I don't see
any
> >>>>>>> multicast-group changes conditioned by the port_binding being
> > up=true in
> >>>>>>> northd.  I might be wrong though.
> >>>>>>>
> >>>>>>
> >>>>>> It is not about "up=true". It is triggered by the
> > port-binding->chassis
> >>>>>> update. Since multicast-group has reference to port-binding, so a
> >>>>>> port-binding update triggers multicast-group change handling, which
> > is
> >>>>>> required because physical flows related to the MC group need to be
> > updated
> >>>>>> when port-binding->chassis is updated. You may argue that the IDL
> > may be
> >>>>>> optimized so that the MC group change can be triggered and handled
> > before
> >>>>>> SB is updated, but I am not sure if the benefit is worth the
> > complexity.
> >>>>>> Given how OVSDB IDL transaction is designed, I'd always think a DB
> > record
> >>>>>> is *formally* updated only after the update notification is
> > received from
> >>>>>> the server, which seems to be safe and clear.
> >>>>>>
> >>>>>>>> Secondly, if the change hasn't made it to the SB, all the other
> > nodes
> >>>>>> would
> >>>>>>>> not be able to reach the port, which means the workload (pod/VM)
> > cannot
> >>>>>>>> receive traffic yet at this phase.
> >>>>>>>>
> >>>>>>>
> >>>>>>> Even if the change made it to the SB we have no way of knowing
> > that all
> >>>>>>> other nodes processed it so we cannot know for sure that traffic
> > can
> >>>>>>> flow properly end-to-end.  But, like I said above, this doesn't
> > matter
> >>>>>>> if the semantics of ovn-installed=true are "all locally relevant
> > flows
> >>>>>>> are installed".
> >>>>>>>
> >>>>>> It's true that even SB is updated it doesn't ensure all the nodes
> >>>>>> processed it, but I view it this way: at least from the current
> > node's
> >>>>>> point of view, its job is done and the other nodes are beyond its
> > control.
> >>>>>> On the other hand, if SB update failed, its job is not done yet. I
> > am not
> >>>>>> saying this is the only *correct* way, but just the way I am seeing
> > it :).
> >>>>>>
> >>>>>>>> So, I think our goal is not to set ovn-installed early, but to
> > set it
> >>>>>>>> accurately (sometime may be ok to be conservative).
> >>>>>>>>
> >>>>>>>
> >>>>>>> Sure, but waiting for the SB port_binding.chassis update might
> > introduce
> >>>>>>> significant spikes in latency if the SB is compacting (or just
> > busy) at
> >>>>>>> that moment.
> >>>>>>>
> >>>>>>> This might become an issue in large scale deployments as pods will
> > take
> >>>>>>> longer to be declared "ready".
> >>>>>>>
> >>>>>> I understand your concern, but if you think about it, no matter how
> > the
> >>>>>> pods are *declared* ready doesn't change the fact it is ready or
> > not. It
> >>>>>> doesn't make the real flow setup faster or slower.
> >>>>>> If the CMS really wants to declare it ready earlier, it can just
> > ignore
> >>>>>> the ovn-installed flag check or flow check. What's the real benefit
> > except
> >>>>>> for metrics?
> >>>>>>
> >>>>>>>> In addition, ovn-monitor-all is not always true even in ovn-k8s.
> > It is
> >>>>>>>> configurable in ovn-k8s. (in our environment we set it to false,
> > to
> >>>>>> save
> >>>>>>>> MEM and CPU for worker nodes, while sacrifice a little for the
> > central
> >>>>>> SB
> >>>>>>>> DB)
> >>>>>>>>
> >>>>>>>
> >>>>>>> Ack.  But for this case specifically, as SB is already busier with
> >>>>>>> conditional monitoring, I think serializing events in
> > ovn-controller
> >>>>>>> will create even more visible delays in pod bringup times.
> >>>>>>>
> >>>>>>> Thanks,
> >>>>>>> Dumitru
> >>>>>>>
> >>>>>>>> Thanks,
> >>>>>>>> Han
> >>>>>>>>>>
> >>>>>>>>>> Thanks again for your feedback
> >>>>>>>>>>
> >>>>>>>>>> Xavier
> >>>>>>>>>>
> >>>>>>>>>> On Thu, Jul 7, 2022 at 8:49 AM Han Zhou <hzhou@ovn.org> wrote:
> >>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> On Wed, Jun 22, 2022 at 3:23 AM Xavier Simonart <
> >>>>>> xsimonar@redhat.com>
> >>>>>>>>>>> wrote:
> >>>>>>>>>>>>
> >>>>>>>>>>>> When VIF ports are claimed on a chassis, SBDB Port_Binding
> > table is
> >>>>>>>>>>> updated.
> >>>>>>>>>>>> If the SBDB IDL is still is read-only ("in transaction") when
> > such
> >>>>>> a
> >>>>>>>>>>> update
> >>>>>>>>>>>> is required, the update is not possible and recompute is
> > triggered
> >>>>>>>>>>> through
> >>>>>>>>>>>> I+P failure.
> >>>>>>>>>>>>
> >>>>>>>>>>>> This situation can happen:
> >>>>>>>>>>>> - after updating Port_Binding->chassis to SBDB for one port,
> > in a
> >>>>>>>>>>> following
> >>>>>>>>>>>>   iteration, ovn-controller handles
> >>>>>>>> Interface:external_ids:ovn-installed
> >>>>>>>>>>>>   (for the same port) while SBDB is still read-only.
> >>>>>>>>>>>> - after updating Port_Binding->chassis to SBDB for one port,
> > in a
> >>>>>>>>>>> following
> >>>>>>>>>>>>   iteration, ovn-controller updates Port_Binding->chassis for
> >>>>>> another
> >>>>>>>>>>> port,
> >>>>>>>>>>>>   while SBDB is still read-only.
> >>>>>>>>>>>>
> >>>>>>>>>>>> This patch prevent the recompute, by having the if-status
> > module
> >>>>>>>>>>>> updating the Port_Binding chassis (if needed) when possible.
> >>>>>>>>>>>> This does not delay Port_Binding chassis update compared to
> > before
> >>>>>>>> this
> >>>>>>>>>>> patch.
> >>>>>>>>>>>> - With the patch, Port_Binding chassis will be updated as
> > soon as
> >>>>>>>> SBDB is
> >>>>>>>>>>>> again writable, without recompute.
> >>>>>>>>>>>> - Without the patch, Port_Binding chassis was updated as soon
> > as
> >>>>>> SBDB
> >>>>>>>> was
> >>>>>>>>>>>> again writable, through a recompute.
> >>>>>>>>>>>>
> >>>>>>>>>>>> As part of this patch, ovn-installed will not be updated for
> >>>>>>>> additional
> >>>>>>>>>>> chassis;
> >>>>>>>>>>>> it will only be updated when the migration is completed.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Reported-at:
> > https://bugzilla.redhat.com/show_bug.cgi?id=2059253
> >>>>>>>>>>>> Signed-off-by: Xavier Simonart <xsimonar@redhat.com>
> >>>>>>>>>>>>
> >>>>>>>>>>>> ---
> >>>>>>>>>>>> v2:  - handled Dumitru's comments.
> >>>>>>>>>>>>      - handled Han's comments, mainly ensure we moved out of
> >>>>>> CLAIMED
> >>>>>>>>>>> state
> >>>>>>>>>>>>        only after updating pb->chassis to guarentee physical
> > flows
> >>>>>> are
> >>>>>>>>>>> installed
> >>>>>>>>>>>>        when ovn-installed is updated in OVS.
> >>>>>>>>>>>>      - slighly reorganize the code to isolate 'notify_up =
> > false'
> >>>>>>>> cases
> >>>>>>>>>>> in
> >>>>>>>>>>>>        claim_port (i.e. ports such as virtual ports), in the
> > idea
> >>>>>> of
> >>>>>>>>>>> making
> >>>>>>>>>>>>        future patch preventing recomputes when virtual ports
> > are
> >>>>>>>> claimed.
> >>>>>>>>>>>>      - updated test case to cause more race conditions.
> >>>>>>>>>>>>      - rebased on origin/main
> >>>>>>>>>>>>      - note that "additional chassis" as now supported by
> >>>>>>>>>>>>        "Support LSP:options:requested-chassis as a list"
might
> >>>>>> still
> >>>>>>>>>>> cause
> >>>>>>>>>>>>        recomputes.
> >>>>>>>>>>>>      - fixed missing flows when Port_Binding chassis was
> > updated by
> >>>>>>>>>>> mgr_update
> >>>>>>>>>>>>        w/o any lflow recalculation.
> >>>>>>>>>>>> v3:  - handled Dumitru's comments on v2, mainly have
> > runtime_data
> >>>>>>>> handler
> >>>>>>>>>>>>        handling pb_claims when sb becomes writable (instead
> > of a
> >>>>>> lflow
> >>>>>>>>>>> handler).
> >>>>>>>>>>>>      - fixed test as it was not checking recomputes on all
> > hv, as
> >>>>>> well
> >>>>>>>>>>> as a flaky
> >>>>>>>>>>>>        behavior.
> >>>>>>>>>>>>      - rebased on origin/main.
> >>>>>>>>>>>> ---
> >>>>>>>>>>>>  controller/binding.c        | 154
> > +++++++++++++++++++++----------
> >>>>>>>>>>>>  controller/binding.h        |  15 +++-
> >>>>>>>>>>>>  controller/if-status.c      | 174
> >>>>>>>> ++++++++++++++++++++++++++++++++----
> >>>>>>>>>>>>  controller/if-status.h      |  16 +++-
> >>>>>>>>>>>>  controller/ovn-controller.c |  72 ++++++++++++++-
> >>>>>>>>>>>>  tests/ovn-macros.at         |  12 +++
> >>>>>>>>>>>>  tests/ovn.at                | 147
> > +++++++++++++++++++++++++++++-
> >>>>>>>>>>>>  tests/perf-northd.at        |  17 ----
> >>>>>>>>>>>>  8 files changed, 519 insertions(+), 88 deletions(-)
> >>>>>>>>>>>>
> >>>>>>>>>>>> diff --git a/controller/binding.c b/controller/binding.c
> >>>>>>>>>>>> index 2279570f9..b21577f71 100644
> >>>>>>>>>>>> --- a/controller/binding.c
> >>>>>>>>>>>> +++ b/controller/binding.c
> >>>>>>>>>>>> @@ -644,11 +644,17 @@ local_binding_get_lport_ofport(const
> > struct
> >>>>>>>> shash
> >>>>>>>>>>> *local_bindings,
> >>>>>>>>>>>>  }
> >>>>>>>>>>>>
> >>>>>>>>>>>>  bool
> >>>>>>>>>>>> -local_binding_is_up(struct shash *local_bindings, const char
> >>>>>>>> *pb_name)
> >>>>>>>>>>>> +local_binding_is_up(struct shash *local_bindings, const char
> >>>>>>>> *pb_name,
> >>>>>>>>>>>> +                    const struct sbrec_chassis *chassis_rec)
> >>>>>>>>>>>>  {
> >>>>>>>>>>>>      struct local_binding *lbinding =
> >>>>>>>>>>>>          local_binding_find(local_bindings, pb_name);
> >>>>>>>>>>>>      struct binding_lport *b_lport =
> >>>>>>>>>>> local_binding_get_primary_lport(lbinding);
> >>>>>>>>>>>> +
> >>>>>>>>>>>> +    if (b_lport && b_lport->pb->chassis != chassis_rec) {
> >>>>>>>>>>>> +        return false;
> >>>>>>>>>>>> +    }
> >>>>>>>>>>>> +
> >>>>>>>>>>>>      if (lbinding && b_lport && lbinding->iface) {
> >>>>>>>>>>>>          if (b_lport->pb->n_up && !b_lport->pb->up[0]) {
> >>>>>>>>>>>>              return false;
> >>>>>>>>>>>> @@ -660,13 +666,23 @@ local_binding_is_up(struct shash
> >>>>>>>> *local_bindings,
> >>>>>>>>>>> const char *pb_name)
> >>>>>>>>>>>>  }
> >>>>>>>>>>>>
> >>>>>>>>>>>>  bool
> >>>>>>>>>>>> -local_binding_is_down(struct shash *local_bindings, const
> > char
> >>>>>>>> *pb_name)
> >>>>>>>>>>>> +local_binding_is_down(struct shash *local_bindings, const
> > char
> >>>>>>>> *pb_name,
> >>>>>>>>>>>> +                      const struct sbrec_chassis
> > *chassis_rec)
> >>>>>>>>>>>>  {
> >>>>>>>>>>>>      struct local_binding *lbinding =
> >>>>>>>>>>>>          local_binding_find(local_bindings, pb_name);
> >>>>>>>>>>>>
> >>>>>>>>>>>>      struct binding_lport *b_lport =
> >>>>>>>>>>> local_binding_get_primary_lport(lbinding);
> >>>>>>>>>>>>
> >>>>>>>>>>>> +    if (b_lport) {
> >>>>>>>>>>>> +        if (b_lport->pb->chassis == chassis_rec) {
> >>>>>>>>>>>> +            return false;
> >>>>>>>>>>>> +        } else if (b_lport->pb->chassis) {
> >>>>>>>>>>>> +            VLOG_DBG("lport %s already claimed by other
> > chassis",
> >>>>>>>>>>>> +                     b_lport->pb->logical_port);
> >>>>>>>>>>>> +        }
> >>>>>>>>>>>> +    }
> >>>>>>>>>>>> +
> >>>>>>>>>>>>      if (!lbinding) {
> >>>>>>>>>>>>          return true;
> >>>>>>>>>>>>      }
> >>>>>>>>>>>> @@ -884,37 +900,80 @@ get_lport_type_str(enum en_lport_type
> >>>>>>>> lport_type)
> >>>>>>>>>>>>      OVS_NOT_REACHED();
> >>>>>>>>>>>>  }
> >>>>>>>>>>>>
> >>>>>>>>>>>> -/* For newly claimed ports, if 'notify_up' is 'false':
> >>>>>>>>>>>> +void
> >>>>>>>>>>>> +set_pb_chassis_in_sbrec(const struct sbrec_port_binding *pb,
> >>>>>>>>>>>> +                        const struct sbrec_chassis
> > *chassis_rec,
> >>>>>>>>>>>> +                        bool is_set)
> >>>>>>>>>>>> +{
> >>>>>>>>>>>> +    if (pb->chassis != chassis_rec) {
> >>>>>>>>>>>> +         if (is_set) {
> >>>>>>>>>>>> +            if (pb->chassis) {
> >>>>>>>>>>>> +                VLOG_INFO("Changing chassis for lport %s
> > from %s
> >>>>>> to
> >>>>>>>>>>> %s.",
> >>>>>>>>>>>> +                          pb->logical_port,
> > pb->chassis->name,
> >>>>>>>>>>>> +                          chassis_rec->name);
> >>>>>>>>>>>> +            } else {
> >>>>>>>>>>>> +                VLOG_INFO("Claiming lport %s for this
> > chassis.",
> >>>>>>>>>>>> +                          pb->logical_port);
> >>>>>>>>>>>> +            }
> >>>>>>>>>>>> +            for (int i = 0; i < pb->n_mac; i++) {
> >>>>>>>>>>>> +                VLOG_INFO("%s: Claiming %s",
> > pb->logical_port,
> >>>>>>>>>>> pb->mac[i]);
> >>>>>>>>>>>> +            }
> >>>>>>>>>>>> +            sbrec_port_binding_set_chassis(pb, chassis_rec);
> >>>>>>>>>>>> +        }
> >>>>>>>>>>>> +    } else if (!is_set) {
> >>>>>>>>>>>> +        sbrec_port_binding_set_chassis(pb, NULL);
> >>>>>>>>>>>> +    }
> >>>>>>>>>>>> +}
> >>>>>>>>>>>> +
> >>>>>>>>>>>> +void
> >>>>>>>>>>>> +local_binding_set_pb(struct shash *local_bindings, const
char
> >>>>>>>> *pb_name,
> >>>>>>>>>>>> +                     const struct sbrec_chassis
*chassis_rec,
> >>>>>>>>>>>> +                     struct hmap *tracked_datapaths, bool
> > is_set)
> >>>>>>>>>>>> +{
> >>>>>>>>>>>> +    struct local_binding *lbinding =
> >>>>>>>>>>>> +        local_binding_find(local_bindings, pb_name);
> >>>>>>>>>>>> +    struct binding_lport *b_lport =
> >>>>>>>>>>> local_binding_get_primary_lport(lbinding);
> >>>>>>>>>>>> +
> >>>>>>>>>>>> +    if (b_lport) {
> >>>>>>>>>>>> +        set_pb_chassis_in_sbrec(b_lport->pb, chassis_rec,
> > is_set);
> >>>>>>>>>>>> +        if (tracked_datapaths) {
> >>>>>>>>>>>> +            update_lport_tracking(b_lport->pb,
> > tracked_datapaths,
> >>>>>>>> true);
> >>>>>>>>>>>> +        }
> >>>>>>>>>>>> +    }
> >>>>>>>>>>>> +}
> >>>>>>>>>>>> +
> >>>>>>>>>>>> +/* For newly claimed ports:
> >>>>>>>>>>>>   * - set the 'pb.up' field to true if 'pb' has no
> > 'parent_pb'.
> >>>>>>>>>>>>   * - set the 'pb.up' field to true if 'parent_pb.up' is
> > 'true'
> >>>>>> (e.g.,
> >>>>>>>>>>> for
> >>>>>>>>>>>>   *   container and virtual ports).
> >>>>>>>>>>>> - * Otherwise request a notification to be sent when the OVS
> > flows
> >>>>>>>>>>>> - * corresponding to 'pb' have been installed.
> >>>>>>>>>>>> + *
> >>>>>>>>>>>> + * Returns false if lport is not claimed due to
> > 'sb_readonly'.
> >>>>>>>>>>>> + * Returns true otherwise.
> >>>>>>>>>>>>   *
> >>>>>>>>>>>>   * Note:
> >>>>>>>>>>>> - *   Updates (directly or through a notification) the
> > 'pb->up'
> >>>>>> field
> >>>>>>>>>>> only if
> >>>>>>>>>>>> - *   it's explicitly set to 'false'.
> >>>>>>>>>>>> + *   Updates the 'pb->up' field only if it's explicitly set
> > to
> >>>>>>>> 'false'.
> >>>>>>>>>>>>   *   This is to ensure compatibility with older versions of
> >>>>>>>> ovn-northd.
> >>>>>>>>>>>>   */
> >>>>>>>>>>>> -static void
> >>>>>>>>>>>> +static bool
> >>>>>>>>>>>>  claimed_lport_set_up(const struct sbrec_port_binding *pb,
> >>>>>>>>>>>>                       const struct sbrec_port_binding
> > *parent_pb,
> >>>>>>>>>>>> -                     const struct sbrec_chassis
*chassis_rec,
> >>>>>>>>>>>> -                     bool notify_up, struct if_status_mgr
> > *if_mgr)
> >>>>>>>>>>>> +                     bool sb_readonly)
> >>>>>>>>>>>>  {
> >>>>>>>>>>>> -    if (!notify_up) {
> >>>>>>>>>>>> -        bool up = true;
> >>>>>>>>>>>> -        if (!parent_pb || (parent_pb->n_up &&
> > parent_pb->up[0])) {
> >>>>>>>>>>>> +    /* When notify_up is false in claim_port(), no state is
> >>>>>> created
> >>>>>>>>>>>> +     * by if_status_mgr. In such cases, return false (i.e.
> > trigger
> >>>>>>>>>>> recompute)
> >>>>>>>>>>>> +     * if we can't update sb (because it is readonly).
> >>>>>>>>>>>> +     */
> >>>>>>>>>>>> +    bool up = true;
> >>>>>>>>>>>> +    if (!parent_pb || (parent_pb->n_up && parent_pb->up[0]))
> > {
> >>>>>>>>>>>> +        if (!sb_readonly) {
> >>>>>>>>>>>>              if (pb->n_up) {
> >>>>>>>>>>>>                  sbrec_port_binding_set_up(pb, &up, 1);
> >>>>>>>>>>>>              }
> >>>>>>>>>>>> +        } else if (pb->n_up && !pb->up[0]) {
> >>>>>>>>>>>> +            return false;
> >>>>>>>>>>>>          }
> >>>>>>>>>>>> -        return;
> >>>>>>>>>>>> -    }
> >>>>>>>>>>>> -
> >>>>>>>>>>>> -    if (pb->chassis != chassis_rec || (pb->n_up &&
> > !pb->up[0])) {
> >>>>>>>>>>>> -        if_status_mgr_claim_iface(if_mgr, pb->logical_port);
> >>>>>>>>>>>>      }
> >>>>>>>>>>>> +    return true;
> >>>>>>>>>>>>  }
> >>>>>>>>>>>>
> >>>>>>>>>>>>  typedef void (*set_func)(const struct sbrec_port_binding
*pb,
> >>>>>>>>>>>> @@ -1057,37 +1116,35 @@ claim_lport(const struct
> > sbrec_port_binding
> >>>>>>>> *pb,
> >>>>>>>>>>>>              struct hmap *tracked_datapaths,
> >>>>>>>>>>>>              struct if_status_mgr *if_mgr)
> >>>>>>>>>>>>  {
> >>>>>>>>>>>> -    if (!sb_readonly) {
> >>>>>>>>>>>> -        claimed_lport_set_up(pb, parent_pb, chassis_rec,
> >>>>>> notify_up,
> >>>>>>>>>>> if_mgr);
> >>>>>>>>>>>> -    }
> >>>>>>>>>>>> -
> >>>>>>>>>>>>      enum can_bind can_bind =
> >>>>>>>>>>> lport_can_bind_on_this_chassis(chassis_rec, pb);
> >>>>>>>>>>>>      bool update_tracked = false;
> >>>>>>>>>>>>
> >>>>>>>>>>>>      if (can_bind == CAN_BIND_AS_MAIN) {
> >>>>>>>>>>>>          if (pb->chassis != chassis_rec) {
> >>>>>>>>>>>> -            if (sb_readonly) {
> >>>>>>>>>>>> -                return false;
> >>>>>>>>>>>> -            }
> >>>>>>>>>>>> -
> >>>>>>>>>>>> -            if (pb->chassis) {
> >>>>>>>>>>>> -                VLOG_INFO("Changing chassis for lport %s
> > from %s
> >>>>>> to
> >>>>>>>>>>> %s.",
> >>>>>>>>>>>> -                        pb->logical_port, pb->chassis->name,
> >>>>>>>>>>>> -                        chassis_rec->name);
> >>>>>>>>>>>> -            } else {
> >>>>>>>>>>>> -                VLOG_INFO("Claiming lport %s for this
> > chassis.",
> >>>>>>>>>>>> -                          pb->logical_port);
> >>>>>>>>>>>> -            }
> >>>>>>>>>>>> -            for (size_t i = 0; i < pb->n_mac; i++) {
> >>>>>>>>>>>> -                VLOG_INFO("%s: Claiming %s",
> > pb->logical_port,
> >>>>>>>>>>> pb->mac[i]);
> >>>>>>>>>>>> -            }
> >>>>>>>>>>>> -
> >>>>>>>>>>>> -            sbrec_port_binding_set_chassis(pb, chassis_rec);
> >>>>>>>>>>>>              if (is_additional_chassis(pb, chassis_rec)) {
> >>>>>>>>>>>> +                if (sb_readonly) {
> >>>>>>>>>>>> +                    return false;
> >>>>>>>>>>>> +                }
> >>>>>>>>>>>>                  remove_additional_chassis(pb, chassis_rec);
> >>>>>>>>>>>>              }
> >>>>>>>>>>>>              update_tracked = true;
> >>>>>>>>>>>>          }
> >>>>>>>>>>>> +        if (!notify_up) {
> >>>>>>>>>>>> +            if (!claimed_lport_set_up(pb, parent_pb,
> >>>>>> sb_readonly)) {
> >>>>>>>>>>>> +                return false;
> >>>>>>>>>>>> +            }
> >>>>>>>>>>>> +            if (pb->chassis != chassis_rec) {
> >>>>>>>>>>>> +                if (sb_readonly) {
> >>>>>>>>>>>> +                    return false;
> >>>>>>>>>>>> +                }
> >>>>>>>>>>>> +                set_pb_chassis_in_sbrec(pb, chassis_rec,
> > true);
> >>>>>>>>>>>> +            }
> >>>>>>>>>>>> +        } else {
> >>>>>>>>>>>> +            if ((pb->chassis != chassis_rec) || (pb->n_up &&
> >>>>>>>>>>> !pb->up[0])) {
> >>>>>>>>>>>> +                if_status_mgr_claim_iface(if_mgr, pb,
> > chassis_rec,
> >>>>>>>>>>>> +                                          sb_readonly);
> >>>>>>>>>>>> +            }
> >>>>>>>>>>>> +        }
> >>>>>>>>>>>>      } else if (can_bind == CAN_BIND_AS_ADDITIONAL) {
> >>>>>>>>>>>>          if (!is_additional_chassis(pb, chassis_rec)) {
> >>>>>>>>>>>>              if (sb_readonly) {
> >>>>>>>>>>>> @@ -1132,7 +1189,8 @@ claim_lport(const struct
> > sbrec_port_binding
> >>>>>> *pb,
> >>>>>>>>>>>>   */
> >>>>>>>>>>>>  static bool
> >>>>>>>>>>>>  release_lport_main_chassis(const struct sbrec_port_binding
> > *pb,
> >>>>>>>>>>>> -                           bool sb_readonly)
> >>>>>>>>>>>> +                           bool sb_readonly,
> >>>>>>>>>>>> +                           struct if_status_mgr *if_mgr)
> >>>>>>>>>>>>  {
> >>>>>>>>>>>>      if (pb->encap) {
> >>>>>>>>>>>>          if (sb_readonly) {
> >>>>>>>>>>>> @@ -1141,11 +1199,13 @@ release_lport_main_chassis(const
> > struct
> >>>>>>>>>>> sbrec_port_binding *pb,
> >>>>>>>>>>>>          sbrec_port_binding_set_encap(pb, NULL);
> >>>>>>>>>>>>      }
> >>>>>>>>>>>>
> >>>>>>>>>>>> +    /* If sb readonly, pb->chassis unset through if-status
if
> >>>>>>>> present.
> >>>>>>>>>>> */
> >>>>>>>>>>>>      if (pb->chassis) {
> >>>>>>>>>>>> -        if (sb_readonly) {
> >>>>>>>>>>>> +        if (!sb_readonly) {
> >>>>>>>>>>>> +            sbrec_port_binding_set_chassis(pb, NULL);
> >>>>>>>>>>>> +        } else if (!if_status_mgr_iface_is_present(if_mgr,
> >>>>>>>>>>> pb->logical_port)) {
> >>>>>>>>>>>>              return false;
> >>>>>>>>>>>>          }
> >>>>>>>>>>>> -        sbrec_port_binding_set_chassis(pb, NULL);
> >>>>>>>>>>>>      }
> >>>>>>>>>>>>
> >>>>>>>>>>>>      if (pb->virtual_parent) {
> >>>>>>>>>>>> @@ -1155,7 +1215,8 @@ release_lport_main_chassis(const struct
> >>>>>>>>>>> sbrec_port_binding *pb,
> >>>>>>>>>>>>          sbrec_port_binding_set_virtual_parent(pb, NULL);
> >>>>>>>>>>>>      }
> >>>>>>>>>>>>
> >>>>>>>>>>>> -    VLOG_INFO("Releasing lport %s from this chassis.",
> >>>>>>>>>>> pb->logical_port);
> >>>>>>>>>>>> +    VLOG_INFO("Releasing lport %s from this chassis
> >>>>>>>> (sb_readonly=%d)",
> >>>>>>>>>>>> +              pb->logical_port, sb_readonly);
> >>>>>>>>>>>>      return true;
> >>>>>>>>>>>>  }
> >>>>>>>>>>>>
> >>>>>>>>>>>> @@ -1189,7 +1250,7 @@ release_lport(const struct
> > sbrec_port_binding
> >>>>>>>> *pb,
> >>>>>>>>>>>>                struct hmap *tracked_datapaths, struct
> > if_status_mgr
> >>>>>>>>>>> *if_mgr)
> >>>>>>>>>>>>  {
> >>>>>>>>>>>>      if (pb->chassis == chassis_rec) {
> >>>>>>>>>>>> -        if (!release_lport_main_chassis(pb, sb_readonly)) {
> >>>>>>>>>>>> +        if (!release_lport_main_chassis(pb, sb_readonly,
> > if_mgr))
> >>>>>> {
> >>>>>>>>>>>>              return false;
> >>>>>>>>>>>>          }
> >>>>>>>>>>>>      } else if (is_additional_chassis(pb, chassis_rec)) {
> >>>>>>>>>>>> @@ -1271,7 +1332,7 @@ consider_vif_lport_(const struct
> >>>>>>>>>>> sbrec_port_binding *pb,
> >>>>>>>>>>>>                               b_lport->lbinding->iface,
> >>>>>>>>>>>>                               !b_ctx_in->ovnsb_idl_txn,
> >>>>>>>>>>>>                               !parent_pb,
> >>>>>>>> b_ctx_out->tracked_dp_bindings,
> >>>>>>>>>>>> -                             b_ctx_out->if_mgr)){
> >>>>>>>>>>>> +                             b_ctx_out->if_mgr)) {
> >>>>>>>>>>>>                  return false;
> >>>>>>>>>>>>              }
> >>>>>>>>>>>>
> >>>>>>>>>>>> @@ -1527,7 +1588,8 @@ consider_localport(const struct
> >>>>>>>> sbrec_port_binding
> >>>>>>>>>>> *pb,
> >>>>>>>>>>>>      enum can_bind can_bind = lport_can_bind_on_this_chassis(
> >>>>>>>>>>>>          b_ctx_in->chassis_rec, pb);
> >>>>>>>>>>>>      if (can_bind == CAN_BIND_AS_MAIN) {
> >>>>>>>>>>>> -        if (!release_lport_main_chassis(pb,
> >>>>>>>> !b_ctx_in->ovnsb_idl_txn)) {
> >>>>>>>>>>>> +        if (!release_lport_main_chassis(pb,
> >>>>>> !b_ctx_in->ovnsb_idl_txn,
> >>>>>>>>>>>> +            b_ctx_out->if_mgr)) {
> >>>>>>>>>>>>              return false;
> >>>>>>>>>>>>          }
> >>>>>>>>>>>>      } else if (can_bind == CAN_BIND_AS_ADDITIONAL) {
> >>>>>>>>>>>> diff --git a/controller/binding.h b/controller/binding.h
> >>>>>>>>>>>> index 1fed06674..d20659b0b 100644
> >>>>>>>>>>>> --- a/controller/binding.h
> >>>>>>>>>>>> +++ b/controller/binding.h
> >>>>>>>>>>>> @@ -151,8 +151,10 @@ const struct sbrec_port_binding
> >>>>>>>>>>> *local_binding_get_primary_pb(
> >>>>>>>>>>>>  ofp_port_t local_binding_get_lport_ofport(const struct shash
> >>>>>>>>>>> *local_bindings,
> >>>>>>>>>>>>                                            const char
> > *pb_name);
> >>>>>>>>>>>>
> >>>>>>>>>>>> -bool local_binding_is_up(struct shash *local_bindings, const
> > char
> >>>>>>>>>>> *pb_name);
> >>>>>>>>>>>> -bool local_binding_is_down(struct shash *local_bindings,
> > const
> >>>>>> char
> >>>>>>>>>>> *pb_name);
> >>>>>>>>>>>> +bool local_binding_is_up(struct shash *local_bindings, const
> > char
> >>>>>>>>>>> *pb_name,
> >>>>>>>>>>>> +                         const struct sbrec_chassis *);
> >>>>>>>>>>>> +bool local_binding_is_down(struct shash *local_bindings,
> > const
> >>>>>> char
> >>>>>>>>>>> *pb_name,
> >>>>>>>>>>>> +                           const struct sbrec_chassis *);
> >>>>>>>>>>>>  void local_binding_set_up(struct shash *local_bindings,
> > const char
> >>>>>>>>>>> *pb_name,
> >>>>>>>>>>>>                            const struct sbrec_chassis
> > *chassis_rec,
> >>>>>>>>>>>>                            const char *ts_now_str, bool
> >>>>>> sb_readonly,
> >>>>>>>>>>>> @@ -160,7 +162,10 @@ void local_binding_set_up(struct shash
> >>>>>>>>>>> *local_bindings, const char *pb_name,
> >>>>>>>>>>>>  void local_binding_set_down(struct shash *local_bindings,
> > const
> >>>>>> char
> >>>>>>>>>>> *pb_name,
> >>>>>>>>>>>>                              const struct sbrec_chassis
> >>>>>> *chassis_rec,
> >>>>>>>>>>>>                              bool sb_readonly, bool
> > ovs_readonly);
> >>>>>>>>>>>> -
> >>>>>>>>>>>> +void local_binding_set_pb(struct shash *local_bindings,
> > const char
> >>>>>>>>>>> *pb_name,
> >>>>>>>>>>>> +                          const struct sbrec_chassis
> > *chassis_rec,
> >>>>>>>>>>>> +                          struct hmap *tracked_datapaths,
> >>>>>>>>>>>> +                          bool is_set);
> >>>>>>>>>>>>  void binding_register_ovs_idl(struct ovsdb_idl *);
> >>>>>>>>>>>>  void binding_run(struct binding_ctx_in *, struct
> > binding_ctx_out
> >>>>>> *);
> >>>>>>>>>>>>  bool binding_cleanup(struct ovsdb_idl_txn *ovnsb_idl_txn,
> >>>>>>>>>>>> @@ -178,6 +183,10 @@ void binding_dump_local_bindings(struct
> >>>>>>>>>>> local_binding_data *, struct ds *);
> >>>>>>>>>>>>  bool is_additional_chassis(const struct sbrec_port_binding
> > *pb,
> >>>>>>>>>>>>                             const struct sbrec_chassis
> >>>>>> *chassis_rec);
> >>>>>>>>>>>>
> >>>>>>>>>>>> +void set_pb_chassis_in_sbrec(const struct sbrec_port_binding
> > *pb,
> >>>>>>>>>>>> +                             const struct sbrec_chassis
> >>>>>> *chassis_rec,
> >>>>>>>>>>>> +                             bool is_set);
> >>>>>>>>>>>> +
> >>>>>>>>>>>>  /* Corresponds to each Port_Binding.type. */
> >>>>>>>>>>>>  enum en_lport_type {
> >>>>>>>>>>>>      LP_UNKNOWN,
> >>>>>>>>>>>> diff --git a/controller/if-status.c b/controller/if-status.c
> >>>>>>>>>>>> index ad61844d8..7693c289b 100644
> >>>>>>>>>>>> --- a/controller/if-status.c
> >>>>>>>>>>>> +++ b/controller/if-status.c
> >>>>>>>>>>>> @@ -24,6 +24,7 @@
> >>>>>>>>>>>>  #include "lib/util.h"
> >>>>>>>>>>>>  #include "timeval.h"
> >>>>>>>>>>>>  #include "openvswitch/vlog.h"
> >>>>>>>>>>>> +#include "lib/ovn-sb-idl.h"
> >>>>>>>>>>>>
> >>>>>>>>>>>>  VLOG_DEFINE_THIS_MODULE(if_status);
> >>>>>>>>>>>>
> >>>>>>>>>>>> @@ -53,9 +54,11 @@ VLOG_DEFINE_THIS_MODULE(if_status);
> >>>>>>>>>>>>   */
> >>>>>>>>>>>>
> >>>>>>>>>>>>  enum if_state {
> >>>>>>>>>>>> -    OIF_CLAIMED,       /* Newly claimed interface. */
> >>>>>>>>>>>> -    OIF_INSTALL_FLOWS, /* Already claimed interface for
which
> >>>>>> flows
> >>>>>>>> are
> >>>>>>>>>>> still
> >>>>>>>>>>>> -                        * being installed.
> >>>>>>>>>>>> +    OIF_CLAIMED,       /* Newly claimed interface.
> > pb->chassis not
> >>>>>>>> yet
> >>>>>>>>>>> updated.
> >>>>>>>>>>>> +                        */
> >>>>>>>>>>>> +    OIF_INSTALL_FLOWS, /* Claimed interface with pb->chassis
> >>>>>>>>>>> successfully
> >>>>>>>>>>>> +                        * updated in SB and for which flows
> > are
> >>>>>> still
> >>>>>>>>>>> being
> >>>>>>>>>>>> +                        * installed.
> >>>>>>>>>>>>                          */
> >>>>>>>>>>>>      OIF_MARK_UP,       /* Interface with flows successfully
> >>>>>> installed
> >>>>>>>>>>> in OVS
> >>>>>>>>>>>>                          * but not yet marked "up" in the
> > binding
> >>>>>>>> module
> >>>>>>>>>>> (in
> >>>>>>>>>>>> @@ -78,6 +81,55 @@ static const char *if_state_names[] = {
> >>>>>>>>>>>>      [OIF_INSTALLED]     = "INSTALLED",
> >>>>>>>>>>>>  };
> >>>>>>>>>>>>
> >>>>>>>>>>>> +/*
> >>>>>>>>>>>> + *       +----------------------+
> >>>>>>>>>>>> + * +---> |                      |
> >>>>>>>>>>>> + * | +-> |         NULL         |
> >>>>>>>>>>> <--------------------------------------+++-+
> >>>>>>>>>>>> + * | |   +----------------------+
> >>>>>>>>>>>      |
> >>>>>>>>>>>> + * | |     ^ release_iface   | claim_iface
> >>>>>>>>>>>       |
> >>>>>>>>>>>> + * | |     |                 V - sbrec_update_chassis(if sb
> > is rw)
> >>>>>>>>>>>       |
> >>>>>>>>>>>> + * | |   +----------------------+
> >>>>>>>>>>>      |
> >>>>>>>>>>>> + * | |   |                      |
> >>>>>>>>>>> <----------------------------------------+ |
> >>>>>>>>>>>> + * | |   |       CLAIMED        |
> >>>>>>>>>>> <--------------------------------------+ | |
> >>>>>>>>>>>> + * | |   +----------------------+
> >>>>>>>>>>>  | | |
> >>>>>>>>>>>> + * | |                  | mgr_update(when sb is rw)
> >>>>>>>>>>>  | | |
> >>>>>>>>>>>> + * | | release_iface    |  - sbrec_update_chassis
> >>>>>>>>>>>  | | |
> >>>>>>>>>>>> + * | |                  |  - request seqno
> >>>>>>>>>>>   | | |
> >>>>>>>>>>>> + * | |                  V
> >>>>>>>>>>>  | | |
> >>>>>>>>>>>> + * | |   +----------------------+
> >>>>>>>>>>>  | | |
> >>>>>>>>>>>> + * | +-- |                      |  mgr_run(seqno not rcvd)
> >>>>>>>>>>>   | | |
> >>>>>>>>>>>> + * |     |    INSTALL_FLOWS     |   - set port down in sb
> >>>>>>>>>>>  | | |
> >>>>>>>>>>>> + * |     |                      |  mgr_update()
> >>>>>>>>>>>  | | |
> >>>>>>>>>>>> + * |     +----------------------+   - sbrec_update_chassis
if
> >>>>>> needed
> >>>>>>>>>>>   | | |
> >>>>>>>>>>>> + * |                    |
> >>>>>>>>>>>  | | |
> >>>>>>>>>>>> + * |                    |  mgr_run(seqno rcvd)
> >>>>>>>>>>>   | | |
> >>>>>>>>>>>> + * |                    |  - set port up in sb
> >>>>>>>>>>>   | | |
> >>>>>>>>>>>> + * | release_iface      |  - set ovn-installed in ovs
> >>>>>>>>>>>  | | |
> >>>>>>>>>>>> + * |                    V
> >>>>>>>>>>>  | | |
> >>>>>>>>>>>> + * |   +----------------------+
> >>>>>>>>>>>  | | |
> >>>>>>>>>>>> + * |   |                      |  mgr_run()
> >>>>>>>>>>>   | | |
> >>>>>>>>>>>> + * +-- |       MARK_UP        |  - set port up in sb
> >>>>>>>>>>>   | | |
> >>>>>>>>>>>> + *     |                      |  - set ovn-installed in ovs
> >>>>>>>>>>>  | | |
> >>>>>>>>>>>> + *     |                      |  mgr_update()
> >>>>>>>>>>>  | | |
> >>>>>>>>>>>> + *     +----------------------+  - sbrec_update_chassis if
> > needed
> >>>>>>>>>>>  | | |
> >>>>>>>>>>>> + *              |
> >>>>>>>>>>>  | | |
> >>>>>>>>>>>> + *              | mgr_update(rcvd port up / ovn_installed &
> >>>>>> chassis
> >>>>>>>>>>> set) | | |
> >>>>>>>>>>>> + *              V
> >>>>>>>>>>>  | | |
> >>>>>>>>>>>> + *     +----------------------+
> >>>>>>>>>>>  | | |
> >>>>>>>>>>>> + *     |      INSTALLED       | ------------> claim_iface
> >>>>>>>>>>> ---------------+ | |
> >>>>>>>>>>>> + *     +----------------------+
> >>>>>>>>>>>    | |
> >>>>>>>>>>>> + *              |
> >>>>>>>>>>>    | |
> >>>>>>>>>>>> + *              | release_iface
> >>>>>>>>>>>    | |
> >>>>>>>>>>>> + *              V
> >>>>>>>>>>>    | |
> >>>>>>>>>>>> + *     +----------------------+
> >>>>>>>>>>>    | |
> >>>>>>>>>>>> + *     |                      | ------------> claim_iface
> >>>>>>>>>>> -----------------+ |
> >>>>>>>>>>>> + *     |      MARK_DOWN       | ------> mgr_update(rcvd port
> > down)
> >>>>>>>>>>> ----------+
> >>>>>>>>>>>> + *     |                      | mgr_run()
> >>>>>>>>>>>> + *     |                      | - set port down in sb
> >>>>>>>>>>>> + *     |                      | mgr_update()
> >>>>>>>>>>>> + *     +----------------------+ - sbrec_update_chassis(NULL)
> >>>>>>>>>>>> + */
> >>>>>>>>>>>> +
> >>>>>>>>>>>>  struct ovs_iface {
> >>>>>>>>>>>>      char *id;               /* Extracted from OVS
> >>>>>>>>>>> external_ids.iface_id. */
> >>>>>>>>>>>>      enum if_state state;    /* State of the interface in the
> > state
> >>>>>>>>>>> machine. */
> >>>>>>>>>>>> @@ -85,6 +137,7 @@ struct ovs_iface {
> >>>>>>>>>>>>                               * be fully programmed in OVS.
> > Only
> >>>>>> used
> >>>>>>>>>>> in state
> >>>>>>>>>>>>                               * OIF_INSTALL_FLOWS.
> >>>>>>>>>>>>                               */
> >>>>>>>>>>>> +    bool chassis_update_required;  /* If true, pb->chassis
> > must be
> >>>>>>>>>>> updated. */
> >>>>>>>>>>>>  };
> >>>>>>>>>>>>
> >>>>>>>>>>>>  static uint64_t ifaces_usage;
> >>>>>>>>>>>> @@ -158,14 +211,23 @@ if_status_mgr_destroy(struct
> > if_status_mgr
> >>>>>> *mgr)
> >>>>>>>>>>>>  }
> >>>>>>>>>>>>
> >>>>>>>>>>>>  void
> >>>>>>>>>>>> -if_status_mgr_claim_iface(struct if_status_mgr *mgr, const
> > char
> >>>>>>>>>>> *iface_id)
> >>>>>>>>>>>> +if_status_mgr_claim_iface(struct if_status_mgr *mgr,
> >>>>>>>>>>>> +                          const struct sbrec_port_binding
> > *pb,
> >>>>>>>>>>>> +                          const struct sbrec_chassis
> > *chassis_rec,
> >>>>>>>>>>>> +                          bool sb_readonly)
> >>>>>>>>>>>>  {
> >>>>>>>>>>>> +    const char *iface_id = pb->logical_port;
> >>>>>>>>>>>>      struct ovs_iface *iface = shash_find_data(&mgr->ifaces,
> >>>>>>>> iface_id);
> >>>>>>>>>>>>
> >>>>>>>>>>>>      if (!iface) {
> >>>>>>>>>>>>          iface = ovs_iface_create(mgr, iface_id,
OIF_CLAIMED);
> >>>>>>>>>>>>      }
> >>>>>>>>>>>> -
> >>>>>>>>>>>> +    if (!sb_readonly) {
> >>>>>>>>>>>> +        set_pb_chassis_in_sbrec(pb, chassis_rec, true);
> >>>>>>>>>>>> +        iface->chassis_update_required = false;
> >>>>>>>>>>>> +    } else {
> >>>>>>>>>>>> +        iface->chassis_update_required = true;
> >>>>>>>>>>>> +    }
> >>>>>>>>>>>>      switch (iface->state) {
> >>>>>>>>>>>>      case OIF_CLAIMED:
> >>>>>>>>>>>>      case OIF_INSTALL_FLOWS:
> >>>>>>>>>>>> @@ -182,6 +244,12 @@ if_status_mgr_claim_iface(struct
> > if_status_mgr
> >>>>>>>>>>> *mgr, const char *iface_id)
> >>>>>>>>>>>>      }
> >>>>>>>>>>>>  }
> >>>>>>>>>>>>
> >>>>>>>>>>>> +bool
> >>>>>>>>>>>> +if_status_mgr_iface_is_present(struct if_status_mgr *mgr,
> > const
> >>>>>> char
> >>>>>>>>>>> *iface_id)
> >>>>>>>>>>>> +{
> >>>>>>>>>>>> +    return !!shash_find_data(&mgr->ifaces, iface_id);
> >>>>>>>>>>>> +}
> >>>>>>>>>>>> +
> >>>>>>>>>>>>  void
> >>>>>>>>>>>>  if_status_mgr_release_iface(struct if_status_mgr *mgr, const
> > char
> >>>>>>>>>>> *iface_id)
> >>>>>>>>>>>>  {
> >>>>>>>>>>>> @@ -246,9 +314,39 @@ if_status_mgr_delete_iface(struct
> >>>>>> if_status_mgr
> >>>>>>>>>>> *mgr, const char *iface_id)
> >>>>>>>>>>>>      }
> >>>>>>>>>>>>  }
> >>>>>>>>>>>>
> >>>>>>>>>>>> +bool
> >>>>>>>>>>>> +if_status_handle_claims(struct if_status_mgr *mgr,
> >>>>>>>>>>>> +                        struct local_binding_data
> > *binding_data,
> >>>>>>>>>>>> +                        const struct sbrec_chassis
> > *chassis_rec,
> >>>>>>>>>>>> +                        struct hmap *tracked_datapath,
> >>>>>>>>>>>> +                        bool sb_readonly)
> >>>>>>>>>>>> +{
> >>>>>>>>>>>> +    if (!binding_data || sb_readonly) {
> >>>>>>>>>>>> +        return false;
> >>>>>>>>>>>> +    }
> >>>>>>>>>>>> +
> >>>>>>>>>>>> +    struct shash *bindings = &binding_data->bindings;
> >>>>>>>>>>>> +    struct hmapx_node *node;
> >>>>>>>>>>>> +
> >>>>>>>>>>>> +    bool rc = false;
> >>>>>>>>>>>> +    HMAPX_FOR_EACH (node,
> > &mgr->ifaces_per_state[OIF_CLAIMED]) {
> >>>>>>>>>>>> +        struct ovs_iface *iface = node->data;
> >>>>>>>>>>>> +        if (iface->chassis_update_required) {
> >>>>>>>>>>>
> >>>>>>>>>>> Thanks Xavier for the revision. The state machine looks more
> > clear
> >>>>>> now,
> >>>>>>>>>>> but I have a major concern for the use of
> > chassis_update_required.
> >>>>>> This
> >>>>>>>>>>> bool flag is used to decide if an update to SB is needed, and
> > once
> >>>>>> a SB
> >>>>>>>>>>> update is requested, it is set to false, and assumes the SB
> > update
> >>>>>> will
> >>>>>>>>>>> succeed immediately. However, the assumption may be wrong.
> > There
> >>>>>> can be
> >>>>>>>>>>> different kinds of reasons that the subsequent SB update
> > fails, or
> >>>>>>>> delayed,
> >>>>>>>>>>> so this flag is not reliable. Instead, in CLAIMED state, the
> >>>>>>>> responsibility
> >>>>>>>>>>> to make sure the SB update is completed. If the transaction is
> >>>>>>>> in-progress,
> >>>>>>>>>>> the sb_readonly is true. So if sb_readonly is false, it means
> >>>>>> nothing
> >>>>>>>> is
> >>>>>>>>>>> in-progress, so we can always check if (!sb_readonly && <SB
> > chassis
> >>>>>> is
> >>>>>>>> not
> >>>>>>>>>>> updated for the port-binding>) we should just send the update,
> >>>>>>>> regardless
> >>>>>>>>>>> of whether we have requested it before. Please also see
another
> >>>>>> comment
> >>>>>>>>>>> below for the state transition.
> >>>>>>>>>>>
> >>>>>>>>>>>> +            VLOG_INFO("if_status_handle_claims for %s",
> >>>>>> iface->id);
> >>>>>>>>>>>> +            local_binding_set_pb(bindings, iface->id,
> > chassis_rec,
> >>>>>>>>>>>> +                                 tracked_datapath, true);
> >>>>>>>>>>>> +            rc = true;
> >>>>>>>>>>>> +        }
> >>>>>>>>>>>> +        iface->chassis_update_required = false;
> >>>>>>>>>>>> +    }
> >>>>>>>>>>>> +    return rc;
> >>>>>>>>>>>> +}
> >>>>>>>>>>>> +
> >>>>>>>>>>>>  void
> >>>>>>>>>>>>  if_status_mgr_update(struct if_status_mgr *mgr,
> >>>>>>>>>>>> -                     struct local_binding_data
*binding_data)
> >>>>>>>>>>>> +                     struct local_binding_data
*binding_data,
> >>>>>>>>>>>> +                     const struct sbrec_chassis
*chassis_rec,
> >>>>>>>>>>>> +                     bool sb_readonly)
> >>>>>>>>>>>>  {
> >>>>>>>>>>>>      if (!binding_data) {
> >>>>>>>>>>>>          return;
> >>>>>>>>>>>> @@ -257,13 +355,25 @@ if_status_mgr_update(struct
> > if_status_mgr
> >>>>>> *mgr,
> >>>>>>>>>>>>      struct shash *bindings = &binding_data->bindings;
> >>>>>>>>>>>>      struct hmapx_node *node;
> >>>>>>>>>>>>
> >>>>>>>>>>>> +    /* Interfaces in OIF_MARK_UP state have already set
their
> >>>>>>>>>>> pb->chassis.
> >>>>>>>>>>>> +     * However, it might have been reset by another hv.
> >>>>>>>>>>>> +     */
> >>>>>>>>>>>>      /* Move all interfaces that have been confirmed "up" by
> > the
> >>>>>>>> binding
> >>>>>>>>>>> module,
> >>>>>>>>>>>>       * from OIF_MARK_UP to OIF_INSTALLED.
> >>>>>>>>>>>>       */
> >>>>>>>>>>>>      HMAPX_FOR_EACH_SAFE (node,
> >>>>>> &mgr->ifaces_per_state[OIF_MARK_UP]) {
> >>>>>>>>>>>>          struct ovs_iface *iface = node->data;
> >>>>>>>>>>>>
> >>>>>>>>>>>> -        if (local_binding_is_up(bindings, iface->id)) {
> >>>>>>>>>>>> +        if (iface->chassis_update_required) {
> >>>>>>>>>>>> +            if (!sb_readonly) {
> >>>>>>>>>>>> +                iface->chassis_update_required = false;
> >>>>>>>>>>>> +                local_binding_set_pb(bindings, iface->id,
> >>>>>>>> chassis_rec,
> >>>>>>>>>>>> +                                     NULL, true);
> >>>>>>>>>>>> +            } else {
> >>>>>>>>>>>> +                continue;
> >>>>>>>>>>>> +            }
> >>>>>>>>>>>> +        }
> >>>>>>>>>>>> +        if (local_binding_is_up(bindings, iface->id,
> >>>>>> chassis_rec)) {
> >>>>>>>>>>>>              ovs_iface_set_state(mgr, iface, OIF_INSTALLED);
> >>>>>>>>>>>>          }
> >>>>>>>>>>>>      }
> >>>>>>>>>>>> @@ -274,23 +384,53 @@ if_status_mgr_update(struct
> > if_status_mgr
> >>>>>> *mgr,
> >>>>>>>>>>>>      HMAPX_FOR_EACH_SAFE (node,
> >>>>>>>> &mgr->ifaces_per_state[OIF_MARK_DOWN]) {
> >>>>>>>>>>>>          struct ovs_iface *iface = node->data;
> >>>>>>>>>>>>
> >>>>>>>>>>>> -        if (local_binding_is_down(bindings, iface->id)) {
> >>>>>>>>>>>> +        if (!sb_readonly) {
> >>>>>>>>>>>> +            local_binding_set_pb(bindings, iface->id,
> > chassis_rec,
> >>>>>>>>>>>> +                                 NULL, false);
> >>>>>>>>>>>> +        }
> >>>>>>>>>>>> +        if (local_binding_is_down(bindings, iface->id,
> >>>>>> chassis_rec))
> >>>>>>>> {
> >>>>>>>>>>>>              ovs_iface_destroy(mgr, iface);
> >>>>>>>>>>>>          }
> >>>>>>>>>>>>      }
> >>>>>>>>>>>>
> >>>>>>>>>>>> -    /* Register for a notification about flows being
> > installed in
> >>>>>> OVS
> >>>>>>>>>>> for all
> >>>>>>>>>>>> -     * newly claimed interfaces.
> >>>>>>>>>>>> +    if (!sb_readonly) {
> >>>>>>>>>>>> +        HMAPX_FOR_EACH_SAFE (node,
> >>>>>>>>>>> &mgr->ifaces_per_state[OIF_INSTALL_FLOWS]) {
> >>>>>>>>>>>> +            struct ovs_iface *iface = node->data;
> >>>>>>>>>>>> +
> >>>>>>>>>>>> +            if (iface->chassis_update_required) {
> >>>>>>>>>>>> +                iface->chassis_update_required = false;
> >>>>>>>>>>>> +                local_binding_set_pb(bindings, iface->id,
> >>>>>>>> chassis_rec,
> >>>>>>>>>>>> +                                     NULL, true);
> >>>>>>>>>>>> +            }
> >>>>>>>>>>>> +        }
> >>>>>>>>>>>> +    }
> >>>>>>>>>>>> +
> >>>>>>>>>>>> +    /* Update Port_Binding->chassis for newly claimed
> > interfaces
> >>>>>>>>>>>> +     * Register for a notification about flows being
> > installed in
> >>>>>> OVS
> >>>>>>>>>>> for all
> >>>>>>>>>>>> +     * newly claimed interfaces for which we could update
> >>>>>>>> pb->chassis.
> >>>>>>>>>>>>       *
> >>>>>>>>>>>>       * Move them from OIF_CLAIMED to OIF_INSTALL_FLOWS.
> >>>>>>>>>>>>       */
> >>>>>>>>>>>> -    bool new_ifaces = false;
> >>>>>>>>>>>> -    HMAPX_FOR_EACH_SAFE (node,
> >>>>>> &mgr->ifaces_per_state[OIF_CLAIMED]) {
> >>>>>>>>>>>> -        struct ovs_iface *iface = node->data;
> >>>>>>>>>>>>
> >>>>>>>>>>>> -        ovs_iface_set_state(mgr, iface, OIF_INSTALL_FLOWS);
> >>>>>>>>>>>> -        iface->install_seqno = mgr->iface_seqno + 1;
> >>>>>>>>>>>> -        new_ifaces = true;
> >>>>>>>>>>>> +    bool new_ifaces = false;
> >>>>>>>>>>>> +    if (!sb_readonly) {
> >>>>>>>>>>>> +        HMAPX_FOR_EACH_SAFE (node,
> >>>>>>>> &mgr->ifaces_per_state[OIF_CLAIMED])
> >>>>>>>>>>> {
> >>>>>>>>>>>> +            struct ovs_iface *iface = node->data;
> >>>>>>>>>>>> +            /* No need to check for chassis_update_required
> > as
> >>>>>>>> already
> >>>>>>>>>>> done
> >>>>>>>>>>>> +             * in if_status_handle_claims or
> >>>>>>>> if_status_mgr_claim_iface
> >>>>>>>>>>>> +             */
> >>>>>>>>>>>> +            ovs_iface_set_state(mgr, iface,
> > OIF_INSTALL_FLOWS);
> >>>>>>>>>>>
> >>>>>>>>>>> We need to make sure the chassis in SB port-binding is
> > up-to-date
> >>>>>> (i.e.
> >>>>>>>>>>> the update notification from SB DB has been received) before
> > moving
> >>>>>> to
> >>>>>>>>>>> INSTALL_FLOWS. Otherwise, it is still possible that the state
> > is
> >>>>>> moved
> >>>>>>>> too
> >>>>>>>>>>> early and end up with incomplete flow installation for the
> > lport
> >>>>>> when
> >>>>>>>> the
> >>>>>>>>>>> state is finally moved to INSTALLED.
> >>>>>>>>>>>
> >>>>>>>>>>> Thanks,
> >>>>>>>>>>> Han
> >>>>>>>>>>>
> >>>>>>>>>>>> +            iface->install_seqno = mgr->iface_seqno + 1;
> >>>>>>>>>>>> +            new_ifaces = true;
> >>>>>>>>>>>> +        }
> >>>>>>>>>>>> +    } else {
> >>>>>>>>>>>> +        HMAPX_FOR_EACH_SAFE (node,
> >>>>>>>> &mgr->ifaces_per_state[OIF_CLAIMED])
> >>>>>>>>>>> {
> >>>>>>>>>>>> +            struct ovs_iface *iface = node->data;
> >>>>>>>>>>>> +            static struct vlog_rate_limit rl =
> >>>>>>>> VLOG_RATE_LIMIT_INIT(5,
> >>>>>>>>>>> 1);
> >>>>>>>>>>>> +            VLOG_INFO_RL(&rl,
> >>>>>>>>>>>> +                         "Not updating pb chassis for %s now
> > as "
> >>>>>>>>>>>> +                         "sb is readonly", iface->id);
> >>>>>>>>>>>> +        }
> >>>>>>>>>>>>      }
> >>>>>>>>>>>>
> >>>>>>>>>>>>      /* Request a seqno update when the flows for new
> > interfaces
> >>>>>> have
> >>>>>>>>>>> been
> >>>>>>>>>>>> @@ -403,7 +543,7 @@ if_status_mgr_update_bindings(struct
> >>>>>> if_status_mgr
> >>>>>>>>>>> *mgr,
> >>>>>>>>>>>>      struct hmapx_node *node;
> >>>>>>>>>>>>
> >>>>>>>>>>>>      /* Notify the binding module to set "down" all bindings
> > that
> >>>>>> are
> >>>>>>>>>>> still
> >>>>>>>>>>>> -     * in the process of being installed in OVS, i.e., are
> > not yet
> >>>>>>>>>>> instsalled.
> >>>>>>>>>>>> +     * in the process of being installed in OVS, i.e., are
> > not yet
> >>>>>>>>>>> installed.
> >>>>>>>>>>>>       */
> >>>>>>>>>>>>      HMAPX_FOR_EACH (node,
> >>>>>> &mgr->ifaces_per_state[OIF_INSTALL_FLOWS])
> >>>>>>>> {
> >>>>>>>>>>>>          struct ovs_iface *iface = node->data;
> >>>>>>>>>>>> diff --git a/controller/if-status.h b/controller/if-status.h
> >>>>>>>>>>>> index bb8a3950d..f9b05d30d 100644
> >>>>>>>>>>>> --- a/controller/if-status.h
> >>>>>>>>>>>> +++ b/controller/if-status.h
> >>>>>>>>>>>> @@ -27,15 +27,27 @@ struct if_status_mgr
> >>>>>> *if_status_mgr_create(void);
> >>>>>>>>>>>>  void if_status_mgr_clear(struct if_status_mgr *);
> >>>>>>>>>>>>  void if_status_mgr_destroy(struct if_status_mgr *);
> >>>>>>>>>>>>
> >>>>>>>>>>>> -void if_status_mgr_claim_iface(struct if_status_mgr *, const
> > char
> >>>>>>>>>>> *iface_id);
> >>>>>>>>>>>> +void if_status_mgr_claim_iface(struct if_status_mgr *,
> >>>>>>>>>>>> +                               const struct
> > sbrec_port_binding
> >>>>>> *pb,
> >>>>>>>>>>>> +                               const struct sbrec_chassis
> >>>>>>>> *chassis_rec,
> >>>>>>>>>>>> +                               bool sb_readonly);
> >>>>>>>>>>>>  void if_status_mgr_release_iface(struct if_status_mgr *,
> > const
> >>>>>> char
> >>>>>>>>>>> *iface_id);
> >>>>>>>>>>>>  void if_status_mgr_delete_iface(struct if_status_mgr *,
> > const char
> >>>>>>>>>>> *iface_id);
> >>>>>>>>>>>>
> >>>>>>>>>>>> -void if_status_mgr_update(struct if_status_mgr *, struct
> >>>>>>>>>>> local_binding_data *);
> >>>>>>>>>>>> +void if_status_mgr_update(struct if_status_mgr *, struct
> >>>>>>>>>>> local_binding_data *,
> >>>>>>>>>>>> +                          const struct sbrec_chassis
> > *chassis,
> >>>>>>>>>>>> +                          bool sb_readonly);
> >>>>>>>>>>>>  void if_status_mgr_run(struct if_status_mgr *mgr, struct
> >>>>>>>>>>> local_binding_data *,
> >>>>>>>>>>>>                         const struct sbrec_chassis *,
> >>>>>>>>>>>>                         bool sb_readonly, bool ovs_readonly);
> >>>>>>>>>>>>  void if_status_mgr_get_memory_usage(struct if_status_mgr
> > *mgr,
> >>>>>>>>>>>>                                      struct simap *usage);
> >>>>>>>>>>>> +bool if_status_mgr_iface_is_present(struct if_status_mgr
> > *mgr,
> >>>>>>>>>>>> +                                    const char *iface_id);
> >>>>>>>>>>>> +bool if_status_handle_claims(struct if_status_mgr *mgr,
> >>>>>>>>>>>> +                             struct local_binding_data
> >>>>>> *binding_data,
> >>>>>>>>>>>> +                             const struct sbrec_chassis
> >>>>>> *chassis_rec,
> >>>>>>>>>>>> +                             struct hmap *tracked_datapath,
> >>>>>>>>>>>> +                             bool sb_readonly);
> >>>>>>>>>>>>
> >>>>>>>>>>>>  # endif /* controller/if-status.h */
> >>>>>>>>>>>> diff --git a/controller/ovn-controller.c
> >>>>>> b/controller/ovn-controller.c
> >>>>>>>>>>>> index 69615308e..3947baf03 100644
> >>>>>>>>>>>> --- a/controller/ovn-controller.c
> >>>>>>>>>>>> +++ b/controller/ovn-controller.c
> >>>>>>>>>>>> @@ -1464,6 +1464,73 @@ en_runtime_data_run(struct engine_node
> >>>>>> *node,
> >>>>>>>>>>> void *data)
> >>>>>>>>>>>>      engine_set_node_state(node, EN_UPDATED);
> >>>>>>>>>>>>  }
> >>>>>>>>>>>>
> >>>>>>>>>>>> +struct ed_type_sb_ro {
> >>>>>>>>>>>> +    bool sb_readonly;
> >>>>>>>>>>>> +};
> >>>>>>>>>>>> +
> >>>>>>>>>>>> +static void *
> >>>>>>>>>>>> +en_sb_ro_init(struct engine_node *node OVS_UNUSED,
> >>>>>>>>>>>> +              struct engine_arg *arg OVS_UNUSED)
> >>>>>>>>>>>> +{
> >>>>>>>>>>>> +    struct ed_type_sb_ro *data = xzalloc(sizeof *data);
> >>>>>>>>>>>> +    return data;
> >>>>>>>>>>>> +}
> >>>>>>>>>>>> +
> >>>>>>>>>>>> +static void
> >>>>>>>>>>>> +en_sb_ro_run(struct engine_node *node, void *data)
> >>>>>>>>>>>> +{
> >>>>>>>>>>>> +    struct ed_type_sb_ro *sb_ro_data = data;
> >>>>>>>>>>>> +    bool sb_readonly = !engine_get_context()->ovnsb_idl_txn;
> >>>>>>>>>>>> +    if (sb_ro_data->sb_readonly != sb_readonly) {
> >>>>>>>>>>>> +        sb_ro_data->sb_readonly = sb_readonly;
> >>>>>>>>>>>> +        if (!sb_ro_data->sb_readonly) {
> >>>>>>>>>>>> +            engine_set_node_state(node, EN_UPDATED);
> >>>>>>>>>>>> +        }
> >>>>>>>>>>>> +    }
> >>>>>>>>>>>> +}
> >>>>>>>>>>>> +
> >>>>>>>>>>>> +static void
> >>>>>>>>>>>> +en_sb_ro_cleanup(void *data OVS_UNUSED)
> >>>>>>>>>>>> +{
> >>>>>>>>>>>> +}
> >>>>>>>>>>>> +
> >>>>>>>>>>>> +static bool
> >>>>>>>>>>>> +runtime_data_sb_ro_handler(struct engine_node *node, void
> > *data)
> >>>>>>>>>>>> +{
> >>>>>>>>>>>> +    const struct sbrec_chassis *chassis = NULL;
> >>>>>>>>>>>> +
> >>>>>>>>>>>> +    struct ovsrec_open_vswitch_table *ovs_table =
> >>>>>>>>>>>> +        (struct ovsrec_open_vswitch_table *)EN_OVSDB_GET(
> >>>>>>>>>>>> +            engine_get_input("OVS_open_vswitch", node));
> >>>>>>>>>>>> +
> >>>>>>>>>>>> +    const char *chassis_id = get_ovs_chassis_id(ovs_table);
> >>>>>>>>>>>> +
> >>>>>>>>>>>> +    struct ovsdb_idl_index *sbrec_chassis_by_name =
> >>>>>>>>>>>> +        engine_ovsdb_node_get_index(
> >>>>>>>>>>>> +                engine_get_input("SB_chassis", node),
> >>>>>>>>>>>> +                "name");
> >>>>>>>>>>>> +
> >>>>>>>>>>>> +    if (chassis_id) {
> >>>>>>>>>>>> +        chassis =
> > chassis_lookup_by_name(sbrec_chassis_by_name,
> >>>>>>>>>>> chassis_id);
> >>>>>>>>>>>> +    }
> >>>>>>>>>>>> +    if (chassis) {
> >>>>>>>>>>>> +        struct ed_type_runtime_data *rt_data = data;
> >>>>>>>>>>>> +        bool sb_readonly =
> > !engine_get_context()->ovnsb_idl_txn;
> >>>>>>>>>>>> +        struct controller_engine_ctx *ctrl_ctx =
> >>>>>>>>>>>> +            engine_get_context()->client_ctx;
> >>>>>>>>>>>> +
> >>>>>>>>>>>> +        if (if_status_handle_claims(ctrl_ctx->if_mgr,
> >>>>>>>>>>>> +                                    &rt_data->lbinding_data,
> >>>>>>>>>>>> +                                    chassis,
> >>>>>>>>>>>> +
> >  &rt_data->tracked_dp_bindings,
> >>>>>>>>>>>> +                                    sb_readonly)) {
> >>>>>>>>>>>> +            engine_set_node_state(node, EN_UPDATED);
> >>>>>>>>>>>> +            rt_data->tracked = true;
> >>>>>>>>>>>> +        }
> >>>>>>>>>>>> +    }
> >>>>>>>>>>>> +    return true;
> >>>>>>>>>>>> +}
> >>>>>>>>>>>> +
> >>>>>>>>>>>>  static bool
> >>>>>>>>>>>>  runtime_data_ovs_interface_shadow_handler(struct engine_node
> >>>>>> *node,
> >>>>>>>>>>> void *data)
> >>>>>>>>>>>>  {
> >>>>>>>>>>>> @@ -3528,6 +3595,7 @@ main(int argc, char *argv[])
> >>>>>>>>>>>>      stopwatch_create(VIF_PLUG_RUN_STOPWATCH_NAME, SW_MS);
> >>>>>>>>>>>>
> >>>>>>>>>>>>      /* Define inc-proc-engine nodes. */
> >>>>>>>>>>>> +    ENGINE_NODE(sb_ro, "sb_ro");
> >>>>>>>>>>>>      ENGINE_NODE_WITH_CLEAR_TRACK_DATA_IS_VALID(ct_zones,
> >>>>>> "ct_zones");
> >>>>>>>>>>>>      ENGINE_NODE_WITH_CLEAR_TRACK_DATA(ovs_interface_shadow,
> >>>>>>>>>>>>
> >  "ovs_interface_shadow");
> >>>>>>>>>>>> @@ -3664,6 +3732,7 @@ main(int argc, char *argv[])
> >>>>>>>>>>>>      engine_add_input(&en_ovs_interface_shadow,
> > &en_ovs_interface,
> >>>>>>>>>>>>
> > ovs_interface_shadow_ovs_interface_handler);
> >>>>>>>>>>>>
> >>>>>>>>>>>> +    engine_add_input(&en_runtime_data, &en_sb_ro,
> >>>>>>>>>>> runtime_data_sb_ro_handler);
> >>>>>>>>>>>>      engine_add_input(&en_runtime_data,
> > &en_ofctrl_is_connected,
> >>>>>>>> NULL);
> >>>>>>>>>>>>
> >>>>>>>>>>>>      engine_add_input(&en_runtime_data, &en_ovs_open_vswitch,
> >>>>>> NULL);
> >>>>>>>>>>>> @@ -4098,7 +4167,8 @@ main(int argc, char *argv[])
> >>>>>>>>>>>>                          runtime_data ?
> >>>>>> &runtime_data->lbinding_data :
> >>>>>>>>>>> NULL;
> >>>>>>>>>>>>
> >>>>>>>>  stopwatch_start(IF_STATUS_MGR_UPDATE_STOPWATCH_NAME,
> >>>>>>>>>>>>                                      time_msec());
> >>>>>>>>>>>> -                    if_status_mgr_update(if_mgr,
> > binding_data);
> >>>>>>>>>>>> +                    if_status_mgr_update(if_mgr,
> > binding_data,
> >>>>>>>> chassis,
> >>>>>>>>>>>> +                                         !ovnsb_idl_txn);
> >>>>>>>>>>>>
> >>>>>>>>  stopwatch_stop(IF_STATUS_MGR_UPDATE_STOPWATCH_NAME,
> >>>>>>>>>>>>                                     time_msec());
> >>>>>>>>>>>>
> >>>>>>>>>>>> diff --git a/tests/ovn-macros.at b/tests/ovn-macros.at
> >>>>>>>>>>>> index 335f9158c..8fd6ae6f7 100644
> >>>>>>>>>>>> --- a/tests/ovn-macros.at
> >>>>>>>>>>>> +++ b/tests/ovn-macros.at
> >>>>>>>>>>>> @@ -759,3 +759,15 @@
> >>>>>>>> m4_define([OVN_FOR_EACH_NORTHD_WITHOUT_DP_GROUPS],
> >>>>>>>>>>>>       [m4_foreach([NORTHD_USE_DP_GROUPS], [no],
> >>>>>>>>>>>>         [m4_foreach([NORTHD_USE_PARALLELIZATION], [yes, no],
> > [$1
> >>>>>>>>>>>>  ])])])])
> >>>>>>>>>>>> +
> >>>>>>>>>>>> +# OVN_NBCTL(NBCTL_COMMAND) adds NBCTL_COMMAND to list of
> > commands
> >>>>>> to
> >>>>>>>> be
> >>>>>>>>>>> run by RUN_OVN_NBCTL().
> >>>>>>>>>>>> +m4_define([OVN_NBCTL], [
> >>>>>>>>>>>> +    command="${command} -- $1"
> >>>>>>>>>>>> +])
> >>>>>>>>>>>> +
> >>>>>>>>>>>> +# RUN_OVN_NBCTL() executes list of commands built by the
> >>>>>> OVN_NBCTL()
> >>>>>>>>>>> macro.
> >>>>>>>>>>>> +m4_define([RUN_OVN_NBCTL], [
> >>>>>>>>>>>> +    check ovn-nbctl ${command}
> >>>>>>>>>>>> +    unset command
> >>>>>>>>>>>> +])
> >>>>>>>>>>>> +
> >>>>>>>>>>>> diff --git a/tests/ovn.at b/tests/ovn.at
> >>>>>>>>>>>> index bfaa41962..94d16bac9 100644
> >>>>>>>>>>>> --- a/tests/ovn.at
> >>>>>>>>>>>> +++ b/tests/ovn.at
> >>>>>>>>>>>> @@ -102,6 +102,18 @@ m4_divert_text([PREPARE_TESTS],
> >>>>>>>>>>>>           test 1 -le $(as $hv1 ovs-ofctl dump-flows br-int |
> > grep
> >>>>>> -c
> >>>>>>>>>>> "output:$ofport")
> >>>>>>>>>>>>       ])
> >>>>>>>>>>>>     }
> >>>>>>>>>>>> +
> >>>>>>>>>>>> +   ovn_wait_remote_input_flows () {
> >>>>>>>>>>>> +     hv1=$1
> >>>>>>>>>>>> +     hv2=$2
> >>>>>>>>>>>> +     echo "$3: waiting for flows for remote input on $hv1"
> >>>>>>>>>>>> +     # Wait for a flow outputing  to remote input
> >>>>>>>>>>>> +     OVS_WAIT_UNTIL([
> >>>>>>>>>>>> +         ofport=$(as $hv1 ovs-vsctl --bare --columns ofport
> > find
> >>>>>>>>>>> Interface name=ovn-${hv2}-0)
> >>>>>>>>>>>> +         echo "tunnel port=$ofport"
> >>>>>>>>>>>> +         test 1 -le $(as $hv1 ovs-ofctl dump-flows br-int |
> > grep
> >>>>>> -c
> >>>>>>>>>>> "in_port=$ofport")
> >>>>>>>>>>>> +     ])
> >>>>>>>>>>>> +   }
> >>>>>>>>>>>>  ])
> >>>>>>>>>>>>
> >>>>>>>>>>>>  m4_define([OVN_CHECK_PACKETS],
> >>>>>>>>>>>> @@ -127,6 +139,8 @@ m4_define([OVN_WAIT_PATCH_PORT_FLOWS],
> >>>>>>>>>>>>  m4_define([OVN_WAIT_REMOTE_OUTPUT_FLOWS],
> >>>>>>>>>>>>    [ovn_wait_remote_output_flows "$1" "$2"
> > "__file__:__line__"])
> >>>>>>>>>>>>
> >>>>>>>>>>>> +m4_define([OVN_WAIT_REMOTE_INPUT_FLOWS],
> >>>>>>>>>>>> +  [ovn_wait_remote_input_flows "$1" "$2"
> > "__file__:__line__"])
> >>>>>>>>>>>>
> >>>>>>>>>>>>  AT_BANNER([OVN components])
> >>>>>>>>>>>>
> >>>>>>>>>>>> @@ -14056,6 +14070,11 @@ wait_column "$hv1_uuid" Port_Binding
> >>>>>>>>>>> requested_chassis logical_port=lsp0
> >>>>>>>>>>>>  wait_column "$hv2_uuid" Port_Binding additional_chassis
> >>>>>>>>>>> logical_port=lsp0
> >>>>>>>>>>>>  wait_column "$hv2_uuid" Port_Binding
> > requested_additional_chassis
> >>>>>>>>>>> logical_port=lsp0
> >>>>>>>>>>>>
> >>>>>>>>>>>> +# Check ovn-installed updated for main chassis
> >>>>>>>>>>>> +wait_for_ports_up
> >>>>>>>>>>>> +OVS_WAIT_UNTIL([test `as hv1 ovs-vsctl get Interface lsp0
> >>>>>>>>>>> external_ids:ovn-installed` = '"true"'])
> >>>>>>>>>>>> +OVS_WAIT_UNTIL([test x`as hv2 ovs-vsctl get Interface lsp0
> >>>>>>>>>>> external_ids:ovn-installed` = x])
> >>>>>>>>>>>> +
> >>>>>>>>>>>>  # Check that setting iface:encap-ip populates
> >>>>>>>>>>> Port_Binding:additional_encap
> >>>>>>>>>>>>  wait_row_count Encap 2 chassis_name=hv1
> >>>>>>>>>>>>  wait_row_count Encap 2 chassis_name=hv2
> >>>>>>>>>>>> @@ -14081,6 +14100,11 @@ wait_column "$hv2_uuid" Port_Binding
> >>>>>>>>>>> requested_chassis logical_port=lsp0
> >>>>>>>>>>>>  wait_column "" Port_Binding additional_chassis
> > logical_port=lsp0
> >>>>>>>>>>>>  wait_column "" Port_Binding requested_additional_chassis
> >>>>>>>>>>> logical_port=lsp0
> >>>>>>>>>>>>
> >>>>>>>>>>>> +# Check ovn-installed updated for main chassis and not for
> > other
> >>>>>>>> chassis
> >>>>>>>>>>>> +wait_for_ports_up
> >>>>>>>>>>>> +OVS_WAIT_UNTIL([test `as hv2 ovs-vsctl get Interface lsp0
> >>>>>>>>>>> external_ids:ovn-installed` = '"true"'])
> >>>>>>>>>>>> +OVS_WAIT_UNTIL([test x`as hv1 ovs-vsctl get Interface lsp0
> >>>>>>>>>>> external_ids:ovn-installed` = x])
> >>>>>>>>>>>> +
> >>>>>>>>>>>>  # Check that additional_encap is cleared
> >>>>>>>>>>>>  wait_column "" Port_Binding additional_encap
> > logical_port=lsp0
> >>>>>>>>>>>>
> >>>>>>>>>>>> @@ -15370,7 +15394,10 @@ AT_CHECK([as hv1 ovs-ofctl
dump-flows
> >>>>>> br-int
> >>>>>>>>>>> table=65 | grep actions=output:1],
> >>>>>>>>>>>>  echo "verifying that lsp0 binding moves when
> > requested-chassis is
> >>>>>>>>>>> changed"
> >>>>>>>>>>>>
> >>>>>>>>>>>>  ovn-nbctl lsp-set-options lsp0 requested-chassis=hv2
> >>>>>>>>>>>> -OVS_WAIT_UNTIL([test 1 = $(grep -c "Releasing lport lsp0
> > from this
> >>>>>>>>>>> chassis" hv1/ovn-controller.log)])
> >>>>>>>>>>>> +
> >>>>>>>>>>>> +# We might see multiple "Releasing lport ...", when sb is
> > read
> >>>>>> only
> >>>>>>>>>>>> +OVS_WAIT_UNTIL([test 1 -le $(grep -c "Releasing lport lsp0
> > from
> >>>>>> this
> >>>>>>>>>>> chassis" hv1/ovn-controller.log)])
> >>>>>>>>>>>> +
> >>>>>>>>>>>>  wait_column "$hv2_uuid" Port_Binding chassis
> > logical_port=lsp0
> >>>>>>>>>>>>
> >>>>>>>>>>>>  # (6) Chassis hv2 should add flows and hv1 should not.
> >>>>>>>>>>>> @@ -15416,7 +15443,7 @@ AT_CHECK([as hv1 ovs-ofctl dump-flows
> >>>>>> br-int
> >>>>>>>>>>> table=0 | grep in_port=1], [0], [ig
> >>>>>>>>>>>>  AT_CHECK([as hv1 ovs-ofctl dump-flows br-int table=65 | grep
> >>>>>>>>>>> actions=output:1], [0], [ignore])
> >>>>>>>>>>>>
> >>>>>>>>>>>>  check ovn-nbctl --wait=hv lsp-set-options lsp0
> >>>>>>>>>>> requested-chassis=non-existant-chassis
> >>>>>>>>>>>> -OVS_WAIT_UNTIL([test 1 = $(grep -c "Releasing lport lsp0
> > from this
> >>>>>>>>>>> chassis" hv1/ovn-controller.log)])
> >>>>>>>>>>>> +OVS_WAIT_UNTIL([test 1 -le $(grep -c "Releasing lport lsp0
> > from
> >>>>>> this
> >>>>>>>>>>> chassis" hv1/ovn-controller.log)])
> >>>>>>>>>>>>  check ovn-nbctl --wait=hv sync
> >>>>>>>>>>>>  wait_column '' Port_Binding chasssi logical_port=lsp0
> >>>>>>>>>>>>  AT_CHECK([as hv1 ovs-ofctl dump-flows br-int table=0 | grep
> >>>>>>>> in_port=1],
> >>>>>>>>>>> [1], [])
> >>>>>>>>>>>> @@ -32418,3 +32445,119 @@ AT_CHECK([test $(ovn-sbctl list fdb
> > |
> >>>>>> grep
> >>>>>>>> -c
> >>>>>>>>>>> "00:00:00:00:10:30") = 0])
> >>>>>>>>>>>>  OVN_CLEANUP([hv1])
> >>>>>>>>>>>>  AT_CLEANUP
> >>>>>>>>>>>>  ])
> >>>>>>>>>>>> +
> >>>>>>>>>>>> +OVN_FOR_EACH_NORTHD([
> >>>>>>>>>>>> +AT_SETUP([recomputes])
> >>>>>>>>>>>> +ovn_start
> >>>>>>>>>>>> +
> >>>>>>>>>>>> +n_hv=4
> >>>>>>>>>>>> +
> >>>>>>>>>>>> +# Add chassis
> >>>>>>>>>>>> +net_add n1
> >>>>>>>>>>>> +for i in $(seq 1 $n_hv); do
> >>>>>>>>>>>> +    sim_add hv$i
> >>>>>>>>>>>> +    as hv$i
> >>>>>>>>>>>> +    check ovs-vsctl add-br br-phys
> >>>>>>>>>>>> +    ovs-vsctl set open .
> >>>>>>>> external-ids:ovn-bridge-mappings=phys:br-phys
> >>>>>>>>>>>> +    ovn_attach n1 br-phys 192.168.0.$i 24 geneve
> >>>>>>>>>>>> +done
> >>>>>>>>>>>> +
> >>>>>>>>>>>> +add_switch_ports() {
> >>>>>>>>>>>> +    start_port=$1
> >>>>>>>>>>>> +    end_port=$2
> >>>>>>>>>>>> +    nb_hv=$3
> >>>>>>>>>>>> +    bulk_size=$4
> >>>>>>>>>>>> +    for ((i=start_port; i<end_port; )) do
> >>>>>>>>>>>> +        start_bulk=$i
> >>>>>>>>>>>> +        for hv in $(seq 1 $nb_hv); do
> >>>>>>>>>>>> +            end_bulk=$((start_bulk+bulk_size-1))
> >>>>>>>>>>>> +            for port in $(seq $start_bulk $end_bulk); do
> >>>>>>>>>>>> +                logical_switch_port=lsp${port}
> >>>>>>>>>>>> +                OVN_NBCTL(lsp-add ls1 $logical_switch_port)
> >>>>>>>>>>>> +                OVN_NBCTL(lsp-set-addresses
> > $logical_switch_port
> >>>>>>>>>>> dynamic)
> >>>>>>>>>>>> +            done
> >>>>>>>>>>>> +            start_bulk=$((end_bulk+1))
> >>>>>>>>>>>> +        done
> >>>>>>>>>>>> +        RUN_OVN_NBCTL()
> >>>>>>>>>>>> +
> >>>>>>>>>>>> +        start_bulk=$i
> >>>>>>>>>>>> +        for hv in $(seq 1 $nb_hv); do
> >>>>>>>>>>>> +            end_bulk=$((start_bulk+bulk_size-1))
> >>>>>>>>>>>> +            for port in $(seq $start_bulk $end_bulk); do
> >>>>>>>>>>>> +                logical_switch_port=lsp${port}
> >>>>>>>>>>>> +                as hv$hv ovs-vsctl \
> >>>>>>>>>>>> +                    --no-wait -- add-port br-int vif${port}
\
> >>>>>>>>>>>> +                    -- set Interface vif${port}
> >>>>>>>>>>> external_ids:iface-id=$logical_switch_port
> >>>>>>>>>>>> +            done
> >>>>>>>>>>>> +            start_bulk=$((end_bulk+1))
> >>>>>>>>>>>> +        done
> >>>>>>>>>>>> +        i=$((end_bulk+1))
> >>>>>>>>>>>> +    done
> >>>>>>>>>>>> +}
> >>>>>>>>>>>> +check ovn-nbctl ls-add ls1
> >>>>>>>>>>>> +check ovn-nbctl set Logical_Switch ls1 other_config:subnet=
> >>>>>>>> 10.1.0.0/16
> >>>>>>>>>>>> +check ovn-nbctl set Logical_Switch ls1
> >>>>>>>>>>> other_config:exclude_ips=10.1.255.254
> >>>>>>>>>>>> +
> >>>>>>>>>>>> +check ovn-nbctl lr-add lr1
> >>>>>>>>>>>> +check ovn-nbctl lsp-add ls1 lsp0 -- set Logical_Switch_Port
> > lsp0
> >>>>>>>>>>> type=router options:router-port=lrp0 addresses=dynamic
> >>>>>>>>>>>> +check ovn-nbctl lrp-add lr1 lrp0 "f0:00:00:01:00:01"
> >>>>>> 10.1.255.254/16
> >>>>>>>>>>>> +check ovn-nbctl lr-nat-add lr1 snat 10.2.0.1 10.1.0.0/16
> >>>>>>>>>>>> +
> >>>>>>>>>>>> +lflow_run=0
> >>>>>>>>>>>> +check ovn-nbctl --wait=hv sync
> >>>>>>>>>>>> +
> >>>>>>>>>>>> +# Tunnel ports might not be added (yet) at this point on
slow
> >>>>>> system.
> >>>>>>>>>>>> +# Wait for flows related to such ports to ensure those ports
> > have
> >>>>>>>> been
> >>>>>>>>>>> added
> >>>>>>>>>>>> +# before we measure recomputes. Otherwise, ovs_interface
> > handler
> >>>>>>>> might
> >>>>>>>>>>> be run
> >>>>>>>>>>>> +# afterwards for tunnel ports, causing recomputes.
> >>>>>>>>>>>> +for i in $(seq 1 $n_hv); do
> >>>>>>>>>>>> +    for j in $(seq 1 $n_hv); do
> >>>>>>>>>>>> +        if test $i != $j; then
> >>>>>>>>>>>> +            OVN_WAIT_REMOTE_INPUT_FLOWS(["hv$i"],["hv$j"])
> >>>>>>>>>>>> +        fi
> >>>>>>>>>>>> +    done
> >>>>>>>>>>>> +done
> >>>>>>>>>>>> +
> >>>>>>>>>>>> +for i in $(seq 1 $n_hv); do
> >>>>>>>>>>>> +    as hv$i
> >>>>>>>>>>>> +    lflow_run1=$(ovn-appctl -t ovn-controller
> >>>>>> coverage/read-counter
> >>>>>>>>>>> lflow_run)
> >>>>>>>>>>>> +    lflow_run=`expr $lflow_run1 + $lflow_run`
> >>>>>>>>>>>> +done
> >>>>>>>>>>>> +
> >>>>>>>>>>>> +add_switch_ports 1 1000 $n_hv 5
> >>>>>>>>>>>> +
> >>>>>>>>>>>> +wait_for_ports_up
> >>>>>>>>>>>> +check ovn-nbctl --wait=hv sync
> >>>>>>>>>>>> +
> >>>>>>>>>>>> +for i in $(seq 1 $n_hv); do
> >>>>>>>>>>>> +    pid=$(cat hv${i}/ovn-controller.pid)
> >>>>>>>>>>>> +    u=$((u+$(cat "/proc/$pid/stat" | awk '{print $14}')))
> >>>>>>>>>>>> +    s=$((s+$(cat "/proc/$pid/stat" | awk '{print $15}')))
> >>>>>>>>>>>> +done
> >>>>>>>>>>>> +
> >>>>>>>>>>>> +n_pid=$(cat northd/ovn-northd.pid)
> >>>>>>>>>>>> +n_u=$(cat "/proc/$pid/stat" | awk '{print $14}')
> >>>>>>>>>>>> +n_s=$(cat "/proc/$pid/stat" | awk '{print $15}')
> >>>>>>>>>>>> +
> >>>>>>>>>>>> +echo "Total Northd User Time: $n_u"
> >>>>>>>>>>>> +echo "Total Northd System Time: $n_s"
> >>>>>>>>>>>> +echo "Total Controller User Time: $u"
> >>>>>>>>>>>> +echo "Total Controller System Time: $s"
> >>>>>>>>>>>> +
> >>>>>>>>>>>> +lflow_run_end=0
> >>>>>>>>>>>> +for i in $(seq 1 $n_hv); do
> >>>>>>>>>>>> +    as hv$i
> >>>>>>>>>>>> +    lflow_run1=$(ovn-appctl -t ovn-controller
> >>>>>> coverage/read-counter
> >>>>>>>>>>> lflow_run)
> >>>>>>>>>>>> +    lflow_run_end=`expr $lflow_run1 + $lflow_run_end`
> >>>>>>>>>>>> +done
> >>>>>>>>>>>> +n_recomputes=`expr $lflow_run_end - $lflow_run`
> >>>>>>>>>>>> +echo "$n_recomputes recomputes"
> >>>>>>>>>>>> +
> >>>>>>>>>>>> +AT_CHECK([test $lflow_run_end == $lflow_run])
> >>>>>>>>>>>> +
> >>>>>>>>>>>> +for i in $(seq 2 $n_hv); do
> >>>>>>>>>>>> +    OVN_CLEANUP_SBOX([hv$i])
> >>>>>>>>>>>> +done
> >>>>>>>>>>>> +OVN_CLEANUP([hv1])
> >>>>>>>>>>>> +AT_CLEANUP
> >>>>>>>>>>>> +])
> >>>>>>>>>>>> diff --git a/tests/perf-northd.at b/tests/perf-northd.at
> >>>>>>>>>>>> index 74b69e9d4..6ec196b36 100644
> >>>>>>>>>>>> --- a/tests/perf-northd.at
> >>>>>>>>>>>> +++ b/tests/perf-northd.at
> >>>>>>>>>>>> @@ -76,23 +76,6 @@ m4_define([PERF_RECORD_STOP], [
> >>>>>>>>>>>>      PERF_RECORD_STOPWATCH(ovn-northd-loop, ["Short term
> > average"],
> >>>>>>>>>>> [Average (northd-loop in msec)])
> >>>>>>>>>>>>  ])
> >>>>>>>>>>>>
> >>>>>>>>>>>> -# OVN_NBCTL([NBCTL_COMMAND])
> >>>>>>>>>>>> -#
> >>>>>>>>>>>> -# Add NBCTL_COMMAND to list of commands to be run by
> >>>>>> RUN_OVN_NBCTL().
> >>>>>>>>>>>> -#
> >>>>>>>>>>>> -m4_define([OVN_NBCTL], [
> >>>>>>>>>>>> -    command="${command} -- $1"
> >>>>>>>>>>>> -])
> >>>>>>>>>>>> -
> >>>>>>>>>>>> -# RUN_OVN_NBCTL()
> >>>>>>>>>>>> -#
> >>>>>>>>>>>> -# Execute list of commands built by the OVN_NBCTL() macro.
> >>>>>>>>>>>> -#
> >>>>>>>>>>>> -m4_define([RUN_OVN_NBCTL], [
> >>>>>>>>>>>> -    check ovn-nbctl ${command}
> >>>>>>>>>>>> -    unset command
> >>>>>>>>>>>> -])
> >>>>>>>>>>>> -
> >>>>>>>>>>>>  OVS_START_SHELL_HELPERS
> >>>>>>>>>>>>  generate_subnet () {
> >>>>>>>>>>>>      local a=$(printf %d $(expr $1 / 256 + 10))
> >>>>>>>>>>>> --
> >>>>>>>>>>>> 2.31.1
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>>
> >>> _______________________________________________
> >>> dev mailing list
> >>> dev@openvswitch.org
> >>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> >>>
> >
>
diff mbox series

Patch

diff --git a/controller/binding.c b/controller/binding.c
index 2279570f9..b21577f71 100644
--- a/controller/binding.c
+++ b/controller/binding.c
@@ -644,11 +644,17 @@  local_binding_get_lport_ofport(const struct shash *local_bindings,
 }
 
 bool
-local_binding_is_up(struct shash *local_bindings, const char *pb_name)
+local_binding_is_up(struct shash *local_bindings, const char *pb_name,
+                    const struct sbrec_chassis *chassis_rec)
 {
     struct local_binding *lbinding =
         local_binding_find(local_bindings, pb_name);
     struct binding_lport *b_lport = local_binding_get_primary_lport(lbinding);
+
+    if (b_lport && b_lport->pb->chassis != chassis_rec) {
+        return false;
+    }
+
     if (lbinding && b_lport && lbinding->iface) {
         if (b_lport->pb->n_up && !b_lport->pb->up[0]) {
             return false;
@@ -660,13 +666,23 @@  local_binding_is_up(struct shash *local_bindings, const char *pb_name)
 }
 
 bool
-local_binding_is_down(struct shash *local_bindings, const char *pb_name)
+local_binding_is_down(struct shash *local_bindings, const char *pb_name,
+                      const struct sbrec_chassis *chassis_rec)
 {
     struct local_binding *lbinding =
         local_binding_find(local_bindings, pb_name);
 
     struct binding_lport *b_lport = local_binding_get_primary_lport(lbinding);
 
+    if (b_lport) {
+        if (b_lport->pb->chassis == chassis_rec) {
+            return false;
+        } else if (b_lport->pb->chassis) {
+            VLOG_DBG("lport %s already claimed by other chassis",
+                     b_lport->pb->logical_port);
+        }
+    }
+
     if (!lbinding) {
         return true;
     }
@@ -884,37 +900,80 @@  get_lport_type_str(enum en_lport_type lport_type)
     OVS_NOT_REACHED();
 }
 
-/* For newly claimed ports, if 'notify_up' is 'false':
+void
+set_pb_chassis_in_sbrec(const struct sbrec_port_binding *pb,
+                        const struct sbrec_chassis *chassis_rec,
+                        bool is_set)
+{
+    if (pb->chassis != chassis_rec) {
+         if (is_set) {
+            if (pb->chassis) {
+                VLOG_INFO("Changing chassis for lport %s from %s to %s.",
+                          pb->logical_port, pb->chassis->name,
+                          chassis_rec->name);
+            } else {
+                VLOG_INFO("Claiming lport %s for this chassis.",
+                          pb->logical_port);
+            }
+            for (int i = 0; i < pb->n_mac; i++) {
+                VLOG_INFO("%s: Claiming %s", pb->logical_port, pb->mac[i]);
+            }
+            sbrec_port_binding_set_chassis(pb, chassis_rec);
+        }
+    } else if (!is_set) {
+        sbrec_port_binding_set_chassis(pb, NULL);
+    }
+}
+
+void
+local_binding_set_pb(struct shash *local_bindings, const char *pb_name,
+                     const struct sbrec_chassis *chassis_rec,
+                     struct hmap *tracked_datapaths, bool is_set)
+{
+    struct local_binding *lbinding =
+        local_binding_find(local_bindings, pb_name);
+    struct binding_lport *b_lport = local_binding_get_primary_lport(lbinding);
+
+    if (b_lport) {
+        set_pb_chassis_in_sbrec(b_lport->pb, chassis_rec, is_set);
+        if (tracked_datapaths) {
+            update_lport_tracking(b_lport->pb, tracked_datapaths, true);
+        }
+    }
+}
+
+/* For newly claimed ports:
  * - set the 'pb.up' field to true if 'pb' has no 'parent_pb'.
  * - set the 'pb.up' field to true if 'parent_pb.up' is 'true' (e.g., for
  *   container and virtual ports).
- * Otherwise request a notification to be sent when the OVS flows
- * corresponding to 'pb' have been installed.
+ *
+ * Returns false if lport is not claimed due to 'sb_readonly'.
+ * Returns true otherwise.
  *
  * Note:
- *   Updates (directly or through a notification) the 'pb->up' field only if
- *   it's explicitly set to 'false'.
+ *   Updates the 'pb->up' field only if it's explicitly set to 'false'.
  *   This is to ensure compatibility with older versions of ovn-northd.
  */
-static void
+static bool
 claimed_lport_set_up(const struct sbrec_port_binding *pb,
                      const struct sbrec_port_binding *parent_pb,
-                     const struct sbrec_chassis *chassis_rec,
-                     bool notify_up, struct if_status_mgr *if_mgr)
+                     bool sb_readonly)
 {
-    if (!notify_up) {
-        bool up = true;
-        if (!parent_pb || (parent_pb->n_up && parent_pb->up[0])) {
+    /* When notify_up is false in claim_port(), no state is created
+     * by if_status_mgr. In such cases, return false (i.e. trigger recompute)
+     * if we can't update sb (because it is readonly).
+     */
+    bool up = true;
+    if (!parent_pb || (parent_pb->n_up && parent_pb->up[0])) {
+        if (!sb_readonly) {
             if (pb->n_up) {
                 sbrec_port_binding_set_up(pb, &up, 1);
             }
+        } else if (pb->n_up && !pb->up[0]) {
+            return false;
         }
-        return;
-    }
-
-    if (pb->chassis != chassis_rec || (pb->n_up && !pb->up[0])) {
-        if_status_mgr_claim_iface(if_mgr, pb->logical_port);
     }
+    return true;
 }
 
 typedef void (*set_func)(const struct sbrec_port_binding *pb,
@@ -1057,37 +1116,35 @@  claim_lport(const struct sbrec_port_binding *pb,
             struct hmap *tracked_datapaths,
             struct if_status_mgr *if_mgr)
 {
-    if (!sb_readonly) {
-        claimed_lport_set_up(pb, parent_pb, chassis_rec, notify_up, if_mgr);
-    }
-
     enum can_bind can_bind = lport_can_bind_on_this_chassis(chassis_rec, pb);
     bool update_tracked = false;
 
     if (can_bind == CAN_BIND_AS_MAIN) {
         if (pb->chassis != chassis_rec) {
-            if (sb_readonly) {
-                return false;
-            }
-
-            if (pb->chassis) {
-                VLOG_INFO("Changing chassis for lport %s from %s to %s.",
-                        pb->logical_port, pb->chassis->name,
-                        chassis_rec->name);
-            } else {
-                VLOG_INFO("Claiming lport %s for this chassis.",
-                          pb->logical_port);
-            }
-            for (size_t i = 0; i < pb->n_mac; i++) {
-                VLOG_INFO("%s: Claiming %s", pb->logical_port, pb->mac[i]);
-            }
-
-            sbrec_port_binding_set_chassis(pb, chassis_rec);
             if (is_additional_chassis(pb, chassis_rec)) {
+                if (sb_readonly) {
+                    return false;
+                }
                 remove_additional_chassis(pb, chassis_rec);
             }
             update_tracked = true;
         }
+        if (!notify_up) {
+            if (!claimed_lport_set_up(pb, parent_pb, sb_readonly)) {
+                return false;
+            }
+            if (pb->chassis != chassis_rec) {
+                if (sb_readonly) {
+                    return false;
+                }
+                set_pb_chassis_in_sbrec(pb, chassis_rec, true);
+            }
+        } else {
+            if ((pb->chassis != chassis_rec) || (pb->n_up && !pb->up[0])) {
+                if_status_mgr_claim_iface(if_mgr, pb, chassis_rec,
+                                          sb_readonly);
+            }
+        }
     } else if (can_bind == CAN_BIND_AS_ADDITIONAL) {
         if (!is_additional_chassis(pb, chassis_rec)) {
             if (sb_readonly) {
@@ -1132,7 +1189,8 @@  claim_lport(const struct sbrec_port_binding *pb,
  */
 static bool
 release_lport_main_chassis(const struct sbrec_port_binding *pb,
-                           bool sb_readonly)
+                           bool sb_readonly,
+                           struct if_status_mgr *if_mgr)
 {
     if (pb->encap) {
         if (sb_readonly) {
@@ -1141,11 +1199,13 @@  release_lport_main_chassis(const struct sbrec_port_binding *pb,
         sbrec_port_binding_set_encap(pb, NULL);
     }
 
+    /* If sb readonly, pb->chassis unset through if-status if present. */
     if (pb->chassis) {
-        if (sb_readonly) {
+        if (!sb_readonly) {
+            sbrec_port_binding_set_chassis(pb, NULL);
+        } else if (!if_status_mgr_iface_is_present(if_mgr, pb->logical_port)) {
             return false;
         }
-        sbrec_port_binding_set_chassis(pb, NULL);
     }
 
     if (pb->virtual_parent) {
@@ -1155,7 +1215,8 @@  release_lport_main_chassis(const struct sbrec_port_binding *pb,
         sbrec_port_binding_set_virtual_parent(pb, NULL);
     }
 
-    VLOG_INFO("Releasing lport %s from this chassis.", pb->logical_port);
+    VLOG_INFO("Releasing lport %s from this chassis (sb_readonly=%d)",
+              pb->logical_port, sb_readonly);
     return true;
 }
 
@@ -1189,7 +1250,7 @@  release_lport(const struct sbrec_port_binding *pb,
               struct hmap *tracked_datapaths, struct if_status_mgr *if_mgr)
 {
     if (pb->chassis == chassis_rec) {
-        if (!release_lport_main_chassis(pb, sb_readonly)) {
+        if (!release_lport_main_chassis(pb, sb_readonly, if_mgr)) {
             return false;
         }
     } else if (is_additional_chassis(pb, chassis_rec)) {
@@ -1271,7 +1332,7 @@  consider_vif_lport_(const struct sbrec_port_binding *pb,
                              b_lport->lbinding->iface,
                              !b_ctx_in->ovnsb_idl_txn,
                              !parent_pb, b_ctx_out->tracked_dp_bindings,
-                             b_ctx_out->if_mgr)){
+                             b_ctx_out->if_mgr)) {
                 return false;
             }
 
@@ -1527,7 +1588,8 @@  consider_localport(const struct sbrec_port_binding *pb,
     enum can_bind can_bind = lport_can_bind_on_this_chassis(
         b_ctx_in->chassis_rec, pb);
     if (can_bind == CAN_BIND_AS_MAIN) {
-        if (!release_lport_main_chassis(pb, !b_ctx_in->ovnsb_idl_txn)) {
+        if (!release_lport_main_chassis(pb, !b_ctx_in->ovnsb_idl_txn,
+            b_ctx_out->if_mgr)) {
             return false;
         }
     } else if (can_bind == CAN_BIND_AS_ADDITIONAL) {
diff --git a/controller/binding.h b/controller/binding.h
index 1fed06674..d20659b0b 100644
--- a/controller/binding.h
+++ b/controller/binding.h
@@ -151,8 +151,10 @@  const struct sbrec_port_binding *local_binding_get_primary_pb(
 ofp_port_t local_binding_get_lport_ofport(const struct shash *local_bindings,
                                           const char *pb_name);
 
-bool local_binding_is_up(struct shash *local_bindings, const char *pb_name);
-bool local_binding_is_down(struct shash *local_bindings, const char *pb_name);
+bool local_binding_is_up(struct shash *local_bindings, const char *pb_name,
+                         const struct sbrec_chassis *);
+bool local_binding_is_down(struct shash *local_bindings, const char *pb_name,
+                           const struct sbrec_chassis *);
 void local_binding_set_up(struct shash *local_bindings, const char *pb_name,
                           const struct sbrec_chassis *chassis_rec,
                           const char *ts_now_str, bool sb_readonly,
@@ -160,7 +162,10 @@  void local_binding_set_up(struct shash *local_bindings, const char *pb_name,
 void local_binding_set_down(struct shash *local_bindings, const char *pb_name,
                             const struct sbrec_chassis *chassis_rec,
                             bool sb_readonly, bool ovs_readonly);
-
+void local_binding_set_pb(struct shash *local_bindings, const char *pb_name,
+                          const struct sbrec_chassis *chassis_rec,
+                          struct hmap *tracked_datapaths,
+                          bool is_set);
 void binding_register_ovs_idl(struct ovsdb_idl *);
 void binding_run(struct binding_ctx_in *, struct binding_ctx_out *);
 bool binding_cleanup(struct ovsdb_idl_txn *ovnsb_idl_txn,
@@ -178,6 +183,10 @@  void binding_dump_local_bindings(struct local_binding_data *, struct ds *);
 bool is_additional_chassis(const struct sbrec_port_binding *pb,
                            const struct sbrec_chassis *chassis_rec);
 
+void set_pb_chassis_in_sbrec(const struct sbrec_port_binding *pb,
+                             const struct sbrec_chassis *chassis_rec,
+                             bool is_set);
+
 /* Corresponds to each Port_Binding.type. */
 enum en_lport_type {
     LP_UNKNOWN,
diff --git a/controller/if-status.c b/controller/if-status.c
index ad61844d8..7693c289b 100644
--- a/controller/if-status.c
+++ b/controller/if-status.c
@@ -24,6 +24,7 @@ 
 #include "lib/util.h"
 #include "timeval.h"
 #include "openvswitch/vlog.h"
+#include "lib/ovn-sb-idl.h"
 
 VLOG_DEFINE_THIS_MODULE(if_status);
 
@@ -53,9 +54,11 @@  VLOG_DEFINE_THIS_MODULE(if_status);
  */
 
 enum if_state {
-    OIF_CLAIMED,       /* Newly claimed interface. */
-    OIF_INSTALL_FLOWS, /* Already claimed interface for which flows are still
-                        * being installed.
+    OIF_CLAIMED,       /* Newly claimed interface. pb->chassis not yet updated.
+                        */
+    OIF_INSTALL_FLOWS, /* Claimed interface with pb->chassis successfully
+                        * updated in SB and for which flows are still being
+                        * installed.
                         */
     OIF_MARK_UP,       /* Interface with flows successfully installed in OVS
                         * but not yet marked "up" in the binding module (in
@@ -78,6 +81,55 @@  static const char *if_state_names[] = {
     [OIF_INSTALLED]     = "INSTALLED",
 };
 
+/*
+ *       +----------------------+
+ * +---> |                      |
+ * | +-> |         NULL         | <--------------------------------------+++-+
+ * | |   +----------------------+                                            |
+ * | |     ^ release_iface   | claim_iface                                   |
+ * | |     |                 V - sbrec_update_chassis(if sb is rw)           |
+ * | |   +----------------------+                                            |
+ * | |   |                      | <----------------------------------------+ |
+ * | |   |       CLAIMED        | <--------------------------------------+ | |
+ * | |   +----------------------+                                        | | |
+ * | |                  | mgr_update(when sb is rw)                      | | |
+ * | | release_iface    |  - sbrec_update_chassis                        | | |
+ * | |                  |  - request seqno                               | | |
+ * | |                  V                                                | | |
+ * | |   +----------------------+                                        | | |
+ * | +-- |                      |  mgr_run(seqno not rcvd)               | | |
+ * |     |    INSTALL_FLOWS     |   - set port down in sb                | | |
+ * |     |                      |  mgr_update()                          | | |
+ * |     +----------------------+   - sbrec_update_chassis if needed     | | |
+ * |                    |                                                | | |
+ * |                    |  mgr_run(seqno rcvd)                           | | |
+ * |                    |  - set port up in sb                           | | |
+ * | release_iface      |  - set ovn-installed in ovs                    | | |
+ * |                    V                                                | | |
+ * |   +----------------------+                                          | | |
+ * |   |                      |  mgr_run()                               | | |
+ * +-- |       MARK_UP        |  - set port up in sb                     | | |
+ *     |                      |  - set ovn-installed in ovs              | | |
+ *     |                      |  mgr_update()                            | | |
+ *     +----------------------+  - sbrec_update_chassis if needed        | | |
+ *              |                                                        | | |
+ *              | mgr_update(rcvd port up / ovn_installed & chassis set) | | |
+ *              V                                                        | | |
+ *     +----------------------+                                          | | |
+ *     |      INSTALLED       | ------------> claim_iface ---------------+ | |
+ *     +----------------------+                                            | |
+ *              |                                                          | |
+ *              | release_iface                                            | |
+ *              V                                                          | |
+ *     +----------------------+                                            | |
+ *     |                      | ------------> claim_iface -----------------+ |
+ *     |      MARK_DOWN       | ------> mgr_update(rcvd port down) ----------+
+ *     |                      | mgr_run()
+ *     |                      | - set port down in sb
+ *     |                      | mgr_update()
+ *     +----------------------+ - sbrec_update_chassis(NULL)
+ */
+
 struct ovs_iface {
     char *id;               /* Extracted from OVS external_ids.iface_id. */
     enum if_state state;    /* State of the interface in the state machine. */
@@ -85,6 +137,7 @@  struct ovs_iface {
                              * be fully programmed in OVS.  Only used in state
                              * OIF_INSTALL_FLOWS.
                              */
+    bool chassis_update_required;  /* If true, pb->chassis must be updated. */
 };
 
 static uint64_t ifaces_usage;
@@ -158,14 +211,23 @@  if_status_mgr_destroy(struct if_status_mgr *mgr)
 }
 
 void
-if_status_mgr_claim_iface(struct if_status_mgr *mgr, const char *iface_id)
+if_status_mgr_claim_iface(struct if_status_mgr *mgr,
+                          const struct sbrec_port_binding *pb,
+                          const struct sbrec_chassis *chassis_rec,
+                          bool sb_readonly)
 {
+    const char *iface_id = pb->logical_port;
     struct ovs_iface *iface = shash_find_data(&mgr->ifaces, iface_id);
 
     if (!iface) {
         iface = ovs_iface_create(mgr, iface_id, OIF_CLAIMED);
     }
-
+    if (!sb_readonly) {
+        set_pb_chassis_in_sbrec(pb, chassis_rec, true);
+        iface->chassis_update_required = false;
+    } else {
+        iface->chassis_update_required = true;
+    }
     switch (iface->state) {
     case OIF_CLAIMED:
     case OIF_INSTALL_FLOWS:
@@ -182,6 +244,12 @@  if_status_mgr_claim_iface(struct if_status_mgr *mgr, const char *iface_id)
     }
 }
 
+bool
+if_status_mgr_iface_is_present(struct if_status_mgr *mgr, const char *iface_id)
+{
+    return !!shash_find_data(&mgr->ifaces, iface_id);
+}
+
 void
 if_status_mgr_release_iface(struct if_status_mgr *mgr, const char *iface_id)
 {
@@ -246,9 +314,39 @@  if_status_mgr_delete_iface(struct if_status_mgr *mgr, const char *iface_id)
     }
 }
 
+bool
+if_status_handle_claims(struct if_status_mgr *mgr,
+                        struct local_binding_data *binding_data,
+                        const struct sbrec_chassis *chassis_rec,
+                        struct hmap *tracked_datapath,
+                        bool sb_readonly)
+{
+    if (!binding_data || sb_readonly) {
+        return false;
+    }
+
+    struct shash *bindings = &binding_data->bindings;
+    struct hmapx_node *node;
+
+    bool rc = false;
+    HMAPX_FOR_EACH (node, &mgr->ifaces_per_state[OIF_CLAIMED]) {
+        struct ovs_iface *iface = node->data;
+        if (iface->chassis_update_required) {
+            VLOG_INFO("if_status_handle_claims for %s", iface->id);
+            local_binding_set_pb(bindings, iface->id, chassis_rec,
+                                 tracked_datapath, true);
+            rc = true;
+        }
+        iface->chassis_update_required = false;
+    }
+    return rc;
+}
+
 void
 if_status_mgr_update(struct if_status_mgr *mgr,
-                     struct local_binding_data *binding_data)
+                     struct local_binding_data *binding_data,
+                     const struct sbrec_chassis *chassis_rec,
+                     bool sb_readonly)
 {
     if (!binding_data) {
         return;
@@ -257,13 +355,25 @@  if_status_mgr_update(struct if_status_mgr *mgr,
     struct shash *bindings = &binding_data->bindings;
     struct hmapx_node *node;
 
+    /* Interfaces in OIF_MARK_UP state have already set their pb->chassis.
+     * However, it might have been reset by another hv.
+     */
     /* Move all interfaces that have been confirmed "up" by the binding module,
      * from OIF_MARK_UP to OIF_INSTALLED.
      */
     HMAPX_FOR_EACH_SAFE (node, &mgr->ifaces_per_state[OIF_MARK_UP]) {
         struct ovs_iface *iface = node->data;
 
-        if (local_binding_is_up(bindings, iface->id)) {
+        if (iface->chassis_update_required) {
+            if (!sb_readonly) {
+                iface->chassis_update_required = false;
+                local_binding_set_pb(bindings, iface->id, chassis_rec,
+                                     NULL, true);
+            } else {
+                continue;
+            }
+        }
+        if (local_binding_is_up(bindings, iface->id, chassis_rec)) {
             ovs_iface_set_state(mgr, iface, OIF_INSTALLED);
         }
     }
@@ -274,23 +384,53 @@  if_status_mgr_update(struct if_status_mgr *mgr,
     HMAPX_FOR_EACH_SAFE (node, &mgr->ifaces_per_state[OIF_MARK_DOWN]) {
         struct ovs_iface *iface = node->data;
 
-        if (local_binding_is_down(bindings, iface->id)) {
+        if (!sb_readonly) {
+            local_binding_set_pb(bindings, iface->id, chassis_rec,
+                                 NULL, false);
+        }
+        if (local_binding_is_down(bindings, iface->id, chassis_rec)) {
             ovs_iface_destroy(mgr, iface);
         }
     }
 
-    /* Register for a notification about flows being installed in OVS for all
-     * newly claimed interfaces.
+    if (!sb_readonly) {
+        HMAPX_FOR_EACH_SAFE (node, &mgr->ifaces_per_state[OIF_INSTALL_FLOWS]) {
+            struct ovs_iface *iface = node->data;
+
+            if (iface->chassis_update_required) {
+                iface->chassis_update_required = false;
+                local_binding_set_pb(bindings, iface->id, chassis_rec,
+                                     NULL, true);
+            }
+        }
+    }
+
+    /* Update Port_Binding->chassis for newly claimed interfaces
+     * Register for a notification about flows being installed in OVS for all
+     * newly claimed interfaces for which we could update pb->chassis.
      *
      * Move them from OIF_CLAIMED to OIF_INSTALL_FLOWS.
      */
-    bool new_ifaces = false;
-    HMAPX_FOR_EACH_SAFE (node, &mgr->ifaces_per_state[OIF_CLAIMED]) {
-        struct ovs_iface *iface = node->data;
 
-        ovs_iface_set_state(mgr, iface, OIF_INSTALL_FLOWS);
-        iface->install_seqno = mgr->iface_seqno + 1;
-        new_ifaces = true;
+    bool new_ifaces = false;
+    if (!sb_readonly) {
+        HMAPX_FOR_EACH_SAFE (node, &mgr->ifaces_per_state[OIF_CLAIMED]) {
+            struct ovs_iface *iface = node->data;
+            /* No need to check for chassis_update_required as already done
+             * in if_status_handle_claims or if_status_mgr_claim_iface
+             */
+            ovs_iface_set_state(mgr, iface, OIF_INSTALL_FLOWS);
+            iface->install_seqno = mgr->iface_seqno + 1;
+            new_ifaces = true;
+        }
+    } else {
+        HMAPX_FOR_EACH_SAFE (node, &mgr->ifaces_per_state[OIF_CLAIMED]) {
+            struct ovs_iface *iface = node->data;
+            static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 1);
+            VLOG_INFO_RL(&rl,
+                         "Not updating pb chassis for %s now as "
+                         "sb is readonly", iface->id);
+        }
     }
 
     /* Request a seqno update when the flows for new interfaces have been
@@ -403,7 +543,7 @@  if_status_mgr_update_bindings(struct if_status_mgr *mgr,
     struct hmapx_node *node;
 
     /* Notify the binding module to set "down" all bindings that are still
-     * in the process of being installed in OVS, i.e., are not yet instsalled.
+     * in the process of being installed in OVS, i.e., are not yet installed.
      */
     HMAPX_FOR_EACH (node, &mgr->ifaces_per_state[OIF_INSTALL_FLOWS]) {
         struct ovs_iface *iface = node->data;
diff --git a/controller/if-status.h b/controller/if-status.h
index bb8a3950d..f9b05d30d 100644
--- a/controller/if-status.h
+++ b/controller/if-status.h
@@ -27,15 +27,27 @@  struct if_status_mgr *if_status_mgr_create(void);
 void if_status_mgr_clear(struct if_status_mgr *);
 void if_status_mgr_destroy(struct if_status_mgr *);
 
-void if_status_mgr_claim_iface(struct if_status_mgr *, const char *iface_id);
+void if_status_mgr_claim_iface(struct if_status_mgr *,
+                               const struct sbrec_port_binding *pb,
+                               const struct sbrec_chassis *chassis_rec,
+                               bool sb_readonly);
 void if_status_mgr_release_iface(struct if_status_mgr *, const char *iface_id);
 void if_status_mgr_delete_iface(struct if_status_mgr *, const char *iface_id);
 
-void if_status_mgr_update(struct if_status_mgr *, struct local_binding_data *);
+void if_status_mgr_update(struct if_status_mgr *, struct local_binding_data *,
+                          const struct sbrec_chassis *chassis,
+                          bool sb_readonly);
 void if_status_mgr_run(struct if_status_mgr *mgr, struct local_binding_data *,
                        const struct sbrec_chassis *,
                        bool sb_readonly, bool ovs_readonly);
 void if_status_mgr_get_memory_usage(struct if_status_mgr *mgr,
                                     struct simap *usage);
+bool if_status_mgr_iface_is_present(struct if_status_mgr *mgr,
+                                    const char *iface_id);
+bool if_status_handle_claims(struct if_status_mgr *mgr,
+                             struct local_binding_data *binding_data,
+                             const struct sbrec_chassis *chassis_rec,
+                             struct hmap *tracked_datapath,
+                             bool sb_readonly);
 
 # endif /* controller/if-status.h */
diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c
index 69615308e..3947baf03 100644
--- a/controller/ovn-controller.c
+++ b/controller/ovn-controller.c
@@ -1464,6 +1464,73 @@  en_runtime_data_run(struct engine_node *node, void *data)
     engine_set_node_state(node, EN_UPDATED);
 }
 
+struct ed_type_sb_ro {
+    bool sb_readonly;
+};
+
+static void *
+en_sb_ro_init(struct engine_node *node OVS_UNUSED,
+              struct engine_arg *arg OVS_UNUSED)
+{
+    struct ed_type_sb_ro *data = xzalloc(sizeof *data);
+    return data;
+}
+
+static void
+en_sb_ro_run(struct engine_node *node, void *data)
+{
+    struct ed_type_sb_ro *sb_ro_data = data;
+    bool sb_readonly = !engine_get_context()->ovnsb_idl_txn;
+    if (sb_ro_data->sb_readonly != sb_readonly) {
+        sb_ro_data->sb_readonly = sb_readonly;
+        if (!sb_ro_data->sb_readonly) {
+            engine_set_node_state(node, EN_UPDATED);
+        }
+    }
+}
+
+static void
+en_sb_ro_cleanup(void *data OVS_UNUSED)
+{
+}
+
+static bool
+runtime_data_sb_ro_handler(struct engine_node *node, void *data)
+{
+    const struct sbrec_chassis *chassis = NULL;
+
+    struct ovsrec_open_vswitch_table *ovs_table =
+        (struct ovsrec_open_vswitch_table *)EN_OVSDB_GET(
+            engine_get_input("OVS_open_vswitch", node));
+
+    const char *chassis_id = get_ovs_chassis_id(ovs_table);
+
+    struct ovsdb_idl_index *sbrec_chassis_by_name =
+        engine_ovsdb_node_get_index(
+                engine_get_input("SB_chassis", node),
+                "name");
+
+    if (chassis_id) {
+        chassis = chassis_lookup_by_name(sbrec_chassis_by_name, chassis_id);
+    }
+    if (chassis) {
+        struct ed_type_runtime_data *rt_data = data;
+        bool sb_readonly = !engine_get_context()->ovnsb_idl_txn;
+        struct controller_engine_ctx *ctrl_ctx =
+            engine_get_context()->client_ctx;
+
+        if (if_status_handle_claims(ctrl_ctx->if_mgr,
+                                    &rt_data->lbinding_data,
+                                    chassis,
+                                    &rt_data->tracked_dp_bindings,
+                                    sb_readonly)) {
+            engine_set_node_state(node, EN_UPDATED);
+            rt_data->tracked = true;
+        }
+    }
+    return true;
+}
+
 static bool
 runtime_data_ovs_interface_shadow_handler(struct engine_node *node, void *data)
 {
@@ -3528,6 +3595,7 @@  main(int argc, char *argv[])
     stopwatch_create(VIF_PLUG_RUN_STOPWATCH_NAME, SW_MS);
 
     /* Define inc-proc-engine nodes. */
+    ENGINE_NODE(sb_ro, "sb_ro");
     ENGINE_NODE_WITH_CLEAR_TRACK_DATA_IS_VALID(ct_zones, "ct_zones");
     ENGINE_NODE_WITH_CLEAR_TRACK_DATA(ovs_interface_shadow,
                                       "ovs_interface_shadow");
@@ -3664,6 +3732,7 @@  main(int argc, char *argv[])
     engine_add_input(&en_ovs_interface_shadow, &en_ovs_interface,
                      ovs_interface_shadow_ovs_interface_handler);
 
+    engine_add_input(&en_runtime_data, &en_sb_ro, runtime_data_sb_ro_handler);
     engine_add_input(&en_runtime_data, &en_ofctrl_is_connected, NULL);
 
     engine_add_input(&en_runtime_data, &en_ovs_open_vswitch, NULL);
@@ -4098,7 +4167,8 @@  main(int argc, char *argv[])
                         runtime_data ? &runtime_data->lbinding_data : NULL;
                     stopwatch_start(IF_STATUS_MGR_UPDATE_STOPWATCH_NAME,
                                     time_msec());
-                    if_status_mgr_update(if_mgr, binding_data);
+                    if_status_mgr_update(if_mgr, binding_data, chassis,
+                                         !ovnsb_idl_txn);
                     stopwatch_stop(IF_STATUS_MGR_UPDATE_STOPWATCH_NAME,
                                    time_msec());
 
diff --git a/tests/ovn-macros.at b/tests/ovn-macros.at
index 335f9158c..8fd6ae6f7 100644
--- a/tests/ovn-macros.at
+++ b/tests/ovn-macros.at
@@ -759,3 +759,15 @@  m4_define([OVN_FOR_EACH_NORTHD_WITHOUT_DP_GROUPS],
      [m4_foreach([NORTHD_USE_DP_GROUPS], [no],
        [m4_foreach([NORTHD_USE_PARALLELIZATION], [yes, no], [$1
 ])])])])
+
+# OVN_NBCTL(NBCTL_COMMAND) adds NBCTL_COMMAND to list of commands to be run by RUN_OVN_NBCTL().
+m4_define([OVN_NBCTL], [
+    command="${command} -- $1"
+])
+
+# RUN_OVN_NBCTL() executes list of commands built by the OVN_NBCTL() macro.
+m4_define([RUN_OVN_NBCTL], [
+    check ovn-nbctl ${command}
+    unset command
+])
+
diff --git a/tests/ovn.at b/tests/ovn.at
index bfaa41962..94d16bac9 100644
--- a/tests/ovn.at
+++ b/tests/ovn.at
@@ -102,6 +102,18 @@  m4_divert_text([PREPARE_TESTS],
          test 1 -le $(as $hv1 ovs-ofctl dump-flows br-int | grep -c "output:$ofport")
      ])
    }
+
+   ovn_wait_remote_input_flows () {
+     hv1=$1
+     hv2=$2
+     echo "$3: waiting for flows for remote input on $hv1"
+     # Wait for a flow outputing  to remote input
+     OVS_WAIT_UNTIL([
+         ofport=$(as $hv1 ovs-vsctl --bare --columns ofport find Interface name=ovn-${hv2}-0)
+         echo "tunnel port=$ofport"
+         test 1 -le $(as $hv1 ovs-ofctl dump-flows br-int | grep -c "in_port=$ofport")
+     ])
+   }
 ])
 
 m4_define([OVN_CHECK_PACKETS],
@@ -127,6 +139,8 @@  m4_define([OVN_WAIT_PATCH_PORT_FLOWS],
 m4_define([OVN_WAIT_REMOTE_OUTPUT_FLOWS],
   [ovn_wait_remote_output_flows "$1" "$2" "__file__:__line__"])
 
+m4_define([OVN_WAIT_REMOTE_INPUT_FLOWS],
+  [ovn_wait_remote_input_flows "$1" "$2" "__file__:__line__"])
 
 AT_BANNER([OVN components])
 
@@ -14056,6 +14070,11 @@  wait_column "$hv1_uuid" Port_Binding requested_chassis logical_port=lsp0
 wait_column "$hv2_uuid" Port_Binding additional_chassis logical_port=lsp0
 wait_column "$hv2_uuid" Port_Binding requested_additional_chassis logical_port=lsp0
 
+# Check ovn-installed updated for main chassis
+wait_for_ports_up
+OVS_WAIT_UNTIL([test `as hv1 ovs-vsctl get Interface lsp0 external_ids:ovn-installed` = '"true"'])
+OVS_WAIT_UNTIL([test x`as hv2 ovs-vsctl get Interface lsp0 external_ids:ovn-installed` = x])
+
 # Check that setting iface:encap-ip populates Port_Binding:additional_encap
 wait_row_count Encap 2 chassis_name=hv1
 wait_row_count Encap 2 chassis_name=hv2
@@ -14081,6 +14100,11 @@  wait_column "$hv2_uuid" Port_Binding requested_chassis logical_port=lsp0
 wait_column "" Port_Binding additional_chassis logical_port=lsp0
 wait_column "" Port_Binding requested_additional_chassis logical_port=lsp0
 
+# Check ovn-installed updated for main chassis and not for other chassis
+wait_for_ports_up
+OVS_WAIT_UNTIL([test `as hv2 ovs-vsctl get Interface lsp0 external_ids:ovn-installed` = '"true"'])
+OVS_WAIT_UNTIL([test x`as hv1 ovs-vsctl get Interface lsp0 external_ids:ovn-installed` = x])
+
 # Check that additional_encap is cleared
 wait_column "" Port_Binding additional_encap logical_port=lsp0
 
@@ -15370,7 +15394,10 @@  AT_CHECK([as hv1 ovs-ofctl dump-flows br-int table=65 | grep actions=output:1],
 echo "verifying that lsp0 binding moves when requested-chassis is changed"
 
 ovn-nbctl lsp-set-options lsp0 requested-chassis=hv2
-OVS_WAIT_UNTIL([test 1 = $(grep -c "Releasing lport lsp0 from this chassis" hv1/ovn-controller.log)])
+
+# We might see multiple "Releasing lport ...", when sb is read only
+OVS_WAIT_UNTIL([test 1 -le $(grep -c "Releasing lport lsp0 from this chassis" hv1/ovn-controller.log)])
+
 wait_column "$hv2_uuid" Port_Binding chassis logical_port=lsp0
 
 # (6) Chassis hv2 should add flows and hv1 should not.
@@ -15416,7 +15443,7 @@  AT_CHECK([as hv1 ovs-ofctl dump-flows br-int table=0 | grep in_port=1], [0], [ig
 AT_CHECK([as hv1 ovs-ofctl dump-flows br-int table=65 | grep actions=output:1], [0], [ignore])
 
 check ovn-nbctl --wait=hv lsp-set-options lsp0 requested-chassis=non-existant-chassis
-OVS_WAIT_UNTIL([test 1 = $(grep -c "Releasing lport lsp0 from this chassis" hv1/ovn-controller.log)])
+OVS_WAIT_UNTIL([test 1 -le $(grep -c "Releasing lport lsp0 from this chassis" hv1/ovn-controller.log)])
 check ovn-nbctl --wait=hv sync
 wait_column '' Port_Binding chasssi logical_port=lsp0
 AT_CHECK([as hv1 ovs-ofctl dump-flows br-int table=0 | grep in_port=1], [1], [])
@@ -32418,3 +32445,119 @@  AT_CHECK([test $(ovn-sbctl list fdb | grep -c "00:00:00:00:10:30") = 0])
 OVN_CLEANUP([hv1])
 AT_CLEANUP
 ])
+
+OVN_FOR_EACH_NORTHD([
+AT_SETUP([recomputes])
+ovn_start
+
+n_hv=4
+
+# Add chassis
+net_add n1
+for i in $(seq 1 $n_hv); do
+    sim_add hv$i
+    as hv$i
+    check ovs-vsctl add-br br-phys
+    ovs-vsctl set open . external-ids:ovn-bridge-mappings=phys:br-phys
+    ovn_attach n1 br-phys 192.168.0.$i 24 geneve
+done
+
+add_switch_ports() {
+    start_port=$1
+    end_port=$2
+    nb_hv=$3
+    bulk_size=$4
+    for ((i=start_port; i<end_port; )) do
+        start_bulk=$i
+        for hv in $(seq 1 $nb_hv); do
+            end_bulk=$((start_bulk+bulk_size-1))
+            for port in $(seq $start_bulk $end_bulk); do
+                logical_switch_port=lsp${port}
+                OVN_NBCTL(lsp-add ls1 $logical_switch_port)
+                OVN_NBCTL(lsp-set-addresses $logical_switch_port dynamic)
+            done
+            start_bulk=$((end_bulk+1))
+        done
+        RUN_OVN_NBCTL()
+
+        start_bulk=$i
+        for hv in $(seq 1 $nb_hv); do
+            end_bulk=$((start_bulk+bulk_size-1))
+            for port in $(seq $start_bulk $end_bulk); do
+                logical_switch_port=lsp${port}
+                as hv$hv ovs-vsctl \
+                    --no-wait -- add-port br-int vif${port} \
+                    -- set Interface vif${port} external_ids:iface-id=$logical_switch_port
+            done
+            start_bulk=$((end_bulk+1))
+        done
+        i=$((end_bulk+1))
+    done
+}
+check ovn-nbctl ls-add ls1
+check ovn-nbctl set Logical_Switch ls1 other_config:subnet=10.1.0.0/16
+check ovn-nbctl set Logical_Switch ls1 other_config:exclude_ips=10.1.255.254
+
+check ovn-nbctl lr-add lr1
+check ovn-nbctl lsp-add ls1 lsp0 -- set Logical_Switch_Port lsp0 type=router options:router-port=lrp0 addresses=dynamic
+check ovn-nbctl lrp-add lr1 lrp0 "f0:00:00:01:00:01" 10.1.255.254/16
+check ovn-nbctl lr-nat-add lr1 snat 10.2.0.1 10.1.0.0/16
+
+lflow_run=0
+check ovn-nbctl --wait=hv sync
+
+# Tunnel ports might not be added (yet) at this point on slow system.
+# Wait for flows related to such ports to ensure those ports have been added
+# before we measure recomputes. Otherwise, ovs_interface handler might be run
+# afterwards for tunnel ports, causing recomputes.
+for i in $(seq 1 $n_hv); do
+    for j in $(seq 1 $n_hv); do
+        if test $i != $j; then
+            OVN_WAIT_REMOTE_INPUT_FLOWS(["hv$i"],["hv$j"])
+        fi
+    done
+done
+
+for i in $(seq 1 $n_hv); do
+    as hv$i
+    lflow_run1=$(ovn-appctl -t ovn-controller coverage/read-counter lflow_run)
+    lflow_run=`expr $lflow_run1 + $lflow_run`
+done
+
+add_switch_ports 1 1000 $n_hv 5
+
+wait_for_ports_up
+check ovn-nbctl --wait=hv sync
+
+for i in $(seq 1 $n_hv); do
+    pid=$(cat hv${i}/ovn-controller.pid)
+    u=$((u+$(cat "/proc/$pid/stat" | awk '{print $14}')))
+    s=$((s+$(cat "/proc/$pid/stat" | awk '{print $15}')))
+done
+
+n_pid=$(cat northd/ovn-northd.pid)
+n_u=$(cat "/proc/$pid/stat" | awk '{print $14}')
+n_s=$(cat "/proc/$pid/stat" | awk '{print $15}')
+
+echo "Total Northd User Time: $n_u"
+echo "Total Northd System Time: $n_s"
+echo "Total Controller User Time: $u"
+echo "Total Controller System Time: $s"
+
+lflow_run_end=0
+for i in $(seq 1 $n_hv); do
+    as hv$i
+    lflow_run1=$(ovn-appctl -t ovn-controller coverage/read-counter lflow_run)
+    lflow_run_end=`expr $lflow_run1 + $lflow_run_end`
+done
+n_recomputes=`expr $lflow_run_end - $lflow_run`
+echo "$n_recomputes recomputes"
+
+AT_CHECK([test $lflow_run_end == $lflow_run])
+
+for i in $(seq 2 $n_hv); do
+    OVN_CLEANUP_SBOX([hv$i])
+done
+OVN_CLEANUP([hv1])
+AT_CLEANUP
+])
diff --git a/tests/perf-northd.at b/tests/perf-northd.at
index 74b69e9d4..6ec196b36 100644
--- a/tests/perf-northd.at
+++ b/tests/perf-northd.at
@@ -76,23 +76,6 @@  m4_define([PERF_RECORD_STOP], [
     PERF_RECORD_STOPWATCH(ovn-northd-loop, ["Short term average"], [Average (northd-loop in msec)])
 ])
 
-# OVN_NBCTL([NBCTL_COMMAND])
-#
-# Add NBCTL_COMMAND to list of commands to be run by RUN_OVN_NBCTL().
-#
-m4_define([OVN_NBCTL], [
-    command="${command} -- $1"
-])
-
-# RUN_OVN_NBCTL()
-#
-# Execute list of commands built by the OVN_NBCTL() macro.
-#
-m4_define([RUN_OVN_NBCTL], [
-    check ovn-nbctl ${command}
-    unset command
-])
-
 OVS_START_SHELL_HELPERS
 generate_subnet () {
     local a=$(printf %d $(expr $1 / 256 + 10))