diff mbox

[ovs-dev,patch_v10] ovn: Add datapaths of interest filtering.

Message ID 1480896824-62755-1-git-send-email-dlu998@gmail.com
State Superseded
Headers show

Commit Message

Darrell Ball Dec. 5, 2016, 12:13 a.m. UTC
This patch adds datapaths of interest support where only datapaths of
local interest are monitored by the ovn-controller ovsdb client.  The
idea is to do a flood fill in ovn-controller of datapath associations
calculated by northd. A new column is added to the SB database
datapath_binding table - related_datapaths to facilitate this so all
datapaths associations are known quickly in ovn-controller.  This
allows monitoring to adapt quickly with a single new monitor setting
for all datapaths of interest locally.

We monitor both logical flows and logical ports. The mac bindings
will likely go away and multicast groups are relatively few.
To reduce risk and overall latency we only suppress logical flows as
a starting point. Logical ports are not suppressed as a starting
point and but then are monitor suppressed as appropriate from that
point on. The initial suppression has little practical gain and not
doing it allows for us to be flexible in terms of how different port
types are supported.

To validate the gain, we take the approach to verify the l3gateway
router flow suppression from the non-l3gateway HVs POV and also add
a separate test to verify the flow and port handling scale advantages.
The separate test added verifies the logical port and logical flow
suppression advantage. This measures the difference in flows and ports
sent to the HV clients where the far majority of the overall work is.
The less ports and flows that are unnecessarily processed on the HVs,
the more benefit, excluding latency skew. The separate test is
simple for interpretation and shows an order of magnitude advantage;
the test uses only logical switches. The advantage would increase with
logical routers. The test is conservative in estimating the numerical
advantage to increase the probability of the check passing the first
time, since there are some timing conssiderations that affect the numbers.

Liran Schour contributed enhancements to the conditional
monitoring granularity in ovs and also submitted patches
for ovn usage of conditional monitoring which aided this patch
though sharing of concepts through code review work.

Ben Pfaff suggested that northd could be used to pre-populate
related datapaths for ovn-controller to use.  That idea is
used as part of this patch.

CC: Ben Pfaff <blp@ovn.org>
CC: Liran Schour <LIRANS@il.ibm.com>
Signed-off-by: Darrell Ball <dlu998@gmail.com>
---

v9->v10: Fix ordering issue on northd as pointed out by Mickey.

         Remove patch flood fill triggering optimization to
         make the code easier to verify and harder to break.

         Fix a couple issues on ovn-controller side, including
         stale checking.

         Add lots more focused testing - now 2 separate extensive
         tests and beefed up l3gateway test.

         Remove dependency on port filtering for l3gateways filtering
         and associate l3gateway chassis name as part of datapath,
         similarly to logical ports. Some agreement on adding this
         extra field to the datapath binding table. 
         
v8->v9: Reinstate logical port monitoring from patch V3
        now to handle some LP high influence topologies, especially
        in lieu of a lack of incremental processing right now.
        The original plan was to forgo monitoring logical ports
        in the first commit, but without incremental processing,
        port processing gains importance.

        Parse out some functions

        Add a separate test for conditional monitoring to
        guesstimate the reduction in logical port and flow events
        seen at the HVs.

v7->v8: Start wth logical flow conditional monitoring off.
        Remove deprecated code.
        Recover SB DB version number change.
        Change test to be more definitive. 

v6->v7: Rebase

v5->v6: Rebase; fix stale handling.

v4->v5: Correct cleanup of monitors.
        Fix warning.

v3->v4: Refactor after incremental processing backout.
        Limit filtering to logical flows to limit risk.

v2->v3: Line length violation fixups :/

v1->v2: Added logical port removal monitoring handling, factoring
        in recent changes for incremental processing.  Added some
        comments in the code regarding initial conditions for
        database conditional monitoring.

 ovn/controller/binding.c        |  32 +++-
 ovn/controller/lflow.c          |   5 +
 ovn/controller/ovn-controller.c | 188 ++++++++++++++++++++
 ovn/controller/ovn-controller.h |  10 ++
 ovn/controller/patch.c          |  24 +++
 ovn/northd/ovn-northd.c         | 121 +++++++++++++
 ovn/ovn-sb.ovsschema            |  16 +-
 ovn/ovn-sb.xml                  |  13 ++
 tests/ovn.at                    | 371 +++++++++++++++++++++++++++++++++++++++-
 9 files changed, 770 insertions(+), 10 deletions(-)

Comments

Mickey Spiegel Dec. 5, 2016, 5:48 a.m. UTC | #1
On Sun, Dec 4, 2016 at 4:13 PM, Darrell Ball <dlu998@gmail.com> wrote:

> This patch adds datapaths of interest support where only datapaths of
> local interest are monitored by the ovn-controller ovsdb client.  The
> idea is to do a flood fill in ovn-controller of datapath associations
> calculated by northd. A new column is added to the SB database
> datapath_binding table - related_datapaths to facilitate this so all
> datapaths associations are known quickly in ovn-controller.  This
> allows monitoring to adapt quickly with a single new monitor setting
> for all datapaths of interest locally.
>

This is getting close. I have a few questions and comments inline.

>
> We monitor both logical flows and logical ports. The mac bindings
> will likely go away and multicast groups are relatively few.
> To reduce risk and overall latency we only suppress logical flows as
> a starting point. Logical ports are not suppressed as a starting
> point and but then are monitor suppressed as appropriate from that
> point on. The initial suppression has little practical gain and not
> doing it allows for us to be flexible in terms of how different port
> types are supported.
>

I lack familiarity with conditional monitoring behavior when you do
not explicitly suppress as a starting point. A new hv comes up, and
starts downloading stuff from SB, including port_bindings. At some
point the first VIF on that hv comes up, as a result of which the hv's
controller calls sbrec_port_binding_add_clause_logical_port.
Do all other port_bindings get suppressed at that point?
If so, won't this lead to different behavior between the first time a
VIF comes up on a hv, and subsequent additions of VIFs on
unrelated and previously unseen datapaths?
Wouldn't the behavior be more consistent if you suppress from the
beginning?

>
> To validate the gain, we take the approach to verify the l3gateway
> router flow suppression from the non-l3gateway HVs POV and also add
> a separate test to verify the flow and port handling scale advantages.
> The separate test added verifies the logical port and logical flow
> suppression advantage. This measures the difference in flows and ports
> sent to the HV clients where the far majority of the overall work is.
> The less ports and flows that are unnecessarily processed on the HVs,
> the more benefit, excluding latency skew. The separate test is
> simple for interpretation and shows an order of magnitude advantage;
> the test uses only logical switches. The advantage would increase with
> logical routers. The test is conservative in estimating the numerical
> advantage to increase the probability of the check passing the first
> time, since there are some timing conssiderations that affect the numbers.
>

This description should be updated to mention the distributed
logical router tests that you added.

>
> Liran Schour contributed enhancements to the conditional
> monitoring granularity in ovs and also submitted patches
> for ovn usage of conditional monitoring which aided this patch
> though sharing of concepts through code review work.
>
> Ben Pfaff suggested that northd could be used to pre-populate
> related datapaths for ovn-controller to use.  That idea is
> used as part of this patch.
>
> CC: Ben Pfaff <blp@ovn.org>
> CC: Liran Schour <LIRANS@il.ibm.com>
> Signed-off-by: Darrell Ball <dlu998@gmail.com>
> ---
>
> v9->v10: Fix ordering issue on northd as pointed out by Mickey.
>
>          Remove patch flood fill triggering optimization to
>          make the code easier to verify and harder to break.
>
>          Fix a couple issues on ovn-controller side, including
>          stale checking.
>
>          Add lots more focused testing - now 2 separate extensive
>          tests and beefed up l3gateway test.
>
>          Remove dependency on port filtering for l3gateways filtering
>          and associate l3gateway chassis name as part of datapath,
>          similarly to logical ports. Some agreement on adding this
>          extra field to the datapath binding table.
>

I certainly agree. It is good to have the filtering depend only
on the contents of datapath_binding, with no dependencies
on port_bindings. This way there is no issue with ordering
dependencies when filtering port_bindings.

>
> v8->v9: Reinstate logical port monitoring from patch V3
>         now to handle some LP high influence topologies, especially
>         in lieu of a lack of incremental processing right now.
>         The original plan was to forgo monitoring logical ports
>         in the first commit, but without incremental processing,
>         port processing gains importance.
>
>         Parse out some functions
>
>         Add a separate test for conditional monitoring to
>         guesstimate the reduction in logical port and flow events
>         seen at the HVs.
>
> v7->v8: Start wth logical flow conditional monitoring off.
>         Remove deprecated code.
>         Recover SB DB version number change.
>         Change test to be more definitive.
>
> v6->v7: Rebase
>
> v5->v6: Rebase; fix stale handling.
>
> v4->v5: Correct cleanup of monitors.
>         Fix warning.
>
> v3->v4: Refactor after incremental processing backout.
>         Limit filtering to logical flows to limit risk.
>
> v2->v3: Line length violation fixups :/
>
> v1->v2: Added logical port removal monitoring handling, factoring
>         in recent changes for incremental processing.  Added some
>         comments in the code regarding initial conditions for
>         database conditional monitoring.
>
>  ovn/controller/binding.c        |  32 +++-
>  ovn/controller/lflow.c          |   5 +
>  ovn/controller/ovn-controller.c | 188 ++++++++++++++++++++
>  ovn/controller/ovn-controller.h |  10 ++
>  ovn/controller/patch.c          |  24 +++
>  ovn/northd/ovn-northd.c         | 121 +++++++++++++
>  ovn/ovn-sb.ovsschema            |  16 +-
>  ovn/ovn-sb.xml                  |  13 ++
>  tests/ovn.at                    | 371 ++++++++++++++++++++++++++++++
> +++++++++-
>  9 files changed, 770 insertions(+), 10 deletions(-)
>
> diff --git a/ovn/controller/binding.c b/ovn/controller/binding.c
> index fb76032..f24e2e6 100644
> --- a/ovn/controller/binding.c
> +++ b/ovn/controller/binding.c
> @@ -68,7 +68,8 @@ static void
>  get_local_iface_ids(const struct ovsrec_bridge *br_int,
>                      struct shash *lport_to_iface,
>                      struct sset *all_lports,
> -                    struct sset *egress_ifaces)
> +                    struct sset *egress_ifaces,
> +                    const struct controller_ctx *ctx)
>  {
>      int i;
>
> @@ -88,6 +89,9 @@ get_local_iface_ids(const struct ovsrec_bridge *br_int,
>              iface_id = smap_get(&iface_rec->external_ids, "iface-id");
>
>              if (iface_id) {
> +                sbrec_port_binding_add_clause_
> logical_port(ctx->ovnsb_idl,
> +                                                           OVSDB_F_EQ,
> +                                                           iface_id);
>                  shash_add(lport_to_iface, iface_id, iface_rec);
>                  sset_add(all_lports, iface_id);
>              }
> @@ -106,8 +110,13 @@ get_local_iface_ids(const struct ovsrec_bridge
> *br_int,
>
>  static void
>  add_local_datapath(struct hmap *local_datapaths,
> -        const struct sbrec_port_binding *binding_rec)
> +        const struct sbrec_port_binding *binding_rec,
> +        const struct controller_ctx *ctx,
> +        const struct sbrec_chassis *chassis_rec)
>  {
> +
> +    do_datapath_flood_fill(ctx, binding_rec->datapath, chassis_rec);
> +
>

In previous versions, this was at the end of add_local_datapath.
In this version you moved it to the beginning.
Is there an advantage in running flood fill when the second, third,
fourth, etc ports are added for the same datapath?
It does add processing.


>      if (get_local_datapath(local_datapaths,
>                             binding_rec->datapath->tunnel_key)) {
>          return;
> @@ -293,7 +302,7 @@ consider_local_datapath(struct controller_ctx *ctx,
>              /* Add child logical port to the set of all local ports. */
>              sset_add(all_lports, binding_rec->logical_port);
>          }
> -        add_local_datapath(local_datapaths, binding_rec);
> +        add_local_datapath(local_datapaths, binding_rec, ctx,
> chassis_rec);
>          if (iface_rec && qos_map && ctx->ovs_idl_txn) {
>              get_qos_params(binding_rec, qos_map);
>          }
> @@ -328,7 +337,7 @@ consider_local_datapath(struct controller_ctx *ctx,
>          }
>
>          sset_add(all_lports, binding_rec->logical_port);
> -        add_local_datapath(local_datapaths, binding_rec);
> +        add_local_datapath(local_datapaths, binding_rec, ctx,
> chassis_rec);
>          if (binding_rec->chassis == chassis_rec) {
>              return;
>          }
> @@ -342,7 +351,7 @@ consider_local_datapath(struct controller_ctx *ctx,
>          const char *chassis = smap_get(&binding_rec->options,
>                                         "l3gateway-chassis");
>          if (!strcmp(chassis, chassis_rec->name) && ctx->ovnsb_idl_txn) {
> -            add_local_datapath(local_datapaths, binding_rec);
> +            add_local_datapath(local_datapaths, binding_rec, ctx,
> chassis_rec);
>          }
>      } else if (chassis_rec && binding_rec->chassis == chassis_rec) {
>          if (ctx->ovnsb_idl_txn) {
> @@ -351,6 +360,9 @@ consider_local_datapath(struct controller_ctx *ctx,
>              for (int i = 0; i < binding_rec->n_mac; i++) {
>                  VLOG_INFO("Releasing %s", binding_rec->mac[i]);
>              }
> +            sbrec_port_binding_remove_clause_logical_port(
> +                ctx->ovnsb_idl, OVSDB_F_EQ, binding_rec->logical_port);
> +
>              sbrec_port_binding_set_chassis(binding_rec, NULL);
>              sset_find_and_delete(all_lports, binding_rec->logical_port);
>          }
> @@ -381,13 +393,21 @@ binding_run(struct controller_ctx *ctx, const struct
> ovsrec_bridge *br_int,
>      hmap_init(&qos_map);
>      if (br_int) {
>          get_local_iface_ids(br_int, &lport_to_iface, all_lports,
> -                            &egress_ifaces);
> +                            &egress_ifaces, ctx);
>      }
>
> +    set_all_datapaths_of_interest_stale();
> +
>      /* Run through each binding record to see if it is resident on this
>       * chassis and update the binding accordingly.  This includes both
>       * directly connected logical ports and children of those ports. */
>      SBREC_PORT_BINDING_FOR_EACH(binding_rec, ctx->ovnsb_idl) {
> +
> +        /* Set any datapath, used by any port and that we are monitoring
> as used,
>

This line is too long.


> +         * so we don't remove the monitoring.
> +         */
> +        set_datapath_of_interest_used(binding_rec->datapath);
> +
>

This seems too broad. This would set the datapath as used if
you have received any SB port_bindings for that datapath, even
if those port_bindings are not relevant to this chassis.
How would datapaths ever be removed?

Why isn't the stale flag processing in add_datapath_of_interest
enough?
Was there some corner case that was missed?

         consider_local_datapath(ctx, chassis_rec, binding_rec,
>                                  sset_is_empty(&egress_ifaces) ? NULL :
>                                  &qos_map, local_datapaths,
> &lport_to_iface,
> diff --git a/ovn/controller/lflow.c b/ovn/controller/lflow.c
> index 4e67365..765eae4 100644
> --- a/ovn/controller/lflow.c
> +++ b/ovn/controller/lflow.c
> @@ -167,6 +167,11 @@ consider_logical_flow(const struct lport_index
> *lports,
>      if (!ldp) {
>          return;
>      }
> +
> +    VLOG_DBG("consider logical flow: tunnel_key %lu "
> +             "\"match\" \"%s\", \"actions\" \"%s\"",
> +             ldp->tunnel_key, lflow->match, lflow->actions);
> +
>      if (is_switch(ldp)) {
>          /* For a logical switch datapath, local_datapaths tells us if
> there
>           * are any local ports for this datapath.  If not, we can skip
> diff --git a/ovn/controller/ovn-controller.c b/ovn/controller/ovn-
> controller.c
> index fea7841..c06bd81 100644
> --- a/ovn/controller/ovn-controller.c
> +++ b/ovn/controller/ovn-controller.c
> @@ -69,6 +69,14 @@ OVS_NO_RETURN static void usage(void);
>
>  static char *ovs_remote;
>
> +struct dp_of_interest_node {
> +    struct hmap_node hmap_node;
> +    const struct sbrec_datapath_binding *sb_db;
> +    bool stale;
> +};
> +
> +static struct hmap dps_of_interest = HMAP_INITIALIZER(&dps_of_interest);
> +
>  struct local_datapath *
>  get_local_datapath(const struct hmap *local_datapaths, uint32_t
> tunnel_key)
>  {
> @@ -128,6 +136,181 @@ get_bridge(struct ovsdb_idl *ovs_idl, const char
> *br_name)
>      return NULL;
>  }
>
> +static bool
> +remote_l3gat_lrd_port(const struct sbrec_datapath_binding *dp,
> +                             const struct sbrec_chassis *chassis_rec)
> +{
> +    const char *l3gateway_chassis = smap_get(&dp->options,
> +                                             "l3gateway-chassis");
> +    if(l3gateway_chassis){
> +        if(strcmp(l3gateway_chassis, chassis_rec->name)){
> +            return true;
> +        }
> +    }
> +    return false;
> +}
> +
> +void
> +set_all_datapaths_of_interest_stale(void)
> +{
> +    struct dp_of_interest_node *node;
> +    HMAP_FOR_EACH(node, hmap_node, &dps_of_interest) {
> +            node->stale = true;
> +    }
> +}
> +
> +static void
> +remove_datapath_of_interest(struct controller_ctx *ctx,
> +                            const struct sbrec_datapath_binding *dp)
>

The only call to remove_datapath_of_interest is from
clean_stale_datapaths_of_interest.
Why not pass in dp_cur_node rather than dp, and avoid
the hash lookup?
Or just put all the logic in
clean_stale_datapaths_of_interest?


> +{
> +
> +    struct dp_of_interest_node *node;
> +    HMAP_FOR_EACH_WITH_HASH (node, hmap_node,
> uuid_hash(&dp->header_.uuid),
> +                             &dps_of_interest) {
> +        if (ctx && uuid_equals(&node->sb_db->header_.uuid,
> &dp->header_.uuid)) {
> +
> +            sbrec_port_binding_remove_clause_datapath(ctx->ovnsb_idl,
> +                                                      OVSDB_F_EQ,
> +                                                      &dp->header_.uuid);
> +            sbrec_logical_flow_remove_clause_logical_datapath(
> +                ctx->ovnsb_idl, OVSDB_F_EQ, &dp->header_.uuid);
> +            hmap_remove(&dps_of_interest, &node->hmap_node);
> +            free(node);
> +            return;
> +        }
> +    }
> +}
> +
> +static void
> +clean_stale_datapaths_of_interest(struct controller_ctx *ctx)
> +{
> +    struct dp_of_interest_node *dp_cur_node, *dp_next_node;
> +    HMAP_FOR_EACH_SAFE (dp_cur_node, dp_next_node, hmap_node,
> +                        &dps_of_interest) {
> +        if(dp_cur_node->stale == true) {
> +            remove_datapath_of_interest(ctx, dp_cur_node->sb_db);
> +        }
> +    }
> +}
> +
> +void
> +set_datapath_of_interest_used(const struct sbrec_datapath_binding *dp)
> +{
> +
> +    struct dp_of_interest_node *node;
> +    HMAP_FOR_EACH_WITH_HASH (node, hmap_node,
> uuid_hash(&dp->header_.uuid),
> +                             &dps_of_interest) {
> +        if (uuid_equals(&node->sb_db->header_.uuid, &dp->header_.uuid)) {
> +            node->stale = false;
> +        }
> +    }
> +}
> +
> +static void
> +add_datapath_of_interest(const struct controller_ctx *ctx,
> +                         const struct sbrec_datapath_binding *dp)
> +{
> +    struct dp_of_interest_node *node;
> +
> +    HMAP_FOR_EACH_WITH_HASH (node, hmap_node,
> uuid_hash(&dp->header_.uuid),
> +                             &dps_of_interest) {
> +        if (uuid_equals(&node->sb_db->header_.uuid, &dp->header_.uuid)) {
> +            node->stale = false;
> +            return;
> +        }
> +    }
> +
> +    VLOG_DBG("monitor datapath with tunnel_key %lu", dp->tunnel_key);
> +
> +    sbrec_logical_flow_add_clause_logical_datapath(ctx->ovnsb_idl,
> +                                                   OVSDB_F_EQ,
> +                                                   &dp->header_.uuid);
> +    sbrec_port_binding_add_clause_datapath(ctx->ovnsb_idl, OVSDB_F_EQ,
> +                                           &dp->header_.uuid);
> +
> +    node = xzalloc(sizeof *node);
> +    hmap_insert(&dps_of_interest, &node->hmap_node,
> +                uuid_hash(&dp->header_.uuid));
> +    node->sb_db = dp;
> +    return;
> +}
> +
> +void
> +do_datapath_flood_fill(const struct controller_ctx *ctx,
> +                              const struct sbrec_datapath_binding
> *dp_start,
> +                                          const struct sbrec_chassis
> *chassis_rec)
> +{
> +
> +    if(remote_l3gat_lrd_port(dp_start, chassis_rec)) {
> +        return;
> +    }
> +
> +    struct dp_checked {
> +        struct hmap_node hmap_node;
> +        const struct sbrec_datapath_binding *sb_db;
> +    };
> +    struct dp_checked *dp_node;
> +    struct hmap related_dp_processed = HMAP_INITIALIZER(&related_dp_
> processed);
>

This third level of lists is not easy to follow, having
dps_of_interest
related_dp_processed
interest_datapaths

It looks like this is similar to the flood_fill placement in
add_local_datapath.
Should each datapath be considered once in each run
through the main ovn-controller.c loop, such that
different flood_fill runs going through the same
datapaths do not investigate any further (as in v9)?
Or should each flood_fill run to completion,
recalculating the entire set of related datapaths each
time that flood_fill is called, even if many of those
datapaths are already present in dps_of_interest
(as in this v10)?

+
> +    struct interest_dps_list_elem {
> +        struct ovs_list list_node;
> +        const struct sbrec_datapath_binding * dp;
> +    };
> +    struct ovs_list interest_datapaths;
> +    ovs_list_init(&interest_datapaths);
> +
> +    struct interest_dps_list_elem *dp_list_elem =
> +        xzalloc(sizeof *dp_list_elem);
> +    dp_list_elem->dp = dp_start;
> +    ovs_list_push_back(&interest_datapaths, &dp_list_elem->list_node);
> +
> +    while (!ovs_list_is_empty(&interest_datapaths)) {
> +
> +        struct ovs_list *list_node_ptr =
> +            ovs_list_pop_front(&interest_datapaths);
> +        dp_list_elem = CONTAINER_OF(list_node_ptr,
> +            struct interest_dps_list_elem, list_node);
> +           size_t num_related_datapaths =
> +            dp_list_elem->dp->n_related_datapaths;
> +           struct sbrec_datapath_binding **related_datapaths =
> +            dp_list_elem->dp->related_datapaths;
> +
> +        dp_node = xzalloc(sizeof *dp_node);
> +        dp_node->sb_db = dp_list_elem->dp;
> +        hmap_insert(&related_dp_processed, &dp_node->hmap_node,
> +                     uuid_hash(&dp_node->sb_db->header_.uuid));
> +
> +        add_datapath_of_interest(ctx, dp_list_elem->dp);
> +        free(dp_list_elem);
> +        for (int i = 0; i < num_related_datapaths; i++) {
> +            bool found = false;
> +            HMAP_FOR_EACH_WITH_HASH (
> +                    dp_node, hmap_node,
> +                    uuid_hash(&related_datapaths[i]->header_.uuid),
> +                    &related_dp_processed) {
> +
> +                if (uuid_equals(&dp_node->sb_db->header_.uuid,
> +                                &related_datapaths[i]->header_.uuid)) {
> +                    found = true;
> +                    break;
> +                }
> +            }
> +            if(!found &&
> +               !remote_l3gat_lrd_port(related_datapaths[i],
> chassis_rec)) {
> +                dp_list_elem = xzalloc(sizeof *dp_list_elem);
> +                dp_list_elem->dp = related_datapaths[i];
> +                ovs_list_push_back(&interest_datapaths,
> &dp_list_elem->list_node);
> +            }
> +        }
> +    }
> +    struct dp_checked *cur_node, *next_node;
> +    HMAP_FOR_EACH_SAFE (cur_node, next_node, hmap_node,
> &related_dp_processed) {
> +        hmap_remove(&related_dp_processed, &cur_node->hmap_node);
> +        free(cur_node);
> +    }
> +    hmap_destroy(&related_dp_processed);
> +}
> +
>  static const struct ovsrec_bridge *
>  create_br_int(struct controller_ctx *ctx,
>                const struct ovsrec_open_vswitch *cfg,
> @@ -465,6 +648,10 @@ main(int argc, char *argv[])
>          ovsdb_idl_create(ovnsb_remote, &sbrec_idl_class, true, true));
>      ovsdb_idl_omit_alert(ovnsb_idl_loop.idl, &sbrec_chassis_col_nb_cfg);
>
> +    /* Start with suppressing all logical flows and then later request
> those
> +     * that are specifically needed. */
> +    sbrec_logical_flow_add_clause_false(ovnsb_idl_loop.idl);
> +
>      /* Track the southbound idl. */
>      ovsdb_idl_track_add_all(ovnsb_idl_loop.idl);
>
> @@ -561,6 +748,7 @@ main(int argc, char *argv[])
>              lport_index_destroy(&lports);
>          }
>
> +        clean_stale_datapaths_of_interest(&ctx);
>          sset_destroy(&all_lports);
>
>          struct local_datapath *cur_node, *next_node;
> diff --git a/ovn/controller/ovn-controller.h b/ovn/controller/ovn-
> controller.h
> index 8a3e855..e35b5b0 100644
> --- a/ovn/controller/ovn-controller.h
> +++ b/ovn/controller/ovn-controller.h
> @@ -76,6 +76,16 @@ const struct ovsrec_bridge *get_bridge(struct ovsdb_idl
> *,
>  const struct sbrec_chassis *get_chassis(struct ovsdb_idl *,
>                                          const char *chassis_id);
>
> +void do_datapath_flood_fill(const struct controller_ctx *,
> +                            const struct sbrec_datapath_binding *,
> +                            const struct sbrec_chassis *);
> +
> +void
> +set_datapath_of_interest_used(const struct sbrec_datapath_binding *dp);
> +
> +void
> +set_all_datapaths_of_interest_stale(void);
> +
>  /* Must be a bit-field ordered from most-preferred (higher number) to
>   * least-preferred (lower number). */
>  enum chassis_tunnel_type {
> diff --git a/ovn/controller/patch.c b/ovn/controller/patch.c
> index 79a7d81..9e8e856 100644
> --- a/ovn/controller/patch.c
> +++ b/ovn/controller/patch.c
> @@ -330,6 +330,9 @@ add_logical_patch_ports(struct controller_ctx *ctx,
>              free(dst_name);
>              free(src_name);
>              add_patched_datapath(patched_datapaths, binding, local_port);
> +
> +            sbrec_port_binding_add_clause_logical_port(ctx->ovnsb_idl,
> +                                                       OVSDB_F_EQ, peer);
>

Why is this necessary?
This could add logical ports that are unrelated to this chassis.

The northd logic adds related_datapaths for any patch port.
The flood fill based on related_datapaths should take care of
datapaths across patch ports, starting from datapaths that
this chassis does care about.

The add_local_datapath logic works for starting points
including vifs, l2gateway, and l3gateway.
What do you think is missing?

             if (local_port) {
>                  if (binding->chassis != chassis_rec &&
> ctx->ovnsb_idl_txn) {
>                      sbrec_port_binding_set_chassis(binding, chassis_rec);
> @@ -339,6 +342,26 @@ add_logical_patch_ports(struct controller_ctx *ctx,
>      }
>  }
>
> +/* Non-logical patch ports should be unaffected. */
> +static void
> +remove_patch_port_monitors(struct controller_ctx *ctx,
> +                            const struct ovsrec_port *port)
> +{
> +    for (size_t i = 0; i < port->n_interfaces; i++) {
> +        struct ovsrec_interface *iface = port->interfaces[i];
> +        if (strcmp(iface->type, "patch")) {
> +            continue;
>

I believe your code adds clauses for l3gateway ports above,
but does not remove those monitors here.

+        }
> +        const char *iface_id = smap_get(&iface->external_ids,
> +                                        "iface-id");
> +        if (iface_id) {


Patch interfaces don't have iface_id. They have options "peer",
and the ports (not interfaces) have external_ids with key
"ovn-logical-patch-port" or "ovn-l3gateway-port".

>

+            sbrec_port_binding_remove_clause_logical_port(ctx->ovnsb_idl,
> +                                                          OVSDB_F_EQ,
> +                                                          iface_id);
> +        }
> +    }
> +}
> +
>  void
>  patch_run(struct controller_ctx *ctx, const struct ovsrec_bridge *br_int,
>            const char *chassis_id, struct hmap *local_datapaths,
> @@ -374,6 +397,7 @@ patch_run(struct controller_ctx *ctx, const struct
> ovsrec_bridge *br_int,
>      SHASH_FOR_EACH_SAFE (port_node, port_next_node, &existing_ports) {
>          struct ovsrec_port *port = port_node->data;
>          shash_delete(&existing_ports, port_node);
> +        remove_patch_port_monitors(ctx, port);
>          remove_port(ctx, port);
>      }
>      shash_destroy(&existing_ports);
> diff --git a/ovn/northd/ovn-northd.c b/ovn/northd/ovn-northd.c
> index f2dc353..7ea5dc9 100644
> --- a/ovn/northd/ovn-northd.c
> +++ b/ovn/northd/ovn-northd.c
> @@ -231,6 +231,42 @@ struct tnlid_node {
>      uint32_t tnlid;
>  };
>
> +struct related_datapath_node {
> +    struct hmap_node hmap_node;
> +    const struct sbrec_datapath_binding *sb_db;
> +};
> +
> +static void
> +add_related_datapath(struct hmap *related_datapaths,
> +                     const struct sbrec_datapath_binding *sb,
> +                     size_t *n_related_datapaths)
> +{
> +    struct related_datapath_node *node;
> +    HMAP_FOR_EACH_WITH_HASH (node, hmap_node,
> +                             uuid_hash(&sb->header_.uuid),
> +                             related_datapaths) {
> +        if (uuid_equals(&sb->header_.uuid, &node->sb_db->header_.uuid)) {
> +            return;
> +        }
> +    }
> +
> +    node = xzalloc(sizeof *node);
> +    hmap_insert(related_datapaths, &node->hmap_node,
> +                uuid_hash(&sb->header_.uuid));
> +    node->sb_db = sb;
> +    (*n_related_datapaths)++;
> +}
> +
> +static void
> +destroy_related_datapaths(struct hmap *related_datapaths)
> +{
> +    struct related_datapath_node *node;
> +    HMAP_FOR_EACH_POP (node, hmap_node, related_datapaths) {
> +        free(node);
> +    }
> +    hmap_destroy(related_datapaths);
> +}
> +
>  static void
>  destroy_tnlids(struct hmap *tnlids)
>  {
> @@ -376,6 +412,8 @@ struct ovn_datapath {
>      size_t n_router_ports;
>
>      struct hmap port_tnlids;
> +    struct hmap related_datapaths;
> +    size_t n_related_datapaths;
>      uint32_t port_key_hint;
>
>      bool has_unknown;
> @@ -426,6 +464,7 @@ ovn_datapath_create(struct hmap *datapaths, const
> struct uuid *key,
>      od->nbr = nbr;
>      hmap_init(&od->port_tnlids);
>      hmap_init(&od->ipam);
> +    hmap_init(&od->related_datapaths);
>      od->port_key_hint = 0;
>      hmap_insert(datapaths, &od->key_node, uuid_hash(&od->key));
>      return od;
> @@ -441,6 +480,7 @@ ovn_datapath_destroy(struct hmap *datapaths, struct
> ovn_datapath *od)
>          hmap_remove(datapaths, &od->key_node);
>          destroy_tnlids(&od->port_tnlids);
>          destroy_ipam(&od->ipam);
> +        destroy_related_datapaths(&od->related_datapaths);
>          free(od->router_ports);
>          free(od);
>      }
> @@ -671,6 +711,69 @@ struct ovn_port {
>      struct ovs_list list;       /* In list of similar records. */
>  };
>
> +static void
> +populate_sb_related_datapaths(struct ovn_datapath *od)
> +{
> +    struct sbrec_datapath_binding **sb_related_datapaths
> +        = xmalloc(sizeof(*sb_related_datapaths) *
> od->n_related_datapaths);
> +    int rdi = 0;
> +    struct related_datapath_node *related_datapath;
> +    HMAP_FOR_EACH (related_datapath, hmap_node,
> +                   &od->related_datapaths) {
> +        if (rdi >= od->n_related_datapaths) {
> +            static struct vlog_rate_limit rl
> +                = VLOG_RATE_LIMIT_INIT(5, 1);
> +            VLOG_ERR_RL(&rl, "related datapaths accounting error"
> +                        UUID_FMT, UUID_ARGS(&od->key));
> +            break;
> +        }
> +
> +        sb_related_datapaths[rdi] = CONST_CAST(
> +            struct sbrec_datapath_binding *, related_datapath->sb_db);
> +        rdi++;
> +    }
> +    sbrec_datapath_binding_set_related_datapaths(od->sb,
> +        sb_related_datapaths, od->n_related_datapaths);
> +    free(sb_related_datapaths);
> +}
> +
> +static void
> +generate_sb_related_datapaths_from_ports(struct ovs_list *both)
> +{
> +    struct dp_checked {
> +           struct hmap_node hmap_node;
> +           const struct sbrec_datapath_binding *sb_db;
> +       };
> +       struct dp_checked *dp_node;
> +       struct hmap dp_processed = HMAP_INITIALIZER(&dp_processed);
> +    struct ovn_port *op, *next;
> +       LIST_FOR_EACH_SAFE (op, next, list, both) {
> +           bool found = false;
> +           HMAP_FOR_EACH_WITH_HASH (dp_node, hmap_node,
> +                                    uuid_hash(&op->od->sb->header_.uuid),
> +                                    &dp_processed) {
> +               if (uuid_equals(&dp_node->sb_db->header_.uuid,
> +                               &op->od->sb->header_.uuid)) {
> +                   found = true;
> +                   break;
> +               }
> +           }
> +           if(!found){
> +            dp_node = xzalloc(sizeof *dp_node);
> +               dp_node->sb_db = op->od->sb;
> +               hmap_insert(&dp_processed, &dp_node->hmap_node,
> +                           uuid_hash(&dp_node->sb_db->header_.uuid));
> +            populate_sb_related_datapaths(op->od);
> +           }
> +       }
> +       struct dp_checked *cur_node, *next_node;
> +       HMAP_FOR_EACH_SAFE (cur_node, next_node, hmap_node, &dp_processed)
> {
> +           hmap_remove(&dp_processed, &cur_node->hmap_node);
> +           free(cur_node);
> +       }
> +       hmap_destroy(&dp_processed);
> +}
> +
>  static struct ovn_port *
>  ovn_port_create(struct hmap *ports, const char *key,
>                  const struct nbrec_logical_switch_port *nbsp,
> @@ -1359,12 +1462,19 @@ ovn_port_update_sbrec(const struct ovn_port *op,
>              sbrec_port_binding_set_type(op->sb, "patch");
>          }
>
> +        if (op->peer && op->peer->od && op->peer->od->sb) {
> +            add_related_datapath(&op->od->related_datapaths,
> +                                 op->peer->od->sb,
> +                                 &op->od->n_related_datapaths);
> +        }
> +
>          const char *peer = op->peer ? op->peer->key : "<error>";
>          struct smap new;
>          smap_init(&new);
>          smap_add(&new, "peer", peer);
>          if (chassis) {
>              smap_add(&new, "l3gateway-chassis", chassis);
> +            sbrec_datapath_binding_set_options(op->sb->datapath, &new);
>

It would be cleaner to modify the datapath_binding in
build_datapaths, though that would require addition of a run
through the both list to update or remove "chassis" as
necessary, as well as modifying the existing run through
the nb_only list.

I don't see any logic to remove "chassis" if that is removed
from the nbr entry.

With this code order, you are also adding "peer" which does
not belong on datapath_binding. If this code stays here, then
the smap_add "peer" line should be moved below this.

         }
>          sbrec_port_binding_set_options(op->sb, &new);
>          smap_destroy(&new);
> @@ -1411,6 +1521,12 @@ ovn_port_update_sbrec(const struct ovn_port *op,
>                  sbrec_port_binding_set_type(op->sb, "patch");
>              }
>
> +            if (op->peer && op->peer->od && op->peer->od->sb) {
> +                add_related_datapath(&op->od->related_datapaths,
> +                                     op->peer->od->sb,
> +                                     &op->od->n_related_datapaths);
> +            }
> +
>              const char *router_port = smap_get_def(&op->nbsp->options,
>                                                     "router-port",
> "<error>");
>              struct smap new;
> @@ -1519,6 +1635,7 @@ build_ports(struct northd_context *ctx, struct hmap
> *datapaths,
>          cleanup_mac_bindings(ctx, ports);
>      }
>
> +    generate_sb_related_datapaths_from_ports(&both);
>      tag_alloc_destroy(&tag_alloc_table);
>      destroy_chassis_queues(&chassis_qdisc_queues);
>  }
> @@ -4952,6 +5069,10 @@ main(int argc, char *argv[])
>      add_column_noalert(ovnsb_idl_loop.idl,
>                         &sbrec_datapath_binding_col_tunnel_key);
>      add_column_noalert(ovnsb_idl_loop.idl,
> +                       &sbrec_datapath_binding_col_related_datapaths);
> +    add_column_noalert(ovnsb_idl_loop.idl,
> +                       &sbrec_datapath_binding_col_options);
> +    add_column_noalert(ovnsb_idl_loop.idl,
>                         &sbrec_datapath_binding_col_external_ids);
>
>      ovsdb_idl_add_table(ovnsb_idl_loop.idl, &sbrec_table_port_binding);
> diff --git a/ovn/ovn-sb.ovsschema b/ovn/ovn-sb.ovsschema
> index 89342fe..8dfb2f4 100644
> --- a/ovn/ovn-sb.ovsschema
> +++ b/ovn/ovn-sb.ovsschema
> @@ -1,7 +1,7 @@
>  {
>      "name": "OVN_Southbound",
> -    "version": "1.9.0",
> -    "cksum": "239060528 9012",
> +    "version": "1.10.0",
> +    "cksum": "3562877871 9537",
>      "tables": {
>          "SB_Global": {
>              "columns": {
> @@ -93,7 +93,17 @@
>                                        "maxInteger": 16777215}}},
>                  "external_ids": {
>                      "type": {"key": "string", "value": "string",
> -                             "min": 0, "max": "unlimited"}}},
> +                             "min": 0, "max": "unlimited"}},
> +                "options": {
> +                     "type": {"key": "string",
> +                              "value": "string",
> +                              "min": 0,
> +                              "max": "unlimited"}},
> +                "related_datapaths": {"type": {"key": {"type": "uuid",
> +                                      "refTable": "Datapath_Binding",
> +                                      "refType": "weak"},
> +                                      "min": 0,
> +                                      "max": "unlimited"}}},
>              "indexes": [["tunnel_key"]],
>              "isRoot": true},
>          "Port_Binding": {
> diff --git a/ovn/ovn-sb.xml b/ovn/ovn-sb.xml
> index 65191ed..88de25d 100644
> --- a/ovn/ovn-sb.xml
> +++ b/ovn/ovn-sb.xml
> @@ -1556,6 +1556,19 @@ tcp.flags = RST;
>        constructed for each supported encapsulation.
>      </column>
>
> +    <column name="options" key="l3gateway-chassis">
> +      The <code>chassis</code> in which the router datapath resides.
> +    </column>
> +
> +    <column name="related_datapaths">
> +      The related_datapaths column is used to keep track of which
> datapaths
> +      are connected to each other.  This is leveraged in ovn-controller to
> +      do a flood fill from each local logical switch to determine the full
> +      set of datapaths needed on a given ovn-controller.   This set of
> +      datapaths can be used to determine which logical ports and logical
> +      flows an ovn-controller should monitor.
> +    </column>
> +
>      <group title="OVN_Northbound Relationship">
>        <p>
>          Each row in <ref table="Datapath_Binding"/> is associated with
> some
> diff --git a/tests/ovn.at b/tests/ovn.at
> index a226849..af544e1 100644
> --- a/tests/ovn.at
> +++ b/tests/ovn.at
> @@ -1673,6 +1673,141 @@ OVN_CLEANUP([hv1],[hv2])
>
>  AT_CLEANUP
>
> +# 2 hypervisors, each with one logical switch with 2 logical ports.
> +AT_SETUP([ovn -- conditional monitoring: 2 HVs each with 1 LS and 2 LPs])
> +AT_SKIP_IF([test $HAVE_PYTHON = no])
> +ovn_start
> +
> +# In this test cases we create 2 switches, all connected to same
> +# physical network (through br-phys on each HV). Each switch has
> +# VIF ports across 1 HV. Each HV has 2 VIF ports. The first digit
> +# of VIF port name indicates the hypervisor it is bound to, e.g.
> +# lp12 means VIF 2 on hv1.
> +
> +for i in 1 2; do
> +    ls_name=ls$i
> +    ovn-nbctl ls-add $ls_name
> +done
> +
> +net_add n1
> +for i in 1 2; do
> +    sim_add hv$i
> +    as hv$i
> +    ovs-vsctl add-br br-phys
> +    ovn_attach n1 br-phys 192.168.0.$i
> +    ovs-appctl -t ovn-controller vlog/set ANY:file:DBG
> +
> +    for j in 1 2; do
> +        ovs-vsctl add-port br-int vif$i$j -- \
> +            set Interface vif$i$j external-ids:iface-id=lp$i$j \
> +                                  options:tx_pcap=hv$i/vif$i$j-tx.pcap \
> +                                  options:rxq_pcap=hv$i/vif$i$j-rx.pcap \
> +                                  ofport-request=$i$j
> +
> +        lsp_name=lp$i$j
> +        if test $i -eq 1; then
> +            ls_name=ls1
> +        else
> +            ls_name=ls2
> +        fi
> +
> +        ovn-nbctl lsp-add $ls_name $lsp_name
> +        ovn-nbctl lsp-set-addresses $lsp_name f0:00:00:00:00:$i$j
> +        ovn-nbctl lsp-set-port-security $lsp_name f0:00:00:00:00:$i$j
> +
> +        OVS_WAIT_UNTIL([test x`ovn-nbctl lsp-get-up $lsp_name` = xup])
> +    done
> +done
> +
> +ovn_populate_arp
> +
> +# XXX This is now the 3rd copy of these functions in this file ...
> +
> +# Given the name of a logical port, prints the name of the hypervisor
> +# on which it is located.
> +vif_to_hv() {
> +    echo hv${1%?}
> +}
> +#
> +# test_packet INPORT DST SRC ETHTYPE OUTPORT...
> +#
> +# This shell function causes a packet to be received on INPORT.  The
> packet's
> +# content has Ethernet destination DST and source SRC (each exactly 12 hex
> +# digits) and Ethernet type ETHTYPE (4 hex digits).  The OUTPORTs (zero or
> +# more) list the VIFs on which the packet should be received.  INPORT and
> the
> +# OUTPORTs are specified as logical switch port numbers, e.g. 11 for
> vif11.
> +for i in 1 2; do
> +    for j in 1 2; do
> +        : > $i$j.expected
> +    done
> +done
> +test_packet() {
> +    local inport=$1 src=$2 dst=$3 eth=$4; shift; shift; shift; shift
> +    local packet=${src}${dst}${eth}
> +    hv=`vif_to_hv $inport`
> +    vif=vif$inport
> +    as $hv ovs-appctl netdev-dummy/receive $vif $packet
> +    for outport; do
> +        echo $packet >> $outport.expected
> +    done
> +}
> +
> +# lp11 and lp12 are on the same network and on
> +# same hypervisors
> +test_packet 11 f00000000012 f00000000011 1112 12
> +test_packet 12 f00000000011 f00000000012 1211 11
> +
> +# lp21 and lp22 are on the same network and on
> +# the same hypervisor
> +test_packet 21 f00000000022 f00000000021 2122 22
> +test_packet 22 f00000000021 f00000000022 2221 21
> +
> +# lp11 and lp22 are on different networks so should
> +# not be able to communicate.
> +test_packet 11 f00000000022 f00000000011 1122
> +test_packet 22 f00000000011 f00000000022 2211
> +
> +echo "------ OVN dump ------"
> +ovn-nbctl show
> +ovn-sbctl show
> +
> +echo "------ hv1 dump ------"
> +as hv1 ovs-vsctl show
> +as hv1 ovs-ofctl -O OpenFlow13 dump-flows br-int
> +
> +echo "------ hv2 dump ------"
> +as hv2 ovs-vsctl show
> +as hv2 ovs-ofctl -O OpenFlow13 dump-flows br-int
> +
> +# Check that all datapath components are monitored as expected.
> +AT_CHECK([cat hv1/ovn-controller.log | grep 'tunnel_key 1'], [0],
> [ignore-nolog])
> +AT_CHECK([cat hv1/ovn-controller.log | grep 'tunnel_key 2'], [1],
> [ignore-nolog])
> +AT_CHECK([cat hv2/ovn-controller.log | grep 'tunnel_key 2'], [0],
> [ignore-nolog])
> +AT_CHECK([cat hv2/ovn-controller.log | grep 'tunnel_key 1'], [1],
> [ignore-nolog])
> +
> +dbg_instances1=`cat hv1/ovn-controller.log | grep 'tunnel_key 1' | wc -l`
> +echo ls1 monitor traces on hv1 $dbg_instances1
> +dbg_instances2=`cat hv1/ovn-controller.log | grep 'tunnel_key 2' | wc -l`
> +echo ls2 monitor traces on hv1 $dbg_instances2
> +AT_CHECK([test $dbg_instances1 -gt $dbg_instances2], [0], [ignore-nolog])
> +
> +dbg_instances1=`cat hv2/ovn-controller.log | grep 'tunnel_key 2' | wc -l`
> +echo ls2 monitor traces on hv2 $dbg_instances1
> +dbg_instances2=`cat hv2/ovn-controller.log | grep 'tunnel_key 1' | wc -l`
> +echo ls1 monitor traces on hv2 $dbg_instances2
> +AT_CHECK([test $dbg_instances1 -gt $dbg_instances2], [0], [ignore-nolog])
> +
> +# Now check the packets actually received against the ones expected.
> +for i in 1 2; do
> +    for j in 1 2; do
> +        OVN_CHECK_PACKETS([hv$i/vif$i$j-tx.pcap], [$i$j.expected])
> +    done
> +done
> +
> +OVN_CLEANUP([hv1],[hv2])
> +
> +AT_CLEANUP
> +
>  AT_SETUP([ovn -- vtep: 3 HVs, 1 VIFs/HV, 1 GW, 1 LS])
>  AT_KEYWORDS([vtep])
>  AT_SKIP_IF([test $HAVE_PYTHON = no])
> @@ -2439,6 +2574,223 @@ OVN_CLEANUP([hv1], [hv2], [hv3])
>
>  AT_CLEANUP
>
> +# 3 hypervisors, 3 logical switches with 3 logical ports each, 1 logical
> router
> +AT_SETUP([ovn -- conditional Monitoring: 3 HVs, 3 LS, 3 lports/LS, 1 LR])
> +AT_SKIP_IF([test $HAVE_PYTHON = no])
> +ovn_start
> +
> +# Logical network:
> +#
> +# Three logical switches ls1, ls2, ls3.
> +# One logical router lr0 connected to ls[123],
> +# with nine subnets, three per logical switch:
> +#
> +#    lrp11 on ls1 for subnet 192.168.11.0/24
> +#    lrp12 on ls1 for subnet 192.168.12.0/24
> +#    lrp13 on ls1 for subnet 192.168.13.0/24


This is confusing since the code below does not add
lrp11, lrp12, lrp13, just the subnets in ls1 which is
not attached to lr0.

I am having trouble understanding the motivation for
3 subnets per ls. It will work, just wondering why you
thought that it is necessary or helpful?

>
> +#    ...
> +#    lrp33 on ls3 for subnet 192.168.33.0/24
> +#
> +# 27 VIFs, 9 per LS, 3 per subnet: lp[123][123][123], where the first two
> +# digits are the subnet and the last digit distinguishes the VIF.
> +for i in 1 2 3; do
> +    ovn-nbctl ls-add ls$i
> +    for j in 1 2 3; do
> +        for k in 1 2 3; do
> +            ovn-nbctl \
> +                -- lsp-add ls$i lp$i$j$k \
> +                -- lsp-set-addresses lp$i$j$k "f0:00:00:00:0$i:$j$k \
> +                   192.168.$i$j.$k"
> +        done
> +    done
> +done
> +
> +# Only ls2 and ls3 are connected by the logical router.
> +ovn-nbctl lr-add lr0
> +for i in 2 3; do
> +    for j in 1 2 3; do
> +        ovn-nbctl lrp-add lr0 lrp$i$j 00:00:00:00:ff:$i$j
> 192.168.$i$j.254/24
> +        ovn-nbctl \
> +            -- lsp-add ls$i lrp$i$j-attachment \
> +            -- set Logical_Switch_Port lrp$i$j-attachment type=router \
> +                             options:router-port=lrp$i$j \
> +                             addresses='"00:00:00:00:ff:'$i$j'"'
> +    done
> +done
> +
> +# Physical network:
> +#
> +# Three hypervisors hv[123].
> +# lp[1??] on hv1
> +# lp[2??] on hv2.
> +# lp[3??] on hv3.
>

I like this. So ls3 is only present on hv3?
If so, this will test the related_datapaths part of the code, making
sure that hv2 grabs flows for datapaths lr0 and ls3 even though
there are no local VIFs in ls3 on hv2. This would catch the
code order northd issue from v9.

Mickey

+
> +# Given the name of a logical port, prints the name of the hypervisor
> +# on which it is located.
> +vif_to_hv() {
> +    case $1 in dnl (
> +        1??) echo 1 ;; dnl (
> +        2??) echo 2 ;; dnl (
> +        3??) echo 3 ;;
> +    esac
> +}
> +
> +# Given the name of a logical port, prints the name of its logical router
> +# port, e.g. "vif_to_lrp 123" yields 12.
> +vif_to_lrp() {
> +    echo ${1%?}
> +}
> +
> +# Given the name of a logical port, prints the name of its logical
> +# switch, e.g. "vif_to_ls 123" yields 1.
> +vif_to_ls() {
> +    echo ${1%??}
> +}
> +
> +net_add n1
> +for i in 1 2 3; do
> +    sim_add hv$i
> +    as hv$i
> +    ovs-vsctl add-br br-phys
> +    ovn_attach n1 br-phys 192.168.0.$i
> +    ovs-appctl -t ovn-controller vlog/set ANY:file:DBG
> +done
> +for i in 1 2 3; do
> +    for j in 1 2 3; do
> +        for k in 1 2 3; do
> +            hv=`vif_to_hv $i$j$k`
> +                as hv$hv ovs-vsctl \
> +                -- add-port br-int vif$i$j$k \
> +                -- set Interface vif$i$j$k \
> +                    external-ids:iface-id=lp$i$j$k \
> +                    options:tx_pcap=hv$hv/vif$i$j$k-tx.pcap \
> +                    options:rxq_pcap=hv$hv/vif$i$j$k-rx.pcap \
> +                    ofport-request=$i$j$k
> +        done
> +    done
> +done
> +
> +# Pre-populate the hypervisors' ARP tables so that we don't lose any
> +# packets for ARP resolution (native tunneling doesn't queue packets
> +# for ARP resolution).
> +ovn_populate_arp
> +
> +# Allow some time for ovn-northd and ovn-controller to catch up.
> +# XXX This should be more systematic.
> +sleep 1
> +
> +# test_ip INPORT SRC_MAC DST_MAC SRC_IP DST_IP OUTPORT...
> +#
> +# This shell function causes a packet to be received on INPORT.  The
> packet's
> +# content has Ethernet destination DST and source SRC (each exactly 12 hex
> +# digits) and Ethernet type ETHTYPE (4 hex digits).  The OUTPORTs (zero or
> +# more) list the VIFs on which the packet should be received.  INPORT and
> the
> +# OUTPORTs are specified as logical switch port numbers, e.g. 123 for
> vif123.
> +for i in 1 2 3; do
> +    for j in 1 2 3; do
> +        for k in 1 2 3; do
> +            : > $i$j$k.expected
> +        done
> +    done
> +done
> +test_ip() {
> +    # This packet has bad checksums but logical L3 routing doesn't check.
> +    local inport=$1 src_mac=$2 dst_mac=$3 src_ip=$4 dst_ip=$5
> +    local packet=${dst_mac}${src_mac}08004500001c0000000040110000${
> src_ip}${dst_ip}0035111100080000
> +    shift; shift; shift; shift; shift
> +    hv=hv`vif_to_hv $inport`
> +    as $hv ovs-appctl netdev-dummy/receive vif$inport $packet
> +    in_ls=`vif_to_ls $inport`
> +    in_lrp=`vif_to_lrp $inport`
> +    for outport; do
> +        out_ls=`vif_to_ls $outport`
> +        if test $in_ls = $out_ls; then
> +            # Ports on the same logical switch receive exactly the same
> packet.
> +            echo $packet
> +        else
> +            # Routing decrements TTL and updates source and dest MAC
> +            # (and checksum).
> +            out_lrp=`vif_to_lrp $outport`
> +            echo f00000000${outport}00000000ff$
> {out_lrp}08004500001c00000000"3f1101"00${src_ip}${dst_ip}0035111100080000
> +        fi >> $outport.expected
> +    done
> +}
> +
> +as hv1 ovn-sbctl list datapath_binding
> +echo "----------------------------"
> +as hv1 ovn-sbctl list port_binding
> +echo "----------------------------"
> +as hv1 ovn-sbctl dump-flows
> +#as hv1 ovs-vsctl --columns=name,ofport list interface
> +#as hv1 ovs-ofctl dump-flows br-int
> +
> +# Send IP packets between all pairs of source and destination ports:
> +#
> +# 1. Unicast IP packets are delivered to exactly one logical switch port
> +#    (except that packets destined to their input ports are dropped).
> +#
> +ip_to_hex() {
> +    printf "%02x%02x%02x%02x" "$@"
> +}
> +for is in 1 2 3; do
> +  for js in 1 2 3; do
> +    for ks in 1 2 3; do
> +      s=$is$js$ks
> +      smac=f00000000$s
> +      sip=`ip_to_hex 192 168 $is$js $ks`
> +      for id in 1 2 3; do
> +          for jd in 1 2 3; do
> +              for kd in 1 2 3; do
> +                d=$id$jd$kd
> +                dip=`ip_to_hex 192 168 $id$jd $kd`
> +                if test $is = $id; then dmac=f00000000$d; else
> dmac=00000000ff$is$js; fi
> +                if test $d != $s; then unicast=$d; else unicast=; fi
> +                if test $is = 1; then if test $is != $id; then unicast=;
> fi; fi
> +                if test $id = 1; then if test $is != $id; then unicast=;
> fi; fi
> +                test_ip $s $smac $dmac $sip $dip $unicast
> +              done
> +          done
> +        done
> +      done
> +  done
> +done
> +
> +# Allow some time for packet forwarding.
> +# XXX This can be improved.
> +sleep 1
> +
> +ovn-sbctl -f csv -d bare --no-heading \
> +    -- --columns=logical_port,ip,mac list mac_binding > mac_bindings
> +
> +# Now check the packets actually received against the ones expected.
> +for i in 1 2 3; do
> +    for j in 1 2 3; do
> +        for k in 1 2 3; do
> +            OVN_CHECK_PACKETS([hv`vif_to_hv $i$j$k`/vif$i$j$k-tx.pcap],
> +                              [$i$j$k.expected])
> +        done
> +    done
> +done
> +
> +# Check that all datapath components are monitored as expected.
> +AT_CHECK([cat hv1/ovn-controller.log | grep 'tunnel_key 1'], [0],
> [ignore-nolog])
> +AT_CHECK([cat hv1/ovn-controller.log | grep 'tunnel_key 2'], [1],
> [ignore-nolog])
> +AT_CHECK([cat hv1/ovn-controller.log | grep 'tunnel_key 3'], [1],
> [ignore-nolog])
> +AT_CHECK([cat hv1/ovn-controller.log | grep 'tunnel_key 4'], [1],
> [ignore-nolog])
> +AT_CHECK([cat hv2/ovn-controller.log | grep 'tunnel_key 1'], [1],
> [ignore-nolog])
> +AT_CHECK([cat hv2/ovn-controller.log | grep 'tunnel_key 2'], [0],
> [ignore-nolog])
> +AT_CHECK([cat hv2/ovn-controller.log | grep 'tunnel_key 3'], [0],
> [ignore-nolog])
> +AT_CHECK([cat hv2/ovn-controller.log | grep 'tunnel_key 4'], [0],
> [ignore-nolog])
> +AT_CHECK([cat hv3/ovn-controller.log | grep 'tunnel_key 1'], [1],
> [ignore-nolog])
> +AT_CHECK([cat hv3/ovn-controller.log | grep 'tunnel_key 2'], [0],
> [ignore-nolog])
> +AT_CHECK([cat hv3/ovn-controller.log | grep 'tunnel_key 3'], [0],
> [ignore-nolog])
> +AT_CHECK([cat hv3/ovn-controller.log | grep 'tunnel_key 4'], [0],
> [ignore-nolog])
> +
> +# Gracefully terminate daemons
> +OVN_CLEANUP([hv1], [hv2], [hv3])
> +
> +AT_CLEANUP
> +
>  # 3 hypervisors, one logical switch, 3 logical ports per hypervisor
>  AT_SETUP([ovn -- portsecurity : 3 HVs, 1 LS, 3 lports/HV])
>  AT_SKIP_IF([test $HAVE_PYTHON = no])
> @@ -2983,6 +3335,7 @@ ovs-vsctl -- add-port br-int vif2 -- \
>      options:tx_pcap=hv1/vif2-tx.pcap \
>      options:rxq_pcap=hv1/vif2-rx.pcap \
>      ofport-request=1
> +ovs-appctl -t ovn-controller vlog/set ANY:file:DBG
>
>
>  # Allow some time for ovn-northd and ovn-controller to catch up.
> @@ -3014,10 +3367,13 @@ echo "---------------------"
>  echo "---------SB dump-----"
>  ovn-sbctl list datapath_binding
>  echo "---------------------"
> +ovn-sbctl list port_binding
> +echo "---------------------"
>  ovn-sbctl list logical_flow
>  echo "---------------------"
>
>  echo "------ hv1 dump ----------"
> +as hv1 ovs-ofctl show br-int
>  as hv1 ovs-ofctl dump-flows br-int
>
>
> @@ -3031,10 +3387,13 @@ sleep 1
>  echo "---------SB dump-----"
>  ovn-sbctl list datapath_binding
>  echo "---------------------"
> +ovn-sbctl list port_binding
> +echo "---------------------"
>  ovn-sbctl list logical_flow
>  echo "---------------------"
>
>  echo "------ hv1 dump ----------"
> +as hv1 ovs-ofctl show br-int
>  as hv1 ovs-ofctl dump-flows br-int
>
>  as hv1 ovs-appctl netdev-dummy/receive vif1 $packet
> @@ -4109,6 +4468,7 @@ ovs-vsctl -- add-port br-int hv1-vif1 -- \
>      options:tx_pcap=hv1/vif1-tx.pcap \
>      options:rxq_pcap=hv1/vif1-rx.pcap \
>      ofport-request=1
> +ovs-appctl -t ovn-controller vlog/set ANY:file:DBG
>
>
>  sim_add hv2
> @@ -4120,6 +4480,7 @@ ovs-vsctl -- add-port br-int hv2-vif1 -- \
>      options:tx_pcap=hv2/vif1-tx.pcap \
>      options:rxq_pcap=hv2/vif1-rx.pcap \
>      ofport-request=1
> +ovs-appctl -t ovn-controller vlog/set ANY:file:DBG
>
>  # Pre-populate the hypervisors' ARP tables so that we don't lose any
>  # packets for ARP resolution (native tunneling doesn't queue packets
> @@ -4213,7 +4574,6 @@ src_ip=`ip_to_hex 192 168 1 2`
>  dst_ip=`ip_to_hex 172 16 1 2`
>  expected=${dst_mac}${src_mac}08004500001c000000003e110200${
> src_ip}${dst_ip}0035111100080000
>
> -
>  as hv1 ovs-appctl netdev-dummy/receive hv1-vif1 $packet
>  as hv1 ovs-appctl ofproto/trace br-int in_port=1 $packet
>
> @@ -4225,6 +4585,15 @@ as hv2 ovs-ofctl show br-int
>  as hv2 ovs-ofctl dump-flows br-int
>  echo "----------------------------"
>
> +# tunnel key 2 represents the gateway router, hence hv1 does not need
> +# to know about its ports and flows, since the transit switch acts as
> proxy.
> +AT_CHECK([cat hv2/ovn-controller.log | grep 'tunnel_key 2'], [0],
> [ignore-nolog])
> +AT_CHECK([cat hv1/ovn-controller.log | grep 'tunnel_key 2'], [1],
> [ignore-nolog])
> +# tunnel key 4 represents a datapath (alice) on the far side of the
> gateway from
> +# the perspective of hv1, hence hv1 does not need to know about its ports
> and flows.
> +AT_CHECK([cat hv2/ovn-controller.log | grep 'tunnel_key 4'], [0],
> [ignore-nolog])
> +AT_CHECK([cat hv1/ovn-controller.log | grep 'tunnel_key 4'], [1],
> [ignore-nolog])
> +
>  echo $expected > expected
>  OVN_CHECK_PACKETS([hv2/vif1-tx.pcap], [expected])
>
> --
> 1.9.1
>
> _______________________________________________
> dev mailing list
> dev@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>
Ben Pfaff Dec. 5, 2016, 7:22 a.m. UTC | #2
On Sun, Dec 04, 2016 at 04:13:44PM -0800, Darrell Ball wrote:
> This patch adds datapaths of interest support where only datapaths of
> local interest are monitored by the ovn-controller ovsdb client.  The
> idea is to do a flood fill in ovn-controller of datapath associations
> calculated by northd. A new column is added to the SB database
> datapath_binding table - related_datapaths to facilitate this so all
> datapaths associations are known quickly in ovn-controller.  This
> allows monitoring to adapt quickly with a single new monitor setting
> for all datapaths of interest locally.

Hi Darrell, the series I just sent out has some relevance here.  It
makes ovn-controller only implement the datapaths and ports flows that
are relevant to a given hypervisor, even though it does not affect what
part of the database is replicated.  The particularly relevant patch is
this:
        https://patchwork.ozlabs.org/patch/702608/

I suggest having a look at the series.

Thanks,

Ben.
Liran Schour Dec. 5, 2016, 11:10 a.m. UTC | #3
Darrell Ball <dlu998@gmail.com> wrote on 05/12/2016 02:13:44 AM:

> This patch adds datapaths of interest support where only datapaths of
> local interest are monitored by the ovn-controller ovsdb client.  The
> idea is to do a flood fill in ovn-controller of datapath associations
> calculated by northd. A new column is added to the SB database
> datapath_binding table - related_datapaths to facilitate this so all
> datapaths associations are known quickly in ovn-controller.  This
> allows monitoring to adapt quickly with a single new monitor setting
> for all datapaths of interest locally.
> 
> We monitor both logical flows and logical ports. The mac bindings
> will likely go away and multicast groups are relatively few.
> To reduce risk and overall latency we only suppress logical flows as
> a starting point. Logical ports are not suppressed as a starting
> point and but then are monitor suppressed as appropriate from that
> point on. The initial suppression has little practical gain and not
> doing it allows for us to be flexible in terms of how different port
> types are supported.
> 

I checked this patch against the original patch that conditionally monitor 
all 4 tables (Port_Binding, Logical_Flow, Multicast_Group, MAC_Binding 
tables). And found that this V10 patch outperform the original patch by 
85% on my environment. (V10: 2,645,656,251,389 cycles, origin patch: 
1,431,070,248,059 cycles).
Otherwise it looks like things are moving at the right direction.

- Liran
Russell Bryant Dec. 6, 2016, 7:45 p.m. UTC | #4
On Mon, Dec 5, 2016 at 2:22 AM, Ben Pfaff <blp@ovn.org> wrote:

> On Sun, Dec 04, 2016 at 04:13:44PM -0800, Darrell Ball wrote:
> > This patch adds datapaths of interest support where only datapaths of
> > local interest are monitored by the ovn-controller ovsdb client.  The
> > idea is to do a flood fill in ovn-controller of datapath associations
> > calculated by northd. A new column is added to the SB database
> > datapath_binding table - related_datapaths to facilitate this so all
> > datapaths associations are known quickly in ovn-controller.  This
> > allows monitoring to adapt quickly with a single new monitor setting
> > for all datapaths of interest locally.
>
> Hi Darrell, the series I just sent out has some relevance here.  It
> makes ovn-controller only implement the datapaths and ports flows that
> are relevant to a given hypervisor, even though it does not affect what
> part of the database is replicated.  The particularly relevant patch is
> this:
>         https://patchwork.ozlabs.org/patch/702608/
>
> I suggest having a look at the series.
>

I'm in the middle of doing some control plane performance testing using
OpenStack and OVN.  I may have time to do another run with some additional
patches applied to see how they affect performance.  Ideally I'd be able to
do that this week, though.

If I'm able to fit this in, what do you guys suggest if I've only got one
shot?  This patch or Ben's series?
Ben Pfaff Dec. 6, 2016, 8:35 p.m. UTC | #5
On Tue, Dec 06, 2016 at 02:45:19PM -0500, Russell Bryant wrote:
> On Mon, Dec 5, 2016 at 2:22 AM, Ben Pfaff <blp@ovn.org> wrote:
> 
> > On Sun, Dec 04, 2016 at 04:13:44PM -0800, Darrell Ball wrote:
> > > This patch adds datapaths of interest support where only datapaths of
> > > local interest are monitored by the ovn-controller ovsdb client.  The
> > > idea is to do a flood fill in ovn-controller of datapath associations
> > > calculated by northd. A new column is added to the SB database
> > > datapath_binding table - related_datapaths to facilitate this so all
> > > datapaths associations are known quickly in ovn-controller.  This
> > > allows monitoring to adapt quickly with a single new monitor setting
> > > for all datapaths of interest locally.
> >
> > Hi Darrell, the series I just sent out has some relevance here.  It
> > makes ovn-controller only implement the datapaths and ports flows that
> > are relevant to a given hypervisor, even though it does not affect what
> > part of the database is replicated.  The particularly relevant patch is
> > this:
> >         https://patchwork.ozlabs.org/patch/702608/
> >
> > I suggest having a look at the series.
> >
> 
> I'm in the middle of doing some control plane performance testing using
> OpenStack and OVN.  I may have time to do another run with some additional
> patches applied to see how they affect performance.  Ideally I'd be able to
> do that this week, though.
> 
> If I'm able to fit this in, what do you guys suggest if I've only got one
> shot?  This patch or Ben's series?

This series is aimed more at technical debt and code clarity than at
performance.  It might help with performance, but that's not the main
goal.

Darrell/Liran's series is motivated by performance.  If it doesn't help
performance, then probably it shouldn't be applied because it makes the
code harder to understand.  (Disclaimer: I haven't read the recent
versions.)
Ben Pfaff Dec. 6, 2016, 8:36 p.m. UTC | #6
On Tue, Dec 06, 2016 at 12:35:16PM -0800, Ben Pfaff wrote:
> On Tue, Dec 06, 2016 at 02:45:19PM -0500, Russell Bryant wrote:
> > On Mon, Dec 5, 2016 at 2:22 AM, Ben Pfaff <blp@ovn.org> wrote:
> > 
> > > On Sun, Dec 04, 2016 at 04:13:44PM -0800, Darrell Ball wrote:
> > > > This patch adds datapaths of interest support where only datapaths of
> > > > local interest are monitored by the ovn-controller ovsdb client.  The
> > > > idea is to do a flood fill in ovn-controller of datapath associations
> > > > calculated by northd. A new column is added to the SB database
> > > > datapath_binding table - related_datapaths to facilitate this so all
> > > > datapaths associations are known quickly in ovn-controller.  This
> > > > allows monitoring to adapt quickly with a single new monitor setting
> > > > for all datapaths of interest locally.
> > >
> > > Hi Darrell, the series I just sent out has some relevance here.  It
> > > makes ovn-controller only implement the datapaths and ports flows that
> > > are relevant to a given hypervisor, even though it does not affect what
> > > part of the database is replicated.  The particularly relevant patch is
> > > this:
> > >         https://patchwork.ozlabs.org/patch/702608/
> > >
> > > I suggest having a look at the series.
> > >
> > 
> > I'm in the middle of doing some control plane performance testing using
> > OpenStack and OVN.  I may have time to do another run with some additional
> > patches applied to see how they affect performance.  Ideally I'd be able to
> > do that this week, though.
> > 
> > If I'm able to fit this in, what do you guys suggest if I've only got one
> > shot?  This patch or Ben's series?
> 
> This series is aimed more at technical debt and code clarity than at
  ^^^^^^^^^^^
I meant "my series", not this one.

> performance.  It might help with performance, but that's not the main
> goal.
> 
> Darrell/Liran's series is motivated by performance.  If it doesn't help
> performance, then probably it shouldn't be applied because it makes the
> code harder to understand.  (Disclaimer: I haven't read the recent
> versions.)
Russell Bryant Dec. 6, 2016, 9:06 p.m. UTC | #7
On Tue, Dec 6, 2016 at 3:36 PM, Ben Pfaff <blp@ovn.org> wrote:

> On Tue, Dec 06, 2016 at 12:35:16PM -0800, Ben Pfaff wrote:
> > On Tue, Dec 06, 2016 at 02:45:19PM -0500, Russell Bryant wrote:
> > > On Mon, Dec 5, 2016 at 2:22 AM, Ben Pfaff <blp@ovn.org> wrote:
> > >
> > > > On Sun, Dec 04, 2016 at 04:13:44PM -0800, Darrell Ball wrote:
> > > > > This patch adds datapaths of interest support where only datapaths
> of
> > > > > local interest are monitored by the ovn-controller ovsdb client.
> The
> > > > > idea is to do a flood fill in ovn-controller of datapath
> associations
> > > > > calculated by northd. A new column is added to the SB database
> > > > > datapath_binding table - related_datapaths to facilitate this so
> all
> > > > > datapaths associations are known quickly in ovn-controller.  This
> > > > > allows monitoring to adapt quickly with a single new monitor
> setting
> > > > > for all datapaths of interest locally.
> > > >
> > > > Hi Darrell, the series I just sent out has some relevance here.  It
> > > > makes ovn-controller only implement the datapaths and ports flows
> that
> > > > are relevant to a given hypervisor, even though it does not affect
> what
> > > > part of the database is replicated.  The particularly relevant patch
> is
> > > > this:
> > > >         https://patchwork.ozlabs.org/patch/702608/
> > > >
> > > > I suggest having a look at the series.
> > > >
> > >
> > > I'm in the middle of doing some control plane performance testing using
> > > OpenStack and OVN.  I may have time to do another run with some
> additional
> > > patches applied to see how they affect performance.  Ideally I'd be
> able to
> > > do that this week, though.
> > >
> > > If I'm able to fit this in, what do you guys suggest if I've only got
> one
> > > shot?  This patch or Ben's series?
> >
> > This series is aimed more at technical debt and code clarity than at
>   ^^^^^^^^^^^
> I meant "my series", not this one.
>
> > performance.  It might help with performance, but that's not the main
> > goal.
> >
> > Darrell/Liran's series is motivated by performance.  If it doesn't help
> > performance, then probably it shouldn't be applied because it makes the
> > code harder to understand.  (Disclaimer: I haven't read the recent
> > versions.)
>

OK, thanks for confirming.

I'll do a run with this included if we have time, since this seems to be
the main pending change related to OVN control plane performance.  Let me
know if I missed something, though.
Russell Bryant Dec. 6, 2016, 10:03 p.m. UTC | #8
On Sun, Dec 4, 2016 at 7:13 PM, Darrell Ball <dlu998@gmail.com> wrote:

> This patch adds datapaths of interest support where only datapaths of
> local interest are monitored by the ovn-controller ovsdb client.  The
> idea is to do a flood fill in ovn-controller of datapath associations
> calculated by northd. A new column is added to the SB database
> datapath_binding table - related_datapaths to facilitate this so all
> datapaths associations are known quickly in ovn-controller.  This
> allows monitoring to adapt quickly with a single new monitor setting
> for all datapaths of interest locally.
>
> We monitor both logical flows and logical ports. The mac bindings
> will likely go away and multicast groups are relatively few.
> To reduce risk and overall latency we only suppress logical flows as
> a starting point. Logical ports are not suppressed as a starting
> point and but then are monitor suppressed as appropriate from that
> point on. The initial suppression has little practical gain and not
> doing it allows for us to be flexible in terms of how different port
> types are supported.
>
> To validate the gain, we take the approach to verify the l3gateway
> router flow suppression from the non-l3gateway HVs POV and also add
> a separate test to verify the flow and port handling scale advantages.
> The separate test added verifies the logical port and logical flow
> suppression advantage. This measures the difference in flows and ports
> sent to the HV clients where the far majority of the overall work is.
> The less ports and flows that are unnecessarily processed on the HVs,
> the more benefit, excluding latency skew. The separate test is
> simple for interpretation and shows an order of magnitude advantage;
> the test uses only logical switches. The advantage would increase with
> logical routers. The test is conservative in estimating the numerical
> advantage to increase the probability of the check passing the first
> time, since there are some timing conssiderations that affect the numbers.
>
> Liran Schour contributed enhancements to the conditional
> monitoring granularity in ovs and also submitted patches
> for ovn usage of conditional monitoring which aided this patch
> though sharing of concepts through code review work.
>
> Ben Pfaff suggested that northd could be used to pre-populate
> related datapaths for ovn-controller to use.  That idea is
> used as part of this patch.
>
> CC: Ben Pfaff <blp@ovn.org>
> CC: Liran Schour <LIRANS@il.ibm.com>
> Signed-off-by: Darrell Ball <dlu998@gmail.com>
> ---
>

When I apply this patch to master, one of the tests is failing for me.
This test passes if I drop the patch.

2296: ovn-controller - ovn-bridge-mappings            FAILED (
ovn-controller.at:45)
Darrell Ball Dec. 6, 2016, 10:13 p.m. UTC | #9
On 12/6/16, 2:03 PM, "ovs-dev-bounces@openvswitch.org on behalf of Russell Bryant" <ovs-dev-bounces@openvswitch.org on behalf of russell@ovn.org> wrote:

    On Sun, Dec 4, 2016 at 7:13 PM, Darrell Ball <dlu998@gmail.com> wrote:
    
    > This patch adds datapaths of interest support where only datapaths of
    > local interest are monitored by the ovn-controller ovsdb client.  The
    > idea is to do a flood fill in ovn-controller of datapath associations
    > calculated by northd. A new column is added to the SB database
    > datapath_binding table - related_datapaths to facilitate this so all
    > datapaths associations are known quickly in ovn-controller.  This
    > allows monitoring to adapt quickly with a single new monitor setting
    > for all datapaths of interest locally.
    >
    > We monitor both logical flows and logical ports. The mac bindings
    > will likely go away and multicast groups are relatively few.
    > To reduce risk and overall latency we only suppress logical flows as
    > a starting point. Logical ports are not suppressed as a starting
    > point and but then are monitor suppressed as appropriate from that
    > point on. The initial suppression has little practical gain and not
    > doing it allows for us to be flexible in terms of how different port
    > types are supported.
    >
    > To validate the gain, we take the approach to verify the l3gateway
    > router flow suppression from the non-l3gateway HVs POV and also add
    > a separate test to verify the flow and port handling scale advantages.
    > The separate test added verifies the logical port and logical flow
    > suppression advantage. This measures the difference in flows and ports
    > sent to the HV clients where the far majority of the overall work is.
    > The less ports and flows that are unnecessarily processed on the HVs,
    > the more benefit, excluding latency skew. The separate test is
    > simple for interpretation and shows an order of magnitude advantage;
    > the test uses only logical switches. The advantage would increase with
    > logical routers. The test is conservative in estimating the numerical
    > advantage to increase the probability of the check passing the first
    > time, since there are some timing conssiderations that affect the numbers.
    >
    > Liran Schour contributed enhancements to the conditional
    > monitoring granularity in ovs and also submitted patches
    > for ovn usage of conditional monitoring which aided this patch
    > though sharing of concepts through code review work.
    >
    > Ben Pfaff suggested that northd could be used to pre-populate
    > related datapaths for ovn-controller to use.  That idea is
    > used as part of this patch.
    >
    > CC: Ben Pfaff <blp@ovn.org>
    > CC: Liran Schour <LIRANS@il.ibm.com>
    > Signed-off-by: Darrell Ball <dlu998@gmail.com>
    > ---
    >
    
    When I apply this patch to master, one of the tests is failing for me.
    This test passes if I drop the patch.
    
    2296: ovn-controller - ovn-bridge-mappings            FAILED (
    ovn-controller.at:45)

I am looking at it
This test checks for patch ports to be created unconditionally, which changes
with conditional monitoring of ports.

    
    -- 
    Russell Bryant
    _______________________________________________
    dev mailing list
    dev@openvswitch.org
    https://urldefense.proofpoint.com/v2/url?u=https-3A__mail.openvswitch.org_mailman_listinfo_ovs-2Ddev&d=DgICAg&c=uilaK90D4TOVoH58JNXRgQ&r=BVhFA09CGX7JQ5Ih-uZnsw&m=wr5myh5e_jqsSNotKbCHZ0h1vKRN4DgPmWbXLwFUuaE&s=CxrUAL7Q_YpMPdeAJdUd7pX3EnhKNIzs3DLj9cF7xzE&e=
Darrell Ball Dec. 6, 2016, 10:49 p.m. UTC | #10
On 12/6/16, 2:13 PM, "ovs-dev-bounces@openvswitch.org on behalf of Darrell Ball" <ovs-dev-bounces@openvswitch.org on behalf of dball@vmware.com> wrote:

    
    
    On 12/6/16, 2:03 PM, "ovs-dev-bounces@openvswitch.org on behalf of Russell Bryant" <ovs-dev-bounces@openvswitch.org on behalf of russell@ovn.org> wrote:
    
        On Sun, Dec 4, 2016 at 7:13 PM, Darrell Ball <dlu998@gmail.com> wrote:
        
        > This patch adds datapaths of interest support where only datapaths of
        > local interest are monitored by the ovn-controller ovsdb client.  The
        > idea is to do a flood fill in ovn-controller of datapath associations
        > calculated by northd. A new column is added to the SB database
        > datapath_binding table - related_datapaths to facilitate this so all
        > datapaths associations are known quickly in ovn-controller.  This
        > allows monitoring to adapt quickly with a single new monitor setting
        > for all datapaths of interest locally.
        >
        > We monitor both logical flows and logical ports. The mac bindings
        > will likely go away and multicast groups are relatively few.
        > To reduce risk and overall latency we only suppress logical flows as
        > a starting point. Logical ports are not suppressed as a starting
        > point and but then are monitor suppressed as appropriate from that
        > point on. The initial suppression has little practical gain and not
        > doing it allows for us to be flexible in terms of how different port
        > types are supported.
        >
        > To validate the gain, we take the approach to verify the l3gateway
        > router flow suppression from the non-l3gateway HVs POV and also add
        > a separate test to verify the flow and port handling scale advantages.
        > The separate test added verifies the logical port and logical flow
        > suppression advantage. This measures the difference in flows and ports
        > sent to the HV clients where the far majority of the overall work is.
        > The less ports and flows that are unnecessarily processed on the HVs,
        > the more benefit, excluding latency skew. The separate test is
        > simple for interpretation and shows an order of magnitude advantage;
        > the test uses only logical switches. The advantage would increase with
        > logical routers. The test is conservative in estimating the numerical
        > advantage to increase the probability of the check passing the first
        > time, since there are some timing conssiderations that affect the numbers.
        >
        > Liran Schour contributed enhancements to the conditional
        > monitoring granularity in ovs and also submitted patches
        > for ovn usage of conditional monitoring which aided this patch
        > though sharing of concepts through code review work.
        >
        > Ben Pfaff suggested that northd could be used to pre-populate
        > related datapaths for ovn-controller to use.  That idea is
        > used as part of this patch.
        >
        > CC: Ben Pfaff <blp@ovn.org>
        > CC: Liran Schour <LIRANS@il.ibm.com>
        > Signed-off-by: Darrell Ball <dlu998@gmail.com>
        > ---
        >
        
        When I apply this patch to master, one of the tests is failing for me.
        This test passes if I drop the patch.
        
        2296: ovn-controller - ovn-bridge-mappings            FAILED (
        ovn-controller.at:45)
    
    I am looking at it
    This test checks for patch ports to be created unconditionally, which changes
    with conditional monitoring of ports.


I have updated the test accordingly
Sorry for the confusion; I did not expect a basic ovn-controller test to be exercising
patch port creation business logic

I will send out a fully verified patch later today
    
        
        -- 
        Russell Bryant
        _______________________________________________
        dev mailing list
        dev@openvswitch.org
        https://urldefense.proofpoint.com/v2/url?u=https-3A__mail.openvswitch.org_mailman_listinfo_ovs-2Ddev&d=DgICAg&c=uilaK90D4TOVoH58JNXRgQ&r=BVhFA09CGX7JQ5Ih-uZnsw&m=wr5myh5e_jqsSNotKbCHZ0h1vKRN4DgPmWbXLwFUuaE&s=CxrUAL7Q_YpMPdeAJdUd7pX3EnhKNIzs3DLj9cF7xzE&e= 
        
    
    _______________________________________________
    dev mailing list
    dev@openvswitch.org
    https://urldefense.proofpoint.com/v2/url?u=https-3A__mail.openvswitch.org_mailman_listinfo_ovs-2Ddev&d=DgICAg&c=uilaK90D4TOVoH58JNXRgQ&r=BVhFA09CGX7JQ5Ih-uZnsw&m=48tEDTZuSHq1Hsqxmi-zW_zmQo354KGaZbm0e93astY&s=VQ-TfEXjnUzVUC-1VnCdPV6jwR5JNHcmOWBnTBMhM2M&e=
Russell Bryant Dec. 7, 2016, 3:08 a.m. UTC | #11
On Tue, Dec 6, 2016 at 3:35 PM, Ben Pfaff <blp@ovn.org> wrote:

> On Tue, Dec 06, 2016 at 02:45:19PM -0500, Russell Bryant wrote:
> > On Mon, Dec 5, 2016 at 2:22 AM, Ben Pfaff <blp@ovn.org> wrote:
> >
> > > On Sun, Dec 04, 2016 at 04:13:44PM -0800, Darrell Ball wrote:
> > > > This patch adds datapaths of interest support where only datapaths of
> > > > local interest are monitored by the ovn-controller ovsdb client.  The
> > > > idea is to do a flood fill in ovn-controller of datapath associations
> > > > calculated by northd. A new column is added to the SB database
> > > > datapath_binding table - related_datapaths to facilitate this so all
> > > > datapaths associations are known quickly in ovn-controller.  This
> > > > allows monitoring to adapt quickly with a single new monitor setting
> > > > for all datapaths of interest locally.
> > >
> > > Hi Darrell, the series I just sent out has some relevance here.  It
> > > makes ovn-controller only implement the datapaths and ports flows that
> > > are relevant to a given hypervisor, even though it does not affect what
> > > part of the database is replicated.  The particularly relevant patch is
> > > this:
> > >         https://patchwork.ozlabs.org/patch/702608/
> > >
> > > I suggest having a look at the series.
> > >
> >
> > I'm in the middle of doing some control plane performance testing using
> > OpenStack and OVN.  I may have time to do another run with some
> additional
> > patches applied to see how they affect performance.  Ideally I'd be able
> to
> > do that this week, though.
> >
> > If I'm able to fit this in, what do you guys suggest if I've only got one
> > shot?  This patch or Ben's series?
>
> This series is aimed more at technical debt and code clarity than at
> performance.  It might help with performance, but that's not the main
> goal.
>
> Darrell/Liran's series is motivated by performance.  If it doesn't help
> performance, then probably it shouldn't be applied because it makes the
> code harder to understand.  (Disclaimer: I haven't read the recent
> versions.)
>

I just deployed this change in my environment and was not able to observe a
noticeable difference in CPU consumption of ovn-controller.  (ovn-northd
was updated, as well.)

The scenario was creating 500 VMs through OpenStack at a concurrency of 64
(up to 64 VMs at a time).  Each VM also received its own network.  The 500
VMs were distributed among 9 hypervisors.  Once all 500 VMs were up, they
were all deleted.

The CPU consumption looked roughly the same both before and after the patch
and time to complete the test scenario was also similar.

ovn-controller hits 100% CPU about 30% of the way into creating VMs and
stays there until we finish creating them and start deleting them instead.
Darrell Ball Dec. 7, 2016, 3:53 a.m. UTC | #12
On 12/4/16, 9:48 PM, "ovs-dev-bounces@openvswitch.org on behalf of Mickey Spiegel" <ovs-dev-bounces@openvswitch.org on behalf of mickeys.dev@gmail.com> wrote:

    On Sun, Dec 4, 2016 at 4:13 PM, Darrell Ball <dlu998@gmail.com> wrote:
    
    > This patch adds datapaths of interest support where only datapaths of

    > local interest are monitored by the ovn-controller ovsdb client.  The

    > idea is to do a flood fill in ovn-controller of datapath associations

    > calculated by northd. A new column is added to the SB database

    > datapath_binding table - related_datapaths to facilitate this so all

    > datapaths associations are known quickly in ovn-controller.  This

    > allows monitoring to adapt quickly with a single new monitor setting

    > for all datapaths of interest locally.

    >

    
    This is getting close. I have a few questions and comments inline.
    
    >

    > We monitor both logical flows and logical ports. The mac bindings

    > will likely go away and multicast groups are relatively few.

    > To reduce risk and overall latency we only suppress logical flows as

    > a starting point. Logical ports are not suppressed as a starting

    > point and but then are monitor suppressed as appropriate from that

    > point on. The initial suppression has little practical gain and not

    > doing it allows for us to be flexible in terms of how different port

    > types are supported.

    >

    
    I lack familiarity with conditional monitoring behavior when you do
    not explicitly suppress as a starting point. A new hv comes up, and
    starts downloading stuff from SB, including port_bindings. At some
    point the first VIF on that hv comes up, as a result of which the hv's
    controller calls sbrec_port_binding_add_clause_logical_port.
    Do all other port_bindings get suppressed at that point?
    If so, won't this lead to different behavior between the first time a
    VIF comes up on a hv, and subsequent additions of VIFs on
    unrelated and previously unseen datapaths?
    Wouldn't the behavior be more consistent if you suppress from the
    beginning?

I commented on this before; I wanted to push this part until later; but fine, let us push it now.
The issue is l2gateway logical ports which may legitimately exist as the only port in a LS datapath
on a given HV.
L2gateway LPs are only known on a HV when the LP l2gateway-chassis option is downloaded to a HV.
If we want initial suppression of LPs, then we need to do something special for l2gateway ports.
For now, monitor all l2gateway ports unconditionally.
L2gateways are both a logical port and a chassis; not a datapath as in the case of l3gateway
 (Possibly later, a physical HV level external_id could be used to specify the presence of the l2gateway logical port
similarly to VIFs.).
The number of l2gateway ports could be on the order of tenants, at least.

    
    >

    > To validate the gain, we take the approach to verify the l3gateway

    > router flow suppression from the non-l3gateway HVs POV and also add

    > a separate test to verify the flow and port handling scale advantages.

    > The separate test added verifies the logical port and logical flow

    > suppression advantage. This measures the difference in flows and ports

    > sent to the HV clients where the far majority of the overall work is.

    > The less ports and flows that are unnecessarily processed on the HVs,

    > the more benefit, excluding latency skew. The separate test is

    > simple for interpretation and shows an order of magnitude advantage;

    > the test uses only logical switches. The advantage would increase with

    > logical routers. The test is conservative in estimating the numerical

    > advantage to increase the probability of the check passing the first

    > time, since there are some timing conssiderations that affect the numbers.

    >

    
    This description should be updated to mention the distributed
    logical router tests that you added.


sure
    
    >

    > Liran Schour contributed enhancements to the conditional

    > monitoring granularity in ovs and also submitted patches

    > for ovn usage of conditional monitoring which aided this patch

    > though sharing of concepts through code review work.

    >

    > Ben Pfaff suggested that northd could be used to pre-populate

    > related datapaths for ovn-controller to use.  That idea is

    > used as part of this patch.

    >

    > CC: Ben Pfaff <blp@ovn.org>

    > CC: Liran Schour <LIRANS@il.ibm.com>

    > Signed-off-by: Darrell Ball <dlu998@gmail.com>

    > ---

    >

    > v9->v10: Fix ordering issue on northd as pointed out by Mickey.

    >

    >          Remove patch flood fill triggering optimization to

    >          make the code easier to verify and harder to break.

    >

    >          Fix a couple issues on ovn-controller side, including

    >          stale checking.

    >

    >          Add lots more focused testing - now 2 separate extensive

    >          tests and beefed up l3gateway test.

    >

    >          Remove dependency on port filtering for l3gateways filtering

    >          and associate l3gateway chassis name as part of datapath,

    >          similarly to logical ports. Some agreement on adding this

    >          extra field to the datapath binding table.

    >

    
    I certainly agree. It is good to have the filtering depend only
    on the contents of datapath_binding, with no dependencies
    on port_bindings. This way there is no issue with ordering
    dependencies when filtering port_bindings.
    
    >

    > v8->v9: Reinstate logical port monitoring from patch V3

    >         now to handle some LP high influence topologies, especially

    >         in lieu of a lack of incremental processing right now.

    >         The original plan was to forgo monitoring logical ports

    >         in the first commit, but without incremental processing,

    >         port processing gains importance.

    >

    >         Parse out some functions

    >

    >         Add a separate test for conditional monitoring to

    >         guesstimate the reduction in logical port and flow events

    >         seen at the HVs.

    >

    > v7->v8: Start wth logical flow conditional monitoring off.

    >         Remove deprecated code.

    >         Recover SB DB version number change.

    >         Change test to be more definitive.

    >

    > v6->v7: Rebase

    >

    > v5->v6: Rebase; fix stale handling.

    >

    > v4->v5: Correct cleanup of monitors.

    >         Fix warning.

    >

    > v3->v4: Refactor after incremental processing backout.

    >         Limit filtering to logical flows to limit risk.

    >

    > v2->v3: Line length violation fixups :/

    >

    > v1->v2: Added logical port removal monitoring handling, factoring

    >         in recent changes for incremental processing.  Added some

    >         comments in the code regarding initial conditions for

    >         database conditional monitoring.

    >

    >  ovn/controller/binding.c        |  32 +++-

    >  ovn/controller/lflow.c          |   5 +

    >  ovn/controller/ovn-controller.c | 188 ++++++++++++++++++++

    >  ovn/controller/ovn-controller.h |  10 ++

    >  ovn/controller/patch.c          |  24 +++

    >  ovn/northd/ovn-northd.c         | 121 +++++++++++++

    >  ovn/ovn-sb.ovsschema            |  16 +-

    >  ovn/ovn-sb.xml                  |  13 ++

    >  tests/ovn.at                    | 371 ++++++++++++++++++++++++++++++

    > +++++++++-

    >  9 files changed, 770 insertions(+), 10 deletions(-)

    >

    > diff --git a/ovn/controller/binding.c b/ovn/controller/binding.c

    > index fb76032..f24e2e6 100644

    > --- a/ovn/controller/binding.c

    > +++ b/ovn/controller/binding.c

    > @@ -68,7 +68,8 @@ static void

    >  get_local_iface_ids(const struct ovsrec_bridge *br_int,

    >                      struct shash *lport_to_iface,

    >                      struct sset *all_lports,

    > -                    struct sset *egress_ifaces)

    > +                    struct sset *egress_ifaces,

    > +                    const struct controller_ctx *ctx)

    >  {

    >      int i;

    >

    > @@ -88,6 +89,9 @@ get_local_iface_ids(const struct ovsrec_bridge *br_int,

    >              iface_id = smap_get(&iface_rec->external_ids, "iface-id");

    >

    >              if (iface_id) {

    > +                sbrec_port_binding_add_clause_

    > logical_port(ctx->ovnsb_idl,

    > +                                                           OVSDB_F_EQ,

    > +                                                           iface_id);

    >                  shash_add(lport_to_iface, iface_id, iface_rec);

    >                  sset_add(all_lports, iface_id);

    >              }

    > @@ -106,8 +110,13 @@ get_local_iface_ids(const struct ovsrec_bridge

    > *br_int,

    >

    >  static void

    >  add_local_datapath(struct hmap *local_datapaths,

    > -        const struct sbrec_port_binding *binding_rec)

    > +        const struct sbrec_port_binding *binding_rec,

    > +        const struct controller_ctx *ctx,

    > +        const struct sbrec_chassis *chassis_rec)

    >  {

    > +

    > +    do_datapath_flood_fill(ctx, binding_rec->datapath, chassis_rec);

    > +

    >

    
    In previous versions, this was at the end of add_local_datapath.
    In this version you moved it to the beginning.
    Is there an advantage in running flood fill when the second, third,
    fourth, etc ports are added for the same datapath?
    It does add processing.
    
All tests pass either way.

I moved it here because conceptually, I don’t want the local_datapaths check stopping
datapaths of interest checking. 
i.e. I don’t want any shortcircuiting of one feature by another.

    
    >      if (get_local_datapath(local_datapaths,

    >                             binding_rec->datapath->tunnel_key)) {

    >          return;

    > @@ -293,7 +302,7 @@ consider_local_datapath(struct controller_ctx *ctx,

    >              /* Add child logical port to the set of all local ports. */

    >              sset_add(all_lports, binding_rec->logical_port);

    >          }

    > -        add_local_datapath(local_datapaths, binding_rec);

    > +        add_local_datapath(local_datapaths, binding_rec, ctx,

    > chassis_rec);

    >          if (iface_rec && qos_map && ctx->ovs_idl_txn) {

    >              get_qos_params(binding_rec, qos_map);

    >          }

    > @@ -328,7 +337,7 @@ consider_local_datapath(struct controller_ctx *ctx,

    >          }

    >

    >          sset_add(all_lports, binding_rec->logical_port);

    > -        add_local_datapath(local_datapaths, binding_rec);

    > +        add_local_datapath(local_datapaths, binding_rec, ctx,

    > chassis_rec);

    >          if (binding_rec->chassis == chassis_rec) {

    >              return;

    >          }

    > @@ -342,7 +351,7 @@ consider_local_datapath(struct controller_ctx *ctx,

    >          const char *chassis = smap_get(&binding_rec->options,

    >                                         "l3gateway-chassis");

    >          if (!strcmp(chassis, chassis_rec->name) && ctx->ovnsb_idl_txn) {

    > -            add_local_datapath(local_datapaths, binding_rec);

    > +            add_local_datapath(local_datapaths, binding_rec, ctx,

    > chassis_rec);

    >          }

    >      } else if (chassis_rec && binding_rec->chassis == chassis_rec) {

    >          if (ctx->ovnsb_idl_txn) {

    > @@ -351,6 +360,9 @@ consider_local_datapath(struct controller_ctx *ctx,

    >              for (int i = 0; i < binding_rec->n_mac; i++) {

    >                  VLOG_INFO("Releasing %s", binding_rec->mac[i]);

    >              }

    > +            sbrec_port_binding_remove_clause_logical_port(

    > +                ctx->ovnsb_idl, OVSDB_F_EQ, binding_rec->logical_port);

    > +

    >              sbrec_port_binding_set_chassis(binding_rec, NULL);

    >              sset_find_and_delete(all_lports, binding_rec->logical_port);

    >          }

    > @@ -381,13 +393,21 @@ binding_run(struct controller_ctx *ctx, const struct

    > ovsrec_bridge *br_int,

    >      hmap_init(&qos_map);

    >      if (br_int) {

    >          get_local_iface_ids(br_int, &lport_to_iface, all_lports,

    > -                            &egress_ifaces);

    > +                            &egress_ifaces, ctx);

    >      }

    >

    > +    set_all_datapaths_of_interest_stale();

    > +

    >      /* Run through each binding record to see if it is resident on this

    >       * chassis and update the binding accordingly.  This includes both

    >       * directly connected logical ports and children of those ports. */

    >      SBREC_PORT_BINDING_FOR_EACH(binding_rec, ctx->ovnsb_idl) {

    > +

    > +        /* Set any datapath, used by any port and that we are monitoring

    > as used,

    >

    
    This line is too long.

fixed
    
    
    > +         * so we don't remove the monitoring.

    > +         */

    > +        set_datapath_of_interest_used(binding_rec->datapath);

    > +

    >

    
    This seems too broad. This would set the datapath as used if
    you have received any SB port_bindings for that datapath, even
    if those port_bindings are not relevant to this chassis.

It is not broad; it checks the datapath of interest DB and if a datapath is monitored
it sets it to NOT STALE; otherwise it changes nothing.

    How would datapaths ever be removed?

Datapaths get removed when no logical port refers to them. This is the definition of a stale datapath – 
so I know it has no dependencies.

    Why isn't the stale flag processing in add_datapath_of_interest
    enough?
    Was there some corner case that was missed?

I think the code in consider_local_datapath() could filter calls to
add_local_datapath(); one of the examples is if we don’t have a transaction ?

        if (!strcmp(chassis, chassis_rec->name) && ctx->ovnsb_idl_txn) {

I don’t want to have a dependency on something that should be irrelevant.

    
             consider_local_datapath(ctx, chassis_rec, binding_rec,
    >                                  sset_is_empty(&egress_ifaces) ? NULL :

    >                                  &qos_map, local_datapaths,

    > &lport_to_iface,

    > diff --git a/ovn/controller/lflow.c b/ovn/controller/lflow.c

    > index 4e67365..765eae4 100644

    > --- a/ovn/controller/lflow.c

    > +++ b/ovn/controller/lflow.c

    > @@ -167,6 +167,11 @@ consider_logical_flow(const struct lport_index

    > *lports,

    >      if (!ldp) {

    >          return;

    >      }

    > +

    > +    VLOG_DBG("consider logical flow: tunnel_key %lu "

    > +             "\"match\" \"%s\", \"actions\" \"%s\"",

    > +             ldp->tunnel_key, lflow->match, lflow->actions);

    > +

    >      if (is_switch(ldp)) {

    >          /* For a logical switch datapath, local_datapaths tells us if

    > there

    >           * are any local ports for this datapath.  If not, we can skip

    > diff --git a/ovn/controller/ovn-controller.c b/ovn/controller/ovn-

    > controller.c

    > index fea7841..c06bd81 100644

    > --- a/ovn/controller/ovn-controller.c

    > +++ b/ovn/controller/ovn-controller.c

    > @@ -69,6 +69,14 @@ OVS_NO_RETURN static void usage(void);

    >

    >  static char *ovs_remote;

    >

    > +struct dp_of_interest_node {

    > +    struct hmap_node hmap_node;

    > +    const struct sbrec_datapath_binding *sb_db;

    > +    bool stale;

    > +};

    > +

    > +static struct hmap dps_of_interest = HMAP_INITIALIZER(&dps_of_interest);

    > +

    >  struct local_datapath *

    >  get_local_datapath(const struct hmap *local_datapaths, uint32_t

    > tunnel_key)

    >  {

    > @@ -128,6 +136,181 @@ get_bridge(struct ovsdb_idl *ovs_idl, const char

    > *br_name)

    >      return NULL;

    >  }

    >

    > +static bool

    > +remote_l3gat_lrd_port(const struct sbrec_datapath_binding *dp,

    > +                             const struct sbrec_chassis *chassis_rec)

    > +{

    > +    const char *l3gateway_chassis = smap_get(&dp->options,

    > +                                             "l3gateway-chassis");

    > +    if(l3gateway_chassis){

    > +        if(strcmp(l3gateway_chassis, chassis_rec->name)){

    > +            return true;

    > +        }

    > +    }

    > +    return false;

    > +}

    > +

    > +void

    > +set_all_datapaths_of_interest_stale(void)

    > +{

    > +    struct dp_of_interest_node *node;

    > +    HMAP_FOR_EACH(node, hmap_node, &dps_of_interest) {

    > +            node->stale = true;

    > +    }

    > +}

    > +

    > +static void

    > +remove_datapath_of_interest(struct controller_ctx *ctx,

    > +                            const struct sbrec_datapath_binding *dp)

    >

    
    The only call to remove_datapath_of_interest is from
    clean_stale_datapaths_of_interest.
    Why not pass in dp_cur_node rather than dp, and avoid
    the hash lookup?
    Or just put all the logic in
    clean_stale_datapaths_of_interest?

Actually, remove_datapath_of_interest() is left over from a transition in coding
when it had an external caller as well.
Hence, remove_datapath_of_interest() is deprecated now; it is gone.

    
    > +{

    > +

    > +    struct dp_of_interest_node *node;

    > +    HMAP_FOR_EACH_WITH_HASH (node, hmap_node,

    > uuid_hash(&dp->header_.uuid),

    > +                             &dps_of_interest) {

    > +        if (ctx && uuid_equals(&node->sb_db->header_.uuid,

    > &dp->header_.uuid)) {

    > +

    > +            sbrec_port_binding_remove_clause_datapath(ctx->ovnsb_idl,

    > +                                                      OVSDB_F_EQ,

    > +                                                      &dp->header_.uuid);

    > +            sbrec_logical_flow_remove_clause_logical_datapath(

    > +                ctx->ovnsb_idl, OVSDB_F_EQ, &dp->header_.uuid);

    > +            hmap_remove(&dps_of_interest, &node->hmap_node);

    > +            free(node);

    > +            return;

    > +        }

    > +    }

    > +}

    > +

    > +static void

    > +clean_stale_datapaths_of_interest(struct controller_ctx *ctx)

    > +{

    > +    struct dp_of_interest_node *dp_cur_node, *dp_next_node;

    > +    HMAP_FOR_EACH_SAFE (dp_cur_node, dp_next_node, hmap_node,

    > +                        &dps_of_interest) {

    > +        if(dp_cur_node->stale == true) {

    > +            remove_datapath_of_interest(ctx, dp_cur_node->sb_db);

    > +        }

    > +    }

    > +}

    > +

    > +void

    > +set_datapath_of_interest_used(const struct sbrec_datapath_binding *dp)

    > +{

    > +

    > +    struct dp_of_interest_node *node;

    > +    HMAP_FOR_EACH_WITH_HASH (node, hmap_node,

    > uuid_hash(&dp->header_.uuid),

    > +                             &dps_of_interest) {

    > +        if (uuid_equals(&node->sb_db->header_.uuid, &dp->header_.uuid)) {

    > +            node->stale = false;

    > +        }

    > +    }

    > +}

    > +

    > +static void

    > +add_datapath_of_interest(const struct controller_ctx *ctx,

    > +                         const struct sbrec_datapath_binding *dp)

    > +{

    > +    struct dp_of_interest_node *node;

    > +

    > +    HMAP_FOR_EACH_WITH_HASH (node, hmap_node,

    > uuid_hash(&dp->header_.uuid),

    > +                             &dps_of_interest) {

    > +        if (uuid_equals(&node->sb_db->header_.uuid, &dp->header_.uuid)) {

    > +            node->stale = false;

    > +            return;

    > +        }

    > +    }

    > +

    > +    VLOG_DBG("monitor datapath with tunnel_key %lu", dp->tunnel_key);

    > +

    > +    sbrec_logical_flow_add_clause_logical_datapath(ctx->ovnsb_idl,

    > +                                                   OVSDB_F_EQ,

    > +                                                   &dp->header_.uuid);

    > +    sbrec_port_binding_add_clause_datapath(ctx->ovnsb_idl, OVSDB_F_EQ,

    > +                                           &dp->header_.uuid);

    > +

    > +    node = xzalloc(sizeof *node);

    > +    hmap_insert(&dps_of_interest, &node->hmap_node,

    > +                uuid_hash(&dp->header_.uuid));

    > +    node->sb_db = dp;

    > +    return;

    > +}

    > +

    > +void

    > +do_datapath_flood_fill(const struct controller_ctx *ctx,

    > +                              const struct sbrec_datapath_binding

    > *dp_start,

    > +                                          const struct sbrec_chassis

    > *chassis_rec)

    > +{

    > +

    > +    if(remote_l3gat_lrd_port(dp_start, chassis_rec)) {

    > +        return;

    > +    }

    > +

    > +    struct dp_checked {

    > +        struct hmap_node hmap_node;

    > +        const struct sbrec_datapath_binding *sb_db;

    > +    };

    > +    struct dp_checked *dp_node;

    > +    struct hmap related_dp_processed = HMAP_INITIALIZER(&related_dp_

    > processed);

    >

    
    This third level of lists is not easy to follow, having
    dps_of_interest

    related_dp_processed
This map is to remember what we saw and not follow it again
during processing one list.

    interest_datapaths


    It looks like this is similar to the flood_fill placement in
    add_local_datapath.
    Should each datapath be considered once in each run
    through the main ovn-controller.c loop, such that
    different flood_fill runs going through the same
    datapaths do not investigate any further (as in v9)?
    Or should each flood_fill run to completion,
    recalculating the entire set of related datapaths each
    time that flood_fill is called, even if many of those
    datapaths are already present in dps_of_interest
    (as in this v10)?

Yes, I changed it from v9 to v10.
The number of datapaths is small relative to the number of ports, so it is
only slightly more expensive. 
The issue is premature cleanup due to partial port bindings at startup - I’ll investigate this more later.
Also, in general, I don’t want to make any assumptions about input data that I don’t have to.

    +
    > +    struct interest_dps_list_elem {

    > +        struct ovs_list list_node;

    > +        const struct sbrec_datapath_binding * dp;

    > +    };

    > +    struct ovs_list interest_datapaths;

    > +    ovs_list_init(&interest_datapaths);

    > +

    > +    struct interest_dps_list_elem *dp_list_elem =

    > +        xzalloc(sizeof *dp_list_elem);

    > +    dp_list_elem->dp = dp_start;

    > +    ovs_list_push_back(&interest_datapaths, &dp_list_elem->list_node);

    > +

    > +    while (!ovs_list_is_empty(&interest_datapaths)) {

    > +

    > +        struct ovs_list *list_node_ptr =

    > +            ovs_list_pop_front(&interest_datapaths);

    > +        dp_list_elem = CONTAINER_OF(list_node_ptr,

    > +            struct interest_dps_list_elem, list_node);

    > +           size_t num_related_datapaths =

    > +            dp_list_elem->dp->n_related_datapaths;

    > +           struct sbrec_datapath_binding **related_datapaths =

    > +            dp_list_elem->dp->related_datapaths;

    > +

    > +        dp_node = xzalloc(sizeof *dp_node);

    > +        dp_node->sb_db = dp_list_elem->dp;

    > +        hmap_insert(&related_dp_processed, &dp_node->hmap_node,

    > +                     uuid_hash(&dp_node->sb_db->header_.uuid));

    > +

    > +        add_datapath_of_interest(ctx, dp_list_elem->dp);

    > +        free(dp_list_elem);

    > +        for (int i = 0; i < num_related_datapaths; i++) {

    > +            bool found = false;

    > +            HMAP_FOR_EACH_WITH_HASH (

    > +                    dp_node, hmap_node,

    > +                    uuid_hash(&related_datapaths[i]->header_.uuid),

    > +                    &related_dp_processed) {

    > +

    > +                if (uuid_equals(&dp_node->sb_db->header_.uuid,

    > +                                &related_datapaths[i]->header_.uuid)) {

    > +                    found = true;

    > +                    break;

    > +                }

    > +            }

    > +            if(!found &&

    > +               !remote_l3gat_lrd_port(related_datapaths[i],

    > chassis_rec)) {

    > +                dp_list_elem = xzalloc(sizeof *dp_list_elem);

    > +                dp_list_elem->dp = related_datapaths[i];

    > +                ovs_list_push_back(&interest_datapaths,

    > &dp_list_elem->list_node);

    > +            }

    > +        }

    > +    }

    > +    struct dp_checked *cur_node, *next_node;

    > +    HMAP_FOR_EACH_SAFE (cur_node, next_node, hmap_node,

    > &related_dp_processed) {

    > +        hmap_remove(&related_dp_processed, &cur_node->hmap_node);

    > +        free(cur_node);

    > +    }

    > +    hmap_destroy(&related_dp_processed);

    > +}

    > +

    >  static const struct ovsrec_bridge *

    >  create_br_int(struct controller_ctx *ctx,

    >                const struct ovsrec_open_vswitch *cfg,

    > @@ -465,6 +648,10 @@ main(int argc, char *argv[])

    >          ovsdb_idl_create(ovnsb_remote, &sbrec_idl_class, true, true));

    >      ovsdb_idl_omit_alert(ovnsb_idl_loop.idl, &sbrec_chassis_col_nb_cfg);

    >

    > +    /* Start with suppressing all logical flows and then later request

    > those

    > +     * that are specifically needed. */

    > +    sbrec_logical_flow_add_clause_false(ovnsb_idl_loop.idl);

    > +

    >      /* Track the southbound idl. */

    >      ovsdb_idl_track_add_all(ovnsb_idl_loop.idl);

    >

    > @@ -561,6 +748,7 @@ main(int argc, char *argv[])

    >              lport_index_destroy(&lports);

    >          }

    >

    > +        clean_stale_datapaths_of_interest(&ctx);

    >          sset_destroy(&all_lports);

    >

    >          struct local_datapath *cur_node, *next_node;

    > diff --git a/ovn/controller/ovn-controller.h b/ovn/controller/ovn-

    > controller.h

    > index 8a3e855..e35b5b0 100644

    > --- a/ovn/controller/ovn-controller.h

    > +++ b/ovn/controller/ovn-controller.h

    > @@ -76,6 +76,16 @@ const struct ovsrec_bridge *get_bridge(struct ovsdb_idl

    > *,

    >  const struct sbrec_chassis *get_chassis(struct ovsdb_idl *,

    >                                          const char *chassis_id);

    >

    > +void do_datapath_flood_fill(const struct controller_ctx *,

    > +                            const struct sbrec_datapath_binding *,

    > +                            const struct sbrec_chassis *);

    > +

    > +void

    > +set_datapath_of_interest_used(const struct sbrec_datapath_binding *dp);

    > +

    > +void

    > +set_all_datapaths_of_interest_stale(void);

    > +

    >  /* Must be a bit-field ordered from most-preferred (higher number) to

    >   * least-preferred (lower number). */

    >  enum chassis_tunnel_type {

    > diff --git a/ovn/controller/patch.c b/ovn/controller/patch.c

    > index 79a7d81..9e8e856 100644

    > --- a/ovn/controller/patch.c

    > +++ b/ovn/controller/patch.c

    > @@ -330,6 +330,9 @@ add_logical_patch_ports(struct controller_ctx *ctx,

    >              free(dst_name);

    >              free(src_name);

    >              add_patched_datapath(patched_datapaths, binding, local_port);

    > +

    > +            sbrec_port_binding_add_clause_logical_port(ctx->ovnsb_idl,

    > +                                                       OVSDB_F_EQ, peer);

    >

    
    Why is this necessary?
    This could add logical ports that are unrelated to this chassis.

This is for peer patch ports.
This has no benefit anymore and it will monitor some patch ports unnecessarily going forward.
I removed it and the corresponding cleanup.
patch.c changes are now zero.

    The northd logic adds related_datapaths for any patch port.
    The flood fill based on related_datapaths should take care of
    datapaths across patch ports, starting from datapaths that
    this chassis does care about.
    
    The add_local_datapath logic works for starting points
    including vifs, l2gateway, and l3gateway.
    What do you think is missing?


                 if (local_port) {
    >                  if (binding->chassis != chassis_rec &&

    > ctx->ovnsb_idl_txn) {

    >                      sbrec_port_binding_set_chassis(binding, chassis_rec);

    > @@ -339,6 +342,26 @@ add_logical_patch_ports(struct controller_ctx *ctx,

    >      }

    >  }

    >

    > +/* Non-logical patch ports should be unaffected. */

    > +static void

    > +remove_patch_port_monitors(struct controller_ctx *ctx,

    > +                            const struct ovsrec_port *port)

    > +{

    > +    for (size_t i = 0; i < port->n_interfaces; i++) {

    > +        struct ovsrec_interface *iface = port->interfaces[i];

    > +        if (strcmp(iface->type, "patch")) {

    > +            continue;

    >

    
    I believe your code adds clauses for l3gateway ports above,
    but does not remove those monitors here.

    
    +        }
    > +        const char *iface_id = smap_get(&iface->external_ids,

    > +                                        "iface-id");

    > +        if (iface_id) {

    
    
    Patch interfaces don't have iface_id. They have options "peer",
    and the ports (not interfaces) have external_ids with key
    "ovn-logical-patch-port" or "ovn-l3gateway-port".

That was a sloppy copy/paste from binding.c to check for external ids that I never finished afterwards.
I intended to check for these:
            smap_get(&port->external_ids, "ovn-l3gateway-port")
            smap_get(&port->external_ids, "ovn-logical-patch-port")

The iface_id argument below also did not get updated.
This code is deprecated now anyways.


    >

    
    +            sbrec_port_binding_remove_clause_logical_port(ctx->ovnsb_idl,
    > +                                                          OVSDB_F_EQ,

    > +                                                          iface_id);

    > +        }

    > +    }

    > +}

    > +

    >  void

    >  patch_run(struct controller_ctx *ctx, const struct ovsrec_bridge *br_int,

    >            const char *chassis_id, struct hmap *local_datapaths,

    > @@ -374,6 +397,7 @@ patch_run(struct controller_ctx *ctx, const struct

    > ovsrec_bridge *br_int,

    >      SHASH_FOR_EACH_SAFE (port_node, port_next_node, &existing_ports) {

    >          struct ovsrec_port *port = port_node->data;

    >          shash_delete(&existing_ports, port_node);

    > +        remove_patch_port_monitors(ctx, port);

    >          remove_port(ctx, port);

    >      }

    >      shash_destroy(&existing_ports);

    > diff --git a/ovn/northd/ovn-northd.c b/ovn/northd/ovn-northd.c

    > index f2dc353..7ea5dc9 100644

    > --- a/ovn/northd/ovn-northd.c

    > +++ b/ovn/northd/ovn-northd.c

    > @@ -231,6 +231,42 @@ struct tnlid_node {

    >      uint32_t tnlid;

    >  };

    >

    > +struct related_datapath_node {

    > +    struct hmap_node hmap_node;

    > +    const struct sbrec_datapath_binding *sb_db;

    > +};

    > +

    > +static void

    > +add_related_datapath(struct hmap *related_datapaths,

    > +                     const struct sbrec_datapath_binding *sb,

    > +                     size_t *n_related_datapaths)

    > +{

    > +    struct related_datapath_node *node;

    > +    HMAP_FOR_EACH_WITH_HASH (node, hmap_node,

    > +                             uuid_hash(&sb->header_.uuid),

    > +                             related_datapaths) {

    > +        if (uuid_equals(&sb->header_.uuid, &node->sb_db->header_.uuid)) {

    > +            return;

    > +        }

    > +    }

    > +

    > +    node = xzalloc(sizeof *node);

    > +    hmap_insert(related_datapaths, &node->hmap_node,

    > +                uuid_hash(&sb->header_.uuid));

    > +    node->sb_db = sb;

    > +    (*n_related_datapaths)++;

    > +}

    > +

    > +static void

    > +destroy_related_datapaths(struct hmap *related_datapaths)

    > +{

    > +    struct related_datapath_node *node;

    > +    HMAP_FOR_EACH_POP (node, hmap_node, related_datapaths) {

    > +        free(node);

    > +    }

    > +    hmap_destroy(related_datapaths);

    > +}

    > +

    >  static void

    >  destroy_tnlids(struct hmap *tnlids)

    >  {

    > @@ -376,6 +412,8 @@ struct ovn_datapath {

    >      size_t n_router_ports;

    >

    >      struct hmap port_tnlids;

    > +    struct hmap related_datapaths;

    > +    size_t n_related_datapaths;

    >      uint32_t port_key_hint;

    >

    >      bool has_unknown;

    > @@ -426,6 +464,7 @@ ovn_datapath_create(struct hmap *datapaths, const

    > struct uuid *key,

    >      od->nbr = nbr;

    >      hmap_init(&od->port_tnlids);

    >      hmap_init(&od->ipam);

    > +    hmap_init(&od->related_datapaths);

    >      od->port_key_hint = 0;

    >      hmap_insert(datapaths, &od->key_node, uuid_hash(&od->key));

    >      return od;

    > @@ -441,6 +480,7 @@ ovn_datapath_destroy(struct hmap *datapaths, struct

    > ovn_datapath *od)

    >          hmap_remove(datapaths, &od->key_node);

    >          destroy_tnlids(&od->port_tnlids);

    >          destroy_ipam(&od->ipam);

    > +        destroy_related_datapaths(&od->related_datapaths);

    >          free(od->router_ports);

    >          free(od);

    >      }

    > @@ -671,6 +711,69 @@ struct ovn_port {

    >      struct ovs_list list;       /* In list of similar records. */

    >  };

    >

    > +static void

    > +populate_sb_related_datapaths(struct ovn_datapath *od)

    > +{

    > +    struct sbrec_datapath_binding **sb_related_datapaths

    > +        = xmalloc(sizeof(*sb_related_datapaths) *

    > od->n_related_datapaths);

    > +    int rdi = 0;

    > +    struct related_datapath_node *related_datapath;

    > +    HMAP_FOR_EACH (related_datapath, hmap_node,

    > +                   &od->related_datapaths) {

    > +        if (rdi >= od->n_related_datapaths) {

    > +            static struct vlog_rate_limit rl

    > +                = VLOG_RATE_LIMIT_INIT(5, 1);

    > +            VLOG_ERR_RL(&rl, "related datapaths accounting error"

    > +                        UUID_FMT, UUID_ARGS(&od->key));

    > +            break;

    > +        }

    > +

    > +        sb_related_datapaths[rdi] = CONST_CAST(

    > +            struct sbrec_datapath_binding *, related_datapath->sb_db);

    > +        rdi++;

    > +    }

    > +    sbrec_datapath_binding_set_related_datapaths(od->sb,

    > +        sb_related_datapaths, od->n_related_datapaths);

    > +    free(sb_related_datapaths);

    > +}

    > +

    > +static void

    > +generate_sb_related_datapaths_from_ports(struct ovs_list *both)

    > +{

    > +    struct dp_checked {

    > +           struct hmap_node hmap_node;

    > +           const struct sbrec_datapath_binding *sb_db;

    > +       };

    > +       struct dp_checked *dp_node;

    > +       struct hmap dp_processed = HMAP_INITIALIZER(&dp_processed);

    > +    struct ovn_port *op, *next;

    > +       LIST_FOR_EACH_SAFE (op, next, list, both) {

    > +           bool found = false;

    > +           HMAP_FOR_EACH_WITH_HASH (dp_node, hmap_node,

    > +                                    uuid_hash(&op->od->sb->header_.uuid),

    > +                                    &dp_processed) {

    > +               if (uuid_equals(&dp_node->sb_db->header_.uuid,

    > +                               &op->od->sb->header_.uuid)) {

    > +                   found = true;

    > +                   break;

    > +               }

    > +           }

    > +           if(!found){

    > +            dp_node = xzalloc(sizeof *dp_node);

    > +               dp_node->sb_db = op->od->sb;

    > +               hmap_insert(&dp_processed, &dp_node->hmap_node,

    > +                           uuid_hash(&dp_node->sb_db->header_.uuid));

    > +            populate_sb_related_datapaths(op->od);

    > +           }

    > +       }

    > +       struct dp_checked *cur_node, *next_node;

    > +       HMAP_FOR_EACH_SAFE (cur_node, next_node, hmap_node, &dp_processed)

    > {

    > +           hmap_remove(&dp_processed, &cur_node->hmap_node);

    > +           free(cur_node);

    > +       }

    > +       hmap_destroy(&dp_processed);

    > +}

    > +

    >  static struct ovn_port *

    >  ovn_port_create(struct hmap *ports, const char *key,

    >                  const struct nbrec_logical_switch_port *nbsp,

    > @@ -1359,12 +1462,19 @@ ovn_port_update_sbrec(const struct ovn_port *op,

    >              sbrec_port_binding_set_type(op->sb, "patch");

    >          }

    >

    > +        if (op->peer && op->peer->od && op->peer->od->sb) {

    > +            add_related_datapath(&op->od->related_datapaths,

    > +                                 op->peer->od->sb,

    > +                                 &op->od->n_related_datapaths);

    > +        }

    > +

    >          const char *peer = op->peer ? op->peer->key : "<error>";

    >          struct smap new;

    >          smap_init(&new);

    >          smap_add(&new, "peer", peer);

    >          if (chassis) {

    >              smap_add(&new, "l3gateway-chassis", chassis);

    > +            sbrec_datapath_binding_set_options(op->sb->datapath, &new);

    >

    
    It would be cleaner to modify the datapath_binding in
    build_datapaths, though that would require addition of a run
    through the both list to update or remove "chassis" as
    necessary, as well as modifying the existing run through
    the nb_only list.
    
    I don't see any logic to remove "chassis" if that is removed
    from the nbr entry.

    With this code order, you are also adding "peer" which does
    not belong on datapath_binding. If this code stays here, then
    the smap_add "peer" line should be moved below this.

Yeah, that was sloppy.
“peer” in the datapath_binding options map will do no harm fortunately.
But the l3gateway-chassis option conceivably may need to be cleared in the same
datapath_binding entry at some point in it’s lifetime - a datapath may cease to be a
l3gateway and become something else.
I moved the line.

    
             }
    >          sbrec_port_binding_set_options(op->sb, &new);

    >          smap_destroy(&new);

    > @@ -1411,6 +1521,12 @@ ovn_port_update_sbrec(const struct ovn_port *op,

    >                  sbrec_port_binding_set_type(op->sb, "patch");

    >              }

    >

    > +            if (op->peer && op->peer->od && op->peer->od->sb) {

    > +                add_related_datapath(&op->od->related_datapaths,

    > +                                     op->peer->od->sb,

    > +                                     &op->od->n_related_datapaths);

    > +            }

    > +

    >              const char *router_port = smap_get_def(&op->nbsp->options,

    >                                                     "router-port",

    > "<error>");

    >              struct smap new;

    > @@ -1519,6 +1635,7 @@ build_ports(struct northd_context *ctx, struct hmap

    > *datapaths,

    >          cleanup_mac_bindings(ctx, ports);

    >      }

    >

    > +    generate_sb_related_datapaths_from_ports(&both);

    >      tag_alloc_destroy(&tag_alloc_table);

    >      destroy_chassis_queues(&chassis_qdisc_queues);

    >  }

    > @@ -4952,6 +5069,10 @@ main(int argc, char *argv[])

    >      add_column_noalert(ovnsb_idl_loop.idl,

    >                         &sbrec_datapath_binding_col_tunnel_key);

    >      add_column_noalert(ovnsb_idl_loop.idl,

    > +                       &sbrec_datapath_binding_col_related_datapaths);

    > +    add_column_noalert(ovnsb_idl_loop.idl,

    > +                       &sbrec_datapath_binding_col_options);

    > +    add_column_noalert(ovnsb_idl_loop.idl,

    >                         &sbrec_datapath_binding_col_external_ids);

    >

    >      ovsdb_idl_add_table(ovnsb_idl_loop.idl, &sbrec_table_port_binding);

    > diff --git a/ovn/ovn-sb.ovsschema b/ovn/ovn-sb.ovsschema

    > index 89342fe..8dfb2f4 100644

    > --- a/ovn/ovn-sb.ovsschema

    > +++ b/ovn/ovn-sb.ovsschema

    > @@ -1,7 +1,7 @@

    >  {

    >      "name": "OVN_Southbound",

    > -    "version": "1.9.0",

    > -    "cksum": "239060528 9012",

    > +    "version": "1.10.0",

    > +    "cksum": "3562877871 9537",

    >      "tables": {

    >          "SB_Global": {

    >              "columns": {

    > @@ -93,7 +93,17 @@

    >                                        "maxInteger": 16777215}}},

    >                  "external_ids": {

    >                      "type": {"key": "string", "value": "string",

    > -                             "min": 0, "max": "unlimited"}}},

    > +                             "min": 0, "max": "unlimited"}},

    > +                "options": {

    > +                     "type": {"key": "string",

    > +                              "value": "string",

    > +                              "min": 0,

    > +                              "max": "unlimited"}},

    > +                "related_datapaths": {"type": {"key": {"type": "uuid",

    > +                                      "refTable": "Datapath_Binding",

    > +                                      "refType": "weak"},

    > +                                      "min": 0,

    > +                                      "max": "unlimited"}}},

    >              "indexes": [["tunnel_key"]],

    >              "isRoot": true},

    >          "Port_Binding": {

    > diff --git a/ovn/ovn-sb.xml b/ovn/ovn-sb.xml

    > index 65191ed..88de25d 100644

    > --- a/ovn/ovn-sb.xml

    > +++ b/ovn/ovn-sb.xml

    > @@ -1556,6 +1556,19 @@ tcp.flags = RST;

    >        constructed for each supported encapsulation.

    >      </column>

    >

    > +    <column name="options" key="l3gateway-chassis">

    > +      The <code>chassis</code> in which the router datapath resides.

    > +    </column>

    > +

    > +    <column name="related_datapaths">

    > +      The related_datapaths column is used to keep track of which

    > datapaths

    > +      are connected to each other.  This is leveraged in ovn-controller to

    > +      do a flood fill from each local logical switch to determine the full

    > +      set of datapaths needed on a given ovn-controller.   This set of

    > +      datapaths can be used to determine which logical ports and logical

    > +      flows an ovn-controller should monitor.

    > +    </column>

    > +

    >      <group title="OVN_Northbound Relationship">

    >        <p>

    >          Each row in <ref table="Datapath_Binding"/> is associated with

    > some

    > diff --git a/tests/ovn.at b/tests/ovn.at

    > index a226849..af544e1 100644

    > --- a/tests/ovn.at

    > +++ b/tests/ovn.at

    > @@ -1673,6 +1673,141 @@ OVN_CLEANUP([hv1],[hv2])

    >

    >  AT_CLEANUP

    >

    > +# 2 hypervisors, each with one logical switch with 2 logical ports.

    > +AT_SETUP([ovn -- conditional monitoring: 2 HVs each with 1 LS and 2 LPs])

    > +AT_SKIP_IF([test $HAVE_PYTHON = no])

    > +ovn_start

    > +

    > +# In this test cases we create 2 switches, all connected to same

    > +# physical network (through br-phys on each HV). Each switch has

    > +# VIF ports across 1 HV. Each HV has 2 VIF ports. The first digit

    > +# of VIF port name indicates the hypervisor it is bound to, e.g.

    > +# lp12 means VIF 2 on hv1.

    > +

    > +for i in 1 2; do

    > +    ls_name=ls$i

    > +    ovn-nbctl ls-add $ls_name

    > +done

    > +

    > +net_add n1

    > +for i in 1 2; do

    > +    sim_add hv$i

    > +    as hv$i

    > +    ovs-vsctl add-br br-phys

    > +    ovn_attach n1 br-phys 192.168.0.$i

    > +    ovs-appctl -t ovn-controller vlog/set ANY:file:DBG

    > +

    > +    for j in 1 2; do

    > +        ovs-vsctl add-port br-int vif$i$j -- \

    > +            set Interface vif$i$j external-ids:iface-id=lp$i$j \

    > +                                  options:tx_pcap=hv$i/vif$i$j-tx.pcap \

    > +                                  options:rxq_pcap=hv$i/vif$i$j-rx.pcap \

    > +                                  ofport-request=$i$j

    > +

    > +        lsp_name=lp$i$j

    > +        if test $i -eq 1; then

    > +            ls_name=ls1

    > +        else

    > +            ls_name=ls2

    > +        fi

    > +

    > +        ovn-nbctl lsp-add $ls_name $lsp_name

    > +        ovn-nbctl lsp-set-addresses $lsp_name f0:00:00:00:00:$i$j

    > +        ovn-nbctl lsp-set-port-security $lsp_name f0:00:00:00:00:$i$j

    > +

    > +        OVS_WAIT_UNTIL([test x`ovn-nbctl lsp-get-up $lsp_name` = xup])

    > +    done

    > +done

    > +

    > +ovn_populate_arp

    > +

    > +# XXX This is now the 3rd copy of these functions in this file ...

    > +

    > +# Given the name of a logical port, prints the name of the hypervisor

    > +# on which it is located.

    > +vif_to_hv() {

    > +    echo hv${1%?}

    > +}

    > +#

    > +# test_packet INPORT DST SRC ETHTYPE OUTPORT...

    > +#

    > +# This shell function causes a packet to be received on INPORT.  The

    > packet's

    > +# content has Ethernet destination DST and source SRC (each exactly 12 hex

    > +# digits) and Ethernet type ETHTYPE (4 hex digits).  The OUTPORTs (zero or

    > +# more) list the VIFs on which the packet should be received.  INPORT and

    > the

    > +# OUTPORTs are specified as logical switch port numbers, e.g. 11 for

    > vif11.

    > +for i in 1 2; do

    > +    for j in 1 2; do

    > +        : > $i$j.expected

    > +    done

    > +done

    > +test_packet() {

    > +    local inport=$1 src=$2 dst=$3 eth=$4; shift; shift; shift; shift

    > +    local packet=${src}${dst}${eth}

    > +    hv=`vif_to_hv $inport`

    > +    vif=vif$inport

    > +    as $hv ovs-appctl netdev-dummy/receive $vif $packet

    > +    for outport; do

    > +        echo $packet >> $outport.expected

    > +    done

    > +}

    > +

    > +# lp11 and lp12 are on the same network and on

    > +# same hypervisors

    > +test_packet 11 f00000000012 f00000000011 1112 12

    > +test_packet 12 f00000000011 f00000000012 1211 11

    > +

    > +# lp21 and lp22 are on the same network and on

    > +# the same hypervisor

    > +test_packet 21 f00000000022 f00000000021 2122 22

    > +test_packet 22 f00000000021 f00000000022 2221 21

    > +

    > +# lp11 and lp22 are on different networks so should

    > +# not be able to communicate.

    > +test_packet 11 f00000000022 f00000000011 1122

    > +test_packet 22 f00000000011 f00000000022 2211

    > +

    > +echo "------ OVN dump ------"

    > +ovn-nbctl show

    > +ovn-sbctl show

    > +

    > +echo "------ hv1 dump ------"

    > +as hv1 ovs-vsctl show

    > +as hv1 ovs-ofctl -O OpenFlow13 dump-flows br-int

    > +

    > +echo "------ hv2 dump ------"

    > +as hv2 ovs-vsctl show

    > +as hv2 ovs-ofctl -O OpenFlow13 dump-flows br-int

    > +

    > +# Check that all datapath components are monitored as expected.

    > +AT_CHECK([cat hv1/ovn-controller.log | grep 'tunnel_key 1'], [0],

    > [ignore-nolog])

    > +AT_CHECK([cat hv1/ovn-controller.log | grep 'tunnel_key 2'], [1],

    > [ignore-nolog])

    > +AT_CHECK([cat hv2/ovn-controller.log | grep 'tunnel_key 2'], [0],

    > [ignore-nolog])

    > +AT_CHECK([cat hv2/ovn-controller.log | grep 'tunnel_key 1'], [1],

    > [ignore-nolog])

    > +

    > +dbg_instances1=`cat hv1/ovn-controller.log | grep 'tunnel_key 1' | wc -l`

    > +echo ls1 monitor traces on hv1 $dbg_instances1

    > +dbg_instances2=`cat hv1/ovn-controller.log | grep 'tunnel_key 2' | wc -l`

    > +echo ls2 monitor traces on hv1 $dbg_instances2

    > +AT_CHECK([test $dbg_instances1 -gt $dbg_instances2], [0], [ignore-nolog])

    > +

    > +dbg_instances1=`cat hv2/ovn-controller.log | grep 'tunnel_key 2' | wc -l`

    > +echo ls2 monitor traces on hv2 $dbg_instances1

    > +dbg_instances2=`cat hv2/ovn-controller.log | grep 'tunnel_key 1' | wc -l`

    > +echo ls1 monitor traces on hv2 $dbg_instances2

    > +AT_CHECK([test $dbg_instances1 -gt $dbg_instances2], [0], [ignore-nolog])

    > +

    > +# Now check the packets actually received against the ones expected.

    > +for i in 1 2; do

    > +    for j in 1 2; do

    > +        OVN_CHECK_PACKETS([hv$i/vif$i$j-tx.pcap], [$i$j.expected])

    > +    done

    > +done

    > +

    > +OVN_CLEANUP([hv1],[hv2])

    > +

    > +AT_CLEANUP

    > +

    >  AT_SETUP([ovn -- vtep: 3 HVs, 1 VIFs/HV, 1 GW, 1 LS])

    >  AT_KEYWORDS([vtep])

    >  AT_SKIP_IF([test $HAVE_PYTHON = no])

    > @@ -2439,6 +2574,223 @@ OVN_CLEANUP([hv1], [hv2], [hv3])

    >

    >  AT_CLEANUP

    >

    > +# 3 hypervisors, 3 logical switches with 3 logical ports each, 1 logical

    > router

    > +AT_SETUP([ovn -- conditional Monitoring: 3 HVs, 3 LS, 3 lports/LS, 1 LR])

    > +AT_SKIP_IF([test $HAVE_PYTHON = no])

    > +ovn_start

    > +

    > +# Logical network:

    > +#

    > +# Three logical switches ls1, ls2, ls3.

    > +# One logical router lr0 connected to ls[123],

    > +# with nine subnets, three per logical switch:

    > +#

    > +#    lrp11 on ls1 for subnet 192.168.11.0/24

    > +#    lrp12 on ls1 for subnet 192.168.12.0/24

    > +#    lrp13 on ls1 for subnet 192.168.13.0/24

    
    
    This is confusing since the code below does not add
    lrp11, lrp12, lrp13, just the subnets in ls1 which is
    not attached to lr0.

That description was not cleaned up when the test was updated.
I fixed it.
    
    I am having trouble understanding the motivation for
    3 subnets per ls. It will work, just wondering why you
    thought that it is necessary or helpful?

I just needed multiple ports. Having multiple subnets allows easier porting
to create a different test case in future based on this one.
    
    >

    > +#    ...

    > +#    lrp33 on ls3 for subnet 192.168.33.0/24

    > +#

    > +# 27 VIFs, 9 per LS, 3 per subnet: lp[123][123][123], where the first two

    > +# digits are the subnet and the last digit distinguishes the VIF.

    > +for i in 1 2 3; do

    > +    ovn-nbctl ls-add ls$i

    > +    for j in 1 2 3; do

    > +        for k in 1 2 3; do

    > +            ovn-nbctl \

    > +                -- lsp-add ls$i lp$i$j$k \

    > +                -- lsp-set-addresses lp$i$j$k "f0:00:00:00:0$i:$j$k \

    > +                   192.168.$i$j.$k"

    > +        done

    > +    done

    > +done

    > +

    > +# Only ls2 and ls3 are connected by the logical router.

    > +ovn-nbctl lr-add lr0

    > +for i in 2 3; do

    > +    for j in 1 2 3; do

    > +        ovn-nbctl lrp-add lr0 lrp$i$j 00:00:00:00:ff:$i$j

    > 192.168.$i$j.254/24

    > +        ovn-nbctl \

    > +            -- lsp-add ls$i lrp$i$j-attachment \

    > +            -- set Logical_Switch_Port lrp$i$j-attachment type=router \

    > +                             options:router-port=lrp$i$j \

    > +                             addresses='"00:00:00:00:ff:'$i$j'"'

    > +    done

    > +done

    > +

    > +# Physical network:

    > +#

    > +# Three hypervisors hv[123].

    > +# lp[1??] on hv1

    > +# lp[2??] on hv2.

    > +# lp[3??] on hv3.

    >

    
    I like this. So ls3 is only present on hv3?
    If so, this will test the related_datapaths part of the code, making
    sure that hv2 grabs flows for datapaths lr0 and ls3 even though
    there are no local VIFs in ls3 on hv2. This would catch the
    code order northd issue from v9.
    
    Mickey
    
    +
    > +# Given the name of a logical port, prints the name of the hypervisor

    > +# on which it is located.

    > +vif_to_hv() {

    > +    case $1 in dnl (

    > +        1??) echo 1 ;; dnl (

    > +        2??) echo 2 ;; dnl (

    > +        3??) echo 3 ;;

    > +    esac

    > +}

    > +

    > +# Given the name of a logical port, prints the name of its logical router

    > +# port, e.g. "vif_to_lrp 123" yields 12.

    > +vif_to_lrp() {

    > +    echo ${1%?}

    > +}

    > +

    > +# Given the name of a logical port, prints the name of its logical

    > +# switch, e.g. "vif_to_ls 123" yields 1.

    > +vif_to_ls() {

    > +    echo ${1%??}

    > +}

    > +

    > +net_add n1

    > +for i in 1 2 3; do

    > +    sim_add hv$i

    > +    as hv$i

    > +    ovs-vsctl add-br br-phys

    > +    ovn_attach n1 br-phys 192.168.0.$i

    > +    ovs-appctl -t ovn-controller vlog/set ANY:file:DBG

    > +done

    > +for i in 1 2 3; do

    > +    for j in 1 2 3; do

    > +        for k in 1 2 3; do

    > +            hv=`vif_to_hv $i$j$k`

    > +                as hv$hv ovs-vsctl \

    > +                -- add-port br-int vif$i$j$k \

    > +                -- set Interface vif$i$j$k \

    > +                    external-ids:iface-id=lp$i$j$k \

    > +                    options:tx_pcap=hv$hv/vif$i$j$k-tx.pcap \

    > +                    options:rxq_pcap=hv$hv/vif$i$j$k-rx.pcap \

    > +                    ofport-request=$i$j$k

    > +        done

    > +    done

    > +done

    > +

    > +# Pre-populate the hypervisors' ARP tables so that we don't lose any

    > +# packets for ARP resolution (native tunneling doesn't queue packets

    > +# for ARP resolution).

    > +ovn_populate_arp

    > +

    > +# Allow some time for ovn-northd and ovn-controller to catch up.

    > +# XXX This should be more systematic.

    > +sleep 1

    > +

    > +# test_ip INPORT SRC_MAC DST_MAC SRC_IP DST_IP OUTPORT...

    > +#

    > +# This shell function causes a packet to be received on INPORT.  The

    > packet's

    > +# content has Ethernet destination DST and source SRC (each exactly 12 hex

    > +# digits) and Ethernet type ETHTYPE (4 hex digits).  The OUTPORTs (zero or

    > +# more) list the VIFs on which the packet should be received.  INPORT and

    > the

    > +# OUTPORTs are specified as logical switch port numbers, e.g. 123 for

    > vif123.

    > +for i in 1 2 3; do

    > +    for j in 1 2 3; do

    > +        for k in 1 2 3; do

    > +            : > $i$j$k.expected

    > +        done

    > +    done

    > +done

    > +test_ip() {

    > +    # This packet has bad checksums but logical L3 routing doesn't check.

    > +    local inport=$1 src_mac=$2 dst_mac=$3 src_ip=$4 dst_ip=$5

    > +    local packet=${dst_mac}${src_mac}08004500001c0000000040110000${

    > src_ip}${dst_ip}0035111100080000

    > +    shift; shift; shift; shift; shift

    > +    hv=hv`vif_to_hv $inport`

    > +    as $hv ovs-appctl netdev-dummy/receive vif$inport $packet

    > +    in_ls=`vif_to_ls $inport`

    > +    in_lrp=`vif_to_lrp $inport`

    > +    for outport; do

    > +        out_ls=`vif_to_ls $outport`

    > +        if test $in_ls = $out_ls; then

    > +            # Ports on the same logical switch receive exactly the same

    > packet.

    > +            echo $packet

    > +        else

    > +            # Routing decrements TTL and updates source and dest MAC

    > +            # (and checksum).

    > +            out_lrp=`vif_to_lrp $outport`

    > +            echo f00000000${outport}00000000ff$

    > {out_lrp}08004500001c00000000"3f1101"00${src_ip}${dst_ip}0035111100080000

    > +        fi >> $outport.expected

    > +    done

    > +}

    > +

    > +as hv1 ovn-sbctl list datapath_binding

    > +echo "----------------------------"

    > +as hv1 ovn-sbctl list port_binding

    > +echo "----------------------------"

    > +as hv1 ovn-sbctl dump-flows

    > +#as hv1 ovs-vsctl --columns=name,ofport list interface

    > +#as hv1 ovs-ofctl dump-flows br-int

    > +

    > +# Send IP packets between all pairs of source and destination ports:

    > +#

    > +# 1. Unicast IP packets are delivered to exactly one logical switch port

    > +#    (except that packets destined to their input ports are dropped).

    > +#

    > +ip_to_hex() {

    > +    printf "%02x%02x%02x%02x" "$@"

    > +}

    > +for is in 1 2 3; do

    > +  for js in 1 2 3; do

    > +    for ks in 1 2 3; do

    > +      s=$is$js$ks

    > +      smac=f00000000$s

    > +      sip=`ip_to_hex 192 168 $is$js $ks`

    > +      for id in 1 2 3; do

    > +          for jd in 1 2 3; do

    > +              for kd in 1 2 3; do

    > +                d=$id$jd$kd

    > +                dip=`ip_to_hex 192 168 $id$jd $kd`

    > +                if test $is = $id; then dmac=f00000000$d; else

    > dmac=00000000ff$is$js; fi

    > +                if test $d != $s; then unicast=$d; else unicast=; fi

    > +                if test $is = 1; then if test $is != $id; then unicast=;

    > fi; fi

    > +                if test $id = 1; then if test $is != $id; then unicast=;

    > fi; fi

    > +                test_ip $s $smac $dmac $sip $dip $unicast

    > +              done

    > +          done

    > +        done

    > +      done

    > +  done

    > +done

    > +

    > +# Allow some time for packet forwarding.

    > +# XXX This can be improved.

    > +sleep 1

    > +

    > +ovn-sbctl -f csv -d bare --no-heading \

    > +    -- --columns=logical_port,ip,mac list mac_binding > mac_bindings

    > +

    > +# Now check the packets actually received against the ones expected.

    > +for i in 1 2 3; do

    > +    for j in 1 2 3; do

    > +        for k in 1 2 3; do

    > +            OVN_CHECK_PACKETS([hv`vif_to_hv $i$j$k`/vif$i$j$k-tx.pcap],

    > +                              [$i$j$k.expected])

    > +        done

    > +    done

    > +done

    > +

    > +# Check that all datapath components are monitored as expected.

    > +AT_CHECK([cat hv1/ovn-controller.log | grep 'tunnel_key 1'], [0],

    > [ignore-nolog])

    > +AT_CHECK([cat hv1/ovn-controller.log | grep 'tunnel_key 2'], [1],

    > [ignore-nolog])

    > +AT_CHECK([cat hv1/ovn-controller.log | grep 'tunnel_key 3'], [1],

    > [ignore-nolog])

    > +AT_CHECK([cat hv1/ovn-controller.log | grep 'tunnel_key 4'], [1],

    > [ignore-nolog])

    > +AT_CHECK([cat hv2/ovn-controller.log | grep 'tunnel_key 1'], [1],

    > [ignore-nolog])

    > +AT_CHECK([cat hv2/ovn-controller.log | grep 'tunnel_key 2'], [0],

    > [ignore-nolog])

    > +AT_CHECK([cat hv2/ovn-controller.log | grep 'tunnel_key 3'], [0],

    > [ignore-nolog])

    > +AT_CHECK([cat hv2/ovn-controller.log | grep 'tunnel_key 4'], [0],

    > [ignore-nolog])

    > +AT_CHECK([cat hv3/ovn-controller.log | grep 'tunnel_key 1'], [1],

    > [ignore-nolog])

    > +AT_CHECK([cat hv3/ovn-controller.log | grep 'tunnel_key 2'], [0],

    > [ignore-nolog])

    > +AT_CHECK([cat hv3/ovn-controller.log | grep 'tunnel_key 3'], [0],

    > [ignore-nolog])

    > +AT_CHECK([cat hv3/ovn-controller.log | grep 'tunnel_key 4'], [0],

    > [ignore-nolog])

    > +

    > +# Gracefully terminate daemons

    > +OVN_CLEANUP([hv1], [hv2], [hv3])

    > +

    > +AT_CLEANUP

    > +

    >  # 3 hypervisors, one logical switch, 3 logical ports per hypervisor

    >  AT_SETUP([ovn -- portsecurity : 3 HVs, 1 LS, 3 lports/HV])

    >  AT_SKIP_IF([test $HAVE_PYTHON = no])

    > @@ -2983,6 +3335,7 @@ ovs-vsctl -- add-port br-int vif2 -- \

    >      options:tx_pcap=hv1/vif2-tx.pcap \

    >      options:rxq_pcap=hv1/vif2-rx.pcap \

    >      ofport-request=1

    > +ovs-appctl -t ovn-controller vlog/set ANY:file:DBG

    >

    >

    >  # Allow some time for ovn-northd and ovn-controller to catch up.

    > @@ -3014,10 +3367,13 @@ echo "---------------------"

    >  echo "---------SB dump-----"

    >  ovn-sbctl list datapath_binding

    >  echo "---------------------"

    > +ovn-sbctl list port_binding

    > +echo "---------------------"

    >  ovn-sbctl list logical_flow

    >  echo "---------------------"

    >

    >  echo "------ hv1 dump ----------"

    > +as hv1 ovs-ofctl show br-int

    >  as hv1 ovs-ofctl dump-flows br-int

    >

    >

    > @@ -3031,10 +3387,13 @@ sleep 1

    >  echo "---------SB dump-----"

    >  ovn-sbctl list datapath_binding

    >  echo "---------------------"

    > +ovn-sbctl list port_binding

    > +echo "---------------------"

    >  ovn-sbctl list logical_flow

    >  echo "---------------------"

    >

    >  echo "------ hv1 dump ----------"

    > +as hv1 ovs-ofctl show br-int

    >  as hv1 ovs-ofctl dump-flows br-int

    >

    >  as hv1 ovs-appctl netdev-dummy/receive vif1 $packet

    > @@ -4109,6 +4468,7 @@ ovs-vsctl -- add-port br-int hv1-vif1 -- \

    >      options:tx_pcap=hv1/vif1-tx.pcap \

    >      options:rxq_pcap=hv1/vif1-rx.pcap \

    >      ofport-request=1

    > +ovs-appctl -t ovn-controller vlog/set ANY:file:DBG

    >

    >

    >  sim_add hv2

    > @@ -4120,6 +4480,7 @@ ovs-vsctl -- add-port br-int hv2-vif1 -- \

    >      options:tx_pcap=hv2/vif1-tx.pcap \

    >      options:rxq_pcap=hv2/vif1-rx.pcap \

    >      ofport-request=1

    > +ovs-appctl -t ovn-controller vlog/set ANY:file:DBG

    >

    >  # Pre-populate the hypervisors' ARP tables so that we don't lose any

    >  # packets for ARP resolution (native tunneling doesn't queue packets

    > @@ -4213,7 +4574,6 @@ src_ip=`ip_to_hex 192 168 1 2`

    >  dst_ip=`ip_to_hex 172 16 1 2`

    >  expected=${dst_mac}${src_mac}08004500001c000000003e110200${

    > src_ip}${dst_ip}0035111100080000

    >

    > -

    >  as hv1 ovs-appctl netdev-dummy/receive hv1-vif1 $packet

    >  as hv1 ovs-appctl ofproto/trace br-int in_port=1 $packet

    >

    > @@ -4225,6 +4585,15 @@ as hv2 ovs-ofctl show br-int

    >  as hv2 ovs-ofctl dump-flows br-int

    >  echo "----------------------------"

    >

    > +# tunnel key 2 represents the gateway router, hence hv1 does not need

    > +# to know about its ports and flows, since the transit switch acts as

    > proxy.

    > +AT_CHECK([cat hv2/ovn-controller.log | grep 'tunnel_key 2'], [0],

    > [ignore-nolog])

    > +AT_CHECK([cat hv1/ovn-controller.log | grep 'tunnel_key 2'], [1],

    > [ignore-nolog])

    > +# tunnel key 4 represents a datapath (alice) on the far side of the

    > gateway from

    > +# the perspective of hv1, hence hv1 does not need to know about its ports

    > and flows.

    > +AT_CHECK([cat hv2/ovn-controller.log | grep 'tunnel_key 4'], [0],

    > [ignore-nolog])

    > +AT_CHECK([cat hv1/ovn-controller.log | grep 'tunnel_key 4'], [1],

    > [ignore-nolog])

    > +

    >  echo $expected > expected

    >  OVN_CHECK_PACKETS([hv2/vif1-tx.pcap], [expected])

    >

    > --

    > 1.9.1

    >

    > _______________________________________________

    > dev mailing list

    > dev@openvswitch.org

    > https://urldefense.proofpoint.com/v2/url?u=https-3A__mail.openvswitch.org_mailman_listinfo_ovs-2Ddev&d=DgICAg&c=uilaK90D4TOVoH58JNXRgQ&r=BVhFA09CGX7JQ5Ih-uZnsw&m=0NVW3s0W2Cp5PwqZSAh44zzzcCktMWNrs3_jXV4h4pY&s=Pq6i89qqTAZzkmqhxV08ag-Ve08Jx9nGmx3NQeGgc08&e= 

    >

    _______________________________________________
    dev mailing list
    dev@openvswitch.org
    https://urldefense.proofpoint.com/v2/url?u=https-3A__mail.openvswitch.org_mailman_listinfo_ovs-2Ddev&d=DgICAg&c=uilaK90D4TOVoH58JNXRgQ&r=BVhFA09CGX7JQ5Ih-uZnsw&m=0NVW3s0W2Cp5PwqZSAh44zzzcCktMWNrs3_jXV4h4pY&s=Pq6i89qqTAZzkmqhxV08ag-Ve08Jx9nGmx3NQeGgc08&e=
Darrell Ball Dec. 7, 2016, 4:03 a.m. UTC | #13
On 12/6/16, 7:08 PM, "ovs-dev-bounces@openvswitch.org on behalf of Russell Bryant" <ovs-dev-bounces@openvswitch.org on behalf of russell@ovn.org> wrote:

    On Tue, Dec 6, 2016 at 3:35 PM, Ben Pfaff <blp@ovn.org> wrote:
    
    > On Tue, Dec 06, 2016 at 02:45:19PM -0500, Russell Bryant wrote:

    > > On Mon, Dec 5, 2016 at 2:22 AM, Ben Pfaff <blp@ovn.org> wrote:

    > >

    > > > On Sun, Dec 04, 2016 at 04:13:44PM -0800, Darrell Ball wrote:

    > > > > This patch adds datapaths of interest support where only datapaths of

    > > > > local interest are monitored by the ovn-controller ovsdb client.  The

    > > > > idea is to do a flood fill in ovn-controller of datapath associations

    > > > > calculated by northd. A new column is added to the SB database

    > > > > datapath_binding table - related_datapaths to facilitate this so all

    > > > > datapaths associations are known quickly in ovn-controller.  This

    > > > > allows monitoring to adapt quickly with a single new monitor setting

    > > > > for all datapaths of interest locally.

    > > >

    > > > Hi Darrell, the series I just sent out has some relevance here.  It

    > > > makes ovn-controller only implement the datapaths and ports flows that

    > > > are relevant to a given hypervisor, even though it does not affect what

    > > > part of the database is replicated.  The particularly relevant patch is

    > > > this:

    > > >         https://urldefense.proofpoint.com/v2/url?u=https-3A__patchwork.ozlabs.org_patch_702608_&d=DgICAg&c=uilaK90D4TOVoH58JNXRgQ&r=BVhFA09CGX7JQ5Ih-uZnsw&m=qWAPHpdqV7RcWokRLBU_mikSMVTnTefdf_BqOa0D9xE&s=iRVMW7EreZ7g-20aDnp497XNOeX0UbLTtBG0HHDPYXQ&e= 

    > > >

    > > > I suggest having a look at the series.

    > > >

    > >

    > > I'm in the middle of doing some control plane performance testing using

    > > OpenStack and OVN.  I may have time to do another run with some

    > additional

    > > patches applied to see how they affect performance.  Ideally I'd be able

    > to

    > > do that this week, though.

    > >

    > > If I'm able to fit this in, what do you guys suggest if I've only got one

    > > shot?  This patch or Ben's series?

    >

    > This series is aimed more at technical debt and code clarity than at

    > performance.  It might help with performance, but that's not the main

    > goal.

    >

    > Darrell/Liran's series is motivated by performance.  If it doesn't help

    > performance, then probably it shouldn't be applied because it makes the

    > code harder to understand.  (Disclaimer: I haven't read the recent

    > versions.)

    >

    
    I just deployed this change in my environment and was not able to observe a
    noticeable difference in CPU consumption of ovn-controller.  (ovn-northd
    was updated, as well.)
    
    The scenario was creating 500 VMs through OpenStack at a concurrency of 64
    (up to 64 VMs at a time).  Each VM also received its own network.  The 500
    VMs were distributed among 9 hypervisors.  Once all 500 VMs were up, they
    were all deleted.

Your results seem to be quite different than Liran’s. What is the difference ?

How many:
1) logical routers
2) gateway routers
 do you have ?

Sounds like this test boots up 500 VMs, checking for all to be created and then deletes them
and then ends ? 

    The CPU consumption looked roughly the same both before and after the patch
    and time to complete the test scenario was also similar.
    
    ovn-controller hits 100% CPU about 30% of the way into creating VMs and
    stays there until we finish creating them and start deleting them instead.
    
    -- 
    Russell Bryant
    _______________________________________________
    dev mailing list
    dev@openvswitch.org
    https://urldefense.proofpoint.com/v2/url?u=https-3A__mail.openvswitch.org_mailman_listinfo_ovs-2Ddev&d=DgICAg&c=uilaK90D4TOVoH58JNXRgQ&r=BVhFA09CGX7JQ5Ih-uZnsw&m=qWAPHpdqV7RcWokRLBU_mikSMVTnTefdf_BqOa0D9xE&s=INRPrNb-XW4vMEJXjpU5SuTBNF_snlyvngHM3fevc_Y&e=
Mickey Spiegel Dec. 7, 2016, 11:50 p.m. UTC | #14
On Tue, Dec 6, 2016 at 7:53 PM, Darrell Ball <dball@vmware.com> wrote:

>
>
> On 12/4/16, 9:48 PM, "ovs-dev-bounces@openvswitch.org on behalf of Mickey
> Spiegel" <ovs-dev-bounces@openvswitch.org on behalf of
> mickeys.dev@gmail.com> wrote:
>
>     On Sun, Dec 4, 2016 at 4:13 PM, Darrell Ball <dlu998@gmail.com> wrote:
>
>     > This patch adds datapaths of interest support where only datapaths of
>     > local interest are monitored by the ovn-controller ovsdb client.  The
>     > idea is to do a flood fill in ovn-controller of datapath associations
>     > calculated by northd. A new column is added to the SB database
>     > datapath_binding table - related_datapaths to facilitate this so all
>     > datapaths associations are known quickly in ovn-controller.  This
>     > allows monitoring to adapt quickly with a single new monitor setting
>     > for all datapaths of interest locally.
>     >
>
>     This is getting close. I have a few questions and comments inline.
>
>     >
>     > We monitor both logical flows and logical ports. The mac bindings
>     > will likely go away and multicast groups are relatively few.
>     > To reduce risk and overall latency we only suppress logical flows as
>     > a starting point. Logical ports are not suppressed as a starting
>     > point and but then are monitor suppressed as appropriate from that
>     > point on. The initial suppression has little practical gain and not
>     > doing it allows for us to be flexible in terms of how different port
>     > types are supported.
>     >
>
>     I lack familiarity with conditional monitoring behavior when you do
>     not explicitly suppress as a starting point. A new hv comes up, and
>     starts downloading stuff from SB, including port_bindings. At some
>     point the first VIF on that hv comes up, as a result of which the hv's
>     controller calls sbrec_port_binding_add_clause_logical_port.
>     Do all other port_bindings get suppressed at that point?
>     If so, won't this lead to different behavior between the first time a
>     VIF comes up on a hv, and subsequent additions of VIFs on
>     unrelated and previously unseen datapaths?
>     Wouldn't the behavior be more consistent if you suppress from the
>     beginning?
>
> I commented on this before; I wanted to push this part until later; but
> fine, let us push it now.
> The issue is l2gateway logical ports which may legitimately exist as the
> only port in a LS datapath
> on a given HV.
> L2gateway LPs are only known on a HV when the LP l2gateway-chassis option
> is downloaded to a HV.
> If we want initial suppression of LPs, then we need to do something
> special for l2gateway ports.
>

This is a valid point that I missed.


> For now, monitor all l2gateway ports unconditionally.
> L2gateways are both a logical port and a chassis; not a datapath as in the
> case of l3gateway
>  (Possibly later, a physical HV level external_id could be used to specify
> the presence of the l2gateway logical port
> similarly to VIFs.).
> The number of l2gateway ports could be on the order of tenants, at least.
>

The code that you added for l3gateway looks good.

If you do not take care of optimizing l2gateway, I will take a look
later. In my distributed NAT patch set, I am adding another
port_binding type on distributed routers, that will be bound to a
chassis. Similar to l2gateway, this new port_binding type should
trigger a clause based on that datapath, and unlike l3gateway it
does not affect the flood fill calculation. To handle both of these,
a new datapath option "chassis-of-interest" will need to be
added.

>
>
>     >
>     > To validate the gain, we take the approach to verify the l3gateway
>     > router flow suppression from the non-l3gateway HVs POV and also add
>     > a separate test to verify the flow and port handling scale
> advantages.
>     > The separate test added verifies the logical port and logical flow
>     > suppression advantage. This measures the difference in flows and
> ports
>     > sent to the HV clients where the far majority of the overall work is.
>     > The less ports and flows that are unnecessarily processed on the HVs,
>     > the more benefit, excluding latency skew. The separate test is
>     > simple for interpretation and shows an order of magnitude advantage;
>     > the test uses only logical switches. The advantage would increase
> with
>     > logical routers. The test is conservative in estimating the numerical
>     > advantage to increase the probability of the check passing the first
>     > time, since there are some timing conssiderations that affect the
> numbers.
>     >
>
>     This description should be updated to mention the distributed
>     logical router tests that you added.
>
>
> sure
>
>     >
>     > Liran Schour contributed enhancements to the conditional
>     > monitoring granularity in ovs and also submitted patches
>     > for ovn usage of conditional monitoring which aided this patch
>     > though sharing of concepts through code review work.
>     >
>     > Ben Pfaff suggested that northd could be used to pre-populate
>     > related datapaths for ovn-controller to use.  That idea is
>     > used as part of this patch.
>     >
>     > CC: Ben Pfaff <blp@ovn.org>
>     > CC: Liran Schour <LIRANS@il.ibm.com>
>     > Signed-off-by: Darrell Ball <dlu998@gmail.com>
>     > ---
>     >
>     > v9->v10: Fix ordering issue on northd as pointed out by Mickey.
>     >
>     >          Remove patch flood fill triggering optimization to
>     >          make the code easier to verify and harder to break.
>     >
>     >          Fix a couple issues on ovn-controller side, including
>     >          stale checking.
>     >
>     >          Add lots more focused testing - now 2 separate extensive
>     >          tests and beefed up l3gateway test.
>     >
>     >          Remove dependency on port filtering for l3gateways filtering
>     >          and associate l3gateway chassis name as part of datapath,
>     >          similarly to logical ports. Some agreement on adding this
>     >          extra field to the datapath binding table.
>     >
>
>     I certainly agree. It is good to have the filtering depend only
>     on the contents of datapath_binding, with no dependencies
>     on port_bindings. This way there is no issue with ordering
>     dependencies when filtering port_bindings.
>
>     >
>     > v8->v9: Reinstate logical port monitoring from patch V3
>     >         now to handle some LP high influence topologies, especially
>     >         in lieu of a lack of incremental processing right now.
>     >         The original plan was to forgo monitoring logical ports
>     >         in the first commit, but without incremental processing,
>     >         port processing gains importance.
>     >
>     >         Parse out some functions
>     >
>     >         Add a separate test for conditional monitoring to
>     >         guesstimate the reduction in logical port and flow events
>     >         seen at the HVs.
>     >
>     > v7->v8: Start wth logical flow conditional monitoring off.
>     >         Remove deprecated code.
>     >         Recover SB DB version number change.
>     >         Change test to be more definitive.
>     >
>     > v6->v7: Rebase
>     >
>     > v5->v6: Rebase; fix stale handling.
>     >
>     > v4->v5: Correct cleanup of monitors.
>     >         Fix warning.
>     >
>     > v3->v4: Refactor after incremental processing backout.
>     >         Limit filtering to logical flows to limit risk.
>     >
>     > v2->v3: Line length violation fixups :/
>     >
>     > v1->v2: Added logical port removal monitoring handling, factoring
>     >         in recent changes for incremental processing.  Added some
>     >         comments in the code regarding initial conditions for
>     >         database conditional monitoring.
>     >
>     >  ovn/controller/binding.c        |  32 +++-
>     >  ovn/controller/lflow.c          |   5 +
>     >  ovn/controller/ovn-controller.c | 188 ++++++++++++++++++++
>     >  ovn/controller/ovn-controller.h |  10 ++
>     >  ovn/controller/patch.c          |  24 +++
>     >  ovn/northd/ovn-northd.c         | 121 +++++++++++++
>     >  ovn/ovn-sb.ovsschema            |  16 +-
>     >  ovn/ovn-sb.xml                  |  13 ++
>     >  tests/ovn.at                    | 371
> ++++++++++++++++++++++++++++++
>     > +++++++++-
>     >  9 files changed, 770 insertions(+), 10 deletions(-)
>     >
>     > diff --git a/ovn/controller/binding.c b/ovn/controller/binding.c
>     > index fb76032..f24e2e6 100644
>     > --- a/ovn/controller/binding.c
>     > +++ b/ovn/controller/binding.c
>     > @@ -68,7 +68,8 @@ static void
>     >  get_local_iface_ids(const struct ovsrec_bridge *br_int,
>     >                      struct shash *lport_to_iface,
>     >                      struct sset *all_lports,
>     > -                    struct sset *egress_ifaces)
>     > +                    struct sset *egress_ifaces,
>     > +                    const struct controller_ctx *ctx)
>     >  {
>     >      int i;
>     >
>     > @@ -88,6 +89,9 @@ get_local_iface_ids(const struct ovsrec_bridge
> *br_int,
>     >              iface_id = smap_get(&iface_rec->external_ids,
> "iface-id");
>     >
>     >              if (iface_id) {
>     > +                sbrec_port_binding_add_clause_
>     > logical_port(ctx->ovnsb_idl,
>     > +
>  OVSDB_F_EQ,
>     > +
>  iface_id);
>     >                  shash_add(lport_to_iface, iface_id, iface_rec);
>     >                  sset_add(all_lports, iface_id);
>     >              }
>     > @@ -106,8 +110,13 @@ get_local_iface_ids(const struct ovsrec_bridge
>     > *br_int,
>     >
>     >  static void
>     >  add_local_datapath(struct hmap *local_datapaths,
>     > -        const struct sbrec_port_binding *binding_rec)
>     > +        const struct sbrec_port_binding *binding_rec,
>     > +        const struct controller_ctx *ctx,
>     > +        const struct sbrec_chassis *chassis_rec)
>     >  {
>     > +
>     > +    do_datapath_flood_fill(ctx, binding_rec->datapath, chassis_rec);
>     > +
>     >
>
>     In previous versions, this was at the end of add_local_datapath.
>     In this version you moved it to the beginning.
>     Is there an advantage in running flood fill when the second, third,
>     fourth, etc ports are added for the same datapath?
>     It does add processing.
>
> All tests pass either way.
>
> I moved it here because conceptually, I don’t want the local_datapaths
> check stopping
> datapaths of interest checking.
> i.e. I don’t want any shortcircuiting of one feature by another.
>

If you take a look at Ben's patch set, specifically patch 6, his
expanded notion of local_datapaths is very close to your
dps_of_interest. One difference is that local_datapaths is
recalculated with every pass through the ovn-controller.c
main loop, whereas dps_of_interest is persistent in order to
track what conditional monitoring datapath clauses have
been added.

From my comments on Ben's patch 6:

Both this and datapaths of interest trigger a computation from
add_local_datapath, to find other datapaths that can be reached on
the local hypervisor. At a high level, the difference is that this
computation is based on patch ports from a datapath, while the
datapaths of interest computation is based on a list of neighbor
datapaths that is populated by northd into each datapath_binding.
Note that northd populates related_datapaths based on patch
ports, so in the end it is the same source of information, but
different parts of the computation are done in different places at
different times.

Moving on to the specifics of this issue:
local_datapaths is reinitialized with each pass through the
ovn-controller.c main loop. For any one pass through the
main loop, i.e. one pass through binding_run, the first time
add_local_datapath is called for any port on a specific
datapath, the if statement will be evaluated as false.
Regardless of where the call to do_datapath_flood_fill is
placed, it will be triggered. The flood fill algorithm will go
over all reachable datapaths, considering all
related_datapaths from each reachable datapath.

Within the same pass through binding_run, assume that
add_local_datapath is called for a second port on the
same datapath. If you call do_datapath_flood_fill again,
the only way that you will get a different answer is if there
has been a change that added or removed a reachable
datapath_binding, or modified the related_datapaths list
on a reachable datapath_binding.

Trying to catch those kinds of changes seems to me to
be different than the way OVN normally behaves. For
example, when a new datapath, ports, and corresponding
flows are added, if the ports are not seen by the time
binding_run completes, then the corresponding logical
flows will not be processed until the next pass through the
ovn-controller.c main loop, since they will fail the
local_datapaths check at the beginning of
consider_logical_flow.

>
>
>     >      if (get_local_datapath(local_datapaths,
>     >                             binding_rec->datapath->tunnel_key)) {
>     >          return;
>     > @@ -293,7 +302,7 @@ consider_local_datapath(struct controller_ctx
> *ctx,
>     >              /* Add child logical port to the set of all local
> ports. */
>     >              sset_add(all_lports, binding_rec->logical_port);
>     >          }
>     > -        add_local_datapath(local_datapaths, binding_rec);
>     > +        add_local_datapath(local_datapaths, binding_rec, ctx,
>     > chassis_rec);
>     >          if (iface_rec && qos_map && ctx->ovs_idl_txn) {
>     >              get_qos_params(binding_rec, qos_map);
>     >          }
>     > @@ -328,7 +337,7 @@ consider_local_datapath(struct controller_ctx
> *ctx,
>     >          }
>     >
>     >          sset_add(all_lports, binding_rec->logical_port);
>     > -        add_local_datapath(local_datapaths, binding_rec);
>     > +        add_local_datapath(local_datapaths, binding_rec, ctx,
>     > chassis_rec);
>     >          if (binding_rec->chassis == chassis_rec) {
>     >              return;
>     >          }
>     > @@ -342,7 +351,7 @@ consider_local_datapath(struct controller_ctx
> *ctx,
>     >          const char *chassis = smap_get(&binding_rec->options,
>     >                                         "l3gateway-chassis");
>     >          if (!strcmp(chassis, chassis_rec->name) &&
> ctx->ovnsb_idl_txn) {
>     > -            add_local_datapath(local_datapaths, binding_rec);
>     > +            add_local_datapath(local_datapaths, binding_rec, ctx,
>     > chassis_rec);
>     >          }
>     >      } else if (chassis_rec && binding_rec->chassis == chassis_rec) {
>     >          if (ctx->ovnsb_idl_txn) {
>     > @@ -351,6 +360,9 @@ consider_local_datapath(struct controller_ctx
> *ctx,
>     >              for (int i = 0; i < binding_rec->n_mac; i++) {
>     >                  VLOG_INFO("Releasing %s", binding_rec->mac[i]);
>     >              }
>     > +            sbrec_port_binding_remove_clause_logical_port(
>     > +                ctx->ovnsb_idl, OVSDB_F_EQ,
> binding_rec->logical_port);
>     > +
>     >              sbrec_port_binding_set_chassis(binding_rec, NULL);
>     >              sset_find_and_delete(all_lports,
> binding_rec->logical_port);
>     >          }
>     > @@ -381,13 +393,21 @@ binding_run(struct controller_ctx *ctx, const
> struct
>     > ovsrec_bridge *br_int,
>     >      hmap_init(&qos_map);
>     >      if (br_int) {
>     >          get_local_iface_ids(br_int, &lport_to_iface, all_lports,
>     > -                            &egress_ifaces);
>     > +                            &egress_ifaces, ctx);
>     >      }
>     >
>     > +    set_all_datapaths_of_interest_stale();
>     > +
>     >      /* Run through each binding record to see if it is resident on
> this
>     >       * chassis and update the binding accordingly.  This includes
> both
>     >       * directly connected logical ports and children of those
> ports. */
>     >      SBREC_PORT_BINDING_FOR_EACH(binding_rec, ctx->ovnsb_idl) {
>     > +
>     > +        /* Set any datapath, used by any port and that we are
> monitoring
>     > as used,
>     >
>
>     This line is too long.
>
> fixed
>
>
>     > +         * so we don't remove the monitoring.
>     > +         */
>     > +        set_datapath_of_interest_used(binding_rec->datapath);
>     > +
>     >
>
>     This seems too broad. This would set the datapath as used if
>     you have received any SB port_bindings for that datapath, even
>     if those port_bindings are not relevant to this chassis.
>
> It is not broad; it checks the datapath of interest DB and if a datapath
> is monitored
> it sets it to NOT STALE; otherwise it changes nothing.
>
>     How would datapaths ever be removed?
>
> Datapaths get removed when no logical port refers to them. This is the
> definition of a stale datapath –
> so I know it has no dependencies.
>

Assume there is a datapath 1 and logical ports A and B on that
datapath, neither one on this chassis. Now add vif C on this
chassis, in datapath 1. As a result of the addition of the vif,
add_local_datapaths is eventually called, and datapath 1 is
added to dps_of_interest. This results in SB port_bindings
A, B, and C being received.

Now remove vif C. You still have SB port_bindings for A and B.
Won't those trigger this call to set_datapath_of_interest_used?
Datapath 1 will remain in dps_of_interest even though there
is no reachable path from anything resident on this chassis to
datapath 1.

>
>     Why isn't the stale flag processing in add_datapath_of_interest
>     enough?
>     Was there some corner case that was missed?
>
> I think the code in consider_local_datapath() could filter calls to
> add_local_datapath(); one of the examples is if we don’t have a
> transaction ?
>
>         if (!strcmp(chassis, chassis_rec->name) && ctx->ovnsb_idl_txn) {
>

I can think of no good reason for the inclusion of ctx->ovnsb_idl_txn in
this if statement. There is no corresponding check of ctx->ovnsb_idl_txn
for vifs or l2gateway ports, only for l3gateway ports. I think you found a
small bug.

>
> I don’t want to have a dependency on something that should be irrelevant.
>

The calls to add_local_datapath are what trigger addition of
datapaths to dps_of_interest through do_datapath_flood_fill.
It seems a lot cleaner to depend on the same code path for
removal of datapaths from dps_of_interest. If there are
problems in that code path that leave things
non-deterministic, isn't it better to try and fix those?
I feel that the addition of a separate way to set stale=false
raises the possibility of lots of different code paths and
combinations, making it difficult to debug in a deterministic
manner.

>
>
>              consider_local_datapath(ctx, chassis_rec, binding_rec,
>     >                                  sset_is_empty(&egress_ifaces) ?
> NULL :
>     >                                  &qos_map, local_datapaths,
>     > &lport_to_iface,
>     > diff --git a/ovn/controller/lflow.c b/ovn/controller/lflow.c
>     > index 4e67365..765eae4 100644
>     > --- a/ovn/controller/lflow.c
>     > +++ b/ovn/controller/lflow.c
>     > @@ -167,6 +167,11 @@ consider_logical_flow(const struct lport_index
>     > *lports,
>     >      if (!ldp) {
>     >          return;
>     >      }
>     > +
>     > +    VLOG_DBG("consider logical flow: tunnel_key %lu "
>     > +             "\"match\" \"%s\", \"actions\" \"%s\"",
>     > +             ldp->tunnel_key, lflow->match, lflow->actions);
>     > +
>     >      if (is_switch(ldp)) {
>     >          /* For a logical switch datapath, local_datapaths tells us
> if
>     > there
>     >           * are any local ports for this datapath.  If not, we can
> skip
>     > diff --git a/ovn/controller/ovn-controller.c b/ovn/controller/ovn-
>     > controller.c
>     > index fea7841..c06bd81 100644
>     > --- a/ovn/controller/ovn-controller.c
>     > +++ b/ovn/controller/ovn-controller.c
>     > @@ -69,6 +69,14 @@ OVS_NO_RETURN static void usage(void);
>     >
>     >  static char *ovs_remote;
>     >
>     > +struct dp_of_interest_node {
>     > +    struct hmap_node hmap_node;
>     > +    const struct sbrec_datapath_binding *sb_db;
>     > +    bool stale;
>     > +};
>     > +
>     > +static struct hmap dps_of_interest = HMAP_INITIALIZER(&dps_of_
> interest);
>     > +
>     >  struct local_datapath *
>     >  get_local_datapath(const struct hmap *local_datapaths, uint32_t
>     > tunnel_key)
>     >  {
>     > @@ -128,6 +136,181 @@ get_bridge(struct ovsdb_idl *ovs_idl, const
> char
>     > *br_name)
>     >      return NULL;
>     >  }
>     >
>     > +static bool
>     > +remote_l3gat_lrd_port(const struct sbrec_datapath_binding *dp,
>     > +                             const struct sbrec_chassis
> *chassis_rec)
>     > +{
>     > +    const char *l3gateway_chassis = smap_get(&dp->options,
>     > +                                             "l3gateway-chassis");
>     > +    if(l3gateway_chassis){
>     > +        if(strcmp(l3gateway_chassis, chassis_rec->name)){
>     > +            return true;
>     > +        }
>     > +    }
>     > +    return false;
>     > +}
>     > +
>     > +void
>     > +set_all_datapaths_of_interest_stale(void)
>     > +{
>     > +    struct dp_of_interest_node *node;
>     > +    HMAP_FOR_EACH(node, hmap_node, &dps_of_interest) {
>     > +            node->stale = true;
>     > +    }
>     > +}
>     > +
>     > +static void
>     > +remove_datapath_of_interest(struct controller_ctx *ctx,
>     > +                            const struct sbrec_datapath_binding *dp)
>     >
>
>     The only call to remove_datapath_of_interest is from
>     clean_stale_datapaths_of_interest.
>     Why not pass in dp_cur_node rather than dp, and avoid
>     the hash lookup?
>     Or just put all the logic in
>     clean_stale_datapaths_of_interest?
>
> Actually, remove_datapath_of_interest() is left over from a transition in
> coding
> when it had an external caller as well.
> Hence, remove_datapath_of_interest() is deprecated now; it is gone.
>

The new code looks good.

>
>
>     > +{
>     > +
>     > +    struct dp_of_interest_node *node;
>     > +    HMAP_FOR_EACH_WITH_HASH (node, hmap_node,
>     > uuid_hash(&dp->header_.uuid),
>     > +                             &dps_of_interest) {
>     > +        if (ctx && uuid_equals(&node->sb_db->header_.uuid,
>     > &dp->header_.uuid)) {
>     > +
>     > +            sbrec_port_binding_remove_clause_datapath(ctx->ovnsb_
> idl,
>     > +                                                      OVSDB_F_EQ,
>     > +
> &dp->header_.uuid);
>     > +            sbrec_logical_flow_remove_clause_logical_datapath(
>     > +                ctx->ovnsb_idl, OVSDB_F_EQ, &dp->header_.uuid);
>     > +            hmap_remove(&dps_of_interest, &node->hmap_node);
>     > +            free(node);
>     > +            return;
>     > +        }
>     > +    }
>     > +}
>     > +
>     > +static void
>     > +clean_stale_datapaths_of_interest(struct controller_ctx *ctx)
>     > +{
>     > +    struct dp_of_interest_node *dp_cur_node, *dp_next_node;
>     > +    HMAP_FOR_EACH_SAFE (dp_cur_node, dp_next_node, hmap_node,
>     > +                        &dps_of_interest) {
>     > +        if(dp_cur_node->stale == true) {
>     > +            remove_datapath_of_interest(ctx, dp_cur_node->sb_db);
>     > +        }
>     > +    }
>     > +}
>     > +
>     > +void
>     > +set_datapath_of_interest_used(const struct sbrec_datapath_binding
> *dp)
>     > +{
>     > +
>     > +    struct dp_of_interest_node *node;
>     > +    HMAP_FOR_EACH_WITH_HASH (node, hmap_node,
>     > uuid_hash(&dp->header_.uuid),
>     > +                             &dps_of_interest) {
>     > +        if (uuid_equals(&node->sb_db->header_.uuid,
> &dp->header_.uuid)) {
>     > +            node->stale = false;
>     > +        }
>     > +    }
>     > +}
>     > +
>     > +static void
>     > +add_datapath_of_interest(const struct controller_ctx *ctx,
>     > +                         const struct sbrec_datapath_binding *dp)
>     > +{
>     > +    struct dp_of_interest_node *node;
>     > +
>     > +    HMAP_FOR_EACH_WITH_HASH (node, hmap_node,
>     > uuid_hash(&dp->header_.uuid),
>     > +                             &dps_of_interest) {
>     > +        if (uuid_equals(&node->sb_db->header_.uuid,
> &dp->header_.uuid)) {
>     > +            node->stale = false;
>     > +            return;
>     > +        }
>     > +    }
>     > +
>     > +    VLOG_DBG("monitor datapath with tunnel_key %lu",
> dp->tunnel_key);
>     > +
>     > +    sbrec_logical_flow_add_clause_logical_datapath(ctx->ovnsb_idl,
>     > +                                                   OVSDB_F_EQ,
>     > +
>  &dp->header_.uuid);
>     > +    sbrec_port_binding_add_clause_datapath(ctx->ovnsb_idl,
> OVSDB_F_EQ,
>     > +                                           &dp->header_.uuid);
>
    > +
>     > +    node = xzalloc(sizeof *node);
>     > +    hmap_insert(&dps_of_interest, &node->hmap_node,
>     > +                uuid_hash(&dp->header_.uuid));
>     > +    node->sb_db = dp;
>     > +    return;
>     > +}
>     > +
>     > +void
>     > +do_datapath_flood_fill(const struct controller_ctx *ctx,
>     > +                              const struct sbrec_datapath_binding
>     > *dp_start,
>     > +                                          const struct sbrec_chassis
>     > *chassis_rec)
>     > +{
>     > +
>     > +    if(remote_l3gat_lrd_port(dp_start, chassis_rec)) {
>     > +        return;
>     > +    }
>     > +
>     > +    struct dp_checked {
>     > +        struct hmap_node hmap_node;
>     > +        const struct sbrec_datapath_binding *sb_db;
>     > +    };
>     > +    struct dp_checked *dp_node;
>     > +    struct hmap related_dp_processed = HMAP_INITIALIZER(&related_dp_
>     > processed);
>     >
>
>     This third level of lists is not easy to follow, having
>     dps_of_interest
>
>     related_dp_processed
> This map is to remember what we saw and not follow it again
> during processing one list.
>

I got that. I still question whether it is worth the effort,
versus just checking dps_of_interest.

>
>     interest_datapaths
>

Ben's patch 6 used recursion to implement a similar
flood fill algorithm, only over port_bindings rather than
related_datapaths. This made it clear that a depth first
recursion will result in much more compact code that
is easier to follow, compared to the breadth first
approach taken here. With recursion,
interest_datapaths goes away.

IMO the right thing to do is to combine this patch with
Ben's patch 6. Do a recursive flood fill computation on
local_datapaths, based on related_datapaths. At the
end compare local_datapaths with dps_of_interest
and adjust conditional monitoring clauses based on
datapath as appropriate.

>
>
>     It looks like this is similar to the flood_fill placement in
>     add_local_datapath.
>     Should each datapath be considered once in each run
>     through the main ovn-controller.c loop, such that
>     different flood_fill runs going through the same
>     datapaths do not investigate any further (as in v9)?
>     Or should each flood_fill run to completion,
>     recalculating the entire set of related datapaths each
>     time that flood_fill is called, even if many of those
>     datapaths are already present in dps_of_interest
>     (as in this v10)?
>
> Yes, I changed it from v9 to v10.
> The number of datapaths is small relative to the number of ports, so it is
> only slightly more expensive.
> The issue is premature cleanup due to partial port bindings at startup -
> I’ll investigate this more later.
> Also, in general, I don’t want to make any assumptions about input data
> that I don’t have to.
>

As I mentioned above with respect to the placement of
do_datapath_flood_fill in add_local_datapath, there are
much more significant cases of possible timing windows
that ovn does not attempt to address. I don't see this as
being more important than those other cases. IMO the
benefits do not justify the additional complexity.

>
>     +
>     > +    struct interest_dps_list_elem {
>     > +        struct ovs_list list_node;
>     > +        const struct sbrec_datapath_binding * dp;
>     > +    };
>     > +    struct ovs_list interest_datapaths;
>     > +    ovs_list_init(&interest_datapaths);
>     > +
>     > +    struct interest_dps_list_elem *dp_list_elem =
>     > +        xzalloc(sizeof *dp_list_elem);
>     > +    dp_list_elem->dp = dp_start;
>     > +    ovs_list_push_back(&interest_datapaths,
> &dp_list_elem->list_node);
>     > +
>     > +    while (!ovs_list_is_empty(&interest_datapaths)) {
>     > +
>     > +        struct ovs_list *list_node_ptr =
>     > +            ovs_list_pop_front(&interest_datapaths);
>     > +        dp_list_elem = CONTAINER_OF(list_node_ptr,
>     > +            struct interest_dps_list_elem, list_node);
>     > +           size_t num_related_datapaths =
>     > +            dp_list_elem->dp->n_related_datapaths;
>     > +           struct sbrec_datapath_binding **related_datapaths =
>     > +            dp_list_elem->dp->related_datapaths;
>     > +
>     > +        dp_node = xzalloc(sizeof *dp_node);
>     > +        dp_node->sb_db = dp_list_elem->dp;
>     > +        hmap_insert(&related_dp_processed, &dp_node->hmap_node,
>     > +                     uuid_hash(&dp_node->sb_db->header_.uuid));
>     > +
>     > +        add_datapath_of_interest(ctx, dp_list_elem->dp);
>     > +        free(dp_list_elem);
>     > +        for (int i = 0; i < num_related_datapaths; i++) {
>     > +            bool found = false;
>     > +            HMAP_FOR_EACH_WITH_HASH (
>     > +                    dp_node, hmap_node,
>     > +                    uuid_hash(&related_datapaths[i]->header_.uuid),
>     > +                    &related_dp_processed) {
>     > +
>     > +                if (uuid_equals(&dp_node->sb_db->header_.uuid,
>     > +                                &related_datapaths[i]->header_.uuid))
> {
>     > +                    found = true;
>     > +                    break;
>     > +                }
>     > +            }
>     > +            if(!found &&
>     > +               !remote_l3gat_lrd_port(related_datapaths[i],
>     > chassis_rec)) {
>     > +                dp_list_elem = xzalloc(sizeof *dp_list_elem);
>     > +                dp_list_elem->dp = related_datapaths[i];
>     > +                ovs_list_push_back(&interest_datapaths,
>     > &dp_list_elem->list_node);
>     > +            }
>     > +        }
>     > +    }
>     > +    struct dp_checked *cur_node, *next_node;
>     > +    HMAP_FOR_EACH_SAFE (cur_node, next_node, hmap_node,
>     > &related_dp_processed) {
>     > +        hmap_remove(&related_dp_processed, &cur_node->hmap_node);
>     > +        free(cur_node);
>     > +    }
>     > +    hmap_destroy(&related_dp_processed);
>     > +}
>     > +
>     >  static const struct ovsrec_bridge *
>     >  create_br_int(struct controller_ctx *ctx,
>     >                const struct ovsrec_open_vswitch *cfg,
>     > @@ -465,6 +648,10 @@ main(int argc, char *argv[])
>     >          ovsdb_idl_create(ovnsb_remote, &sbrec_idl_class, true,
> true));
>     >      ovsdb_idl_omit_alert(ovnsb_idl_loop.idl,
> &sbrec_chassis_col_nb_cfg);
>     >
>     > +    /* Start with suppressing all logical flows and then later
> request
>     > those
>     > +     * that are specifically needed. */
>     > +    sbrec_logical_flow_add_clause_false(ovnsb_idl_loop.idl);
>     > +
>     >      /* Track the southbound idl. */
>     >      ovsdb_idl_track_add_all(ovnsb_idl_loop.idl);
>     >
>     > @@ -561,6 +748,7 @@ main(int argc, char *argv[])
>     >              lport_index_destroy(&lports);
>     >          }
>     >
>     > +        clean_stale_datapaths_of_interest(&ctx);
>     >          sset_destroy(&all_lports);
>     >
>     >          struct local_datapath *cur_node, *next_node;
>     > diff --git a/ovn/controller/ovn-controller.h b/ovn/controller/ovn-
>     > controller.h
>     > index 8a3e855..e35b5b0 100644
>     > --- a/ovn/controller/ovn-controller.h
>     > +++ b/ovn/controller/ovn-controller.h
>     > @@ -76,6 +76,16 @@ const struct ovsrec_bridge *get_bridge(struct
> ovsdb_idl
>     > *,
>     >  const struct sbrec_chassis *get_chassis(struct ovsdb_idl *,
>     >                                          const char *chassis_id);
>     >
>     > +void do_datapath_flood_fill(const struct controller_ctx *,
>     > +                            const struct sbrec_datapath_binding *,
>     > +                            const struct sbrec_chassis *);
>     > +
>     > +void
>     > +set_datapath_of_interest_used(const struct sbrec_datapath_binding
> *dp);
>     > +
>     > +void
>     > +set_all_datapaths_of_interest_stale(void);
>     > +
>     >  /* Must be a bit-field ordered from most-preferred (higher number)
> to
>     >   * least-preferred (lower number). */
>     >  enum chassis_tunnel_type {
>     > diff --git a/ovn/controller/patch.c b/ovn/controller/patch.c
>     > index 79a7d81..9e8e856 100644
>     > --- a/ovn/controller/patch.c
>     > +++ b/ovn/controller/patch.c
>     > @@ -330,6 +330,9 @@ add_logical_patch_ports(struct controller_ctx
> *ctx,
>     >              free(dst_name);
>     >              free(src_name);
>     >              add_patched_datapath(patched_datapaths, binding,
> local_port);
>     > +
>     > +            sbrec_port_binding_add_clause_
> logical_port(ctx->ovnsb_idl,
>     > +                                                       OVSDB_F_EQ,
> peer);
>     >
>
>     Why is this necessary?
>     This could add logical ports that are unrelated to this chassis.
>
> This is for peer patch ports.
> This has no benefit anymore and it will monitor some patch ports
> unnecessarily going forward.
> I removed it and the corresponding cleanup.
> patch.c changes are now zero.
>

I like having no changes to patch.c

>
>     The northd logic adds related_datapaths for any patch port.
>     The flood fill based on related_datapaths should take care of
>     datapaths across patch ports, starting from datapaths that
>     this chassis does care about.
>
>     The add_local_datapath logic works for starting points
>     including vifs, l2gateway, and l3gateway.
>     What do you think is missing?
>
>
>                  if (local_port) {
>     >                  if (binding->chassis != chassis_rec &&
>     > ctx->ovnsb_idl_txn) {
>     >                      sbrec_port_binding_set_chassis(binding,
> chassis_rec);
>     > @@ -339,6 +342,26 @@ add_logical_patch_ports(struct controller_ctx
> *ctx,
>     >      }
>     >  }
>     >
>     > +/* Non-logical patch ports should be unaffected. */
>     > +static void
>     > +remove_patch_port_monitors(struct controller_ctx *ctx,
>     > +                            const struct ovsrec_port *port)
>     > +{
>     > +    for (size_t i = 0; i < port->n_interfaces; i++) {
>     > +        struct ovsrec_interface *iface = port->interfaces[i];
>     > +        if (strcmp(iface->type, "patch")) {
>     > +            continue;
>     >
>
>     I believe your code adds clauses for l3gateway ports above,
>     but does not remove those monitors here.
>
>
>     +        }
>     > +        const char *iface_id = smap_get(&iface->external_ids,
>     > +                                        "iface-id");
>     > +        if (iface_id) {
>
>
>     Patch interfaces don't have iface_id. They have options "peer",
>     and the ports (not interfaces) have external_ids with key
>     "ovn-logical-patch-port" or "ovn-l3gateway-port".
>
> That was a sloppy copy/paste from binding.c to check for external ids that
> I never finished afterwards.
> I intended to check for these:
>             smap_get(&port->external_ids, "ovn-l3gateway-port")
>             smap_get(&port->external_ids, "ovn-logical-patch-port")
>
> The iface_id argument below also did not get updated.
> This code is deprecated now anyways.
>

Agreed.

>
>
>     >
>
>     +            sbrec_port_binding_remove_clause_logical_port(ctx->
> ovnsb_idl,
>     > +
> OVSDB_F_EQ,
>     > +                                                          iface_id);
>     > +        }
>     > +    }
>     > +}
>     > +
>     >  void
>     >  patch_run(struct controller_ctx *ctx, const struct ovsrec_bridge
> *br_int,
>     >            const char *chassis_id, struct hmap *local_datapaths,
>     > @@ -374,6 +397,7 @@ patch_run(struct controller_ctx *ctx, const
> struct
>     > ovsrec_bridge *br_int,
>     >      SHASH_FOR_EACH_SAFE (port_node, port_next_node,
> &existing_ports) {
>     >          struct ovsrec_port *port = port_node->data;
>     >          shash_delete(&existing_ports, port_node);
>     > +        remove_patch_port_monitors(ctx, port);
>     >          remove_port(ctx, port);
>     >      }
>     >      shash_destroy(&existing_ports);
>     > diff --git a/ovn/northd/ovn-northd.c b/ovn/northd/ovn-northd.c
>     > index f2dc353..7ea5dc9 100644
>     > --- a/ovn/northd/ovn-northd.c
>     > +++ b/ovn/northd/ovn-northd.c
>     > @@ -231,6 +231,42 @@ struct tnlid_node {
>     >      uint32_t tnlid;
>     >  };
>     >
>     > +struct related_datapath_node {
>     > +    struct hmap_node hmap_node;
>     > +    const struct sbrec_datapath_binding *sb_db;
>     > +};
>     > +
>     > +static void
>     > +add_related_datapath(struct hmap *related_datapaths,
>     > +                     const struct sbrec_datapath_binding *sb,
>     > +                     size_t *n_related_datapaths)
>     > +{
>     > +    struct related_datapath_node *node;
>     > +    HMAP_FOR_EACH_WITH_HASH (node, hmap_node,
>     > +                             uuid_hash(&sb->header_.uuid),
>     > +                             related_datapaths) {
>     > +        if (uuid_equals(&sb->header_.uuid,
> &node->sb_db->header_.uuid)) {
>     > +            return;
>     > +        }
>     > +    }
>     > +
>     > +    node = xzalloc(sizeof *node);
>     > +    hmap_insert(related_datapaths, &node->hmap_node,
>     > +                uuid_hash(&sb->header_.uuid));
>     > +    node->sb_db = sb;
>     > +    (*n_related_datapaths)++;
>     > +}
>     > +
>     > +static void
>     > +destroy_related_datapaths(struct hmap *related_datapaths)
>     > +{
>     > +    struct related_datapath_node *node;
>     > +    HMAP_FOR_EACH_POP (node, hmap_node, related_datapaths) {
>     > +        free(node);
>     > +    }
>     > +    hmap_destroy(related_datapaths);
>     > +}
>     > +
>     >  static void
>     >  destroy_tnlids(struct hmap *tnlids)
>     >  {
>     > @@ -376,6 +412,8 @@ struct ovn_datapath {
>     >      size_t n_router_ports;
>     >
>     >      struct hmap port_tnlids;
>     > +    struct hmap related_datapaths;
>     > +    size_t n_related_datapaths;
>     >      uint32_t port_key_hint;
>     >
>     >      bool has_unknown;
>     > @@ -426,6 +464,7 @@ ovn_datapath_create(struct hmap *datapaths, const
>     > struct uuid *key,
>     >      od->nbr = nbr;
>     >      hmap_init(&od->port_tnlids);
>     >      hmap_init(&od->ipam);
>     > +    hmap_init(&od->related_datapaths);
>     >      od->port_key_hint = 0;
>     >      hmap_insert(datapaths, &od->key_node, uuid_hash(&od->key));
>     >      return od;
>     > @@ -441,6 +480,7 @@ ovn_datapath_destroy(struct hmap *datapaths,
> struct
>     > ovn_datapath *od)
>     >          hmap_remove(datapaths, &od->key_node);
>     >          destroy_tnlids(&od->port_tnlids);
>     >          destroy_ipam(&od->ipam);
>     > +        destroy_related_datapaths(&od->related_datapaths);
>     >          free(od->router_ports);
>     >          free(od);
>     >      }
>     > @@ -671,6 +711,69 @@ struct ovn_port {
>     >      struct ovs_list list;       /* In list of similar records. */
>     >  };
>     >
>     > +static void
>     > +populate_sb_related_datapaths(struct ovn_datapath *od)
>     > +{
>     > +    struct sbrec_datapath_binding **sb_related_datapaths
>     > +        = xmalloc(sizeof(*sb_related_datapaths) *
>     > od->n_related_datapaths);
>     > +    int rdi = 0;
>     > +    struct related_datapath_node *related_datapath;
>     > +    HMAP_FOR_EACH (related_datapath, hmap_node,
>     > +                   &od->related_datapaths) {
>     > +        if (rdi >= od->n_related_datapaths) {
>     > +            static struct vlog_rate_limit rl
>     > +                = VLOG_RATE_LIMIT_INIT(5, 1);
>     > +            VLOG_ERR_RL(&rl, "related datapaths accounting error"
>     > +                        UUID_FMT, UUID_ARGS(&od->key));
>     > +            break;
>     > +        }
>     > +
>     > +        sb_related_datapaths[rdi] = CONST_CAST(
>     > +            struct sbrec_datapath_binding *,
> related_datapath->sb_db);
>     > +        rdi++;
>     > +    }
>     > +    sbrec_datapath_binding_set_related_datapaths(od->sb,
>     > +        sb_related_datapaths, od->n_related_datapaths);
>     > +    free(sb_related_datapaths);
>     > +}
>     > +
>     > +static void
>     > +generate_sb_related_datapaths_from_ports(struct ovs_list *both)
>     > +{
>     > +    struct dp_checked {
>     > +           struct hmap_node hmap_node;
>     > +           const struct sbrec_datapath_binding *sb_db;
>     > +       };
>     > +       struct dp_checked *dp_node;
>     > +       struct hmap dp_processed = HMAP_INITIALIZER(&dp_processed);
>     > +    struct ovn_port *op, *next;
>     > +       LIST_FOR_EACH_SAFE (op, next, list, both) {
>     > +           bool found = false;
>     > +           HMAP_FOR_EACH_WITH_HASH (dp_node, hmap_node,
>     > +                                    uuid_hash(&op->od->sb->header_
> .uuid),
>     > +                                    &dp_processed) {
>     > +               if (uuid_equals(&dp_node->sb_db->header_.uuid,
>     > +                               &op->od->sb->header_.uuid)) {
>     > +                   found = true;
>     > +                   break;
>     > +               }
>     > +           }
>     > +           if(!found){
>     > +            dp_node = xzalloc(sizeof *dp_node);
>     > +               dp_node->sb_db = op->od->sb;
>     > +               hmap_insert(&dp_processed, &dp_node->hmap_node,
>     > +                           uuid_hash(&dp_node->sb_db->
> header_.uuid));
>     > +            populate_sb_related_datapaths(op->od);
>     > +           }
>     > +       }
>     > +       struct dp_checked *cur_node, *next_node;
>     > +       HMAP_FOR_EACH_SAFE (cur_node, next_node, hmap_node,
> &dp_processed)
>     > {
>     > +           hmap_remove(&dp_processed, &cur_node->hmap_node);
>     > +           free(cur_node);
>     > +       }
>     > +       hmap_destroy(&dp_processed);
>     > +}
>     > +
>     >  static struct ovn_port *
>     >  ovn_port_create(struct hmap *ports, const char *key,
>     >                  const struct nbrec_logical_switch_port *nbsp,
>     > @@ -1359,12 +1462,19 @@ ovn_port_update_sbrec(const struct ovn_port
> *op,
>     >              sbrec_port_binding_set_type(op->sb, "patch");
>     >          }
>     >
>     > +        if (op->peer && op->peer->od && op->peer->od->sb) {
>     > +            add_related_datapath(&op->od->related_datapaths,
>     > +                                 op->peer->od->sb,
>     > +                                 &op->od->n_related_datapaths);
>     > +        }
>     > +
>     >          const char *peer = op->peer ? op->peer->key : "<error>";
>     >          struct smap new;
>     >          smap_init(&new);
>     >          smap_add(&new, "peer", peer);
>     >          if (chassis) {
>     >              smap_add(&new, "l3gateway-chassis", chassis);
>     > +            sbrec_datapath_binding_set_options(op->sb->datapath,
> &new);
>     >
>
>     It would be cleaner to modify the datapath_binding in
>     build_datapaths, though that would require addition of a run
>     through the both list to update or remove "chassis" as
>     necessary, as well as modifying the existing run through
>     the nb_only list.
>
>     I don't see any logic to remove "chassis" if that is removed
>     from the nbr entry.
>
>     With this code order, you are also adding "peer" which does
>     not belong on datapath_binding. If this code stays here, then
>     the smap_add "peer" line should be moved below this.
>
> Yeah, that was sloppy.
> “peer” in the datapath_binding options map will do no harm fortunately.
> But the l3gateway-chassis option conceivably may need to be cleared in the
> same
> datapath_binding entry at some point in it’s lifetime - a datapath may
> cease to be a
> l3gateway and become something else.
> I moved the line.
>

The change looks good.

Mickey

>
>
>              }
>     >          sbrec_port_binding_set_options(op->sb, &new);
>     >          smap_destroy(&new);
>     > @@ -1411,6 +1521,12 @@ ovn_port_update_sbrec(const struct ovn_port
> *op,
>     >                  sbrec_port_binding_set_type(op->sb, "patch");
>     >              }
>     >
>     > +            if (op->peer && op->peer->od && op->peer->od->sb) {
>     > +                add_related_datapath(&op->od->related_datapaths,
>     > +                                     op->peer->od->sb,
>     > +                                     &op->od->n_related_datapaths);
>     > +            }
>     > +
>     >              const char *router_port = smap_get_def(&op->nbsp->
> options,
>     >                                                     "router-port",
>     > "<error>");
>     >              struct smap new;
>     > @@ -1519,6 +1635,7 @@ build_ports(struct northd_context *ctx, struct
> hmap
>     > *datapaths,
>     >          cleanup_mac_bindings(ctx, ports);
>     >      }
>     >
>     > +    generate_sb_related_datapaths_from_ports(&both);
>     >      tag_alloc_destroy(&tag_alloc_table);
>     >      destroy_chassis_queues(&chassis_qdisc_queues);
>     >  }
>     > @@ -4952,6 +5069,10 @@ main(int argc, char *argv[])
>     >      add_column_noalert(ovnsb_idl_loop.idl,
>     >                         &sbrec_datapath_binding_col_tunnel_key);
>     >      add_column_noalert(ovnsb_idl_loop.idl,
>     > +                       &sbrec_datapath_binding_col_
> related_datapaths);
>     > +    add_column_noalert(ovnsb_idl_loop.idl,
>     > +                       &sbrec_datapath_binding_col_options);
>     > +    add_column_noalert(ovnsb_idl_loop.idl,
>     >                         &sbrec_datapath_binding_col_external_ids);
>     >
>     >      ovsdb_idl_add_table(ovnsb_idl_loop.idl,
> &sbrec_table_port_binding);
>     > diff --git a/ovn/ovn-sb.ovsschema b/ovn/ovn-sb.ovsschema
>     > index 89342fe..8dfb2f4 100644
>     > --- a/ovn/ovn-sb.ovsschema
>     > +++ b/ovn/ovn-sb.ovsschema
>     > @@ -1,7 +1,7 @@
>     >  {
>     >      "name": "OVN_Southbound",
>     > -    "version": "1.9.0",
>     > -    "cksum": "239060528 9012",
>     > +    "version": "1.10.0",
>     > +    "cksum": "3562877871 9537",
>     >      "tables": {
>     >          "SB_Global": {
>     >              "columns": {
>     > @@ -93,7 +93,17 @@
>     >                                        "maxInteger": 16777215}}},
>     >                  "external_ids": {
>     >                      "type": {"key": "string", "value": "string",
>     > -                             "min": 0, "max": "unlimited"}}},
>     > +                             "min": 0, "max": "unlimited"}},
>     > +                "options": {
>     > +                     "type": {"key": "string",
>     > +                              "value": "string",
>     > +                              "min": 0,
>     > +                              "max": "unlimited"}},
>     > +                "related_datapaths": {"type": {"key": {"type":
> "uuid",
>     > +                                      "refTable":
> "Datapath_Binding",
>     > +                                      "refType": "weak"},
>     > +                                      "min": 0,
>     > +                                      "max": "unlimited"}}},
>     >              "indexes": [["tunnel_key"]],
>     >              "isRoot": true},
>     >          "Port_Binding": {
>     > diff --git a/ovn/ovn-sb.xml b/ovn/ovn-sb.xml
>     > index 65191ed..88de25d 100644
>     > --- a/ovn/ovn-sb.xml
>     > +++ b/ovn/ovn-sb.xml
>     > @@ -1556,6 +1556,19 @@ tcp.flags = RST;
>     >        constructed for each supported encapsulation.
>     >      </column>
>     >
>     > +    <column name="options" key="l3gateway-chassis">
>     > +      The <code>chassis</code> in which the router datapath resides.
>     > +    </column>
>     > +
>     > +    <column name="related_datapaths">
>     > +      The related_datapaths column is used to keep track of which
>     > datapaths
>     > +      are connected to each other.  This is leveraged in
> ovn-controller to
>     > +      do a flood fill from each local logical switch to determine
> the full
>     > +      set of datapaths needed on a given ovn-controller.   This set
> of
>     > +      datapaths can be used to determine which logical ports and
> logical
>     > +      flows an ovn-controller should monitor.
>     > +    </column>
>     > +
>     >      <group title="OVN_Northbound Relationship">
>     >        <p>
>     >          Each row in <ref table="Datapath_Binding"/> is associated
> with
>     > some
>     > diff --git a/tests/ovn.at b/tests/ovn.at
>     > index a226849..af544e1 100644
>     > --- a/tests/ovn.at
>     > +++ b/tests/ovn.at
>     > @@ -1673,6 +1673,141 @@ OVN_CLEANUP([hv1],[hv2])
>     >
>     >  AT_CLEANUP
>     >
>     > +# 2 hypervisors, each with one logical switch with 2 logical ports.
>     > +AT_SETUP([ovn -- conditional monitoring: 2 HVs each with 1 LS and 2
> LPs])
>     > +AT_SKIP_IF([test $HAVE_PYTHON = no])
>     > +ovn_start
>     > +
>     > +# In this test cases we create 2 switches, all connected to same
>     > +# physical network (through br-phys on each HV). Each switch has
>     > +# VIF ports across 1 HV. Each HV has 2 VIF ports. The first digit
>     > +# of VIF port name indicates the hypervisor it is bound to, e.g.
>     > +# lp12 means VIF 2 on hv1.
>     > +
>     > +for i in 1 2; do
>     > +    ls_name=ls$i
>     > +    ovn-nbctl ls-add $ls_name
>     > +done
>     > +
>     > +net_add n1
>     > +for i in 1 2; do
>     > +    sim_add hv$i
>     > +    as hv$i
>     > +    ovs-vsctl add-br br-phys
>     > +    ovn_attach n1 br-phys 192.168.0.$i
>     > +    ovs-appctl -t ovn-controller vlog/set ANY:file:DBG
>     > +
>     > +    for j in 1 2; do
>     > +        ovs-vsctl add-port br-int vif$i$j -- \
>     > +            set Interface vif$i$j external-ids:iface-id=lp$i$j \
>     > +                                  options:tx_pcap=hv$i/vif$i$j-tx.pcap
> \
>     > +                                  options:rxq_pcap=hv$i/vif$i$j-rx.pcap
> \
>     > +                                  ofport-request=$i$j
>     > +
>     > +        lsp_name=lp$i$j
>     > +        if test $i -eq 1; then
>     > +            ls_name=ls1
>     > +        else
>     > +            ls_name=ls2
>     > +        fi
>     > +
>     > +        ovn-nbctl lsp-add $ls_name $lsp_name
>     > +        ovn-nbctl lsp-set-addresses $lsp_name f0:00:00:00:00:$i$j
>     > +        ovn-nbctl lsp-set-port-security $lsp_name
> f0:00:00:00:00:$i$j
>     > +
>     > +        OVS_WAIT_UNTIL([test x`ovn-nbctl lsp-get-up $lsp_name` =
> xup])
>     > +    done
>     > +done
>     > +
>     > +ovn_populate_arp
>     > +
>     > +# XXX This is now the 3rd copy of these functions in this file ...
>     > +
>     > +# Given the name of a logical port, prints the name of the
> hypervisor
>     > +# on which it is located.
>     > +vif_to_hv() {
>     > +    echo hv${1%?}
>     > +}
>     > +#
>     > +# test_packet INPORT DST SRC ETHTYPE OUTPORT...
>     > +#
>     > +# This shell function causes a packet to be received on INPORT.  The
>     > packet's
>     > +# content has Ethernet destination DST and source SRC (each exactly
> 12 hex
>     > +# digits) and Ethernet type ETHTYPE (4 hex digits).  The OUTPORTs
> (zero or
>     > +# more) list the VIFs on which the packet should be received.
> INPORT and
>     > the
>     > +# OUTPORTs are specified as logical switch port numbers, e.g. 11 for
>     > vif11.
>     > +for i in 1 2; do
>     > +    for j in 1 2; do
>     > +        : > $i$j.expected
>     > +    done
>     > +done
>     > +test_packet() {
>     > +    local inport=$1 src=$2 dst=$3 eth=$4; shift; shift; shift; shift
>     > +    local packet=${src}${dst}${eth}
>     > +    hv=`vif_to_hv $inport`
>     > +    vif=vif$inport
>     > +    as $hv ovs-appctl netdev-dummy/receive $vif $packet
>     > +    for outport; do
>     > +        echo $packet >> $outport.expected
>     > +    done
>     > +}
>     > +
>     > +# lp11 and lp12 are on the same network and on
>     > +# same hypervisors
>     > +test_packet 11 f00000000012 f00000000011 1112 12
>     > +test_packet 12 f00000000011 f00000000012 1211 11
>     > +
>     > +# lp21 and lp22 are on the same network and on
>     > +# the same hypervisor
>     > +test_packet 21 f00000000022 f00000000021 2122 22
>     > +test_packet 22 f00000000021 f00000000022 2221 21
>     > +
>     > +# lp11 and lp22 are on different networks so should
>     > +# not be able to communicate.
>     > +test_packet 11 f00000000022 f00000000011 1122
>     > +test_packet 22 f00000000011 f00000000022 2211
>     > +
>     > +echo "------ OVN dump ------"
>     > +ovn-nbctl show
>     > +ovn-sbctl show
>     > +
>     > +echo "------ hv1 dump ------"
>     > +as hv1 ovs-vsctl show
>     > +as hv1 ovs-ofctl -O OpenFlow13 dump-flows br-int
>     > +
>     > +echo "------ hv2 dump ------"
>     > +as hv2 ovs-vsctl show
>     > +as hv2 ovs-ofctl -O OpenFlow13 dump-flows br-int
>     > +
>     > +# Check that all datapath components are monitored as expected.
>     > +AT_CHECK([cat hv1/ovn-controller.log | grep 'tunnel_key 1'], [0],
>     > [ignore-nolog])
>     > +AT_CHECK([cat hv1/ovn-controller.log | grep 'tunnel_key 2'], [1],
>     > [ignore-nolog])
>     > +AT_CHECK([cat hv2/ovn-controller.log | grep 'tunnel_key 2'], [0],
>     > [ignore-nolog])
>     > +AT_CHECK([cat hv2/ovn-controller.log | grep 'tunnel_key 1'], [1],
>     > [ignore-nolog])
>     > +
>     > +dbg_instances1=`cat hv1/ovn-controller.log | grep 'tunnel_key 1' |
> wc -l`
>     > +echo ls1 monitor traces on hv1 $dbg_instances1
>     > +dbg_instances2=`cat hv1/ovn-controller.log | grep 'tunnel_key 2' |
> wc -l`
>     > +echo ls2 monitor traces on hv1 $dbg_instances2
>     > +AT_CHECK([test $dbg_instances1 -gt $dbg_instances2], [0],
> [ignore-nolog])
>     > +
>     > +dbg_instances1=`cat hv2/ovn-controller.log | grep 'tunnel_key 2' |
> wc -l`
>     > +echo ls2 monitor traces on hv2 $dbg_instances1
>     > +dbg_instances2=`cat hv2/ovn-controller.log | grep 'tunnel_key 1' |
> wc -l`
>     > +echo ls1 monitor traces on hv2 $dbg_instances2
>     > +AT_CHECK([test $dbg_instances1 -gt $dbg_instances2], [0],
> [ignore-nolog])
>     > +
>     > +# Now check the packets actually received against the ones expected.
>     > +for i in 1 2; do
>     > +    for j in 1 2; do
>     > +        OVN_CHECK_PACKETS([hv$i/vif$i$j-tx.pcap], [$i$j.expected])
>     > +    done
>     > +done
>     > +
>     > +OVN_CLEANUP([hv1],[hv2])
>     > +
>     > +AT_CLEANUP
>     > +
>     >  AT_SETUP([ovn -- vtep: 3 HVs, 1 VIFs/HV, 1 GW, 1 LS])
>     >  AT_KEYWORDS([vtep])
>     >  AT_SKIP_IF([test $HAVE_PYTHON = no])
>     > @@ -2439,6 +2574,223 @@ OVN_CLEANUP([hv1], [hv2], [hv3])
>     >
>     >  AT_CLEANUP
>     >
>     > +# 3 hypervisors, 3 logical switches with 3 logical ports each, 1
> logical
>     > router
>     > +AT_SETUP([ovn -- conditional Monitoring: 3 HVs, 3 LS, 3 lports/LS,
> 1 LR])
>     > +AT_SKIP_IF([test $HAVE_PYTHON = no])
>     > +ovn_start
>     > +
>     > +# Logical network:
>     > +#
>     > +# Three logical switches ls1, ls2, ls3.
>     > +# One logical router lr0 connected to ls[123],
>     > +# with nine subnets, three per logical switch:
>     > +#
>     > +#    lrp11 on ls1 for subnet 192.168.11.0/24
>     > +#    lrp12 on ls1 for subnet 192.168.12.0/24
>     > +#    lrp13 on ls1 for subnet 192.168.13.0/24
>
>
>     This is confusing since the code below does not add
>     lrp11, lrp12, lrp13, just the subnets in ls1 which is
>     not attached to lr0.
>
> That description was not cleaned up when the test was updated.
> I fixed it.
>
>     I am having trouble understanding the motivation for
>     3 subnets per ls. It will work, just wondering why you
>     thought that it is necessary or helpful?
>
> I just needed multiple ports. Having multiple subnets allows easier porting
> to create a different test case in future based on this one.
>
>     >
>     > +#    ...
>     > +#    lrp33 on ls3 for subnet 192.168.33.0/24
>     > +#
>     > +# 27 VIFs, 9 per LS, 3 per subnet: lp[123][123][123], where the
> first two
>     > +# digits are the subnet and the last digit distinguishes the VIF.
>     > +for i in 1 2 3; do
>     > +    ovn-nbctl ls-add ls$i
>     > +    for j in 1 2 3; do
>     > +        for k in 1 2 3; do
>     > +            ovn-nbctl \
>     > +                -- lsp-add ls$i lp$i$j$k \
>     > +                -- lsp-set-addresses lp$i$j$k "f0:00:00:00:0$i:$j$k
> \
>     > +                   192.168.$i$j.$k"
>     > +        done
>     > +    done
>     > +done
>     > +
>     > +# Only ls2 and ls3 are connected by the logical router.
>     > +ovn-nbctl lr-add lr0
>     > +for i in 2 3; do
>     > +    for j in 1 2 3; do
>     > +        ovn-nbctl lrp-add lr0 lrp$i$j 00:00:00:00:ff:$i$j
>     > 192.168.$i$j.254/24
>     > +        ovn-nbctl \
>     > +            -- lsp-add ls$i lrp$i$j-attachment \
>     > +            -- set Logical_Switch_Port lrp$i$j-attachment
> type=router \
>     > +                             options:router-port=lrp$i$j \
>     > +                             addresses='"00:00:00:00:ff:'$i$j'"'
>     > +    done
>     > +done
>     > +
>     > +# Physical network:
>     > +#
>     > +# Three hypervisors hv[123].
>     > +# lp[1??] on hv1
>     > +# lp[2??] on hv2.
>     > +# lp[3??] on hv3.
>     >
>
>     I like this. So ls3 is only present on hv3?
>     If so, this will test the related_datapaths part of the code, making
>     sure that hv2 grabs flows for datapaths lr0 and ls3 even though
>     there are no local VIFs in ls3 on hv2. This would catch the
>     code order northd issue from v9.
>
>     Mickey
>
>     +
>     > +# Given the name of a logical port, prints the name of the
> hypervisor
>     > +# on which it is located.
>     > +vif_to_hv() {
>     > +    case $1 in dnl (
>     > +        1??) echo 1 ;; dnl (
>     > +        2??) echo 2 ;; dnl (
>     > +        3??) echo 3 ;;
>     > +    esac
>     > +}
>     > +
>     > +# Given the name of a logical port, prints the name of its logical
> router
>     > +# port, e.g. "vif_to_lrp 123" yields 12.
>     > +vif_to_lrp() {
>     > +    echo ${1%?}
>     > +}
>     > +
>     > +# Given the name of a logical port, prints the name of its logical
>     > +# switch, e.g. "vif_to_ls 123" yields 1.
>     > +vif_to_ls() {
>     > +    echo ${1%??}
>     > +}
>     > +
>     > +net_add n1
>     > +for i in 1 2 3; do
>     > +    sim_add hv$i
>     > +    as hv$i
>     > +    ovs-vsctl add-br br-phys
>     > +    ovn_attach n1 br-phys 192.168.0.$i
>     > +    ovs-appctl -t ovn-controller vlog/set ANY:file:DBG
>     > +done
>     > +for i in 1 2 3; do
>     > +    for j in 1 2 3; do
>     > +        for k in 1 2 3; do
>     > +            hv=`vif_to_hv $i$j$k`
>     > +                as hv$hv ovs-vsctl \
>     > +                -- add-port br-int vif$i$j$k \
>     > +                -- set Interface vif$i$j$k \
>     > +                    external-ids:iface-id=lp$i$j$k \
>     > +                    options:tx_pcap=hv$hv/vif$i$j$k-tx.pcap \
>     > +                    options:rxq_pcap=hv$hv/vif$i$j$k-rx.pcap \
>     > +                    ofport-request=$i$j$k
>     > +        done
>     > +    done
>     > +done
>     > +
>     > +# Pre-populate the hypervisors' ARP tables so that we don't lose any
>     > +# packets for ARP resolution (native tunneling doesn't queue packets
>     > +# for ARP resolution).
>     > +ovn_populate_arp
>     > +
>     > +# Allow some time for ovn-northd and ovn-controller to catch up.
>     > +# XXX This should be more systematic.
>     > +sleep 1
>     > +
>     > +# test_ip INPORT SRC_MAC DST_MAC SRC_IP DST_IP OUTPORT...
>     > +#
>     > +# This shell function causes a packet to be received on INPORT.  The
>     > packet's
>     > +# content has Ethernet destination DST and source SRC (each exactly
> 12 hex
>     > +# digits) and Ethernet type ETHTYPE (4 hex digits).  The OUTPORTs
> (zero or
>     > +# more) list the VIFs on which the packet should be received.
> INPORT and
>     > the
>     > +# OUTPORTs are specified as logical switch port numbers, e.g. 123
> for
>     > vif123.
>     > +for i in 1 2 3; do
>     > +    for j in 1 2 3; do
>     > +        for k in 1 2 3; do
>     > +            : > $i$j$k.expected
>     > +        done
>     > +    done
>     > +done
>     > +test_ip() {
>     > +    # This packet has bad checksums but logical L3 routing doesn't
> check.
>     > +    local inport=$1 src_mac=$2 dst_mac=$3 src_ip=$4 dst_ip=$5
>     > +    local packet=${dst_mac}${src_mac}08004500001c0000000040110000${
>     > src_ip}${dst_ip}0035111100080000
>     > +    shift; shift; shift; shift; shift
>     > +    hv=hv`vif_to_hv $inport`
>     > +    as $hv ovs-appctl netdev-dummy/receive vif$inport $packet
>     > +    in_ls=`vif_to_ls $inport`
>     > +    in_lrp=`vif_to_lrp $inport`
>     > +    for outport; do
>     > +        out_ls=`vif_to_ls $outport`
>     > +        if test $in_ls = $out_ls; then
>     > +            # Ports on the same logical switch receive exactly the
> same
>     > packet.
>     > +            echo $packet
>     > +        else
>     > +            # Routing decrements TTL and updates source and dest MAC
>     > +            # (and checksum).
>     > +            out_lrp=`vif_to_lrp $outport`
>     > +            echo f00000000${outport}00000000ff$
>     > {out_lrp}08004500001c00000000"3f1101"00${src_ip}${dst_ip}
> 0035111100080000
>     > +        fi >> $outport.expected
>     > +    done
>     > +}
>     > +
>     > +as hv1 ovn-sbctl list datapath_binding
>     > +echo "----------------------------"
>     > +as hv1 ovn-sbctl list port_binding
>     > +echo "----------------------------"
>     > +as hv1 ovn-sbctl dump-flows
>     > +#as hv1 ovs-vsctl --columns=name,ofport list interface
>     > +#as hv1 ovs-ofctl dump-flows br-int
>     > +
>     > +# Send IP packets between all pairs of source and destination ports:
>     > +#
>     > +# 1. Unicast IP packets are delivered to exactly one logical switch
> port
>     > +#    (except that packets destined to their input ports are
> dropped).
>     > +#
>     > +ip_to_hex() {
>     > +    printf "%02x%02x%02x%02x" "$@"
>     > +}
>     > +for is in 1 2 3; do
>     > +  for js in 1 2 3; do
>     > +    for ks in 1 2 3; do
>     > +      s=$is$js$ks
>     > +      smac=f00000000$s
>     > +      sip=`ip_to_hex 192 168 $is$js $ks`
>     > +      for id in 1 2 3; do
>     > +          for jd in 1 2 3; do
>     > +              for kd in 1 2 3; do
>     > +                d=$id$jd$kd
>     > +                dip=`ip_to_hex 192 168 $id$jd $kd`
>     > +                if test $is = $id; then dmac=f00000000$d; else
>     > dmac=00000000ff$is$js; fi
>     > +                if test $d != $s; then unicast=$d; else unicast=; fi
>     > +                if test $is = 1; then if test $is != $id; then
> unicast=;
>     > fi; fi
>     > +                if test $id = 1; then if test $is != $id; then
> unicast=;
>     > fi; fi
>     > +                test_ip $s $smac $dmac $sip $dip $unicast
>     > +              done
>     > +          done
>     > +        done
>     > +      done
>     > +  done
>     > +done
>     > +
>     > +# Allow some time for packet forwarding.
>     > +# XXX This can be improved.
>     > +sleep 1
>     > +
>     > +ovn-sbctl -f csv -d bare --no-heading \
>     > +    -- --columns=logical_port,ip,mac list mac_binding > mac_bindings
>     > +
>     > +# Now check the packets actually received against the ones expected.
>     > +for i in 1 2 3; do
>     > +    for j in 1 2 3; do
>     > +        for k in 1 2 3; do
>     > +            OVN_CHECK_PACKETS([hv`vif_to_hv
> $i$j$k`/vif$i$j$k-tx.pcap],
>     > +                              [$i$j$k.expected])
>     > +        done
>     > +    done
>     > +done
>     > +
>     > +# Check that all datapath components are monitored as expected.
>     > +AT_CHECK([cat hv1/ovn-controller.log | grep 'tunnel_key 1'], [0],
>     > [ignore-nolog])
>     > +AT_CHECK([cat hv1/ovn-controller.log | grep 'tunnel_key 2'], [1],
>     > [ignore-nolog])
>     > +AT_CHECK([cat hv1/ovn-controller.log | grep 'tunnel_key 3'], [1],
>     > [ignore-nolog])
>     > +AT_CHECK([cat hv1/ovn-controller.log | grep 'tunnel_key 4'], [1],
>     > [ignore-nolog])
>     > +AT_CHECK([cat hv2/ovn-controller.log | grep 'tunnel_key 1'], [1],
>     > [ignore-nolog])
>     > +AT_CHECK([cat hv2/ovn-controller.log | grep 'tunnel_key 2'], [0],
>     > [ignore-nolog])
>     > +AT_CHECK([cat hv2/ovn-controller.log | grep 'tunnel_key 3'], [0],
>     > [ignore-nolog])
>     > +AT_CHECK([cat hv2/ovn-controller.log | grep 'tunnel_key 4'], [0],
>     > [ignore-nolog])
>     > +AT_CHECK([cat hv3/ovn-controller.log | grep 'tunnel_key 1'], [1],
>     > [ignore-nolog])
>     > +AT_CHECK([cat hv3/ovn-controller.log | grep 'tunnel_key 2'], [0],
>     > [ignore-nolog])
>     > +AT_CHECK([cat hv3/ovn-controller.log | grep 'tunnel_key 3'], [0],
>     > [ignore-nolog])
>     > +AT_CHECK([cat hv3/ovn-controller.log | grep 'tunnel_key 4'], [0],
>     > [ignore-nolog])
>     > +
>     > +# Gracefully terminate daemons
>     > +OVN_CLEANUP([hv1], [hv2], [hv3])
>     > +
>     > +AT_CLEANUP
>     > +
>     >  # 3 hypervisors, one logical switch, 3 logical ports per hypervisor
>     >  AT_SETUP([ovn -- portsecurity : 3 HVs, 1 LS, 3 lports/HV])
>     >  AT_SKIP_IF([test $HAVE_PYTHON = no])
>     > @@ -2983,6 +3335,7 @@ ovs-vsctl -- add-port br-int vif2 -- \
>     >      options:tx_pcap=hv1/vif2-tx.pcap \
>     >      options:rxq_pcap=hv1/vif2-rx.pcap \
>     >      ofport-request=1
>     > +ovs-appctl -t ovn-controller vlog/set ANY:file:DBG
>     >
>     >
>     >  # Allow some time for ovn-northd and ovn-controller to catch up.
>     > @@ -3014,10 +3367,13 @@ echo "---------------------"
>     >  echo "---------SB dump-----"
>     >  ovn-sbctl list datapath_binding
>     >  echo "---------------------"
>     > +ovn-sbctl list port_binding
>     > +echo "---------------------"
>     >  ovn-sbctl list logical_flow
>     >  echo "---------------------"
>     >
>     >  echo "------ hv1 dump ----------"
>     > +as hv1 ovs-ofctl show br-int
>     >  as hv1 ovs-ofctl dump-flows br-int
>     >
>     >
>     > @@ -3031,10 +3387,13 @@ sleep 1
>     >  echo "---------SB dump-----"
>     >  ovn-sbctl list datapath_binding
>     >  echo "---------------------"
>     > +ovn-sbctl list port_binding
>     > +echo "---------------------"
>     >  ovn-sbctl list logical_flow
>     >  echo "---------------------"
>     >
>     >  echo "------ hv1 dump ----------"
>     > +as hv1 ovs-ofctl show br-int
>     >  as hv1 ovs-ofctl dump-flows br-int
>     >
>     >  as hv1 ovs-appctl netdev-dummy/receive vif1 $packet
>     > @@ -4109,6 +4468,7 @@ ovs-vsctl -- add-port br-int hv1-vif1 -- \
>     >      options:tx_pcap=hv1/vif1-tx.pcap \
>     >      options:rxq_pcap=hv1/vif1-rx.pcap \
>     >      ofport-request=1
>     > +ovs-appctl -t ovn-controller vlog/set ANY:file:DBG
>     >
>     >
>     >  sim_add hv2
>     > @@ -4120,6 +4480,7 @@ ovs-vsctl -- add-port br-int hv2-vif1 -- \
>     >      options:tx_pcap=hv2/vif1-tx.pcap \
>     >      options:rxq_pcap=hv2/vif1-rx.pcap \
>     >      ofport-request=1
>     > +ovs-appctl -t ovn-controller vlog/set ANY:file:DBG
>     >
>     >  # Pre-populate the hypervisors' ARP tables so that we don't lose any
>     >  # packets for ARP resolution (native tunneling doesn't queue packets
>     > @@ -4213,7 +4574,6 @@ src_ip=`ip_to_hex 192 168 1 2`
>     >  dst_ip=`ip_to_hex 172 16 1 2`
>     >  expected=${dst_mac}${src_mac}08004500001c000000003e110200${
>     > src_ip}${dst_ip}0035111100080000
>     >
>     > -
>     >  as hv1 ovs-appctl netdev-dummy/receive hv1-vif1 $packet
>     >  as hv1 ovs-appctl ofproto/trace br-int in_port=1 $packet
>     >
>     > @@ -4225,6 +4585,15 @@ as hv2 ovs-ofctl show br-int
>     >  as hv2 ovs-ofctl dump-flows br-int
>     >  echo "----------------------------"
>     >
>     > +# tunnel key 2 represents the gateway router, hence hv1 does not
> need
>     > +# to know about its ports and flows, since the transit switch acts
> as
>     > proxy.
>     > +AT_CHECK([cat hv2/ovn-controller.log | grep 'tunnel_key 2'], [0],
>     > [ignore-nolog])
>     > +AT_CHECK([cat hv1/ovn-controller.log | grep 'tunnel_key 2'], [1],
>     > [ignore-nolog])
>     > +# tunnel key 4 represents a datapath (alice) on the far side of the
>     > gateway from
>     > +# the perspective of hv1, hence hv1 does not need to know about its
> ports
>     > and flows.
>     > +AT_CHECK([cat hv2/ovn-controller.log | grep 'tunnel_key 4'], [0],
>     > [ignore-nolog])
>     > +AT_CHECK([cat hv1/ovn-controller.log | grep 'tunnel_key 4'], [1],
>     > [ignore-nolog])
>     > +
>     >  echo $expected > expected
>     >  OVN_CHECK_PACKETS([hv2/vif1-tx.pcap], [expected])
>     >
>     > --
>     > 1.9.1
>     >
>     > _______________________________________________
>     > dev mailing list
>     > dev@openvswitch.org
>     > https://urldefense.proofpoint.com/v2/url?u=https-3A__mail.
> openvswitch.org_mailman_listinfo_ovs-2Ddev&d=DgICAg&c=
> uilaK90D4TOVoH58JNXRgQ&r=BVhFA09CGX7JQ5Ih-uZnsw&m=
> 0NVW3s0W2Cp5PwqZSAh44zzzcCktMWNrs3_jXV4h4pY&s=Pq6i89qqTAZzkmqhxV08ag-
> Ve08Jx9nGmx3NQeGgc08&e=
>     >
>     _______________________________________________
>     dev mailing list
>     dev@openvswitch.org
>     https://urldefense.proofpoint.com/v2/url?u=https-3A__mail.
> openvswitch.org_mailman_listinfo_ovs-2Ddev&d=DgICAg&c=
> uilaK90D4TOVoH58JNXRgQ&r=BVhFA09CGX7JQ5Ih-uZnsw&m=
> 0NVW3s0W2Cp5PwqZSAh44zzzcCktMWNrs3_jXV4h4pY&s=Pq6i89qqTAZzkmqhxV08ag-
> Ve08Jx9nGmx3NQeGgc08&e=
>
>
>
>
>
>
>
>
>
>
>
>
>
Darrell Ball Dec. 9, 2016, 5:13 p.m. UTC | #15
From: Mickey Spiegel <mickeys.dev@gmail.com>

Date: Wednesday, December 7, 2016 at 3:50 PM
To: Darrell Ball <dball@vmware.com>
Cc: Darrell Ball <dlu998@gmail.com>, ovs dev <dev@openvswitch.org>
Subject: Re: [ovs-dev] [patch_v10] ovn: Add datapaths of interest filtering.


On Tue, Dec 6, 2016 at 7:53 PM, Darrell Ball <dball@vmware.com<mailto:dball@vmware.com>> wrote:


On 12/4/16, 9:48 PM, "ovs-dev-bounces@openvswitch.org<mailto:ovs-dev-bounces@openvswitch.org> on behalf of Mickey Spiegel" <ovs-dev-bounces@openvswitch.org<mailto:ovs-dev-bounces@openvswitch.org> on behalf of mickeys.dev@gmail.com<mailto:mickeys.dev@gmail.com>> wrote:

    On Sun, Dec 4, 2016 at 4:13 PM, Darrell Ball <dlu998@gmail.com<mailto:dlu998@gmail.com>> wrote:

    > This patch adds datapaths of interest support where only datapaths of

    > local interest are monitored by the ovn-controller ovsdb client.  The

    > idea is to do a flood fill in ovn-controller of datapath associations

    > calculated by northd. A new column is added to the SB database

    > datapath_binding table - related_datapaths to facilitate this so all

    > datapaths associations are known quickly in ovn-controller.  This

    > allows monitoring to adapt quickly with a single new monitor setting

    > for all datapaths of interest locally.

    >


    This is getting close. I have a few questions and comments inline.

    >

    > We monitor both logical flows and logical ports. The mac bindings

    > will likely go away and multicast groups are relatively few.

    > To reduce risk and overall latency we only suppress logical flows as

    > a starting point. Logical ports are not suppressed as a starting

    > point and but then are monitor suppressed as appropriate from that

    > point on. The initial suppression has little practical gain and not

    > doing it allows for us to be flexible in terms of how different port

    > types are supported.

    >


    I lack familiarity with conditional monitoring behavior when you do
    not explicitly suppress as a starting point. A new hv comes up, and
    starts downloading stuff from SB, including port_bindings. At some
    point the first VIF on that hv comes up, as a result of which the hv's
    controller calls sbrec_port_binding_add_clause_logical_port.
    Do all other port_bindings get suppressed at that point?
    If so, won't this lead to different behavior between the first time a
    VIF comes up on a hv, and subsequent additions of VIFs on
    unrelated and previously unseen datapaths?
    Wouldn't the behavior be more consistent if you suppress from the
    beginning?
I commented on this before; I wanted to push this part until later; but fine, let us push it now.
The issue is l2gateway logical ports which may legitimately exist as the only port in a LS datapath
on a given HV.
L2gateway LPs are only known on a HV when the LP l2gateway-chassis option is downloaded to a HV.
If we want initial suppression of LPs, then we need to do something special for l2gateway ports.

This is a valid point that I missed.

For now, monitor all l2gateway ports unconditionally.
L2gateways are both a logical port and a chassis; not a datapath as in the case of l3gateway
 (Possibly later, a physical HV level external_id could be used to specify the presence of the l2gateway logical port
similarly to VIFs.).
The number of l2gateway ports could be on the order of tenants, at least.

The code that you added for l3gateway looks good.

If you do not take care of optimizing l2gateway, I will take a look
later. In my distributed NAT patch set, I am adding another
port_binding type on distributed routers, that will be bound to a
chassis. Similar to l2gateway, this new port_binding type should
trigger a clause based on that datapath, and unlike l3gateway it
does not affect the flood fill calculation. To handle both of these,
a new datapath option "chassis-of-interest" will need to be
added.

I will be vetting the proposal for l2gateways I mentioned.


    >

    > To validate the gain, we take the approach to verify the l3gateway

    > router flow suppression from the non-l3gateway HVs POV and also add

    > a separate test to verify the flow and port handling scale advantages.

    > The separate test added verifies the logical port and logical flow

    > suppression advantage. This measures the difference in flows and ports

    > sent to the HV clients where the far majority of the overall work is.

    > The less ports and flows that are unnecessarily processed on the HVs,

    > the more benefit, excluding latency skew. The separate test is

    > simple for interpretation and shows an order of magnitude advantage;

    > the test uses only logical switches. The advantage would increase with

    > logical routers. The test is conservative in estimating the numerical

    > advantage to increase the probability of the check passing the first

    > time, since there are some timing conssiderations that affect the numbers.

    >


    This description should be updated to mention the distributed
    logical router tests that you added.


sure

    >

    > Liran Schour contributed enhancements to the conditional

    > monitoring granularity in ovs and also submitted patches

    > for ovn usage of conditional monitoring which aided this patch

    > though sharing of concepts through code review work.

    >

    > Ben Pfaff suggested that northd could be used to pre-populate

    > related datapaths for ovn-controller to use.  That idea is

    > used as part of this patch.

    >

    > CC: Ben Pfaff <blp@ovn.org<mailto:blp@ovn.org>>

    > CC: Liran Schour <LIRANS@il.ibm.com<mailto:LIRANS@il.ibm.com>>

    > Signed-off-by: Darrell Ball <dlu998@gmail.com<mailto:dlu998@gmail.com>>

    > ---

    >

    > v9->v10: Fix ordering issue on northd as pointed out by Mickey.

    >

    >          Remove patch flood fill triggering optimization to

    >          make the code easier to verify and harder to break.

    >

    >          Fix a couple issues on ovn-controller side, including

    >          stale checking.

    >

    >          Add lots more focused testing - now 2 separate extensive

    >          tests and beefed up l3gateway test.

    >

    >          Remove dependency on port filtering for l3gateways filtering

    >          and associate l3gateway chassis name as part of datapath,

    >          similarly to logical ports. Some agreement on adding this

    >          extra field to the datapath binding table.

    >


    I certainly agree. It is good to have the filtering depend only
    on the contents of datapath_binding, with no dependencies
    on port_bindings. This way there is no issue with ordering
    dependencies when filtering port_bindings.

    >

    > v8->v9: Reinstate logical port monitoring from patch V3

    >         now to handle some LP high influence topologies, especially

    >         in lieu of a lack of incremental processing right now.

    >         The original plan was to forgo monitoring logical ports

    >         in the first commit, but without incremental processing,

    >         port processing gains importance.

    >

    >         Parse out some functions

    >

    >         Add a separate test for conditional monitoring to

    >         guesstimate the reduction in logical port and flow events

    >         seen at the HVs.

    >

    > v7->v8: Start wth logical flow conditional monitoring off.

    >         Remove deprecated code.

    >         Recover SB DB version number change.

    >         Change test to be more definitive.

    >

    > v6->v7: Rebase

    >

    > v5->v6: Rebase; fix stale handling.

    >

    > v4->v5: Correct cleanup of monitors.

    >         Fix warning.

    >

    > v3->v4: Refactor after incremental processing backout.

    >         Limit filtering to logical flows to limit risk.

    >

    > v2->v3: Line length violation fixups :/

    >

    > v1->v2: Added logical port removal monitoring handling, factoring

    >         in recent changes for incremental processing.  Added some

    >         comments in the code regarding initial conditions for

    >         database conditional monitoring.

    >

    >  ovn/controller/binding.c        |  32 +++-

    >  ovn/controller/lflow.c          |   5 +

    >  ovn/controller/ovn-controller.c | 188 ++++++++++++++++++++

    >  ovn/controller/ovn-controller.h |  10 ++

    >  ovn/controller/patch.c          |  24 +++

    >  ovn/northd/ovn-northd.c         | 121 +++++++++++++

    >  ovn/ovn-sb.ovsschema            |  16 +-

    >  ovn/ovn-sb.xml                  |  13 ++

    >  tests/ovn.at<https://urldefense.proofpoint.com/v2/url?u=http-3A__ovn.at&d=DgMFaQ&c=uilaK90D4TOVoH58JNXRgQ&r=BVhFA09CGX7JQ5Ih-uZnsw&m=yvFQavsMt44UErnsU48dG-siDBRh4VpxO9YgK7BL6ck&s=KfIW3XPtAa-A_BA3rRQbElosBnlSrYOdBJOWdRIo-jM&e=>                    | 371 ++++++++++++++++++++++++++++++

    > +++++++++-

    >  9 files changed, 770 insertions(+), 10 deletions(-)

    >

    > diff --git a/ovn/controller/binding.c b/ovn/controller/binding.c

    > index fb76032..f24e2e6 100644

    > --- a/ovn/controller/binding.c

    > +++ b/ovn/controller/binding.c

    > @@ -68,7 +68,8 @@ static void

    >  get_local_iface_ids(const struct ovsrec_bridge *br_int,

    >                      struct shash *lport_to_iface,

    >                      struct sset *all_lports,

    > -                    struct sset *egress_ifaces)

    > +                    struct sset *egress_ifaces,

    > +                    const struct controller_ctx *ctx)

    >  {

    >      int i;

    >

    > @@ -88,6 +89,9 @@ get_local_iface_ids(const struct ovsrec_bridge *br_int,

    >              iface_id = smap_get(&iface_rec->external_ids, "iface-id");

    >

    >              if (iface_id) {

    > +                sbrec_port_binding_add_clause_

    > logical_port(ctx->ovnsb_idl,

    > +                                                           OVSDB_F_EQ,

    > +                                                           iface_id);

    >                  shash_add(lport_to_iface, iface_id, iface_rec);

    >                  sset_add(all_lports, iface_id);

    >              }

    > @@ -106,8 +110,13 @@ get_local_iface_ids(const struct ovsrec_bridge

    > *br_int,

    >

    >  static void

    >  add_local_datapath(struct hmap *local_datapaths,

    > -        const struct sbrec_port_binding *binding_rec)

    > +        const struct sbrec_port_binding *binding_rec,

    > +        const struct controller_ctx *ctx,

    > +        const struct sbrec_chassis *chassis_rec)

    >  {

    > +

    > +    do_datapath_flood_fill(ctx, binding_rec->datapath, chassis_rec);

    > +

    >


    In previous versions, this was at the end of add_local_datapath.
    In this version you moved it to the beginning.
    Is there an advantage in running flood fill when the second, third,
    fourth, etc ports are added for the same datapath?
    It does add processing.
All tests pass either way.

I moved it here because conceptually, I don’t want the local_datapaths check stopping
datapaths of interest checking.
i.e. I don’t want any shortcircuiting of one feature by another.

If you take a look at Ben's patch set, specifically patch 6, his
expanded notion of local_datapaths is very close to your
dps_of_interest. One difference is that local_datapaths is
recalculated with every pass through the ovn-controller.c
main loop, whereas dps_of_interest is persistent in order to
track what conditional monitoring datapath clauses have
been added.

From my comments on Ben's patch 6:

Both this and datapaths of interest trigger a computation from

add_local_datapath, to find other datapaths that can be reached on

the local hypervisor. At a high level, the difference is that this

computation is based on patch ports from a datapath, while the

datapaths of interest computation is based on a list of neighbor

datapaths that is populated by northd into each datapath_binding.

Note that northd populates related_datapaths based on patch

ports, so in the end it is the same source of information, but

different parts of the computation are done in different places at

different times.



related_datapaths allows a HV to request all the ports and flows it needs

in one pass but only those ports and flows.


Moving on to the specifics of this issue:
local_datapaths is reinitialized with each pass through the
ovn-controller.c main loop. For any one pass through the
main loop, i.e. one pass through binding_run, the first time
add_local_datapath is called for any port on a specific
datapath, the if statement will be evaluated as false.
Regardless of where the call to do_datapath_flood_fill is
placed, it will be triggered. The flood fill algorithm will go
over all reachable datapaths, considering all
related_datapaths from each reachable datapath.

The call to do_datapath_flood_fill() was placed behind the call to
get_local_datapath() for the earlier patchs strictly for performance
advantages, because do_datapath_flood_fill() would only see each
starting local datapath once per binding_run()
The present usage of local_datapaths fit that purpose and it would
have looked strange to create my own when local_datapaths was there.

I am not sure what that next version of incremental processing will look
like, so I wanted to eliminate that and other future dependencies. That is
the reason for the move. For the purposes of this patch,
I can leave the call where I had it before. There is no disadvantage in the
short term and there is the performance advantage I explained above.


Within the same pass through binding_run, assume that
add_local_datapath is called for a second port on the
same datapath. If you call do_datapath_flood_fill again,
the only way that you will get a different answer is if there
has been a change that added or removed a reachable
datapath_binding, or modified the related_datapaths list
on a reachable datapath_binding.


Trying to catch those kinds of changes seems to me to
be different than the way OVN normally behaves. For
example, when a new datapath, ports, and corresponding
flows are added, if the ports are not seen by the time
binding_run completes, then the corresponding logical
flows will not be processed until the next pass through the
ovn-controller.c main loop, since they will fail the
local_datapaths check at the beginning of
consider_logical_flow.

See answer above



    >      if (get_local_datapath(local_datapaths,

    >                             binding_rec->datapath->tunnel_key)) {

    >          return;

    > @@ -293,7 +302,7 @@ consider_local_datapath(struct controller_ctx *ctx,

    >              /* Add child logical port to the set of all local ports. */

    >              sset_add(all_lports, binding_rec->logical_port);

    >          }

    > -        add_local_datapath(local_datapaths, binding_rec);

    > +        add_local_datapath(local_datapaths, binding_rec, ctx,

    > chassis_rec);

    >          if (iface_rec && qos_map && ctx->ovs_idl_txn) {

    >              get_qos_params(binding_rec, qos_map);

    >          }

    > @@ -328,7 +337,7 @@ consider_local_datapath(struct controller_ctx *ctx,

    >          }

    >

    >          sset_add(all_lports, binding_rec->logical_port);

    > -        add_local_datapath(local_datapaths, binding_rec);

    > +        add_local_datapath(local_datapaths, binding_rec, ctx,

    > chassis_rec);

    >          if (binding_rec->chassis == chassis_rec) {

    >              return;

    >          }

    > @@ -342,7 +351,7 @@ consider_local_datapath(struct controller_ctx *ctx,

    >          const char *chassis = smap_get(&binding_rec->options,

    >                                         "l3gateway-chassis");

    >          if (!strcmp(chassis, chassis_rec->name) && ctx->ovnsb_idl_txn) {

    > -            add_local_datapath(local_datapaths, binding_rec);

    > +            add_local_datapath(local_datapaths, binding_rec, ctx,

    > chassis_rec);

    >          }

    >      } else if (chassis_rec && binding_rec->chassis == chassis_rec) {

    >          if (ctx->ovnsb_idl_txn) {

    > @@ -351,6 +360,9 @@ consider_local_datapath(struct controller_ctx *ctx,

    >              for (int i = 0; i < binding_rec->n_mac; i++) {

    >                  VLOG_INFO("Releasing %s", binding_rec->mac[i]);

    >              }

    > +            sbrec_port_binding_remove_clause_logical_port(

    > +                ctx->ovnsb_idl, OVSDB_F_EQ, binding_rec->logical_port);

    > +

    >              sbrec_port_binding_set_chassis(binding_rec, NULL);

    >              sset_find_and_delete(all_lports, binding_rec->logical_port);

    >          }

    > @@ -381,13 +393,21 @@ binding_run(struct controller_ctx *ctx, const struct

    > ovsrec_bridge *br_int,

    >      hmap_init(&qos_map);

    >      if (br_int) {

    >          get_local_iface_ids(br_int, &lport_to_iface, all_lports,

    > -                            &egress_ifaces);

    > +                            &egress_ifaces, ctx);

    >      }

    >

    > +    set_all_datapaths_of_interest_stale();

    > +

    >      /* Run through each binding record to see if it is resident on this

    >       * chassis and update the binding accordingly.  This includes both

    >       * directly connected logical ports and children of those ports. */

    >      SBREC_PORT_BINDING_FOR_EACH(binding_rec, ctx->ovnsb_idl) {

    > +

    > +        /* Set any datapath, used by any port and that we are monitoring

    > as used,

    >


    This line is too long.
fixed


    > +         * so we don't remove the monitoring.

    > +         */

    > +        set_datapath_of_interest_used(binding_rec->datapath);

    > +

    >


    This seems too broad. This would set the datapath as used if
    you have received any SB port_bindings for that datapath, even
    if those port_bindings are not relevant to this chassis.

It is not broad; it checks the datapath of interest DB and if a datapath is monitored
it sets it to NOT STALE; otherwise it changes nothing.

    How would datapaths ever be removed?

Datapaths get removed when no logical port refers to them. This is the definition of a stale datapath –
so I know it has no dependencies.

Assume there is a datapath 1 and logical ports A and B on that
datapath, neither one on this chassis. Now add vif C on this
chassis, in datapath 1. As a result of the addition of the vif,
add_local_datapaths is eventually called, and datapath 1 is
added to dps_of_interest. This results in SB port_bindings
A, B, and C being received.

Now remove vif C. You still have SB port_bindings for A and B.
Won't those trigger this call to set_datapath_of_interest_used?
Datapath 1 will remain in dps_of_interest even though there
is no reachable path from anything resident on this chassis to
datapath 1.

Here is the real reason.
The code set_datapath_of_interest_used() was added in V10 because there was an
outstanding database interaction issue that I was working on and did not understand;
in the interim I wanted to error on the side of caution for monitor cleanup so the patch
was viable.
Now that the issue is understood, set_datapath_of_interest_used() is no longer needed.



    Why isn't the stale flag processing in add_datapath_of_interest
    enough?
    Was there some corner case that was missed?

I think the code in consider_local_datapath() could filter calls to
add_local_datapath(); one of the examples is if we don’t have a transaction ?

        if (!strcmp(chassis, chassis_rec->name) && ctx->ovnsb_idl_txn) {

I can think of no good reason for the inclusion of ctx->ovnsb_idl_txn in
this if statement. There is no corresponding check of ctx->ovnsb_idl_txn
for vifs or l2gateway ports, only for l3gateway ports. I think you found a
small bug.

I agree, but my point was a little more general; there are other such cases
and more in future. But see above answer.




I don’t want to have a dependency on something that should be irrelevant.

The calls to add_local_datapath are what trigger addition of
datapaths to dps_of_interest through do_datapath_flood_fill.
It seems a lot cleaner to depend on the same code path for
removal of datapaths from dps_of_interest. If there are
problems in that code path that leave things
non-deterministic, isn't it better to try and fix those?
I feel that the addition of a separate way to set stale=false
raises the possibility of lots of different code paths and
combinations, making it difficult to debug in a deterministic
manner.

See the response above.




             consider_local_datapath(ctx, chassis_rec, binding_rec,
    >                                  sset_is_empty(&egress_ifaces) ? NULL :

    >                                  &qos_map, local_datapaths,

    > &lport_to_iface,

    > diff --git a/ovn/controller/lflow.c b/ovn/controller/lflow.c

    > index 4e67365..765eae4 100644

    > --- a/ovn/controller/lflow.c

    > +++ b/ovn/controller/lflow.c

    > @@ -167,6 +167,11 @@ consider_logical_flow(const struct lport_index

    > *lports,

    >      if (!ldp) {

    >          return;

    >      }

    > +

    > +    VLOG_DBG("consider logical flow: tunnel_key %lu "

    > +             "\"match\" \"%s\", \"actions\" \"%s\"",

    > +             ldp->tunnel_key, lflow->match, lflow->actions);

    > +

    >      if (is_switch(ldp)) {

    >          /* For a logical switch datapath, local_datapaths tells us if

    > there

    >           * are any local ports for this datapath.  If not, we can skip

    > diff --git a/ovn/controller/ovn-controller.c b/ovn/controller/ovn-

    > controller.c

    > index fea7841..c06bd81 100644

    > --- a/ovn/controller/ovn-controller.c

    > +++ b/ovn/controller/ovn-controller.c

    > @@ -69,6 +69,14 @@ OVS_NO_RETURN static void usage(void);

    >

    >  static char *ovs_remote;

    >

    > +struct dp_of_interest_node {

    > +    struct hmap_node hmap_node;

    > +    const struct sbrec_datapath_binding *sb_db;

    > +    bool stale;

    > +};

    > +

    > +static struct hmap dps_of_interest = HMAP_INITIALIZER(&dps_of_interest);

    > +

    >  struct local_datapath *

    >  get_local_datapath(const struct hmap *local_datapaths, uint32_t

    > tunnel_key)

    >  {

    > @@ -128,6 +136,181 @@ get_bridge(struct ovsdb_idl *ovs_idl, const char

    > *br_name)

    >      return NULL;

    >  }

    >

    > +static bool

    > +remote_l3gat_lrd_port(const struct sbrec_datapath_binding *dp,

    > +                             const struct sbrec_chassis *chassis_rec)

    > +{

    > +    const char *l3gateway_chassis = smap_get(&dp->options,

    > +                                             "l3gateway-chassis");

    > +    if(l3gateway_chassis){

    > +        if(strcmp(l3gateway_chassis, chassis_rec->name)){

    > +            return true;

    > +        }

    > +    }

    > +    return false;

    > +}

    > +

    > +void

    > +set_all_datapaths_of_interest_stale(void)

    > +{

    > +    struct dp_of_interest_node *node;

    > +    HMAP_FOR_EACH(node, hmap_node, &dps_of_interest) {

    > +            node->stale = true;

    > +    }

    > +}

    > +

    > +static void

    > +remove_datapath_of_interest(struct controller_ctx *ctx,

    > +                            const struct sbrec_datapath_binding *dp)

    >


    The only call to remove_datapath_of_interest is from
    clean_stale_datapaths_of_interest.
    Why not pass in dp_cur_node rather than dp, and avoid
    the hash lookup?
    Or just put all the logic in
    clean_stale_datapaths_of_interest?
Actually, remove_datapath_of_interest() is left over from a transition in coding
when it had an external caller as well.
Hence, remove_datapath_of_interest() is deprecated now; it is gone.

The new code looks good.


    > +{

    > +

    > +    struct dp_of_interest_node *node;

    > +    HMAP_FOR_EACH_WITH_HASH (node, hmap_node,

    > uuid_hash(&dp->header_.uuid),

    > +                             &dps_of_interest) {

    > +        if (ctx && uuid_equals(&node->sb_db->header_.uuid,

    > &dp->header_.uuid)) {

    > +

    > +            sbrec_port_binding_remove_clause_datapath(ctx->ovnsb_idl,

    > +                                                      OVSDB_F_EQ,

    > +                                                      &dp->header_.uuid);

    > +            sbrec_logical_flow_remove_clause_logical_datapath(

    > +                ctx->ovnsb_idl, OVSDB_F_EQ, &dp->header_.uuid);

    > +            hmap_remove(&dps_of_interest, &node->hmap_node);

    > +            free(node);

    > +            return;

    > +        }

    > +    }

    > +}

    > +

    > +static void

    > +clean_stale_datapaths_of_interest(struct controller_ctx *ctx)

    > +{

    > +    struct dp_of_interest_node *dp_cur_node, *dp_next_node;

    > +    HMAP_FOR_EACH_SAFE (dp_cur_node, dp_next_node, hmap_node,

    > +                        &dps_of_interest) {

    > +        if(dp_cur_node->stale == true) {

    > +            remove_datapath_of_interest(ctx, dp_cur_node->sb_db);

    > +        }

    > +    }

    > +}

    > +

    > +void

    > +set_datapath_of_interest_used(const struct sbrec_datapath_binding *dp)

    > +{

    > +

    > +    struct dp_of_interest_node *node;

    > +    HMAP_FOR_EACH_WITH_HASH (node, hmap_node,

    > uuid_hash(&dp->header_.uuid),

    > +                             &dps_of_interest) {

    > +        if (uuid_equals(&node->sb_db->header_.uuid, &dp->header_.uuid)) {

    > +            node->stale = false;

    > +        }

    > +    }

    > +}

    > +

    > +static void

    > +add_datapath_of_interest(const struct controller_ctx *ctx,

    > +                         const struct sbrec_datapath_binding *dp)

    > +{

    > +    struct dp_of_interest_node *node;

    > +

    > +    HMAP_FOR_EACH_WITH_HASH (node, hmap_node,

    > uuid_hash(&dp->header_.uuid),

    > +                             &dps_of_interest) {

    > +        if (uuid_equals(&node->sb_db->header_.uuid, &dp->header_.uuid)) {

    > +            node->stale = false;

    > +            return;

    > +        }

    > +    }

    > +

    > +    VLOG_DBG("monitor datapath with tunnel_key %lu", dp->tunnel_key);

    > +

    > +    sbrec_logical_flow_add_clause_logical_datapath(ctx->ovnsb_idl,

    > +                                                   OVSDB_F_EQ,

    > +                                                   &dp->header_.uuid);

    > +    sbrec_port_binding_add_clause_datapath(ctx->ovnsb_idl, OVSDB_F_EQ,

    > +                                           &dp->header_.uuid);

    > +

    > +    node = xzalloc(sizeof *node);

    > +    hmap_insert(&dps_of_interest, &node->hmap_node,

    > +                uuid_hash(&dp->header_.uuid));

    > +    node->sb_db = dp;

    > +    return;

    > +}

    > +

    > +void

    > +do_datapath_flood_fill(const struct controller_ctx *ctx,

    > +                              const struct sbrec_datapath_binding

    > *dp_start,

    > +                                          const struct sbrec_chassis

    > *chassis_rec)

    > +{

    > +

    > +    if(remote_l3gat_lrd_port(dp_start, chassis_rec)) {

    > +        return;

    > +    }

    > +

    > +    struct dp_checked {

    > +        struct hmap_node hmap_node;

    > +        const struct sbrec_datapath_binding *sb_db;

    > +    };

    > +    struct dp_checked *dp_node;

    > +    struct hmap related_dp_processed = HMAP_INITIALIZER(&related_dp_

    > processed);

    >


    This third level of lists is not easy to follow, having
    dps_of_interest

    related_dp_processed
This map is to remember what we saw and not follow it again
during processing one list.

I got that. I still question whether it is worth the effort,
versus just checking dps_of_interest.

    interest_datapaths

Ben's patch 6 used recursion to implement a similar
flood fill algorithm, only over port_bindings rather than
related_datapaths. This made it clear that a depth first
recursion will result in much more compact code that
is easier to follow, compared to the breadth first
approach taken here. With recursion,
interest_datapaths goes away.

Recursion is always less code because it uses the call stack.

IMO the right thing to do is to combine this patch with
Ben's patch 6. Do a recursive flood fill computation on
local_datapaths, based on related_datapaths. At the
end compare local_datapaths with dps_of_interest
and adjust conditional monitoring clauses based on
datapath as appropriate.

We have thought about it.
If the performance aspect is not a big factor, I would be fine with it.



    It looks like this is similar to the flood_fill placement in
    add_local_datapath.
    Should each datapath be considered once in each run
    through the main ovn-controller.c loop, such that
    different flood_fill runs going through the same
    datapaths do not investigate any further (as in v9)?
    Or should each flood_fill run to completion,
    recalculating the entire set of related datapaths each
    time that flood_fill is called, even if many of those
    datapaths are already present in dps_of_interest
    (as in this v10)?

Yes, I changed it from v9 to v10.
The number of datapaths is small relative to the number of ports, so it is
only slightly more expensive.
The issue is premature cleanup due to partial port bindings at startup - I’ll investigate this more later.
Also, in general, I don’t want to make any assumptions about input data that I don’t have to.

As I mentioned above with respect to the placement of
do_datapath_flood_fill in add_local_datapath, there are
much more significant cases of possible timing windows
that ovn does not attempt to address. I don't see this as
being more important than those other cases. IMO the
benefits do not justify the additional complexity.

There change remains because I don’t want to make do_datapath_flood_fill()
aware of a binding_run() loop context and think about what happens if I run it twice spanned
over 2 binding run loops.

It becomes a moot point if I have a starting point map per binding_run()
That is what I was using local_datapaths in earlier. See above.
This allowed me to only call do_datapath_flood_fill() once for each local_datapath.


    +
    > +    struct interest_dps_list_elem {

    > +        struct ovs_list list_node;

    > +        const struct sbrec_datapath_binding * dp;

    > +    };

    > +    struct ovs_list interest_datapaths;

    > +    ovs_list_init(&interest_datapaths);

    > +

    > +    struct interest_dps_list_elem *dp_list_elem =

    > +        xzalloc(sizeof *dp_list_elem);

    > +    dp_list_elem->dp = dp_start;

    > +    ovs_list_push_back(&interest_datapaths, &dp_list_elem->list_node);

    > +

    > +    while (!ovs_list_is_empty(&interest_datapaths)) {

    > +

    > +        struct ovs_list *list_node_ptr =

    > +            ovs_list_pop_front(&interest_datapaths);

    > +        dp_list_elem = CONTAINER_OF(list_node_ptr,

    > +            struct interest_dps_list_elem, list_node);

    > +           size_t num_related_datapaths =

    > +            dp_list_elem->dp->n_related_datapaths;

    > +           struct sbrec_datapath_binding **related_datapaths =

    > +            dp_list_elem->dp->related_datapaths;

    > +

    > +        dp_node = xzalloc(sizeof *dp_node);

    > +        dp_node->sb_db = dp_list_elem->dp;

    > +        hmap_insert(&related_dp_processed, &dp_node->hmap_node,

    > +                     uuid_hash(&dp_node->sb_db->header_.uuid));

    > +

    > +        add_datapath_of_interest(ctx, dp_list_elem->dp);

    > +        free(dp_list_elem);

    > +        for (int i = 0; i < num_related_datapaths; i++) {

    > +            bool found = false;

    > +            HMAP_FOR_EACH_WITH_HASH (

    > +                    dp_node, hmap_node,

    > +                    uuid_hash(&related_datapaths[i]->header_.uuid),

    > +                    &related_dp_processed) {

    > +

    > +                if (uuid_equals(&dp_node->sb_db->header_.uuid,

    > +                                &related_datapaths[i]->header_.uuid)) {

    > +                    found = true;

    > +                    break;

    > +                }

    > +            }

    > +            if(!found &&

    > +               !remote_l3gat_lrd_port(related_datapaths[i],

    > chassis_rec)) {

    > +                dp_list_elem = xzalloc(sizeof *dp_list_elem);

    > +                dp_list_elem->dp = related_datapaths[i];

    > +                ovs_list_push_back(&interest_datapaths,

    > &dp_list_elem->list_node);

    > +            }

    > +        }

    > +    }

    > +    struct dp_checked *cur_node, *next_node;

    > +    HMAP_FOR_EACH_SAFE (cur_node, next_node, hmap_node,

    > &related_dp_processed) {

    > +        hmap_remove(&related_dp_processed, &cur_node->hmap_node);

    > +        free(cur_node);

    > +    }

    > +    hmap_destroy(&related_dp_processed);

    > +}

    > +

    >  static const struct ovsrec_bridge *

    >  create_br_int(struct controller_ctx *ctx,

    >                const struct ovsrec_open_vswitch *cfg,

    > @@ -465,6 +648,10 @@ main(int argc, char *argv[])

    >          ovsdb_idl_create(ovnsb_remote, &sbrec_idl_class, true, true));

    >      ovsdb_idl_omit_alert(ovnsb_idl_loop.idl, &sbrec_chassis_col_nb_cfg);

    >

    > +    /* Start with suppressing all logical flows and then later request

    > those

    > +     * that are specifically needed. */

    > +    sbrec_logical_flow_add_clause_false(ovnsb_idl_loop.idl);

    > +

    >      /* Track the southbound idl. */

    >      ovsdb_idl_track_add_all(ovnsb_idl_loop.idl);

    >

    > @@ -561,6 +748,7 @@ main(int argc, char *argv[])

    >              lport_index_destroy(&lports);

    >          }

    >

    > +        clean_stale_datapaths_of_interest(&ctx);

    >          sset_destroy(&all_lports);

    >

    >          struct local_datapath *cur_node, *next_node;

    > diff --git a/ovn/controller/ovn-controller.h b/ovn/controller/ovn-

    > controller.h

    > index 8a3e855..e35b5b0 100644

    > --- a/ovn/controller/ovn-controller.h

    > +++ b/ovn/controller/ovn-controller.h

    > @@ -76,6 +76,16 @@ const struct ovsrec_bridge *get_bridge(struct ovsdb_idl

    > *,

    >  const struct sbrec_chassis *get_chassis(struct ovsdb_idl *,

    >                                          const char *chassis_id);

    >

    > +void do_datapath_flood_fill(const struct controller_ctx *,

    > +                            const struct sbrec_datapath_binding *,

    > +                            const struct sbrec_chassis *);

    > +

    > +void

    > +set_datapath_of_interest_used(const struct sbrec_datapath_binding *dp);

    > +

    > +void

    > +set_all_datapaths_of_interest_stale(void);

    > +

    >  /* Must be a bit-field ordered from most-preferred (higher number) to

    >   * least-preferred (lower number). */

    >  enum chassis_tunnel_type {

    > diff --git a/ovn/controller/patch.c b/ovn/controller/patch.c

    > index 79a7d81..9e8e856 100644

    > --- a/ovn/controller/patch.c

    > +++ b/ovn/controller/patch.c

    > @@ -330,6 +330,9 @@ add_logical_patch_ports(struct controller_ctx *ctx,

    >              free(dst_name);

    >              free(src_name);

    >              add_patched_datapath(patched_datapaths, binding, local_port);

    > +

    > +            sbrec_port_binding_add_clause_logical_port(ctx->ovnsb_idl,

    > +                                                       OVSDB_F_EQ, peer);

    >


    Why is this necessary?
    This could add logical ports that are unrelated to this chassis.
This is for peer patch ports.
This has no benefit anymore and it will monitor some patch ports unnecessarily going forward.
I removed it and the corresponding cleanup.
patch.c changes are now zero.

I like having no changes to patch.c

    The northd logic adds related_datapaths for any patch port.
    The flood fill based on related_datapaths should take care of
    datapaths across patch ports, starting from datapaths that
    this chassis does care about.

    The add_local_datapath logic works for starting points
    including vifs, l2gateway, and l3gateway.
    What do you think is missing?


                 if (local_port) {
    >                  if (binding->chassis != chassis_rec &&

    > ctx->ovnsb_idl_txn) {

    >                      sbrec_port_binding_set_chassis(binding, chassis_rec);

    > @@ -339,6 +342,26 @@ add_logical_patch_ports(struct controller_ctx *ctx,

    >      }

    >  }

    >

    > +/* Non-logical patch ports should be unaffected. */

    > +static void

    > +remove_patch_port_monitors(struct controller_ctx *ctx,

    > +                            const struct ovsrec_port *port)

    > +{

    > +    for (size_t i = 0; i < port->n_interfaces; i++) {

    > +        struct ovsrec_interface *iface = port->interfaces[i];

    > +        if (strcmp(iface->type, "patch")) {

    > +            continue;

    >


    I believe your code adds clauses for l3gateway ports above,
    but does not remove those monitors here.


    +        }
    > +        const char *iface_id = smap_get(&iface->external_ids,

    > +                                        "iface-id");

    > +        if (iface_id) {



    Patch interfaces don't have iface_id. They have options "peer",
    and the ports (not interfaces) have external_ids with key
    "ovn-logical-patch-port" or "ovn-l3gateway-port".
That was a sloppy copy/paste from binding.c to check for external ids that I never finished afterwards.
I intended to check for these:
            smap_get(&port->external_ids, "ovn-l3gateway-port")
            smap_get(&port->external_ids, "ovn-logical-patch-port")

The iface_id argument below also did not get updated.
This code is deprecated now anyways.

Agreed.


    >


    +            sbrec_port_binding_remove_clause_logical_port(ctx->ovnsb_idl,
    > +                                                          OVSDB_F_EQ,

    > +                                                          iface_id);

    > +        }

    > +    }

    > +}

    > +

    >  void

    >  patch_run(struct controller_ctx *ctx, const struct ovsrec_bridge *br_int,

    >            const char *chassis_id, struct hmap *local_datapaths,

    > @@ -374,6 +397,7 @@ patch_run(struct controller_ctx *ctx, const struct

    > ovsrec_bridge *br_int,

    >      SHASH_FOR_EACH_SAFE (port_node, port_next_node, &existing_ports) {

    >          struct ovsrec_port *port = port_node->data;

    >          shash_delete(&existing_ports, port_node);

    > +        remove_patch_port_monitors(ctx, port);

    >          remove_port(ctx, port);

    >      }

    >      shash_destroy(&existing_ports);

    > diff --git a/ovn/northd/ovn-northd.c b/ovn/northd/ovn-northd.c

    > index f2dc353..7ea5dc9 100644

    > --- a/ovn/northd/ovn-northd.c

    > +++ b/ovn/northd/ovn-northd.c

    > @@ -231,6 +231,42 @@ struct tnlid_node {

    >      uint32_t tnlid;

    >  };

    >

    > +struct related_datapath_node {

    > +    struct hmap_node hmap_node;

    > +    const struct sbrec_datapath_binding *sb_db;

    > +};

    > +

    > +static void

    > +add_related_datapath(struct hmap *related_datapaths,

    > +                     const struct sbrec_datapath_binding *sb,

    > +                     size_t *n_related_datapaths)

    > +{

    > +    struct related_datapath_node *node;

    > +    HMAP_FOR_EACH_WITH_HASH (node, hmap_node,

    > +                             uuid_hash(&sb->header_.uuid),

    > +                             related_datapaths) {

    > +        if (uuid_equals(&sb->header_.uuid, &node->sb_db->header_.uuid)) {

    > +            return;

    > +        }

    > +    }

    > +

    > +    node = xzalloc(sizeof *node);

    > +    hmap_insert(related_datapaths, &node->hmap_node,

    > +                uuid_hash(&sb->header_.uuid));

    > +    node->sb_db = sb;

    > +    (*n_related_datapaths)++;

    > +}

    > +

    > +static void

    > +destroy_related_datapaths(struct hmap *related_datapaths)

    > +{

    > +    struct related_datapath_node *node;

    > +    HMAP_FOR_EACH_POP (node, hmap_node, related_datapaths) {

    > +        free(node);

    > +    }

    > +    hmap_destroy(related_datapaths);

    > +}

    > +

    >  static void

    >  destroy_tnlids(struct hmap *tnlids)

    >  {

    > @@ -376,6 +412,8 @@ struct ovn_datapath {

    >      size_t n_router_ports;

    >

    >      struct hmap port_tnlids;

    > +    struct hmap related_datapaths;

    > +    size_t n_related_datapaths;

    >      uint32_t port_key_hint;

    >

    >      bool has_unknown;

    > @@ -426,6 +464,7 @@ ovn_datapath_create(struct hmap *datapaths, const

    > struct uuid *key,

    >      od->nbr = nbr;

    >      hmap_init(&od->port_tnlids);

    >      hmap_init(&od->ipam);

    > +    hmap_init(&od->related_datapaths);

    >      od->port_key_hint = 0;

    >      hmap_insert(datapaths, &od->key_node, uuid_hash(&od->key));

    >      return od;

    > @@ -441,6 +480,7 @@ ovn_datapath_destroy(struct hmap *datapaths, struct

    > ovn_datapath *od)

    >          hmap_remove(datapaths, &od->key_node);

    >          destroy_tnlids(&od->port_tnlids);

    >          destroy_ipam(&od->ipam);

    > +        destroy_related_datapaths(&od->related_datapaths);

    >          free(od->router_ports);

    >          free(od);

    >      }

    > @@ -671,6 +711,69 @@ struct ovn_port {

    >      struct ovs_list list;       /* In list of similar records. */

    >  };

    >

    > +static void

    > +populate_sb_related_datapaths(struct ovn_datapath *od)

    > +{

    > +    struct sbrec_datapath_binding **sb_related_datapaths

    > +        = xmalloc(sizeof(*sb_related_datapaths) *

    > od->n_related_datapaths);

    > +    int rdi = 0;

    > +    struct related_datapath_node *related_datapath;

    > +    HMAP_FOR_EACH (related_datapath, hmap_node,

    > +                   &od->related_datapaths) {

    > +        if (rdi >= od->n_related_datapaths) {

    > +            static struct vlog_rate_limit rl

    > +                = VLOG_RATE_LIMIT_INIT(5, 1);

    > +            VLOG_ERR_RL(&rl, "related datapaths accounting error"

    > +                        UUID_FMT, UUID_ARGS(&od->key));

    > +            break;

    > +        }

    > +

    > +        sb_related_datapaths[rdi] = CONST_CAST(

    > +            struct sbrec_datapath_binding *, related_datapath->sb_db);

    > +        rdi++;

    > +    }

    > +    sbrec_datapath_binding_set_related_datapaths(od->sb,

    > +        sb_related_datapaths, od->n_related_datapaths);

    > +    free(sb_related_datapaths);

    > +}

    > +

    > +static void

    > +generate_sb_related_datapaths_from_ports(struct ovs_list *both)

    > +{

    > +    struct dp_checked {

    > +           struct hmap_node hmap_node;

    > +           const struct sbrec_datapath_binding *sb_db;

    > +       };

    > +       struct dp_checked *dp_node;

    > +       struct hmap dp_processed = HMAP_INITIALIZER(&dp_processed);

    > +    struct ovn_port *op, *next;

    > +       LIST_FOR_EACH_SAFE (op, next, list, both) {

    > +           bool found = false;

    > +           HMAP_FOR_EACH_WITH_HASH (dp_node, hmap_node,

    > +                                    uuid_hash(&op->od->sb->header_.uuid),

    > +                                    &dp_processed) {

    > +               if (uuid_equals(&dp_node->sb_db->header_.uuid,

    > +                               &op->od->sb->header_.uuid)) {

    > +                   found = true;

    > +                   break;

    > +               }

    > +           }

    > +           if(!found){

    > +            dp_node = xzalloc(sizeof *dp_node);

    > +               dp_node->sb_db = op->od->sb;

    > +               hmap_insert(&dp_processed, &dp_node->hmap_node,

    > +                           uuid_hash(&dp_node->sb_db->header_.uuid));

    > +            populate_sb_related_datapaths(op->od);

    > +           }

    > +       }

    > +       struct dp_checked *cur_node, *next_node;

    > +       HMAP_FOR_EACH_SAFE (cur_node, next_node, hmap_node, &dp_processed)

    > {

    > +           hmap_remove(&dp_processed, &cur_node->hmap_node);

    > +           free(cur_node);

    > +       }

    > +       hmap_destroy(&dp_processed);

    > +}

    > +

    >  static struct ovn_port *

    >  ovn_port_create(struct hmap *ports, const char *key,

    >                  const struct nbrec_logical_switch_port *nbsp,

    > @@ -1359,12 +1462,19 @@ ovn_port_update_sbrec(const struct ovn_port *op,

    >              sbrec_port_binding_set_type(op->sb, "patch");

    >          }

    >

    > +        if (op->peer && op->peer->od && op->peer->od->sb) {

    > +            add_related_datapath(&op->od->related_datapaths,

    > +                                 op->peer->od->sb,

    > +                                 &op->od->n_related_datapaths);

    > +        }

    > +

    >          const char *peer = op->peer ? op->peer->key : "<error>";

    >          struct smap new;

    >          smap_init(&new);

    >          smap_add(&new, "peer", peer);

    >          if (chassis) {

    >              smap_add(&new, "l3gateway-chassis", chassis);

    > +            sbrec_datapath_binding_set_options(op->sb->datapath, &new);

    >


    It would be cleaner to modify the datapath_binding in
    build_datapaths, though that would require addition of a run
    through the both list to update or remove "chassis" as
    necessary, as well as modifying the existing run through
    the nb_only list.

    I don't see any logic to remove "chassis" if that is removed
    from the nbr entry.

    With this code order, you are also adding "peer" which does
    not belong on datapath_binding. If this code stays here, then
    the smap_add "peer" line should be moved below this.
Yeah, that was sloppy.
“peer” in the datapath_binding options map will do no harm fortunately.
But the l3gateway-chassis option conceivably may need to be cleared in the same
datapath_binding entry at some point in it’s lifetime - a datapath may cease to be a
l3gateway and become something else.
I moved the line.

The change looks good.

Mickey


             }
    >          sbrec_port_binding_set_options(op->sb, &new);

    >          smap_destroy(&new);

    > @@ -1411,6 +1521,12 @@ ovn_port_update_sbrec(const struct ovn_port *op,

    >                  sbrec_port_binding_set_type(op->sb, "patch");

    >              }

    >

    > +            if (op->peer && op->peer->od && op->peer->od->sb) {

    > +                add_related_datapath(&op->od->related_datapaths,

    > +                                     op->peer->od->sb,

    > +                                     &op->od->n_related_datapaths);

    > +            }

    > +

    >              const char *router_port = smap_get_def(&op->nbsp->options,

    >                                                     "router-port",

    > "<error>");

    >              struct smap new;

    > @@ -1519,6 +1635,7 @@ build_ports(struct northd_context *ctx, struct hmap

    > *datapaths,

    >          cleanup_mac_bindings(ctx, ports);

    >      }

    >

    > +    generate_sb_related_datapaths_from_ports(&both);

    >      tag_alloc_destroy(&tag_alloc_table);

    >      destroy_chassis_queues(&chassis_qdisc_queues);

    >  }

    > @@ -4952,6 +5069,10 @@ main(int argc, char *argv[])

    >      add_column_noalert(ovnsb_idl_loop.idl,

    >                         &sbrec_datapath_binding_col_tunnel_key);

    >      add_column_noalert(ovnsb_idl_loop.idl,

    > +                       &sbrec_datapath_binding_col_related_datapaths);

    > +    add_column_noalert(ovnsb_idl_loop.idl,

    > +                       &sbrec_datapath_binding_col_options);

    > +    add_column_noalert(ovnsb_idl_loop.idl,

    >                         &sbrec_datapath_binding_col_external_ids);

    >

    >      ovsdb_idl_add_table(ovnsb_idl_loop.idl, &sbrec_table_port_binding);

    > diff --git a/ovn/ovn-sb.ovsschema b/ovn/ovn-sb.ovsschema

    > index 89342fe..8dfb2f4 100644

    > --- a/ovn/ovn-sb.ovsschema

    > +++ b/ovn/ovn-sb.ovsschema

    > @@ -1,7 +1,7 @@

    >  {

    >      "name": "OVN_Southbound",

    > -    "version": "1.9.0",

    > -    "cksum": "239060528 9012",

    > +    "version": "1.10.0",

    > +    "cksum": "3562877871 9537",

    >      "tables": {

    >          "SB_Global": {

    >              "columns": {

    > @@ -93,7 +93,17 @@

    >                                        "maxInteger": 16777215}}},

    >                  "external_ids": {

    >                      "type": {"key": "string", "value": "string",

    > -                             "min": 0, "max": "unlimited"}}},

    > +                             "min": 0, "max": "unlimited"}},

    > +                "options": {

    > +                     "type": {"key": "string",

    > +                              "value": "string",

    > +                              "min": 0,

    > +                              "max": "unlimited"}},

    > +                "related_datapaths": {"type": {"key": {"type": "uuid",

    > +                                      "refTable": "Datapath_Binding",

    > +                                      "refType": "weak"},

    > +                                      "min": 0,

    > +                                      "max": "unlimited"}}},

    >              "indexes": [["tunnel_key"]],

    >              "isRoot": true},

    >          "Port_Binding": {

    > diff --git a/ovn/ovn-sb.xml b/ovn/ovn-sb.xml

    > index 65191ed..88de25d 100644

    > --- a/ovn/ovn-sb.xml

    > +++ b/ovn/ovn-sb.xml

    > @@ -1556,6 +1556,19 @@ tcp.flags = RST;

    >        constructed for each supported encapsulation.

    >      </column>

    >

    > +    <column name="options" key="l3gateway-chassis">

    > +      The <code>chassis</code> in which the router datapath resides.

    > +    </column>

    > +

    > +    <column name="related_datapaths">

    > +      The related_datapaths column is used to keep track of which

    > datapaths

    > +      are connected to each other.  This is leveraged in ovn-controller to

    > +      do a flood fill from each local logical switch to determine the full

    > +      set of datapaths needed on a given ovn-controller.   This set of

    > +      datapaths can be used to determine which logical ports and logical

    > +      flows an ovn-controller should monitor.

    > +    </column>

    > +

    >      <group title="OVN_Northbound Relationship">

    >        <p>

    >          Each row in <ref table="Datapath_Binding"/> is associated with

    > some

    > diff --git a/tests/ovn.at<https://urldefense.proofpoint.com/v2/url?u=http-3A__ovn.at&d=DgMFaQ&c=uilaK90D4TOVoH58JNXRgQ&r=BVhFA09CGX7JQ5Ih-uZnsw&m=yvFQavsMt44UErnsU48dG-siDBRh4VpxO9YgK7BL6ck&s=KfIW3XPtAa-A_BA3rRQbElosBnlSrYOdBJOWdRIo-jM&e=> b/tests/ovn.at<https://urldefense.proofpoint.com/v2/url?u=http-3A__ovn.at&d=DgMFaQ&c=uilaK90D4TOVoH58JNXRgQ&r=BVhFA09CGX7JQ5Ih-uZnsw&m=yvFQavsMt44UErnsU48dG-siDBRh4VpxO9YgK7BL6ck&s=KfIW3XPtAa-A_BA3rRQbElosBnlSrYOdBJOWdRIo-jM&e=>

    > index a226849..af544e1 100644

    > --- a/tests/ovn.at<https://urldefense.proofpoint.com/v2/url?u=http-3A__ovn.at&d=DgMFaQ&c=uilaK90D4TOVoH58JNXRgQ&r=BVhFA09CGX7JQ5Ih-uZnsw&m=yvFQavsMt44UErnsU48dG-siDBRh4VpxO9YgK7BL6ck&s=KfIW3XPtAa-A_BA3rRQbElosBnlSrYOdBJOWdRIo-jM&e=>

    > +++ b/tests/ovn.at<https://urldefense.proofpoint.com/v2/url?u=http-3A__ovn.at&d=DgMFaQ&c=uilaK90D4TOVoH58JNXRgQ&r=BVhFA09CGX7JQ5Ih-uZnsw&m=yvFQavsMt44UErnsU48dG-siDBRh4VpxO9YgK7BL6ck&s=KfIW3XPtAa-A_BA3rRQbElosBnlSrYOdBJOWdRIo-jM&e=>

    > @@ -1673,6 +1673,141 @@ OVN_CLEANUP([hv1],[hv2])

    >

    >  AT_CLEANUP

    >

    > +# 2 hypervisors, each with one logical switch with 2 logical ports.

    > +AT_SETUP([ovn -- conditional monitoring: 2 HVs each with 1 LS and 2 LPs])

    > +AT_SKIP_IF([test $HAVE_PYTHON = no])

    > +ovn_start

    > +

    > +# In this test cases we create 2 switches, all connected to same

    > +# physical network (through br-phys on each HV). Each switch has

    > +# VIF ports across 1 HV. Each HV has 2 VIF ports. The first digit

    > +# of VIF port name indicates the hypervisor it is bound to, e.g.

    > +# lp12 means VIF 2 on hv1.

    > +

    > +for i in 1 2; do

    > +    ls_name=ls$i

    > +    ovn-nbctl ls-add $ls_name

    > +done

    > +

    > +net_add n1

    > +for i in 1 2; do

    > +    sim_add hv$i

    > +    as hv$i

    > +    ovs-vsctl add-br br-phys

    > +    ovn_attach n1 br-phys 192.168.0.$i

    > +    ovs-appctl -t ovn-controller vlog/set ANY:file:DBG

    > +

    > +    for j in 1 2; do

    > +        ovs-vsctl add-port br-int vif$i$j -- \

    > +            set Interface vif$i$j external-ids:iface-id=lp$i$j \

    > +                                  options:tx_pcap=hv$i/vif$i$j-tx.pcap \

    > +                                  options:rxq_pcap=hv$i/vif$i$j-rx.pcap \

    > +                                  ofport-request=$i$j

    > +

    > +        lsp_name=lp$i$j

    > +        if test $i -eq 1; then

    > +            ls_name=ls1

    > +        else

    > +            ls_name=ls2

    > +        fi

    > +

    > +        ovn-nbctl lsp-add $ls_name $lsp_name

    > +        ovn-nbctl lsp-set-addresses $lsp_name f0:00:00:00:00:$i$j

    > +        ovn-nbctl lsp-set-port-security $lsp_name f0:00:00:00:00:$i$j

    > +

    > +        OVS_WAIT_UNTIL([test x`ovn-nbctl lsp-get-up $lsp_name` = xup])

    > +    done

    > +done

    > +

    > +ovn_populate_arp

    > +

    > +# XXX This is now the 3rd copy of these functions in this file ...

    > +

    > +# Given the name of a logical port, prints the name of the hypervisor

    > +# on which it is located.

    > +vif_to_hv() {

    > +    echo hv${1%?}

    > +}

    > +#

    > +# test_packet INPORT DST SRC ETHTYPE OUTPORT...

    > +#

    > +# This shell function causes a packet to be received on INPORT.  The

    > packet's

    > +# content has Ethernet destination DST and source SRC (each exactly 12 hex

    > +# digits) and Ethernet type ETHTYPE (4 hex digits).  The OUTPORTs (zero or

    > +# more) list the VIFs on which the packet should be received.  INPORT and

    > the

    > +# OUTPORTs are specified as logical switch port numbers, e.g. 11 for

    > vif11.

    > +for i in 1 2; do

    > +    for j in 1 2; do

    > +        : > $i$j.expected

    > +    done

    > +done

    > +test_packet() {

    > +    local inport=$1 src=$2 dst=$3 eth=$4; shift; shift; shift; shift

    > +    local packet=${src}${dst}${eth}

    > +    hv=`vif_to_hv $inport`

    > +    vif=vif$inport

    > +    as $hv ovs-appctl netdev-dummy/receive $vif $packet

    > +    for outport; do

    > +        echo $packet >> $outport.expected

    > +    done

    > +}

    > +

    > +# lp11 and lp12 are on the same network and on

    > +# same hypervisors

    > +test_packet 11 f00000000012 f00000000011 1112 12

    > +test_packet 12 f00000000011 f00000000012 1211 11

    > +

    > +# lp21 and lp22 are on the same network and on

    > +# the same hypervisor

    > +test_packet 21 f00000000022 f00000000021 2122 22

    > +test_packet 22 f00000000021 f00000000022 2221 21

    > +

    > +# lp11 and lp22 are on different networks so should

    > +# not be able to communicate.

    > +test_packet 11 f00000000022 f00000000011 1122

    > +test_packet 22 f00000000011 f00000000022 2211

    > +

    > +echo "------ OVN dump ------"

    > +ovn-nbctl show

    > +ovn-sbctl show

    > +

    > +echo "------ hv1 dump ------"

    > +as hv1 ovs-vsctl show

    > +as hv1 ovs-ofctl -O OpenFlow13 dump-flows br-int

    > +

    > +echo "------ hv2 dump ------"

    > +as hv2 ovs-vsctl show

    > +as hv2 ovs-ofctl -O OpenFlow13 dump-flows br-int

    > +

    > +# Check that all datapath components are monitored as expected.

    > +AT_CHECK([cat hv1/ovn-controller.log | grep 'tunnel_key 1'], [0],

    > [ignore-nolog])

    > +AT_CHECK([cat hv1/ovn-controller.log | grep 'tunnel_key 2'], [1],

    > [ignore-nolog])

    > +AT_CHECK([cat hv2/ovn-controller.log | grep 'tunnel_key 2'], [0],

    > [ignore-nolog])

    > +AT_CHECK([cat hv2/ovn-controller.log | grep 'tunnel_key 1'], [1],

    > [ignore-nolog])

    > +

    > +dbg_instances1=`cat hv1/ovn-controller.log | grep 'tunnel_key 1' | wc -l`

    > +echo ls1 monitor traces on hv1 $dbg_instances1

    > +dbg_instances2=`cat hv1/ovn-controller.log | grep 'tunnel_key 2' | wc -l`

    > +echo ls2 monitor traces on hv1 $dbg_instances2

    > +AT_CHECK([test $dbg_instances1 -gt $dbg_instances2], [0], [ignore-nolog])

    > +

    > +dbg_instances1=`cat hv2/ovn-controller.log | grep 'tunnel_key 2' | wc -l`

    > +echo ls2 monitor traces on hv2 $dbg_instances1

    > +dbg_instances2=`cat hv2/ovn-controller.log | grep 'tunnel_key 1' | wc -l`

    > +echo ls1 monitor traces on hv2 $dbg_instances2

    > +AT_CHECK([test $dbg_instances1 -gt $dbg_instances2], [0], [ignore-nolog])

    > +

    > +# Now check the packets actually received against the ones expected.

    > +for i in 1 2; do

    > +    for j in 1 2; do

    > +        OVN_CHECK_PACKETS([hv$i/vif$i$j-tx.pcap], [$i$j.expected])

    > +    done

    > +done

    > +

    > +OVN_CLEANUP([hv1],[hv2])

    > +

    > +AT_CLEANUP

    > +

    >  AT_SETUP([ovn -- vtep: 3 HVs, 1 VIFs/HV, 1 GW, 1 LS])

    >  AT_KEYWORDS([vtep])

    >  AT_SKIP_IF([test $HAVE_PYTHON = no])

    > @@ -2439,6 +2574,223 @@ OVN_CLEANUP([hv1], [hv2], [hv3])

    >

    >  AT_CLEANUP

    >

    > +# 3 hypervisors, 3 logical switches with 3 logical ports each, 1 logical

    > router

    > +AT_SETUP([ovn -- conditional Monitoring: 3 HVs, 3 LS, 3 lports/LS, 1 LR])

    > +AT_SKIP_IF([test $HAVE_PYTHON = no])

    > +ovn_start

    > +

    > +# Logical network:

    > +#

    > +# Three logical switches ls1, ls2, ls3.

    > +# One logical router lr0 connected to ls[123],

    > +# with nine subnets, three per logical switch:

    > +#

    > +#    lrp11 on ls1 for subnet 192.168.11.0/24<https://urldefense.proofpoint.com/v2/url?u=http-3A__192.168.11.0_24&d=DgMFaQ&c=uilaK90D4TOVoH58JNXRgQ&r=BVhFA09CGX7JQ5Ih-uZnsw&m=yvFQavsMt44UErnsU48dG-siDBRh4VpxO9YgK7BL6ck&s=rVNFqYXA3MuwuMHO0JXIIJ7q6xrIt0-pn1gJeyhZrLQ&e=>

    > +#    lrp12 on ls1 for subnet 192.168.12.0/24<https://urldefense.proofpoint.com/v2/url?u=http-3A__192.168.12.0_24&d=DgMFaQ&c=uilaK90D4TOVoH58JNXRgQ&r=BVhFA09CGX7JQ5Ih-uZnsw&m=yvFQavsMt44UErnsU48dG-siDBRh4VpxO9YgK7BL6ck&s=7OmDf8UTHQzZ11W4nV1mzyc5RXHkNfMWu5S3c8DLiJw&e=>

    > +#    lrp13 on ls1 for subnet 192.168.13.0/24<https://urldefense.proofpoint.com/v2/url?u=http-3A__192.168.13.0_24&d=DgMFaQ&c=uilaK90D4TOVoH58JNXRgQ&r=BVhFA09CGX7JQ5Ih-uZnsw&m=yvFQavsMt44UErnsU48dG-siDBRh4VpxO9YgK7BL6ck&s=3Q5GdU5UoVHL4Q3lTDVB_nnRkHiTTzM8vJgSCw4SXpk&e=>



    This is confusing since the code below does not add
    lrp11, lrp12, lrp13, just the subnets in ls1 which is
    not attached to lr0.
That description was not cleaned up when the test was updated.
I fixed it.

    I am having trouble understanding the motivation for
    3 subnets per ls. It will work, just wondering why you
    thought that it is necessary or helpful?

I just needed multiple ports. Having multiple subnets allows easier porting
to create a different test case in future based on this one.

    >

    > +#    ...

    > +#    lrp33 on ls3 for subnet 192.168.33.0/24<https://urldefense.proofpoint.com/v2/url?u=http-3A__192.168.33.0_24&d=DgMFaQ&c=uilaK90D4TOVoH58JNXRgQ&r=BVhFA09CGX7JQ5Ih-uZnsw&m=yvFQavsMt44UErnsU48dG-siDBRh4VpxO9YgK7BL6ck&s=seh_zSu3CymUwPCHM8faVEuLVraDV5gAXF3UiFlvyBw&e=>

    > +#

    > +# 27 VIFs, 9 per LS, 3 per subnet: lp[123][123][123], where the first two

    > +# digits are the subnet and the last digit distinguishes the VIF.

    > +for i in 1 2 3; do

    > +    ovn-nbctl ls-add ls$i

    > +    for j in 1 2 3; do

    > +        for k in 1 2 3; do

    > +            ovn-nbctl \

    > +                -- lsp-add ls$i lp$i$j$k \

    > +                -- lsp-set-addresses lp$i$j$k "f0:00:00:00:0$i:$j$k \

    > +                   192.168.$i$j.$k"

    > +        done

    > +    done

    > +done

    > +

    > +# Only ls2 and ls3 are connected by the logical router.

    > +ovn-nbctl lr-add lr0

    > +for i in 2 3; do

    > +    for j in 1 2 3; do

    > +        ovn-nbctl lrp-add lr0 lrp$i$j 00:00:00:00:ff:$i$j

    > 192.168.$i$j.254/24

    > +        ovn-nbctl \

    > +            -- lsp-add ls$i lrp$i$j-attachment \

    > +            -- set Logical_Switch_Port lrp$i$j-attachment type=router \

    > +                             options:router-port=lrp$i$j \

    > +                             addresses='"00:00:00:00:ff:'$i$j'"'

    > +    done

    > +done

    > +

    > +# Physical network:

    > +#

    > +# Three hypervisors hv[123].

    > +# lp[1??] on hv1

    > +# lp[2??] on hv2.

    > +# lp[3??] on hv3.

    >


    I like this. So ls3 is only present on hv3?
    If so, this will test the related_datapaths part of the code, making
    sure that hv2 grabs flows for datapaths lr0 and ls3 even though
    there are no local VIFs in ls3 on hv2. This would catch the
    code order northd issue from v9.

    Mickey

    +
    > +# Given the name of a logical port, prints the name of the hypervisor

    > +# on which it is located.

    > +vif_to_hv() {

    > +    case $1 in dnl (

    > +        1??) echo 1 ;; dnl (

    > +        2??) echo 2 ;; dnl (

    > +        3??) echo 3 ;;

    > +    esac

    > +}

    > +

    > +# Given the name of a logical port, prints the name of its logical router

    > +# port, e.g. "vif_to_lrp 123" yields 12.

    > +vif_to_lrp() {

    > +    echo ${1%?}

    > +}

    > +

    > +# Given the name of a logical port, prints the name of its logical

    > +# switch, e.g. "vif_to_ls 123" yields 1.

    > +vif_to_ls() {

    > +    echo ${1%??}

    > +}

    > +

    > +net_add n1

    > +for i in 1 2 3; do

    > +    sim_add hv$i

    > +    as hv$i

    > +    ovs-vsctl add-br br-phys

    > +    ovn_attach n1 br-phys 192.168.0.$i

    > +    ovs-appctl -t ovn-controller vlog/set ANY:file:DBG

    > +done

    > +for i in 1 2 3; do

    > +    for j in 1 2 3; do

    > +        for k in 1 2 3; do

    > +            hv=`vif_to_hv $i$j$k`

    > +                as hv$hv ovs-vsctl \

    > +                -- add-port br-int vif$i$j$k \

    > +                -- set Interface vif$i$j$k \

    > +                    external-ids:iface-id=lp$i$j$k \

    > +                    options:tx_pcap=hv$hv/vif$i$j$k-tx.pcap \

    > +                    options:rxq_pcap=hv$hv/vif$i$j$k-rx.pcap \

    > +                    ofport-request=$i$j$k

    > +        done

    > +    done

    > +done

    > +

    > +# Pre-populate the hypervisors' ARP tables so that we don't lose any

    > +# packets for ARP resolution (native tunneling doesn't queue packets

    > +# for ARP resolution).

    > +ovn_populate_arp

    > +

    > +# Allow some time for ovn-northd and ovn-controller to catch up.

    > +# XXX This should be more systematic.

    > +sleep 1

    > +

    > +# test_ip INPORT SRC_MAC DST_MAC SRC_IP DST_IP OUTPORT...

    > +#

    > +# This shell function causes a packet to be received on INPORT.  The

    > packet's

    > +# content has Ethernet destination DST and source SRC (each exactly 12 hex

    > +# digits) and Ethernet type ETHTYPE (4 hex digits).  The OUTPORTs (zero or

    > +# more) list the VIFs on which the packet should be received.  INPORT and

    > the

    > +# OUTPORTs are specified as logical switch port numbers, e.g. 123 for

    > vif123.

    > +for i in 1 2 3; do

    > +    for j in 1 2 3; do

    > +        for k in 1 2 3; do

    > +            : > $i$j$k.expected

    > +        done

    > +    done

    > +done

    > +test_ip() {

    > +    # This packet has bad checksums but logical L3 routing doesn't check.

    > +    local inport=$1 src_mac=$2 dst_mac=$3 src_ip=$4 dst_ip=$5

    > +    local packet=${dst_mac}${src_mac}08004500001c0000000040110000${

    > src_ip}${dst_ip}0035111100080000

    > +    shift; shift; shift; shift; shift

    > +    hv=hv`vif_to_hv $inport`

    > +    as $hv ovs-appctl netdev-dummy/receive vif$inport $packet

    > +    in_ls=`vif_to_ls $inport`

    > +    in_lrp=`vif_to_lrp $inport`

    > +    for outport; do

    > +        out_ls=`vif_to_ls $outport`

    > +        if test $in_ls = $out_ls; then

    > +            # Ports on the same logical switch receive exactly the same

    > packet.

    > +            echo $packet

    > +        else

    > +            # Routing decrements TTL and updates source and dest MAC

    > +            # (and checksum).

    > +            out_lrp=`vif_to_lrp $outport`

    > +            echo f00000000${outport}00000000ff$

    > {out_lrp}08004500001c00000000"3f1101"00${src_ip}${dst_ip}0035111100080000

    > +        fi >> $outport.expected

    > +    done

    > +}

    > +

    > +as hv1 ovn-sbctl list datapath_binding

    > +echo "----------------------------"

    > +as hv1 ovn-sbctl list port_binding

    > +echo "----------------------------"

    > +as hv1 ovn-sbctl dump-flows

    > +#as hv1 ovs-vsctl --columns=name,ofport list interface

    > +#as hv1 ovs-ofctl dump-flows br-int

    > +

    > +# Send IP packets between all pairs of source and destination ports:

    > +#

    > +# 1. Unicast IP packets are delivered to exactly one logical switch port

    > +#    (except that packets destined to their input ports are dropped).

    > +#

    > +ip_to_hex() {

    > +    printf "%02x%02x%02x%02x" "$@"

    > +}

    > +for is in 1 2 3; do

    > +  for js in 1 2 3; do

    > +    for ks in 1 2 3; do

    > +      s=$is$js$ks

    > +      smac=f00000000$s

    > +      sip=`ip_to_hex 192 168 $is$js $ks`

    > +      for id in 1 2 3; do

    > +          for jd in 1 2 3; do

    > +              for kd in 1 2 3; do

    > +                d=$id$jd$kd

    > +                dip=`ip_to_hex 192 168 $id$jd $kd`

    > +                if test $is = $id; then dmac=f00000000$d; else

    > dmac=00000000ff$is$js; fi

    > +                if test $d != $s; then unicast=$d; else unicast=; fi

    > +                if test $is = 1; then if test $is != $id; then unicast=;

    > fi; fi

    > +                if test $id = 1; then if test $is != $id; then unicast=;

    > fi; fi

    > +                test_ip $s $smac $dmac $sip $dip $unicast

    > +              done

    > +          done

    > +        done

    > +      done

    > +  done

    > +done

    > +

    > +# Allow some time for packet forwarding.

    > +# XXX This can be improved.

    > +sleep 1

    > +

    > +ovn-sbctl -f csv -d bare --no-heading \

    > +    -- --columns=logical_port,ip,mac list mac_binding > mac_bindings

    > +

    > +# Now check the packets actually received against the ones expected.

    > +for i in 1 2 3; do

    > +    for j in 1 2 3; do

    > +        for k in 1 2 3; do

    > +            OVN_CHECK_PACKETS([hv`vif_to_hv $i$j$k`/vif$i$j$k-tx.pcap],

    > +                              [$i$j$k.expected])

    > +        done

    > +    done

    > +done

    > +

    > +# Check that all datapath components are monitored as expected.

    > +AT_CHECK([cat hv1/ovn-controller.log | grep 'tunnel_key 1'], [0],

    > [ignore-nolog])

    > +AT_CHECK([cat hv1/ovn-controller.log | grep 'tunnel_key 2'], [1],

    > [ignore-nolog])

    > +AT_CHECK([cat hv1/ovn-controller.log | grep 'tunnel_key 3'], [1],

    > [ignore-nolog])

    > +AT_CHECK([cat hv1/ovn-controller.log | grep 'tunnel_key 4'], [1],

    > [ignore-nolog])

    > +AT_CHECK([cat hv2/ovn-controller.log | grep 'tunnel_key 1'], [1],

    > [ignore-nolog])

    > +AT_CHECK([cat hv2/ovn-controller.log | grep 'tunnel_key 2'], [0],

    > [ignore-nolog])

    > +AT_CHECK([cat hv2/ovn-controller.log | grep 'tunnel_key 3'], [0],

    > [ignore-nolog])

    > +AT_CHECK([cat hv2/ovn-controller.log | grep 'tunnel_key 4'], [0],

    > [ignore-nolog])

    > +AT_CHECK([cat hv3/ovn-controller.log | grep 'tunnel_key 1'], [1],

    > [ignore-nolog])

    > +AT_CHECK([cat hv3/ovn-controller.log | grep 'tunnel_key 2'], [0],

    > [ignore-nolog])

    > +AT_CHECK([cat hv3/ovn-controller.log | grep 'tunnel_key 3'], [0],

    > [ignore-nolog])

    > +AT_CHECK([cat hv3/ovn-controller.log | grep 'tunnel_key 4'], [0],

    > [ignore-nolog])

    > +

    > +# Gracefully terminate daemons

    > +OVN_CLEANUP([hv1], [hv2], [hv3])

    > +

    > +AT_CLEANUP

    > +

    >  # 3 hypervisors, one logical switch, 3 logical ports per hypervisor

    >  AT_SETUP([ovn -- portsecurity : 3 HVs, 1 LS, 3 lports/HV])

    >  AT_SKIP_IF([test $HAVE_PYTHON = no])

    > @@ -2983,6 +3335,7 @@ ovs-vsctl -- add-port br-int vif2 -- \

    >      options:tx_pcap=hv1/vif2-tx.pcap \

    >      options:rxq_pcap=hv1/vif2-rx.pcap \

    >      ofport-request=1

    > +ovs-appctl -t ovn-controller vlog/set ANY:file:DBG

    >

    >

    >  # Allow some time for ovn-northd and ovn-controller to catch up.

    > @@ -3014,10 +3367,13 @@ echo "---------------------"

    >  echo "---------SB dump-----"

    >  ovn-sbctl list datapath_binding

    >  echo "---------------------"

    > +ovn-sbctl list port_binding

    > +echo "---------------------"

    >  ovn-sbctl list logical_flow

    >  echo "---------------------"

    >

    >  echo "------ hv1 dump ----------"

    > +as hv1 ovs-ofctl show br-int

    >  as hv1 ovs-ofctl dump-flows br-int

    >

    >

    > @@ -3031,10 +3387,13 @@ sleep 1

    >  echo "---------SB dump-----"

    >  ovn-sbctl list datapath_binding

    >  echo "---------------------"

    > +ovn-sbctl list port_binding

    > +echo "---------------------"

    >  ovn-sbctl list logical_flow

    >  echo "---------------------"

    >

    >  echo "------ hv1 dump ----------"

    > +as hv1 ovs-ofctl show br-int

    >  as hv1 ovs-ofctl dump-flows br-int

    >

    >  as hv1 ovs-appctl netdev-dummy/receive vif1 $packet

    > @@ -4109,6 +4468,7 @@ ovs-vsctl -- add-port br-int hv1-vif1 -- \

    >      options:tx_pcap=hv1/vif1-tx.pcap \

    >      options:rxq_pcap=hv1/vif1-rx.pcap \

    >      ofport-request=1

    > +ovs-appctl -t ovn-controller vlog/set ANY:file:DBG

    >

    >

    >  sim_add hv2

    > @@ -4120,6 +4480,7 @@ ovs-vsctl -- add-port br-int hv2-vif1 -- \

    >      options:tx_pcap=hv2/vif1-tx.pcap \

    >      options:rxq_pcap=hv2/vif1-rx.pcap \

    >      ofport-request=1

    > +ovs-appctl -t ovn-controller vlog/set ANY:file:DBG

    >

    >  # Pre-populate the hypervisors' ARP tables so that we don't lose any

    >  # packets for ARP resolution (native tunneling doesn't queue packets

    > @@ -4213,7 +4574,6 @@ src_ip=`ip_to_hex 192 168 1 2`

    >  dst_ip=`ip_to_hex 172 16 1 2`

    >  expected=${dst_mac}${src_mac}08004500001c000000003e110200${

    > src_ip}${dst_ip}0035111100080000

    >

    > -

    >  as hv1 ovs-appctl netdev-dummy/receive hv1-vif1 $packet

    >  as hv1 ovs-appctl ofproto/trace br-int in_port=1 $packet

    >

    > @@ -4225,6 +4585,15 @@ as hv2 ovs-ofctl show br-int

    >  as hv2 ovs-ofctl dump-flows br-int

    >  echo "----------------------------"

    >

    > +# tunnel key 2 represents the gateway router, hence hv1 does not need

    > +# to know about its ports and flows, since the transit switch acts as

    > proxy.

    > +AT_CHECK([cat hv2/ovn-controller.log | grep 'tunnel_key 2'], [0],

    > [ignore-nolog])

    > +AT_CHECK([cat hv1/ovn-controller.log | grep 'tunnel_key 2'], [1],

    > [ignore-nolog])

    > +# tunnel key 4 represents a datapath (alice) on the far side of the

    > gateway from

    > +# the perspective of hv1, hence hv1 does not need to know about its ports

    > and flows.

    > +AT_CHECK([cat hv2/ovn-controller.log | grep 'tunnel_key 4'], [0],

    > [ignore-nolog])

    > +AT_CHECK([cat hv1/ovn-controller.log | grep 'tunnel_key 4'], [1],

    > [ignore-nolog])

    > +

    >  echo $expected > expected

    >  OVN_CHECK_PACKETS([hv2/vif1-tx.pcap], [expected])

    >

    > --

    > 1.9.1

    >

    > _______________________________________________

    > dev mailing list

    > dev@openvswitch.org<mailto:dev@openvswitch.org>

    > https://urldefense.proofpoint.com/v2/url?u=https-3A__mail.openvswitch.org_mailman_listinfo_ovs-2Ddev&d=DgICAg&c=uilaK90D4TOVoH58JNXRgQ&r=BVhFA09CGX7JQ5Ih-uZnsw&m=0NVW3s0W2Cp5PwqZSAh44zzzcCktMWNrs3_jXV4h4pY&s=Pq6i89qqTAZzkmqhxV08ag-Ve08Jx9nGmx3NQeGgc08&e=

    >

    _______________________________________________
    dev mailing list
    dev@openvswitch.org<mailto:dev@openvswitch.org>
    https://urldefense.proofpoint.com/v2/url?u=https-3A__mail.openvswitch.org_mailman_listinfo_ovs-2Ddev&d=DgICAg&c=uilaK90D4TOVoH58JNXRgQ&r=BVhFA09CGX7JQ5Ih-uZnsw&m=0NVW3s0W2Cp5PwqZSAh44zzzcCktMWNrs3_jXV4h4pY&s=Pq6i89qqTAZzkmqhxV08ag-Ve08Jx9nGmx3NQeGgc08&e=
diff mbox

Patch

diff --git a/ovn/controller/binding.c b/ovn/controller/binding.c
index fb76032..f24e2e6 100644
--- a/ovn/controller/binding.c
+++ b/ovn/controller/binding.c
@@ -68,7 +68,8 @@  static void
 get_local_iface_ids(const struct ovsrec_bridge *br_int,
                     struct shash *lport_to_iface,
                     struct sset *all_lports,
-                    struct sset *egress_ifaces)
+                    struct sset *egress_ifaces,
+                    const struct controller_ctx *ctx)
 {
     int i;
 
@@ -88,6 +89,9 @@  get_local_iface_ids(const struct ovsrec_bridge *br_int,
             iface_id = smap_get(&iface_rec->external_ids, "iface-id");
 
             if (iface_id) {
+                sbrec_port_binding_add_clause_logical_port(ctx->ovnsb_idl,
+                                                           OVSDB_F_EQ,
+                                                           iface_id);
                 shash_add(lport_to_iface, iface_id, iface_rec);
                 sset_add(all_lports, iface_id);
             }
@@ -106,8 +110,13 @@  get_local_iface_ids(const struct ovsrec_bridge *br_int,
 
 static void
 add_local_datapath(struct hmap *local_datapaths,
-        const struct sbrec_port_binding *binding_rec)
+        const struct sbrec_port_binding *binding_rec,
+        const struct controller_ctx *ctx,
+        const struct sbrec_chassis *chassis_rec)
 {
+
+    do_datapath_flood_fill(ctx, binding_rec->datapath, chassis_rec);
+
     if (get_local_datapath(local_datapaths,
                            binding_rec->datapath->tunnel_key)) {
         return;
@@ -293,7 +302,7 @@  consider_local_datapath(struct controller_ctx *ctx,
             /* Add child logical port to the set of all local ports. */
             sset_add(all_lports, binding_rec->logical_port);
         }
-        add_local_datapath(local_datapaths, binding_rec);
+        add_local_datapath(local_datapaths, binding_rec, ctx, chassis_rec);
         if (iface_rec && qos_map && ctx->ovs_idl_txn) {
             get_qos_params(binding_rec, qos_map);
         }
@@ -328,7 +337,7 @@  consider_local_datapath(struct controller_ctx *ctx,
         }
 
         sset_add(all_lports, binding_rec->logical_port);
-        add_local_datapath(local_datapaths, binding_rec);
+        add_local_datapath(local_datapaths, binding_rec, ctx, chassis_rec);
         if (binding_rec->chassis == chassis_rec) {
             return;
         }
@@ -342,7 +351,7 @@  consider_local_datapath(struct controller_ctx *ctx,
         const char *chassis = smap_get(&binding_rec->options,
                                        "l3gateway-chassis");
         if (!strcmp(chassis, chassis_rec->name) && ctx->ovnsb_idl_txn) {
-            add_local_datapath(local_datapaths, binding_rec);
+            add_local_datapath(local_datapaths, binding_rec, ctx, chassis_rec);
         }
     } else if (chassis_rec && binding_rec->chassis == chassis_rec) {
         if (ctx->ovnsb_idl_txn) {
@@ -351,6 +360,9 @@  consider_local_datapath(struct controller_ctx *ctx,
             for (int i = 0; i < binding_rec->n_mac; i++) {
                 VLOG_INFO("Releasing %s", binding_rec->mac[i]);
             }
+            sbrec_port_binding_remove_clause_logical_port(
+                ctx->ovnsb_idl, OVSDB_F_EQ, binding_rec->logical_port);
+
             sbrec_port_binding_set_chassis(binding_rec, NULL);
             sset_find_and_delete(all_lports, binding_rec->logical_port);
         }
@@ -381,13 +393,21 @@  binding_run(struct controller_ctx *ctx, const struct ovsrec_bridge *br_int,
     hmap_init(&qos_map);
     if (br_int) {
         get_local_iface_ids(br_int, &lport_to_iface, all_lports,
-                            &egress_ifaces);
+                            &egress_ifaces, ctx);
     }
 
+    set_all_datapaths_of_interest_stale();
+
     /* Run through each binding record to see if it is resident on this
      * chassis and update the binding accordingly.  This includes both
      * directly connected logical ports and children of those ports. */
     SBREC_PORT_BINDING_FOR_EACH(binding_rec, ctx->ovnsb_idl) {
+
+        /* Set any datapath, used by any port and that we are monitoring as used,
+         * so we don't remove the monitoring.
+         */
+        set_datapath_of_interest_used(binding_rec->datapath);
+
         consider_local_datapath(ctx, chassis_rec, binding_rec,
                                 sset_is_empty(&egress_ifaces) ? NULL :
                                 &qos_map, local_datapaths, &lport_to_iface,
diff --git a/ovn/controller/lflow.c b/ovn/controller/lflow.c
index 4e67365..765eae4 100644
--- a/ovn/controller/lflow.c
+++ b/ovn/controller/lflow.c
@@ -167,6 +167,11 @@  consider_logical_flow(const struct lport_index *lports,
     if (!ldp) {
         return;
     }
+
+    VLOG_DBG("consider logical flow: tunnel_key %lu "
+             "\"match\" \"%s\", \"actions\" \"%s\"",
+             ldp->tunnel_key, lflow->match, lflow->actions);
+
     if (is_switch(ldp)) {
         /* For a logical switch datapath, local_datapaths tells us if there
          * are any local ports for this datapath.  If not, we can skip
diff --git a/ovn/controller/ovn-controller.c b/ovn/controller/ovn-controller.c
index fea7841..c06bd81 100644
--- a/ovn/controller/ovn-controller.c
+++ b/ovn/controller/ovn-controller.c
@@ -69,6 +69,14 @@  OVS_NO_RETURN static void usage(void);
 
 static char *ovs_remote;
 
+struct dp_of_interest_node {
+    struct hmap_node hmap_node;
+    const struct sbrec_datapath_binding *sb_db;
+    bool stale;
+};
+
+static struct hmap dps_of_interest = HMAP_INITIALIZER(&dps_of_interest);
+
 struct local_datapath *
 get_local_datapath(const struct hmap *local_datapaths, uint32_t tunnel_key)
 {
@@ -128,6 +136,181 @@  get_bridge(struct ovsdb_idl *ovs_idl, const char *br_name)
     return NULL;
 }
 
+static bool
+remote_l3gat_lrd_port(const struct sbrec_datapath_binding *dp,
+		              const struct sbrec_chassis *chassis_rec)
+{
+    const char *l3gateway_chassis = smap_get(&dp->options,
+                                             "l3gateway-chassis");
+    if(l3gateway_chassis){
+        if(strcmp(l3gateway_chassis, chassis_rec->name)){
+            return true;
+        }
+    }
+    return false;
+}
+
+void
+set_all_datapaths_of_interest_stale(void)
+{
+    struct dp_of_interest_node *node;
+    HMAP_FOR_EACH(node, hmap_node, &dps_of_interest) {
+            node->stale = true;
+    }
+}
+
+static void
+remove_datapath_of_interest(struct controller_ctx *ctx,
+                            const struct sbrec_datapath_binding *dp)
+{
+
+    struct dp_of_interest_node *node;
+    HMAP_FOR_EACH_WITH_HASH (node, hmap_node, uuid_hash(&dp->header_.uuid),
+                             &dps_of_interest) {
+        if (ctx && uuid_equals(&node->sb_db->header_.uuid, &dp->header_.uuid)) {
+
+            sbrec_port_binding_remove_clause_datapath(ctx->ovnsb_idl,
+                                                      OVSDB_F_EQ,
+                                                      &dp->header_.uuid);
+            sbrec_logical_flow_remove_clause_logical_datapath(
+                ctx->ovnsb_idl, OVSDB_F_EQ, &dp->header_.uuid);
+            hmap_remove(&dps_of_interest, &node->hmap_node);
+            free(node);
+            return;
+        }
+    }
+}
+
+static void
+clean_stale_datapaths_of_interest(struct controller_ctx *ctx)
+{
+    struct dp_of_interest_node *dp_cur_node, *dp_next_node;
+    HMAP_FOR_EACH_SAFE (dp_cur_node, dp_next_node, hmap_node,
+                        &dps_of_interest) {
+        if(dp_cur_node->stale == true) {
+            remove_datapath_of_interest(ctx, dp_cur_node->sb_db);
+        }
+    }
+}
+
+void
+set_datapath_of_interest_used(const struct sbrec_datapath_binding *dp)
+{
+
+    struct dp_of_interest_node *node;
+    HMAP_FOR_EACH_WITH_HASH (node, hmap_node, uuid_hash(&dp->header_.uuid),
+                             &dps_of_interest) {
+        if (uuid_equals(&node->sb_db->header_.uuid, &dp->header_.uuid)) {
+            node->stale = false;
+        }
+    }
+}
+
+static void
+add_datapath_of_interest(const struct controller_ctx *ctx,
+                         const struct sbrec_datapath_binding *dp)
+{
+    struct dp_of_interest_node *node;
+
+    HMAP_FOR_EACH_WITH_HASH (node, hmap_node, uuid_hash(&dp->header_.uuid),
+                             &dps_of_interest) {
+        if (uuid_equals(&node->sb_db->header_.uuid, &dp->header_.uuid)) {
+            node->stale = false;
+            return;
+        }
+    }
+
+    VLOG_DBG("monitor datapath with tunnel_key %lu", dp->tunnel_key);
+
+    sbrec_logical_flow_add_clause_logical_datapath(ctx->ovnsb_idl,
+                                                   OVSDB_F_EQ,
+                                                   &dp->header_.uuid);
+    sbrec_port_binding_add_clause_datapath(ctx->ovnsb_idl, OVSDB_F_EQ,
+                                           &dp->header_.uuid);
+
+    node = xzalloc(sizeof *node);
+    hmap_insert(&dps_of_interest, &node->hmap_node,
+                uuid_hash(&dp->header_.uuid));
+    node->sb_db = dp;
+    return;
+}
+
+void
+do_datapath_flood_fill(const struct controller_ctx *ctx,
+		               const struct sbrec_datapath_binding *dp_start,
+					   const struct sbrec_chassis *chassis_rec)
+{
+
+    if(remote_l3gat_lrd_port(dp_start, chassis_rec)) {
+        return;
+    }
+
+    struct dp_checked {
+        struct hmap_node hmap_node;
+        const struct sbrec_datapath_binding *sb_db;
+    };
+    struct dp_checked *dp_node;
+    struct hmap related_dp_processed = HMAP_INITIALIZER(&related_dp_processed);
+
+    struct interest_dps_list_elem {
+        struct ovs_list list_node;
+        const struct sbrec_datapath_binding * dp;
+    };
+    struct ovs_list interest_datapaths;
+    ovs_list_init(&interest_datapaths);
+
+    struct interest_dps_list_elem *dp_list_elem =
+        xzalloc(sizeof *dp_list_elem);
+    dp_list_elem->dp = dp_start;
+    ovs_list_push_back(&interest_datapaths, &dp_list_elem->list_node);
+
+    while (!ovs_list_is_empty(&interest_datapaths)) {
+
+        struct ovs_list *list_node_ptr =
+            ovs_list_pop_front(&interest_datapaths);
+        dp_list_elem = CONTAINER_OF(list_node_ptr,
+            struct interest_dps_list_elem, list_node);
+	    size_t num_related_datapaths =
+            dp_list_elem->dp->n_related_datapaths;
+	    struct sbrec_datapath_binding **related_datapaths =
+            dp_list_elem->dp->related_datapaths;
+
+        dp_node = xzalloc(sizeof *dp_node);
+        dp_node->sb_db = dp_list_elem->dp;
+        hmap_insert(&related_dp_processed, &dp_node->hmap_node,
+                     uuid_hash(&dp_node->sb_db->header_.uuid));
+
+        add_datapath_of_interest(ctx, dp_list_elem->dp);
+        free(dp_list_elem);
+        for (int i = 0; i < num_related_datapaths; i++) {
+            bool found = false;
+            HMAP_FOR_EACH_WITH_HASH (
+                    dp_node, hmap_node,
+                    uuid_hash(&related_datapaths[i]->header_.uuid),
+                    &related_dp_processed) {
+
+                if (uuid_equals(&dp_node->sb_db->header_.uuid,
+                                &related_datapaths[i]->header_.uuid)) {
+                    found = true;
+                    break;
+                }
+            }
+            if(!found &&
+               !remote_l3gat_lrd_port(related_datapaths[i], chassis_rec)) {
+                dp_list_elem = xzalloc(sizeof *dp_list_elem);
+                dp_list_elem->dp = related_datapaths[i];
+                ovs_list_push_back(&interest_datapaths, &dp_list_elem->list_node);
+            }
+        }
+    }
+    struct dp_checked *cur_node, *next_node;
+    HMAP_FOR_EACH_SAFE (cur_node, next_node, hmap_node, &related_dp_processed) {
+        hmap_remove(&related_dp_processed, &cur_node->hmap_node);
+        free(cur_node);
+    }
+    hmap_destroy(&related_dp_processed);
+}
+
 static const struct ovsrec_bridge *
 create_br_int(struct controller_ctx *ctx,
               const struct ovsrec_open_vswitch *cfg,
@@ -465,6 +648,10 @@  main(int argc, char *argv[])
         ovsdb_idl_create(ovnsb_remote, &sbrec_idl_class, true, true));
     ovsdb_idl_omit_alert(ovnsb_idl_loop.idl, &sbrec_chassis_col_nb_cfg);
 
+    /* Start with suppressing all logical flows and then later request those
+     * that are specifically needed. */
+    sbrec_logical_flow_add_clause_false(ovnsb_idl_loop.idl);
+
     /* Track the southbound idl. */
     ovsdb_idl_track_add_all(ovnsb_idl_loop.idl);
 
@@ -561,6 +748,7 @@  main(int argc, char *argv[])
             lport_index_destroy(&lports);
         }
 
+        clean_stale_datapaths_of_interest(&ctx);
         sset_destroy(&all_lports);
 
         struct local_datapath *cur_node, *next_node;
diff --git a/ovn/controller/ovn-controller.h b/ovn/controller/ovn-controller.h
index 8a3e855..e35b5b0 100644
--- a/ovn/controller/ovn-controller.h
+++ b/ovn/controller/ovn-controller.h
@@ -76,6 +76,16 @@  const struct ovsrec_bridge *get_bridge(struct ovsdb_idl *,
 const struct sbrec_chassis *get_chassis(struct ovsdb_idl *,
                                         const char *chassis_id);
 
+void do_datapath_flood_fill(const struct controller_ctx *,
+                            const struct sbrec_datapath_binding *,
+                            const struct sbrec_chassis *);
+
+void
+set_datapath_of_interest_used(const struct sbrec_datapath_binding *dp);
+
+void
+set_all_datapaths_of_interest_stale(void);
+
 /* Must be a bit-field ordered from most-preferred (higher number) to
  * least-preferred (lower number). */
 enum chassis_tunnel_type {
diff --git a/ovn/controller/patch.c b/ovn/controller/patch.c
index 79a7d81..9e8e856 100644
--- a/ovn/controller/patch.c
+++ b/ovn/controller/patch.c
@@ -330,6 +330,9 @@  add_logical_patch_ports(struct controller_ctx *ctx,
             free(dst_name);
             free(src_name);
             add_patched_datapath(patched_datapaths, binding, local_port);
+
+            sbrec_port_binding_add_clause_logical_port(ctx->ovnsb_idl,
+                                                       OVSDB_F_EQ, peer);
             if (local_port) {
                 if (binding->chassis != chassis_rec && ctx->ovnsb_idl_txn) {
                     sbrec_port_binding_set_chassis(binding, chassis_rec);
@@ -339,6 +342,26 @@  add_logical_patch_ports(struct controller_ctx *ctx,
     }
 }
 
+/* Non-logical patch ports should be unaffected. */
+static void
+remove_patch_port_monitors(struct controller_ctx *ctx,
+                            const struct ovsrec_port *port)
+{
+    for (size_t i = 0; i < port->n_interfaces; i++) {
+        struct ovsrec_interface *iface = port->interfaces[i];
+        if (strcmp(iface->type, "patch")) {
+            continue;
+        }
+        const char *iface_id = smap_get(&iface->external_ids,
+                                        "iface-id");
+        if (iface_id) {
+            sbrec_port_binding_remove_clause_logical_port(ctx->ovnsb_idl,
+                                                          OVSDB_F_EQ,
+                                                          iface_id);
+        }
+    }
+}
+
 void
 patch_run(struct controller_ctx *ctx, const struct ovsrec_bridge *br_int,
           const char *chassis_id, struct hmap *local_datapaths,
@@ -374,6 +397,7 @@  patch_run(struct controller_ctx *ctx, const struct ovsrec_bridge *br_int,
     SHASH_FOR_EACH_SAFE (port_node, port_next_node, &existing_ports) {
         struct ovsrec_port *port = port_node->data;
         shash_delete(&existing_ports, port_node);
+        remove_patch_port_monitors(ctx, port);
         remove_port(ctx, port);
     }
     shash_destroy(&existing_ports);
diff --git a/ovn/northd/ovn-northd.c b/ovn/northd/ovn-northd.c
index f2dc353..7ea5dc9 100644
--- a/ovn/northd/ovn-northd.c
+++ b/ovn/northd/ovn-northd.c
@@ -231,6 +231,42 @@  struct tnlid_node {
     uint32_t tnlid;
 };
 
+struct related_datapath_node {
+    struct hmap_node hmap_node;
+    const struct sbrec_datapath_binding *sb_db;
+};
+
+static void
+add_related_datapath(struct hmap *related_datapaths,
+                     const struct sbrec_datapath_binding *sb,
+                     size_t *n_related_datapaths)
+{
+    struct related_datapath_node *node;
+    HMAP_FOR_EACH_WITH_HASH (node, hmap_node,
+                             uuid_hash(&sb->header_.uuid),
+                             related_datapaths) {
+        if (uuid_equals(&sb->header_.uuid, &node->sb_db->header_.uuid)) {
+            return;
+        }
+    }
+
+    node = xzalloc(sizeof *node);
+    hmap_insert(related_datapaths, &node->hmap_node,
+                uuid_hash(&sb->header_.uuid));
+    node->sb_db = sb;
+    (*n_related_datapaths)++;
+}
+
+static void
+destroy_related_datapaths(struct hmap *related_datapaths)
+{
+    struct related_datapath_node *node;
+    HMAP_FOR_EACH_POP (node, hmap_node, related_datapaths) {
+        free(node);
+    }
+    hmap_destroy(related_datapaths);
+}
+
 static void
 destroy_tnlids(struct hmap *tnlids)
 {
@@ -376,6 +412,8 @@  struct ovn_datapath {
     size_t n_router_ports;
 
     struct hmap port_tnlids;
+    struct hmap related_datapaths;
+    size_t n_related_datapaths;
     uint32_t port_key_hint;
 
     bool has_unknown;
@@ -426,6 +464,7 @@  ovn_datapath_create(struct hmap *datapaths, const struct uuid *key,
     od->nbr = nbr;
     hmap_init(&od->port_tnlids);
     hmap_init(&od->ipam);
+    hmap_init(&od->related_datapaths);
     od->port_key_hint = 0;
     hmap_insert(datapaths, &od->key_node, uuid_hash(&od->key));
     return od;
@@ -441,6 +480,7 @@  ovn_datapath_destroy(struct hmap *datapaths, struct ovn_datapath *od)
         hmap_remove(datapaths, &od->key_node);
         destroy_tnlids(&od->port_tnlids);
         destroy_ipam(&od->ipam);
+        destroy_related_datapaths(&od->related_datapaths);
         free(od->router_ports);
         free(od);
     }
@@ -671,6 +711,69 @@  struct ovn_port {
     struct ovs_list list;       /* In list of similar records. */
 };
 
+static void
+populate_sb_related_datapaths(struct ovn_datapath *od)
+{
+    struct sbrec_datapath_binding **sb_related_datapaths
+        = xmalloc(sizeof(*sb_related_datapaths) * od->n_related_datapaths);
+    int rdi = 0;
+    struct related_datapath_node *related_datapath;
+    HMAP_FOR_EACH (related_datapath, hmap_node,
+                   &od->related_datapaths) {
+        if (rdi >= od->n_related_datapaths) {
+            static struct vlog_rate_limit rl
+                = VLOG_RATE_LIMIT_INIT(5, 1);
+            VLOG_ERR_RL(&rl, "related datapaths accounting error"
+                        UUID_FMT, UUID_ARGS(&od->key));
+            break;
+        }
+
+        sb_related_datapaths[rdi] = CONST_CAST(
+            struct sbrec_datapath_binding *, related_datapath->sb_db);
+        rdi++;
+    }
+    sbrec_datapath_binding_set_related_datapaths(od->sb,
+        sb_related_datapaths, od->n_related_datapaths);
+    free(sb_related_datapaths);
+}
+
+static void
+generate_sb_related_datapaths_from_ports(struct ovs_list *both)
+{
+    struct dp_checked {
+	    struct hmap_node hmap_node;
+	    const struct sbrec_datapath_binding *sb_db;
+	};
+	struct dp_checked *dp_node;
+	struct hmap dp_processed = HMAP_INITIALIZER(&dp_processed);
+    struct ovn_port *op, *next;
+	LIST_FOR_EACH_SAFE (op, next, list, both) {
+	    bool found = false;
+	    HMAP_FOR_EACH_WITH_HASH (dp_node, hmap_node,
+	                             uuid_hash(&op->od->sb->header_.uuid),
+	                             &dp_processed) {
+	        if (uuid_equals(&dp_node->sb_db->header_.uuid,
+	                        &op->od->sb->header_.uuid)) {
+	            found = true;
+	            break;
+	        }
+	    }
+	    if(!found){
+            dp_node = xzalloc(sizeof *dp_node);
+	        dp_node->sb_db = op->od->sb;
+	        hmap_insert(&dp_processed, &dp_node->hmap_node,
+	                    uuid_hash(&dp_node->sb_db->header_.uuid));
+            populate_sb_related_datapaths(op->od);
+	    }
+	}
+	struct dp_checked *cur_node, *next_node;
+	HMAP_FOR_EACH_SAFE (cur_node, next_node, hmap_node, &dp_processed) {
+	    hmap_remove(&dp_processed, &cur_node->hmap_node);
+	    free(cur_node);
+	}
+	hmap_destroy(&dp_processed);
+}
+
 static struct ovn_port *
 ovn_port_create(struct hmap *ports, const char *key,
                 const struct nbrec_logical_switch_port *nbsp,
@@ -1359,12 +1462,19 @@  ovn_port_update_sbrec(const struct ovn_port *op,
             sbrec_port_binding_set_type(op->sb, "patch");
         }
 
+        if (op->peer && op->peer->od && op->peer->od->sb) {
+            add_related_datapath(&op->od->related_datapaths,
+                                 op->peer->od->sb,
+                                 &op->od->n_related_datapaths);
+        }
+
         const char *peer = op->peer ? op->peer->key : "<error>";
         struct smap new;
         smap_init(&new);
         smap_add(&new, "peer", peer);
         if (chassis) {
             smap_add(&new, "l3gateway-chassis", chassis);
+            sbrec_datapath_binding_set_options(op->sb->datapath, &new);
         }
         sbrec_port_binding_set_options(op->sb, &new);
         smap_destroy(&new);
@@ -1411,6 +1521,12 @@  ovn_port_update_sbrec(const struct ovn_port *op,
                 sbrec_port_binding_set_type(op->sb, "patch");
             }
 
+            if (op->peer && op->peer->od && op->peer->od->sb) {
+                add_related_datapath(&op->od->related_datapaths,
+                                     op->peer->od->sb,
+                                     &op->od->n_related_datapaths);
+            }
+
             const char *router_port = smap_get_def(&op->nbsp->options,
                                                    "router-port", "<error>");
             struct smap new;
@@ -1519,6 +1635,7 @@  build_ports(struct northd_context *ctx, struct hmap *datapaths,
         cleanup_mac_bindings(ctx, ports);
     }
 
+    generate_sb_related_datapaths_from_ports(&both);
     tag_alloc_destroy(&tag_alloc_table);
     destroy_chassis_queues(&chassis_qdisc_queues);
 }
@@ -4952,6 +5069,10 @@  main(int argc, char *argv[])
     add_column_noalert(ovnsb_idl_loop.idl,
                        &sbrec_datapath_binding_col_tunnel_key);
     add_column_noalert(ovnsb_idl_loop.idl,
+                       &sbrec_datapath_binding_col_related_datapaths);
+    add_column_noalert(ovnsb_idl_loop.idl,
+                       &sbrec_datapath_binding_col_options);
+    add_column_noalert(ovnsb_idl_loop.idl,
                        &sbrec_datapath_binding_col_external_ids);
 
     ovsdb_idl_add_table(ovnsb_idl_loop.idl, &sbrec_table_port_binding);
diff --git a/ovn/ovn-sb.ovsschema b/ovn/ovn-sb.ovsschema
index 89342fe..8dfb2f4 100644
--- a/ovn/ovn-sb.ovsschema
+++ b/ovn/ovn-sb.ovsschema
@@ -1,7 +1,7 @@ 
 {
     "name": "OVN_Southbound",
-    "version": "1.9.0",
-    "cksum": "239060528 9012",
+    "version": "1.10.0",
+    "cksum": "3562877871 9537",
     "tables": {
         "SB_Global": {
             "columns": {
@@ -93,7 +93,17 @@ 
                                       "maxInteger": 16777215}}},
                 "external_ids": {
                     "type": {"key": "string", "value": "string",
-                             "min": 0, "max": "unlimited"}}},
+                             "min": 0, "max": "unlimited"}},
+                "options": {
+                     "type": {"key": "string",
+                              "value": "string",
+                              "min": 0,
+                              "max": "unlimited"}},
+                "related_datapaths": {"type": {"key": {"type": "uuid",
+                                      "refTable": "Datapath_Binding",
+                                      "refType": "weak"},
+                                      "min": 0,
+                                      "max": "unlimited"}}},
             "indexes": [["tunnel_key"]],
             "isRoot": true},
         "Port_Binding": {
diff --git a/ovn/ovn-sb.xml b/ovn/ovn-sb.xml
index 65191ed..88de25d 100644
--- a/ovn/ovn-sb.xml
+++ b/ovn/ovn-sb.xml
@@ -1556,6 +1556,19 @@  tcp.flags = RST;
       constructed for each supported encapsulation.
     </column>
 
+    <column name="options" key="l3gateway-chassis">
+      The <code>chassis</code> in which the router datapath resides.
+    </column>
+
+    <column name="related_datapaths">
+      The related_datapaths column is used to keep track of which datapaths
+      are connected to each other.  This is leveraged in ovn-controller to
+      do a flood fill from each local logical switch to determine the full
+      set of datapaths needed on a given ovn-controller.   This set of
+      datapaths can be used to determine which logical ports and logical
+      flows an ovn-controller should monitor.
+    </column>
+
     <group title="OVN_Northbound Relationship">
       <p>
         Each row in <ref table="Datapath_Binding"/> is associated with some
diff --git a/tests/ovn.at b/tests/ovn.at
index a226849..af544e1 100644
--- a/tests/ovn.at
+++ b/tests/ovn.at
@@ -1673,6 +1673,141 @@  OVN_CLEANUP([hv1],[hv2])
 
 AT_CLEANUP
 
+# 2 hypervisors, each with one logical switch with 2 logical ports.
+AT_SETUP([ovn -- conditional monitoring: 2 HVs each with 1 LS and 2 LPs])
+AT_SKIP_IF([test $HAVE_PYTHON = no])
+ovn_start
+
+# In this test cases we create 2 switches, all connected to same
+# physical network (through br-phys on each HV). Each switch has
+# VIF ports across 1 HV. Each HV has 2 VIF ports. The first digit
+# of VIF port name indicates the hypervisor it is bound to, e.g.
+# lp12 means VIF 2 on hv1.
+
+for i in 1 2; do
+    ls_name=ls$i
+    ovn-nbctl ls-add $ls_name
+done
+
+net_add n1
+for i in 1 2; do
+    sim_add hv$i
+    as hv$i
+    ovs-vsctl add-br br-phys
+    ovn_attach n1 br-phys 192.168.0.$i
+    ovs-appctl -t ovn-controller vlog/set ANY:file:DBG
+
+    for j in 1 2; do
+        ovs-vsctl add-port br-int vif$i$j -- \
+            set Interface vif$i$j external-ids:iface-id=lp$i$j \
+                                  options:tx_pcap=hv$i/vif$i$j-tx.pcap \
+                                  options:rxq_pcap=hv$i/vif$i$j-rx.pcap \
+                                  ofport-request=$i$j
+
+        lsp_name=lp$i$j
+        if test $i -eq 1; then
+            ls_name=ls1
+        else
+            ls_name=ls2
+        fi
+
+        ovn-nbctl lsp-add $ls_name $lsp_name
+        ovn-nbctl lsp-set-addresses $lsp_name f0:00:00:00:00:$i$j
+        ovn-nbctl lsp-set-port-security $lsp_name f0:00:00:00:00:$i$j
+
+        OVS_WAIT_UNTIL([test x`ovn-nbctl lsp-get-up $lsp_name` = xup])
+    done
+done
+
+ovn_populate_arp
+
+# XXX This is now the 3rd copy of these functions in this file ...
+
+# Given the name of a logical port, prints the name of the hypervisor
+# on which it is located.
+vif_to_hv() {
+    echo hv${1%?}
+}
+#
+# test_packet INPORT DST SRC ETHTYPE OUTPORT...
+#
+# This shell function causes a packet to be received on INPORT.  The packet's
+# content has Ethernet destination DST and source SRC (each exactly 12 hex
+# digits) and Ethernet type ETHTYPE (4 hex digits).  The OUTPORTs (zero or
+# more) list the VIFs on which the packet should be received.  INPORT and the
+# OUTPORTs are specified as logical switch port numbers, e.g. 11 for vif11.
+for i in 1 2; do
+    for j in 1 2; do
+        : > $i$j.expected
+    done
+done
+test_packet() {
+    local inport=$1 src=$2 dst=$3 eth=$4; shift; shift; shift; shift
+    local packet=${src}${dst}${eth}
+    hv=`vif_to_hv $inport`
+    vif=vif$inport
+    as $hv ovs-appctl netdev-dummy/receive $vif $packet
+    for outport; do
+        echo $packet >> $outport.expected
+    done
+}
+
+# lp11 and lp12 are on the same network and on
+# same hypervisors
+test_packet 11 f00000000012 f00000000011 1112 12
+test_packet 12 f00000000011 f00000000012 1211 11
+
+# lp21 and lp22 are on the same network and on
+# the same hypervisor
+test_packet 21 f00000000022 f00000000021 2122 22
+test_packet 22 f00000000021 f00000000022 2221 21
+
+# lp11 and lp22 are on different networks so should
+# not be able to communicate.
+test_packet 11 f00000000022 f00000000011 1122
+test_packet 22 f00000000011 f00000000022 2211
+
+echo "------ OVN dump ------"
+ovn-nbctl show
+ovn-sbctl show
+
+echo "------ hv1 dump ------"
+as hv1 ovs-vsctl show
+as hv1 ovs-ofctl -O OpenFlow13 dump-flows br-int
+
+echo "------ hv2 dump ------"
+as hv2 ovs-vsctl show
+as hv2 ovs-ofctl -O OpenFlow13 dump-flows br-int
+
+# Check that all datapath components are monitored as expected.
+AT_CHECK([cat hv1/ovn-controller.log | grep 'tunnel_key 1'], [0], [ignore-nolog])
+AT_CHECK([cat hv1/ovn-controller.log | grep 'tunnel_key 2'], [1], [ignore-nolog])
+AT_CHECK([cat hv2/ovn-controller.log | grep 'tunnel_key 2'], [0], [ignore-nolog])
+AT_CHECK([cat hv2/ovn-controller.log | grep 'tunnel_key 1'], [1], [ignore-nolog])
+
+dbg_instances1=`cat hv1/ovn-controller.log | grep 'tunnel_key 1' | wc -l`
+echo ls1 monitor traces on hv1 $dbg_instances1
+dbg_instances2=`cat hv1/ovn-controller.log | grep 'tunnel_key 2' | wc -l`
+echo ls2 monitor traces on hv1 $dbg_instances2
+AT_CHECK([test $dbg_instances1 -gt $dbg_instances2], [0], [ignore-nolog])
+
+dbg_instances1=`cat hv2/ovn-controller.log | grep 'tunnel_key 2' | wc -l`
+echo ls2 monitor traces on hv2 $dbg_instances1
+dbg_instances2=`cat hv2/ovn-controller.log | grep 'tunnel_key 1' | wc -l`
+echo ls1 monitor traces on hv2 $dbg_instances2
+AT_CHECK([test $dbg_instances1 -gt $dbg_instances2], [0], [ignore-nolog])
+
+# Now check the packets actually received against the ones expected.
+for i in 1 2; do
+    for j in 1 2; do
+        OVN_CHECK_PACKETS([hv$i/vif$i$j-tx.pcap], [$i$j.expected])
+    done
+done
+
+OVN_CLEANUP([hv1],[hv2])
+
+AT_CLEANUP
+
 AT_SETUP([ovn -- vtep: 3 HVs, 1 VIFs/HV, 1 GW, 1 LS])
 AT_KEYWORDS([vtep])
 AT_SKIP_IF([test $HAVE_PYTHON = no])
@@ -2439,6 +2574,223 @@  OVN_CLEANUP([hv1], [hv2], [hv3])
 
 AT_CLEANUP
 
+# 3 hypervisors, 3 logical switches with 3 logical ports each, 1 logical router
+AT_SETUP([ovn -- conditional Monitoring: 3 HVs, 3 LS, 3 lports/LS, 1 LR])
+AT_SKIP_IF([test $HAVE_PYTHON = no])
+ovn_start
+
+# Logical network:
+#
+# Three logical switches ls1, ls2, ls3.
+# One logical router lr0 connected to ls[123],
+# with nine subnets, three per logical switch:
+#
+#    lrp11 on ls1 for subnet 192.168.11.0/24
+#    lrp12 on ls1 for subnet 192.168.12.0/24
+#    lrp13 on ls1 for subnet 192.168.13.0/24
+#    ...
+#    lrp33 on ls3 for subnet 192.168.33.0/24
+#
+# 27 VIFs, 9 per LS, 3 per subnet: lp[123][123][123], where the first two
+# digits are the subnet and the last digit distinguishes the VIF.
+for i in 1 2 3; do
+    ovn-nbctl ls-add ls$i
+    for j in 1 2 3; do
+        for k in 1 2 3; do
+            ovn-nbctl \
+                -- lsp-add ls$i lp$i$j$k \
+                -- lsp-set-addresses lp$i$j$k "f0:00:00:00:0$i:$j$k \
+                   192.168.$i$j.$k"
+        done
+    done
+done
+
+# Only ls2 and ls3 are connected by the logical router.
+ovn-nbctl lr-add lr0
+for i in 2 3; do
+    for j in 1 2 3; do
+        ovn-nbctl lrp-add lr0 lrp$i$j 00:00:00:00:ff:$i$j 192.168.$i$j.254/24
+        ovn-nbctl \
+            -- lsp-add ls$i lrp$i$j-attachment \
+            -- set Logical_Switch_Port lrp$i$j-attachment type=router \
+                             options:router-port=lrp$i$j \
+                             addresses='"00:00:00:00:ff:'$i$j'"'
+    done
+done
+
+# Physical network:
+#
+# Three hypervisors hv[123].
+# lp[1??] on hv1
+# lp[2??] on hv2.
+# lp[3??] on hv3.
+
+# Given the name of a logical port, prints the name of the hypervisor
+# on which it is located.
+vif_to_hv() {
+    case $1 in dnl (
+        1??) echo 1 ;; dnl (
+        2??) echo 2 ;; dnl (
+        3??) echo 3 ;;
+    esac
+}
+
+# Given the name of a logical port, prints the name of its logical router
+# port, e.g. "vif_to_lrp 123" yields 12.
+vif_to_lrp() {
+    echo ${1%?}
+}
+
+# Given the name of a logical port, prints the name of its logical
+# switch, e.g. "vif_to_ls 123" yields 1.
+vif_to_ls() {
+    echo ${1%??}
+}
+
+net_add n1
+for i in 1 2 3; do
+    sim_add hv$i
+    as hv$i
+    ovs-vsctl add-br br-phys
+    ovn_attach n1 br-phys 192.168.0.$i
+    ovs-appctl -t ovn-controller vlog/set ANY:file:DBG
+done
+for i in 1 2 3; do
+    for j in 1 2 3; do
+        for k in 1 2 3; do
+            hv=`vif_to_hv $i$j$k`
+                as hv$hv ovs-vsctl \
+                -- add-port br-int vif$i$j$k \
+                -- set Interface vif$i$j$k \
+                    external-ids:iface-id=lp$i$j$k \
+                    options:tx_pcap=hv$hv/vif$i$j$k-tx.pcap \
+                    options:rxq_pcap=hv$hv/vif$i$j$k-rx.pcap \
+                    ofport-request=$i$j$k
+        done
+    done
+done
+
+# Pre-populate the hypervisors' ARP tables so that we don't lose any
+# packets for ARP resolution (native tunneling doesn't queue packets
+# for ARP resolution).
+ovn_populate_arp
+
+# Allow some time for ovn-northd and ovn-controller to catch up.
+# XXX This should be more systematic.
+sleep 1
+
+# test_ip INPORT SRC_MAC DST_MAC SRC_IP DST_IP OUTPORT...
+#
+# This shell function causes a packet to be received on INPORT.  The packet's
+# content has Ethernet destination DST and source SRC (each exactly 12 hex
+# digits) and Ethernet type ETHTYPE (4 hex digits).  The OUTPORTs (zero or
+# more) list the VIFs on which the packet should be received.  INPORT and the
+# OUTPORTs are specified as logical switch port numbers, e.g. 123 for vif123.
+for i in 1 2 3; do
+    for j in 1 2 3; do
+        for k in 1 2 3; do
+            : > $i$j$k.expected
+        done
+    done
+done
+test_ip() {
+    # This packet has bad checksums but logical L3 routing doesn't check.
+    local inport=$1 src_mac=$2 dst_mac=$3 src_ip=$4 dst_ip=$5
+    local packet=${dst_mac}${src_mac}08004500001c0000000040110000${src_ip}${dst_ip}0035111100080000
+    shift; shift; shift; shift; shift
+    hv=hv`vif_to_hv $inport`
+    as $hv ovs-appctl netdev-dummy/receive vif$inport $packet
+    in_ls=`vif_to_ls $inport`
+    in_lrp=`vif_to_lrp $inport`
+    for outport; do
+        out_ls=`vif_to_ls $outport`
+        if test $in_ls = $out_ls; then
+            # Ports on the same logical switch receive exactly the same packet.
+            echo $packet
+        else
+            # Routing decrements TTL and updates source and dest MAC
+            # (and checksum).
+            out_lrp=`vif_to_lrp $outport`
+            echo f00000000${outport}00000000ff${out_lrp}08004500001c00000000"3f1101"00${src_ip}${dst_ip}0035111100080000
+        fi >> $outport.expected
+    done
+}
+
+as hv1 ovn-sbctl list datapath_binding
+echo "----------------------------"
+as hv1 ovn-sbctl list port_binding
+echo "----------------------------"
+as hv1 ovn-sbctl dump-flows
+#as hv1 ovs-vsctl --columns=name,ofport list interface
+#as hv1 ovs-ofctl dump-flows br-int
+
+# Send IP packets between all pairs of source and destination ports:
+#
+# 1. Unicast IP packets are delivered to exactly one logical switch port
+#    (except that packets destined to their input ports are dropped).
+#
+ip_to_hex() {
+    printf "%02x%02x%02x%02x" "$@"
+}
+for is in 1 2 3; do
+  for js in 1 2 3; do
+    for ks in 1 2 3; do
+      s=$is$js$ks
+      smac=f00000000$s
+      sip=`ip_to_hex 192 168 $is$js $ks`
+      for id in 1 2 3; do
+          for jd in 1 2 3; do
+              for kd in 1 2 3; do
+                d=$id$jd$kd
+                dip=`ip_to_hex 192 168 $id$jd $kd`
+                if test $is = $id; then dmac=f00000000$d; else dmac=00000000ff$is$js; fi
+                if test $d != $s; then unicast=$d; else unicast=; fi
+                if test $is = 1; then if test $is != $id; then unicast=; fi; fi
+                if test $id = 1; then if test $is != $id; then unicast=; fi; fi
+                test_ip $s $smac $dmac $sip $dip $unicast
+              done
+          done
+        done
+      done
+  done
+done
+
+# Allow some time for packet forwarding.
+# XXX This can be improved.
+sleep 1
+
+ovn-sbctl -f csv -d bare --no-heading \
+    -- --columns=logical_port,ip,mac list mac_binding > mac_bindings
+
+# Now check the packets actually received against the ones expected.
+for i in 1 2 3; do
+    for j in 1 2 3; do
+        for k in 1 2 3; do
+            OVN_CHECK_PACKETS([hv`vif_to_hv $i$j$k`/vif$i$j$k-tx.pcap],
+                              [$i$j$k.expected])
+        done
+    done
+done
+
+# Check that all datapath components are monitored as expected.
+AT_CHECK([cat hv1/ovn-controller.log | grep 'tunnel_key 1'], [0], [ignore-nolog])
+AT_CHECK([cat hv1/ovn-controller.log | grep 'tunnel_key 2'], [1], [ignore-nolog])
+AT_CHECK([cat hv1/ovn-controller.log | grep 'tunnel_key 3'], [1], [ignore-nolog])
+AT_CHECK([cat hv1/ovn-controller.log | grep 'tunnel_key 4'], [1], [ignore-nolog])
+AT_CHECK([cat hv2/ovn-controller.log | grep 'tunnel_key 1'], [1], [ignore-nolog])
+AT_CHECK([cat hv2/ovn-controller.log | grep 'tunnel_key 2'], [0], [ignore-nolog])
+AT_CHECK([cat hv2/ovn-controller.log | grep 'tunnel_key 3'], [0], [ignore-nolog])
+AT_CHECK([cat hv2/ovn-controller.log | grep 'tunnel_key 4'], [0], [ignore-nolog])
+AT_CHECK([cat hv3/ovn-controller.log | grep 'tunnel_key 1'], [1], [ignore-nolog])
+AT_CHECK([cat hv3/ovn-controller.log | grep 'tunnel_key 2'], [0], [ignore-nolog])
+AT_CHECK([cat hv3/ovn-controller.log | grep 'tunnel_key 3'], [0], [ignore-nolog])
+AT_CHECK([cat hv3/ovn-controller.log | grep 'tunnel_key 4'], [0], [ignore-nolog])
+
+# Gracefully terminate daemons
+OVN_CLEANUP([hv1], [hv2], [hv3])
+
+AT_CLEANUP
+
 # 3 hypervisors, one logical switch, 3 logical ports per hypervisor
 AT_SETUP([ovn -- portsecurity : 3 HVs, 1 LS, 3 lports/HV])
 AT_SKIP_IF([test $HAVE_PYTHON = no])
@@ -2983,6 +3335,7 @@  ovs-vsctl -- add-port br-int vif2 -- \
     options:tx_pcap=hv1/vif2-tx.pcap \
     options:rxq_pcap=hv1/vif2-rx.pcap \
     ofport-request=1
+ovs-appctl -t ovn-controller vlog/set ANY:file:DBG
 
 
 # Allow some time for ovn-northd and ovn-controller to catch up.
@@ -3014,10 +3367,13 @@  echo "---------------------"
 echo "---------SB dump-----"
 ovn-sbctl list datapath_binding
 echo "---------------------"
+ovn-sbctl list port_binding
+echo "---------------------"
 ovn-sbctl list logical_flow
 echo "---------------------"
 
 echo "------ hv1 dump ----------"
+as hv1 ovs-ofctl show br-int
 as hv1 ovs-ofctl dump-flows br-int
 
 
@@ -3031,10 +3387,13 @@  sleep 1
 echo "---------SB dump-----"
 ovn-sbctl list datapath_binding
 echo "---------------------"
+ovn-sbctl list port_binding
+echo "---------------------"
 ovn-sbctl list logical_flow
 echo "---------------------"
 
 echo "------ hv1 dump ----------"
+as hv1 ovs-ofctl show br-int
 as hv1 ovs-ofctl dump-flows br-int
 
 as hv1 ovs-appctl netdev-dummy/receive vif1 $packet
@@ -4109,6 +4468,7 @@  ovs-vsctl -- add-port br-int hv1-vif1 -- \
     options:tx_pcap=hv1/vif1-tx.pcap \
     options:rxq_pcap=hv1/vif1-rx.pcap \
     ofport-request=1
+ovs-appctl -t ovn-controller vlog/set ANY:file:DBG
 
 
 sim_add hv2
@@ -4120,6 +4480,7 @@  ovs-vsctl -- add-port br-int hv2-vif1 -- \
     options:tx_pcap=hv2/vif1-tx.pcap \
     options:rxq_pcap=hv2/vif1-rx.pcap \
     ofport-request=1
+ovs-appctl -t ovn-controller vlog/set ANY:file:DBG
 
 # Pre-populate the hypervisors' ARP tables so that we don't lose any
 # packets for ARP resolution (native tunneling doesn't queue packets
@@ -4213,7 +4574,6 @@  src_ip=`ip_to_hex 192 168 1 2`
 dst_ip=`ip_to_hex 172 16 1 2`
 expected=${dst_mac}${src_mac}08004500001c000000003e110200${src_ip}${dst_ip}0035111100080000
 
-
 as hv1 ovs-appctl netdev-dummy/receive hv1-vif1 $packet
 as hv1 ovs-appctl ofproto/trace br-int in_port=1 $packet
 
@@ -4225,6 +4585,15 @@  as hv2 ovs-ofctl show br-int
 as hv2 ovs-ofctl dump-flows br-int
 echo "----------------------------"
 
+# tunnel key 2 represents the gateway router, hence hv1 does not need
+# to know about its ports and flows, since the transit switch acts as proxy.
+AT_CHECK([cat hv2/ovn-controller.log | grep 'tunnel_key 2'], [0], [ignore-nolog])
+AT_CHECK([cat hv1/ovn-controller.log | grep 'tunnel_key 2'], [1], [ignore-nolog])
+# tunnel key 4 represents a datapath (alice) on the far side of the gateway from
+# the perspective of hv1, hence hv1 does not need to know about its ports and flows.
+AT_CHECK([cat hv2/ovn-controller.log | grep 'tunnel_key 4'], [0], [ignore-nolog])
+AT_CHECK([cat hv1/ovn-controller.log | grep 'tunnel_key 4'], [1], [ignore-nolog])
+
 echo $expected > expected
 OVN_CHECK_PACKETS([hv2/vif1-tx.pcap], [expected])