diff mbox series

[ovs-dev] controller: Avoid unnecessary load balancer flow processing.

Message ID 20210707145627.23990-1-dceara@redhat.com
State Changes Requested
Headers show
Series [ovs-dev] controller: Avoid unnecessary load balancer flow processing. | expand

Checks

Context Check Description
ovsrobot/apply-robot success apply and check: success
ovsrobot/github-robot-_Build_and_Test success github build: passed
ovsrobot/github-robot-_ovn-kubernetes success github build: passed

Commit Message

Dumitru Ceara July 7, 2021, 2:56 p.m. UTC
Whenever a Load_Balancer is updated, e.g., a VIP is added, the following
sequence of events happens:

1. The Southbound Load_Balancer record is updated.
2. The Southbound Datapath_Binding records on which the Load_Balancer is
   applied are updated.
3. Southbound ovsdb-server sends updates about the Load_Balancer and
   Datapath_Binding records to ovn-controller.
4. The IDL layer in ovn-controller processes the updates at #3, but
   because of the SB schema references between tables [0] all logical
   flows referencing the updated Datapath_Binding are marked as
   "updated".  The same is true for Logical_DP_Group records
   referencing the Datapath_Binding, and also for all logical flows
   pointing to the new "updated" datapath groups.
5. ovn-controller ends up recomputing (removing/readding) all flows for
   all these tracked updates.

From the SB Schema:
        "Datapath_Binding": {
            "columns": {
                [...]
                "load_balancers": {"type": {"key": {"type": "uuid",
                                                   "refTable": "Load_Balancer",
                                                   "refType": "weak"},
                                            "min": 0,
                                            "max": "unlimited"}},
        [...]
        "Load_Balancer": {
            "columns": {
                "datapaths": {
                [...]
                    "type": {"key": {"type": "uuid",
                                     "refTable": "Datapath_Binding"},
                             "min": 0, "max": "unlimited"}},
        [...]
        "Logical_DP_Group": {
            "columns": {
                "datapaths":
                    {"type": {"key": {"type": "uuid",
                                      "refTable": "Datapath_Binding",
                                      "refType": "weak"},
                              "min": 0, "max": "unlimited"}}},
        [...]
        "Logical_Flow": {
            "columns": {
                "logical_datapath":
                    {"type": {"key": {"type": "uuid",
                                      "refTable": "Datapath_Binding"},
                              "min": 0, "max": 1}},
                "logical_dp_group":
                    {"type": {"key": {"type": "uuid",
                                      "refTable": "Logical_DP_Group"},

In order to avoid this unnecessary Logical_Flow notification storm we
now remove the explicit reference from Datapath_Binding to
Load_Balancer and instead store raw UUIDs.

This means that on the ovn-controller side we need to perform a
Load_Balancer table lookup by UUID whenever a new datapath is added,
but that doesn't happen too often and the cost of the lookup is
negligible compared to the huge cost of processing the unnecessary
logical flow updates.

This change is backwards compatible because the contents stored in the
database are not changed, just that the schema constraints are relaxed a
bit.

Some performance measurements, on a scale test deployment simulating an
ovn-kubernetes deployment with 120 nodes and a large load balancer
with 16K VIPs associated to each node's logical switch, the event
processing loop time in ovn-controller, when adding a new VIP, is
reduced from ~39 seconds to ~8 seconds.

There's no need to change the northd DDlog implementation.

Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=1978605
Signed-off-by: Dumitru Ceara <dceara@redhat.com>
---
 controller/lflow.c          |  6 ++++--
 controller/lflow.h          |  2 ++
 controller/ovn-controller.c |  2 ++
 lib/inc-proc-eng.h          |  1 +
 northd/ovn-northd.c         | 14 ++++++--------
 ovn-sb.ovsschema            |  6 ++----
 6 files changed, 17 insertions(+), 14 deletions(-)

Comments

Mark Michelson July 8, 2021, 8:10 p.m. UTC | #1
Acked-by: Mark Michelson <mmichels@redhat.com>


If I'm to extrapolate this a bit more, it sounds like any time a 
Datapath_Binding is updated, it results in the potential for many 
Logical_Flows to be re-calculated. In this particular case, it was 
triggered based on modifying a wide-reaching load balancer. I suppose 
the same thing would happen if Datapath_Binding external_ids were updated.

I wonder if it's worth investigating trying to create a new reference 
type for the purposes of the IDL. The idea would be that if a column in 
table A references rows in table B, then if the rows in table B are 
updated, then any relevant rows in table A will reported as updated as 
well. However, any columns in tables C, D, E, etc. that reference table 
A will not result in the rows in tables C, D, E, etc. being reported as 
updated.

Essentially, it's the same as what you've done here, except we can have 
the IDL fill in the sbrec_load_balancer in the Datapath_Binding record 
instead of having to look it up when needed.

Alternatively, would it be worth listing the relevant columns that a 
particular type "cares" about? For instance, Logical_Datapaths are only 
affected if the tunnel_key is updated on a Datapath_Binding. The 
load_balancers and external_ids are irrelevant. Would it be worthwhile 
to be able to note in the reference the columns that should elicit an 
update?

Is it worth it? Or is this such a niche use case that it's not worth 
implementing?

On 7/7/21 10:56 AM, Dumitru Ceara wrote:
> Whenever a Load_Balancer is updated, e.g., a VIP is added, the following
> sequence of events happens:
> 
> 1. The Southbound Load_Balancer record is updated.
> 2. The Southbound Datapath_Binding records on which the Load_Balancer is
>     applied are updated.
> 3. Southbound ovsdb-server sends updates about the Load_Balancer and
>     Datapath_Binding records to ovn-controller.
> 4. The IDL layer in ovn-controller processes the updates at #3, but
>     because of the SB schema references between tables [0] all logical
>     flows referencing the updated Datapath_Binding are marked as
>     "updated".  The same is true for Logical_DP_Group records
>     referencing the Datapath_Binding, and also for all logical flows
>     pointing to the new "updated" datapath groups.
> 5. ovn-controller ends up recomputing (removing/readding) all flows for
>     all these tracked updates.
> 
>  From the SB Schema:
>          "Datapath_Binding": {
>              "columns": {
>                  [...]
>                  "load_balancers": {"type": {"key": {"type": "uuid",
>                                                     "refTable": "Load_Balancer",
>                                                     "refType": "weak"},
>                                              "min": 0,
>                                              "max": "unlimited"}},
>          [...]
>          "Load_Balancer": {
>              "columns": {
>                  "datapaths": {
>                  [...]
>                      "type": {"key": {"type": "uuid",
>                                       "refTable": "Datapath_Binding"},
>                               "min": 0, "max": "unlimited"}},
>          [...]
>          "Logical_DP_Group": {
>              "columns": {
>                  "datapaths":
>                      {"type": {"key": {"type": "uuid",
>                                        "refTable": "Datapath_Binding",
>                                        "refType": "weak"},
>                                "min": 0, "max": "unlimited"}}},
>          [...]
>          "Logical_Flow": {
>              "columns": {
>                  "logical_datapath":
>                      {"type": {"key": {"type": "uuid",
>                                        "refTable": "Datapath_Binding"},
>                                "min": 0, "max": 1}},
>                  "logical_dp_group":
>                      {"type": {"key": {"type": "uuid",
>                                        "refTable": "Logical_DP_Group"},
> 
> In order to avoid this unnecessary Logical_Flow notification storm we
> now remove the explicit reference from Datapath_Binding to
> Load_Balancer and instead store raw UUIDs.
> 
> This means that on the ovn-controller side we need to perform a
> Load_Balancer table lookup by UUID whenever a new datapath is added,
> but that doesn't happen too often and the cost of the lookup is
> negligible compared to the huge cost of processing the unnecessary
> logical flow updates.
> 
> This change is backwards compatible because the contents stored in the
> database are not changed, just that the schema constraints are relaxed a
> bit.
> 
> Some performance measurements, on a scale test deployment simulating an
> ovn-kubernetes deployment with 120 nodes and a large load balancer
> with 16K VIPs associated to each node's logical switch, the event
> processing loop time in ovn-controller, when adding a new VIP, is
> reduced from ~39 seconds to ~8 seconds.
> 
> There's no need to change the northd DDlog implementation.
> 
> Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=1978605
> Signed-off-by: Dumitru Ceara <dceara@redhat.com>
> ---
>   controller/lflow.c          |  6 ++++--
>   controller/lflow.h          |  2 ++
>   controller/ovn-controller.c |  2 ++
>   lib/inc-proc-eng.h          |  1 +
>   northd/ovn-northd.c         | 14 ++++++--------
>   ovn-sb.ovsschema            |  6 ++----
>   6 files changed, 17 insertions(+), 14 deletions(-)
> 
> diff --git a/controller/lflow.c b/controller/lflow.c
> index 60aa011ff..877bd49b0 100644
> --- a/controller/lflow.c
> +++ b/controller/lflow.c
> @@ -1744,8 +1744,10 @@ lflow_processing_end:
>       /* Add load balancer hairpin flows if the datapath has any load balancers
>        * associated. */
>       for (size_t i = 0; i < dp->n_load_balancers; i++) {
> -        consider_lb_hairpin_flows(dp->load_balancers[i],
> -                                  l_ctx_in->local_datapaths,
> +        const struct sbrec_load_balancer *lb =
> +            sbrec_load_balancer_get_for_uuid(l_ctx_in->sb_idl,
> +                                             &dp->load_balancers[i]);
> +        consider_lb_hairpin_flows(lb, l_ctx_in->local_datapaths,
>                                     l_ctx_out->flow_table);
>       }
>   
> diff --git a/controller/lflow.h b/controller/lflow.h
> index c17ff6dd4..5ba7af894 100644
> --- a/controller/lflow.h
> +++ b/controller/lflow.h
> @@ -40,6 +40,7 @@
>   #include "openvswitch/list.h"
>   
>   struct ovn_extend_table;
> +struct ovsdb_idl;
>   struct ovsdb_idl_index;
>   struct ovn_desired_flow_table;
>   struct hmap;
> @@ -126,6 +127,7 @@ void lflow_resource_destroy(struct lflow_resource_ref *);
>   void lflow_resource_clear(struct lflow_resource_ref *);
>   
>   struct lflow_ctx_in {
> +    struct ovsdb_idl *sb_idl;
>       struct ovsdb_idl_index *sbrec_multicast_group_by_name_datapath;
>       struct ovsdb_idl_index *sbrec_logical_flow_by_logical_datapath;
>       struct ovsdb_idl_index *sbrec_logical_flow_by_logical_dp_group;
> diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c
> index 9050380f3..24443bad1 100644
> --- a/controller/ovn-controller.c
> +++ b/controller/ovn-controller.c
> @@ -2016,6 +2016,7 @@ init_lflow_ctx(struct engine_node *node,
>           engine_get_input_data("port_groups", node);
>       struct shash *port_groups = &pg_data->port_groups_cs_local;
>   
> +    l_ctx_in->sb_idl = engine_get_context()->ovnsb_idl;
>       l_ctx_in->sbrec_multicast_group_by_name_datapath =
>           sbrec_mc_group_by_name_dp;
>       l_ctx_in->sbrec_logical_flow_by_logical_datapath =
> @@ -3124,6 +3125,7 @@ main(int argc, char *argv[])
>           }
>   
>           struct engine_context eng_ctx = {
> +            .ovnsb_idl = ovnsb_idl_loop.idl,
>               .ovs_idl_txn = ovs_idl_txn,
>               .ovnsb_idl_txn = ovnsb_idl_txn,
>               .client_ctx = &ctrl_engine_ctx
> diff --git a/lib/inc-proc-eng.h b/lib/inc-proc-eng.h
> index 7e9f5bb70..506da51a1 100644
> --- a/lib/inc-proc-eng.h
> +++ b/lib/inc-proc-eng.h
> @@ -64,6 +64,7 @@
>   #define ENGINE_MAX_OVSDB_INDEX 256
>   
>   struct engine_context {
> +    struct ovsdb_idl *ovnsb_idl;
>       struct ovsdb_idl_txn *ovs_idl_txn;
>       struct ovsdb_idl_txn *ovnsb_idl_txn;
>       void *client_ctx;
> diff --git a/northd/ovn-northd.c b/northd/ovn-northd.c
> index 570c6a3ef..8649a590b 100644
> --- a/northd/ovn-northd.c
> +++ b/northd/ovn-northd.c
> @@ -3529,19 +3529,17 @@ build_ovn_lbs(struct northd_context *ctx, struct hmap *datapaths,
>               continue;
>           }
>   
> -        const struct sbrec_load_balancer **sbrec_lbs =
> -            xmalloc(od->nbs->n_load_balancer * sizeof *sbrec_lbs);
> +        struct uuid *lb_uuids =
> +            xmalloc(od->nbs->n_load_balancer * sizeof *lb_uuids);
>           for (size_t i = 0; i < od->nbs->n_load_balancer; i++) {
>               const struct uuid *lb_uuid =
>                   &od->nbs->load_balancer[i]->header_.uuid;
>               lb = ovn_northd_lb_find(lbs, lb_uuid);
> -            sbrec_lbs[i] = lb->slb;
> +            lb_uuids[i] = lb->slb->header_.uuid;
>           }
> -
> -        sbrec_datapath_binding_set_load_balancers(
> -            od->sb, (struct sbrec_load_balancer **)sbrec_lbs,
> -            od->nbs->n_load_balancer);
> -        free(sbrec_lbs);
> +        sbrec_datapath_binding_set_load_balancers(od->sb, lb_uuids,
> +                                                  od->nbs->n_load_balancer);
> +        free(lb_uuids);
>       }
>   }
>   
> diff --git a/ovn-sb.ovsschema b/ovn-sb.ovsschema
> index bbf60781d..57fbc3ca4 100644
> --- a/ovn-sb.ovsschema
> +++ b/ovn-sb.ovsschema
> @@ -1,7 +1,7 @@
>   {
>       "name": "OVN_Southbound",
>       "version": "20.18.0",
> -    "cksum": "1816525029 26536",
> +    "cksum": "779623696 26386",
>       "tables": {
>           "SB_Global": {
>               "columns": {
> @@ -166,9 +166,7 @@
>                        "type": {"key": {"type": "integer",
>                                         "minInteger": 1,
>                                         "maxInteger": 16777215}}},
> -                "load_balancers": {"type": {"key": {"type": "uuid",
> -                                                   "refTable": "Load_Balancer",
> -                                                   "refType": "weak"},
> +                "load_balancers": {"type": {"key": {"type": "uuid"},
>                                               "min": 0,
>                                               "max": "unlimited"}},
>                   "external_ids": {
>
Han Zhou July 9, 2021, 1:46 a.m. UTC | #2
On Thu, Jul 8, 2021 at 1:10 PM Mark Michelson <mmichels@redhat.com> wrote:
>
> Acked-by: Mark Michelson <mmichels@redhat.com>
>
Thanks Dumitru and Mark!

>
> If I'm to extrapolate this a bit more, it sounds like any time a
> Datapath_Binding is updated, it results in the potential for many
> Logical_Flows to be re-calculated. In this particular case, it was
> triggered based on modifying a wide-reaching load balancer. I suppose
> the same thing would happen if Datapath_Binding external_ids were updated.
>
Yes. Luckily it seems external_ids of datapath_binding shouldn't change
very often. But I agree this is a general problem of the I-P.

> I wonder if it's worth investigating trying to create a new reference
> type for the purposes of the IDL. The idea would be that if a column in
> table A references rows in table B, then if the rows in table B are
> updated, then any relevant rows in table A will reported as updated as
> well. However, any columns in tables C, D, E, etc. that reference table
> A will not result in the rows in tables C, D, E, etc. being reported as
> updated.
>
> Essentially, it's the same as what you've done here, except we can have
> the IDL fill in the sbrec_load_balancer in the Datapath_Binding record
> instead of having to look it up when needed.
>
It may be helpful to have a new reference type, but deciding when and where
to use it would be very tricky. If not careful enough, we might be creating
some dependency in the code without getting proper change notification.

> Alternatively, would it be worth listing the relevant columns that a
> particular type "cares" about? For instance, Logical_Datapaths are only
> affected if the tunnel_key is updated on a Datapath_Binding. The
> load_balancers and external_ids are irrelevant. Would it be worthwhile
> to be able to note in the reference the columns that should elicit an
> update?

This sounds more reliable. Ideally, if referenced columns are noted, the
generated IDL structure of A's field B should only contains the columns
that are interested by A. This way we can make sure it is impossible to
secretly use a column without being notified of updates that possibly need
to be handled by I-P. I am not sure how hard this would be.

>
> Is it worth it? Or is this such a niche use case that it's not worth
> implementing?

I think we can see if there are more such scenarios that justify the IDL
change. I am ok with changing the schema for the current use case to
replace the reference by just raw uuid. However, we should never directly
use IDL from the ctx and access any table, this would break the I-P
dependency tracking. Please see more comments inlined.

>
> On 7/7/21 10:56 AM, Dumitru Ceara wrote:
> > Whenever a Load_Balancer is updated, e.g., a VIP is added, the following
> > sequence of events happens:
> >
> > 1. The Southbound Load_Balancer record is updated.
> > 2. The Southbound Datapath_Binding records on which the Load_Balancer is
> >     applied are updated.
> > 3. Southbound ovsdb-server sends updates about the Load_Balancer and
> >     Datapath_Binding records to ovn-controller.
> > 4. The IDL layer in ovn-controller processes the updates at #3, but
> >     because of the SB schema references between tables [0] all logical
> >     flows referencing the updated Datapath_Binding are marked as
> >     "updated".  The same is true for Logical_DP_Group records
> >     referencing the Datapath_Binding, and also for all logical flows
> >     pointing to the new "updated" datapath groups.
> > 5. ovn-controller ends up recomputing (removing/readding) all flows for
> >     all these tracked updates.
> >
> >  From the SB Schema:
> >          "Datapath_Binding": {
> >              "columns": {
> >                  [...]
> >                  "load_balancers": {"type": {"key": {"type": "uuid",
> >                                                     "refTable":
"Load_Balancer",
> >                                                     "refType": "weak"},
> >                                              "min": 0,
> >                                              "max": "unlimited"}},
> >          [...]
> >          "Load_Balancer": {
> >              "columns": {
> >                  "datapaths": {
> >                  [...]
> >                      "type": {"key": {"type": "uuid",
> >                                       "refTable": "Datapath_Binding"},
> >                               "min": 0, "max": "unlimited"}},
> >          [...]
> >          "Logical_DP_Group": {
> >              "columns": {
> >                  "datapaths":
> >                      {"type": {"key": {"type": "uuid",
> >                                        "refTable": "Datapath_Binding",
> >                                        "refType": "weak"},
> >                                "min": 0, "max": "unlimited"}}},
> >          [...]
> >          "Logical_Flow": {
> >              "columns": {
> >                  "logical_datapath":
> >                      {"type": {"key": {"type": "uuid",
> >                                        "refTable": "Datapath_Binding"},
> >                                "min": 0, "max": 1}},
> >                  "logical_dp_group":
> >                      {"type": {"key": {"type": "uuid",
> >                                        "refTable": "Logical_DP_Group"},
> >
> > In order to avoid this unnecessary Logical_Flow notification storm we
> > now remove the explicit reference from Datapath_Binding to
> > Load_Balancer and instead store raw UUIDs.
> >
> > This means that on the ovn-controller side we need to perform a
> > Load_Balancer table lookup by UUID whenever a new datapath is added,
> > but that doesn't happen too often and the cost of the lookup is
> > negligible compared to the huge cost of processing the unnecessary
> > logical flow updates.
> >
> > This change is backwards compatible because the contents stored in the
> > database are not changed, just that the schema constraints are relaxed a
> > bit.
> >
> > Some performance measurements, on a scale test deployment simulating an
> > ovn-kubernetes deployment with 120 nodes and a large load balancer
> > with 16K VIPs associated to each node's logical switch, the event
> > processing loop time in ovn-controller, when adding a new VIP, is
> > reduced from ~39 seconds to ~8 seconds.
> >
> > There's no need to change the northd DDlog implementation.
> >
> > Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=1978605
> > Signed-off-by: Dumitru Ceara <dceara@redhat.com>
> > ---
> >   controller/lflow.c          |  6 ++++--
> >   controller/lflow.h          |  2 ++
> >   controller/ovn-controller.c |  2 ++
> >   lib/inc-proc-eng.h          |  1 +
> >   northd/ovn-northd.c         | 14 ++++++--------
> >   ovn-sb.ovsschema            |  6 ++----
> >   6 files changed, 17 insertions(+), 14 deletions(-)
> >
> > diff --git a/controller/lflow.c b/controller/lflow.c
> > index 60aa011ff..877bd49b0 100644
> > --- a/controller/lflow.c
> > +++ b/controller/lflow.c
> > @@ -1744,8 +1744,10 @@ lflow_processing_end:
> >       /* Add load balancer hairpin flows if the datapath has any load
balancers
> >        * associated. */
> >       for (size_t i = 0; i < dp->n_load_balancers; i++) {
> > -        consider_lb_hairpin_flows(dp->load_balancers[i],
> > -                                  l_ctx_in->local_datapaths,
> > +        const struct sbrec_load_balancer *lb =
> > +            sbrec_load_balancer_get_for_uuid(l_ctx_in->sb_idl,
> > +                                             &dp->load_balancers[i]);

We should never directly access DB tables this way within the I-P engine.
We will lose track of the dependencies. Now that we removed the reference,
we should add load_balancer table as a new dependency, get the table data
through EN_OVSDB_GET, and explicitly add a change handler to avoid
recompute.

> > +        consider_lb_hairpin_flows(lb, l_ctx_in->local_datapaths,
> >                                     l_ctx_out->flow_table);
> >       }
> >
> > diff --git a/controller/lflow.h b/controller/lflow.h
> > index c17ff6dd4..5ba7af894 100644
> > --- a/controller/lflow.h
> > +++ b/controller/lflow.h
> > @@ -40,6 +40,7 @@
> >   #include "openvswitch/list.h"
> >
> >   struct ovn_extend_table;
> > +struct ovsdb_idl;
> >   struct ovsdb_idl_index;
> >   struct ovn_desired_flow_table;
> >   struct hmap;
> > @@ -126,6 +127,7 @@ void lflow_resource_destroy(struct
lflow_resource_ref *);
> >   void lflow_resource_clear(struct lflow_resource_ref *);
> >
> >   struct lflow_ctx_in {
> > +    struct ovsdb_idl *sb_idl;
> >       struct ovsdb_idl_index *sbrec_multicast_group_by_name_datapath;
> >       struct ovsdb_idl_index *sbrec_logical_flow_by_logical_datapath;
> >       struct ovsdb_idl_index *sbrec_logical_flow_by_logical_dp_group;
> > diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c
> > index 9050380f3..24443bad1 100644
> > --- a/controller/ovn-controller.c
> > +++ b/controller/ovn-controller.c
> > @@ -2016,6 +2016,7 @@ init_lflow_ctx(struct engine_node *node,
> >           engine_get_input_data("port_groups", node);
> >       struct shash *port_groups = &pg_data->port_groups_cs_local;
> >
> > +    l_ctx_in->sb_idl = engine_get_context()->ovnsb_idl;
> >       l_ctx_in->sbrec_multicast_group_by_name_datapath =
> >           sbrec_mc_group_by_name_dp;
> >       l_ctx_in->sbrec_logical_flow_by_logical_datapath =
> > @@ -3124,6 +3125,7 @@ main(int argc, char *argv[])
> >           }
> >
> >           struct engine_context eng_ctx = {
> > +            .ovnsb_idl = ovnsb_idl_loop.idl,
> >               .ovs_idl_txn = ovs_idl_txn,
> >               .ovnsb_idl_txn = ovnsb_idl_txn,
> >               .client_ctx = &ctrl_engine_ctx
> > diff --git a/lib/inc-proc-eng.h b/lib/inc-proc-eng.h
> > index 7e9f5bb70..506da51a1 100644
> > --- a/lib/inc-proc-eng.h
> > +++ b/lib/inc-proc-eng.h
> > @@ -64,6 +64,7 @@
> >   #define ENGINE_MAX_OVSDB_INDEX 256
> >
> >   struct engine_context {
> > +    struct ovsdb_idl *ovnsb_idl;
> >       struct ovsdb_idl_txn *ovs_idl_txn;
> >       struct ovsdb_idl_txn *ovnsb_idl_txn;
> >       void *client_ctx;
> > diff --git a/northd/ovn-northd.c b/northd/ovn-northd.c
> > index 570c6a3ef..8649a590b 100644
> > --- a/northd/ovn-northd.c
> > +++ b/northd/ovn-northd.c
> > @@ -3529,19 +3529,17 @@ build_ovn_lbs(struct northd_context *ctx,
struct hmap *datapaths,
> >               continue;
> >           }
> >
> > -        const struct sbrec_load_balancer **sbrec_lbs =
> > -            xmalloc(od->nbs->n_load_balancer * sizeof *sbrec_lbs);
> > +        struct uuid *lb_uuids =
> > +            xmalloc(od->nbs->n_load_balancer * sizeof *lb_uuids);
> >           for (size_t i = 0; i < od->nbs->n_load_balancer; i++) {
> >               const struct uuid *lb_uuid =
> >                   &od->nbs->load_balancer[i]->header_.uuid;
> >               lb = ovn_northd_lb_find(lbs, lb_uuid);
> > -            sbrec_lbs[i] = lb->slb;
> > +            lb_uuids[i] = lb->slb->header_.uuid;
> >           }
> > -
> > -        sbrec_datapath_binding_set_load_balancers(
> > -            od->sb, (struct sbrec_load_balancer **)sbrec_lbs,
> > -            od->nbs->n_load_balancer);
> > -        free(sbrec_lbs);
> > +        sbrec_datapath_binding_set_load_balancers(od->sb, lb_uuids,
> > +
 od->nbs->n_load_balancer);
> > +        free(lb_uuids);

I believe we need to change the deletion part as well. Now that we don't
have the weak reference, we need to take care of the LB deletions by
removing the reference explicitly from table datapath_binding.

Thanks,
Han


> >       }
> >   }
> >
> > diff --git a/ovn-sb.ovsschema b/ovn-sb.ovsschema
> > index bbf60781d..57fbc3ca4 100644
> > --- a/ovn-sb.ovsschema
> > +++ b/ovn-sb.ovsschema
> > @@ -1,7 +1,7 @@
> >   {
> >       "name": "OVN_Southbound",
> >       "version": "20.18.0",
> > -    "cksum": "1816525029 26536",
> > +    "cksum": "779623696 26386",
> >       "tables": {
> >           "SB_Global": {
> >               "columns": {
> > @@ -166,9 +166,7 @@
> >                        "type": {"key": {"type": "integer",
> >                                         "minInteger": 1,
> >                                         "maxInteger": 16777215}}},
> > -                "load_balancers": {"type": {"key": {"type": "uuid",
> > -                                                   "refTable":
"Load_Balancer",
> > -                                                   "refType": "weak"},
> > +                "load_balancers": {"type": {"key": {"type": "uuid"},
> >                                               "min": 0,
> >                                               "max": "unlimited"}},
> >                   "external_ids": {
> >
>
> _______________________________________________
> dev mailing list
> dev@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Dumitru Ceara July 9, 2021, 8:01 a.m. UTC | #3
On 7/9/21 3:46 AM, Han Zhou wrote:
> On Thu, Jul 8, 2021 at 1:10 PM Mark Michelson <mmichels@redhat.com> wrote:
>>
>> Acked-by: Mark Michelson <mmichels@redhat.com>
>>
> Thanks Dumitru and Mark!
> 

Hi Mark and Han,

Thanks for the reviews!

>>
>> If I'm to extrapolate this a bit more, it sounds like any time a
>> Datapath_Binding is updated, it results in the potential for many
>> Logical_Flows to be re-calculated. In this particular case, it was
>> triggered based on modifying a wide-reaching load balancer. I suppose
>> the same thing would happen if Datapath_Binding external_ids were updated.
>>
> Yes. Luckily it seems external_ids of datapath_binding shouldn't change
> very often. But I agree this is a general problem of the I-P.
> 

I see it in the same way.  I-P doesn't give per column change tracking
information.

>> I wonder if it's worth investigating trying to create a new reference
>> type for the purposes of the IDL. The idea would be that if a column in
>> table A references rows in table B, then if the rows in table B are
>> updated, then any relevant rows in table A will reported as updated as
>> well. However, any columns in tables C, D, E, etc. that reference table
>> A will not result in the rows in tables C, D, E, etc. being reported as
>> updated.
>>
>> Essentially, it's the same as what you've done here, except we can have
>> the IDL fill in the sbrec_load_balancer in the Datapath_Binding record
>> instead of having to look it up when needed.
>>
> It may be helpful to have a new reference type, but deciding when and where
> to use it would be very tricky. If not careful enough, we might be creating
> some dependency in the code without getting proper change notification.
> 
>> Alternatively, would it be worth listing the relevant columns that a
>> particular type "cares" about? For instance, Logical_Datapaths are only
>> affected if the tunnel_key is updated on a Datapath_Binding. The
>> load_balancers and external_ids are irrelevant. Would it be worthwhile
>> to be able to note in the reference the columns that should elicit an
>> update?
> 
> This sounds more reliable. Ideally, if referenced columns are noted, the
> generated IDL structure of A's field B should only contains the columns
> that are interested by A. This way we can make sure it is impossible to
> secretly use a column without being notified of updates that possibly need
> to be handled by I-P. I am not sure how hard this would be.
> 

I think this is part of a larger discussion (there were some started a
while back) about refactoring ovn-controller in general.  As far as I've
seen there are multiple options:

- keep using the IDL and enhance it (and the schema), suggested above
- stop using IDL, use ovsdb-cs directly instead and implement a new type
of incremental processing in ovn-controller.  This can be either ddlog
on top of ovsdb-cs (like ovn-northd-ddlog does), or something else.

>>
>> Is it worth it? Or is this such a niche use case that it's not worth
>> implementing?
> 
> I think we can see if there are more such scenarios that justify the IDL
> change. I am ok with changing the schema for the current use case to
> replace the reference by just raw uuid. However, we should never directly
> use IDL from the ctx and access any table, this would break the I-P
> dependency tracking. Please see more comments inlined.
> 

In the general case I tend to agree but in this specific case
Datapath_Binding doesn't really depend on Load_Balancer, it just uses
those uuids as hints.  Please see my replies inline.

>>
>> On 7/7/21 10:56 AM, Dumitru Ceara wrote:
>>> Whenever a Load_Balancer is updated, e.g., a VIP is added, the following
>>> sequence of events happens:
>>>
>>> 1. The Southbound Load_Balancer record is updated.
>>> 2. The Southbound Datapath_Binding records on which the Load_Balancer is
>>>     applied are updated.
>>> 3. Southbound ovsdb-server sends updates about the Load_Balancer and
>>>     Datapath_Binding records to ovn-controller.
>>> 4. The IDL layer in ovn-controller processes the updates at #3, but
>>>     because of the SB schema references between tables [0] all logical
>>>     flows referencing the updated Datapath_Binding are marked as
>>>     "updated".  The same is true for Logical_DP_Group records
>>>     referencing the Datapath_Binding, and also for all logical flows
>>>     pointing to the new "updated" datapath groups.
>>> 5. ovn-controller ends up recomputing (removing/readding) all flows for
>>>     all these tracked updates.
>>>
>>>  From the SB Schema:
>>>          "Datapath_Binding": {
>>>              "columns": {
>>>                  [...]
>>>                  "load_balancers": {"type": {"key": {"type": "uuid",
>>>                                                     "refTable":
> "Load_Balancer",
>>>                                                     "refType": "weak"},
>>>                                              "min": 0,
>>>                                              "max": "unlimited"}},
>>>          [...]
>>>          "Load_Balancer": {
>>>              "columns": {
>>>                  "datapaths": {
>>>                  [...]
>>>                      "type": {"key": {"type": "uuid",
>>>                                       "refTable": "Datapath_Binding"},
>>>                               "min": 0, "max": "unlimited"}},
>>>          [...]
>>>          "Logical_DP_Group": {
>>>              "columns": {
>>>                  "datapaths":
>>>                      {"type": {"key": {"type": "uuid",
>>>                                        "refTable": "Datapath_Binding",
>>>                                        "refType": "weak"},
>>>                                "min": 0, "max": "unlimited"}}},
>>>          [...]
>>>          "Logical_Flow": {
>>>              "columns": {
>>>                  "logical_datapath":
>>>                      {"type": {"key": {"type": "uuid",
>>>                                        "refTable": "Datapath_Binding"},
>>>                                "min": 0, "max": 1}},
>>>                  "logical_dp_group":
>>>                      {"type": {"key": {"type": "uuid",
>>>                                        "refTable": "Logical_DP_Group"},
>>>
>>> In order to avoid this unnecessary Logical_Flow notification storm we
>>> now remove the explicit reference from Datapath_Binding to
>>> Load_Balancer and instead store raw UUIDs.
>>>
>>> This means that on the ovn-controller side we need to perform a
>>> Load_Balancer table lookup by UUID whenever a new datapath is added,
>>> but that doesn't happen too often and the cost of the lookup is
>>> negligible compared to the huge cost of processing the unnecessary
>>> logical flow updates.
>>>
>>> This change is backwards compatible because the contents stored in the
>>> database are not changed, just that the schema constraints are relaxed a
>>> bit.
>>>
>>> Some performance measurements, on a scale test deployment simulating an
>>> ovn-kubernetes deployment with 120 nodes and a large load balancer
>>> with 16K VIPs associated to each node's logical switch, the event
>>> processing loop time in ovn-controller, when adding a new VIP, is
>>> reduced from ~39 seconds to ~8 seconds.
>>>
>>> There's no need to change the northd DDlog implementation.
>>>
>>> Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=1978605
>>> Signed-off-by: Dumitru Ceara <dceara@redhat.com>
>>> ---
>>>   controller/lflow.c          |  6 ++++--
>>>   controller/lflow.h          |  2 ++
>>>   controller/ovn-controller.c |  2 ++
>>>   lib/inc-proc-eng.h          |  1 +
>>>   northd/ovn-northd.c         | 14 ++++++--------
>>>   ovn-sb.ovsschema            |  6 ++----
>>>   6 files changed, 17 insertions(+), 14 deletions(-)
>>>
>>> diff --git a/controller/lflow.c b/controller/lflow.c
>>> index 60aa011ff..877bd49b0 100644
>>> --- a/controller/lflow.c
>>> +++ b/controller/lflow.c
>>> @@ -1744,8 +1744,10 @@ lflow_processing_end:
>>>       /* Add load balancer hairpin flows if the datapath has any load
> balancers
>>>        * associated. */
>>>       for (size_t i = 0; i < dp->n_load_balancers; i++) {
>>> -        consider_lb_hairpin_flows(dp->load_balancers[i],
>>> -                                  l_ctx_in->local_datapaths,
>>> +        const struct sbrec_load_balancer *lb =
>>> +            sbrec_load_balancer_get_for_uuid(l_ctx_in->sb_idl,
>>> +                                             &dp->load_balancers[i]);
> 
> We should never directly access DB tables this way within the I-P engine.
> We will lose track of the dependencies. Now that we removed the reference,
> we should add load_balancer table as a new dependency, get the table data
> through EN_OVSDB_GET, and explicitly add a change handler to avoid
> recompute.
> 

There's some confusion here.

Datapath doesn't really depend on Load_Balancer.  From the beginning
there shouldn't have been a reference from Datapath_Binding to
Load_Balancer.

As a matter of fact NB.Load_Balancer used to be completely handled by
ovn-northd and would generate logical flows.  So the real dependency is
"en_lflow_output" depends on SB load balancers.

We already have an I-P handler for that:
lflow_output_sb_load_balancer_handler()

We don't *really* need to store load_balancer references in
Datapath_Binding.  The only reason they were added there when
Load_Balancer flow generation was moved to ovn-controller was to avoid
walking all load balancers whenever a datapath was added.

However, when a load balancer is changed there's no change to be handled
for datapaths themselves.

As a matter of fact, an alternative implementation (that obviously
doesn't scale well) is:

SBREC_LOAD_BALANCER_TABLE_FOR_EACH (lb, l_ctx_in->lb_table) {
   for (size_t i = 0; i < lb->n_datapaths; i++) {
      if (lb->datapaths[i] == dp) {
          consider_lb_hairpin_flows(lb, ...);
          break;
      }
   }
}

To avoid this potentially expensive table walk, we use the load_balancer
uuids stored in the datapath record itself (it's probably best to see
those as hints I guess).

>>> +        consider_lb_hairpin_flows(lb, l_ctx_in->local_datapaths,
>>>                                     l_ctx_out->flow_table);
>>>       }
>>>
>>> diff --git a/controller/lflow.h b/controller/lflow.h
>>> index c17ff6dd4..5ba7af894 100644
>>> --- a/controller/lflow.h
>>> +++ b/controller/lflow.h
>>> @@ -40,6 +40,7 @@
>>>   #include "openvswitch/list.h"
>>>
>>>   struct ovn_extend_table;
>>> +struct ovsdb_idl;
>>>   struct ovsdb_idl_index;
>>>   struct ovn_desired_flow_table;
>>>   struct hmap;
>>> @@ -126,6 +127,7 @@ void lflow_resource_destroy(struct
> lflow_resource_ref *);
>>>   void lflow_resource_clear(struct lflow_resource_ref *);
>>>
>>>   struct lflow_ctx_in {
>>> +    struct ovsdb_idl *sb_idl;
>>>       struct ovsdb_idl_index *sbrec_multicast_group_by_name_datapath;
>>>       struct ovsdb_idl_index *sbrec_logical_flow_by_logical_datapath;
>>>       struct ovsdb_idl_index *sbrec_logical_flow_by_logical_dp_group;
>>> diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c
>>> index 9050380f3..24443bad1 100644
>>> --- a/controller/ovn-controller.c
>>> +++ b/controller/ovn-controller.c
>>> @@ -2016,6 +2016,7 @@ init_lflow_ctx(struct engine_node *node,
>>>           engine_get_input_data("port_groups", node);
>>>       struct shash *port_groups = &pg_data->port_groups_cs_local;
>>>
>>> +    l_ctx_in->sb_idl = engine_get_context()->ovnsb_idl;
>>>       l_ctx_in->sbrec_multicast_group_by_name_datapath =
>>>           sbrec_mc_group_by_name_dp;
>>>       l_ctx_in->sbrec_logical_flow_by_logical_datapath =
>>> @@ -3124,6 +3125,7 @@ main(int argc, char *argv[])
>>>           }
>>>
>>>           struct engine_context eng_ctx = {
>>> +            .ovnsb_idl = ovnsb_idl_loop.idl,
>>>               .ovs_idl_txn = ovs_idl_txn,
>>>               .ovnsb_idl_txn = ovnsb_idl_txn,
>>>               .client_ctx = &ctrl_engine_ctx
>>> diff --git a/lib/inc-proc-eng.h b/lib/inc-proc-eng.h
>>> index 7e9f5bb70..506da51a1 100644
>>> --- a/lib/inc-proc-eng.h
>>> +++ b/lib/inc-proc-eng.h
>>> @@ -64,6 +64,7 @@
>>>   #define ENGINE_MAX_OVSDB_INDEX 256
>>>
>>>   struct engine_context {
>>> +    struct ovsdb_idl *ovnsb_idl;
>>>       struct ovsdb_idl_txn *ovs_idl_txn;
>>>       struct ovsdb_idl_txn *ovnsb_idl_txn;
>>>       void *client_ctx;
>>> diff --git a/northd/ovn-northd.c b/northd/ovn-northd.c
>>> index 570c6a3ef..8649a590b 100644
>>> --- a/northd/ovn-northd.c
>>> +++ b/northd/ovn-northd.c
>>> @@ -3529,19 +3529,17 @@ build_ovn_lbs(struct northd_context *ctx,
> struct hmap *datapaths,
>>>               continue;
>>>           }
>>>
>>> -        const struct sbrec_load_balancer **sbrec_lbs =
>>> -            xmalloc(od->nbs->n_load_balancer * sizeof *sbrec_lbs);
>>> +        struct uuid *lb_uuids =
>>> +            xmalloc(od->nbs->n_load_balancer * sizeof *lb_uuids);
>>>           for (size_t i = 0; i < od->nbs->n_load_balancer; i++) {
>>>               const struct uuid *lb_uuid =
>>>                   &od->nbs->load_balancer[i]->header_.uuid;
>>>               lb = ovn_northd_lb_find(lbs, lb_uuid);
>>> -            sbrec_lbs[i] = lb->slb;
>>> +            lb_uuids[i] = lb->slb->header_.uuid;
>>>           }
>>> -
>>> -        sbrec_datapath_binding_set_load_balancers(
>>> -            od->sb, (struct sbrec_load_balancer **)sbrec_lbs,
>>> -            od->nbs->n_load_balancer);
>>> -        free(sbrec_lbs);
>>> +        sbrec_datapath_binding_set_load_balancers(od->sb, lb_uuids,
>>> +
>  od->nbs->n_load_balancer);
>>> +        free(lb_uuids);
> 
> I believe we need to change the deletion part as well. Now that we don't
> have the weak reference, we need to take care of the LB deletions by
> removing the reference explicitly from table datapath_binding.

This is handling both addition and deletion, it's just storing the list
of load balancers that are currently associated to the logical datapath.

If a load balancer was deleted it won't be in the list.

We already have tests validating this works fine:

https://github.com/ovn-org/ovn/blob/ecc941c70f1675b66f6ecb2cdf09df362f62c97a/tests/ovn-northd.at#L2432

> 
> Thanks,
> Han
> 
> 

Thanks,
Dumitru

>>>       }
>>>   }
>>>
>>> diff --git a/ovn-sb.ovsschema b/ovn-sb.ovsschema
>>> index bbf60781d..57fbc3ca4 100644
>>> --- a/ovn-sb.ovsschema
>>> +++ b/ovn-sb.ovsschema
>>> @@ -1,7 +1,7 @@
>>>   {
>>>       "name": "OVN_Southbound",
>>>       "version": "20.18.0",
>>> -    "cksum": "1816525029 26536",
>>> +    "cksum": "779623696 26386",
>>>       "tables": {
>>>           "SB_Global": {
>>>               "columns": {
>>> @@ -166,9 +166,7 @@
>>>                        "type": {"key": {"type": "integer",
>>>                                         "minInteger": 1,
>>>                                         "maxInteger": 16777215}}},
>>> -                "load_balancers": {"type": {"key": {"type": "uuid",
>>> -                                                   "refTable":
> "Load_Balancer",
>>> -                                                   "refType": "weak"},
>>> +                "load_balancers": {"type": {"key": {"type": "uuid"},
>>>                                               "min": 0,
>>>                                               "max": "unlimited"}},
>>>                   "external_ids": {
>>>
>>
>> _______________________________________________
>> dev mailing list
>> dev@openvswitch.org
>> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>
Han Zhou July 9, 2021, 4:11 p.m. UTC | #4
On Fri, Jul 9, 2021 at 1:01 AM Dumitru Ceara <dceara@redhat.com> wrote:
>
> On 7/9/21 3:46 AM, Han Zhou wrote:
> > On Thu, Jul 8, 2021 at 1:10 PM Mark Michelson <mmichels@redhat.com>
wrote:
> >>
> >> Acked-by: Mark Michelson <mmichels@redhat.com>
> >>
> > Thanks Dumitru and Mark!
> >
>
> Hi Mark and Han,
>
> Thanks for the reviews!
>
> >>
> >> If I'm to extrapolate this a bit more, it sounds like any time a
> >> Datapath_Binding is updated, it results in the potential for many
> >> Logical_Flows to be re-calculated. In this particular case, it was
> >> triggered based on modifying a wide-reaching load balancer. I suppose
> >> the same thing would happen if Datapath_Binding external_ids were
updated.
> >>
> > Yes. Luckily it seems external_ids of datapath_binding shouldn't change
> > very often. But I agree this is a general problem of the I-P.
> >
>
> I see it in the same way.  I-P doesn't give per column change tracking
> information.
>
> >> I wonder if it's worth investigating trying to create a new reference
> >> type for the purposes of the IDL. The idea would be that if a column in
> >> table A references rows in table B, then if the rows in table B are
> >> updated, then any relevant rows in table A will reported as updated as
> >> well. However, any columns in tables C, D, E, etc. that reference table
> >> A will not result in the rows in tables C, D, E, etc. being reported as
> >> updated.
> >>
> >> Essentially, it's the same as what you've done here, except we can have
> >> the IDL fill in the sbrec_load_balancer in the Datapath_Binding record
> >> instead of having to look it up when needed.
> >>
> > It may be helpful to have a new reference type, but deciding when and
where
> > to use it would be very tricky. If not careful enough, we might be
creating
> > some dependency in the code without getting proper change notification.
> >
> >> Alternatively, would it be worth listing the relevant columns that a
> >> particular type "cares" about? For instance, Logical_Datapaths are only
> >> affected if the tunnel_key is updated on a Datapath_Binding. The
> >> load_balancers and external_ids are irrelevant. Would it be worthwhile
> >> to be able to note in the reference the columns that should elicit an
> >> update?
> >
> > This sounds more reliable. Ideally, if referenced columns are noted, the
> > generated IDL structure of A's field B should only contains the columns
> > that are interested by A. This way we can make sure it is impossible to
> > secretly use a column without being notified of updates that possibly
need
> > to be handled by I-P. I am not sure how hard this would be.
> >
>
> I think this is part of a larger discussion (there were some started a
> while back) about refactoring ovn-controller in general.  As far as I've
> seen there are multiple options:
>
> - keep using the IDL and enhance it (and the schema), suggested above
> - stop using IDL, use ovsdb-cs directly instead and implement a new type
> of incremental processing in ovn-controller.  This can be either ddlog
> on top of ovsdb-cs (like ovn-northd-ddlog does), or something else.
>
> >>
> >> Is it worth it? Or is this such a niche use case that it's not worth
> >> implementing?
> >
> > I think we can see if there are more such scenarios that justify the IDL
> > change. I am ok with changing the schema for the current use case to
> > replace the reference by just raw uuid. However, we should never
directly
> > use IDL from the ctx and access any table, this would break the I-P
> > dependency tracking. Please see more comments inlined.
> >
>
> In the general case I tend to agree but in this specific case
> Datapath_Binding doesn't really depend on Load_Balancer, it just uses
> those uuids as hints.  Please see my replies inline.
>
> >>
> >> On 7/7/21 10:56 AM, Dumitru Ceara wrote:
> >>> Whenever a Load_Balancer is updated, e.g., a VIP is added, the
following
> >>> sequence of events happens:
> >>>
> >>> 1. The Southbound Load_Balancer record is updated.
> >>> 2. The Southbound Datapath_Binding records on which the Load_Balancer
is
> >>>     applied are updated.
> >>> 3. Southbound ovsdb-server sends updates about the Load_Balancer and
> >>>     Datapath_Binding records to ovn-controller.
> >>> 4. The IDL layer in ovn-controller processes the updates at #3, but
> >>>     because of the SB schema references between tables [0] all logical
> >>>     flows referencing the updated Datapath_Binding are marked as
> >>>     "updated".  The same is true for Logical_DP_Group records
> >>>     referencing the Datapath_Binding, and also for all logical flows
> >>>     pointing to the new "updated" datapath groups.
> >>> 5. ovn-controller ends up recomputing (removing/readding) all flows
for
> >>>     all these tracked updates.
> >>>
> >>>  From the SB Schema:
> >>>          "Datapath_Binding": {
> >>>              "columns": {
> >>>                  [...]
> >>>                  "load_balancers": {"type": {"key": {"type": "uuid",
> >>>                                                     "refTable":
> > "Load_Balancer",
> >>>                                                     "refType":
"weak"},
> >>>                                              "min": 0,
> >>>                                              "max": "unlimited"}},
> >>>          [...]
> >>>          "Load_Balancer": {
> >>>              "columns": {
> >>>                  "datapaths": {
> >>>                  [...]
> >>>                      "type": {"key": {"type": "uuid",
> >>>                                       "refTable": "Datapath_Binding"},
> >>>                               "min": 0, "max": "unlimited"}},
> >>>          [...]
> >>>          "Logical_DP_Group": {
> >>>              "columns": {
> >>>                  "datapaths":
> >>>                      {"type": {"key": {"type": "uuid",
> >>>                                        "refTable": "Datapath_Binding",
> >>>                                        "refType": "weak"},
> >>>                                "min": 0, "max": "unlimited"}}},
> >>>          [...]
> >>>          "Logical_Flow": {
> >>>              "columns": {
> >>>                  "logical_datapath":
> >>>                      {"type": {"key": {"type": "uuid",
> >>>                                        "refTable":
"Datapath_Binding"},
> >>>                                "min": 0, "max": 1}},
> >>>                  "logical_dp_group":
> >>>                      {"type": {"key": {"type": "uuid",
> >>>                                        "refTable":
"Logical_DP_Group"},
> >>>
> >>> In order to avoid this unnecessary Logical_Flow notification storm we
> >>> now remove the explicit reference from Datapath_Binding to
> >>> Load_Balancer and instead store raw UUIDs.
> >>>
> >>> This means that on the ovn-controller side we need to perform a
> >>> Load_Balancer table lookup by UUID whenever a new datapath is added,
> >>> but that doesn't happen too often and the cost of the lookup is
> >>> negligible compared to the huge cost of processing the unnecessary
> >>> logical flow updates.
> >>>
> >>> This change is backwards compatible because the contents stored in the
> >>> database are not changed, just that the schema constraints are
relaxed a
> >>> bit.
> >>>
> >>> Some performance measurements, on a scale test deployment simulating
an
> >>> ovn-kubernetes deployment with 120 nodes and a large load balancer
> >>> with 16K VIPs associated to each node's logical switch, the event
> >>> processing loop time in ovn-controller, when adding a new VIP, is
> >>> reduced from ~39 seconds to ~8 seconds.
> >>>
> >>> There's no need to change the northd DDlog implementation.
> >>>
> >>> Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=1978605
> >>> Signed-off-by: Dumitru Ceara <dceara@redhat.com>
> >>> ---
> >>>   controller/lflow.c          |  6 ++++--
> >>>   controller/lflow.h          |  2 ++
> >>>   controller/ovn-controller.c |  2 ++
> >>>   lib/inc-proc-eng.h          |  1 +
> >>>   northd/ovn-northd.c         | 14 ++++++--------
> >>>   ovn-sb.ovsschema            |  6 ++----
> >>>   6 files changed, 17 insertions(+), 14 deletions(-)
> >>>
> >>> diff --git a/controller/lflow.c b/controller/lflow.c
> >>> index 60aa011ff..877bd49b0 100644
> >>> --- a/controller/lflow.c
> >>> +++ b/controller/lflow.c
> >>> @@ -1744,8 +1744,10 @@ lflow_processing_end:
> >>>       /* Add load balancer hairpin flows if the datapath has any load
> > balancers
> >>>        * associated. */
> >>>       for (size_t i = 0; i < dp->n_load_balancers; i++) {
> >>> -        consider_lb_hairpin_flows(dp->load_balancers[i],
> >>> -                                  l_ctx_in->local_datapaths,
> >>> +        const struct sbrec_load_balancer *lb =
> >>> +            sbrec_load_balancer_get_for_uuid(l_ctx_in->sb_idl,
> >>> +                                             &dp->load_balancers[i]);
> >
> > We should never directly access DB tables this way within the I-P
engine.
> > We will lose track of the dependencies. Now that we removed the
reference,
> > we should add load_balancer table as a new dependency, get the table
data
> > through EN_OVSDB_GET, and explicitly add a change handler to avoid
> > recompute.
> >
>
> There's some confusion here.
>
> Datapath doesn't really depend on Load_Balancer.  From the beginning
> there shouldn't have been a reference from Datapath_Binding to
> Load_Balancer.
>
> As a matter of fact NB.Load_Balancer used to be completely handled by
> ovn-northd and would generate logical flows.  So the real dependency is
> "en_lflow_output" depends on SB load balancers.
>
> We already have an I-P handler for that:
> lflow_output_sb_load_balancer_handler()
>
> We don't *really* need to store load_balancer references in
> Datapath_Binding.  The only reason they were added there when
> Load_Balancer flow generation was moved to ovn-controller was to avoid
> walking all load balancers whenever a datapath was added.
>
> However, when a load balancer is changed there's no change to be handled
> for datapaths themselves.
>
> As a matter of fact, an alternative implementation (that obviously
> doesn't scale well) is:
>
> SBREC_LOAD_BALANCER_TABLE_FOR_EACH (lb, l_ctx_in->lb_table) {
>    for (size_t i = 0; i < lb->n_datapaths; i++) {
>       if (lb->datapaths[i] == dp) {
>           consider_lb_hairpin_flows(lb, ...);
>           break;
>       }
>    }
> }
>
> To avoid this potentially expensive table walk, we use the load_balancer
> uuids stored in the datapath record itself (it's probably best to see
> those as hints I guess).
>
Thanks for the explain. What you described is indeed a dependency between
lflow and sb_load_balancer because in lflow's compute/change handlers
sb_load_balancer data is required. (otherwise we would not need to call
sbrec_load_balancer_get_for_uuid().

However, since this dependency is already captured in the I-P, it is just
easy for this use case. We should simply use
sbrec_load_balancer_table_get_for_uuid() instead, which takes struct
sbrec_load_balancer_table* as argument and we already have it in the
lflow_ctx_in.lb_table as the input to lflow engine node.

> >>> +        consider_lb_hairpin_flows(lb, l_ctx_in->local_datapaths,
> >>>                                     l_ctx_out->flow_table);
> >>>       }
> >>>
> >>> diff --git a/controller/lflow.h b/controller/lflow.h
> >>> index c17ff6dd4..5ba7af894 100644
> >>> --- a/controller/lflow.h
> >>> +++ b/controller/lflow.h
> >>> @@ -40,6 +40,7 @@
> >>>   #include "openvswitch/list.h"
> >>>
> >>>   struct ovn_extend_table;
> >>> +struct ovsdb_idl;
> >>>   struct ovsdb_idl_index;
> >>>   struct ovn_desired_flow_table;
> >>>   struct hmap;
> >>> @@ -126,6 +127,7 @@ void lflow_resource_destroy(struct
> > lflow_resource_ref *);
> >>>   void lflow_resource_clear(struct lflow_resource_ref *);
> >>>
> >>>   struct lflow_ctx_in {
> >>> +    struct ovsdb_idl *sb_idl;
> >>>       struct ovsdb_idl_index *sbrec_multicast_group_by_name_datapath;
> >>>       struct ovsdb_idl_index *sbrec_logical_flow_by_logical_datapath;
> >>>       struct ovsdb_idl_index *sbrec_logical_flow_by_logical_dp_group;
> >>> diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c
> >>> index 9050380f3..24443bad1 100644
> >>> --- a/controller/ovn-controller.c
> >>> +++ b/controller/ovn-controller.c
> >>> @@ -2016,6 +2016,7 @@ init_lflow_ctx(struct engine_node *node,
> >>>           engine_get_input_data("port_groups", node);
> >>>       struct shash *port_groups = &pg_data->port_groups_cs_local;
> >>>
> >>> +    l_ctx_in->sb_idl = engine_get_context()->ovnsb_idl;
> >>>       l_ctx_in->sbrec_multicast_group_by_name_datapath =
> >>>           sbrec_mc_group_by_name_dp;
> >>>       l_ctx_in->sbrec_logical_flow_by_logical_datapath =
> >>> @@ -3124,6 +3125,7 @@ main(int argc, char *argv[])
> >>>           }
> >>>
> >>>           struct engine_context eng_ctx = {
> >>> +            .ovnsb_idl = ovnsb_idl_loop.idl,
> >>>               .ovs_idl_txn = ovs_idl_txn,
> >>>               .ovnsb_idl_txn = ovnsb_idl_txn,
> >>>               .client_ctx = &ctrl_engine_ctx
> >>> diff --git a/lib/inc-proc-eng.h b/lib/inc-proc-eng.h
> >>> index 7e9f5bb70..506da51a1 100644
> >>> --- a/lib/inc-proc-eng.h
> >>> +++ b/lib/inc-proc-eng.h
> >>> @@ -64,6 +64,7 @@
> >>>   #define ENGINE_MAX_OVSDB_INDEX 256
> >>>
> >>>   struct engine_context {
> >>> +    struct ovsdb_idl *ovnsb_idl;
> >>>       struct ovsdb_idl_txn *ovs_idl_txn;
> >>>       struct ovsdb_idl_txn *ovnsb_idl_txn;
> >>>       void *client_ctx;
> >>> diff --git a/northd/ovn-northd.c b/northd/ovn-northd.c
> >>> index 570c6a3ef..8649a590b 100644
> >>> --- a/northd/ovn-northd.c
> >>> +++ b/northd/ovn-northd.c
> >>> @@ -3529,19 +3529,17 @@ build_ovn_lbs(struct northd_context *ctx,
> > struct hmap *datapaths,
> >>>               continue;
> >>>           }
> >>>
> >>> -        const struct sbrec_load_balancer **sbrec_lbs =
> >>> -            xmalloc(od->nbs->n_load_balancer * sizeof *sbrec_lbs);
> >>> +        struct uuid *lb_uuids =
> >>> +            xmalloc(od->nbs->n_load_balancer * sizeof *lb_uuids);
> >>>           for (size_t i = 0; i < od->nbs->n_load_balancer; i++) {
> >>>               const struct uuid *lb_uuid =
> >>>                   &od->nbs->load_balancer[i]->header_.uuid;
> >>>               lb = ovn_northd_lb_find(lbs, lb_uuid);
> >>> -            sbrec_lbs[i] = lb->slb;
> >>> +            lb_uuids[i] = lb->slb->header_.uuid;
> >>>           }
> >>> -
> >>> -        sbrec_datapath_binding_set_load_balancers(
> >>> -            od->sb, (struct sbrec_load_balancer **)sbrec_lbs,
> >>> -            od->nbs->n_load_balancer);
> >>> -        free(sbrec_lbs);
> >>> +        sbrec_datapath_binding_set_load_balancers(od->sb, lb_uuids,
> >>> +
> >  od->nbs->n_load_balancer);
> >>> +        free(lb_uuids);
> >
> > I believe we need to change the deletion part as well. Now that we don't
> > have the weak reference, we need to take care of the LB deletions by
> > removing the reference explicitly from table datapath_binding.
>
> This is handling both addition and deletion, it's just storing the list
> of load balancers that are currently associated to the logical datapath.
>
> If a load balancer was deleted it won't be in the list.
>
> We already have tests validating this works fine:
>
>
https://github.com/ovn-org/ovn/blob/ecc941c70f1675b66f6ecb2cdf09df362f62c97a/tests/ovn-northd.at#L2432
>
Oh you are right. Sorry that my brain was somehow wired with I-P when
reviewing this.

> >
> > Thanks,
> > Han
> >
> >
>
> Thanks,
> Dumitru
>
> >>>       }
> >>>   }
> >>>
> >>> diff --git a/ovn-sb.ovsschema b/ovn-sb.ovsschema
> >>> index bbf60781d..57fbc3ca4 100644
> >>> --- a/ovn-sb.ovsschema
> >>> +++ b/ovn-sb.ovsschema
> >>> @@ -1,7 +1,7 @@
> >>>   {
> >>>       "name": "OVN_Southbound",
> >>>       "version": "20.18.0",
> >>> -    "cksum": "1816525029 26536",
> >>> +    "cksum": "779623696 26386",
> >>>       "tables": {
> >>>           "SB_Global": {
> >>>               "columns": {
> >>> @@ -166,9 +166,7 @@
> >>>                        "type": {"key": {"type": "integer",
> >>>                                         "minInteger": 1,
> >>>                                         "maxInteger": 16777215}}},
> >>> -                "load_balancers": {"type": {"key": {"type": "uuid",
> >>> -                                                   "refTable":
> > "Load_Balancer",
> >>> -                                                   "refType":
"weak"},
> >>> +                "load_balancers": {"type": {"key": {"type": "uuid"},
> >>>                                               "min": 0,
> >>>                                               "max": "unlimited"}},
> >>>                   "external_ids": {
> >>>
> >>
> >> _______________________________________________
> >> dev mailing list
> >> dev@openvswitch.org
> >> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> >
>
Dumitru Ceara July 12, 2021, 8:11 a.m. UTC | #5
On 7/9/21 6:11 PM, Han Zhou wrote:
>> To avoid this potentially expensive table walk, we use the load_balancer
>> uuids stored in the datapath record itself (it's probably best to see
>> those as hints I guess).
>>
> Thanks for the explain. What you described is indeed a dependency between
> lflow and sb_load_balancer because in lflow's compute/change handlers
> sb_load_balancer data is required. (otherwise we would not need to call
> sbrec_load_balancer_get_for_uuid().
> 
> However, since this dependency is already captured in the I-P, it is just
> easy for this use case. We should simply use
> sbrec_load_balancer_table_get_for_uuid() instead, which takes struct
> sbrec_load_balancer_table* as argument and we already have it in the
> lflow_ctx_in.lb_table as the input to lflow engine node.
> 

You're right, it's simpler like this, thanks for pointing out the
sbrec_*table_get_for_uuid() variant.

I sent a v2:

http://patchwork.ozlabs.org/project/ovn/list/?series=253029

Regards,
Dumitru
Dumitru Ceara July 12, 2021, 2:22 p.m. UTC | #6
On 7/12/21 10:11 AM, Dumitru Ceara wrote:
> On 7/9/21 6:11 PM, Han Zhou wrote:
>>> To avoid this potentially expensive table walk, we use the load_balancer
>>> uuids stored in the datapath record itself (it's probably best to see
>>> those as hints I guess).
>>>
>> Thanks for the explain. What you described is indeed a dependency between
>> lflow and sb_load_balancer because in lflow's compute/change handlers
>> sb_load_balancer data is required. (otherwise we would not need to call
>> sbrec_load_balancer_get_for_uuid().
>>
>> However, since this dependency is already captured in the I-P, it is just
>> easy for this use case. We should simply use
>> sbrec_load_balancer_table_get_for_uuid() instead, which takes struct
>> sbrec_load_balancer_table* as argument and we already have it in the
>> lflow_ctx_in.lb_table as the input to lflow engine node.
>>
> 
> You're right, it's simpler like this, thanks for pointing out the
> sbrec_*table_get_for_uuid() variant.
> 
> I sent a v2:
> 
> http://patchwork.ozlabs.org/project/ovn/list/?series=253029
> 

Sorry for the noise;  Ilya mentioned offline that I forgot to update the
schema version number, I sent a v3 taking care of that:

http://patchwork.ozlabs.org/project/ovn/list/?series=253094

Regards,
Dumitru
diff mbox series

Patch

diff --git a/controller/lflow.c b/controller/lflow.c
index 60aa011ff..877bd49b0 100644
--- a/controller/lflow.c
+++ b/controller/lflow.c
@@ -1744,8 +1744,10 @@  lflow_processing_end:
     /* Add load balancer hairpin flows if the datapath has any load balancers
      * associated. */
     for (size_t i = 0; i < dp->n_load_balancers; i++) {
-        consider_lb_hairpin_flows(dp->load_balancers[i],
-                                  l_ctx_in->local_datapaths,
+        const struct sbrec_load_balancer *lb =
+            sbrec_load_balancer_get_for_uuid(l_ctx_in->sb_idl,
+                                             &dp->load_balancers[i]);
+        consider_lb_hairpin_flows(lb, l_ctx_in->local_datapaths,
                                   l_ctx_out->flow_table);
     }
 
diff --git a/controller/lflow.h b/controller/lflow.h
index c17ff6dd4..5ba7af894 100644
--- a/controller/lflow.h
+++ b/controller/lflow.h
@@ -40,6 +40,7 @@ 
 #include "openvswitch/list.h"
 
 struct ovn_extend_table;
+struct ovsdb_idl;
 struct ovsdb_idl_index;
 struct ovn_desired_flow_table;
 struct hmap;
@@ -126,6 +127,7 @@  void lflow_resource_destroy(struct lflow_resource_ref *);
 void lflow_resource_clear(struct lflow_resource_ref *);
 
 struct lflow_ctx_in {
+    struct ovsdb_idl *sb_idl;
     struct ovsdb_idl_index *sbrec_multicast_group_by_name_datapath;
     struct ovsdb_idl_index *sbrec_logical_flow_by_logical_datapath;
     struct ovsdb_idl_index *sbrec_logical_flow_by_logical_dp_group;
diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c
index 9050380f3..24443bad1 100644
--- a/controller/ovn-controller.c
+++ b/controller/ovn-controller.c
@@ -2016,6 +2016,7 @@  init_lflow_ctx(struct engine_node *node,
         engine_get_input_data("port_groups", node);
     struct shash *port_groups = &pg_data->port_groups_cs_local;
 
+    l_ctx_in->sb_idl = engine_get_context()->ovnsb_idl;
     l_ctx_in->sbrec_multicast_group_by_name_datapath =
         sbrec_mc_group_by_name_dp;
     l_ctx_in->sbrec_logical_flow_by_logical_datapath =
@@ -3124,6 +3125,7 @@  main(int argc, char *argv[])
         }
 
         struct engine_context eng_ctx = {
+            .ovnsb_idl = ovnsb_idl_loop.idl,
             .ovs_idl_txn = ovs_idl_txn,
             .ovnsb_idl_txn = ovnsb_idl_txn,
             .client_ctx = &ctrl_engine_ctx
diff --git a/lib/inc-proc-eng.h b/lib/inc-proc-eng.h
index 7e9f5bb70..506da51a1 100644
--- a/lib/inc-proc-eng.h
+++ b/lib/inc-proc-eng.h
@@ -64,6 +64,7 @@ 
 #define ENGINE_MAX_OVSDB_INDEX 256
 
 struct engine_context {
+    struct ovsdb_idl *ovnsb_idl;
     struct ovsdb_idl_txn *ovs_idl_txn;
     struct ovsdb_idl_txn *ovnsb_idl_txn;
     void *client_ctx;
diff --git a/northd/ovn-northd.c b/northd/ovn-northd.c
index 570c6a3ef..8649a590b 100644
--- a/northd/ovn-northd.c
+++ b/northd/ovn-northd.c
@@ -3529,19 +3529,17 @@  build_ovn_lbs(struct northd_context *ctx, struct hmap *datapaths,
             continue;
         }
 
-        const struct sbrec_load_balancer **sbrec_lbs =
-            xmalloc(od->nbs->n_load_balancer * sizeof *sbrec_lbs);
+        struct uuid *lb_uuids =
+            xmalloc(od->nbs->n_load_balancer * sizeof *lb_uuids);
         for (size_t i = 0; i < od->nbs->n_load_balancer; i++) {
             const struct uuid *lb_uuid =
                 &od->nbs->load_balancer[i]->header_.uuid;
             lb = ovn_northd_lb_find(lbs, lb_uuid);
-            sbrec_lbs[i] = lb->slb;
+            lb_uuids[i] = lb->slb->header_.uuid;
         }
-
-        sbrec_datapath_binding_set_load_balancers(
-            od->sb, (struct sbrec_load_balancer **)sbrec_lbs,
-            od->nbs->n_load_balancer);
-        free(sbrec_lbs);
+        sbrec_datapath_binding_set_load_balancers(od->sb, lb_uuids,
+                                                  od->nbs->n_load_balancer);
+        free(lb_uuids);
     }
 }
 
diff --git a/ovn-sb.ovsschema b/ovn-sb.ovsschema
index bbf60781d..57fbc3ca4 100644
--- a/ovn-sb.ovsschema
+++ b/ovn-sb.ovsschema
@@ -1,7 +1,7 @@ 
 {
     "name": "OVN_Southbound",
     "version": "20.18.0",
-    "cksum": "1816525029 26536",
+    "cksum": "779623696 26386",
     "tables": {
         "SB_Global": {
             "columns": {
@@ -166,9 +166,7 @@ 
                      "type": {"key": {"type": "integer",
                                       "minInteger": 1,
                                       "maxInteger": 16777215}}},
-                "load_balancers": {"type": {"key": {"type": "uuid",
-                                                   "refTable": "Load_Balancer",
-                                                   "refType": "weak"},
+                "load_balancers": {"type": {"key": {"type": "uuid"},
                                             "min": 0,
                                             "max": "unlimited"}},
                 "external_ids": {