Message ID | 20210707145627.23990-1-dceara@redhat.com |
---|---|
State | Changes Requested |
Headers | show |
Series | [ovs-dev] controller: Avoid unnecessary load balancer flow processing. | expand |
Context | Check | Description |
---|---|---|
ovsrobot/apply-robot | success | apply and check: success |
ovsrobot/github-robot-_Build_and_Test | success | github build: passed |
ovsrobot/github-robot-_ovn-kubernetes | success | github build: passed |
Acked-by: Mark Michelson <mmichels@redhat.com> If I'm to extrapolate this a bit more, it sounds like any time a Datapath_Binding is updated, it results in the potential for many Logical_Flows to be re-calculated. In this particular case, it was triggered based on modifying a wide-reaching load balancer. I suppose the same thing would happen if Datapath_Binding external_ids were updated. I wonder if it's worth investigating trying to create a new reference type for the purposes of the IDL. The idea would be that if a column in table A references rows in table B, then if the rows in table B are updated, then any relevant rows in table A will reported as updated as well. However, any columns in tables C, D, E, etc. that reference table A will not result in the rows in tables C, D, E, etc. being reported as updated. Essentially, it's the same as what you've done here, except we can have the IDL fill in the sbrec_load_balancer in the Datapath_Binding record instead of having to look it up when needed. Alternatively, would it be worth listing the relevant columns that a particular type "cares" about? For instance, Logical_Datapaths are only affected if the tunnel_key is updated on a Datapath_Binding. The load_balancers and external_ids are irrelevant. Would it be worthwhile to be able to note in the reference the columns that should elicit an update? Is it worth it? Or is this such a niche use case that it's not worth implementing? On 7/7/21 10:56 AM, Dumitru Ceara wrote: > Whenever a Load_Balancer is updated, e.g., a VIP is added, the following > sequence of events happens: > > 1. The Southbound Load_Balancer record is updated. > 2. The Southbound Datapath_Binding records on which the Load_Balancer is > applied are updated. > 3. Southbound ovsdb-server sends updates about the Load_Balancer and > Datapath_Binding records to ovn-controller. > 4. The IDL layer in ovn-controller processes the updates at #3, but > because of the SB schema references between tables [0] all logical > flows referencing the updated Datapath_Binding are marked as > "updated". The same is true for Logical_DP_Group records > referencing the Datapath_Binding, and also for all logical flows > pointing to the new "updated" datapath groups. > 5. ovn-controller ends up recomputing (removing/readding) all flows for > all these tracked updates. > > From the SB Schema: > "Datapath_Binding": { > "columns": { > [...] > "load_balancers": {"type": {"key": {"type": "uuid", > "refTable": "Load_Balancer", > "refType": "weak"}, > "min": 0, > "max": "unlimited"}}, > [...] > "Load_Balancer": { > "columns": { > "datapaths": { > [...] > "type": {"key": {"type": "uuid", > "refTable": "Datapath_Binding"}, > "min": 0, "max": "unlimited"}}, > [...] > "Logical_DP_Group": { > "columns": { > "datapaths": > {"type": {"key": {"type": "uuid", > "refTable": "Datapath_Binding", > "refType": "weak"}, > "min": 0, "max": "unlimited"}}}, > [...] > "Logical_Flow": { > "columns": { > "logical_datapath": > {"type": {"key": {"type": "uuid", > "refTable": "Datapath_Binding"}, > "min": 0, "max": 1}}, > "logical_dp_group": > {"type": {"key": {"type": "uuid", > "refTable": "Logical_DP_Group"}, > > In order to avoid this unnecessary Logical_Flow notification storm we > now remove the explicit reference from Datapath_Binding to > Load_Balancer and instead store raw UUIDs. > > This means that on the ovn-controller side we need to perform a > Load_Balancer table lookup by UUID whenever a new datapath is added, > but that doesn't happen too often and the cost of the lookup is > negligible compared to the huge cost of processing the unnecessary > logical flow updates. > > This change is backwards compatible because the contents stored in the > database are not changed, just that the schema constraints are relaxed a > bit. > > Some performance measurements, on a scale test deployment simulating an > ovn-kubernetes deployment with 120 nodes and a large load balancer > with 16K VIPs associated to each node's logical switch, the event > processing loop time in ovn-controller, when adding a new VIP, is > reduced from ~39 seconds to ~8 seconds. > > There's no need to change the northd DDlog implementation. > > Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=1978605 > Signed-off-by: Dumitru Ceara <dceara@redhat.com> > --- > controller/lflow.c | 6 ++++-- > controller/lflow.h | 2 ++ > controller/ovn-controller.c | 2 ++ > lib/inc-proc-eng.h | 1 + > northd/ovn-northd.c | 14 ++++++-------- > ovn-sb.ovsschema | 6 ++---- > 6 files changed, 17 insertions(+), 14 deletions(-) > > diff --git a/controller/lflow.c b/controller/lflow.c > index 60aa011ff..877bd49b0 100644 > --- a/controller/lflow.c > +++ b/controller/lflow.c > @@ -1744,8 +1744,10 @@ lflow_processing_end: > /* Add load balancer hairpin flows if the datapath has any load balancers > * associated. */ > for (size_t i = 0; i < dp->n_load_balancers; i++) { > - consider_lb_hairpin_flows(dp->load_balancers[i], > - l_ctx_in->local_datapaths, > + const struct sbrec_load_balancer *lb = > + sbrec_load_balancer_get_for_uuid(l_ctx_in->sb_idl, > + &dp->load_balancers[i]); > + consider_lb_hairpin_flows(lb, l_ctx_in->local_datapaths, > l_ctx_out->flow_table); > } > > diff --git a/controller/lflow.h b/controller/lflow.h > index c17ff6dd4..5ba7af894 100644 > --- a/controller/lflow.h > +++ b/controller/lflow.h > @@ -40,6 +40,7 @@ > #include "openvswitch/list.h" > > struct ovn_extend_table; > +struct ovsdb_idl; > struct ovsdb_idl_index; > struct ovn_desired_flow_table; > struct hmap; > @@ -126,6 +127,7 @@ void lflow_resource_destroy(struct lflow_resource_ref *); > void lflow_resource_clear(struct lflow_resource_ref *); > > struct lflow_ctx_in { > + struct ovsdb_idl *sb_idl; > struct ovsdb_idl_index *sbrec_multicast_group_by_name_datapath; > struct ovsdb_idl_index *sbrec_logical_flow_by_logical_datapath; > struct ovsdb_idl_index *sbrec_logical_flow_by_logical_dp_group; > diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c > index 9050380f3..24443bad1 100644 > --- a/controller/ovn-controller.c > +++ b/controller/ovn-controller.c > @@ -2016,6 +2016,7 @@ init_lflow_ctx(struct engine_node *node, > engine_get_input_data("port_groups", node); > struct shash *port_groups = &pg_data->port_groups_cs_local; > > + l_ctx_in->sb_idl = engine_get_context()->ovnsb_idl; > l_ctx_in->sbrec_multicast_group_by_name_datapath = > sbrec_mc_group_by_name_dp; > l_ctx_in->sbrec_logical_flow_by_logical_datapath = > @@ -3124,6 +3125,7 @@ main(int argc, char *argv[]) > } > > struct engine_context eng_ctx = { > + .ovnsb_idl = ovnsb_idl_loop.idl, > .ovs_idl_txn = ovs_idl_txn, > .ovnsb_idl_txn = ovnsb_idl_txn, > .client_ctx = &ctrl_engine_ctx > diff --git a/lib/inc-proc-eng.h b/lib/inc-proc-eng.h > index 7e9f5bb70..506da51a1 100644 > --- a/lib/inc-proc-eng.h > +++ b/lib/inc-proc-eng.h > @@ -64,6 +64,7 @@ > #define ENGINE_MAX_OVSDB_INDEX 256 > > struct engine_context { > + struct ovsdb_idl *ovnsb_idl; > struct ovsdb_idl_txn *ovs_idl_txn; > struct ovsdb_idl_txn *ovnsb_idl_txn; > void *client_ctx; > diff --git a/northd/ovn-northd.c b/northd/ovn-northd.c > index 570c6a3ef..8649a590b 100644 > --- a/northd/ovn-northd.c > +++ b/northd/ovn-northd.c > @@ -3529,19 +3529,17 @@ build_ovn_lbs(struct northd_context *ctx, struct hmap *datapaths, > continue; > } > > - const struct sbrec_load_balancer **sbrec_lbs = > - xmalloc(od->nbs->n_load_balancer * sizeof *sbrec_lbs); > + struct uuid *lb_uuids = > + xmalloc(od->nbs->n_load_balancer * sizeof *lb_uuids); > for (size_t i = 0; i < od->nbs->n_load_balancer; i++) { > const struct uuid *lb_uuid = > &od->nbs->load_balancer[i]->header_.uuid; > lb = ovn_northd_lb_find(lbs, lb_uuid); > - sbrec_lbs[i] = lb->slb; > + lb_uuids[i] = lb->slb->header_.uuid; > } > - > - sbrec_datapath_binding_set_load_balancers( > - od->sb, (struct sbrec_load_balancer **)sbrec_lbs, > - od->nbs->n_load_balancer); > - free(sbrec_lbs); > + sbrec_datapath_binding_set_load_balancers(od->sb, lb_uuids, > + od->nbs->n_load_balancer); > + free(lb_uuids); > } > } > > diff --git a/ovn-sb.ovsschema b/ovn-sb.ovsschema > index bbf60781d..57fbc3ca4 100644 > --- a/ovn-sb.ovsschema > +++ b/ovn-sb.ovsschema > @@ -1,7 +1,7 @@ > { > "name": "OVN_Southbound", > "version": "20.18.0", > - "cksum": "1816525029 26536", > + "cksum": "779623696 26386", > "tables": { > "SB_Global": { > "columns": { > @@ -166,9 +166,7 @@ > "type": {"key": {"type": "integer", > "minInteger": 1, > "maxInteger": 16777215}}}, > - "load_balancers": {"type": {"key": {"type": "uuid", > - "refTable": "Load_Balancer", > - "refType": "weak"}, > + "load_balancers": {"type": {"key": {"type": "uuid"}, > "min": 0, > "max": "unlimited"}}, > "external_ids": { >
On Thu, Jul 8, 2021 at 1:10 PM Mark Michelson <mmichels@redhat.com> wrote: > > Acked-by: Mark Michelson <mmichels@redhat.com> > Thanks Dumitru and Mark! > > If I'm to extrapolate this a bit more, it sounds like any time a > Datapath_Binding is updated, it results in the potential for many > Logical_Flows to be re-calculated. In this particular case, it was > triggered based on modifying a wide-reaching load balancer. I suppose > the same thing would happen if Datapath_Binding external_ids were updated. > Yes. Luckily it seems external_ids of datapath_binding shouldn't change very often. But I agree this is a general problem of the I-P. > I wonder if it's worth investigating trying to create a new reference > type for the purposes of the IDL. The idea would be that if a column in > table A references rows in table B, then if the rows in table B are > updated, then any relevant rows in table A will reported as updated as > well. However, any columns in tables C, D, E, etc. that reference table > A will not result in the rows in tables C, D, E, etc. being reported as > updated. > > Essentially, it's the same as what you've done here, except we can have > the IDL fill in the sbrec_load_balancer in the Datapath_Binding record > instead of having to look it up when needed. > It may be helpful to have a new reference type, but deciding when and where to use it would be very tricky. If not careful enough, we might be creating some dependency in the code without getting proper change notification. > Alternatively, would it be worth listing the relevant columns that a > particular type "cares" about? For instance, Logical_Datapaths are only > affected if the tunnel_key is updated on a Datapath_Binding. The > load_balancers and external_ids are irrelevant. Would it be worthwhile > to be able to note in the reference the columns that should elicit an > update? This sounds more reliable. Ideally, if referenced columns are noted, the generated IDL structure of A's field B should only contains the columns that are interested by A. This way we can make sure it is impossible to secretly use a column without being notified of updates that possibly need to be handled by I-P. I am not sure how hard this would be. > > Is it worth it? Or is this such a niche use case that it's not worth > implementing? I think we can see if there are more such scenarios that justify the IDL change. I am ok with changing the schema for the current use case to replace the reference by just raw uuid. However, we should never directly use IDL from the ctx and access any table, this would break the I-P dependency tracking. Please see more comments inlined. > > On 7/7/21 10:56 AM, Dumitru Ceara wrote: > > Whenever a Load_Balancer is updated, e.g., a VIP is added, the following > > sequence of events happens: > > > > 1. The Southbound Load_Balancer record is updated. > > 2. The Southbound Datapath_Binding records on which the Load_Balancer is > > applied are updated. > > 3. Southbound ovsdb-server sends updates about the Load_Balancer and > > Datapath_Binding records to ovn-controller. > > 4. The IDL layer in ovn-controller processes the updates at #3, but > > because of the SB schema references between tables [0] all logical > > flows referencing the updated Datapath_Binding are marked as > > "updated". The same is true for Logical_DP_Group records > > referencing the Datapath_Binding, and also for all logical flows > > pointing to the new "updated" datapath groups. > > 5. ovn-controller ends up recomputing (removing/readding) all flows for > > all these tracked updates. > > > > From the SB Schema: > > "Datapath_Binding": { > > "columns": { > > [...] > > "load_balancers": {"type": {"key": {"type": "uuid", > > "refTable": "Load_Balancer", > > "refType": "weak"}, > > "min": 0, > > "max": "unlimited"}}, > > [...] > > "Load_Balancer": { > > "columns": { > > "datapaths": { > > [...] > > "type": {"key": {"type": "uuid", > > "refTable": "Datapath_Binding"}, > > "min": 0, "max": "unlimited"}}, > > [...] > > "Logical_DP_Group": { > > "columns": { > > "datapaths": > > {"type": {"key": {"type": "uuid", > > "refTable": "Datapath_Binding", > > "refType": "weak"}, > > "min": 0, "max": "unlimited"}}}, > > [...] > > "Logical_Flow": { > > "columns": { > > "logical_datapath": > > {"type": {"key": {"type": "uuid", > > "refTable": "Datapath_Binding"}, > > "min": 0, "max": 1}}, > > "logical_dp_group": > > {"type": {"key": {"type": "uuid", > > "refTable": "Logical_DP_Group"}, > > > > In order to avoid this unnecessary Logical_Flow notification storm we > > now remove the explicit reference from Datapath_Binding to > > Load_Balancer and instead store raw UUIDs. > > > > This means that on the ovn-controller side we need to perform a > > Load_Balancer table lookup by UUID whenever a new datapath is added, > > but that doesn't happen too often and the cost of the lookup is > > negligible compared to the huge cost of processing the unnecessary > > logical flow updates. > > > > This change is backwards compatible because the contents stored in the > > database are not changed, just that the schema constraints are relaxed a > > bit. > > > > Some performance measurements, on a scale test deployment simulating an > > ovn-kubernetes deployment with 120 nodes and a large load balancer > > with 16K VIPs associated to each node's logical switch, the event > > processing loop time in ovn-controller, when adding a new VIP, is > > reduced from ~39 seconds to ~8 seconds. > > > > There's no need to change the northd DDlog implementation. > > > > Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=1978605 > > Signed-off-by: Dumitru Ceara <dceara@redhat.com> > > --- > > controller/lflow.c | 6 ++++-- > > controller/lflow.h | 2 ++ > > controller/ovn-controller.c | 2 ++ > > lib/inc-proc-eng.h | 1 + > > northd/ovn-northd.c | 14 ++++++-------- > > ovn-sb.ovsschema | 6 ++---- > > 6 files changed, 17 insertions(+), 14 deletions(-) > > > > diff --git a/controller/lflow.c b/controller/lflow.c > > index 60aa011ff..877bd49b0 100644 > > --- a/controller/lflow.c > > +++ b/controller/lflow.c > > @@ -1744,8 +1744,10 @@ lflow_processing_end: > > /* Add load balancer hairpin flows if the datapath has any load balancers > > * associated. */ > > for (size_t i = 0; i < dp->n_load_balancers; i++) { > > - consider_lb_hairpin_flows(dp->load_balancers[i], > > - l_ctx_in->local_datapaths, > > + const struct sbrec_load_balancer *lb = > > + sbrec_load_balancer_get_for_uuid(l_ctx_in->sb_idl, > > + &dp->load_balancers[i]); We should never directly access DB tables this way within the I-P engine. We will lose track of the dependencies. Now that we removed the reference, we should add load_balancer table as a new dependency, get the table data through EN_OVSDB_GET, and explicitly add a change handler to avoid recompute. > > + consider_lb_hairpin_flows(lb, l_ctx_in->local_datapaths, > > l_ctx_out->flow_table); > > } > > > > diff --git a/controller/lflow.h b/controller/lflow.h > > index c17ff6dd4..5ba7af894 100644 > > --- a/controller/lflow.h > > +++ b/controller/lflow.h > > @@ -40,6 +40,7 @@ > > #include "openvswitch/list.h" > > > > struct ovn_extend_table; > > +struct ovsdb_idl; > > struct ovsdb_idl_index; > > struct ovn_desired_flow_table; > > struct hmap; > > @@ -126,6 +127,7 @@ void lflow_resource_destroy(struct lflow_resource_ref *); > > void lflow_resource_clear(struct lflow_resource_ref *); > > > > struct lflow_ctx_in { > > + struct ovsdb_idl *sb_idl; > > struct ovsdb_idl_index *sbrec_multicast_group_by_name_datapath; > > struct ovsdb_idl_index *sbrec_logical_flow_by_logical_datapath; > > struct ovsdb_idl_index *sbrec_logical_flow_by_logical_dp_group; > > diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c > > index 9050380f3..24443bad1 100644 > > --- a/controller/ovn-controller.c > > +++ b/controller/ovn-controller.c > > @@ -2016,6 +2016,7 @@ init_lflow_ctx(struct engine_node *node, > > engine_get_input_data("port_groups", node); > > struct shash *port_groups = &pg_data->port_groups_cs_local; > > > > + l_ctx_in->sb_idl = engine_get_context()->ovnsb_idl; > > l_ctx_in->sbrec_multicast_group_by_name_datapath = > > sbrec_mc_group_by_name_dp; > > l_ctx_in->sbrec_logical_flow_by_logical_datapath = > > @@ -3124,6 +3125,7 @@ main(int argc, char *argv[]) > > } > > > > struct engine_context eng_ctx = { > > + .ovnsb_idl = ovnsb_idl_loop.idl, > > .ovs_idl_txn = ovs_idl_txn, > > .ovnsb_idl_txn = ovnsb_idl_txn, > > .client_ctx = &ctrl_engine_ctx > > diff --git a/lib/inc-proc-eng.h b/lib/inc-proc-eng.h > > index 7e9f5bb70..506da51a1 100644 > > --- a/lib/inc-proc-eng.h > > +++ b/lib/inc-proc-eng.h > > @@ -64,6 +64,7 @@ > > #define ENGINE_MAX_OVSDB_INDEX 256 > > > > struct engine_context { > > + struct ovsdb_idl *ovnsb_idl; > > struct ovsdb_idl_txn *ovs_idl_txn; > > struct ovsdb_idl_txn *ovnsb_idl_txn; > > void *client_ctx; > > diff --git a/northd/ovn-northd.c b/northd/ovn-northd.c > > index 570c6a3ef..8649a590b 100644 > > --- a/northd/ovn-northd.c > > +++ b/northd/ovn-northd.c > > @@ -3529,19 +3529,17 @@ build_ovn_lbs(struct northd_context *ctx, struct hmap *datapaths, > > continue; > > } > > > > - const struct sbrec_load_balancer **sbrec_lbs = > > - xmalloc(od->nbs->n_load_balancer * sizeof *sbrec_lbs); > > + struct uuid *lb_uuids = > > + xmalloc(od->nbs->n_load_balancer * sizeof *lb_uuids); > > for (size_t i = 0; i < od->nbs->n_load_balancer; i++) { > > const struct uuid *lb_uuid = > > &od->nbs->load_balancer[i]->header_.uuid; > > lb = ovn_northd_lb_find(lbs, lb_uuid); > > - sbrec_lbs[i] = lb->slb; > > + lb_uuids[i] = lb->slb->header_.uuid; > > } > > - > > - sbrec_datapath_binding_set_load_balancers( > > - od->sb, (struct sbrec_load_balancer **)sbrec_lbs, > > - od->nbs->n_load_balancer); > > - free(sbrec_lbs); > > + sbrec_datapath_binding_set_load_balancers(od->sb, lb_uuids, > > + od->nbs->n_load_balancer); > > + free(lb_uuids); I believe we need to change the deletion part as well. Now that we don't have the weak reference, we need to take care of the LB deletions by removing the reference explicitly from table datapath_binding. Thanks, Han > > } > > } > > > > diff --git a/ovn-sb.ovsschema b/ovn-sb.ovsschema > > index bbf60781d..57fbc3ca4 100644 > > --- a/ovn-sb.ovsschema > > +++ b/ovn-sb.ovsschema > > @@ -1,7 +1,7 @@ > > { > > "name": "OVN_Southbound", > > "version": "20.18.0", > > - "cksum": "1816525029 26536", > > + "cksum": "779623696 26386", > > "tables": { > > "SB_Global": { > > "columns": { > > @@ -166,9 +166,7 @@ > > "type": {"key": {"type": "integer", > > "minInteger": 1, > > "maxInteger": 16777215}}}, > > - "load_balancers": {"type": {"key": {"type": "uuid", > > - "refTable": "Load_Balancer", > > - "refType": "weak"}, > > + "load_balancers": {"type": {"key": {"type": "uuid"}, > > "min": 0, > > "max": "unlimited"}}, > > "external_ids": { > > > > _______________________________________________ > dev mailing list > dev@openvswitch.org > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
On 7/9/21 3:46 AM, Han Zhou wrote: > On Thu, Jul 8, 2021 at 1:10 PM Mark Michelson <mmichels@redhat.com> wrote: >> >> Acked-by: Mark Michelson <mmichels@redhat.com> >> > Thanks Dumitru and Mark! > Hi Mark and Han, Thanks for the reviews! >> >> If I'm to extrapolate this a bit more, it sounds like any time a >> Datapath_Binding is updated, it results in the potential for many >> Logical_Flows to be re-calculated. In this particular case, it was >> triggered based on modifying a wide-reaching load balancer. I suppose >> the same thing would happen if Datapath_Binding external_ids were updated. >> > Yes. Luckily it seems external_ids of datapath_binding shouldn't change > very often. But I agree this is a general problem of the I-P. > I see it in the same way. I-P doesn't give per column change tracking information. >> I wonder if it's worth investigating trying to create a new reference >> type for the purposes of the IDL. The idea would be that if a column in >> table A references rows in table B, then if the rows in table B are >> updated, then any relevant rows in table A will reported as updated as >> well. However, any columns in tables C, D, E, etc. that reference table >> A will not result in the rows in tables C, D, E, etc. being reported as >> updated. >> >> Essentially, it's the same as what you've done here, except we can have >> the IDL fill in the sbrec_load_balancer in the Datapath_Binding record >> instead of having to look it up when needed. >> > It may be helpful to have a new reference type, but deciding when and where > to use it would be very tricky. If not careful enough, we might be creating > some dependency in the code without getting proper change notification. > >> Alternatively, would it be worth listing the relevant columns that a >> particular type "cares" about? For instance, Logical_Datapaths are only >> affected if the tunnel_key is updated on a Datapath_Binding. The >> load_balancers and external_ids are irrelevant. Would it be worthwhile >> to be able to note in the reference the columns that should elicit an >> update? > > This sounds more reliable. Ideally, if referenced columns are noted, the > generated IDL structure of A's field B should only contains the columns > that are interested by A. This way we can make sure it is impossible to > secretly use a column without being notified of updates that possibly need > to be handled by I-P. I am not sure how hard this would be. > I think this is part of a larger discussion (there were some started a while back) about refactoring ovn-controller in general. As far as I've seen there are multiple options: - keep using the IDL and enhance it (and the schema), suggested above - stop using IDL, use ovsdb-cs directly instead and implement a new type of incremental processing in ovn-controller. This can be either ddlog on top of ovsdb-cs (like ovn-northd-ddlog does), or something else. >> >> Is it worth it? Or is this such a niche use case that it's not worth >> implementing? > > I think we can see if there are more such scenarios that justify the IDL > change. I am ok with changing the schema for the current use case to > replace the reference by just raw uuid. However, we should never directly > use IDL from the ctx and access any table, this would break the I-P > dependency tracking. Please see more comments inlined. > In the general case I tend to agree but in this specific case Datapath_Binding doesn't really depend on Load_Balancer, it just uses those uuids as hints. Please see my replies inline. >> >> On 7/7/21 10:56 AM, Dumitru Ceara wrote: >>> Whenever a Load_Balancer is updated, e.g., a VIP is added, the following >>> sequence of events happens: >>> >>> 1. The Southbound Load_Balancer record is updated. >>> 2. The Southbound Datapath_Binding records on which the Load_Balancer is >>> applied are updated. >>> 3. Southbound ovsdb-server sends updates about the Load_Balancer and >>> Datapath_Binding records to ovn-controller. >>> 4. The IDL layer in ovn-controller processes the updates at #3, but >>> because of the SB schema references between tables [0] all logical >>> flows referencing the updated Datapath_Binding are marked as >>> "updated". The same is true for Logical_DP_Group records >>> referencing the Datapath_Binding, and also for all logical flows >>> pointing to the new "updated" datapath groups. >>> 5. ovn-controller ends up recomputing (removing/readding) all flows for >>> all these tracked updates. >>> >>> From the SB Schema: >>> "Datapath_Binding": { >>> "columns": { >>> [...] >>> "load_balancers": {"type": {"key": {"type": "uuid", >>> "refTable": > "Load_Balancer", >>> "refType": "weak"}, >>> "min": 0, >>> "max": "unlimited"}}, >>> [...] >>> "Load_Balancer": { >>> "columns": { >>> "datapaths": { >>> [...] >>> "type": {"key": {"type": "uuid", >>> "refTable": "Datapath_Binding"}, >>> "min": 0, "max": "unlimited"}}, >>> [...] >>> "Logical_DP_Group": { >>> "columns": { >>> "datapaths": >>> {"type": {"key": {"type": "uuid", >>> "refTable": "Datapath_Binding", >>> "refType": "weak"}, >>> "min": 0, "max": "unlimited"}}}, >>> [...] >>> "Logical_Flow": { >>> "columns": { >>> "logical_datapath": >>> {"type": {"key": {"type": "uuid", >>> "refTable": "Datapath_Binding"}, >>> "min": 0, "max": 1}}, >>> "logical_dp_group": >>> {"type": {"key": {"type": "uuid", >>> "refTable": "Logical_DP_Group"}, >>> >>> In order to avoid this unnecessary Logical_Flow notification storm we >>> now remove the explicit reference from Datapath_Binding to >>> Load_Balancer and instead store raw UUIDs. >>> >>> This means that on the ovn-controller side we need to perform a >>> Load_Balancer table lookup by UUID whenever a new datapath is added, >>> but that doesn't happen too often and the cost of the lookup is >>> negligible compared to the huge cost of processing the unnecessary >>> logical flow updates. >>> >>> This change is backwards compatible because the contents stored in the >>> database are not changed, just that the schema constraints are relaxed a >>> bit. >>> >>> Some performance measurements, on a scale test deployment simulating an >>> ovn-kubernetes deployment with 120 nodes and a large load balancer >>> with 16K VIPs associated to each node's logical switch, the event >>> processing loop time in ovn-controller, when adding a new VIP, is >>> reduced from ~39 seconds to ~8 seconds. >>> >>> There's no need to change the northd DDlog implementation. >>> >>> Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=1978605 >>> Signed-off-by: Dumitru Ceara <dceara@redhat.com> >>> --- >>> controller/lflow.c | 6 ++++-- >>> controller/lflow.h | 2 ++ >>> controller/ovn-controller.c | 2 ++ >>> lib/inc-proc-eng.h | 1 + >>> northd/ovn-northd.c | 14 ++++++-------- >>> ovn-sb.ovsschema | 6 ++---- >>> 6 files changed, 17 insertions(+), 14 deletions(-) >>> >>> diff --git a/controller/lflow.c b/controller/lflow.c >>> index 60aa011ff..877bd49b0 100644 >>> --- a/controller/lflow.c >>> +++ b/controller/lflow.c >>> @@ -1744,8 +1744,10 @@ lflow_processing_end: >>> /* Add load balancer hairpin flows if the datapath has any load > balancers >>> * associated. */ >>> for (size_t i = 0; i < dp->n_load_balancers; i++) { >>> - consider_lb_hairpin_flows(dp->load_balancers[i], >>> - l_ctx_in->local_datapaths, >>> + const struct sbrec_load_balancer *lb = >>> + sbrec_load_balancer_get_for_uuid(l_ctx_in->sb_idl, >>> + &dp->load_balancers[i]); > > We should never directly access DB tables this way within the I-P engine. > We will lose track of the dependencies. Now that we removed the reference, > we should add load_balancer table as a new dependency, get the table data > through EN_OVSDB_GET, and explicitly add a change handler to avoid > recompute. > There's some confusion here. Datapath doesn't really depend on Load_Balancer. From the beginning there shouldn't have been a reference from Datapath_Binding to Load_Balancer. As a matter of fact NB.Load_Balancer used to be completely handled by ovn-northd and would generate logical flows. So the real dependency is "en_lflow_output" depends on SB load balancers. We already have an I-P handler for that: lflow_output_sb_load_balancer_handler() We don't *really* need to store load_balancer references in Datapath_Binding. The only reason they were added there when Load_Balancer flow generation was moved to ovn-controller was to avoid walking all load balancers whenever a datapath was added. However, when a load balancer is changed there's no change to be handled for datapaths themselves. As a matter of fact, an alternative implementation (that obviously doesn't scale well) is: SBREC_LOAD_BALANCER_TABLE_FOR_EACH (lb, l_ctx_in->lb_table) { for (size_t i = 0; i < lb->n_datapaths; i++) { if (lb->datapaths[i] == dp) { consider_lb_hairpin_flows(lb, ...); break; } } } To avoid this potentially expensive table walk, we use the load_balancer uuids stored in the datapath record itself (it's probably best to see those as hints I guess). >>> + consider_lb_hairpin_flows(lb, l_ctx_in->local_datapaths, >>> l_ctx_out->flow_table); >>> } >>> >>> diff --git a/controller/lflow.h b/controller/lflow.h >>> index c17ff6dd4..5ba7af894 100644 >>> --- a/controller/lflow.h >>> +++ b/controller/lflow.h >>> @@ -40,6 +40,7 @@ >>> #include "openvswitch/list.h" >>> >>> struct ovn_extend_table; >>> +struct ovsdb_idl; >>> struct ovsdb_idl_index; >>> struct ovn_desired_flow_table; >>> struct hmap; >>> @@ -126,6 +127,7 @@ void lflow_resource_destroy(struct > lflow_resource_ref *); >>> void lflow_resource_clear(struct lflow_resource_ref *); >>> >>> struct lflow_ctx_in { >>> + struct ovsdb_idl *sb_idl; >>> struct ovsdb_idl_index *sbrec_multicast_group_by_name_datapath; >>> struct ovsdb_idl_index *sbrec_logical_flow_by_logical_datapath; >>> struct ovsdb_idl_index *sbrec_logical_flow_by_logical_dp_group; >>> diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c >>> index 9050380f3..24443bad1 100644 >>> --- a/controller/ovn-controller.c >>> +++ b/controller/ovn-controller.c >>> @@ -2016,6 +2016,7 @@ init_lflow_ctx(struct engine_node *node, >>> engine_get_input_data("port_groups", node); >>> struct shash *port_groups = &pg_data->port_groups_cs_local; >>> >>> + l_ctx_in->sb_idl = engine_get_context()->ovnsb_idl; >>> l_ctx_in->sbrec_multicast_group_by_name_datapath = >>> sbrec_mc_group_by_name_dp; >>> l_ctx_in->sbrec_logical_flow_by_logical_datapath = >>> @@ -3124,6 +3125,7 @@ main(int argc, char *argv[]) >>> } >>> >>> struct engine_context eng_ctx = { >>> + .ovnsb_idl = ovnsb_idl_loop.idl, >>> .ovs_idl_txn = ovs_idl_txn, >>> .ovnsb_idl_txn = ovnsb_idl_txn, >>> .client_ctx = &ctrl_engine_ctx >>> diff --git a/lib/inc-proc-eng.h b/lib/inc-proc-eng.h >>> index 7e9f5bb70..506da51a1 100644 >>> --- a/lib/inc-proc-eng.h >>> +++ b/lib/inc-proc-eng.h >>> @@ -64,6 +64,7 @@ >>> #define ENGINE_MAX_OVSDB_INDEX 256 >>> >>> struct engine_context { >>> + struct ovsdb_idl *ovnsb_idl; >>> struct ovsdb_idl_txn *ovs_idl_txn; >>> struct ovsdb_idl_txn *ovnsb_idl_txn; >>> void *client_ctx; >>> diff --git a/northd/ovn-northd.c b/northd/ovn-northd.c >>> index 570c6a3ef..8649a590b 100644 >>> --- a/northd/ovn-northd.c >>> +++ b/northd/ovn-northd.c >>> @@ -3529,19 +3529,17 @@ build_ovn_lbs(struct northd_context *ctx, > struct hmap *datapaths, >>> continue; >>> } >>> >>> - const struct sbrec_load_balancer **sbrec_lbs = >>> - xmalloc(od->nbs->n_load_balancer * sizeof *sbrec_lbs); >>> + struct uuid *lb_uuids = >>> + xmalloc(od->nbs->n_load_balancer * sizeof *lb_uuids); >>> for (size_t i = 0; i < od->nbs->n_load_balancer; i++) { >>> const struct uuid *lb_uuid = >>> &od->nbs->load_balancer[i]->header_.uuid; >>> lb = ovn_northd_lb_find(lbs, lb_uuid); >>> - sbrec_lbs[i] = lb->slb; >>> + lb_uuids[i] = lb->slb->header_.uuid; >>> } >>> - >>> - sbrec_datapath_binding_set_load_balancers( >>> - od->sb, (struct sbrec_load_balancer **)sbrec_lbs, >>> - od->nbs->n_load_balancer); >>> - free(sbrec_lbs); >>> + sbrec_datapath_binding_set_load_balancers(od->sb, lb_uuids, >>> + > od->nbs->n_load_balancer); >>> + free(lb_uuids); > > I believe we need to change the deletion part as well. Now that we don't > have the weak reference, we need to take care of the LB deletions by > removing the reference explicitly from table datapath_binding. This is handling both addition and deletion, it's just storing the list of load balancers that are currently associated to the logical datapath. If a load balancer was deleted it won't be in the list. We already have tests validating this works fine: https://github.com/ovn-org/ovn/blob/ecc941c70f1675b66f6ecb2cdf09df362f62c97a/tests/ovn-northd.at#L2432 > > Thanks, > Han > > Thanks, Dumitru >>> } >>> } >>> >>> diff --git a/ovn-sb.ovsschema b/ovn-sb.ovsschema >>> index bbf60781d..57fbc3ca4 100644 >>> --- a/ovn-sb.ovsschema >>> +++ b/ovn-sb.ovsschema >>> @@ -1,7 +1,7 @@ >>> { >>> "name": "OVN_Southbound", >>> "version": "20.18.0", >>> - "cksum": "1816525029 26536", >>> + "cksum": "779623696 26386", >>> "tables": { >>> "SB_Global": { >>> "columns": { >>> @@ -166,9 +166,7 @@ >>> "type": {"key": {"type": "integer", >>> "minInteger": 1, >>> "maxInteger": 16777215}}}, >>> - "load_balancers": {"type": {"key": {"type": "uuid", >>> - "refTable": > "Load_Balancer", >>> - "refType": "weak"}, >>> + "load_balancers": {"type": {"key": {"type": "uuid"}, >>> "min": 0, >>> "max": "unlimited"}}, >>> "external_ids": { >>> >> >> _______________________________________________ >> dev mailing list >> dev@openvswitch.org >> https://mail.openvswitch.org/mailman/listinfo/ovs-dev >
On Fri, Jul 9, 2021 at 1:01 AM Dumitru Ceara <dceara@redhat.com> wrote: > > On 7/9/21 3:46 AM, Han Zhou wrote: > > On Thu, Jul 8, 2021 at 1:10 PM Mark Michelson <mmichels@redhat.com> wrote: > >> > >> Acked-by: Mark Michelson <mmichels@redhat.com> > >> > > Thanks Dumitru and Mark! > > > > Hi Mark and Han, > > Thanks for the reviews! > > >> > >> If I'm to extrapolate this a bit more, it sounds like any time a > >> Datapath_Binding is updated, it results in the potential for many > >> Logical_Flows to be re-calculated. In this particular case, it was > >> triggered based on modifying a wide-reaching load balancer. I suppose > >> the same thing would happen if Datapath_Binding external_ids were updated. > >> > > Yes. Luckily it seems external_ids of datapath_binding shouldn't change > > very often. But I agree this is a general problem of the I-P. > > > > I see it in the same way. I-P doesn't give per column change tracking > information. > > >> I wonder if it's worth investigating trying to create a new reference > >> type for the purposes of the IDL. The idea would be that if a column in > >> table A references rows in table B, then if the rows in table B are > >> updated, then any relevant rows in table A will reported as updated as > >> well. However, any columns in tables C, D, E, etc. that reference table > >> A will not result in the rows in tables C, D, E, etc. being reported as > >> updated. > >> > >> Essentially, it's the same as what you've done here, except we can have > >> the IDL fill in the sbrec_load_balancer in the Datapath_Binding record > >> instead of having to look it up when needed. > >> > > It may be helpful to have a new reference type, but deciding when and where > > to use it would be very tricky. If not careful enough, we might be creating > > some dependency in the code without getting proper change notification. > > > >> Alternatively, would it be worth listing the relevant columns that a > >> particular type "cares" about? For instance, Logical_Datapaths are only > >> affected if the tunnel_key is updated on a Datapath_Binding. The > >> load_balancers and external_ids are irrelevant. Would it be worthwhile > >> to be able to note in the reference the columns that should elicit an > >> update? > > > > This sounds more reliable. Ideally, if referenced columns are noted, the > > generated IDL structure of A's field B should only contains the columns > > that are interested by A. This way we can make sure it is impossible to > > secretly use a column without being notified of updates that possibly need > > to be handled by I-P. I am not sure how hard this would be. > > > > I think this is part of a larger discussion (there were some started a > while back) about refactoring ovn-controller in general. As far as I've > seen there are multiple options: > > - keep using the IDL and enhance it (and the schema), suggested above > - stop using IDL, use ovsdb-cs directly instead and implement a new type > of incremental processing in ovn-controller. This can be either ddlog > on top of ovsdb-cs (like ovn-northd-ddlog does), or something else. > > >> > >> Is it worth it? Or is this such a niche use case that it's not worth > >> implementing? > > > > I think we can see if there are more such scenarios that justify the IDL > > change. I am ok with changing the schema for the current use case to > > replace the reference by just raw uuid. However, we should never directly > > use IDL from the ctx and access any table, this would break the I-P > > dependency tracking. Please see more comments inlined. > > > > In the general case I tend to agree but in this specific case > Datapath_Binding doesn't really depend on Load_Balancer, it just uses > those uuids as hints. Please see my replies inline. > > >> > >> On 7/7/21 10:56 AM, Dumitru Ceara wrote: > >>> Whenever a Load_Balancer is updated, e.g., a VIP is added, the following > >>> sequence of events happens: > >>> > >>> 1. The Southbound Load_Balancer record is updated. > >>> 2. The Southbound Datapath_Binding records on which the Load_Balancer is > >>> applied are updated. > >>> 3. Southbound ovsdb-server sends updates about the Load_Balancer and > >>> Datapath_Binding records to ovn-controller. > >>> 4. The IDL layer in ovn-controller processes the updates at #3, but > >>> because of the SB schema references between tables [0] all logical > >>> flows referencing the updated Datapath_Binding are marked as > >>> "updated". The same is true for Logical_DP_Group records > >>> referencing the Datapath_Binding, and also for all logical flows > >>> pointing to the new "updated" datapath groups. > >>> 5. ovn-controller ends up recomputing (removing/readding) all flows for > >>> all these tracked updates. > >>> > >>> From the SB Schema: > >>> "Datapath_Binding": { > >>> "columns": { > >>> [...] > >>> "load_balancers": {"type": {"key": {"type": "uuid", > >>> "refTable": > > "Load_Balancer", > >>> "refType": "weak"}, > >>> "min": 0, > >>> "max": "unlimited"}}, > >>> [...] > >>> "Load_Balancer": { > >>> "columns": { > >>> "datapaths": { > >>> [...] > >>> "type": {"key": {"type": "uuid", > >>> "refTable": "Datapath_Binding"}, > >>> "min": 0, "max": "unlimited"}}, > >>> [...] > >>> "Logical_DP_Group": { > >>> "columns": { > >>> "datapaths": > >>> {"type": {"key": {"type": "uuid", > >>> "refTable": "Datapath_Binding", > >>> "refType": "weak"}, > >>> "min": 0, "max": "unlimited"}}}, > >>> [...] > >>> "Logical_Flow": { > >>> "columns": { > >>> "logical_datapath": > >>> {"type": {"key": {"type": "uuid", > >>> "refTable": "Datapath_Binding"}, > >>> "min": 0, "max": 1}}, > >>> "logical_dp_group": > >>> {"type": {"key": {"type": "uuid", > >>> "refTable": "Logical_DP_Group"}, > >>> > >>> In order to avoid this unnecessary Logical_Flow notification storm we > >>> now remove the explicit reference from Datapath_Binding to > >>> Load_Balancer and instead store raw UUIDs. > >>> > >>> This means that on the ovn-controller side we need to perform a > >>> Load_Balancer table lookup by UUID whenever a new datapath is added, > >>> but that doesn't happen too often and the cost of the lookup is > >>> negligible compared to the huge cost of processing the unnecessary > >>> logical flow updates. > >>> > >>> This change is backwards compatible because the contents stored in the > >>> database are not changed, just that the schema constraints are relaxed a > >>> bit. > >>> > >>> Some performance measurements, on a scale test deployment simulating an > >>> ovn-kubernetes deployment with 120 nodes and a large load balancer > >>> with 16K VIPs associated to each node's logical switch, the event > >>> processing loop time in ovn-controller, when adding a new VIP, is > >>> reduced from ~39 seconds to ~8 seconds. > >>> > >>> There's no need to change the northd DDlog implementation. > >>> > >>> Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=1978605 > >>> Signed-off-by: Dumitru Ceara <dceara@redhat.com> > >>> --- > >>> controller/lflow.c | 6 ++++-- > >>> controller/lflow.h | 2 ++ > >>> controller/ovn-controller.c | 2 ++ > >>> lib/inc-proc-eng.h | 1 + > >>> northd/ovn-northd.c | 14 ++++++-------- > >>> ovn-sb.ovsschema | 6 ++---- > >>> 6 files changed, 17 insertions(+), 14 deletions(-) > >>> > >>> diff --git a/controller/lflow.c b/controller/lflow.c > >>> index 60aa011ff..877bd49b0 100644 > >>> --- a/controller/lflow.c > >>> +++ b/controller/lflow.c > >>> @@ -1744,8 +1744,10 @@ lflow_processing_end: > >>> /* Add load balancer hairpin flows if the datapath has any load > > balancers > >>> * associated. */ > >>> for (size_t i = 0; i < dp->n_load_balancers; i++) { > >>> - consider_lb_hairpin_flows(dp->load_balancers[i], > >>> - l_ctx_in->local_datapaths, > >>> + const struct sbrec_load_balancer *lb = > >>> + sbrec_load_balancer_get_for_uuid(l_ctx_in->sb_idl, > >>> + &dp->load_balancers[i]); > > > > We should never directly access DB tables this way within the I-P engine. > > We will lose track of the dependencies. Now that we removed the reference, > > we should add load_balancer table as a new dependency, get the table data > > through EN_OVSDB_GET, and explicitly add a change handler to avoid > > recompute. > > > > There's some confusion here. > > Datapath doesn't really depend on Load_Balancer. From the beginning > there shouldn't have been a reference from Datapath_Binding to > Load_Balancer. > > As a matter of fact NB.Load_Balancer used to be completely handled by > ovn-northd and would generate logical flows. So the real dependency is > "en_lflow_output" depends on SB load balancers. > > We already have an I-P handler for that: > lflow_output_sb_load_balancer_handler() > > We don't *really* need to store load_balancer references in > Datapath_Binding. The only reason they were added there when > Load_Balancer flow generation was moved to ovn-controller was to avoid > walking all load balancers whenever a datapath was added. > > However, when a load balancer is changed there's no change to be handled > for datapaths themselves. > > As a matter of fact, an alternative implementation (that obviously > doesn't scale well) is: > > SBREC_LOAD_BALANCER_TABLE_FOR_EACH (lb, l_ctx_in->lb_table) { > for (size_t i = 0; i < lb->n_datapaths; i++) { > if (lb->datapaths[i] == dp) { > consider_lb_hairpin_flows(lb, ...); > break; > } > } > } > > To avoid this potentially expensive table walk, we use the load_balancer > uuids stored in the datapath record itself (it's probably best to see > those as hints I guess). > Thanks for the explain. What you described is indeed a dependency between lflow and sb_load_balancer because in lflow's compute/change handlers sb_load_balancer data is required. (otherwise we would not need to call sbrec_load_balancer_get_for_uuid(). However, since this dependency is already captured in the I-P, it is just easy for this use case. We should simply use sbrec_load_balancer_table_get_for_uuid() instead, which takes struct sbrec_load_balancer_table* as argument and we already have it in the lflow_ctx_in.lb_table as the input to lflow engine node. > >>> + consider_lb_hairpin_flows(lb, l_ctx_in->local_datapaths, > >>> l_ctx_out->flow_table); > >>> } > >>> > >>> diff --git a/controller/lflow.h b/controller/lflow.h > >>> index c17ff6dd4..5ba7af894 100644 > >>> --- a/controller/lflow.h > >>> +++ b/controller/lflow.h > >>> @@ -40,6 +40,7 @@ > >>> #include "openvswitch/list.h" > >>> > >>> struct ovn_extend_table; > >>> +struct ovsdb_idl; > >>> struct ovsdb_idl_index; > >>> struct ovn_desired_flow_table; > >>> struct hmap; > >>> @@ -126,6 +127,7 @@ void lflow_resource_destroy(struct > > lflow_resource_ref *); > >>> void lflow_resource_clear(struct lflow_resource_ref *); > >>> > >>> struct lflow_ctx_in { > >>> + struct ovsdb_idl *sb_idl; > >>> struct ovsdb_idl_index *sbrec_multicast_group_by_name_datapath; > >>> struct ovsdb_idl_index *sbrec_logical_flow_by_logical_datapath; > >>> struct ovsdb_idl_index *sbrec_logical_flow_by_logical_dp_group; > >>> diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c > >>> index 9050380f3..24443bad1 100644 > >>> --- a/controller/ovn-controller.c > >>> +++ b/controller/ovn-controller.c > >>> @@ -2016,6 +2016,7 @@ init_lflow_ctx(struct engine_node *node, > >>> engine_get_input_data("port_groups", node); > >>> struct shash *port_groups = &pg_data->port_groups_cs_local; > >>> > >>> + l_ctx_in->sb_idl = engine_get_context()->ovnsb_idl; > >>> l_ctx_in->sbrec_multicast_group_by_name_datapath = > >>> sbrec_mc_group_by_name_dp; > >>> l_ctx_in->sbrec_logical_flow_by_logical_datapath = > >>> @@ -3124,6 +3125,7 @@ main(int argc, char *argv[]) > >>> } > >>> > >>> struct engine_context eng_ctx = { > >>> + .ovnsb_idl = ovnsb_idl_loop.idl, > >>> .ovs_idl_txn = ovs_idl_txn, > >>> .ovnsb_idl_txn = ovnsb_idl_txn, > >>> .client_ctx = &ctrl_engine_ctx > >>> diff --git a/lib/inc-proc-eng.h b/lib/inc-proc-eng.h > >>> index 7e9f5bb70..506da51a1 100644 > >>> --- a/lib/inc-proc-eng.h > >>> +++ b/lib/inc-proc-eng.h > >>> @@ -64,6 +64,7 @@ > >>> #define ENGINE_MAX_OVSDB_INDEX 256 > >>> > >>> struct engine_context { > >>> + struct ovsdb_idl *ovnsb_idl; > >>> struct ovsdb_idl_txn *ovs_idl_txn; > >>> struct ovsdb_idl_txn *ovnsb_idl_txn; > >>> void *client_ctx; > >>> diff --git a/northd/ovn-northd.c b/northd/ovn-northd.c > >>> index 570c6a3ef..8649a590b 100644 > >>> --- a/northd/ovn-northd.c > >>> +++ b/northd/ovn-northd.c > >>> @@ -3529,19 +3529,17 @@ build_ovn_lbs(struct northd_context *ctx, > > struct hmap *datapaths, > >>> continue; > >>> } > >>> > >>> - const struct sbrec_load_balancer **sbrec_lbs = > >>> - xmalloc(od->nbs->n_load_balancer * sizeof *sbrec_lbs); > >>> + struct uuid *lb_uuids = > >>> + xmalloc(od->nbs->n_load_balancer * sizeof *lb_uuids); > >>> for (size_t i = 0; i < od->nbs->n_load_balancer; i++) { > >>> const struct uuid *lb_uuid = > >>> &od->nbs->load_balancer[i]->header_.uuid; > >>> lb = ovn_northd_lb_find(lbs, lb_uuid); > >>> - sbrec_lbs[i] = lb->slb; > >>> + lb_uuids[i] = lb->slb->header_.uuid; > >>> } > >>> - > >>> - sbrec_datapath_binding_set_load_balancers( > >>> - od->sb, (struct sbrec_load_balancer **)sbrec_lbs, > >>> - od->nbs->n_load_balancer); > >>> - free(sbrec_lbs); > >>> + sbrec_datapath_binding_set_load_balancers(od->sb, lb_uuids, > >>> + > > od->nbs->n_load_balancer); > >>> + free(lb_uuids); > > > > I believe we need to change the deletion part as well. Now that we don't > > have the weak reference, we need to take care of the LB deletions by > > removing the reference explicitly from table datapath_binding. > > This is handling both addition and deletion, it's just storing the list > of load balancers that are currently associated to the logical datapath. > > If a load balancer was deleted it won't be in the list. > > We already have tests validating this works fine: > > https://github.com/ovn-org/ovn/blob/ecc941c70f1675b66f6ecb2cdf09df362f62c97a/tests/ovn-northd.at#L2432 > Oh you are right. Sorry that my brain was somehow wired with I-P when reviewing this. > > > > Thanks, > > Han > > > > > > Thanks, > Dumitru > > >>> } > >>> } > >>> > >>> diff --git a/ovn-sb.ovsschema b/ovn-sb.ovsschema > >>> index bbf60781d..57fbc3ca4 100644 > >>> --- a/ovn-sb.ovsschema > >>> +++ b/ovn-sb.ovsschema > >>> @@ -1,7 +1,7 @@ > >>> { > >>> "name": "OVN_Southbound", > >>> "version": "20.18.0", > >>> - "cksum": "1816525029 26536", > >>> + "cksum": "779623696 26386", > >>> "tables": { > >>> "SB_Global": { > >>> "columns": { > >>> @@ -166,9 +166,7 @@ > >>> "type": {"key": {"type": "integer", > >>> "minInteger": 1, > >>> "maxInteger": 16777215}}}, > >>> - "load_balancers": {"type": {"key": {"type": "uuid", > >>> - "refTable": > > "Load_Balancer", > >>> - "refType": "weak"}, > >>> + "load_balancers": {"type": {"key": {"type": "uuid"}, > >>> "min": 0, > >>> "max": "unlimited"}}, > >>> "external_ids": { > >>> > >> > >> _______________________________________________ > >> dev mailing list > >> dev@openvswitch.org > >> https://mail.openvswitch.org/mailman/listinfo/ovs-dev > > >
On 7/9/21 6:11 PM, Han Zhou wrote: >> To avoid this potentially expensive table walk, we use the load_balancer >> uuids stored in the datapath record itself (it's probably best to see >> those as hints I guess). >> > Thanks for the explain. What you described is indeed a dependency between > lflow and sb_load_balancer because in lflow's compute/change handlers > sb_load_balancer data is required. (otherwise we would not need to call > sbrec_load_balancer_get_for_uuid(). > > However, since this dependency is already captured in the I-P, it is just > easy for this use case. We should simply use > sbrec_load_balancer_table_get_for_uuid() instead, which takes struct > sbrec_load_balancer_table* as argument and we already have it in the > lflow_ctx_in.lb_table as the input to lflow engine node. > You're right, it's simpler like this, thanks for pointing out the sbrec_*table_get_for_uuid() variant. I sent a v2: http://patchwork.ozlabs.org/project/ovn/list/?series=253029 Regards, Dumitru
On 7/12/21 10:11 AM, Dumitru Ceara wrote: > On 7/9/21 6:11 PM, Han Zhou wrote: >>> To avoid this potentially expensive table walk, we use the load_balancer >>> uuids stored in the datapath record itself (it's probably best to see >>> those as hints I guess). >>> >> Thanks for the explain. What you described is indeed a dependency between >> lflow and sb_load_balancer because in lflow's compute/change handlers >> sb_load_balancer data is required. (otherwise we would not need to call >> sbrec_load_balancer_get_for_uuid(). >> >> However, since this dependency is already captured in the I-P, it is just >> easy for this use case. We should simply use >> sbrec_load_balancer_table_get_for_uuid() instead, which takes struct >> sbrec_load_balancer_table* as argument and we already have it in the >> lflow_ctx_in.lb_table as the input to lflow engine node. >> > > You're right, it's simpler like this, thanks for pointing out the > sbrec_*table_get_for_uuid() variant. > > I sent a v2: > > http://patchwork.ozlabs.org/project/ovn/list/?series=253029 > Sorry for the noise; Ilya mentioned offline that I forgot to update the schema version number, I sent a v3 taking care of that: http://patchwork.ozlabs.org/project/ovn/list/?series=253094 Regards, Dumitru
diff --git a/controller/lflow.c b/controller/lflow.c index 60aa011ff..877bd49b0 100644 --- a/controller/lflow.c +++ b/controller/lflow.c @@ -1744,8 +1744,10 @@ lflow_processing_end: /* Add load balancer hairpin flows if the datapath has any load balancers * associated. */ for (size_t i = 0; i < dp->n_load_balancers; i++) { - consider_lb_hairpin_flows(dp->load_balancers[i], - l_ctx_in->local_datapaths, + const struct sbrec_load_balancer *lb = + sbrec_load_balancer_get_for_uuid(l_ctx_in->sb_idl, + &dp->load_balancers[i]); + consider_lb_hairpin_flows(lb, l_ctx_in->local_datapaths, l_ctx_out->flow_table); } diff --git a/controller/lflow.h b/controller/lflow.h index c17ff6dd4..5ba7af894 100644 --- a/controller/lflow.h +++ b/controller/lflow.h @@ -40,6 +40,7 @@ #include "openvswitch/list.h" struct ovn_extend_table; +struct ovsdb_idl; struct ovsdb_idl_index; struct ovn_desired_flow_table; struct hmap; @@ -126,6 +127,7 @@ void lflow_resource_destroy(struct lflow_resource_ref *); void lflow_resource_clear(struct lflow_resource_ref *); struct lflow_ctx_in { + struct ovsdb_idl *sb_idl; struct ovsdb_idl_index *sbrec_multicast_group_by_name_datapath; struct ovsdb_idl_index *sbrec_logical_flow_by_logical_datapath; struct ovsdb_idl_index *sbrec_logical_flow_by_logical_dp_group; diff --git a/controller/ovn-controller.c b/controller/ovn-controller.c index 9050380f3..24443bad1 100644 --- a/controller/ovn-controller.c +++ b/controller/ovn-controller.c @@ -2016,6 +2016,7 @@ init_lflow_ctx(struct engine_node *node, engine_get_input_data("port_groups", node); struct shash *port_groups = &pg_data->port_groups_cs_local; + l_ctx_in->sb_idl = engine_get_context()->ovnsb_idl; l_ctx_in->sbrec_multicast_group_by_name_datapath = sbrec_mc_group_by_name_dp; l_ctx_in->sbrec_logical_flow_by_logical_datapath = @@ -3124,6 +3125,7 @@ main(int argc, char *argv[]) } struct engine_context eng_ctx = { + .ovnsb_idl = ovnsb_idl_loop.idl, .ovs_idl_txn = ovs_idl_txn, .ovnsb_idl_txn = ovnsb_idl_txn, .client_ctx = &ctrl_engine_ctx diff --git a/lib/inc-proc-eng.h b/lib/inc-proc-eng.h index 7e9f5bb70..506da51a1 100644 --- a/lib/inc-proc-eng.h +++ b/lib/inc-proc-eng.h @@ -64,6 +64,7 @@ #define ENGINE_MAX_OVSDB_INDEX 256 struct engine_context { + struct ovsdb_idl *ovnsb_idl; struct ovsdb_idl_txn *ovs_idl_txn; struct ovsdb_idl_txn *ovnsb_idl_txn; void *client_ctx; diff --git a/northd/ovn-northd.c b/northd/ovn-northd.c index 570c6a3ef..8649a590b 100644 --- a/northd/ovn-northd.c +++ b/northd/ovn-northd.c @@ -3529,19 +3529,17 @@ build_ovn_lbs(struct northd_context *ctx, struct hmap *datapaths, continue; } - const struct sbrec_load_balancer **sbrec_lbs = - xmalloc(od->nbs->n_load_balancer * sizeof *sbrec_lbs); + struct uuid *lb_uuids = + xmalloc(od->nbs->n_load_balancer * sizeof *lb_uuids); for (size_t i = 0; i < od->nbs->n_load_balancer; i++) { const struct uuid *lb_uuid = &od->nbs->load_balancer[i]->header_.uuid; lb = ovn_northd_lb_find(lbs, lb_uuid); - sbrec_lbs[i] = lb->slb; + lb_uuids[i] = lb->slb->header_.uuid; } - - sbrec_datapath_binding_set_load_balancers( - od->sb, (struct sbrec_load_balancer **)sbrec_lbs, - od->nbs->n_load_balancer); - free(sbrec_lbs); + sbrec_datapath_binding_set_load_balancers(od->sb, lb_uuids, + od->nbs->n_load_balancer); + free(lb_uuids); } } diff --git a/ovn-sb.ovsschema b/ovn-sb.ovsschema index bbf60781d..57fbc3ca4 100644 --- a/ovn-sb.ovsschema +++ b/ovn-sb.ovsschema @@ -1,7 +1,7 @@ { "name": "OVN_Southbound", "version": "20.18.0", - "cksum": "1816525029 26536", + "cksum": "779623696 26386", "tables": { "SB_Global": { "columns": { @@ -166,9 +166,7 @@ "type": {"key": {"type": "integer", "minInteger": 1, "maxInteger": 16777215}}}, - "load_balancers": {"type": {"key": {"type": "uuid", - "refTable": "Load_Balancer", - "refType": "weak"}, + "load_balancers": {"type": {"key": {"type": "uuid"}, "min": 0, "max": "unlimited"}}, "external_ids": {
Whenever a Load_Balancer is updated, e.g., a VIP is added, the following sequence of events happens: 1. The Southbound Load_Balancer record is updated. 2. The Southbound Datapath_Binding records on which the Load_Balancer is applied are updated. 3. Southbound ovsdb-server sends updates about the Load_Balancer and Datapath_Binding records to ovn-controller. 4. The IDL layer in ovn-controller processes the updates at #3, but because of the SB schema references between tables [0] all logical flows referencing the updated Datapath_Binding are marked as "updated". The same is true for Logical_DP_Group records referencing the Datapath_Binding, and also for all logical flows pointing to the new "updated" datapath groups. 5. ovn-controller ends up recomputing (removing/readding) all flows for all these tracked updates. From the SB Schema: "Datapath_Binding": { "columns": { [...] "load_balancers": {"type": {"key": {"type": "uuid", "refTable": "Load_Balancer", "refType": "weak"}, "min": 0, "max": "unlimited"}}, [...] "Load_Balancer": { "columns": { "datapaths": { [...] "type": {"key": {"type": "uuid", "refTable": "Datapath_Binding"}, "min": 0, "max": "unlimited"}}, [...] "Logical_DP_Group": { "columns": { "datapaths": {"type": {"key": {"type": "uuid", "refTable": "Datapath_Binding", "refType": "weak"}, "min": 0, "max": "unlimited"}}}, [...] "Logical_Flow": { "columns": { "logical_datapath": {"type": {"key": {"type": "uuid", "refTable": "Datapath_Binding"}, "min": 0, "max": 1}}, "logical_dp_group": {"type": {"key": {"type": "uuid", "refTable": "Logical_DP_Group"}, In order to avoid this unnecessary Logical_Flow notification storm we now remove the explicit reference from Datapath_Binding to Load_Balancer and instead store raw UUIDs. This means that on the ovn-controller side we need to perform a Load_Balancer table lookup by UUID whenever a new datapath is added, but that doesn't happen too often and the cost of the lookup is negligible compared to the huge cost of processing the unnecessary logical flow updates. This change is backwards compatible because the contents stored in the database are not changed, just that the schema constraints are relaxed a bit. Some performance measurements, on a scale test deployment simulating an ovn-kubernetes deployment with 120 nodes and a large load balancer with 16K VIPs associated to each node's logical switch, the event processing loop time in ovn-controller, when adding a new VIP, is reduced from ~39 seconds to ~8 seconds. There's no need to change the northd DDlog implementation. Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=1978605 Signed-off-by: Dumitru Ceara <dceara@redhat.com> --- controller/lflow.c | 6 ++++-- controller/lflow.h | 2 ++ controller/ovn-controller.c | 2 ++ lib/inc-proc-eng.h | 1 + northd/ovn-northd.c | 14 ++++++-------- ovn-sb.ovsschema | 6 ++---- 6 files changed, 17 insertions(+), 14 deletions(-)