Message ID | 20230823215705.1786348-1-i.maximets@ovn.org |
---|---|
State | Accepted |
Headers | show |
Series | [ovs-dev] northd: Fix incorrect memory allocation for router group datapaths. | expand |
Context | Check | Description |
---|---|---|
ovsrobot/apply-robot | success | apply and check: success |
ovsrobot/github-robot-_Build_and_Test | success | github build: passed |
ovsrobot/github-robot-_ovn-kubernetes | success | github build: passed |
Thanks Ilya, Acked-by: Mark Michelson <mmichels@redhat.com> On 8/23/23 17:57, Ilya Maximets wrote: > We need to allocate memory for 'n_router_dps' pointers, but the > current code allocates space for 'n_router_dps' datapaths, which > is 762 byte structure. Having 5000 routers in the OpenStack > setup, we allocate 762 * 5000 * 5000 ~= 17 GB of memory instead > of 190 MB: > > 98.17% (19,962,746,247B) (heap allocation functions) > ->90.67% (18,436,818,400B) xcalloc__ (util.c:124) > ->90.67% (18,436,818,400B) xcalloc (util.c:161) > ->90.67% (18,436,818,400B) build_lrouter_groups (northd.c:8707) > ->90.67% (18,436,818,400B) ovnnb_db_run (northd.c:17611) > ->90.67% (18,436,818,400B) en_northd_run (en-northd.c:137) > ->90.67% (18,436,818,400B) engine_recompute (inc-proc-eng.c:415) > ->90.67% (18,436,818,400B) engine_run_node (inc-proc-eng.c:477) > ->90.67% (18,436,818,400B) engine_run (inc-proc-eng.c:528) > ->90.67% (18,436,818,400B) inc_proc_northd_run (inc-proc-northd.c:316) > ->90.67% (18,436,818,400B) main (ovn-northd.c:937) > > Also, even though this doesn't affect the allocation size, the order > of arguments is incorrect. > > Fix that by using correct pointers and order. That saves a huge > amount of memory as well as time on allocating and zeroing it out. > In my testing northd node saved 10 seconds on these allocations. > > Fixes: b82d351976a1 ("ovn: Add generic HA chassis group") > Signed-off-by: Ilya Maximets <i.maximets@ovn.org> > --- > northd/northd.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/northd/northd.c b/northd/northd.c > index 0a749931e..c701d929e 100644 > --- a/northd/northd.c > +++ b/northd/northd.c > @@ -8704,7 +8704,8 @@ build_lrouter_groups(struct hmap *lr_ports, struct ovs_list *lr_list) > od->lr_group = xzalloc(sizeof *od->lr_group); > /* Each logical router group can have max > * 'n_router_dps'. So allocate enough memory. */ > - od->lr_group->router_dps = xcalloc(sizeof *od, n_router_dps); > + od->lr_group->router_dps = > + xcalloc(n_router_dps, sizeof *od->lr_group->router_dps); > od->lr_group->router_dps[0] = od; > od->lr_group->n_router_dps = 1; > sset_init(&od->lr_group->ha_chassis_groups);
I merged this change to main and all branches back to 22.03. Thanks Ilya! On 8/24/23 15:02, Mark Michelson wrote: > Thanks Ilya, > > Acked-by: Mark Michelson <mmichels@redhat.com> > > On 8/23/23 17:57, Ilya Maximets wrote: >> We need to allocate memory for 'n_router_dps' pointers, but the >> current code allocates space for 'n_router_dps' datapaths, which >> is 762 byte structure. Having 5000 routers in the OpenStack >> setup, we allocate 762 * 5000 * 5000 ~= 17 GB of memory instead >> of 190 MB: >> >> 98.17% (19,962,746,247B) (heap allocation functions) >> ->90.67% (18,436,818,400B) xcalloc__ (util.c:124) >> ->90.67% (18,436,818,400B) xcalloc (util.c:161) >> ->90.67% (18,436,818,400B) build_lrouter_groups (northd.c:8707) >> ->90.67% (18,436,818,400B) ovnnb_db_run (northd.c:17611) >> ->90.67% (18,436,818,400B) en_northd_run (en-northd.c:137) >> ->90.67% (18,436,818,400B) engine_recompute (inc-proc-eng.c:415) >> ->90.67% (18,436,818,400B) engine_run_node (inc-proc-eng.c:477) >> ->90.67% (18,436,818,400B) engine_run (inc-proc-eng.c:528) >> ->90.67% (18,436,818,400B) inc_proc_northd_run >> (inc-proc-northd.c:316) >> ->90.67% (18,436,818,400B) main (ovn-northd.c:937) >> >> Also, even though this doesn't affect the allocation size, the order >> of arguments is incorrect. >> >> Fix that by using correct pointers and order. That saves a huge >> amount of memory as well as time on allocating and zeroing it out. >> In my testing northd node saved 10 seconds on these allocations. >> >> Fixes: b82d351976a1 ("ovn: Add generic HA chassis group") >> Signed-off-by: Ilya Maximets <i.maximets@ovn.org> >> --- >> northd/northd.c | 3 ++- >> 1 file changed, 2 insertions(+), 1 deletion(-) >> >> diff --git a/northd/northd.c b/northd/northd.c >> index 0a749931e..c701d929e 100644 >> --- a/northd/northd.c >> +++ b/northd/northd.c >> @@ -8704,7 +8704,8 @@ build_lrouter_groups(struct hmap *lr_ports, >> struct ovs_list *lr_list) >> od->lr_group = xzalloc(sizeof *od->lr_group); >> /* Each logical router group can have max >> * 'n_router_dps'. So allocate enough memory. */ >> - od->lr_group->router_dps = xcalloc(sizeof *od, >> n_router_dps); >> + od->lr_group->router_dps = >> + xcalloc(n_router_dps, sizeof *od->lr_group->router_dps); >> od->lr_group->router_dps[0] = od; >> od->lr_group->n_router_dps = 1; >> sset_init(&od->lr_group->ha_chassis_groups); >
diff --git a/northd/northd.c b/northd/northd.c index 0a749931e..c701d929e 100644 --- a/northd/northd.c +++ b/northd/northd.c @@ -8704,7 +8704,8 @@ build_lrouter_groups(struct hmap *lr_ports, struct ovs_list *lr_list) od->lr_group = xzalloc(sizeof *od->lr_group); /* Each logical router group can have max * 'n_router_dps'. So allocate enough memory. */ - od->lr_group->router_dps = xcalloc(sizeof *od, n_router_dps); + od->lr_group->router_dps = + xcalloc(n_router_dps, sizeof *od->lr_group->router_dps); od->lr_group->router_dps[0] = od; od->lr_group->n_router_dps = 1; sset_init(&od->lr_group->ha_chassis_groups);
We need to allocate memory for 'n_router_dps' pointers, but the current code allocates space for 'n_router_dps' datapaths, which is 762 byte structure. Having 5000 routers in the OpenStack setup, we allocate 762 * 5000 * 5000 ~= 17 GB of memory instead of 190 MB: 98.17% (19,962,746,247B) (heap allocation functions) ->90.67% (18,436,818,400B) xcalloc__ (util.c:124) ->90.67% (18,436,818,400B) xcalloc (util.c:161) ->90.67% (18,436,818,400B) build_lrouter_groups (northd.c:8707) ->90.67% (18,436,818,400B) ovnnb_db_run (northd.c:17611) ->90.67% (18,436,818,400B) en_northd_run (en-northd.c:137) ->90.67% (18,436,818,400B) engine_recompute (inc-proc-eng.c:415) ->90.67% (18,436,818,400B) engine_run_node (inc-proc-eng.c:477) ->90.67% (18,436,818,400B) engine_run (inc-proc-eng.c:528) ->90.67% (18,436,818,400B) inc_proc_northd_run (inc-proc-northd.c:316) ->90.67% (18,436,818,400B) main (ovn-northd.c:937) Also, even though this doesn't affect the allocation size, the order of arguments is incorrect. Fix that by using correct pointers and order. That saves a huge amount of memory as well as time on allocating and zeroing it out. In my testing northd node saved 10 seconds on these allocations. Fixes: b82d351976a1 ("ovn: Add generic HA chassis group") Signed-off-by: Ilya Maximets <i.maximets@ovn.org> --- northd/northd.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)