diff mbox series

[ovs-dev] northd: Initialize hmap size in lflow_mgr.

Message ID 20240214182616.3260992-1-xsimonar@redhat.com
State Accepted
Headers show
Series [ovs-dev] northd: Initialize hmap size in lflow_mgr. | expand

Checks

Context Check Description
ovsrobot/apply-robot success apply and check: success
ovsrobot/github-robot-_Build_and_Test success github build: passed
ovsrobot/github-robot-_ovn-kubernetes success github build: passed

Commit Message

Xavier Simonart Feb. 14, 2024, 6:26 p.m. UTC
When (re)starting ovn-northd with an existing big nbdb,
the first iteration of northd was very slow as trying to
push all flows in a single bucket.

Fixes: a623606052ea ("northd: Refactor lflow management into a separate module.")

Signed-off-by: Xavier Simonart <xsimonar@redhat.com>
---
 northd/lflow-mgr.c | 3 +++
 1 file changed, 3 insertions(+)

Comments

Numan Siddique Feb. 14, 2024, 9:57 p.m. UTC | #1
On Wed, Feb 14, 2024 at 1:26 PM Xavier Simonart <xsimonar@redhat.com> wrote:
>
> When (re)starting ovn-northd with an existing big nbdb,
> the first iteration of northd was very slow as trying to
> push all flows in a single bucket.
>
> Fixes: a623606052ea ("northd: Refactor lflow management into a separate module.")
>
> Signed-off-by: Xavier Simonart <xsimonar@redhat.com>

Hi Xavier,

Thanks for the patch.  Did the first iteration of north become very
slow after the lflow-mgr was added ?

If I compare the code before these patches were merged,  lflows_temp
hmap was not a fast hmap [1].
So I'm still confused how this regression was introduced ?

[1] - https://github.com/ovn-org/ovn/blob/branch-23.09/northd/northd.c#L16717

If you see the code I linked above,  lflows_temp is declared and initialized as

struct hmap lflows_temp = HMAP_INITIALIZER(&lflows_temp);

And later a lflow is removed from the "lflows" hmap and inserted into
it. And finally it is swapped with "lflows"

hmap_swap(lflows, &lflows_temp);
hmap_destroy(&lflows_temp);

The same thing is done in the lflow-mgr too.

I'm fine with this patch if it speeds up the first iteration.  I want
to understand if this behavior is a regression or not.

Thanks
Numan


> ---
>  northd/lflow-mgr.c | 3 +++
>  1 file changed, 3 insertions(+)
>
> diff --git a/northd/lflow-mgr.c b/northd/lflow-mgr.c
> index 61729e903..af072bf70 100644
> --- a/northd/lflow-mgr.c
> +++ b/northd/lflow-mgr.c
> @@ -260,6 +260,9 @@ lflow_table_sync_to_sb(struct lflow_table *lflow_table,
>      struct hmap *lflows = &lflow_table->entries;
>      struct ovn_lflow *lflow;
>
> +    fast_hmap_size_for(&lflows_temp,
> +                       lflow_table->max_seen_lflow_size);
> +
>      /* Push changes to the Logical_Flow table to database. */
>      const struct sbrec_logical_flow *sbflow;
>      SBREC_LOGICAL_FLOW_TABLE_FOR_EACH_SAFE (sbflow, sb_flow_table) {
> --
> 2.41.0
>
> _______________________________________________
> dev mailing list
> dev@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>
Xavier Simonart Feb. 15, 2024, 10:10 a.m. UTC | #2
Hi Numan

Thanks for the quick review.

On Wed, Feb 14, 2024 at 10:57 PM Numan Siddique <numans@ovn.org> wrote:

> On Wed, Feb 14, 2024 at 1:26 PM Xavier Simonart <xsimonar@redhat.com>
> wrote:
> >
> > When (re)starting ovn-northd with an existing big nbdb,
> > the first iteration of northd was very slow as trying to
> > push all flows in a single bucket.
> >
> > Fixes: a623606052ea ("northd: Refactor lflow management into a separate
> module.")
> >
> > Signed-off-by: Xavier Simonart <xsimonar@redhat.com>
>
> Hi Xavier,
>
> Thanks for the patch.  Did the first iteration of north become very
> slow after the lflow-mgr was added ?
>
Yes, it was introduced by "northd: Refactor lflow management into a
separate module"
We can test this easily using the check-perf tests. Using "check perf
TESTSUITEFLAGS="-v 2", it takes around 16 seconds (on my system) before the
patch and 213 seconds after.
The difference gets exponential if we increase the number of ports.

>
> If I compare the code before these patches were merged,  lflows_temp
> hmap was not a fast hmap [1].
> So I'm still confused how this regression was introduced ?
>
> [1] -
> https://github.com/ovn-org/ovn/blob/branch-23.09/northd/northd.c#L16717
>
> If you see the code I linked above,  lflows_temp is declared and
> initialized as
>
> struct hmap lflows_temp = HMAP_INITIALIZER(&lflows_temp);
>
> And later a lflow is removed from the "lflows" hmap and inserted into
> it. And finally it is swapped with "lflows"
>
> hmap_swap(lflows, &lflows_temp);
> hmap_destroy(&lflows_temp);
>
> The same thing is done in the lflow-mgr too.
>
> I'm fine with this patch if it speeds up the first iteration.  I want
> to understand if this behavior is a regression or not.
>
> Sorry, I should have added more comments in the commit message. The extra
time spent is not in lflow_table_sync_to_sb.
But the hmap_swap in lflow_table_sync_to_sb, when the hmap is empty, swaps
an empty hmap initialized with 128 buckets with an
empty hmap initialized with one bucket.
Then, when we build flows the first time, the "fast" hmap has only one
bucket and could get millions of flows in this poor bucket.
When initialized with 128 buckets, the initial flow build is not great, but
still much better ...
Another fix could have been to set the number of buckets in lflows_temp to
128, or to avoid the swap if both hmap are empty.

> Thanks
> Numan
>
> Thanks
Xavier

>
> > ---
> >  northd/lflow-mgr.c | 3 +++
> >  1 file changed, 3 insertions(+)
> >
> > diff --git a/northd/lflow-mgr.c b/northd/lflow-mgr.c
> > index 61729e903..af072bf70 100644
> > --- a/northd/lflow-mgr.c
> > +++ b/northd/lflow-mgr.c
> > @@ -260,6 +260,9 @@ lflow_table_sync_to_sb(struct lflow_table
> *lflow_table,
> >      struct hmap *lflows = &lflow_table->entries;
> >      struct ovn_lflow *lflow;
> >
> > +    fast_hmap_size_for(&lflows_temp,
> > +                       lflow_table->max_seen_lflow_size);
> > +
> >      /* Push changes to the Logical_Flow table to database. */
> >      const struct sbrec_logical_flow *sbflow;
> >      SBREC_LOGICAL_FLOW_TABLE_FOR_EACH_SAFE (sbflow, sb_flow_table) {
> > --
> > 2.41.0
> >
> > _______________________________________________
> > dev mailing list
> > dev@openvswitch.org
> > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> >
>
>
Numan Siddique Feb. 15, 2024, 3:51 p.m. UTC | #3
On Thu, Feb 15, 2024 at 5:10 AM Xavier Simonart <xsimonar@redhat.com> wrote:
>
> Hi Numan
>
> Thanks for the quick review.
>
> On Wed, Feb 14, 2024 at 10:57 PM Numan Siddique <numans@ovn.org> wrote:
>
> > On Wed, Feb 14, 2024 at 1:26 PM Xavier Simonart <xsimonar@redhat.com>
> > wrote:
> > >
> > > When (re)starting ovn-northd with an existing big nbdb,
> > > the first iteration of northd was very slow as trying to
> > > push all flows in a single bucket.
> > >
> > > Fixes: a623606052ea ("northd: Refactor lflow management into a separate
> > module.")
> > >
> > > Signed-off-by: Xavier Simonart <xsimonar@redhat.com>
> >
> > Hi Xavier,
> >
> > Thanks for the patch.  Did the first iteration of north become very
> > slow after the lflow-mgr was added ?
> >
> Yes, it was introduced by "northd: Refactor lflow management into a
> separate module"
> We can test this easily using the check-perf tests. Using "check perf
> TESTSUITEFLAGS="-v 2", it takes around 16 seconds (on my system) before the
> patch and 213 seconds after.
> The difference gets exponential if we increase the number of ports.
>
> >
> > If I compare the code before these patches were merged,  lflows_temp
> > hmap was not a fast hmap [1].
> > So I'm still confused how this regression was introduced ?
> >
> > [1] -
> > https://github.com/ovn-org/ovn/blob/branch-23.09/northd/northd.c#L16717
> >
> > If you see the code I linked above,  lflows_temp is declared and
> > initialized as
> >
> > struct hmap lflows_temp = HMAP_INITIALIZER(&lflows_temp);
> >
> > And later a lflow is removed from the "lflows" hmap and inserted into
> > it. And finally it is swapped with "lflows"
> >
> > hmap_swap(lflows, &lflows_temp);
> > hmap_destroy(&lflows_temp);
> >
> > The same thing is done in the lflow-mgr too.
> >
> > I'm fine with this patch if it speeds up the first iteration.  I want
> > to understand if this behavior is a regression or not.
> >
> > Sorry, I should have added more comments in the commit message. The extra
> time spent is not in lflow_table_sync_to_sb.
> But the hmap_swap in lflow_table_sync_to_sb, when the hmap is empty, swaps
> an empty hmap initialized with 128 buckets with an
> empty hmap initialized with one bucket.
> Then, when we build flows the first time, the "fast" hmap has only one
> bucket and could get millions of flows in this poor bucket.
> When initialized with 128 buckets, the initial flow build is not great, but
> still much better ...
> Another fix could have been to set the number of buckets in lflows_temp to
> 128, or to avoid the swap if both hmap are empty.

Now I understand what's going on.  Thank you for the fix.

I applied to main and backported to branch-24.03.

Thanks
Numan

>
> > Thanks
> > Numan
> >
> > Thanks
> Xavier
>
> >
> > > ---
> > >  northd/lflow-mgr.c | 3 +++
> > >  1 file changed, 3 insertions(+)
> > >
> > > diff --git a/northd/lflow-mgr.c b/northd/lflow-mgr.c
> > > index 61729e903..af072bf70 100644
> > > --- a/northd/lflow-mgr.c
> > > +++ b/northd/lflow-mgr.c
> > > @@ -260,6 +260,9 @@ lflow_table_sync_to_sb(struct lflow_table
> > *lflow_table,
> > >      struct hmap *lflows = &lflow_table->entries;
> > >      struct ovn_lflow *lflow;
> > >
> > > +    fast_hmap_size_for(&lflows_temp,
> > > +                       lflow_table->max_seen_lflow_size);
> > > +
> > >      /* Push changes to the Logical_Flow table to database. */
> > >      const struct sbrec_logical_flow *sbflow;
> > >      SBREC_LOGICAL_FLOW_TABLE_FOR_EACH_SAFE (sbflow, sb_flow_table) {
> > > --
> > > 2.41.0
> > >
> > > _______________________________________________
> > > dev mailing list
> > > dev@openvswitch.org
> > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> > >
> >
> >
> _______________________________________________
> dev mailing list
> dev@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
diff mbox series

Patch

diff --git a/northd/lflow-mgr.c b/northd/lflow-mgr.c
index 61729e903..af072bf70 100644
--- a/northd/lflow-mgr.c
+++ b/northd/lflow-mgr.c
@@ -260,6 +260,9 @@  lflow_table_sync_to_sb(struct lflow_table *lflow_table,
     struct hmap *lflows = &lflow_table->entries;
     struct ovn_lflow *lflow;
 
+    fast_hmap_size_for(&lflows_temp,
+                       lflow_table->max_seen_lflow_size);
+
     /* Push changes to the Logical_Flow table to database. */
     const struct sbrec_logical_flow *sbflow;
     SBREC_LOGICAL_FLOW_TABLE_FOR_EACH_SAFE (sbflow, sb_flow_table) {