diff mbox series

[ovs-dev,09/14] northd: Incremental processing of VIF additions in 'lflow' node.

Message ID 20230513000356.2475960-10-hzhou@ovn.org
State Changes Requested
Headers show
Series ovn-northd incremental processing for VIF changes | expand

Checks

Context Check Description
ovsrobot/apply-robot warning apply and check: warning
ovsrobot/github-robot-_Build_and_Test fail github build: failed
ovsrobot/github-robot-_ovn-kubernetes success github build: passed

Commit Message

Han Zhou May 13, 2023, 12:03 a.m. UTC
This patch introduces a change handler for 'northd' input within the
'lflow' node. It specifically handles cases when VIFs are created, which
is an easier start for lflow incremental processing. Support for
update/delete will be added later.

Below are the performance test results simulating an ovn-k8s topology of 500
nodes x 50 lsp per node:

Before:
ovn-nbctl --wait=hv --print-wait-time lsp-add ls_1_0001 lsp_1_0001_01 -- lsp-set-addresses lsp_1_0001_01 "ff:f1:bb:00:01:01 1.0.1.101" -- lsp-set-port-security lsp_1_0001_01 "ff:f1:bb:00:01:01 1.0.1.101"
    ovn-northd completion:          773ms

After:
ovn-nbctl --wait=hv --print-wait-time lsp-add ls_1_0001 lsp_1_0001_01 -- lsp-set-addresses lsp_1_0001_01 "ff:f1:bb:00:01:01 1.0.1.101" -- lsp-set-port-security lsp_1_0001_01 "ff:f1:bb:00:01:01 1.0.1.101"
    ovn-northd completion:          30ms

It is more than 95% reduction (or 20x faster).

Signed-off-by: Han Zhou <hzhou@ovn.org>
---
 northd/en-lflow.c        |  82 ++++++++-----
 northd/en-lflow.h        |   1 +
 northd/inc-proc-northd.c |   2 +-
 northd/northd.c          | 245 +++++++++++++++++++++++++++++++++------
 northd/northd.h          |   6 +-
 tests/ovn-northd.at      |   4 +-
 6 files changed, 277 insertions(+), 63 deletions(-)

Comments

0-day Robot May 13, 2023, 12:20 a.m. UTC | #1
Bleep bloop.  Greetings Han Zhou, I am a robot and I have tried out your patch.
Thanks for your contribution.

I encountered some error that I wasn't expecting.  See the details below.


checkpatch:
WARNING: Comment with 'xxx' marker
#209 FILE: northd/northd.c:5697:
 * XXX: this mechanism is temporary and will be replaced when we add hash index

WARNING: Comment with 'xxx' marker
#386 FILE: northd/northd.c:16066:
            /* XXX: implement lflow index so that we can handle updated and

Lines checked: 563, Warnings: 2, Errors: 0


Please check this out.  If you feel there has been an error, please email aconole@redhat.com

Thanks,
0-day Robot
Ilya Maximets May 29, 2023, 4:24 p.m. UTC | #2
On 5/13/23 02:03, Han Zhou wrote:
> This patch introduces a change handler for 'northd' input within the
> 'lflow' node. It specifically handles cases when VIFs are created, which
> is an easier start for lflow incremental processing. Support for
> update/delete will be added later.
> 
> Below are the performance test results simulating an ovn-k8s topology of 500
> nodes x 50 lsp per node:
> 
> Before:
> ovn-nbctl --wait=hv --print-wait-time lsp-add ls_1_0001 lsp_1_0001_01 -- lsp-set-addresses lsp_1_0001_01 "ff:f1:bb:00:01:01 1.0.1.101" -- lsp-set-port-security lsp_1_0001_01 "ff:f1:bb:00:01:01 1.0.1.101"
>     ovn-northd completion:          773ms
> 
> After:
> ovn-nbctl --wait=hv --print-wait-time lsp-add ls_1_0001 lsp_1_0001_01 -- lsp-set-addresses lsp_1_0001_01 "ff:f1:bb:00:01:01 1.0.1.101" -- lsp-set-port-security lsp_1_0001_01 "ff:f1:bb:00:01:01 1.0.1.101"
>     ovn-northd completion:          30ms
> 
> It is more than 95% reduction (or 20x faster).
> 
> Signed-off-by: Han Zhou <hzhou@ovn.org>
> ---
>  northd/en-lflow.c        |  82 ++++++++-----
>  northd/en-lflow.h        |   1 +
>  northd/inc-proc-northd.c |   2 +-
>  northd/northd.c          | 245 +++++++++++++++++++++++++++++++++------
>  northd/northd.h          |   6 +-
>  tests/ovn-northd.at      |   4 +-
>  6 files changed, 277 insertions(+), 63 deletions(-)
> 

<snip>

> @@ -5685,7 +5687,18 @@ ovn_dp_group_add_with_reference(struct ovn_lflow *lflow_ref,
>  
>  /* Adds a row with the specified contents to the Logical_Flow table.
>   * Version to use when hash bucket locking is NOT required.
> + *
> + * Note: This function can add generated lflows to the global variable
> + * temp_lflow_list as its output, controlled by the global variable
> + * add_lflow_to_temp_list. The caller of the ovn_lflow_add_... marcros can get
> + * a list of lflows generated by setting add_lflow_to_temp_list to true. The
> + * caller is responsible for initializing the temp_lflow_list, and also
> + * reset the add_lflow_to_temp_list to false when it is no longer needed.
> + * XXX: this mechanism is temporary and will be replaced when we add hash index
> + * to lflow_data and refactor related functions.
>   */
> +static bool add_lflow_to_temp_list = false;
> +static struct ovs_list temp_lflow_list;
>  static void
>  do_ovn_lflow_add(struct hmap *lflow_map, const struct ovn_datapath *od,
>                   const unsigned long *dp_bitmap, size_t dp_bitmap_len,
> @@ -5702,12 +5715,17 @@ do_ovn_lflow_add(struct hmap *lflow_map, const struct ovn_datapath *od,
>      size_t bitmap_len = od ? ods_size(od->datapaths) : dp_bitmap_len;
>      ovs_assert(bitmap_len);
>  
> -    old_lflow = ovn_lflow_find(lflow_map, NULL, stage, priority, match,
> -                               actions, ctrl_meter, hash);
> -    if (old_lflow) {
> -        ovn_dp_group_add_with_reference(old_lflow, od, dp_bitmap,
> -                                        bitmap_len);
> -        return;
> +    if (add_lflow_to_temp_list) {
> +        ovs_assert(od);
> +        ovs_assert(!dp_bitmap);
> +    } else {
> +        old_lflow = ovn_lflow_find(lflow_map, NULL, stage, priority, match,
> +                                   actions, ctrl_meter, hash);
> +        if (old_lflow) {
> +            ovn_dp_group_add_with_reference(old_lflow, od, dp_bitmap,
> +                                            bitmap_len);
> +            return;
> +        }
>      }

Hi, Han.  Not a full review, but I'm a bit concerned that this code doesn't
support datapath groups.  IIUC, if some flows can be aggregated with datapath
groups, I-P will just add them separately.  Later a full re-compute will
delete all these flows and replace them with a single grouped one.  Might
cause some unnecessary churn in the cluster.

Is my understanding correct?  And can it be a performance issue? E.g. if the
database grows too much because of that.   Or how hard it is to add datapath
group support to I-P ?

I suppose, DP-groups is one of the reasons you didn't implement deletions
here, right?

Best regards, Ilya Maxiemts.
Han Zhou May 29, 2023, 6:42 p.m. UTC | #3
On Mon, May 29, 2023 at 9:24 AM Ilya Maximets <i.maximets@ovn.org> wrote:
>
> On 5/13/23 02:03, Han Zhou wrote:
> > This patch introduces a change handler for 'northd' input within the
> > 'lflow' node. It specifically handles cases when VIFs are created, which
> > is an easier start for lflow incremental processing. Support for
> > update/delete will be added later.
> >
> > Below are the performance test results simulating an ovn-k8s topology
of 500
> > nodes x 50 lsp per node:
> >
> > Before:
> > ovn-nbctl --wait=hv --print-wait-time lsp-add ls_1_0001 lsp_1_0001_01
-- lsp-set-addresses lsp_1_0001_01 "ff:f1:bb:00:01:01 1.0.1.101" --
lsp-set-port-security lsp_1_0001_01 "ff:f1:bb:00:01:01 1.0.1.101"
> >     ovn-northd completion:          773ms
> >
> > After:
> > ovn-nbctl --wait=hv --print-wait-time lsp-add ls_1_0001 lsp_1_0001_01
-- lsp-set-addresses lsp_1_0001_01 "ff:f1:bb:00:01:01 1.0.1.101" --
lsp-set-port-security lsp_1_0001_01 "ff:f1:bb:00:01:01 1.0.1.101"
> >     ovn-northd completion:          30ms
> >
> > It is more than 95% reduction (or 20x faster).
> >
> > Signed-off-by: Han Zhou <hzhou@ovn.org>
> > ---
> >  northd/en-lflow.c        |  82 ++++++++-----
> >  northd/en-lflow.h        |   1 +
> >  northd/inc-proc-northd.c |   2 +-
> >  northd/northd.c          | 245 +++++++++++++++++++++++++++++++++------
> >  northd/northd.h          |   6 +-
> >  tests/ovn-northd.at      |   4 +-
> >  6 files changed, 277 insertions(+), 63 deletions(-)
> >
>
> <snip>
>
> > @@ -5685,7 +5687,18 @@ ovn_dp_group_add_with_reference(struct ovn_lflow
*lflow_ref,
> >
> >  /* Adds a row with the specified contents to the Logical_Flow table.
> >   * Version to use when hash bucket locking is NOT required.
> > + *
> > + * Note: This function can add generated lflows to the global variable
> > + * temp_lflow_list as its output, controlled by the global variable
> > + * add_lflow_to_temp_list. The caller of the ovn_lflow_add_... marcros
can get
> > + * a list of lflows generated by setting add_lflow_to_temp_list to
true. The
> > + * caller is responsible for initializing the temp_lflow_list, and also
> > + * reset the add_lflow_to_temp_list to false when it is no longer
needed.
> > + * XXX: this mechanism is temporary and will be replaced when we add
hash index
> > + * to lflow_data and refactor related functions.
> >   */
> > +static bool add_lflow_to_temp_list = false;
> > +static struct ovs_list temp_lflow_list;
> >  static void
> >  do_ovn_lflow_add(struct hmap *lflow_map, const struct ovn_datapath *od,
> >                   const unsigned long *dp_bitmap, size_t dp_bitmap_len,
> > @@ -5702,12 +5715,17 @@ do_ovn_lflow_add(struct hmap *lflow_map, const
struct ovn_datapath *od,
> >      size_t bitmap_len = od ? ods_size(od->datapaths) : dp_bitmap_len;
> >      ovs_assert(bitmap_len);
> >
> > -    old_lflow = ovn_lflow_find(lflow_map, NULL, stage, priority, match,
> > -                               actions, ctrl_meter, hash);
> > -    if (old_lflow) {
> > -        ovn_dp_group_add_with_reference(old_lflow, od, dp_bitmap,
> > -                                        bitmap_len);
> > -        return;
> > +    if (add_lflow_to_temp_list) {
> > +        ovs_assert(od);
> > +        ovs_assert(!dp_bitmap);
> > +    } else {
> > +        old_lflow = ovn_lflow_find(lflow_map, NULL, stage, priority,
match,
> > +                                   actions, ctrl_meter, hash);
> > +        if (old_lflow) {
> > +            ovn_dp_group_add_with_reference(old_lflow, od, dp_bitmap,
> > +                                            bitmap_len);
> > +            return;
> > +        }
> >      }
>
> Hi, Han.  Not a full review, but I'm a bit concerned that this code
doesn't
> support datapath groups.  IIUC, if some flows can be aggregated with
datapath
> groups, I-P will just add them separately.  Later a full re-compute will
> delete all these flows and replace them with a single grouped one.  Might
> cause some unnecessary churn in the cluster.
>

Hi Ilya, thanks for the feedback. You are right that the current I-P
doesn't support datapath groups, because:
1. The current I-P only supports LSP (regular VIFs) related changes, which
is very unlikely to generate datapath groups. Even when that happens in
rare cases, the logic is still correct and the churn should be small.
2. There are two types of DP groups:
    a. DP groups that are directly generated when northd has the knowledge
that a lflow should be applied to a group of DPs.
    b. DP groups that are passively generated by combining lflows generated
for different DPs.
Supporting I-P for the type-a DP groups is relatively straightforward,
which would be part of the I-P logic for the object that generates the DP
group, e.g. for LB related DP groups.
Supporting I-P for the type-b DP groups is more complex, but I think for
most DP groups we can optimize northd to make them type-a DP groups, for
example, for ACLs we should know beforehand the group of DPs the lflow
should be applied. This conversion/optimization is an incremental process,
when we see a type of change handling is a bottleneck, and decide to
implement I-P for it.
In the end I hope there will be no cases that need I-P for a type-b DP
group, which is why I don't worry about supporting DP groups for now.

> Is my understanding correct?  And can it be a performance issue? E.g. if
the
> database grows too much because of that.   Or how hard it is to add
datapath
> group support to I-P ?
>
I'd not worry too much about DB growth for now since it is only LSP
(regular VIFs) related lflows that's skipping DP group which I believe DP
group is not heavily involved in.

> I suppose, DP-groups is one of the reasons you didn't implement deletions
> here, right?

Maybe partly related, but not the major reason. I just didn't get enough
time for that part yet. For the reason mentioned above, I will skip the DP
group for VIF related lflow generation so that I can build hash index for
deletion, and I don't see any obvious negative impact for not trying to
combine such lflows with DP groups. Please let me know if you think
differently.

Thanks,
Han

>
> Best regards, Ilya Maxiemts.
Ilya Maximets May 30, 2023, 11:04 a.m. UTC | #4
On 5/29/23 20:42, Han Zhou wrote:
> 
> 
> On Mon, May 29, 2023 at 9:24 AM Ilya Maximets <i.maximets@ovn.org <mailto:i.maximets@ovn.org>> wrote:
>>
>> On 5/13/23 02:03, Han Zhou wrote:
>> > This patch introduces a change handler for 'northd' input within the
>> > 'lflow' node. It specifically handles cases when VIFs are created, which
>> > is an easier start for lflow incremental processing. Support for
>> > update/delete will be added later.
>> >
>> > Below are the performance test results simulating an ovn-k8s topology of 500
>> > nodes x 50 lsp per node:
>> >
>> > Before:
>> > ovn-nbctl --wait=hv --print-wait-time lsp-add ls_1_0001 lsp_1_0001_01 -- lsp-set-addresses lsp_1_0001_01 "ff:f1:bb:00:01:01 1.0.1.101" -- lsp-set-port-security lsp_1_0001_01 "ff:f1:bb:00:01:01 1.0.1.101"
>> >     ovn-northd completion:          773ms
>> >
>> > After:
>> > ovn-nbctl --wait=hv --print-wait-time lsp-add ls_1_0001 lsp_1_0001_01 -- lsp-set-addresses lsp_1_0001_01 "ff:f1:bb:00:01:01 1.0.1.101" -- lsp-set-port-security lsp_1_0001_01 "ff:f1:bb:00:01:01 1.0.1.101"
>> >     ovn-northd completion:          30ms
>> >
>> > It is more than 95% reduction (or 20x faster).
>> >
>> > Signed-off-by: Han Zhou <hzhou@ovn.org <mailto:hzhou@ovn.org>>
>> > ---
>> >  northd/en-lflow.c        |  82 ++++++++-----
>> >  northd/en-lflow.h        |   1 +
>> >  northd/inc-proc-northd.c |   2 +-
>> >  northd/northd.c          | 245 +++++++++++++++++++++++++++++++++------
>> >  northd/northd.h          |   6 +-
>> >  tests/ovn-northd.at <http://ovn-northd.at>      |   4 +-
>> >  6 files changed, 277 insertions(+), 63 deletions(-)
>> >
>>
>> <snip>
>>
>> > @@ -5685,7 +5687,18 @@ ovn_dp_group_add_with_reference(struct ovn_lflow *lflow_ref,
>> >
>> >  /* Adds a row with the specified contents to the Logical_Flow table.
>> >   * Version to use when hash bucket locking is NOT required.
>> > + *
>> > + * Note: This function can add generated lflows to the global variable
>> > + * temp_lflow_list as its output, controlled by the global variable
>> > + * add_lflow_to_temp_list. The caller of the ovn_lflow_add_... marcros can get
>> > + * a list of lflows generated by setting add_lflow_to_temp_list to true. The
>> > + * caller is responsible for initializing the temp_lflow_list, and also
>> > + * reset the add_lflow_to_temp_list to false when it is no longer needed.
>> > + * XXX: this mechanism is temporary and will be replaced when we add hash index
>> > + * to lflow_data and refactor related functions.
>> >   */
>> > +static bool add_lflow_to_temp_list = false;
>> > +static struct ovs_list temp_lflow_list;
>> >  static void
>> >  do_ovn_lflow_add(struct hmap *lflow_map, const struct ovn_datapath *od,
>> >                   const unsigned long *dp_bitmap, size_t dp_bitmap_len,
>> > @@ -5702,12 +5715,17 @@ do_ovn_lflow_add(struct hmap *lflow_map, const struct ovn_datapath *od,
>> >      size_t bitmap_len = od ? ods_size(od->datapaths) : dp_bitmap_len;
>> >      ovs_assert(bitmap_len);
>> >
>> > -    old_lflow = ovn_lflow_find(lflow_map, NULL, stage, priority, match,
>> > -                               actions, ctrl_meter, hash);
>> > -    if (old_lflow) {
>> > -        ovn_dp_group_add_with_reference(old_lflow, od, dp_bitmap,
>> > -                                        bitmap_len);
>> > -        return;
>> > +    if (add_lflow_to_temp_list) {
>> > +        ovs_assert(od);
>> > +        ovs_assert(!dp_bitmap);
>> > +    } else {
>> > +        old_lflow = ovn_lflow_find(lflow_map, NULL, stage, priority, match,
>> > +                                   actions, ctrl_meter, hash);
>> > +        if (old_lflow) {
>> > +            ovn_dp_group_add_with_reference(old_lflow, od, dp_bitmap,
>> > +                                            bitmap_len);
>> > +            return;
>> > +        }
>> >      }
>>
>> Hi, Han.  Not a full review, but I'm a bit concerned that this code doesn't
>> support datapath groups.  IIUC, if some flows can be aggregated with datapath
>> groups, I-P will just add them separately.  Later a full re-compute will
>> delete all these flows and replace them with a single grouped one.  Might
>> cause some unnecessary churn in the cluster.
>>
> 
> Hi Ilya, thanks for the feedback. You are right that the current I-P doesn't support datapath groups, because:
> 1. The current I-P only supports LSP (regular VIFs) related changes, which is very unlikely to generate datapath groups. Even when that happens in rare cases, the logic is still correct and the churn should be small.
> 2. There are two types of DP groups:
>     a. DP groups that are directly generated when northd has the knowledge that a lflow should be applied to a group of DPs.
>     b. DP groups that are passively generated by combining lflows generated for different DPs.
> Supporting I-P for the type-a DP groups is relatively straightforward, which would be part of the I-P logic for the object that generates the DP group, e.g. for LB related DP groups.
> Supporting I-P for the type-b DP groups is more complex, but I think for most DP groups we can optimize northd to make them type-a DP groups, for example, for ACLs we should know beforehand the group of DPs the lflow should be applied. This conversion/optimization is an incremental process, when we see a type of change handling is a bottleneck, and decide to implement I-P for it.
> In the end I hope there will be no cases that need I-P for a type-b DP group, which is why I don't worry about supporting DP groups for now.
> 
>> Is my understanding correct?  And can it be a performance issue? E.g. if the
>> database grows too much because of that.   Or how hard it is to add datapath
>> group support to I-P ?
>>
> I'd not worry too much about DB growth for now since it is only LSP (regular VIFs) related lflows that's skipping DP group which I believe DP group is not heavily involved in.
> 
>> I suppose, DP-groups is one of the reasons you didn't implement deletions
>> here, right?
> 
> Maybe partly related, but not the major reason. I just didn't get enough time for that part yet. For the reason mentioned above, I will skip the DP group for VIF related lflow generation so that I can build hash index for deletion, and I don't see any obvious negative impact for not trying to combine such lflows with DP groups. Please let me know if you think differently.

It's hard to tell for sure what kind of side effects we could see,
because it's hard to sustain incremental-only processing for a long
time with current implementation.  It falls back to re-compute too
frequently to see a long-term impact.

But I would not be that dismissive of type-b flows.  For example, I
pulled the southbound database from our recent ovn-heater density-light
500-node test.  It doesn't have any load balancers involved.

The database by the end of a test contains 556 K logical flows.
Only 826 of these flows have datapath groups, as expected, not a lot.
However, each of these flows is applied to a group of 500+ datapaths.
If we unfold them, we'll get additional 826 * 500 = 413 K flows.
So, the number of logical flows will be effectively double of what we
have today.  The total database size will also grow accordingly.

Best regards, Ilya Maximets.
Han Zhou May 30, 2023, 11:22 p.m. UTC | #5
On Tue, May 30, 2023 at 4:04 AM Ilya Maximets <i.maximets@ovn.org> wrote:
>
> On 5/29/23 20:42, Han Zhou wrote:
> >
> >
> > On Mon, May 29, 2023 at 9:24 AM Ilya Maximets <i.maximets@ovn.org
<mailto:i.maximets@ovn.org>> wrote:
> >>
> >> On 5/13/23 02:03, Han Zhou wrote:
> >> > This patch introduces a change handler for 'northd' input within the
> >> > 'lflow' node. It specifically handles cases when VIFs are created,
which
> >> > is an easier start for lflow incremental processing. Support for
> >> > update/delete will be added later.
> >> >
> >> > Below are the performance test results simulating an ovn-k8s
topology of 500
> >> > nodes x 50 lsp per node:
> >> >
> >> > Before:
> >> > ovn-nbctl --wait=hv --print-wait-time lsp-add ls_1_0001
lsp_1_0001_01 -- lsp-set-addresses lsp_1_0001_01 "ff:f1:bb:00:01:01
1.0.1.101" -- lsp-set-port-security lsp_1_0001_01 "ff:f1:bb:00:01:01
1.0.1.101"
> >> >     ovn-northd completion:          773ms
> >> >
> >> > After:
> >> > ovn-nbctl --wait=hv --print-wait-time lsp-add ls_1_0001
lsp_1_0001_01 -- lsp-set-addresses lsp_1_0001_01 "ff:f1:bb:00:01:01
1.0.1.101" -- lsp-set-port-security lsp_1_0001_01 "ff:f1:bb:00:01:01
1.0.1.101"
> >> >     ovn-northd completion:          30ms
> >> >
> >> > It is more than 95% reduction (or 20x faster).
> >> >
> >> > Signed-off-by: Han Zhou <hzhou@ovn.org <mailto:hzhou@ovn.org>>
> >> > ---
> >> >  northd/en-lflow.c        |  82 ++++++++-----
> >> >  northd/en-lflow.h        |   1 +
> >> >  northd/inc-proc-northd.c |   2 +-
> >> >  northd/northd.c          | 245
+++++++++++++++++++++++++++++++++------
> >> >  northd/northd.h          |   6 +-
> >> >  tests/ovn-northd.at <http://ovn-northd.at>      |   4 +-
> >> >  6 files changed, 277 insertions(+), 63 deletions(-)
> >> >
> >>
> >> <snip>
> >>
> >> > @@ -5685,7 +5687,18 @@ ovn_dp_group_add_with_reference(struct
ovn_lflow *lflow_ref,
> >> >
> >> >  /* Adds a row with the specified contents to the Logical_Flow table.
> >> >   * Version to use when hash bucket locking is NOT required.
> >> > + *
> >> > + * Note: This function can add generated lflows to the global
variable
> >> > + * temp_lflow_list as its output, controlled by the global variable
> >> > + * add_lflow_to_temp_list. The caller of the ovn_lflow_add_...
marcros can get
> >> > + * a list of lflows generated by setting add_lflow_to_temp_list to
true. The
> >> > + * caller is responsible for initializing the temp_lflow_list, and
also
> >> > + * reset the add_lflow_to_temp_list to false when it is no longer
needed.
> >> > + * XXX: this mechanism is temporary and will be replaced when we
add hash index
> >> > + * to lflow_data and refactor related functions.
> >> >   */
> >> > +static bool add_lflow_to_temp_list = false;
> >> > +static struct ovs_list temp_lflow_list;
> >> >  static void
> >> >  do_ovn_lflow_add(struct hmap *lflow_map, const struct ovn_datapath
*od,
> >> >                   const unsigned long *dp_bitmap, size_t
dp_bitmap_len,
> >> > @@ -5702,12 +5715,17 @@ do_ovn_lflow_add(struct hmap *lflow_map,
const struct ovn_datapath *od,
> >> >      size_t bitmap_len = od ? ods_size(od->datapaths) :
dp_bitmap_len;
> >> >      ovs_assert(bitmap_len);
> >> >
> >> > -    old_lflow = ovn_lflow_find(lflow_map, NULL, stage, priority,
match,
> >> > -                               actions, ctrl_meter, hash);
> >> > -    if (old_lflow) {
> >> > -        ovn_dp_group_add_with_reference(old_lflow, od, dp_bitmap,
> >> > -                                        bitmap_len);
> >> > -        return;
> >> > +    if (add_lflow_to_temp_list) {
> >> > +        ovs_assert(od);
> >> > +        ovs_assert(!dp_bitmap);
> >> > +    } else {
> >> > +        old_lflow = ovn_lflow_find(lflow_map, NULL, stage,
priority, match,
> >> > +                                   actions, ctrl_meter, hash);
> >> > +        if (old_lflow) {
> >> > +            ovn_dp_group_add_with_reference(old_lflow, od,
dp_bitmap,
> >> > +                                            bitmap_len);
> >> > +            return;
> >> > +        }
> >> >      }
> >>
> >> Hi, Han.  Not a full review, but I'm a bit concerned that this code
doesn't
> >> support datapath groups.  IIUC, if some flows can be aggregated with
datapath
> >> groups, I-P will just add them separately.  Later a full re-compute
will
> >> delete all these flows and replace them with a single grouped one.
Might
> >> cause some unnecessary churn in the cluster.
> >>
> >
> > Hi Ilya, thanks for the feedback. You are right that the current I-P
doesn't support datapath groups, because:
> > 1. The current I-P only supports LSP (regular VIFs) related changes,
which is very unlikely to generate datapath groups. Even when that happens
in rare cases, the logic is still correct and the churn should be small.
> > 2. There are two types of DP groups:
> >     a. DP groups that are directly generated when northd has the
knowledge that a lflow should be applied to a group of DPs.
> >     b. DP groups that are passively generated by combining lflows
generated for different DPs.
> > Supporting I-P for the type-a DP groups is relatively straightforward,
which would be part of the I-P logic for the object that generates the DP
group, e.g. for LB related DP groups.
> > Supporting I-P for the type-b DP groups is more complex, but I think
for most DP groups we can optimize northd to make them type-a DP groups,
for example, for ACLs we should know beforehand the group of DPs the lflow
should be applied. This conversion/optimization is an incremental process,
when we see a type of change handling is a bottleneck, and decide to
implement I-P for it.
> > In the end I hope there will be no cases that need I-P for a type-b DP
group, which is why I don't worry about supporting DP groups for now.
> >
> >> Is my understanding correct?  And can it be a performance issue? E.g.
if the
> >> database grows too much because of that.   Or how hard it is to add
datapath
> >> group support to I-P ?
> >>
> > I'd not worry too much about DB growth for now since it is only LSP
(regular VIFs) related lflows that's skipping DP group which I believe DP
group is not heavily involved in.
> >
> >> I suppose, DP-groups is one of the reasons you didn't implement
deletions
> >> here, right?
> >
> > Maybe partly related, but not the major reason. I just didn't get
enough time for that part yet. For the reason mentioned above, I will skip
the DP group for VIF related lflow generation so that I can build hash
index for deletion, and I don't see any obvious negative impact for not
trying to combine such lflows with DP groups. Please let me know if you
think differently.
>
> It's hard to tell for sure what kind of side effects we could see,
> because it's hard to sustain incremental-only processing for a long
> time with current implementation.  It falls back to re-compute too
> frequently to see a long-term impact.
>
> But I would not be that dismissive of type-b flows.  For example, I
> pulled the southbound database from our recent ovn-heater density-light
> 500-node test.  It doesn't have any load balancers involved.
>
> The database by the end of a test contains 556 K logical flows.
> Only 826 of these flows have datapath groups, as expected, not a lot.
> However, each of these flows is applied to a group of 500+ datapaths.
> If we unfold them, we'll get additional 826 * 500 = 413 K flows.
> So, the number of logical flows will be effectively double of what we
> have today.  The total database size will also grow accordingly.
>
> Best regards, Ilya Maximets.

Maybe I should clarify more clearly that the current implementation of this
patch series only performs I-P for VIF related changes. The VIF related
lflows are mostly different, and shall never be applied to a group of 500+
datapaths.
In the test you mentioned, the flows that applied to a group of 500+
datapaths shouldn't be related to any individual VIFs. With this patch,
they are still always-recomputed and applied to DP groups instead of
individual DPs.
In my own ovn-k8s-simulation test environment, I have confirmed that after
adding and binding a new VIF (which is incrementally processed), and then
triggering a recompute from northd, there is no change in the logical flows
(no churns observed).

Also, I was not dismissive of type-b DP groups. I totally understand the
impact of the cost of "ungrouping" them and never meant to do so. I am just
not worried now because the current patch series (and immediate TODOs for
deletions) would not touch the flows of type-b DP groups, so they will keep
getting recomputed and work as is for now.

To implement I-P for the flows of type-b DP groups, I also had the proposal
above: to convert them to type-a DP groups first. By that I mean, when
generating flows that applies to a big group of DPs, northd must have the
knowledge which DPs should the flows be applied to while constructing them.
An example is the DP groups for LBs that you have optimized recently.
Another example is the flows for ACLs that apply to DPs either for the LSes
that reference the ACLs directly or through port-groups. When implementing
I-P for these objects, we can construct the DP groups and apply the flows
to the group (type-a) directly instead of generating flows for each
individual DPs first and then collecting them (type-b). The goal is not to
convert every type-b group to type-a, but when we are trying to implement
I-P for a certain input this is what we need to do for the DP groups
related to that input.

I hope this addresses your concerns.

Finally, I don't want to mislead that there is never going to be DP groups
involved in VIF related flows. In some uncommon scenarios I think it may
still happen that two VIFs on different LSes may happen to have same IP/MAC
configured thus some of the flows generated may look the same on the
different DPs (thus get grouped as type-b groups), but these are very rare
cases and the DP groups generated by such coincidental identical flows
should be small and the effect of such small-sized DP groups are
unimportant (unlike the DP groups for common flows for all LSes/LRs or the
ones for LBs/ACLs). So if there are churns caused by this type of flows
they should be very small (if exists at all), and once I implement the I-P
for deletion (which will disable DP groups for such coincidental small
groups) there will be no churn at all even for these rare cases.

Thanks,
Han
Ilya Maximets May 31, 2023, 1:35 p.m. UTC | #6
On 5/31/23 01:22, Han Zhou wrote:
> 
> 
> On Tue, May 30, 2023 at 4:04 AM Ilya Maximets <i.maximets@ovn.org <mailto:i.maximets@ovn.org>> wrote:
>>
>> On 5/29/23 20:42, Han Zhou wrote:
>> >
>> >
>> > On Mon, May 29, 2023 at 9:24 AM Ilya Maximets <i.maximets@ovn.org <mailto:i.maximets@ovn.org> <mailto:i.maximets@ovn.org <mailto:i.maximets@ovn.org>>> wrote:
>> >>
>> >> On 5/13/23 02:03, Han Zhou wrote:
>> >> > This patch introduces a change handler for 'northd' input within the
>> >> > 'lflow' node. It specifically handles cases when VIFs are created, which
>> >> > is an easier start for lflow incremental processing. Support for
>> >> > update/delete will be added later.
>> >> >
>> >> > Below are the performance test results simulating an ovn-k8s topology of 500
>> >> > nodes x 50 lsp per node:
>> >> >
>> >> > Before:
>> >> > ovn-nbctl --wait=hv --print-wait-time lsp-add ls_1_0001 lsp_1_0001_01 -- lsp-set-addresses lsp_1_0001_01 "ff:f1:bb:00:01:01 1.0.1.101" -- lsp-set-port-security lsp_1_0001_01 "ff:f1:bb:00:01:01 1.0.1.101"
>> >> >     ovn-northd completion:          773ms
>> >> >
>> >> > After:
>> >> > ovn-nbctl --wait=hv --print-wait-time lsp-add ls_1_0001 lsp_1_0001_01 -- lsp-set-addresses lsp_1_0001_01 "ff:f1:bb:00:01:01 1.0.1.101" -- lsp-set-port-security lsp_1_0001_01 "ff:f1:bb:00:01:01 1.0.1.101"
>> >> >     ovn-northd completion:          30ms
>> >> >
>> >> > It is more than 95% reduction (or 20x faster).
>> >> >
>> >> > Signed-off-by: Han Zhou <hzhou@ovn.org <mailto:hzhou@ovn.org> <mailto:hzhou@ovn.org <mailto:hzhou@ovn.org>>>
>> >> > ---
>> >> >  northd/en-lflow.c        |  82 ++++++++-----
>> >> >  northd/en-lflow.h        |   1 +
>> >> >  northd/inc-proc-northd.c |   2 +-
>> >> >  northd/northd.c          | 245 +++++++++++++++++++++++++++++++++------
>> >> >  northd/northd.h          |   6 +-
>> >> >  tests/ovn-northd.at <http://ovn-northd.at> <http://ovn-northd.at <http://ovn-northd.at>>      |   4 +-
>> >> >  6 files changed, 277 insertions(+), 63 deletions(-)
>> >> >
>> >>
>> >> <snip>
>> >>
>> >> > @@ -5685,7 +5687,18 @@ ovn_dp_group_add_with_reference(struct ovn_lflow *lflow_ref,
>> >> >
>> >> >  /* Adds a row with the specified contents to the Logical_Flow table.
>> >> >   * Version to use when hash bucket locking is NOT required.
>> >> > + *
>> >> > + * Note: This function can add generated lflows to the global variable
>> >> > + * temp_lflow_list as its output, controlled by the global variable
>> >> > + * add_lflow_to_temp_list. The caller of the ovn_lflow_add_... marcros can get
>> >> > + * a list of lflows generated by setting add_lflow_to_temp_list to true. The
>> >> > + * caller is responsible for initializing the temp_lflow_list, and also
>> >> > + * reset the add_lflow_to_temp_list to false when it is no longer needed.
>> >> > + * XXX: this mechanism is temporary and will be replaced when we add hash index
>> >> > + * to lflow_data and refactor related functions.
>> >> >   */
>> >> > +static bool add_lflow_to_temp_list = false;
>> >> > +static struct ovs_list temp_lflow_list;
>> >> >  static void
>> >> >  do_ovn_lflow_add(struct hmap *lflow_map, const struct ovn_datapath *od,
>> >> >                   const unsigned long *dp_bitmap, size_t dp_bitmap_len,
>> >> > @@ -5702,12 +5715,17 @@ do_ovn_lflow_add(struct hmap *lflow_map, const struct ovn_datapath *od,
>> >> >      size_t bitmap_len = od ? ods_size(od->datapaths) : dp_bitmap_len;
>> >> >      ovs_assert(bitmap_len);
>> >> >
>> >> > -    old_lflow = ovn_lflow_find(lflow_map, NULL, stage, priority, match,
>> >> > -                               actions, ctrl_meter, hash);
>> >> > -    if (old_lflow) {
>> >> > -        ovn_dp_group_add_with_reference(old_lflow, od, dp_bitmap,
>> >> > -                                        bitmap_len);
>> >> > -        return;
>> >> > +    if (add_lflow_to_temp_list) {
>> >> > +        ovs_assert(od);
>> >> > +        ovs_assert(!dp_bitmap);
>> >> > +    } else {
>> >> > +        old_lflow = ovn_lflow_find(lflow_map, NULL, stage, priority, match,
>> >> > +                                   actions, ctrl_meter, hash);
>> >> > +        if (old_lflow) {
>> >> > +            ovn_dp_group_add_with_reference(old_lflow, od, dp_bitmap,
>> >> > +                                            bitmap_len);
>> >> > +            return;
>> >> > +        }
>> >> >      }
>> >>
>> >> Hi, Han.  Not a full review, but I'm a bit concerned that this code doesn't
>> >> support datapath groups.  IIUC, if some flows can be aggregated with datapath
>> >> groups, I-P will just add them separately.  Later a full re-compute will
>> >> delete all these flows and replace them with a single grouped one.  Might
>> >> cause some unnecessary churn in the cluster.
>> >>
>> >
>> > Hi Ilya, thanks for the feedback. You are right that the current I-P doesn't support datapath groups, because:
>> > 1. The current I-P only supports LSP (regular VIFs) related changes, which is very unlikely to generate datapath groups. Even when that happens in rare cases, the logic is still correct and the churn should be small.
>> > 2. There are two types of DP groups:
>> >     a. DP groups that are directly generated when northd has the knowledge that a lflow should be applied to a group of DPs.
>> >     b. DP groups that are passively generated by combining lflows generated for different DPs.
>> > Supporting I-P for the type-a DP groups is relatively straightforward, which would be part of the I-P logic for the object that generates the DP group, e.g. for LB related DP groups.
>> > Supporting I-P for the type-b DP groups is more complex, but I think for most DP groups we can optimize northd to make them type-a DP groups, for example, for ACLs we should know beforehand the group of DPs the lflow should be applied. This conversion/optimization is an incremental process, when we see a type of change handling is a bottleneck, and decide to implement I-P for it.
>> > In the end I hope there will be no cases that need I-P for a type-b DP group, which is why I don't worry about supporting DP groups for now.
>> >
>> >> Is my understanding correct?  And can it be a performance issue? E.g. if the
>> >> database grows too much because of that.   Or how hard it is to add datapath
>> >> group support to I-P ?
>> >>
>> > I'd not worry too much about DB growth for now since it is only LSP (regular VIFs) related lflows that's skipping DP group which I believe DP group is not heavily involved in.
>> >
>> >> I suppose, DP-groups is one of the reasons you didn't implement deletions
>> >> here, right?
>> >
>> > Maybe partly related, but not the major reason. I just didn't get enough time for that part yet. For the reason mentioned above, I will skip the DP group for VIF related lflow generation so that I can build hash index for deletion, and I don't see any obvious negative impact for not trying to combine such lflows with DP groups. Please let me know if you think differently.
>>
>> It's hard to tell for sure what kind of side effects we could see,
>> because it's hard to sustain incremental-only processing for a long
>> time with current implementation.  It falls back to re-compute too
>> frequently to see a long-term impact.
>>
>> But I would not be that dismissive of type-b flows.  For example, I
>> pulled the southbound database from our recent ovn-heater density-light
>> 500-node test.  It doesn't have any load balancers involved.
>>
>> The database by the end of a test contains 556 K logical flows.
>> Only 826 of these flows have datapath groups, as expected, not a lot.
>> However, each of these flows is applied to a group of 500+ datapaths.
>> If we unfold them, we'll get additional 826 * 500 = 413 K flows.
>> So, the number of logical flows will be effectively double of what we
>> have today.  The total database size will also grow accordingly.
>>
>> Best regards, Ilya Maximets.
> 
> Maybe I should clarify more clearly that the current implementation of this patch series only performs I-P for VIF related changes. The VIF related lflows are mostly different, and shall never be applied to a group of 500+ datapaths.

I see.  Indeed, most of the flows in the case above are related to
router ports and don't really match on any port-specific information.
I missed that while looking through the database.

> In the test you mentioned, the flows that applied to a group of 500+ datapaths shouldn't be related to any individual VIFs. With this patch, they are still always-recomputed and applied to DP groups instead of individual DPs.
> In my own ovn-k8s-simulation test environment, I have confirmed that after adding and binding a new VIF (which is incrementally processed), and then triggering a recompute from northd, there is no change in the logical flows (no churns observed).
> 
> Also, I was not dismissive of type-b DP groups.

Sorry, it was a poor choice of words from my side.

> I totally understand the impact of the cost of "ungrouping" them and never meant to do so. I am just not worried now because the current patch series (and immediate TODOs for deletions) would not touch the flows of type-b DP groups, so they will keep getting recomputed and work as is for now.
> 
> To implement I-P for the flows of type-b DP groups, I also had the proposal above: to convert them to type-a DP groups first. By that I mean, when generating flows that applies to a big group of DPs, northd must have the knowledge which DPs should the flows be applied to while constructing them. An example is the DP groups for LBs that you have optimized recently. Another example is the flows for ACLs that apply to DPs either for the LSes that reference the ACLs directly or through port-groups. When implementing I-P for these objects, we can construct the DP groups and apply the flows to the group (type-a) directly instead of generating flows for each individual DPs first and then collecting them (type-b). The goal is not to convert every type-b group to type-a, but when we are trying to implement I-P for a certain input this is what we need to do for the DP groups related to that input.

Ack.

> 
> I hope this addresses your concerns.
> 
> Finally, I don't want to mislead that there is never going to be DP groups involved in VIF related flows. In some uncommon scenarios I think it may still happen that two VIFs on different LSes may happen to have same IP/MAC configured thus some of the flows generated may look the same on the different DPs (thus get grouped as type-b groups), but these are very rare cases and the DP groups generated by such coincidental identical flows should be small and the effect of such small-sized DP groups are unimportant (unlike the DP groups for common flows for all LSes/LRs or the ones for LBs/ACLs). So if there are churns caused by this type of flows they should be very small (if exists at all), and once I implement the I-P for deletion (which will disable DP groups for such coincidental small groups) there will be no churn at all even for these rare cases.


Thanks for explanation.  I see that datapath groups are not really
a problem for this particular patch set.  We just need to be careful
while adding new I-P nodes for other types of resources.

Best regards, Ilya Maximets.

> 
> Thanks,
> Han
diff mbox series

Patch

diff --git a/northd/en-lflow.c b/northd/en-lflow.c
index 081ec7c353ed..28ab1c67fb8f 100644
--- a/northd/en-lflow.c
+++ b/northd/en-lflow.c
@@ -30,43 +30,49 @@ 
 
 VLOG_DEFINE_THIS_MODULE(en_lflow);
 
-void en_lflow_run(struct engine_node *node, void *data)
+static void
+lflow_get_input_data(struct engine_node *node,
+                     struct lflow_input *lflow_input)
 {
-    const struct engine_context *eng_ctx = engine_get_context();
-
-    struct lflow_input lflow_input;
-
     struct northd_data *northd_data = engine_get_input_data("northd", node);
-
-    struct hmap bfd_connections = HMAP_INITIALIZER(&bfd_connections);
-
-    lflow_input.nbrec_bfd_table =
+    lflow_input->nbrec_bfd_table =
         EN_OVSDB_GET(engine_get_input("NB_bfd", node));
-    lflow_input.sbrec_bfd_table =
+    lflow_input->sbrec_bfd_table =
         EN_OVSDB_GET(engine_get_input("SB_bfd", node));
-    lflow_input.sbrec_logical_flow_table =
+    lflow_input->sbrec_logical_flow_table =
         EN_OVSDB_GET(engine_get_input("SB_logical_flow", node));
-    lflow_input.sbrec_multicast_group_table =
+    lflow_input->sbrec_multicast_group_table =
         EN_OVSDB_GET(engine_get_input("SB_multicast_group", node));
-    lflow_input.sbrec_igmp_group_table =
+    lflow_input->sbrec_igmp_group_table =
         EN_OVSDB_GET(engine_get_input("SB_igmp_group", node));
 
-    lflow_input.sbrec_mcast_group_by_name_dp =
+    lflow_input->sbrec_mcast_group_by_name_dp =
            engine_ovsdb_node_get_index(
                           engine_get_input("SB_multicast_group", node),
                          "sbrec_mcast_group_by_name");
 
-    lflow_input.ls_datapaths = &northd_data->ls_datapaths;
-    lflow_input.lr_datapaths = &northd_data->lr_datapaths;
-    lflow_input.ls_ports = &northd_data->ls_ports;
-    lflow_input.lr_ports = &northd_data->lr_ports;
-    lflow_input.port_groups = &northd_data->port_groups;
-    lflow_input.meter_groups = &northd_data->meter_groups;
-    lflow_input.lbs = &northd_data->lbs;
-    lflow_input.bfd_connections = &bfd_connections;
-    lflow_input.features = &northd_data->features;
-    lflow_input.ovn_internal_version_changed =
+    lflow_input->ls_datapaths = &northd_data->ls_datapaths;
+    lflow_input->lr_datapaths = &northd_data->lr_datapaths;
+    lflow_input->ls_ports = &northd_data->ls_ports;
+    lflow_input->lr_ports = &northd_data->lr_ports;
+    lflow_input->port_groups = &northd_data->port_groups;
+    lflow_input->meter_groups = &northd_data->meter_groups;
+    lflow_input->lbs = &northd_data->lbs;
+    lflow_input->features = &northd_data->features;
+    lflow_input->ovn_internal_version_changed =
                       northd_data->ovn_internal_version_changed;
+    lflow_input->bfd_connections = NULL;
+}
+
+void en_lflow_run(struct engine_node *node, void *data)
+{
+    const struct engine_context *eng_ctx = engine_get_context();
+
+    struct lflow_input lflow_input;
+    lflow_get_input_data(node, &lflow_input);
+
+    struct hmap bfd_connections = HMAP_INITIALIZER(&bfd_connections);
+    lflow_input.bfd_connections = &bfd_connections;
 
     struct lflow_data *lflow_data = data;
     lflow_data_destroy(lflow_data);
@@ -76,8 +82,8 @@  void en_lflow_run(struct engine_node *node, void *data)
     build_bfd_table(eng_ctx->ovnsb_idl_txn,
                     lflow_input.nbrec_bfd_table,
                     lflow_input.sbrec_bfd_table,
-                    &bfd_connections,
-                    &northd_data->lr_ports);
+                    lflow_input.lr_ports,
+                    &bfd_connections);
     build_lflows(eng_ctx->ovnsb_idl_txn, &lflow_input, &lflow_data->lflows);
     bfd_cleanup_connections(lflow_input.nbrec_bfd_table,
                             &bfd_connections);
@@ -87,6 +93,30 @@  void en_lflow_run(struct engine_node *node, void *data)
     engine_set_node_state(node, EN_UPDATED);
 }
 
+bool
+lflow_northd_handler(struct engine_node *node,
+                     void *data)
+{
+    struct northd_data *northd_data = engine_get_input_data("northd", node);
+    if (!northd_data->change_tracked) {
+        return false;
+    }
+    const struct engine_context *eng_ctx = engine_get_context();
+    struct lflow_data *lflow_data = data;
+
+    struct lflow_input lflow_input;
+    lflow_get_input_data(node, &lflow_input);
+
+    if (!lflow_handle_northd_ls_changes(eng_ctx->ovnsb_idl_txn,
+                                        &northd_data->tracked_ls_changes,
+                                        &lflow_input, &lflow_data->lflows)) {
+        return false;
+    }
+
+    engine_set_node_state(node, EN_UPDATED);
+    return true;
+}
+
 void *en_lflow_init(struct engine_node *node OVS_UNUSED,
                      struct engine_arg *arg OVS_UNUSED)
 {
diff --git a/northd/en-lflow.h b/northd/en-lflow.h
index 0e4d522ff3fe..5e3fbc25e3e0 100644
--- a/northd/en-lflow.h
+++ b/northd/en-lflow.h
@@ -12,5 +12,6 @@ 
 void en_lflow_run(struct engine_node *node, void *data);
 void *en_lflow_init(struct engine_node *node, struct engine_arg *arg);
 void en_lflow_cleanup(void *data);
+bool lflow_northd_handler(struct engine_node *, void *data);
 
 #endif /* EN_LFLOW_H */
diff --git a/northd/inc-proc-northd.c b/northd/inc-proc-northd.c
index f992a9ec8420..f6ceb8280624 100644
--- a/northd/inc-proc-northd.c
+++ b/northd/inc-proc-northd.c
@@ -182,7 +182,7 @@  void inc_proc_northd_init(struct ovsdb_idl_loop *nb,
     engine_add_input(&en_lflow, &en_sb_logical_flow, NULL);
     engine_add_input(&en_lflow, &en_sb_multicast_group, NULL);
     engine_add_input(&en_lflow, &en_sb_igmp_group, NULL);
-    engine_add_input(&en_lflow, &en_northd, NULL);
+    engine_add_input(&en_lflow, &en_northd, lflow_northd_handler);
 
     engine_add_input(&en_sync_to_sb_addr_set, &en_nb_address_set,
                      sync_to_sb_addr_set_nb_address_set_handler);
diff --git a/northd/northd.c b/northd/northd.c
index 2cb568b8e1dd..32c18ab3c932 100644
--- a/northd/northd.c
+++ b/northd/northd.c
@@ -5500,6 +5500,7 @@  ovn_igmp_group_destroy(struct hmap *igmp_groups,
 
 struct ovn_lflow {
     struct hmap_node hmap_node;
+    struct ovs_list list_node;
 
     struct ovn_datapath *od;     /* 'logical_datapath' in SB schema.  */
     unsigned long *dpg_bitmap;   /* Bitmap of all datapaths by their 'index'.*/
@@ -5557,6 +5558,7 @@  ovn_lflow_init(struct ovn_lflow *lflow, struct ovn_datapath *od,
                char *match, char *actions, char *io_port, char *ctrl_meter,
                char *stage_hint, const char *where)
 {
+    ovs_list_init(&lflow->list_node);
     lflow->dpg_bitmap = bitmap_allocate(dp_bitmap_len);
     lflow->od = od;
     lflow->stage = stage;
@@ -5685,7 +5687,18 @@  ovn_dp_group_add_with_reference(struct ovn_lflow *lflow_ref,
 
 /* Adds a row with the specified contents to the Logical_Flow table.
  * Version to use when hash bucket locking is NOT required.
+ *
+ * Note: This function can add generated lflows to the global variable
+ * temp_lflow_list as its output, controlled by the global variable
+ * add_lflow_to_temp_list. The caller of the ovn_lflow_add_... marcros can get
+ * a list of lflows generated by setting add_lflow_to_temp_list to true. The
+ * caller is responsible for initializing the temp_lflow_list, and also
+ * reset the add_lflow_to_temp_list to false when it is no longer needed.
+ * XXX: this mechanism is temporary and will be replaced when we add hash index
+ * to lflow_data and refactor related functions.
  */
+static bool add_lflow_to_temp_list = false;
+static struct ovs_list temp_lflow_list;
 static void
 do_ovn_lflow_add(struct hmap *lflow_map, const struct ovn_datapath *od,
                  const unsigned long *dp_bitmap, size_t dp_bitmap_len,
@@ -5702,12 +5715,17 @@  do_ovn_lflow_add(struct hmap *lflow_map, const struct ovn_datapath *od,
     size_t bitmap_len = od ? ods_size(od->datapaths) : dp_bitmap_len;
     ovs_assert(bitmap_len);
 
-    old_lflow = ovn_lflow_find(lflow_map, NULL, stage, priority, match,
-                               actions, ctrl_meter, hash);
-    if (old_lflow) {
-        ovn_dp_group_add_with_reference(old_lflow, od, dp_bitmap,
-                                        bitmap_len);
-        return;
+    if (add_lflow_to_temp_list) {
+        ovs_assert(od);
+        ovs_assert(!dp_bitmap);
+    } else {
+        old_lflow = ovn_lflow_find(lflow_map, NULL, stage, priority, match,
+                                   actions, ctrl_meter, hash);
+        if (old_lflow) {
+            ovn_dp_group_add_with_reference(old_lflow, od, dp_bitmap,
+                                            bitmap_len);
+            return;
+        }
     }
 
     lflow = xmalloc(sizeof *lflow);
@@ -5728,6 +5746,10 @@  do_ovn_lflow_add(struct hmap *lflow_map, const struct ovn_datapath *od,
         hmap_insert_fast(lflow_map, &lflow->hmap_node, hash);
         thread_lflow_counter++;
     }
+
+    if (add_lflow_to_temp_list) {
+        ovs_list_insert(&temp_lflow_list, &lflow->list_node);
+    }
 }
 
 /* Adds a row with the specified contents to the Logical_Flow table. */
@@ -9867,7 +9889,7 @@  void
 build_bfd_table(struct ovsdb_idl_txn *ovnsb_txn,
                 const struct nbrec_bfd_table *nbrec_bfd_table,
                 const struct sbrec_bfd_table *sbrec_bfd_table,
-                struct hmap *bfd_connections, struct hmap *lr_ports)
+                const struct hmap *lr_ports, struct hmap *bfd_connections)
 {
     struct hmap sb_only = HMAP_INITIALIZER(&sb_only);
     const struct sbrec_bfd *sb_bt;
@@ -15219,32 +15241,28 @@  build_lswitch_and_lrouter_iterate_by_lr(struct ovn_datapath *od,
  */
 static void
 build_lswitch_and_lrouter_iterate_by_lsp(struct ovn_port *op,
-                                         struct lswitch_flow_build_info *lsi)
+                                         const struct hmap *ls_ports,
+                                         const struct hmap *lr_ports,
+                                         const struct shash *meter_groups,
+                                         struct ds *match,
+                                         struct ds *actions,
+                                         struct hmap *lflows)
 {
     ovs_assert(op->nbsp);
 
     /* Build Logical Switch Flows. */
-    build_lswitch_port_sec_op(op, lsi->lflows, &lsi->actions, &lsi->match);
-    build_lswitch_learn_fdb_op(op, lsi->lflows, &lsi->actions,
-                               &lsi->match);
-    build_lswitch_arp_nd_responder_skip_local(op, lsi->lflows,
-                                              &lsi->match);
-    build_lswitch_arp_nd_responder_known_ips(op, lsi->lflows,
-                                             lsi->ls_ports,
-                                             lsi->meter_groups,
-                                             &lsi->actions,
-                                             &lsi->match);
-    build_lswitch_dhcp_options_and_response(op, lsi->lflows,
-                                            lsi->meter_groups);
-    build_lswitch_external_port(op, lsi->lflows);
-    build_lswitch_ip_unicast_lookup(op, lsi->lflows, &lsi->actions,
-                                    &lsi->match);
+    build_lswitch_port_sec_op(op, lflows, actions, match);
+    build_lswitch_learn_fdb_op(op, lflows, actions, match);
+    build_lswitch_arp_nd_responder_skip_local(op, lflows, match);
+    build_lswitch_arp_nd_responder_known_ips(op, lflows, ls_ports,
+                                             meter_groups, actions, match);
+    build_lswitch_dhcp_options_and_response(op, lflows, meter_groups);
+    build_lswitch_external_port(op, lflows);
+    build_lswitch_ip_unicast_lookup(op, lflows, actions, match);
 
     /* Build Logical Router Flows. */
-    build_ip_routing_flows_for_router_type_lsp(op, lsi->lr_ports,
-                                               lsi->lflows);
-    build_arp_resolve_flows_for_lsp(op, lsi->lflows, lsi->lr_ports,
-                                    &lsi->match, &lsi->actions);
+    build_ip_routing_flows_for_router_type_lsp(op, lr_ports, lflows);
+    build_arp_resolve_flows_for_lsp(op, lflows, lr_ports, match, actions);
 }
 
 /* Helper function to combine all lflow generation which is iterated by logical
@@ -15330,7 +15348,12 @@  build_lflows_thread(void *arg)
                     if (stop_parallel_processing()) {
                         return NULL;
                     }
-                    build_lswitch_and_lrouter_iterate_by_lsp(op, lsi);
+                    build_lswitch_and_lrouter_iterate_by_lsp(op, lsi->ls_ports,
+                                                             lsi->lr_ports,
+                                                             lsi->meter_groups,
+                                                             &lsi->match,
+                                                             &lsi->actions,
+                                                             lsi->lflows);
                 }
             }
             for (bnum = control->id;
@@ -15515,7 +15538,11 @@  build_lswitch_and_lrouter_flows(const struct ovn_datapaths *ls_datapaths,
         stopwatch_stop(LFLOWS_DATAPATHS_STOPWATCH_NAME, time_msec());
         stopwatch_start(LFLOWS_PORTS_STOPWATCH_NAME, time_msec());
         HMAP_FOR_EACH (op, key_node, ls_ports) {
-            build_lswitch_and_lrouter_iterate_by_lsp(op, &lsi);
+            build_lswitch_and_lrouter_iterate_by_lsp(op, lsi.ls_ports,
+                                                     lsi.lr_ports,
+                                                     lsi.meter_groups,
+                                                     &lsi.match, &lsi.actions,
+                                                     lsi.lflows);
         }
         HMAP_FOR_EACH (op, key_node, lr_ports) {
             build_lswitch_and_lrouter_iterate_by_lrp(op, &lsi);
@@ -15629,6 +15656,20 @@  build_mcast_groups(const struct sbrec_igmp_group_table *sbrec_igmp_group_table,
                    struct hmap *mcast_groups,
                    struct hmap *igmp_groups);
 
+static struct sbrec_multicast_group *
+create_sb_multicast_group(struct ovsdb_idl_txn *ovnsb_txn,
+                          const struct sbrec_datapath_binding *dp,
+                          const char *name,
+                          int64_t tunnel_key)
+{
+    struct sbrec_multicast_group *sbmc =
+        sbrec_multicast_group_insert(ovnsb_txn);
+    sbrec_multicast_group_set_datapath(sbmc, dp);
+    sbrec_multicast_group_set_name(sbmc, name);
+    sbrec_multicast_group_set_tunnel_key(sbmc, tunnel_key);
+    return sbmc;
+}
+
 /* Updates the Logical_Flow and Multicast_Group tables in the OVN_SB database,
  * constructing their contents based on the OVN_NB database. */
 void build_lflows(struct ovsdb_idl_txn *ovnsb_txn,
@@ -15995,10 +16036,8 @@  void build_lflows(struct ovsdb_idl_txn *ovnsb_txn,
             ovn_multicast_destroy(&mcast_groups, mc);
             continue;
         }
-        sbmc = sbrec_multicast_group_insert(ovnsb_txn);
-        sbrec_multicast_group_set_datapath(sbmc, mc->datapath->sb);
-        sbrec_multicast_group_set_name(sbmc, mc->group->name);
-        sbrec_multicast_group_set_tunnel_key(sbmc, mc->group->key);
+        sbmc = create_sb_multicast_group(ovnsb_txn, mc->datapath->sb,
+                                         mc->group->name, mc->group->key);
         ovn_multicast_update_sbrec(mc, sbmc);
         ovn_multicast_destroy(&mcast_groups, mc);
     }
@@ -16013,6 +16052,146 @@  void build_lflows(struct ovsdb_idl_txn *ovnsb_txn,
     hmap_destroy(&mcast_groups);
 }
 
+bool lflow_handle_northd_ls_changes(struct ovsdb_idl_txn *ovnsb_txn,
+                                    struct tracked_ls_changes *ls_changes,
+                                    struct lflow_input *lflow_input,
+                                    struct hmap *lflows)
+{
+    struct ls_change *ls_change;
+    LIST_FOR_EACH (ls_change, list_node, &ls_changes->updated) {
+        ovs_list_init(&temp_lflow_list);
+        add_lflow_to_temp_list = true;
+        if (!ovs_list_is_empty(&ls_change->updated_ports) ||
+            !ovs_list_is_empty(&ls_change->deleted_ports)) {
+            /* XXX: implement lflow index so that we can handle updated and
+             * deleted LSPs incrementally. */
+            ovs_list_init(&temp_lflow_list);
+            add_lflow_to_temp_list = false;
+            return false;
+        }
+
+        const struct sbrec_multicast_group *sbmc_flood =
+            mcast_group_lookup(lflow_input->sbrec_mcast_group_by_name_dp,
+                               MC_FLOOD, ls_change->od->sb);
+        const struct sbrec_multicast_group *sbmc_flood_l2 =
+            mcast_group_lookup(lflow_input->sbrec_mcast_group_by_name_dp,
+                               MC_FLOOD_L2, ls_change->od->sb);
+        const struct sbrec_multicast_group *sbmc_unknown =
+            mcast_group_lookup(lflow_input->sbrec_mcast_group_by_name_dp,
+                               MC_UNKNOWN, ls_change->od->sb);
+
+        struct ovn_port *op;
+        LIST_FOR_EACH (op, list, &ls_change->added_ports) {
+            struct ds match = DS_EMPTY_INITIALIZER;
+            struct ds actions = DS_EMPTY_INITIALIZER;
+            build_lswitch_and_lrouter_iterate_by_lsp(op, lflow_input->ls_ports,
+                                                     lflow_input->lr_ports,
+                                                     lflow_input->meter_groups,
+                                                     &match, &actions,
+                                                     lflows);
+            ds_destroy(&match);
+            ds_destroy(&actions);
+            if (!sbmc_flood) {
+                sbmc_flood = create_sb_multicast_group(ovnsb_txn,
+                    ls_change->od->sb, MC_FLOOD, OVN_MCAST_FLOOD_TUNNEL_KEY);
+            }
+            sbrec_multicast_group_update_ports_addvalue(sbmc_flood, op->sb);
+
+            if (!sbmc_flood_l2) {
+                sbmc_flood_l2 = create_sb_multicast_group(ovnsb_txn,
+                    ls_change->od->sb, MC_FLOOD_L2,
+                    OVN_MCAST_FLOOD_L2_TUNNEL_KEY);
+            }
+            sbrec_multicast_group_update_ports_addvalue(sbmc_flood_l2, op->sb);
+
+            if (op->has_unknown) {
+                if (!sbmc_unknown) {
+                    sbmc_unknown = create_sb_multicast_group(ovnsb_txn,
+                        ls_change->od->sb, MC_UNKNOWN,
+                        OVN_MCAST_UNKNOWN_TUNNEL_KEY);
+                }
+                sbrec_multicast_group_update_ports_addvalue(sbmc_unknown,
+                                                            op->sb);
+            }
+        }
+        /* Sync the newly added flows to SB. */
+        struct ovn_lflow *lflow;
+        LIST_FOR_EACH (lflow, list_node, &temp_lflow_list) {
+            size_t n_datapaths;
+            struct ovn_datapath **datapaths_array;
+            if (ovn_stage_to_datapath_type(lflow->stage) == DP_SWITCH) {
+                n_datapaths = ods_size(lflow_input->ls_datapaths);
+                datapaths_array = lflow_input->ls_datapaths->array;
+            } else {
+                n_datapaths = ods_size(lflow_input->lr_datapaths);
+                datapaths_array = lflow_input->lr_datapaths->array;
+            }
+            uint32_t n_ods = bitmap_count1(lflow->dpg_bitmap, n_datapaths);
+            ovs_assert(n_ods == 1);
+            /* There is only one datapath, so it should be moved out of the
+             * group to a single 'od'. */
+            size_t index = bitmap_scan(lflow->dpg_bitmap, true, 0,
+                                       n_datapaths);
+
+            bitmap_set0(lflow->dpg_bitmap, index);
+            lflow->od = datapaths_array[index];
+
+            /* Logical flow should be re-hashed to allow lookups. */
+            uint32_t hash = hmap_node_hash(&lflow->hmap_node);
+            /* Remove from lflows. */
+            hmap_remove(lflows, &lflow->hmap_node);
+            hash = ovn_logical_flow_hash_datapath(&lflow->od->sb->header_.uuid,
+                                                  hash);
+            /* Add back. */
+            hmap_insert(lflows, &lflow->hmap_node, hash);
+
+            /* Sync to SB. */
+            const struct sbrec_logical_flow *sbflow;
+            sbflow = sbrec_logical_flow_insert(ovnsb_txn);
+            const char *pipeline = ovn_stage_get_pipeline_name(lflow->stage);
+            uint8_t table = ovn_stage_get_table(lflow->stage);
+            sbrec_logical_flow_set_logical_datapath(sbflow, lflow->od->sb);
+            sbrec_logical_flow_set_logical_dp_group(sbflow, NULL);
+            sbrec_logical_flow_set_pipeline(sbflow, pipeline);
+            sbrec_logical_flow_set_table_id(sbflow, table);
+            sbrec_logical_flow_set_priority(sbflow, lflow->priority);
+            sbrec_logical_flow_set_match(sbflow, lflow->match);
+            sbrec_logical_flow_set_actions(sbflow, lflow->actions);
+            if (lflow->io_port) {
+                struct smap tags = SMAP_INITIALIZER(&tags);
+                smap_add(&tags, "in_out_port", lflow->io_port);
+                sbrec_logical_flow_set_tags(sbflow, &tags);
+                smap_destroy(&tags);
+            }
+            sbrec_logical_flow_set_controller_meter(sbflow, lflow->ctrl_meter);
+            /* Trim the source locator lflow->where, which looks something like
+             * "ovn/northd/northd.c:1234", down to just the part following the
+             * last slash, e.g. "northd.c:1234". */
+            const char *slash = strrchr(lflow->where, '/');
+#if _WIN32
+            const char *backslash = strrchr(lflow->where, '\\');
+            if (!slash || backslash > slash) {
+                slash = backslash;
+            }
+#endif
+            const char *where = slash ? slash + 1 : lflow->where;
+
+            struct smap ids = SMAP_INITIALIZER(&ids);
+            smap_add(&ids, "stage-name", ovn_stage_to_str(lflow->stage));
+            smap_add(&ids, "source", where);
+            if (lflow->stage_hint) {
+                smap_add(&ids, "stage-hint", lflow->stage_hint);
+            }
+            sbrec_logical_flow_set_external_ids(sbflow, &ids);
+            smap_destroy(&ids);
+        }
+    }
+    ovs_list_init(&temp_lflow_list);
+    add_lflow_to_temp_list = false;
+    return true;
+
+}
+
 /* Each port group in Port_Group table in OVN_Northbound has a corresponding
  * entry in Port_Group table in OVN_Southbound. In OVN_Northbound the entries
  * contains lport uuids, while in OVN_Southbound we store the lport names.
diff --git a/northd/northd.h b/northd/northd.h
index f549fee0b5f7..8be504811921 100644
--- a/northd/northd.h
+++ b/northd/northd.h
@@ -330,11 +330,15 @@  void northd_indices_create(struct northd_data *data,
 void build_lflows(struct ovsdb_idl_txn *ovnsb_txn,
                   struct lflow_input *input_data,
                   struct hmap *lflows);
+bool lflow_handle_northd_ls_changes(struct ovsdb_idl_txn *ovnsb_txn,
+                                    struct tracked_ls_changes *,
+                                    struct lflow_input *, struct hmap *lflows);
 
 void build_bfd_table(struct ovsdb_idl_txn *ovnsb_txn,
                      const struct nbrec_bfd_table *,
                      const struct sbrec_bfd_table *,
-                     struct hmap *bfd_connections, struct hmap *lr_ports);
+                     const struct hmap *lr_ports,
+                     struct hmap *bfd_connections);
 void bfd_cleanup_connections(const struct nbrec_bfd_table *,
                              struct hmap *bfd_map);
 void run_update_worker_pool(int n_threads);
diff --git a/tests/ovn-northd.at b/tests/ovn-northd.at
index 428237e432c2..deeb1d310925 100644
--- a/tests/ovn-northd.at
+++ b/tests/ovn-northd.at
@@ -9000,14 +9000,14 @@  check as northd ovn-appctl -t NORTHD_TYPE inc-engine/clear-stats
 check ovn-nbctl --wait=hv lsp-add ls0 lsp0-1 -- lsp-set-addresses lsp0-1 "aa:aa:aa:00:00:01 192.168.0.11"
 AT_CHECK([as northd ovn-appctl -t NORTHD_TYPE inc-engine/show-stats northd recompute], [0], [3
 ])
-AT_CHECK([as northd ovn-appctl -t NORTHD_TYPE inc-engine/show-stats lflow recompute], [0], [6
+AT_CHECK([as northd ovn-appctl -t NORTHD_TYPE inc-engine/show-stats lflow recompute], [0], [5
 ])
 
 check as northd ovn-appctl -t NORTHD_TYPE inc-engine/clear-stats
 check ovn-nbctl --wait=hv lsp-add ls0 lsp0-2 -- lsp-set-addresses lsp0-2 "aa:aa:aa:00:00:02 192.168.0.12"
 AT_CHECK([as northd ovn-appctl -t NORTHD_TYPE inc-engine/show-stats northd recompute], [0], [3
 ])
-AT_CHECK([as northd ovn-appctl -t NORTHD_TYPE inc-engine/show-stats lflow recompute], [0], [6
+AT_CHECK([as northd ovn-appctl -t NORTHD_TYPE inc-engine/show-stats lflow recompute], [0], [5
 ])
 
 check as northd ovn-appctl -t NORTHD_TYPE inc-engine/clear-stats