diff mbox series

[ovs-dev] ofctrl: Don't assert while merging tracked flows.

Message ID 20210426105300.994615-1-numans@ovn.org
State Not Applicable
Headers show
Series [ovs-dev] ofctrl: Don't assert while merging tracked flows. | expand

Commit Message

Numan Siddique April 26, 2021, 10:53 a.m. UTC
From: Numan Siddique <numans@ovn.org>

Presently we do ovs_assert(del_f->installed_flows) in
merged_tracked_flows() of ofctrl.c if 'del_f' and flow 'f' are equal
and if 'del_f' is not installed.  But there are a couple of scenarios
this can happen:

1. A flow 'F1' was added to the desired flows, but ofctrl_put() couldn't
install the flow (because of the rconn queue is full) and an SB update
caused 'F1' to be removed and re-added as 'F2'.

2. In a single engine run, a flow 'F1' was added to the desired flow,
removed from the desired flow and re-added as 'F2'.  This could
happen after the commit [1].

The likelihood of (2) seems to be higher with datapath groups enabled.

The assertion - ovs_assert(del_f->installed_flows) is observed in
few deployments after the commit [1].  This patch addressed this issue
by removing that assertion.  Removing this assertion should not have
any side effects as the flow 'F2' will be installed.

This patch is proposed based on the code inspection and the possibility
that the above mentioned scenarios can happen.  The patch doesn't have
any test cases to reproduce these scenarios.

[1] - c6c61b4e3462("ofctrl: Fix the assert seen when flood removing flows with conj actions.")

Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=1951502
Signed-off-by: Numan Siddique <numans@ovn.org>
---
 controller/ofctrl.c | 26 ++++++++++++++++++++++----
 1 file changed, 22 insertions(+), 4 deletions(-)

Comments

Han Zhou April 28, 2021, 7:31 a.m. UTC | #1
Thanks Numan for working on this issue!

On Mon, Apr 26, 2021 at 3:53 AM <numans@ovn.org> wrote:
>
> From: Numan Siddique <numans@ovn.org>
>
> Presently we do ovs_assert(del_f->installed_flows) in
> merged_tracked_flows() of ofctrl.c if 'del_f' and flow 'f' are equal
> and if 'del_f' is not installed.  But there are a couple of scenarios
> this can happen:
>
> 1. A flow 'F1' was added to the desired flows, but ofctrl_put() couldn't
> install the flow (because of the rconn queue is full) and an SB update
> caused 'F1' to be removed and re-added as 'F2'.
>

This seems not possible, because del_f->installed_flow is set whenever a
"installed_flow" is added to the installed_flows table, regardless of
whether it is sent to OVS or not.

> 2. In a single engine run, a flow 'F1' was added to the desired flow,
> removed from the desired flow and re-added as 'F2'.  This could
> happen after the commit [1].

In theory this should not happen either, because if F1 was added and then
removed before it was installed, it would have been removed in
track_flow_del():

        if (!f->installed_flow) {
            /* If it is not installed yet, simply destroy it. */
            desired_flow_destroy(f);
            return;
        }

In fact this is what the comment is supposed to say, but the comment had a
typo (my bad):

            /* del_f must have been installed, otherwise it should have been
            * removed during track_flow_add_or_modify. */

s/track_flow_add_or_modify/track_flow_del

In practice, if commit [1] did trigger this bt, then I'd look deeper into
the logic, but I think removing the assert may just hide the problem.

>
> The likelihood of (2) seems to be higher with datapath groups enabled.
>
> The assertion - ovs_assert(del_f->installed_flows) is observed in
> few deployments after the commit [1].  This patch addressed this issue
> by removing that assertion.  Removing this assertion should not have
> any side effects as the flow 'F2' will be installed.
>
> This patch is proposed based on the code inspection and the possibility
> that the above mentioned scenarios can happen.  The patch doesn't have
> any test cases to reproduce these scenarios.
>
> [1] - c6c61b4e3462("ofctrl: Fix the assert seen when flood removing flows
with conj actions.")
>
> Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=1951502
> Signed-off-by: Numan Siddique <numans@ovn.org>
> ---
>  controller/ofctrl.c | 26 ++++++++++++++++++++++----
>  1 file changed, 22 insertions(+), 4 deletions(-)
>
> diff --git a/controller/ofctrl.c b/controller/ofctrl.c
> index c29c3d180..9e960b034 100644
> --- a/controller/ofctrl.c
> +++ b/controller/ofctrl.c
> @@ -1931,11 +1931,29 @@ merge_tracked_flows(struct ovn_desired_flow_table
*flow_table)
>                  continue;
>              }
>
> -            /* del_f must have been installed, otherwise it should have
been
> -             * removed during track_flow_add_or_modify. */
> -            ovs_assert(del_f->installed_flow);
> +            if (!del_f->installed_flow) {
> +                /* del_f must have been installed, otherwise it should
have
> +                 * been removed during track_flow_add_or_modify.
> +                 *
> +                 * But however there are a couple of scenarios this may
not
> +                 * happen.
> +                 *
> +                 * Scenario 1:  A flow was added to the desired flows,
but
> +                 *              ofctrl_put() couldn't install the flow
and
> +                 *              an SB update caused the 'del_f' to be
removed
> +                 *              and re-added as 'f'.
> +                 *
> +                 * Scenario 2:  In a  single engine run, a flow 'del_f'
was
> +                 *              added to the desired flow, removed from
the
> +                 *              desired flow and re-added as 'f'.  This
could
> +                 *              happen after the commit c6c61b4e3462.
> +                 *
> +                 * Treat both the scenarios as valid scenarios and just
remove
> +                 * 'del_f' from the hmap - deleted_flows.
> +                 * update_installed_flows_by_track() will install 'f'.
> +                 */
>
> -            if (!f->installed_flow) {
> +            } else if (!f->installed_flow) {
>                  /* f is not installed yet. */
>                  replace_installed_to_desired(del_f->installed_flow,
del_f, f);
>              } else {
> --
> 2.29.2
>
> _______________________________________________
> dev mailing list
> dev@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Numan Siddique April 28, 2021, 10:33 a.m. UTC | #2
On Wed, Apr 28, 2021 at 3:31 AM Han Zhou <hzhou@ovn.org> wrote:
>
> Thanks Numan for working on this issue!
>
> On Mon, Apr 26, 2021 at 3:53 AM <numans@ovn.org> wrote:
> >
> > From: Numan Siddique <numans@ovn.org>
> >
> > Presently we do ovs_assert(del_f->installed_flows) in
> > merged_tracked_flows() of ofctrl.c if 'del_f' and flow 'f' are equal
> > and if 'del_f' is not installed.  But there are a couple of scenarios
> > this can happen:
> >
> > 1. A flow 'F1' was added to the desired flows, but ofctrl_put() couldn't
> > install the flow (because of the rconn queue is full) and an SB update
> > caused 'F1' to be removed and re-added as 'F2'.
> >
>
> This seems not possible, because del_f->installed_flow is set whenever a
> "installed_flow" is added to the installed_flows table, regardless of
> whether it is sent to OVS or not.

This can happen if ofctrl_can_put() returns false in which case
ovn-controller will
not call ofctrl_put() at all. There is a theoretical  possibility here for sure.

Let me know if you think otherwise.

We observed this assertion during the upgrades of openshift to an ovn version
and there were a lot of "unreasonable timeout" warnings.

The updated OVN version had the commit [1] which led me to think that
commit [1] can also cause this.

>
> > 2. In a single engine run, a flow 'F1' was added to the desired flow,
> > removed from the desired flow and re-added as 'F2'.  This could
> > happen after the commit [1].
>
> In theory this should not happen either, because if F1 was added and then
> removed before it was installed, it would have been removed in
> track_flow_del():
>
>         if (!f->installed_flow) {
>             /* If it is not installed yet, simply destroy it. */
>             desired_flow_destroy(f);
>             return;
>         }
>
> In fact this is what the comment is supposed to say, but the comment had a
> typo (my bad):
>
>             /* del_f must have been installed, otherwise it should have been
>             * removed during track_flow_add_or_modify. */
>
> s/track_flow_add_or_modify/track_flow_del

This typo really confused me :). From your comments the scenario 2 seems
not possible.

But I still think scenario 1 can happen if ofctrl_can_put() returns FALSE.
What do you think ?

Thanks
Numan

>
> In practice, if commit [1] did trigger this bt, then I'd look deeper into
> the logic, but I think removing the assert may just hide the problem.
>
> >
> > The likelihood of (2) seems to be higher with datapath groups enabled.
> >
> > The assertion - ovs_assert(del_f->installed_flows) is observed in
> > few deployments after the commit [1].  This patch addressed this issue
> > by removing that assertion.  Removing this assertion should not have
> > any side effects as the flow 'F2' will be installed.
> >
> > This patch is proposed based on the code inspection and the possibility
> > that the above mentioned scenarios can happen.  The patch doesn't have
> > any test cases to reproduce these scenarios.
> >
> > [1] - c6c61b4e3462("ofctrl: Fix the assert seen when flood removing flows
> with conj actions.")
> >
> > Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=1951502
> > Signed-off-by: Numan Siddique <numans@ovn.org>
> > ---
> >  controller/ofctrl.c | 26 ++++++++++++++++++++++----
> >  1 file changed, 22 insertions(+), 4 deletions(-)
> >
> > diff --git a/controller/ofctrl.c b/controller/ofctrl.c
> > index c29c3d180..9e960b034 100644
> > --- a/controller/ofctrl.c
> > +++ b/controller/ofctrl.c
> > @@ -1931,11 +1931,29 @@ merge_tracked_flows(struct ovn_desired_flow_table
> *flow_table)
> >                  continue;
> >              }
> >
> > -            /* del_f must have been installed, otherwise it should have
> been
> > -             * removed during track_flow_add_or_modify. */
> > -            ovs_assert(del_f->installed_flow);
> > +            if (!del_f->installed_flow) {
> > +                /* del_f must have been installed, otherwise it should
> have
> > +                 * been removed during track_flow_add_or_modify.
> > +                 *
> > +                 * But however there are a couple of scenarios this may
> not
> > +                 * happen.
> > +                 *
> > +                 * Scenario 1:  A flow was added to the desired flows,
> but
> > +                 *              ofctrl_put() couldn't install the flow
> and
> > +                 *              an SB update caused the 'del_f' to be
> removed
> > +                 *              and re-added as 'f'.
> > +                 *
> > +                 * Scenario 2:  In a  single engine run, a flow 'del_f'
> was
> > +                 *              added to the desired flow, removed from
> the
> > +                 *              desired flow and re-added as 'f'.  This
> could
> > +                 *              happen after the commit c6c61b4e3462.
> > +                 *
> > +                 * Treat both the scenarios as valid scenarios and just
> remove
> > +                 * 'del_f' from the hmap - deleted_flows.
> > +                 * update_installed_flows_by_track() will install 'f'.
> > +                 */
> >
> > -            if (!f->installed_flow) {
> > +            } else if (!f->installed_flow) {
> >                  /* f is not installed yet. */
> >                  replace_installed_to_desired(del_f->installed_flow,
> del_f, f);
> >              } else {
> > --
> > 2.29.2
> >
> > _______________________________________________
> > dev mailing list
> > dev@openvswitch.org
> > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> _______________________________________________
> dev mailing list
> dev@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>
Han Zhou April 28, 2021, 7:09 p.m. UTC | #3
On Wed, Apr 28, 2021 at 3:34 AM Numan Siddique <numans@ovn.org> wrote:
>
> On Wed, Apr 28, 2021 at 3:31 AM Han Zhou <hzhou@ovn.org> wrote:
> >
> > Thanks Numan for working on this issue!
> >
> > On Mon, Apr 26, 2021 at 3:53 AM <numans@ovn.org> wrote:
> > >
> > > From: Numan Siddique <numans@ovn.org>
> > >
> > > Presently we do ovs_assert(del_f->installed_flows) in
> > > merged_tracked_flows() of ofctrl.c if 'del_f' and flow 'f' are equal
> > > and if 'del_f' is not installed.  But there are a couple of scenarios
> > > this can happen:
> > >
> > > 1. A flow 'F1' was added to the desired flows, but ofctrl_put()
couldn't
> > > install the flow (because of the rconn queue is full) and an SB update
> > > caused 'F1' to be removed and re-added as 'F2'.
> > >
> >
> > This seems not possible, because del_f->installed_flow is set whenever a
> > "installed_flow" is added to the installed_flows table, regardless of
> > whether it is sent to OVS or not.
>
> This can happen if ofctrl_can_put() returns false in which case
> ovn-controller will
> not call ofctrl_put() at all. There is a theoretical  possibility here
for sure.
>
> Let me know if you think otherwise.
>

Oh, sorry that I misunderstood. You are right that ofctrl_put() may return
immediately without installing the desired flow F1, but if that happens,
F1->installed_flow should be NULL, and then it is the same case as in
scenario 2) below: when F1 is deleted, it would be destroyed in
track_flow_del(). So we should never end up with a tracked deleted flow
with installed_flow being NULL, right? The logic is, in theory, when a
"desired but not yet installed" flow is being deleted, we should simply
destroy it and remove it from tracking. Maybe we should check if there are
other possibilities.

> We observed this assertion during the upgrades of openshift to an ovn
version
> and there were a lot of "unreasonable timeout" warnings.
>
> The updated OVN version had the commit [1] which led me to think that
> commit [1] can also cause this.
>
> >
> > > 2. In a single engine run, a flow 'F1' was added to the desired flow,
> > > removed from the desired flow and re-added as 'F2'.  This could
> > > happen after the commit [1].
> >
> > In theory this should not happen either, because if F1 was added and
then
> > removed before it was installed, it would have been removed in
> > track_flow_del():
> >
> >         if (!f->installed_flow) {
> >             /* If it is not installed yet, simply destroy it. */
> >             desired_flow_destroy(f);
> >             return;
> >         }
> >
> > In fact this is what the comment is supposed to say, but the comment
had a
> > typo (my bad):
> >
> >             /* del_f must have been installed, otherwise it should have
been
> >             * removed during track_flow_add_or_modify. */
> >
> > s/track_flow_add_or_modify/track_flow_del
>
> This typo really confused me :). From your comments the scenario 2 seems
> not possible.
>
> But I still think scenario 1 can happen if ofctrl_can_put() returns FALSE.
> What do you think ?
>
> Thanks
> Numan
>
> >
> > In practice, if commit [1] did trigger this bt, then I'd look deeper
into
> > the logic, but I think removing the assert may just hide the problem.
> >
> > >
> > > The likelihood of (2) seems to be higher with datapath groups enabled.
> > >
> > > The assertion - ovs_assert(del_f->installed_flows) is observed in
> > > few deployments after the commit [1].  This patch addressed this issue
> > > by removing that assertion.  Removing this assertion should not have
> > > any side effects as the flow 'F2' will be installed.
> > >
> > > This patch is proposed based on the code inspection and the
possibility
> > > that the above mentioned scenarios can happen.  The patch doesn't have
> > > any test cases to reproduce these scenarios.
> > >
> > > [1] - c6c61b4e3462("ofctrl: Fix the assert seen when flood removing
flows
> > with conj actions.")
> > >
> > > Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=1951502
> > > Signed-off-by: Numan Siddique <numans@ovn.org>
> > > ---
> > >  controller/ofctrl.c | 26 ++++++++++++++++++++++----
> > >  1 file changed, 22 insertions(+), 4 deletions(-)
> > >
> > > diff --git a/controller/ofctrl.c b/controller/ofctrl.c
> > > index c29c3d180..9e960b034 100644
> > > --- a/controller/ofctrl.c
> > > +++ b/controller/ofctrl.c
> > > @@ -1931,11 +1931,29 @@ merge_tracked_flows(struct
ovn_desired_flow_table
> > *flow_table)
> > >                  continue;
> > >              }
> > >
> > > -            /* del_f must have been installed, otherwise it should
have
> > been
> > > -             * removed during track_flow_add_or_modify. */
> > > -            ovs_assert(del_f->installed_flow);
> > > +            if (!del_f->installed_flow) {
> > > +                /* del_f must have been installed, otherwise it
should
> > have
> > > +                 * been removed during track_flow_add_or_modify.
> > > +                 *
> > > +                 * But however there are a couple of scenarios this
may
> > not
> > > +                 * happen.
> > > +                 *
> > > +                 * Scenario 1:  A flow was added to the desired
flows,
> > but
> > > +                 *              ofctrl_put() couldn't install the
flow
> > and
> > > +                 *              an SB update caused the 'del_f' to be
> > removed
> > > +                 *              and re-added as 'f'.
> > > +                 *
> > > +                 * Scenario 2:  In a  single engine run, a flow
'del_f'
> > was
> > > +                 *              added to the desired flow, removed
from
> > the
> > > +                 *              desired flow and re-added as 'f'.
This
> > could
> > > +                 *              happen after the commit c6c61b4e3462.
> > > +                 *
> > > +                 * Treat both the scenarios as valid scenarios and
just
> > remove
> > > +                 * 'del_f' from the hmap - deleted_flows.
> > > +                 * update_installed_flows_by_track() will install
'f'.
> > > +                 */
> > >
> > > -            if (!f->installed_flow) {
> > > +            } else if (!f->installed_flow) {
> > >                  /* f is not installed yet. */
> > >                  replace_installed_to_desired(del_f->installed_flow,
> > del_f, f);
> > >              } else {
> > > --
> > > 2.29.2
> > >
> > > _______________________________________________
> > > dev mailing list
> > > dev@openvswitch.org
> > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> > _______________________________________________
> > dev mailing list
> > dev@openvswitch.org
> > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> >
Numan Siddique April 29, 2021, 4:25 p.m. UTC | #4
On Wed, Apr 28, 2021 at 3:10 PM Han Zhou <hzhou@ovn.org> wrote:
>
> On Wed, Apr 28, 2021 at 3:34 AM Numan Siddique <numans@ovn.org> wrote:
> >
> > On Wed, Apr 28, 2021 at 3:31 AM Han Zhou <hzhou@ovn.org> wrote:
> > >
> > > Thanks Numan for working on this issue!
> > >
> > > On Mon, Apr 26, 2021 at 3:53 AM <numans@ovn.org> wrote:
> > > >
> > > > From: Numan Siddique <numans@ovn.org>
> > > >
> > > > Presently we do ovs_assert(del_f->installed_flows) in
> > > > merged_tracked_flows() of ofctrl.c if 'del_f' and flow 'f' are equal
> > > > and if 'del_f' is not installed.  But there are a couple of scenarios
> > > > this can happen:
> > > >
> > > > 1. A flow 'F1' was added to the desired flows, but ofctrl_put()
> couldn't
> > > > install the flow (because of the rconn queue is full) and an SB update
> > > > caused 'F1' to be removed and re-added as 'F2'.
> > > >
> > >
> > > This seems not possible, because del_f->installed_flow is set whenever a
> > > "installed_flow" is added to the installed_flows table, regardless of
> > > whether it is sent to OVS or not.
> >
> > This can happen if ofctrl_can_put() returns false in which case
> > ovn-controller will
> > not call ofctrl_put() at all. There is a theoretical  possibility here
> for sure.
> >
> > Let me know if you think otherwise.
> >
>
> Oh, sorry that I misunderstood. You are right that ofctrl_put() may return
> immediately without installing the desired flow F1, but if that happens,
> F1->installed_flow should be NULL, and then it is the same case as in
> scenario 2) below: when F1 is deleted, it would be destroyed in
> track_flow_del(). So we should never end up with a tracked deleted flow
> with installed_flow being NULL, right? The logic is, in theory, when a
> "desired but not yet installed" flow is being deleted, we should simply
> destroy it and remove it from tracking. Maybe we should check if there are
> other possibilities.

You're right.  F1 should be destroyed in track_flow_del().

I got access to the core dump and I can see below details of "f" and "del_f".

Please let me know if this rings any bell

-----
(gdb) frame 6
#6  0x00005569459bfbaa in merge_tracked_flows
(flow_table=0x5569470c5b40) at
/usr/src/debug/ovn2.13-20.12.0-24.el8fdp.x86_64/openvswitch-2.14.90/include/openvswitch/hmap.h:283
283             hmap_expand_at(hmap, where);
(gdb) print del_f
$1 = (struct desired_flow *) 0x556948c670b0
(gdb) print *del_f
$2 = {flow = {table_id = 37 '%', priority = 100, match = {{{flow =
0x55694909a540, mask = 0x55694909a560}, flows = {0x55694909a540,
0x55694909a560}}, tun_md = 0x0}, hash = 2338012780, ofpacts =
0x556948d96460, ofpacts_len = 432, cookie = 1418516517},
  match_hmap_node = {hash = 2338012780, next = 0x0}, list_node = {prev
= 0x556948c67100, next = 0x556948c67100}, references = {prev =
0x556948c67110, next = 0x556948c67110}, installed_flow = 0x0,
installed_ref_list_node = {prev = 0x5569495bcd90,
    next = 0x5569495bcd90}, track_list_node = {prev = 0x5569470c5b88,
next = 0x556948d2bfc8}, is_deleted = true}


(gdb) print f
$7 = (struct desired_flow *) 0x556948d2bf40

(gdb) print *f
$6 = {flow = {table_id = 37 '%', priority = 100, match = {{{flow =
0x556948eaf1a0, mask = 0x556948eaf1c0}, flows = {0x556948eaf1a0,
0x556948eaf1c0}}, tun_md = 0x0}, hash = 2338012780, ofpacts =
0x5569472a15e0, ofpacts_len = 432, cookie = 1418516517},
  match_hmap_node = {hash = 2338012780, next = 0x556948da8070},
list_node = {prev = 0x556948d2bf90, next = 0x556948d2bf90}, references
= {prev = 0x556947b35980, next = 0x556947b35980}, installed_flow =
0x0, installed_ref_list_node = {prev = 0x556948d2bfb8,
    next = 0x556948d2bfb8}, track_list_node = {prev = 0x556948c67138,
next = 0x556948e68658}, is_deleted = false}

------

Looks like there are some issues with physical flow handling.  Seems
to me an installed flow F1 is deleted and in the same loop, F2 is
added
to the tracked flow which has the same match, actions and cookie.

Table id is 37 which is OFTABLE_REMOTE_OUTPUT.  The SB resource can be
either port binding or multicast group.

Thanks for the comments so far. I will continue debugging.  This patch
can be dropped.

Numan

>
> > We observed this assertion during the upgrades of openshift to an ovn
> version
> > and there were a lot of "unreasonable timeout" warnings.
> >
> > The updated OVN version had the commit [1] which led me to think that
> > commit [1] can also cause this.
> >
> > >
> > > > 2. In a single engine run, a flow 'F1' was added to the desired flow,
> > > > removed from the desired flow and re-added as 'F2'.  This could
> > > > happen after the commit [1].
> > >
> > > In theory this should not happen either, because if F1 was added and
> then
> > > removed before it was installed, it would have been removed in
> > > track_flow_del():
> > >
> > >         if (!f->installed_flow) {
> > >             /* If it is not installed yet, simply destroy it. */
> > >             desired_flow_destroy(f);
> > >             return;
> > >         }
> > >
> > > In fact this is what the comment is supposed to say, but the comment
> had a
> > > typo (my bad):
> > >
> > >             /* del_f must have been installed, otherwise it should have
> been
> > >             * removed during track_flow_add_or_modify. */
> > >
> > > s/track_flow_add_or_modify/track_flow_del
> >
> > This typo really confused me :). From your comments the scenario 2 seems
> > not possible.
> >
> > But I still think scenario 1 can happen if ofctrl_can_put() returns FALSE.
> > What do you think ?
> >
> > Thanks
> > Numan
> >
> > >
> > > In practice, if commit [1] did trigger this bt, then I'd look deeper
> into
> > > the logic, but I think removing the assert may just hide the problem.
> > >
> > > >
> > > > The likelihood of (2) seems to be higher with datapath groups enabled.
> > > >
> > > > The assertion - ovs_assert(del_f->installed_flows) is observed in
> > > > few deployments after the commit [1].  This patch addressed this issue
> > > > by removing that assertion.  Removing this assertion should not have
> > > > any side effects as the flow 'F2' will be installed.
> > > >
> > > > This patch is proposed based on the code inspection and the
> possibility
> > > > that the above mentioned scenarios can happen.  The patch doesn't have
> > > > any test cases to reproduce these scenarios.
> > > >
> > > > [1] - c6c61b4e3462("ofctrl: Fix the assert seen when flood removing
> flows
> > > with conj actions.")
> > > >
> > > > Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=1951502
> > > > Signed-off-by: Numan Siddique <numans@ovn.org>
> > > > ---
> > > >  controller/ofctrl.c | 26 ++++++++++++++++++++++----
> > > >  1 file changed, 22 insertions(+), 4 deletions(-)
> > > >
> > > > diff --git a/controller/ofctrl.c b/controller/ofctrl.c
> > > > index c29c3d180..9e960b034 100644
> > > > --- a/controller/ofctrl.c
> > > > +++ b/controller/ofctrl.c
> > > > @@ -1931,11 +1931,29 @@ merge_tracked_flows(struct
> ovn_desired_flow_table
> > > *flow_table)
> > > >                  continue;
> > > >              }
> > > >
> > > > -            /* del_f must have been installed, otherwise it should
> have
> > > been
> > > > -             * removed during track_flow_add_or_modify. */
> > > > -            ovs_assert(del_f->installed_flow);
> > > > +            if (!del_f->installed_flow) {
> > > > +                /* del_f must have been installed, otherwise it
> should
> > > have
> > > > +                 * been removed during track_flow_add_or_modify.
> > > > +                 *
> > > > +                 * But however there are a couple of scenarios this
> may
> > > not
> > > > +                 * happen.
> > > > +                 *
> > > > +                 * Scenario 1:  A flow was added to the desired
> flows,
> > > but
> > > > +                 *              ofctrl_put() couldn't install the
> flow
> > > and
> > > > +                 *              an SB update caused the 'del_f' to be
> > > removed
> > > > +                 *              and re-added as 'f'.
> > > > +                 *
> > > > +                 * Scenario 2:  In a  single engine run, a flow
> 'del_f'
> > > was
> > > > +                 *              added to the desired flow, removed
> from
> > > the
> > > > +                 *              desired flow and re-added as 'f'.
> This
> > > could
> > > > +                 *              happen after the commit c6c61b4e3462.
> > > > +                 *
> > > > +                 * Treat both the scenarios as valid scenarios and
> just
> > > remove
> > > > +                 * 'del_f' from the hmap - deleted_flows.
> > > > +                 * update_installed_flows_by_track() will install
> 'f'.
> > > > +                 */
> > > >
> > > > -            if (!f->installed_flow) {
> > > > +            } else if (!f->installed_flow) {
> > > >                  /* f is not installed yet. */
> > > >                  replace_installed_to_desired(del_f->installed_flow,
> > > del_f, f);
> > > >              } else {
> > > > --
> > > > 2.29.2
> > > >
> > > > _______________________________________________
> > > > dev mailing list
> > > > dev@openvswitch.org
> > > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> > > _______________________________________________
> > > dev mailing list
> > > dev@openvswitch.org
> > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> > >
> _______________________________________________
> dev mailing list
> dev@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>
Han Zhou May 5, 2021, 6:24 a.m. UTC | #5
On Thu, Apr 29, 2021 at 9:25 AM Numan Siddique <numans@ovn.org> wrote:
>
> On Wed, Apr 28, 2021 at 3:10 PM Han Zhou <hzhou@ovn.org> wrote:
> >
> > On Wed, Apr 28, 2021 at 3:34 AM Numan Siddique <numans@ovn.org> wrote:
> > >
> > > On Wed, Apr 28, 2021 at 3:31 AM Han Zhou <hzhou@ovn.org> wrote:
> > > >
> > > > Thanks Numan for working on this issue!
> > > >
> > > > On Mon, Apr 26, 2021 at 3:53 AM <numans@ovn.org> wrote:
> > > > >
> > > > > From: Numan Siddique <numans@ovn.org>
> > > > >
> > > > > Presently we do ovs_assert(del_f->installed_flows) in
> > > > > merged_tracked_flows() of ofctrl.c if 'del_f' and flow 'f' are
equal
> > > > > and if 'del_f' is not installed.  But there are a couple of
scenarios
> > > > > this can happen:
> > > > >
> > > > > 1. A flow 'F1' was added to the desired flows, but ofctrl_put()
> > couldn't
> > > > > install the flow (because of the rconn queue is full) and an SB
update
> > > > > caused 'F1' to be removed and re-added as 'F2'.
> > > > >
> > > >
> > > > This seems not possible, because del_f->installed_flow is set
whenever a
> > > > "installed_flow" is added to the installed_flows table, regardless
of
> > > > whether it is sent to OVS or not.
> > >
> > > This can happen if ofctrl_can_put() returns false in which case
> > > ovn-controller will
> > > not call ofctrl_put() at all. There is a theoretical  possibility here
> > for sure.
> > >
> > > Let me know if you think otherwise.
> > >
> >
> > Oh, sorry that I misunderstood. You are right that ofctrl_put() may
return
> > immediately without installing the desired flow F1, but if that happens,
> > F1->installed_flow should be NULL, and then it is the same case as in
> > scenario 2) below: when F1 is deleted, it would be destroyed in
> > track_flow_del(). So we should never end up with a tracked deleted flow
> > with installed_flow being NULL, right? The logic is, in theory, when a
> > "desired but not yet installed" flow is being deleted, we should simply
> > destroy it and remove it from tracking. Maybe we should check if there
are
> > other possibilities.
>
> You're right.  F1 should be destroyed in track_flow_del().
>
> I got access to the core dump and I can see below details of "f" and
"del_f".
>
> Please let me know if this rings any bell
>
> -----
> (gdb) frame 6
> #6  0x00005569459bfbaa in merge_tracked_flows
> (flow_table=0x5569470c5b40) at
>
/usr/src/debug/ovn2.13-20.12.0-24.el8fdp.x86_64/openvswitch-2.14.90/include/openvswitch/hmap.h:283
> 283             hmap_expand_at(hmap, where);
> (gdb) print del_f
> $1 = (struct desired_flow *) 0x556948c670b0
> (gdb) print *del_f
> $2 = {flow = {table_id = 37 '%', priority = 100, match = {{{flow =
> 0x55694909a540, mask = 0x55694909a560}, flows = {0x55694909a540,
> 0x55694909a560}}, tun_md = 0x0}, hash = 2338012780, ofpacts =
> 0x556948d96460, ofpacts_len = 432, cookie = 1418516517},
>   match_hmap_node = {hash = 2338012780, next = 0x0}, list_node = {prev
> = 0x556948c67100, next = 0x556948c67100}, references = {prev =
> 0x556948c67110, next = 0x556948c67110}, installed_flow = 0x0,
> installed_ref_list_node = {prev = 0x5569495bcd90,
>     next = 0x5569495bcd90}, track_list_node = {prev = 0x5569470c5b88,
> next = 0x556948d2bfc8}, is_deleted = true}
>
Looking at the content of the del_f, the installed_ref_list_node is not
pointing to the field of the flow itself, which means at least it was in
some installed_flow's desired_refs list, which means it was installed at
least at some point before. Now that the "installed_flow" is NULL, it is
possible that the desired flow was installed but then unlinked from the
installed flow. But by reviewing the code I didn't find any path that
"unlink_installed_to_desired()" or "replace_installed_to_desired()" could
happen to this flow before this point. If it is unlinked it should also be
removed from the track list and also from the deleted_flows hmap and
destroyed so it should never be found again. Of course it is also possible
that this desired_flow data structure is corrupted so the fields
nstalled_ref_list_node and installed_flow just have inconsistent data. I
will keep looking into the code. At the same time, is this reproducible? We
could add some more logs to help debugging if this happens again.

>
> (gdb) print f
> $7 = (struct desired_flow *) 0x556948d2bf40
>
> (gdb) print *f
> $6 = {flow = {table_id = 37 '%', priority = 100, match = {{{flow =
> 0x556948eaf1a0, mask = 0x556948eaf1c0}, flows = {0x556948eaf1a0,
> 0x556948eaf1c0}}, tun_md = 0x0}, hash = 2338012780, ofpacts =
> 0x5569472a15e0, ofpacts_len = 432, cookie = 1418516517},
>   match_hmap_node = {hash = 2338012780, next = 0x556948da8070},
> list_node = {prev = 0x556948d2bf90, next = 0x556948d2bf90}, references
> = {prev = 0x556947b35980, next = 0x556947b35980}, installed_flow =
> 0x0, installed_ref_list_node = {prev = 0x556948d2bfb8,
>     next = 0x556948d2bfb8}, track_list_node = {prev = 0x556948c67138,
> next = 0x556948e68658}, is_deleted = false}
>
> ------
>
> Looks like there are some issues with physical flow handling.  Seems
> to me an installed flow F1 is deleted and in the same loop, F2 is
> added
> to the tracked flow which has the same match, actions and cookie.
>
> Table id is 37 which is OFTABLE_REMOTE_OUTPUT.  The SB resource can be
> either port binding or multicast group.
>
> Thanks for the comments so far. I will continue debugging.  This patch
> can be dropped.
>
> Numan
>
> >
> > > We observed this assertion during the upgrades of openshift to an ovn
> > version
> > > and there were a lot of "unreasonable timeout" warnings.
> > >
> > > The updated OVN version had the commit [1] which led me to think that
> > > commit [1] can also cause this.
> > >
> > > >
> > > > > 2. In a single engine run, a flow 'F1' was added to the desired
flow,
> > > > > removed from the desired flow and re-added as 'F2'.  This could
> > > > > happen after the commit [1].
> > > >
> > > > In theory this should not happen either, because if F1 was added and
> > then
> > > > removed before it was installed, it would have been removed in
> > > > track_flow_del():
> > > >
> > > >         if (!f->installed_flow) {
> > > >             /* If it is not installed yet, simply destroy it. */
> > > >             desired_flow_destroy(f);
> > > >             return;
> > > >         }
> > > >
> > > > In fact this is what the comment is supposed to say, but the comment
> > had a
> > > > typo (my bad):
> > > >
> > > >             /* del_f must have been installed, otherwise it should
have
> > been
> > > >             * removed during track_flow_add_or_modify. */
> > > >
> > > > s/track_flow_add_or_modify/track_flow_del
> > >
> > > This typo really confused me :). From your comments the scenario 2
seems
> > > not possible.
> > >
> > > But I still think scenario 1 can happen if ofctrl_can_put() returns
FALSE.
> > > What do you think ?
> > >
> > > Thanks
> > > Numan
> > >
> > > >
> > > > In practice, if commit [1] did trigger this bt, then I'd look deeper
> > into
> > > > the logic, but I think removing the assert may just hide the
problem.
> > > >
> > > > >
> > > > > The likelihood of (2) seems to be higher with datapath groups
enabled.
> > > > >
> > > > > The assertion - ovs_assert(del_f->installed_flows) is observed in
> > > > > few deployments after the commit [1].  This patch addressed this
issue
> > > > > by removing that assertion.  Removing this assertion should not
have
> > > > > any side effects as the flow 'F2' will be installed.
> > > > >
> > > > > This patch is proposed based on the code inspection and the
> > possibility
> > > > > that the above mentioned scenarios can happen.  The patch doesn't
have
> > > > > any test cases to reproduce these scenarios.
> > > > >
> > > > > [1] - c6c61b4e3462("ofctrl: Fix the assert seen when flood
removing
> > flows
> > > > with conj actions.")
> > > > >
> > > > > Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=1951502
> > > > > Signed-off-by: Numan Siddique <numans@ovn.org>
> > > > > ---
> > > > >  controller/ofctrl.c | 26 ++++++++++++++++++++++----
> > > > >  1 file changed, 22 insertions(+), 4 deletions(-)
> > > > >
> > > > > diff --git a/controller/ofctrl.c b/controller/ofctrl.c
> > > > > index c29c3d180..9e960b034 100644
> > > > > --- a/controller/ofctrl.c
> > > > > +++ b/controller/ofctrl.c
> > > > > @@ -1931,11 +1931,29 @@ merge_tracked_flows(struct
> > ovn_desired_flow_table
> > > > *flow_table)
> > > > >                  continue;
> > > > >              }
> > > > >
> > > > > -            /* del_f must have been installed, otherwise it
should
> > have
> > > > been
> > > > > -             * removed during track_flow_add_or_modify. */
> > > > > -            ovs_assert(del_f->installed_flow);
> > > > > +            if (!del_f->installed_flow) {
> > > > > +                /* del_f must have been installed, otherwise it
> > should
> > > > have
> > > > > +                 * been removed during track_flow_add_or_modify.
> > > > > +                 *
> > > > > +                 * But however there are a couple of scenarios
this
> > may
> > > > not
> > > > > +                 * happen.
> > > > > +                 *
> > > > > +                 * Scenario 1:  A flow was added to the desired
> > flows,
> > > > but
> > > > > +                 *              ofctrl_put() couldn't install the
> > flow
> > > > and
> > > > > +                 *              an SB update caused the 'del_f'
to be
> > > > removed
> > > > > +                 *              and re-added as 'f'.
> > > > > +                 *
> > > > > +                 * Scenario 2:  In a  single engine run, a flow
> > 'del_f'
> > > > was
> > > > > +                 *              added to the desired flow,
removed
> > from
> > > > the
> > > > > +                 *              desired flow and re-added as 'f'.
> > This
> > > > could
> > > > > +                 *              happen after the commit
c6c61b4e3462.
> > > > > +                 *
> > > > > +                 * Treat both the scenarios as valid scenarios
and
> > just
> > > > remove
> > > > > +                 * 'del_f' from the hmap - deleted_flows.
> > > > > +                 * update_installed_flows_by_track() will install
> > 'f'.
> > > > > +                 */
> > > > >
> > > > > -            if (!f->installed_flow) {
> > > > > +            } else if (!f->installed_flow) {
> > > > >                  /* f is not installed yet. */
> > > > >
 replace_installed_to_desired(del_f->installed_flow,
> > > > del_f, f);
> > > > >              } else {
> > > > > --
> > > > > 2.29.2
> > > > >
> > > > > _______________________________________________
> > > > > dev mailing list
> > > > > dev@openvswitch.org
> > > > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> > > > _______________________________________________
> > > > dev mailing list
> > > > dev@openvswitch.org
> > > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> > > >
> > _______________________________________________
> > dev mailing list
> > dev@openvswitch.org
> > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> >
Numan Siddique May 6, 2021, 5:19 p.m. UTC | #6
On Wed, May 5, 2021 at 2:25 AM Han Zhou <hzhou@ovn.org> wrote:
>
> On Thu, Apr 29, 2021 at 9:25 AM Numan Siddique <numans@ovn.org> wrote:
> >
> > On Wed, Apr 28, 2021 at 3:10 PM Han Zhou <hzhou@ovn.org> wrote:
> > >
> > > On Wed, Apr 28, 2021 at 3:34 AM Numan Siddique <numans@ovn.org> wrote:
> > > >
> > > > On Wed, Apr 28, 2021 at 3:31 AM Han Zhou <hzhou@ovn.org> wrote:
> > > > >
> > > > > Thanks Numan for working on this issue!
> > > > >
> > > > > On Mon, Apr 26, 2021 at 3:53 AM <numans@ovn.org> wrote:
> > > > > >
> > > > > > From: Numan Siddique <numans@ovn.org>
> > > > > >
> > > > > > Presently we do ovs_assert(del_f->installed_flows) in
> > > > > > merged_tracked_flows() of ofctrl.c if 'del_f' and flow 'f' are
> equal
> > > > > > and if 'del_f' is not installed.  But there are a couple of
> scenarios
> > > > > > this can happen:
> > > > > >
> > > > > > 1. A flow 'F1' was added to the desired flows, but ofctrl_put()
> > > couldn't
> > > > > > install the flow (because of the rconn queue is full) and an SB
> update
> > > > > > caused 'F1' to be removed and re-added as 'F2'.
> > > > > >
> > > > >
> > > > > This seems not possible, because del_f->installed_flow is set
> whenever a
> > > > > "installed_flow" is added to the installed_flows table, regardless
> of
> > > > > whether it is sent to OVS or not.
> > > >
> > > > This can happen if ofctrl_can_put() returns false in which case
> > > > ovn-controller will
> > > > not call ofctrl_put() at all. There is a theoretical  possibility here
> > > for sure.
> > > >
> > > > Let me know if you think otherwise.
> > > >
> > >
> > > Oh, sorry that I misunderstood. You are right that ofctrl_put() may
> return
> > > immediately without installing the desired flow F1, but if that happens,
> > > F1->installed_flow should be NULL, and then it is the same case as in
> > > scenario 2) below: when F1 is deleted, it would be destroyed in
> > > track_flow_del(). So we should never end up with a tracked deleted flow
> > > with installed_flow being NULL, right? The logic is, in theory, when a
> > > "desired but not yet installed" flow is being deleted, we should simply
> > > destroy it and remove it from tracking. Maybe we should check if there
> are
> > > other possibilities.
> >
> > You're right.  F1 should be destroyed in track_flow_del().
> >
> > I got access to the core dump and I can see below details of "f" and
> "del_f".
> >
> > Please let me know if this rings any bell
> >
> > -----
> > (gdb) frame 6
> > #6  0x00005569459bfbaa in merge_tracked_flows
> > (flow_table=0x5569470c5b40) at
> >
> /usr/src/debug/ovn2.13-20.12.0-24.el8fdp.x86_64/openvswitch-2.14.90/include/openvswitch/hmap.h:283
> > 283             hmap_expand_at(hmap, where);
> > (gdb) print del_f
> > $1 = (struct desired_flow *) 0x556948c670b0
> > (gdb) print *del_f
> > $2 = {flow = {table_id = 37 '%', priority = 100, match = {{{flow =
> > 0x55694909a540, mask = 0x55694909a560}, flows = {0x55694909a540,
> > 0x55694909a560}}, tun_md = 0x0}, hash = 2338012780, ofpacts =
> > 0x556948d96460, ofpacts_len = 432, cookie = 1418516517},
> >   match_hmap_node = {hash = 2338012780, next = 0x0}, list_node = {prev
> > = 0x556948c67100, next = 0x556948c67100}, references = {prev =
> > 0x556948c67110, next = 0x556948c67110}, installed_flow = 0x0,
> > installed_ref_list_node = {prev = 0x5569495bcd90,
> >     next = 0x5569495bcd90}, track_list_node = {prev = 0x5569470c5b88,
> > next = 0x556948d2bfc8}, is_deleted = true}
> >
> Looking at the content of the del_f, the installed_ref_list_node is not
> pointing to the field of the flow itself, which means at least it was in
> some installed_flow's desired_refs list, which means it was installed at
> least at some point before. Now that the "installed_flow" is NULL, it is
> possible that the desired flow was installed but then unlinked from the
> installed flow. But by reviewing the code I didn't find any path that
> "unlink_installed_to_desired()" or "replace_installed_to_desired()" could
> happen to this flow before this point. If it is unlinked it should also be
> removed from the track list and also from the deleted_flows hmap and
> destroyed so it should never be found again. Of course it is also possible
> that this desired_flow data structure is corrupted so the fields
> nstalled_ref_list_node and installed_flow just have inconsistent data. I
> will keep looking into the code. At the same time, is this reproducible? We
> could add some more logs to help debugging if this happens again.
>

In one of the environments, his issue was seen when upgrading an OCP cluster.
and there were some failures related to memory and OOM had kicked in.

I am wondering if the issue is related to OOM. In order to address the
OOM issues,
few kernel patches were required.  After addressing these issues, ovn-controller
assert is not seen after that.

Thanks for looking into the backtraces.

Numan



> >
> > (gdb) print f
> > $7 = (struct desired_flow *) 0x556948d2bf40
> >
> > (gdb) print *f
> > $6 = {flow = {table_id = 37 '%', priority = 100, match = {{{flow =
> > 0x556948eaf1a0, mask = 0x556948eaf1c0}, flows = {0x556948eaf1a0,
> > 0x556948eaf1c0}}, tun_md = 0x0}, hash = 2338012780, ofpacts =
> > 0x5569472a15e0, ofpacts_len = 432, cookie = 1418516517},
> >   match_hmap_node = {hash = 2338012780, next = 0x556948da8070},
> > list_node = {prev = 0x556948d2bf90, next = 0x556948d2bf90}, references
> > = {prev = 0x556947b35980, next = 0x556947b35980}, installed_flow =
> > 0x0, installed_ref_list_node = {prev = 0x556948d2bfb8,
> >     next = 0x556948d2bfb8}, track_list_node = {prev = 0x556948c67138,
> > next = 0x556948e68658}, is_deleted = false}
> >
> > ------
> >
> > Looks like there are some issues with physical flow handling.  Seems
> > to me an installed flow F1 is deleted and in the same loop, F2 is
> > added
> > to the tracked flow which has the same match, actions and cookie.
> >
> > Table id is 37 which is OFTABLE_REMOTE_OUTPUT.  The SB resource can be
> > either port binding or multicast group.
> >
> > Thanks for the comments so far. I will continue debugging.  This patch
> > can be dropped.
> >
> > Numan
> >
> > >
> > > > We observed this assertion during the upgrades of openshift to an ovn
> > > version
> > > > and there were a lot of "unreasonable timeout" warnings.
> > > >
> > > > The updated OVN version had the commit [1] which led me to think that
> > > > commit [1] can also cause this.
> > > >
> > > > >
> > > > > > 2. In a single engine run, a flow 'F1' was added to the desired
> flow,
> > > > > > removed from the desired flow and re-added as 'F2'.  This could
> > > > > > happen after the commit [1].
> > > > >
> > > > > In theory this should not happen either, because if F1 was added and
> > > then
> > > > > removed before it was installed, it would have been removed in
> > > > > track_flow_del():
> > > > >
> > > > >         if (!f->installed_flow) {
> > > > >             /* If it is not installed yet, simply destroy it. */
> > > > >             desired_flow_destroy(f);
> > > > >             return;
> > > > >         }
> > > > >
> > > > > In fact this is what the comment is supposed to say, but the comment
> > > had a
> > > > > typo (my bad):
> > > > >
> > > > >             /* del_f must have been installed, otherwise it should
> have
> > > been
> > > > >             * removed during track_flow_add_or_modify. */
> > > > >
> > > > > s/track_flow_add_or_modify/track_flow_del
> > > >
> > > > This typo really confused me :). From your comments the scenario 2
> seems
> > > > not possible.
> > > >
> > > > But I still think scenario 1 can happen if ofctrl_can_put() returns
> FALSE.
> > > > What do you think ?
> > > >
> > > > Thanks
> > > > Numan
> > > >
> > > > >
> > > > > In practice, if commit [1] did trigger this bt, then I'd look deeper
> > > into
> > > > > the logic, but I think removing the assert may just hide the
> problem.
> > > > >
> > > > > >
> > > > > > The likelihood of (2) seems to be higher with datapath groups
> enabled.
> > > > > >
> > > > > > The assertion - ovs_assert(del_f->installed_flows) is observed in
> > > > > > few deployments after the commit [1].  This patch addressed this
> issue
> > > > > > by removing that assertion.  Removing this assertion should not
> have
> > > > > > any side effects as the flow 'F2' will be installed.
> > > > > >
> > > > > > This patch is proposed based on the code inspection and the
> > > possibility
> > > > > > that the above mentioned scenarios can happen.  The patch doesn't
> have
> > > > > > any test cases to reproduce these scenarios.
> > > > > >
> > > > > > [1] - c6c61b4e3462("ofctrl: Fix the assert seen when flood
> removing
> > > flows
> > > > > with conj actions.")
> > > > > >
> > > > > > Reported-at: https://bugzilla.redhat.com/show_bug.cgi?id=1951502
> > > > > > Signed-off-by: Numan Siddique <numans@ovn.org>
> > > > > > ---
> > > > > >  controller/ofctrl.c | 26 ++++++++++++++++++++++----
> > > > > >  1 file changed, 22 insertions(+), 4 deletions(-)
> > > > > >
> > > > > > diff --git a/controller/ofctrl.c b/controller/ofctrl.c
> > > > > > index c29c3d180..9e960b034 100644
> > > > > > --- a/controller/ofctrl.c
> > > > > > +++ b/controller/ofctrl.c
> > > > > > @@ -1931,11 +1931,29 @@ merge_tracked_flows(struct
> > > ovn_desired_flow_table
> > > > > *flow_table)
> > > > > >                  continue;
> > > > > >              }
> > > > > >
> > > > > > -            /* del_f must have been installed, otherwise it
> should
> > > have
> > > > > been
> > > > > > -             * removed during track_flow_add_or_modify. */
> > > > > > -            ovs_assert(del_f->installed_flow);
> > > > > > +            if (!del_f->installed_flow) {
> > > > > > +                /* del_f must have been installed, otherwise it
> > > should
> > > > > have
> > > > > > +                 * been removed during track_flow_add_or_modify.
> > > > > > +                 *
> > > > > > +                 * But however there are a couple of scenarios
> this
> > > may
> > > > > not
> > > > > > +                 * happen.
> > > > > > +                 *
> > > > > > +                 * Scenario 1:  A flow was added to the desired
> > > flows,
> > > > > but
> > > > > > +                 *              ofctrl_put() couldn't install the
> > > flow
> > > > > and
> > > > > > +                 *              an SB update caused the 'del_f'
> to be
> > > > > removed
> > > > > > +                 *              and re-added as 'f'.
> > > > > > +                 *
> > > > > > +                 * Scenario 2:  In a  single engine run, a flow
> > > 'del_f'
> > > > > was
> > > > > > +                 *              added to the desired flow,
> removed
> > > from
> > > > > the
> > > > > > +                 *              desired flow and re-added as 'f'.
> > > This
> > > > > could
> > > > > > +                 *              happen after the commit
> c6c61b4e3462.
> > > > > > +                 *
> > > > > > +                 * Treat both the scenarios as valid scenarios
> and
> > > just
> > > > > remove
> > > > > > +                 * 'del_f' from the hmap - deleted_flows.
> > > > > > +                 * update_installed_flows_by_track() will install
> > > 'f'.
> > > > > > +                 */
> > > > > >
> > > > > > -            if (!f->installed_flow) {
> > > > > > +            } else if (!f->installed_flow) {
> > > > > >                  /* f is not installed yet. */
> > > > > >
>  replace_installed_to_desired(del_f->installed_flow,
> > > > > del_f, f);
> > > > > >              } else {
> > > > > > --
> > > > > > 2.29.2
> > > > > >
> > > > > > _______________________________________________
> > > > > > dev mailing list
> > > > > > dev@openvswitch.org
> > > > > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> > > > > _______________________________________________
> > > > > dev mailing list
> > > > > dev@openvswitch.org
> > > > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> > > > >
> > > _______________________________________________
> > > dev mailing list
> > > dev@openvswitch.org
> > > https://mail.openvswitch.org/mailman/listinfo/ovs-dev
> > >
> _______________________________________________
> dev mailing list
> dev@openvswitch.org
> https://mail.openvswitch.org/mailman/listinfo/ovs-dev
>
diff mbox series

Patch

diff --git a/controller/ofctrl.c b/controller/ofctrl.c
index c29c3d180..9e960b034 100644
--- a/controller/ofctrl.c
+++ b/controller/ofctrl.c
@@ -1931,11 +1931,29 @@  merge_tracked_flows(struct ovn_desired_flow_table *flow_table)
                 continue;
             }
 
-            /* del_f must have been installed, otherwise it should have been
-             * removed during track_flow_add_or_modify. */
-            ovs_assert(del_f->installed_flow);
+            if (!del_f->installed_flow) {
+                /* del_f must have been installed, otherwise it should have
+                 * been removed during track_flow_add_or_modify.
+                 *
+                 * But however there are a couple of scenarios this may not
+                 * happen.
+                 *
+                 * Scenario 1:  A flow was added to the desired flows, but
+                 *              ofctrl_put() couldn't install the flow and
+                 *              an SB update caused the 'del_f' to be removed
+                 *              and re-added as 'f'.
+                 *
+                 * Scenario 2:  In a  single engine run, a flow 'del_f' was
+                 *              added to the desired flow, removed from the
+                 *              desired flow and re-added as 'f'.  This could
+                 *              happen after the commit c6c61b4e3462.
+                 *
+                 * Treat both the scenarios as valid scenarios and just remove
+                 * 'del_f' from the hmap - deleted_flows.
+                 * update_installed_flows_by_track() will install 'f'.
+                 */
 
-            if (!f->installed_flow) {
+            } else if (!f->installed_flow) {
                 /* f is not installed yet. */
                 replace_installed_to_desired(del_f->installed_flow, del_f, f);
             } else {